Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format string and dictionary options are incompatible #124

Open
sebastic opened this issue Nov 29, 2021 · 1 comment
Open

Format string and dictionary options are incompatible #124

sebastic opened this issue Nov 29, 2021 · 1 comment
Labels

Comments

@sebastic
Copy link
Contributor

sebastic commented Nov 29, 2021

Describe the bug
stetl fails when using a configuration option with a dictionary value and an arguments dictionary.

Example config from: https://github.com/geopython/stetl/blob/master/examples/basics/11_formatconvert/etl.cfg#L46

# The GML must be a simple features collection
[convert_to_geojson]
class = stetl.filters.formatconverter.FormatConverter
input_format = etree_doc
output_format = geojson_collection
converter_args = {
    'root_tag': 'FeatureCollection',
    'feature_tag': 'featureMember',
    'feature_id_attr': 'fid'
    }

To Reproduce

$ PYTHONPATH=. python3 bin/stetl -c examples/basics/11_formatconvert/etl.cfg -a foo=bar
2021-11-29 14:49:25,134 util INFO Found lxml.etree, native XML parsing, fabulous!
2021-11-29 14:49:25,188 util INFO Found GDAL/OGR Python bindings, super!!
2021-11-29 14:49:25,190 main INFO Stetl version = 2.1.dev0
2021-11-29 14:49:25,191 ETL INFO INIT - Stetl version is 2.1.dev0
2021-11-29 14:49:25,191 ETL INFO Config/working dir = /home/bas/git/nlextract/nlextract/externals/stetl/examples/basics/11_formatconvert
2021-11-29 14:49:25,191 ETL INFO Reading config_file = examples/basics/11_formatconvert/etl.cfg
2021-11-29 14:49:25,191 ETL INFO Substituting 0 args in config file from args_dict: []
2021-11-29 14:49:25,191 ETL ERROR Error substituting config arguments: err="\n    'root_tag'"
Traceback (most recent call last):
  File "/home/bas/git/nlextract/nlextract/externals/stetl/bin/stetl", line 43, in <module>
    main()
  File "/home/bas/git/nlextract/nlextract/externals/stetl/bin/stetl", line 35, in main
    etl = ETL(vars(args), args.config_args)
  File "/home/bas/git/nlextract/nlextract/externals/stetl/stetl/etl.py", line 97, in __init__
    raise e
  File "/home/bas/git/nlextract/nlextract/externals/stetl/stetl/etl.py", line 91, in __init__
    config_str = config_str.format(**args_dict)
KeyError: "\n    'root_tag'"

Expected Behavior
The configuration is loaded successfully, including argument substitution.

Context (please complete one or more from the following information):

  • OS: Debian unstable
  • Python Version: 3.9.9
  • Stetl Version: 2.1.dev0
  • Stetl Input/Output/Filter Component: stetl/etl.py
  • Stetl Config file: examples/basics/11_formatconvert/etl.cfg

Additional context
A string2record converter was implemented:

--- a/stetl/filters/formatconverter.py
+++ b/stetl/filters/formatconverter.py
@@ -338,6 +338,29 @@ class FormatConverter(Filter):
         packet.data = etree.fromstring(packet.data)
         return packet
 
+    @staticmethod
+    def string2record(packet, converter_args=None):
+        if(
+            converter_args is not None and
+            'value_column' in converter_args
+        ):
+            key = converter_args['value_column']
+        else:
+            key = 'value'
+
+        record = dict({key: packet.data})
+
+        if(
+            converter_args is not None and
+            'column_data' in converter_args
+        ):
+            for key in converter_args['column_data']:
+                record[key] = converter_args['column_data'][key]
+
+        packet.data = record
+
+        return packet
+
     @staticmethod
     def struct2string(packet):
         packet.data = packet.to_string()
@@ -406,6 +429,7 @@ FORMAT_CONVERTERS = {
     },
     FORMAT.string: {
         FORMAT.etree_doc: FormatConverter.string2etree_doc,
+        FORMAT.record: FormatConverter.string2record,
         FORMAT.xml_doc_as_string: FormatConverter.no_op
     },
     FORMAT.struct: {

Which requires configuration like this:

# convert string to record
[convert_string_to_record]
class = stetl.filters.formatconverter.FormatConverter
input_format = string
output_format = record
converter_args = {
        'value_column': 'waarde',
        'column_data': {
            'sleutel': 'levering_xml',
        },
    }

Due to this issue the converters which require converter_args cannot be used in the NLExtract BAGv2 configuration because that sets arguments via options/<hostname>.args.

@sebastic sebastic added the bug label Nov 29, 2021
@sebastic
Copy link
Contributor Author

ast.literal_eval() does not support the alternative dict() syntax:

>>> {}
{}
>>> dict()
{}
>>> ast.literal_eval('{}')
{}
>>> ast.literal_eval('dict()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/ast.py", line 105, in literal_eval
    return _convert(node_or_string)
  File "/usr/lib/python3.9/ast.py", line 104, in _convert
    return _convert_signed_num(node)
  File "/usr/lib/python3.9/ast.py", line 78, in _convert_signed_num
    return _convert_num(node)
  File "/usr/lib/python3.9/ast.py", line 69, in _convert_num
    _raise_malformed_node(node)
  File "/usr/lib/python3.9/ast.py", line 66, in _raise_malformed_node
    raise ValueError(f'malformed node or string: {node!r}')
ValueError: malformed node or string: <ast.Call object at 0x7fad4e8fd460>

Supporting both substitution variables and dictionary values may require changing the config file into a Jinja template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant