-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stetl bgt improvements #69
Conversation
…with named groups. The extracted data is returned as a record.
The Travis build can easily be fixed: just run |
Thanks Just, I didn't know that tool. Could be useful for NLExtract as well ;) |
I'm also considering more improvements to Stetl for the BGT extract. Right now I have "hacked" a way in NLExtract to prepare a custom GFS which contains only the feature type and also the feature count. This greatly improves the import speed with ogr2ogr. I think it is useful to add this as a filter in Stetl as well. It will depend on OGR on the command line and LXML. |
stetl/filters/templatingfilter.py
Outdated
@Config(ptype=bool, default=False, required=False) | ||
def safe_substitution(self): | ||
""" | ||
Apply safe substitution? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly add more comment (I did not know e.g. about this standard option in Python Templates), like
if placeholders are missing from mapping and keywords, instead of raising an exception, the original placeholder will appear in the resulting string intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Usually I don't add comments for things which can be easily looked up.
@@ -0,0 +1,61 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful, an example will help, hard to grasp otherwise. Suggestions:
- can't regexes be compiled once during init?
- more uses expected? Maybe a baseclass
RegexFilter
and subclassesRegexToRecordFilter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compilation: good point.
More uses: I haven't thought about it yet. It is possible, but at the moment I don't have any other concrete use cases yet. When looking at the possible formats, I think only struct will be a good option. Although formats like geojson_feature, ogr_feature and etree_element could represent the parsed data, they are too specialized. The output of regexfilter, a dictionary, is not something you would typically write directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok with PR, you may fill in the suggestions.
I have added unit tests for the new filter classes. I haven't added a unit test for my change to the StringTemplatingFilter, since unit tests are missing entirely. We should continue working on them, but I'd like to do that outside of the scope of this PR. |
Thanks for the quick merge after these fixes! |
While working on improving NLExtract's BGT Extract, I've found it necessary to add two filters, and improve two other filters. The changes should be self-explanatory. If not, please let me know.