Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stetl bgt improvements #69

Merged
merged 8 commits into from
Feb 27, 2018
Merged

Conversation

fsteggink
Copy link
Collaborator

While working on improving NLExtract's BGT Extract, I've found it necessary to add two filters, and improve two other filters. The changes should be self-explanatory. If not, please let me know.

@justb4 justb4 self-requested a review February 10, 2018 16:44
@justb4 justb4 added this to the Version 1.2 milestone Feb 10, 2018
@justb4
Copy link
Member

justb4 commented Feb 10, 2018

The Travis build can easily be fixed: just run flake8 in the project root. Just some minor code formatting issues, all regular tests passed.

@fsteggink
Copy link
Collaborator Author

Thanks Just, I didn't know that tool. Could be useful for NLExtract as well ;)

@fsteggink
Copy link
Collaborator Author

I'm also considering more improvements to Stetl for the BGT extract. Right now I have "hacked" a way in NLExtract to prepare a custom GFS which contains only the feature type and also the feature count. This greatly improves the import speed with ogr2ogr. I think it is useful to add this as a filter in Stetl as well. It will depend on OGR on the command line and LXML.

@Config(ptype=bool, default=False, required=False)
def safe_substitution(self):
"""
Apply safe substitution?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly add more comment (I did not know e.g. about this standard option in Python Templates), like
if placeholders are missing from mapping and keywords, instead of raising an exception, the original placeholder will appear in the resulting string intact.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Usually I don't add comments for things which can be easily looked up.

@@ -0,0 +1,61 @@
#!/usr/bin/env python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful, an example will help, hard to grasp otherwise. Suggestions:

  • can't regexes be compiled once during init?
  • more uses expected? Maybe a baseclass RegexFilter and subclasses RegexToRecordFilter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compilation: good point.
More uses: I haven't thought about it yet. It is possible, but at the moment I don't have any other concrete use cases yet. When looking at the possible formats, I think only struct will be a good option. Although formats like geojson_feature, ogr_feature and etree_element could represent the parsed data, they are too specialized. The output of regexfilter, a dictionary, is not something you would typically write directly.

Copy link
Member

@justb4 justb4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok with PR, you may fill in the suggestions.

@fsteggink
Copy link
Collaborator Author

I have added unit tests for the new filter classes. I haven't added a unit test for my change to the StringTemplatingFilter, since unit tests are missing entirely.

We should continue working on them, but I'd like to do that outside of the scope of this PR.

@justb4 justb4 merged commit d50832b into geopython:master Feb 27, 2018
@justb4
Copy link
Member

justb4 commented Feb 27, 2018

Ok, great, merged...Yes I also started adding tests like for splitting, and there is a separate issue #40 for Unit testing.

@fsteggink
Copy link
Collaborator Author

Thanks for the quick merge after these fixes!

@fsteggink fsteggink deleted the stetl_bgt_improvements branch February 27, 2018 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants