Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ogr2OgrExecOutput issues #76

Open
fsteggink opened this issue Jun 13, 2018 · 3 comments
Open

Ogr2OgrExecOutput issues #76

fsteggink opened this issue Jun 13, 2018 · 3 comments
Assignees
Milestone

Comments

@fsteggink
Copy link
Collaborator

I've noticed a couple of issues which make Ogr2OgrExecOutput a bit less flexible than necessary. I'm writing them here, because of pending changes in execoutput in PR #75, so these changes can be done once that PR has been closed (either accepted or dismissed). I'm prepared to do these changes myself. For now it would be wise to have a discussion about the proposed changes.

  • Handling of position arguments. The position arguments of ogr2ogr are dst_datasource_name src_datasource_name [layer [layer ...]]. The rest are keyword arguments and they can be ignored for this matter. So you'll need at least two position arguments for ogr2ogr to work. They can be expanded indefinitely by adding layers. So, I'd like to propose that, when composing the ogr2ogr command string the keyword arguments are all put in front of the position arguments. This means that self.dest_data_source should only be added to the command in the execute method. We're not handling layers yet, but they could be added to in the future.
  • The dest_data_source should be optional. For my current use case I'm exporting data from PostgreSQL to a couple of different files. I'm using a combination of LineStreamerFileInput (list of data to be exported) + FormatConverter (split list into separate strings) + StringSubstitutionFilter (handling temp dir) + Ogr2OgrExecOutput. So, ogr2ogr is invoked multiple times. In my command file I'm specifying the dest source, as well as an -sql option, which is also different for every invocation. In my config I've defined an empty dest_data_source.
  • Add an optional parameter for the source data source. In my use case (see above) this is always the same, namely the Postgres connection string.
  • Make it configurable whether the lco options should be only called once or during every invocation of ogr2ogr. Currently I've put a -lco argument in the options string.

As you might guess, I'm looking for a solution where not only the source can change after each invocation, but also the destination and parameters. Ideally they should be passed in through a record. While this is possible, I think this is a next step in the evolution of this output object.

@justb4
Copy link
Member

justb4 commented Jul 4, 2018

OK, #75 has been integrated, my proposal is to first release this week v.1.2 (Milestone 1.2) for which all issues, including #75 are closed/merged now.

How to progress further? Actually Stetl started as a pipeline of Bash scripts, calling xsltproc, ogr2ogr etc via Linux pipeline symbols (|), hence this notation still in [etl] Chains.

When moving to Python the first approach was to embed ogr2ogr in an ogr2ogr Input class that evolved into OgrInput, i.e. using GDAL Python bindings. The ogr2ogr Output class was (and is) initially Ogr2OgrOutput, which had limitations, but had one parameter specifying the entire ogr2ogr command line with options. See here.

Later I tried to develop a 'native' OgrOutput (see this example) with GDAL Python bindings, but this was never really completed.

IMHO it is still a bit hacky to embed commands/programs in any Stetl Component. Though Ogr2OgrExecOutput, is powerful, like in NLExtract, we, as this issue states, still have to deal with the myriad of commandline options. Plus:

  • it requires special installation and invocation of ogr2ogr on different platforms (Windows, Linux, Mac)
  • it is hard to control any error situations
  • we need the "temp input file"

Long story short: I would rather like to move to and enhance the native Python-binded OgrOutput. At this point I can't assess all the issues/complexity involved, but this (native OgrOutput) seems a much cleaner solution for the future...

@justb4 justb4 added this to the Version 1.3 milestone Jul 4, 2018
@justb4 justb4 modified the milestones: Version 2.1, Version 2.2 Jul 22, 2020
@justb4
Copy link
Member

justb4 commented Jul 22, 2020

Hasn't this been (partly) solved? e.g. "lco options" via #95?

@fsteggink
Copy link
Collaborator Author

The last issue (LCO) has been solved via #95. AFAIK, the rest still stands. When I'm in the occasion, I can submit a few PRs.
I agree that we should give more attention to OgrOutput, but I have no idea how much work still needs to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants