You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed a couple of issues which make Ogr2OgrExecOutput a bit less flexible than necessary. I'm writing them here, because of pending changes in execoutput in PR #75, so these changes can be done once that PR has been closed (either accepted or dismissed). I'm prepared to do these changes myself. For now it would be wise to have a discussion about the proposed changes.
Handling of position arguments. The position arguments of ogr2ogr are dst_datasource_name src_datasource_name [layer [layer ...]]. The rest are keyword arguments and they can be ignored for this matter. So you'll need at least two position arguments for ogr2ogr to work. They can be expanded indefinitely by adding layers. So, I'd like to propose that, when composing the ogr2ogr command string the keyword arguments are all put in front of the position arguments. This means that self.dest_data_source should only be added to the command in the execute method. We're not handling layers yet, but they could be added to in the future.
The dest_data_source should be optional. For my current use case I'm exporting data from PostgreSQL to a couple of different files. I'm using a combination of LineStreamerFileInput (list of data to be exported) + FormatConverter (split list into separate strings) + StringSubstitutionFilter (handling temp dir) + Ogr2OgrExecOutput. So, ogr2ogr is invoked multiple times. In my command file I'm specifying the dest source, as well as an -sql option, which is also different for every invocation. In my config I've defined an empty dest_data_source.
Add an optional parameter for the source data source. In my use case (see above) this is always the same, namely the Postgres connection string.
Make it configurable whether the lco options should be only called once or during every invocation of ogr2ogr. Currently I've put a -lco argument in the options string.
As you might guess, I'm looking for a solution where not only the source can change after each invocation, but also the destination and parameters. Ideally they should be passed in through a record. While this is possible, I think this is a next step in the evolution of this output object.
The text was updated successfully, but these errors were encountered:
OK, #75 has been integrated, my proposal is to first release this week v.1.2 (Milestone 1.2) for which all issues, including #75 are closed/merged now.
How to progress further? Actually Stetl started as a pipeline of Bash scripts, calling xsltproc, ogr2ogr etc via Linux pipeline symbols (|), hence this notation still in [etl] Chains.
When moving to Python the first approach was to embed ogr2ogr in an ogr2ogr Input class that evolved into OgrInput, i.e. using GDAL Python bindings. The ogr2ogr Output class was (and is) initially Ogr2OgrOutput, which had limitations, but had one parameter specifying the entire ogr2ogr command line with options. See here.
Later I tried to develop a 'native' OgrOutput (see this example) with GDAL Python bindings, but this was never really completed.
IMHO it is still a bit hacky to embed commands/programs in any Stetl Component. Though Ogr2OgrExecOutput, is powerful, like in NLExtract, we, as this issue states, still have to deal with the myriad of commandline options. Plus:
it requires special installation and invocation of ogr2ogr on different platforms (Windows, Linux, Mac)
it is hard to control any error situations
we need the "temp input file"
Long story short: I would rather like to move to and enhance the native Python-binded OgrOutput. At this point I can't assess all the issues/complexity involved, but this (native OgrOutput) seems a much cleaner solution for the future...
The last issue (LCO) has been solved via #95. AFAIK, the rest still stands. When I'm in the occasion, I can submit a few PRs.
I agree that we should give more attention to OgrOutput, but I have no idea how much work still needs to be done.
I've noticed a couple of issues which make Ogr2OgrExecOutput a bit less flexible than necessary. I'm writing them here, because of pending changes in execoutput in PR #75, so these changes can be done once that PR has been closed (either accepted or dismissed). I'm prepared to do these changes myself. For now it would be wise to have a discussion about the proposed changes.
As you might guess, I'm looking for a solution where not only the source can change after each invocation, but also the destination and parameters. Ideally they should be passed in through a record. While this is possible, I think this is a next step in the evolution of this output object.
The text was updated successfully, but these errors were encountered: