You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FileInput and derived classes like StringFileInput can handle lists of files from directory and glob.glob parameters. Still all file content is read/passed as a single Packet. Also .zip files are handled by a dedicated class ZipFileInput.
It should be possible to generalize FileInput to have derived classes read from files no matter if files came from directory structures, glob.glob expanded file lists or .zip files. Even a mixture of these should be handled. For example within NLExtract https://github.com/nlextract/NLExtract/blob/master/bag/src/bagfilereader.py can handle any file structure provided.
A second aspect is file chunking: a FileInput may split up a single file into Packets containing data structures extracted from that file. For example, FileInputs like XmlElementStreamerFileInput and LineStreamerFileInput
open/parse a file but pass file-content (lines, parsed elements) in
fine-grained chunks on each read(). Currently these classes implement this fully
within their read() function, but the generic pattern is that they
maintain a "context" for the open/parsed file.
So all in all this issue addresses two general aspects:
handle any file-specs: directories, maps, Globbing, zip-files and any mix of these
handle fine-grained file-chunking: on each invoke()/read() may supply part of a file: a line an XML element etc.
See also issue #49 for additional discussion which lead to this issue.
The Strategy Design Pattern may be applied (many refs on the web).
The text was updated successfully, but these errors were encountered:
FileInput
and derived classes likeStringFileInput
can handle lists of files from directory andglob.glob
parameters. Still all file content is read/passed as a singlePacket
. Also.zip
files are handled by a dedicated classZipFileInput
.It should be possible to generalize
FileInput
to have derived classes read from files no matter if files came from directory structures,glob.glob
expanded file lists or .zip files. Even a mixture of these should be handled. For example within NLExtract https://github.com/nlextract/NLExtract/blob/master/bag/src/bagfilereader.py can handle any file structure provided.A second aspect is
file chunking
: aFileInput
may split up a single file into Packets containing data structures extracted from that file. For example,FileInput
s likeXmlElementStreamerFileInput
andLineStreamerFileInput
open/parse a file but pass file-content (lines, parsed elements) in
fine-grained chunks on each
read()
. Currently these classes implement this fullywithin their
read()
function, but the generic pattern is that theymaintain a "context" for the open/parsed file.
So all in all this issue addresses two general aspects:
file-specs
: directories, maps,Globbing
, zip-files and any mix of theseSee also issue #49 for additional discussion which lead to this issue.
The Strategy Design Pattern may be applied (many refs on the web).
The text was updated successfully, but these errors were encountered: