Genome Utilities (GeUtilities) provides open-source building-blocks for genomic data analysis tools. The following components are currently implemented:
- IGenomics: interfaces to build portable objects. For instance, ChIP-seq peaks, variations, or general features.
- Parsers; highly customizable parsers reading source files into in-memory objects. The following parsers are currently implemented:
- Interval-based data formats
- Browser Extensible Data (BED)
- Gene transfer format (GTF)
- Variant Call Format (VCF)
- Reference Sequence (RefSeq)
- Interval-based data formats
These components are highly customizable making them suitable for variety of application scenarios and variations of input data. For instance, you can use a parser as simple as passing the path to the file to be parsed and call Parse()
function. However, if you have a tool that implements a class Foo
for a ChIP-seq peak, you can set the parser to read BED files and produce peaks in the Foo
type. Therefore, no need to case or convert from the parsed data type to your application’s implemented types. Additionally, if the file to be parsed has different column orders than what the format specification says (e.g., p-value is given on the second column of a “BED” file), then you can update the parser’s column indexes to match your data. Moreover, you may want to only sniff your data (i.e., read only the first 10 lines of the input), then you can specify the number of lines you want the parser to read. Accordingly, our design decisions enable delivering components that can be used out-of-box with minimal configurations while still highly customizable.
x64 Release | |
---|---|
Windows | |
Linux Ubuntu 14.04 |