Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TASK-5564 - Update data sources for CellBase 6.2 #696

Open
wants to merge 158 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
6346c97
download: gwas catalog fixes
imedina Jan 21, 2024
19368e6
Merge branch 'TASK-5392' into TASK-5564
imedina Mar 1, 2024
89264c2
Update configuration.yml
imedina Mar 1, 2024
2be5f21
core: update pubmed URLs in the configuration file, #TASK-5775, #TASK…
jtarraga Mar 7, 2024
fe05795
core: update pubmed version in the configuration file, #TASK-5775, #T…
jtarraga Mar 7, 2024
50f7008
core: improve Ontology downloader, #TASK-5775, #TASK-5564
jtarraga Mar 7, 2024
a8a9328
lib: take into account PubMed version from config file, and fix sonna…
jtarraga Mar 7, 2024
f84734e
lib: improve clinvar and gwas downloader by removing hardcode filenam…
jtarraga Mar 7, 2024
1a5ba4a
core: update clinvar version in config file, #TASK-5775, #TASK-5564
jtarraga Mar 7, 2024
4cdd046
lib: improve gene downloader by updating versions from config file, a…
jtarraga Mar 8, 2024
3cea3f3
lib: improve repeat downloader by updating versions from config file,…
jtarraga Mar 8, 2024
f308f25
lib: improve conservation downloader by updating versions from config…
jtarraga Mar 22, 2024
2e6e895
lib: update regulation download manager, and the configuration file, …
jtarraga Mar 28, 2024
a6688d0
lib: update configuration file; and create version files for COSMIC a…
jtarraga Apr 3, 2024
c2345d4
lib: update CellBase builder for clinical variants, #TASK-5776, #TASK…
jtarraga Apr 3, 2024
df0f1e0
lib: fix Gwas Catalog builder for clinical variants, #TASK-5776, #TAS…
jtarraga Apr 4, 2024
d4cba15
lib: refactor code by changing the DownloadProperties.URLProperties, …
jtarraga Apr 5, 2024
a3e9684
lib: update CellBase downloaders according to the DownloadProperties.…
jtarraga Apr 11, 2024
c7ad55d
Rename get file name method
imedina Apr 11, 2024
e92b676
lib: update CellBase downloaders according to the DownloadProperties.…
jtarraga Apr 11, 2024
281fb22
Resolve conflicts, #TASK-5564
jtarraga Apr 11, 2024
e18506b
lib: update CellBase downloaders, #TASK-5775, #TASK-5564
jtarraga Apr 12, 2024
69a58bf
core: update CellBase configuration file, #TASK-5775, #TASK-5564
jtarraga Apr 15, 2024
d4e0cd6
lib: update MANE Select downloader, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
6ee2f78
lib: update LRG, HGNC, Cancer HotSpot, DGIDB, Gene Uniprot Xref, Gene…
jtarraga Apr 18, 2024
d794ceb
lib: update RefSeq downloader, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
1b751de
lib: update missense scores (REVEL) downloader, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
b635333
lib: update CADD and clinical variant downloaders, #TASK-5775, #TASK-…
jtarraga Apr 18, 2024
106b96d
lib: update protein downloaders, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
55afe6b
lib: update gene downloader (specially for ensembl data), and improve…
jtarraga Apr 19, 2024
88c2b17
core: add Ensembl primary fasta URL into the configuration file for t…
jtarraga Apr 22, 2024
eee13e3
lib: update genome download manager by declaring and using constants …
jtarraga Apr 22, 2024
cd367b9
app: update genome builder by using constants from the class EtlCommo…
jtarraga Apr 22, 2024
ce6f8d5
app: fix sonnar issues in BuildCommandExecutor, #TASK-5564
jtarraga Apr 22, 2024
3566e01
app: improve log/exception messages in DownloadCommandExecutor, #TASK…
jtarraga Apr 22, 2024
cd94452
app: update repeats builder, and improve log/exception messages, #TAS…
jtarraga Apr 22, 2024
148814f
lib: update the repeats builder by removing the hardcoded filenames a…
jtarraga Apr 22, 2024
30a4c87
lib: update conservation builder by removing the hardcoded filenames …
jtarraga Apr 22, 2024
85e17db
lib: call bigWigToBedGraph to convert the GERP bigwig to bed graph fi…
jtarraga Apr 23, 2024
0223cb5
lib: include log messages, #TASK-5564
jtarraga Apr 23, 2024
833c337
lib: improve ProteinBuilder by removing hardcoded file names, adding …
jtarraga Apr 23, 2024
01deb0c
lib: move DataSource reader from ConservationBuilder to the parent Ce…
jtarraga Apr 24, 2024
9416894
lib: move the function to split UniProt into chuncks from the protein…
jtarraga Apr 24, 2024
909c0b2
core: fix regulation URLs in the configuration file, #TASK-5775, #TAS…
jtarraga Apr 24, 2024
71d8056
lib: launch a CellBase exception if executing a command (wget, gunzip…
jtarraga Apr 24, 2024
1544824
lib: fix sonnar issues, #TASK-5775, #TASK-5564
jtarraga Apr 24, 2024
3e43874
lib: move the function to parse and build PFMs from the regulation do…
jtarraga Apr 24, 2024
959e423
core: update ontology section of the CellBase configuration since ont…
jtarraga Apr 25, 2024
158c259
lib: update ontology download since ontology versions will be taken f…
jtarraga Apr 25, 2024
0b83831
app: update the build command executor to check/copy the ontology ver…
jtarraga Apr 25, 2024
39f0f41
lib: improve the ontology builder by removing hardcoded filenames, ad…
jtarraga Apr 25, 2024
5c3dae0
lib: improve the PharmGKB downloader by moving the function to unzip …
jtarraga Apr 25, 2024
971235e
lib: improve the PharmGKB builder by adding checks and log messages; …
jtarraga Apr 25, 2024
cd444b0
lib: improve the PubMed downloader by adding log messages and fixing …
jtarraga Apr 25, 2024
e19fe73
lib: create maps to get the names, categories and version filenames f…
jtarraga Apr 26, 2024
a29afe3
lib: update according to the EtlCommons changes, #TASK-5775, #TASK-5564
jtarraga Apr 26, 2024
377ee9c
lib: improve PubMed builder by adding checks, log messages and fixing…
jtarraga Apr 26, 2024
997c8ec
lib: update CADD downloader according to last changes, #TASK-5775, #T…
jtarraga Apr 26, 2024
96078b7
lib: improve the CADD builder by adding checks, log messages, cleanin…
jtarraga Apr 26, 2024
3163a90
lib: update the REVEL downloader according to the last changes, and a…
jtarraga Apr 26, 2024
bc22fad
lib: add log messages, #TASK-5776, #TASK-5564
jtarraga Apr 29, 2024
0c9a299
lib: improve the Revel builder by fixing sonnar issues and adding che…
jtarraga Apr 29, 2024
4f9e39a
lib: update CellBase downloaders according to the last changes, #TASK…
jtarraga Apr 29, 2024
1586a77
app: update load command executor according to the EtlCommons changes…
jtarraga Apr 29, 2024
c7c398a
lib: update CellBase builders according to the EtlCommons changes, #T…
jtarraga Apr 29, 2024
754384a
lib: fix revel builder, #TASK-5776, #TASK-5564
jtarraga Apr 29, 2024
24eb091
configuration: update versions
imedina May 7, 2024
fc09da4
app: add bash script to fix the downloaded MirTarBase file, #TASK-577…
jtarraga May 7, 2024
09d33a0
core: add some comments to the configuration file, #TASK-5775, #TASK-…
jtarraga May 7, 2024
303585d
lib: update Ensembl/RefSeq indexers and builders (include major impro…
jtarraga May 7, 2024
68c47ef
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga May 7, 2024
e7c2385
lib: update clinical variant downloader by moving the split ClinVar f…
jtarraga May 10, 2024
f5b7c34
lib: update clinical variant builder by including the split ClinVar f…
jtarraga May 10, 2024
a4fca6b
lib: update code to the last changes, #TASK-5564
jtarraga May 10, 2024
57c6f6f
lib: include SpliceAI/MMSplice in the configuration file, and create …
jtarraga May 11, 2024
c131459
lib: remove deprecated functions, #TASK-5575, #TASK-5564
jtarraga May 11, 2024
a8a047c
lib: improve gene downloader by taking into account the manually down…
jtarraga May 16, 2024
100d6f3
lib: update gene builder (Ensembl/RefSeq) according to last changes, …
jtarraga May 17, 2024
0cd4b80
lib: udate Ensembl/RefSeq gene builder to gunzip FASTA files before b…
jtarraga May 27, 2024
e42cd7e
Merge branch 'develop' into TASK-5564
jtarraga May 27, 2024
4d965d7
Merge branch 'develop' into TASK-5564
imedina Jun 22, 2024
fc65d14
lib: add hpo filter to GeneQuery
imedina Jun 23, 2024
84ad97b
Many improvements and fixes:
imedina Jul 2, 2024
aaec065
* Add new ensembl_canonical.pl
imedina Jul 2, 2024
dad180d
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga Jul 2, 2024
694b81d
lib: use DockerUtils to execute Perl script from docker image, #TASK-…
jtarraga Jul 2, 2024
c8e719a
test: update JUnit tests, #TASK-5564
jtarraga Jul 3, 2024
19efdf4
cicd: update task.yml to deploy cellbase-builder docker, #TASK-5564
jtarraga Jul 3, 2024
fcbb680
build: create the MiRTarBase parser for .xlsx files, #TASK-5576, #TAS…
jtarraga Jul 4, 2024
10a579a
Builder improvements and several data cleaning
imedina Jul 4, 2024
87d95e8
Merge branch 'TASK-5564' of github.com:opencb/cellbase into TASK-5564
imedina Jul 4, 2024
c6bcbdd
Gene downloader fixes
imedina Jul 4, 2024
0a6a84f
Add VariationDownloader
imedina Jul 5, 2024
3dcad47
Add VariationDownloader
imedina Jul 5, 2024
5eb33ae
app: update Dockerfile for cellbase-builder in order to allow the scr…
jtarraga Jul 8, 2024
f44dcc5
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga Jul 8, 2024
2b226fe
lib: add variation to the EtlCommons dataVersionFilenamesMap, #TASK-5…
jtarraga Jul 9, 2024
510819c
Merge branch 'develop' into TASK-5564
jtarraga Jul 22, 2024
5514177
lib: remove unused variables, #TASK-5575, #TASK-5564
jtarraga Jul 23, 2024
ae9a817
core: add the field 'id' in DataSource model, #TASK-5575, #TASK-5564
jtarraga Jul 23, 2024
20c554b
core: update DGIdb in the configuration file, #TASK-5575, #TASK-5564
jtarraga Jul 23, 2024
9d2d4fe
lib: check if genome data is already downloaded before downloading to…
jtarraga Jul 23, 2024
299003b
lib: add the parameter 'assembly' to command line when calling the sc…
jtarraga Jul 23, 2024
1d171d5
lib: update GeneDownloadManager to call the script gene_extra_info.pl…
jtarraga Jul 24, 2024
d10931d
lib: improve genome and conservation downloaders by checking if data …
jtarraga Jul 24, 2024
b422f3a
lib: improve repeats downloaders by checking if data is already downl…
jtarraga Jul 24, 2024
d0c0ba3
lib: improve regulation downloader by checking if data is already dow…
jtarraga Jul 24, 2024
1dc504f
lib: fix motif features folder for regulation downloader, #TASK-5575,…
jtarraga Jul 24, 2024
4ba788d
lib: fix minor sonnar issue, #TASK-5575, #TASK-5564
jtarraga Jul 24, 2024
6fc7129
lib: improve protein downloader by checking if data is already downlo…
jtarraga Jul 24, 2024
8ed0e0d
lib: improve variation downloader by checking if data is already down…
jtarraga Jul 24, 2024
1442766
lib: fix variation folder in downloader, #TASK-5575, #TASK-5564
jtarraga Jul 24, 2024
e48d27d
core: remove DISGENET, #TASK-5575, #TASK-5564
jtarraga Jul 24, 2024
642935a
lib: improve gene downloader, removing DISGENET, fixing sonnar issues…
jtarraga Jul 24, 2024
8030b02
lib: fix command line to execute Perl script, #TASK-5575, #TASK-5564
jtarraga Jul 24, 2024
e17e51d
lib: add files generated by scripts in the version JSON files, #TASK-…
jtarraga Jul 25, 2024
733cade
lib: improve genome builder by checking files, and fixing sonnar issu…
jtarraga Jul 25, 2024
ddc1056
lib: take into account the parameter --keep when gunzip, #TASK-5576, …
jtarraga Jul 25, 2024
8c6dc78
lib: improve conservation builder by adding checks, log messages and …
jtarraga Jul 26, 2024
847f835
lib: add support for multi-species, checks and log messages in the re…
jtarraga Jul 26, 2024
b0d1c67
lib: add support for multi-species, checks and log messages in regula…
jtarraga Jul 26, 2024
039aa81
lib: fix protein builder, #TASK-5576, #TASK-5564
jtarraga Jul 29, 2024
7f77dec
lib: fix gene downloader for RefSeq files, #TASK-5575, #TASK-5564
jtarraga Jul 29, 2024
0eb898e
lib: improve gene (Ensembl/RefSeq) builder by supporting multi-specie…
jtarraga Jul 31, 2024
1d47fd9
lib: fix sonnar issues, #TASK-5576, #TASK-5564
jtarraga Jul 31, 2024
7fbc054
lib: add variant and variant_structural_variations in the configurati…
jtarraga Jul 31, 2024
d483dcf
app: improve CellBase loader by creating a new function to be reused …
jtarraga Aug 1, 2024
7f62ce7
lib: improve genome sequence and info loader, #TASK-6142, #TASK-5564
jtarraga Aug 1, 2024
0602bba
app: update CellBase loader for conservation data, #TASK-6142, #TASK-…
jtarraga Aug 1, 2024
2b4fbeb
app: update CellBase loader for genes and proteins according to the p…
jtarraga Aug 1, 2024
d693f57
lib: add VariantBuilder to generate the variation JSON files from VCF…
jtarraga Aug 1, 2024
38400c1
app: update the CellBase loader for variation data according to the l…
jtarraga Aug 1, 2024
3117337
app: add check before building variation data, #TASK-5776, #TASK-5564
jtarraga Aug 2, 2024
9c810e7
lib: skip API-KEY param when parsing variant quey, #TASK-5564
jtarraga Aug 2, 2024
ec5f21a
server: update RESTful server to take into account multi-species, #TA…
jtarraga Aug 2, 2024
36c3609
lib: extract the FutureSpliceScoreAnnotator in a file to reduce the V…
jtarraga Aug 2, 2024
efa4824
lib: update the VariantAnnotationCalculator to support multi-species,…
jtarraga Aug 2, 2024
4326fa3
lib: add log messages in protein builder, #TASK-5776, #TASK-5564
jtarraga Aug 2, 2024
2c7ddfb
lib: set variant ID in VariantBuilder, #TASK-5576, #TASK-5564
jtarraga Aug 5, 2024
78211d0
lib: remove System.exit, #TASK-5576, #TASK-5564
jtarraga Aug 5, 2024
e0c6a13
lib: fix VariationBuilder by converting SV values from Ensembl to sta…
jtarraga Aug 5, 2024
81e4cb1
lib: add new command 'data-list' to display the list of data supporte…
jtarraga Aug 6, 2024
280fd67
app: update build options and fix sonnar issues, #TASK-5576, #TASK-5564
jtarraga Aug 6, 2024
2235e5c
app: update CLI option descriptions for loading, exporting, indexing.…
jtarraga Aug 6, 2024
6a4c16a
test: update JUnit tests according to the latest changes, #TASK-5564
jtarraga Aug 7, 2024
68c9f43
lib: improve variation builder by setting xref and annotation, and re…
jtarraga Aug 7, 2024
914b9c1
lib: remove break for testing, #TASK-5576, #TASK-5564
jtarraga Aug 7, 2024
3538e14
core: add ontology data into configuration file for "mus musculus" an…
jtarraga Aug 7, 2024
d0d92a3
lib: update ontology downloader and take into account multi-species s…
jtarraga Aug 8, 2024
d51114b
lib: update ontology builder and take into account multi-species supp…
jtarraga Aug 8, 2024
24450d3
app: update load command executor for ontology data according to the …
jtarraga Aug 8, 2024
132382d
app: check data according to the species before loading data, #TASK-6…
jtarraga Aug 8, 2024
d556c4c
app: fix sonnar issues, #TASK-6142, #TASK-5564
jtarraga Aug 8, 2024
1f3572c
lib: fix the function to save status and message of the downloaded fi…
jtarraga Aug 9, 2024
6056655
Merge branch 'develop' into TASK-5564
jtarraga Aug 10, 2024
a8d6368
core: add dbSNP in config file (removed after merging), #TASK-5564
jtarraga Aug 10, 2024
162f34d
add: improve species and assembly parameter descriptions, #TASK-5575,…
jtarraga Aug 13, 2024
344e92e
test: fix JUnit tests by updating configuration files, #TASK-5564
jtarraga Aug 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/task.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,5 @@ jobs:
uses: opencb/java-common-libs/.github/workflows/deploy-docker-hub-workflow.yml@develop
needs: test
with:
cli: python3 ./build/cloud/docker/docker-build.py push --images base --tag ${{ github.ref_name }}
cli: python3 ./build/cloud/docker/docker-build.py push --images base,builder --tag ${{ github.ref_name }}
secrets: inherit
10 changes: 7 additions & 3 deletions cellbase-app/app/cloud/docker/cellbase-builder/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ LABEL org.label-schema.vendor="OpenCB" \
## We need to be root to install dependencies
USER root
RUN apt-get update -y && \
apt-get install -y git default-mysql-client libjson-perl libdbi-perl libdbd-mysql-perl libdbd-mysql-perl libtry-tiny-perl && \
apt-get install -y git default-mysql-client libjson-perl libdbi-perl libdbd-mysql-perl libdbd-mysql-perl libtry-tiny-perl libxml-simple-perl liblog-log4perl-perl libxml-parser-perl libxml-dom-perl && \
mkdir /opt/ensembl && chown cellbase:cellbase /opt/ensembl && \
rm -rf /var/lib/apt/lists/*

Expand All @@ -26,6 +26,10 @@ RUN cd /opt/ensembl && \
git clone https://github.com/Ensembl/ensembl-variation.git && \
git clone https://github.com/Ensembl/ensembl-funcgen.git && \
git clone https://github.com/Ensembl/ensembl-compara.git && \
git clone https://github.com/Ensembl/ensembl-io.git
git clone https://github.com/Ensembl/ensembl-io.git && \
git clone --branch cvs/release-0_7 https://github.com/biomart/biomart-perl

ENV PERL5LIB=$PERL5LIB:/opt/ensembl/bioperl-live:/opt/ensembl/ensembl/modules:/opt/ensembl/ensembl-variation/modules:/opt/ensembl/ensembl-funcgen/modules:/opt/ensembl/ensembl-compara/modules:/opt/ensembl/lib/perl/5.18.2:/opt/cellbase/scripts/ensembl-scripts
## Give writting permissions to allow the script ensembl_canonical.pl to create sub-folder for cache purposes
RUN chmod -R 777 /opt/cellbase/scripts/ensembl-scripts/

ENV PERL5LIB=$PERL5LIB:/opt/ensembl/bioperl-live:/opt/ensembl/ensembl/modules:/opt/ensembl/ensembl-variation/modules:/opt/ensembl/ensembl-funcgen/modules:/opt/ensembl/ensembl-compara/modules:/opt/ensembl/lib/perl/5.18.2:/opt/cellbase/scripts/ensembl-scripts:/opt/ensembl/biomart-perl/lib
14 changes: 7 additions & 7 deletions cellbase-app/app/scripts/ensembl-scripts/DB_CONFIG.pm
Original file line number Diff line number Diff line change
Expand Up @@ -134,16 +134,16 @@ our $ENSEMBL_GENOMES_PORT = "4157";
our $ENSEMBL_GENOMES_USER = "anonymous";

## Vertebrates
our $HOMO_SAPIENS_CORE = "homo_sapiens_core_110_38";
our $HOMO_SAPIENS_VARIATION = "homo_sapiens_variation_110_38";
our $HOMO_SAPIENS_FUNCTIONAL = "homo_sapiens_funcgen_110_38";
our $HOMO_SAPIENS_COMPARA = "homo_sapiens_compara_110_38";
our $HOMO_SAPIENS_CORE = "homo_sapiens_core_111_38";
our $HOMO_SAPIENS_VARIATION = "homo_sapiens_variation_111_38";
our $HOMO_SAPIENS_FUNCTIONAL = "homo_sapiens_funcgen_111_38";
our $HOMO_SAPIENS_COMPARA = "homo_sapiens_compara_111_38";
#our $HOMO_SAPIENS_CORE = "homo_sapiens_core_78_38";
#our $HOMO_SAPIENS_VARIATION = "homo_sapiens_variation_78_38";
#our $HOMO_SAPIENS_FUNCTIONAL = "homo_sapiens_funcgen_78_38";
our $MUS_MUSCULUS_CORE = "mus_musculus_core_78_38";
our $MUS_MUSCULUS_VARIATION = "mus_musculus_variation_78_38";
our $MUS_MUSCULUS_FUNCTIONAL = "mus_musculus_funcgen_78_38";
our $MUS_MUSCULUS_CORE = "mus_musculus_core_111_39";
our $MUS_MUSCULUS_VARIATION = "mus_musculus_variation_111_39";
our $MUS_MUSCULUS_FUNCTIONAL = "mus_musculus_funcgen_111_39";
our $RATTUS_NORVEGICUS_CORE = "rattus_norvegicus_core_78_5";
our $RATTUS_NORVEGICUS_VARIATION = "rattus_norvegicus_variation_78_5";
our $RATTUS_NORVEGICUS_FUNCTIONAL = "rattus_norvegicus_funcgen_78_5";
Expand Down
61 changes: 61 additions & 0 deletions cellbase-app/app/scripts/ensembl-scripts/ensembl_canonical.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/env perl

use strict;
use Getopt::Long;
use Data::Dumper;
use JSON;
use DB_CONFIG;

use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

## Default values
my $species = 'hsapiens';
my $outdir = "./";

## Parsing command line
GetOptions ('species=s' => \$species, 'outdir=s' => \$outdir);


my $confFile = "/opt/cellbase/scripts/ensembl-scripts/martURLLocation.xml";

# NB: change action to 'clean' if you wish to start a fresh configuration
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
my $action='clean';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');

$query->setDataset($species."_gene_ensembl");

$query->addAttribute("ensembl_gene_id");
$query->addAttribute("ensembl_transcript_id");
$query->addAttribute("transcript_is_canonical");

$query->formatter("TSV");

# Open the file for writing
open(my $fh, '>', "$outdir/ensembl_canonical.txt") or die "Cannot open ensembl_canonical.txt file: $!";

# Save the original stdout
my $original_stdout = *STDOUT;
open(STDOUT, '>&', $fh) or die "Can't redirect STDOUT: $!";

my $query_runner = BioMart::QueryRunner->new();

# to obtain unique rows only
$query_runner->uniqueRowsOnly(1);
$query_runner->execute($query);
#$query_runner->printHeader();
#print ENSEMBL_CANONICAL $query_runner->printResults();
# Call printResults which prints to STDOUT (now redirected to the file)
$query_runner->printResults();
#$query_runner->printFooter();

# Restore the original stdout
open(STDOUT, '>&', $original_stdout) or die "Can't restore STDOUT: $!";

# Close the filehandle
close($fh) or die "Failed to close file: $!";
8 changes: 5 additions & 3 deletions cellbase-app/app/scripts/ensembl-scripts/gene_extra_info.pl
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@
####################################################################
## Parsing command line options ####################################
####################################################################
# USAGE: ./gene_extra_info.pl --species "Homo sapiens" --outdir ../../appl_db/ird_v1/hsa ...
##docker run -it --mount type=bind,source=/tmp,target=/tmp opencb/cellbase-builder:6.2.0-SNAPSHOT /opt/cellbase/scripts/ensembl-scripts/gene_extra_info.pl -s "Mus musculus" -o /tmp

# USAGE: ./gene_extra_info.pl --species "Homo sapiens" --assembly "GRCh38" --outdir ../../appl_db/ird_v1/hsa ...

## Parsing command line
GetOptions ('species=s' => \$species, 'assembly=s' => \$assembly, 'outdir=s' => \$outdir, 'phylo=s' => \$phylo,
Expand Down Expand Up @@ -50,8 +52,8 @@

if ($phylo eq "" || $phylo eq "vertebrate") {
print ("In vertebrates section\n");
if ($species eq "Homo sapiens" && $assembly eq "GRCh38") {
print ("Human selected, assembly ".$assembly." selected, connecting to port ".$ENSEMBL_PORT."\n");
if ($species eq "Homo sapiens" || $species eq "Mus musculus") {
print ($species." selected, assembly ".$assembly." selected, connecting to port ".$ENSEMBL_PORT."\n");
Bio::EnsEMBL::Registry->load_registry_from_db(
-host => $ENSEMBL_HOST,
-user => $ENSEMBL_USER,
Expand Down
32 changes: 12 additions & 20 deletions cellbase-app/app/scripts/ensembl-scripts/genome_info.pl
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@
####################################################################
## Parsing command line options ####################################
####################################################################
# USAGE: ./genome_info.pl --species "Homo sapiens" --outfile ../../appl_db/ird_v1/hsa ...
##docker run -it --mount type=bind,source=/tmp,target=/tmp opencb/cellbase-builder:6.2.0-SNAPSHOT /opt/cellbase/scripts/ensembl-scripts/genome_info.pl --species "Mus musculus" --assembly GRCm39 --outfile /tmp

# USAGE: ./genome_info.pl --species "Homo sapiens" --assembly GRCh38 --outfile ../../appl_db/ird_v1/hsa ...

## Parsing command line
GetOptions ('species=s' => \$species, 'assembly=s' => \$assembly, 'o|outfile=s' => \$outfile, 'phylo=s' => \$phylo,
Expand All @@ -29,7 +31,6 @@

if ($outfile eq "") {
$outfile = "/ensembl-data/genome_info.json";
# $outfile = "/ensembl-data/$species.json";
}

####################################################################
Expand All @@ -42,17 +43,13 @@
# Bio::EnsEMBL::Registry->load_all("$ENSEMBL_REGISTRY");
if($phylo eq "" || $phylo eq "vertebrate") {
print ("In vertebrates section\n");
if ($species eq "Homo sapiens" && $assembly eq "GRCh38") {
print ("Human selected, assembly ".$assembly." selected, connecting to port ".$ENSEMBL_PORT."\n");
Bio::EnsEMBL::Registry->load_registry_from_db(
-host => $ENSEMBL_HOST,
-user => $ENSEMBL_USER,
-port => $ENSEMBL_PORT,
-verbose => $verbose
);
} else {
print ("Human selected, assembly ".$assembly." no supported\n");
}
print ("Species: ".$species.", assembly ".$assembly.", connecting to: ".$ENSEMBL_HOST.":".$ENSEMBL_PORT."\n");
Bio::EnsEMBL::Registry->load_registry_from_db(
-host => $ENSEMBL_HOST,
-user => $ENSEMBL_USER,
-port => $ENSEMBL_PORT,
-verbose => $verbose
);
} else {
print ("In no-vertebrates section\n");
Bio::EnsEMBL::Registry->load_registry_from_db(
Expand All @@ -64,7 +61,6 @@

my $slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor($species, "core", "Slice");
my $karyotype_adaptor = Bio::EnsEMBL::Registry->get_adaptor($species, "core", "KaryotypeBand");
# my $gene_adaptor = Bio::EnsEMBL::Registry->get_adaptor($species, "core", "Gene");
####################################################################

my %info_stats = ();
Expand All @@ -81,12 +77,10 @@
$chromosome{'start'} = int($chrom->start());
$chromosome{'end'} = int($chrom->end());
$chromosome{'size'} = int($chrom->seq_region_length());
# $chromosome{'numberGenes'} = scalar @{$chrom->get_all_Genes()};
$chromosome{'isCircular'} = $chrom->is_circular();

my @cytobands = ();
foreach my $cyto(@{$karyotype_adaptor->fetch_all_by_chr_name($chrom->seq_region_name)}) {
# print $cytoband->name."\n";
my %cytoband = ();
$cytoband{'name'} = $cyto->name();
$cytoband{'start'} = int($cyto->start());
Expand All @@ -96,7 +90,7 @@
push(@cytobands, \%cytoband);
}

## check if any cytoband has been added
## Check if any cytoband has been added
## If not a unique cytoband covering all chromosome is added.
if(@cytobands == 0) {
my %cytoband = ();
Expand All @@ -110,7 +104,6 @@
$chromosome{'cytobands'} = \@cytobands;

push(@chromosomes, \%chromosome);
# push(@chrom_ids, $chrom->seq_region_name);
}
$info_stats{'chromosomes'} = \@chromosomes;

Expand All @@ -124,7 +117,6 @@
$supercontig{'start'} = int($supercon->start());
$supercontig{'end'} = int($supercon->end());
$supercontig{'size'} = int($supercon->seq_region_length());
# $supercontig{'numberGenes'} = scalar @{$supercon->get_all_Genes()};
$supercontig{'isCircular'} = $supercon->is_circular();

## Adding an unique cytoband covering all chromosome is added.
Expand All @@ -151,7 +143,7 @@

sub print_parameters {
print "Parameters: ";
print "species: $species, outfile: $outfile, ";
print "species: $species, assembly: $assembly, outfile: $outfile, ";
print "ensembl-registry: $ENSEMBL_REGISTRY, ";
print "ensembl-host: $ENSEMBL_HOST, ensembl-port: $ENSEMBL_PORT, ";
print "ensembl-user: $ENSEMBL_USER, verbose: $verbose, help: $help";
Expand Down
19 changes: 19 additions & 0 deletions cellbase-app/app/scripts/ensembl-scripts/martURLLocation.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<!--
~ Copyright 2015-2020 OpenCB
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

<MartRegistry>
<MartURLLocation database="ensembl_mart_111" default="1" displayName="Ensembl Genes 111" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_ENSEMBL" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
</MartRegistry>
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ public class CommonCommandOptions {
description = "Set the logging level, accepted values are: debug, info, warn, error and fatal")
public String logLevel = "info";

@Parameter(names = {"-C", "--config"}, arity = 1,
description = "Path to CellBase configuration.yml file")
@Deprecated
@Parameter(names = {"-C", "--config"}, arity = 1, hidden = true, description = "Path to CellBase configuration.yml file")
public String conf;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,18 +35,12 @@
import java.nio.file.Path;
import java.nio.file.Paths;

/**
* Created by imedina on 03/02/15.
*/

public abstract class CommandExecutor {

protected String logLevel;
// protected boolean verbose;
protected String conf;

@Deprecated
protected String configFile;

protected String appHome;

protected CellBaseConfiguration configuration;
Expand All @@ -55,35 +49,13 @@ public abstract class CommandExecutor {
protected Logger logger;

public CommandExecutor() {

}

public CommandExecutor(String logLevel, String conf) {
this.logLevel = logLevel;
this.conf = conf;

/**
* System property 'app.home' is set up by cellbase.sh. If by any reason this is null
* then CELLBASE_HOME environment variable is used instead.
*/
this.appHome = System.getProperty("app.home", System.getenv("CELLBASE_HOME"));

if (StringUtils.isEmpty(conf)) {
this.conf = this.appHome + "/conf";
}

if (logLevel != null && !logLevel.isEmpty()) {
// We must call to this method
setLogLevel(logLevel);
}
}

public CommandExecutor(String logLevel, boolean verbose, String conf) {
this.logLevel = logLevel;
// this.verbose = verbose;
this.conf = conf;

/**
/*
* System property 'app.home' is set up by cellbase.sh. If by any reason this is null
* then CELLBASE_HOME environment variable is used instead.
*/
Expand Down Expand Up @@ -124,41 +96,30 @@ public void setLogLevel(String logLevel) {
this.logLevel = logLevel;
}

// public boolean isVerbose() {
// return verbose;
// }
//
// public void setVerbose(boolean verbose) {
// this.verbose = verbose;
// }

public String getConfigFile() {
return configFile;
}

public void setConfigFile(String configFile) {
this.configFile = configFile;
}

public Logger getLogger() {
return logger;
}

/*
/**
* This method attempts to first data configuration from CLI parameter, if not present then uses
* the configuration from installation directory, if not exists then loads JAR configuration.json or yml.
*
* @throws URISyntaxException If any URI problem occurs
* @throws IOException If any IO problem occurs
*/
public void loadCellBaseConfiguration() throws URISyntaxException, IOException {
Path confPath = Paths.get(this.conf);
FileUtils.checkDirectory(confPath);

if (Files.exists(confPath.resolve("configuration.json"))) {
logger.debug("Loading configuration from '{}'", confPath.resolve("configuration.json").toAbsolutePath());
this.configuration = CellBaseConfiguration.load(new FileInputStream(confPath.resolve("configuration.json").toFile()),
CellBaseConfiguration.ConfigurationFileFormat.JSON);
this.configuration = CellBaseConfiguration
.load(Files.newInputStream(confPath.resolve("configuration.json").toFile().toPath()),
CellBaseConfiguration.ConfigurationFileFormat.JSON);
} else if (Files.exists(Paths.get(this.appHome + "/conf/configuration.yml"))) {
logger.debug("Loading configuration from '{}'", this.appHome + "/conf/configuration.yml");
this.configuration = CellBaseConfiguration.load(new FileInputStream(new File(this.appHome + "/conf/configuration.yml")));
this.configuration = CellBaseConfiguration
.load(Files.newInputStream(new File(this.appHome + "/conf/configuration.yml").toPath()));
} else {
InputStream inputStream = CellBaseConfiguration.class.getClassLoader().getResourceAsStream("conf/configuration.json");
String configurationFilePath = "conf/configuration.json";
Expand Down Expand Up @@ -198,10 +159,4 @@ public void loadClientConfiguration() throws IOException {
}
}
}

protected void makeDir(Path folderPath) throws IOException {
if (!Files.exists(folderPath)) {
Files.createDirectories(folderPath);
}
}
}
Loading