diff --git a/README.md b/README.md index 63f1acbb..38c04ce4 100644 --- a/README.md +++ b/README.md @@ -3,27 +3,31 @@ [![Version](https://img.shields.io/github/v/release/gagneurlab/drop?include_prereleases)](https://github.com/gagneurlab/drop/releases) [![Version](https://readthedocs.org/projects/gagneurlab-drop/badge/?version=latest)](https://gagneurlab-drop.readthedocs.io/en/latest) -The detection of RNA Outliers Pipeline (DROP) is an integrative workflow to detect aberrant expression, aberrant splicing, and mono-allelic expression from raw sequencing data. + +The detection of RNA Outliers Pipeline (DROP) is an integrative workflow to detect aberrant expression, aberrant splicing, and mono-allelic expression from raw RNA sequencing data. The manuscript is available in [Nature Protocols](https://www.nature.com/articles/s41596-020-00462-5). -The website containing the different reports of the Geuvadis demo dataset described in the paper can be found [here](https://cmm.in.tum.de/public/paper/drop_analysis/webDir/drop_analysis_index.html). +This [website](https://cmm.in.tum.de/public/paper/drop_analysis/webDir/drop_analysis_index.html) contains the different reports of the Geuvadis demo dataset described in the paper. + +This [video](https://www.youtube.com/watch?v=XvgjiFQClhM&t=2761s) presents the tools used in DROP and their application to rare disease diagnostics. -This [video](https://www.youtube.com/watch?v=XvgjiFQClhM&t=2761s) introduces the tools used in DROP and their application to rare disease diagnostics. drop logo ## Quickstart DROP is available on [bioconda](https://anaconda.org/bioconda/drop). -We recommend using a dedicated conda environment (`drop_env` in this example). Installation time: ~ 10min. +We recommend using a dedicated conda environment (`drop_env` in this example). Installation time: ~ 15 min. ``` mamba create -n drop_env -c conda-forge -c bioconda drop --override-channels ``` -In the case of mamba/conda troubles, we recommend using the fixed `DROP_.yaml` installation file we make available on our [public server](https://www.cmm.in.tum.de/public/paper/drop_analysis/). Install the current version and use the full path in the following command to install the conda environment `drop_env` + +In the case troubles with mamba or conda, we recommend using the fixed `DROP_.yaml` installation file we make available on our [public server](https://www.cmm.in.tum.de/public/paper/drop_analysis/). Install the current version and use the full path in the following command to install the conda environment `drop_env` ``` -mamba env create -f DROP_1.3.4.yaml +mamba env create -f DROP_1.4.0.yaml + ``` Test installation with demo project @@ -47,6 +51,8 @@ For more information on different installation options, refer to the ## What's new + +Version 1.4.0 fixes some bugs regarding the split reads counting for FRASER, which only affects cohorts containing samples with different strands (stranded forward, stranded reverse or unstranded). If your cohort contained samples with different strands, please rerun the AS module using version 1.4.0. In addition, due to snakemake updates affecting wBuild and the way we installed FRASER, installing DROP 1.3.3 no longer works. Version 1.3.4 fixes the FRASER version to ensure reproducibility and fixes certain scripts affected by the snakemake update. Running the pipeline with version 1.3.4 should provide the same outlier results as 1.3.3. Due to snakemake updates affecting wBuild and the way we installed FRASER, installing DROP 1.3.3 no longer works. Version 1.3.4 simply fixes the FRASER version to ensure reproducibility and fixes certain scripts affected by the snakemake update. Running the pipeline with the version 1.3.4 should provide the same outlier results as 1.3.3. Version 1.3.0 introduces the option to use FRASER 2.0 which is an improved version of FRASER that uses the Intron Jaccard Index metric instead of percent spliced in and splicing efficiency to quantify and later call aberrant splicing. To run FRASER 2.0, modify the `FRASER_version` parameter in the aberrantSplicing dictionary in the config file and adapt the `quantileForFiltering` and `deltaPsiCutoff` parameters. See the [config template](https://github.com/gagneurlab/drop/blob/master/drop/template/config.yaml) for more details. When switching between FRASER versions, we recommend running DROP in a @@ -56,7 +62,9 @@ separate folder for each version. Moreover, DROP now allows users to provide lis As of version 1.2.1 DROP has a new module that performs RNA-seq variant calling. The input are BAM files and the output either a single-sample or a multi-sample VCF file (option specified by the user) annotated with allele frequencies from gnomAD (if specified by the user). The sample annotation table does not need to be changed, but several new parameters in the config file have to be added and tuned. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#rna-variant-calling-dictionary). -Also, as of version 1.2.1 the integration of external split and non-split counts to detect aberrant splicing is now possible. Simply specify in a new column in the sample annotation the directory containing the counts. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#external-count-examples). + +Also, as of version 1.2.1 the integration of external split and non-split counts to detect aberrant splicing is now possible. In a new column in the sample annotation, simply specify the directory containing the counts. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#external-count-examples). + ## Set up a custom project diff --git a/docs/source/conf.py b/docs/source/conf.py index 749be82b..e675c861 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -23,7 +23,8 @@ author = 'Michaela Müller' # The full version, including alpha/beta/rc tags -release_ = '1.3.4' +release_ = '1.4.0' + diff --git a/docs/source/installation.rst b/docs/source/installation.rst index ad120274..002defe1 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -20,7 +20,8 @@ Install the latest version and use the full path in the following command to ins .. code-block:: bash - mamba env create -f DROP_1.3.4.yaml + mamba env create -f DROP_1.4.0.yaml + Installation time: ~ 10min diff --git a/drop/__init__.py b/drop/__init__.py index 95a02cb1..24621ccd 100644 --- a/drop/__init__.py +++ b/drop/__init__.py @@ -4,5 +4,5 @@ from . import utils from . import demo -__version__ = "1.3.4" +__version__ = "1.4.0" diff --git a/drop/cli.py b/drop/cli.py index 0795ea87..4a788f38 100644 --- a/drop/cli.py +++ b/drop/cli.py @@ -17,7 +17,8 @@ @click.group() @click_log.simple_verbosity_option(logger) -@click.version_option('1.3.4',prog_name='drop') + +@click.version_option('1.4.0',prog_name='drop') def main(): diff --git a/drop/config/SampleAnnotation.py b/drop/config/SampleAnnotation.py index f6f7911b..4b9a9190 100644 --- a/drop/config/SampleAnnotation.py +++ b/drop/config/SampleAnnotation.py @@ -47,6 +47,14 @@ def parse(self, sep='\t'): optional_columns = {"GENE_COUNTS_FILE", "SPLICE_COUNTS_DIR", "GENE_ANNOTATION", "GENOME"} annotationTable = pd.read_csv(self.file, sep=sep, index_col=False) + + # FRASER cannot handle a mixture of stranded and unstranded samples, raise error in such cases + if (annotationTable['STRAND'] == 'no').any() & ((annotationTable['STRAND'] == 'reverse').any() | (annotationTable['STRAND'] == 'yes').any()): + raise ValueError("Data contains a mix of stranded and unstranded samples. Please analyze them separately.\n") + + if annotationTable['STRAND'].isnull().values.any(): + raise ValueError("STRAND is not provided for some samples. All samples should have STRAND value in the sample annotation file.\n") + missing_cols = [x for x in self.SAMPLE_ANNOTATION_COLUMNS if x not in annotationTable.columns.values] if len(missing_cols) > 0: if "GENE_ANNOTATION" in missing_cols and "ANNOTATION" in annotationTable.columns.values: diff --git a/drop/demo/sample_annotation_relative.tsv b/drop/demo/sample_annotation_relative.tsv index 92295724..2a1f248c 100644 --- a/drop/demo/sample_annotation_relative.tsv +++ b/drop/demo/sample_annotation_relative.tsv @@ -1,15 +1,15 @@ RNA_ID RNA_BAM_FILE DNA_VCF_FILE DNA_ID DROP_GROUP PAIRED_END COUNT_MODE COUNT_OVERLAPS STRAND HPO_TERMS GENE_COUNTS_FILE GENE_ANNOTATION GENOME SPLICE_COUNTS_DIR -HG00096 Data/rna_bam/HG00096_ncbi.bam Data/dna_vcf/demo_chr21_ncbi.vcf.gz HG00096 outrider,fraser,mae,batch_0 True IntersectionStrict True no HP:0009802,HP:0010896 ncbi -HG00103 Data/rna_bam/HG00103.bam Data/dna_vcf/demo_chr21.vcf.gz HG00103 outrider,fraser,mae,batch_1 True IntersectionStrict True no HP:0004582,HP:0031959 ucsc -HG00106 Data/rna_bam/HG00106.bam Data/dna_vcf/demo_chr21.vcf.gz HG00106 outrider,outrider_external,fraser,fraser_external,mae,batch_1 True IntersectionStrict True no HP:0002895,HP:0006731 ucsc -HG00111 Data/rna_bam/HG00111.bam Data/dna_vcf/demo_chr21.vcf.gz HG00111 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0100491,HP:0100871 -HG00116 Data/rna_bam/HG00116.bam Data/dna_vcf/demo_chr21.vcf.gz HG00116 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0030613,HP:0012767 -HG00126 Data/rna_bam/HG00126.bam Data/dna_vcf/demo_chr21.vcf.gz HG00126 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0000290,HP:0000293 -HG00132 Data/rna_bam/HG00132.bam Data/dna_vcf/demo_chr21.vcf.gz HG00132 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0006489,HP:0006490 -HG00149 Data/rna_bam/HG00149.bam Data/dna_vcf/demo_chr21.vcf.gz HG00149 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0000014,HP:0000020,HP:0032663 -HG00150 Data/rna_bam/HG00150.bam Data/dna_vcf/demo_chr21.vcf.gz HG00150 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0030809,HP:0006144 -HG00176 Data/rna_bam/HG00176.bam Data/dna_vcf/demo_chr21.vcf.gz HG00176 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0005215,HP:0010234 -HG00178 outrider_external Data/external_count_data/geneCounts.tsv.gz v29 -HG00181 outrider_external Data/external_count_data/geneCounts.tsv.gz v29 -HG00191 fraser_external Data/external_count_data -HG00201 fraser_external Data/external_count_data +HG00096 Data/rna_bam/HG00096_ncbi.bam Data/dna_vcf/demo_chr21_ncbi.vcf.gz HG00096 outrider,fraser,mae,batch_0 TRUE IntersectionStrict TRUE no HP:0009802,HP:0010896 ncbi +HG00103 Data/rna_bam/HG00103.bam Data/dna_vcf/demo_chr21.vcf.gz HG00103 outrider,fraser,mae,batch_1 TRUE IntersectionStrict TRUE no HP:0004582,HP:0031959 ucsc +HG00106 Data/rna_bam/HG00106.bam Data/dna_vcf/demo_chr21.vcf.gz HG00106 outrider,outrider_external,fraser,fraser_external,mae,batch_1 TRUE IntersectionStrict TRUE no HP:0002895,HP:0006731 ucsc +HG00111 Data/rna_bam/HG00111.bam Data/dna_vcf/demo_chr21.vcf.gz HG00111 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0100491,HP:0100871 +HG00116 Data/rna_bam/HG00116.bam Data/dna_vcf/demo_chr21.vcf.gz HG00116 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0030613,HP:0012767 +HG00126 Data/rna_bam/HG00126.bam Data/dna_vcf/demo_chr21.vcf.gz HG00126 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0000290,HP:0000293 +HG00132 Data/rna_bam/HG00132.bam Data/dna_vcf/demo_chr21.vcf.gz HG00132 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0006489,HP:0006490 +HG00149 Data/rna_bam/HG00149.bam Data/dna_vcf/demo_chr21.vcf.gz HG00149 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0000014,HP:0000020,HP:0032663 +HG00150 Data/rna_bam/HG00150.bam Data/dna_vcf/demo_chr21.vcf.gz HG00150 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0030809,HP:0006144 +HG00176 Data/rna_bam/HG00176.bam Data/dna_vcf/demo_chr21.vcf.gz HG00176 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0005215,HP:0010234 +HG00178 outrider_external no Data/external_count_data/geneCounts.tsv.gz v29 +HG00181 outrider_external no Data/external_count_data/geneCounts.tsv.gz v29 +HG00191 fraser_external no Data/external_count_data +HG00201 fraser_external no Data/external_count_data diff --git a/drop/installRPackages.R b/drop/installRPackages.R index 6635ce61..42148f58 100644 --- a/drop/installRPackages.R +++ b/drop/installRPackages.R @@ -23,20 +23,18 @@ if (file.exists(args[1])){ package=gsub("=.*", "", unlist(args)), version=gsub(".*=", "", unlist(args)), ref="") - packages[package == version, c("min_version", "max_version") := ""] + packages[package == version, version:=NA] } installed <- as.data.table(installed.packages()) for (pckg_name in packages$package) { package_dt <- packages[package == pckg_name] pckg_name <- gsub(".*/", "", pckg_name) - min_version <- package_dt$min_version - max_version <- package_dt$max_version + version <- package_dt$version branch <- package_dt$ref - if (!pckg_name %in% installed$Package || (min_version != "" && (compareVersion( - installed[Package == pckg_name, Version], min_version) < 0 || compareVersion( - installed[Package == pckg_name, Version], max_version) > 0))) { + if (!pckg_name %in% installed$Package || (!is.na(version) && compareVersion( + installed[Package == pckg_name, Version], version) < 0)) { package <- package_dt$package message(paste("install", package)) diff --git a/drop/modules/aberrant-splicing-pipeline/Counting/01_0_countRNA_init.R b/drop/modules/aberrant-splicing-pipeline/Counting/01_0_countRNA_init.R index 41e2ae5f..7b4117b5 100644 --- a/drop/modules/aberrant-splicing-pipeline/Counting/01_0_countRNA_init.R +++ b/drop/modules/aberrant-splicing-pipeline/Counting/01_0_countRNA_init.R @@ -29,16 +29,15 @@ params <- snakemake@config$aberrantSplicing # Create initial FRASER object col_data <- fread(colDataFile) +col_data$strand <- 0L + fds <- FraserDataSet(colData = col_data, workingDir = workingDir, name = paste0("raw-local-", dataset)) # Add paired end and strand specificity to the fds pairedEnd(fds) <- colData(fds)$PAIRED_END -strandSpecific(fds) <- 'no' -if(uniqueN(colData(fds)$STRAND) == 1){ - strandSpecific(fds) <- unique(colData(fds)$STRAND) -} +strandSpecific(fds) <- colData(fds)$STRAND # Save initial FRASER dataset fds <- saveFraserDataSet(fds) diff --git a/drop/modules/aberrant-splicing-pipeline/Counting/01_1_countRNA_splitReads_samplewise.R b/drop/modules/aberrant-splicing-pipeline/Counting/01_1_countRNA_splitReads_samplewise.R index 36eeb798..04d7cb78 100644 --- a/drop/modules/aberrant-splicing-pipeline/Counting/01_1_countRNA_splitReads_samplewise.R +++ b/drop/modules/aberrant-splicing-pipeline/Counting/01_1_countRNA_splitReads_samplewise.R @@ -35,7 +35,7 @@ sample_id <- snakemake@wildcards[["sample_id"]] # If data is not strand specific, add genome info genome <- NULL -if(strandSpecific(fds) == 0){ +if(strandSpecific(fds[, sample_id]) == 0){ genome <- getBSgenome(genomeAssembly) } diff --git a/drop/modules/aberrant-splicing-pipeline/Counting/03_filter_expression_FraseR.R b/drop/modules/aberrant-splicing-pipeline/Counting/03_filter_expression_FraseR.R index ae64b323..683495b4 100644 --- a/drop/modules/aberrant-splicing-pipeline/Counting/03_filter_expression_FraseR.R +++ b/drop/modules/aberrant-splicing-pipeline/Counting/03_filter_expression_FraseR.R @@ -53,6 +53,9 @@ if(length(exCountIDs) > 0){ for(resource in unique(exCountFiles)){ exSampleIDs <- exCountIDs[exCountFiles == resource] exAnno <- fread(sample_anno_file, key="RNA_ID")[J(exSampleIDs)] + exAnno[, strand:=exAnno$STRAND] + exAnno$strand<- sapply(exAnno[, strand], switch, 'no' = 0L, 'unstranded' = 0L, + 'yes' = 1L, 'stranded' = 1L, 'reverse' = 2L, -1L) setnames(exAnno, "RNA_ID", "sampleID") ctsNames <- c("k_j", "k_theta", "n_psi3", "n_psi5", "n_theta") diff --git a/drop/modules/aberrant-splicing-pipeline/FRASER/08_extract_results_FraseR.R b/drop/modules/aberrant-splicing-pipeline/FRASER/08_extract_results_FraseR.R index 87dba6ca..81dcb665 100644 --- a/drop/modules/aberrant-splicing-pipeline/FRASER/08_extract_results_FraseR.R +++ b/drop/modules/aberrant-splicing-pipeline/FRASER/08_extract_results_FraseR.R @@ -54,26 +54,10 @@ print('Results per junction extracted') # Add features if(nrow(res_junc_dt) > 0){ - - # dcast to have one row per outlier - subsets <- res_junc_dt[, unique(FDR_set)] - subsets <- subsets[subsets != "transcriptome-wide"] - colorder <- colnames(res_junc_dt[, !"FDR_set", with=FALSE]) - res_junc_dt <- dcast(res_junc_dt, ... ~ FDR_set, value.var="padjust") - setnames(res_junc_dt, "transcriptome-wide", "padjust") - for(subset_name in subsets){ - setnames(res_junc_dt, subset_name, paste0("padjust_", subset_name)) - } - setcolorder(res_junc_dt, colorder) - # number of samples per gene and variant res_junc_dt[, numSamplesPerGene := uniqueN(sampleID), by = hgncSymbol] res_junc_dt[, numEventsPerGene := .N, by = "hgncSymbol,sampleID"] res_junc_dt[, numSamplesPerJunc := uniqueN(sampleID), by = "seqnames,start,end,strand"] - - # add colData to the results - res_junc_dt <- merge(res_junc_dt, as.data.table(colData(fds)), by = "sampleID") - res_junc_dt[, c("bamFile", "pairedEnd", "STRAND", "RNA_BAM_FILE", "DNA_VCF_FILE", "COUNT_MODE", "COUNT_OVERLAPS") := NULL] } else{ warning("The aberrant splicing pipeline gave 0 intron-level results for the ", dataset, " dataset.") } @@ -83,15 +67,6 @@ res_gene <- results(fds, psiType=psiTypes, aggregate=TRUE, collapse=FALSE, all=TRUE) res_genes_dt <- as.data.table(res_gene) -subsets <- res_genes_dt[, unique(FDR_set)] -subsets <- subsets[subsets != "transcriptome-wide"] -colorder <- colnames(res_genes_dt[, !c("padjust", "FDR_set"), with=FALSE]) -res_genes_dt <- dcast(res_genes_dt[,!"padjust", with=FALSE], ... ~ FDR_set, value.var="padjustGene") -setnames(res_genes_dt, "transcriptome-wide", "padjustGene") -for(subset_name in subsets){ - setnames(res_genes_dt, subset_name, paste0("padjustGene_", subset_name)) -} -setcolorder(res_genes_dt, colorder) print('Results per gene extracted') write_tsv(res_genes_dt, file=snakemake@output$resultTableGene_full) @@ -103,9 +78,6 @@ res_genes_dt <- res_genes_dt[do.call(pmin, c(res_genes_dt[,padj_cols, with=FALSE totalCounts >= 5,] if(nrow(res_genes_dt) > 0){ - res_genes_dt <- merge(res_genes_dt, as.data.table(colData(fds)), by = "sampleID") - res_genes_dt[, c("bamFile", "pairedEnd", "STRAND", "RNA_BAM_FILE", "DNA_VCF_FILE", "COUNT_MODE", "COUNT_OVERLAPS") := NULL] - # add HPO overlap information sa <- fread(snakemake@config$sampleAnnotation, colClasses = c(RNA_ID = 'character', DNA_ID = 'character')) diff --git a/drop/requirementsR.txt b/drop/requirementsR.txt index 6461207f..bd4a3a8b 100644 --- a/drop/requirementsR.txt +++ b/drop/requirementsR.txt @@ -1,8 +1,8 @@ -package min_version max_version ref +package version ref devtools -gagneurlab/OUTRIDER 1.20.1 1.20.1 1.20.1 -gagneurlab/FRASER 1.99.3 1.99.3 d6a422c -gagneurlab/tMAE 1.0.4 1.0.4 1.0.4 +gagneurlab/OUTRIDER 1.20.1 1.20.1 +gagneurlab/FRASER 1.99.4 1.99.4 +gagneurlab/tMAE 1.0.4 1.0.4 VariantAnnotation rmarkdown knitr diff --git a/setup.cfg b/setup.cfg index ae813b4e..2548d125 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,5 +1,6 @@ [bumpversion] -current_version = 1.3.4 + +current_version = 1.4.0 commit = True [bumpversion:file:setup.py] diff --git a/setup.py b/setup.py index 9499792c..eefe3a86 100644 --- a/setup.py +++ b/setup.py @@ -21,7 +21,8 @@ setuptools.setup( name="drop", - version="1.3.4", + version="1.4.0", + author="Vicente A. Yépez, Michaela Müller, Nicholas H. Smith, Daniela Klaproth-Andrade, Luise Schuller, Ines Scheller, Christian Mertes , Julien Gagneur ", author_email="yepez@in.tum.de",