You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am new to DROP. We have a single affected sample and hence wanted to use the publicly available control dataset of external counts. First I wanted to test the pipeline to see if I have setup the input files properly(using version 1.4.0).
This is my current sample annotation file taken from https://doi.org/10.5281/zenodo.3963470:
projectTitle: "DROP: Detection of RNA Outliers Pipeline"
root: /projectnb/4kajdatx/drop_project/Output # root directory of all output objects and tables
htmlOutputPath: Output/html # path for HTML rendered reports
indexWithFolderName: true # whether the root base name should be part of the index name
hpoFile: null # if null, downloads it from webserver
sampleAnnotation: /projectnb/4kajdatx/drop_project/Data/sample_annotation.tsv # path to sample annotation (see documentation on how to create it)
geneAnnotation:
gencode34: /projectnb/4kajdatx/drop_project/Data/gencode.v34lift37.annotation.gtf
genomeAssembly: hg19
genome:
hg19: /projectnb/4kajdatx/drop_project/Data/GRCh37.primary_assembly.genome.fa # path to reference genome sequence in fasta format.
# You can define multiple reference genomes in yaml format, ncbi: path/to/ncbi, ucsc: path/to/ucsc
# the keywords that define the path should be in the GENOME column of the sample annotation table
random_seed: true # just for demo data, remove for analysis
exportCounts:
# specify which gene annotations to include and which
# groups to exclude when exporting counts
geneAnnotations:
- gencode34
excludeGroups:
aberrantExpression:
run: true
groups:
- outrider
fpkmCutoff: 1
implementation: autoencoder
padjCutoff: 0.05
zScoreCutoff: 0
genesToTest: null
maxTestedDimensionProportion: 3
yieldSize: 2000000
aberrantSplicing:
run: false
groups:
- fraser
recount: false
longRead: false
keepNonStandardChrs: false
filter: true
minExpressionInOneSample: 20
quantileMinExpression: 10
minDeltaPsi: 0.05
implementation: PCA
padjCutoff: 0.1
maxTestedDimensionProportion: 6
genesToTest: null
### FRASER1 configuration
FRASER_version: "FRASER"
deltaPsiCutoff : 0.3
quantileForFiltering: 0.95
### For FRASER2, use the follwing parameters instead of the 3 lines above:
# FRASER_version: "FRASER2"
# deltaPsiCutoff : 0.1
# quantileForFiltering: 0.75
mae:
run: false
groups:
"config.yaml" 100L, 2981C
When I run snakemake aberrantExpression --cores 1, I get this error:
Error in rowRanges<-(*tmp*, value = new("CompressedGRangesList", unlistData = new("GRanges", :
nb of rows in 'assay' (62455) must equal nb of rows in 'rowData' (62492)
Calls: rowRanges<- -> rowRanges<-
Execution halted
[Mon Oct 28 11:47:03 2024]
Error in rule AberrantExpression_pipeline_Counting_mergeCounts_R:
jobid: 4
input: /projectnb/4kajdatx/drop_project/Data/control_counts/geneCounts.tsv.gz, /projectnb/4kajdatx/drop_project/Output/processed_data/aberrant_expression/gencode34/count_ranges.Rds, /projectnb/4kajdatx/drop_project/Output/processed_data/aberrant_expression/gencode34/params/merge/outrider_mergeParams.csv, Scripts/AberrantExpression/pipeline/Counting/mergeCounts.R
output: /projectnb/4kajdatx/drop_project/Output/processed_data/aberrant_expression/gencode34/outrider/outrider/total_counts.Rds
log: /projectnb/4kajdatx/drop_project/.drop/tmp/AE/gencode34/outrider/merge.Rds (check log file(s) for error details)
RuleException:
CalledProcessError in file /tmp/tmpn8qsm0fb, line 48:
Command 'set -euo pipefail; Rscript --vanilla /projectnb/4kajdatx/drop_project/.snakemake/scripts/tmpo5sn0bp9.mergeCounts.R' returned non-zero exit status 1.
File "/tmp/tmpn8qsm0fb", line 48, in __rule_AberrantExpression_pipeline_Counting_mergeCounts_R
File "/projectnb/4kajdatx/miniconda3/envs/drop_env/lib/python3.12/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-28T114347.025705.snakemake.log
I am not sure what I am doing wrong. I would greatly appreciate your guidance in this.
Best,
Madesh
The text was updated successfully, but these errors were encountered:
So apparently the annotation file has more genes than the counts file and the excess transcripts had to be removed? I had to edit countsRanges.Rds manually and rerun from the incomplete step. Why does the pipeline expect the same rows in both?
This is what I did to resolve the issue -
In R,
Load geneCounts.tsv.gz:
control_counts <- read.table("/projectnb/4kajdatx/drop_project/Data/control_counts/geneCounts.tsv.gz", header = TRUE,sep = "\t")
control_gene_ids <- unique(control_counts$geneID)
# Load count_ranges
count_ranges <- readRDS("/projectnb/4kajdatx/drop_project/Output/processed_data/aberrant_expression/gencode34/count_ranges.Rds")
# Filter count_ranges based on control_gene_ids
count_ranges_filtered <- count_ranges[names(count_ranges) %in% control_gene_ids]
# Save the filtered count_ranges
saveRDS(count_ranges_filtered, "/projectnb/4kajdatx/drop_project/Output/processed_data/aberrant_expression/gencode34/count_ranges.Rds")
Then I ran : snakemake aberrantExpression --rerun-incomplete --cores 1 again
Hello, I am new to DROP. We have a single affected sample and hence wanted to use the publicly available control dataset of external counts. First I wanted to test the pipeline to see if I have setup the input files properly(using version 1.4.0).
This is my current sample annotation file taken from https://doi.org/10.5281/zenodo.3963470:
The gene annotation file I use is from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/GRCh37_mapping/gencode.v34lift37.annotation.gtf.gz
This is my config.yaml:
When I run snakemake aberrantExpression --cores 1, I get this error:
I am not sure what I am doing wrong. I would greatly appreciate your guidance in this.
Best,
Madesh
The text was updated successfully, but these errors were encountered: