Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop demo fails at rnaVariantCalling/pipeline/Snakefile line 446 #477

Open
nicholas-owen opened this issue Jun 20, 2023 · 18 comments
Open

Comments

@nicholas-owen
Copy link

nicholas-owen commented Jun 20, 2023

HI, I am running drop version 1.3.3, with the command using demo data of snakemake --cores 1 but get the error:

Waiting at most 5 seconds for missing files.
MissingOutputException in rule bqsr in file /mnt/f/dropdemo/Scripts/rnaVariantCalling/pipeline/Snakefile, line 446:
Job 158  completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
/mnt/f/dropdemo/Output/processed_data/rnaVariantCalling/out/bqsr/HG00096_recal.table
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-06-20T223528.085086.snakemake.log

I have attached the log:
2023-06-20T223528.085086.snakemake.log

Thanks for looking into this.

Edit: I changed the latency wait to 120 and it didnt change the outcome.

@nickhsmith
Copy link
Collaborator

nickhsmith commented Jun 21, 2023

This is directly trying to run the demo? Can you please try to run the following?

snakemake -F -c1 /mnt/f/dropdemo/Output/processed_data/rnaVariantCalling/out/bqsr/HG00096_recal.table

And see how it works? This is trying to build just this one file which should run everything up until that point
Can you also please share the file /mnt/f/dropdemo/Output/processed_data/rnaVariantCalling/logs/bqsr/HG00096.log or copy paste its contents?

@nicholas-owen
Copy link
Author

nicholas-owen commented Jun 22, 2023

Hi, yes this is just running drop demo .

I ran the command:
snakemake -F -c1 /mnt/f/dropdemo/Output/processed_data/rnaVariantCalling/out/bqsr/HG00096_recal.table

and got the error:

Structuring dependencies...
Dependencies file generated at: /tmp/tmp4gegkffg

Building DAG of jobs...
MissingRuleException:
No rule to produce /mnt/f/dropdemo/Output/processed_data/rnaVariantCalling/out/bqsr/HG00096_recal.tabl (if you use input functions make sure that they don't raise unexpected exceptions).

I have attached the log but this file wasnt made today, it was dated 20/6/2023
Thanks!
HG00096.log

@nickhsmith
Copy link
Collaborator

nickhsmith commented Jun 22, 2023

First off, sorry I am unable to replicate your error with a fresh demo on my machine.

It also seems like there is a typo? Is there a chance you didn't copy paste correct?

No rule to produce /mnt/f/dropdemo/Output/processed_data/rnaVariantCalling/out/bqsr/HG00096_recal.tabl (if you use input functions make sure that they don't raise unexpected exceptions).

Also looking at your log file it seems like there is a strange input error, maybe the data download got corrupted or altered by mistake. Can you please try to start fresh from a new demo directory? Or start looking through the rnaVariantCalling logs and see if the previous steps have good log files

cd /mnt/f/dropdemo
mkdir new_demo
cd new_demo
drop demo

snakemake -c1 rnaVariantCalling

You could also try (although I'm guessing this isn't the problem) to make sure your conda build is up to date. conda update gatk4

@nicholas-owen
Copy link
Author

Many thanks for the suggestions. I have created a new demo dir and downloaded the demo files all ok.

I ran `snakemake -c1 rnaVariantCalling' and get the error:

[June 23, 2023 at 10:42:26 PM BST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=127926272
java.lang.NumberFormatException: For input string: "AT"
        at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
        at java.base/java.lang.Integer.parseInt(Integer.java:668)
        at java.base/java.lang.Integer.parseInt(Integer.java:786)
        at htsjdk.tribble.readers.TabixReader.getIntv(TabixReader.java:337)
        at htsjdk.tribble.readers.TabixReader.access$500(TabixReader.java:48)
        at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:438)
        at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
        at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
        at htsjdk.tribble.TabixFeatureReader$FeatureIterator.<init>(TabixFeatureReader.java:159)
        at htsjdk.tribble.TabixFeatureReader.query(TabixFeatureReader.java:133)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:622)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:591)
        at org.broadinstitute.hellbender.engine.FeatureManager.getFeatures(FeatureManager.java:363)
        at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:173)
        at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:125)
        at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:263)
        at org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator.apply(BaseRecalibrator.java:189)
        at org.broadinstitute.hellbender.engine.ReadWalker.lambda$traverse$0(ReadWalker.java:100)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:98)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)
Waiting at most 5 seconds for missing files.
MissingOutputException in rule bqsr in file /mnt/f/dropdemo/new_demo/Scripts/rnaVariantCalling/pipeline/Snakefile, line 446:
Job 19  completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
/mnt/f/dropdemo/new_demo/Output/processed_data/rnaVariantCalling/out/bqsr/HG00096_recal.table
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-06-23T224155.969947.snakemake.log

2023-06-23T224155.969947.snakemake.log

@nickhsmith
Copy link
Collaborator

How does the previous log file look?

/mnt/f/dropdemo/new_demo/Output/processed_data/rnaVariantCalling/logs/splitNcigar/HG00096.log

It's hard for me to recommend specific fixes since I can't replicate the problem myself. I'd guess there is some problem with the installation at some point. You could try to make sure that your conda environment is built correctly. We have a working yaml file for download, available here:

https://www.cmm.in.tum.de/public/paper/drop_analysis/DROP_1.3.3.yaml

You can then create a conda environment (called drop_env_133) by the following command

mamba env create -f DROP_1.3.3.yaml

And subsequently retry the demo command using

conda deactivate
mamba activate drop_env_133
cd /mnt/f/dropdemo/new_demo/
drop update
snakemake -c1 -F rnaVariantCalling

I'm sorry you are having difficulty.

@nicholas-owen
Copy link
Author

Thanks for helping out as much as you can 👍
The previous log is attached.

I will remake the env and see if anything crops up, i had problems with mamba so used conda, not sure if that created a problem..

Cheers

HG00096.log

@nickhsmith
Copy link
Collaborator

One thing you can also try within that existing conda environment is just test if gatk is working properly. Run gatk --version and see if that gives errors

@nickhsmith
Copy link
Collaborator

nickhsmith commented Jul 3, 2023

@nicholas-owen Any luck or updates on the installation?

@gaynora7
Copy link

gaynora7 commented Jul 5, 2023

Hi! I wasnt sure if I should make this a new issue- If that is the case I will gladly open a new one!

I seem to be having the same issue as nicholas-owen, but only with my own data. I called rnaVariantCalling with the demo data, and it worked great.

I called snakemake rnaVariantCalling --cores 8 with my data (sample annotation file and config.yaml attached) and I got the following error:

rule changeHeader:
input: /vcu_gpfs2/home/gaynora/BAM/GRCh38/N_1037_RAligned.sortedByCoord.out.bam, /vcu_gpfs2/home/gaynora/BAM/GRCh38/N_1037_RAligned.sortedByCoord.out.bam.bai, /vcu_gpfs2/home/gaynora/DROP/udx_1037/Scripts/rnaVariantCalling/pipeline/GATK_BASH/changeHeader.sh
output: /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam, /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam.bai, /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_newDropHeader.txt
log: /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/logs/changeHeader/N_1037.log
jobid: 16
reason: Missing output files: /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam, /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam.bai
wildcards: sample=N_1037
resources: tmpdir=/tmp/134220.1.all.q

Waiting at most 360 seconds for missing files.
MissingOutputException in rule changeHeader in file /vcu_gpfs2/home/gaynora/DROP/udx_1037/Scripts/rnaVariantCalling/pipeline/Snakefile, line 617:
Job 16 completed successfully, but some output files are missing. Missing files after 360 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
/vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam
/vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam.bai
/vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_newDropHeader.txt
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

I changed the latency time to 360 by adding --latency-wait 360 to my command, and I still got the same error. I am wondering there is an issue with the vcf, dbSNP, or repeat masker files I was using (as seen in config.yaml)? I downloaded them all from the "files-to-download" page from the DROP documentation.

Not sure what is going on. Thank you for your help, much appreciated!

Best,

Ali

config.yaml.txt

sample_annotation.xlsx

@nickhsmith
Copy link
Collaborator

Can you please post the file /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/logs/changeHeader/N_1037.log or its contents?

Thanks

@gaynora7
Copy link

gaynora7 commented Jul 5, 2023

The log exists but its empty!

@nickhsmith
Copy link
Collaborator

Can you try running just the changeHeader job?

snakemake -c1 /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam

@gaynora7
Copy link

gaynora7 commented Jul 5, 2023

Hi,

I executed snakemake -c1 /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam and got the following error:

rule changeHeader:
input: /vcu_gpfs2/home/gaynora/BAM/GRCh38/N_1037_RAligned.sortedByCoord.out.bam, /vcu_gpfs2/home/gaynora/BAM/GRCh38/N_1037_RAligned.sortedByCoord.out.bam.bai, /vcu_gpfs2/home/gaynora/DROP/udx_1037/Scripts/rnaVariantCalling/pipeline/GATK_BASH/changeHeader.sh
output: /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam, /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam.bai, /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_newDropHeader.txt
log: /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/logs/changeHeader/N_1037.log
jobid: 0
reason: Missing output files: /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam
wildcards: sample=N_1037
resources: tmpdir=/tmp/134250.1.all.q

Waiting at most 5 seconds for missing files.
MissingOutputException in rule changeHeader in file /vcu_gpfs2/home/gaynora/DROP/udx_1037/Scripts/rnaVariantCalling/pipeline/Snakefile, line 617:
Job 0 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
/vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam
/vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam.bai
/vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_newDropHeader.txt
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-07-05T153237.935533.snakemake.log

Also-- /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/logs/changeHeader/N_1037.log is still empty.

Thanks for your quick replies!

@nickhsmith
Copy link
Collaborator

nickhsmith commented Jul 5, 2023

hmm, this is quite odd. Can you try to run the actual command of this step? You first run

snakemake -np /vcu_gpfs2/home/gaynora/DROP/udx_1037/Output/processed_data/rnaVariantCalling/out/bam/N_1037/N_1037_dropHeader.bam

Which should give you a command line script (although a long one). something like

        /home/nicksmith/Documents/projects/drop_work/demo/Scripts/rnaVariantCalling/pipeline/GATK_BASH/changeHeader.sh /home/nicksmith/Documents/projects/drop_work/demo/Data/rna_bam/HG00096_ncbi.bam /home/nicksmith/Documents/projects/drop_work/demo/Data/rna_bam/HG00096_ncbi.bam.bai HG00096 /home/nicksmith/Documents/projects/drop_work/demo/Output/processed_data/rnaVariantCalling/logs/changeHeader/HG00096.log         /home/nicksmith/Documents/projects/drop_work/demo/Output/processed_data/rnaVariantCalling/out/bam/HG00096/HG00096_dropHeader.bam /home/nicksmith/Documents/projects/drop_work/demo/Output/processed_data/rnaVariantCalling/out/bam/HG00096/HG00096_dropHeader.bam.bai /home/nicksmith/Documents/projects/drop_work/demo/Output/processed_data/rnaVariantCalling/out/bam/HG00096/HG00096_newDropHeader.txt

Which you would copy paste and run independently of snakemake. You ideally should see a result like so:

WARNING
Internal Header is designated: 0
SampleID is HG00096
Forcing 0 to match HG00096
new header can be found here:/home/nicksmith/Documents/projects/drop_work/demo/Output/processed_data/rnaVariantCalling/out/bam/HG00096/HG00096_newDropHeader.txt

@nickhsmith
Copy link
Collaborator

@gaynora7 I think I have an idea of why your bam file is failing. It's probably because you don't have a readgroup named in the bam file.

You can check by running the following:

samtools view -H /vcu_gpfs2/home/gaynora/BAM/GRCh38/N_1037_RAligned.sortedByCoord.out.bam | grep ^@RG

You can check out this resource with links that may help you alleviate the problem (problem 2)

https://github.com/gagneurlab/drop/blob/594d7daaff872604d65ae1537a0fe59f463de6b3/docs/source/help.rst

@nickhsmith
Copy link
Collaborator

@gaynora7 and @nicholas-owen I hope that this has been of some help, please let me know if you were able to get things working.

Thanks

@gaynora7
Copy link

Hi @nickhsmith, sorry for the delay in response! I really appreciate all the time you have spent looking into my issue. You are right, I don't have any readgroups in my bam files. I am looking into GATK Addorreplacereadgroups to address the issue.

Thanks again, you have been super helpful!

@vyepez88
Copy link
Collaborator

Hi @gaynora7 were you able to have a look at this and sort it out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants