Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run v4.0.0 or develop (#d5f7d1f) locally #85

Open
leipzig opened this issue Apr 12, 2021 · 4 comments
Open

Unable to run v4.0.0 or develop (#d5f7d1f) locally #85

leipzig opened this issue Apr 12, 2021 · 4 comments

Comments

@leipzig
Copy link

leipzig commented Apr 12, 2021

Hi I am interested in hosting this workflow on a Cromwell-enabled platform, but I've been seeing errors even trying it locally with both the stable and develop branches using these inputs derived from your internal tests

{
    "RNAseq.cpatHex": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/CPAT/Human_Hexamer.tsv",
    "RNAseq.dbsnpVCF": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/wgs2.vcf.gz",
    "RNAseq.hisat2Index": [
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.1.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.2.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.3.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.4.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.5.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.6.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.7.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.8.ht2"
    ],
    "RNAseq.refflatFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.refflat",
    "RNAseq.strandedness": "None",
    "RNAseq.dbsnpVCFIndex": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/wgs2.vcf.gz.tbi",
    "RNAseq.cpatLogitModel": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/CPAT/Human_logitModel.RData",
    "RNAseq.referenceFasta": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.fasta",
    "RNAseq.variantCalling": true,
    "RNAseq.lncRNAdatabases": [
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.gtf"
    ],
    "RNAseq.lncRNAdetection": true,
    "RNAseq.dockerImagesFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/dockerImages.yml",
    "RNAseq.referenceGtfFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.gtf",
    "RNAseq.sampleConfigFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/samplesheets/Rna3PairedEnd.yml",
    "RNAseq.referenceFastaFai": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.fasta.fai",
    "RNAseq.referenceFastaDict": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.dict"
}

On v4.0.0 I see:

java -jar cromwell-59.jar run -i PairedEndHisat2.json RNA-seq.wdl
...
  File "/usr/local/bin/biowdl-input-converter", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/biowdl_input_converter/__init__.py", line 96, in main
    output_json = samplesheet_to_json(
  File "/usr/local/lib/python3.8/site-packages/biowdl_input_converter/__init__.py", line 77, in samplesheet_to_json
    raise NotImplementedError(
NotImplementedError: Unsupported extension: 

On develop I see:

Failed to import 'expression-quantification/multi-bam-quantify.wdl' (reason 1 of 1): Failed to process workflow definition 'MultiBamExpressionQuantification' (reason 1 of 1): Failed to process 'call collectColumns.CollectColumns as mergedStringtieFPKMs' (reason 1 of 1): The call supplied a value 'sumOnDuplicateId' that doesn't exist in the task (or sub-workflow)

Either of these might be easy to resolve but I'm not sure what direction I should take. Thanks!

@rhpvorderman
Copy link
Contributor

Please use v4.0.0 that should be stable.

The error signifies that your SampleConfigFile does not have an extension, but you provided: RNAseq.sampleConfigFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/samplesheets/Rna3PairedEnd.yml. So probably cromwell changes the extension during the download process. This should be visible in the full log.

Can you try downloading the samplesheet first and adding it as a file path instead of a URI?

@leipzig
Copy link
Author

leipzig commented Apr 13, 2021

That works, or at least progresses to the same problem with chunked_scatter. It appears Cromwell renames https URI'ed files for security purposes

#using https://
└── -239497156
  └── 7793893292351066636
#using local filesystem
└── -153306629
  └── Rna3PairedEnd.yml

@leipzig
Copy link
Author

leipzig commented Apr 13, 2021

The easiest way forward for me at this point might be a flag to allow explicit file types to be passed to https://github.com/biowdl/biowdl-input-converter rather than rely on autodetection. I'll try to cook up a PR.

@leipzig
Copy link
Author

leipzig commented Apr 14, 2021

biowdl/biowdl-input-converter#12
of course, files list in the yaml must be found somehow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants