Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandoc error when running using cluster submission #137

Open
alexg9010 opened this issue Apr 3, 2024 · 6 comments
Open

Pandoc error when running using cluster submission #137

alexg9010 opened this issue Apr 3, 2024 · 6 comments

Comments

@alexg9010
Copy link
Member

alexg9010 commented Apr 3, 2024

I was running the test data in a cluster environment.

I had to extend the memory limit for counts_from_SALMON in tests/settings.yaml:

execution:
  submit-to-cluster: yes
  rules:
    counts_from_SALMON:
      threads: 1
      memory: 2000

Then run via

export PYTHONPATH=$GUIX_PYTHONPATH
export PIGX_UNINSTALLED="1" ; ./pigx-rnaseq -s tests/settings.yaml tests/sample_sheet.csv

The pipeline failed for the report generating jobs:

[...]
Error in rule report1:
    jobid: 40
    output: /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/report/hisat2/analysis1.deseq.report.html, /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/report/hisat2/analysis1.deseq_results.tsv
    log: /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/logs/hisat2/analysis1.report.log (check log file(s) for error message)
    shell:
        /gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/Rscript --vanilla /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/scripts/runDeseqReport.R --logo=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/images/Logo_PiGx.png --prefix='analysis1' --reportFile=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/scripts/deseqReport.Rmd --countDataFile=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/feature_counts/raw_counts/hisat2/counts.tsv --colDataFile=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/colData.tsv --gtfFile=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/sample_data/sample.gtf --caseSampleGroups='HBR' --controlSampleGroups='UHR' --covariates=''  --workdir=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/report/hisat2 --organism='' --description='This analysis is part of the pigx-rnaseq build-time tests.' --selfContained='True' >> /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/logs/hisat2/analysis1.report.log 2>&1
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Your job 7042317 ("snakejob.report1.40.sh") has been submitted

Error executing rule report1 on cluster (jobid: 40, external: Your job 7042317 ("snakejob.report1.40.sh") has been submitted, jobscript: /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/.snakemake/tmp.a2y6tlv1/snakejob.report1.40.sh). For error details see the cluster log and the log files of the involved rule(s).
[...]

This is the content of the log:

 $ cat  /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/logs/salmon/analysis1.report.salmon.genes.log

arguments: --logo=/gnu/store/1nwmyp16abzi3yhvk43g0m21plcbgw5g-pigx-rnaseq-0.1.0/share/pigx_rnaseq/Logo_PiGx.png --prefix=D3_VS_WILDTYPE.salmon.transcripts --reportFile=/gnu/store/1nwmyp16abzi3yhvk43g0m21plcbgw5g-pigx-rnaseq-0.1.0/libexec/pigx_rnaseq/scripts/deseqReport.Rmd --countDataFile=/fast/AG_Akalin/agosdsc/projects/testing_swaroop/role_of_pde3a_in_htnb/feature_counts/raw_counts/salmon/counts_from_SALMON.transcripts.tsv --colDataFile=/fast/AG_Akalin/agosdsc/projects/testing_swaroop/role_of_pde3a_in_htnb/colData.tsv --gtfFile=/fast/AG_Klussmann/swaroop/rat_annotation/gtf/Rattus_norvegicus.mRatBN7.2.111.gtf --caseSampleGroups=D3_MUTANT --controlSampleGroups=WILD_TYPE --covariates= --workdir=/fast/AG_Akalin/agosdsc/projects/testing_swaroop/role_of_pde3a_in_htnb/report/salmon --organism= --description=Comparison of D3 mutatants vs wildtype --selfContained=True
setting working directory to  /fast/AG_Akalin/agosdsc/projects/testing_swaroop/role_of_pde3a_in_htnb/report/salmon
Error: pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available).
Execution halted

I see this pandoc related error:

Error: pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available).
@rekado
Copy link
Member

rekado commented Apr 4, 2024

yikes. We use pandoc 2. Perhaps something broke in the rmarkdown check for pandoc? I'll take a look.

@rekado
Copy link
Member

rekado commented Apr 4, 2024

I just did this and it works fine:

guix shell --container r-minimal r-rmarkdown -- R -e 'rmarkdown::pandoc_available("2.11")'

So, that's not it.

The reason is likely that you're using PiGx from a checkout. I would assume that on the cluster nodes you don't actually have Pandoc. What does the tools section of the settings file look like? Using PIGX_UNINSTALLED is also a red flag.

@alexg9010
Copy link
Member Author

This is the tools section from the generated `config.json', the test settings file does not contain any tool specification:

  "tools": {
        "Rscript": {
            "args": "--vanilla",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/Rscript"
        },
        "bamCoverage": {
            "args": "--normalizeUsing BPM --numberOfProcessors 2",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/bamCoverage"
        },
        "fastp": {
            "args": "--adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/fastp"
        },
        "gunzip": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/gunzip"
        },
        "hisat2": {
            "args": "--fast",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/hisat2"
        },
        "hisat2-build": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/hisat2-build"
        },
        "megadepth": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/megadepth"
        },
        "multiqc": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/multiqc"
        },
        "salmon_index": {
            "args": "index",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/salmon"
        },
        "salmon_quant": {
            "args": "quant",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/salmon"
        },
        "samtools": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/samtools"
        },
        "sed": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/sed"
        },
        "star_index": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/STAR"
        },
        "star_map": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/STAR"
        }
    }
}

@alexg9010
Copy link
Member Author

Seems like the way to set pandoc path is done via rmarkdown::find_pandoc (see https://bookdown.org/yihui/rmarkdown-cookbook/install-pandoc.html)

The purpose of find_pandoc() is to

Searches for the pandoc executable in a few places and use the highest version found, unless a specific version is requested.
Source: https://pkgs.rstudio.com/rmarkdown/reference/find_pandoc.html

Specifcally it searches the paths given by "RSTUDIO_PANDOC", "PATH" (via rmarkdown:::find_program() ) and the folder "~/opt/pandoc":

https://github.com/rstudio/rmarkdown/blob/ee69d59f8011ad7b717a409fcbf8060d6ffc4139/R/pandoc.R#L663C1-L668C34

There is no "~/opt/pandoc", but exporting "RSTUDIO_PANDOC" via qsub is possible by updating the qsub template:

qsub-template.sh.in:

#!@GNUBASH@
# properties = {properties}

if [ 'yes' = '@capture_environment@' ]; then
    export R_LIBS_SITE="@R_LIBS_SITE@"
    export PYTHONPATH="@PYTHONPATH@"
        export RSTUDIO_PANDOC="@PANDOC@"
fi

env

{exec_job}

checking for used pandoc version by adding this chunk to rule report1:

{RSCRIPT_EXEC} -e 'rmarkdown::find_pandoc()'

We can inspect the jobs environment by checking the job log output:

$less tests/output/snakejob.report1.40.sh.o7043287

[...]
RSTUDIO_PANDOC=/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/pandoc
[...]
$version
[1] ‘0’

$dir
NULL

So it seems no matching dir was found.

Running the function find_pandoc in guix environment -l guix.scm in the pigx folder works:

> rmarkdown::find_pandoc()
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.utf-8)
$version
[1] '2.19.2'

$dir
[1] "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin"

@rekado
Copy link
Member

rekado commented Apr 5, 2024

My reading of pandoc.R tells me that RSTUDIO_PANDOC is meant to be a directory. Give it the dirname of @pandoc@ instead.

@alexg9010
Copy link
Member Author

Thanks, using dirname of pandoc works.

I will try to fix this in pigx-common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants