Pandoc error when running using cluster submission #137

alexg9010 · 2024-04-03T13:24:16Z

I was running the test data in a cluster environment.

I had to extend the memory limit for counts_from_SALMON in tests/settings.yaml:

execution:
  submit-to-cluster: yes
  rules:
    counts_from_SALMON:
      threads: 1
      memory: 2000

Then run via

export PYTHONPATH=$GUIX_PYTHONPATH
export PIGX_UNINSTALLED="1" ; ./pigx-rnaseq -s tests/settings.yaml tests/sample_sheet.csv

The pipeline failed for the report generating jobs:

[...]
Error in rule report1:
    jobid: 40
    output: /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/report/hisat2/analysis1.deseq.report.html, /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/report/hisat2/analysis1.deseq_results.tsv
    log: /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/logs/hisat2/analysis1.report.log (check log file(s) for error message)
    shell:
        /gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/Rscript --vanilla /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/scripts/runDeseqReport.R --logo=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/images/Logo_PiGx.png --prefix='analysis1' --reportFile=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/scripts/deseqReport.Rmd --countDataFile=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/feature_counts/raw_counts/hisat2/counts.tsv --colDataFile=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/colData.tsv --gtfFile=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/sample_data/sample.gtf --caseSampleGroups='HBR' --controlSampleGroups='UHR' --covariates=''  --workdir=/fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/report/hisat2 --organism='' --description='This analysis is part of the pigx-rnaseq build-time tests.' --selfContained='True' >> /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/logs/hisat2/analysis1.report.log 2>&1
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Your job 7042317 ("snakejob.report1.40.sh") has been submitted

Error executing rule report1 on cluster (jobid: 40, external: Your job 7042317 ("snakejob.report1.40.sh") has been submitted, jobscript: /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/.snakemake/tmp.a2y6tlv1/snakejob.report1.40.sh). For error details see the cluster log and the log files of the involved rule(s).
[...]

This is the content of the log:

 $ cat  /fast/home/a/agosdsc/projects/pigx/pigx_rnaseq/tests/output/logs/salmon/analysis1.report.salmon.genes.log

arguments: --logo=/gnu/store/1nwmyp16abzi3yhvk43g0m21plcbgw5g-pigx-rnaseq-0.1.0/share/pigx_rnaseq/Logo_PiGx.png --prefix=D3_VS_WILDTYPE.salmon.transcripts --reportFile=/gnu/store/1nwmyp16abzi3yhvk43g0m21plcbgw5g-pigx-rnaseq-0.1.0/libexec/pigx_rnaseq/scripts/deseqReport.Rmd --countDataFile=/fast/AG_Akalin/agosdsc/projects/testing_swaroop/role_of_pde3a_in_htnb/feature_counts/raw_counts/salmon/counts_from_SALMON.transcripts.tsv --colDataFile=/fast/AG_Akalin/agosdsc/projects/testing_swaroop/role_of_pde3a_in_htnb/colData.tsv --gtfFile=/fast/AG_Klussmann/swaroop/rat_annotation/gtf/Rattus_norvegicus.mRatBN7.2.111.gtf --caseSampleGroups=D3_MUTANT --controlSampleGroups=WILD_TYPE --covariates= --workdir=/fast/AG_Akalin/agosdsc/projects/testing_swaroop/role_of_pde3a_in_htnb/report/salmon --organism= --description=Comparison of D3 mutatants vs wildtype --selfContained=True
setting working directory to  /fast/AG_Akalin/agosdsc/projects/testing_swaroop/role_of_pde3a_in_htnb/report/salmon
Error: pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available).
Execution halted

I see this pandoc related error:

Error: pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available).

The text was updated successfully, but these errors were encountered:

rekado · 2024-04-04T12:45:07Z

yikes. We use pandoc 2. Perhaps something broke in the rmarkdown check for pandoc? I'll take a look.

rekado · 2024-04-04T12:50:33Z

I just did this and it works fine:

guix shell --container r-minimal r-rmarkdown -- R -e 'rmarkdown::pandoc_available("2.11")'

So, that's not it.

The reason is likely that you're using PiGx from a checkout. I would assume that on the cluster nodes you don't actually have Pandoc. What does the tools section of the settings file look like? Using PIGX_UNINSTALLED is also a red flag.

alexg9010 · 2024-04-05T09:24:09Z

This is the tools section from the generated `config.json', the test settings file does not contain any tool specification:

  "tools": {
        "Rscript": {
            "args": "--vanilla",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/Rscript"
        },
        "bamCoverage": {
            "args": "--normalizeUsing BPM --numberOfProcessors 2",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/bamCoverage"
        },
        "fastp": {
            "args": "--adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/fastp"
        },
        "gunzip": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/gunzip"
        },
        "hisat2": {
            "args": "--fast",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/hisat2"
        },
        "hisat2-build": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/hisat2-build"
        },
        "megadepth": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/megadepth"
        },
        "multiqc": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/multiqc"
        },
        "salmon_index": {
            "args": "index",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/salmon"
        },
        "salmon_quant": {
            "args": "quant",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/salmon"
        },
        "samtools": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/samtools"
        },
        "sed": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/sed"
        },
        "star_index": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/STAR"
        },
        "star_map": {
            "args": "",
            "executable": "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/STAR"
        }
    }
}

alexg9010 · 2024-04-05T10:20:44Z

Seems like the way to set pandoc path is done via rmarkdown::find_pandoc (see https://bookdown.org/yihui/rmarkdown-cookbook/install-pandoc.html)

The purpose of find_pandoc() is to

Searches for the pandoc executable in a few places and use the highest version found, unless a specific version is requested.
Source: https://pkgs.rstudio.com/rmarkdown/reference/find_pandoc.html

Specifcally it searches the paths given by "RSTUDIO_PANDOC", "PATH" (via rmarkdown:::find_program() ) and the folder "~/opt/pandoc":

https://github.com/rstudio/rmarkdown/blob/ee69d59f8011ad7b717a409fcbf8060d6ffc4139/R/pandoc.R#L663C1-L668C34

There is no "~/opt/pandoc", but exporting "RSTUDIO_PANDOC" via qsub is possible by updating the qsub template:

qsub-template.sh.in:

#!@GNUBASH@
# properties = {properties}

if [ 'yes' = '@capture_environment@' ]; then
    export R_LIBS_SITE="@R_LIBS_SITE@"
    export PYTHONPATH="@PYTHONPATH@"
        export RSTUDIO_PANDOC="@PANDOC@"
fi

env

{exec_job}

checking for used pandoc version by adding this chunk to rule report1:

{RSCRIPT_EXEC} -e 'rmarkdown::find_pandoc()'

We can inspect the jobs environment by checking the job log output:

$less tests/output/snakejob.report1.40.sh.o7043287

[...]
RSTUDIO_PANDOC=/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin/pandoc
[...]
$version
[1] ‘0’

$dir
NULL

So it seems no matching dir was found.

Running the function find_pandoc in guix environment -l guix.scm in the pigx folder works:

> rmarkdown::find_pandoc()
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.utf-8)
$version
[1] '2.19.2'

$dir
[1] "/gnu/store/b0skxv953fpsdg79cs4g9qz78ds6pvlz-profile/bin"

rekado · 2024-04-05T11:52:11Z

My reading of pandoc.R tells me that RSTUDIO_PANDOC is meant to be a directory. Give it the dirname of @pandoc@ instead.

alexg9010 · 2024-04-05T12:56:15Z

Thanks, using dirname of pandoc works.

I will try to fix this in pigx-common.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandoc error when running using cluster submission #137

Pandoc error when running using cluster submission #137

alexg9010 commented Apr 3, 2024 •

edited

Loading

rekado commented Apr 4, 2024

rekado commented Apr 4, 2024

alexg9010 commented Apr 5, 2024

alexg9010 commented Apr 5, 2024

rekado commented Apr 5, 2024

alexg9010 commented Apr 5, 2024

Pandoc error when running using cluster submission #137

Pandoc error when running using cluster submission #137

Comments

alexg9010 commented Apr 3, 2024 • edited Loading

rekado commented Apr 4, 2024

rekado commented Apr 4, 2024

alexg9010 commented Apr 5, 2024

alexg9010 commented Apr 5, 2024

rekado commented Apr 5, 2024

alexg9010 commented Apr 5, 2024

alexg9010 commented Apr 3, 2024 •

edited

Loading