diff --git a/.editorconfig b/.editorconfig new file mode 100644 index 00000000..95549501 --- /dev/null +++ b/.editorconfig @@ -0,0 +1,27 @@ +root = true + +[*] +charset = utf-8 +end_of_line = lf +insert_final_newline = true +trim_trailing_whitespace = true +indent_size = 4 +indent_style = space + +[*.{yml,yaml}] +indent_size = 2 + +[*.json] +insert_final_newline = unset + +# These files are edited and tested upstream in nf-core/modules +[/modules/nf-core/**] +charset = unset +end_of_line = unset +insert_final_newline = unset +trim_trailing_whitespace = unset +indent_style = unset +indent_size = unset + +[/assets/email*] +indent_size = unset diff --git a/.github/.dockstore.yml b/.github/.dockstore.yml index 030138a0..191fabd2 100644 --- a/.github/.dockstore.yml +++ b/.github/.dockstore.yml @@ -3,3 +3,4 @@ version: 1.2 workflows: - subclass: nfl primaryDescriptorPath: /nextflow.config + publish: True diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 58cb32e4..42b647d0 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -18,8 +18,9 @@ If you'd like to write some code for nf-core/bacass, the standard workflow is as 1. Check that there isn't already an issue about your idea in the [nf-core/bacass issues](https://github.com/nf-core/bacass/issues) to avoid duplicating work * If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/bacass repository](https://github.com/nf-core/bacass) to your GitHub account -3. Make the necessary changes / additions within your forked repository -4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged +3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) +4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -30,14 +31,14 @@ Typically, pull-requests are only fully reviewed when these tests are passing, t There are typically two types of tests that run: -### Lint Tests +### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. -### Pipeline Tests +### Pipeline tests Each `nf-core` pipeline should be set up with a minimal set of test-data. `GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. @@ -55,3 +56,73 @@ These tests are run both with the latest available version of `Nextflow` and als ## Getting help For further information/help, please consult the [nf-core/bacass documentation](https://nf-co.re/bacass/usage) and don't hesitate to get in touch on the nf-core Slack [#bacass](https://nfcore.slack.com/channels/bacass) channel ([join our Slack here](https://nf-co.re/join/slack)). + +## Pipeline contribution conventions + +To make the nf-core/bacass code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. + +### Adding a new step + +If you wish to contribute a new step, please use the following coding standards: + +1. Define the corresponding input channel into your new process from the expected previous process channel +2. Write the process block (see below). +3. Define the output channel if needed (see below). +4. Add any new flags/options to `nextflow.config` with a default (see below). +5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build`). +6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter). +7. Add sanity checks for all relevant parameters. +8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`. +9. Do local tests that the new code works properly and as expected. +10. Add a new test command in `.github/workflow/ci.yml`. +11. If applicable add a [MultiQC](https://https://multiqc.info/) module. +12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order. +13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`. + +### Default values + +Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. + +Once there, use `nf-core schema build` to add to `nextflow_schema.json`. + +### Default processes resource requirements + +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. + +The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block. + +### Naming schemes + +Please use the following naming schemes, to make it easy to understand what is going where. + +* initial process channel: `ch_output_from_` +* intermediate and terminal channels: `ch__for_` + +### Nextflow version bumping + +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` + +### Software version reporting + +If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process. + +Add to the script block of the process, something like the following: + +```bash + --version &> v_.txt 2>&1 || true +``` + +or + +```bash + --help | head -n 1 &> v_.txt 2>&1 || true +``` + +You then need to edit the script `bin/scrape_software_versions.py` to: + +1. Add a Python regex for your tool's `--version` output (as in stored in the `v_.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1` +2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC. + +### Images and figures + +For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 5d654575..8e0a6155 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -1,13 +1,25 @@ +--- +name: Bug report +about: Report something that is broken or incorrect +labels: bug +--- + +## Check Documentation + +I have checked the following places for your error: + +- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) +- [ ] [nf-core/bacass pipeline documentation](https://nf-co.re/bacass/usage) + ## Description of the bug @@ -23,6 +35,13 @@ Steps to reproduce the behaviour: +## Log files + +Have you provided the following extra information/files: + +- [ ] The command used to run the pipeline +- [ ] The `.nextflow.log` file + ## System - Hardware: @@ -32,13 +51,12 @@ Steps to reproduce the behaviour: ## Nextflow Installation -- Version: +- Version: ## Container engine -- Engine: +- Engine: - version: -- Image tag: ## Additional context diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 00000000..eab75cde --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,8 @@ +blank_issues_enabled: false +contact_links: + - name: Join nf-core + url: https://nf-co.re/join + about: Please join the nf-core community here + - name: "Slack #bacass channel" + url: https://nfcore.slack.com/channels/bacass + about: Discussion about the nf-core/bacass pipeline diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 78f110cf..29121dbd 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -1,3 +1,9 @@ +--- +name: Feature request +about: Suggest an idea for the nf-core/bacass pipeline +labels: enhancement +--- + + ## PR checklist -- [ ] This comment contains a description of changes (with reason) -- [ ] `CHANGELOG.md` is updated +- [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! -- [ ] Documentation in `docs` is updated -- [ ] If necessary, also make a PR on the [nf-core/bacass branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/bacass) + - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/bacass/tree/master/.github/CONTRIBUTING.md) + - [ ] If necessary, also make a PR on the nf-core/bacass _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. +- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). +- [ ] Usage Documentation in `docs/usage.md` is updated. +- [ ] Output Documentation in `docs/output.md` is updated. +- [ ] `CHANGELOG.md` is updated. +- [ ] `README.md` is updated (including new tool citations and authors/contributors). diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml deleted file mode 100644 index 96b12a70..00000000 --- a/.github/markdownlint.yml +++ /dev/null @@ -1,5 +0,0 @@ -# Markdownlint configuration file -default: true, -line-length: false -no-duplicate-header: - siblings_only: true diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index c4a39103..5e94f7df 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,41 +1,33 @@ name: nf-core AWS full size tests # This workflow is triggered on published releases. -# It can be additionally triggered manually with GitHub actions workflow dispatch. +# It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: release: types: [published] workflow_dispatch: - jobs: - run-awstest: + run-tower: name: Run AWS full tests if: github.repository == 'nf-core/bacass' runs-on: ubuntu-latest steps: - - name: Setup Miniconda - uses: goanpeca/setup-miniconda@v1.0.2 - with: - auto-update-conda: true - python-version: 3.7 - - name: Install awscli - run: conda install -c conda-forge awscli - - name: Start AWS batch job + - name: Launch workflow via tower + uses: nf-core/tower-action@master # Add full size test data (but still relatively small datasets for few samples) # on the `test_full.config` test runs with only one set of parameters - # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command - env: - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} - AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} - AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} - AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} - run: | - aws batch submit-job \ - --region eu-west-1 \ - --job-name nf-core-bacass \ - --job-queue $AWS_JOB_QUEUE \ - --job-definition $AWS_JOB_DEFINITION \ - --container-overrides '{"command": ["nf-core/bacass", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/bacass/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/bacass/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + + with: + workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} + bearer_token: ${{ secrets.TOWER_BEARER_TOKEN }} + compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} + pipeline: ${{ github.repository }} + revision: ${{ github.sha }} + workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/bacass/work-${{ github.sha }} + parameters: | + { + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/bacass/results-${{ github.sha }}" + } + profiles: '[ "test_full", "aws_tower" ]' + diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index f687cf32..56b6ad8a 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -1,38 +1,28 @@ name: nf-core AWS test -# This workflow is triggered on push to the master branch. -# It can be additionally triggered manually with GitHub actions workflow dispatch. -# It runs the -profile 'test' on AWS batch. +# This workflow can be triggered manually with the GitHub actions workflow dispatch button. +# It runs the -profile 'test' on AWS batch on: workflow_dispatch: - jobs: - run-awstest: + run-tower: name: Run AWS tests if: github.repository == 'nf-core/bacass' runs-on: ubuntu-latest steps: - - name: Setup Miniconda - uses: goanpeca/setup-miniconda@v1.0.2 + - name: Launch workflow via tower + uses: nf-core/tower-action@master + with: - auto-update-conda: true - python-version: 3.7 - - name: Install awscli - run: conda install -c conda-forge awscli - - name: Start AWS batch job - # For example: adding multiple test runs with different parameters - # Remember that you can parallelise this by using strategy.matrix - env: - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} - AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} - AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} - AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} - run: | - aws batch submit-job \ - --region eu-west-1 \ - --job-name nf-core-bacass \ - --job-queue $AWS_JOB_QUEUE \ - --job-definition $AWS_JOB_DEFINITION \ - --container-overrides '{"command": ["nf-core/bacass", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/bacass/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/bacass/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} + bearer_token: ${{ secrets.TOWER_BEARER_TOKEN }} + compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} + pipeline: ${{ github.repository }} + revision: ${{ github.sha }} + workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/bacass/work-${{ github.sha }} + parameters: | + { + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/bacass/results-${{ github.sha }}" + } + profiles: '[ "test", "aws_tower" ]' + diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index 6599d424..453a14f4 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -2,7 +2,7 @@ name: nf-core branch protection # This workflow is triggered on PRs to master branch on the repository # It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` on: - pull_request: + pull_request_target: branches: [master] jobs: @@ -13,7 +13,7 @@ jobs: - name: Check PRs if: github.repository == 'nf-core/bacass' run: | - { [[ ${{github.event.pull_request.head.repo.full_name}} == nf-core/bacass ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] + { [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/bacass ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] # If the above check failed, post a comment on the PR explaining the failure @@ -23,13 +23,22 @@ jobs: uses: mshick/add-pr-comment@v1 with: message: | + ## This PR is against the `master` branch :x: + + * Do not close this PR + * Click _Edit_ and change the `base` to `dev` + * This CI test will remain failed until you push a new commit + + --- + Hi @${{ github.event.pull_request.user.login }}, - It looks like this pull-request is has been made against the ${{github.event.pull_request.head.repo.full_name}} `master` branch. + It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch. The `master` branch on nf-core repositories should always contain code from the latest release. - Because of this, PRs to `master` are only allowed if they come from the ${{github.event.pull_request.head.repo.full_name}} `dev` branch. + Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch. You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page. + Note that even after this, the test will continue to show as failing until you push a new commit. Thanks again for your contribution! repo-token: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 132dcbc5..7f5d08e6 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -8,6 +8,9 @@ on: release: types: [published] +# Uncomment if we need an edge release of Nextflow again +# env: NXF_EDGE: 1 + jobs: test: name: Run workflow tests @@ -20,34 +23,46 @@ jobs: strategy: matrix: # Nextflow versions: check pipeline minimum and current latest - nxf_ver: ['19.10.0', ''] - options: ["test,docker", "test_long_miniasm,docker", "test_hybrid,docker", "test_long,docker --assembler canu"] + nxf_ver: ['21.04.0', ''] steps: - name: Check out pipeline code uses: actions/checkout@v2 - - name: Check if Dockerfile or Conda environment changed - uses: technote-space/get-diff-action@v1 - with: - PREFIX_FILTER: | - Dockerfile - environment.yml - - - name: Build new docker image - if: env.GIT_DIFF - run: docker build --no-cache . -t nfcore/bacass:1.1.1 + - name: Install Nextflow + env: + CAPSULE_LOG: none + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ - - name: Pull docker image - if: ${{ !env.GIT_DIFF }} + - name: Run pipeline with test data + # For example: adding multiple test runs with different parameters + # Remember that you can parallelise this by using strategy.matrix run: | - docker pull nfcore/bacass:dev - docker tag nfcore/bacass:dev nfcore/bacass:1.1.1 + nextflow run ${GITHUB_WORKSPACE} -profile test,docker + + profiles: + name: Run workflow profile + # Only run on push if this is the nf-core dev branch (merged PRs) + if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/bacass') }} + runs-on: ubuntu-latest + env: + NXF_VER: '21.04.0' + NXF_ANSI_LOG: false + strategy: + matrix: + # Run remaining test profiles with minimum nextflow version + profile: [test_long_miniasm, test_hybrid, test_long, test_dfast] + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 - name: Install Nextflow + env: + CAPSULE_LOG: none run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - - - name: Run pipeline with test data + - name: Run pipeline with ${{ matrix.profile }} test profile run: | - nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.options }} + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profile }},docker diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 3b2060e3..3b448773 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -18,7 +18,49 @@ jobs: - name: Install markdownlint run: npm install -g markdownlint-cli - name: Run Markdownlint - run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml + run: markdownlint . + + # If the above check failed, post a comment on the PR explaining the failure + - name: Post PR comment + if: failure() + uses: mshick/add-pr-comment@v1 + with: + message: | + ## Markdown linting is failing + + To keep the code consistent with lots of contributors, we run automated code consistency checks. + To fix this CI test, please run: + + * Install `markdownlint-cli` + * On Mac: `brew install markdownlint-cli` + * Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`) + * Fix the markdown errors + * Automatically: `markdownlint . --fix` + * Manually resolve anything left from `markdownlint .` + + Once you push these changes the test should pass, and you can hide this comment :+1: + + We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help! + + Thanks again for your contribution! + repo-token: ${{ secrets.GITHUB_TOKEN }} + allow-repeats: false + + EditorConfig: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + + - uses: actions/setup-node@v1 + with: + node-version: '10' + + - name: Install editorconfig-checker + run: npm install -g editorconfig-checker + + - name: Run ECLint check + run: editorconfig-checker -exclude README.md $(git ls-files | grep -v test) + YAML: runs-on: ubuntu-latest steps: @@ -29,7 +71,33 @@ jobs: - name: Install yaml-lint run: npm install -g yaml-lint - name: Run yaml-lint - run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml") + run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml" -o -name "*.yaml") + + # If the above check failed, post a comment on the PR explaining the failure + - name: Post PR comment + if: failure() + uses: mshick/add-pr-comment@v1 + with: + message: | + ## YAML linting is failing + + To keep the code consistent with lots of contributors, we run automated code consistency checks. + To fix this CI test, please run: + + * Install `yaml-lint` + * [Install `npm`](https://www.npmjs.com/get-npm) then [install `yaml-lint`](https://www.npmjs.com/package/yaml-lint) (`npm install -g yaml-lint`) + * Fix the markdown errors + * Run the test locally: `yamllint $(find . -type f -name "*.yml" -o -name "*.yaml")` + * Fix any reported errors in your YAML files + + Once you push these changes the test should pass, and you can hide this comment :+1: + + We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help! + + Thanks again for your contribution! + repo-token: ${{ secrets.GITHUB_TOKEN }} + allow-repeats: false + nf-core: runs-on: ubuntu-latest steps: @@ -38,9 +106,12 @@ jobs: uses: actions/checkout@v2 - name: Install Nextflow + env: + CAPSULE_LOG: none run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ + - uses: actions/setup-python@v1 with: python-version: '3.6' @@ -56,12 +127,19 @@ jobs: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint ${GITHUB_WORKSPACE} + run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + + - name: Save PR number + if: ${{ always() }} + run: echo ${{ github.event.pull_request.number }} > PR_number.txt - name: Upload linting log file artifact if: ${{ always() }} uses: actions/upload-artifact@v2 with: - name: linting-log-file - path: lint_log.txt + name: linting-logs + path: | + lint_log.txt + lint_results.md + PR_number.txt diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml new file mode 100644 index 00000000..90f03c6f --- /dev/null +++ b/.github/workflows/linting_comment.yml @@ -0,0 +1,29 @@ + +name: nf-core linting comment +# This workflow is triggered after the linting action is complete +# It posts an automated comment to the PR, even if the PR is coming from a fork + +on: + workflow_run: + workflows: ["nf-core linting"] + +jobs: + test: + runs-on: ubuntu-latest + steps: + - name: Download lint results + uses: dawidd6/action-download-artifact@v2 + with: + workflow: linting.yml + + - name: Get PR number + id: pr_number + run: echo "::set-output name=pr_number::$(cat linting-logs/PR_number.txt)" + + - name: Post PR comment + uses: marocchino/sticky-pull-request-comment@v2 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + number: ${{ steps.pr_number.outputs.pr_number }} + path: linting-logs/lint_results.md + diff --git a/.github/workflows/push_dockerhub.yml b/.github/workflows/push_dockerhub.yml deleted file mode 100644 index 6831ce46..00000000 --- a/.github/workflows/push_dockerhub.yml +++ /dev/null @@ -1,40 +0,0 @@ -name: nf-core Docker push -# This builds the docker image and pushes it to DockerHub -# Runs on nf-core repo releases and push event to 'dev' branch (PR merges) -on: - push: - branches: - - dev - release: - types: [published] - -jobs: - push_dockerhub: - name: Push new Docker image to Docker Hub - runs-on: ubuntu-latest - # Only run for the nf-core repo, for releases and merged PRs - if: ${{ github.repository == 'nf-core/bacass' }} - env: - DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} - DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} - steps: - - name: Check out pipeline code - uses: actions/checkout@v2 - - - name: Build new docker image - run: docker build --no-cache . -t nfcore/bacass:latest - - - name: Push Docker image to DockerHub (dev) - if: ${{ github.event_name == 'push' }} - run: | - echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin - docker tag nfcore/bacass:latest nfcore/bacass:dev - docker push nfcore/bacass:dev - - - name: Push Docker image to DockerHub (release) - if: ${{ github.event_name == 'release' }} - run: | - echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin - docker push nfcore/bacass:latest - docker tag nfcore/bacass:latest nfcore/bacass:${{ github.event.release.tag_name }} - docker push nfcore/bacass:${{ github.event.release.tag_name }} diff --git a/.gitignore b/.gitignore index 1f916833..5124c9ac 100644 --- a/.gitignore +++ b/.gitignore @@ -3,10 +3,6 @@ work/ data/ results/ .DS_Store -tests/test_data -*.un~ -*.swp -nohup.out -tmp/ -._.DS_Store +testing/ +testing* *.pyc diff --git a/.markdownlint.yml b/.markdownlint.yml new file mode 100644 index 00000000..9e605fcf --- /dev/null +++ b/.markdownlint.yml @@ -0,0 +1,14 @@ +# Markdownlint configuration file +default: true +line-length: false +ul-indent: + indent: 4 +no-duplicate-header: + siblings_only: true +no-inline-html: + allowed_elements: + - img + - p + - kbd + - details + - summary diff --git a/CHANGELOG.md b/CHANGELOG.md index 230fdab7..d9b44c37 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,37 +1,74 @@ # nf-core/bacass: Changelog +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## v2.0.0 nf-core/bacass: "Navy Steel Swordfish" 2021/08/27 + +### `Changed` + +* [#56](https://github.com/nf-core/bacass/pull/56) - Switched to DSL2 & update to new nf-core 2.1 `TEMPLATE` +* [#56](https://github.com/nf-core/bacass/pull/56) - `--krakendb` now expects a `.tar.gz`/`.tgz` (compressed tar archive) directly from `https://benlangmead.github.io/aws-indexes/k2` instead of an uncompressed folder. + +### `Added` + +* [#56](https://github.com/nf-core/bacass/pull/56) - Added full size test dataset, two Zetaproteobacteria sequenced with Illumina MiSeq Reagent Kit V2, PE250, 3 to 4 million read pairs. + +### `Fixed` + +* [#51](https://github.com/nf-core/bacass/issues/51) - Fixed Unicycler + +### `Dependencies` + +* [#56](https://github.com/nf-core/bacass/pull/56) - Updated a bunch of dependencies (unchanged: FastQC, Miniasm, Prokka, Porechop, QUAST) + * Unicycler from 0.4.4 to 0.4.8 + * Kraken2 from 2.0.9beta to 2.1.1 + * MultiQC from 1.9 to 1.10.1 + * PYCOQC from 2.5.0.23 to 2.5.2 + * Samtools from 1.11 to 1.13 + * Canu from 2.0 to 2.1.1-2 + * dfast from 1.2.10 to 1.2.14 + * Medaka from 1.1.2 to 1.4.3-0 + * Minimap 2 from 2.17 to 2.21 + * Nanoplot from 1.32.1 to 1.38.0 + * Nanopolish from 0.13.2 to 0.13.2-5 + * Racon from 1.4.13 to 1.4.20-1 + * Skewer from 0.2.2 to 0.2.2-3 + +### `Deprecated` + ## v1.1.1 nf-core/bacass: "Green Aluminium Shark" 2020/11/05 This is basically a maintenance update that includes template updates, fixed environments and some minor bugfixes. * Merged in nf-core/tools template v 1.10.2 * Updated dependencies - * fastqc=0.11.8, 0.11.9 - * multiqc=1.8, 1.9 - * kraken2=2.0.8_beta, 2.0.9beta - * prokka=1.14.5, 1.14.6 - * nanopolish=0.11.2, 0.13.2 - * parallel=20191122, 20200922 - * racon=1.4.10, 1.4.13 - * canu=1.9, 2.0 - * samtools=1.9, 1.11 - * nanoplot=1.28.1, 1.32.1 - * pycoqc=2.5.0.3, 2.5.0.23 + * fastqc=0.11.8, 0.11.9 + * multiqc=1.8, 1.9 + * kraken2=2.0.8_beta, 2.0.9beta + * prokka=1.14.5, 1.14.6 + * nanopolish=0.11.2, 0.13.2 + * parallel=20191122, 20200922 + * racon=1.4.10, 1.4.13 + * canu=1.9, 2.0 + * samtools=1.9, 1.11 + * nanoplot=1.28.1, 1.32.1 + * pycoqc=2.5.0.3, 2.5.0.23 * Switched out containers for many tools to make DSLv2 transition easier (escape from dependency hell) ## v1.1.0 nf-core/bacass: "Green Aluminium Shark" 2019/12/13 * Added support for hybrid assembly using Nanopore and Illumina Short Reads * Added methods for long-read Nanopore data - * Nanopolish, for polishing of Nanopore data with Illumina reads - * Medaka, as alternative assembly polishing method - * PoreChop, for quality trimming of Nanopore data - * Nanoplot, for plotting quality metrics of Nanopore data - * PycoQC, to QC Nanopore data + * Nanopolish, for polishing of Nanopore data with Illumina reads + * Medaka, as alternative assembly polishing method + * PoreChop, for quality trimming of Nanopore data + * Nanoplot, for plotting quality metrics of Nanopore data + * PycoQC, to QC Nanopore data * Added multiple tools to assemble long-reads - * Miniasm + Racon - * Canu Assembler - * Unicycler in Long read Mode + * Miniasm + Racon + * Canu Assembler + * Unicycler in Long read Mode * Add alternative assembly annotation using DFAST * Add social preview image diff --git a/CITATIONS.md b/CITATIONS.md new file mode 100644 index 00000000..dfec0ec6 --- /dev/null +++ b/CITATIONS.md @@ -0,0 +1,74 @@ +# nf-core/bacass: Citations + +## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) + +> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. + +## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) + +> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. + +## Pipeline tools + +* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) + +* [Skewer](https://pubmed.ncbi.nlm.nih.gov/24925680/) + > Jiang H, Lei R, Ding SW, Zhu S. Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics. 2014 Jun 12;15:182. doi: 10.1186/1471-2105-15-182. PMID: 24925680; PMCID: PMC4074385. + +* [Porechop](https://github.com/rrwick/Porechop) + +* [NanoPlot](https://doi.org/10.1093/bioinformatics/bty149) + > De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. doi: 10.1093/bioinformatics/bty149. + +* [pycoQC](https://github.com/tleonardi/pycoQC) + +* [Unicycler](https://pubmed.ncbi.nlm.nih.gov/28594827/) + > Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. PMID: 28594827; PMCID: PMC5481147. + +* [Miniasm](https://github.com/lh3/miniasm) with [Racon](https://github.com/isovic/racon) + +* [Canu](https://pubmed.ncbi.nlm.nih.gov/28298431/) + > Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017 May;27(5):722-736. doi: 10.1101/gr.215087.116. Epub 2017 Mar 15. PMID: 28298431; PMCID: PMC5411767. + +* [QUAST](https://pubmed.ncbi.nlm.nih.gov/23422339/) + > Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19. PMID: 23422339; PMCID: PMC3624806. + +* [Prokka](https://pubmed.ncbi.nlm.nih.gov/24642063/) + > Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014 Jul 15;30(14):2068-9. doi: 10.1093/bioinformatics/btu153. Epub 2014 Mar 18. PMID: 24642063. + +* [DFAST](https://pubmed.ncbi.nlm.nih.gov/29106469/) + > Tanizawa Y, Fujisawa T, Nakamura Y. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Bioinformatics. 2018 Mar 15;34(6):1037-1039. doi: 10.1093/bioinformatics/btx713. PMID: 29106469; PMCID: PMC5860143. + +* [Medaka](https://github.com/nanoporetech/medaka) + +* [Nanopolish](https://github.com/jts/nanopolish) + +* [SAMtools](https://doi.org/10.1093/bioinformatics/btp352) + > Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. doi: 10.1093/bioinformatics/btp352. + +* [Kraken2](https://doi.org/10.1186/s13059-019-1891-0) + > Wood, D et al., 2019. Improved metagenomic analysis with Kraken 2. Genome Biology volume 20, Article number: 257. doi: 10.1186/s13059-019-1891-0. + +* [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/) + > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924 + +## Data + +* [Full-size test data](https://pubmed.ncbi.nlm.nih.gov/32561582/) + > Blackwell N, Bryce C, Straub D, Kappler A, Kleindienst S. Genomic Insights into Two Novel Fe(II)-Oxidizing Zetaproteobacteria Isolates Reveal Lifestyle Adaption to Coastal Marine Sediments. Appl Environ Microbiol. 2020 Aug 18;86(17):e01160-20. doi: 10.1128/AEM.01160-20. PMID: 32561582; PMCID: PMC7440796. + +## Software packaging/containerisation tools + +* [Anaconda](https://anaconda.com) + > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. + +* [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) + > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. + +* [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) + > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. + +* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) + +* [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) + > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 405fb1bf..f4fd052f 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -1,46 +1,111 @@ -# Contributor Covenant Code of Conduct +# Code of Conduct at nf-core (v1.0) ## Our Pledge -In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. +In the interest of fostering an open, collaborative, and welcoming environment, we as contributors and maintainers of nf-core, pledge to making participation in our projects and community a harassment-free experience for everyone, regardless of: -## Our Standards +- Age +- Body size +- Familial status +- Gender identity and expression +- Geographical location +- Level of experience +- Nationality and national origins +- Native language +- Physical and neurological ability +- Race or ethnicity +- Religion +- Sexual identity and orientation +- Socioeconomic status -Examples of behavior that contributes to creating a positive environment include: +Please note that the list above is alphabetised and is therefore not ranked in any order of preference or importance. -* Using welcoming and inclusive language -* Being respectful of differing viewpoints and experiences -* Gracefully accepting constructive criticism -* Focusing on what is best for the community -* Showing empathy towards other community members +## Preamble -Examples of unacceptable behavior by participants include: +> Note: This Code of Conduct (CoC) has been drafted by the nf-core Safety Officer and been edited after input from members of the nf-core team and others. "We", in this document, refers to the Safety Officer and members of the nf-core core team, both of whom are deemed to be members of the nf-core community and are therefore required to abide by this Code of Conduct. This document will amended periodically to keep it up-to-date, and in case of any dispute, the most current version will apply. -* The use of sexualized language or imagery and unwelcome sexual attention or advances -* Trolling, insulting/derogatory comments, and personal or political attacks -* Public or private harassment -* Publishing others' private information, such as a physical or electronic address, without explicit permission -* Other conduct which could reasonably be considered inappropriate in a professional setting +An up-to-date list of members of the nf-core core team can be found [here](https://nf-co.re/about). Our current safety officer is Renuka Kudva. + +nf-core is a young and growing community that welcomes contributions from anyone with a shared vision for [Open Science Policies](https://www.fosteropenscience.eu/taxonomy/term/8). Open science policies encompass inclusive behaviours and we strive to build and maintain a safe and inclusive environment for all individuals. + +We have therefore adopted this code of conduct (CoC), which we require all members of our community and attendees in nf-core events to adhere to in all our workspaces at all times. Workspaces include but are not limited to Slack, meetings on Zoom, Jitsi, YouTube live etc. + +Our CoC will be strictly enforced and the nf-core team reserve the right to exclude participants who do not comply with our guidelines from our workspaces and future nf-core activities. + +We ask all members of our community to help maintain a supportive and productive workspace and to avoid behaviours that can make individuals feel unsafe or unwelcome. Please help us maintain and uphold this CoC. + +Questions, concerns or ideas on what we can include? Contact safety [at] nf-co [dot] re ## Our Responsibilities -Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. +The safety officer is responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behaviour. + +The safety officer in consultation with the nf-core core team have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. + +Members of the core team or the safety officer who violate the CoC will be required to recuse themselves pending investigation. They will not have access to any reports of the violations and be subject to the same actions as others in violation of the CoC. + +## When are where does this Code of Conduct apply? + +Participation in the nf-core community is contingent on following these guidelines in all our workspaces and events. This includes but is not limited to the following listed alphabetically and therefore in no order of preference: + +- Communicating with an official project email address. +- Communicating with community members within the nf-core Slack channel. +- Participating in hackathons organised by nf-core (both online and in-person events). +- Participating in collaborative work on GitHub, Google Suite, community calls, mentorship meetings, email correspondence. +- Participating in workshops, training, and seminar series organised by nf-core (both online and in-person events). This applies to events hosted on web-based platforms such as Zoom, Jitsi, YouTube live etc. +- Representing nf-core on social media. This includes both official and personal accounts. + +## nf-core cares 😊 + +nf-core's CoC and expectations of respectful behaviours for all participants (including organisers and the nf-core team) include but are not limited to the following (listed in alphabetical order): + +- Ask for consent before sharing another community member’s personal information (including photographs) on social media. +- Be respectful of differing viewpoints and experiences. We are all here to learn from one another and a difference in opinion can present a good learning opportunity. +- Celebrate your accomplishments at events! (Get creative with your use of emojis 🎉 🥳 💯 🙌 !) +- Demonstrate empathy towards other community members. (We don’t all have the same amount of time to dedicate to nf-core. If tasks are pending, don’t hesitate to gently remind members of your team. If you are leading a task, ask for help if you feel overwhelmed.) +- Engage with and enquire after others. (This is especially important given the geographically remote nature of the nf-core community, so let’s do this the best we can) +- Focus on what is best for the team and the community. (When in doubt, ask) +- Graciously accept constructive criticism, yet be unafraid to question, deliberate, and learn. +- Introduce yourself to members of the community. (We’ve all been outsiders and we know that talking to strangers can be hard for some, but remember we’re interested in getting to know you and your visions for open science!) +- Show appreciation and **provide clear feedback**. (This is especially important because we don’t see each other in person and it can be harder to interpret subtleties. Also remember that not everyone understands a certain language to the same extent as you do, so **be clear in your communications to be kind.**) +- Take breaks when you feel like you need them. +- Using welcoming and inclusive language. (Participants are encouraged to display their chosen pronouns on Zoom or in communication on Slack.) + +## nf-core frowns on 😕 + +The following behaviours from any participants within the nf-core community (including the organisers) will be considered unacceptable under this code of conduct. Engaging or advocating for any of the following could result in expulsion from nf-core workspaces. + +- Deliberate intimidation, stalking or following and sustained disruption of communication among participants of the community. This includes hijacking shared screens through actions such as using the annotate tool in conferencing software such as Zoom. +- “Doxing” i.e. posting (or threatening to post) another person’s personal identifying information online. +- Spamming or trolling of individuals on social media. +- Use of sexual or discriminatory imagery, comments, or jokes and unwelcome sexual attention. +- Verbal and text comments that reinforce social structures of domination related to gender, gender identity and expression, sexual orientation, ability, physical appearance, body size, race, age, religion or work experience. + +### Online Trolling + +The majority of nf-core interactions and events are held online. Unfortunately, holding events online comes with the added issue of online trolling. This is unacceptable, reports of such behaviour will be taken very seriously, and perpetrators will be excluded from activities immediately. + +All community members are required to ask members of the group they are working within for explicit consent prior to taking screenshots of individuals during video calls. + +## Procedures for Reporting CoC violations -Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. +If someone makes you feel uncomfortable through their behaviours or actions, report it as soon as possible. -## Scope +You can reach out to members of the [nf-core core team](https://nf-co.re/about) and they will forward your concerns to the safety officer(s). -This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. +Issues directly concerning members of the core team will be dealt with by other members of the core team and the safety manager, and possible conflicts of interest will be taken into account. nf-core is also in discussions about having an ombudsperson, and details will be shared in due course. -## Enforcement +All reports will be handled with utmost discretion and confidentially. -Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. +## Attribution and Acknowledgements -Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. +- The [Contributor Covenant, version 1.4](http://contributor-covenant.org/version/1/4) +- The [OpenCon 2017 Code of Conduct](http://www.opencon2017.org/code_of_conduct) (CC BY 4.0 OpenCon organisers, SPARC and Right to Research Coalition) +- The [eLife innovation sprint 2020 Code of Conduct](https://sprint.elifesciences.org/code-of-conduct/) +- The [Mozilla Community Participation Guidelines v3.1](https://www.mozilla.org/en-US/about/governance/policies/participation/) (version 3.1, CC BY-SA 3.0 Mozilla) -## Attribution +## Changelog -This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct/][version] +### v1.0 - March 12th, 2021 -[homepage]: https://contributor-covenant.org -[version]: https://www.contributor-covenant.org/version/1/4/code-of-conduct/ +- Complete rewrite from original [Contributor Covenant](http://contributor-covenant.org/) CoC. diff --git a/Dockerfile b/Dockerfile deleted file mode 100644 index cf0320ce..00000000 --- a/Dockerfile +++ /dev/null @@ -1,19 +0,0 @@ -FROM nfcore/base:1.11 -LABEL authors="Andreas Wilm, Alexander Peltzer" \ - description="Docker image containing all software requirements for the nf-core/bacass pipeline" - -# Install the conda environment -COPY environment.yml / -# for bandage :/ otherwise it complains about missing libGL.so.1 -#RUN apt-get install -y libgl1-mesa-glx && apt-get clean -y -RUN conda env create --quiet -f /environment.yml && conda clean -a - -# Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-bacass-1.1.1/bin:$PATH - -# Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-bacass-1.1.1 > nf-core-bacass-1.1.1.yml - -# Instruct R processes to use these empty files instead of clashing with a local version -RUN touch .Rprofile -RUN touch .Renviron diff --git a/README.md b/README.md index d3415746..3865ec47 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,29 @@ -# ![nf-core/bacass](docs/images/nfcore-bacass_logo.png) +# ![nf-core/bacass](docs/images/nf-core-bacass_logo.png) -A simple bacterial assembly and annotation pipeline +[![GitHub Actions CI Status](https://github.com/nf-core/bacass/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/bacass/actions?query=workflow%3A%22nf-core+CI%22) +[![GitHub Actions Linting Status](https://github.com/nf-core/bacass/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/bacass/actions?query=workflow%3A%22nf-core+linting%22) +[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/bacass/results) +[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.2669428-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.2669428) -[![GitHub Actions CI Status](https://github.com/nf-core/bacass/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/bacass/actions) -[![GitHub Actions Linting Status](https://github.com/nf-core/bacass/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/bacass/actions) -[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.04.0-23aa62.svg?labelColor=000000)](https://www.nextflow.io/) +[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) +[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) +[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) -[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](https://bioconda.github.io/) -[![Docker](https://img.shields.io/docker/automated/nfcore/bacass.svg)](https://hub.docker.com/r/nfcore/bacass) -[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23bacass-4A154B?logo=slack)](https://nfcore.slack.com/channels/bacass) -[![DOI](https://zenodo.org/badge/168486714.svg)](https://zenodo.org/badge/latestdoi/168486714) +[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23bacass-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/bacass) +[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core) +[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core) ## Introduction +**nf-core/bacass** is a bioinformatics best-practice analysis pipeline for simple bacterial assembly and annotation. The pipeline is able to assemble short reads, long reads, or a mixture of short and long reads (hybrid assembly). + +The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community! + +On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/bacass/results). + +## Pipeline summary + ### Short Read Assembly This pipeline is primarily for bacterial assembly of next-generation sequencing reads. It can be used to quality trim your reads using [Skewer](https://github.com/relipmoc/skewer) and performs basic sequencing QC using [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Afterwards, the pipeline performs read assembly using [Unicycler](https://github.com/rrwick/Unicycler). Contamination of the assembly is checked using [Kraken2](https://ccb.jhu.edu/software/kraken2/) to verify sample purity. @@ -20,30 +31,53 @@ This pipeline is primarily for bacterial assembly of next-generation sequencing ### Long Read Assembly For users that only have Nanopore data, the pipeline quality trims these using [PoreChop](https://github.com/rrwick/Porechop) and assesses basic sequencing QC utilizing [NanoPlot](https://github.com/wdecoster/NanoPlot) and [PycoQC](https://github.com/a-slide/pycoQC). -The pipeline can then perform long read assembly utilizing [Unicycler](https://github.com/rrwick/Unicycler), [Miniasm](https://github.com/lh3/miniasm) in combination with [Racon](https://github.com/isovic/racon) or [Canu](https://github.com/marbl/canu). Long reads can be polished using specified Fast5 files with [NanoPolish](https://github.com/jts/nanopolish). +The pipeline can then perform long read assembly utilizing [Unicycler](https://github.com/rrwick/Unicycler), [Miniasm](https://github.com/lh3/miniasm) in combination with [Racon](https://github.com/isovic/racon), or [Canu](https://github.com/marbl/canu). Long reads assembly can be polished using [Medaka](https://github.com/nanoporetech/medaka) or [NanoPolish](https://github.com/jts/nanopolish) with Fast5 files. ### Hybrid Assembly For users specifying both short read and long read (NanoPore) data, the pipeline can perform a hybrid assembly approach utilizing [Unicycler](https://github.com/rrwick/Unicycler), taking the full set of information from short reads and long reads into account. -### Shared QC across all forms of assembly +### Assembly QC and annotation -In all cases, the assembly is assessed using [QUAST](http://bioinf.spbau.ru/quast). -The resulting bacterial assembly is furthermore annotated using [Prokka](https://github.com/tseemann/prokka). +In all cases, the assembly is assessed using [QUAST](http://bioinf.spbau.ru/quast). The resulting bacterial assembly is furthermore annotated using [Prokka](https://github.com/tseemann/prokka) or [DFAST](https://github.com/nigyta/dfast_core). -In addition, the pipeline creates various reports in the `results` directory specified, including a [MultiQC](https://multiqc.info) report summarizing some of the findings and software versions. +## Quick Start -The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple computing infrastructures in a portable manner. It comes with docker or singularity containers as well as conda environments, making installation trivial and results highly reproducible. +1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.04.0`) -## Documentation +2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_ + +3. Download the pipeline and test it on a minimal dataset with a single command: + + ```console + nextflow run nf-core/bacass -profile test, + ``` + + > * Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > * If you are using `singularity` then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to pre-download all of the required containers before running the pipeline and to set the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options to be able to store and re-use the images from a central location for future pipeline runs. + > * If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. -The nf-core/bacass pipeline comes with documentation about the pipeline, found in the `docs/` directory: +4. Start running your own analysis! -The nf-core/bacass pipeline comes with documentation about the pipeline which you can read at [https://nf-core/bacass/docs](https://nf-core/bacass/docs) or find in the [`docs/` directory](docs). + Default: Short read assembly with Unicycler, `--kraken2db` can be any [compressed database (`.tar.gz`/`.tgz`)](https://benlangmead.github.io/aws-indexes/k2): + + ```console + nextflow run nf-core/bacass -profile --input samplesheet.tsv --kraken2db "https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz" + ``` + + Long read assembly with Miniasm: + + ```console + nextflow run nf-core/bacass -profile --input samplesheet.tsv --assembly_type 'long' --assembler 'miniasm' --kraken2db "https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz" + ``` + +## Documentation + +The nf-core/bacass pipeline comes with documentation about the pipeline [usage](https://nf-co.re/bacass/usage), [parameters](https://nf-co.re/bacass/parameters) and [output](https://nf-co.re/bacass/output). ## Credits -nf-core/bacass was originally written by Andreas Wilm, Alexander Peltzer. +nf-core/bacass was initiated by [Andreas Wilm](https://github.com/andreas-wilm), originally written by [Alex Peltzer](https://github.com/apeltzer) (DSL1) and rewritten by [Daniel Straub](https://github.com/d4straub) (DSL2). ## Contributions and Support @@ -51,9 +85,11 @@ If you would like to contribute to this pipeline, please see the [contributing g For further information or help, don't hesitate to get in touch on the [Slack `#bacass` channel](https://nfcore.slack.com/channels/bacass) (you can join with [this invite](https://nf-co.re/join/slack)). -## Citation +## Citations + +If you use nf-core/bacass for your analysis, please cite it using the following doi: [10.5281/zenodo.2669428](https://doi.org/10.5281/zenodo.2669428) -If you use nf-core/bacass for your analysis, please cite it using the following doi: [10.5281/zenodo.3574476](https://zenodo.org/badge/latestdoi/168486714) +An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. You can cite the `nf-core` publication as follows: @@ -62,4 +98,3 @@ You can cite the `nf-core` publication as follows: > Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). -> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) diff --git a/assets/email_template.html b/assets/email_template.html index 05b2d0c8..ebf8e7a3 100644 --- a/assets/email_template.html +++ b/assets/email_template.html @@ -1,11 +1,10 @@ - - + nf-core/bacass Pipeline Report diff --git a/assets/nf-core-bacass_logo.png b/assets/nf-core-bacass_logo.png index 37e3ec62..75f92635 100644 Binary files a/assets/nf-core-bacass_logo.png and b/assets/nf-core-bacass_logo.png differ diff --git a/assets/nf-core-bacass_social_preview.png b/assets/nf-core-bacass_social_preview.png deleted file mode 100644 index d06998ce..00000000 Binary files a/assets/nf-core-bacass_social_preview.png and /dev/null differ diff --git a/assets/nf-core-bacass_social_preview.svg b/assets/nf-core-bacass_social_preview.svg deleted file mode 100644 index 73f16e32..00000000 --- a/assets/nf-core-bacass_social_preview.svg +++ /dev/null @@ -1,448 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - Simple bacterial assembly and annotation pipeline - bacass - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv new file mode 100644 index 00000000..983c7011 --- /dev/null +++ b/assets/samplesheet.csv @@ -0,0 +1,4 @@ +ID R1 R2 LongFastQ Fast5 GenomeSize +shortreads ./data/S1_R1.fastq.gz ./data/S1_R2.fastq.gz NA NA NA +longreads NA NA ./data/S1_long_fastq.gz ./data/FAST5 2.8m +shortNlong ./data/S1_R1.fastq.gz ./data/S1_R2.fastq.gz ./data/S1_long_fastq.gz ./data/FAST5 2.8m diff --git a/assets/schema_input.json b/assets/schema_input.json new file mode 100644 index 00000000..9d286b07 --- /dev/null +++ b/assets/schema_input.json @@ -0,0 +1,39 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/bacass/master/assets/schema_input.json", + "title": "nf-core/bacass pipeline - params.input schema", + "description": "Schema for the file provided with params.input", + "type": "array", + "items": { + "type": "object", + "properties": { + "sample": { + "type": "string", + "pattern": "^\\S+$", + "errorMessage": "Sample name must be provided and cannot contain spaces" + }, + "fastq_1": { + "type": "string", + "pattern": "^\\S+\\.f(ast)?q\\.gz$", + "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + }, + "fastq_2": { + "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'", + "anyOf": [ + { + "type": "string", + "pattern": "^\\S+\\.f(ast)?q\\.gz$" + }, + { + "type": "string", + "maxLength": 0 + } + ] + } + }, + "required": [ + "sample", + "fastq_1" + ] + } +} diff --git a/assets/sendmail_template.txt b/assets/sendmail_template.txt index 3302baf8..8f0cfd4c 100644 --- a/assets/sendmail_template.txt +++ b/assets/sendmail_template.txt @@ -14,16 +14,16 @@ Content-Transfer-Encoding: base64 Content-ID: Content-Disposition: inline; filename="nf-core-bacass_logo.png" -<% out << new File("$baseDir/assets/nf-core-bacass_logo.png"). - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' ) %> +<% out << new File("$projectDir/assets/nf-core-bacass_logo.png"). + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' ) %> <% if (mqcFile){ @@ -37,15 +37,15 @@ Content-ID: Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\" ${mqcFileObj. - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' )} + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' )} """ }} %> diff --git a/assets/test_config_dfast.py b/assets/test_config_dfast.py index 2e47c4e7..234b0bdc 100644 --- a/assets/test_config_dfast.py +++ b/assets/test_config_dfast.py @@ -204,7 +204,7 @@ class Config: "scov_cutoff": 75, "aligner": "ghostx", # ghostz, ghostx or blastp "aligner_options": {}, # Normally, leave this empty. (Current version does not use this option.) - "database": "@@APP_ROOT@@/db/protein/DFAST-default.ref", + "database": "./protein/DFAST-default.ref", }, }, { diff --git a/bin/markdown_to_html.py b/bin/markdown_to_html.py deleted file mode 100755 index a26d1ff5..00000000 --- a/bin/markdown_to_html.py +++ /dev/null @@ -1,91 +0,0 @@ -#!/usr/bin/env python -from __future__ import print_function -import argparse -import markdown -import os -import sys -import io - - -def convert_markdown(in_fn): - input_md = io.open(in_fn, mode="r", encoding="utf-8").read() - html = markdown.markdown( - "[TOC]\n" + input_md, - extensions=["pymdownx.extra", "pymdownx.b64", "pymdownx.highlight", "pymdownx.emoji", "pymdownx.tilde", "toc"], - extension_configs={ - "pymdownx.b64": {"base_path": os.path.dirname(in_fn)}, - "pymdownx.highlight": {"noclasses": True}, - "toc": {"title": "Table of Contents"}, - }, - ) - return html - - -def wrap_html(contents): - header = """ - - - - - -
- """ - footer = """ -
- - - """ - return header + contents + footer - - -def parse_args(args=None): - parser = argparse.ArgumentParser() - parser.add_argument("mdfile", type=argparse.FileType("r"), nargs="?", help="File to convert. Defaults to stdin.") - parser.add_argument( - "-o", "--out", type=argparse.FileType("w"), default=sys.stdout, help="Output file name. Defaults to stdout." - ) - return parser.parse_args(args) - - -def main(args=None): - args = parse_args(args) - converted_md = convert_markdown(args.mdfile.name) - html = wrap_html(converted_md) - args.out.write(html) - - -if __name__ == "__main__": - sys.exit(main()) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index adf3cbca..df04fa4a 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -3,19 +3,20 @@ import os results = {} -version_files = [x for x in os.listdir('.') if x.endswith('.version.txt')] +version_files = [x for x in os.listdir(".") if x.endswith(".version.txt")] for version_file in version_files: - software = version_file.replace('.version.txt','') - if software == 'pipeline': - software = 'nf-core/bacass' + software = version_file.replace(".version.txt", "") + if software == "pipeline": + software = "nf-core/bacass" with open(version_file) as fin: version = fin.read().strip() results[software] = version # Dump to YAML -print (''' +print( + """ id: 'software_versions' section_name: 'nf-core/bacass Software Versions' section_href: 'https://github.com/nf-core/bacass' @@ -23,12 +24,13 @@ description: 'are collected at run time from the software output.' data: |
-''') -for k,v in sorted(results.items()): - print("
{}
{}
".format(k,v)) -print ("
") +""" +) +for k, v in sorted(results.items()): + print("
{}
{}
".format(k, v)) +print(" ") -# Write out regexes as csv file: -with open('software_versions.csv', 'w') as f: - for k,v in sorted(results.items()): - f.write("{}\t{}\n".format(k,v)) +# Write out as tsv file: +with open("software_versions.tsv", "w") as f: + for k, v in sorted(results.items()): + f.write("{}\t{}\n".format(k, v)) diff --git a/conf/base.config b/conf/base.config index a98bbf14..842ca37c 100644 --- a/conf/base.config +++ b/conf/base.config @@ -1,81 +1,55 @@ -process { - - cpus = { check_max( 2, 'cpus' ) } - memory = { check_max( 8.GB * task.attempt, 'memory' ) } - time = { check_max( 2.h * task.attempt, 'time' ) } - - errorStrategy = { task.exitStatus in [1,143,137,104,134,139] ? 'retry' : 'finish' } - maxRetries = 3 - maxErrors = '-1' - - withLabel:'small'{ - cpus = { check_max( 2, 'cpus' ) } - memory = { check_max( 1.GB * task.attempt, 'memory' ) } - time = { check_max( 1.h * task.attempt, 'time' ) } - } - - withLabel:'medium' { - cpus = { check_max( 8, 'cpus' ) } - memory = { check_max( 8.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } - } - - withLabel: 'medium_extramem'{ - cpus = { check_max( 8, 'cpus' ) } - memory = { check_max( 16.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } - } - - withLabel:'large'{ - cpus = { check_max( 32, 'cpus' ) } - memory = { check_max( 350.GB * task.attempt, 'memory' ) } - time = { check_max( 160.h * task.attempt, 'time' ) } - } - -//Container definitions, should be replaced once move to DSLv2 is done - withName:'quast'{ - container = 'quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2' - } +/* +======================================================================================== + nf-core/bacass Nextflow base config file +======================================================================================== + A 'blank slate' config file, appropriate for general use on most high performance + compute environments. Assumes that all software is installed and available on + the PATH. Runs in `local` mode - all jobs will be run on the logged in environment. +---------------------------------------------------------------------------------------- +*/ - withName:'adapter_trimming'{ - container = 'quay.io/biocontainers/porechop:0.2.4--py38hed8969a_1' - } - - withName:'unicycler'{ - container = 'quay.io/biocontainers/unicycler:0.4.4--py38h8162308_3' - } - - withName:'dfast'{ - container = 'quay.io/biocontainers/dfast:1.2.10--h8b12597_0' - } - - withName:'medaka'{ - container = 'quay.io/biocontainers/medaka:1.1.2--py38hfcf0ad1_0' - } - - withName:'pycoqc' { - container = 'quay.io/biocontainers/pycoqc:2.5.0.23--py_0' - } - - withName:'prokka'{ - container = 'quay.io/biocontainers/prokka:1.14.6--pl526_0' - } - - withName:'nanopolish'{ - container = 'quay.io/biocontainers/mulled-v2-47609127678014b991aafdfe3c9141852f17fee7:11970b07a8fd8f9441c3235a979d527c7df52d34-0' - } - - withName:'nanoplot' { - container = 'quay.io/biocontainers/nanoplot:1.32.1--py_0' - } - - +process { - params { - // Defaults only, expecting to be overwritten - max_memory = 64.GB - max_cpus = 32 - max_time = 24.h - igenomes_base = 's3://ngi-igenomes/igenomes/' - } + cpus = { check_max( 1 * task.attempt, 'cpus' ) } + memory = { check_max( 1.GB * task.attempt, 'memory' ) } + time = { check_max( 1.h * task.attempt, 'time' ) } + + errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } + maxRetries = 1 + maxErrors = '-1' + + // Process-specific resource requirements + // NOTE - Please try and re-use the labels below as much as possible. + // These labels are used and recognised by default in DSL2 files hosted on nf-core/modules. + // If possible, it would be nice to keep the same label naming convention when + // adding in your local modules too. + // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors + withLabel:process_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 1.GB * task.attempt, 'memory' ) } + time = { check_max( 1.h * task.attempt, 'time' ) } + } + withLabel:process_medium { + cpus = { check_max( 8 * task.attempt, 'cpus' ) } + memory = { check_max( 8.GB * task.attempt, 'memory' ) } + time = { check_max( 8.h * task.attempt, 'time' ) } + } + withLabel:process_high { + cpus = { check_max( 12 * task.attempt, 'cpus' ) } + memory = { check_max( 72.GB * task.attempt, 'memory' ) } + time = { check_max( 16.h * task.attempt, 'time' ) } + } + withLabel:process_long { + time = { check_max( 160.h * task.attempt, 'time' ) } + } + withLabel:process_high_memory { + memory = { check_max( 350.GB * task.attempt, 'memory' ) } + } + withLabel:error_ignore { + errorStrategy = 'ignore' + } + withLabel:error_retry { + errorStrategy = 'retry' + maxRetries = 3 + } } diff --git a/conf/igenomes.config b/conf/igenomes.config deleted file mode 100644 index caeafceb..00000000 --- a/conf/igenomes.config +++ /dev/null @@ -1,421 +0,0 @@ -/* - * ------------------------------------------------- - * Nextflow config file for iGenomes paths - * ------------------------------------------------- - * Defines reference genomes, using iGenome paths - * Can be used by any config that customises the base - * path using $params.igenomes_base / --igenomes_base - */ - -params { - // illumina iGenomes reference file paths - genomes { - 'GRCh37' { - fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" - mito_name = "MT" - macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed" - } - 'GRCh38' { - fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" - } - 'GRCm38' { - fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" - mito_name = "MT" - macs_gsize = "1.87e9" - blacklist = "${baseDir}/assets/blacklists/GRCm38-blacklist.bed" - } - 'TAIR10' { - fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" - mito_name = "Mt" - } - 'EB2' { - fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" - } - 'UMD3.1' { - fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" - mito_name = "MT" - } - 'WBcel235' { - fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" - mito_name = "MtDNA" - macs_gsize = "9e7" - } - 'CanFam3.1' { - fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" - mito_name = "MT" - } - 'GRCz10' { - fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" - mito_name = "MT" - } - 'BDGP6' { - fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" - mito_name = "M" - macs_gsize = "1.2e8" - } - 'EquCab2' { - fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" - mito_name = "MT" - } - 'EB1' { - fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" - } - 'Galgal4' { - fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" - mito_name = "MT" - } - 'Gm01' { - fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" - } - 'Mmul_1' { - fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" - mito_name = "MT" - } - 'IRGSP-1.0' { - fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" - mito_name = "Mt" - } - 'CHIMP2.1.4' { - fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" - mito_name = "MT" - } - 'Rnor_6.0' { - fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" - mito_name = "MT" - } - 'R64-1-1' { - fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" - mito_name = "MT" - macs_gsize = "1.2e7" - } - 'EF2' { - fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" - mito_name = "MT" - macs_gsize = "1.21e7" - } - 'Sbi1' { - fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" - } - 'Sscrofa10.2' { - fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" - mito_name = "MT" - } - 'AGPv3' { - fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" - mito_name = "Mt" - } - 'hg38' { - fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" - } - 'hg19' { - fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/hg19-blacklist.bed" - } - 'mm10' { - fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "1.87e9" - blacklist = "${baseDir}/assets/blacklists/mm10-blacklist.bed" - } - 'bosTau8' { - fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed" - mito_name = "chrM" - } - 'ce10' { - fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "9e7" - } - 'canFam3' { - fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" - mito_name = "chrM" - } - 'danRer10' { - fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "1.37e9" - } - 'dm6' { - fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "1.2e8" - } - 'equCab2' { - fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" - mito_name = "chrM" - } - 'galGal4' { - fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" - mito_name = "chrM" - } - 'panTro4' { - fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" - mito_name = "chrM" - } - 'rn6' { - fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed" - mito_name = "chrM" - } - 'sacCer3' { - fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" - readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "1.2e7" - } - 'susScr3' { - fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" - mito_name = "chrM" - } - } -} diff --git a/conf/modules.config b/conf/modules.config new file mode 100644 index 00000000..17944342 --- /dev/null +++ b/conf/modules.config @@ -0,0 +1,134 @@ +/* +======================================================================================== + Config file for defining DSL2 per module options +======================================================================================== + Available keys to override module options: + args = Additional arguments appended to command in module. + args2 = Second set of arguments appended to command in module (multi-tool modules). + args3 = Third set of arguments appended to command in module (multi-tool modules). + publish_dir = Directory to publish results. + publish_by_meta = Groovy list of keys available in meta map to append as directories to "publish_dir" path + If publish_by_meta = true - Value of ${meta['id']} is appended as a directory to "publish_dir" path + If publish_by_meta = ['id', 'custompath'] - If "id" is in meta map and "custompath" isn't then "${meta['id']}/custompath/" + is appended as a directory to "publish_dir" path + If publish_by_meta = false / null - No directories are appended to "publish_dir" path + publish_files = Groovy map where key = "file_ext" and value = "directory" to publish results for that file extension + The value of "directory" is appended to the standard "publish_dir" path as defined above. + If publish_files = null (unspecified) - All files are published. + If publish_files = false - No files are published. + suffix = File name suffix for output files. +---------------------------------------------------------------------------------------- +*/ + +params { + modules { + 'fastqc' { + args = "--quiet" + publish_by_meta = ['id', 'FastQC'] + publish_dir = "." + } + 'skewer' { + args = "-m pe -q 3 -n --quiet" + publish_by_meta = ['id', 'trimming/shortreads'] + publish_dir = "." + } + 'nanoplot' { + args = "" + publish_by_meta = ['id', 'QC_longreads/NanoPlot'] + publish_dir = "." + } + 'pycoqc' { + args = "" + publish_files = [ '.html':'', '.json':'' ] + publish_by_meta = ['id', 'QC_longreads/PycoQC'] + publish_dir = "." + } + 'porechop' { + args = "" + publish_by_meta = ['id', 'trimming/longreads'] + publish_dir = "." + } + 'unicycler' { + args = "" + publish_by_meta = ['id', 'Unicycler'] + publish_dir = "." + } + 'canu' { + args = "" + publish_by_meta = ['id', 'Canu'] + publish_dir = "." + } + 'minimap_align' { + args = "-x ava-ont" + publish_files = false + publish_by_meta = ['id', 'minimap_align'] + publish_dir = "." + } + 'minimap_consensus' { + args = "-x map-ont" + publish_files = false + publish_by_meta = ['id', 'minimap_consensus'] + publish_dir = "." + } + 'minimap_polish' { + args = "-ax map-ont" + publish_files = false + publish_by_meta = ['id', 'minimap_polish'] + publish_dir = "." + } + 'miniasm' { + args = "" + publish_files = [ '_assembly.fasta':'' ] + publish_by_meta = ['id', 'Miniasm'] + publish_dir = "." + } + 'racon' { + args = "" + publish_files = [ '_assembly_consensus.fasta':'' ] + publish_by_meta = ['id', 'Miniasm'] + publish_dir = "." + } + 'medaka' { + args = "" + publish_by_meta = ['id', 'Medaka'] + publish_dir = "." + } + 'nanopolish' { + args = "" + publish_by_meta = ['id', 'Nanopolish'] + publish_dir = "." + } + 'kraken2' { + args = "" + publish_files = [ 'report.txt':'' ] + publish_by_meta = ['id', 'Kraken2'] + publish_dir = "." + } + 'kraken2_long' { + args = "" + suffix = "_longreads" + publish_files = [ 'report.txt':'' ] + publish_by_meta = ['id', 'Kraken2'] + publish_dir = "." + } + 'quast' { + args = "" + publish_by_meta = false //the module allows no meta, it collects all assemblies! + publish_dir = "./QUAST" + suffix = "other_files" + } + 'prokka' { + args = "" + publish_by_meta = ['id', 'Prokka'] + publish_dir = "." + } + 'dfast' { + args = "" + publish_by_meta = ['id', 'DFAST'] + publish_dir = "." + } + 'multiqc' { + args = "" + } + } +} diff --git a/conf/test.config b/conf/test.config index 6a25a110..38cb8622 100644 --- a/conf/test.config +++ b/conf/test.config @@ -1,25 +1,31 @@ /* - * ------------------------------------------------- - * Nextflow config file for running tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: - * nextflow run nf-core/bacass -profile test, - */ +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/bacass -profile test, + +---------------------------------------------------------------------------------------- +*/ params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = 6.GB - max_time = 48.h - skip_kraken2 = true - // some extra args to speed tests up - unicycler_args="--no_correct --no_pilon" - prokka_args=" --fast" - assembly_type = 'short' - skip_pycoqc = true - // Input dataset - input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_short.csv' + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = 6.GB + max_time = 6.h + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_short.csv' + + // some extra args to speed tests up + unicycler_args="--no_correct --no_pilon" + prokka_args=" --fast" + assembly_type = 'short' + skip_pycoqc = true + skip_kraken2 = true } diff --git a/conf/test_dfast.config b/conf/test_dfast.config index 9138ed47..756554d9 100644 --- a/conf/test_dfast.config +++ b/conf/test_dfast.config @@ -1,25 +1,31 @@ /* - * ------------------------------------------------- - * Nextflow config file for running tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: - * nextflow run nf-core/bacass -profile test_dfast - */ +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/bacass -profile test_dfast, + +---------------------------------------------------------------------------------------- +*/ params { - config_profile_name = 'Test profile with dfast as annotation tool' - config_profile_description = 'Minimal test dataset to check pipeline function, using dfast instead of prokka for annotation' - // Limit resources so that this can run on Travis - max_cpus = 2 - max_memory = 6.GB - max_time = 48.h - skip_kraken2 = true - // some extra args to speed tests up - unicycler_args="--no_correct --no_pilon" - annotation_tool = 'dfast' - assembly_type = 'short' - skip_pycoqc = true - // Input dataset - input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_short.csv' + config_profile_name = 'Test_dfast profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = 6.GB + max_time = 6.h + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_short.csv' + + // some extra args to speed tests up + unicycler_args="--no_correct --no_pilon" + annotation_tool = 'dfast' + assembly_type = 'short' + skip_pycoqc = true + skip_kraken2 = true } \ No newline at end of file diff --git a/conf/test_full.config b/conf/test_full.config index d5d28082..298a4447 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -1,22 +1,20 @@ /* - * ------------------------------------------------- - * Nextflow config file for running full-size tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a full size pipeline test. Use as follows: - * nextflow run nf-core/bacass -profile test_full, - */ -includeConfig 'test.config' +======================================================================================== + Nextflow config file for running full-size tests +======================================================================================== + Defines input files and everything required to run a full size pipeline test. + + Use as follows: + nextflow run nf-core/bacass -profile test_full, + +---------------------------------------------------------------------------------------- +*/ params { - config_profile_name = 'Full test profile' - config_profile_description = 'Full test dataset to check pipeline function' + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' - // some extra args to speed tests up - prokka_args=" --fast" - canu_args='stopOnLowCoverage=0 minInputCoverage=0' - assembly_type='long' - skip_polish = true - // Input data - input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_long.csv' + // Input data for full size test + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_full.csv' + kraken2db = 'https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz' } diff --git a/conf/test_hybrid.config b/conf/test_hybrid.config index 40bd1abb..cd93e699 100644 --- a/conf/test_hybrid.config +++ b/conf/test_hybrid.config @@ -1,18 +1,29 @@ /* - * ------------------------------------------------- - * Nextflow config file for running tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: - * nextflow run nf-core/bacass -profile test_long - */ -includeConfig 'test.config' +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/bacass -profile test, + +---------------------------------------------------------------------------------------- +*/ + params { - config_profile_name = 'Test profile for long and short read data' - config_profile_description = 'Minimal test dataset to check pipeline function' - // some extra args to speed tests up - assembly_type='hybrid' - prokka_args=" --fast" - // Input data - input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_hybrid.csv' -} \ No newline at end of file + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = 6.GB + max_time = 6.h + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_hybrid.csv' + + // some extra args to speed tests up + assembly_type='hybrid' + prokka_args=" --fast" + skip_kraken2 = true +} diff --git a/conf/test_long.config b/conf/test_long.config index 7cbc76f9..be225894 100644 --- a/conf/test_long.config +++ b/conf/test_long.config @@ -1,22 +1,30 @@ /* - * ------------------------------------------------- - * Nextflow config file for running tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: - * nextflow run nf-core/bacass -profile test_long - */ -includeConfig 'test.config' +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/bacass -profile test_long, + +---------------------------------------------------------------------------------------- +*/ params { - config_profile_name = 'Test profile for long-read data' - config_profile_description = 'Minimal test dataset to check pipeline function' + config_profile_name = 'Test_long profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = 6.GB + max_time = 6.h + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_long_miniasm.csv' - // some extra args to speed tests up - prokka_args=" --fast" - canu_args='stopOnLowCoverage=0 minInputCoverage=0' - assembly_type='long' - skip_polish = true - // Input data - input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_long.csv' -} \ No newline at end of file + // some extra args to speed tests up + prokka_args = " --fast" + assembly_type = 'long' + skip_polish = true + skip_kraken2 = true +} diff --git a/conf/test_long_miniasm.config b/conf/test_long_miniasm.config index 93dee34f..a68d3124 100644 --- a/conf/test_long_miniasm.config +++ b/conf/test_long_miniasm.config @@ -1,22 +1,30 @@ /* - * ------------------------------------------------- - * Nextflow config file for running tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: - * nextflow run nf-core/bacass -profile test_long - */ -includeConfig 'test.config' +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/bacass -profile test_long_miniasm, + +---------------------------------------------------------------------------------------- +*/ params { - config_profile_name = 'Test profile for long-read data' - config_profile_description = 'Minimal test dataset to check pipeline function' + config_profile_name = 'Test_long_miniasm profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = 6.GB + max_time = 6.h + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_long_miniasm.csv' - // some extra args to speed tests up - prokka_args=" --fast" - assembly_type='long' - assembler = 'miniasm' - skip_polish = true - // Input data - input = 'https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_long_miniasm.csv' -} \ No newline at end of file + // some extra args to speed tests up + prokka_args = " --fast" + assembly_type = 'long' + assembler = 'miniasm' + kraken2db = "https://genome-idx.s3.amazonaws.com/kraken/16S_Greengenes13.5_20200326.tgz" +} diff --git a/docs/README.md b/docs/README.md index 88f11b6d..5fbf9904 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,11 +1,10 @@ # nf-core/bacass: Documentation -The nf-core/bacass documentation is split into the following files: +The nf-core/bacass documentation is split into the following pages: * [Usage](usage.md) - * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. + * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. * [Output](output.md) - * An overview of the different results produced by the pipeline and how to interpret them. -* [Containers](containers.md) + * An overview of the different results produced by the pipeline and how to interpret them. You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/containers.md b/docs/containers.md deleted file mode 100644 index df229e06..00000000 --- a/docs/containers.md +++ /dev/null @@ -1,5 +0,0 @@ -# Containers - -## `nfcore/bacass` container - -Some of the tools we use in bacass are inherently relying on software dependencies that are not installable in the same environment due to dependency conflicts. In these cases we utilize biocontainers. With a potential switch to Nextflow DSLv2 and therefore nextflow modules, we intend to have separated containers for all processes. This subsequently also means, that we cannot provide a single conda environment for the entire pipeline - please use Singularity or Docker to run your analysis - there is _NO_ conda support with the pipeline. diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png new file mode 100755 index 00000000..361d0e47 Binary files /dev/null and b/docs/images/mqc_fastqc_adapter.png differ diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png new file mode 100755 index 00000000..cb39ebb8 Binary files /dev/null and b/docs/images/mqc_fastqc_counts.png differ diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png new file mode 100755 index 00000000..a4b89bf5 Binary files /dev/null and b/docs/images/mqc_fastqc_quality.png differ diff --git a/docs/images/nf-core-bacass_logo.png b/docs/images/nf-core-bacass_logo.png index 862027b5..d37cb23f 100644 Binary files a/docs/images/nf-core-bacass_logo.png and b/docs/images/nf-core-bacass_logo.png differ diff --git a/docs/images/nfcore-bacass_logo.png b/docs/images/nfcore-bacass_logo.png deleted file mode 100644 index ce8716b3..00000000 Binary files a/docs/images/nfcore-bacass_logo.png and /dev/null differ diff --git a/docs/images/nfcore-bacass_logo.svg b/docs/images/nfcore-bacass_logo.svg deleted file mode 100644 index 7c5217d1..00000000 --- a/docs/images/nfcore-bacass_logo.svg +++ /dev/null @@ -1,205 +0,0 @@ - -image/svg+xmlnf- -core/ -bacass - \ No newline at end of file diff --git a/docs/output.md b/docs/output.md index 99bd31c0..974f9e22 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,33 +1,27 @@ # nf-core/bacass: Output -## :warning: Please read this documentation on the nf-core website: [https://nf-co.re/bacass/output](https://nf-co.re/bacass/output) - -> _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._ - ## Introduction This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. +The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. + ## Pipeline overview -The pipeline is built using [Nextflow](https://www.nextflow.io/) -and processes data using the following steps: +The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -* [nf-core/bacass: Output](#nf-corebacass-output) - * [Pipeline overview](#pipeline-overview) - * [Quality trimming and QC](#quality-trimming-and-qc) +* [Quality trimming and QC](#quality-trimming-and-qc) * [Short Read Trimming](#short-read-trimming) * [Short Read RAW QC](#short-read-raw-qc) * [Long Read Trimming](#long-read-trimming) * [Long Read RAW QC](#long-read-raw-qc) - * [Taxonomic classification](#taxonomic-classification) - * [Kraken2 report screenshot](#kraken2-report-screenshot) - * [Assembly Output](#assembly-output) - * [Assembly Visualization with Bandage](#assembly-visualization-with-bandage) - * [Assembly QC with QUAST](#assembly-qc-with-quast) - * [Annotation with Prokka](#annotation-with-prokka) - * [Report](#report) - * [Pipeline information](#pipeline-information) +* [Taxonomic classification](#taxonomic-classification) +* [Assembly Output](#assembly-output) + * [Polished assemblies](#polished-assemblies) +* [Assembly QC with QUAST](#assembly-qc-with-quast) +* [Annotation](#annotation) +* [Report](#report) +* [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution ## Quality trimming and QC @@ -36,51 +30,64 @@ and processes data using the following steps: This step quality trims the end of reads, removes degenerate or too short reads and if needed, combines reads coming from multiple sequencing runs. -**Output directory: `{sample_id}/trimming/shortreads/`** +
+Output files -* `*.fastq.gz` - * trimmed (and combined reads) +* `{sample_id}/trimming/shortreads/` + * `*.fastq.gz`: Trimmed (and combined reads) + +
### Short Read RAW QC -This step runs FastQC which produces -general quality metrics on your (trimmed) samples and plots them. +[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). -**Output directory: `{sample_id}/trimming/shortreads/`** +> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. -* `*_fastqc.html` - * FastQC report, containing quality metrics for your trimmed reads -* `*_fastqc.zip` - * zip file containing the FastQC report, tab-delimited data file and plot images +
+Output files -For further reading and documentation see the [FastQC help](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). +* `{sample_id}/FastQC/` + * `*_fastqc.html`: FastQC report containing quality metrics. + * `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. ![FastQC report](images/fastqc.png) +
+ ### Long Read Trimming This step performs long read trimming on Nanopore input (if provided). -**Output directory: `{sample_id}/trimming/longreads/`** +
+Output files + +* `{sample_id}/trimming/longreads/` + * `trimmed.fastq.gz`: The trimmed FASTQ file -* `trimmed.fastq` - * The trimmed FASTQ file +
### Long Read RAW QC These steps perform long read QC for input data (if provided). -**Output directory: `{sample_id}/QC_Longreads/`** +Please refer to the documentation of [NanoPlot](https://github.com/wdecoster/NanoPlot) and [PycoQC](https://a-slide.github.io/pycoQC/) if you want to know more about the plots created by these tools. -* `NanoPlot` -* `PycoQC` +
+Output files -Please refer to the documentation of [NanoPlot](https://github.com/wdecoster/NanoPlot) and [PycoQC](https://a-slide.github.io/pycoQC/) if you want to know more about the plots created by these tools. +* `{sample_id}/QC_Longreads/NanoPlot`: Various plots in HTML and PNG format + +* `{sample_id}/QC_Longreads/PycoQC` + * `{sample_id}_pycoqc.html`: QC report in HTML format + * `{sample_id}_pycoqc.json`: QC report in JSON format Example plot from Nanoplot: ![Nanoplot](images/nanoplot.png) +
+ ## Taxonomic classification This QC step classifies your reads using [Kraken2](https://ccb.jhu.edu/software/kraken2/) a k-mer based approach. This helps to identify samples that have purity @@ -88,105 +95,142 @@ issues. Ideally you will not want to assemble reads from samples that are contam multiple species. If you like to visualize the report, try [Pavian](https://github.com/fbreitwieser/pavian) or [Krakey](http://krakey.info/). -**Output directory: `{sample}/`** +
+Output files + +* `{sample}/Kraken2` + * `{sample}.kraken2.report.txt`: Classification of short reads in the Kraken(1) report format. + * `{sample}_longreads.kraken2.report.txt`: Classification of long reads in the Kraken(1) report format. -* `*_kraken2.report` - * Classification in the Kraken(1) report format. See - [webpage](http://ccb.jhu.edu/software/kraken/MANUAL.html#sample-reports) for more details +See [webpage](http://ccb.jhu.edu/software/kraken/MANUAL.html#sample-reports) for more details. -### Kraken2 report screenshot +Exemplary Kraken2 report screenshot: ![Kraken2 report](images/kraken2.png) +
+ ## Assembly Output Trimmed reads are assembled with [Unicycler](https://github.com/rrwick/Unicycler) in `short` or `hybrid` assembly modes. For long-read assembly, there are also `canu` and `miniasm` available. Unicycler is a pipeline on its own, which at least for Illumina reads mainly acts as a frontend to Spades with added polishing steps. -**Output directory: `{sample_id}/unicycler`** +
+Output files -* `{sample}_assembly.fasta` - * Final assembly -* `{sample}_assembly.gfa` - * Final assembly in Graphical Fragment Assembly (GFA) format -* `{sample}_unicycler.log` - * Log file summarizing steps and intermediate results on the Unicycler execution +* `{sample_id}/Unicycler` + * `{sample}.scaffolds.fa`: Final assembly in fasta format + * `{sample}.assembly.gfa`: Final assembly in Graphical Fragment Assembly (GFA) format + * `{sample}.unicycler.log`: Log file summarizing steps and intermediate results on the Unicycler execution Check out the [Unicycler documentation](https://github.com/rrwick/Unicycler) for more information on Unicycler output. -**Output directory: `{sample_id}/canu`** +* `{sample_id}/Canu` + * `{sample}_assembly.fasta`: Final assembly in fasta format + * `{sample}_assembly.report`: Log file Check out the [Canu documentation](https://canu.readthedocs.io/en/latest/index.html) for more information on Canu output. -**Output directory: `{sample_id}/miniasm`** - -* `consensus` - * The consensus sequence created by `miniasm` +* `{sample_id}/Miniasm` + * `{sample}_assembly.fasta`: Assembly in fasta format + * `{sample}_assembly_consensus.fasta`: Consensus assembly in fasta format (polished by Racon) Check out the [Miniasm documentation](https://github.com/lh3/miniasm) for more information on Miniasm output. -## Assembly Visualization with Bandage +
+ +### Polished assemblies -The GFA file produced in the assembly step with Unicycler can be used to visualise the assembly graph, which is -done here with [Bandage](https://rrwick.github.io/Bandage/). We highly recommend to run the Bandage GUI for more versatile visualisation options (annotations etc). +Long reads assemblies can be polished using [Medaka](https://github.com/nanoporetech/medaka) or [NanoPolish](https://github.com/jts/nanopolish) with Fast5 files. -**Output directory: `{sample_id}/unicycler`** +
+Output files -* `{sample}_assembly.png` - * Bandage visualization of assembly +* `{sample_id}/Medaka/{sample_id}_polished_genome.fa` + * `consensus.fasta`: Polished consensus assembly in fasta format + * `calls_to_draft.bam`: Alignment in bam format + * `calls_to_draft.bam.bai`: Index of alignment + * `consensus.fasta.gaps_in_draft_coords.bed` + * `consensus_probs.hdf` -![Assembly visualization](images/bandage.png) +* `{sample_id}/Nanopolish` + * `polished_genome.fa`: Polished consensus assembly in fasta format + +
## Assembly QC with QUAST -The assembly QC is performed with [QUAST](http://quast.sourceforge.net/quast). -It reports multiple metrics including number of contigs, N50, lengths etc in form of an html report. -It further creates an HTML file with integrated contig viewer (Icarus). +The assembly QC is performed with [QUAST](http://quast.sourceforge.net/quast) for all assemblies in one report. It reports multiple metrics including number of contigs, N50, lengths etc in form of an html report. It further creates an HTML file with integrated contig viewer (Icarus). -**Output directory: `{sample_id}/QUAST`** +
+Output files -* `icarus.html` - * QUAST's contig browser as HTML -* `report.html` - * QUAST assembly QC as HTML report +* `QUAST` + * `report.tsv`: QUAST's report in text format +* `QUAST/other_files` + * `icarus.html`: QUAST's contig browser as HTML + * `report.html`: QUAST assembly QC as HTML report + * `report.pdf`: QUAST assembly QC as pdf ![QUAST QC](images/quast.png) ![Icarus](images/icarus.png) -## Annotation with Prokka +
+ +## Annotation + +By default, the assembly is annotated with [Prokka](https://github.com/tseemann/prokka) which acts as frontend for several annotation tools and includes rRNA and ORF predictions. Alternatively, on request, the assembly is annotated with [DFAST](https://github.com/nigyta/dfast_core). -The assembly is annotated with [Prokka](https://github.com/tseemann/prokka) which acts as frontend -for several annotation tools and includes rRNA and ORF predictions. See [its documentation](https://github.com/tseemann/prokka#output-files) for a full description of all output files. +
+Output files -**Output directory: `{sample_id}/{sample_id}_annotation`** +* `{sample_id}/Prokka/{sample_id}` + * `{sample_id}.gff`: Annotation in gff format + * `{sample_id}.txt`: Annotation in text format + * `{sample_id}.faa`: Protein sequences in fasta format + +See [Prokka's documentation](https://github.com/tseemann/prokka#output-files) for a full description of all output files. ![Prokka annotation](images/prokka.png) +* `{sample_id}/DFAST/RESULT_{dfast_profile_name}` + * `genome.gff`: Annotation in gff format + * `statistics.txt`: Annotation statistics in text format + * `protein.faa`: Protein sequences in fasta format + +
+ ## Report Some pipeline results are visualised by [MultiQC](http://multiqc.info), which is a visualisation tool that generates a single HTML report summarising all samples in your project. Further statistics are available in within the report data directory. -[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarizing all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. +[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. -For more information about how to use MultiQC reports, see [https://multiqc.info](https://multiqc.info). +Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . -**Output files:** +
+Output files * `multiqc/` - * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. - * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. - * `multiqc_plots/`: directory containing static images from the report in various formats. + * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. + * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. + * `multiqc_plots/`: directory containing static images from the report in various formats. -## Pipeline information +
+ +### Pipeline information [Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. -**Output files:** +
+Output files * `pipeline_info/` - * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. - * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.csv`. - * Documentation for interpretation of results in HTML format: `results_description.html`. + * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.tsv`. + * Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. + +
diff --git a/docs/usage.md b/docs/usage.md index 73bf960f..a2cab431 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,4 +1,4 @@ -# nf-core/bacass: Usage +# nf-core/bacass: Usage ## :warning: Please read this documentation on the nf-core website: [https://nf-co.re/bacass/usage](https://nf-co.re/bacass/usage) @@ -6,36 +6,51 @@ ## Introduction -* [General Nextflow info](#general-nextflow-info) -* [Running the pipeline](#running-the-pipeline) - * [Updating the pipeline](#updating-the-pipeline) - * [Reproducibility](#reproducibility) -* [Main Nextflow arguments](#main-nextflow-arguments) - * [`-profile`](#-profile) +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. -## General Nextflow info +```console +--input '[path to samplesheet file]' +``` -Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through `screen` / `tmux` or similar tool. Alternatively you can run nextflow within a cluster job submitted your job scheduler. +### Samplesheet -It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in `~/.bashrc` or `~./bash_profile`): +The samplesheet must have 6 columns defined in the table below. -```bash -NXF_OPTS='-Xms1g -Xmx4g' +A final samplesheet file consisting of short read, long reads, or short and long reads. This is for 3 samples. + +```console +ID R1 R2 LongFastQ Fast5 GenomeSize +shortreads ./data/S1_R1.fastq.gz ./data/S1_R2.fastq.gz NA NA NA +longreads NA NA ./data/S1_long_fastq.gz ./data/FAST5 2.8m +shortNlong ./data/S1_R1.fastq.gz ./data/S1_R2.fastq.gz ./data/S1_long_fastq.gz ./data/FAST5 2.8m ``` +> **NB:** `./data/FAST5` points at a folder containing all (i.e. one or mutiple) fast5 files that correspond to the long reads. `NA` indicates that the file is missing. + +| Column | Description | +|-|-| +| `sample` | Custom sample name. May not contain spaces. | +| `R1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". `NA` indicates that the file is missing. | +| `R2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". `NA` indicates that the file is missing. | +| `LongFastQ` | Full path to FastQ file for ONT long reads. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". `NA` indicates that the file is missing. | +| `Fast5` | Full path to a folder containing Fast5 file(s) for ONT long reads. `NA` indicates that there are no Fast5 files available. | +| `GenomeSize` | Expected genome size. For example, `2.8m` means 2.8 million basepairs genome size expected. This is only used by Canu assembler. `NA` indicates that this value is unknown. | + +An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. + ## Running the pipeline The typical command for running the pipeline is as follows: -```bash -nextflow run nf-core/bacass --input design.tsv -profile docker +```console +nextflow run nf-core/bacass --input samplesheet.csv -profile docker --skip_kraken2 ``` This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. Note that the pipeline will create the following files in your working directory: -```bash +```console work # Directory containing the nextflow working files results # Finished results (configurable, see below) .nextflow_log # Log file from Nextflow @@ -46,41 +61,52 @@ results # Finished results (configurable, see below) When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: -```bash +```console nextflow pull nf-core/bacass ``` ### Reproducibility -It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. +It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. First, go to the [nf-core/bacass releases page](https://github.com/nf-core/bacass/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. -## Main Nextflow arguments +## Core Nextflow arguments + +> **NB:** These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen). ### `-profile` Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. -Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Conda) - see below. -If `-profile` is not specified at all the pipeline will be run locally and expects all software to be installed and available on the `PATH`. +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below. When using Biocontainers, most of these software packaging methods pull Docker containers from quay.io e.g [FastQC](https://quay.io/repository/biocontainers/fastqc) except for Singularity which directly downloads Singularity images via https hosted by the [Galaxy project](https://depot.galaxyproject.org/singularity/) and Conda which downloads and installs software locally from [Bioconda](https://bioconda.github.io/). + +> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. + +The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). + +Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! +They are loaded in sequence, so later profiles can overwrite earlier profiles. + +If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. -* `awsbatch` - * A generic configuration profile to be used with AWS Batch. * `docker` - * A generic configuration profile to be used with [Docker](http://docker.com/) - * Pulls software from DockerHub: [`nfcore/bacass`](http://hub.docker.com/r/nfcore/bacass/) + * A generic configuration profile to be used with [Docker](https://docker.com/) * `singularity` - * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) - * Pulls software from Docker Hub: [`nfcore/bacass`](https://hub.docker.com/r/nfcore/bacass/) + * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) * `podman` - * A generic configuration profile to be used with [Podman](https://podman.io/) - * Pulls software from Docker Hub: [`nfcore/bacass`](https://hub.docker.com/r/nfcore/bacass/) + * A generic configuration profile to be used with [Podman](https://podman.io/) +* `shifter` + * A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) +* `charliecloud` + * A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) +* `conda` + * A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud. * `test` - * A profile with a complete configuration for automated testing - * Includes links to test data so needs no other parameters + * A profile with a complete configuration for automated testing + * Includes links to test data so needs no other parameters ### `-resume` @@ -92,27 +118,140 @@ You can also supply a run name to resume a specific run: `-resume [run-name]`. U Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. -#### Custom resource requests +## Custom configuration + +### Resource requests -Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped. +Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. -Whilst these default requirements will hopefully work for most people with most data, you may find that you want to customise the compute resources that the pipeline requests. You can do this by creating a custom config file. For example, to give the workflow process `star` 32GB of memory, you could use the following config: +For example, if the nf-core/rnaseq pipeline is failing after multiple re-submissions of the `STAR_ALIGN` process due to an exit code of `137` this would indicate that there is an out of memory issue: + +```console +[62/149eb0] NOTE: Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -- Execution is retried (1) +Error executing process > 'RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)' + +Caused by: + Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) + +Command executed: + STAR \ + --genomeDir star \ + --readFilesIn WT_REP1_trimmed.fq.gz \ + --runThreadN 2 \ + --outFileNamePrefix WT_REP1. \ + + +Command exit status: + 137 + +Command output: + (empty) + +Command error: + .command.sh: line 9: 30 Killed STAR --genomeDir star --readFilesIn WT_REP1_trimmed.fq.gz --runThreadN 2 --outFileNamePrefix WT_REP1. +Work dir: + /home/pipelinetest/work/9d/172ca5881234073e8d76f2a19c88fb + +Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run` +``` + +To bypass this error you would need to find exactly which resources are set by the `STAR_ALIGN` process. The quickest way is to search for `process STAR_ALIGN` in the [nf-core/rnaseq Github repo](https://github.com/nf-core/rnaseq/search?q=process+STAR_ALIGN). We have standardised the structure of Nextflow DSL2 pipelines such that all module files will be present in the `modules/` directory and so based on the search results the file we want is `modules/nf-core/software/star/align/main.nf`. If you click on the link to that file you will notice that there is a `label` directive at the top of the module that is set to [`label process_high`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L9). The [Nextflow `label`](https://www.nextflow.io/docs/latest/process.html#label) directive allows us to organise workflow processes in separate groups which can be referenced in a configuration file to select and configure subset of processes having similar computing requirements. The default values for the `process_high` label are set in the pipeline's [`base.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L33-L37) which in this case is defined as 72GB. Providing you haven't set any other standard nf-core parameters to __cap__ the [maximum resources](https://nf-co.re/usage/configuration#max-resources) used by the pipeline then we can try and bypass the `STAR_ALIGN` process failure by creating a custom config file that sets at least 72GB of memory, in this case increased to 100GB. The custom config below can then be provided to the pipeline via the [`-c`](#-c) parameter as highlighted in previous sections. ```nextflow process { - withName: star { - memory = 32.GB - } + withName: STAR_ALIGN { + memory = 100.GB + } +} +``` + +> **NB:** We specify just the process name i.e. `STAR_ALIGN` in the config file and not the full task name string that is printed to screen in the error message or on the terminal whilst the pipeline is running i.e. `RNASEQ:ALIGN_STAR:STAR_ALIGN`. You may get a warning suggesting that the process selector isn't recognised but you can ignore that if the process name has been specified correctly. This is something that needs to be fixed upstream in core Nextflow. + +### Tool-specific options + +For the ultimate flexibility, we have implemented and are using Nextflow DSL2 modules in a way where it is possible for both developers and users to change tool-specific command-line arguments (e.g. providing an additional command-line argument to the `STAR_ALIGN` process) as well as publishing options (e.g. saving files produced by the `STAR_ALIGN` process that aren't saved by default by the pipeline). In the majority of instances, as a user you won't have to change the default options set by the pipeline developer(s), however, there may be edge cases where creating a simple custom config file can improve the behaviour of the pipeline if for example it is failing due to a weird error that requires setting a tool-specific parameter to deal with smaller / larger genomes. + +The command-line arguments passed to STAR in the `STAR_ALIGN` module are a combination of: + +* Mandatory arguments or those that need to be evaluated within the scope of the module, as supplied in the [`script`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L49-L55) section of the module file. + +* An [`options.args`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L56) string of non-mandatory parameters that is set to be empty by default in the module but can be overwritten when including the module in the sub-workflow / workflow context via the `addParams` Nextflow option. + +The nf-core/rnaseq pipeline has a sub-workflow (see [terminology](https://github.com/nf-core/modules#terminology)) specifically to align reads with STAR and to sort, index and generate some basic stats on the resulting BAM files using SAMtools. At the top of this file we import the `STAR_ALIGN` module via the Nextflow [`include`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L10) keyword and by default the options passed to the module via the `addParams` option are set as an empty Groovy map [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L5); this in turn means `options.args` will be set to empty by default in the module file too. This is an intentional design choice and allows us to implement well-written sub-workflows composed of a chain of tools that by default run with the bare minimum parameter set for any given tool in order to make it much easier to share across pipelines and to provide the flexibility for users and developers to customise any non-mandatory arguments. + +When including the sub-workflow above in the main pipeline workflow we use the same `include` statement, however, we now have the ability to overwrite options for each of the tools in the sub-workflow including the [`align_options`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L225) variable that will be used specifically to overwrite the optional arguments passed to the `STAR_ALIGN` module. In this case, the options to be provided to `STAR_ALIGN` have been assigned sensible defaults by the developer(s) in the pipeline's [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L70-L74) and can be accessed and customised in the [workflow context](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L201-L204) too before eventually passing them to the sub-workflow as a Groovy map called `star_align_options`. These options will then be propagated from `workflow -> sub-workflow -> module`. + +As mentioned at the beginning of this section it may also be necessary for users to overwrite the options passed to modules to be able to customise specific aspects of the way in which a particular tool is executed by the pipeline. Given that all of the default module options are stored in the pipeline's `modules.config` as a [`params` variable](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L24-L25) it is also possible to overwrite any of these options via a custom config file. + +Say for example we want to append an additional, non-mandatory parameter (i.e. `--outFilterMismatchNmax 16`) to the arguments passed to the `STAR_ALIGN` module. Firstly, we need to copy across the default `args` specified in the [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L71) and create a custom config file that is a composite of the default `args` as well as the additional options you would like to provide. This is very important because Nextflow will overwrite the default value of `args` that you provide via the custom config. + +As you will see in the example below, we have: + +* appended `--outFilterMismatchNmax 16` to the default `args` used by the module. +* changed the default `publish_dir` value to where the files will eventually be published in the main results directory. +* appended `'bam':''` to the default value of `publish_files` so that the BAM files generated by the process will also be saved in the top-level results directory for the module. Note: `'out':'log'` means any file/directory ending in `out` will now be saved in a separate directory called `my_star_directory/log/`. + +```nextflow +params { + modules { + 'star_align' { + args = "--quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand zcat --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend --outFilterMismatchNmax 16" + publish_dir = "my_star_directory" + publish_files = ['out':'log', 'tab':'log', 'bam':''] + } + } } ``` -See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information. +### Updating containers + +The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon everytime a new version of Pangolin has been released. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`. + +1. Check the default version used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19) +2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags) +3. Create the custom config accordingly: + + * For Docker: + + ```nextflow + process { + withName: PANGOLIN { + container = 'quay.io/biocontainers/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` + + * For Singularity: + + ```nextflow + process { + withName: PANGOLIN { + container = 'https://depot.galaxyproject.org/singularity/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` + + * For Conda: + + ```nextflow + process { + withName: PANGOLIN { + conda = 'bioconda::pangolin=3.0.5' + } + } + ``` + +> **NB:** If you wish to periodically update individual tool-specific results (e.g. Pangolin) generated by the pipeline then you must ensure to keep the `work/` directory otherwise the `-resume` ability of the pipeline will be compromised and it will restart from scratch. + +### nf-core/configs + +In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. -If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition above). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. +See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information about creating your own configuration files. If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -### Running in the background +## Running in the background Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. @@ -121,11 +260,11 @@ The Nextflow `-bg` flag launches Nextflow in the background, detached from your Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time. Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs). -#### Nextflow memory requirements +## Nextflow memory requirements In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`): -```bash +```console NXF_OPTS='-Xms1g -Xmx4g' ``` diff --git a/environment.yml b/environment.yml deleted file mode 100644 index 2fd763e3..00000000 --- a/environment.yml +++ /dev/null @@ -1,25 +0,0 @@ -# You can use this file to create a conda environment for this pipeline: -# conda env create -f environment.yml -name: nf-core-bacass-1.1.1 -channels: - - conda-forge - - bioconda - - defaults -dependencies: - - python=3.7.3 - #Stuff for the documentation output - - conda-forge::markdown=3.3 - - conda-forge::pymdown-extensions=8.0.1 - - conda-forge::pygments=2.7.1 - #Other dependencies - - fastqc=0.11.9 - - multiqc=1.9 - - skewer=0.2.2 - - kraken2=2.0.9beta - # Nanopore analysis stuff - - conda-forge::parallel=20200922 - - miniasm=0.3_r179 - - racon=1.4.13 - - minimap2=2.17 - - canu=2.0 - - h5py=2.10.0 #until the pycoqc recipe has been updated diff --git a/lib/NfcoreSchema.groovy b/lib/NfcoreSchema.groovy new file mode 100755 index 00000000..8d6920dd --- /dev/null +++ b/lib/NfcoreSchema.groovy @@ -0,0 +1,517 @@ +// +// This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf-core pipeline template. +// + +import org.everit.json.schema.Schema +import org.everit.json.schema.loader.SchemaLoader +import org.everit.json.schema.ValidationException +import org.json.JSONObject +import org.json.JSONTokener +import org.json.JSONArray +import groovy.json.JsonSlurper +import groovy.json.JsonBuilder + +class NfcoreSchema { + + // + // Resolve Schema path relative to main workflow directory + // + public static String getSchemaPath(workflow, schema_filename='nextflow_schema.json') { + return "${workflow.projectDir}/${schema_filename}" + } + + // + // Function to loop over all parameters defined in schema and check + // whether the given parameters adhere to the specifications + // + /* groovylint-disable-next-line UnusedPrivateMethodParameter */ + public static void validateParameters(workflow, params, log, schema_filename='nextflow_schema.json') { + def has_error = false + //=====================================================================// + // Check for nextflow core params and unexpected params + def json = new File(getSchemaPath(workflow, schema_filename=schema_filename)).text + def Map schemaParams = (Map) new JsonSlurper().parseText(json).get('definitions') + def nf_params = [ + // Options for base `nextflow` command + 'bg', + 'c', + 'C', + 'config', + 'd', + 'D', + 'dockerize', + 'h', + 'log', + 'q', + 'quiet', + 'syslog', + 'v', + 'version', + + // Options for `nextflow run` command + 'ansi', + 'ansi-log', + 'bg', + 'bucket-dir', + 'c', + 'cache', + 'config', + 'dsl2', + 'dump-channels', + 'dump-hashes', + 'E', + 'entry', + 'latest', + 'lib', + 'main-script', + 'N', + 'name', + 'offline', + 'params-file', + 'pi', + 'plugins', + 'poll-interval', + 'pool-size', + 'profile', + 'ps', + 'qs', + 'queue-size', + 'r', + 'resume', + 'revision', + 'stdin', + 'stub', + 'stub-run', + 'test', + 'w', + 'with-charliecloud', + 'with-conda', + 'with-dag', + 'with-docker', + 'with-mpi', + 'with-notification', + 'with-podman', + 'with-report', + 'with-singularity', + 'with-timeline', + 'with-tower', + 'with-trace', + 'with-weblog', + 'without-docker', + 'without-podman', + 'work-dir' + ] + def unexpectedParams = [] + + // Collect expected parameters from the schema + def expectedParams = [] + for (group in schemaParams) { + for (p in group.value['properties']) { + expectedParams.push(p.key) + } + } + + for (specifiedParam in params.keySet()) { + // nextflow params + if (nf_params.contains(specifiedParam)) { + log.error "ERROR: You used a core Nextflow option with two hyphens: '--${specifiedParam}'. Please resubmit with '-${specifiedParam}'" + has_error = true + } + // unexpected params + def params_ignore = params.schema_ignore_params.split(',') + 'schema_ignore_params' + def expectedParamsLowerCase = expectedParams.collect{ it.replace("-", "").toLowerCase() } + def specifiedParamLowerCase = specifiedParam.replace("-", "").toLowerCase() + def isCamelCaseBug = (specifiedParam.contains("-") && !expectedParams.contains(specifiedParam) && expectedParamsLowerCase.contains(specifiedParamLowerCase)) + if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !isCamelCaseBug) { + // Temporarily remove camelCase/camel-case params #1035 + def unexpectedParamsLowerCase = unexpectedParams.collect{ it.replace("-", "").toLowerCase()} + if (!unexpectedParamsLowerCase.contains(specifiedParamLowerCase)){ + unexpectedParams.push(specifiedParam) + } + } + } + + //=====================================================================// + // Validate parameters against the schema + InputStream input_stream = new File(getSchemaPath(workflow, schema_filename=schema_filename)).newInputStream() + JSONObject raw_schema = new JSONObject(new JSONTokener(input_stream)) + + // Remove anything that's in params.schema_ignore_params + raw_schema = removeIgnoredParams(raw_schema, params) + + Schema schema = SchemaLoader.load(raw_schema) + + // Clean the parameters + def cleanedParams = cleanParameters(params) + + // Convert to JSONObject + def jsonParams = new JsonBuilder(cleanedParams) + JSONObject params_json = new JSONObject(jsonParams.toString()) + + // Validate + try { + schema.validate(params_json) + } catch (ValidationException e) { + println '' + log.error 'ERROR: Validation of pipeline parameters failed!' + JSONObject exceptionJSON = e.toJSON() + printExceptions(exceptionJSON, params_json, log) + println '' + has_error = true + } + + // Check for unexpected parameters + if (unexpectedParams.size() > 0) { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) + println '' + def warn_msg = 'Found unexpected parameters:' + for (unexpectedParam in unexpectedParams) { + warn_msg = warn_msg + "\n* --${unexpectedParam}: ${params[unexpectedParam].toString()}" + } + log.warn warn_msg + log.info "- ${colors.dim}Ignore this warning: params.schema_ignore_params = \"${unexpectedParams.join(',')}\" ${colors.reset}" + println '' + } + + if (has_error) { + System.exit(1) + } + } + + // + // Beautify parameters for --help + // + public static String paramsHelp(workflow, params, command, schema_filename='nextflow_schema.json') { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) + Integer num_hidden = 0 + String output = '' + output += 'Typical pipeline command:\n\n' + output += " ${colors.cyan}${command}${colors.reset}\n\n" + Map params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) + Integer max_chars = paramsMaxChars(params_map) + 1 + Integer desc_indent = max_chars + 14 + Integer dec_linewidth = 160 - desc_indent + for (group in params_map.keySet()) { + Integer num_params = 0 + String group_output = colors.underlined + colors.bold + group + colors.reset + '\n' + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (group_params.get(param).hidden && !params.show_hidden_params) { + num_hidden += 1 + continue; + } + def type = '[' + group_params.get(param).type + ']' + def description = group_params.get(param).description + def defaultValue = group_params.get(param).default ? " [default: " + group_params.get(param).default.toString() + "]" : '' + def description_default = description + colors.dim + defaultValue + colors.reset + // Wrap long description texts + // Loosely based on https://dzone.com/articles/groovy-plain-text-word-wrap + if (description_default.length() > dec_linewidth){ + List olines = [] + String oline = "" // " " * indent + description_default.split(" ").each() { wrd -> + if ((oline.size() + wrd.size()) <= dec_linewidth) { + oline += wrd + " " + } else { + olines += oline + oline = wrd + " " + } + } + olines += oline + description_default = olines.join("\n" + " " * desc_indent) + } + group_output += " --" + param.padRight(max_chars) + colors.dim + type.padRight(10) + colors.reset + description_default + '\n' + num_params += 1 + } + group_output += '\n' + if (num_params > 0){ + output += group_output + } + } + if (num_hidden > 0){ + output += colors.dim + "!! Hiding $num_hidden params, use --show_hidden_params to show them !!\n" + colors.reset + } + output += NfcoreTemplate.dashedLine(params.monochrome_logs) + return output + } + + // + // Groovy Map summarising parameters/workflow options used by the pipeline + // + public static LinkedHashMap paramsSummaryMap(workflow, params, schema_filename='nextflow_schema.json') { + // Get a selection of core Nextflow workflow options + def Map workflow_summary = [:] + if (workflow.revision) { + workflow_summary['revision'] = workflow.revision + } + workflow_summary['runName'] = workflow.runName + if (workflow.containerEngine) { + workflow_summary['containerEngine'] = workflow.containerEngine + } + if (workflow.container) { + workflow_summary['container'] = workflow.container + } + workflow_summary['launchDir'] = workflow.launchDir + workflow_summary['workDir'] = workflow.workDir + workflow_summary['projectDir'] = workflow.projectDir + workflow_summary['userName'] = workflow.userName + workflow_summary['profile'] = workflow.profile + workflow_summary['configFiles'] = workflow.configFiles.join(', ') + + // Get pipeline parameters defined in JSON Schema + def Map params_summary = [:] + def blacklist = ['hostnames'] + def params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) + for (group in params_map.keySet()) { + def sub_params = new LinkedHashMap() + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (params.containsKey(param) && !blacklist.contains(param)) { + def params_value = params.get(param) + def schema_value = group_params.get(param).default + def param_type = group_params.get(param).type + if (schema_value != null) { + if (param_type == 'string') { + if (schema_value.contains('$projectDir') || schema_value.contains('${projectDir}')) { + def sub_string = schema_value.replace('\$projectDir', '') + sub_string = sub_string.replace('\${projectDir}', '') + if (params_value.contains(sub_string)) { + schema_value = params_value + } + } + if (schema_value.contains('$params.outdir') || schema_value.contains('${params.outdir}')) { + def sub_string = schema_value.replace('\$params.outdir', '') + sub_string = sub_string.replace('\${params.outdir}', '') + if ("${params.outdir}${sub_string}" == params_value) { + schema_value = params_value + } + } + } + } + + // We have a default in the schema, and this isn't it + if (schema_value != null && params_value != schema_value) { + sub_params.put(param, params_value) + } + // No default in the schema, and this isn't empty + else if (schema_value == null && params_value != "" && params_value != null && params_value != false) { + sub_params.put(param, params_value) + } + } + } + params_summary.put(group, sub_params) + } + return [ 'Core Nextflow options' : workflow_summary ] << params_summary + } + + // + // Beautify parameters for summary and return as string + // + public static String paramsSummaryLog(workflow, params) { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) + String output = '' + def params_map = paramsSummaryMap(workflow, params) + def max_chars = paramsMaxChars(params_map) + for (group in params_map.keySet()) { + def group_params = params_map.get(group) // This gets the parameters of that particular group + if (group_params) { + output += colors.bold + group + colors.reset + '\n' + for (param in group_params.keySet()) { + output += " " + colors.blue + param.padRight(max_chars) + ": " + colors.green + group_params.get(param) + colors.reset + '\n' + } + output += '\n' + } + } + output += "!! Only displaying parameters that differ from the pipeline defaults !!\n" + output += NfcoreTemplate.dashedLine(params.monochrome_logs) + return output + } + + // + // Loop over nested exceptions and print the causingException + // + private static void printExceptions(ex_json, params_json, log) { + def causingExceptions = ex_json['causingExceptions'] + if (causingExceptions.length() == 0) { + def m = ex_json['message'] =~ /required key \[([^\]]+)\] not found/ + // Missing required param + if (m.matches()) { + log.error "* Missing required parameter: --${m[0][1]}" + } + // Other base-level error + else if (ex_json['pointerToViolation'] == '#') { + log.error "* ${ex_json['message']}" + } + // Error with specific param + else { + def param = ex_json['pointerToViolation'] - ~/^#\// + def param_val = params_json[param].toString() + log.error "* --${param}: ${ex_json['message']} (${param_val})" + } + } + for (ex in causingExceptions) { + printExceptions(ex, params_json, log) + } + } + + // + // Remove an element from a JSONArray + // + private static JSONArray removeElement(json_array, element) { + def list = [] + int len = json_array.length() + for (int i=0;i + if(raw_schema.keySet().contains('definitions')){ + raw_schema.definitions.each { definition -> + for (key in definition.keySet()){ + if (definition[key].get("properties").keySet().contains(ignore_param)){ + // Remove the param to ignore + definition[key].get("properties").remove(ignore_param) + // If the param was required, change this + if (definition[key].has("required")) { + def cleaned_required = removeElement(definition[key].required, ignore_param) + definition[key].put("required", cleaned_required) + } + } + } + } + } + if(raw_schema.keySet().contains('properties') && raw_schema.get('properties').keySet().contains(ignore_param)) { + raw_schema.get("properties").remove(ignore_param) + } + if(raw_schema.keySet().contains('required') && raw_schema.required.contains(ignore_param)) { + def cleaned_required = removeElement(raw_schema.required, ignore_param) + raw_schema.put("required", cleaned_required) + } + } + return raw_schema + } + + // + // Clean and check parameters relative to Nextflow native classes + // + private static Map cleanParameters(params) { + def new_params = params.getClass().newInstance(params) + for (p in params) { + // remove anything evaluating to false + if (!p['value']) { + new_params.remove(p.key) + } + // Cast MemoryUnit to String + if (p['value'].getClass() == nextflow.util.MemoryUnit) { + new_params.replace(p.key, p['value'].toString()) + } + // Cast Duration to String + if (p['value'].getClass() == nextflow.util.Duration) { + new_params.replace(p.key, p['value'].toString().replaceFirst(/d(?!\S)/, "day")) + } + // Cast LinkedHashMap to String + if (p['value'].getClass() == LinkedHashMap) { + new_params.replace(p.key, p['value'].toString()) + } + } + return new_params + } + + // + // This function tries to read a JSON params file + // + private static LinkedHashMap paramsLoad(String json_schema) { + def params_map = new LinkedHashMap() + try { + params_map = paramsRead(json_schema) + } catch (Exception e) { + println "Could not read parameters settings from JSON. $e" + params_map = new LinkedHashMap() + } + return params_map + } + + // + // Method to actually read in JSON file using Groovy. + // Group (as Key), values are all parameters + // - Parameter1 as Key, Description as Value + // - Parameter2 as Key, Description as Value + // .... + // Group + // - + private static LinkedHashMap paramsRead(String json_schema) throws Exception { + def json = new File(json_schema).text + def Map schema_definitions = (Map) new JsonSlurper().parseText(json).get('definitions') + def Map schema_properties = (Map) new JsonSlurper().parseText(json).get('properties') + /* Tree looks like this in nf-core schema + * definitions <- this is what the first get('definitions') gets us + group 1 + title + description + properties + parameter 1 + type + description + parameter 2 + type + description + group 2 + title + description + properties + parameter 1 + type + description + * properties <- parameters can also be ungrouped, outside of definitions + parameter 1 + type + description + */ + + // Grouped params + def params_map = new LinkedHashMap() + schema_definitions.each { key, val -> + def Map group = schema_definitions."$key".properties // Gets the property object of the group + def title = schema_definitions."$key".title + def sub_params = new LinkedHashMap() + group.each { innerkey, value -> + sub_params.put(innerkey, value) + } + params_map.put(title, sub_params) + } + + // Ungrouped params + def ungrouped_params = new LinkedHashMap() + schema_properties.each { innerkey, value -> + ungrouped_params.put(innerkey, value) + } + params_map.put("Other parameters", ungrouped_params) + + return params_map + } + + // + // Get maximum number of characters across all parameter names + // + private static Integer paramsMaxChars(params_map) { + Integer max_chars = 0 + for (group in params_map.keySet()) { + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (param.size() > max_chars) { + max_chars = param.size() + } + } + } + return max_chars + } +} diff --git a/lib/NfcoreTemplate.groovy b/lib/NfcoreTemplate.groovy new file mode 100755 index 00000000..44551e0a --- /dev/null +++ b/lib/NfcoreTemplate.groovy @@ -0,0 +1,270 @@ +// +// This file holds several functions used within the nf-core pipeline template. +// + +import org.yaml.snakeyaml.Yaml + +class NfcoreTemplate { + + // + // Check AWS Batch related parameters have been specified correctly + // + public static void awsBatch(workflow, params) { + if (workflow.profile.contains('awsbatch')) { + // Check params.awsqueue and params.awsregion have been set if running on AWSBatch + assert (params.awsqueue && params.awsregion) : "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + // Check outdir paths to be S3 buckets if running on AWSBatch + assert params.outdir.startsWith('s3:') : "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + } + } + + // + // Check params.hostnames + // + public static void hostName(workflow, params, log) { + Map colors = logColours(params.monochrome_logs) + if (params.hostnames) { + try { + def hostname = "hostname".execute().text.trim() + params.hostnames.each { prof, hnames -> + hnames.each { hname -> + if (hostname.contains(hname) && !workflow.profile.contains(prof)) { + log.info "=${colors.yellow}====================================================${colors.reset}=\n" + + "${colors.yellow}WARN: You are running with `-profile $workflow.profile`\n" + + " but your machine hostname is ${colors.white}'$hostname'${colors.reset}.\n" + + " ${colors.yellow_bold}Please use `-profile $prof${colors.reset}`\n" + + "=${colors.yellow}====================================================${colors.reset}=" + } + } + } + } catch (Exception e) { + log.warn "[$workflow.manifest.name] Could not determine 'hostname' - skipping check. Reason: ${e.message}." + } + } + } + + // + // Construct and send completion email + // + public static void email(workflow, params, summary_params, projectDir, log, multiqc_report=[]) { + + // Set up the e-mail variables + def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + if (!workflow.success) { + subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + } + + def summary = [:] + for (group in summary_params.keySet()) { + summary << summary_params[group] + } + + def misc_fields = [:] + misc_fields['Date Started'] = workflow.start + misc_fields['Date Completed'] = workflow.complete + misc_fields['Pipeline script file path'] = workflow.scriptFile + misc_fields['Pipeline script hash ID'] = workflow.scriptId + if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository + if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId + if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build + misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp + + def email_fields = [:] + email_fields['version'] = workflow.manifest.version + email_fields['runName'] = workflow.runName + email_fields['success'] = workflow.success + email_fields['dateComplete'] = workflow.complete + email_fields['duration'] = workflow.duration + email_fields['exitStatus'] = workflow.exitStatus + email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') + email_fields['errorReport'] = (workflow.errorReport ?: 'None') + email_fields['commandLine'] = workflow.commandLine + email_fields['projectDir'] = workflow.projectDir + email_fields['summary'] = summary << misc_fields + + // On success try attach the multiqc report + def mqc_report = null + try { + if (workflow.success) { + mqc_report = multiqc_report.getVal() + if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { + if (mqc_report.size() > 1) { + log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + } + mqc_report = mqc_report[0] + } + } + } catch (all) { + if (multiqc_report) { + log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + } + } + + // Check if we are only sending emails on failure + def email_address = params.email + if (!params.email && params.email_on_fail && !workflow.success) { + email_address = params.email_on_fail + } + + // Render the TXT template + def engine = new groovy.text.GStringTemplateEngine() + def tf = new File("$projectDir/assets/email_template.txt") + def txt_template = engine.createTemplate(tf).make(email_fields) + def email_txt = txt_template.toString() + + // Render the HTML template + def hf = new File("$projectDir/assets/email_template.html") + def html_template = engine.createTemplate(hf).make(email_fields) + def email_html = html_template.toString() + + // Render the sendmail template + def max_multiqc_email_size = params.max_multiqc_email_size as nextflow.util.MemoryUnit + def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "$projectDir", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def sf = new File("$projectDir/assets/sendmail_template.txt") + def sendmail_template = engine.createTemplate(sf).make(smail_fields) + def sendmail_html = sendmail_template.toString() + + // Send the HTML e-mail + Map colors = logColours(params.monochrome_logs) + if (email_address) { + try { + if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + // Try to send HTML e-mail using sendmail + [ 'sendmail', '-t' ].execute() << sendmail_html + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" + } catch (all) { + // Catch failures and try with plaintext + def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + if ( mqc_report.size() <= max_multiqc_email_size.toBytes() ) { + mail_cmd += [ '-A', mqc_report ] + } + mail_cmd.execute() << email_html + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + } + } + + // Write summary e-mail HTML to a file + def output_d = new File("${params.outdir}/pipeline_info/") + if (!output_d.exists()) { + output_d.mkdirs() + } + def output_hf = new File(output_d, "pipeline_report.html") + output_hf.withWriter { w -> w << email_html } + def output_tf = new File(output_d, "pipeline_report.txt") + output_tf.withWriter { w -> w << email_txt } + } + + // + // Print pipeline summary on completion + // + public static void summary(workflow, params, log) { + Map colors = logColours(params.monochrome_logs) + if (workflow.success) { + if (workflow.stats.ignoredCount == 0) { + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" + } else { + log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + } + } else { + hostName(workflow, params, log) + log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + } + + // + // ANSII Colours used for terminal logging + // + public static Map logColours(Boolean monochrome_logs) { + Map colorcodes = [:] + + // Reset / Meta + colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" + colorcodes['bold'] = monochrome_logs ? '' : "\033[1m" + colorcodes['dim'] = monochrome_logs ? '' : "\033[2m" + colorcodes['underlined'] = monochrome_logs ? '' : "\033[4m" + colorcodes['blink'] = monochrome_logs ? '' : "\033[5m" + colorcodes['reverse'] = monochrome_logs ? '' : "\033[7m" + colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" + + // Regular Colors + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + + // Bold + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + + // Underline + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + + // High Intensity + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + + // Bold High Intensity + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + + return colorcodes + } + + // + // Does what is says on the tin + // + public static String dashedLine(monochrome_logs) { + Map colors = logColours(monochrome_logs) + return "-${colors.dim}----------------------------------------------------${colors.reset}-" + } + + // + // nf-core logo + // + public static String logo(workflow, monochrome_logs) { + Map colors = logColours(monochrome_logs) + String.format( + """\n + ${dashedLine(monochrome_logs)} + ${colors.green},--.${colors.black}/${colors.green},-.${colors.reset} + ${colors.blue} ___ __ __ __ ___ ${colors.green}/,-._.--~\'${colors.reset} + ${colors.blue} |\\ | |__ __ / ` / \\ |__) |__ ${colors.yellow}} {${colors.reset} + ${colors.blue} | \\| | \\__, \\__/ | \\ |___ ${colors.green}\\`-._,-`-,${colors.reset} + ${colors.green}`._,._,\'${colors.reset} + ${colors.purple} ${workflow.manifest.name} v${workflow.manifest.version}${colors.reset} + ${dashedLine(monochrome_logs)} + """.stripIndent() + ) + } +} diff --git a/lib/Utils.groovy b/lib/Utils.groovy new file mode 100755 index 00000000..18173e98 --- /dev/null +++ b/lib/Utils.groovy @@ -0,0 +1,47 @@ +// +// This file holds several Groovy functions that could be useful for any Nextflow pipeline +// + +import org.yaml.snakeyaml.Yaml + +class Utils { + + // + // When running with -profile conda, warn if channels have not been set-up appropriately + // + public static void checkCondaChannels(log) { + Yaml parser = new Yaml() + def channels = [] + try { + def config = parser.load("conda config --show channels".execute().text) + channels = config.channels + } catch(NullPointerException | IOException e) { + log.warn "Could not verify conda channel configuration." + return + } + + // Check that all channels are present + def required_channels = ['conda-forge', 'bioconda', 'defaults'] + def conda_check_failed = !required_channels.every { ch -> ch in channels } + + // Check that they are in the right order + conda_check_failed |= !(channels.indexOf('conda-forge') < channels.indexOf('bioconda')) + conda_check_failed |= !(channels.indexOf('bioconda') < channels.indexOf('defaults')) + + if (conda_check_failed) { + log.warn "=============================================================================\n" + + " There is a problem with your Conda configuration!\n\n" + + " You will need to set-up the conda-forge and bioconda channels correctly.\n" + + " Please refer to https://bioconda.github.io/user/install.html#set-up-channels\n" + + " NB: The order of the channels matters!\n" + + "===================================================================================" + } + } + + // + // Join module args with appropriate spacing + // + public static String joinModuleArgs(args_list) { + return ' ' + args_list.join(' ') + } +} diff --git a/lib/WorkflowBacass.groovy b/lib/WorkflowBacass.groovy new file mode 100755 index 00000000..4b99b4dc --- /dev/null +++ b/lib/WorkflowBacass.groovy @@ -0,0 +1,43 @@ +// +// This file holds several functions specific to the workflow/bacass.nf in the nf-core/bacass pipeline +// + +class WorkflowBacass { + + // + // Check and validate parameters + // + public static void initialise(params, log) { + if(("${params.assembler}" == 'canu' || "${params.assembler}" == 'miniasm') && ("${params.assembly_type}" == 'short' || "${params.assembly_type}" == 'hybrid')){ + log.error "Canu and Miniasm can only be used for long read assembly and neither for Hybrid nor Shortread assembly!" + System.exit(1) + } + } + + // + // Get workflow summary for MultiQC + // + public static String paramsSummaryMultiqc(workflow, summary) { + String summary_section = '' + for (group in summary.keySet()) { + def group_params = summary.get(group) // This gets the parameters of that particular group + if (group_params) { + summary_section += "

$group

\n" + summary_section += "
\n" + for (param in group_params.keySet()) { + summary_section += "
$param
${group_params.get(param) ?: 'N/A'}
\n" + } + summary_section += "
\n" + } + } + + String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += "data: |\n" + yaml_file_text += "${summary_section}" + return yaml_file_text + } +} diff --git a/lib/WorkflowMain.groovy b/lib/WorkflowMain.groovy new file mode 100755 index 00000000..049df694 --- /dev/null +++ b/lib/WorkflowMain.groovy @@ -0,0 +1,80 @@ +// +// This file holds several functions specific to the main.nf workflow in the nf-core/bacass pipeline +// + +class WorkflowMain { + + // + // Citation string for pipeline + // + public static String citation(workflow) { + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + + "* The pipeline\n" + + " https://doi.org/10.5281/zenodo.2669428\n\n" + + "* The nf-core framework\n" + + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + + "* Software dependencies\n" + + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + } + + // + // Print help to screen if required + // + public static String help(workflow, params, log) { + def command = "nextflow run ${workflow.manifest.name} --input samplesheet.csv --genome GRCh37 -profile docker" + def help_string = '' + help_string += NfcoreTemplate.logo(workflow, params.monochrome_logs) + help_string += NfcoreSchema.paramsHelp(workflow, params, command) + help_string += '\n' + citation(workflow) + '\n' + help_string += NfcoreTemplate.dashedLine(params.monochrome_logs) + return help_string + } + + // + // Print parameter summary log to screen + // + public static String paramsSummaryLog(workflow, params, log) { + def summary_log = '' + summary_log += NfcoreTemplate.logo(workflow, params.monochrome_logs) + summary_log += NfcoreSchema.paramsSummaryLog(workflow, params) + summary_log += '\n' + citation(workflow) + '\n' + summary_log += NfcoreTemplate.dashedLine(params.monochrome_logs) + return summary_log + } + + // + // Validate parameters and print summary to screen + // + public static void initialise(workflow, params, log) { + // Print help to screen if required + if (params.help) { + log.info help(workflow, params, log) + System.exit(0) + } + + // Validate workflow parameters via the JSON schema + if (params.validate_params) { + NfcoreSchema.validateParameters(workflow, params, log) + } + + // Print parameter summary log to screen + log.info paramsSummaryLog(workflow, params, log) + + // Check that conda channels are set-up correctly + if (params.enable_conda) { + Utils.checkCondaChannels(log) + } + + // Check AWS batch settings + NfcoreTemplate.awsBatch(workflow, params) + + // Check the hostnames against configured profiles + NfcoreTemplate.hostName(workflow, params, log) + + // Check input has been provided + if (!params.input) { + log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.csv'" + System.exit(1) + } + } +} diff --git a/lib/nfcore_external_java_deps.jar b/lib/nfcore_external_java_deps.jar new file mode 100644 index 00000000..805c8bb5 Binary files /dev/null and b/lib/nfcore_external_java_deps.jar differ diff --git a/main.nf b/main.nf index a41d5768..2ce50b0f 100644 --- a/main.nf +++ b/main.nf @@ -1,927 +1,55 @@ #!/usr/bin/env nextflow /* ======================================================================================== - nf-core/bacass + nf-core/bacass ======================================================================================== - nf-core/bacass Analysis Pipeline. - #### Homepage / Documentation - https://github.com/nf-core/bacass + Github : https://github.com/nf-core/bacass + Website: https://nf-co.re/bacass + Slack : https://nfcore.slack.com/channels/bacass ---------------------------------------------------------------------------------------- */ -def helpMessage() { - log.info nfcoreHeader() - log.info""" - Usage: - - The typical command for running the pipeline is as follows: - - nextflow run nf-core/bacass --input input.csv --kraken2db 'path-to-kraken2db' -profile docker - - Mandatory arguments: - -profile Configuration profile to use. Can use multiple (comma separated) - Available: conda, docker, singularity, awsbatch, test and more. - --input The design file used for running the pipeline in TSV format. - - Pipeline arguments: - --assembler Default: "Unicycler", Available: "Canu", "Miniasm", "Unicycler". Short & Hybrid assembly always runs "Unicycler". - --assembly_type Default: "Short", Available: "Short", "Long", "Hybrid". - --kraken2db Path to Kraken2 Database directory - --prokka_args Advanced: Extra arguments to Prokka (quote and add leading space) - --unicycler_args Advanced: Extra arguments to Unicycler (quote and add leading space) - --canu_args Advanced: Extra arguments for Canu assembly (quote and add leading space) - - Other options: - --outdir The output directory where the results will be saved - --email Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits - -name Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. - - Skipping options: - --skip_annotation Skips the annotation with Prokka - --skip_kraken2 Skips the read classification with Kraken2 - --skip_polish Skips polishing long-reads with Nanopolish or Medaka - --skip_pycoqc Skips long-read raw signal QC - - AWSBatch options: - --awsqueue [str] The AWSBatch JobQueue that needs to be set when running on AWSBatch - --awsregion [str] The AWS Region for your AWS Batch job to run on - --awscli [str] Path to the AWS CLI tool - """.stripIndent() -} - -// Show help message -if (params.help) { - helpMessage() - exit 0 -} - -if(! params.skip_kraken2){ - if(params.kraken2db){ - kraken2db = file(params.kraken2db) - } else { - exit 1, "Missing Kraken2 DB arg" - } -} - -// Has the run name been specified by the user? -// this has the bonus effect of catching both -name and --name -custom_runName = params.name -if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { - custom_runName = workflow.runName -} - -// Check AWS batch settings -if (workflow.profile.contains('awsbatch')) { - // AWSBatch sanity checking - if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" - // Check outdir paths to be S3 buckets if running on AWSBatch - // related: https://github.com/nextflow-io/nextflow/issues/813 - if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" - // Prevent trace files to be stored on S3 since S3 does not support rolling files. - if (params.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles." -} - -// Stage config files -ch_multiqc_config = file("$baseDir/assets/multiqc_config.yaml", checkIfExists: true) -ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() -ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) -ch_output_docs_images = file("$baseDir/docs/images/", checkIfExists: true) - -//Check whether we have a design file as input set -if(!params.input){ - exit 1, "Missing Design File - please see documentation how to create one." -} else { - //Design file looks like this - // ID R1 R2 Long-ReadFastQ Fast5Path GenomeSize - // ID is required, everything else (can!) be optional and causes some pipeline components to turn off! - // Tapping the parsed input design to multiple channels to get some data to specific downstream processes that don't need full information! - Channel - .fromPath(params.input) - .splitCsv(header: true, sep:'\t') - .map { col -> - def id = "${col.ID}" - def r1 = returnFile("${col.R1}") - def r2 = returnFile("${col.R2}") - def lr = returnFile("${col.LongFastQ}") - def f5 = returnFile("${col.Fast5}") - def genome_size = "${col.GenomeSize}" - tuple(id,r1,r2,lr,f5,genome_size) - } - .dump(tag: "input") - .tap {ch_all_data; ch_all_data_for_fast5; ch_all_data_for_genomesize} - .map { id,r1,r2,lr,f5,gs -> - tuple(id,r1,r2) - } - .filter{ id,r1,r2 -> - r1 != 'NA' && r2 != 'NA'} - //Filter to get rid of R1/R2 that are NA - .into {ch_for_short_trim; ch_for_fastqc} - //Dump long read info to different channel! - ch_all_data - .map { id, r1, r2, lr, f5, genomeSize -> - tuple(id, file(lr)) - } - .dump(tag: 'longinput') - .into {ch_for_long_trim; ch_for_nanoplot; ch_for_pycoqc; ch_for_nanopolish; ch_for_long_fastq} - - //Dump fast5 to separate channel - ch_all_data_for_fast5 - .map { id, r1, r2, lr, f5, genomeSize -> - tuple(id, f5) - } - .filter {id, fast5 -> - fast5 != 'NA' - } - .into {ch_fast5_for_pycoqc; ch_fast5_for_nanopolish} - - //Dump genomeSize to separate channel, too - ch_all_data_for_genomesize - .map { id, r1, r2, lr, f5, genomeSize -> - tuple(id,genomeSize) - } - .filter{id, genomeSize -> - genomeSize != 'NA' - } - .set {ch_genomeSize_forCanu} -} - -// Header log info -log.info nfcoreHeader() -def summary = [:] -if(workflow.revision) summary['Pipeline Release'] = workflow.revision -summary['Pipeline Name'] = 'nf-core/bacass' -summary['Run Name'] = custom_runName ?: workflow.runName -summary['Assembler Method'] = params.assembler -summary['Assembly Type'] = params.assembly_type -if (params.kraken2db) summary['Kraken2 DB'] = params.kraken2db -summary['Extra Prokka arguments'] = params.prokka_args -summary['Extra Unicycler arguments'] = params.unicycler_args -summary['Extra Canu arguments'] = params.canu_args -if (params.skip_annotation) summary['Skip Annotation'] = params.skip_annotation -if (params.skip_kraken2) summary['Skip Kraken2'] = params.skip_kraken2 -if (params.skip_polish) summary['Skip Polish'] = params.skip_polish -if (!params.skip_polish) summary['Polish Method'] = params.polish_method -if (params.skip_pycoqc) summary['Skip PycoQC'] = params.skip_pycoqc -summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -if(workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" -summary['Launch dir'] = workflow.launchDir -summary['Output dir'] = params.outdir -summary['Working dir'] = workflow.workDir -summary['Script dir'] = workflow.projectDir -summary['User'] = workflow.userName -if(workflow.profile == 'awsbatch'){ - summary['AWS Region'] = params.awsregion - summary['AWS Queue'] = params.awsqueue -} -summary['Config Profile'] = workflow.profile - -if(params.config_profile_description) summary['Config Description'] = params.config_profile_description -if(params.config_profile_contact) summary['Config Contact'] = params.config_profile_contact -if(params.config_profile_url) summary['Config URL'] = params.config_profile_url -if(params.email) { - summary['E-mail Address'] = params.email - summary['MultiQC maxsize'] = params.max_multiqc_email_size -} -log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") -log.info "----------------------------------------------------" - - -// Check the hostnames against configured profiles -checkHostname() - -Channel.from(summary.collect{ [it.key, it.value] }) - .map { k,v -> "
$k
${v ?: 'N/A'}
" } - .reduce { a, b -> return [a, b].join("\n ") } - .map { x -> """ - id: 'nf-core-bacass-summary' - description: " - this information is collected when the pipeline is started." - section_name: 'nf-core/bacass Workflow Summary' - section_href: 'https://github.com/nf-core/bacass' - plot_type: 'html' - data: | -
- $x -
- """.stripIndent() } - .set { ch_workflow_summary } - - -//Check compatible parameters -if(("${params.assembler}" == 'canu' || "${params.assembler}" == 'miniasm') && ("${params.assembly_type}" == 'short' || "${params.assembly_type}" == 'hybrid')){ - exit 1, "Canu and Miniasm can only be used for long read assembly and neither for Hybrid nor Shortread assembly!" -} - - -/* Trim and combine short read read-pairs per sample. Similar to nf-core vipr - */ -process trim_and_combine { - label 'medium' - - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/trimming/shortreads/", mode: params.publish_dir_mode - - input: - set sample_id, file(r1), file(r2) from ch_for_short_trim - - output: - set sample_id, file("${sample_id}_trm-cmb.R1.fastq.gz"), file("${sample_id}_trm-cmb.R2.fastq.gz") into (ch_short_for_kraken2, ch_short_for_unicycler, ch_short_for_fastqc) - // not keeping logs for multiqc input. for that to be useful we would need to concat first and then run skewer - - script: - """ - # loop over readunits in pairs per sample - pairno=0 - echo "${r1} ${r2}" | xargs -n2 | while read fq1 fq2; do - skewer --quiet -t ${task.cpus} -m pe -q 3 -n -z \$fq1 \$fq2; - done - cat \$(ls *trimmed-pair1.fastq.gz | sort) >> ${sample_id}_trm-cmb.R1.fastq.gz - cat \$(ls *trimmed-pair2.fastq.gz | sort) >> ${sample_id}_trm-cmb.R2.fastq.gz - """ -} - - -//AdapterTrimming for ONT reads -process adapter_trimming { - label 'medium' - publishDir "${params.outdir}/${sample_id}/trimming/longreads/", mode: params.publish_dir_mode - - when: params.assembly_type == 'hybrid' || params.assembly_type == 'long' - - input: - set sample_id, file(lr) from ch_for_long_trim - - output: - set sample_id, file('trimmed.fastq') into (ch_long_trimmed_unicycler, ch_long_trimmed_canu, ch_long_trimmed_miniasm, ch_long_trimmed_consensus, ch_long_trimmed_nanopolish, ch_long_trimmed_kraken, ch_long_trimmed_medaka) - file ("porechop.version.txt") into ch_porechop_version - - when: !('short' in params.assembly_type) - - script: - """ - porechop -i "${lr}" -t "${task.cpus}" -o trimmed.fastq - porechop --version > porechop.version.txt - """ -} +nextflow.enable.dsl = 2 /* - * STEP 1 - FastQC FOR SHORT READS +======================================================================================== + VALIDATE & PRINT PARAMETER SUMMARY +======================================================================================== */ -process fastqc { - label 'small' - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/FastQC", mode: params.publish_dir_mode - - input: - set sample_id, file(fq1), file(fq2) from ch_short_for_fastqc - output: - file "*_fastqc.{zip,html}" into ch_fastqc_results - - script: - """ - fastqc -t ${task.cpus} -q ${fq1} ${fq2} - """ -} +WorkflowMain.initialise(workflow, params, log) /* - * Quality check for nanopore reads and Quality/Length Plots - */ -process nanoplot { - label 'medium' - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/QC_longreads/NanoPlot", mode: params.publish_dir_mode - - when: (params.assembly_type != 'short') - - input: - set sample_id, file(lr) from ch_for_nanoplot - - output: - file '*.png' - file '*.html' - file '*.txt' - file 'nanoplot.version.txt' into ch_nanoplot_version - - script: - """ - NanoPlot -t "${task.cpus}" --title "${sample_id}" -c darkblue --fastq ${lr} - NanoPlot --version | sed -e "s/NanoPlot //g" > nanoplot.version.txt - """ -} - - -/** Quality check for nanopore Fast5 files +======================================================================================== + NAMED WORKFLOW FOR PIPELINE +======================================================================================== */ -process pycoqc{ - label 'medium' - - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/QC_longreads/PycoQC", mode: params.publish_dir_mode - - when: (params.assembly_type == 'hybrid' || params.assembly_type == 'long') && !params.skip_pycoqc && fast5 - - input: - set sample_id, file(lr), file(fast5) from ch_for_pycoqc.join(ch_fast5_for_pycoqc) - - output: - set sample_id, file('sequencing_summary.txt') into ch_summary_index_for_nanopolish - file("pycoQC_${sample_id}*") - file("pycoQC.version.txt") into ch_pycoqc_version - - script: - //Find out whether the sequencing_summary already exists - if(file("${fast5}/sequencing_summary.txt").exists()){ - run_summary = '' - prefix = "${fast5}/" - } else { - run_summary = "Fast5_to_seq_summary -f $fast5 -t ${task.cpus} -s './sequencing_summary.txt' --verbose_level 2" - prefix = '' - } - //Barcodes available? - barcode_me = file("${fast5}/barcoding_sequencing.txt").exists() ? "-b ${fast5}/barcoding_sequencing.txt" : '' - """ - $run_summary - pycoQC -f "${prefix}sequencing_summary.txt" $barcode_me -o pycoQC_${sample_id}.html -j pycoQC_${sample_id}.json - pycoQC --version | sed -e "s/pycoQC v//g" > pycoQC.version.txt - """ -} - -/* Join channels for unicycler, as trimming the files happens in two separate processes for paralellization of individual steps. As samples have the same sampleID, we can simply use join() to merge the channels based on this. If we only have one of the channels we insert 'NAs' which are not used in the unicycler process then subsequently, in case of short or long read only assembly. -*/ -if(params.assembly_type == 'hybrid'){ - ch_short_for_unicycler - .join(ch_long_trimmed_unicycler) - .dump(tag: 'unicycler') - .set {ch_short_long_joint_unicycler} -} else if(params.assembly_type == 'short'){ - ch_short_for_unicycler - .map{id,R1,R2 -> - tuple(id,R1,R2,'NA')} - .dump(tag: 'unicycler') - .set {ch_short_long_joint_unicycler} -} else if(params.assembly_type == 'long'){ - ch_long_trimmed_unicycler - .map{id,lr -> - tuple(id,'NA','NA',lr)} - .dump(tag: 'unicycler') - .set {ch_short_long_joint_unicycler} -} - -/* unicycler (short, long or hybrid mode!) - */ -process unicycler { - label 'large' - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/unicycler", mode: params.publish_dir_mode - - when: params.assembler == 'unicycler' - - input: - set sample_id, file(fq1), file(fq2), file(lrfastq) from ch_short_long_joint_unicycler - - output: - set sample_id, file("${sample_id}_assembly.fasta") into (quast_ch, prokka_ch, dfast_ch) - file("${sample_id}_assembly.fasta") into (ch_assembly_nanopolish_unicycler,ch_assembly_medaka_unicycler) - file("${sample_id}_assembly.gfa") - file("${sample_id}_unicycler.log") - file("unicycler.version.txt") into ch_unicycler_version - - script: - if(params.assembly_type == 'long'){ - data_param = "-l $lrfastq" - } else if (params.assembly_type == 'short'){ - data_param = "-1 $fq1 -2 $fq2" - } else if (params.assembly_type == 'hybrid'){ - data_param = "-1 $fq1 -2 $fq2 -l $lrfastq" - } - - """ - unicycler $data_param --threads ${task.cpus} ${params.unicycler_args} --keep 0 -o . - mv unicycler.log ${sample_id}_unicycler.log - # rename so that quast can use the name - mv assembly.gfa ${sample_id}_assembly.gfa - mv assembly.fasta ${sample_id}_assembly.fasta - unicycler --version | sed -e "s/Unicycler v//g" > unicycler.version.txt - """ -} - -process miniasm_assembly { - label 'large' - - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/miniasm", mode: params.publish_dir_mode, pattern: 'assembly.fasta' - - input: - set sample_id, file(lrfastq) from ch_long_trimmed_miniasm - - output: - file 'assembly.fasta' into ch_assembly_from_miniasm - - when: params.assembler == 'miniasm' - - script: - """ - minimap2 -x ava-ont -t "${task.cpus}" "${lrfastq}" "${lrfastq}" > "${lrfastq}.paf" - miniasm -f "${lrfastq}" "${lrfastq}.paf" > "${lrfastq}.gfa" - awk '/^S/{print ">"\$2"\\n"\$3}' "${lrfastq}.gfa" | fold > assembly.fasta - """ -} - -//Run consensus for miniasm, the others don't need it. -process consensus { - label 'large' - - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/miniasm/consensus", mode: params.publish_dir_mode, pattern: 'assembly_consensus.fasta' - - input: - set sample_id, file(lrfastq) from ch_long_trimmed_consensus - file(assembly) from ch_assembly_from_miniasm - - output: - file 'assembly_consensus.fasta' into (ch_assembly_consensus_for_nanopolish, ch_assembly_consensus_for_medaka) - - script: - """ - minimap2 -x map-ont -t "${task.cpus}" "${assembly}" "${lrfastq}" > assembly.paf - racon -t "${task.cpus}" "${lrfastq}" assembly.paf "${assembly}" > assembly_consensus.fasta - """ -} - -process canu_assembly { - label 'large' - - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/canu", mode: params.publish_dir_mode, pattern: 'assembly.fasta' - - input: - set sample_id, file(lrfastq), val(genomeSize) from ch_long_trimmed_canu.join(ch_genomeSize_forCanu) - - output: - file 'assembly.fasta' into (assembly_from_canu_for_nanopolish, assembly_from_canu_for_medaka) - file 'canu.version.txt' into ch_canu_version +include { BACASS } from './workflows/bacass' - when: params.assembler == 'canu' - - script: - """ - canu -p assembly -d canu_out \ - genomeSize="${genomeSize}" -nanopore "${lrfastq}" \ - maxThreads="${task.cpus}" merylMemory="${task.memory.toGiga()}G" \ - merylThreads="${task.cpus}" hapThreads="${task.cpus}" batMemory="${task.memory.toGiga()}G" \ - redMemory="${task.memory.toGiga()}G" redThreads="${task.cpus}" \ - oeaMemory="${task.memory.toGiga()}G" oeaThreads="${task.cpus}" \ - corMemory="${task.memory.toGiga()}G" corThreads="${task.cpus}" ${params.canu_args} - mv canu_out/assembly.contigs.fasta assembly.fasta - canu --version | sed -e "s/Canu //g" > canu.version.txt - """ -} - -/* kraken classification: QC for sample purity, only short end reads for now - */ -process kraken2 { - label 'large' - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/kraken", mode: params.publish_dir_mode - - input: - set sample_id, file(fq1), file(fq2) from ch_short_for_kraken2 - - output: - file("${sample_id}_kraken2.report") - - when: !params.skip_kraken2 - - script: - """ - # stdout reports per read which is not needed. kraken.report can be used with pavian - # braken would be nice but requires readlength and correspondingly build db - kraken2 --threads ${task.cpus} --paired --db ${kraken2db} \ - --report ${sample_id}_kraken2.report ${fq1} ${fq2} | gzip > kraken2.out.gz - """ -} - -/* kraken classification: QC for sample purity, only short end reads for now - */ -process kraken2_long { - label 'large' - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/kraken_long", mode: params.publish_dir_mode - - input: - set sample_id, file(lr) from ch_long_trimmed_kraken - - output: - file("${sample_id}_kraken2.report") - - when: !params.skip_kraken2 - - script: - """ - # stdout reports per read which is not needed. kraken.report can be used with pavian - # braken would be nice but requires readlength and correspondingly build db - kraken2 --threads ${task.cpus} --db ${kraken2db} \ - --report ${sample_id}_kraken2.report ${lr} | gzip > kraken2.out.gz - """ -} - -/* assembly qc with quast - */ -process quast { - label 'small' - tag {"$sample_id"} - publishDir "${params.outdir}/${sample_id}/QUAST", mode: params.publish_dir_mode - - input: - set sample_id, file(fasta) from quast_ch - - output: - // multiqc only detects a file called report.tsv. to avoid - // name clash with other samples we need a directory named by sample - file("${sample_id}_assembly_QC/") - file("${sample_id}_assembly_QC/${sample_id}_report.tsv") into quast_logs_ch - file("quast.version.txt") into ch_quast_version - - script: - """ - quast -t ${task.cpus} -o ${sample_id}_assembly_QC ${fasta} - quast --version | sed -e "s/QUAST v//g" > quast.version.txt - mv ${sample_id}_assembly_QC/report.tsv ${sample_id}_assembly_QC/${sample_id}_report.tsv - """ -} - -/* - * Annotation with prokka - */ -process prokka { - label 'large' - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/", mode: params.publish_dir_mode - - input: - set sample_id, file(fasta) from prokka_ch - - output: - file("${sample_id}_annotation/") - file("prokka.version.txt") into ch_prokka_version - - when: !params.skip_annotation && params.annotation_tool == 'prokka' - - script: - """ - prokka --cpus ${task.cpus} --prefix "${sample_id}" --outdir ${sample_id}_annotation ${params.prokka_args} ${fasta} - prokka --version | sed -e "s/prokka //g" > prokka.version.txt - """ -} - -process dfast { - label 'medium_extramem' - tag "$sample_id" - publishDir "${params.outdir}/${sample_id}/", mode: params.publish_dir_mode - - input: - set sample_id, file(fasta) from dfast_ch - file (config) from Channel.value(params.dfast_config ? file(params.dfast_config) : "") - - output: - file("RESULT*") - file("dfast.version.txt") into ch_dfast_version - - when: !params.skip_annotation && params.annotation_tool == 'dfast' - - script: - """ - dfast --genome ${fasta} --config $config - dfast --version | sed -e "s/DFAST ver. //g" > dfast.version.txt - """ -} - - -//Polishes assembly using FAST5 files -process nanopolish { - tag "$assembly" - label 'large' - - publishDir "${params.outdir}/${sample_id}/nanopolish/", mode: params.publish_dir_mode, pattern: 'polished_genome.fa' - - input: - file(assembly) from ch_assembly_consensus_for_nanopolish.mix(ch_assembly_nanopolish_unicycler,assembly_from_canu_for_nanopolish) //Should take either miniasm, canu, or unicycler consensus sequence (!) - set sample_id, file(lrfastq), file(fast5) from ch_long_trimmed_nanopolish.join(ch_fast5_for_nanopolish) - - output: - file 'polished_genome.fa' - file 'nanopolish.version.txt' into ch_nanopolish_version - file 'samtools.version.txt' into ch_samtools_version - - when: !params.skip_polish && params.assembly_type == 'long' && params.polish_method != 'medaka' - - script: - """ - nanopolish index -d "${fast5}" "${lrfastq}" - minimap2 -ax map-ont -t ${task.cpus} "${assembly}" "${lrfastq}"| \ - samtools sort -o reads.sorted.bam -T reads.tmp - - samtools index reads.sorted.bam - nanopolish_makerange.py "${assembly}" | parallel --results nanopolish.results -P "${task.cpus}" nanopolish variants --consensus -o polished.{1}.vcf -w {1} -r "${lrfastq}" -b reads.sorted.bam -g "${assembly}" -t "${task.cpus}" --min-candidate-frequency 0.1 - nanopolish vcf2fasta -g "${assembly}" polished.*.vcf > polished_genome.fa - - #Versions - nanopolish --version | sed -e "s/nanopolish version //g" | head -n 1 > nanopolish.version.txt - samtools --version | sed -e "s/samtools //g" | head -n 1 > samtools.version.txt - """ -} - -//Polishes assembly -process medaka { - tag "$assembly" - label 'large' - - publishDir "${params.outdir}/${sample_id}/medaka/", mode: params.publish_dir_mode, pattern: 'polished_genome.fa' - - input: - file(assembly) from ch_assembly_consensus_for_medaka.mix(ch_assembly_medaka_unicycler,assembly_from_canu_for_medaka) //Should take either miniasm, canu, or unicycler consensus sequence (!) - set sample_id, file(lrfastq) from ch_long_trimmed_medaka - - output: - file 'polished_genome.fa' - file 'medaka.version.txt' into ch_medaka_version - - when: !params.skip_polish && params.assembly_type == 'long' && params.polish_method == 'medaka' - - script: - """ - medaka_consensus -i ${lrfastq} -d ${assembly} -o "polished_genome.fa" -t ${task.cpus} - medaka --version | sed -e "s/medaka //g" > medaka.version.txt - """ +// +// WORKFLOW: Run main nf-core/bacass analysis pipeline +// +workflow NFCORE_BACASS { + BACASS () } /* - * Parse software version numbers - */ -process get_software_versions { - publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode, - saveAs: {filename -> - if (filename.indexOf(".csv") > 0) filename - else null - } - - input: - path quast_version from ch_quast_version.first().ifEmpty([]) - path porechop_version from ch_porechop_version.first().ifEmpty([]) - path pycoqc_version from ch_pycoqc_version.first().ifEmpty([]) - path unicycler_version from ch_unicycler_version.first().ifEmpty([]) - path canu_version from ch_canu_version.first().ifEmpty([]) - path prokka_version from ch_prokka_version.first().ifEmpty([]) - path dfast_version from ch_dfast_version.first().ifEmpty([]) - path nanopolish_version from ch_nanopolish_version.first().ifEmpty([]) - path samtools_version from ch_samtools_version.first().ifEmpty([]) - path nanoplot_version from ch_nanoplot_version.first().ifEmpty([]) - path medaka_version from ch_medaka_version.first().ifEmpty([]) - - output: - file 'software_versions_mqc.yaml' into software_versions_yaml - file "software_versions.csv" - - script: - """ - #All in main container - echo $workflow.manifest.version > pipeline.version.txt - echo $workflow.nextflow.version > nextflow.version.txt - fastqc --version | sed -e "s/FastQC v//g" > fastqc.version.txt - - #Inside main container - miniasm -V > miniasm.version.txt - minimap2 --version &> minimap2.version.txt - racon --version | sed -e "s/v//g" > racon.version.txt - skewer --version | sed -e "s/skewer version://g" | sed -e 's/\\s//g' | head -n 1 > skewer.version.txt - kraken2 --version | sed -e "s/Kraken version //g" | head -n 1 > kraken2.version.txt - multiqc --version | sed -e "s/multiqc, version//g" > multiqc.version.txt - scrape_software_versions.py > software_versions_mqc.yaml - """ -} - -/* - * STEP - MultiQC - */ - -process multiqc { - label 'small' - publishDir "${params.outdir}/MultiQC", mode: params.publish_dir_mode - - input: - - path (multiqc_config) from ch_multiqc_config - path (mqc_custom_config) from ch_multiqc_custom_config.collect().ifEmpty([]) - - //file prokka_logs from prokka_logs_ch.collect().ifEmpty([]) - file ('quast_logs/*') from quast_logs_ch.collect().ifEmpty([]) - // NOTE unicycler and kraken not supported - file ('fastqc/*') from ch_fastqc_results.collect().ifEmpty([]) - file ('software_versions/*') from software_versions_yaml.collect() - file workflow_summary from ch_workflow_summary.collectFile(name: "workflow_summary_mqc.yaml") - - output: - file "*multiqc_report.html" into ch_multiqc_report - file "*_data" - file "multiqc_plots" - - script: - rtitle = custom_runName ? "--title \"$custom_runName\"" : '' - rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' - custom_config_file = params.multiqc_config ? "--config $mqc_custom_config" : '' - """ - multiqc -f $rtitle $rfilename $custom_config_file . - """ -} - -/* - * STEP 3 - Output Description HTML - */ -process output_documentation { - publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode - - input: - file output_docs from ch_output_docs - file images from ch_output_docs_images - - output: - file "results_description.html" +======================================================================================== + RUN ALL WORKFLOWS +======================================================================================== +*/ - script: - """ - markdown_to_html.py $output_docs -o results_description.html - """ +// +// WORKFLOW: Execute a single named workflow for the pipeline +// See: https://github.com/nf-core/rnaseq/issues/619 +// +workflow { + NFCORE_BACASS () } /* - * Completion e-mail notification - */ -workflow.onComplete { - - // Set up the e-mail variables - def subject = "[nf-core/bacass] Successful: $workflow.runName" - if (!workflow.success) { - subject = "[nf-core/bacass] FAILED: $workflow.runName" - } - def email_fields = [:] - email_fields['version'] = workflow.manifest.version - email_fields['runName'] = custom_runName ?: workflow.runName - email_fields['success'] = workflow.success - email_fields['dateComplete'] = workflow.complete - email_fields['duration'] = workflow.duration - email_fields['exitStatus'] = workflow.exitStatus - email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') - email_fields['errorReport'] = (workflow.errorReport ?: 'None') - email_fields['commandLine'] = workflow.commandLine - email_fields['projectDir'] = workflow.projectDir - email_fields['summary'] = summary - email_fields['summary']['Date Started'] = workflow.start - email_fields['summary']['Date Completed'] = workflow.complete - email_fields['summary']['Pipeline script file path'] = workflow.scriptFile - email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision - email_fields['summary']['Nextflow Version'] = workflow.nextflow.version - email_fields['summary']['Nextflow Build'] = workflow.nextflow.build - email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp - - // On success try attach the multiqc report - def mqc_report = null - try { - if (workflow.success) { - mqc_report = ch_multiqc_report.getVal() - if (mqc_report.getClass() == ArrayList) { - log.warn "[nf-core/bacass] Found multiple reports from process 'multiqc', will use only one" - mqc_report = mqc_report[0] - } - } - } catch (all) { - log.warn "[nf-core/bacass] Could not attach MultiQC report to summary email" - } - - // Check if we are only sending emails on failure - email_address = params.email - if (!params.email && params.email_on_fail && !workflow.success) { - email_address = params.email_on_fail - } - - // Render the TXT template - def engine = new groovy.text.GStringTemplateEngine() - def tf = new File("$baseDir/assets/email_template.txt") - def txt_template = engine.createTemplate(tf).make(email_fields) - def email_txt = txt_template.toString() - - // Render the HTML template - def hf = new File("$baseDir/assets/email_template.html") - def html_template = engine.createTemplate(hf).make(email_fields) - def email_html = html_template.toString() - - // Render the sendmail template - def smail_fields = [ email: params.email, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ] - def sf = new File("$baseDir/assets/sendmail_template.txt") - def sendmail_template = engine.createTemplate(sf).make(smail_fields) - def sendmail_html = sendmail_template.toString() - - // Send the HTML e-mail - if (email_address) { - try { - if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } - // Try to send HTML e-mail using sendmail - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "[nf-core/bacass] Sent summary e-mail to $email_address (sendmail)" - } catch (all) { - // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] - if ( mqc_report.size() <= params.max_multiqc_email_size.toBytes() ) { - mail_cmd += [ '-A', mqc_report ] - } - mail_cmd.execute() << email_html - log.info "[nf-core/bacass] Sent summary e-mail to $email_address (mail)" - } - } - - // Write summary e-mail HTML to a file - def output_d = new File("${params.outdir}/pipeline_info/") - if (!output_d.exists()) { - output_d.mkdirs() - } - def output_hf = new File(output_d, "pipeline_report.html") - output_hf.withWriter { w -> w << email_html } - def output_tf = new File(output_d, "pipeline_report.txt") - output_tf.withWriter { w -> w << email_txt } - - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_red = params.monochrome_logs ? '' : "\033[0;31m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - - if (workflow.stats.ignoredCount > 0 && workflow.success) { - log.info "-${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}-" - log.info "-${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}-" - log.info "-${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}-" - } - - if (workflow.success) { - log.info "-${c_purple}[nf-core/bacass]${c_green} Pipeline completed successfully${c_reset}-" - } else { - checkHostname() - log.info "-${c_purple}[nf-core/bacass]${c_red} Pipeline completed with errors${c_reset}-" - } - -} - - -def nfcoreHeader() { - // Log colors ANSI codes - c_black = params.monochrome_logs ? '' : "\033[0;30m"; - c_blue = params.monochrome_logs ? '' : "\033[0;34m"; - c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; - c_dim = params.monochrome_logs ? '' : "\033[2m"; - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_white = params.monochrome_logs ? '' : "\033[0;37m"; - c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; - - return """ -${c_dim}--------------------------------------------------${c_reset}- - ${c_green},--.${c_black}/${c_green},-.${c_reset} - ${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset} - ${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset} - ${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset} - ${c_green}`._,._,\'${c_reset} - ${c_purple} nf-core/bacass v${workflow.manifest.version}${c_reset} - -${c_dim}--------------------------------------------------${c_reset}- - """.stripIndent() -} - -def checkHostname() { - def c_reset = params.monochrome_logs ? '' : "\033[0m" - def c_white = params.monochrome_logs ? '' : "\033[0;37m" - def c_red = params.monochrome_logs ? '' : "\033[1;91m" - def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" - if (params.hostnames) { - def hostname = "hostname".execute().text.trim() - params.hostnames.each { prof, hnames -> - hnames.each { hname -> - if (hostname.contains(hname) && !workflow.profile.contains(prof)) { - log.error "====================================================\n" + - " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + - " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + - " ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" + - "============================================================" - } - } - } - } -} - -// Return file if it exists, if NA is found this gets treated as a String information -static def returnFile(it) { - if(it == 'NA') { - return 'NA' - } else { - if (!file(it).exists()) exit 1, "Warning: Missing file in CSV file: ${it}, see --help for more information" - return file(it) - } -} +======================================================================================== + THE END +======================================================================================== +*/ diff --git a/modules.json b/modules.json new file mode 100644 index 00000000..2ef7020a --- /dev/null +++ b/modules.json @@ -0,0 +1,29 @@ +{ + "name": "nf-core/bacass", + "homePage": "https://github.com/nf-core/bacass", + "repos": { + "nf-core/modules": { + "fastqc": { + "git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d" + }, + "kraken2/kraken2": { + "git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d" + }, + "multiqc": { + "git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d" + }, + "prokka": { + "git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d" + }, + "quast": { + "git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d" + }, + "samtools/index": { + "git_sha": "c5235a983d454787fa0c3247b02086969217163b" + }, + "samtools/sort": { + "git_sha": "c5235a983d454787fa0c3247b02086969217163b" + } + } + } +} \ No newline at end of file diff --git a/modules/local/canu.nf b/modules/local/canu.nf new file mode 100644 index 00000000..09afecf6 --- /dev/null +++ b/modules/local/canu.nf @@ -0,0 +1,50 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process CANU { + tag "$meta.id" + label 'process_high' + label 'process_long' + label 'process_high_memory' + label 'error_retry' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'canu=2.1.1-2' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/canu:2.1.1--h1b792b2_2" + } else { + container "quay.io/biocontainers/canu:2.1.1--h1b792b2_2" + } + + input: + tuple val(meta), val(reads), file(longreads) + + output: + tuple val(meta), path('*_assembly.fasta') , emit: assembly + tuple val(meta), path('*_assembly.report'), emit: log + path '*.version.txt' , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def genomeSize = meta.genome_size == 'NA' ? "5m" : "${meta.genome_size}" + """ + canu -p assembly -d canu_out \ + ${options.args} \ + genomeSize="${genomeSize}" -nanopore "${longreads}" \ + maxThreads="${task.cpus}" merylMemory="${task.memory.toGiga()}G" \ + merylThreads="${task.cpus}" hapThreads="${task.cpus}" batMemory="${task.memory.toGiga()}G" \ + redMemory="${task.memory.toGiga()}G" redThreads="${task.cpus}" \ + oeaMemory="${task.memory.toGiga()}G" oeaThreads="${task.cpus}" \ + corMemory="${task.memory.toGiga()}G" corThreads="${task.cpus}" + mv canu_out/assembly.contigs.fasta ${prefix}_assembly.fasta + mv canu_out/assembly.report ${prefix}_assembly.report + + echo \$(canu --version 2>&1) | sed -e 's/Canu //g' > ${software}.version.txt + """ +} diff --git a/modules/local/dfast.nf b/modules/local/dfast.nf new file mode 100644 index 00000000..bd515040 --- /dev/null +++ b/modules/local/dfast.nf @@ -0,0 +1,36 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process DFAST { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? "dfast=1.2.14" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/dfast:1.2.14--h2e03b76_0" + } else { + container "quay.io/biocontainers/dfast:1.2.14--h2e03b76_0" + } + + input: + tuple val(meta), path(fasta) + file (config) + + output: + tuple val(meta), path("RESULT*"), emit: reads + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + """ + dfast_file_downloader.py --protein dfast --dbroot . + dfast --genome ${fasta} --config $config + dfast --version | sed -e "s/DFAST ver. //g" > "${software}.version.txt" + """ +} diff --git a/modules/local/functions.nf b/modules/local/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/local/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/local/get_software_versions.nf b/modules/local/get_software_versions.nf new file mode 100644 index 00000000..d7a9a92e --- /dev/null +++ b/modules/local/get_software_versions.nf @@ -0,0 +1,33 @@ +// Import generic module functions +include { saveFiles } from './functions' + +params.options = [:] + +process GET_SOFTWARE_VERSIONS { + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'pipeline_info', meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "conda-forge::python=3.8.3" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/python:3.8.3" + } else { + container "quay.io/biocontainers/python:3.8.3" + } + + cache false + + input: + path versions + + output: + path "software_versions.tsv" , emit: tsv + path 'software_versions_mqc.yaml', emit: yaml + + script: // This script is bundled with the pipeline, in nf-core/bacass/bin/ + """ + echo $workflow.manifest.version > pipeline.version.txt + echo $workflow.nextflow.version > nextflow.version.txt + scrape_software_versions.py &> software_versions_mqc.yaml + """ +} diff --git a/modules/local/kraken2_db_preparation.nf b/modules/local/kraken2_db_preparation.nf new file mode 100644 index 00000000..559f4b2a --- /dev/null +++ b/modules/local/kraken2_db_preparation.nf @@ -0,0 +1,31 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process KRAKEN2_DB_PREPARATION { + tag "${db.simpleName}" + label 'process_low' + + conda (params.enable_conda ? "conda-forge::sed=4.7" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" + } else { + container "biocontainers/biocontainers:v1.2.0_cv1" + } + + input: + path db + + output: + tuple val("${db.simpleName}"), path("database"), emit: db + + script: + """ + mkdir db_tmp + tar -xf "${db}" -C db_tmp + mkdir database + mv `find db_tmp/ -name "*.k2d"` database/ + """ +} diff --git a/modules/local/medaka.nf b/modules/local/medaka.nf new file mode 100644 index 00000000..7af1e028 --- /dev/null +++ b/modules/local/medaka.nf @@ -0,0 +1,43 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process MEDAKA { + tag "$meta.id" + label 'process_high' + label 'process_long' + label 'process_high_memory' + label 'error_retry' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'medaka=1.4.3-0' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/medaka:1.4.3--py38h130def0_0" + } else { + container "quay.io/biocontainers/medaka:1.4.3--py38h130def0_0" + } + + input: + tuple val(meta), file(assembly), val(reads), file(longreads) + + output: + tuple val(meta), path('*_polished_genome.fa'), emit: assembly + path '*.version.txt' , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + """ + medaka_consensus ${options.args} \ + -i ${longreads} \ + -d ${assembly} \ + -o "${prefix}_polished_genome.fa" \ + -t ${task.cpus} + + echo \$(medaka --version 2>&1) | sed -e 's/medaka //g' > ${software}.version.txt + """ +} diff --git a/modules/local/miniasm.nf b/modules/local/miniasm.nf new file mode 100644 index 00000000..5a3cda7d --- /dev/null +++ b/modules/local/miniasm.nf @@ -0,0 +1,41 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process MINIASM { + tag "$meta.id" + label 'process_high' + label 'process_long' + label 'process_high_memory' + label 'error_retry' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'bioconda::miniasm=0.3_r179' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/miniasm:0.3_r179--h5bf99c6_2" + } else { + container "quay.io/biocontainers/miniasm:0.3_r179--h5bf99c6_2" + } + + input: + tuple val(meta), val(reads), file(longreads), file(assembly), path(paf) + + output: + tuple val(meta), path('*_assembly.fasta') , emit: assembly + tuple val(meta), val(reads), file(longreads), path('*_assembly.fasta') , emit: all + path '*.version.txt' , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + """ + miniasm -f "${longreads}" "${paf}" > "${longreads}.gfa" + awk '/^S/{print ">"\$2"\\n"\$3}' "${longreads}.gfa" | fold > ${prefix}_assembly.fasta + + echo \$(miniasm -V 2>&1) > ${software}.version.txt + """ +} diff --git a/modules/local/minimap_align.nf b/modules/local/minimap_align.nf new file mode 100644 index 00000000..89af8661 --- /dev/null +++ b/modules/local/minimap_align.nf @@ -0,0 +1,41 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process MINIMAP2_ALIGN { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'bioconda::minimap2=2.21' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/minimap2:2.21--h5bf99c6_0" + } else { + container "quay.io/biocontainers/minimap2:2.21--h5bf99c6_0" + } + + input: + tuple val(meta), val(reads), file(longreads), file('reference') + + output: + tuple val(meta), val(reads), file(longreads), file('reference'), path("*.paf"), emit: paf + path "*.version.txt", emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + """ + minimap2 \\ + $options.args \\ + -t $task.cpus \\ + reference \\ + $longreads \\ + > ${prefix}.paf + + echo \$(minimap2 --version 2>&1) > ${software}.version.txt + """ +} diff --git a/modules/local/nanoplot.nf b/modules/local/nanoplot.nf new file mode 100644 index 00000000..a0540aba --- /dev/null +++ b/modules/local/nanoplot.nf @@ -0,0 +1,41 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process NANOPLOT { + tag "$meta.id" + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'bioconda::nanoplot=1.38.0' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/nanoplot:1.38.0--pyhdfd78af_0" + } else { + container "quay.io/biocontainers/nanoplot:1.38.0--pyhdfd78af_0" + } + + input: + tuple val(meta), path(ontfile) + + output: + tuple val(meta), path("*.html"), emit: html + tuple val(meta), path("*.png") , emit: png + tuple val(meta), path("*.txt") , emit: txt + tuple val(meta), path("*.log") , emit: log + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + def input_file = "--fastq ${ontfile}" + """ + NanoPlot \\ + $options.args \\ + -t $task.cpus \\ + $input_file + echo \$(NanoPlot --version 2>&1) | sed 's/^.*NanoPlot //; s/ .*\$//' > ${software}.version.txt + """ +} diff --git a/modules/local/nanopolish.nf b/modules/local/nanopolish.nf new file mode 100644 index 00000000..c1e6c943 --- /dev/null +++ b/modules/local/nanopolish.nf @@ -0,0 +1,50 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process NANOPOLISH { + tag "$meta.id" + label 'process_high' + label 'process_long' + label 'process_high_memory' + label 'error_retry' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'nanopolish=0.13.2-5' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/nanopolish:0.13.2--h8cec615_5" + } else { + container "quay.io/biocontainers/nanopolish:0.13.2--h8cec615_5" + } + + input: + tuple val(meta), val(reads), file(longreads), file(assembly), file(bam), file(bai), file(fast5) + + output: + tuple val(meta), file('polished_genome.fa'), emit: assembly + path "*.version.txt", emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + """ + nanopolish index -d "${fast5}" "${longreads}" + + nanopolish variants \ + --consensus \ + -o polished.vcf \ + -r "${longreads}" \ + -b "${bam}" \ + -g "${assembly}" \ + -t "${task.cpus}" \ + --min-candidate-frequency 0.1 + + nanopolish vcf2fasta -g "${assembly}" polished.vcf > polished_genome.fa + + nanopolish --version | sed -e "s/nanopolish version //g" | head -n 1 > ${software}.version.txt + """ +} diff --git a/modules/local/porechop.nf b/modules/local/porechop.nf new file mode 100644 index 00000000..57a2cc2f --- /dev/null +++ b/modules/local/porechop.nf @@ -0,0 +1,34 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process PORECHOP { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? "porechop=0.2.4" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/porechop:0.2.4--py38hed8969a_1" + } else { + container "quay.io/biocontainers/porechop:0.2.4--py38hed8969a_1" + } + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path('trimmed.fastq.gz'), emit: reads + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + """ + porechop $options.args -i "${reads}" -t "${task.cpus}" -o trimmed.fastq.gz + porechop --version > "${software}.version.txt" + """ +} diff --git a/modules/local/pycoqc.nf b/modules/local/pycoqc.nf new file mode 100644 index 00000000..8c0de668 --- /dev/null +++ b/modules/local/pycoqc.nf @@ -0,0 +1,52 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process PYCOQC { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? "bioconda::pycoqc=2.5.2" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/pycoqc:2.5.2--py_0" + } else { + container "quay.io/biocontainers/pycoqc:2.5.2--py_0" + } + + input: + tuple val(meta), path(fast5) + + output: + tuple val(meta), path('sequencing_summary.txt'), emit: summary + path "*.html" , emit: html + path "*.json" , emit: json + path "*.version.txt", emit: version + + script: + def software = getSoftwareName(task.process) + //Find out whether the sequencing_summary already exists + if(file("${fast5}/sequencing_summary.txt").exists()){ + run_summary = "cp ${fast5}/sequencing_summary.txt ./sequencing_summary.txt" + } else { + run_summary = "Fast5_to_seq_summary -f $fast5 -t ${task.cpus} -s './sequencing_summary.txt' --verbose_level 2" + } + //Barcodes available? + barcode_me = file("${fast5}/barcoding_sequencing.txt").exists() ? "-b ${fast5}/barcoding_sequencing.txt" : '' + """ + $run_summary + + pycoQC \\ + $options.args \\ + -f "sequencing_summary.txt" \\ + $barcode_me \\ + -o ${meta.id}_pycoqc.html \\ + -j ${meta.id}_pycoqc.json + + echo \$(pycoQC --version 2>&1) | sed 's/^.*pycoQC v//; s/ .*\$//' > ${software}.version.txt + """ +} diff --git a/modules/local/racon.nf b/modules/local/racon.nf new file mode 100644 index 00000000..28973921 --- /dev/null +++ b/modules/local/racon.nf @@ -0,0 +1,39 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process RACON { + tag "$meta.id" + label 'process_high' + label 'process_long' + label 'process_high_memory' + label 'error_retry' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'racon=1.4.20-1' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/racon:1.4.20--h9a82719_1" + } else { + container "quay.io/biocontainers/racon:1.4.20--h9a82719_1" + } + + input: + tuple val(meta), val(reads), file(longreads), path('assembly.fasta'), path(paf) + + output: + tuple val(meta), path('*_assembly_consensus.fasta') , emit: assembly + path '*.version.txt' , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + """ + racon -t "${task.cpus}" "${longreads}" "${paf}" "assembly.fasta" > ${prefix}_assembly_consensus.fasta + + echo \$(racon --version 2>&1) | sed 's/^.*v//' > ${software}.version.txt + """ +} diff --git a/modules/local/skewer.nf b/modules/local/skewer.nf new file mode 100644 index 00000000..ac983b3b --- /dev/null +++ b/modules/local/skewer.nf @@ -0,0 +1,46 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process SKEWER { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? "skewer=0.2.2-3" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/skewer:0.2.2--hc9558a2_3" + } else { + container "quay.io/biocontainers/skewer:0.2.2--hc9558a2_3" + } + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path("*_trm-cmb.R{1,2}.fastq.gz"), emit: reads + path("*.log") , emit: log + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + """ + # loop over readunits in pairs per sample + pairno=0 + echo "${reads[0]} ${reads[1]}" | xargs -n2 | while read fq1 fq2; do + skewer $options.args -t ${task.cpus} \$fq1 \$fq2; + done + + # gzip, because skewer's -z returns an error + gzip *.fastq + + cat \$(ls *trimmed-pair1.fastq.gz | sort) >> ${meta.id}_trm-cmb.R1.fastq.gz + cat \$(ls *trimmed-pair2.fastq.gz | sort) >> ${meta.id}_trm-cmb.R2.fastq.gz + + echo \$(skewer --version 2>&1) | sed 's/^.*skewer version: //; s/ .*//' > ${software}.version.txt + """ +} diff --git a/modules/local/unicycler.nf b/modules/local/unicycler.nf new file mode 100644 index 00000000..83ce4703 --- /dev/null +++ b/modules/local/unicycler.nf @@ -0,0 +1,56 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process UNICYCLER { + tag "$meta.id" + label 'process_high' + label 'process_long' + label 'process_high_memory' + label 'error_retry' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'bioconda::unicycler=0.4.8' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/unicycler:0.4.8--py38h8162308_3" + } else { + container "quay.io/biocontainers/unicycler:0.4.8--py38h8162308_3" + } + + input: + tuple val(meta), file(reads), file(longreads) + + output: + tuple val(meta), path('*.scaffolds.fa'), emit: scaffolds + tuple val(meta), path('*.assembly.gfa'), emit: gfa + tuple val(meta), path('*.log') , emit: log + path '*.version.txt' , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + if(params.assembly_type == 'long'){ + input_reads = "-l $longreads" + } else if (params.assembly_type == 'short'){ + input_reads = "-1 ${reads[0]} -2 ${reads[1]}" + } else if (params.assembly_type == 'hybrid'){ + input_reads = "-1 ${reads[0]} -2 ${reads[1]} -l $longreads" + } + """ + unicycler \\ + --threads $task.cpus \\ + $options.args \\ + $input_reads \\ + --out ./ + + mv assembly.fasta ${prefix}.scaffolds.fa + mv assembly.gfa ${prefix}.assembly.gfa + mv unicycler.log ${prefix}.unicycler.log + + echo \$(unicycler --version 2>&1) | sed 's/^.*Unicycler v//; s/ .*\$//' > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/fastqc/functions.nf b/modules/nf-core/modules/fastqc/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/nf-core/modules/fastqc/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/nf-core/modules/fastqc/main.nf b/modules/nf-core/modules/fastqc/main.nf new file mode 100644 index 00000000..39c327b2 --- /dev/null +++ b/modules/nf-core/modules/fastqc/main.nf @@ -0,0 +1,47 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process FASTQC { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? "bioconda::fastqc=0.11.9" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0" + } else { + container "quay.io/biocontainers/fastqc:0.11.9--0" + } + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path("*.html"), emit: html + tuple val(meta), path("*.zip") , emit: zip + path "*.version.txt" , emit: version + + script: + // Add soft-links to original FastQs for consistent naming in pipeline + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + if (meta.single_end) { + """ + [ ! -f ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz + fastqc $options.args --threads $task.cpus ${prefix}.fastq.gz + fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt + """ + } else { + """ + [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz + [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz + fastqc $options.args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz + fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt + """ + } +} diff --git a/modules/nf-core/modules/fastqc/meta.yml b/modules/nf-core/modules/fastqc/meta.yml new file mode 100644 index 00000000..8eb9953d --- /dev/null +++ b/modules/nf-core/modules/fastqc/meta.yml @@ -0,0 +1,51 @@ +name: fastqc +description: Run FastQC on sequenced reads +keywords: + - quality control + - qc + - adapters + - fastq +tools: + - fastqc: + description: | + FastQC gives general quality metrics about your reads. + It provides information about the quality score distribution + across your reads, the per base sequence content (%A/C/G/T). + You get information about adapter contamination and other + overrepresented sequences. + homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ + documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - html: + type: file + description: FastQC report + pattern: "*_{fastqc.html}" + - zip: + type: file + description: FastQC report archive + pattern: "*_{fastqc.zip}" + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" +authors: + - "@drpatelh" + - "@grst" + - "@ewels" + - "@FelixKrueger" diff --git a/modules/nf-core/modules/kraken2/kraken2/functions.nf b/modules/nf-core/modules/kraken2/kraken2/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/nf-core/modules/kraken2/kraken2/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/nf-core/modules/kraken2/kraken2/main.nf b/modules/nf-core/modules/kraken2/kraken2/main.nf new file mode 100644 index 00000000..0fa86579 --- /dev/null +++ b/modules/nf-core/modules/kraken2/kraken2/main.nf @@ -0,0 +1,55 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process KRAKEN2_KRAKEN2 { + tag "$meta.id" + label 'process_high' + label 'process_long' + label 'process_high_memory' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'bioconda::kraken2=2.1.1 conda-forge::pigz=2.6' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container 'https://depot.galaxyproject.org/singularity/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:941789bd7fe00db16531c26de8bf3c5c985242a5-0' + } else { + container 'quay.io/biocontainers/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:941789bd7fe00db16531c26de8bf3c5c985242a5-0' + } + + input: + tuple val(meta), path(reads) + path db + + output: + tuple val(meta), path('*classified*') , emit: classified + tuple val(meta), path('*unclassified*'), emit: unclassified + tuple val(meta), path('*report.txt') , emit: txt + path '*.version.txt' , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def paired = meta.single_end ? "" : "--paired" + def classified = meta.single_end ? "${prefix}.classified.fastq" : "${prefix}.classified#.fastq" + def unclassified = meta.single_end ? "${prefix}.unclassified.fastq" : "${prefix}.unclassified#.fastq" + """ + kraken2 \\ + --db $db \\ + --threads $task.cpus \\ + --unclassified-out $unclassified \\ + --classified-out $classified \\ + --report ${prefix}.kraken2.report.txt \\ + --gzip-compressed \\ + $paired \\ + $options.args \\ + $reads + + pigz -p $task.cpus *.fastq + + echo \$(kraken2 --version 2>&1) | sed 's/^.*Kraken version //; s/ .*\$//' > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/kraken2/kraken2/meta.yml b/modules/nf-core/modules/kraken2/kraken2/meta.yml new file mode 100644 index 00000000..cb1ec0de --- /dev/null +++ b/modules/nf-core/modules/kraken2/kraken2/meta.yml @@ -0,0 +1,59 @@ +name: kraken2_kraken2 +description: Classifies metagenomic sequence data +keywords: + - classify + - metagenomics + - fastq + - db +tools: + - kraken2: + description: | + Kraken2 is a taxonomic sequence classifier that assigns taxonomic labels to sequence reads + homepage: https://ccb.jhu.edu/software/kraken2/ + documentation: https://github.com/DerrickWood/kraken2/wiki/Manual + doi: 10.1186/s13059-019-1891-0 +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. + - db: + type: directory + description: Kraken2 database +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - classified: + type: file + description: | + Reads classified to belong to any of the taxa + on the Kraken2 database. + pattern: "*{fastq.gz}" + - unclassified: + type: file + description: | + Reads not classified to belong to any of the taxa + on the Kraken2 database. + pattern: "*{fastq.gz}" + - txt: + type: file + description: | + Kraken2 report containing stats about classified + and not classifed reads. + pattern: "*.{report.txt}" + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" +authors: + - "@joseespinosa" + - "@drpatelh" diff --git a/modules/nf-core/modules/multiqc/functions.nf b/modules/nf-core/modules/multiqc/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/nf-core/modules/multiqc/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/nf-core/modules/multiqc/main.nf b/modules/nf-core/modules/multiqc/main.nf new file mode 100644 index 00000000..da780800 --- /dev/null +++ b/modules/nf-core/modules/multiqc/main.nf @@ -0,0 +1,35 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process MULTIQC { + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "bioconda::multiqc=1.10.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/multiqc:1.10.1--py_0" + } else { + container "quay.io/biocontainers/multiqc:1.10.1--py_0" + } + + input: + path multiqc_files + + output: + path "*multiqc_report.html", emit: report + path "*_data" , emit: data + path "*_plots" , optional:true, emit: plots + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + """ + multiqc -f $options.args . + multiqc --version | sed -e "s/multiqc, version //g" > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/multiqc/meta.yml b/modules/nf-core/modules/multiqc/meta.yml new file mode 100644 index 00000000..532a8bb1 --- /dev/null +++ b/modules/nf-core/modules/multiqc/meta.yml @@ -0,0 +1,39 @@ +name: MultiQC +description: Aggregate results from bioinformatics analyses across many samples into a single report +keywords: + - QC + - bioinformatics tools + - Beautiful stand-alone HTML report +tools: + - multiqc: + description: | + MultiQC searches a given directory for analysis logs and compiles a HTML report. + It's a general use tool, perfect for summarising the output from numerous bioinformatics tools. + homepage: https://multiqc.info/ + documentation: https://multiqc.info/docs/ +input: + - multiqc_files: + type: file + description: | + List of reports / files recognised by MultiQC, for example the html and zip output of FastQC +output: + - report: + type: file + description: MultiQC report file + pattern: "multiqc_report.html" + - data: + type: dir + description: MultiQC data dir + pattern: "multiqc_data" + - plots: + type: file + description: Plots created by MultiQC + pattern: "*_data" + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" +authors: + - "@abhi18av" + - "@bunop" + - "@drpatelh" diff --git a/modules/nf-core/modules/prokka/functions.nf b/modules/nf-core/modules/prokka/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/nf-core/modules/prokka/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/nf-core/modules/prokka/main.nf b/modules/nf-core/modules/prokka/main.nf new file mode 100644 index 00000000..1fa3f3d9 --- /dev/null +++ b/modules/nf-core/modules/prokka/main.nf @@ -0,0 +1,56 @@ +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process PROKKA { + tag "$meta.id" + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? "bioconda::prokka=1.14.6" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/prokka:1.14.6--pl526_0" + } else { + container "quay.io/biocontainers/prokka:1.14.6--pl526_0" + } + + input: + tuple val(meta), path(fasta) + path proteins + path prodigal_tf + + output: + tuple val(meta), path("${prefix}/*.gff"), emit: gff + tuple val(meta), path("${prefix}/*.gbk"), emit: gbk + tuple val(meta), path("${prefix}/*.fna"), emit: fna + tuple val(meta), path("${prefix}/*.faa"), emit: faa + tuple val(meta), path("${prefix}/*.ffn"), emit: ffn + tuple val(meta), path("${prefix}/*.sqn"), emit: sqn + tuple val(meta), path("${prefix}/*.fsa"), emit: fsa + tuple val(meta), path("${prefix}/*.tbl"), emit: tbl + tuple val(meta), path("${prefix}/*.err"), emit: err + tuple val(meta), path("${prefix}/*.log"), emit: log + tuple val(meta), path("${prefix}/*.txt"), emit: txt + tuple val(meta), path("${prefix}/*.tsv"), emit: tsv + path "*.version.txt", emit: version + + script: + def software = getSoftwareName(task.process) + prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def proteins_opt = proteins ? "--proteins ${proteins[0]}" : "" + def prodigal_opt = prodigal_tf ? "--prodigaltf ${prodigal_tf[0]}" : "" + """ + prokka \\ + $options.args \\ + --cpus $task.cpus \\ + --prefix $prefix \\ + $proteins_opt \\ + $prodigal_tf \\ + $fasta + + echo \$(prokka --version 2>&1) | sed 's/^.*prokka //' > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/prokka/meta.yml b/modules/nf-core/modules/prokka/meta.yml new file mode 100644 index 00000000..4489b2fd --- /dev/null +++ b/modules/nf-core/modules/prokka/meta.yml @@ -0,0 +1,91 @@ +name: prokka +description: Whole genome annotation of small genomes (bacterial, archeal, viral) +keywords: + - annotation + - fasta + - prokka +tools: + - prokka: + description: Rapid annotation of prokaryotic genomes + homepage: https://github.com/tseemann/prokka + doi: "10.1093/bioinformatics/btu153" + licence: ['GPL v2'] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: | + FASTA file to be annotated. Has to contain at least a non-empty string dummy value. + - proteins: + type: file + description: FASTA file of trusted proteins to first annotate from (optional) + - prodigal_tf: + type: file + description: Training file to use for Prodigal (optional) + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" + - gff: + type: file + description: annotation in GFF3 format, containing both sequences and annotations + pattern: "*.{gff}" + - gbk: + type: file + description: annotation in GenBank format, containing both sequences and annotations + pattern: "*.{gbk}" + - fna: + type: file + description: nucleotide FASTA file of the input contig sequences + pattern: "*.{fna}" + - faa: + type: file + description: protein FASTA file of the translated CDS sequences + pattern: "*.{faa}" + - ffn: + type: file + description: nucleotide FASTA file of all the prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA) + pattern: "*.{ffn}" + - sqn: + type: file + description: an ASN1 format "Sequin" file for submission to Genbank + pattern: "*.{sqn}" + - fsa: + type: file + description: nucleotide FASTA file of the input contig sequences, used by "tbl2asn" to create the .sqn file + pattern: "*.{fsa}" + - tbl: + type: file + description: feature Table file, used by "tbl2asn" to create the .sqn file + pattern: "*.{tbl}" + - err: + type: file + description: unacceptable annotations - the NCBI discrepancy report. + pattern: "*.{err}" + - log: + type: file + description: contains all the output that Prokka produced during its run + pattern: "*.{log}" + - txt: + type: file + description: statistics relating to the annotated features found + pattern: "*.{txt}" + - tsv: + type: file + description: tab-separated file of all features (locus_tag,ftype,len_bp,gene,EC_number,COG,product) + pattern: "*.{tsv}" + +authors: + - "@rpetit3" diff --git a/modules/nf-core/modules/quast/functions.nf b/modules/nf-core/modules/quast/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/nf-core/modules/quast/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/nf-core/modules/quast/main.nf b/modules/nf-core/modules/quast/main.nf new file mode 100644 index 00000000..0b94c410 --- /dev/null +++ b/modules/nf-core/modules/quast/main.nf @@ -0,0 +1,48 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process QUAST { + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? 'bioconda::quast=5.0.2' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container 'https://depot.galaxyproject.org/singularity/quast:5.0.2--py37pl526hb5aa323_2' + } else { + container 'quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2' + } + + input: + path consensus + path fasta + path gff + val use_fasta + val use_gff + + output: + path "${prefix}" , emit: results + path '*.tsv' , emit: tsv + path '*.version.txt', emit: version + + script: + def software = getSoftwareName(task.process) + prefix = options.suffix ?: software + def features = use_gff ? "--features $gff" : '' + def reference = use_fasta ? "-r $fasta" : '' + """ + quast.py \\ + --output-dir $prefix \\ + $reference \\ + $features \\ + --threads $task.cpus \\ + $options.args \\ + ${consensus.join(' ')} + ln -s ${prefix}/report.tsv + echo \$(quast.py --version 2>&1) | sed 's/^.*QUAST v//; s/ .*\$//' > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/quast/meta.yml b/modules/nf-core/modules/quast/meta.yml new file mode 100644 index 00000000..cc79486e --- /dev/null +++ b/modules/nf-core/modules/quast/meta.yml @@ -0,0 +1,46 @@ +name: quast +description: Quality Assessment Tool for Genome Assemblies +keywords: + - quast + - assembly + - quality +tools: + - quast: + description: | + QUAST calculates quality metrics for genome assemblies + homepage: http://bioinf.spbau.ru/quast + doi: +input: + - consensus: + type: file + description: | + Fasta file containing the assembly of interest + - fasta: + type: file + description: | + The genome assembly to be evaluated. Has to contain at least a non-empty string dummy value. + - use_fasta: + type: boolean + description: Whether to use the provided fasta reference genome file + - gff: + type: file + description: The genome GFF file. Has to contain at least a non-empty string dummy value. + - use_gff: + type: boolean + description: Whether to use the provided gff reference annotation file + +output: + - quast: + type: directory + description: Directory containing complete quast report + pattern: "{prefix}.lineage_report.csv" + - report: + + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" + +authors: + - "@drpatelh" + - "@kevinmenden" diff --git a/modules/nf-core/modules/samtools/index/functions.nf b/modules/nf-core/modules/samtools/index/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/nf-core/modules/samtools/index/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/nf-core/modules/samtools/index/main.nf b/modules/nf-core/modules/samtools/index/main.nf new file mode 100644 index 00000000..e1966fb3 --- /dev/null +++ b/modules/nf-core/modules/samtools/index/main.nf @@ -0,0 +1,35 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process SAMTOOLS_INDEX { + tag "$meta.id" + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'bioconda::samtools=1.13' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/samtools:1.13--h8c37831_0" + } else { + container "quay.io/biocontainers/samtools:1.13--h8c37831_0" + } + + input: + tuple val(meta), path(bam) + + output: + tuple val(meta), path("*.bai"), optional:true, emit: bai + tuple val(meta), path("*.csi"), optional:true, emit: csi + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + """ + samtools index $options.args $bam + echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/samtools/index/meta.yml b/modules/nf-core/modules/samtools/index/meta.yml new file mode 100644 index 00000000..5d076e3b --- /dev/null +++ b/modules/nf-core/modules/samtools/index/meta.yml @@ -0,0 +1,47 @@ +name: samtools_index +description: Index SAM/BAM/CRAM file +keywords: + - index + - bam + - sam + - cram +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: hhttp://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bai: + type: file + description: BAM/CRAM/SAM index file + pattern: "*.{bai,crai,sai}" + - csi: + type: file + description: CSI index file + pattern: "*.{csi}" + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" +authors: + - "@drpatelh" + - "@ewels" diff --git a/modules/nf-core/modules/samtools/sort/functions.nf b/modules/nf-core/modules/samtools/sort/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/nf-core/modules/samtools/sort/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/nf-core/modules/samtools/sort/main.nf b/modules/nf-core/modules/samtools/sort/main.nf new file mode 100644 index 00000000..0a6b7048 --- /dev/null +++ b/modules/nf-core/modules/samtools/sort/main.nf @@ -0,0 +1,35 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process SAMTOOLS_SORT { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'bioconda::samtools=1.13' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/samtools:1.13--h8c37831_0" + } else { + container "quay.io/biocontainers/samtools:1.13--h8c37831_0" + } + + input: + tuple val(meta), path(bam) + + output: + tuple val(meta), path("*.bam"), emit: bam + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + """ + samtools sort $options.args -@ $task.cpus -o ${prefix}.bam -T $prefix $bam + echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/samtools/sort/meta.yml b/modules/nf-core/modules/samtools/sort/meta.yml new file mode 100644 index 00000000..704e8c1f --- /dev/null +++ b/modules/nf-core/modules/samtools/sort/meta.yml @@ -0,0 +1,43 @@ +name: samtools_sort +description: Sort SAM/BAM/CRAM file +keywords: + - sort + - bam + - sam + - cram +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: hhttp://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: Sorted BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" +authors: + - "@drpatelh" + - "@ewels" diff --git a/nextflow.config b/nextflow.config index 765a1aa6..3bce38bd 100644 --- a/nextflow.config +++ b/nextflow.config @@ -1,167 +1,212 @@ /* - * ------------------------------------------------- - * nf-core/bacass Nextflow config file - * ------------------------------------------------- - * Default config options for all environments. - */ +======================================================================================== + nf-core/bacass Nextflow config file +======================================================================================== + Default config options for all compute environments +---------------------------------------------------------------------------------------- +*/ // Global default params, used in configs params { - input = '' - // Workflow flags - outdir = './results' - skip_kraken2 = false - kraken2db = "" - unicycler_args = "" - prokka_args = "" - assembler = 'unicycler' //allowed are unicycler, canu, miniasm - //Short, Long or Hybrid assembly? - assembly_type = 'short' //allowed are short, long, hybrid (hybrid works only with Unicycler) - annotation_tool = 'prokka' //Default - canu_args = '' //Default no extra options, can be adjusted by the user - dfast_config = "$baseDir/assets/test_config_dfast.py" - polish_method = 'medaka' - //Skipping parts - skip_pycoqc = false - skip_annotation = false - skip_polish = false - - // Boilerplate options - name = false - multiqc_config = false - email = false - max_multiqc_email_size = 25.MB - plaintext_email = false - monochrome_logs = false - help = false - publish_dir_mode = 'copy' - igenomes_base = 's3://ngi-igenomes/igenomes/' - igenomes_ignore = true - tracedir = "${params.outdir}/pipeline_info" - custom_config_version = 'master' - custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" - hostnames = false - config_profile_description = false - config_profile_contact = false - config_profile_url = false - - // Defaults only, expecting to be overwritten - max_memory = 128.GB - max_cpus = 16 - max_time = 240.h -} + // Input options + input = null + + // Contamination_screening + kraken2db = "" + + // Assembly parameters + assembler = 'unicycler' //allowed are unicycler, canu, miniasm + assembly_type = 'short' //allowed are short, long, hybrid (hybrid works only with Unicycler) + unicycler_args = "" + canu_args = '' //Default no extra options, can be adjusted by the user + + // Assembly polishing + polish_method = 'medaka' + + // Annotation + annotation_tool = 'prokka' + prokka_args = "" + dfast_config = "$projectDir/assets/test_config_dfast.py" + + // Skipping options + skip_kraken2 = false + skip_pycoqc = false + skip_annotation = false + skip_polish = false -// Container slug. Stable releases should specify release tag! -// Developmental code should specify :dev -process.container = 'nfcore/bacass:1.1.1' + // MultiQC options + multiqc_config = null + multiqc_title = null + max_multiqc_email_size = '25.MB' + + // Boilerplate options + outdir = './results' + tracedir = "${params.outdir}/pipeline_info" + publish_dir_mode = 'copy' + email = null + email_on_fail = null + plaintext_email = false + monochrome_logs = false + help = false + validate_params = true + show_hidden_params = false + schema_ignore_params = 'modules,igenomes_base' + enable_conda = false + singularity_pull_docker_container = false + + // Config options + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + hostnames = [:] + config_profile_description = null + config_profile_contact = null + config_profile_url = null + config_profile_name = null + + // Max resource options + // Defaults only, expecting to be overwritten + max_memory = '128.GB' + max_cpus = 16 + max_time = '240.h' + +} // Load base.config by default for all pipelines includeConfig 'conf/base.config' +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' + // Load nf-core custom profiles from different Institutions try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" + includeConfig "${params.custom_config_base}/nfcore_custom.config" } catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") + System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") } profiles { - conda { process.conda = "$baseDir/environment.yml" } - debug { process.beforeScript = 'echo $HOSTNAME' } - docker { - docker.enabled = true - // Avoid this error: - // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. - // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 - // once this is established and works well, nextflow might implement this behavior as new default. - docker.runOptions = '-u \$(id -u):\$(id -g)' - } - singularity { - singularity.enabled = true - singularity.autoMounts = true - } - podman { - podman.enabled = true - } - test { includeConfig 'conf/test.config' } - test_dfast {includeConfig 'conf/test_dfast.config'} - test_long { includeConfig 'conf/test_long.config' } - test_long_miniasm { includeConfig 'conf/test_long_miniasm.config' } - test_hybrid { includeConfig 'conf/test_hybrid.config' } -} - -// Load igenomes.config if required -if (!params.igenomes_ignore) { - includeConfig 'conf/igenomes.config' + debug { process.beforeScript = 'echo $HOSTNAME' } + conda { + params.enable_conda = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + docker { + docker.enabled = true + docker.userEmulation = true + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + podman { + podman.enabled = true + docker.enabled = false + singularity.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + shifter { + shifter.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + charliecloud.enabled = false + } + charliecloud { + charliecloud.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + } + test { includeConfig 'conf/test.config' } + test_dfast { includeConfig 'conf/test_dfast.config' } + test_hybrid { includeConfig 'conf/test_hybrid.config' } + test_long { includeConfig 'conf/test_long.config' } + test_long_miniasm { includeConfig 'conf/test_long_miniasm.config' } + test_full { includeConfig 'conf/test_full.config' } } // Export these variables to prevent local Python/R libraries from conflicting with those in the container env { - PYTHONNOUSERSITE = 1 - R_PROFILE_USER = "/.Rprofile" - R_ENVIRON_USER = "/.Renviron" + PYTHONNOUSERSITE = 1 + R_PROFILE_USER = "/.Rprofile" + R_ENVIRON_USER = "/.Renviron" } // Capture exit codes from upstream processes when piping process.shell = ['/bin/bash', '-euo', 'pipefail'] +def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') timeline { - enabled = true - file = "${params.tracedir}/execution_timeline.html" + enabled = true + file = "${params.tracedir}/execution_timeline_${trace_timestamp}.html" } report { - enabled = true - file = "${params.tracedir}/execution_report.html" + enabled = true + file = "${params.tracedir}/execution_report_${trace_timestamp}.html" } trace { - enabled = true - file = "${params.tracedir}/execution_trace.txt" + enabled = true + file = "${params.tracedir}/execution_trace_${trace_timestamp}.txt" } dag { - enabled = true - file = "${params.tracedir}/pipeline_dag.svg" + enabled = true + file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg" } manifest { - name = 'nf-core/bacass' - author = 'Andreas Wilm, Alexander Peltzer' - homePage = 'https://github.com/nf-core/bacass' - description = 'Simple bacterial assembly and annotation pipeline.' - mainScript = 'main.nf' - nextflowVersion = '>=19.10.0' - version = '1.1.1' + name = 'nf-core/bacass' + author = 'Andreas Wilm, Alexander Peltzer' + homePage = 'https://github.com/nf-core/bacass' + description = 'Simple bacterial assembly and annotation' + mainScript = 'main.nf' + nextflowVersion = '!>=21.04.0' + version = '2.0.0' } // Function to ensure that resource requirements don't go beyond // a maximum limit def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj + if (type == 'memory') { + try { + if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) + return params.max_memory as nextflow.util.MemoryUnit + else + return obj + } catch (all) { + println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'time') { + try { + if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) + return params.max_time as nextflow.util.Duration + else + return obj + } catch (all) { + println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'cpus') { + try { + return Math.min( obj, params.max_cpus as int ) + } catch (all) { + println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" + return obj + } } - } } diff --git a/nextflow_schema.json b/nextflow_schema.json index e1029c3e..216c26fd 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -2,7 +2,7 @@ "$schema": "http://json-schema.org/draft-07/schema", "$id": "https://raw.githubusercontent.com/nf-core/bacass/master/nextflow_schema.json", "title": "nf-core/bacass pipeline parameters", - "description": "Simple bacterial assembly and annotation pipeline.", + "description": "Simple bacterial assembly and annotation", "type": "object", "definitions": { "input_output_options": { @@ -16,13 +16,17 @@ "properties": { "input": { "type": "string", - "fa_icon": "fas fa-dna", - "description": "The input design file for the pipeline.", - "help_text": "\nUse this to specify the location of your input design file. For example:\n\n```\n--input 'design_hybrid.tsv'\n```\n\nAn example of properly formatted input files can be found at the [nf-core/testData](https://github.com/nf-core/test-datasets/tree/bacass). Exemplarily, this is the input used for a hybrid assembly in testing:\n\n```\nID R1 R2 LongFastQ Fast5 GenomeSize\nERR044595 https://github.com/nf-core/test-datasets/raw/bacass/ERR044595_1M_1.fastq.gz https://github.com/nf-core/test-datasets/raw/bacass/ERR044595_1M_2.fastq.gz https://github.com/nf-core/test-datasets/raw/bacass/nanopore/subset15000.fq.gz NA 2.8m\n```\n\n* `ID` The identifier to use for handling the dataset e.g. sample name\n* `R1` The forward reads in case of available short-read data\n* `R2` The reverse reads in case of available short-read data\n* `LongFastQ` The long*read FastQ file with reads in FASTQ format\n* `Fast5` The folder containing the basecalled FAST5 files\n* `GenomeSize` The expected genome size of the assembly. Only used by the canu assembler.\n\nMissing values (e.g. FAST5 folder in case of short reads) can be omitted by using a `NA` in the TSV file. The pipeline will handle such cases appropriately then." + "format": "file-path", + "mimetype": "text/csv", + "pattern": "^\\S+\\.csv$", + "schema": "assets/schema_input.json", + "description": "Path to comma-separated file containing information about the samples in the experiment.", + "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a tab-separated file with 6 columns, and a header row. See [usage docs](https://nf-co.re/bacass/usage#samplesheet-input).\n\nFor example:\n\n`--input 'design_hybrid.csv'`\n\nAn example of properly formatted input files can be found at the [nf-core/test-datasets](https://github.com/nf-core/test-datasets/tree/bacass). \n\nFor example, this is the input used for a hybrid assembly in testing:\nID R1 R2 LongFastQ Fast5 GenomeSize\nERR044595 https://github.com/nf-core/test-datasets/raw/bacass/ERR044595_1M_1.fastq.gz https://github.com/nf-core/test-datasets/raw/bacass/ERR044595_1M_2.fastq.gz https://github.com/nf-core/test-datasets/raw/bacass/nanopore/subset15000.fq.gz NA 2.8m\n\n* `ID`: The identifier to use for handling the dataset e.g. sample name\n* `R1`: The forward reads in case of available short-read data\n* `R2`: The reverse reads in case of available short-read data\n* `LongFastQ`: The long read FastQ file with reads in FASTQ format\n* `Fast5`: The folder containing the basecalled fast5 files\n* `GenomeSize`: The expected genome size of the assembly. Only used by the canu assembler.\n\nMissing values (e.g. Fast5 folder in case of short reads) can be omitted by using a `NA` in the TSV file. The pipeline will handle such cases appropriately then.", + "fa_icon": "fas fa-file-csv" }, "outdir": { "type": "string", - "description": "The output directory where the results will be saved.", + "description": "Path to the output directory where the results will be saved.", "default": "./results", "fa_icon": "fas fa-folder-open" }, @@ -35,117 +39,121 @@ } } }, - "generic_options": { - "title": "Generic options", + "contamination_screening": { + "title": "Contamination Screening", "type": "object", - "fa_icon": "fas fa-file-import", - "description": "Less common options for the pipeline, typically set in a config file.", - "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", + "description": "", + "default": "", + "fa_icon": "fas fa-box", "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "hidden": true, - "fa_icon": "fas fa-question-circle" - }, - "publish_dir_mode": { + "kraken2db": { "type": "string", - "default": "copy", - "hidden": true, - "description": "Method used to save pipeline results to output directory.", - "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", - "fa_icon": "fas fa-copy", - "enum": [ - "symlink", - "rellink", - "link", - "copy", - "copyNoFollow", - "move" - ] + "fa_icon": "fab fa-gitkraken", + "help_text": "See [Kraken2 homepage](https://benlangmead.github.io/aws-indexes/k2) for download\nlinks. Minikraken2 8GB is a reasonable choice, since we run Kraken here mainly just to check for\nsample purity.", + "description": "Path to Kraken2 database." + } + } + }, + "assembly_parameters": { + "title": "Assembly parameters", + "type": "object", + "description": "Parameters for the assembly", + "default": "", + "fa_icon": "fas fa-puzzle-piece", + "properties": { + "assembler": { + "type": "string", + "default": "unicycler", + "fa_icon": "fas fa-puzzle-piece", + "description": "The assembler to use for assembly. Available options are `Unicycler`, `Canu`, `Miniasm`. The latter two are only available for long-read data, whereas Unicycler can be used for short or hybrid assembly projects." }, - "name": { + "assembly_type": { "type": "string", - "description": "Workflow name.", + "default": "short", "fa_icon": "fas fa-fingerprint", - "hidden": true, - "help_text": "A custom name for the pipeline run. Unlike the core nextflow `-name` option with one hyphen this parameter can be reused multiple times, for example if using `-resume`. Passed through to steps such as MultiQC and used for things like report filenames and titles." - }, - "plaintext_email": { - "type": "boolean", - "description": "Send plain-text email instead of HTML.", - "fa_icon": "fas fa-remove-format", - "hidden": true, - "help_text": "Set to receive plain-text e-mails instead of HTML formatted." + "help_text": "This adjusts the type of assembly done with the input data and can be any of `long`, `short` or `hybrid`. Short & Hybrid assembly will always run Unicycler, whereas long-read assembly can be configured separately using the `--assembler` parameter.", + "description": "Which type of assembly to perform." }, - "max_multiqc_email_size": { + "unicycler_args": { "type": "string", - "description": "File size limit when attaching MultiQC reports to summary emails.", - "default": "25.MB", - "fa_icon": "fas fa-file-upload", - "hidden": true, - "help_text": "If file generated by pipeline exceeds the threshold, it will not be attached." - }, - "monochrome_logs": { - "type": "boolean", - "description": "Do not use coloured log outputs.", - "fa_icon": "fas fa-palette", - "hidden": true, - "help_text": "Set to disable colourful command line output and live life in monochrome." + "fa_icon": "fas fa-bicycle", + "description": "Extra arguments for Unicycler", + "help_text": "This advanced option allows you to pass extra arguments to Unicycler (e.g. `\"--mode conservative\"` or `\"--no_correct\"`). For this to work you need to quote the arguments and add at least one space." }, - "multiqc_config": { + "canu_args": { "type": "string", - "description": "Custom config file to supply to MultiQC.", - "fa_icon": "fas fa-cog", - "hidden": true - }, - "tracedir": { + "fa_icon": "fas fa-ship", + "description": "This can be used to supply [extra options](https://canu.readthedocs.io/en/latest/quick-start.html) to the Canu assembler. Will be ignored when other assemblers are used." + } + } + }, + "assembly_polishing": { + "title": "Assembly Polishing", + "type": "object", + "description": "", + "default": "", + "fa_icon": "fas fa-user-astronaut", + "properties": { + "polish_method": { "type": "string", - "description": "Directory to keep pipeline Nextflow logs and reports.", - "default": "${params.outdir}/pipeline_info", - "fa_icon": "fas fa-cogs", - "hidden": true + "default": "medaka", + "fa_icon": "fas fa-hotdog", + "description": "Which assembly polishing method to use.", + "help_text": "Can be used to define which polishing method is used by default for long reads. Default is `medaka`, available options are `nanopolish` or `medaka`." + } + } + }, + "annotation": { + "title": "Annotation", + "type": "object", + "description": "", + "default": "", + "fa_icon": "fas fa-align-left", + "properties": { + "annotation_tool": { + "type": "string", + "default": "prokka", + "description": "The annotation method to annotate the final assembly. Default choice is `prokka`, but the `dfast` tool is also available. For the latter, make sure to create your specific config if you're not happy with the default one provided. See [#dfast_config](#dfastconfig) to find out how." }, - "igenomes_base": { + "prokka_args": { "type": "string", - "default": "s3://ngi-igenomes/igenomes/" + "description": "Extra arguments for prokka annotation tool.", + "help_text": "This advanced option allows you to pass extra arguments to Prokka (e.g. `\" --rfam\"` or `\" --genus name\"`). For this to work you need to quote the arguments and add at least one space between the arguments. Example:\n\n```bash\n--prokka_args `--rfam --genus Escherichia Coli`\n```\n" }, - "igenomes_ignore": { + "dfast_config": { "type": "string", - "default": "true" + "default": "assets/test_config_dfast.py", + "description": "Specifies a configuration file for the [DFAST](https://github.com/nigyta/dfast_core) annotation method.", + "help_text": "This can be used instead of PROKKA if required to specify a specific config file for annotation. If you want to know how to create your config file, please refer to the [DFAST](https://github.com/nigyta/dfast_core) readme on how to create one. The default config (`assets/test_config_dfast.py`) is just included for testing, so if you want to annotate using DFAST, you have to create a config!" } } }, - "max_job_request_options": { - "title": "Max job request options", + "skipping_options": { + "title": "Skipping Options", "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", + "description": "", + "default": "", + "fa_icon": "fas fa-forward", "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true, - "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" + "skip_kraken2": { + "type": "boolean", + "fa_icon": "fas fa-forward", + "description": "Skip running Kraken2 classifier on reads." }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", - "fa_icon": "fas fa-memory", - "hidden": true, - "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" + "skip_annotation": { + "type": "boolean", + "fa_icon": "fas fa-forward", + "description": "Skip annotating the assembly with Prokka /DFAST." }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job.", - "default": "240.h", - "fa_icon": "far fa-clock", - "hidden": true, - "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" + "skip_pycoqc": { + "type": "boolean", + "fa_icon": "fas fa-forward", + "description": "Skip running `PycoQC` on long read input." + }, + "skip_polish": { + "type": "boolean", + "fa_icon": "fas fa-forward", + "description": "Skip polishing the long-read assembly with fast5 input. Will not affect short/hybrid assemblies." } } }, @@ -161,15 +169,14 @@ "description": "Git commit id for Institutional configs.", "default": "master", "hidden": true, - "fa_icon": "fas fa-users-cog", - "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```" + "fa_icon": "fas fa-users-cog" }, "custom_config_base": { "type": "string", "description": "Base directory for Institutional configs.", "default": "https://raw.githubusercontent.com/nf-core/configs/master", "hidden": true, - "help_text": "If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the `custom_config_base` option. For example:\n\n```bash\n## Download and unzip the config files\ncd /path/to/my/configs\nwget https://github.com/nf-core/configs/archive/master.zip\nunzip master.zip\n\n## Run the pipeline\ncd /path/to/my/data\nnextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/\n```\n\n> Note that the nf-core/tools helper package has a `download` command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.", + "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.", "fa_icon": "fas fa-users-cog" }, "hostnames": { @@ -178,6 +185,12 @@ "hidden": true, "fa_icon": "fas fa-users-cog" }, + "config_profile_name": { + "type": "string", + "description": "Institutional config name.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, "config_profile_description": { "type": "string", "description": "Institutional config description.", @@ -198,120 +211,142 @@ } } }, - "contamination_screening": { - "title": "Contamination Screening", - "type": "object", - "description": "", - "default": "", - "fa_icon": "fas fa-box", - "properties": { - "kraken2db": { - "type": "string", - "fa_icon": "fab fa-gitkraken", - "help_text": "See [Kraken2 homepage](https://benlangmead.github.io/aws-indexes/k2) for download\nlinks. Minikraken2 8GB is a reasonable choice, since we run Kraken here mainly just to check for\nsample purity.", - "description": "Path to Kraken2 database." - } - } - }, - "assembly_parameters": { - "title": "Assembly parameters", + "max_job_request_options": { + "title": "Max job request options", "type": "object", - "description": "Parameters for the assembly", - "default": "", - "fa_icon": "fas fa-puzzle-piece", + "fa_icon": "fab fa-acquisitions-incorporated", + "description": "Set the top limit for requested resources for any single job.", + "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", "properties": { - "assembler": { - "type": "string", - "default": "unicycler", - "fa_icon": "fas fa-puzzle-piece", - "description": "The assembler to use for assembly. Available options are `Unicycler`, `Canu`, `Miniasm`. The latter two are only available for long-read data, whereas Unicycler can be used for short or hybrid assembly projects." - }, - "assembly_type": { - "type": "string", - "default": "short", - "fa_icon": "fas fa-fingerprint", - "help_text": "This adjusts the type of assembly done with the input data and can be any of `long`, `short` or `hybrid`. Short & Hybrid assembly will always run Unicycler, whereas long-read assembly can be configured separately using the `--assembler` parameter.", - "description": "Which type of assembly to perform." + "max_cpus": { + "type": "integer", + "description": "Maximum number of CPUs that can be requested for any single job.", + "default": 16, + "fa_icon": "fas fa-microchip", + "hidden": true, + "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" }, - "unicycler_args": { + "max_memory": { "type": "string", - "fa_icon": "fas fa-bicycle", - "description": "Extra arguments for Unicycler", - "help_text": "This advanced option allows you to pass extra arguments to Unicycler (e.g. `\"--mode conservative\"` or `\"--no_correct\"`). For this to work you need to quote the arguments and add at least one space." + "description": "Maximum amount of memory that can be requested for any single job.", + "default": "128.GB", + "fa_icon": "fas fa-memory", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", + "hidden": true, + "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" }, - "canu_args": { + "max_time": { "type": "string", - "fa_icon": "fas fa-ship", - "description": "This can be used to supply [extra options](https://canu.readthedocs.io/en/latest/quick-start.html) to the Canu assembler. Will be ignored when other assemblers are used." + "description": "Maximum amount of time that can be requested for any single job.", + "default": "240.h", + "fa_icon": "far fa-clock", + "pattern": "^(\\d+\\.?\\s*(s|m|h|day)\\s*)+$", + "hidden": true, + "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" } } }, - "annotation": { - "title": "Annotation", + "generic_options": { + "title": "Generic options", "type": "object", - "description": "", - "default": "", - "fa_icon": "fas fa-align-left", + "fa_icon": "fas fa-file-import", + "description": "Less common options for the pipeline, typically set in a config file.", + "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", "properties": { - "annotation_tool": { - "type": "string", - "default": "prokka", - "description": "The annotation method to annotate the final assembly. Default choice is `prokka`, but the `dfast` tool is also available. For the latter, make sure to create your specific config if you're not happy with the default one provided. See [#dfast_config](#dfastconfig) to find out how." + "help": { + "type": "boolean", + "description": "Display help text.", + "fa_icon": "fas fa-question-circle", + "hidden": true }, - "prokka_args": { + "publish_dir_mode": { "type": "string", - "description": "Extra arguments for prokka annotation tool.", - "help_text": "This advanced option allows you to pass extra arguments to Prokka (e.g. `\" --rfam\"` or `\" --genus name\"`). For this to work you need to quote the arguments and add at least one space between the arguments. Example:\n\n```bash\n--prokka_args `--rfam --genus Escherichia Coli`\n```\n" + "default": "copy", + "description": "Method used to save pipeline results to output directory.", + "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", + "fa_icon": "fas fa-copy", + "enum": [ + "symlink", + "rellink", + "link", + "copy", + "copyNoFollow", + "move" + ], + "hidden": true }, - "dfast_config": { - "type": "string", - "default": "/Users/alexanderpeltzer/IDEA/nf-core/bacass/assets/test_config_dfast.py", - "description": "Specifies a configuration file for the [DFAST](https://github.com/nigyta/dfast_core) annotation method. This can be used instead of PROKKA if required to specify a specific config file for annotation. If you want to know how to create your config file, please refer to the [DFAST](https://github.com/nigyta/dfast_core) readme on how to create one. > The default config is just included for testing, so if you want to annotate using DFAST, you have to create a config!" - } - } - }, - "assembly_polishing": { - "title": "Assembly Polishing", - "type": "object", - "description": "", - "default": "", - "fa_icon": "fas fa-user-astronaut", - "properties": { - "polish_method": { + "multiqc_title": { "type": "string", - "default": "medaka", - "fa_icon": "fas fa-hotdog", - "description": "Which assembly polishing method to use.", - "help_text": "Can be used to define which polishing method is used by default for long reads. Default is `medaka`, available options are `nanopolish` or `medaka`." - } - } - }, - "skipping_options": { - "title": "Skipping Options", - "type": "object", - "description": "", - "default": "", - "fa_icon": "fas fa-forward", - "properties": { - "skip_kraken2": { + "description": "MultiQC report title. Printed as page header, used for filename if not otherwise specified.", + "fa_icon": "fas fa-file-signature" + }, + "email_on_fail": { "type": "string", - "fa_icon": "fas fa-forward", - "description": "Skip running Kraken2 classifier on reads." + "description": "Email address for completion summary, only when pipeline fails.", + "fa_icon": "fas fa-exclamation-triangle", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", + "help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.", + "hidden": true }, - "skip_annotation": { + "plaintext_email": { + "type": "boolean", + "description": "Send plain-text email instead of HTML.", + "fa_icon": "fas fa-remove-format", + "hidden": true + }, + "max_multiqc_email_size": { "type": "string", - "fa_icon": "fas fa-forward", - "description": "Skip annotating the assembly with Prokka /DFAST." + "description": "File size limit when attaching MultiQC reports to summary emails.", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", + "default": "25.MB", + "fa_icon": "fas fa-file-upload", + "hidden": true }, - "skip_pycoqc": { + "monochrome_logs": { + "type": "boolean", + "description": "Do not use coloured log outputs.", + "fa_icon": "fas fa-palette", + "hidden": true + }, + "multiqc_config": { "type": "string", - "fa_icon": "fas fa-forward", - "description": "Skip running `PycoQC` on long read input." + "description": "Custom config file to supply to MultiQC.", + "fa_icon": "fas fa-cog", + "hidden": true }, - "skip_polish": { + "tracedir": { "type": "string", - "fa_icon": "fas fa-forward", - "description": "Skip polishing the long-read assembly with FAST5 input. Will not affect short/hybrid assemblies." + "description": "Directory to keep pipeline Nextflow logs and reports.", + "default": "${params.outdir}/pipeline_info", + "fa_icon": "fas fa-cogs", + "hidden": true + }, + "validate_params": { + "type": "boolean", + "description": "Boolean whether to validate parameters against the schema at runtime", + "default": true, + "fa_icon": "fas fa-check-square", + "hidden": true + }, + "show_hidden_params": { + "type": "boolean", + "fa_icon": "far fa-eye-slash", + "description": "Show all params when using `--help`", + "hidden": true, + "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." + }, + "enable_conda": { + "type": "boolean", + "description": "Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.", + "hidden": true, + "fa_icon": "fas fa-bacon" + }, + "singularity_pull_docker_container": { + "type": "boolean", + "description": "Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead.", + "hidden": true, + "fa_icon": "fas fa-toolbox", + "help_text": "This may be useful for example if you are unable to directly pull Singularity containers to run the pipeline due to http/https proxy issues." } } } @@ -321,28 +356,28 @@ "$ref": "#/definitions/input_output_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/definitions/contamination_screening" }, { - "$ref": "#/definitions/max_job_request_options" + "$ref": "#/definitions/assembly_parameters" }, { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/definitions/assembly_polishing" }, { - "$ref": "#/definitions/contamination_screening" + "$ref": "#/definitions/annotation" }, { - "$ref": "#/definitions/assembly_parameters" + "$ref": "#/definitions/skipping_options" }, { - "$ref": "#/definitions/annotation" + "$ref": "#/definitions/institutional_config_options" }, { - "$ref": "#/definitions/assembly_polishing" + "$ref": "#/definitions/max_job_request_options" }, { - "$ref": "#/definitions/skipping_options" + "$ref": "#/definitions/generic_options" } ] } \ No newline at end of file diff --git a/subworkflows/local/input_check.nf b/subworkflows/local/input_check.nf new file mode 100644 index 00000000..7fb9249c --- /dev/null +++ b/subworkflows/local/input_check.nf @@ -0,0 +1,86 @@ +// +// Check input samplesheet and get read channels +// + +params.options = [:] + +workflow INPUT_CHECK { + take: + samplesheet // file: /path/to/samplesheet.csv + + main: + Channel + .fromPath( samplesheet ) + .ifEmpty {exit 1, log.info "Cannot find path file ${tsvFile}"} + .splitCsv ( header:true, sep:'\t' ) + .map { create_fastq_channels(it) } + .set { reads } + + // reconfigure channels + reads + .map { meta, reads, long_fastq, fast5 -> [ meta, reads ] } + .filter{ meta, reads -> reads != 'NA' } + .filter{ meta, reads -> reads[0] != 'NA' && reads[1] != 'NA' } + .set { shortreads } + reads + .map { meta, reads, long_fastq, fast5 -> [ meta, long_fastq ] } + .filter{ meta, long_fastq -> long_fastq != 'NA' } + .set { longreads } + reads + .map { meta, reads, long_fastq, fast5 -> [ meta, fast5 ] } + .filter{ meta, fast5 -> fast5 != 'NA' } + .set { fast5 } + + emit: + reads // channel: [ val(meta), [ reads ], long_fastq, fast5 ] + shortreads // channel: [ val(meta), [ reads ] ] + longreads // channel: [ val(meta), long_fastq ] + fast5 // channel: [ val(meta), fast5 ] +} + +// Function to get list of [ meta, [ fastq_1, fastq_2 ], long_fastq, fast5 ] +def create_fastq_channels(LinkedHashMap row) { + def meta = [:] + meta.id = row.ID + meta.single_end = false + meta.genome_size = row.GenomeSize == null ? 'NA' : row.GenomeSize + + def array = [] + // check short reads + if ( !(row.R1 == 'NA') ) { + if ( !file(row.R1).exists() ) { + exit 1, "ERROR: Please check input samplesheet -> Read 1 FastQ file does not exist!\n${row.R1}" + } + fastq_1 = file(row.R1) + } else { fastq_1 = 'NA' } + if ( !(row.R2 == 'NA') ) { + if ( !file(row.R2).exists() ) { + exit 1, "ERROR: Please check input samplesheet -> Read 2 FastQ file does not exist!\n${row.R2}" + } + fastq_2 = file(row.R2) + } else { fastq_2 = 'NA' } + + // check long_fastq + if ( !(row.LongFastQ == 'NA') ) { + if ( !file(row.LongFastQ).exists() ) { + exit 1, "ERROR: Please check input samplesheet -> Long FastQ file does not exist!\n${row.R1}" + } + long_fastq = file(row.LongFastQ) + } else { long_fastq = 'NA' } + + // check long_fastq + if ( !(row.Fast5 == 'NA') ) { + if ( !file(row.Fast5).exists() ) { + exit 1, "ERROR: Please check input samplesheet -> Fast5 file does not exist!\n${row.R1}" + } + fast5 = file(row.Fast5) + } else { fast5 = 'NA' } + + // prepare output // currently does not allow single end data! + if ( meta.single_end ) { + array = [ meta, fastq_1 , long_fastq, fast5 ] + } else { + array = [ meta, [ fastq_1, fastq_2 ], long_fastq, fast5 ] + } + return array +} diff --git a/workflows/bacass.nf b/workflows/bacass.nf new file mode 100644 index 00000000..1a9aaaf7 --- /dev/null +++ b/workflows/bacass.nf @@ -0,0 +1,403 @@ +/* +======================================================================================== + VALIDATE INPUTS +======================================================================================== +*/ + +def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) + +// Validate input parameters +WorkflowBacass.initialise(params, log) + +// Check input path parameters to see if they exist +def checkPathParamList = [ params.input, params.multiqc_config, params.kraken2db, params.dfast_config ] +for (param in checkPathParamList) { if (param) { file(param, checkIfExists: true) } } + +// Check mandatory parameters +if (params.input) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet not specified!' } + +// Check krakendb +if(! params.skip_kraken2){ + if(params.kraken2db){ + kraken2db = file(params.kraken2db) + } else { + exit 1, "Missing Kraken2 DB arg" + } +} + +/* +======================================================================================== + CONFIG FILES +======================================================================================== +*/ + +ch_multiqc_config = file("$projectDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config) : Channel.empty() + +/* +======================================================================================== + IMPORT LOCAL MODULES/SUBWORKFLOWS +======================================================================================== +*/ + +// Don't overwrite global params.modules, create a copy instead and use that within the main script. +def modules = params.modules.clone() + +def unicycler_options = modules['unicycler'] +unicycler_options.args += " $params.unicycler_args" + +def canu_options = modules['canu'] +canu_options.args += " $params.canu_args" + +// +// MODULE: Local to the pipeline +// +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files : ['tsv':'']] ) +include { SKEWER } from '../modules/local/skewer' addParams( options: modules['skewer'] ) +include { NANOPLOT } from '../modules/local/nanoplot' addParams( options: modules['nanoplot'] ) +include { PYCOQC } from '../modules/local/pycoqc' addParams( options: modules['pycoqc'] ) +include { PORECHOP } from '../modules/local/porechop' addParams( options: modules['porechop'] ) +include { UNICYCLER } from '../modules/local/unicycler' addParams( options: unicycler_options ) +include { CANU } from '../modules/local/canu' addParams( options: canu_options ) +include { MINIMAP2_ALIGN } from '../modules/local/minimap_align' addParams( options: modules['minimap_align'] ) +include { MINIMAP2_ALIGN as MINIMAP2_CONSENSUS } from '../modules/local/minimap_align' addParams( options: modules['minimap_consensus']) +include { MINIMAP2_ALIGN as MINIMAP2_POLISH } from '../modules/local/minimap_align' addParams( options: modules['minimap_polish']) +include { MINIASM } from '../modules/local/miniasm' addParams( options: modules['miniasm'] ) +include { RACON } from '../modules/local/racon' addParams( options: modules['racon'] ) +include { MEDAKA } from '../modules/local/medaka' addParams( options: modules['medaka'] ) +include { NANOPOLISH } from '../modules/local/nanopolish' addParams( options: modules['nanopolish'] ) +include { KRAKEN2_DB_PREPARATION} from '../modules/local/kraken2_db_preparation' +include { DFAST } from '../modules/local/dfast' addParams( options: modules['dfast'] ) + +// +// SUBWORKFLOW: Consisting of a mix of local and nf-core/modules +// +include { INPUT_CHECK } from '../subworkflows/local/input_check' addParams( options: [:] ) + +/* +======================================================================================== + IMPORT NF-CORE MODULES/SUBWORKFLOWS +======================================================================================== +*/ + +def multiqc_options = modules['multiqc'] +multiqc_options.args += params.multiqc_title ? Utils.joinModuleArgs(["--title \"$params.multiqc_title\""]) : '' + +def prokka_options = modules['prokka'] +prokka_options.args += " $params.prokka_args" + +// +// MODULE: Installed directly from nf-core/modules +// +include { FASTQC } from '../modules/nf-core/modules/fastqc/main' addParams( options: modules['fastqc'] ) +include { SAMTOOLS_SORT } from '../modules/nf-core/modules/samtools/sort/main' addParams( options: [publish_files : false] ) +include { SAMTOOLS_INDEX } from '../modules/nf-core/modules/samtools/index/main' addParams( options: [publish_files : false] ) +include { KRAKEN2_KRAKEN2 as KRAKEN2 } from '../modules/nf-core/modules/kraken2/kraken2/main' addParams( options: modules['kraken2'] ) +include { KRAKEN2_KRAKEN2 as KRAKEN2_LONG } from '../modules/nf-core/modules/kraken2/kraken2/main' addParams( options: modules['kraken2_long'] ) +include { QUAST } from '../modules/nf-core/modules/quast/main' addParams( options: modules['quast'] ) +include { PROKKA } from '../modules/nf-core/modules/prokka/main' addParams( options: prokka_options ) +include { MULTIQC } from '../modules/nf-core/modules/multiqc/main' addParams( options: multiqc_options ) + +/* +======================================================================================== + RUN MAIN WORKFLOW +======================================================================================== +*/ + +// Info required for completion email and summary +def multiqc_report = [] + +workflow BACASS { + + ch_software_versions = Channel.empty() + + // + // SUBWORKFLOW: Read in samplesheet, validate and stage input files + // + INPUT_CHECK ( + ch_input + ) + + // + // MODULE: Run FastQC + // + FASTQC ( + INPUT_CHECK.out.shortreads + ) + ch_software_versions = ch_software_versions.mix(FASTQC.out.version.first().ifEmpty(null)) + + // + // MODULE: Skewer, trim and combine short read read-pairs per sample. + // + SKEWER ( + INPUT_CHECK.out.shortreads.dump(tag: 'shortreads') + ) + ch_software_versions = ch_software_versions.mix(SKEWER.out.version.first().ifEmpty(null)) + + // + // MODULE: Nanoplot, quality check for nanopore reads and Quality/Length Plots + // + NANOPLOT ( + INPUT_CHECK.out.longreads + ) + ch_software_versions = ch_software_versions.mix(NANOPLOT.out.version.first().ifEmpty(null)) + + // + // MODULE: PYCOQC, quality check for nanopore reads and Quality/Length Plots + // + if ( !params.skip_pycoqc ) { + PYCOQC ( + INPUT_CHECK.out.fast5.dump(tag: 'fast5') + ) + ch_software_versions = ch_software_versions.mix(PYCOQC.out.version.first().ifEmpty(null)) + } + + // + // MODULE: PYCOQC, quality check for nanopore reads and Quality/Length Plots + // + if ( params.assembly_type == 'hybrid' || params.assembly_type == 'long' && !('short' in params.assembly_type) ) { + PORECHOP ( + INPUT_CHECK.out.longreads.dump(tag: 'longreads') + ) + ch_software_versions = ch_software_versions.mix(PORECHOP.out.version.first().ifEmpty(null)) + } + + // + // Join channels for assemblers. As samples have the same meta data, we can simply use join() to merge the channels based on this. If we only have one of the channels we insert 'NAs' which are not used in the unicycler process then subsequently, in case of short or long read only assembly. + // Prepare channel for Kraken2 + // + if(params.assembly_type == 'hybrid'){ + ch_for_kraken2_short = SKEWER.out.reads + ch_for_kraken2_long = PORECHOP.out.reads.dump(tag: 'porechop') + SKEWER.out.reads + .dump(tag: 'skewer') + .join(PORECHOP.out.reads) + .dump(tag: 'ch_for_assembly') + .set { ch_for_assembly } + } else if ( params.assembly_type == 'short' ) { + ch_for_kraken2_short = SKEWER.out.reads + ch_for_kraken2_long = Channel.empty() + SKEWER.out.reads + .dump(tag: 'skewer') + .map{ meta,reads -> tuple(meta,reads,'NA') } + .dump(tag: 'ch_for_assembly') + .set { ch_for_assembly } + } else if ( params.assembly_type == 'long' ) { + ch_for_kraken2_short = Channel.empty() + ch_for_kraken2_long = PORECHOP.out.reads + PORECHOP.out.reads + .dump(tag: 'porechop') + .map{ meta,lr -> tuple(meta,'NA',lr) } + .dump(tag: 'ch_for_assembly') + .set { ch_for_assembly } + } + + // + // ASSEMBLY: Unicycler, Canu, Miniasm + // + ch_assembly = Channel.empty() + + // + // MODULE: Unicycler, genome assembly, nf-core module allows only short assembly + // + if ( params.assembler == 'unicycler' ) { + UNICYCLER ( + ch_for_assembly + ) + ch_assembly = ch_assembly.mix( UNICYCLER.out.scaffolds.dump(tag: 'unicycler') ) + ch_software_versions = ch_software_versions.mix(UNICYCLER.out.version.first().ifEmpty(null)) + } + + // + // MODULE: Canu, genome assembly, long reads + // + if ( params.assembler == 'canu' ) { + CANU ( + ch_for_assembly + ) + ch_assembly = ch_assembly.mix( CANU.out.assembly.dump(tag: 'canu') ) + ch_software_versions = ch_software_versions.mix(CANU.out.version.first().ifEmpty(null)) + } + + // + // MODULE: Miniasm, genome assembly, long reads + // + if ( params.assembler == 'miniasm' ) { + MINIMAP2_ALIGN ( + ch_for_assembly.map{ meta,sr,lr -> tuple(meta,sr,lr,lr) } + ) + ch_software_versions = ch_software_versions.mix(MINIMAP2_ALIGN.out.version.first().ifEmpty(null)) + MINIASM ( + MINIMAP2_ALIGN.out.paf.dump(tag: 'minimap2') + ) + ch_software_versions = ch_software_versions.mix(MINIASM.out.version.first().ifEmpty(null)) + MINIMAP2_CONSENSUS ( + MINIASM.out.all.dump(tag: 'miniasm') + ) + RACON ( + MINIMAP2_CONSENSUS.out.paf.dump(tag: 'minimap2_consensus') + ) + ch_assembly = ch_assembly.mix( RACON.out.assembly.dump(tag: 'miniasm') ) + ch_software_versions = ch_software_versions.mix(RACON.out.version.first().ifEmpty(null)) + } + + // + // MODULE: Nanopolish, polishes assembly using FAST5 files - should take either miniasm, canu, or unicycler consensus sequence + // + if ( !params.skip_polish && params.assembly_type == 'long' && params.polish_method != 'medaka' ) { + ch_for_assembly + .join( ch_assembly ) + .set { ch_for_polish } + MINIMAP2_POLISH ( + ch_for_polish.dump(tag: 'into_minimap2_polish') + ) + ch_software_versions = ch_software_versions.mix(MINIMAP2_POLISH.out.version.first().ifEmpty(null)) + SAMTOOLS_SORT ( + MINIMAP2_POLISH.out.paf.map{ meta,sr,lr,ref,paf -> tuple(meta,paf) }.dump(tag: 'minimap2_polish') + ) + ch_software_versions = ch_software_versions.mix(SAMTOOLS_SORT.out.version.first().ifEmpty(null)) + SAMTOOLS_INDEX ( + SAMTOOLS_SORT.out.bam.dump(tag: 'samtools_sort') + ) + ch_software_versions = ch_software_versions.mix(SAMTOOLS_INDEX.out.version.first().ifEmpty(null)) + ch_for_polish //tuple val(meta), val(reads), file(longreads), file(assembly) + .join( SAMTOOLS_SORT.out.bam ) //tuple val(meta), file(bam) + .join( SAMTOOLS_INDEX.out.bai ) //tuple val(meta), file(bai) + .join( INPUT_CHECK.out.fast5 ) //tuple val(meta), file(fast5) + .set { ch_for_nanopolish } //tuple val(meta), val(reads), file(longreads), file(assembly), file(bam), file(bai), file(fast5) + NANOPOLISH ( + ch_for_nanopolish.dump(tag: 'into_nanopolish') + ) + ch_software_versions = ch_software_versions.mix(NANOPOLISH.out.version.first().ifEmpty(null)) + } + + // + // MODULE: Medaka, polishes assembly - should take either miniasm, canu, or unicycler consensus sequence + // + if ( !params.skip_polish && params.assembly_type == 'long' && params.polish_method == 'medaka' ) { + ch_assembly + .join( ch_for_assembly ) + .set { ch_for_medaka } + MEDAKA ( ch_for_medaka.dump(tag: 'into_medaka') ) + ch_software_versions = ch_software_versions.mix(MEDAKA.out.version.first().ifEmpty(null)) + } + + // + // MODULE: Kraken2, QC for sample purity + // + if ( !params.skip_kraken2 ) { + KRAKEN2_DB_PREPARATION ( + kraken2db + ) + KRAKEN2 ( + ch_for_kraken2_short.dump(tag: 'kraken2_short'), + KRAKEN2_DB_PREPARATION.out.db.map { info, db -> db }.dump(tag: 'kraken2_db_preparation') + ) + ch_software_versions = ch_software_versions.mix(KRAKEN2.out.version.first().ifEmpty(null)) + KRAKEN2_LONG ( + ch_for_kraken2_long + .map { meta, reads -> + info = [:] + info.id = meta.id + info.single_end = true + [ info, reads ] + } + .dump(tag: 'kraken2_long'), + KRAKEN2_DB_PREPARATION.out.db.map { info, db -> db }.dump(tag: 'kraken2_db_preparation') + ) + ch_software_versions = ch_software_versions.mix(KRAKEN2_LONG.out.version.first().ifEmpty(null)) + } + + // + // MODULE: QUAST, assembly QC + // + ch_assembly + .map { meta, fasta -> fasta } + .collect() + .set { ch_to_quast } + QUAST ( + ch_to_quast, + [], + [], + false, + false + ) + ch_software_versions = ch_software_versions.mix(QUAST.out.version.ifEmpty(null)) + + // + // MODULE: PROKKA, gene annotation + // + if ( !params.skip_annotation && params.annotation_tool == 'prokka' ) { + PROKKA ( + ch_assembly, + [], + [] + ) + ch_software_versions = ch_software_versions.mix(PROKKA.out.version.first().ifEmpty(null)) + } + + // + // MODULE: DFAST, gene annotation + // + // TODO: "dfast_file_downloader.py --protein dfast --dbroot ." could be used in a separate process and the db could be forwarded + if ( !params.skip_annotation && params.annotation_tool == 'dfast' ) { + DFAST ( + ch_assembly, + Channel.value(params.dfast_config ? file(params.dfast_config) : "") + ) + ch_software_versions = ch_software_versions.mix(DFAST.out.version.first().ifEmpty(null)) + } + + // + // MODULE: Pipeline reporting + // + ch_software_versions + .map { it -> if (it) [ it.baseName, it ] } + .groupTuple() + .map { it[1][0] } + .flatten() + .collect() + .set { ch_software_versions } + + GET_SOFTWARE_VERSIONS ( + ch_software_versions.map { it }.collect() + ) + + // + // MODULE: MultiQC + // + workflow_summary = WorkflowBacass.paramsSummaryMultiqc(workflow, summary_params) + ch_workflow_summary = Channel.value(workflow_summary) + + ch_multiqc_files = Channel.empty() + ch_multiqc_files = ch_multiqc_files.mix(Channel.from(ch_multiqc_config)) + ch_multiqc_files = ch_multiqc_files.mix(ch_multiqc_custom_config.collect().ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) + ch_multiqc_files = ch_multiqc_files.mix(GET_SOFTWARE_VERSIONS.out.yaml.collect()) + ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]}.ifEmpty([])) + + MULTIQC ( + ch_multiqc_files.collect() + ) + multiqc_report = MULTIQC.out.report.toList() + ch_software_versions = ch_software_versions.mix(MULTIQC.out.version.ifEmpty(null)) +} + +/* +======================================================================================== + COMPLETION EMAIL AND SUMMARY +======================================================================================== +*/ + +workflow.onComplete { + if (params.email || params.email_on_fail) { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) + } + NfcoreTemplate.summary(workflow, params, log) +} + +/* +======================================================================================== + THE END +======================================================================================== +*/