Merge pull request #50 from mskcc/feature/update_modules_to_base_develop

Feature/update modules to base develop
mskcc · Jul 30, 2024 · 7f820a8 · 7f820a8
2 parents 0aa3a64 + 23e53ba
commit 7f820a8
Show file tree

Hide file tree

Showing 34 changed files with 508 additions and 685 deletions.
diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml
@@ -11,14 +11,14 @@ on:
         description: "The specific branch you wish to utilize for the test execution of nf-core download."
         required: true
         default: "dev"
-#  pull_request:
-#    types:
-#      - opened
-#    branches:
-#      - master
-#  pull_request_target:
-#    branches:
-#      - master
+  pull_request:
+    types:
+      - opened
+    branches:
+      - master
+  pull_request_target:
+    branches:
+      - master
 
 env:
   NXF_ANSI_LOG: false

diff --git a/docs/images/luksza2021_fig3.png b/docs/images/luksza2021_fig3.png
diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png
diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png
diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png
diff --git a/docs/output.md b/docs/output.md
@@ -12,48 +12,95 @@ The directories listed below will be created in the results directory after the
 
 The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
 
-- [FastQC](#fastqc) - Raw read QC
-- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
-- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
+1. Create phylogenetic trees using [PhyloWGS](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0602-8)
+2. Use [netMHCpan-4](https://services.healthtech.dtu.dk/services/NetMHCpan-4.1/) to calculate binding affinities
+3. Use [netMHCpanStab](https://services.healthtech.dtu.dk/services/NetMHCstabpan-1.0/) to calculate stability scores
+4. Use Luksza et al.'s neoantigen quality and fitness computations tool ([NeoantigenEditing](https://github.com/LukszaLab/NeoantigenEditing)) to evaluate peptides
 
-### FastQC
+### PhyloWGS
 
 <details markdown="1">
 <summary>Output files</summary>
 
-- `fastqc/`
-  - `*_fastqc.html`: FastQC report containing quality metrics.
-  - `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.
+- `phylowgs/`
+  - `*_.summ.json.gz`: Output file for JSON-formatted tree summaries
+  - `*.muts.json.gz`: Output file for JSON-formatted list of mutations
+  - `*.muts.json.gz`: Output file for JSON-formatted list of mutations
+  - `*.muts.json.gz`: Output zipped folder for JSON-formatted list of SSMs and CNVs
 
 </details>
 
-[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
+### netMHCpan
 
-![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
-
-![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)
+<details markdown="1">
+<summary>Output files</summary>
 
-![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
+- `netmhcpan/`
+  - `*.xls`: TSV/XLS file of netMHCpan. This contains the MUT or WT antigens
+  - `*.WT.netmhcpan.output,*.MUT.netmhcpan.output`: STDOUT file of netMHCpan. A uniquely formated file of neoantigens. This contains either the MUT or WT neoantigens. Neoantigenutils contains a parser for this file.
 
-:::note
-The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality.
-:::
+</details>
 
-### MultiQC
+### netMHCstabpan
 
 <details markdown="1">
 <summary>Output files</summary>
 
-- `multiqc/`
-  - `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
-  - `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
-  - `multiqc_plots/`: directory containing static images from the report in various formats.
+- `netmhcstabpan/`
+  - `*.xls`: TSV/XLS file of netMHCpan. This contains the MUT or WT antigens
+  - `*.WT.netmhcpan.output,*.MUT.netmhcpan.output`: STDOUT file of netMHCpan. A uniquely formated file of neoantigens. This contains either the MUT or WT neoantigens. Neoantigenutils contains a parser for this file.
 
 </details>
 
-[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
+### Neoantigen Ediitng Final Output
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `neoantigenediting/`
+
+  - `*._annotated.json`: The final output of the pipeline. This file is an annotated version of the tree output from phyloWGS with an extra property titled 'neoantigens'. Each entry in 'neoantigens' is a property with properties describing the neoantigen. These neoantigen properities are described below
+
+    "id": "XSYI_MG_M_9_C1203_11",
+
+    "mutation_id": "X_72667534_C_G",
+
+    "HLA_gene_id": "HLA-C\*12:03",
 
-Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.
+    "sequence": "ASRSRHSPY",
+
+    "WT_sequence": "PSRSRHSPY",
+
+    "mutated_position": 1,
+
+    "Kd": 192.03,
+
+    "KdWT": 4582.17,
+
+    "R": 0.8911371281207195,
+
+    "logC": 2.263955023939215,
+
+    "logA": 3.1722763542054815,
+
+    "quality": 2.645601185190205
+
+  The above is an example output from a run. Each neoantigenic mutation will have an output like this.
+
+  - id: This is a unique id that combines an id created from the mutation, HLA allele, and window.
+  - mutation_id : ID containing the chromosome, position, ref and alt allele. I and D denote insertions and deletions respectively.
+  - HLA_gene_id : The HLA gene this neoantigen binds to
+  - sequence : Mutated sequence
+  - WT_sequence : The wild type sequence
+  - mutated_position : The position of the first difference
+  - Kd: Binding affinity in nM from netMHCpan for the mutated peptide
+  - kdWT : Binding affinity in nM from netMHCpan for the wild type peptide
+  - R : Similarity of mutated peptide to IEDB peptides
+  - logC : the log of the cross-reactivity
+  - logA : Log of the amplitude. This is a function of kd/kdWT and a constant
+  - quality: The final output of the pipeline and neoantigen editing. A higher quality is a better neoantigen. This is decribed in the Luksza et al. paper and is visualized below
+
+</details>
 
 ### Pipeline information
 

diff --git a/modules.json b/modules.json
@@ -6,27 +6,27 @@
             "modules": {
                 "msk": {
                     "neoantigenediting/aligntoiedb": {
-                        "branch": "neoantigen",
-                        "git_sha": "14ae2b3db25701828835e7145cee14dfdaddf180",
+                        "branch": "develop",
+                        "git_sha": "cac9c047e374ee259fb612ba5816e7e6aae6b86f",
                         "installed_by": ["neoantigen_editing"]
                     },
                     "neoantigenediting/computefitness": {
-                        "branch": "neoantigen",
+                        "branch": "develop",
                         "git_sha": "1f65c2ecdc5010549055ff7f4e6b8bccee48d4ae",
                         "installed_by": ["neoantigen_editing"]
                     },
                     "neoantigenutils/formatnetmhcpan": {
-                        "branch": "neoantigen",
-                        "git_sha": "fceccd3f96fec678849bb9bc0c04e53d9965f973",
+                        "branch": "develop",
+                        "git_sha": "c5d1252252e15555abcc82ea537cebeb281a1856",
                         "installed_by": ["netmhcstabandpan"]
                     },
                     "neoantigenutils/generatehlastring": {
-                        "branch": "neoantigen",
+                        "branch": "develop",
                         "git_sha": "33f0bd33095fa15016ee24f4fb4d61e896dbb970",
                         "installed_by": ["netmhcstabandpan"]
                     },
                     "neoantigenutils/generatemutfasta": {
-                        "branch": "neoantigen",
+                        "branch": "develop",
                         "git_sha": "bb7975c796ab9a2d7a45ef733a6a226a0f5ad74a",
                         "installed_by": ["netmhcstabandpan"]
                     },
@@ -36,32 +36,32 @@
                         "installed_by": ["modules"]
                     },
                     "netmhcpan": {
-                        "branch": "neoantigen",
-                        "git_sha": "33f0bd33095fa15016ee24f4fb4d61e896dbb970",
+                        "branch": "develop",
+                        "git_sha": "503abeb67260f060d8228221b07d743aa4180345",
                         "installed_by": ["modules", "netmhcstabandpan"]
                     },
                     "netmhcstabpan": {
-                        "branch": "neoantigen",
-                        "git_sha": "33f0bd33095fa15016ee24f4fb4d61e896dbb970",
+                        "branch": "develop",
+                        "git_sha": "c1a473f8bc08f778269a36ab62d5adf24357225f",
                         "installed_by": ["modules", "netmhcstabandpan"]
                     },
                     "phylowgs/createinput": {
-                        "branch": "neoantigen",
+                        "branch": "develop",
                         "git_sha": "b031249dcf4279606c25e626da2a628756e75e8a",
                         "installed_by": ["phylowgs"]
                     },
                     "phylowgs/multievolve": {
-                        "branch": "neoantigen",
+                        "branch": "develop",
                         "git_sha": "535662d391a3533dea3b11c462c14799227e08b2",
                         "installed_by": ["phylowgs"]
                     },
                     "phylowgs/parsecnvs": {
-                        "branch": "neoantigen",
-                        "git_sha": "064ce5b42a8f711fb4d8107150aad2d382ae99c2",
+                        "branch": "develop",
+                        "git_sha": "8471691d7c29bc2f5f4fb92279c94fb2640b6c38",
                         "installed_by": ["phylowgs"]
                     },
                     "phylowgs/writeresults": {
-                        "branch": "neoantigen",
+                        "branch": "develop",
                         "git_sha": "6d27f08bf649e8680ace321d3127dcdf0e210973",
                         "installed_by": ["phylowgs"]
                     }
@@ -70,17 +70,17 @@
             "subworkflows": {
                 "msk": {
                     "neoantigen_editing": {
-                        "branch": "neoantigen",
+                        "branch": "develop",
                         "git_sha": "56a628201401866096d6307b9e8c690c5eb46ac2",
                         "installed_by": ["subworkflows"]
                     },
                     "netmhcstabandpan": {
-                        "branch": "neoantigen",
-                        "git_sha": "1b7ac020798572be26402a72dd9c1a22ce849a63",
+                        "branch": "develop",
+                        "git_sha": "d60211568e3709e9284bc06eef938e361d474d08",
                         "installed_by": ["subworkflows"]
                     },
                     "phylowgs": {
-                        "branch": "neoantigen",
+                        "branch": "develop",
                         "git_sha": "a5d61394af346f21ee2eb7ecfd97ab25bdbd1d0e",
                         "installed_by": ["subworkflows"]
                     }
@@ -90,14 +90,9 @@
         "https://github.com/nf-core/modules.git": {
             "modules": {
                 "nf-core": {
-                    "fastqc": {
-                        "branch": "master",
-                        "git_sha": "285a50500f9e02578d90b3ce6382ea3c30216acd",
-                        "installed_by": ["modules"]
-                    },
                     "multiqc": {
                         "branch": "master",
-                        "git_sha": "314d742bdb357a1df5f9b88427b3b6ac78aa33f7",
+                        "git_sha": "b80f5fd12ff7c43938f424dd76392a2704fa2396",
                         "installed_by": ["modules"]
                     }
                 }

diff --git a/modules/msk/neoantigenediting/aligntoiedb/meta.yml b/modules/msk/neoantigenediting/aligntoiedb/meta.yml
@@ -1,7 +1,7 @@
 ---
 # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json
 name: "neoantigenediting_aligntoiedb"
-description: Align neoantigens to the IEDB resource
+description: Align neoantigens to the IEDB file
 keywords:
   - neoantigenediting
   - neoantigens

diff --git a/modules/msk/neoantigenediting/aligntoiedb/resources/usr/bin/align_neoantigens_to_IEDB.py b/modules/msk/neoantigenediting/aligntoiedb/resources/usr/bin/align_neoantigens_to_IEDB.py
@@ -47,14 +47,20 @@ def load_blosum62_mat():
 * -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4  1
 """
     amino_acids = "ACDEFGHIKLMNPQRSTVWY"
-    blosum62_mat_str_list = [l.split() for l in raw_blosum62_mat_str.strip().split("\n")]
+    blosum62_mat_str_list = [
+        l.split() for l in raw_blosum62_mat_str.strip().split("\n")
+    ]
     blosum_aa_order = [blosum62_mat_str_list[0].index(aa) for aa in amino_acids]
 
     blosum62_mat = np.zeros((len(amino_acids), len(amino_acids)))
     for i, bl_ind in enumerate(blosum_aa_order):
-        blosum62_mat[i] = np.array([int(x) for x in blosum62_mat_str_list[bl_ind + 1][1:]])[blosum_aa_order]
+        blosum62_mat[i] = np.array(
+            [int(x) for x in blosum62_mat_str_list[bl_ind + 1][1:]]
+        )[blosum_aa_order]
     blosum62 = {
-        (aaA, aaB): blosum62_mat[i, j] for i, aaA in enumerate(amino_acids) for j, aaB in enumerate(amino_acids)
+        (aaA, aaB): blosum62_mat[i, j]
+        for i, aaA in enumerate(amino_acids)
+        for j, aaB in enumerate(amino_acids)
     }
     return blosum62
 
@@ -226,7 +232,9 @@ def load_epitopes(iedbfasta):
         pjson = json.load(f)
     patient = pjson["patient"]
     neoantigens = pjson["neoantigens"]
-    peptides = set([("_".join(neo["id"].split("_")[:-1]), neo["sequence"]) for neo in neoantigens])
+    peptides = set(
+        [("_".join(neo["id"].split("_")[:-1]), neo["sequence"]) for neo in neoantigens]
+    )
     pepseq2pepid = defaultdict(set)
     for pep_id, pep_seq in peptides:
         pepseq2pepid[pep_seq].add(pep_id)
@@ -251,5 +259,7 @@ def load_epitopes(iedbfasta):
             "Alignment_score",
         ]
     else:
-        aln_data = pd.DataFrame(columns=["Peptide_ID", "Peptide", "Epitope_ID", "Alignment_score"])
+        aln_data = pd.DataFrame(
+            columns=["Peptide_ID", "Peptide", "Epitope_ID", "Alignment_score"]
+        )
     aln_data.to_csv("iedb_alignments_" + patient + ".txt", sep="\t", index=False)
diff --git a/modules/msk/neoantigenediting/computefitness/resources/usr/bin/EpitopeDistance.py b/modules/msk/neoantigenediting/computefitness/resources/usr/bin/EpitopeDistance.py
@@ -11,7 +11,6 @@
 import os
 
 
-# %
 class EpitopeDistance(object):
     """Base class for epitope crossreactivity.
 
@@ -39,7 +38,9 @@ class EpitopeDistance(object):
 
     def __init__(
         self,
-        model_file=os.path.join(os.path.dirname(__file__), "distance_data", "epitope_distance_model_parameters.json"),
+        model_file=os.path.join(
+            os.path.dirname(__file__), "distance_data", "epitope_distance_model_parameters.json"
+        ),
         amino_acids="ACDEFGHIKLMNPQRSTVWY",
     ):
         """Initialize class and compute M_ab."""
@@ -75,5 +76,11 @@ def epitope_dist(self, epiA, epiB):
         """
 
         return sum(
-            [self.d_i[i] * self.M_ab[self.amino_acid_dict[epiA[i]], self.amino_acid_dict[epiB[i]]] for i in range(9)]
+            [
+                self.d_i[i]
+                * self.M_ab[
+                    self.amino_acid_dict[epiA[i]], self.amino_acid_dict[epiB[i]]
+                ]
+                for i in range(9)
+            ]
         )
diff --git a/modules/msk/neoantigenediting/computefitness/resources/usr/bin/compute_fitness.py b/modules/msk/neoantigenediting/computefitness/resources/usr/bin/compute_fitness.py
@@ -60,7 +60,9 @@ def fill_up_clone_neoantigens(tree, mut2neo):
     while len(nodes) > 0:
         node = nodes[0]
         nodes = nodes[1:]
-        node["neoantigens"] = [neo["id"] for mid in node["all_mutations"] for neo in mut2neo[mid]]
+        node["neoantigens"] = [
+            neo["id"] for mid in node["all_mutations"] for neo in mut2neo[mid]
+        ]
         node["neoantigen_load"] = len(node["neoantigens"])
         node["NA_Mut"] = sum([len(mut2neo[mid]) > 0 for mid in node["all_mutations"]])
         if "children" in node:
@@ -157,7 +159,9 @@ def compute_effective_sample_size(sample_json):
         for clone_muts, X in zip(clone_muts_list, freqs):
             for mid in clone_muts:
                 mut_freqs[mid].append(X)
-    avev = np.mean([np.var(mut_freqs[mid]) if mut_freqs[mid] else 0 for mid in mut_freqs])
+    avev = np.mean(
+        [np.var(mut_freqs[mid]) if mut_freqs[mid] else 0 for mid in mut_freqs]
+    )
     n = 1 / avev
     return n
 

diff --git a/modules/msk/neoantigenutils/formatnetmhcpan/main.nf b/modules/msk/neoantigenutils/formatnetmhcpan/main.nf
@@ -36,10 +36,10 @@ process NEOANTIGENUTILS_FORMATNETMHCPAN {
     stub:
     def args = task.ext.args ?: ''
     def prefix = task.ext.prefix ?: "${meta.id}"
+    def netmhcOutputType = meta.typeMut ? "MUT": "WT"
+    def netmhcOutputFrom = meta.fromStab ? "STAB": "PAN"
     """
-
-        touch ${prefix}.MUT.tsv
-        touch ${prefix}.WT.tsv
+        touch ${prefix}.${netmhcOutputType}.${netmhcOutputFrom}.tsv
         cat <<-END_VERSIONS > versions.yml
         "${task.process}":
             formatNetmhcpanOutput: \$(echo \$(format_netmhcpan_output.py -v))