Skip to content

Commit

Permalink
Merge pull request #50 from mskcc/feature/update_modules_to_base_develop
Browse files Browse the repository at this point in the history
Feature/update modules to base develop
  • Loading branch information
nikhil authored Jul 30, 2024
2 parents 0aa3a64 + 23e53ba commit 7f820a8
Show file tree
Hide file tree
Showing 34 changed files with 508 additions and 685 deletions.
16 changes: 8 additions & 8 deletions .github/workflows/download_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ on:
description: "The specific branch you wish to utilize for the test execution of nf-core download."
required: true
default: "dev"
# pull_request:
# types:
# - opened
# branches:
# - master
# pull_request_target:
# branches:
# - master
pull_request:
types:
- opened
branches:
- master
pull_request_target:
branches:
- master

env:
NXF_ANSI_LOG: false
Expand Down
Binary file added docs/images/luksza2021_fig3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/mqc_fastqc_adapter.png
Binary file not shown.
Binary file removed docs/images/mqc_fastqc_counts.png
Binary file not shown.
Binary file removed docs/images/mqc_fastqc_quality.png
Binary file not shown.
91 changes: 69 additions & 22 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,48 +12,95 @@ The directories listed below will be created in the results directory after the

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [FastQC](#fastqc) - Raw read QC
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
1. Create phylogenetic trees using [PhyloWGS](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0602-8)
2. Use [netMHCpan-4](https://services.healthtech.dtu.dk/services/NetMHCpan-4.1/) to calculate binding affinities
3. Use [netMHCpanStab](https://services.healthtech.dtu.dk/services/NetMHCstabpan-1.0/) to calculate stability scores
4. Use Luksza et al.'s neoantigen quality and fitness computations tool ([NeoantigenEditing](https://github.com/LukszaLab/NeoantigenEditing)) to evaluate peptides

### FastQC
### PhyloWGS

<details markdown="1">
<summary>Output files</summary>

- `fastqc/`
- `*_fastqc.html`: FastQC report containing quality metrics.
- `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.
- `phylowgs/`
- `*_.summ.json.gz`: Output file for JSON-formatted tree summaries
- `*.muts.json.gz`: Output file for JSON-formatted list of mutations
- `*.muts.json.gz`: Output file for JSON-formatted list of mutations
- `*.muts.json.gz`: Output zipped folder for JSON-formatted list of SSMs and CNVs

</details>

[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
### netMHCpan

![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)

![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)
<details markdown="1">
<summary>Output files</summary>

![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
- `netmhcpan/`
- `*.xls`: TSV/XLS file of netMHCpan. This contains the MUT or WT antigens
- `*.WT.netmhcpan.output,*.MUT.netmhcpan.output`: STDOUT file of netMHCpan. A uniquely formated file of neoantigens. This contains either the MUT or WT neoantigens. Neoantigenutils contains a parser for this file.

:::note
The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality.
:::
</details>

### MultiQC
### netMHCstabpan

<details markdown="1">
<summary>Output files</summary>

- `multiqc/`
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
- `multiqc_plots/`: directory containing static images from the report in various formats.
- `netmhcstabpan/`
- `*.xls`: TSV/XLS file of netMHCpan. This contains the MUT or WT antigens
- `*.WT.netmhcpan.output,*.MUT.netmhcpan.output`: STDOUT file of netMHCpan. A uniquely formated file of neoantigens. This contains either the MUT or WT neoantigens. Neoantigenutils contains a parser for this file.

</details>

[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
### Neoantigen Ediitng Final Output

<details markdown="1">
<summary>Output files</summary>

- `neoantigenediting/`

- `*._annotated.json`: The final output of the pipeline. This file is an annotated version of the tree output from phyloWGS with an extra property titled 'neoantigens'. Each entry in 'neoantigens' is a property with properties describing the neoantigen. These neoantigen properities are described below

"id": "XSYI_MG_M_9_C1203_11",

"mutation_id": "X_72667534_C_G",

"HLA_gene_id": "HLA-C\*12:03",

Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.
"sequence": "ASRSRHSPY",

"WT_sequence": "PSRSRHSPY",

"mutated_position": 1,

"Kd": 192.03,

"KdWT": 4582.17,

"R": 0.8911371281207195,

"logC": 2.263955023939215,

"logA": 3.1722763542054815,

"quality": 2.645601185190205

The above is an example output from a run. Each neoantigenic mutation will have an output like this.

- id: This is a unique id that combines an id created from the mutation, HLA allele, and window.
- mutation_id : ID containing the chromosome, position, ref and alt allele. I and D denote insertions and deletions respectively.
- HLA_gene_id : The HLA gene this neoantigen binds to
- sequence : Mutated sequence
- WT_sequence : The wild type sequence
- mutated_position : The position of the first difference
- Kd: Binding affinity in nM from netMHCpan for the mutated peptide
- kdWT : Binding affinity in nM from netMHCpan for the wild type peptide
- R : Similarity of mutated peptide to IEDB peptides
- logC : the log of the cross-reactivity
- logA : Log of the amplitude. This is a function of kd/kdWT and a constant
- quality: The final output of the pipeline and neoantigen editing. A higher quality is a better neoantigen. This is decribed in the Luksza et al. paper and is visualized below

</details>

### Pipeline information

Expand Down
47 changes: 21 additions & 26 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,27 @@
"modules": {
"msk": {
"neoantigenediting/aligntoiedb": {
"branch": "neoantigen",
"git_sha": "14ae2b3db25701828835e7145cee14dfdaddf180",
"branch": "develop",
"git_sha": "cac9c047e374ee259fb612ba5816e7e6aae6b86f",
"installed_by": ["neoantigen_editing"]
},
"neoantigenediting/computefitness": {
"branch": "neoantigen",
"branch": "develop",
"git_sha": "1f65c2ecdc5010549055ff7f4e6b8bccee48d4ae",
"installed_by": ["neoantigen_editing"]
},
"neoantigenutils/formatnetmhcpan": {
"branch": "neoantigen",
"git_sha": "fceccd3f96fec678849bb9bc0c04e53d9965f973",
"branch": "develop",
"git_sha": "c5d1252252e15555abcc82ea537cebeb281a1856",
"installed_by": ["netmhcstabandpan"]
},
"neoantigenutils/generatehlastring": {
"branch": "neoantigen",
"branch": "develop",
"git_sha": "33f0bd33095fa15016ee24f4fb4d61e896dbb970",
"installed_by": ["netmhcstabandpan"]
},
"neoantigenutils/generatemutfasta": {
"branch": "neoantigen",
"branch": "develop",
"git_sha": "bb7975c796ab9a2d7a45ef733a6a226a0f5ad74a",
"installed_by": ["netmhcstabandpan"]
},
Expand All @@ -36,32 +36,32 @@
"installed_by": ["modules"]
},
"netmhcpan": {
"branch": "neoantigen",
"git_sha": "33f0bd33095fa15016ee24f4fb4d61e896dbb970",
"branch": "develop",
"git_sha": "503abeb67260f060d8228221b07d743aa4180345",
"installed_by": ["modules", "netmhcstabandpan"]
},
"netmhcstabpan": {
"branch": "neoantigen",
"git_sha": "33f0bd33095fa15016ee24f4fb4d61e896dbb970",
"branch": "develop",
"git_sha": "c1a473f8bc08f778269a36ab62d5adf24357225f",
"installed_by": ["modules", "netmhcstabandpan"]
},
"phylowgs/createinput": {
"branch": "neoantigen",
"branch": "develop",
"git_sha": "b031249dcf4279606c25e626da2a628756e75e8a",
"installed_by": ["phylowgs"]
},
"phylowgs/multievolve": {
"branch": "neoantigen",
"branch": "develop",
"git_sha": "535662d391a3533dea3b11c462c14799227e08b2",
"installed_by": ["phylowgs"]
},
"phylowgs/parsecnvs": {
"branch": "neoantigen",
"git_sha": "064ce5b42a8f711fb4d8107150aad2d382ae99c2",
"branch": "develop",
"git_sha": "8471691d7c29bc2f5f4fb92279c94fb2640b6c38",
"installed_by": ["phylowgs"]
},
"phylowgs/writeresults": {
"branch": "neoantigen",
"branch": "develop",
"git_sha": "6d27f08bf649e8680ace321d3127dcdf0e210973",
"installed_by": ["phylowgs"]
}
Expand All @@ -70,17 +70,17 @@
"subworkflows": {
"msk": {
"neoantigen_editing": {
"branch": "neoantigen",
"branch": "develop",
"git_sha": "56a628201401866096d6307b9e8c690c5eb46ac2",
"installed_by": ["subworkflows"]
},
"netmhcstabandpan": {
"branch": "neoantigen",
"git_sha": "1b7ac020798572be26402a72dd9c1a22ce849a63",
"branch": "develop",
"git_sha": "d60211568e3709e9284bc06eef938e361d474d08",
"installed_by": ["subworkflows"]
},
"phylowgs": {
"branch": "neoantigen",
"branch": "develop",
"git_sha": "a5d61394af346f21ee2eb7ecfd97ab25bdbd1d0e",
"installed_by": ["subworkflows"]
}
Expand All @@ -90,14 +90,9 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"fastqc": {
"branch": "master",
"git_sha": "285a50500f9e02578d90b3ce6382ea3c30216acd",
"installed_by": ["modules"]
},
"multiqc": {
"branch": "master",
"git_sha": "314d742bdb357a1df5f9b88427b3b6ac78aa33f7",
"git_sha": "b80f5fd12ff7c43938f424dd76392a2704fa2396",
"installed_by": ["modules"]
}
}
Expand Down
2 changes: 1 addition & 1 deletion modules/msk/neoantigenediting/aligntoiedb/meta.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json
name: "neoantigenediting_aligntoiedb"
description: Align neoantigens to the IEDB resource
description: Align neoantigens to the IEDB file
keywords:
- neoantigenediting
- neoantigens
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,20 @@ def load_blosum62_mat():
* -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1
"""
amino_acids = "ACDEFGHIKLMNPQRSTVWY"
blosum62_mat_str_list = [l.split() for l in raw_blosum62_mat_str.strip().split("\n")]
blosum62_mat_str_list = [
l.split() for l in raw_blosum62_mat_str.strip().split("\n")
]
blosum_aa_order = [blosum62_mat_str_list[0].index(aa) for aa in amino_acids]

blosum62_mat = np.zeros((len(amino_acids), len(amino_acids)))
for i, bl_ind in enumerate(blosum_aa_order):
blosum62_mat[i] = np.array([int(x) for x in blosum62_mat_str_list[bl_ind + 1][1:]])[blosum_aa_order]
blosum62_mat[i] = np.array(
[int(x) for x in blosum62_mat_str_list[bl_ind + 1][1:]]
)[blosum_aa_order]
blosum62 = {
(aaA, aaB): blosum62_mat[i, j] for i, aaA in enumerate(amino_acids) for j, aaB in enumerate(amino_acids)
(aaA, aaB): blosum62_mat[i, j]
for i, aaA in enumerate(amino_acids)
for j, aaB in enumerate(amino_acids)
}
return blosum62

Expand Down Expand Up @@ -226,7 +232,9 @@ def load_epitopes(iedbfasta):
pjson = json.load(f)
patient = pjson["patient"]
neoantigens = pjson["neoantigens"]
peptides = set([("_".join(neo["id"].split("_")[:-1]), neo["sequence"]) for neo in neoantigens])
peptides = set(
[("_".join(neo["id"].split("_")[:-1]), neo["sequence"]) for neo in neoantigens]
)
pepseq2pepid = defaultdict(set)
for pep_id, pep_seq in peptides:
pepseq2pepid[pep_seq].add(pep_id)
Expand All @@ -251,5 +259,7 @@ def load_epitopes(iedbfasta):
"Alignment_score",
]
else:
aln_data = pd.DataFrame(columns=["Peptide_ID", "Peptide", "Epitope_ID", "Alignment_score"])
aln_data = pd.DataFrame(
columns=["Peptide_ID", "Peptide", "Epitope_ID", "Alignment_score"]
)
aln_data.to_csv("iedb_alignments_" + patient + ".txt", sep="\t", index=False)
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
import os


# %
class EpitopeDistance(object):
"""Base class for epitope crossreactivity.
Expand Down Expand Up @@ -39,7 +38,9 @@ class EpitopeDistance(object):

def __init__(
self,
model_file=os.path.join(os.path.dirname(__file__), "distance_data", "epitope_distance_model_parameters.json"),
model_file=os.path.join(
os.path.dirname(__file__), "distance_data", "epitope_distance_model_parameters.json"
),
amino_acids="ACDEFGHIKLMNPQRSTVWY",
):
"""Initialize class and compute M_ab."""
Expand Down Expand Up @@ -75,5 +76,11 @@ def epitope_dist(self, epiA, epiB):
"""

return sum(
[self.d_i[i] * self.M_ab[self.amino_acid_dict[epiA[i]], self.amino_acid_dict[epiB[i]]] for i in range(9)]
[
self.d_i[i]
* self.M_ab[
self.amino_acid_dict[epiA[i]], self.amino_acid_dict[epiB[i]]
]
for i in range(9)
]
)
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,9 @@ def fill_up_clone_neoantigens(tree, mut2neo):
while len(nodes) > 0:
node = nodes[0]
nodes = nodes[1:]
node["neoantigens"] = [neo["id"] for mid in node["all_mutations"] for neo in mut2neo[mid]]
node["neoantigens"] = [
neo["id"] for mid in node["all_mutations"] for neo in mut2neo[mid]
]
node["neoantigen_load"] = len(node["neoantigens"])
node["NA_Mut"] = sum([len(mut2neo[mid]) > 0 for mid in node["all_mutations"]])
if "children" in node:
Expand Down Expand Up @@ -157,7 +159,9 @@ def compute_effective_sample_size(sample_json):
for clone_muts, X in zip(clone_muts_list, freqs):
for mid in clone_muts:
mut_freqs[mid].append(X)
avev = np.mean([np.var(mut_freqs[mid]) if mut_freqs[mid] else 0 for mid in mut_freqs])
avev = np.mean(
[np.var(mut_freqs[mid]) if mut_freqs[mid] else 0 for mid in mut_freqs]
)
n = 1 / avev
return n

Expand Down
6 changes: 3 additions & 3 deletions modules/msk/neoantigenutils/formatnetmhcpan/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ process NEOANTIGENUTILS_FORMATNETMHCPAN {
stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def netmhcOutputType = meta.typeMut ? "MUT": "WT"
def netmhcOutputFrom = meta.fromStab ? "STAB": "PAN"
"""
touch ${prefix}.MUT.tsv
touch ${prefix}.WT.tsv
touch ${prefix}.${netmhcOutputType}.${netmhcOutputFrom}.tsv
cat <<-END_VERSIONS > versions.yml
"${task.process}":
formatNetmhcpanOutput: \$(echo \$(format_netmhcpan_output.py -v))
Expand Down
Loading

0 comments on commit 7f820a8

Please sign in to comment.