diff --git a/docs/output.md b/docs/output.md index 0780b5e4..98d1f003 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,85 +1,9 @@ -# Outputs (ongoing) +# Outputs This document describes the final outputs produced by the pipeline. Most of the plots are taken from report generated from the [full-sized test dataset](https://sandbox.zenodo.org/record/1074721) for the pipeline. The files listed below will be created in the selected results directory (`output_location` parameter). All paths are relative to the top-level results directory. -## Directory structure (example for ``=_RPE-BM510_) - -```bash -/ -|-- alfred -| |-- Celln.tsv.gz -| `-- Celln.json.gz -|-- bam -| |-- Cell1.sort.mdup.bam -| |-- Cell2.sort.mdup.bam -| `-- Celln.sort.mdup.bam -|-- cell_selection -| |-- labels_raw.tsv -| `-- labels.tsv -|-- config -| |-- chroms_to_exclude.txt -| `-- single_paired_end_detection.txt -|-- counts -| `-- RPE-BM510 -| `-- counts-per-cell -|-- fastq -| |-- Cell1.1.fastq.gz -| |-- Cell1.2.fastq.gz -| |-- Cell2.1.fastq.gz -| `-- Cell2.2.fastq.gz -|-- haplotag -| |-- bam -| | `-- RPE-BM510 -| |-- bed -| `-- table -| `-- RPE-BM510 -| `-- by-cell -|-- log -| |-- ... -| `-- ... -|-- merged_bam -| `-- merged_bam.bam -|-- mosaiclassifier -| |-- haplotag_likelihoods -| |-- postprocessing -| | |-- filter -| | | `-- RPE-BM510 -| | |-- group-table -| | | `-- RPE-BM510 -| | `-- merge -| | `-- RPE-BM510 -| |-- sv_calls -| | `-- RPE-BM510 -| `-- sv_probabilities -| `-- RPE-BM510 -|-- plots -| `-- RPE-BM510 -| |-- counts -| |-- final_results -| |-- sv_calls -| |-- sv_clustering -| `-- sv_consistency -|-- segmentation -| `-- RPE-BM510 -| `-- segmentation-per-cell -|-- snv_genotyping -| `-- RPE-BM510 -|-- stats -| `-- RPE-BM510 -`-- strandphaser - |-- phased-snvs - |-- RPE-BM510 - | `-- StrandPhaseR_analysis.chr21 - | |-- browserFiles - | |-- data - | |-- Phased - | |-- SingleCellHaps - | `-- VCFfiles - `-- R_setup -``` - ## Plots folder ### Mosaic count - reads density across bins @@ -208,7 +132,6 @@ By using these heatmaps, the user can easily identify subclones based on the SV --- - File path: `//stats/stats-merged.tsv` Report category: `Stats` diff --git a/docs/parameters.md b/docs/parameters.md index 8fd766df..c34a62cf 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -33,8 +33,9 @@ All these arguments can be specified in two ways: | ---------------------------------------- | --------------------------------------------------------------------------------------------------- | ------- | ------------ | | `multistep_normalisation_analysis` | Allow to perform multistep normalisation including GC correction for visualization (Marco Cosenza). | False | False | | `multistep_normalisation_for_SV_calling` | Allow to use multistep normalisation count file during SV calling (Marco Cosenza). | False | False | -| `arbigent` | Enable ArbiGent mode of execution to genotype SV based on arbitrary segments | False | True | -| `scNOVA` | Enable scNOVA mode of execution to compute Nucleosome Occupancy (NO) of detected SV | False | True | +| `hgsvc_based_normalized_counts` | Use HGSVC based normalisation . | True | False | +| `arbigent` | Enable ArbiGent mode of execution to genotype SV based on arbitrary segments | False | False | +| `scNOVA` | Enable scNOVA mode of execution to compute Nucleosome Occupancy (NO) of detected SV | False | False | ### External files @@ -66,10 +67,12 @@ All these arguments can be specified in two ways: ### EMBL specific options -| Parameter | Comment | Default | -| ---------------------- | ----------------------------------------------------------------------------------------------------- | ------- | -| `genecore` | Enable/disable genecore mode to give as input the genecore shared folder in /g/korbel/shared/genecore | False | -| `genecore_date_folder` | Specify folder to be processed | | +| Parameter | Comment | Default | +| ------------------------- | ----------------------------------------------------------------------------------------------------- | ------------------------------------------- | +| `genecore` | Enable/disable genecore mode to give as input the genecore shared folder in /g/korbel/shared/genecore | False | +| `genecore_date_folder` | Specify folder to be processed | | +| `genecore_prefix` | Specify genecore prefix folder | /g/korbel/STOCKS/Data/Assay/sequencing/2023 | +| `genecore_regex_elements` | Specify genecore regex element to be used to distinguish sample from well number | PE20 | If `genecore` and `genecore_date_folder` are correctly specified, each plate will be processed independently by creating a specific folder in the `data_location` folder. diff --git a/docs/usage.md b/docs/usage.md index 0a0bc50b..55bca02d 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -61,7 +61,7 @@ snakemake \ **ℹ️ Note for 🇪🇺 EMBL users** -- You can load already installed snakemake modusl on the HPC (by connecting to login01 & login02) using the following `module load snakemake/7.14.0-foss-2022a` +- You can load already installed snakemake modules on the HPC (by connecting to login01 & login02) using the following `module load snakemake/7.14.0-foss-2022a` - Use the following command for singularity-args parameter: `--singularity-args "-B /g:/g -B /scratch:/scratch"` --- diff --git a/workflow/rules/external_data.smk b/workflow/rules/external_data.smk index da9fc9f3..365452a0 100644 --- a/workflow/rules/external_data.smk +++ b/workflow/rules/external_data.smk @@ -147,11 +147,31 @@ rule download_scnova_data: keep_local=True, ), output: + "workflow/data/scNOVA/utils/bin_chr_length.bed", + "workflow/data/scNOVA/utils/bin_Genebody_all.bed ", + "workflow/data/scNOVA/utils/bin_Genes_for_CNN_num_sort_ann_sort_GC_ensemble.txt", + "workflow/data/scNOVA/utils/bin_Genes_for_CNN_num_sort.txt", "workflow/data/scNOVA/utils/bin_Genes_for_CNN_reshape_annot.txt", + "workflow/data/scNOVA/utils/bin_Genes_for_CNN_sort.txt.corrected ", + "workflow/data/scNOVA/utils/Deeptool_Genes_for_CNN_merge_sort_lab_final.txt", + "workflow/data/scNOVA/utils/Features_reshape_CpG_orientation_impute.txt", + "workflow/data/scNOVA/utils/Features_reshape_CpG_orientation.txt", + "workflow/data/scNOVA/utils/Features_reshape_GC_orientation_impute.txt", + "workflow/data/scNOVA/utils/Features_reshape_GC_orientation.txt", + "workflow/data/scNOVA/utils/Features_reshape_RT_orientation.txt", + "workflow/data/scNOVA/utils/Features_reshape_size_orientation.txt", + "workflow/data/scNOVA/utils/FPKM_sort_LCL_RPE_19770_renamed.txt", + "workflow/data/scNOVA/utils/regions_all_hg38_v2_resize_2kb_sort_num_sort_for_chromVAR.bed", + "workflow/data/scNOVA/utils/regions_all_hg38_v2_resize_2kb_sort.bed ", + "workflow/data/scNOVA/utils/Strand_seq_matrix_Genebody_for_SCDE.txt", + "workflow/data/scNOVA/utils/Strand_seq_matrix_Genebody_for_SVM.txt", + "workflow/data/scNOVA/utils/Strand_seq_matrix_TES_for_SVM.txt", + "workflow/data/scNOVA/utils/Strand_seq_matrix_TSS_for_SVM.txt", log: touch("log/config/dl_arbigent_mappability_track.ok"), conda: "../envs/scNOVA/scNOVA_DL.yaml" + container: None shell: """ directory="workflow/data/ref_genomes/"