Merge branch 'release/0.1.4'

msk-access · Mar 12, 2023 · 4ce989d · 4ce989d
2 parents bf9c2e3 + 0c91508
commit 4ce989d
Show file tree

Hide file tree

Showing 9 changed files with 495 additions and 117 deletions.
diff --git a/.gitignore b/.gitignore
@@ -48,3 +48,129 @@ vignettes/*.pdf
 
 # MAC
 .DS_Store
+
+# Editors
+.vscode/
+.idea/
+
+# Vagrant
+.vagrant/
+
+# Mac/OSX
+.DS_Store
+
+# Windows
+Thumbs.db
+
+# Source for the following rules: https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
diff --git a/python/run_create_report/README.md b/python/run_create_report/README.md
@@ -1,30 +1,33 @@
-# Table of Contents
+# Run Create Report
 
-- [Table of Contents](#table-of-contents)
+- [Run Create Report](#run-create-report)
   - [Requirements](#requirements)
   - [run\_create\_report](#run_create_report)
-    - [main](#main)
+    - [Main Script (run\_create\_report.py)](#main-script-run_create_reportpy)
   - [Submodules](#submodules)
     - [check\_required\_columns](#check_required_columns)
       - [check\_required\_columns](#check_required_columns-1)
     - [generate\_repo\_paths](#generate_repo_paths)
       - [generate\_repo\_path](#generate_repo_path)
     - [read\_manifest](#read_manifest)
       - [read\_manifest](#read_manifest-1)
+      - [get\_row](#get_row)
     - [get\_small\_variant\_csv](#get_small_variant_csv)
       - [get\_small\_variant\_csv](#get_small_variant_csv-1)
     - [run\_cmd](#run_cmd)
       - [run\_cmd](#run_cmd-1)
       - [run\_multiple\_cmd](#run_multiple_cmd)
     - [generate\_facet\_maf\_path](#generate_facet_maf_path)
       - [generate\_facet\_maf\_path](#generate_facet_maf_path-1)
+      - [get\_maf\_path](#get_maf_path)
+      - [get\_best\_fit\_folder](#get_best_fit_folder)
     - [generate\_create\_report\_cmd](#generate_create_report_cmd)
       - [generate\_create\_report\_cmd](#generate_create_report_cmd-1)
 
 ## Requirements
 
 ```bash
-access_data_analysis==0.1.2 # works with this repo tag
+access_data_analysis=>0.1.2 # works with this repo tag
 typer==0.3.2
 typing_extensions==3.10.0.0
 pandas==1.2.5
@@ -37,7 +40,7 @@ rich==12.1.0
 
 <a id="run_create_report.main"></a>
 
-### main
+### Main Script (run\_create\_report.py)
 
 ```bash
 Usage: run_create_report.py [OPTIONS]
@@ -54,7 +57,17 @@ Options:
                                   create_report.R when `--repo` is not given
 
   -m, --manifest FILE             File containing meta information per sample.
-                                  [required]
+                                  Require following columns in the header:
+                                  cmo_patient_id, sample_id, dmp_patient_id,
+                                  collection_date or collection_day,
+                                  timepoint. If dmp_sample_id column is given
+                                  and has information that will be used to run
+                                  facets. If dmp_sample_id is not given and
+                                  dmp_patient_id is given than it will be used
+                                  to get the Tumor sample with lowest number.
+                                  If dmp_sample_id or dmp_patient_id is not
+                                  given then it will run without the facet maf
+                                  file  [required]
 
   -v, --variant-results DIRECTORY
                                   Base path for all results of small variants
@@ -71,6 +84,11 @@ Options:
                                   /work/ccs/shared/resources/impact/facets/all
                                   /]
 
+  -bf, --best-fit                 If this is set to True then we will attempt
+                                  to parse `facets_review.manifest` file to
+                                  pick the best fit for a given dmp_sample_id
+                                  [default: False]
+
   -l, --tumor-type TEXT           Tumor type label for the report  [required]
   -cfm, --copy-facet-maf          If this is set to True then we will copy the
                                   facet maf file in the directory specified in
@@ -87,10 +105,12 @@ Options:
   -gm, --generate-markdown        If given, the create_report.R will be run
                                   with `-md` flag to generate markdown
                                   [default: False]
-  -ff, --force                    If this is set to True then we will not 
-                                  stop if an error is encountered in a given
-                                  sample but keep on running for the next sample
-                                  [default: False]
+
+  -ff, --force                    If this is set to True then we will not stop
+                                  if an error is encountered in a given sample
+                                  while running create_report.R but keep on
+                                  running for the next sample  [default:
+                                  False]
 
   --install-completion            Install completion for the current shell.
   --show-completion               Show completion for the current shell, to
@@ -106,10 +126,11 @@ Wrapper script to run create_report.R
 - `repo_path` _Path, optional_ - "Base path to where the git repository is located for access_data_analysis".
 - `script_path` _Path, optional_ - "Path to the create_report.R script, fall back if `--repo` is not given".
 - `template_path` _Path, optional_ - "Path to the template.Rmd or template_days.Rmd to be used with create_report.R when `--repo` is not given".
-- `manifest` _Path, required_ - "File containing meta information per sample.".
+- `manifest` _Path, required_ - "File containing meta information per sample. Require following columns in the header: `cmo_patient_id`, `sample_id`, `dmp_patient_id`, `collection_date` or `collection_day`, `timepoint`. If dmp_sample_id column is given and has information that will be used to run facets. if dmp_sample_id is not given and dmp_patient_id is given than it will be used to get the Tumor sample with lowest number.If dmp_sample_id or dmp_patient_id is not given then it will run without the facet maf file".
 - `variant_path` _Path, required_ - "Base path for all results of small variants as generated by filter_calls.R script in access_data_analysis (Make sure only High Confidence calls are included)".
 - `cnv_path` _Path, required_ - "Base path for all results of CNV as generated by CNV_processing.R script in access_data_analysis".
 - `facet_repo` _Path, required_ - "Base path for all results of facets on Clinical MSK-IMPACT samples".
+- `best_fit` _bool, optional_ - "If this is set to True then we will attempt to parse `facets_review.manifest` file to pick the best fit for a given dmp_sample_id".
 - `tumor_type` _str, required_ - "Tumor type label for the report".
 - `copy_facet` _bool, optional_ - "If this is set to True then we will copy the facet maf file in the directory specified in `copy_facet_dir`".
 - `copy_facet_dir` _Path, optional_ - "Directory path where the facet maf file should be copied.".
@@ -119,18 +140,18 @@ Wrapper script to run create_report.R
 
 **Usage**
 
-- Using Generate Markdown, copy facet maf file, use template_days RMarkdown and force flag
+- Using Generate Markdown, copy facet maf file, use template_days RMarkdown, force flag and best fit for facets
 
 ```bash
 > python python/run_create_report/run_create_report.py \
 -m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
 -r /home/shahr2/github/access_data_analysis \
 -v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
 -c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
--l "Melanoma" -gm -d -cfm -ff
+-l "Melanoma" -gm -d -cfm -ff -bf
 ```
 
-- Using Generate Markdown and force flag
+- Using Generate Markdown, force flag and default fit for facets
 
 ```bash
 > python python/run_create_report/run_create_report.py \
@@ -221,7 +242,7 @@ Generate path to create_report.R and template RMarkdown file
 def read_manifest(manifest)
 ```
 
-_summary_
+Read manifest file
 
 **Arguments**:
 
@@ -232,6 +253,24 @@ _summary_
 
 - `data_frame` - _description_
 
+<a id="read_manifest.get_row"></a>
+
+#### get\_row
+
+```python
+def get_row(tsv_file)
+```
+
+Function to skip rows
+
+**Arguments**:
+
+- `tsv_file` _file_ - file to be read
+  
+**Returns**:
+
+- `list` - lines to be skipped
+
 <a id="get_small_variant_csv"></a>
 
 ### get\_small\_variant\_csv
@@ -319,6 +358,44 @@ Get path of maf associated with facet-suite output
 
 - `str` - path of the facets maf
 
+<a id="generate_facet_maf_path.get_maf_path"></a>
+
+#### get\_maf\_path
+
+```python
+def get_maf_path(maf_path, patient_id, sample_id)
+```
+
+Get the path to the maf file
+
+**Arguments**:
+
+- `maf_path` _pathlib.Path_ - Base path of the maf file
+- `patient_id` _str_: DMP Patient ID for facets
+- `sample_id` _str_ - DMP Sample ID if any for facets
+  
+**Returns**:
+
+- `str` - Path to the maf file
+
+<a id="generate_facet_maf_path.get_best_fit_folder"></a>
+
+#### get\_best\_fit\_folder
+
+```python
+def get_best_fit_folder(facet_manifest_path)
+```
+
+Get the best fit folder for the given facet manifest path
+
+**Arguments**:
+
+- `facet_manifest_path` _str_ - manifest path to be used for determining best fit
+  
+**Returns**:
+
+- `pathlib.Path` - path to the folder containing best fit maf files
+
 <a id="generate_create_report_cmd"></a>
 
 ### generate\_create\_report\_cmd

diff --git a/python/run_create_report/modules/check_required_columns.py b/python/run_create_report/modules/check_required_columns.py
@@ -1,5 +1,5 @@
-import typer
 import pandas as pd
+import typer
 
 
 def check_required_columns(manifest, template_days=None):