Merge pull request #50 from MannLabs/run_ratiotests_by_default

Run ratiotests by default
MannLabs · Oct 24, 2024 · d04985f · d04985f
2 parents d5183b2 + b19cfc6
commit d04985f
Show file tree

Hide file tree

Showing 28 changed files with 664 additions and 2,909 deletions.
diff --git a/.github/workflows/create_release.yml b/.github/workflows/create_release.yml
@@ -18,5 +18,5 @@ jobs:
     with:
       package_name: directlfq
       commitish_to_release: ${{ inputs.commitish_to_release }}
-      python_version: 3.8
+      python_version: 3.9
       test_app: false
diff --git a/.github/workflows/quick_tests.yml b/.github/workflows/quick_tests.yml
@@ -2,7 +2,6 @@ on:
   push:
     branches: [ main, development ]
   pull_request:
-    branches: [ main, development ]
   workflow_dispatch:
 
 name: Quick tests based on default installation

diff --git a/.github/workflows/quick_tests_ubuntu.yml b/.github/workflows/quick_tests_ubuntu.yml
@@ -2,7 +2,6 @@ on:
   push:
     branches: [ main, development ]
   pull_request:
-    branches: [ main, development ]
   workflow_dispatch:
 
 name: Quick tests based on default installation, ubuntu stable
@@ -24,7 +23,7 @@ jobs:
       - name: Conda info
         shell: bash -l {0}
         run: conda info
-      - name: Test pip installation with all loose dependencies
+      - name: Test pip installation with stable dependencies
         shell: bash -l {0}
         run: |
           cd misc

diff --git a/.github/workflows/ratio_tests.yml b/.github/workflows/ratio_tests.yml
@@ -0,0 +1,34 @@
+on:
+  pull_request:
+  workflow_dispatch:
+
+name: Ratio tests, ubuntu stable
+
+jobs:
+  loose_installation:
+    name: Test loose pip installation on ${{ matrix.os }}
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest]
+    steps:
+      - uses: actions/checkout@v2
+      - uses: conda-incubator/setup-miniconda@v2
+        with:
+          auto-update-conda: true
+          python-version: ${{ matrix.python-version }}
+          miniconda-version: latest
+      - name: Conda info
+        shell: bash -l {0}
+        run: conda info
+      - name: Test pip installation with stable dependencies
+        shell: bash -l {0}
+        run: |
+          cd misc
+          . ./stable_pip_install.sh
+
+      - name: Ratio tests
+        shell: bash -l {0}
+        run: |
+          cd tests
+          . ./run_ratio_tests.sh
diff --git a/README.md b/README.md
@@ -54,7 +54,7 @@ There are currently two different types of installation possible:
 
 * [**One-click GUI installer:**](#one-click-gui) Choose this installation if you only want the GUI and/or keep things as simple as possible.
 <!---
-* [**Pip installer:**](#pip) Choose this installation if you want to use directlfq as a Python package in an existing Python 3.8 environment (e.g. a Jupyter notebook). If needed, the GUI and CLI can be installed with pip as well.
+* [**Pip installer:**](#pip) Choose this installation if you want to use directlfq as a Python package in an existing Python 3.9 environment (e.g. a Jupyter notebook). If needed, the GUI and CLI can be installed with pip as well.
 -->
 * [**Developer installer:**](#developer) Choose this installation if you are familiar with CLI tools, [conda](https://docs.conda.io/en/latest/) and Python. This installation allows access to all available features of directlfq and even allows to modify its source code directly. Generally, the developer version of directlfq outperforms the precompiled versions which makes this the installation of choice for high-throughput experiments.
 
@@ -71,7 +71,7 @@ Older releases remain available on the [release page](https://github.com/MannLab
 -
 ### Pip
 
-directLFQ can be installed in an existing Python 3.8 environment with a single `bash` command.
+directLFQ can be installed in an existing Python 3.9 environment with a single `bash` command.
 
 ```bash
 pip install directlfq
@@ -117,7 +117,7 @@ git clone https://github.com/MannLabs/directlfq.git
 For any Python package, it is highly recommended to use a separate [conda virtual environment](https://docs.conda.io/en/latest/), as otherwise *dependancy conflicts can occur with already existing packages*.
 
 ```bash
-conda create --name directlfq python=3.8 -y
+conda create --name directlfq python=3.9 -y
 conda activate directlfq
 ```
 

diff --git a/directlfq/__init__.py b/directlfq/__init__.py
@@ -13,7 +13,7 @@
     "software",
     "AlphaPept ecosystem",
 ]
-__python_version__ = ">=3.8"
+__python_version__ = ">=3.9"
 __classifiers__ = [
     # "Development Status :: 1 - Planning",
     # "Development Status :: 2 - Pre-Alpha",

diff --git a/directlfq/benchmarking.py b/directlfq/benchmarking.py
@@ -1,14 +1,5 @@
-# AUTOGENERATED! DO NOT EDIT! File to edit: ../nbdev_nbs/06_benchmarking.ipynb.
 
-# %% auto 0
-__all__ = ['plot_lines', 'plot_points', 'get_tps_fps', 'annotate_dataframe', 'compare_to_reference', 'compare_normalization',
-           'print_nonref_hits', 'ResultsTable', 'ResultsTableDirectLFQ', 'ResultsTableIq', 'ResultsTableMaxQuant',
-           'ResultsTableMerger', 'OrganismAnnotator', 'OrganismAnnotatorMaxQuant', 'OrganismAnnotatorSpectronaut',
-           'OrganismAnnotatorDIANN', 'PlotConfig', 'MultiOrganismIntensityFCPlotter', 'ResultsTableBiological',
-           'CVInfoDataset', 'CVDistributionPlotter', 'HistPlotConfig', 'SampleListScaler', 'SampleIndexIQScaler',
-           'ScaledDFCreatorDirectLFQFormat', 'ScaledDFCreatorIQFormat', 'LFQTimer', 'TimedLFQRun', 'RuntimeInfo']
 
-# %% ../nbdev_nbs/06_benchmarking.ipynb 1
 import matplotlib.pyplot as plt
 import numpy as np
 
@@ -336,76 +327,7 @@ def __init__(self, x_axisboundaries = None, y_axisboundaries = None, colormaps =
         self.y_axisboundaries = y_axisboundaries
         self._colormaps = colormaps
 
-class MultiOrganismIntensityFCPlotter():
-    def __init__(self, ax, resultstable_w_ratios, organisms_to_plot = None, fcs_to_expect = None, title = ""):
-        self.ax = ax
-        self._color_list_hex = ['#bad566', '#325e7a', '#ffd479']
-        self._resultstable_w_ratios = resultstable_w_ratios
-        self._organism_column = resultstable_w_ratios.organism_column
-        self._log2fc_column = resultstable_w_ratios.log2fc_column
-        self._mean_intensity_column = resultstable_w_ratios.mean_intensity_column
-
-        self._organisms_to_plot = self._get_organisms_to_plot(organisms_to_plot)
-        self._fcs_to_expect = fcs_to_expect
-
-        self._title = self._get_title(title)
-        self._scatter_per_organism()
-        self._add_expected_lines()
 
-    def _get_organisms_to_plot(self, organisms_to_plot):
-        if organisms_to_plot is not None:
-            return organisms_to_plot
-        else:
-            return sorted(list(set(self._resultstable_w_ratios.formated_dataframe[self._organism_column].astype('str'))))
-
-    def _get_title(self, title):
-        if title !="":
-            self._print_infos_about_data()
-            return title
-        return self._generate_title()
-
-    def _print_infos_about_data(self):
-        for organism in self._organisms_to_plot:
-            subtable_organism = self._get_organism_subtable(organism)
-            print(self._get_stats_of_organism(organism, subtable_organism))
-
-    def _generate_title(self):
-        title = ""
-        for organism in self._organisms_to_plot:
-            subtable_organism = self._get_organism_subtable(organism)
-            title += self._get_stats_of_organism(organism, subtable_organism)
-        return title
-
-    def _scatter_per_organism(self):
-        complete_table = self._resultstable_w_ratios.formated_dataframe.copy()
-        complete_table[self._mean_intensity_column] = np.log2(complete_table[self._mean_intensity_column])
-        complete_table = self._remove_omitted_organisms_from_table(complete_table)
-        color_palette = sns.color_palette(self._color_list_hex, n_colors=len(self._organisms_to_plot))
-        sns.scatterplot(data= complete_table, x =self._mean_intensity_column, y= self._log2fc_column, hue=self._organism_column, alpha=0.15, ax=self.ax, 
-        hue_order=self._organisms_to_plot, palette=color_palette, size=0.2)
-        self.ax.set_title(self._title)
-
-    def _remove_omitted_organisms_from_table(self, complete_table):
-        row_w_permitted_organism = [x in self._organisms_to_plot for x in complete_table["organism"]]
-        return complete_table[row_w_permitted_organism]
-
-    def _add_expected_lines(self):
-        if self._fcs_to_expect is not None:
-            for idx, fc in enumerate(self._fcs_to_expect):
-                color = self._color_list_hex[idx]
-                self.ax.axhline(fc, color = color)
-
-    def _get_organism_subtable(self, organism):
-        complete_table = self._resultstable_w_ratios.formated_dataframe
-        return complete_table[complete_table[self._organism_column] == organism]
-
-    def _get_stats_of_organism(self, organism, subtable_organism):
-        fcs = subtable_organism[self._log2fc_column].to_numpy()
-        fcs = fcs[np.isfinite(fcs)]
-        median_fc = np.nanmedian(fcs)
-        std_fc = np.nanstd(fcs)
-        num_ratios = sum(~np.isnan(fcs))
-        return f"{organism} num:{num_ratios} median_FC:{median_fc:.2} STD:{std_fc:.2}\n"
 
 
 # %% ../nbdev_nbs/06_benchmarking.ipynb 16

diff --git a/directlfq/test_utils.py b/directlfq/test_utils.py
@@ -3,6 +3,12 @@
 
 from  numpy.random import MT19937
 from numpy.random import RandomState, SeedSequence
+import directlfq.config as config
+import logging
+config.setup_logging()
+
+LOGGER = logging.getLogger(__name__)
+
 
 class ProteinProfileGenerator():
     def __init__(self, peptide_profiles):
@@ -57,4 +63,59 @@ def _add_zeros_to_profilevector(self):
         num_elements_to_set_zero = int(self._num_samples*self._fraction_zeros_in_profile)
         idxs_to_set_zero = np.random.choice(self._num_samples,size=num_elements_to_set_zero, replace=False)
         self.peptide_profile_vector[idxs_to_set_zero] = 0
-
+
+
+
+
+
+class RatioChecker():
+    def __init__(self, formatted_df : pd.DataFrame, organism2expectedfc : dict, organism2CI95 : dict):
+        """the ratio checker takes in a dataframe with columns 'log2fc' and 'organism' and checks if the ratios are consistent with the expected median fold changes and confidence intervals
+        
+        Args:
+            formatted_df (pd.DataFrame): dataframe with columns 'log2fc' and 'organism' from mixed species experiment
+            organism2expectedfc (dict): the expected log2fc for this organism in the mixed species experiment
+            organism2CI95 (dict): the expected confidence interval for this organism in the mixed species experiment (i.e. 95% of protein ratios should be within this interval)
+        """
+        self._formatted_df = formatted_df
+        self._organism2expectedfc = organism2expectedfc
+        self._organism2deviation_threshold = organism2CI95
+        self._organism2fcs = self._get_organism2fcs()
+        self._check_ratio_consistency_per_organism()
+
+    def _get_organism2fcs(self):
+        return self._formatted_df.groupby('organism')['log2fc'].apply(list).to_dict()
+
+    def _check_ratio_consistency_per_organism(self):
+        for organism in self._organism2expectedfc.keys():
+            expected_fc = self._organism2expectedfc[organism]
+            deviation_threshod = self._organism2deviation_threshold[organism]
+            fcs = self._organism2fcs[organism]
+            RatioConsistencyChecker(fcs = fcs, expected_fc = expected_fc, deviation_threshold = deviation_threshod)			
+
+class RatioConsistencyChecker():
+    def __init__(self, fcs, expected_fc, deviation_threshold):
+        self._fcs = np.array([fc for fc in fcs if np.isfinite(fc)])
+        self._expected_fc = expected_fc
+        self._fc_deviations = self._calc_fc_deviations()
+        self._deviation_threshold = deviation_threshold
+        self._fc_deviation_center = self._calc_deviation_center() #should be 0 if no bias
+        self._fraction_consistent = self._get_fraction_consistent()
+
+        self._assert_no_bias()
+
+    def _calc_fc_deviations(self):
+        return abs(self._fcs - self._expected_fc)
+
+    def _calc_deviation_center(self):
+        return np.nanmedian(self._fc_deviations)
+
+    def _get_fraction_consistent(self):
+        consistent_fcs= sum(self._fc_deviations <= self._deviation_threshold)
+        total_fcs = len(self._fcs)
+        return consistent_fcs/total_fcs
+
+    def _assert_no_bias(self):
+        assert self._fc_deviation_center < self._deviation_threshold, f"Deviation from center: {self._fc_deviation_center:.2f}"
+        assert self._fraction_consistent >0.95, f"Fraction consistent below 95%: {self._fraction_consistent:.2f}"
+        LOGGER.info("Checked ratios, no bias detected")
diff --git a/directlfq/visualizations.py b/directlfq/visualizations.py
@@ -1,11 +1,10 @@
-# AUTOGENERATED! DO NOT EDIT! File to edit: ../nbdev_nbs/05_visualizations.ipynb.
+import numpy as np
+import directlfq.config as config
+import logging
+config.setup_logging()
 
-# %% auto 0
-__all__ = ['a4_dims', 'a4_width_no_margin', 'AlphaPeptColorMap', 'CmapRegistrator', 'IonTraceCompararisonPlotter',
-           'IonTraceCompararisonPlotterNoDirectLFQTrace', 'IonTraceVisualizer', 'MultiOrganismMultiMethodBoxPlot',
-           'plot_withincond_fcs', 'plot_relative_to_median_fcs']
+LOGGER = logging.getLogger(__name__)
 
-# %% ../nbdev_nbs/05_visualizations.ipynb 1
 a4_dims = (11.7, 8.27)
 a4_width_no_margin = 10.5
 
@@ -222,3 +221,76 @@ def plot_relative_to_median_fcs(normed_intensity_df):
     plt.hist(diff_fcs,80,density=True, histtype='step')
     plt.xlabel("log2 peptide fcs")
     plt.show()
+
+
+
+class MultiOrganismIntensityFCPlotter():
+    def __init__(self, ax, resultstable_w_ratios, organisms_to_plot = None, fcs_to_expect = None, title = ""):
+        self.ax = ax
+        self._color_list_hex = ['#bad566', '#325e7a', '#ffd479']
+        self._resultstable_w_ratios = resultstable_w_ratios
+        self._organism_column = resultstable_w_ratios.organism_column
+        self._log2fc_column = resultstable_w_ratios.log2fc_column
+        self._mean_intensity_column = resultstable_w_ratios.mean_intensity_column
+
+        self._organisms_to_plot = self._get_organisms_to_plot(organisms_to_plot)
+        self._fcs_to_expect = fcs_to_expect
+
+        self._title = self._get_title(title)
+        self._scatter_per_organism()
+        self._add_expected_lines()
+
+    def _get_organisms_to_plot(self, organisms_to_plot):
+        if organisms_to_plot is not None:
+            return organisms_to_plot
+        else:
+            return sorted(list(set(self._resultstable_w_ratios.formated_dataframe[self._organism_column].astype('str'))))
+
+    def _get_title(self, title):
+        if title !="":
+            self._print_infos_about_data()
+            return title
+        return self._generate_title()
+
+    def _print_infos_about_data(self):
+        for organism in self._organisms_to_plot:
+            subtable_organism = self._get_organism_subtable(organism)
+            LOGGER.info(self._get_stats_of_organism(organism, subtable_organism))
+
+    def _generate_title(self):
+        title = ""
+        for organism in self._organisms_to_plot:
+            subtable_organism = self._get_organism_subtable(organism)
+            title += self._get_stats_of_organism(organism, subtable_organism)
+        return title
+
+    def _scatter_per_organism(self):
+        complete_table = self._resultstable_w_ratios.formated_dataframe.copy()
+        complete_table[self._mean_intensity_column] = np.log2(complete_table[self._mean_intensity_column])
+        complete_table = self._remove_omitted_organisms_from_table(complete_table)
+        color_palette = sns.color_palette(self._color_list_hex, n_colors=len(self._organisms_to_plot))
+        sns.scatterplot(data= complete_table, x =self._mean_intensity_column, y= self._log2fc_column, hue=self._organism_column, alpha=0.15, ax=self.ax, 
+        hue_order=self._organisms_to_plot, palette=color_palette, size=0.2)
+        self.ax.set_title(self._title)
+
+    def _remove_omitted_organisms_from_table(self, complete_table):
+        row_w_permitted_organism = [x in self._organisms_to_plot for x in complete_table["organism"]]
+        return complete_table[row_w_permitted_organism]
+
+    def _add_expected_lines(self):
+        if self._fcs_to_expect is not None:
+            for idx, fc in enumerate(self._fcs_to_expect):
+                color = self._color_list_hex[idx]
+                self.ax.axhline(fc, color = color)
+
+    def _get_organism_subtable(self, organism):
+        complete_table = self._resultstable_w_ratios.formated_dataframe
+        return complete_table[complete_table[self._organism_column] == organism]
+
+    def _get_stats_of_organism(self, organism, subtable_organism):
+        fcs = subtable_organism[self._log2fc_column].to_numpy()
+        fcs = fcs[np.isfinite(fcs)]
+        median_fc = np.nanmedian(fcs)
+        std_fc = np.nanstd(fcs)
+        num_ratios = sum(~np.isnan(fcs))
+        return f"{organism} num:{num_ratios} median_FC:{median_fc:.2} STD:{std_fc:.2}\n"
diff --git a/misc/check_version.sh b/misc/check_version.sh
@@ -1,7 +1,7 @@
 # TODO remove with old release workflow
 current_version=$(grep "__version__" ../directlfq/__init__.py | cut -f3 -d ' ' | sed 's/"//g')
 current_version_as_regex=$(echo $current_version | sed 's/\./\\./g')
-conda create -n version_check python=3.8 pip=20.1 -y
+conda create -n version_check python=3.9 pip=20.1 -y
 conda activate version_check
 set +e
 already_on_pypi=$(pip install directlfq== 2>&1 | grep -c "$current_version_as_regex")

diff --git a/misc/loose_pip_install.sh b/misc/loose_pip_install.sh
@@ -1,4 +1,4 @@
-conda create -n directlfq python=3.8 -y
+conda create -n directlfq python=3.9 -y
 conda activate directlfq
 pip install -e '../.[development-stable, gui]'
 directlfq

diff --git a/misc/stable_pip_install.sh b/misc/stable_pip_install.sh
@@ -1,4 +1,4 @@
-conda create -n directlfq python=3.8 -y
+conda create -n directlfq python=3.9 -y
 conda activate directlfq
 pip install -e '../.[stable,development-stable, gui-stable]'
 directlfq