Skip to content

Commit

Permalink
Merge pull request #50 from MannLabs/run_ratiotests_by_default
Browse files Browse the repository at this point in the history
Run ratiotests by default
  • Loading branch information
ammarcsj authored Oct 24, 2024
2 parents d5183b2 + b19cfc6 commit d04985f
Show file tree
Hide file tree
Showing 28 changed files with 664 additions and 2,909 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/create_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ jobs:
with:
package_name: directlfq
commitish_to_release: ${{ inputs.commitish_to_release }}
python_version: 3.8
python_version: 3.9
test_app: false
1 change: 0 additions & 1 deletion .github/workflows/quick_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ on:
push:
branches: [ main, development ]
pull_request:
branches: [ main, development ]
workflow_dispatch:

name: Quick tests based on default installation
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/quick_tests_ubuntu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ on:
push:
branches: [ main, development ]
pull_request:
branches: [ main, development ]
workflow_dispatch:

name: Quick tests based on default installation, ubuntu stable
Expand All @@ -24,7 +23,7 @@ jobs:
- name: Conda info
shell: bash -l {0}
run: conda info
- name: Test pip installation with all loose dependencies
- name: Test pip installation with stable dependencies
shell: bash -l {0}
run: |
cd misc
Expand Down
34 changes: 34 additions & 0 deletions .github/workflows/ratio_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
on:
pull_request:
workflow_dispatch:

name: Ratio tests, ubuntu stable

jobs:
loose_installation:
name: Test loose pip installation on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
steps:
- uses: actions/checkout@v2
- uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
- name: Test pip installation with stable dependencies
shell: bash -l {0}
run: |
cd misc
. ./stable_pip_install.sh
- name: Ratio tests
shell: bash -l {0}
run: |
cd tests
. ./run_ratio_tests.sh
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ There are currently two different types of installation possible:

* [**One-click GUI installer:**](#one-click-gui) Choose this installation if you only want the GUI and/or keep things as simple as possible.
<!---
* [**Pip installer:**](#pip) Choose this installation if you want to use directlfq as a Python package in an existing Python 3.8 environment (e.g. a Jupyter notebook). If needed, the GUI and CLI can be installed with pip as well.
* [**Pip installer:**](#pip) Choose this installation if you want to use directlfq as a Python package in an existing Python 3.9 environment (e.g. a Jupyter notebook). If needed, the GUI and CLI can be installed with pip as well.
-->
* [**Developer installer:**](#developer) Choose this installation if you are familiar with CLI tools, [conda](https://docs.conda.io/en/latest/) and Python. This installation allows access to all available features of directlfq and even allows to modify its source code directly. Generally, the developer version of directlfq outperforms the precompiled versions which makes this the installation of choice for high-throughput experiments.

Expand All @@ -71,7 +71,7 @@ Older releases remain available on the [release page](https://github.com/MannLab
-
### Pip

directLFQ can be installed in an existing Python 3.8 environment with a single `bash` command.
directLFQ can be installed in an existing Python 3.9 environment with a single `bash` command.

```bash
pip install directlfq
Expand Down Expand Up @@ -117,7 +117,7 @@ git clone https://github.com/MannLabs/directlfq.git
For any Python package, it is highly recommended to use a separate [conda virtual environment](https://docs.conda.io/en/latest/), as otherwise *dependancy conflicts can occur with already existing packages*.

```bash
conda create --name directlfq python=3.8 -y
conda create --name directlfq python=3.9 -y
conda activate directlfq
```

Expand Down
2 changes: 1 addition & 1 deletion directlfq/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"software",
"AlphaPept ecosystem",
]
__python_version__ = ">=3.8"
__python_version__ = ">=3.9"
__classifiers__ = [
# "Development Status :: 1 - Planning",
# "Development Status :: 2 - Pre-Alpha",
Expand Down
78 changes: 0 additions & 78 deletions directlfq/benchmarking.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,5 @@
# AUTOGENERATED! DO NOT EDIT! File to edit: ../nbdev_nbs/06_benchmarking.ipynb.

# %% auto 0
__all__ = ['plot_lines', 'plot_points', 'get_tps_fps', 'annotate_dataframe', 'compare_to_reference', 'compare_normalization',
'print_nonref_hits', 'ResultsTable', 'ResultsTableDirectLFQ', 'ResultsTableIq', 'ResultsTableMaxQuant',
'ResultsTableMerger', 'OrganismAnnotator', 'OrganismAnnotatorMaxQuant', 'OrganismAnnotatorSpectronaut',
'OrganismAnnotatorDIANN', 'PlotConfig', 'MultiOrganismIntensityFCPlotter', 'ResultsTableBiological',
'CVInfoDataset', 'CVDistributionPlotter', 'HistPlotConfig', 'SampleListScaler', 'SampleIndexIQScaler',
'ScaledDFCreatorDirectLFQFormat', 'ScaledDFCreatorIQFormat', 'LFQTimer', 'TimedLFQRun', 'RuntimeInfo']

# %% ../nbdev_nbs/06_benchmarking.ipynb 1
import matplotlib.pyplot as plt
import numpy as np

Expand Down Expand Up @@ -336,76 +327,7 @@ def __init__(self, x_axisboundaries = None, y_axisboundaries = None, colormaps =
self.y_axisboundaries = y_axisboundaries
self._colormaps = colormaps

class MultiOrganismIntensityFCPlotter():
def __init__(self, ax, resultstable_w_ratios, organisms_to_plot = None, fcs_to_expect = None, title = ""):
self.ax = ax
self._color_list_hex = ['#bad566', '#325e7a', '#ffd479']
self._resultstable_w_ratios = resultstable_w_ratios
self._organism_column = resultstable_w_ratios.organism_column
self._log2fc_column = resultstable_w_ratios.log2fc_column
self._mean_intensity_column = resultstable_w_ratios.mean_intensity_column

self._organisms_to_plot = self._get_organisms_to_plot(organisms_to_plot)
self._fcs_to_expect = fcs_to_expect

self._title = self._get_title(title)
self._scatter_per_organism()
self._add_expected_lines()

def _get_organisms_to_plot(self, organisms_to_plot):
if organisms_to_plot is not None:
return organisms_to_plot
else:
return sorted(list(set(self._resultstable_w_ratios.formated_dataframe[self._organism_column].astype('str'))))

def _get_title(self, title):
if title !="":
self._print_infos_about_data()
return title
return self._generate_title()

def _print_infos_about_data(self):
for organism in self._organisms_to_plot:
subtable_organism = self._get_organism_subtable(organism)
print(self._get_stats_of_organism(organism, subtable_organism))

def _generate_title(self):
title = ""
for organism in self._organisms_to_plot:
subtable_organism = self._get_organism_subtable(organism)
title += self._get_stats_of_organism(organism, subtable_organism)
return title

def _scatter_per_organism(self):
complete_table = self._resultstable_w_ratios.formated_dataframe.copy()
complete_table[self._mean_intensity_column] = np.log2(complete_table[self._mean_intensity_column])
complete_table = self._remove_omitted_organisms_from_table(complete_table)
color_palette = sns.color_palette(self._color_list_hex, n_colors=len(self._organisms_to_plot))
sns.scatterplot(data= complete_table, x =self._mean_intensity_column, y= self._log2fc_column, hue=self._organism_column, alpha=0.15, ax=self.ax,
hue_order=self._organisms_to_plot, palette=color_palette, size=0.2)
self.ax.set_title(self._title)

def _remove_omitted_organisms_from_table(self, complete_table):
row_w_permitted_organism = [x in self._organisms_to_plot for x in complete_table["organism"]]
return complete_table[row_w_permitted_organism]

def _add_expected_lines(self):
if self._fcs_to_expect is not None:
for idx, fc in enumerate(self._fcs_to_expect):
color = self._color_list_hex[idx]
self.ax.axhline(fc, color = color)

def _get_organism_subtable(self, organism):
complete_table = self._resultstable_w_ratios.formated_dataframe
return complete_table[complete_table[self._organism_column] == organism]

def _get_stats_of_organism(self, organism, subtable_organism):
fcs = subtable_organism[self._log2fc_column].to_numpy()
fcs = fcs[np.isfinite(fcs)]
median_fc = np.nanmedian(fcs)
std_fc = np.nanstd(fcs)
num_ratios = sum(~np.isnan(fcs))
return f"{organism} num:{num_ratios} median_FC:{median_fc:.2} STD:{std_fc:.2}\n"


# %% ../nbdev_nbs/06_benchmarking.ipynb 16
Expand Down
63 changes: 62 additions & 1 deletion directlfq/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@

from numpy.random import MT19937
from numpy.random import RandomState, SeedSequence
import directlfq.config as config
import logging
config.setup_logging()

LOGGER = logging.getLogger(__name__)


class ProteinProfileGenerator():
def __init__(self, peptide_profiles):
Expand Down Expand Up @@ -57,4 +63,59 @@ def _add_zeros_to_profilevector(self):
num_elements_to_set_zero = int(self._num_samples*self._fraction_zeros_in_profile)
idxs_to_set_zero = np.random.choice(self._num_samples,size=num_elements_to_set_zero, replace=False)
self.peptide_profile_vector[idxs_to_set_zero] = 0






class RatioChecker():
def __init__(self, formatted_df : pd.DataFrame, organism2expectedfc : dict, organism2CI95 : dict):
"""the ratio checker takes in a dataframe with columns 'log2fc' and 'organism' and checks if the ratios are consistent with the expected median fold changes and confidence intervals
Args:
formatted_df (pd.DataFrame): dataframe with columns 'log2fc' and 'organism' from mixed species experiment
organism2expectedfc (dict): the expected log2fc for this organism in the mixed species experiment
organism2CI95 (dict): the expected confidence interval for this organism in the mixed species experiment (i.e. 95% of protein ratios should be within this interval)
"""
self._formatted_df = formatted_df
self._organism2expectedfc = organism2expectedfc
self._organism2deviation_threshold = organism2CI95
self._organism2fcs = self._get_organism2fcs()
self._check_ratio_consistency_per_organism()

def _get_organism2fcs(self):
return self._formatted_df.groupby('organism')['log2fc'].apply(list).to_dict()

def _check_ratio_consistency_per_organism(self):
for organism in self._organism2expectedfc.keys():
expected_fc = self._organism2expectedfc[organism]
deviation_threshod = self._organism2deviation_threshold[organism]
fcs = self._organism2fcs[organism]
RatioConsistencyChecker(fcs = fcs, expected_fc = expected_fc, deviation_threshold = deviation_threshod)

class RatioConsistencyChecker():
def __init__(self, fcs, expected_fc, deviation_threshold):
self._fcs = np.array([fc for fc in fcs if np.isfinite(fc)])
self._expected_fc = expected_fc
self._fc_deviations = self._calc_fc_deviations()
self._deviation_threshold = deviation_threshold
self._fc_deviation_center = self._calc_deviation_center() #should be 0 if no bias
self._fraction_consistent = self._get_fraction_consistent()

self._assert_no_bias()

def _calc_fc_deviations(self):
return abs(self._fcs - self._expected_fc)

def _calc_deviation_center(self):
return np.nanmedian(self._fc_deviations)

def _get_fraction_consistent(self):
consistent_fcs= sum(self._fc_deviations <= self._deviation_threshold)
total_fcs = len(self._fcs)
return consistent_fcs/total_fcs

def _assert_no_bias(self):
assert self._fc_deviation_center < self._deviation_threshold, f"Deviation from center: {self._fc_deviation_center:.2f}"
assert self._fraction_consistent >0.95, f"Fraction consistent below 95%: {self._fraction_consistent:.2f}"
LOGGER.info("Checked ratios, no bias detected")
84 changes: 78 additions & 6 deletions directlfq/visualizations.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
# AUTOGENERATED! DO NOT EDIT! File to edit: ../nbdev_nbs/05_visualizations.ipynb.
import numpy as np
import directlfq.config as config
import logging
config.setup_logging()

# %% auto 0
__all__ = ['a4_dims', 'a4_width_no_margin', 'AlphaPeptColorMap', 'CmapRegistrator', 'IonTraceCompararisonPlotter',
'IonTraceCompararisonPlotterNoDirectLFQTrace', 'IonTraceVisualizer', 'MultiOrganismMultiMethodBoxPlot',
'plot_withincond_fcs', 'plot_relative_to_median_fcs']
LOGGER = logging.getLogger(__name__)

# %% ../nbdev_nbs/05_visualizations.ipynb 1
a4_dims = (11.7, 8.27)
a4_width_no_margin = 10.5

Expand Down Expand Up @@ -222,3 +221,76 @@ def plot_relative_to_median_fcs(normed_intensity_df):
plt.hist(diff_fcs,80,density=True, histtype='step')
plt.xlabel("log2 peptide fcs")
plt.show()



class MultiOrganismIntensityFCPlotter():
def __init__(self, ax, resultstable_w_ratios, organisms_to_plot = None, fcs_to_expect = None, title = ""):
self.ax = ax
self._color_list_hex = ['#bad566', '#325e7a', '#ffd479']
self._resultstable_w_ratios = resultstable_w_ratios
self._organism_column = resultstable_w_ratios.organism_column
self._log2fc_column = resultstable_w_ratios.log2fc_column
self._mean_intensity_column = resultstable_w_ratios.mean_intensity_column

self._organisms_to_plot = self._get_organisms_to_plot(organisms_to_plot)
self._fcs_to_expect = fcs_to_expect

self._title = self._get_title(title)
self._scatter_per_organism()
self._add_expected_lines()

def _get_organisms_to_plot(self, organisms_to_plot):
if organisms_to_plot is not None:
return organisms_to_plot
else:
return sorted(list(set(self._resultstable_w_ratios.formated_dataframe[self._organism_column].astype('str'))))

def _get_title(self, title):
if title !="":
self._print_infos_about_data()
return title
return self._generate_title()

def _print_infos_about_data(self):
for organism in self._organisms_to_plot:
subtable_organism = self._get_organism_subtable(organism)
LOGGER.info(self._get_stats_of_organism(organism, subtable_organism))

def _generate_title(self):
title = ""
for organism in self._organisms_to_plot:
subtable_organism = self._get_organism_subtable(organism)
title += self._get_stats_of_organism(organism, subtable_organism)
return title

def _scatter_per_organism(self):
complete_table = self._resultstable_w_ratios.formated_dataframe.copy()
complete_table[self._mean_intensity_column] = np.log2(complete_table[self._mean_intensity_column])
complete_table = self._remove_omitted_organisms_from_table(complete_table)
color_palette = sns.color_palette(self._color_list_hex, n_colors=len(self._organisms_to_plot))
sns.scatterplot(data= complete_table, x =self._mean_intensity_column, y= self._log2fc_column, hue=self._organism_column, alpha=0.15, ax=self.ax,
hue_order=self._organisms_to_plot, palette=color_palette, size=0.2)
self.ax.set_title(self._title)

def _remove_omitted_organisms_from_table(self, complete_table):
row_w_permitted_organism = [x in self._organisms_to_plot for x in complete_table["organism"]]
return complete_table[row_w_permitted_organism]

def _add_expected_lines(self):
if self._fcs_to_expect is not None:
for idx, fc in enumerate(self._fcs_to_expect):
color = self._color_list_hex[idx]
self.ax.axhline(fc, color = color)

def _get_organism_subtable(self, organism):
complete_table = self._resultstable_w_ratios.formated_dataframe
return complete_table[complete_table[self._organism_column] == organism]

def _get_stats_of_organism(self, organism, subtable_organism):
fcs = subtable_organism[self._log2fc_column].to_numpy()
fcs = fcs[np.isfinite(fcs)]
median_fc = np.nanmedian(fcs)
std_fc = np.nanstd(fcs)
num_ratios = sum(~np.isnan(fcs))
return f"{organism} num:{num_ratios} median_FC:{median_fc:.2} STD:{std_fc:.2}\n"
2 changes: 1 addition & 1 deletion misc/check_version.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# TODO remove with old release workflow
current_version=$(grep "__version__" ../directlfq/__init__.py | cut -f3 -d ' ' | sed 's/"//g')
current_version_as_regex=$(echo $current_version | sed 's/\./\\./g')
conda create -n version_check python=3.8 pip=20.1 -y
conda create -n version_check python=3.9 pip=20.1 -y
conda activate version_check
set +e
already_on_pypi=$(pip install directlfq== 2>&1 | grep -c "$current_version_as_regex")
Expand Down
2 changes: 1 addition & 1 deletion misc/loose_pip_install.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
conda create -n directlfq python=3.8 -y
conda create -n directlfq python=3.9 -y
conda activate directlfq
pip install -e '../.[development-stable, gui]'
directlfq
Expand Down
2 changes: 1 addition & 1 deletion misc/stable_pip_install.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
conda create -n directlfq python=3.8 -y
conda create -n directlfq python=3.9 -y
conda activate directlfq
pip install -e '../.[stable,development-stable, gui-stable]'
directlfq
Expand Down
Loading

0 comments on commit d04985f

Please sign in to comment.