Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seaborn-v0_8-dark issue persisting in v0.1.8, attempts to re-run without re-modelling yields different passed variants #28

Open
rachwo opened this issue Jul 7, 2024 · 3 comments

Comments

@rachwo
Copy link

rachwo commented Jul 7, 2024

Hi there!

After running cellsnp-lite, I attempted to run mquad using the following:

mquad \
-c /cellsnplite_allsamples/sample1 \
-o /outputfolder \
-p 20 \
--minDP 5

but get this issue:

Traceback (most recent call last):
  File "/hostservername/software/linux-x86_64-centos7/Anaconda3-4.10.1/lib/python3.8/site-packages/matplotlib/style/core.py", line 121, in use
    rc = rc_params_from_file(style, use_default_template=False)
  File "/hostservername/software/linux-x86_64-centos7/Anaconda3-4.10.1/lib/python3.8/site-packages/matplotlib/__init__.py", line 883, in rc_params_from_file
    config_from_file = _rc_params_in_file(fname, fail_on_error=fail_on_error)
  File "/hostservername/software/linux-x86_64-centos7/Anaconda3-4.10.1/lib/python3.8/site-packages/matplotlib/__init__.py", line 812, in _rc_params_in_file
    with _open_file_or_url(fname) as fd:
  File "/hostservername/software/linux-x86_64-centos7/Anaconda3-4.10.1/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/hostservername/software/linux-x86_64-centos7/Anaconda3-4.10.1/lib/python3.8/site-packages/matplotlib/__init__.py", line 790, in _open_file_or_url
    with open(fname, encoding=encoding) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'seaborn-v0_8-dark'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/myusername/.local/bin/mquad", line 8, in <module>
    sys.exit(main())
  File "/home/myusername/.local/lib/python3.8/site-packages/mquad/mquad_CLI.py", line 144, in main
    best_ad, best_dp = mdphd.selectInformativeVariants(min_cells = minCell, out_dir = out_dir, tenx_cutoff=cutoff)
  File "/home/myusername/.local/lib/python3.8/site-packages/mquad/mquad_batch_mixbin.py", line 242, in selectInformativeVariants
    plt.style.use('seaborn-v0_8-dark')
  File "/hostservername/software/linux-x86_64-centos7/Anaconda3-4.10.1/lib/python3.8/site-packages/matplotlib/style/core.py", line 124, in use
    raise IOError(
OSError: 'seaborn-v0_8-dark' not found in the style library and input is not a valid URL or path; see `style.available` for list of available styles

The output directory contained the following files:

  1. passed_variant_names.txt, 2) deltaBIC_cdf.pdf, 3) debug_unsorted_BIC_params.csv, and 4) BIC_params.csv

I was missing the passed_dp.mtx and passed_ad.mtx, and top variants heatmap.pdf so I attempted to re-run without re-fitting the model using:

mquad \
-c /cellsnplite_allsamples/sample1 \
-o /new_output_folder \
--BICparams /outputfolder/debug_unsorted_BIC_params.csv

This seemed to work and it generated

  1. passed_dp.mtx 2) passed_ad.mtx 3) top variants heatmap.pdf 4) another passed_variant_names.txt file and 5) another deltaBIC_cdf.pdf.

However, I noticed that the passed_variant_names.txt file contains a different list of variants than the first list. Why might that be? The variants in the top variants heatmap pdf match the variants in the passed_variant_names.txt file generated from the re-run attempt, but when I look at their scores* they seem to be incorrect. Is this expected behaviour?

*(I assume based on looking at the BIC_params.csv output. The passed_variant_names.txt generated from the initial run had a higher num_cells and deltaBIC. The passed variant names generated from the re-run pointed to variants that had deltaBIC <=0 and lower num_cells.)

Thanks in advance!

@rachwo rachwo changed the title seaborn-v0_8-dark issue persisting in v0.1.8 seaborn-v0_8-dark issue persisting in v0.1.8, attempts to re-run without re-modelling yields different passed variants Jul 8, 2024
@aaronkwc
Copy link
Collaborator

aaronkwc commented Jul 9, 2024

Hi @rachwo ,

Thanks for reporting this issue.

For the first matplotlib issue, can you please run the following in your python terminal and check if the your matplotlib version is up to date?

>>> import matplotlib.pyplot as plt
>>> print(plt.style.available)
['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-nogrid', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-v0_8', 'seaborn-v0_8-bright', 'seaborn-v0_8-colorblind', 'seaborn-v0_8-dark', 'seaborn-v0_8-dark-palette', 'seaborn-v0_8-darkgrid', 'seaborn-v0_8-deep', 'seaborn-v0_8-muted', 'seaborn-v0_8-notebook', 'seaborn-v0_8-paper', 'seaborn-v0_8-pastel', 'seaborn-v0_8-poster', 'seaborn-v0_8-talk', 'seaborn-v0_8-ticks', 'seaborn-v0_8-white', 'seaborn-v0_8-whitegrid', 'tableau-colorblind10']

The motivation for v0.1.8 fix was to address this matplotlib update where the names changed and the old style name was no longer available. You can check with:

>>> import matplotlib
>>> matplotlib.__version__
'3.8.0'

For the second issue, that is definitely not expected behaviour and would need some debugging. Can you please check if the new deltaBIC_cdf.pdf looks anything like your previous run? There can be a few possible things going wrong so I want to first check if the knee point is behaving normally.

Thanks for your patience and sorry for the bug!

@rachwo
Copy link
Author

rachwo commented Jul 9, 2024

Thanks for the response!

My matplotlib version was older. I've since updated my python (v3.11) and matplotlib version (3.9.1) and am able to generate all expected output files (BIC_params.csv, debug_unsorted_BIC_params.csv, passed_ad.mtx, passed_dp.mtx, deltaBIC_cdf.pdf, top variants heatmap.pdf, and passed_variant_names.txt) using this code:

# FIRST-RUN:
mquad \
-c /cellsnplite_allsamples/sample1 \
-o /outputfolder2 \
-p 20 \
--minDP 5

However, this time, there are more variants in the "passed_variant_names.txt" than what had been generated in my very first attempt (as in my initial post). As a second test, I attempted to rerun without remodelling using the following code:

# RE-RUN:
mquad \
-c /cellsnplite_allsamples/sample1 \
-o /new_output_folder2 \
--BICparams /outputfolder2/debug_unsorted_BIC_params.csv

I noticed again that the variant names are different than what was generated directly above.

I've attached the output top variant heatmaps and deltaBIC cdf plots from both (first run, plus rerun). Hopefully this is helpful to you, but let me know if you need any other info or clarification.

Unrelatedly, are the mitochondrial variant names in the variant_name column of debug_unsorted_BIC_params.csv named such that MT_130_G_C refers to #CHROM=MT, POS=130 of the cellSNP.base.vcf.gz file? I wanted to try some custom plots but want to confirm that this is the naming convention!

Thanks again for your help.

Generated from first run (first codeblock in this post):
deltaBIC_cdf_FIRST-RUN.pdf
top variants heatmap_FIRST-RUN.pdf
passed_variant_names_FIRST-RUN.txt
BIC_params.csv
debug_unsorted_BIC_params.csv

Generated from the re-run (second codeblock in this post):
deltaBIC_cdf_RE-RUN.pdf
top variants heatmap_RE-RUN.pdf
passed_variant_names_RE-RUN.txt

@aaronkwc
Copy link
Collaborator

Thank you so much for providing me with the files, they were very helpful in debugging.

I have pushed a hotfix to the dev version of MQuad, can you please try to install from the repo directly and see if it solves the problem? The output of your second run should be the same as your first now (I am not entirely sure but that's what the dev branch is for).

Also, you are right with the nomenclature of the variants. The names in MQuad correspond to your cellSNP output, however in cellSNP, if you did not provide a ref genome, the allele with the largest count is considered REF, which might cause some issue in certain positions where the REF and ALT are reversed, so I would suggest checking if you are making custom plots. It does not affect the modelling in MQuad though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants