Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MUMmer4/MUMmer3 versions reported differently/SLURM job limits #325

Closed
genomesandMGEs opened this issue Aug 30, 2021 · 12 comments
Closed

MUMmer4/MUMmer3 versions reported differently/SLURM job limits #325

genomesandMGEs opened this issue Aug 30, 2021 · 12 comments

Comments

@genomesandMGEs
Copy link

genomesandMGEs commented Aug 30, 2021

Summary:

There seems to be a problem with pyani anim

Description:

When running pyani anim --scheduler SGE -i . -o genomes_ANIm on my collection of ~2k genomes, I get an AttributeError (please see below). I tried to run this locally, witout the --scheduler, and I get the same error. Thanks for looking into this!

Current Output:

Traceback (most recent call last):
File "/home/jbotelho/anaconda3/bin/pyani", line 11, in
load_entry_point('pyani', 'console_scripts', 'pyani')()
File "/home/jbotelho/pyani/pyani/scripts/pyani_script.py", line 117, in run_main
returnval = args.func(args)
File "/home/jbotelho/pyani/pyani/scripts/subcommands/subcmd_anim.py", line 168, in subcmd_anim
nucmer_version = anim.get_version(args.nucmer_exe)
File "/home/jbotelho/pyani/pyani/anim.py", line 110, in get_version
version = match.group() # type: ignore
AttributeError: 'NoneType' object has no attribute 'group'

pyani Version:

0.3.0

installed dependencies

System information
Platorm==Linux-4.4.0-19041-Microsoft-x86_64-with-debian-buster-sid
Python==3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]
Installed pyani Python dependendencies...
Pillow==6.2.0 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
biopython==1.78 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
matplotlib==3.2.1 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
namedlist==1.8 (/home/jbotelho/anaconda3/lib/python3.7/site-packages/namedlist-1.8-py3.7.egg)
networkx==2.3 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
numpy==1.17.2 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
openpyxl==3.0.0 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
pandas==0.25.1 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
scipy==1.6.3 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
seaborn==0.9.0 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
sqlalchemy==1.3.9 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
tqdm==4.36.1 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
Installed pyani development dependendencies...
bandit==Not Installed (-)
black==Not Installed (-)
codecov==Not Installed (-)
coverage==Not Installed (-)
doc8==Not Installed (-)
flake8==Not Installed (-)
jinja2==2.10.3 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
mypy==Not Installed (-)
pydocstyle==Not Installed (-)
pylint==2.4.2 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
pytest==5.2.1 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
pytest-cov==Not Installed (-)
sphinx==2.2.0 (/home/jbotelho/anaconda3/lib/python3.7/site-packages)
Installed pyani pip-install dependendencies...
pre-commit==Not Installed (-)
pytest-ordering==Not Installed (-)
sphinx-rtd-theme==Not Installed (-)
Installed third-party tool versions...
blast+==Linux_2.9.0+
Traceback (most recent call last):
File "/home/jbotelho/anaconda3/bin/pyani", line 11, in
load_entry_point('pyani', 'console_scripts', 'pyani')()
File "/home/jbotelho/pyani/pyani/scripts/pyani_script.py", line 117, in run_main
returnval = args.func(args)
File "/home/jbotelho/pyani/pyani/scripts/subcommands/subcmd_listdeps.py", line 86, in subcmd_listdeps
for tool, version in get_tool_versions():
File "/home/jbotelho/pyani/pyani/dependencies.py", line 117, in get_tool_versions
yield (name, func())
File "/home/jbotelho/pyani/pyani/anim.py", line 110, in get_version
version = match.group() # type: ignore
AttributeError: 'NoneType' object has no attribute 'group'

Python Version:

3.7.4

Operating System:

WSL

@baileythegreen
Copy link
Contributor

Hi, @genomesandMGEs . Thanks for your interest in pyani.

Based on the error, and the dependency output you have provided, I believe the issue is that nucmer is either not installed, of that pyani can not find it. nucmer is a required dependency for using pyani anim.

If you are unsure whether you have installed nucmer, can you please try running:

which nucmer

If this does not return a file location, then nucmer is either not installed or pyani is looking for it in the wrong place. If you know that you have installed it, then you can specify the location with the --nucmer_exe option.

@genomesandMGEs
Copy link
Author

Hi @baileythegreen, many thanks for the quick reply. I have nucmer installed (v4.0.0rc1); have installed mummer with conda. Running which nucmer, I get /home/jbotelho/anaconda3/bin/nucmer.

I tried following your suggestion and ran pyani anim --nucmer_exe /home/jbotelho/anaconda3/bin/nucmer --scheduler SGE -i . -o genomes_ANIm, but I get the same error

Traceback (most recent call last):
File "/home/jbotelho/anaconda3/bin/pyani", line 11, in
load_entry_point('pyani', 'console_scripts', 'pyani')()
File "/home/jbotelho/pyani/pyani/scripts/pyani_script.py", line 117, in run_main
returnval = args.func(args)
File "/home/jbotelho/pyani/pyani/scripts/subcommands/subcmd_anim.py", line 168, in subcmd_anim
nucmer_version = anim.get_version(args.nucmer_exe)
File "/home/jbotelho/pyani/pyani/anim.py", line 110, in get_version
version = match.group() # type: ignore
AttributeError: 'NoneType' object has no attribute 'group'

@widdowquinn
Copy link
Owner

Hi @genomesandMGEs,

It looks like the issue might be the move to MUMmer4 - what's failing there is the check for the reported version number, which appears to be reported differently by the new MUMmer version.

If you're working in a conda environment, the quick solution would be to step back to MUMmer3 until that check gets updated in pyani. But that version check is something we need to see to.

Cheers,

L.

@genomesandMGEs
Copy link
Author

Hi @widdowquinn, thanks for the reply. So, I create a new conda env, cloned the repository and installed pyani with pip install -e ., to get pyani 0.3.0-alpha. I then installed mummer and blast with conda, and now I have nucmer version 3.1. Running pyani anim I now have a new error sh: 1: qsub: not found. I don't seem to find a good solution to install qsub on my WSL. Do you recommend some procedure?

@widdowquinn
Copy link
Owner

I see what's happening there - the check for dependencies is baulking because you don't have the SGE scheduler installed locally. I think we were on top of this with #232 and #276 but the PR hasn't been merged, yet. @baileythegreen - might you please be able to take a quick look at the conflict?

@baileythegreen
Copy link
Contributor

baileythegreen commented Aug 31, 2021

@widdowquinn, I actually think the issue is a bit different.

@genomesandMGEs To clarify: When you ran it in the new environment with nucmer version 3.1, were you using the --scheduler SGE option? Unless you've specifically set up an SGE scheduler on your WSL, this is not going to work. qsub is the command used to submit a job to an SGE scheduler, so using this option is the only reason pyani should be invoking qsub.

The scheduler option is primarily intended for use on compute clusters, so if you are trying to use pyani locally (on a laptop or desktop), that option is probably not needed.

(assuming you just ran the command you sent originally)

Try running this:

pyani anim -i . -o genomes_ANIm

If you have a database file with the default name already created, this should work. (If you don't, it'll give you a pretty arcane SQL error.)

Let me know if that works.

@genomesandMGEs
Copy link
Author

Thanks you two. Yes @baileythegreen, I ran with the --scheduler SGE option. Some months ago I tried to to run a calculation with pyani v0.2 on my work cluster, but there was an incompatibility with SLURM (please see #267). On my HPC work system, one can use only Slurm on all clusters, and thus it is unfortunately not possible to make the SGE scheduler available. So, I copied the genomes to my laptop, and I'm trying to run these with the SGE scheduler. I have around ~2k genomes, so running this locally on my laptop would take forever and would require a lot of RAM. In parallel, I'm running a pyani anim calculation on my work cluster, with the --scheduler SGE option disabled.

I'm now running the command you suggested, and the same for the command I ran before using the --scheduler SGE option, I can see the two progress bars. For the command with the --scheduler SGE option though, I got the qsub error. However, I don't think running anim locally for ~2k bacterial genomes is feasible. Is there a workaround to make the SGE functional on my WSL machine?

@baileythegreen
Copy link
Contributor

@genomesandMGEs, you're right; running it locally for ~2k bacterial genomes probably isn't feasible.

As far as using SGE on your WSL machine, this isn't a question of a workaround to make it functional, it's a question of installing the SGE scheduler. This is theoretically possible, but is not something I have any experience with. There are various sites with instructions, though: search results.

However, I must stress that installing a scheduler on a local machine is not going to help. A scheduler's purpose is to efficiently assign computing tasks to individual nodes in a large computing cluster, such as a supercomputer. If you don't have a cluster, a scheduler can't do anything to help speed computation. It just allocates resources.

I'm sorry I don't have better news for you.

@widdowquinn
Copy link
Owner

widdowquinn commented Aug 31, 2021

Glad to be of help.

There are a few different issues here, so I'll try to take them in turn.

  1. The original errors have been resolved (step down nucmer to v3; remove --scheduler SGE argument).

  2. pyani does not yet have SLURM support. We are working on it, and do have a version in development which will pass jobs to SLURM. However, there is still issue (3)…

  3. If you are comparing 2k genomes, then pyani has to run ((2k ^ 2) / 2 - 2k) ≈ 2m pairwise comparisons. It also has to run 2m filtering operations. That makes a total of 4m jobs. The way SLURM works, each task within an array counts as one "job", and the SLURM scheduler has an upper limit of jobs in the queue (MaxJobCount) which is typically set much lower than 4m - usually lower than 1m. That is - the server configuration imposes a limit on the number of pairwise comparisons you can schedule with pyani. That can't be fixed in code - it's about the SLURM configuration. We're looking at ways to deal with this so larger tasks can still be run, but there will not be a solution implemented in pyani in the short term because it will require significant refactoring of the back-end code. At the moment, you would need to convince your cluster managers to increase the number of jobs in the queue to >4m - as this affects cluster efficiency and the ability to track jobs in the database, I expect you may meet resistance [NOTE: in previous runs on a different - SGE - cluster, I did not encounter this job limit in the queue, and so was able to compare sets of larger than 2k genomes]

  4. @baileythegreen has already dealt with: schedulers like SGE/SLURM will not speed up jobs on your laptop. You will probably not get faster on a single machine than pyani's multiprocessing will give you.

@genomesandMGEs
Copy link
Author

Many thanks to you both for trying to help me with this. @baileythegreen pyani anim is now running on my work cluster (slurm system), so keeping my fingers crossed. In the past I had problems running pyani with version 0.2, let's see if it works with version 0.3. If I reach the upper limit of jobs @widdowquinn mentioned, then I'll try to convince the cluster managers to increase the numbers of jobs in the queue.

@widdowquinn
Copy link
Owner

From conversations here, it seems like - to avoid issues with overloading the SLURM queue - the recommendation is to have each array task handle multiple jobs, and submit a single array job (which will be limited, e.g. to 10k tasks). The individual tasks run longer (because there are more comparisons per task), but there is then less overhead on the scheduler.

With an array of 10k tasks, and ≈4m comparisons, that would mean 400 comparisons per task.

This is likely the model we will go with for the backend (as the total number of comparisons gets large). It might even be convenient for our plans regarding asynchronous population of the database in version 3.

@baileythegreen
Copy link
Contributor

@widdowquinn Please advise as to whether we want to keep this issue open, rename it, create a new issue, et cetera given that the current topic of discussion is no longer at all related to the issue title.

@widdowquinn widdowquinn changed the title AttributeError: 'NoneType' object has no attribute 'group' MUMmer4/MUMmer3 versions reported differently/SLURM job limits Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants