-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Clustering using HDBSCAN running" step dows not complete #5
Comments
Thanks for the issue. I will look into this. I may update the docker file accordingly and reply to you in a few days time. |
Hi @anuradhawick! We are having the same problem clustering using hdbscan but we are using conda instead of docker. Did you have time to look into the issue yet? Thank you very much in advance!
|
Hi Anjuli, thanks for the issue. can you tell me how you installed packages using conda? I need to know command to install hdbscan. Thanks. |
Hi @anuradhawick! Thank you for your reply! We used the following command as suggested in the README.md
Thank you very much for looking into it. We are eager to use your tool on our data! Cheers, |
Hi @4njul1 and @crastr, Could you please try to install HDBSCAN using the command,
There are issues in the conda version and it is not the latest version. Let me know if this helps, ~Anuradha |
Hi @anuradhawick!, thanks a lot for looking into this. I tried to upgrade HDBSCAN with the command you posted and it works! Thank you very much for your help! Best wishes, |
@4njul1 fantastic. Please let me know how the tool performs, any artefacts and feedback when you have time. Thanks |
Hi @anuradhawick!
We managed to launch LRBinner in docker, but the "Clustering using HDBSCAN running" step ended with the following mistake.
docker run --rm -it --gpus '"device=3"' -v
pwd
:pwd
-uid -u
:id -g
anuradhawick/lrbinner contigs -r $PWD/c1.fq -c $PWD/c1.fasta --k-size 4 --cuda --output $PWD/resultOutput:
2021-12-10 17:35:54,303 - INFO - Command /usr/LRBinner/LRBinner contigs -r /mnt/40_tb_10/work/alex/other_labs/andronov/andornov_metag_2021/complete_polished/c1/c1.fq -c /mnt/40_tb_10/work/alex/other_labs/andronov/andornov_metag_2021/complete_polished/c1/c1.fasta --k-size 4 -t 40 --cuda --output /mnt/40_tb_10/work/alex/other_labs/andronov/andornov_metag_2021/complete_polished/c1/result --resume
2021-12-10 17:35:57,360 - INFO - CUDA found in system
2021-12-10 17:35:57,362 - INFO - Resuming the program from previous checkpoints
2021-12-10 17:35:57,363 - INFO - Loading contig lengths
2021-12-10 17:35:57,485 - INFO - Loading marker genes from previous computations
2021-12-10 17:38:00,783 - INFO - Contigs already split
2021-12-10 17:38:00,783 - INFO - 15-mer counting already performed
2021-12-10 17:38:00,783 - INFO - K-mer vectors already computed
2021-12-10 17:38:00,783 - INFO - Coverage vectors already computed
2021-12-10 17:38:01,196 - INFO - Numpy arrays already computed
2021-12-10 17:38:01,196 - INFO - VAE already trained
2021-12-10 17:38:01,248 - INFO - Clustering using HDBSCAN running
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 407, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "/opt/conda/lib/python3.9/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
File "sklearn/neighbors/_binary_tree.pxi", line 1057, in sklearn.neighbors._kd_tree.BinaryTree.setstate
File "sklearn/neighbors/_binary_tree.pxi", line 999, in sklearn.neighbors._kd_tree.BinaryTree._update_memviews
File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
File "stringsource", line 349, in View.MemoryView.memoryview.cinit
ValueError: buffer source array is read-only
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/LRBinner/LRBinner", line 197, in
main()
File "/usr/LRBinner/LRBinner", line 179, in main
pipelines.run_contig_binning(args)
File "/usr/LRBinner/mbcclr_utils/pipelines.py", line 242, in run_contig_binning
cluster_utils.perform_contig_binning_HDBSCAN(
File "/usr/LRBinner/mbcclr_utils/cluster_utils.py", line 494, in perform_contig_binning_HDBSCAN
labels = HDBSCAN(min_cluster_size=250).fit_predict(latent)
File "/opt/conda/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 941, in fit_predict
self.fit(X)
File "/opt/conda/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 919, in fit
self.min_spanning_tree) = hdbscan(X, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/hdbscan/hdbscan.py", line 610, in hdbscan
(single_linkage_tree, result_min_span_tree) = memory.cache(
File "/opt/conda/lib/python3.9/site-packages/joblib/memory.py", line 349, in call
return self.func(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 275, in _hdbscan_boruvka_kdtree
alg = KDTreeBoruvkaAlgorithm(tree, min_samples, metric=metric,
File "hdbscan/_hdbscan_boruvka.pyx", line 392, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm.init
File "hdbscan/_hdbscan_boruvka.pyx", line 426, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm._compute_bounds
File "/opt/conda/lib/python3.9/site-packages/joblib/parallel.py", line 1056, in call
self.retrieve()
File "/opt/conda/lib/python3.9/site-packages/joblib/parallel.py", line 935, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/opt/conda/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 445, in result
return self.__get_result()
File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
raise self._exception
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
The same line works perfectly on 10% of contigs and 10% of reads.
Quick googling showed that the possible problem could be connected to the number of rows (like in https://githubmemory.com/repo/scikit-learn/scikit-learn/issues/21228).
Thanks in advance!
Alexey
The text was updated successfully, but these errors were encountered: