You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue when running the midas2 merge_snps command. The process has been running for several days without noticeable progress. The log file seems to show repeated "start" and "finish" messages for various species, but many of the output folders for certain species are either empty or incomplete.
Steps to Reproduce:
First, I obtained the SNPs information using midas2 with the following script to generate the snp information for 200 samples.
The merge_snps command should complete within a reasonable time frame and produce merged SNP files for all species without leaving empty or incomplete folders.
System Information:
MIDAS2 version: [MIDAS2]
Database: UHGG
Number of cores: 120
Chunk size: 200,000
Operating system: [ubuntu 22]
Log File Excerpts:
Here are some excerpts from the log file for reference:
Could you please investigate this issue? Any guidance on how to resolve it would be greatly appreciated. I'm particularly concerned about the empty species folders and the long runtime without progress.
Thank you for your help!
The text was updated successfully, but these errors were encountered:
Thank you for providing the detailed log. The merge_snps process for 200 samples should not take 3 days. It seems like the issue might be related to memory limitations or CPU thrashing.
Could you confirm the total memory available on your machine? This task is memory-intensive, and if progress has stalled for 3 days, it’s possible the machine was overwhelmed. The call_and_write_population_snps step loads chunk pileups from all samples into memory to calculate population SNPs. The more cores you use, the more memory your system needs. For 200 samples, I recommend using a machine with at least 120 GB of memory and 16 cores (using --num_cores 16), while keeping the default chunk size. If your machine has more memory, you can try increasing to 32 cores.
A few additional notes:
Are you using vCPUs or physical CPUs?
The --chunk_size 200000 isn’t the default chunk size. I recommend running:
This will calculate the chunk information accordingly.
The empty species folders are created by MIDAS during the preprocessing phase before multiprocessing begins, so this is expected behavior and not a bug.
Description:
I encountered an issue when running the
midas2 merge_snps
command. The process has been running for several days without noticeable progress. The log file seems to show repeated "start" and "finish" messages for various species, but many of the output folders for certain species are either empty or incomplete.Steps to Reproduce:
Observed Behavior:
The process has been running for 3 days with no significant progress.
The log file shows repeated messages of "start" and "finish" for
accumulate_samples
andcall_and_write_population_snps
, as shown below:Many species result directories in
midas2_merge/snps/
are empty or contain only partial files. For example:Expected Behavior:
merge_snps
command should complete within a reasonable time frame and produce merged SNP files for all species without leaving empty or incomplete folders.System Information:
Log File Excerpts:
Here are some excerpts from the log file for reference:
Request:
Could you please investigate this issue? Any guidance on how to resolve it would be greatly appreciated. I'm particularly concerned about the empty species folders and the long runtime without progress.
Thank you for your help!
The text was updated successfully, but these errors were encountered: