Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIDAS2 merge_snps Command Stuck with No Progress and Incomplete Output #145

Open
yejunbin opened this issue Oct 1, 2024 · 1 comment
Open

Comments

@yejunbin
Copy link

yejunbin commented Oct 1, 2024

Description:

I encountered an issue when running the midas2 merge_snps command. The process has been running for several days without noticeable progress. The log file seems to show repeated "start" and "finish" messages for various species, but many of the output folders for certain species are either empty or incomplete.

Steps to Reproduce:

  1. First, I obtained the SNPs information using midas2 with the following script to generate the snp information for 200 samples.
midas2 run_species \
                --sample_name ${sample} \
                -1 cleandata/${sample}.R1.fq.gz \
                -2 cleandata/${sample}.R2.fq.gz \
                --midasdb_name uhgg \
                --midasdb_dir /Disk2/database/midas2/uhgg/ \
                --num_cores 16 \
                midas2

midas2 run_snps \
                --sample_name ${sample} \
                -1 cleandata/${sample}.R1.fq.gz \
                -2 cleandata/${sample}.R2.fq.gz \
                --midasdb_name uhgg \
                --midasdb_dir /Disk2/database/midas2/uhgg/ \
                --num_cores 16 \
                midas2 
   
  1. Then, I ran the following midas2 merge_snps command:

echo -e "sample_name\tmidas_outdir" >> midas2_sample_list.txt
   
   ls midas2/ | while read line; do 
       if [ -f midas2/${line}/snps/snps_summary.tsv ]; then 
           echo -e "$line\tmidas2" >> midas2_sample_list.txt
       fi 
   done

   midas2 merge_snps \
     --samples_list midas2_sample_list.txt \
     --midasdb_name uhgg \
     --midasdb_dir /Disk2/database/midas2/uhgg/ \
     --num_cores 120 \
     --chunk_size 200000 --robust_chunk \
     --sample_counts 10 \
     midas2_merge

Observed Behavior:

  • The process has been running for 3 days with no significant progress.

  • The log file shows repeated messages of "start" and "finish" for accumulate_samples and call_and_write_population_snps, as shown below:

  • 1727460035.2:      MIDAS2::species_worker::102298--2::start call_and_write_population_snps
    1727460042.5:    MIDAS2::process::102538-1::finish snps_worker
    1727460042.5:    MIDAS2::process::102538--1::start collect_chunks
    1727460043.6:      MIDAS2::species_worker::100273--2::finish accumulate_samples
    1727460043.6:      MIDAS2::species_worker::100273--2::start call_and_write_population_snps
    1727460043.9:    MIDAS2::process::102538--1::finish collect_chunks
    1727460060.8:      MIDAS2::species_worker::101367--2::finish accumulate_samples
    1727460060.8:      MIDAS2::species_worker::101367--2::start call_and_write_population_snps
    ...
    
  • Many species result directories in midas2_merge/snps/ are empty or contain only partial files. For example:

    midas2_merge/snps/100078:
    100078.snps_depth.tsv.lz4  100078.snps_freqs.tsv.lz4  100078.snps_info.tsv.lz4
    
    midas2_merge/snps/100084: [empty]
    
    midas2_merge/snps/100087: [empty]
    
    midas2_merge/snps/100099:
    100099.snps_depth.tsv.lz4  100099.snps_freqs.tsv.lz4  100099.snps_info.tsv.lz4
    ...
    

Expected Behavior:

  • The merge_snps command should complete within a reasonable time frame and produce merged SNP files for all species without leaving empty or incomplete folders.

System Information:

  • MIDAS2 version: [MIDAS2]
  • Database: UHGG
  • Number of cores: 120
  • Chunk size: 200,000
  • Operating system: [ubuntu 22]

Log File Excerpts:

Here are some excerpts from the log file for reference:

1727460035.2:      MIDAS2::species_worker::102298--2::start call_and_write_population_snps
1727460042.5:    MIDAS2::process::102538-1::finish snps_worker
1727460042.5:    MIDAS2::process::102538--1::start collect_chunks
1727460043.6:      MIDAS2::species_worker::100273--2::finish accumulate_samples
1727460043.6:      MIDAS2::species_worker::100273--2::start call_and_write_population_snps
...

Request:

Could you please investigate this issue? Any guidance on how to resolve it would be greatly appreciated. I'm particularly concerned about the empty species folders and the long runtime without progress.

Thank you for your help!

@zhaoc1
Copy link
Contributor

zhaoc1 commented Oct 1, 2024

Hi,

Thank you for providing the detailed log. The merge_snps process for 200 samples should not take 3 days. It seems like the issue might be related to memory limitations or CPU thrashing.

Could you confirm the total memory available on your machine? This task is memory-intensive, and if progress has stalled for 3 days, it’s possible the machine was overwhelmed. The call_and_write_population_snps step loads chunk pileups from all samples into memory to calculate population SNPs. The more cores you use, the more memory your system needs. For 200 samples, I recommend using a machine with at least 120 GB of memory and 16 cores (using --num_cores 16), while keeping the default chunk size. If your machine has more memory, you can try increasing to 32 cores.

A few additional notes:

  • Are you using vCPUs or physical CPUs?

  • The --chunk_size 200000 isn’t the default chunk size. I recommend running:

midas compute_chunks --chunk_type merge_snps --chunk_size 200000 --species all --midasdb_name $db_name --midasdb_dir $db_dir --debug --force -t ${num_cores}

This will calculate the chunk information accordingly.

  • The empty species folders are created by MIDAS during the preprocessing phase before multiprocessing begins, so this is expected behavior and not a bug.

Let me know if this works for you!

Best
Chunyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants