Split input fasta file, smartly #193

chasemc · 2021-10-18T16:15:06Z

This might only be important with quite large metagenomes and even then probably will still not be the slow step even if uneven.
Currently, when running in parallel, the input contigs are naively split/chunked into x-number of files.
We had discussed before that it might be good to split contigs a bit more evenly into files based on contain length so that computation will be a bit more even later.

Relevant code:

Autometa/modules/local/seqkit_split.nf

Lines 31 to 36 in f372b82

    
                   seqkit \\ 
        
                       split \\ 
        
                       ${fasta} \\ 
        
                       ${options.args} \\ 
        
                       ${options.args2} \\ 
        
                       -O outfolder

Autometa/conf/modules.config

Lines 78 to 82 in f372b82

    
           'seqkit_split_options' { 
        
               publish_by_meta  = ['id'] 
        
               args           = "-p ${params.num_splits}" 
        
               args2          = "--two-pass" 
        
           }

chasemc added enhancement New feature or request good first issue Good for newcomers labels Oct 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split input fasta file, smartly #193

Split input fasta file, smartly #193

chasemc commented Oct 18, 2021

Split input fasta file, smartly #193

Split input fasta file, smartly #193

Comments

chasemc commented Oct 18, 2021