Skip to content

Commit

Permalink
update average genome size computing in metagenomes
Browse files Browse the repository at this point in the history
  • Loading branch information
caozhichongchong committed May 31, 2021
1 parent e57f6f9 commit fe3cb62
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 14 deletions.
Binary file modified .DS_Store
Binary file not shown.
32 changes: 21 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,22 @@ arg_ranker evaluates the risk of ARGs in genomes and metagenomes

## Requirement
* python 3
* kraken2: `conda install -c bioconda kraken2`\
download kraken2 database: `kraken2-build --standard --db $KRAKENDB` \
where $krakenDB is your preferred database name/location\
* diamond: `conda install -c bioconda diamond`\
* diamond: `conda install -c bioconda diamond`
* blast+: `conda install -c bioconda blast`
* For metagenomes:
* kraken2: `conda install -c bioconda kraken2`
* to compute the abundance of ARGs as copy number of ARGs per bacterial cell (recommended)
* download the kraken2 standard database (50 GB of disk space): `kraken2-build --standard --db $KRAKENDB` \
where $KRAKENDB is your preferred database name/location
* MicrobeCensu: `git clone https://github.com/snayfach/MicrobeCensus && cd MicrobeCensus && python setup.py install` to estimate the average genome size for metagenomes.
* to compute the abundance of ARGs as copy number of ARGs per 16S
* download the kraken2 16S database (73.2 MB of disk space): `kraken2-build --db $DBNAME --special greengenes`

## How to use it
* put all your genomes (.fa or .fasta) and metagenomes (.fq or .fastq) into one folder ($INPUT)
* run `arg_ranker -i $INPUT --kkdb $KRAKENDB`
* run `arg_ranker -i $INPUT` (genomes only)
* run `arg_ranker -i $INPUT -kkdb $KRAKENDB` (genomes/metagenomes + kraken2 standard database)
* or run `arg_ranker -i $INPUT -kkdb $KRAKENDB -kkdbtype 16S` (kraken2 16S database)
* run `sh arg_ranking/script_output/arg_ranker.sh`

## Output
Expand All @@ -25,21 +32,24 @@ where $krakenDB is your preferred database name/location\
|WEE300_all-trimmed-decont_1.fastq|4.6E-02|0.0E+00|6.8E-02|7.5E-01|1.3E-01|5.4E-04|1.0-0.0-0.5-1.7-0.3|1.0|0.0|0.5|1.7|0.3|hospital_metagenome|
|EsCo_genome.fasta|0.0E+00|0.0E+00|0.0E+00|1.0E+00|0.0E+00|2.0E+00|0.0-0.0-0.0-2.2-0.0|0.0|0.0|0.0|2.2|0.0|E.coli_genome|

1. We compute the abundance of ARGs as the copy number of ARGs divided by the 16S copy number in a sample\
For metagenomes, the copy number of ARGs and 16S were computed as the number of reads mapped to them divided by their gene length.\
Rank_I_per - Unassessed_per: percentage of ARGs of a risk rank\
1. Rank_I_per - Unassessed_per: percentage of ARGs of a risk rank\
Total_abu: total abundance of all ARGs
2. We compute the risk of ARGs as the average abundance of ARGs of a risk rank divided the average abundance of all ARGs\
2. For genomes, we output the copy number of ARGs detected in each genome.
3. For metagenomes, we compute the abundance of ARGs as the copy number of ARGs divided by the bacterial cell number or 16S copy number in the same metagenome.\
If you downloaded the kraken2 standard database, we compute the copy number of ARGs divided by the bacterial cell number.\
If you downloaded the kraken2 16S database, we compute the copy number of ARGs divided by the 16S copy number.\
The copy number of ARGs, 16S, and bacterial cells were computed as the number of reads mapped to them divided by their gene/genome length.
4. We compute the risk of ARGs as the average abundance of ARGs of a risk rank divided by the average abundance of all ARGs\
Rank_I_risk - Unassessed_risk: the risk of ARGs of a risk rank\
Rank_code: a code of ARG risk from Rank I to Unassessed

* Sample_ARGpresence.txt:\
The abundance, the gene family, and the antibiotic of resistance of ARGs detected in the input samples

## Test
run `arg_ranker -i example --kkdb $KRAKENDB`\
run `arg_ranker -i example -kkdb $KRAKENDB`\
run `sh arg_ranking/script_output/arg_ranker.sh`\
The arg_ranking/Sample_ranking_results.txt should look like Table 1
The arg_ranking/Sample_ranking_results.txt should look like Table 1 (using kraken2 standard database)

## Metadata for your samples (optional)
arg_ranker can merge your sample metadata into the results of ARG ranking (i.e. note1 in Table 1).\
Expand Down
21 changes: 19 additions & 2 deletions arg_ranker/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ def main():
help="Optional: complete path to blast if not in PATH",
metavar="/usr/local/bin/blast",
action='store', default='blast', type=str)
optional.add_argument('-mc',
help="Optional: complete path to MicrobeCensus if not in PATH",
metavar="/usr/local/bin/run_microbe_census.py",
action='store', default='run_microbe_census.py', type=str)
optional.add_argument('-kk',
help="Optional: complete path to kraken2 if not in PATH",
metavar="/usr/local/bin/kraken2",
Expand All @@ -62,6 +66,11 @@ def main():
help="Optional: complete path to kraken2 database if not in PATH",
metavar="/usr/local/bin/kraken2/krakendb",
action='store', default='krakendb', type=str)
optional.add_argument('-kkdbtype',
help="Optional: type of kraken2 database (default = standard)",
choice = ['standard','16S'],
metavar="standard or 16S",
action='store', default='standard', type=str)

################################################## Definition ########################################################
args = parser.parse_args()
Expand Down Expand Up @@ -191,11 +200,19 @@ def searchARG(allsamples):
workingdir, inputfasta, samplename, search_output)
# compute taxonomy
try:
sampleoutput = os.path.join(search_output, samplename + '.kraken')
sampleoutput = os.path.join(search_output, samplename + '.kraken.kreport')
f1 = open(sampleoutput, 'r')
except IOError:
cmds += '%skraken2 --db %s %s --output %s --report %s.kreport --threads %s\n' % (
split_string_last(args.kk, 'kraken'), args.kkdb, sample, sampleoutput, sampleoutput, args.t)
# compute average genome size
if args.kkdbtype != '16S':
try:
sampleoutput = os.path.join(search_output, samplename + '.AGS.txt')
f1 = open(sampleoutput, 'r')
except IOError:
cmds += '%run_microbe_census.py -t %s %s %s \n' % (
split_string_last(args.mc, 'run_microbe_census'),args.t, sample, sampleoutput)
sample = os.path.join(search_output, samplename + '.diamond.txt.aa')
# run blast
sampleoutput = os.path.join(search_output, samplename + '.blast.txt')
Expand Down Expand Up @@ -297,7 +314,7 @@ def ranking_arg(inputfasta,Metadata):
# search ARGs in all samples
cmds = searchARG(allsamples)
# output summary table
cmds += 'python %s/bin/ARG_table.sum.py -i %s -d %s\n' % (workingdir,search_output,ARGdatabase_mapping)
cmds += 'python %s/bin/ARG_table.sum.py -i %s -d %s -kkdbtype %s\n' % (workingdir,search_output,ARGdatabase_mapping,args.kkdbtype)
# risk ranking of ARGs
cmds += 'arg_ranker -i %s/Sample_ARGpresence.txt -m %s -o %s\n'%(output_dir,Metadata,output_dir)
f1 = open('%s/arg_ranker.sh'%(scripts_output),'w')
Expand Down
15 changes: 14 additions & 1 deletion arg_ranker/bin/ARG_table.sum.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@
parser.add_argument("-d",
help="database mapping file",
type=str, default='SARG.structure.txt ',metavar='SARG.structure.txt')
optional.add_argument('-kkdbtype',
help="Optional: type of kraken2 database (default = standard)",
choice = ['standard','16S'],
metavar="standard or 16S",
action='store', default='standard', type=str)

################################################## Definition ########################################################
args = parser.parse_args()
Expand Down Expand Up @@ -41,13 +46,21 @@ def sum_ARG(allsearchoutput):
# load kraken results
kraken = glob.glob(searchoutput.replace('.blast.txt.filter','.kraken.kreport'))
copy_16S = 1
gene_length = 1550 # 16S
if kraken!= []:
# metagenomes
for lines in open(kraken[0],'r'):
lines_set = lines.split('\n')[0].split('\t')
if 'Bacteria' in lines:
copy_16S = float(lines_set[1])*2 # pair end
break
if args.kkdbtype != '16S':
# load average genome size
AGS_result = glob.glob(searchoutput.replace('.blast.txt.filter', '.AGS.txt'))[0]
for lines in open(AGS_result,'r'):
if lines.startswith('average_genome_size'):
lines_set = lines.split('\n')[0].split('\t')
gene_length = float(lines_set[1])
# load ARG blast results
samplename = os.path.split(searchoutput)[-1].split('.blast.txt.filter')[0]
allsamplename.append(samplename)
Expand All @@ -58,7 +71,7 @@ def sum_ARG(allsearchoutput):
if copy_16S == 1: # genomes
sampleARG[ARG] += 1.0 / copy_16S
else: # metagenomes
sampleARG[ARG] += 1.0/copy_16S/ARG_length[ARG]*1550 # normalize against ARG length and 16S length
sampleARG[ARG] += 1.0 / copy_16S / ARG_length[ARG] * gene_length # normalize against ARG length and 16S/genome length
allsampleARG.append(sampleARG)
# sum ARG blast results
alloutput = []
Expand Down

0 comments on commit fe3cb62

Please sign in to comment.