Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--database argument seemingly not working #90

Open
nschcolnicov opened this issue Oct 2, 2024 · 2 comments
Open

--database argument seemingly not working #90

nschcolnicov opened this issue Oct 2, 2024 · 2 comments

Comments

@nschcolnicov
Copy link

nschcolnicov commented Oct 2, 2024

Background

When working on the nf-core/smrnaseq pipeline I saw an issue where a user was trying to use a gff file from a database that wasn't any of the ones that mirtop looks for ("miRBase", "MirGeneDB" or "microRNAs"), the database is called RumimiR (http://rumimir.sigenae.org).
nf-core/smrnaseq#329

Expected behavior and actual behavior.

When using the --database argument, I'm still getting the error "Database not found in --mirna rumimir_sheep.gff. Use --database argument to add a custom source."

Steps to reproduce the problem.

Attaching my workdir to help others reproduce the issue, this was all run using this image "community.wave.seqera.io/library/mirtop_pybedtools_pysam_samtools_pruned:60b8208f3dbb2910"
workdir.tar.gz

Here is the command that I used

mirtop \
    gff \
    --database RumimiR \
    --sps oar \
    --hairpin hairpin.fa_igenome.fa_idx.fa \
    --gtf rumimir_sheep.gff \
    -o mirtop \
    pc4.bam

And the error that I'm getting:

/opt/conda/lib/python3.12/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
  warnings.warn(
10/02/2024 11:13:01 INFO Run annotation
10/02/2024 11:13:01 ERROR Database not found in --mirna rumimir_sheep.gff. Use --database argument to add a custom source.
Traceback (most recent call last):
  File "/opt/conda/bin/mirtop", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/mirtop/command_line.py", line 31, in main
    reader(kwargs["args"])
  File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/__init__.py", line 28, in reader
    matures = mapper.read_gtf_to_precursor(args.gtf)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/mirtop/mirna/mapper.py", line 164, in read_gtf_to_precursor
    if _guess_database_file(gtf).find("miRBase") > -1:
       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/mirtop/mirna/mapper.py", line 40, in _guess_database_file
    raise ValueError("Database not found in %s header" % gff)
ValueError: Database not found in rumimir_sheep.gff header

FYI @lpantano

@nschcolnicov nschcolnicov changed the title --database argument not working --database argument seemingly not working Oct 2, 2024
@atrigila
Copy link

atrigila commented Oct 2, 2024

@nschcolnicov I think you might need to pass the --database RumimiR-Dec2022 instead of --database RumimiR as that seems to be the identifier in the file:

1	RumimiR-Dec2022	miRNA	89245	89266	.	-	.	Specie=Caprine; Name=chr1_811; RumimiRID=Rum-chi-00001; Tissue=Mammary_tissue

@nschcolnicov
Copy link
Author

Hi @atrigila I tried using that value and I got the same error. Also, I see that the mirtop script extracts the database from the header instead of from the body, which is why I'm using that value:

if not line.startswith("#"):

# Provided by RumimiR database (v. Dec2022) available at http://rumimir.sigenae.org
# File contains 22348 miRNA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants