Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MUMmer 4 reports version differently to MUMmer 3 #326

Closed
baileythegreen opened this issue Aug 31, 2021 · 14 comments · Fixed by #335
Closed

MUMmer 4 reports version differently to MUMmer 3 #326

baileythegreen opened this issue Aug 31, 2021 · 14 comments · Fixed by #335

Comments

@baileythegreen
Copy link
Contributor

Summary:

The regex used in anim.py to retrieve the version of nucmer installed fails with MUMmer v4.0.0rc1, as reported in Issue #325. This issue will theoretically be fixed when PR #276 is merged, but this is a reminder to specifically check that.

Current Output:

The output you get from pyani or at the terminal, with this issue. It is very useful to know the current "wrong" behavior on your system.

Traceback (most recent call last):
File "~/anaconda3/bin/pyani", line 11, in
load_entry_point('pyani', 'console_scripts', 'pyani')()
File "~/pyani/pyani/scripts/pyani_script.py", line 117, in run_main
returnval = args.func(args)
File "~/pyani/pyani/scripts/subcommands/subcmd_anim.py", line 168, in subcmd_anim
nucmer_version = anim.get_version(args.nucmer_exe)
File "~/pyani/pyani/anim.py", line 110, in get_version
version = match.group() # type: ignore
AttributeError: 'NoneType' object has no attribute 'group'

pyani Version:

0.3.0-alpha

@baileythegreen
Copy link
Contributor Author

For the record, this is the change that is causing the problem:

Previous version output:

nucmer -V
nucmer 
NUCmer (NUCleotide MUMmer) version 3.1

New version output:

./nucmer -V
4.0.0rc1

@baileythegreen
Copy link
Contributor Author

baileythegreen commented Sep 4, 2021

So, there are two issues that need to be fixed:

  • The first is that the format of the version has been changed.
  • The second is that it now gets set to stdout, not stderr:
CompletedProcess(args=['/Users/baileythegreen/Software/mummer-4.0.0rc1/nucmer', '-V'],
returncode=0, stdout=b'4.0.0rc1\n', stderr=b'')

We currently use this regex to find the version for nucmer:

match = re.search(r"(?<=version\s)[0-9\.]*", str(result.stderr, "utf-8"))

PR #276 did not resolve this issue, so we need to decide whether we want to try for a global regex (if the version information is now sent to a new place, I'm not sure this is possible), or try a second one if the first one fails.

For reference, here is the output from version 3 (when run interactively):

CompletedProcess(args=['nucmer', '-V'], returncode=0, stdout=b'', stderr=b'nucmer \n
NUCmer (NUCleotide MUMmer) version 3.1\n    \n')

@widdowquinn
Copy link
Owner

It looks like this should be fairly straighforward (pseudocode follows):

try:
    retvals = subprocess.run("nucmer")
except:
    raise CouldntRunNUCmerError

if check_stderr_for_v3_version():
    return "suitable string"
elif check_stdout_for_v4_version():
    return "suitable string"
else:
    raise CouldntGetVersionError

Am I missing something?

@baileythegreen
Copy link
Contributor Author

I'll have to think. Looking at it, I think interspersing those lines into the get_version() function should work. (I am assuming this is where you would put it; and that check_stderr_for_v3_version(), et cetera are going to be nested functions.)

@widdowquinn
Copy link
Owner

If they were functions it wouldn't be my own style to nest them, but if they're never to be used outwith the enclosing function's scope it's a valid approach. I'd have them outside that scope, in case of potential reuse.

@baileythegreen
Copy link
Contributor Author

Ah, you wrote them as functions in your pseudocode, and I was assuming you intended them to be such. I don't mind a nested function, but it did seem odd for you to be suggesting one.

So, I can either write this with top-level functions, or just as an if/elif. Preference?

@widdowquinn
Copy link
Owner

So, I can either write this with top-level functions, or just as an if/elif. Preference?

I'm easy either way on this. But in general… ;)

Working, clear and elegant > Working and clear > Working > Broken

@baileythegreen
Copy link
Contributor Author

pyani doesn't define a lot of super-specific custom errors, though it has a few more general custom ones. In anim.py, for instance, I see PyaniANImException. Unless you specifically want new custom errors made (inheriting from PyaniANImException), I will use what is already defined and pass a specific error message explaining the proximal cause.

@widdowquinn
Copy link
Owner

Sometimes we want custom errors inheriting from PyaniException. My rule of thumb so far has been that, wherever there is an error arising from, say, a datatype mismatch, we use the standard Python exceptions. When something has gone wrong with something in the algorithm (e.g. a result is outwith expected bounds) we inherit from PyaniException in some way.

There is, as yet, no documentation/formal guide for when to make that call. It's something I was introducing in v3 and did not formalise.

@widdowquinn
Copy link
Owner

I think there may have been an implied question somewhere - "should I define a new kind of exception, here?"

I think this could be handled as a generic PyaniANImException with a specific error message, as you say - or something more specific inheriting from PyaniANImException like PyaniANImMUMmerException (with specific message to indicate it's a MUMmer version error, as opposed to one of MUMmer's other error modes).

@baileythegreen
Copy link
Contributor Author

Well, I'll write it one way, and then it can always be changed.

Another question that has just occurred to me: when a version can't be retrieved, maybe we don't want to raise an error; maybe we just want to log a warning and carry on. Raising an exception will halt execution (unless we then catch it in subcmd_anim.py and handle it).

@widdowquinn
Copy link
Owner

when a version can't be retrieved, maybe we don't want to raise an error; maybe we just want to log a warning and carry on.

I'd accept this behaviour if the software itself does not offer version information. Why else would we allow it?

Raising an exception will halt execution

Yes. If we're expecting a version number and can't obtain one, that suggests that something is wrong with the third party tool or our calls to it. If we're not expecting a version number, we can overlook it, as you note above.

Suppose someone installs MUMmer5, and it's got a completely different version string. When we check for the existence of nucmer we don't get the version we expect and we raise the error. That flags up that there's an issue - which we can then fix. I see that, if we ignored the error, we could have continued and (maybe) got the right answer, but raising the error here at least gives us the warning that we might have to adapt the code for the new version.

But suppose someone accidentally installs a different tool called nucmer that isn't part of the MUMmer suite, and it gives us bogus answers. We could have caught this by raising an error when we didn't get the expected version string. If we ignore the error we end up maybe catching an error deeper into the analysis and debugging something trivial that could have been caught earlier.

These are both hypotheticals, and there are no right answers, but I think I'm more comfortable with catching signs of unexpected behaviour early.

@baileythegreen
Copy link
Contributor Author

something more specific inheriting from PyaniANImException like PyaniANImMUMmerException (with specific message to indicate it's a MUMmer version error, as opposed to one of MUMmer's other error modes).

I am pro-hyper-custom exception classes (nested, of course) that can be used to pinpoint exactly which bit of code threw them (because it is the only place that exception is thrown, for instance). But I know this is not to everyone's taste. (An (extreme) example where I was playing with this idea can be seen here: https://github.com/baileythegreen/Mercury/blob/main/EnsemblAPIErrors.py.)


If we're expecting a version number and can't obtain one, that suggests that something is wrong with the third party tool or our calls to it.

I suppose raising an error does more to ensure we hear about an issue like this. I don't know why else we might want to let something like this pass; it just seems possible there is a reason.

@widdowquinn
Copy link
Owner

widdowquinn commented Sep 7, 2021

An (extreme) example where I was playing with this idea [...]

For some reason I'm reminded of writing interactive fiction in INFORM (actually, INFORM6)

I'd probably take a middle line - it makes sense to combine some errors under a single exception. When in doubt I'd look at Python's own hierarchy to help guide the level of granularity. It's one of those things where there can be more than one reasonable approach.

baileythegreen added a commit that referenced this issue Dec 17, 2021
Issue #326: Add new regex to catch MUMmer v4 version information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants