Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System for exceptions #97

Closed
abingham opened this issue Apr 18, 2015 · 29 comments
Closed

System for exceptions #97

abingham opened this issue Apr 18, 2015 · 29 comments

Comments

@abingham
Copy link
Contributor

In some cases we'll find that surviving mutations are completely acceptable. Consider ways to allow users to add exceptions.


@abingham
Copy link
Contributor Author

A reasonable approach might be to let users provide an exceptions list of some sort. They would specify the line number or something (though this is brittle.) Then we would simply ignore survival results for that location.

This isn't as robust as embedding exceptions in the code itself, but it also doesn't force people to pollute their code with exception notes.


Original comment by: austin_bingham

@ghost
Copy link

ghost commented Feb 16, 2016

Specifying whitelisted code statements would be helpful, it always likes to find survivors in __name__ == "__main__".

Also, reusing comments designed for other systems may work as well, commonly ignored code may have 'pragma: no cover' annotated for instance.

@abingham
Copy link
Contributor Author

Hehehe...that's a really interesting one I hadn't thought about! That really would be nearly impossible to test for.

The main technical challenge with a code-based whitelist is that, for the most part, cosmic ray doesn't actually deal with the code at all. It works on the AST, so I'm not immediately sure how we could correlate code snippets with AST nodes in a general way. Maybe it's easier than I think, at least for some specific cases like this one.

@ghost
Copy link

ghost commented Feb 16, 2016

For this case at least, it works pretty well. ast.parse('__name__ == "__main__"') just ends up giving back an AST expression with name, eq, "__main__" since it's a valid Python expression on it's own.

As a first effort, only allowing valid Python in the blacklist would still help a lot but give direct comparisons to the AST before modifying it.

If you can somehow make "pass" generic, you could go even further and accept things like if __name__ == '__main__': pass, but it would be a more complicated AST comparison. It kind of reminds me of overriding functions to help pylint work around metaprogramming. I know it uses astroid heavily, so maybe that can help with some of the harder problems with ASTs.

@abingham
Copy link
Contributor Author

Astroid is definitely on my "things to look at" list, but it hasn't risen to the top yet. Perhaps we can completely reuse pylints black/whitelisting system; that would certainly be very convenient!

In all honesty, this isn't going to be my focus for quite a while. My main goal, when and if I have serious time to work on CR, is to get concurrency working better. So if you've got time and/or ideas for this area, I'd be more than happy to help you get them implemented.

@ghost
Copy link

ghost commented Feb 16, 2016

I actually had a few ideas around that, but wasn't sure where to put it.

Most of the test cases for me seem to be incompetent (timeout) which isn't too helpful, but for the killed ones, I suspect many will fail the same tests. It would probably require a rethink of the test plugin system, but hinting which tests to run first could help killed tests fail fast for the same properties.

Maybe forcing the same AST location to run on a single process and abusing the pytest flag --ff (although I'm not sure where it stores that state).

Another idea is tracking runtime per test, so if a single test completes in 5ms during the healthy test run, the whole run can be killed if it takes 15ms (baseline=3) rather than waiting for several seconds that the entire test suite timeout may take. I'm not sure unittest or py.test give that much information easily, but it could greatly speed up timeouts.

I'm happy to help with the speed improvements and reporting when I find some time, but I'm not a great programmer, so I probably won't be joining you in the AST stuff.

@abingham
Copy link
Contributor Author

After a discussion with @rob-smallshire and @DRMacIver at ACCU, it might not be so difficult to have an external file containing processing instructions, exceptions, etc. The gist of the idea was that each entry would include filename, line number, and the line of code in question along with one or two surrounding lines for context. With this information we could scan the code reliably tell when and how the exception file had become stale, prompting the user to update things. The hope is that it would become stale infrequently enough to not be a real burden.

Based on this info we could construct some table of exceptions, and that would be used to trim the work order constructed during initialization.

atodorov added a commit to MrSenko/cosmic-ray that referenced this issue Jul 13, 2016
Don't use `while True` b/c BooleanReplacer breaks this function's
test. This is a bit ugly but until Issue sixty-north#97 is fixed there is
no other way around it.
atodorov added a commit to MrSenko/cosmic-ray that referenced this issue Jul 13, 2016
Don't use `while True` b/c BooleanReplacer breaks this function's
test. This is a bit ugly but until Issue sixty-north#97 is fixed there is
no other way around it.
atodorov added a commit to MrSenko/cosmic-ray that referenced this issue Jul 13, 2016
Don't use `while True` b/c BooleanReplacer breaks this function's
test. This is a bit ugly but until Issue sixty-north#97 is fixed there is
no other way around it.
atodorov added a commit to MrSenko/cosmic-ray that referenced this issue Jul 14, 2016
Don't use `while True` b/c BooleanReplacer breaks this function's
test. This is a bit ugly but until Issue sixty-north#97 is fixed there is
no other way around it.
@abingham
Copy link
Contributor Author

I've started some work on this in the spor branch. This is based on the spor project which is going to be an interesting technical challenge in its own right.

@abingham
Copy link
Contributor Author

@rob-smallshire had the excellent idea that we could anchor exceptions using language constructs rather than source lines. So you could say "don't run operator X for module.submodule.funcA" and so forth.

The same idea is used in pytest where you can select your tests based on "node id". A good first approximation of this technique for CR would be to steal their syntax, etc.

@abingham
Copy link
Contributor Author

anchor exceptions using language constructs

After thinking about this a bit, it has some pretty substantial shortcomings. The biggest one is that it means we can't anchor exceptions to things with no obvious language construct. For example, suppose I had a class-level "constant":

class Foo:
    CONST = 42

How could I ask CR to not mutate the 42? I could imagine an address like "module/Foo.py::Foo::CONST", but I worry that the logic for this would get pretty hairy. In this case, we'd have to know to look for assignment AST nodes and check their targets for the name "CONST".

@boxed
Copy link

boxed commented Nov 28, 2018

mutmut supports the "# pragma: no mutate" option that @xiaclo alludes to above. I think it would be great if Cosmic Ray and Mutpy supported this too so we can have a common standard.

@boxed
Copy link

boxed commented Nov 28, 2018

Also see the discussion on more advanced exclusion rules here: boxed/mutmut#47 and boxed/mutmut#28

@abingham
Copy link
Contributor Author

I'm not satisfied with that "embedded comment" style of directives. Getting it to do anything beyond a simple "don't mutate this line" is going to get ugly fast, and makes me think of people trying to use XML as a programming language. The approach we're working on with spor will, I think, be much more powerful, less intrusive, and (selfishly) far more interesting to work on.

@boxed
Copy link

boxed commented Nov 28, 2018

I agree, but for the simple case it's pretty ok and very simple and straight forward. I hadn't seen spor before (I saw you talk about it but didn't manage to find the repo before). This seems pretty interesting!

Do you have support for mapping some metadata to the entire directory with spor? Because that's mostly what we've been discussing with mutmut: to have global patterns for exclude.

@boxed
Copy link

boxed commented Nov 28, 2018

Ah, I should also mention that we've been using the pragma in practice with mutmut in quite a few projects where we've achieved zero mutants (not counting excludes) and in practice it works surprisingly well we've found. So while I agree on the theoretic approach, in practice you can get probably 90% of the way with just the pragma.

@boxed
Copy link

boxed commented Nov 28, 2018

Hmm.. I had an idea just now: what if spor could import from pragmas? Then you could have a good argument for using spor for coverage.py. Seems like it should be simple to implement and gives a logical migration path.

@abingham
Copy link
Contributor Author

Do you have support for mapping some metadata to the entire directory with spor?

Not right now, and I've only really given idle thought to it. I think it would mean introducing a new kind of "anchor" since the current anchors use file content as context for updating. I've added an issue to spor for this.

@abingham
Copy link
Contributor Author

what if spor could import from pragmas?

I imagine it would be straightforward to write a pragma-to-spor converter. I'd even consider adding something like this to the spor repository, but my instinct is that it should be a standalone tool.

@boxed
Copy link

boxed commented Nov 28, 2018

I'd say you're overthinking it :)

@abingham
Copy link
Contributor Author

How so?

@boxed
Copy link

boxed commented Nov 28, 2018

It's like a 15 lines script, better to include it instead of requiring users to install yet another dependency.

@abingham
Copy link
Contributor Author

Right, but my objection to is that, currently, spor doesn't really know anything about "Python code", so-to-speak. It's written in Python (though there's an experimental rust implementation), but beyond that it's intended to be language agnostic. So yes, it likely won't hurt anyone to have a Python-specific tool in spor, I want to avoid any inclination to make spor into a "Python tool".

@boxed
Copy link

boxed commented Nov 28, 2018

Oh. Yea ok. Still though, you could include various mini tools to try to drive adoption...

@boxed
Copy link

boxed commented Mar 13, 2020

I implemented a (quite hacky) way to plug into mutmut. You can see a practical use at https://github.com/TriOptima/iommi/blob/master/mutmut_config.py It is very successful. This little config makes mutation testing of iommi pretty fast, as compared to unfeasible without it.

It would be nice if we would agree on a common API (maybe with specific extensions namespaced so we can still have differentiation before we standardize). I would love your feedback on this.

@abingham
Copy link
Contributor Author

If I understand what you've got, you're skipping mutations based on whether they've got tests associated with them or if certain decorators are applied. Is that right?

I think we could find an easy standardization around a sort of ad hoc predicate API. Users could specify a function that answer "mutate" or "do not mutate" for a given (source file, line number). CR and mutmut could then provide facilities for using such functions to filter mutations in a cross-tool way.

It's interesting to think about other parameters we could pass to this predicate. An obvious one would be the name of the mutation operator so that user could prevent certain kinds of mutations. Another would be the range of text being mutated in the, supporting even more fine-grained control. This might be overkill for a simple, broadly applicable API, though. I suspect most users would be what they need from a (file, line) based approach.

@boxed
Copy link

boxed commented Mar 14, 2020

If I understand what you've got, you're skipping mutations based on whether they've got tests associated with them or if certain decorators are applied. Is that right?

Sort of. I am skipping certain decorators (or trying to, see below) and I am setting the entire test command based on the file I'm in. It's pretty rough, but it's a HUGE improvement in speed and the accuracy is pretty ok.

Users could specify a function that answer "mutate" or "do not mutate" for a given (source file, line number)

I think the more important point is that this API should be able to change what test command is run. This is the big improvement here. It takes our test suite of ~9s and cuts it down to ~1s (of which almost all is the pytest overhead, again more on that later!).

CR and mutmut could then provide facilities for using such functions to filter mutations in a cross-tool way.

I think this would be awesome for both our projects and would make mutation testing seem much more production ready :)

An obvious one would be the name of the mutation operator so that user could prevent certain kinds of mutations.

That is a good idea!

Another would be the range of text being mutated in the, supporting even more fine-grained control. This might be overkill for a simple, broadly applicable API, though. I suspect most users would be what they need from a (file, line) based approach.

I assumed so too, but I've already hit the limit of the line based approach! I will definitely implement a way to get the relevant section of code I think, this might be smaller than a line or bigger than a line. And the range too I guess...

All this work with mutmut has made it even more obvious that pytest really is a liability for mutation testing. This applies to you and me. If I select the perfect subset of tests in zero time I am still absolutely destroyed by the pytest overhead. This is largely irrelevant for normal use, but for mutation testing this is unacceptable.

I've been thinking about this for several days now and I'm starting to think I'm gonna have to write a new test runner. This seems a bit extreme but I don't see how I have another option in order to move forward :( I will not accept going back to unittest, it's just not nice enough to work with imo.

@abingham
Copy link
Contributor Author

I am setting the entire test command based on the file I'm in

That's an interesting idea. CR would sorta support this by using multiple sessions, I think, but it's not a feature I've really thought about. I can imagine ways to support it more directly, but I need to think about it.

I am still absolutely destroyed by the pytest overhead.

I wonder if this is related to pkg_resources. Importing it is notoriously slow; this is a problem we face at Sixty North with a number of our programs that use plugins. So many people are experiencing this problem that I suspect it'll be fixed at some point.

write a new test runner.

That is a big step! It's not a step I will take for CR, I think...I really want it to work non-intrusively for as many projects as possible.

@boxed
Copy link

boxed commented Mar 14, 2020

Yea I've done some work on speeding up pytest but the plug in system is suuuper slow as you say. I'm not sure anyone is looking though.

It is a big step I agree but I don't see a way to make mutation testing actually good. And there are some big advantages to having a test runner that prioritized speed (and thus enables mutation testing). At the very least it can supply a pressure on pytest so they have to take speed more serious :)

@abingham
Copy link
Contributor Author

I'm going to consider this solved, with the solution being filters. We have fairly sophisticated support for filtering/skipping tests based on essentially arbitrary criteria. See e.g. the cr-filter-pragma tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants