System for exceptions #97

abingham · 2015-04-18T08:41:33Z

In some cases we'll find that surviving mutations are completely acceptable. Consider ways to allow users to add exceptions.

Bitbucket: https://bitbucket.org/austin_bingham/cosmic-ray/issue/7
Originally reported by: austin_bingham
Originally created at: 2015-02-04T13:01:05.851

abingham · 2015-04-18T08:41:33Z

A reasonable approach might be to let users provide an exceptions list of some sort. They would specify the line number or something (though this is brittle.) Then we would simply ignore survival results for that location.

This isn't as robust as embedding exceptions in the code itself, but it also doesn't force people to pollute their code with exception notes.

Original comment by: austin_bingham

ghost · 2016-02-16T12:07:55Z

Specifying whitelisted code statements would be helpful, it always likes to find survivors in __name__ == "__main__".

Also, reusing comments designed for other systems may work as well, commonly ignored code may have 'pragma: no cover' annotated for instance.

abingham · 2016-02-16T12:15:57Z

Hehehe...that's a really interesting one I hadn't thought about! That really would be nearly impossible to test for.

The main technical challenge with a code-based whitelist is that, for the most part, cosmic ray doesn't actually deal with the code at all. It works on the AST, so I'm not immediately sure how we could correlate code snippets with AST nodes in a general way. Maybe it's easier than I think, at least for some specific cases like this one.

ghost · 2016-02-16T12:34:50Z

For this case at least, it works pretty well. ast.parse('__name__ == "__main__"') just ends up giving back an AST expression with name, eq, "__main__" since it's a valid Python expression on it's own.

As a first effort, only allowing valid Python in the blacklist would still help a lot but give direct comparisons to the AST before modifying it.

If you can somehow make "pass" generic, you could go even further and accept things like if __name__ == '__main__': pass, but it would be a more complicated AST comparison. It kind of reminds me of overriding functions to help pylint work around metaprogramming. I know it uses astroid heavily, so maybe that can help with some of the harder problems with ASTs.

abingham · 2016-02-16T12:46:47Z

Astroid is definitely on my "things to look at" list, but it hasn't risen to the top yet. Perhaps we can completely reuse pylints black/whitelisting system; that would certainly be very convenient!

In all honesty, this isn't going to be my focus for quite a while. My main goal, when and if I have serious time to work on CR, is to get concurrency working better. So if you've got time and/or ideas for this area, I'd be more than happy to help you get them implemented.

ghost · 2016-02-16T13:07:55Z

I actually had a few ideas around that, but wasn't sure where to put it.

Most of the test cases for me seem to be incompetent (timeout) which isn't too helpful, but for the killed ones, I suspect many will fail the same tests. It would probably require a rethink of the test plugin system, but hinting which tests to run first could help killed tests fail fast for the same properties.

Maybe forcing the same AST location to run on a single process and abusing the pytest flag --ff (although I'm not sure where it stores that state).

Another idea is tracking runtime per test, so if a single test completes in 5ms during the healthy test run, the whole run can be killed if it takes 15ms (baseline=3) rather than waiting for several seconds that the entire test suite timeout may take. I'm not sure unittest or py.test give that much information easily, but it could greatly speed up timeouts.

I'm happy to help with the speed improvements and reporting when I find some time, but I'm not a great programmer, so I probably won't be joining you in the AST stuff.

abingham · 2016-04-21T15:19:02Z

After a discussion with @rob-smallshire and @DRMacIver at ACCU, it might not be so difficult to have an external file containing processing instructions, exceptions, etc. The gist of the idea was that each entry would include filename, line number, and the line of code in question along with one or two surrounding lines for context. With this information we could scan the code reliably tell when and how the exception file had become stale, prompting the user to update things. The hope is that it would become stale infrequently enough to not be a real burden.

Based on this info we could construct some table of exceptions, and that would be used to trim the work order constructed during initialization.

Don't use `while True` b/c BooleanReplacer breaks this function's test. This is a bit ugly but until Issue sixty-north#97 is fixed there is no other way around it.

abingham · 2017-07-19T09:02:40Z

I've started some work on this in the spor branch. This is based on the spor project which is going to be an interesting technical challenge in its own right.

abingham · 2017-11-17T14:08:33Z

@rob-smallshire had the excellent idea that we could anchor exceptions using language constructs rather than source lines. So you could say "don't run operator X for module.submodule.funcA" and so forth.

The same idea is used in pytest where you can select your tests based on "node id". A good first approximation of this technique for CR would be to steal their syntax, etc.

abingham · 2017-11-21T08:34:59Z

anchor exceptions using language constructs

After thinking about this a bit, it has some pretty substantial shortcomings. The biggest one is that it means we can't anchor exceptions to things with no obvious language construct. For example, suppose I had a class-level "constant":

class Foo:
    CONST = 42

How could I ask CR to not mutate the 42? I could imagine an address like "module/Foo.py::Foo::CONST", but I worry that the logic for this would get pretty hairy. In this case, we'd have to know to look for assignment AST nodes and check their targets for the name "CONST".

boxed · 2018-11-28T07:33:56Z

mutmut supports the "# pragma: no mutate" option that @xiaclo alludes to above. I think it would be great if Cosmic Ray and Mutpy supported this too so we can have a common standard.

boxed · 2018-11-28T07:38:17Z

Also see the discussion on more advanced exclusion rules here: boxed/mutmut#47 and boxed/mutmut#28

abingham · 2018-11-28T07:57:04Z

I'm not satisfied with that "embedded comment" style of directives. Getting it to do anything beyond a simple "don't mutate this line" is going to get ugly fast, and makes me think of people trying to use XML as a programming language. The approach we're working on with spor will, I think, be much more powerful, less intrusive, and (selfishly) far more interesting to work on.

boxed · 2018-11-28T08:01:08Z

I agree, but for the simple case it's pretty ok and very simple and straight forward. I hadn't seen spor before (I saw you talk about it but didn't manage to find the repo before). This seems pretty interesting!

Do you have support for mapping some metadata to the entire directory with spor? Because that's mostly what we've been discussing with mutmut: to have global patterns for exclude.

boxed · 2018-11-28T08:02:48Z

Ah, I should also mention that we've been using the pragma in practice with mutmut in quite a few projects where we've achieved zero mutants (not counting excludes) and in practice it works surprisingly well we've found. So while I agree on the theoretic approach, in practice you can get probably 90% of the way with just the pragma.

boxed · 2018-11-28T08:04:47Z

Hmm.. I had an idea just now: what if spor could import from pragmas? Then you could have a good argument for using spor for coverage.py. Seems like it should be simple to implement and gives a logical migration path.

abingham · 2018-11-28T08:09:52Z

Do you have support for mapping some metadata to the entire directory with spor?

Not right now, and I've only really given idle thought to it. I think it would mean introducing a new kind of "anchor" since the current anchors use file content as context for updating. I've added an issue to spor for this.

abingham · 2018-11-28T08:12:53Z

what if spor could import from pragmas?

I imagine it would be straightforward to write a pragma-to-spor converter. I'd even consider adding something like this to the spor repository, but my instinct is that it should be a standalone tool.

boxed · 2018-11-28T08:15:44Z

I'd say you're overthinking it :)

abingham · 2018-11-28T08:27:51Z

How so?

boxed · 2018-11-28T08:29:34Z

It's like a 15 lines script, better to include it instead of requiring users to install yet another dependency.

abingham · 2018-11-28T08:32:48Z

Right, but my objection to is that, currently, spor doesn't really know anything about "Python code", so-to-speak. It's written in Python (though there's an experimental rust implementation), but beyond that it's intended to be language agnostic. So yes, it likely won't hurt anyone to have a Python-specific tool in spor, I want to avoid any inclination to make spor into a "Python tool".

boxed · 2018-11-28T08:34:29Z

Oh. Yea ok. Still though, you could include various mini tools to try to drive adoption...

boxed · 2020-03-13T12:27:39Z

I implemented a (quite hacky) way to plug into mutmut. You can see a practical use at https://github.com/TriOptima/iommi/blob/master/mutmut_config.py It is very successful. This little config makes mutation testing of iommi pretty fast, as compared to unfeasible without it.

It would be nice if we would agree on a common API (maybe with specific extensions namespaced so we can still have differentiation before we standardize). I would love your feedback on this.

abingham · 2020-03-14T09:54:50Z

If I understand what you've got, you're skipping mutations based on whether they've got tests associated with them or if certain decorators are applied. Is that right?

I think we could find an easy standardization around a sort of ad hoc predicate API. Users could specify a function that answer "mutate" or "do not mutate" for a given (source file, line number). CR and mutmut could then provide facilities for using such functions to filter mutations in a cross-tool way.

It's interesting to think about other parameters we could pass to this predicate. An obvious one would be the name of the mutation operator so that user could prevent certain kinds of mutations. Another would be the range of text being mutated in the, supporting even more fine-grained control. This might be overkill for a simple, broadly applicable API, though. I suspect most users would be what they need from a (file, line) based approach.

boxed · 2020-03-14T11:37:58Z

If I understand what you've got, you're skipping mutations based on whether they've got tests associated with them or if certain decorators are applied. Is that right?

Sort of. I am skipping certain decorators (or trying to, see below) and I am setting the entire test command based on the file I'm in. It's pretty rough, but it's a HUGE improvement in speed and the accuracy is pretty ok.

Users could specify a function that answer "mutate" or "do not mutate" for a given (source file, line number)

I think the more important point is that this API should be able to change what test command is run. This is the big improvement here. It takes our test suite of ~9s and cuts it down to ~1s (of which almost all is the pytest overhead, again more on that later!).

CR and mutmut could then provide facilities for using such functions to filter mutations in a cross-tool way.

I think this would be awesome for both our projects and would make mutation testing seem much more production ready :)

An obvious one would be the name of the mutation operator so that user could prevent certain kinds of mutations.

That is a good idea!

Another would be the range of text being mutated in the, supporting even more fine-grained control. This might be overkill for a simple, broadly applicable API, though. I suspect most users would be what they need from a (file, line) based approach.

I assumed so too, but I've already hit the limit of the line based approach! I will definitely implement a way to get the relevant section of code I think, this might be smaller than a line or bigger than a line. And the range too I guess...

All this work with mutmut has made it even more obvious that pytest really is a liability for mutation testing. This applies to you and me. If I select the perfect subset of tests in zero time I am still absolutely destroyed by the pytest overhead. This is largely irrelevant for normal use, but for mutation testing this is unacceptable.

I've been thinking about this for several days now and I'm starting to think I'm gonna have to write a new test runner. This seems a bit extreme but I don't see how I have another option in order to move forward :( I will not accept going back to unittest, it's just not nice enough to work with imo.

abingham · 2020-03-14T14:21:26Z

I am setting the entire test command based on the file I'm in

That's an interesting idea. CR would sorta support this by using multiple sessions, I think, but it's not a feature I've really thought about. I can imagine ways to support it more directly, but I need to think about it.

I am still absolutely destroyed by the pytest overhead.

I wonder if this is related to pkg_resources. Importing it is notoriously slow; this is a problem we face at Sixty North with a number of our programs that use plugins. So many people are experiencing this problem that I suspect it'll be fixed at some point.

write a new test runner.

That is a big step! It's not a step I will take for CR, I think...I really want it to work non-intrusively for as many projects as possible.

boxed · 2020-03-14T14:51:17Z

Yea I've done some work on speeding up pytest but the plug in system is suuuper slow as you say. I'm not sure anyone is looking though.

It is a big step I agree but I don't see a way to make mutation testing actually good. And there are some big advantages to having a test runner that prioritized speed (and thus enables mutation testing). At the very least it can supply a pressure on pytest so they have to take speed more serious :)

abingham · 2021-05-16T13:58:07Z

I'm going to consider this solved, with the solution being filters. We have fairly sophisticated support for filtering/skipping tests based on essentially arbitrary criteria. See e.g. the cr-filter-pragma tool.

atodorov added a commit to MrSenko/cosmic-ray that referenced this issue Jul 13, 2016

Modify the infinite loop function

464c66b

Don't use `while True` b/c BooleanReplacer breaks this function's test. This is a bit ugly but until Issue sixty-north#97 is fixed there is no other way around it.

atodorov added a commit to MrSenko/cosmic-ray that referenced this issue Jul 13, 2016

Modify the infinite loop function

b680e3e

Don't use `while True` b/c BooleanReplacer breaks this function's test. This is a bit ugly but until Issue sixty-north#97 is fixed there is no other way around it.

atodorov added a commit to MrSenko/cosmic-ray that referenced this issue Jul 13, 2016

Modify the infinite loop function

ba93b06

Don't use `while True` b/c BooleanReplacer breaks this function's test. This is a bit ugly but until Issue sixty-north#97 is fixed there is no other way around it.

atodorov added a commit to MrSenko/cosmic-ray that referenced this issue Jul 14, 2016

Modify the infinite loop function

29ef482

Don't use `while True` b/c BooleanReplacer breaks this function's test. This is a bit ugly but until Issue sixty-north#97 is fixed there is no other way around it.

abingham mentioned this issue Dec 5, 2016

Local execution mode #183

Closed

abingham mentioned this issue Sep 6, 2017

More mutation operators #93

Open

abingham added pyconuk and removed pyconuk labels Oct 30, 2017

boxed mentioned this issue Nov 28, 2018

System for exceptions mutpy/mutpy#42

Open

abingham mentioned this issue Nov 28, 2018

Consider adding support for anchoring directories abingham/spor#5

Open

abingham closed this as completed May 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System for exceptions #97

System for exceptions #97

abingham commented Apr 18, 2015

abingham commented Apr 18, 2015

ghost commented Feb 16, 2016

abingham commented Feb 16, 2016

ghost commented Feb 16, 2016

abingham commented Feb 16, 2016

ghost commented Feb 16, 2016

abingham commented Apr 21, 2016

abingham commented Jul 19, 2017

abingham commented Nov 17, 2017

abingham commented Nov 21, 2017

boxed commented Nov 28, 2018

boxed commented Nov 28, 2018

abingham commented Nov 28, 2018

boxed commented Nov 28, 2018

boxed commented Nov 28, 2018

boxed commented Nov 28, 2018

abingham commented Nov 28, 2018

abingham commented Nov 28, 2018

boxed commented Nov 28, 2018

abingham commented Nov 28, 2018

boxed commented Nov 28, 2018

abingham commented Nov 28, 2018

boxed commented Nov 28, 2018

boxed commented Mar 13, 2020

abingham commented Mar 14, 2020

boxed commented Mar 14, 2020

abingham commented Mar 14, 2020

boxed commented Mar 14, 2020

abingham commented May 16, 2021

System for exceptions #97

System for exceptions #97

Comments

abingham commented Apr 18, 2015

abingham commented Apr 18, 2015

ghost commented Feb 16, 2016

abingham commented Feb 16, 2016

ghost commented Feb 16, 2016

abingham commented Feb 16, 2016

ghost commented Feb 16, 2016

abingham commented Apr 21, 2016

abingham commented Jul 19, 2017

abingham commented Nov 17, 2017

abingham commented Nov 21, 2017

boxed commented Nov 28, 2018

boxed commented Nov 28, 2018

abingham commented Nov 28, 2018

boxed commented Nov 28, 2018

boxed commented Nov 28, 2018

boxed commented Nov 28, 2018

abingham commented Nov 28, 2018

abingham commented Nov 28, 2018

boxed commented Nov 28, 2018

abingham commented Nov 28, 2018

boxed commented Nov 28, 2018

abingham commented Nov 28, 2018

boxed commented Nov 28, 2018

boxed commented Mar 13, 2020

abingham commented Mar 14, 2020

boxed commented Mar 14, 2020

abingham commented Mar 14, 2020

boxed commented Mar 14, 2020

abingham commented May 16, 2021