-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System for exceptions #97
Comments
A reasonable approach might be to let users provide an exceptions list of some sort. They would specify the line number or something (though this is brittle.) Then we would simply ignore survival results for that location. This isn't as robust as embedding exceptions in the code itself, but it also doesn't force people to pollute their code with exception notes. Original comment by: austin_bingham |
Specifying whitelisted code statements would be helpful, it always likes to find survivors in Also, reusing comments designed for other systems may work as well, commonly ignored code may have 'pragma: no cover' annotated for instance. |
Hehehe...that's a really interesting one I hadn't thought about! That really would be nearly impossible to test for. The main technical challenge with a code-based whitelist is that, for the most part, cosmic ray doesn't actually deal with the code at all. It works on the AST, so I'm not immediately sure how we could correlate code snippets with AST nodes in a general way. Maybe it's easier than I think, at least for some specific cases like this one. |
For this case at least, it works pretty well. As a first effort, only allowing valid Python in the blacklist would still help a lot but give direct comparisons to the AST before modifying it. If you can somehow make "pass" generic, you could go even further and accept things like |
Astroid is definitely on my "things to look at" list, but it hasn't risen to the top yet. Perhaps we can completely reuse pylints black/whitelisting system; that would certainly be very convenient! In all honesty, this isn't going to be my focus for quite a while. My main goal, when and if I have serious time to work on CR, is to get concurrency working better. So if you've got time and/or ideas for this area, I'd be more than happy to help you get them implemented. |
I actually had a few ideas around that, but wasn't sure where to put it. Most of the test cases for me seem to be incompetent (timeout) which isn't too helpful, but for the killed ones, I suspect many will fail the same tests. It would probably require a rethink of the test plugin system, but hinting which tests to run first could help killed tests fail fast for the same properties. Maybe forcing the same AST location to run on a single process and abusing the pytest flag Another idea is tracking runtime per test, so if a single test completes in 5ms during the healthy test run, the whole run can be killed if it takes 15ms (baseline=3) rather than waiting for several seconds that the entire test suite timeout may take. I'm not sure unittest or py.test give that much information easily, but it could greatly speed up timeouts. I'm happy to help with the speed improvements and reporting when I find some time, but I'm not a great programmer, so I probably won't be joining you in the AST stuff. |
After a discussion with @rob-smallshire and @DRMacIver at ACCU, it might not be so difficult to have an external file containing processing instructions, exceptions, etc. The gist of the idea was that each entry would include filename, line number, and the line of code in question along with one or two surrounding lines for context. With this information we could scan the code reliably tell when and how the exception file had become stale, prompting the user to update things. The hope is that it would become stale infrequently enough to not be a real burden. Based on this info we could construct some table of exceptions, and that would be used to trim the work order constructed during initialization. |
Don't use `while True` b/c BooleanReplacer breaks this function's test. This is a bit ugly but until Issue sixty-north#97 is fixed there is no other way around it.
Don't use `while True` b/c BooleanReplacer breaks this function's test. This is a bit ugly but until Issue sixty-north#97 is fixed there is no other way around it.
Don't use `while True` b/c BooleanReplacer breaks this function's test. This is a bit ugly but until Issue sixty-north#97 is fixed there is no other way around it.
Don't use `while True` b/c BooleanReplacer breaks this function's test. This is a bit ugly but until Issue sixty-north#97 is fixed there is no other way around it.
I've started some work on this in the |
@rob-smallshire had the excellent idea that we could anchor exceptions using language constructs rather than source lines. So you could say "don't run operator X for module.submodule.funcA" and so forth. The same idea is used in pytest where you can select your tests based on "node id". A good first approximation of this technique for CR would be to steal their syntax, etc. |
After thinking about this a bit, it has some pretty substantial shortcomings. The biggest one is that it means we can't anchor exceptions to things with no obvious language construct. For example, suppose I had a class-level "constant": class Foo:
CONST = 42 How could I ask CR to not mutate the |
mutmut supports the "# pragma: no mutate" option that @xiaclo alludes to above. I think it would be great if Cosmic Ray and Mutpy supported this too so we can have a common standard. |
Also see the discussion on more advanced exclusion rules here: boxed/mutmut#47 and boxed/mutmut#28 |
I'm not satisfied with that "embedded comment" style of directives. Getting it to do anything beyond a simple "don't mutate this line" is going to get ugly fast, and makes me think of people trying to use XML as a programming language. The approach we're working on with spor will, I think, be much more powerful, less intrusive, and (selfishly) far more interesting to work on. |
I agree, but for the simple case it's pretty ok and very simple and straight forward. I hadn't seen spor before (I saw you talk about it but didn't manage to find the repo before). This seems pretty interesting! Do you have support for mapping some metadata to the entire directory with spor? Because that's mostly what we've been discussing with mutmut: to have global patterns for exclude. |
Ah, I should also mention that we've been using the pragma in practice with mutmut in quite a few projects where we've achieved zero mutants (not counting excludes) and in practice it works surprisingly well we've found. So while I agree on the theoretic approach, in practice you can get probably 90% of the way with just the pragma. |
Hmm.. I had an idea just now: what if spor could import from pragmas? Then you could have a good argument for using spor for coverage.py. Seems like it should be simple to implement and gives a logical migration path. |
Not right now, and I've only really given idle thought to it. I think it would mean introducing a new kind of "anchor" since the current anchors use file content as context for updating. I've added an issue to spor for this. |
I imagine it would be straightforward to write a pragma-to-spor converter. I'd even consider adding something like this to the spor repository, but my instinct is that it should be a standalone tool. |
I'd say you're overthinking it :) |
How so? |
It's like a 15 lines script, better to include it instead of requiring users to install yet another dependency. |
Right, but my objection to is that, currently, spor doesn't really know anything about "Python code", so-to-speak. It's written in Python (though there's an experimental rust implementation), but beyond that it's intended to be language agnostic. So yes, it likely won't hurt anyone to have a Python-specific tool in spor, I want to avoid any inclination to make spor into a "Python tool". |
Oh. Yea ok. Still though, you could include various mini tools to try to drive adoption... |
I implemented a (quite hacky) way to plug into mutmut. You can see a practical use at https://github.com/TriOptima/iommi/blob/master/mutmut_config.py It is very successful. This little config makes mutation testing of iommi pretty fast, as compared to unfeasible without it. It would be nice if we would agree on a common API (maybe with specific extensions namespaced so we can still have differentiation before we standardize). I would love your feedback on this. |
If I understand what you've got, you're skipping mutations based on whether they've got tests associated with them or if certain decorators are applied. Is that right? I think we could find an easy standardization around a sort of ad hoc predicate API. Users could specify a function that answer "mutate" or "do not mutate" for a given It's interesting to think about other parameters we could pass to this predicate. An obvious one would be the name of the mutation operator so that user could prevent certain kinds of mutations. Another would be the range of text being mutated in the, supporting even more fine-grained control. This might be overkill for a simple, broadly applicable API, though. I suspect most users would be what they need from a |
Sort of. I am skipping certain decorators (or trying to, see below) and I am setting the entire test command based on the file I'm in. It's pretty rough, but it's a HUGE improvement in speed and the accuracy is pretty ok.
I think the more important point is that this API should be able to change what test command is run. This is the big improvement here. It takes our test suite of ~9s and cuts it down to ~1s (of which almost all is the pytest overhead, again more on that later!).
I think this would be awesome for both our projects and would make mutation testing seem much more production ready :)
That is a good idea!
I assumed so too, but I've already hit the limit of the line based approach! I will definitely implement a way to get the relevant section of code I think, this might be smaller than a line or bigger than a line. And the range too I guess... All this work with mutmut has made it even more obvious that pytest really is a liability for mutation testing. This applies to you and me. If I select the perfect subset of tests in zero time I am still absolutely destroyed by the pytest overhead. This is largely irrelevant for normal use, but for mutation testing this is unacceptable. I've been thinking about this for several days now and I'm starting to think I'm gonna have to write a new test runner. This seems a bit extreme but I don't see how I have another option in order to move forward :( I will not accept going back to unittest, it's just not nice enough to work with imo. |
That's an interesting idea. CR would sorta support this by using multiple sessions, I think, but it's not a feature I've really thought about. I can imagine ways to support it more directly, but I need to think about it.
I wonder if this is related to
That is a big step! It's not a step I will take for CR, I think...I really want it to work non-intrusively for as many projects as possible. |
Yea I've done some work on speeding up pytest but the plug in system is suuuper slow as you say. I'm not sure anyone is looking though. It is a big step I agree but I don't see a way to make mutation testing actually good. And there are some big advantages to having a test runner that prioritized speed (and thus enables mutation testing). At the very least it can supply a pressure on pytest so they have to take speed more serious :) |
I'm going to consider this solved, with the solution being filters. We have fairly sophisticated support for filtering/skipping tests based on essentially arbitrary criteria. See e.g. the |
In some cases we'll find that surviving mutations are completely acceptable. Consider ways to allow users to add exceptions.
The text was updated successfully, but these errors were encountered: