-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests Deadliness and Higher Order Mutations #174
Comments
Interesting.. I have had someone suggest to use mutmut to find irrelevant tests that can be deleted before, but deadliness is a new idea to me. Deadliness isn't as easy as you might expect though. There are a few factors that make it more difficult, the primary one being that mutmut is specifically designed to not have any insight into the test runner. So fail/pass is all it knows. This is what makes it so easy to support all test runners. What you suggest would require 1) that we get detailed information on which tests failed and 2) that we don't run the test runner in fail fast mode which is slower. I would think a deadliness of > 1 would be seen as unwanted, as it either means you're running more tests than you need for each mutant. I believe this can be implemented today by having your test runner output a test report at the end and then giving mutmut a post_mutation step where you can parse that file and do whatever you want with that data.
This is an interesting area but so far I don't know of anyone who has actually tried it successfully. Your arguments make sense a priori, but I have actually tried to implement higher order mutants and it doesn't seem to do anything useful. It makes everything slower for negligible gain. As far as I know there are precious few code bases that have reached zero surviving mutants, which you really should be doing before beginning to think about higher order mutants. And you probably have much more to gain by property based testing in a big chunk of those too before higher order mutants become relevant. If you still need more security and robustness you should probably change programming language I think :P All that being said.. do you have a code base where you have reached zero and still feel there is more to be done? If so, can I see it? |
Deadliness
I did not know that. Maybe the slow down from running all tests instead of fail fast could be compensated by using parallelisation of tests such as xdist for pytest. I passed it via the
I think there are two different ways to think about (a high) deadliness.
Higher Order Mutants
I would totally agree with that. Up to second/third/... order mutants takes roughly double/triple/... the number of mutants to test. Maybe parallelisation of tests (like I mentioned above) and of mutants (like mentioned in another issue) could help with that though.
I think I would argue that a surviving higher order mutant is more critical than a lower order mutant. After all this tells us, that there is similar implementation, which still satisfies all tests and has more changes than the lower order mutant applied to the code base. Or put differently, the test base is particularly insensitive to a larger set of changes. Anyways, thank you for your thoughtful response! I can see that the implementation of my ideas come with some challenges, but I just wanted to see if anybody had thought about these concepts before. |
Hey boxed and mutmut contributors,
thanks for this great library! It really helps in developing tests, which test the real corner cases!
I was wondering, whether it would be possible to implement two new features, namely test deadliness and higher order mutations.
Test Deadliness
As far as I am aware, mutmut does not report the number of tests which killed a mutant. The more tests fail after applying a mutant, the higher the deadliness for said mutant is. I think this would be interesting, as this reveals which locations are really critical for the correctness of the code base. It also reveals which locations are well tested and which are not.
As an example, a surviving mutant has a test deadliness of 0, a mutant killed by a single failed test has a test deadliness of 1 and so on.
Maybe this could be normalized by the number of tests to account for small or large test bases.
Reporting an aggregate measure (average, median) or the histogram of test deadliness over all mutants reveals whether
As this should be quite easy to obtain I would propose just outputting the results when calling
mutmut results
.Higher Order Mutations
Mutmut currently only applies a single mutation before running tests. Combining these mutations (two mutations -> second order, three mutations -> third order, an so on) would allow to evaluate the test robustness in a more extreme setting (edit: I noticed, that you mentioned this idea already in #113 (comment)). I would argue that this setting (for reasonable low orders) is still quite realistic, as the example below shows:
for i in range(0,5): ...
the reference codefor i in range(1,5): ...
a normal first order mutant, with changed iteration start and thus the iteration lengthfor i in range(0,6): ...
a normal first order mutant, with changed iteration end and thus the iteration lengthfor i in range(1,6): ...
a second order mutant, with changed iteration start and end, but not the iteration lengthAnother example would be an
if A and B or (C and not D): ...
statement with multiple logical expressionsA
,B
,C
andD
. A single mutant will likely change the behavior, but multiple mutations could partially restore parts of the logical behavior and thus have tests wrongly pass.Obviously this does not scale well for high orders if the number of mutants N is large, I think there are potentially N*(N-1)/2 higher order mutants, which would need to be tested. But within all potential mutant combinations there are some, which are more interesting and more likely to survive. Those combinations are most likely those, of which the independent mutant weren't able to kill (m)any tests.
Thus, building on the test deadliness proposed above, one could order or limit the higher order mutations by the test deadliness of their individual tests. Afterall two (or more) mutations, which were not killed, might also survive in conjunction.
Note, that test deadliness should also be measured for these higher order mutants, which allows us to do the following:
For example, if mutations
A
andB
have td=0, testingAB
would be interesting. On the other hand if mutationsA
andB
have td>>0, testingAB
would not be interesting.If if mutations
A
,B
andC
have td=0, the second order mutationsAB
,AC
andCB
can be tested. Furthermore the third order mutationABC
could be tested, and this would be determined by the test deadliness of he second order mutationsAB
,AC
andCB
.To use this feature, I would propose a new CLI argument
--max_order
, which defaults to 1 and thus defaults to the current mutmut behavior. A usage like--max_order=2
would additionally test all possible pair combinations of the mutants. Low digit values of--max_order
should reveal the most interesting combinations, as more and more concurrent mutations increase the probability of a test or simply a execution failure.Something like
--td_limit_higher_order
could be used to provide a test deadliness limit to skip mutant combinations, where the individuals already have a high average test deadliness.I would be happy to hear your or any of the mutmut contributors opinion on the usefulness of these two features!
The text was updated successfully, but these errors were encountered: