Mutation score can be used to compare different test suites in relation to mutants detection. However, it is not known if the mutation score, being a summary of the detection ratios of different mutation types, is a fair metric to do such comparison. In this paper, we present an empirical study, with 10 open-source projects, which compares developer-written and automatically generated test suites in terms of mutation score and in relation to the detection ratios of 7 mutation types. Our results indicate fairness on the mutation score but also suggest equivalence among mutants generated by PIT with different mutation operators.

