Mayat is a code similarity detection tool developed by Tian(Maxwell) Yang. It works by comparing the Abstract Syntax Trees of students' code solutions and generate a similarity score for each pair of students' code.
pip install mayat
python -m mayat.install_langs
Let's say we need to check all students' uniq.c
for homework1. The path for each uniq.c
has the format homework1/<unique-id>/user/uniq.c
. All we need to do is run:
python -m mayat.frontends.TS_C homework1/*/user/uniq.c
If we only want to check the main
function, we can do:
python -m mayat.frontends.TS_C homework1/*/user/uniq.c -f main
Additionally, we can pass more optional arguments for C.py
:
--threshold
: Specify the granularity for the matching algorithm. Default to5
. A smaller value will cause it to check trivial details, which increases the similarity score of two code even though they might not be similar. A larger value will cause it to overlook some common cheat tricks such as swapping two function definitions.
- C:
mayat.TS_C
mayat.C
(Legacy)
- Python:
mayat.TS_Python
mayat.Python
(Legacy)
- Java:
mayat.TS_Java
We implement a new programming language's frontend by using classes and functions defined in mayat
. They are:
mayat.AST.AST
: The base class for Abstract Syntax Tree. For a new PL you should inherit this and implement theAST.create(path)
class method, which takes the path of a program as a parameter and returns the AST representation of that program. Currently it is preferred to usetree-sitter
parsers to implement language frontends, whose corresponding file should be prefixed withTS_
.mayat.args.arg_parser
: Aargparse.ArgumentParser
object. We need to use this object to retrieve command arguments. We can add new arguments if needed.mayat.driver.driver
: The driver function that takes the inherited AST class and the parsed arguments as parameters and run the plagiarism detection algorithm.
An example of this can be find in mayat/frontends/TS_C.py
, which is a C frontend implemented using tree-sitter-c
parser.
cd tests
python test.py -v
This tool will never work for assembly code as the code has to be written in a high level programming language that can be converted into an AST. We can potentially figure out a way to automatically reverse engineer assembly code back to C and then convert it to AST. However, there's no guarantee that the reverse-engineered code can be a good representation for its assembly counterpart.