Parser Generation
Solution 1:
If you want to create a parser in Python you can look at these libraries: PLYpyparsing and Lepl - new but very powerful
Solution 2:
Building a real C parser by yourself is a really big task.
I suggest you either find one that is already done, eg. pycparser or you define a really simple subset of C that is easily parsed.
You'll have plenty of work to do for your plagiarism detector after you are done parsing C.
Solution 3:
I'm not sure you need to parse the token stream to detect the features you're looking for. In fact, it's probably going to complicate things more than anything.
what you're really looking for is sequences of original source code that have a very strong similarity with a suspect sample code being tested. This sounds very similar to the purpose of a Bayes classifier, like those used in spam filtering and language detection.
Post a Comment for "Parser Generation"