Skip to content

alexeygrigorev/wsdmcup17-vandalism-detection

Repository files navigation

WSDM Cup 2017: Vandalism Detection

Running the solution:

  • download all the data, unpack, and put the files to the data folder,
  • run 01_xml_to_csv.py for converting the wikidata dump files into a bunch of csv files
  • run 02_join_data.py to join the data from the xml files with the meta information and labels
  • 03_extract_features.py processes the data so it can be used for the model. This includes
    • specifying the training, validation and testing folds
    • processing the information about the users (including the meta information)
    • extracting useful features from the comments
  • 04_train_svm.py creates two models:
    • vectorizer for creating a large one-hot-encoding matrix for all the string features
    • a linear SVM model with L1 penalty for performing the classification
  • tira_client.py is used for running the model on http://tira.io/

About

The 2nd place solution for WSDM Cup 2017: Vandalism Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages