My code for the news scraping / topic prediction kaggle competition.
You can find the scraping code (better than mine) in the competition Code section.
python setup.py build_ext --inplace
- train_kernel_svm.py - approach #1
- train_rubert.py - approach #2 (trained on different data)
- fusion.py - late fusion (gives +0.5% to using approach #2)
- fix_known_documents.py - set labels of the test documents that appear in the training set (4Gb RAM)
Tested on MacOS / kaggle