- This Tamil fake news text corpus has 5,273 rows
- The data was collected automatically from various verified websites by using web scraping tools
- Download the dataset here: Dataset
News | Count |
---|---|
Fake | 2949 |
Real | 2324 |
Total | 5273 |
- The Corpus has data from the following domains:
Domain | Label | Count |
---|---|---|
Politics | politics | 1674 |
Miscellaneous (individual opinions, political) | miscellaneous | 1521 |
Business/Science | tech | 966 |
Entertainment | entertainment | 589 |
Sports | sport | 476 |
Model | Accuracy |
---|---|
Support Vector Machine | 87.85% |
Logisitic Regression | 86.80% |
Naive Bayes | 85.46% |
XG-Boost | 85.08% |
RNN (2 LSTM layers) | 75.04% |
14/11/2022: Our paper titled A Novel Dataset for Fake News Detection in Tamil Regional Language
has been accepted for SPELLL 2022!
03/06/2023: Read our paper here: Speech and Language Technologies for Low-Resource Languages