Skip to content

Implementation of character-level deep neural networks for text classification.

Notifications You must be signed in to change notification settings

nopperl/text-char-dnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Character-level Text Classification

Implementation of character-level deep neural networks for text classification. Three models (CNN, VDCNN and GRU) are evaluated on four binary text classification datasets (Blog Authorship Corpus, PAN13 and PAN14 and Enron Email Dataset). Results:

Blogs PAN13 PAN14 Enron
CNN 65% 55% 69% 57%
VDCNN 66% 74% 67% 64%
GRU 62% 60% 63% 62%

Overall, the VDCNN model is the most accurate, but the GRU model displays more consistent results.

Installation

A working Python 3 installation is assumed. Install the required packages using:

pip install -r requirements.txt

Note that requirements.txt references the tensorflow-gpu package. It is recommended to use a GPU to train the models. If no GPU is used, install the tensorflow package instead.

Usage

Download the training data using:

./download.sh

Run the preprocessing steps using:

./process.sh

Now, you can train a model using:

./train.py -a vdcnn -d blogs pan13_tr_en

Use train.py -h for more information.

About

Implementation of character-level deep neural networks for text classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published