awesome-text-summarization

A curated list of resources dedicated to text summarization

Opinosis dataset contains 51 articles. Each article is about a product’s feature, like iPod’s Battery Life, etc. and is a collection of reviews by customers who purchased that product. Each article in the dataset has 5 manually written “gold” summaries. Usually the 5 gold summaries are different but they can also be the same text repeated 5 times.
Past DUC Data and TAC Data include summarization data.
English Gigaword: English Gigaword was produced by Linguistic Data Consortium (LDC).
Large Scale Chinese Short Text Summarization Dataset (LCSTS): This corpus is constructed from the Chinese microblogging website SinaWeibo. It consists of over 2 million real Chinese short texts with short summaries given by the writer of each text.
Ziqiang Cao, Chengyao Chen, Wenjie Li, Sujian Li, Furu Wei, Ming Zhou. TGSum: Build Tweet Guided Multi-Document Summarization Dataset. arXiv:1511.08417, 2015.
scisumm-corpus contains a release of the scientific document summarization corpus and annotations from the WING NUS group.
Avinesh P.V.S., Maxime Peyrard, Christian M. Meyer. Live Blog Corpus for Summarization. arXiv:1802.09884, 2018.
Alexander R. Fabbri, Irene Li, Prawat Trairatvorakul, Yijiao He, Wei Tai Ting, Robert Tung, Caitlin Westerfield, Dragomir R. Radev.TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation. arXiv:1805.04617, 2018. The source code is TutorialBank. All the datasets could be found through the search engine. The blog TutorialBank: Learning NLP Made Easier is an excellent user guide with step by step instructions on how to use the search engine.
Legal Case Reports Data Set contains Australian legal cases from the Federal Court of Australia (FCA).
TIPSTER Text Summarization Evaluation Conference (SUMMAC) includes 183 documents.
NEWS SUMMARY consists of 4515 examples.
BBC News Summary consists of 417 political news articles of BBC from 2004 to 2005.
CNN / Daily Mail dataset (non-anonymized) for summarization is produced by the code cnn-dailymail.
sentence-compression is a large corpus of uncompressed and compressed sentences from news articles. The algorithm to collect the data is described here: Overcoming the Lack of Parallel Data in Sentence Compression by Katja Filippova and Yasemin Altun, EMNLP '13.
The Columbia Summarization Corpus (CSC) was retrieved from the output of the Newsblaster online news summarization system that crawls the Web for news articles, clusters them on specific topics and produces multidocument summaries for each cluster. They collected a total of 166,435 summaries containing 2.5 million sentences and covering 2,129 days in the 2003-2011 period. Additional references of the Columbia Newsblaster summarizer can be found on the website of Columbia NLP group publication page.
WikiHow-Dataset a new large-scale dataset using the online [WikiHow] (http://www.wikihow.com) knowledge base. Each article consists of multiple paragraphs and each paragraph starts with a sentence summarizing it. By merging the paragraphs to form the article and the paragraph outlines to form the summary, the resulting version of the dataset contains more than 200,000 long-sequence pairs.

Text Summarization Software

sumeval implemented in Python is a well tested & multi-language evaluation framework for text summarization.
sumy is a simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains simple evaluation framework for text summaries. Implemented summarization methods are Luhn, Edmundson, LSA, LexRank, TextRank, SumBasic and KL-Sum.
TextRank4ZH implements the TextRank algorithm to extract key words/phrases and text summarization in Chinese. It is written in Python.
snownlp is python library for processing Chinese text.
PKUSUMSUM is an integrated toolkit for automatic document summarization. It supports single-document, multi-document and topic-focused multi-document summarizations, and a variety of summarization methods have been implemented in the toolkit. It supports Western languages (e.g. English) and Chinese language.
fnlp is a toolkit for Chinese natural language processing.
fairseq is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence model.

Word Representations

G. E. Hinton, J. L, McClelland, and D. E. Rumelhart. Distributed representations. In D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, MIT Press, Cambridge, MA. 1986. The related slides are here or here.
- "Distributed representation" means a many-tomany relationship between two types of representation (such as concepts and neurons): 1. Each concept is represented by many neurons; 2. Each neuron participates in the representation of many concepts.
Language Modeling with N-Grams. The related slides are here. It introduced language modeling and the N-gram, one of the most widely used tools in language processing.
- Language models offer a way to assign a probability to a sentence or other sequence of words, and to predict a word from preceding words.
- N-grams are Markov models that estimate words from a fixed window of previous words. N-gram probabilities can be estimated by counting in a corpus and normalizing (the maximum likelihood estimate).
- N-gram language models are evaluated extrinsically in some task, or intrinsically using perplexity.
- The perplexity of a test set according to a language model is the geometric mean of the inverse test set probability computed by the model.
- Smoothing algorithms provide a more sophisticated way to estimat the probability of N-grams. Commonly used smoothing algorithms for N-grams rely on lower-order N-gram counts through backoff or interpolation.
- There are at least two drawbacks for the n-gram language model. First, it is not taking into account contexts farther than 1 or 2 words. N-grams with n up to 5 (i.e. 4 words of context) have been reported, though, but due to data scarcity, most predictions are made with a much shorter context. Second, it is not taking into account the “similarity” between words.
Yoshua Bengio, Réjean Ducharme, Pascal Vincent and Christian Jauvin. A Neural Probabilistic Language Model. Journal of Machine Learning Research, 2003.
- They propose continuous space LMs using neural networks to fight the curse of dimensionality by learning a distributed representation for words.
- The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations.
- Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence.
- The idea of the proposed approach can be summarized: 1. associate with each word in the vocabulary a distributed word feature vector, 2. express the joint probability function of word sequences in terms of the feature vectors of these words in the sequence, and 3. learn simultaneously the word feature vectors and the parameters of that probability function.
In the following two papers, it is shown that both to project all words of the context onto a continuous space and calculate the language model probability for the given context can be performed by a neural network using two hidden layers.
- Holger Schwenk and Jean-Luc Gauvain. Training Neural Network Language Models On Very Large Corpora. in Proc. Joint Conference HLT/EMNLP, 2005.
- Holger Schwenk. Continuous space language models. Computer Speech and Language, 2007.
Tomas Mikolov's series of papers improved the quality of word representations:
- T. Mikolov, J. Kopecky, L. Burget, O. Glembek and J. Cernocky. Neural network based language models for higly inflective languages. Proc. ICASSP, 2009. The first step in their architecture is training of bigram neural network: given word w from vocabulary V, estimate probability distribution of the next word in text. To compute projection of word w onto a continuous space, half of the bigram network (first two layers) is used to compute values in hidden layer. Values from the hidden layer of bigram network are used to form input layer of n-gram network.
- T. Mikolov, W.T. Yih and G. Zweig. Linguistic Regularities in Continuous Space Word Representations. NAACL HLT, 2013. They examine the vector-space word representations that are implicitly learned by the input-layer weights. They find that these representations are surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. This allows vector-oriented reasoning based on the offsets between words. Remarkably, this method outperforms the best previous systems.
- Tomas Mikolov, Kai Chen, Greg Corrado and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781v3, 2013. They propose two new model architectures for learning distributed representations: 1. Continuous Bag-of-Words Model (CBOW) builds a log-linear classifier with context words at the input, where the training criterion is to correctly classify the current word; 2. Continuous Skip-gram Model uses each current word as an input to a log-linear classifier with continuous projection layer, and predicts words within a certain range before and after the current word.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546, 2013. The source code written in C is word2vec. They present several extensions of the original Skip-gram model. They show that sub-sampling of frequent words during training results in a significant speedup (around 2x - 10x), and improves accuracy of the representations of less frequent words. In addition, they present a simplified variant of Noise Contrastive Estimation for training the Skip-gram model that results in faster training and better vector representations for frequent words. Word based model is extended to phrase based model. They found that simple vector addition can often produce meaningful results.
- Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch and Armand Joulin.Advances in Pre-Training Distributed Word Representations. arXiv:1712.09405, 2017. They show that several modifications of the standard word2vec training pipeline significantly improves the quality of the resulting word vectors: position-dependent weighting, the phrase representations and the subword information.
Christopher Olah. Deep Learning, NLP, and Representations. This post reviews some extremely remarkable results in applying deep neural networks to NLP, where the representation perspective of deep learning is a powerful view that seems to answer why deep neural networks are so effective.
Levy, Omer, and Yoav Goldberg. Neural word embedding as implicit matrix factorization. NIPS. 2014.
Sanjeev Arora's a series of blogs/papers about word embeddings:
- The blog Semantic Word Embeddings is a very good overview about word embedding.
- The blog Word Embeddings: Explaining their properties introduces the main result about RAND-WALK: A Latent Variable Model Approach to Word Embeddings, which answers three interesting questions: 1. Why do low-dimensional embeddings capture huge statistical information? 2. Why do low dimensional embeddings work better than high-dimensional ones? 3. Why do Semantic Relations correspond to Directions?
- The blog Linear algebraic structure of word meanings introduces the main result about Linear Algebraic Structure of Word Senses, with Applications to Polysemy, which shows that word senses are easily accessible in many current word embeddings.
Word2Vec Resources: This is a post with links to and descriptions of word2vec tutorials, papers, and implementations.
Word embeddings: how to transform text into numbers
GloVe: Global Vectors for Word Representation an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus.
Li, Yitan, et al. Word embedding revisited: A new representation learning and explicit matrix factorization perspective. IJCAI. 2015.
O. Levy, Y. Goldberg, and I. Dagan. Improving Distributional Similarity with Lessons Learned from Word Embeddings. Trans. Assoc. Comput. Linguist., 2015.
Eric Nalisnick, Sachin Ravi. Learning the Dimensionality of Word Embeddings. arXiv:1511.05392, 2015.
- They describe a method for learning word embeddings with data-dependent dimensionality. Their Stochastic Dimensionality Skip-Gram (SD-SG) and Stochastic Dimensionality Continuous Bag-of-Words (SD-CBOW) are nonparametric analogs of Mikolov et al.'s (2013) well-known 'word2vec' model.
William L. Hamilton, Jure Leskovec, Dan Jurafsky. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change.
- Hamilton et al. model changes in word meaning by fitting word embeddings on consecutive corpora of historical language. They compare several ways of quantifying meaning (co-occurrence vectors weighted by PPMI, SVD embeddings and word2vec embeddings), and align historical embeddings from different corpora by finding the optimal rotational alignment that preserves the cosine similarities as much as possible.
Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, Hui Xiong. Dynamic Word Embeddings for Evolving Semantic Discovery. arXiv:1703.00607v2, International Conference on Web Search and Data Mining (WSDM 2018).
Yang, Wei and Lu, Wei and Zheng, Vincent. A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings. ACL, 2017. The source code in C is cross_domain_embedding.
- This paper presents a simple yet effective method for learning word embeddings based on text from different domains.
Sebastian Ruder. Word embeddings in 2017: Trends and future directions
Bryan McCann, James Bradbury, Caiming Xiong and Richard Socher. Learned in Translation: Contextualized Word Vectors. For a high-level overview of why CoVe are great, check out the post.
- A Keras/TensorFlow implementation of the MT-LSTM/CoVe is CoVe.
- A PyTorch implementation of the MT-LSTM/CoVe is cove.
Maria Pelevina, Nikolay Arefyev, Chris Biemann, Alexander Panchenko. Making Sense of Word Embeddings. arXiv:1708.03390, 2017. The source code written in Python is sensegram.
- Making sense embedding out of word embeddings using graph-based word sense induction.
Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov. Enriching Word Vectors with Subword Information. arXiv:1607.04606v2, 2017. The souce code in C++11 is fastText, which is a library for efficient learning of word representations and sentence classification.
- They propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram; words being represented as the sum of these representations.
Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer and Herv{'e} J{'e}gou. Word Translation Without Parallel Data. arXiv:1710.04087, 2017. The source code in Python is MUSE, which is a library for multilingual unsupervised or supervised word embeddings.
Gabriel Grand, Idan Asher Blank, Francisco Pereira, Evelina Fedorenko. Semantic projection: recovering human knowledge of multiple, distinct object features from word embeddings. arXiv:1802.01241, 2018.
- Could context-dependent relationships be recovered from word embeddings? To address this issue, they introduce a powerful, domain-general solution: "semantic projection" of word-vectors onto lines that represent various object features, like size (the line extending from the word "small" to "big"), intelligence (from "dumb" to "smart"), or danger (from "safe" to "dangerous").
Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov. Learning Word Vectors for 157 Languages. arXiv:1802.06893v2, Proceedings of LREC, 2018.
- They describe how high quality word representations for 157 languages are trained. They used two sources of data to train these models: the free online encyclopedia Wikipedia and data from the common crawl project. Pre-trained word vectors for 157 languages are available.
Douwe Kiela, Changhan Wang and Kyunghyun Cho. Context-Attentive Embeddings for Improved Sentence Representations. arXiv:1804.07983, 2018.
- While one of the first steps in many NLP systems is selecting what embeddings to use, they argue that such a step is better left for neural networks to figure out by themselves. To that end, they introduce a novel, straightforward yet highly effective method for combining multiple types of word embeddings in a single model, leading to state-of-the-art performance within the same model class on a variety of tasks.
Laura Wendlandt, Jonathan K. Kummerfeld, Rada Mihalcea. Factors Influencing the Surprising Instability of Word Embeddings. arXiv:1804.09692, NAACL HLT 2018.
- They provide empirical evidence for how various factors contribute to the stability of word embeddings, and analyze the effects of stability on downstream tasks.
magnitude is a feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner.
Jose Camacho-Collados, Mohammad Taher Pilehvar. From Word to Sense Embeddings: A Survey on Vector Representations of Meaning. arXiv:1805.04032v3, 2018.

Word Representations for Chinese

X. Chen, L. Xu, Z. Liu, M. Sun and H. Luan. Joint Learning of Character and Word Embeddings. IJCAI, 2015. The source code in C is CWE.
Jian Xu, Jiawei Liu, Liangang Zhang, Zhengyu Li, Huanhuan Chen. Improve Chinese Word Embeddings by Exploiting Internal Structure. NAACL 2016. The source code in C is SCWE.
Jinxing Yu, Xun Jian, Hao Xin and Yangqiu Song. Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components. EMNLP, 2017. The source code in C is JWE.
Shaosheng Cao and Wei Lu. Improving Word Embeddings with Convolutional Feature Learning and Subword Information. AAAI, 2017. The source code in C# is IWE.
Zhe Zhao, Tao Liu, Shen Li, Bofang Li and Xiaoyong Du. Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics. EMNLP, 2017. The source code in Python is ngram2vec.
Shaosheng Cao, Wei Lu, Jun Zhou, Xiaolong Li. cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information. AAAI, 2018. The source code in C++ is cw2vec.

Evaluation of Word Embeddings

Tobias Schnabel, Igor Labutov, David Mimno and Thorsten Joachims. Evaluation methods for unsupervised word embeddings. EMNLP, 2015. The slides are here.
Billy Chiu, Anna Korhonen and Sampo Pyysalo. Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance. Proceedings of the 1st Workshop on Evaluating Vector-Space Rep- resentations for NLP, 2016.
Stanisław Jastrzebski, Damian Leśniak, Wojciech Marian Czarnecki. How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks. arXiv:1702.02170, 2017. The source code in Python is word-embeddings-benchmarks.
Amir Bakarov. A Survey of Word Embeddings Evaluation Methods. arXiv:1801.09536, 2018.

Evaluation of Word Embeddings for Chinese

Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du. Analogical Reasoning on Chinese Morphological and Semantic Relations. arXiv:1805.06504, ACL, 2018.
- The project Chinese-Word-Vectors provides 100+ Chinese Word Embeddings trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora. Moreover, it provides a Chinese analogical reasoning dataset CA8 and an evaluation toolkit for users to evaluate the quality of their word vectors.
Yuanyuan Qiu, Hongzheng Li, Shen Li, Yingdi Jiang, Renfen Hu, Lijiao Yang. Revisiting Correlations between Intrinsic and Extrinsic Evaluations of Word Embeddings. Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2018.

Sentence Representations

Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom. A convolutional neural network for modelling sentences. arXiv:1404.2188, 2014.
Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. arXiv:1405.4053v2, 2014.
- Distributed Memory Model of Paragraph Vectors (PV-DM): The inspiration is that the paragraph vectors are asked to contribute to the prediction task of the next word given many contexts sampled from the paragraph. The paragraph vector and word vectors are averaged or concatenated to predict the next word in a context. The contexts are fixed-length and sampled from a sliding window over the paragraph. The paragraph vector is shared across all contexts generated from the same paragraph but not across paragraphs. However, the word vector matrix is shared across paragraphs. The downside is at prediction time, inference needs to be performed to compute a new vector.
- Distributed Bag of Words version of Paragraph Vector (PV-DBOW): This modle is to ignore the context words in the input, but force the model to predict words randomly sampled from the paragraph in the output.
Yoon Kim. Convolutional neural networks for sentence classification. arXiv:1408.5882, EMNLP 2014.
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun and Sanja Fidler. Skip-Thought Vectors. arXiv:1506.06726, 2015. The source code in Python is skip-thoughts. The TensorFlow implementation of Skip-Thought Vectors is skip_thoughts
- Instead of using a word to predict its surrounding context, they instead encode a sentence to predict the sentences around it. The skip-thoughts is in the framework of encoder-decoder models: an encoder maps words to a sentence vector and a decoder is used to generate the surrounding sentences.
- The end product of skip-thoughts is the encoder, which can then be used to generate fixed length representations of sentences. The decoders are thrown away after training.
- A good tutorial to this paper is My Thoughts On Skip Thoughts.
Andrew M. Dai, Quoc V. Le. Semi-supervised Sequence Learning. arXiv:1511.01432, 2015.
- They present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a "pretraining" step for a later supervised sequence learning algorithm.
- Their semi-supervised learning approach is related to Skip-Thought vectors with two differences. The first difference is that Skip-Thought is a harder objective, because it predicts adjacent sentences. The second is that Skip-Thought is a pure unsupervised learning algorithm, without fine-tuning.
John Wieting and Mohit Bansal and Kevin Gimpel and Karen Livescu. Towards Universal Paraphrastic Sentence Embeddings. arXiv:1511.08198, ICLR 2016. The source code written in Python is iclr2016.
Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, Lawrence Carin. Learning Generic Sentence Representations Using Convolutional Neural Networks. arXiv:1611.07897, EMNLP 2017. The training code written in Python is ConvSent.
Matteo Pagliardini, Prakhar Gupta, Martin Jaggi. Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features. arXiv:1703.02507, NAACL 2018. The source code in Python is sent2vec.
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio. A Structured Self-attentive Sentence Embedding. arXiv:1703.03130, ICLR 2017.
Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, Jason Weston. StarSpace: Embed All The Things. arXiv:1709.03856v5, 2017. The source code in C++11 is StarSpace.
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, Antoine Bordes. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. arXiv:1705.02364v5, EMNLP 2017. The source code in Python is InferSent.
Sanjeev Arora, Yingyu Liang, Tengyu Ma. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. ICLR 2017. The source code written in Python is SIF. SIF_mini_demo is a minimum example for the sentence embedding algorithm. sentence2vec is another implementation.
- A weighted average of words by their distance from the first principal component of a sentence is proposed, which yields a remarkably robust approximate sentence vector embedding.
- However, this “smooth inverse frequency” approach comes with limitations. Not only is calculating PCA for every sentence in a document computationally complex, but the first principal component of a small number of normally distributed words in a high dimensional space is subject to random fluctuation. Their calculation of word frequencies from the unigram count of the word in the corpus also means that their approach still does not work for out-of-vocab words, has no equivalent in other vector spaces and can’t be generated from the word vectors alone.
Yixin Nie, Mohit Bansal. Shortcut-Stacked Sentence Encoders for Multi-Domain Inference. arXiv:1708.02312, EMNLP 2017. The source code in Python is multiNLI_encoder. The new repo ResEncoder is for Residual-connected sentence encoder for NLI.
Allen Nie, Erin D. Bennett, Noah D. Goodman. DisSent: Sentence Representation Learning from Explicit Discourse Relations. arXiv:1710.04334v2, 2018.
Andreas Rücklé, Steffen Eger, Maxime Peyrard, Iryna Gurevych. Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations. arXiv:1803.01400v2, 2018. The source code written in Python is arxiv2018-xling-sentence-embeddings.
Lajanugen Logeswaran, Honglak Lee. An efficient framework for learning sentence representations. arXiv:1803.02893, ICLR 2018. The open review comments are listed here.
Eric Zelikman. Context is Everything: Finding Meaning Statistically in Semantic Spaces. arXiv:1803.08493, 2018.
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil. Universal Sentence Encoder. arXiv:1803.11175v2, 2018.
Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J Pal. Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning. arXiv:1804.00079, ICLR 2018.

Evaluation of Sentence Embeddings

Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, Yoav Goldberg. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks. arXiv:1608.04207v3, 2017.
- They define prediction tasks around isolated aspects of sentence structure (namely sentence length, word content, and word order), and score representations by the ability to train a classifier to solve each prediction task when using the representation as input.
Alexis Conneau, Douwe Kiela. SentEval: An Evaluation Toolkit for Universal Sentence Representations. arXiv:1803.05449, LREC 2018. The source code in Python is SentEval. SentEval encompasses a variety of tasks, including binary and multi-class classification, natural language inference and sentence similarity.
Alex Wang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. arXiv:1804.07461, 2018.
Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni. What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv:1805.01070v2, 2018.
Christian S. Perone, Roberto Silveira, Thomas S. Paula. Evaluation of sentence embeddings in downstream and linguistic probing tasks. arXiv:1806.06259, 2018.

Cross-lingual Sentence Representations

LASER is a library to calculate multilingual sentence embeddings:
- Holger Schwenk and Matthijs Douze. Learning Joint Multilingual Sentence Representations with Neural Machine Translation. ACL workshop on Representation Learning for NLP, 2017.
- Holger Schwenk and Xian Li. A Corpus for Multilingual Document Classification in Eight Languages. LREC, 2018.
- Holger Schwenk. Filtering and Mining Parallel Data in a Joint Multilingual Space. arXiv:1805.09822, ACL, 2018.
- Mikel Artetxe, Holger Schwenk. Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings. arXiv:1811.01136, 2018.
- Mikel Artetxe, Holger Schwenk. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. arXiv:1812.10464, 2018.
  - They learn a single, language agnostic BiLSTM shared encoder that can handle 93 different languages, which is coupled with an auxiliary decoder and trained over parallel corpora.

Evaluation of Cross-lingual Sentence Representations

Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk, Veselin Stoyanov. XNLI: Evaluating Cross-lingual Sentence Representations. arXiv:1809.05053, EMNLP 2018.

Language Representations

Jeremy Howard, Sebastian Ruder. Universal Language Model Fine-tuning for Text Classification. arXiv:1801.06146v5, ACL 2018.
- To address the lack of labeled data and to make NLP classification easier and less time-consuming, the researchers suggest applying transfer learning to NLP problems. Thus, instead of training the model from scratch, you can use another model that has been trained to solve a similar problem as the basis, and then fine-tune the original model to solve your specific problem.
- This fine-tuning should take into account several important considerations: a) Different layers should be fine-tuned to different extents as they capture different kinds of information. b) Adapting model’s parameters to task-specific features will be more efficient if the learning rate is firstly linearly increased and then linearly decayed. c) Fine-tuning all layers at once is likely to result in catastrophic forgetting; thus, it would be better to gradually unfreeze the model starting from the last layer.
- ULMFiT consists of three stages: a) The LM is trained on a general-domain corpus to capture general features of the language in different layers. b) The full LM is fine-tuned on target task data using discriminative fine-tuning and slanted triangular learning rates to learn task-specific features. c) The classifier is fine-tuned on the target task using gradual unfreezing and STLR to preserve low-level representations and adapt high-level ones.
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. Deep contextualized word representations. arXiv:1802.05365, NAACL 2018. The source code is ELMo.
- To generate word embeddings as a weighted sum of the internal states of a deep bi-directional language model (biLM), pre-trained on a large text corpus.
- To include representations from all layers of a biLM as different layers represent different types of information.
- To base ELMo representations on characters so that the network can use morphological clues to “understand” out-of-vocabulary tokens unseen in training.
Matthew E. Peters, Mark Neumann, Luke Zettlemoyer, Wen-tau Yih. Dissecting Contextual Word Embeddings: Architecture and Representation. arXiv:1808.08949v2, EMNLP 2018.
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving Language Understanding by Generative Pre-Training. Technical report, OpenAI, 2018. The source code written in Python is finetune-transformer-lm.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805, 2018.
- TensorFlow code and pre-trained models for BERT are in bert.
- PyTorch versions of BERT are pytorch-pretrained-BERT and BERT-pytorch.
- Chainer implementation of BERT is bert-chainer.
- Using BERT model as a sentence encoding service is implemented as bert-as-service.

Cross-lingual Language Representations

Guillaume Lample, Alexis Conneau. Cross-lingual Language Model Pretraining. arXiv:1901.07291, 2019.

Extractive Text Summarization

H. P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 1958. Luhn's method is as follows:
1. Ignore Stopwords: Common words (known as stopwords) are ignored.
2. Determine Top Words: The most often occuring words in the document are counted up.
3. Select Top Words: A small number of the top words are selected to be used for scoring.
4. Select Top Sentences: Sentences are scored according to how many of the top words they contain. The top four sentences are selected for the summary.
H. P. Edmundson. New Methods in Automatic Extracting. Journal of the Association for Computing Machinery, 1969.
David M. Blei, Andrew Y. Ng and Michael I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003. The source code in Python is sklearn.decomposition.LatentDirichletAllocation. Reimplement Luhn's algorithm, but with topics instead of words and applied to several documents instead of one.
1. Train LDA on all products of a certain type (e.g. all the books)
2. Treat all the reviews of a particular product as one document, and infer their topic distribution
3. Infer the topic distribution for each sentence
4. For each topic that dominates the reviews of a product, pick some sentences that are themselves dominated by that topic.
David M. Blei. Probabilistic Topic Models. Communications of the ACM, 2012.
Rada Mihalcea and Paul Tarau. TextRank: Bringing Order into Texts. ACL, 2004. The source code in Python is pytextrank. pytextrank works in four stages, each feeding its output to the next:
- Part-of-Speech Tagging and lemmatization are performed for every sentence in the document.
- Key phrases are extracted along with their counts, and are normalized.
- Calculates a score for each sentence by approximating jaccard distance between the sentence and key phrases.
- Summarizes the document based on most significant sentences and key phrases.
Federico Barrios, Federico López, Luis Argerich and Rosa Wachenchauzer. Variations of the Similarity Function of TextRank for Automated Summarization. arXiv:1602.03606, 2016. The source code in Python is gensim.summarization. Gensim's summarization only works for English for now, because the text is pre-processed so that stop words are removed and the words are stemmed, and these processes are language-dependent. TextRank works as follows:
- Pre-process the text: remove stop words and stem the remaining words.
- Create a graph where vertices are sentences.
- Connect every sentence to every other sentence by an edge. The weight of the edge is how similar the two sentences are.
- Run the PageRank algorithm on the graph.
- Pick the vertices(sentences) with the highest PageRank score.
TextTeaser uses basic summarization features and build from it. Those features are:
- Title feature is used to score the sentence with the regards to the title. It is calculated as the count of words which are common to title of the document and sentence.
- Sentence length is scored depends on how many words are in the sentence. TextTeaser defined a constant “ideal” (with value 20), which represents the ideal length of the summary, in terms of number of words. Sentence length is calculated as a normalized distance from this value.
- Sentence position is where the sentence is located. I learned that introduction and conclusion will have higher score for this feature.
- Keyword frequency is just the frequency of the words used in the whole text in the bag-of-words model (after removing stop words).
Güneş Erkan and Dragomir R. Radev. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. 2004.
- LexRank uses IDF-modified Cosine as the similarity measure between two sentences. This similarity is used as weight of the graph edge between two sentences. LexRank also incorporates an intelligent post-processing step which makes sure that top sentences chosen for the summary are not too similar to each other.
Latent Semantic Analysis(LSA) Tutorial.
Josef Steinberger and Karel Jezek. Using Latent Semantic Analysis in Text Summarization and Summary Evaluation. Proc. ISIM’04, 2004.
Josef Steinberger and Karel Ježek. Text summarization and singular value decomposition. International Conference on Advances in Information Systems, 2004.
Josef Steinberger, Massimo Poesio, Mijail A Kabadjov and Karel Ježek. Two uses of anaphora resolution in summarization. Information Processing & Management, 2007.
James Clarke and Mirella Lapata. Modelling Compression with Discourse Constraints. EMNLP-CoNLL, 2007.
Dan Gillick and Benoit Favre. A Scalable Global Model for Summarization. ACL, 2009.
Ani Nenkova and Kathleen McKeown. Automatic summarization. Foundations and Trend in Information Retrieval, 2011. The slides are also available.
Vahed Qazvinian, Dragomir R. Radev, Saif M. Mohammad, Bonnie Dorr, David Zajic, Michael Whidby, Taesun Moon. Generating Extractive Summaries of Scientific Paradigms. arXiv:1402.0556, 2014.
Kågebäck, Mikael, et al. Extractive summarization using continuous vector space models. Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC)@ EACL. 2014.
Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser, Oriol Vinyals. Sentence Compression by Deletion with LSTMs. EMNLP 2015.
Ramesh Nallapati, Bowen Zhou, Mingbo Ma. Classify or Select: Neural Architectures for Extractive Document Summarization. arXiv:1611.04244. 2016.
Liangguo Wang, Jing Jiang, Hai Leong Chieu, Chen Hui Ong, Dandan Song, Lejian Liao. Can Syntax Help? Improving an LSTM-based Sentence Compression Model for New Domains. ACL 2017.
Ramesh Nallapati, Feifei Zhai, Bowen Zhou. SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents. arXiv:1611.04230, AAAI 2017.
Shashi Narayan, Nikos Papasarantopoulos, Mirella Lapata, Shay B. Cohen. Neural Extractive Summarization with Side Information. arXiv:1704.04530, 2017.
Rakesh Verma, Daniel Lee. Extractive Summarization: Limits, Compression, Generalized Model and Heuristics. arXiv:1704.05550, 2017.
Ed Collins, Isabelle Augenstein, Sebastian Riedel. A Supervised Approach to Extractive Summarisation of Scientific Papers. arXiv:1706.03946, 2017.
Sukriti Verma, Vagisha Nidhi. Extractive Summarization using Deep Learning. arXiv:1708.04439, 2017.
Parth Mehta, Gaurav Arora, Prasenjit Majumder. Attention based Sentence Extraction from Scientific Articles using Pseudo-Labeled data. arXiv:1802.04675, 2018.
Shashi Narayan, Shay B. Cohen, Mirella Lapata. Ranking Sentences for Extractive Summarization with Reinforcement Learning. arXiv:1802.08636, NAACL, 2018.
Aakash Sinha, Abhishek Yadav, Akshay Gahlot. Extractive Text Summarization using Neural Networks. arXiv:1802.10137, 2018.
Yuxiang Wu, Baotian Hu. Learning to Extract Coherent Summary via Deep Reinforcement Learning. arXiv:1804.07036, AAAI, 2018.
Tanner A. Bohn, Charles X. Ling. Neural Sentence Location Prediction for Summarization. arXiv:1804.08053, 2018.
Kamal Al-Sabahi, Zhang Zuping, Mohammed Nadher. A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS). arXiv:1805.07799, IEEE Access, 2018.
Sansiri Tarnpradab, Fei Liu, Kien A. Hua. Toward Extractive Summarization of Online Forum Discussions via Hierarchical Attention Networks. 2018.
Kristjan Arumae, Fei Liu. Reinforced Extractive Summarization with Question-Focused Rewards. arXiv:1805.10392, 2018.
Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao. Neural Document Summarization by Jointly Learning to Score and Select Sentences. arXiv:1807.02305, ACL 2018.
Xingxing Zhang, Mirella Lapata, Furu Wei, Ming Zhou. Neural Latent Extractive Document Summarization. arXiv:1808.07187, EMNLP 2018.
Chandra Shekhar Yadav. Automatic Text Document Summarization using Semantic-based Analysis. arXiv:1811.06567, 2018.
Jiacheng Xu, Greg Durrett. Neural Extractive Text Summarization with Syntactic Compression. arXiv:1902.00863v1, 2019.

Abstractive Text Summarization

Alexander M. Rush, Sumit Chopra, Jason Weston. A Neural Attention Model for Abstractive Sentence Summarization. EMNLP, 2015. The source code in LUA Torch7 is NAMAS.
- They use sequence-to-sequence encoder-decoder LSTM with attention.
- They use the first sentence of a document. The source document is quite small (about 1 paragraph or ~500 words in the training dataset of Gigaword) and the produced output is also very short (about 75 characters). It remains an open challenge to scale up these limits - to produce longer summaries over multi-paragraph text input (even good LSTM models with attention models fall victim to vanishing gradients when the input sequences become longer than a few hundred items).
- The evaluation method used for automatic summarization has traditionally been the ROUGE metric - which has been shown to correlate well with human judgment of summary quality, but also has a known tendency to encourage "extractive" summarization - so that using ROUGE as a target metric to optimize will lead a summarizer towards a copy-paste behavior of the input instead of the hoped-for reformulation type of summaries.
Peter Liu and Xin Pan. Sequence-to-Sequence with Attention Model for Text Summarization. 2016. The source code in Python is textsum.
- They use sequence-to-sequence encoder-decoder LSTM with attention and bidirectional neural net.
- They use the first 2 sentences of a document with a limit at 120 words.
- The scores achieved by Google’s textsum are 42.57 ROUGE-1 and 23.13 ROUGE-2.
Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. arXiv:1602.06023, 2016. The souce code written in Python is Summarization or abstractive-text-summarization.
- They use GRU with attention and bidirectional neural net.
- They use the first 2 sentences of a documnet with a limit at 120 words.
- They use the Large vocabulary trick (LVT) of Jean et al. 2014, which means when you decode, use only the words that appear in the source - this reduces perplexity. But then you lose the capability to do "abstractive" summary. So they do "vocabulary expansion" by adding a layer of "word2vec nearest neighbors" to the words in the input.
- Feature rich encoding - they add TFIDF and Named Entity types to the word embeddings (concatenated) to the encodings of the words - this adds to the encoding dimensions that reflect "importance" of the words.
- The most interesting of all is what they call the "Switching Generator/Pointer" layer. In the decoder, they add a layer that decides to either generate a new word based on the context / previously generated word (usual decoder) or copy a word from the input (that is - add a pointer to the input). They learn when to do Generate vs. Pointer and when it is a Pointer which word of the input to Point to.
Konstantin Lopyrev. Generating News Headlines with Recurrent Neural Networks. arXiv:1512.01712, 2015. The source code in Python is headlines.
Jiwei Li, Minh-Thang Luong and Dan Jurafsky. A Hierarchical Neural Autoencoder for Paragraphs and Documents. arXiv:1506.01057, 2015. The source code in Matlab is Hierarchical-Neural-Autoencoder.
Sumit Chopra, Alexander M. Rush and Michael Auli. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks. NAACL, 2016.
Jianpeng Cheng, Mirella Lapata. Neural Summarization by Extracting Sentences and Words. arXiv:1603.07252, 2016.
- This paper uses attention as a mechanism for identifying the best sentences to extract, and then go beyond that to generate an abstractive summary.
Siddhartha Banerjee, Prasenjit Mitra, Kazunari Sugiyama. Generating Abstractive Summaries from Meeting Transcripts. arXiv:1609.07033, Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng' 2015.
Siddhartha Banerjee, Prasenjit Mitra, Kazunari Sugiyama. Multi-document abstractive summarization using ILP based multi-sentence compression. arXiv:1609.07034, 2016.
Suzuki, Jun, and Masaaki Nagata. Cutting-off Redundant Repeating Generations for Neural Abstractive Summarization. EACL 2017 (2017): 291.
Jiwei Tan and Xiaojun Wan. Abstractive Document Summarization with a Graph-Based Attentional Neural Model. ACL, 2017.
Preksha Nema, Mitesh M. Khapra, Balaraman Ravindran and Anirban Laha. Diversity driven attention model for query-based abstractive summarization. ACL,2017
Romain Paulus, Caiming Xiong, Richard Socher. A Deep Reinforced Model for Abstractive Summarization. arXiv:1705.04304, 2017. The related blog is Your tldr by an ai: a deep reinforced model for abstractive summarization.
- Their model is trained with teacher forcing and reinforcement learning at the same time, being able to make use of both word-level and whole-summary-level supervision to make it more coherent and readable.
Shibhansh Dohare, Harish Karnick. Text Summarization using Abstract Meaning Representation. arXiv:1706.01678, 2017.
Piji Li, Wai Lam, Lidong Bing, Zihao Wang. Deep Recurrent Generative Decoder for Abstractive Text Summarization. arXiv:1708.00625, 2017.
Xinyu Hua, Lu Wang. A Pilot Study of Domain Adaptation Effect for Neural Abstractive Summarization. arXiv:1707.07062, 2017.
Angela Fan, David Grangier, Michael Auli. Controllable Abstractive Summarization. arXiv:1711.05217, 2017.
Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, Hongyan Li. Generative Adversarial Network for Abstractive Text Summarization. arXiv:1711.09357, 2017.
Johan Hasselqvist, Niklas Helmertz, Mikael Kågebäck. Query-Based Abstractive Summarization Using Neural Networks. arXiv:1712.06100, 2017.
Tal Baumel, Matan Eyal, Michael Elhadad. Query Focused Abstractive Summarization: Incorporating Query Relevance, Multi-Document Coverage, and Summary Length Constraints into seq2seq Models. arXiv:1801.07704, 2018.
André Cibils, Claudiu Musat, Andreea Hossman, Michael Baeriswyl. Diverse Beam Search for Increased Novelty in Abstractive Summarization. arXiv:1802.01457, 2018.
Chieh-Teng Chang, Chi-Chia Huang, Jane Yung-Jen Hsu. A Hybrid Word-Character Model for Abstractive Summarization. arXiv:1802.09968, 2018.
Asli Celikyilmaz, Antoine Bosselut, Xiaodong He, Yejin Choi. Deep Communicating Agents for Abstractive Summarization. arXiv:1803.10357, 2018.
Piji Li, Lidong Bing, Wai Lam. Actor-Critic based Training Framework for Abstractive Summarization. arXiv:1803.11070, 2018.
Paul Azunre, Craig Corcoran, David Sullivan, Garrett Honke, Rebecca Ruppel, Sandeep Verma, Jonathon Morgan. Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings. arXiv:1804.01503, 2018.
Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, Nazli Goharian. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. arXiv:1804.05685, 2018.
Ramakanth Pasunuru, Mohit Bansal. Multi-Reward Reinforced Summarization with Saliency and Entailment. arXiv:1804.06451, 2018.
Jianmin Zhang, Jiwei Tan, Xiaojun Wan. Towards a Neural Network Approach to Abstractive Multi-Document Summarization. arXiv:1804.09010, 2018.
Shuming Ma, Xu Sun, Junyang Lin, Xuancheng Ren. A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification. arXiv:1805.01089v2, IJCAI 2018.
Li Wang, Junlin Yao, Yunzhe Tao, Li Zhong, Wei Liu, Qiang Du. A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization. arXiv:1805.03616, International Joint Conference on Artificial Intelligence and European Conference on Artificial Intelligence (IJCAI-ECAI), 2018.
Guokan Shang, Wensi Ding, Zekun Zhang, Antoine J.-P. Tixier, Polykarpos Meladianos, Michalis Vazirgiannis, Jean-Pierre Lorre´. Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization. arXiv:1805.05271, 2018.
Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman Sadeh, Noah A. Smith. Toward Abstractive Summarization Using Semantic Representations. arXiv:1805.10399, 2018.
Han Guo, Ramakanth Pasunuru, Mohit Bansal. Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation. arXiv:1805.11004, ACL 2018.
Yen-Chun Chen, Mohit Bansal. Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. arXiv:1805.11080, ACL 2018. The souce code written in Python is fast_abs_rl.
Reinald Kim Amplayo, Seonjae Lim, Seung-won Hwang. Entity Commonsense Representation for Neural Abstractive Summarization. arXiv:1806.05504, NAACL 2018.
Kaiqiang Song, Lin Zhao, Fei Liu. Structure-Infused Copy Mechanisms for Abstractive Summarization. arXiv:1806.05658, 2018.
Kexin Liao, Logan Lebanoff, Fei Liu. Abstract Meaning Representation for Multi-Document Summarization. arXiv:1806.05655, 2018.
Shibhansh Dohare, Vivek Gupta and Harish Karnick. Unsupervised Semantic Abstractive Summarization. ACL, July 2018.
Niantao Xie, Sujian Li, Huiling Ren, Qibin Zhai. Abstractive Summarization Improved by WordNet-based Extractive Sentences. arXiv:1808.01426, NLPCC 2018.
Wojciech Kryściński, Romain Paulus, Caiming Xiong, Richard Socher. Improving Abstraction in Text Summarization. arXiv:1808.07913, 2018.
Hardy, Andreas Vlachos. Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representation. arXiv:1808.09160, EMNLP 2018.
Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush. Bottom-Up Abstractive Summarization. arXiv:1808.10792, 2018.
Yichen Jiang, Mohit Bansal. Closed-Book Training to Improve Summarization Encoder Memory. arXiv:1809.04585, 2018.
Raphael Schumann. Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder. arXiv:1809.05233, 2018.
Kamal Al-Sabahi, Zhang Zuping, Yang Kang. Bidirectional Attentional Encoder-Decoder Model and Bidirectional Beam Search for Abstractive Summarization. arXiv:1809.06662, 2018.
Tomonori Kodaira, Mamoru Komachi. The Rule of Three: Abstractive Text Summarization in Three Bullet Points. arXiv:1809.10867, PACLIC 2018, 2018.
Byeongchang Kim, Hyunwoo Kim, Gunhee Kim. Abstractive Summarization of Reddit Posts with Multi-level Memory Networks. arXiv:1811.00783, 2018. The github project is MMN including the dataset.
Tian Shi, Yaser Keneshloo, Naren Ramakrishnan, Chandan K. Reddy. Neural Abstractive Text Summarization with Sequence-to-Sequence Models. arXiv:1812.02303v2, 2018.
Shen Gao, Xiuying Chen, Piji Li, Zhaochun Ren, Lidong Bing, Dongyan Zhao, Rui Yan. Abstractive Text Summarization by Incorporating Reader Comments. arXiv:1812.05407v1, AAAI 2019.
Haoyu Zhang, Yeyun Gong, Yu Yan, Nan Duan, Jianjun Xu, Ji Wang, Ming Gong, Ming Zhou. Pretraining-Based Natural Language Generation for Text Summarization. arXiv:1902.09243v2, 2019.
Soheil Esmaeilzadeh, Gao Xian Peh, Angela Xu. Neural Abstractive Text Summarization and Fake News Detection. arXiv:1904.00788v1, 2019.

Text Summarization

Eduard Hovy and Chin-Yew Lin. Automated text summarization and the summarist system. In Proceedings of a Workshop on Held at Baltimore, Maryland, ACL, 1998.
Eduard Hovy and Chin-Yew Lin. Automated Text Summarization in SUMMARIST. In Advances in Automatic Text Summarization, 1999.
Dipanjan Das and Andre F.T. Martins. A survey on automatic text summarization. Technical report, CMU, 2007
J. Leskovec, L. Backstrom, J. Kleinberg. Meme-tracking and the Dynamics of the News Cycle. ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2009.
Ryang, Seonggi, and Takeshi Abekawa. "Framework of automatic text summarization using reinforcement learning." In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 256-265. Association for Computational Linguistics, 2012. [not neural-based methods]
King, Ben, Rahul Jha, Tyler Johnson, Vaishnavi Sundararajan, and Clayton Scott. "Experiments in Automatic Text Summarization Using Deep Neural Networks." Machine Learning (2011).
Liu, Yan, Sheng-hua Zhong, and Wenjie Li. "Query-Oriented Multi-Document Summarization via Unsupervised Deep Learning." AAAI. 2012.
He, Zhanying, Chun Chen, Jiajun Bu, Can Wang, Lijun Zhang, Deng Cai, and Xiaofei He. "Document Summarization Based on Data Reconstruction." In AAAI. 2012.
Mohsen Pourvali, Mohammad Saniee Abadeh. Automated Text Summarization Base on Lexicales Chain and graph Using of WordNet and Wikipedia Knowledge Base. arXiv:1203.3586, 2012.
PadmaPriya, G., and K. Duraiswamy. An Approach For Text Summarization Using Deep Learning Algorithm. Journal of Computer Science 10, no. 1 (2013): 1-9.
Rushdi Shams, M.M.A. Hashem, Afrina Hossain, Suraiya Rumana Akter, Monika Gope. Corpus-based Web Document Summarization using Statistical and Linguistic Approach. arXiv:1304.2476, Procs. of the IEEE International Conference on Computer and Communication Engineering (ICCCE10), pp. 115-120, Kuala Lumpur, Malaysia, May 11-13, (2010).
Juan-Manuel Torres-Moreno. Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization. arXiv:1209.3126, 2012.
Rioux, Cody, Sadid A. Hasan, and Yllias Chali. Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning. In EMNLP, pp. 681-690. 2014.[not neural-based methods]
Fatma El-Ghannam, Tarek El-Shishtawy. Multi-Topic Multi-Document Summarizer. arXiv:1401.0640, 2014.
Denil, Misha, Alban Demiraj, and Nando de Freitas. Extraction of Salient Sentences from Labelled Documents. arXiv:1412.6815, 2014.
Denil, Misha, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, and Nando de Freitas.Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network. arXiv:1406.3830, 2014.
Cao, Ziqiang, Furu Wei, Li Dong, Sujian Li, and Ming Zhou. Ranking with Recursive Neural Networks and Its Application to Multi-document Summarization. AAAI, 2015.
Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman Sadeh, and Noah A. Smith. Toward Abstractive Summarization Using Semantic Representations. NAACL, 2015.
Wenpeng Yin， Yulong Pei. Optimizing Sentence Modeling and Selection for Document Summarization. IJCAI, 2015.
Liu, He, Hongliang Yu, and Zhi-Hong Deng. Multi-Document Summarization Based on Two-Level Sparse Representation Model. In Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.
Jin-ge Yao, Xiaojun Wan and Jianguo Xiao. Compressive Document Summarization via Sparse Optimization. IJCAI, 2015.
Piji Li, Lidong Bing, Wai Lam, Hang Li, and Yi Liao. Reader-Aware Multi-Document Summarization via Sparse Coding. arXiv:1504.07324, IJCAI, 2015.
Marta Aparício, Paulo Figueiredo, Francisco Raposo, David Martins de Matos, Ricardo Ribeiro, Luís Marujo. Summarization of Films and Documentaries Based on Subtitles and Scripts. arXiv:1506.01273, 2015.
Luís Marujo, Ricardo Ribeiro, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell. Extending a Single-Document Summarizer to Multi-Document: a Hierarchical Approach. arXiv:1507.02907, 2015.
Xiaojun Wan, Yansong Feng and Weiwei Sun. Automatic Text Generation: Research Progress and Future Trends. Book Chapter in CCF 2014-2015 Annual Report on Computer Science and Technology in China (In Chinese), 2015.
Xiaojun Wan, Ziqiang Cao, Furu Wei, Sujian Li, Ming Zhou. Multi-Document Summarization via Discriminative Summary Reranking. arXiv:1507.02062, 2015.
Gulcehre, Caglar, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou, and Yoshua Bengio. Pointing the Unknown Words. arXiv:1603.08148, 2016.
Jiatao Gu, Zhengdong Lu, Hang Li, Victor O.K. Li. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. arXiv:1603.06393, ACL, 2016.
- They addressed an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. In this paper, they incorporated copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure. CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence.
Jianmin Zhang, Jin-ge Yao and Xiaojun Wan. Toward constructing sports news from live text commentary. In Proceedings of ACL, 2016.
Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei. "AttSum: Joint Learning of Focusing and Summarization with Neural Attention". arXiv:1604.00125, 2016
Ayana, Shiqi Shen, Yu Zhao, Zhiyuan Liu and Maosong Sun. Neural Headline Generation with Sentence-wise Optimization. arXiv:1604.01904, 2016.
Ayana, Shiqi Shen, Zhiyuan Liu and Maosong Sun. Neural Headline Generation with Minimum Risk Training. 2016.
Lu Wang, Hema Raghavan, Vittorio Castelli, Radu Florian, Claire Cardie. A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization. arXiv:1606.07548, 2016.
Milad Moradi, Nasser Ghadiri. Different approaches for identifying important concepts in probabilistic biomedical text summarization. arXiv:1605.02948, 2016.
Kikuchi, Yuta, Graham Neubig, Ryohei Sasano, Hiroya Takamura, and Manabu Okumura. Controlling Output Length in Neural Encoder-Decoders. arXiv:1609.09552, 2016.
Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei and Hui Jiang. Distraction-Based Neural Networks for Document Summarization. arXiv:1610.08462, IJCAI, 2016.
Wang, Lu, and Wang Ling. Neural Network-Based Abstract Generation for Opinions and Arguments. NAACL, 2016.
Yishu Miao, Phil Blunsom. Language as a Latent Variable: Discrete Generative Models for Sentence Compression. EMNLP, 2016.
Takase, Sho, Jun Suzuki, Naoaki Okazaki, Tsutomu Hirao, and Masaaki Nagata. Neural headline generation on abstract meaning representation. EMNLP, 1054-1059, 2016.
Wenyuan Zeng, Wenjie Luo, Sanja Fidler, Raquel Urtasun. Efficient Summarization with Read-Again and Copy Mechanism. arXiv:1611.03382, 2016.
Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei. Improving Multi-Document Summarization via Text Classification. arXiv:1611.09238, 2016.
Hongya Song, Zhaochun Ren, Piji Li, Shangsong Liang, Jun Ma, and Maarten de Rijke. Summarizing Answers in Non-Factoid Community Question-Answering. In WSDM 2017: The 10th International Conference on Web Search and Data Mining, 2017.
Piji Li, Zihao Wang, Wai Lam, Zhaochun Ren, Lidong Bing. Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization. In AAAI, 2017.
Yinfei Yang, Forrest Sheng Bao, Ani Nenkova. Detecting (Un)Important Content for Single-Document News Summarization. arXiv:1702.07998, 2017.
Rui Meng, Sanqiang Zhao, Shuguang Han, Daqing He, Peter Brusilovsky, Yu Chi. Deep Keyphrase Generation. arXiv:1704.06879, 2017. The source code written in Python is seq2seq-keyphrase.
Abigail See, Peter J. Liu and Christopher D. Manning. Get To The Point: Summarization with Pointer-Generator Networks. ACL, 2017. The souce code is pointer-generator.
Qingyu Zhou, Nan Yang, Furu Wei and Ming Zhou. Selective Encoding for Abstractive Sentence Summarization. arXiv:1704.07073, ACL, 2017.
Maxime Peyrard and Judith Eckle-Kohler. Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization. ACL, 2017.
Jin-ge Yao, Xiaojun Wan and Jianguo Xiao. Recent Advances in Document Summarization. KAIS, survey paper, 2017.
Pranay Mathur, Aman Gill and Aayush Yadav. Text Summarization in Python: Extractive vs. Abstractive techniques revisited. 2017.
- They compared modern extractive methods like LexRank, LSA, Luhn and Gensim’s existing TextRank summarization module on the Opinosis dataset of 51 (article, summary) pairs. They also had a try with an abstractive technique using Tensorflow’s algorithm textsum, but didn’t obtain good results due to its extremely high hardware demands (7000 GPU hours).
Arman Cohan, Nazli Goharian. Scientific Article Summarization Using Citation-Context and Article's Discourse Structure. arXiv:1704.06619, EMNLP, 2015.
Shuming Ma, Xu Sun, Jingjing Xu, Houfeng Wang, Wenjie Li, Qi Su. Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization. The source code written in Python is SRB.
Arman Cohan, Nazli Goharian. Scientific document summarization via citation contextualization and scientific discourse. arXiv:1706.03449, 2017.
Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan, Dragomir Radev. Graph-based Neural Multi-Document Summarization. arXiv:1706.06681, CoNLL, 2017.
Abeed Sarker, Diego Molla, Cecile Paris. Automated text summarisation and evidence-based medicine: A survey of two domains. arXiv:1706.08162, 2017.
Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys Kochut. Text Summarization Techniques: A Brief Survey. arXiv:1707.02268, 2017.
Demian Gholipour Ghalandari. Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization. arXiv:1708.07690, EMNLP, 2017.
Shuming Ma, Xu Sun. A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification. arXiv:1710.02318, 2017. The source code written in Python is SRB.
Kaustubh Mani, Ishan Verma, Lipika Dey. Multi-Document Summarization using Distributed Bag-of-Words Model. arXiv:1710.02745, 2017.
Liqun Shao, Hao Zhang, Ming Jia, Jie Wang. Efficient and Effective Single-Document Summarizations and A Word-Embedding Measurement of Quality. arXiv:1710.00284, KDIR, 2017.
Mohammad Ebrahim Khademi, Mohammad Fakhredanesh, Seyed Mojtaba Hoseini. Conceptual Text Summarizer: A new model in continuous vector space. arXiv:1710.10994, 2017.
Jingjing Xu. Improving Social Media Text Summarization by Learning Sentence Weight Distribution. arXiv:1710.11332, 2017.
Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer. Generating Wikipedia by Summarizing Long Sequences. arXiv:1801.10198, 2018.
Parth Mehta, Prasenjit Majumder. Content based Weighted Consensus Summarization. arXiv:1802.00946, 2018.
Mayank Chaudhari, Aakash Nelson Mattukoyya. Tone Biased MMR Text Summarization. arXiv:1802.09426, 2018.
Divyanshu Daiya, Anukarsh Singh, Mukesh Jadon. Using Statistical and Semantic Models for Multi-Document Summarization. arXiv:1805.04579, 2018.
Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min Sun. A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss.arXiv:1805.06266, ACL 2018.
Pei Guo, Connor Anderson, Kolten Pearson, Ryan Farrell. Neural Network Interpretation via Fine Grained Textual Summarization. arXiv:1805.08969, 2018.
Kamal Al-Sabahi, Zhang Zuping, Yang Kang. Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings. arXiv:1807.02748, KSII Transactions on Internet and Information Systems, 2018.
Chandra Khatri, Gyanit Singh, Nish Parikh. Abstractive and Extractive Text Summarization using Document Context Vector and Recurrent Neural Networks. arXiv:1807.08000v2, ACM KDD 2018 Deep Learning Day, 2018.
Logan Lebanoff, Kaiqiang Song, Fei Liu. Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization. arXiv:1808.06218, 2018.
Shashi Narayan, Shay B. Cohen, Mirella Lapata. Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. arXiv:1808.08745, 2018.
Parth Mehta, Prasenjit Majumder. Exploiting local and global performance of candidate systems for aggregation of summarization techniques. arXiv:1809.02343, 2018.
Ritwik Mishra and Tirthankar Gayen. "Automatic Lossless-Summarization of News Articles with Abstract Meaning Representation." Procedia Computer Science 135 (September 2018): 178-185.
Chi Zhang, Shagan Sah, Thang Nguyen, Dheeraj Peri, Alexander Loui, Carl Salvaggio, Raymond Ptucha. Semantic Sentence Embeddings for Paraphrasing and Text Summarization. arXiv:1809.10267, IEEE GlobalSIP 2017 Conference, 2018.
Yaser Keneshloo, Naren Ramakrishnan, Chandan K. Reddy. Deep Transfer Reinforcement Learning for Text Summarization. arXiv:1810.06667, 2018.
Elvys Linhares Pontes, Stéphane Huet, Juan-Manuel Torres-Moreno. A Multilingual Study of Compressive Cross-Language Text Summarization. arXiv:1810.10639, 2018.
Patrick Fernandes, Miltiadis Allamanis, Marc Brockschmidt. Structured Neural Summarization. arXiv:1811.01824v2, ICLR 2019.
Hadrien Van Lierde, Tommy W. S. Chow. Query-oriented text summarization based on hypergraph transversals. arXiv:1902.00672v1, 2019.
Erion Çano, Ondřej Bojar. Keyphrase Generation: A Text Summarization Struggle. arXiv:1904.00110v2, 2019.
Abdelkrime Aries, Djamel eddine Zegour, Walid Khaled Hidouci. Automatic text summarization: What has been done and what has to be done. arXiv:1904.00688v1, 2019.

Chinese Text Summarization

Mao Song Sun. Natural Language Processing Based on Naturally Annotated Web Resources. Journal of Chinese Information Processing, 2011.
Baotian Hu, Qingcai Chen and Fangze Zhu. LCSTS: A Large Scale Chinese Short Text Summarization Dataset. 2015.
- They constructed a large-scale Chinese short text summarization dataset constructed from the Chinese microblogging website Sina Weibo, which is released to the public. Then they performed GRU-based encoder-decoder method on it to generate summary. They took the whole short text as one sequence, this may not be very reasonable, because most of short texts contain several sentences.
- LCSTS contains 2,400,591 (short text, summary) pairs as the training set and 1,106 pairs as the test set.
- All the models are trained on the GPUs tesla M2090 for about one week.
- The results show that the RNN with context outperforms RNN without context on both character and word based input.
- Moreover, the performances of the character-based input outperform the word-based input.
Bingzhen Wei, Xuancheng Ren, Xu Sun, Yi Zhang, Xiaoyan Cai, Qi Su. Regularizing Output Distribution of Abstractive Chinese Social Media Text Summarization for Improved Semantic Consistency. arXiv:1805.04033, 2018.
LancoSum provides a toolkit for abstractive summarization, which can achieve the SOTA performance.
- Shuming Ma, Xu Sun, Wei Li, Sujian Li, Wenjie Li, Xuancheng Ren. Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation. arXiv:1803.01465v3, NAACL HLT 2018.
- Junyang Lin, Xu Sun, Shuming Ma, Qi Su. Global Encoding for Abstractive Summarization. arXiv:1805.03989v2, ACL 2018. The source code written in Python is Global-Encoding.
- Shuming Ma, Xu Sun, Junyang Lin and Houfeng Wang. Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization. arXiv:1805.04869, ACL 2018. The source code written in Python is superAE.

Evaluation Metrics

Chin-Yew Lin and Eduard Hovy. Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics. In Proceedings of the Human Technology Conference 2003 (HLT-NAACL-2003).
Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation.
Arman Cohan, Nazli Goharian. Revisiting Summarization Evaluation for Scientific Articles. arXiv:1604.00400, LREC, 2016.
Maxime Peyrard. A Formal Definition of Importance for Summarization. arXiv:1801.08991, 2018.
Kavita Ganesan. ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks. arXiv:1803.01937, 2018. It works by comparing an automatically produced summary or translation against a set of reference summaries (typically human-produced). ROUGE is one of the standard ways to compute effectiveness of auto generated summaries. The evaluation toolkit ROUGE 2.0 is an easy to use for Automatic Summarization tasks.

Opinion Summarization

Kavita Ganesan, ChengXiang Zhai and Jiawei Han. Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions. Proceedings of COLING '10, 2010.
Kavita Ganesan, ChengXiang Zhai and Evelyne Viegas. Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions. WWW'12, 2012.
Kavita Ganesan. Opinion Driven Decision Support System (ODSS). PhD Thesis, University of Illinois at Urbana-Champaign, 2013.
Ozan Irsoy and Claire Cardie. Opinion Mining with Deep Recurrent Neural Networks. In EMNLP, 2014.
Ahmad Kamal. Review Mining for Feature Based Opinion Summarization and Visualization. arXiv:1504.03068, 2015.
Haibing Wu, Yiwei Gu, Shangdi Sun and Xiaodong Gu. Aspect-based Opinion Summarization with Convolutional Neural Networks. 2015.
Lu Wang, Hema Raghavan, Claire Cardie, Vittorio Castelli. Query-Focused Opinion Summarization for User-Generated Content. arXiv:1606.05702, 2016.

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-text-summarization

Table of Contents

Contents

Corpus

Text Summarization Software

Word Representations

Word Representations for Chinese

Evaluation of Word Embeddings

Evaluation of Word Embeddings for Chinese

Sentence Representations

Evaluation of Sentence Embeddings

Cross-lingual Sentence Representations

Evaluation of Cross-lingual Sentence Representations

Language Representations

Cross-lingual Language Representations

Extractive Text Summarization

Abstractive Text Summarization

Text Summarization

Chinese Text Summarization

Evaluation Metrics

Opinion Summarization

About

Releases

Packages

ritwikmishra/awesome-text-summarization

Folders and files

Latest commit

History

Repository files navigation

awesome-text-summarization

Table of Contents

Contents

Corpus

Text Summarization Software

Word Representations

Word Representations for Chinese

Evaluation of Word Embeddings

Evaluation of Word Embeddings for Chinese

Sentence Representations

Evaluation of Sentence Embeddings

Cross-lingual Sentence Representations

Evaluation of Cross-lingual Sentence Representations

Language Representations

Cross-lingual Language Representations

Extractive Text Summarization

Abstractive Text Summarization

Text Summarization

Chinese Text Summarization

Evaluation Metrics

Opinion Summarization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages