Skip to content

Implemented a naive indexer for Reuters21578. Implemented single-term query processing. Implmented and compared results of lossy dictionary compression

Notifications You must be signed in to change notification settings

BlackSound1/Reuters21578-naive-indexer

Repository files navigation

Reuters21578 Naive Indexer

Installation

Install the Reuters21578 corpus from http://www.daviddlewis.com/resources/testcollections/reuters21578/. Unzip it and save the folder to the same level as this project. Name the folder reuters21578.

Install all dependencies in requirements.txt.

Running

This project is split into three subprojects. Run them with $ python main.py.

Subproject 1

Creates a naive index out of the text of the Reuters21578 corpus.

Subproject 3

Reads the index created in subproject 1 and performs lossy compression techniques on its dictionary. Shows a table comparing the sizes of the indexes dictionary before and after various compression steps.

Subproject 2

Queries the index with several single-term queries.

About

Implemented a naive indexer for Reuters21578. Implemented single-term query processing. Implmented and compared results of lossy dictionary compression

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages