TapSearch is a search engine which index documents and searches for top 10 documents in the collection
- It takes in multiple paragraphs of text, assigns a unique ID To each paragraph and stores the words to paragraph mappings on an inverted index. This is similar to what elasticsearch does.
- Given a word to search for, it lists out the top 10 paragraphs in which the word is present.
- Add paragraphs.
- Search for words in the paragraphs.
- Check the paragraphs & words added which are to be searched.
- Go to the deployed link.
- The home page is the index page which contains a textarea to type the paragraphs
- The search page contains a search box after typing the word click on submit, if there is any the top 10 documents are outputted with unique id and theire respective bm25 score.
- The clear page has a button to clear all the documents indexed.
$ git clone https://github.com/Cool-fire/tapsearch
$ cd tapsearch
$ docker-compose up
I would implement several other features to this application like searching an incomplete word and showing relevant documents. I will also take this project to another level by implementing image search by using ML techniques by extracting the text from the image and searching for that information in the indexed documents.
Eight million is the size of the population in Switzerland. Eight million is the number of Indians who wanted to go to last week’s local cricket match in a village out in the countryside, but couldn’t get tickets
\n\n
Indian food is an inspiration to the world. Swiss food is an inspiration to itself
\n\n
Switzerland is rich despite an official poverty rate of 8%. India has riches, including an 8% yearly growth potential which is currently held back by unattended poverty.
Results: Paragraph 1 and 3 are returned
Results: Paragraph 2