Skip to content

Twitter Trends is a web-based application that automatically detects and analyzes emerging topics in real time through hashtags and user mentions in tweets. Twitter being the major microblogging service is a reliable source for trends detection. The project involved extracting live streaming tweets, processing them to find top hashtags and user …

Notifications You must be signed in to change notification settings

Medha11/Twitter-Trends

Repository files navigation

Plan

  • First extractor extracts tweets as usual.
  • Tweets are cleaned and dumped into MongoDB.
  • Aggregation is done for the whole day.
  • Based on the aggregation, top 100 entities are found and the respective tweets are clubbed into one collection.
  • Before it is dumped into the collection, sentiment analysis is done on them.
  • Using each of the 100 collections as a separate document, LDA is performed. If 100 documents is too low, we can split the big documents into smaller ones.
  • The tweets are iterated individually to find the topic to which it belongs.
  • URLs are extracted for each topic which seem to be most relevant.
  • Webpages corresponding to the URLs are downloaded and parsed.
  • A portion of the main content can be displayed after extraction.
  • The graph is approximated as usual but the time span has to be discussed upon.
  • The graph, related tweets and summarizations of the URLs along with the hyperlinks is displayed for each topic on the portal.

Workflow

  • Control of engine starts with manager.py
  • manager.py makes us of multiprocess and subprocess to spawn extractor, preprocessor and postprocessor as separate processes
  • config.py in the utilities package stores tuning parameters such as 'alarm' times, file limit etc.
  • Refer to this .ppt for further information.

Dataset

  • Download dataset(s) from the Drive folder
    • The full_dataset.rar contains all 2 Million tweets
    • Optionally, you can download parts of this dataset from the Parts folder, each (dataset*.rar) containing 200,000 tweets
    • Each .json file contains 10,000 tweets

init

  • Clone the git repository
  • Run python_path.bat to add PYTHONPATH env variable. This needs to be done only once
  • Make necessary changes in the config.py file in *engine\utilities*
  • Run python init.py in Command Prompt to start engine
  • To stop, close all Command Prompt and Python windows

Portal

  • The portal folder is the django project for the web portal
  • Create a database called 'trends'
  • In the settings file, change password for mysql root, in case it is different
  • Run createsuperuser to create an admin
  • Create some top trends using the admin site. I have included a screenshot for UI after creating some sample topics(with ranks). It will redirect to the details page after clicking(see screenshots).
  • Homepage can be opened using the url: 127.0.0.1:8000 or localhost:8000
  • TopTrends model has a topic object and a rank object. Will be modified to include graphs n all when implementation is done.

About

Twitter Trends is a web-based application that automatically detects and analyzes emerging topics in real time through hashtags and user mentions in tweets. Twitter being the major microblogging service is a reliable source for trends detection. The project involved extracting live streaming tweets, processing them to find top hashtags and user …

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published