Our project’s goal is to analyze subjects that were misconstrued on social media (such as health misinfo or backlash against a law) to understand the construction and spread of certain myths. We are also considering leveraging GPT/our own messaging templates to build communication to myth-bust and help counter misinformation. Note that we will define misinformation on the basis of what has been fact-checked as false through International Fact-Checking Network approved sources and InjusticeWatch (NGO advocating for SAFE-T Act).
Our first step was to scrape data from social media sites, Twitter and Reddit, via their APIs. We attempted to scrape data from Meta but found that access to developers was extremely limited and largely restricted to Ads Library information -- which would mean that our data would not be balanced with other sources (inorganic Ads versus organic content from Reddit or Twitter). We already had access to a Twitter API key and found Reddit highly accessible which tipped the scales in favor of these platforms.
To ensure parallelism of our data, we also discounted YouTube or TikTok. This is because we pulled text/post information from Twitter and Reddit; the difference in entity types (videos/short reels) would corrupt our analysis as it would mean that our sample groups were unbalanced.
Note that the following impositions applied to scraping from these platforms:
- Twitter: the search index had a 7 day limit and pulled data only for the preceding week
- Reddit: limited pulls to 200 posts while using requests + difficulting in filtering specific keywords
- Make the clone of the project repository
- Go to the project directory:
cd 30122-project-lie-brary
- From the directory install virtual environment and dependencies:
poetry install
- Activate the virtual environment:
poetry shell
- Fill credentials on
30122-project-lie-brary/lie_brary/scripts/scrap/key.py
or replace with filled key.py file - Run the project:
- For running the dashboard:
python3 -m lie_brary dashboard
. This will be running on port 8051 by default. (If can not run because port already in use, change the port on30122-project-lie-brary/lie_brary/run.py
line 16 from port=8051 to your available port. - For running the script to update the data:
python3 -m lie_brary getdata
Note: If using windows, you can use python
instead of python3
- Make the clone of the project repository
- Go to the project directory:
cd 30122-project-lie-brary
- Create a virtual environment:
python3 -m venv liebrary_env
- Activate the virtual environment:
source liebrary_env/bin/activate
- Install the dependencies:
pip install -r requirements.txt
- Fill credentials on
30122-project-lie-brary/lie_brary/scripts/scrap/key.py
or replace with filled key.py file - Run the project:
- For running the dashboard:
python3 -m lie_brary dashboard
. This will be running on port 8051 by default. (If can not run because port already in use, change the port on30122-project-lie-brary/lie_brary/run.py
line 16 from port=8051 to your available port. - For running the script to update the data:
python3 -m lie_brary getdata
Note: Tested on Ubuntu with Python 3.8. (Not Recommended because more prone to errors)