Skip to content

stiles/survivoR2py

Repository files navigation

survivoR2py: Survivor data for Python users

About

The code in this repository converts data files in an R package devoted to the Survivor television series from .rda to .csv and .json formats for users who prefer Python or other data sceience tools.

Sources

The data comes from the canonical survivoR package created by David Ohm, et al, which contains detailed datasets about the history of the show, including an episode summary, castaway listing, challenge results and vote history, among many others.

Process

Convert survivoR data

  • scripts/convert_data.py: This script converts the survivoR data by fetching the latest .rda files from the source, storing copies locally in data/raw/rda, and then converting them to comma-delimited text files in data/processed/csv. A Gihub Actions workflow also runs the script once daily at 8 pm Pacific Time to keep the files fresh during a season, storing data in the repo and also on S3.

Storage

The latest version of each table can be downloaded here:

Notes: The converted .rda data files from the original project are stored in this repo's raw/csv directory. The content of those files won't change — only the file formats. Any value errors can be flagged as issues there. They are typically resolved quickly. Also: Please see the original repo for metadata about the individual files.

Related repositories

  • survivor-voteoffs: How did each castaway react to his or her torch getting snuffed? There's data for that.
  • survivor-transcripts: Fetching and storing complete transcripts for each episode of the American television show and analyzing the text for keyword/phrase frequency.

Questions? Corrections?

Please let me know.