Skip to content

cuinfoscience/INFO3402-Spring2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INFO 3402 "Information Exposition"

Spring 2022
Brian Keegan, Assistant Professor

Course Objectives

This course teaches students how to communicate the findings from their data analyses and to understand the ethics and implications of data communication and storytelling. Students will develop their computational skills for reshaping, analyzing, summarizing, and visualizing quantitative data using Python’s scientific libraries. Students will also develop their communication skills for interpreting, designing, and storytelling the findings of their analyses for general and professional audiences. Students will be evaluated on both their computational and communication skills through quizzes, weekly assignments, blog posts, and a final project. There is no midterm or final exam. Students will:

  • Analyze and interpret relationships in quantitative data
  • Create effective data visualizations through iterative design
  • Communicate findings for general and professional audiences
  • Think critically about and critique data narratives and visualizations

Course Design

Class will meet two times per week on Tuesdays and Thursdays from 11:00 to 12:15 in Eaton Humanities 1B80. Attendance is required. During class expect to participate in coding exercises and discussions, critique visualizations and narratives, and assist your peers in projects. Tuesday’s lecture will introduce a new concept with slides and a notebook and will have exercises to complete in class or take home. The Weekly Assignment will be also be introduced on Tuesday and will be due on Friday. Thursday’s lecture will review concepts from Tuesday’s class and exercises, have a session where we critique a data narrative or visualization, time to work on the Weekly Assignment, and will end with the Weekly Quiz. The class is split up into six modules for different data communication skills (shaping, distributions, comparisons, trends, relationships, and spatial). There will be a Module Assignment at the end of each Module where students will write a blog post for a general or professional audience communicating the findings of their analysis of a data set.

Publishing

The class will use of the Medium blogging platform. Instruction on how to create accounts, read, write, and post to the class publication will be covered in class. There is extensive documentation in the Medium Help Center as well as multiple tutorials. Students will write their Module Assignments on Medium and submit links via Canvas with the expectation that their writing can be read by the general public. If students are unable or do not want to use the Medium platform, they should email the instructor before Friday, January 21 to work out an alternative arrangement.

Computing

Students will use programming languages for data analysis and visualization. Jupyter notebooks written in Python 3 will be used for all in-class examples and assignments. The Anaconda distribution of Python 3.8 (or above) is strongly recommended to provide all of these programs and other libraries. We will be using the Matplotlib and Seaborn libraries for data visualization. Lectures will include exercises and presentations with the expectation that students participate with their own laptop computers. If students cannot bring a laptop to class, they should email the instructor to work out an alternative arrangement.

If students wish to use an alternative programmatic data analysis software (R, Matlab, Julia,etc.) or other Python data visualization libraries (Plotly, Altair, Bokeh,etc.) they are welcome to do so, but instructional support will only be provided for Python and Matplotlib. Students are not permitted to use spreadsheet(Microsoft Excel, Apple Numbers,etc.) or business intelligence (Tableau, Microsoft PowerBI,etc.) software for assignments. The instructors reserve the right to conduct a code review on any assignment submitted by a student to ensure academic integrity. Students who are unable or unwilling to describe how their submitted code works will lose all credit on the assignment.

Evaluation

Students will be evaluated through four mechanisms. The class has no midterm or final exam.

  • Weekly Assignments (30%). Weekly Assignments are intended to develop students’ skill and confidence applying the technical and expository skills introduced during lecture. There will be a total of 15 Weekly Assignments. Each Weekly Assignment is worth 2% of the final grade (30% cumulative) and are due on Canvas by Friday before midnight. In the absence of an approved excuse, late submissions will be docked50% of their value for every day elapsed since the deadline:assignments submitted after Sunday before midnight will lose all credit. The lowest Weekly Assignment grade will be automatically dropped and there are no opportunities for re-grades on assignments.
  • Weekly Quizzes (15%). Weekly Quizzes are intended to evaluate students’ understanding of the concepts from the readings and lecture. There will be a total of 15 Weekly Quizzes. Each Weekly Quiz is worth1% of the final grade (15% cumulative). The quizzes will be administered via Canvas and will take place during class on Thursdays. The lowest Weekly Quiz grade will be automatically dropped. There are no opportunities to retake the quiz outside of class.
  • Module Assignments (30%). Module Assignments are intended to (1) develop students’ confidence communicating with data to a general audience and (2) generate a public-facing portfolio. There are six Module Assignments in total, one per module. Each Module Assignment is worth 5% of the final grade (30% cumulative) and are due by 11am the Tuesday before the start of the next module. The format and evaluation criteria of each Module Assignment will vary, but will emphasize applying the module’s concepts to a novel dataset. Each Module Assignment will be published as a blog post via the class’s Medium publication. In the absence of an approved excuse, late submissions will be docked 50% of their value for every day elapsed since the deadline:assignments submitted after Thursday before 11am will lose all credit.
  • Final Project (25%). The Final Project is intended to be a portfolio piece highlighting a student’s computational and communication skills. The project will be both a data analysis and write-up with the goal of submitting for external publication as an op-ed, guest column,etc. Further details about the Final Project will be collaboratively developed and detailed later in the course. In the absence of an approved excuse,late submissions will be docked 50% of their value for every day elapsed since the deadline.

Schedule

  • Week 1 — Shaping: Loading & Documentation
    • Introductions and loading data. Using documentation and markdown.
  • Week 2 — Shaping: Aggregating & Summarization
    • Pivot tables and groupby-aggregation. Paradoxes of summarization.
  • Week 3 — Shaping: Joining & Validation
    • Types of joins and handling duplicated and missing data. Evaluating a join.
  • Week 4 — Shaping: Tidying & Tables
    • Wide versus tidy data and melting. Designing tables.
  • Week 5 — Distribution: Histograms & Perception
    • Counts, cuts, and transformations. Theories of visual perception.
  • Week 6 — Distribution: Box plots & Audience
    • Distributions, outliers, and significance testing. Understanding your audience.
  • Week 7 — Comparison: Cat plots & Context
    • Cat and hued plots. Sensitizing to context.
  • Week 8 — Comparison: Faceted plots & Simplicity
    • Faceted plots and managing multiple plots. Customizing plots for simplicity.
  • Week 9 — Trend: Line plots & Trust
    • Temporal data and resampling. Establishing trust.
  • Week 10 — Trend: Stacked plots & Annotation
    • Stacked and area plots. Annotating plots.
  • Week 11 — Spring Break
    • No class!
  • Week 12 — Relationship: Scatter plots & Fallacies
    • Scatter plots and correlation. Statistical fallacies.
  • Week 13 — Relationship: Heatmaps & Persuasion
    • Heatmaps and clustering. Persuasion strategies.
  • Week 14 — Spatial: Choropleths & Conventions
    • Choropleths and spatial data. Socio-cultural conventions.
  • Week 15 — Spatial: Point plots & Design
    • Point plots and spatial joins. Designing infographics.
  • Week 16 — Projects
    • Working on final projects.

About

Spring 2022 version of INFO 3402 Information Exposition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published