Jupyter Notebooks for SOEP data analysis

This repository contains exemplary Jupyter Notebooks for analysing SOEP data with Python and R.

Author: Heinz-Alexander Fütterer

Index

Datasets
Notebooks
1. Python Notebook
2. R Notebook
Acknowledgements

Datasets

The datasets we used in these notebooks are part of the bilingual Stata based distribution of the SOEP data in version 34. Researchers will find the datasets in STATA_DEEN_v34.zip. We assume the datasets to be extracted to a directory called data/.

Citation:

Liebig, Stefan; Schupp, Jürgen; Goebel, Jan; Richter, David; Schröder, Carsten et. al. (2019): Sozio-oekonomisches Panel (SOEP), Daten der Jahre 1984-2017. Version: v34. SOEP - Sozio-oekonomisches Panel. Dataset. http://doi.org/10.5684/soep.v34

In the notebooks we make use of three datasets: hgen, pgen and ppathl:

$ md5sum data/*
510427c28ed0d7113d989a3651191af2  data/hgen.dta
096d87642640a076f4514b7163e716b2  data/pgen.dta
33fef93b16c406d2a82350772d6070cc  data/ppathl.dta

Notebooks

The exemplary Notebooks in this repository demonstrate some common processing and analysis steps usually done with SOEP data using:

Python
R

The steps are among others:

loading data from disk (tabular data in Stata's .dta-format)
selecting columns of interest
plotting of histograms
crosstables
grouping
merging multiple datasets based on key columns
setting values to NaN
plotting boxplots
create new columns based on content of existings columns
prepare subset of dataset for statistical modelling
statistical modelling

Python Notebook

Installation and start:

git clone https://github.com/zbw/soep-notebooks.git
cd soep-notebooks/
pip install --user --upgrade pipenv
pipenv install
pipenv shell
jupyter notebook

The Python Notebook uses these libraries:

R Notebook

The R Notebook uses these libraries:

Acknowledgements

Thanks, Andreas Franken for the initial R and Stata scripts with the examples.

Also this article proved useful to structure notebooks:

Rule, Adam, et al. "Ten simple rules for reproducible research in Jupyter notebooks." arXiv preprint arXiv:1810.08055 (2018).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
soep-notebooks-python.ipynb		soep-notebooks-python.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jupyter Notebooks for SOEP data analysis

Index

Datasets

Notebooks

Python Notebook

R Notebook

Acknowledgements

About

Releases

Packages

Languages

zbw/soep-notebooks

Folders and files

Latest commit

History

Repository files navigation

Jupyter Notebooks for SOEP data analysis

Index

Datasets

Notebooks

Python Notebook

R Notebook

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages