Wrapper around python data analysis packages such as pandas, scikit-learn and matplotlib to make data analysis on python easier.
- Python
- pandas
- scikit-learn
- matplotlib
- rpy2
- tornado
Note: pandas is the only package that is required before installing copper, but is recommended to have all other packages installed too.
Note 2: The package is developed for Python 3 and Python 2 with a single code base. But the main target is Python 3 so is recommended since most packages already support Python 3.
pip install copper
- Project structure for Data Analysis projects ala Project Template on R.
- Dataset: Wrapper around pandas.DataFrame to introduce metadata
- Data transformation templates
- Custom matplotlib charts for exploration: histograms, scatterplots
- Exploration via D3.js (very experimental)
- More data imputation options via R (rpy2)
- Rapid Machine Learning prototyping:
- Easy to compare classifiers
- Ensemble (bagging)
Copper uses a project structure based on Project Template (from R) to give structure to a Data Analysis project.
The suggested structure is:
data
-> `project/data': All the data files, raw, cached, etc.
Is suggested to use /data/raw
for raw files such as .csv
files.
Copper by default loads data from the data
folder. For example: data = copper.read_csv('catalog.csv')
will load the project/data/catalog.csv
file into a pandas.DataFrame using the pandas read_csv
method and parameters.
As expected when saving files (copper.save(...)
or copper.export(...)
) copper saves the files on the data
folder
source
-> src
: Python, iPython notebook files.
Following the intuition every file inside the source
folder should do:
import copper
copper.project.path = '../'
For other suggested folders see: Project Template
For more info about this see the examples below.
Donors:
Loans:
Catalog:
Kaggle Bulldozers:
For more information and more examples (but some are possible outdated) can see my blog: danielfrg.github.com