Skip to content
This repository has been archived by the owner on Oct 8, 2023. It is now read-only.

mikemykhaylov/uber-fares

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

75 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Uber Fares Data Science

Uber Fares is a Data Science and Machine Learning I worked on in my free time. Its goals were to analyse the dataset of 200k NYC Uber rides and build a model to predict the price of the trip.

MIT License GitHub tag (latest SemVer)

Features

During the project development I have...

  • Downloaded the dataset from Kaggle
  • Formatted, cleaned, and enriched the dataset with additional data (NYC Neighborhoods and US Holidays)
  • Created qualitative, spacial and temporal visualisations with Seaborn
  • Iterated through several ML algorithms such as Polynomial regression, ElasticNet and Decision trees

Documentation

Documentation is hosted on Netlify and built on Sphinx

Project Structure

    β”œβ”€β”€ data
    β”‚   β”œβ”€β”€ external            <- Data from third party sources.
    β”‚   β”œβ”€β”€ interim             <- Intermediate data that has been transformed.
    β”‚   β”œβ”€β”€ processed           <- The final, canonical data sets for modeling.
    β”‚   └── raw                 <- The original, immutable data dump.
    β”œβ”€β”€ docs                    <- Sphinx Docs; see sphinx-doc.org for details
    β”œβ”€β”€ models                  <- Trained and serialized models
    β”œβ”€β”€ notebooks               <- Jupyter notebooks for explorations
    β”‚   β”œβ”€β”€ 0.1_data_processing_tests
    β”‚   β”œβ”€β”€ 0.2_exploration
    β”‚   └── 0.3_machine_learning
    β”œβ”€β”€ references              <- Data dictionaries, manuals, and all other explanatory materials.
    β”œβ”€β”€ reports                 <- Generated analysis as HTML, PDF, LaTeX, etc.
    β”‚   └── figures             <- Generated graphics and figures to be used in reporting
    β”œβ”€β”€ utils                   <- Source code for all analysis
    β”‚   β”œβ”€β”€ data                <- Scripts to preprocess data for analysis
    β”‚   β”œβ”€β”€ features            <- Scripts to build features
    β”‚   β”œβ”€β”€ models              <- Scripts to train models
    β”‚   └── visualization       <- Scripts to produce visualisations
    β”œβ”€β”€ web                     <- Web demo
    β”œβ”€β”€ environment.yml         <- Template for conda environment creation
    β”œβ”€β”€ Makefile                <- Makefile with commands like `make data` or `make model`
    β”œβ”€β”€ pyproject.toml          <- Python project config file
    β”œβ”€β”€ README.md               <- The top-level README for developers using this project.
    β”œβ”€β”€ requirements.txt        <- Pip requirements
    β”œβ”€β”€ test_environment.py     <- Script for testing the correct environment setup
    └── tox.ini                 <- tox file with settings for running tox; see tox.readthedocs.io

Acknowledgements