Skip to content

Latest commit

 

History

History
54 lines (39 loc) · 1.59 KB

README.md

File metadata and controls

54 lines (39 loc) · 1.59 KB

Pachyderm/MLeap Demo

This is the codebase to support the Pachyderm/MLeap training and scoring demo. It is used to generate the Docker images used by the demo.

Docker Image

The Docker images are located here:

  1. Training Image
  2. Scoring Image

Building Locally

Build the Docker image locally with SBT.

  1. Install SBT with these instructions
  2. Make sure docker is running
  3. Use SBT to publish the image locally
sbt training/docker:publishLocal
sbt scoring/docker:publishLocal

This will publish two docker images named combustml/pmd-training:0.1-SNAPSHOT and combustml/pmd-scoring:0.1-SNAPSHOT.

Training

Download the Airbnb training dataset here: airbnb.clean.avro.

docker run -v /tmp/pmd-in:/data-in \
  -v /tmp/pmd-out:/data-training-out combustml/pmd-training:0.1-SNAPSHOT airbnb \
  -t random-forest \ # train a random forest model
  -i file:///data-in/airbnb.clean.avro \ # input airbnb dataset
  -o /data-out/model.zip \ # set the output location of the model file
  -s /data-out/summary.txt \ # output path for model summary
  -J-Xmx2048m # make sure Spark has enough memory

Scoring

docker run -v /tmp/pmd-out:/data-in1 \
  -v /tmp/pmd-training-in:/data-int2 \
  -v /tmp/pmd-scoring-out:/data-out combustml/pmd-scoring:0.1-SNAPSHOT \
  -m /data-in1/model.zip \
  -i /data-in2/good.avro \
  -o /data-out/test-docker.avro \
  -J-Xmx2048m