This is the codebase to support the Pachyderm/MLeap training and scoring demo. It is used to generate the Docker images used by the demo.
The Docker images are located here:
Build the Docker image locally with SBT.
- Install SBT with these instructions
- Make sure docker is running
- Use SBT to publish the image locally
sbt training/docker:publishLocal
sbt scoring/docker:publishLocal
This will publish two docker images named combustml/pmd-training:0.1-SNAPSHOT
and
combustml/pmd-scoring:0.1-SNAPSHOT
.
Download the Airbnb training dataset here: airbnb.clean.avro.
docker run -v /tmp/pmd-in:/data-in \
-v /tmp/pmd-out:/data-training-out combustml/pmd-training:0.1-SNAPSHOT airbnb \
-t random-forest \ # train a random forest model
-i file:///data-in/airbnb.clean.avro \ # input airbnb dataset
-o /data-out/model.zip \ # set the output location of the model file
-s /data-out/summary.txt \ # output path for model summary
-J-Xmx2048m # make sure Spark has enough memory
docker run -v /tmp/pmd-out:/data-in1 \
-v /tmp/pmd-training-in:/data-int2 \
-v /tmp/pmd-scoring-out:/data-out combustml/pmd-scoring:0.1-SNAPSHOT \
-m /data-in1/model.zip \
-i /data-in2/good.avro \
-o /data-out/test-docker.avro \
-J-Xmx2048m