xd-demo with Pivotal HD Retail data ===================================
Contributors
- James Williams - jwilliams@gopivotal.com
- Michael Goddard - mgoddard@gopivotal.com
- Adam Zwickey - azwickey@gopivotal.com
Demo User Story
We want to ingest real time orders from our POS system directly to HDFS via a pipe delimited HTTP post.
A sample post looks like:
Customer ID, Order ID, Order Amount, Store ID curl -d "{\"orderid\":\"123\",\"storeid\":\"456\",\"customerid\":\"789\",\"orderamount\":\"5000.01\"}" http://localhost:8000 - Good Post curl -d "{\"orderid\":\"BAD_DATA\",\"storeid\":\"456\",\"customerid\":\"789\",\"orderamount\":\"5000.01\"}" http://localhost:8000 - Bad Post 123|456|789|5000.01 - Dream State in HDFS with HAWQ and in-memory Query
We are going to re-use some integration work that was done in the past and we need to transform and filter the POS data before
ingesting into hadoop. The HTTP stream will accept JSON formatted key/value pairs of Order data.
Some orders have bad data. We need to filter these records before persisting them to HDFS. After landing the data into hadoop,
we would like to run SQL analytics on the orders to see if they match known fraudulent orders from the past. Hive is not an option
because it does not provide fast enough response time and full ANSI compliance. We want to run a logistic regression model on all
orders to feed our real-time fraud detection applications that aim to catch criminals before they leave the store. The logistic regression
model needs to be re-trained periodically via a scheduled process. The in-memory fraud data store needs to be flushed on a configurable
interval and HDFS files need to be archived via a scheduled process.
In order to get this running with Pivotal HD
- Start Pivotal HD instance. It is optional to run the "pivotal-samples" data labs to populate the retail_demo DB with HAWQ tables/data. The "pivotal-samples" github project is located at:
- Download and install the latest Spring XD binary. The project is located at:
- Update your spring-xd hadoop config ($SPRING_XD/conf/hadoop.properties) to reflect your hdfs address: =======
- Update your spring-xd hadoop config ($SPRING_XD/xd/config/hadoop.properties) to reflect webhdfs:
>>>>>>> 4985ef63c23b7c2723e426e91d14f685bebacd48
fs.default.name=hdfs://my-hadoop:8020
- Open config.py and add entries for each property. This is very important to ensure connectivity to Pivotal HD and SQLFire.
- In a terminal window run(will scp python demo scripts to pivotal hd and sqlfire VMs. Will copy spring xd scripts, lib jars, modules and sink config:
./install.py
- Run 3 Spring XD runtimes in terminal windows(redis, admin, container)
sudo sysctl -w net.inet.tcp.msl=1000
$SPRING_XD/redis/bin/redis-server
$SPRING_XD/xd/bin/xd-admin --hadoopDistro phd1
$SPRING_XD/xd/bin/xd-container --hadoopDistro phd1 - Run Spring XD Shell in a terminal window
$SPRING_XD/shell/bin/spring-xd-shell --hadoopDistro phd1
- In Spring XD Shell - Create Hadoop ingest, Pivotal HD analytics tap and SQLFire sink.
script --file ../../xd/cmd/create-all.cmds
- [PIVOTALHD TERMINAL] Open an ssh session to your Pivotal VM and run this script. You must do this before starting the data stream.
./demo.py setup_hdfs
- In a terminal window, run send_data.py to start a data stream simulation.
./send_data.py
- [SQLFIRE TERMINAL] Verify that SQLFire is getting only a small subset of orders
./demo.py query
- In Spring XD Shell - Re-run batch jobs(should delete SQLFire data, populate HAWQ tables, and re-run analytic training model)
script --file ../../xd/cmd/deploy-batch.cmds
- In Spring XD Shell - Reset the richgauge taps to 0)
script --file ../../xd/cmd/reset-taps.cmds
- [PIVOTALHD_TERMINAL] Run a PXF and HAWQ Query
./demo.py query_hawq
- Install DB Visualizer and run queries through a JDBC client GUI. http://www.dbvis.com/. You will need to add a new "Cache" Driver JAR for SQLFire. You will need to modify '/data/1/hawq_master/gpseg-1/pg_hba.conf' in your Pivotal HD VM to remote connect.
- [PIVOTALHD TERMINAL] Restart Pivotal HD via the stop/start scripts.
/home/gpadmin/stop_all.sh; /home/gpadmin/start_all.sh;
- In Spring XD Shell - Remove all streams/taps from Spring XD. Does not delete any data)
script --file ../../xd/cmd/destroy-all.cmds
<<<<<<< HEAD
- Update app.properties (src/main/webapps/WEB-INF/classes) to reflect the IP addresses of your sqlfire environment
- Open a terminal and build the war via maven
mvn install
- Copy the WAR file to a working tc Server or Tomcat server
- The application will be available at: http://localhost:8080/xd-demo-client/resources/index.html