This Hands on Advanced Tutorial Session (HATS) is presented by the LPC to demonstrate a CMS analysis using Apache Spark, Spark-ROOT, Histogrammar, and MatplotLib. After introducing Spark and the paradigm it brings with it, students will learn some basic building blocks then combine them to perform a basic measurement of the Z-boson mass using CMS data recorded in 2016.
Students of the HATS will be provided access to Vanderbilt's Jupyter instance using their CERN username. The jupyter instance contains this repository and all necessary software preconfigured.
The day before the tutorial, it's critical that each student perform the pre-exercises. This way, any potential technical/login issues can be cleared up beforehand. To perform the pre-exercises, connect to Jupyter. You will first need to log in to CERN and authorize Jupyter to authenticate (don't worry, CERN doesn't transfer your password, just a secret authentication token).
Once you've given Jupyter permission to authenticate, click "Start My Server" to start your Jupyter instance.
Once your server starts, you'll be placed into the Jupyter file browser. Then, navigate to
spark-hats/notebooks/10-building-blocks.ipynb
to begin the tutorial.
Once logged into Jupyter, navigate to the spark-hats
directory and open the notebook named setup-libraries.ipynb
- Jupyter - Interactive python notebook interface
- Apache Spark - Fast and general engine for large-scale data processing
- Spark-ROOT - Scala-based ROOT/IO interface to Spark
- Histogrammar - Functional historgamming framework, optimized for Spark
- MatplotLib - Python plotting library
- Andrew Melo - [http://lpc.fnal.gov/fellows/2017/Andrew_Melo.shtml]
- The LPC Distinguished Researcher Program (link) - Support for the author
- Advanced Computing Center for Research and Education (ACCRE) (link) - Host facility and sysadmin support
- The Diana-HEP project (link - Interoperability and compatibility libaries
- Vanderbilt Trans Institutional Program (TIPs) Award (link) - Big Data hardware seed funding