Skip to content

Timemory

Damien L-G edited this page Oct 24, 2023 · 2 revisions

This profiling tool is designed to work with the modular performance analysis library: timemory. Timemory is a template library whose goal is provide a flexible and highly-efficient API for implementing one or more performance analysis tools in the form of "components" which are recursively injected by other components or explicitly listed in variadic tuple-like structures.

This connector provides an easy method for Kokkos developers to generating custom analysis metrics: Kokkos developers can write their own component specifications and the add the name of the component into the profile_entry_t component tuple. In general, a component that collects data should be a struct with following minimum specification:

  • A default constructor
  • Inherit from CRTP base class base<this_type, value_type>
    • this_type is the component itself
    • value_type is measurement data type
      • E.g. a wall-clock timer may store the start/stop values as integer timestamps
  • T get() const
    • T can be any type
    • E.g. a wall-clock timer may convert the difference of the integer timestamps into a double-precision floating-point value
  • U get_display() const
    • U can be any type
    • This should return a type that supports operator<< for printing to screen
  • static value_type record()
    • This is used for one-time measurements
  • static std::string label()
    • This is used for generating the output file name
  • static std::string description()
  • void start() member function
    • This should (generally) update value_type value inherited from the base class
  • void start() member function
    • This should (generally) update value_type accum from the inherited base class with the delta
namespace tim
{
namespace component
{
struct trip_count : public base<trip_count, int64_t>
{
    using value_type = int64_t;
    using this_type  = trip_count;
    using base_type  = base<this_type, value_type>;

    static std::string label() { return "trip_count"; }
    static std::string description() { return "Number of invocations"; }
    static value_type  record() { return 1; }

    value_type get() const { return accum; }
    value_type get_display() const { return get(); }

    void start() { value = record(); }
    void stop() { accum += value; }
};
}  // namespace component
}  // namespace tim

Several pre-built components are provided, these can be queried with the timemory-avail command-line tool.

Using the pre-built components or a user-built component is straight-forward. Components can be explicitly specified at compile time or used within other components.

For example, component_tuple<wall_clock, cpu_clock> creates a single handle for a wall-clock timer and cpu-clock timer that are started and stopped via the obj.start() and obj.stop() member functions. component_tuple<user_tuple_bundle> obj combined with user_tuple_bundle::configure<wall_clock, cpu_clock>() accomplishes the same result. In the former method, direct access to the tools is possible in C++, e.g. obj.get<wall_clock>().start(), but the components must be specified at compile time whereas the latter does not allow direct access to the tool but allows for dynamic runtime configuration of the tool.

Compilation

Install timemory

First install the timemory library. The timemory library is uses a standard CMake build system. It is recommended to toggle the statistics settings as desired and to build the Python interface (provides plotting for Kokkos output in addition to the Python profiling and instrumentation capabilities).

git clone https://github.com/NERSC/timemory.git timemory
mkdir build-timemory
cd build-timemory
cmake -DCMAKE_INSTALL_PREFIX=/usr/local -DBUILD_STATIC_LIBS=OFF ../timemory
make -j8
make install -j8

The TIMEMORY_REQUIRE_PACKAGES=ON option will add REQUIRED to every find_package(...). This is quite useful to be sure that the installation includes all the tools you might want to use in Kokkos-tools. External packages

Building Connector

The build system uses CMake. Prefix the environment variable CMAKE_PREFIX_PATH with the root folder of the timemory installation, run CMake, and build/install. Various options in the form of USE_<PACKAGE> should be configured to activate the various external libraries. Timers and memory measurements require no external packages. By default, all forms of output are generated.

The output comes in 4 forms: print to screen at end of application, output to text at end of application, output to JSON at end of application, and, provided the Python interface was built and JSON output is enabled, the JSON output will be plotted. If any errors in the plotting occur, ensure the timemory Python installation is in PYTHONPATH and the Python installation includes matplotlib and pillow.

The default kp_timemory.so connector library uses the KOKKOS_TIMEMORY_COMPONENTS environment variable to specify the components to measure.

The option BUILD_CONFIG=ON will generate several connector libraries that will explicitly measure certain components, e.g. kp_timemory_timers.so will generate a connector for component_tuple<wall_clock, cpu_clock, cpu_util>, kp_timemory_memory.so will generate a connector for component_tuple<peak_rss, page_rss, virtual_memory>, etc.

For roofline capabilities, PAPI is used for generating the CPU roofline and CUPTI is used for generating the (NVIDIA) GPU roofline. Rooflines require running the application twice. First set KOKKOS_ROOFLINE=ON, set TIMEMORY_OUTPUT_PATH=<OUTPUT-DIR>, run once with TIMEMORY_ROOFLINE_MODE=op, and then a second time with TIMEMORY_ROOFLINE_MODE=ai. At the end of the application, timemory will run roofline models to empirically determine the peak performance of the hardware. Then use the timemory-roofline command line tool and specify the files output during the run, e.g. timemory-roofline -t gpu_roofline -op timemory-output/gpu_roofline_counters.json -ai timemory-output/gpu_roofline_activity.json.

Output

Unless KOKKOS_ROOFLINE is set to ON, output will be located in timemory-output/<DATE-TIME>. The <DATE-TIME> uses strftime formatting specifications and can be altered via the TIMEMORY_TIME_FORMAT env variable. The default is "%F_%I.%M_%p".

Clone this wiki locally