-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic expansion of thread data #294
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jrmadsen
added
bug fix
Fixes a bug
timemory
Issue affects/involves timemory features/capabilities
libomnitrace
Involves omnitrace library
cmake
Modifies the CMake build system
submodule
Updates a git submodule
libomnitrace-core
Internal library containing core capabilities
labels
Jun 30, 2023
jrmadsen
force-pushed
the
thread-data-update
branch
3 times, most recently
from
July 6, 2023 23:19
77eeb9f
to
aac3903
Compare
- tests which exceeds OMNITRACE_MAX_THREADS value for thread creation
- include source files in /tests/source directory
- fail if a timemory hash is not resolved to a name
- remove env disabling of critical-trace and process-sampling
- make_unique in concepts.hpp - add OMNITRACE_USE_ROCM_SMI to "process_sampling" category - remove forced disabling of critical-trace in sampling mode - parentheses for OMNITRACE_PREFER - use tim::get_hash_id instead of tim::get_combined_hash_id
- added aligned_static_vector.hpp - similar to static_vector.hpp but attempts to align to cache line size - alignment template parameter for stable_vector - added missing aliases in static_vector - consistent with aligned_static_vector aliases
- track the peak number of threads created - thread_info::get_peak_num_threads() returns the peak number of threads
- generic thread_data inherits from base_thread_data - thread_data reworked to support dynamic expansion - base_thread_data updated to invoke private_instance() function - thread_data<optional<T>> uses stable_vector aligned to cache line width - thread_data<identity<T>> uses stable_vector aligned to cache line width - thread_data for optional and identity provide private private_instance function + friend to base_thread_data - component_bundle_cache<T> is now thread_data<component_bundle_cache_impl<T>>
- thread_data<T>::instances -> thread_data<T>::instance(construct_on_thread{ ... }) - loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads() - tim::get_combined_hash_id -> tim::get_hash_id - update progress_bundle usage to new thread_data API
- backtrace_metrics update - update to new thead_data API - add thread CPU time row in perfetto - fix potential bug when rusage categories are disabled - fix bug in operator-= not subtracting cpu time of rhs - backtrace update - skip all child call-stack below 'tim::openmp::' if sampling_keep_internal = false
- pthread_gotcha::shutdown() invokes pthread_create_gotcha::shutdown()
- minor tweak to {start,stop}_bundle functions: pass in thread id - update to new thread_data API - track native handles of internal threads - implement system with pthread_kill to stop dangling bundles
- update to new thread_data API - loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- update to new thread_data API - tim::get_combined_hash_id -> tim::get_hash_id
- update to new thread_data API
- update to new thread_data API - loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- update to new thread_data API - loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- update to new thread_data API
- update to new thread_data API
- update to new thread_data API - loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- invoke pthread_gotcha::shutdown before invoking OMPT finalize function - this prevents signals from being delivered to OpenMP threads
- replace get_timemory_hash_{ids,aliases} functions with copy_timemory_hash_ids function - update to new thread_data API - loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads() - tim::get_combined_hash_id -> tim::get_hash_id - improvements to + error checking in thread_init function
- move copying timemory hash id/aliases to tracing.cpp - update to new thread_data API - loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- add -Wno-interference-size to suppress warning about use of std::hardware_destructive_interference
- improve scheme for waiting on child processes via waitpid instead of wait - support running main routine multiple times - push/pop regions in child process
- allow use to specify misc values via -D <name>=<value> - OMNITRACE_CACHELINE_SIZE - OMNITRACE_CACHELINE_SIZE_MIN - OMNITRACE_ROCM_MAX_COUNTERS - remove unused defines - OMNITRACE_ROCM_LOOK_AHEAD - OMNITRACE_MAX_ROCM_QUEUES
- OMNITRACE_MAX_ROCM_COUNTERS -> OMNITRACE_ROCM_MAX_COUNTERS
- set cacheline_align_v from max of OMNITRACE_CACHELINE_SIZE and OMNITRACE_CACHELINE_SIZE_MIN
- acquire locks for updating main hash ids/aliases - only propagate ids/aliases when finalizing
- make sure hash for "start_thread" exists on main thread
- if OMNITRACE_BUILD_NUMBER is 1, set OMNITRACE_VERBOSE=0
jrmadsen
force-pushed
the
thread-data-update
branch
from
October 16, 2023 18:12
aac3903
to
150703c
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug fix
Fixes a bug
cmake
Modifies the CMake build system
libomnitrace
Involves omnitrace library
libomnitrace-core
Internal library containing core capabilities
submodule
Updates a git submodule
timemory
Issue affects/involves timemory features/capabilities
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OMNITRACE_MAX_THREADS
(defined at compile-time)static thread_local
allocationsOMNITRACE_MAX_THREADS=32
then the user application could only create ~31 additional threads at the absolute max before omnitrace abortedOMNITRACE_MAX_THREADS
to support an unlimited number of threads, i.e. once the 32nd additional thread is created, omnitrace will resize all thethread_data
instances to support 32 more threads (i.e. the size is originally 32 and after the resize, the size is 64)