-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat[MQB]: Enhance queue consumption monitor alarm log with additional details #420
Feat[MQB]: Enhance queue consumption monitor alarm log with additional details #420
Conversation
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Few questions.
-
Maybe, we can print
Storage::numMessages
andStorage::numBytes
as well? -
Can we get rid of
QueueEngineUtil_AppState::head()
andQueueConsumptionMonitor::SubStreamInfo::d_headCb
now? -
Is
QueueConsumptionMonitor::onTransitionToIdle
"level triggered" (vs "edge triggered")? It can be noisy, how often we want that log?
@chrisbeard please take a look at the output
Regarding
Aren't they the same as Storage::numMessages() and Storage::numBytes? In my test they printed the same values. Are there any possible scenarios when they will differ? |
Regarding
Is Regarding |
Ok. This can flap, we often see a lot of subsequent logs. We are increasing the size of what's logged, so we may want to throttle the logging. |
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a possibility to reduce mqbblp_queueconsumptionmonitor.t
dependencies. Let's explore it.
@@ -503,19 +498,7 @@ TEST_F(Test, putAliveIdleWithConsumer) | |||
ASSERT_EQ(logObserver.records().size(), ++expectedLogRecords); | |||
ASSERT(mwctst::ScopedLogObserverUtil::recordMessageMatch( | |||
logObserver.records().back(), | |||
"ALARM \\[QUEUE_CONSUMER_MONITOR\\].*It currently has 2 consumers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, there is no "test consumer 1/2" log because it is outside of the QueueConsumptionMonitor
now.
That is fine.
Question, maybe we do not need d_consumer1/2
now?
Maybe, the Test
can shrink now since the state of monitored objects is factored out by the Test::LoggingCb
?
If that is the case, we can remove Test::d_queue
, Test::d_queueState
, Test::d_domain
, Test::d_cluster
, and Test::createClient
. Possibly, d_storage
as well, if Test::putMessage()
changes the state of Test
instead of the Test::d_storage
(is that Test::d_advance
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the test by simplifying logic and removing consumers, but still need Test::d_queueState
and its dependencies, because it is required for d_monitor
.
BSLS_ASSERT_SAFE(d_queueState_p->queue()->dispatcher()->inDispatcherThread( | ||
d_queueState_p->queue())); | ||
|
||
// Construct AppId from appKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace the App lookup code with
Apps::const_iterator cItApp = d_apps.findByKey2(AppKeyCount(appKey, 0));
AppKeyCount
is an (hopefully, not for too long) artefact of our transitioning from non-CSL to CSL way of registering Apps. In short, only registered Apps are supposed to have storage and consumption monitoring. And registered Apps should have 0
as the "count" .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced as proposed.
bdlma::LocalSequentialAllocator<4096> localAllocator(d_allocator_p); | ||
|
||
bmqt::Uri uri(&localAllocator); | ||
uriBuilder.uri(&uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d_queueState_p->uri()
<< " consumers." << ss.str() << '\n'; | ||
|
||
// Log un-delivered messages info | ||
mqbi::Storage* const storage = d_queueState_p->storage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use storage
or d_queueState_p->storage()
everywhere?
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
QueueConsumptionMonitor::State::e_ALIVE); | ||
ASSERT_EQ(logObserver.records().size(), expectedLogRecords); | ||
} | ||
|
||
TEST_F(Test, logFormat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test originally tested the log format, but now logging is done by the callback, so this test makes no sense.
@@ -727,122 +497,6 @@ TEST_F(Test, putAliveIdleSendAliveTwoSubstreams) | |||
ASSERT_EQ(d_monitor.state(key2), QueueConsumptionMonitor::State::e_ALIVE); | |||
} | |||
|
|||
TEST_F(Test, putAliveIdleSendAliveTwoSubstreamsTwoConsumers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since consumers are removed, this test becomes the same as above, remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
/// `enableLog` is `true` it logs alarm data. Return `true` if there are | ||
/// un-delivered messages and `false` otherwise. | ||
bool logAlarmCb(const mqbu::StorageKey& appKey, | ||
const bool enableLog) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const bool enableLog) const; | |
bool enableLog) const; |
Not necessary to qualify as const for a basic type in arguments
@@ -229,6 +215,7 @@ void QueueConsumptionMonitor::onTimer(bsls::Types::Int64 currentTimer) | |||
// PRECONDITIONS | |||
BSLS_ASSERT_SAFE(d_queueState_p->queue()->dispatcher()->inDispatcherThread( | |||
d_queueState_p->queue())); | |||
BSLS_ASSERT_SAFE(d_loggingCb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BSLS_ASSERT_SAFE(d_loggingCb); |
Might remove this precondition since we set this up in the only constructor and check it right there already. We don't have setters or other way to change d_loggingCb
to an invalid value
@@ -210,8 +210,7 @@ class QueueConsumptionMonitor { | |||
static const char* toAscii(Transition::Enum value); | |||
}; | |||
|
|||
typedef bsl::function<bslma::ManagedPtr<mqbi::StorageIterator>(void)> | |||
HeadCb; | |||
typedef bsl::function<bool(const mqbu::StorageKey&, bool)> LoggingCb; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to explain here what the first and the second args mean
@@ -220,8 +219,7 @@ class QueueConsumptionMonitor { | |||
struct SubStreamInfo { | |||
// CREATORS | |||
|
|||
SubStreamInfo(const HeadCb& headCb); | |||
SubStreamInfo(const SubStreamInfo& other); | |||
SubStreamInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good simplification
/// Update the specified 'subStreamInfo', associated to the specified | ||
/// 'appKey', and write log, upon transition to alive state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Update the specified 'subStreamInfo', associated to the specified | |
/// 'appKey', and write log, upon transition to alive state. | |
/// Update the specified `subStreamInfo`, associated to the specified | |
/// `appKey`, and write log, upon transition to alive state. |
int idx = 1; | ||
int numConsumers = 0; | ||
|
||
QueueEngineUtil_AppState::Consumers& consumers = app->consumers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QueueEngineUtil_AppState::Consumers& consumers = app->consumers(); | |
const QueueEngineUtil_AppState::Consumers& consumers = app->consumers(); |
out << k_EXPR_NUM_LIMIT << " of " | ||
<< " consumer subscription expressions: "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out << k_EXPR_NUM_LIMIT << " of " | |
<< " consumer subscription expressions: "; | |
out << k_EXPR_NUM_LIMIT << " of " | |
<< "consumer subscription expressions: "; |
Double space here, also, from the log itself it's not clear that not all of the existing expressions were printed due to limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added First
word in the beginning for clarity, e.g. First 50 of consumer subscription expressions:
app->putAsideList().first()); | ||
if (rc == mqbi::StorageResult::e_SUCCESS) { | ||
// Log timestamp | ||
out << "Oldest message in a 'Put aside' list:\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out << "Oldest message in a 'Put aside' list:\n"; | |
out << "Oldest message in the 'Put aside' list:\n"; |
Since we are logging info about concrete queue
BALL_LOG_WARN << "Failed to streamIn MessageProperties, rc = " | ||
<< rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BALL_LOG_WARN << "Failed to streamIn MessageProperties, rc = " | |
<< rc; | |
BALL_LOG_WARN << "Failed to streamIn MessageProperties, rc = " | |
<< rc; | |
out << "Message Properties: Failed to acquire [rc: " << rc << "]\n"; |
Do we want to print this in the alarm record itself?
BALL_LOG_WARN << "Failed to get storage iterator for GUID: " | ||
<< app->putAsideList().first() << ", rc = " << rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BALL_LOG_WARN << "Failed to get storage iterator for GUID: " | |
<< app->putAsideList().first() << ", rc = " << rc; | |
BALL_LOG_WARN << "Failed to get storage iterator for GUID: " | |
<< app->putAsideList().first() << ", rc = " << rc; | |
out << "'Put aside' list: Failed to acquire [rc: " << rc << "]\n"; |
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
@678098 thank you, applied your suggestions. |
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
When a queue starts to fill up, it is valuable to see information about which AppIds are impacted, and information about the messages in the queue.
Especially in the case of subscriptions (which we are enabling for everyone now), messages that match no subscription expression will build up in the put aside list.
To help make this situation clearer to operators and users (what apps are impacted, why are messages building up, how old is the head of the queue for each app, etc), we can log more information when the watermark alarm is triggered:
storage()->capacityMeter()->printShortSummary()
);This is to help debug why a message doesn't match a subscription.
Alarm log looks like this:
Implementation details:
QueueConsumptionMonitor
intoRootQueueEngine
class, where more data is available;QueueConsumptionMonitor
and called in case of alarm to log alarm data;