test: TC for Metric P0 nv_load_time per model #7697

indrajit96 · 2024-10-14T08:10:43Z

What does the PR do?

Test Case of model load time metrics

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Core : triton-inference-server/core#397

Where should the reviewer start?

qa/L0_metrics/general_metrics_test.py

Test plan:

Added tests for

Normal Mode Model Load
Explicit Model Load
Explicit Model Unload

CI Pipeline ID: https://gitlab-master.nvidia.com/dl/dgx/tritonserver/-/pipelines/19322776

Background

Improve metrics in Triton

qa/L0_metrics/test.sh

…-server/server into ibhosale_metrics_google

qa/L0_metrics/general_metrics_test.py

kthui

Nice work! Make sure the CI passes before merging.

yinggeh · 2024-10-31T16:53:57Z

docs/user_guide/metrics.md

+#### Load Time Per-Model
+The *Model Load Duration* reflects the time to load a model from storage into GPU/CPU in seconds.
+```
+# HELP nv_model_load_duration_secs Model load time in seconds


Do we need a sample output for a gauge metric?

qa/L0_metrics/general_metrics_test.py

qa/L0_metrics/test.sh

yinggeh · 2024-11-05T20:49:31Z

qa/L0_metrics/test.sh

 # Test 3 for explicit mode UNLOAD
 python3 -m pytest --junitxml="general_metrics_test.test_metrics_load_time_explicit_unload.report.xml" $CLIENT_PY::TestGeneralMetrics::test_metrics_load_time_explicit_unload >> $CLIENT_LOG 2>&1
 kill_server
+set -e

 # Test 4 for explicit mode LOAD and UNLOAD with multiple versions
 set +e
 CLIENT_PY="./general_metrics_test.py"


yinggeh · 2024-11-05T20:52:39Z

qa/L0_metrics/general_metrics_test.py

        print(f"Model '{model_name}' loaded successfully.")
-    else:
+    except AssertionError:


Do we want the test to pass if failed to load the model? If not, you should remove try...except...

Yes that's expected behaviour.
Models should load and unload. Else test should fail as subsequent metrics will be incorrect

If load or unload failure will result test to fail anyway, why not let it fail at the HTTP response code check instead of metrics check? This way people can easiler identify the root cause of job failure.

rmccorm4 · 2024-11-07T22:21:10Z

qa/L0_metrics/general_metrics_test.py

How come the core PR was merged way before this one finished? We currently have no ongoing tests for the merged feature on our nightly pipelines in core, right?

It was approved in parallel. A couple of days appart.
I was unable to get a CI passing due to other build issues.
And then @yinggeh added more comments after it was approved. Hence the delay.
Yes I will get this in ASAP after the trtllm Code freeze

TC for Metric P0 nv_load_time per model

c9e8c6a

indrajit96 requested review from rmccorm4 and yinggeh October 14, 2024 08:10

Fix Pre-Commit

5b1f62f

indrajit96 changed the title ~~TC for Metric P0 nv_load_time per model~~ test: TC for Metric P0 nv_load_time per model Oct 14, 2024

indrajit96 mentioned this pull request Oct 14, 2024

feat: Add model_load_time metric triton-inference-server/core#397

Merged

19 tasks

indrajit96 requested review from GuanLuo and kthui October 14, 2024 16:17

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

qa/L0_metrics/test.sh Outdated Show resolved Hide resolved

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

qa/L0_metrics/test.sh Outdated Show resolved Hide resolved

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

qa/L0_metrics/test.sh Outdated Show resolved Hide resolved

Fix review comments

d421e49

indrajit96 requested a review from rmccorm4 October 16, 2024 22:59

indrajit96 added 3 commits October 17, 2024 18:02

Update Docs for new metric model load time

d47ebe5

Remove logs causing test to fail

3fcc649

Merge branch 'main' into ibhosale_metrics_google

8447d01

kthui reviewed Oct 19, 2024

View reviewed changes

qa/L0_metrics/test.sh Outdated Show resolved Hide resolved

indrajit96 added 2 commits October 19, 2024 01:44

Fix review comments add new test for versions

748d3c5

Merge branch 'ibhosale_metrics_google' of github.com:triton-inference…

b93d774

…-server/server into ibhosale_metrics_google

github-advanced-security bot found potential problems Oct 19, 2024

View reviewed changes

qa/L0_metrics/general_metrics_test.py Fixed Show resolved Hide resolved

indrajit96 added 2 commits October 19, 2024 01:48

Pre-Commit Fix

9f3f577

Merge branch 'main' into ibhosale_metrics_google

0cfd16c

kthui reviewed Oct 21, 2024

View reviewed changes

qa/L0_metrics/general_metrics_test.py Show resolved Hide resolved

qa/L0_metrics/general_metrics_test.py Show resolved Hide resolved

Comments fixed

f745073

indrajit96 requested a review from kthui October 22, 2024 01:04

kthui previously approved these changes Oct 22, 2024

View reviewed changes

yinggeh reviewed Oct 31, 2024

View reviewed changes

Review Comments Fixed

f07f5ef

indrajit96 dismissed kthui’s stale review via f07f5ef November 4, 2024 23:47

Pre-Commit Fix

b752a5b

Merge branch 'main' into ibhosale_metrics_google

513a301

yinggeh reviewed Nov 5, 2024

View reviewed changes

indrajit96 added 2 commits November 6, 2024 11:59

Extra assignment removed

9329e55

Merge branch 'main' into ibhosale_metrics_google

fdf05c7

rmccorm4 reviewed Nov 7, 2024

View reviewed changes

Merge branch 'main' into ibhosale_metrics_google

4764717

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: TC for Metric P0 nv_load_time per model #7697

test: TC for Metric P0 nv_load_time per model #7697

indrajit96 commented Oct 14, 2024 •

edited

Loading

kthui left a comment

yinggeh Oct 31, 2024

yinggeh Nov 5, 2024

yinggeh Nov 5, 2024

indrajit96 Nov 6, 2024

yinggeh Nov 7, 2024

rmccorm4 Nov 7, 2024

indrajit96 Nov 7, 2024

test: TC for Metric P0 nv_load_time per model #7697

Are you sure you want to change the base?

test: TC for Metric P0 nv_load_time per model #7697

Conversation

indrajit96 commented Oct 14, 2024 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Background

kthui left a comment

Choose a reason for hiding this comment

yinggeh Oct 31, 2024

Choose a reason for hiding this comment

yinggeh Nov 5, 2024

Choose a reason for hiding this comment

yinggeh Nov 5, 2024

Choose a reason for hiding this comment

indrajit96 Nov 6, 2024

Choose a reason for hiding this comment

yinggeh Nov 7, 2024

Choose a reason for hiding this comment

rmccorm4 Nov 7, 2024

Choose a reason for hiding this comment

indrajit96 Nov 7, 2024

Choose a reason for hiding this comment

indrajit96 commented Oct 14, 2024 •

edited

Loading