-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: TC for Metric P0 nv_load_time per model #7697
base: main
Are you sure you want to change the base?
Changes from 3 commits
c9e8c6a
5b1f62f
d421e49
d47ebe5
3fcc649
8447d01
748d3c5
b93d774
9f3f577
0cfd16c
f745073
f07f5ef
b752a5b
513a301
9329e55
fdf05c7
4764717
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,10 +62,10 @@ def get_model_load_times(): | |
def load_model_explicit(model_name, server_url="http://localhost:8000"): | ||
endpoint = f"{server_url}/v2/repository/models/{model_name}/load" | ||
response = requests.post(endpoint) | ||
|
||
if response.status_code == 200: | ||
try: | ||
self.assertEqual(response.status_code, 200) | ||
print(f"Model '{model_name}' loaded successfully.") | ||
else: | ||
except AssertionError: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want the test to pass if failed to load the model? If not, you should remove try...except... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes that's expected behaviour. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If load or unload failure will result test to fail anyway, why not let it fail at the HTTP response code check instead of metrics check? This way people can easiler identify the root cause of job failure. |
||
print( | ||
f"Failed to load model '{model_name}'. Status code: {response.status_code}" | ||
) | ||
|
@@ -75,12 +75,12 @@ def load_model_explicit(model_name, server_url="http://localhost:8000"): | |
def unload_model_explicit(model_name, server_url="http://localhost:8000"): | ||
endpoint = f"{server_url}/v2/repository/models/{model_name}/unload" | ||
response = requests.post(endpoint) | ||
|
||
if response.status_code == 200: | ||
try: | ||
self.assertEqual(response.status_code, 200) | ||
print(f"Model '{model_name}' unloaded successfully.") | ||
else: | ||
except AssertionError: | ||
print( | ||
f"Failed to load model '{model_name}'. Status code: {response.status_code}" | ||
f"Failed to unload model '{model_name}'. Status code: {response.status_code}" | ||
) | ||
print("Response:", response.text) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -143,34 +143,32 @@ run_and_check_server | |
# Test 1 for default model control mode (all models loaded at startup) | ||
python3 -m pytest --junitxml="general_metrics_test.test_metrics_load_time.report.xml" $CLIENT_PY::TestGeneralMetrics::test_metrics_load_time >> $CLIENT_LOG 2>&1 | ||
kill_server | ||
set -e | ||
|
||
set +e | ||
CLIENT_PY="./general_metrics_test.py" | ||
CLIENT_LOG="general_metrics_test_client.log" | ||
SERVER_LOG="general_metrics_test_server.log" | ||
SERVER_ARGS="$BASE_SERVER_ARGS --model-control-mode=explicit --log-verbose=1" | ||
run_and_check_server | ||
MODEL_NAME='libtorch_float32_float32_float32' | ||
code=`curl -s -w %{http_code} -X POST ${TRITONSERVER_IPADDR}:8000/v2/repository/models/${MODEL_NAME}/load` | ||
curl -s -w %{http_code} -X POST ${TRITONSERVER_IPADDR}:8000/v2/repository/models/${MODEL_NAME}/load | ||
# Test 2 for explicit mode LOAD | ||
python3 -m pytest --junitxml="general_metrics_test.test_metrics_load_time_explicit_load.report.xml" $CLIENT_PY::TestGeneralMetrics::test_metrics_load_time_explicit_load >> $CLIENT_LOG 2>&1 | ||
|
||
code=`curl -s -w %{http_code} -X POST ${TRITONSERVER_IPADDR}:8000/v2/repository/models/${MODEL_NAME}/unload` | ||
curl -s -w %{http_code} -X POST ${TRITONSERVER_IPADDR}:8000/v2/repository/models/${MODEL_NAME}/unload | ||
# Test 3 for explicit mode UNLOAD | ||
python3 -m pytest --junitxml="general_metrics_test.test_metrics_load_time_explicit_unload.report.xml" $CLIENT_PY::TestGeneralMetrics::test_metrics_load_time_explicit_unload >> $CLIENT_LOG 2>&1 | ||
kill_server | ||
set -e | ||
|
||
# Test 4 for explicit mode LOAD and UNLOAD with multiple versions | ||
set +e | ||
CLIENT_PY="./general_metrics_test.py" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove |
||
CLIENT_LOG="general_metrics_test_client.log" | ||
SERVER_LOG="general_metrics_test_server.log" | ||
VERSION_DIR="${PWD}/version_models" | ||
SERVER_ARGS="$BASE_SERVER_ARGS --model-repository=${VERSION_DIR} --model-control-mode=explicit --log-verbose=1" | ||
run_and_check_server | ||
python3 -m pytest --junitxml="general_metrics_test.test_metrics_load_time_multiple_version_reload.report.xml" $CLIENT_PY::TestGeneralMetrics::test_metrics_load_time_multiple_version_reload >> $CLIENT_LOG 2>&1 | ||
|
||
kill_server | ||
set -e | ||
|
||
### Pinned memory metrics tests | ||
set +e | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come the core PR was merged way before this one finished? We currently have no ongoing tests for the merged feature on our nightly pipelines in core, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was approved in parallel. A couple of days appart.
I was unable to get a CI passing due to other build issues.
And then @yinggeh added more comments after it was approved. Hence the delay.
Yes I will get this in ASAP after the trtllm Code freeze