How can I use dynamic batch? #5401

kwanwoo02 · 2023-02-14T07:35:56Z

kwanwoo02
Feb 14, 2023

How can I use dynamic batch?

I even went so far as to load the densenet onnx example model. I want to use dynamic batch for this model.
Is it possible to use dynamic batch by modifying only the config file?

The way I did it is by editing the config file. But it didn't load in the normal way.
I would be grateful if you could tell me in detail.

I wish there was an example code that even beginners can do.

triton server docker command

# trion version == 22.12

docker run --gpus=all --shm-size=1g --ulimit memlock=-1  --ulimit stack=67108864 -it -p8012:8012 -p8013:8013 -p8014:8014 -v ${PWD}/model_repository:/models triton_server:inference_server tritonserver --model-repository=models --http-port 8012 --log-verbose 1  --allow-http True --strict-readiness True --allow-grpc True --grpc-port 8013 --allow-metrics True --allow-gpu-metrics True --metrics-port 8014

config.pbtxt

name: "densenet_onnx"
platform: "onnxruntime_onnx"
max_batch_size : 8

dynamic_batching {
    preferred_batch_size: [2, 4, 8]
    max_queue_delay_microseconds: 3000000
}

input [
  {
    name: "data_0"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
    reshape { shape: [ 1, 3, 224, 224 ] }
  }
]
output [
  {
    name: "fc6_1"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    reshape { shape: [ 1, 1000, 1, 1 ] }
    label_filename: "densenet_labels.txt"
  }
]

Answered by dyastremsky

Feb 14, 2023

There's a pull request currently to create a conceptual walkthrough to go into more detail about dynamic batching here.

The dynamic batcher documentation is here. Your config looks correct. If you're getting an error, it's likely because that model does not support batching. You need a model that supports batching. By enabling dynamic batching here, you add an extra dimension before the others, which the model is not expecting. When creating a model for use with server-side batching, you want the first dimension to be the batch dimension.

Depending on what kind of model you want to create, you can see some example model generation scripts we use for tests in this folder. Many of them have…

View full answer

dyastremsky · 2023-02-14T20:40:52Z

dyastremsky
Feb 14, 2023
Collaborator

There's a pull request currently to create a conceptual walkthrough to go into more detail about dynamic batching here.

The dynamic batcher documentation is here. Your config looks correct. If you're getting an error, it's likely because that model does not support batching. You need a model that supports batching. By enabling dynamic batching here, you add an extra dimension before the others, which the model is not expecting. When creating a model for use with server-side batching, you want the first dimension to be the batch dimension.

Depending on what kind of model you want to create, you can see some example model generation scripts we use for tests in this folder. Many of them have batching and non-batching variants, like this function.

0 replies

kwanwoo02 · 2023-02-15T03:24:42Z

kwanwoo02
Feb 15, 2023
Author

thank you for the reply. @dyastremsky
I have succeeded Loading the model but I want to test this. Is there any way to do it separately?
I want to make 1000 requests per second. Please let me know if you have an example code.

0 replies

dyastremsky · 2023-02-15T18:32:00Z

dyastremsky
Feb 15, 2023
Collaborator

The easiest way would be to use perf analyzer.

You could also do some testing similar to what we do in the test L0_batcher, if you want to do your own detailed validation.

0 replies

kwanwoo02 · 2023-02-16T03:30:35Z

kwanwoo02
Feb 16, 2023
Author

Thank you for always responding kindly.

I tried perf analyzer. But it seems I'm doing it wrong.

The difference between dynamic batch models and non-dynamic batch models.
expected result: Batch model should have faster inference speed than non-dynamic batch model
Question 1
Why is there a difference in inference time between non-batch and batch models ? Both are the same batch 1

Question 2
I request 1000 simultaneous requests to the batch model with batch 8, the client send suddenly increases. What is the reason?

Question 3
This is a different question, Does Tronton Server it self support auto scaling?

It seems I'm not understanding correctly.

# non-batch_model perf_analyzer command
perf_analyzer -m densenet_onnx -u localhost:8013 -i grpc --concurrency-range 1000:1005 -f non-batch_model.csv
# batch_model perf_analyzer command batch 1
perf_analyzer -m densenet_onnx_batch -u localhost:8013 -i grpc --concurrency-range 1000:1005 -f batch_model.csv
# batch_model perf_analyzer command batch 8
perf_analyzer -m densenet_onnx_batch -u localhost:8013 -i grpc --concurrency-range 1000:1005 -f result.csv -b 8

result

Concurrency	Inferences/Second	Client Send	Network+Server Send/Recv	Server Queue	Server Compute Input	Server Compute Infer	Server Compute Output	Client Recv	p50 latency	p90 latency	p95 latency	p99 latency
non-batch 1000	109.151	126	1174	9122185	128	8852	42	3	9131777	9181858	9189835	9195548
non-batch 1001	108.816	126	1088	9166177	126	8884	41	3	9171606	9210519	9218089	9226364
non-batch 1002	109.818	124	1071	9163930	126	8805	42	3	9177141	9235011	9245484	9258080
non-batch 1003	109.153	124	1081	9138232	129	8849	43	3	9164229	9206285	9213619	9224271
non-batch 1004	107.928	128	1119	9266421	132	8940	47	3	9270436	9344674	9352779	9359024
non-batch 1005	108.927	128	1115	9236327	132	8862	45	3	9249798	9304722	9328778	9359866
batch size 1 1000	620.671	172	2270	1593042	1427	10518	178	2	1612074	1639469	1643406	1652083
batch size 1 1001	613.194	171	2280	1618940	1560	10537	181	2	1623867	1690583	1756187	1803276
batch size 1 1002	614.956	173	2178	1615140	1517	10549	181	2	1632662	1662164	1669868	1682167
batch size 1 1003	611.411	172	2169	1623515	1528	10598	185	2	1634649	1693174	1705013	1726904
batch size 1 1004	611.859	169	2181	1627732	1553	10569	178	2	1641367	1690799	1718490	1756771
batch size 1 1005	619.864	171	2167	1609184	1493	10458	180	2	1621632	1677143	1685486	1698884
batch size 8 1000	178.187	1197646	952549	42650199	1805	42731	104	4	44773240	47847234	48903760	49562561
batch size 8 1001	175.968	927	4684	44963676	2026	43101	107	3	45020693	45140706	45164126	45175130
batch size 8 1002	179.525	853	4770	45125902	1219	42958	115	4	45201443	45279854	45286604	45295975
batch size 8 1003	175.523	821	4929	45063832	2040	43142	132	4	45124547	45199996	45209540	45217476
batch size 8 1004	175.974	830	4620	45270875	2127	42877	111	4	45368995	45447741	45452156	45456701
batch size 8 1005	174.639	960	5286	45565151	1939	43459	121	4	45701512	45789025	45799010	45809087

0 replies

dyastremsky · 2023-02-16T18:31:45Z

dyastremsky
Feb 16, 2023
Collaborator

These results don't make sense to me. You shouldn't be seeing much of a difference between batch size 1 and non-batch, yet you're batch size 1 has greater than 3x higher throughput. Your batch size 8 should typically perform better than batch size 1, assuming you're not exhausting your system resources (using all your CPU/GPU/etc.). Can you run nvidia-smi to see what your GPU usage is while these models are running? Also check the CPU usage.

Otherwise, we need you to provide the models so that we could see if we can reproduce the issues on our side.

Depending on what you mean by auto scaling, you'll likely need to incorporate outside tools like Kubernetes to do so. A couple blog posts about doing so:
https://developer.nvidia.com/blog/simplifying-and-scaling-inference-serving-with-triton-2-3/
https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I use dynamic batch? #5401

{{title}}

Replies: 5 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How can I use dynamic batch? #5401

kwanwoo02 Feb 14, 2023

Replies: 5 comments

dyastremsky Feb 14, 2023 Collaborator

kwanwoo02 Feb 15, 2023 Author

dyastremsky Feb 15, 2023 Collaborator

kwanwoo02 Feb 16, 2023 Author

dyastremsky Feb 16, 2023 Collaborator

kwanwoo02
Feb 14, 2023

dyastremsky
Feb 14, 2023
Collaborator

kwanwoo02
Feb 15, 2023
Author

dyastremsky
Feb 15, 2023
Collaborator

kwanwoo02
Feb 16, 2023
Author

dyastremsky
Feb 16, 2023
Collaborator