Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark launcher for AudioQnA #981

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions AudioQnA/benchmark/performance/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# AudioQnA Benchmarking

This folder contains a collection of scripts to enable inference benchmarking by leveraging a comprehensive benchmarking tool, [GenAIEval](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md), that enables throughput analysis to assess inference performance.

By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.

## Purpose

We aim to run these benchmarks and share them with the OPEA community for three primary reasons:

- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.

## Metrics

The benchmark will report the below metrics, including:

- Number of Concurrent Requests
- End-to-End Latency: P50, P90, P99 (in milliseconds)
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
- Average Next Token Latency (in milliseconds)
- Average Token Latency (in milliseconds)
- Requests Per Second (RPS)
- Output Tokens Per Second
- Input Tokens Per Second

Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.

## Getting Started

We recommend using Kubernetes to deploy the AudioQnA service, as it offers benefits such as load balancing and improved scalability. However, you can also deploy the service using Docker if that better suits your needs.

### Prerequisites

- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).

- Every node has direct internet access
- Set up kubectl on the master node with access to the Kubernetes cluster.
- Install Python 3.8+ on the master node for running GenAIEval.
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
- Ensure that the container's ulimit can meet the the number of requests.

```bash
# The way to modify the containered ulimit:
sudo systemctl edit containerd
# Add two lines:
[Service]
LimitNOFILE=65536:1048576

sudo systemctl daemon-reload; sudo systemctl restart containerd
```

### Test Steps

Please deploy AudioQnA service before benchmarking.

##### Run Benchmark Test

Before the benchmark, we can configure the number of test queries and test output directory by:

```bash
export USER_QUERIES="[128, 128, 128, 128]"
export TEST_OUTPUT_DIR="/tmp/benchmark_output"
```

And then run the benchmark by:

```bash
bash benchmark.sh -n <node_count>
```

The argument `-n` refers to the number of test nodes.

##### 4. Data collection

All the test results will come to this folder `/tmp/benchmark_output` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
99 changes: 99 additions & 0 deletions AudioQnA/benchmark/performance/benchmark.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#!/bin/bash

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

deployment_type="k8s"
node_number=1
service_port=8888
query_per_node=128

benchmark_tool_path="$(pwd)/GenAIEval"

usage() {
echo "Usage: $0 [-d deployment_type] [-n node_number] [-i service_ip] [-p service_port]"
echo " -d deployment_type AudioQnA deployment type, select between k8s and docker (default: k8s)"
echo " -n node_number Test node number, required only for k8s deployment_type, (default: 1)"
echo " -i service_ip AudioQnA service ip, required only for docker deployment_type"
echo " -p service_port AudioQnA service port, required only for docker deployment_type, (default: 8888)"
exit 1
}

while getopts ":d:n:i:p:" opt; do
case ${opt} in
d )
deployment_type=$OPTARG
;;
n )
node_number=$OPTARG
;;
i )
service_ip=$OPTARG
;;
p )
service_port=$OPTARG
;;
\? )
echo "Invalid option: -$OPTARG" 1>&2
usage
;;
: )
echo "Invalid option: -$OPTARG requires an argument" 1>&2
usage
;;
esac
done

if [[ "$deployment_type" == "docker" && -z "$service_ip" ]]; then
echo "Error: service_ip is required for docker deployment_type" 1>&2
usage
fi

if [[ "$deployment_type" == "k8s" && ( -n "$service_ip" || -n "$service_port" ) ]]; then
echo "Warning: service_ip and service_port are ignored for k8s deployment_type" 1>&2
fi

function main() {
if [[ ! -d ${benchmark_tool_path} ]]; then
echo "Benchmark tool not found, setting up..."
setup_env
fi
run_benchmark
}

function setup_env() {
git clone https://github.com/opea-project/GenAIEval.git
pushd ${benchmark_tool_path}
python3 -m venv stress_venv
source stress_venv/bin/activate
pip install -r requirements.txt
popd
}

function run_benchmark() {
source ${benchmark_tool_path}/stress_venv/bin/activate
export DEPLOYMENT_TYPE=${deployment_type}
export SERVICE_IP=${service_ip:-"None"}
export SERVICE_PORT=${service_port:-"None"}
if [[ -z $USER_QUERIES ]]; then
user_query=$((query_per_node*node_number))
export USER_QUERIES="[${user_query}, ${user_query}, ${user_query}, ${user_query}]"
echo "USER_QUERIES not configured, setting to: ${USER_QUERIES}."
fi
export WARMUP=$(echo $USER_QUERIES | sed -e 's/[][]//g' -e 's/,.*//')
if [[ -z $WARMUP ]]; then export WARMUP=0; fi
if [[ -z $TEST_OUTPUT_DIR ]]; then
if [[ $DEPLOYMENT_TYPE == "k8s" ]]; then
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/node_${node_number}"
else
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/docker"
fi
echo "TEST_OUTPUT_DIR not configured, setting to: ${TEST_OUTPUT_DIR}."
fi

envsubst < ./benchmark.yaml > ${benchmark_tool_path}/evals/benchmark/benchmark.yaml
cd ${benchmark_tool_path}/evals/benchmark
python benchmark.py
}

main
52 changes: 52 additions & 0 deletions AudioQnA/benchmark/performance/benchmark.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

test_suite_config: # Overall configuration settings for the test suite
examples: ["audioqna"] # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
deployment_type: "k8s" # Default is "k8s", can also be "docker"
service_ip: None # Leave as None for k8s, specify for Docker
service_port: None # Leave as None for k8s, specify for Docker
warm_ups: 0 # Number of test requests for warm-up
run_time: 60m # The max total run time for the test suite
seed: # The seed for all RNGs
user_queries: [1, 2, 4, 8, 16, 32, 64, 128] # Number of test requests at each concurrency level
query_timeout: 120 # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
random_prompt: false # Use random prompts if true, fixed prompts if false
collect_service_metric: false # Collect service metrics if true, do not collect service metrics if false
data_visualization: false # Generate data visualization if true, do not generate data visualization if false
llm_model: "Intel/neural-chat-7b-v3-3" # The LLM model used for the test
test_output_dir: "/tmp/benchmark_output" # The directory to store the test output
load_shape: # Tenant concurrency pattern
name: constant # poisson or constant(locust default load shape)
params: # Loadshape-specific parameters
constant: # Poisson load shape specific parameters, activate only if load_shape is poisson
concurrent_level: 4 # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
poisson: # Poisson load shape specific parameters, activate only if load_shape is poisson
arrival-rate: 1.0 # Request arrival rate
namespace: "" # Fill the user-defined namespace. Otherwise, it will be default.

test_cases:
audioqna:
asr:
run_test: true
service_name: "asr-svc" # Replace with your service name
llm:
run_test: true
service_name: "llm-svc" # Replace with your service name
parameters:
model_name: "Intel/neural-chat-7b-v3-3"
max_new_tokens: 128
temperature: 0.01
top_k: 10
top_p: 0.95
repetition_penalty: 1.03
streaming: true
llmserve:
run_test: true
service_name: "llm-svc" # Replace with your service name
tts:
run_test: true
service_name: "tts-svc" # Replace with your service name
e2e:
run_test: true
service_name: "audioqna-backend-server-svc" # Replace with your service name
Loading