ChatQnA queries return just Gaudi TGI errors when rerank is used #487

eero-t · 2024-10-18T15:45:06Z

Installing -f chatqna/gaudi-values.yaml git HEAD setup with Helm, and then querying ChatQnA:

curl http://${host_ip}:8888/v1/chatqna \
    -H "Content-Type: application/json" \
    -d '{
        "messages": "What is the revenue of Nike in 2023?"
    }'

Just gives TGI errors as answers:

data: b'{"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 2048. Given: 1887 `inputs` tokens and 1024 `max_new_tokens`","error_type":"validation"}'

data: [DONE]

Which is indeed how GenAIInfra is configured for Gaudi:

$ git grep MAX | grep gaudi | grep -e chatqna -e common
chatqna/gaudi-values.yaml:  MAX_INPUT_LENGTH: "1024"
chatqna/gaudi-values.yaml:  MAX_TOTAL_TOKENS: "2048"
chatqna/guardrails-gaudi-values.yaml:  MAX_INPUT_LENGTH: "1024"
chatqna/guardrails-gaudi-values.yaml:  MAX_TOTAL_TOKENS: "2048"
chatqna/guardrails-gaudi-values.yaml:  MAX_INPUT_LENGTH: "1024"
chatqna/guardrails-gaudi-values.yaml:  MAX_TOTAL_TOKENS: "2048"
common/tgi/gaudi-values.yaml:MAX_INPUT_LENGTH: "1024"
common/tgi/gaudi-values.yaml:MAX_TOTAL_TOKENS: "2048"

PS. To make things worse:

ChatQnA interprets these as successes (returns 200 OK code, and increments related request metric)
CI did not catch this regression, if it is due to nowrapper change: Implement the nowrapper version chatqna #474

The text was updated successfully, but these errors were encountered:

eero-t · 2024-10-23T09:44:16Z

@lianhao @yongfengdu Any comments on this?

eero-t · 2024-10-23T14:39:21Z

Note: TGI outputs this error only after uploading doc with data-prep i.e. when ChatQnA uses reranking (rerank use adds more input tokens for TGI).

If I minimize input / use smaller max tokens limit, there's still an error, it just changes a bit:

$ time curl --no-progress-meter http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json"  -d '{"messages":"1+1?", "max_tokens": 4}'
data: b'{"error":"Input validation error: `inputs` must have less than 1024 tokens. Given: 1835","error_type":"validation"}'

On quick check on HF TEI docs, it does not seem to have options for limiting tokens, (except for warmup), but the issue goes away if I double current TGI token limits:

  MAX_INPUT_LENGTH: "2048"
  MAX_TOTAL_TOKENS: "4096"

eero-t · 2024-10-23T14:42:04Z

Doesn't current CI make sure that data-prep upload worked i.e. rerank is used, before verifying that ChatQnA / TGI work with gaudi-values.yaml settings?

eero-t changed the title ~~ChatQnA queries return just Gaudi TGI error messages~~ ChatQnA queries return just Gaudi TGI error messages when rerank is used Oct 23, 2024

eero-t changed the title ~~ChatQnA queries return just Gaudi TGI error messages when rerank is used~~ ChatQnA queries return just Gaudi TGI errors when rerank is used Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatQnA queries return just Gaudi TGI errors when rerank is used #487

ChatQnA queries return just Gaudi TGI errors when rerank is used #487

eero-t commented Oct 18, 2024 •

edited

Loading

eero-t commented Oct 23, 2024

eero-t commented Oct 23, 2024

eero-t commented Oct 23, 2024

ChatQnA queries return just Gaudi TGI errors when rerank is used #487

ChatQnA queries return just Gaudi TGI errors when rerank is used #487

Comments

eero-t commented Oct 18, 2024 • edited Loading

eero-t commented Oct 23, 2024

eero-t commented Oct 23, 2024

eero-t commented Oct 23, 2024

eero-t commented Oct 18, 2024 •

edited

Loading