Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatQnA queries return just Gaudi TGI errors when rerank is used #487

Open
eero-t opened this issue Oct 18, 2024 · 3 comments
Open

ChatQnA queries return just Gaudi TGI errors when rerank is used #487

eero-t opened this issue Oct 18, 2024 · 3 comments

Comments

@eero-t
Copy link
Contributor

eero-t commented Oct 18, 2024

Installing -f chatqna/gaudi-values.yaml git HEAD setup with Helm, and then querying ChatQnA:

curl http://${host_ip}:8888/v1/chatqna \
    -H "Content-Type: application/json" \
    -d '{
        "messages": "What is the revenue of Nike in 2023?"
    }'

Just gives TGI errors as answers:

data: b'{"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 2048. Given: 1887 `inputs` tokens and 1024 `max_new_tokens`","error_type":"validation"}'

data: [DONE]

Which is indeed how GenAIInfra is configured for Gaudi:

$ git grep MAX | grep gaudi | grep -e chatqna -e common
chatqna/gaudi-values.yaml:  MAX_INPUT_LENGTH: "1024"
chatqna/gaudi-values.yaml:  MAX_TOTAL_TOKENS: "2048"
chatqna/guardrails-gaudi-values.yaml:  MAX_INPUT_LENGTH: "1024"
chatqna/guardrails-gaudi-values.yaml:  MAX_TOTAL_TOKENS: "2048"
chatqna/guardrails-gaudi-values.yaml:  MAX_INPUT_LENGTH: "1024"
chatqna/guardrails-gaudi-values.yaml:  MAX_TOTAL_TOKENS: "2048"
common/tgi/gaudi-values.yaml:MAX_INPUT_LENGTH: "1024"
common/tgi/gaudi-values.yaml:MAX_TOTAL_TOKENS: "2048"

PS. To make things worse:

@eero-t
Copy link
Contributor Author

eero-t commented Oct 23, 2024

@lianhao @yongfengdu Any comments on this?

@eero-t
Copy link
Contributor Author

eero-t commented Oct 23, 2024

Note: TGI outputs this error only after uploading doc with data-prep i.e. when ChatQnA uses reranking (rerank use adds more input tokens for TGI).

If I minimize input / use smaller max tokens limit, there's still an error, it just changes a bit:

$ time curl --no-progress-meter http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json"  -d '{"messages":"1+1?", "max_tokens": 4}'
data: b'{"error":"Input validation error: `inputs` must have less than 1024 tokens. Given: 1835","error_type":"validation"}'

On quick check on HF TEI docs, it does not seem to have options for limiting tokens, (except for warmup), but the issue goes away if I double current TGI token limits:

  MAX_INPUT_LENGTH: "2048"
  MAX_TOTAL_TOKENS: "4096"

@eero-t eero-t changed the title ChatQnA queries return just Gaudi TGI error messages ChatQnA queries return just Gaudi TGI error messages when rerank is used Oct 23, 2024
@eero-t eero-t changed the title ChatQnA queries return just Gaudi TGI error messages when rerank is used ChatQnA queries return just Gaudi TGI errors when rerank is used Oct 23, 2024
@eero-t
Copy link
Contributor Author

eero-t commented Oct 23, 2024

Doesn't current CI make sure that data-prep upload worked i.e. rerank is used, before verifying that ChatQnA / TGI work with gaudi-values.yaml settings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant