Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Concurrent Query Processing and Document Upload #1848

Open
llmwesee opened this issue Sep 18, 2024 · 9 comments
Open

Issue with Concurrent Query Processing and Document Upload #1848

llmwesee opened this issue Sep 18, 2024 · 9 comments

Comments

@llmwesee
Copy link

I have implemented a solution using vLLM on an A100 server to support multiple users. However, I have encountered an issue:

While one user's query is being processed, other users are unable to upload documents into the UserData or MyData collections. The document upload process gets stuck at the processing stage without any errors appearing in the terminal or UI. Additionally, the document is not uploaded successfully.

Can you suggest ways to decouple the query processing, document upload, and user interface programs so they can run independently of each other?

Alternatively, can we build or use prebuilt separate APIs to manage program in the backend?
Please provide suggestions or potential solutions.

@pseudotensor
Copy link
Collaborator

pseudotensor commented Sep 20, 2024

They should all be independent unless you changed CONCURRENCY_COUNT to be 1. This is tested normally. The backend has no issues with this at all.

@pseudotensor
Copy link
Collaborator

Once you have that working, I can explain how to make it even more efficient using the function_server.

@llmwesee
Copy link
Author

this is the command for running h2ogpt with login.
python generate.py --base_model=meta-llama/Meta-Llama-3.1-8B-Instruct --score_model=None --langchain_mode='UserData' --user_path=user_path --auth='' --use_auth_token=True --visible_visible_models=False --max_seq_len=8192 --max_max_new_tokens=4096 --max_new_tokens=4096 --min_new_tokens=256
can you show me some examples for having h2ogpt as fully backend server running with full functionality from query processing to document uploading for multiple users concurrently & independently . I want to integrated it's backend with react or next.js as frontend with having full functionality like as h2ogpt and having a datalake for all related document things

@pseudotensor
Copy link
Collaborator

I'd guess I'd need to ask how you see things blocked. E.g. if you had a pytest test code that you are running that shows how things are blocking each other (e.g. long add of dock and then chat is blocked in another test you ran with -n 2) or you just show video of the UI and what you are doing, I can mimic it and see if I can see what you are seeing.

@pseudotensor
Copy link
Collaborator

As for the function server, you can try it. Just add to CLI:

 --function_server=True --function_server_workers=5 --multiple_workers_gunicorn=True --function_server_port=5002 --function_api_key=API_KEY

@llmwesee
Copy link
Author

llmwesee commented Oct 4, 2024

the function server has issue when hitting through upload_api and add_file_api

Traceback (most recent call last):
  File "/home/abc/Documents/xxxx/xxxx/src/gpt_langchain.py", line 9383, in update_user_db
    return _update_user_db(file, db1s=db1s,
  File "/home/xxxx/src/gpt_langchain.py", line 9664, in _update_user_db
    sources = call_function_server('0.0.0.0', function_server_port, 'path_to_docs', (file,), simple_kwargs,
  File "/home/xxxx/src/function_client.py", line 50, in call_function_server
    execute_result = execute_function_on_server(host, port, function_name, args, kwargs, use_disk, use_pickle,
  File "/home/xxxx/src/function_client.py", line 21, in execute_function_on_server
    response = requests.post(url, json=payload, headers=headers)
  File "/home/xxxx/lib/python3.10/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/home/xxxx/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/xxxx/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/xxxx/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/xxxx/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=5002): Max retries exceeded with url: /execute_function/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1deb5867a0>: Failed to establish a new connection: [Errno 111] Connection refused'))

@pseudotensor
Copy link
Collaborator

It just looks like the function server isn't even up. Perhaps you have something else on that port etc. Check startup logs.

@llmwesee
Copy link
Author

llmwesee commented Oct 7, 2024

They should all be independent unless you changed CONCURRENCY_COUNT to be 1. This is tested normally. The backend has no issues with this at all.

when setting concurrency count to be 64:

python generate.py --base_model=meta-llama/Meta-Llama-3.1-8B-Instruct --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --visible_visible_models=False --max_seq_len=8192 --max_max_new_tokens=4096 --max_new_tokens=4096 --min_new_tokens=256 --api_open=True --allow_api=True --max_quality=True --function_server=True --function_server_workers=5 --multiple_workers_gunicorn=True --function_server_port=5002 --function_api_key=API_KEY --concurrency_count=64

then the following error is shown:

File "/home/xxxx/src/gen.py", line 1736, in main
    raise ValueError(
ValueError: Concurrency count > 1 will lead to mixup in cache use for local LLMs, disable this raise at own risk.

@pseudotensor
Copy link
Collaborator

Correct, I recommend vLLM for handling concurrency well, transformers is not itself thread safe.

@h2oai h2oai deleted a comment Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants