Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Missing embeddings in collections after a system reboot #2905

Open
ymzayek opened this issue Oct 7, 2024 · 8 comments
Open

[Bug]: Missing embeddings in collections after a system reboot #2905

ymzayek opened this issue Oct 7, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@ymzayek
Copy link

ymzayek commented Oct 7, 2024

What happened?

Hello, I'm working on a project where we use chromadb:0.5.11 as part of rag pipelines. We have succesfully used it to create collections and query them. We use our own embedder for the queries and chunks and do not rely on the chroma embedding method. We have just had an issue where it seemed that the embeddings in a collection got "deleted" or at least they are missing over the weekend after a reboot of the servers that we work on. To be clear, some query search tests on the collections before the weekend and system reboot clearly showed that the embeddings were well added to the collections at time of creation: a retrieval worked as it should to get the relevant documents. Now the same tests return an empty list of documents. I did some debugging by connecting to the chroma server with the chroma http client to maually check the collections and I see that the chunks and metadata still exists but the embeddings are empty. Has anyone seen this problem before? Any ideas about what could have happened or if it could be related to a system reboot? In the chroma logs all looks fine. I just see DEBUG: Starting component PersistentLocalHnswSegment sometimes but not sure that is related.

Possibly related to the following issues linked below but in our use case we do not delete any documents and then add new ones to the same collection. We just create one collection at a time which we then query without any further modifications in any of the data/documents in the collection.

#2512
#2062
#870

Versions

0.5.11

Relevant log output

No response

@ymzayek ymzayek added the bug Something isn't working label Oct 7, 2024
@HammadB
Copy link
Collaborator

HammadB commented Oct 9, 2024

Hi @ymzayek is it possible to share your db if its not sensitive data? Also happy to take them via discord/email if you don't want to post it here. That would help us debug on our end.

Otherwise, can we start by knowing what files are under /<chroma_path>/<collection_id>/

@tazarov
Copy link
Contributor

tazarov commented Oct 10, 2024

@ymzayek, have a look at this issue #2922. I think the symptoms you are experiencing are similar.

tazarov added a commit that referenced this issue Oct 10, 2024
Closes #2922
Closes #2912

It might be related to #2905

## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
	 - ...
 - New functionality
	 - ...

## Test plan
*How are these changes tested?*

- [ ] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust

## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs repository](https://github.com/chroma-core/docs)?*
@ymzayek
Copy link
Author

ymzayek commented Oct 11, 2024

@tazarov yes it seems to explain our problem so I think you can close this issue. Thanks!

@tazarov
Copy link
Contributor

tazarov commented Oct 11, 2024

@ymzayek, excellent. However, you will have to recreate the missing embeddings. Let me know if you need help with that.

@ymzayek
Copy link
Author

ymzayek commented Oct 11, 2024

@tazarov yes we will handle that. Thanks a lot :)

@tazarov
Copy link
Contributor

tazarov commented Oct 14, 2024

@ymzayek, can you confirm whether this solved your problem?

@ymzayek
Copy link
Author

ymzayek commented Oct 17, 2024

Apologies for the delay. For the moment we haven't upgraded (after a downgrade to 0.5.5). Are there any compatibility issues with the latest version and 0.5.5? But in any case, you can feel free to close this issue

@tazarov
Copy link
Contributor

tazarov commented Oct 20, 2024

@ymzayek, there have been a number of bugfixes and improvements introduced in 0.5.13+. One notable thing that was fixed in 0.5.7 was a connection pool leak #2014 (long-standing problem in Chroma). From server compatibility I don't think you'll face any major hurdles. There's been some minor API changes - e.g. #2880.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants