Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All metadata attribute lost when going through dsp.primitives.search.py #1592

Open
bhomass opened this issue Oct 7, 2024 · 4 comments
Open
Assignees

Comments

@bhomass
Copy link

bhomass commented Oct 7, 2024

I use a chromadb retriever. Chromadb itself returns the entire node with content and all metadata attributes. But once the code runs through search.py, only longtext is kept, and all metadata are lost.

That's a heavy loss. The metadata potentially contains lots information that the client caller is looking for.

Can this be re-architected so that the metadata attributes are returned when calling dspy.Retrieve()?

@okhat
Copy link
Collaborator

okhat commented Oct 7, 2024

Great point. Tagging @arnavsinghvi11 who is re-architecting the retriever abstractions

@arnavsinghvi11
Copy link
Collaborator

Hi @bhomass , would this be solved by passing in with_metadata = True in the dspy.Retrieve forward pass?

The ChromaDBRM integration in DSPy does return the metadata so setting this should retrieve it, but let me know if it doesn't!

@bhomass
Copy link
Author

bhomass commented Oct 13, 2024

indeed I see that options now. But I ran into a new error complaining there is no re-ranker. This appears to be illogical code.
dsp/primitive/search.py :119
if not dsp.settings.reranker:
return retrieveRerankEnsemblewithMetadata(queries=queries,k=k)

why is this forcing a call to retrieveRerankEnsemblewithMetadata when the caller is NOT feeding a reranker. Shouldn't this be
if dsp.settings.reranker:
return retrieveRerankEnsemblewithMetadata(queries=queries,k=k)

@bhomass
Copy link
Author

bhomass commented Oct 13, 2024

I removed the "not" keyword, and now I see the metadata returned with the chunk! That's what I wanted.

Thansk!

What should I do with the "not" keyword. should it remain there? if so, do I artificially feed a non-null ranker, just so I get past the if statement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants