Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: feat: Add ColBERT #37

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bclavie
Copy link

@bclavie bclavie commented Aug 12, 2024

Hey @Muennighoff!

Just the indexing code for now (will add the rest tomorrow), but opening the draft PR in case you wanted to take a look at this before the rest comes in!

Goal of the PR

Add support for ColBERT models, starting with Answer.AI's ColBERT-small via an API Answer will host (discussed with @okhat who's also okay with this being the first ColBERT representative), in order to see how multi-vector models of various sizes fare on this benchmark. The querying mechanism within the API is very simple and lives at AnswerDotAI/mteb_arena_colbert_api.

Changes

  • The PR relies on an external API, where the index is hosted and queried, and which will simply return documents. It doesn't change the logic of any existing mechanisms.
  • It adds the ColBERT indexing code for full reproducibility
  • TODO: It adds the querying mechanism, using API calls to fetch the highest scoring document for a given query.
  • TODO: It adds utilities to download the pre-built indexes from Wikipedia to be able to query them locally.

@bclavie bclavie marked this pull request as draft August 12, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant