-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible class conflict between faiss-cpu and pymupdf #3689
Comments
This may also be a potential security vulnerability depending on what is actually happening under the hood. For example, I could modify the pymupdf vector class to include malicious code in the data() function, and the pymupdf proxy class would inadvertently be used, allowing for the code to be run whenever the .data() method is called. |
this may be because both Faiss and pymupdf are wrapped with SWIG. |
I think we could use SWIG_TYPE_TABLE to make a unique type table for Faiss. |
@hairuoguo could you try to install Faiss through conda? and here is the instruction https://github.com/facebookresearch/faiss/blob/main/INSTALL.md . Thanks |
will try this out when I have the time (next week or so), thanks |
@hairuoguo I faced the same issue while using fitz but when I used PDFplumber there is no issue. |
Summary
Hello,
I am currently using the ColBERT model for a work project, which uses faiss. We had pymupdf installed in the same conda environment, as we are trying to work with scanned documents as a datasource.
ColBERT calls faiss's kmeans.train(), which led to an AssertionError on line 109 in vector_to_array.py (assert classname.endswith('Vector')). When I took a look at the input to that function it was a pymupdf proxy object instead of belonging to the expected "[dtype]Vector" classes defined in faiss.
This error disappeared after uninstalling pymupdf.
Platform
OS: Ubuntu 20.04.5 LTS (in docker container)
Faiss version: faiss-cpu 1.8.0.post1
Installed from: pip
Faiss compilation options: default flags
Running on:
Interface:
Reproduction instructions
Install faiss-cpu and pymupdf in conda environment using pip.
Import fitz (pymupdf) and attempt to train faiss kmeans class
OR
Install ColBERT from ColBERT repo using instructions
Install pymupdf
import fitz (pymupdf) in code that runs ColBERT's Indexer class
The text was updated successfully, but these errors were encountered: