Skip to content

Mystic-Slice/smart-vid-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smart Video Index

Finds the most relevant moments in a video collection for your question.

Demo:

Demo.mp4

More Examples:

Examples.mp4

How does it work?

The app indexes a video, more specifically the transcript of the video in two levels.

  • Video level - a summary of the video content generated by an LLM is stored.
  • Segment level - the video is broken into segments of 1 minute each and stored as separate segments.

The following happens when a question is asked:

  • The question is passed to the LLM model to generate different variations of the question. (multi-query generation)
  • The generated queries are matched against the video level index to get the most relevant videos.
  • The generated queries are then matched against the segment level index to get the most relevant segments in the subset of videos obtained in the previous step.
  • The video segments are then ranked using reciprocal rank fusion.
  • The top video segments are passed to the LLM along with the user query to build an answer.
  • The answer and the top video segments are displayed to the user.

Why two levels of indexing?:

My reasoning for including a video level index instead of matching the query against the segment level index straight away is that, the question might contain some context that is not present in the video segment but might be important for the right video segment to be selected.

For example, in the Examples video, the second to last question is "What caused Obito to despair?"

Nowhere in the relevant video segment is the name Obito mentioned, but that information is very important for that question to be answered correctly. The video level index helps in restricting the search to videos that contain the context required for the question to be answered correctly.

Where could this app be most useful?

To search through a large video collection for specific moments. I don't see this app, in its current form, to provide a good retrieval and answer generation for general questions. It is best suited for specific questions that can be answered by a small segment of a video. Think of this app as more of a finder tool than a question answering system.

For example, if you are preparing for an algorithms exam or a job interview and want to quickly learn about bubble sort, ask the app to find the section in the video collection that explains bubble sort.

When is the retrieval the strongest?

The retrieval is strongest when the question is specific and contains the key terms that will appear in the video transcript.

When is the retrieval the weakest?

The retrieval is weakest when the question is more general and does not contain the key terms that will appear in the video transcript. It also struggles when the answer to the question is spread across multiple segments of a video.

Future extensions:

  • Improve segment retrieval
    • Make search more robust to lack of specificity in the query.
    • Incorporate more metadata in the search.
  • Add a collections feature where users can segregate videos into different collections and query them separately.
  • Make the tool more conversational where the chat history is used to improve/make corrections to the answer.

Tech & Tools used to build this app:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published