Merge branch 'main' into ko-rag_with_knowledge_graphs_neo4j

huggingface · Oct 21, 2024 · 5f2d56a · 5f2d56a
2 parents adfa5c5 + 688115c
commit 5f2d56a
Show file tree

Hide file tree

Showing 27 changed files with 37,076 additions and 58 deletions.
diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml
@@ -86,6 +86,8 @@
           title: Analyzing Artistic Styles with Multimodal Embeddings
         - local: faiss_with_hf_datasets_and_clip
           title: Embedding multimodal data for similarity search
+        - local: multimodal_rag_using_document_retrieval_and_vlms
+          title: Multimodal Retrieval-Augmented Generation (RAG) with Document Retrieval (ColPali) and Vision Language Models (VLMs)
 
     - title: Search Recipes
       isExpanded: false

diff --git a/notebooks/en/code_search.ipynb b/notebooks/en/code_search.ipynb
@@ -172,41 +172,39 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 3,
+      "execution_count": null,
       "metadata": {
         "colab": {
           "base_uri": "https://localhost:8080/"
         },
         "id": "nZEBXstzQQCk",
         "outputId": "81d723e6-08c2-4a25-8b8e-508c4a7e86b1"
       },
-      "outputs": [
-        {
-          "data": {
-            "text/plain": [
-              "{'name': 'InvertedIndexRam',\n",
-              " 'signature': '# [doc = \" Inverted flatten index from dimension id to posting list\"] # [derive (Debug , Clone , PartialEq)] pub struct InvertedIndexRam { # [doc = \" Posting lists for each dimension flattened (dimension id -> posting list)\"] # [doc = \" Gaps are filled with empty posting lists\"] pub postings : Vec < PostingList > , # [doc = \" Number of unique indexed vectors\"] # [doc = \" pre-computed on build and upsert to avoid having to traverse the posting lists.\"] pub vector_count : usize , }',\n",
-              " 'code_type': 'Struct',\n",
-              " 'docstring': '= \" Inverted flatten index from dimension id to posting list\"',\n",
-              " 'line': 15,\n",
-              " 'line_from': 13,\n",
-              " 'line_to': 22,\n",
-              " 'context': {'module': 'inverted_index',\n",
-              "  'file_path': 'lib/sparse/src/index/inverted_index/inverted_index_ram.rs',\n",
-              "  'file_name': 'inverted_index_ram.rs',\n",
-              "  'struct_name': None,\n",
-              "  'snippet': '/// Inverted flatten index from dimension id to posting list\\n#[derive(Debug, Clone, PartialEq)]\\npub struct InvertedIndexRam {\\n    /// Posting lists for each dimension flattened (dimension id -> posting list)\\n    /// Gaps are filled with empty posting lists\\n    pub postings: Vec<PostingList>,\\n    /// Number of unique indexed vectors\\n    /// pre-computed on build and upsert to avoid having to traverse the posting lists.\\n    pub vector_count: usize,\\n}\\n'}}"
-            ]
-          },
-          "execution_count": 3,
-          "metadata": {},
-          "output_type": "execute_result"
-        }
-      ],
+      "outputs": [],
       "source": [
         "structures[0]"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "```python\n",
+        "{'name': 'InvertedIndexRam',\n",
+        " 'signature': '# [doc = \" Inverted flatten index from dimension id to posting list\"] # [derive (Debug , Clone , PartialEq)] pub struct InvertedIndexRam { # [doc = \" Posting lists for each dimension flattened (dimension id -> posting list)\"] # [doc = \" Gaps are filled with empty posting lists\"] pub postings : Vec < PostingList > , # [doc = \" Number of unique indexed vectors\"] # [doc = \" pre-computed on build and upsert to avoid having to traverse the posting lists.\"] pub vector_count : usize , }',\n",
+        " 'code_type': 'Struct',\n",
+        " 'docstring': '= \" Inverted flatten index from dimension id to posting list\"',\n",
+        " 'line': 15,\n",
+        " 'line_from': 13,\n",
+        " 'line_to': 22,\n",
+        " 'context': {'module': 'inverted_index',\n",
+        "  'file_path': 'lib/sparse/src/index/inverted_index/inverted_index_ram.rs',\n",
+        "  'file_name': 'inverted_index_ram.rs',\n",
+        "  'struct_name': None,\n",
+        "  'snippet': '/// Inverted flatten index from dimension id to posting list\\n#[derive(Debug, Clone, PartialEq)]\\npub struct InvertedIndexRam {\\n    /// Posting lists for each dimension flattened (dimension id -> posting list)\\n    /// Gaps are filled with empty posting lists\\n    pub postings: Vec<PostingList>,\\n    /// Number of unique indexed vectors\\n    /// pre-computed on build and upsert to avoid having to traverse the posting lists.\\n    pub vector_count: usize,\\n}\\n'}}\n",
+        "  ```"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -314,7 +312,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 7,
+      "execution_count": null,
       "metadata": {
         "colab": {
           "base_uri": "https://localhost:8080/",
@@ -323,25 +321,20 @@
         "id": "zosN7TC9QQCl",
         "outputId": "38e0d938-3fb1-4426-f00c-74a42267bf7d"
       },
-      "outputs": [
-        {
-          "data": {
-            "application/vnd.google.colaboratory.intrinsic+json": {
-              "type": "string"
-            },
-            "text/plain": [
-              "'Function Hnsw discover precision that does Checks discovery search precision when using hnsw index this is different from the tests in defined as Fn hnsw discover precision module integration file hnsw_discover_test rs'"
-            ]
-          },
-          "execution_count": 7,
-          "metadata": {},
-          "output_type": "execute_result"
-        }
-      ],
+      "outputs": [],
       "source": [
         "text_representations[1000]"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "```python\n",
+        "'Function Hnsw discover precision that does Checks discovery search precision when using hnsw index this is different from the tests in defined as Fn hnsw discover precision module integration file hnsw_discover_test rs'\n",
+        "```"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {

diff --git a/notebooks/en/index.md b/notebooks/en/index.md
@@ -23,6 +23,7 @@ Check out the recently added notebooks:
 - [Enhancing RAG Reasoning with Knowledge Graphs](rag_with_knowledge_graphs_neo4j)
 - [Fine-Tuning Object Detection on a Custom Dataset 🖼, Deployment in Spaces, and Gradio API Integration](fine_tuning_detr_custom_dataset)
 - [Fine-Tuning a Semantic Segmentation Model on a Custom Dataset and Usage via the Inference API](semantic_segmentation_fine_tuning_inference)
+- [Multimodal Retrieval-Augmented Generation (RAG) with Document Retrieval (ColPali) and Vision Language Models (VLMs)](multimodal_rag_using_document_retrieval_and_vlms)
 
 
 

diff --git a/notebooks/en/multimodal_rag_using_document_retrieval_and_vlms.ipynb b/notebooks/en/multimodal_rag_using_document_retrieval_and_vlms.ipynb
diff --git a/notebooks/ko/_toctree.yml b/notebooks/ko/_toctree.yml
@@ -9,5 +9,7 @@
       sections:
         - local: advanced_ko_rag
           title: 한국어로 Advanced RAG 구현하기 - Hugging Face와 LangChain을 활용한 Cookbook
+        - local: structured_generation
+          title: 구조화된 생성으로 근거 강조 표시가 있는 RAG 시스템 구축하기
         - local: ko_rag_with_knowledge_graphs_neo4j
-          title: 지식 그래프를 활용한 RAG 추론 향상
+          title: 지식 그래프를 활용한 RAG 추론 향상
diff --git a/notebooks/ko/index.md b/notebooks/ko/index.md
@@ -7,6 +7,7 @@
 최근 추가된 노트북을 살펴보세요:
 
 - [한국어 Advanced RAG 구현: Hugging Face와 LangChain 활용한 Cookbook](advanced_ko_rag)
+- [구조화된 생성으로 근거 강조 표시가 있는 RAG 시스템 구축하기](structured_generation)
 - [지식 그래프를 활용한 RAG 추론 향상](ko_rag_with_knowledge_graphs_neo4j)
 
 더 다양한 노트북을 확인하고 싶다면 Cookbook's [GitHub 리포지토리](https://github.com/huggingface/cookbook)에 방문해보세요.