diff --git a/docs-ua/community/rfcs/24-06-21-OPEA-001-DocSum_Video_Audio.md b/docs-ua/community/rfcs/24-06-21-OPEA-001-DocSum_Video_Audio.md
new file mode 100644
index 00000000..e6215dd9
--- /dev/null
+++ b/docs-ua/community/rfcs/24-06-21-OPEA-001-DocSum_Video_Audio.md
@@ -0,0 +1,107 @@
+# 24-06-21-OPEA-001-DocSum_Video_Audio
+RFC - Expanding Document Summary through Video and Audio
+
+## RFC Content
+
+### Author
+[Mustafa Cetin](https://github.com/MSCetin37)
+
+### Status
+Under Review
+
+### Objective
+This RFC aims to extend the current Document Summarization Application by incorporating video and audio summary features. This enhancement will enable the application to summarize video and audio content in addition to text documents, thereby broadening its utility and applicability.
+
+### Motivation
+The motivation for adding video and audio summary features stems from the increasing prevalence of multimedia content in various domains, including education, corporate training, marketing, and entertainment. Videos and audio recordings often contain valuable information that can be time-consuming to digest in their entirety. By summarizing video and audio content, users can quickly grasp the key points, saving time and improving productivity.
+
+Key motivations include:
+1. **Enhanced User Experience**: Users can quickly understand the essence of video and audio content without consuming the entire media.
+2. **Increased Efficiency**: Summarizing videos and audio can save time for professionals who need to review large amounts of multimedia content.
+3. **Broader Applicability**: Extending the application to handle video and audio content makes it more versatile and useful across different industries.
+4. **Competitive Advantage**: Offering video and audio summarization can differentiate the application from other text-only summarization tools.
+
+### Design Proposal
+
+#### Workflow of the Deployed Document Summarization Service
+The workflow of the Document Summarization Service, from the user's input query to the application's output response, is as follows:
+
+```mermaid
+flowchart LR
+ subgraph DocSum
+ direction LR
+ A[User] <--> |Input query| B[DocSum Gateway]
+ B <--> |Post| Megaservice
+ subgraph Megaservice["Megaservice"]
+ direction TB
+ C([Microservice - Video-to-AudioDoc : Will be implemented]) -. Post .-> D([Microservice - Audio-to-Text Transcription : opea/asr
9099]) -. Post .-> E([Microservice : llm-docsum-tgi
9000]) -. Post .-> F{{TGI Service
8008}}
+ end
+ Megaservice --> |Output| G[Response]
+ end
+ subgraph Legend
+ X([Microservice])
+ Y{{Service from industry peers}}
+ Z[Gateway]
+ end
+```
+
+The proposed design for the video and audio summary features involves the following components:
+
+#### 1. DocSum Gateway:
+- **User Interface**: Update the user interface to upload video and audio files in various formats to summarize alongside text.
+
+#### 2. Text Transcription, Video, and Audio Ingestion and Preprocessing:
+- **Audio Extraction Microservice**: Extract audio from video files for transcription.
+
+ Signature of audio extraction microservice:
+ ```python
+ @traceable(run_type="tool")
+ @register_statistics(names=["opea_service@audio_extraction"])
+ def audio_extraction(input: VideoDoc) -> AudioDoc:
+ ```
+- **Audio-to-Text Transcription**: Use the Audio-Speech-Recognition microservice from OPEA, which aims to generate a transcript for an input audio using an audio-to-text model (Whisper).
+
+ Transcript generation microservice:
+ - opea/whisper:latest
+ - opea/asr:latest
+
+- **Text Transcription**: Apply existing text summarization techniques that do not require any data preprocessing.
+
+#### 3. Summarization:
+- **Text Summarization**: Apply existing text summarization techniques to the generated transcripts.
+- **Audio Summarization**: Use audio summarization techniques that extract the Transcription, then use text summarization steps.
+- **Visual Summarization**: Use visual summarization techniques that extract the AudioDoc and then use Audio Summarization to create Transcription, then use text summarization steps.
+
+#### 4. Integration and Output:
+- **Summary Generation**: Combine text, audio, and visual summaries to create comprehensive document summaries from different document formats.
+
+### Use-case Stories
+
+#### 1. Corporate Training:
+**Scenario**: A company conducts regular training sessions and records them as videos. Employees need to review these training videos to stay updated.
+
+**Solution**: The video summary feature can generate concise summaries of training videos, highlighting key points and important segments. Employees can quickly review the summaries to understand the training content without watching the entire video.
+
+#### 2. Educational Content:
+**Scenario**: An online education platform offers video lectures on various subjects. Students often need to review these lectures for exams.
+
+**Solution**: The video summary feature can create summaries of video lectures, providing students with a quick overview of the main topicscovered. This helps students to revise efficiently and focus on important concepts.
+
+#### 3. Marketing and Advertising:
+**Scenario**: A marketing team produces promotional videos for their products. They need to analyze the effectiveness of these videos.
+
+**Solution**: The video summary feature can generate summaries of promotional videos, highlighting key messages and visual elements. The marketing team can use these summaries to evaluate the impact of their videos and make data-driven decisions.
+
+#### 4. Research and Development:
+**Scenario**: Researchers record their experiments and presentations as videos. They need to document and share their findings with colleagues.
+
+**Solution**: The video summary feature can create summaries of research videos, capturing essential information and visual data. Researchers can share these summaries with their peers, facilitating knowledge sharing and collaboration.
+
+#### 5. Podcast and Audio Content:
+**Scenario**: A company produces a series of educational podcasts. Employees need to review these podcasts to stay informed about industry trends and best practices.
+
+**Solution**: The audio summary feature can generate concise summaries of podcast episodes, highlighting key points and important segments. Employees can quickly review the summaries to understand the podcast content without listening to the entire episode.
+
+By implementing the video and audio summary features, the Document Summarization Application will become a more powerful and versatile tool, capable of handling both text and multimedia content. This enhancement will significantly improve user experience, efficiency, and applicability across various domains.
+
+
diff --git a/docs-ua/community/rfcs/24-07-11-OPEA-Agent.md b/docs-ua/community/rfcs/24-07-11-OPEA-Agent.md
new file mode 100644
index 00000000..9c8a4571
--- /dev/null
+++ b/docs-ua/community/rfcs/24-07-11-OPEA-Agent.md
@@ -0,0 +1,278 @@
+## Status
+
+v0.1 team sharing completed(07/10/24)
+
+## Objective
+
+This RFC introduces a new concept of an "Hierarchical Agent," which includes two parts.
+
+* 'Agent’: Agent refers to a framework that integrates the reasoning capabilities of large language models (LLMs) with the ability to take actionable steps, creating a more sophisticated system that can understand and process information, evaluate situations, take appropriate actions, communicate responses, and track ongoing situations, finally output with result meeting defined goals.
+
+Single Agent Example:
+
+ ![image](https://github.com/xuechendi/docs/assets/4355494/41a40edc-df73-4e3d-8b0a-c206724cc881)
+
+ behind the scene
+
+ ![image](https://github.com/xuechendi/docs/assets/4355494/02232f5b-8034-44f9-a10c-545a13ec5e40)
+
+
+* ‘Multi Agent' system: Multi Agents refer to a design that leveraging a Hierarchical Agent Teams to complete sub-tasks through individual agent working groups. Benefits of multi-agents’ design: (1) Grouping tools/responsibilities can give better results. An agent is more likely to succeed on a focused task than if it must select from dozens of tools. (2) Each agent will have their own assets including prompt, llm model, planning strategy and toolsets. (3) User can easily use yaml files or few lines of python to build a 'Hierarchical Multi Agent' megaservice by cherry-picking ready-to-use individual agents. (4) For small tasks which can be perfectly performed by single Agent, user can directly use 'Agent' microservice with simple/easy resource management.
+
+Multi Agent example:
+
+```
+curl ${ip_addr}:${SUPERVISOR_AGENT_PORT}/v1/chat/completions -X POST \
+-d "{'input': 'Generate a Analyst Stock Recommendations by taking an average of all analyst recommendations and classifying them as Strong Buy, Buy, Hold, Underperform or Sell.'}"
+```
+![image](https://github.com/xuechendi/docs/assets/4355494/d96b5e26-95a5-4611-9a32-a546eaa324a4)
+
+## Motivation
+
+This RFC aims to provide low-code / no-code agents as new microservice / megaservice for Enterprise users who are looking for using their own tools with LLM. Tools includes domain_specific_search, knowledgebase_retrieval, enterprise_servic_api_authorization_required, proprietary_tools, etc.
+
+## Persona
+
+We use the listed terms to define different persona mentioned in this document.
+
+* OPEA developer: OPEA developers describe who will follow current OPEA API SPEC or expand OPEA API SPEC to add new solutions. OPEA developers are expected to use this RFC to understand how this microservice communicates with other microservices and chained in megaflow. OPEA developer develops OPEA agent codes and add new Agent Implementation by extending current Agent library with advanced agent strategies.
+
+* Enterprise User (Devops): Devops describe who will follow OPEA yaml configuration format to update settings according to their real need, or tune some of the configuration to get better performance, who will also use their updated configuration to launch all microservices and get functional endpoint and API calling. Devops are expected to use this RFC to understand the keywords, how these keywords works and rules of using this microservice. Devops are expected to follow customer tool template to provide their own tools and register to Agent microservice.
+
+* End user: End user describe who writes application which will use OPEA exposed endpoints and API to fulfill task goals. End users are expected to use this RFC to understand API keywords and rules.
+
+
+## Design Proposal
+ ### Execution Plan
+ v0.8 (PR ready or merge to opea - agent branch)
+ * Agent component v0.1
+ * Support chat-completion API
+ * Agent example - Insight Assistant v0.1 (IT demo)
+ * hierarchical multi agents
+ * includes: research(rag, data_crawler); writer(format); reviewer(rule)
+ * Agent debug system
+
+V0.9
+* Agent component v0.1
+ * Support assistants API
+ * K8s helm chart
+* Agent Example - Insight Assistant v0.1
+ * Shared demo with IT
+ * Establish IT collaboration effort
+
+V1.0
+* Performance benchmark
+* Scaling
+* Concurrency
+
+ ### Part 1. API SPEC
+ Provide two types of API for different client application.
+ 1. openAI chat completion API.
+ > Reference: https://platform.openai.com/docs/api-reference/chat/create
+
+ Advantage and limitation:
+ * Most common API, should be working with any existing client uses openAI.
+ * will not be able to memorize user historical session, human_in_loop agent will not work using this API.
+
+ ```
+ "/v1/chat/completions": {
+ "model": str,
+ "messages": list,
+ "tools": list,
+ }
+ ```
+
+ 2. openAI assistant API
+ > Reference: https://platform.openai.com/docs/api-reference/assistants
+
+ Advantage and limitation:
+ * User can create a session thread memorizing previous conversation as long-term memory. And Human-In-Loop agent will only works use this API.
+ * User client application may need codes change to work with this new API.
+ * openAI assistant API is tagged with ‘beta’, not stable
+
+ ```
+ # assistants API is used to create agent runtime instance with a set of tool / append addition instructions
+ - "/v1/assistants": {
+ "instructions": str,
+ "name": str,
+ "tools": list
+ }
+
+ # threads API is to used maintain conversation session with one user. It can be resumed from previous, can tracking long term memories.
+ - "/v1/threads/ ": { # empty is allowed }
+
+
+ # threads messages API is to add a task content to thread_1 (the thread created by threads API)
+ - "/v1/threads/thread_1/messages": {
+ "role": str,
+ "content": str
+ }
+
+ # threads run API is to start to execute agent thread using run api
+
+ - "/v1/threads/thread_1/runs": {
+ 'assistant_id': str,
+ 'instructions': str,
+ }
+ ```
+
+ ### Part 2. 'Agent' genAI Component definition
+
+ 'Agent' genAI Component is regarded as the resource management unit in “Agent” design. It will be launched as one microservice and can be instantiated as ‘Agent’, ‘Planner’ or ‘Executor’ according to configuration. Tools will be registered to 'Agent' microservice during launch or runetime.
+
+ ![image](https://github.com/user-attachments/assets/38e83fa4-57d8-4146-9061-e5153472b5f4)
+
+ #### SPEC for any agent Role - agent, planner, executor
+ ```
+ "/v1/chat/completions": {
+ "model": str,
+ "messages": list,
+ "tools": list,
+ }
+ "/v1/assistants": {
+ "instructions": str,
+ "name": str,
+ "tools": list
+ }
+ "/v1/threads/: {}
+ "/v1/threads/thread_1/runs": {
+ 'assistant_id': str,
+ 'instructions': str,
+ }
+ "/v1/threads/thread_1/messages": {
+ "role": str,
+ "content": str
+ }
+ ```
+
+ #### Agent Role microservice definition - 'Agent':
+ A complete implementation of Agent, which contains LLM endpoint as planner, strategy algorithm for plan execution, Tools, and database handler to keep track of historical state and conversation.
+
+ configuration:
+ ```
+ strategy: choices([react, planexec, humanInLoopPlanExec])
+ require_human_feedback: bool
+ llm_endpoint_url: str
+ llm_engine: choices([tgi, vllm, openai])
+ llm_model_id: str
+ recursion_limit: int
+ tools: file_path or dict
+
+ # Tools definition
+ [tool_name]:
+ description: str
+ callable_api: choices([http://xxxx, xxx.py:func_name])
+ env: str
+ pip_dependencies: str # sep by ,
+ args_schema:
+ query:
+ type: choices([int, str, bool])
+ description: str
+ return_output: str
+ ```
+
+ #### Agent Role microservice definition - 'Planner':
+ Agent without tools. Planner only contains LLM endpoints as planner, certain strategies to complete an optimized plan.
+
+ configuration:
+ ```
+ strategy: choices([react, planexec, humanInLoopPlanExec])
+ require_human_feedback: bool
+ llm_endpoint_url: str
+ llm_engine: choices([tgi, vllm, openai])
+ llm_model_id: str
+ recursion_limit: int
+ require_human_feedback: bool
+ ```
+
+ #### Agent Role microservice definition - 'Executor':
+ Tools executors. Executor is used to process input with registered tools.
+
+ Configuration:
+ ```
+ [tool_name]:
+ description: str
+ callable_api: choices([http://xxxx, xxx.py:func_name])
+ env: str
+ pip_dependencies: str # sep by ,
+ args_schema:
+ query:
+ type: choices([int, str, bool])
+ description: str
+ return_output: str
+ ```
+
+ > Any microservcice follow this spec can be registered as role in Part3-graph-based
+
+### Part3. 'Multi Agent' system overview
+
+We planned to provide multi-agent system in two phases.
+
+* Phase I: Hierarchical Multi Agents
+ 1. In this design, only top-layer Agent will be exposed to OPEA mega flow. And only ‘Agent’ microservice will be used to compose Hierarchical Multi Agents system.
+ 2. Users are only allowed to use yaml files to provide tools configuration, high-level instructions text and hierarchical relationship between agents.
+ 3. This design simplifies the agent configuration, using simple yaml definition can still be used to compose a multi agent system to handle complex tasks.
+ > Detailed configuration please refer to Part3.1
+ ![image](https://github.com/user-attachments/assets/be3bef3a-a1c9-4059-a8a1-e8e52e0d6c16)
+
+
+* Phase II: Graph-Based Multi Agent
+ 1. In this design, we provide user a new SDK to compose a graph-based multi agents system with conditional edge to define all strategic rules.
+ 2. Enterprise user will be able to use python code to wrap either ‘agent’, ‘planner’ or tools as ‘Role’ and add conditional edges between them for complex task agent design.
+ 3. This design provides user enough flexibility to handle very complex tasks and also provide flexibility to handle resource management when certain tools are running way slower than others.
+ > Detailed configuration please refer to Part3.2
+ ![image](https://github.com/user-attachments/assets/35b36f64-eaa1-4f05-b25e-b8bea013680d)
+
+#### Part3.1 Hierarchical Multi Agents
+__Example 1__: ‘Single Agent megaservice’
+Only 1 agent is presented in this configuration.
+![image](https://github.com/user-attachments/assets/2e716dd4-2923-4ebd-97bf-fe7a44161280)
+
+3 tools are registered to this agent through custom_tools.yaml
+![image](https://github.com/user-attachments/assets/5b523ff2-9193-4b0c-b606-4149fd3e8612)
+
+![image](https://github.com/user-attachments/assets/5ad3c2a9-dc50-472b-8352-041ae4b6a9c6)
+![image](https://github.com/user-attachments/assets/ec89e35b-8ccc-474b-9fb7-3ed7210acc10)
+
+__Example 2__: ‘Hierarchical Multi Agents’
+3 agents are presented in this configuration, 1st layer supervisor agent is the gateway to interact with user, and 1st layer agent will manage 2nd layer worker agents.
+
+![image](https://github.com/user-attachments/assets/a83b51e6-ee08-473f-b389-51df48f1054f)
+
+Users are expected to register 2nd layer workerAgents to 1st layer supervisor agent through supervisor_agent_custom_tools.yaml file.
+![image](https://github.com/user-attachments/assets/d07223e9-4290-4ea7-8416-0caa2540bce1)
+
+![image](https://github.com/user-attachments/assets/9cc3825f-c77f-4482-bf10-292c08235f3b)
+![image](https://github.com/user-attachments/assets/62bc9644-5308-4d4b-9784-a022dc26c37a)
+
+> User can follow this way to add more layers:
+![image](https://github.com/user-attachments/assets/cc42fe97-4adf-44c9-a95a-c4bef8e26000)
+
+__Example 3__: ‘Multi Steps Agent megaservice’:
+
+User can also chain agent into a multi-step mega service. audioAgent_megaservice.yaml
+![image](https://github.com/user-attachments/assets/5fb18d75-9c08-4d7b-97f7-25d7227147dd)
+
+#### Part3.2 Graph-Based Multi Agent
+In Phase II, we propose to provide a graph-based multi agents system, which enterprise user will be able to define edges and conditional edges between agent nodes, planner nodes and tools for complex task agent design.
+
+![image](https://github.com/user-attachments/assets/7c07e651-43ed-4056-b20a-cd39f3f883ee)
+
+The user can build and launch the graph-based message group by the combination of docker image and yaml file:
+![image](https://github.com/user-attachments/assets/5c84f728-ff87-45c9-8f09-ecd5428da454)
+
+The yaml file contains the basic config information for each single “Role” in the agent architecture. The user can build a MessageGroup to define the link connection information and the data flow via “edges” and “conditional_edges”. The “edges” mean the output of the head_node is the input of the tail_node. The “conditional_edges” means there is a decision-making among the candidate tail_nodes based on the output of the head_node. The logic of this selection part is defined by the state component “Should_Continue”.
+![image](https://github.com/user-attachments/assets/55ecb718-b134-4546-9496-40ac3a427a7b)
+
+Appending agents/roles in MessageGroup.
+Define the role class define the action of the role add edges recompile the messagegroup
+![image](https://github.com/user-attachments/assets/65a3fc1d-89f3-4bb3-a078-75db91400c58)
+
+#### Part 4. Agent Debug System
+
+TBD
+
+#### Part 5. Benchmark
+
+TBD
+
diff --git a/docs-ua/community/rfcs/24-08-20-OPEA-001-AI_Gateway_API.md b/docs-ua/community/rfcs/24-08-20-OPEA-001-AI_Gateway_API.md
new file mode 100644
index 00000000..afc36425
--- /dev/null
+++ b/docs-ua/community/rfcs/24-08-20-OPEA-001-AI_Gateway_API.md
@@ -0,0 +1,107 @@
+## RFC Title
+
+AI Gateway API
+
+## RFC Content
+
+### Author
+
+[daixiang0](https://github.com/daixiang0), [zhixie](https://github.com/zhxie), [gyohuangxin](https://github.com/gyohuangxin), [Forrest-zhao](https://github.com/Forrest-zhao), [ruijin-intel](https://github.com/ruijin-intel)
+
+### Status
+
+Under Review
+
+### Objective
+
+Design the API for AI Gateway.
+
+### Motivation
+
+- Introduce gateway to do mTLS, traffic control, observability and so on
+- Introduce AI Gateway API to use existing gateway sloutions rather than implement our own one.
+
+### Design Proposal
+
+The AI gateway is at the front of all microservices:
+
+```mermaid
+graph TD;
+ A(AI Gateway)-->Retrival;
+ A-->Rerank;
+ A-->LLM;
+ A-->Guardrails;
+ A-->B(Any microservice);
+```
+
+#### API overall
+
+To make the most of current resources, we choose to follow [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) since it is the gateway API standard that all gateways support.
+
+Since AI specific features of Kubernetes Gateway API are still [under discussion](https://docs.google.com/document/d/1FQN_hGhTNeoTgV5Jj16ialzaSiAxC0ozxH1D9ngCVew/edit), We design AI Gateway API including following two parts:
+
+- **Kubernetes Gateway API** for features it already supports
+- **Extension API for** all other features
+
+#### API workflow
+
+```mermaid
+graph LR;
+ A(Config using AI Gateway API)-->B(Convert to specific gateway API)
+```
+
+AI Gateway is not a brand-new gateway implementation, only does one thing: Convert.
+
+#### Extension API
+
+```yaml
+apiVersion: extension.gateway.opea.dev/v1
+kind: Gateway
+metadata:
+ name: extension-exmaple
+spec:
+ gatewayClassName: envoy
+ extensions:
+ - name: extension-1
+ config:
+ extension-1-config: aaa
+ - name: extension-2
+ config:
+ extension-2-config: bbb
+```
+
+- gatewayClassName: specific gateway implement
+- name: the name of extension feature, support multiple extensions
+- config: the content of extension config, following specified gateway API
+
+#### Extension API example
+
+```yaml
+
+apiVersion: extension.gateway.opea.dev/v1
+kind: Gateway
+metadata:
+ name: envoy-extension-exmaple
+spec:
+ gatewayClassName: envoy
+ extensions:
+ - name: token-ratelimit
+ config:
+ name: envoy.filters.http.guardrails
+ typed_config:
+ "@type": type.googleapis.com/envoy.extensions.filters.http.guardrails.v3.Guardrails
+ inference:
+ runtime: envoy.inference_runtime.openvino
+ typed_config:
+ "@type": type.googleapis.com/envoy.extensions.inference_runtime.openvino.v3.OpenvinoConfig
+ backend: CPU
+ plugins:
+ - /usr/lib/libopenvino_tokenizers.so
+ model_path: /home/zhihao/envoy/.project/openvino/models/OTIS-Official-Spam-Model.xml
+ source: RESPONSE
+ action: ALLOW
+```
+
+**Guardrail** is AI specific feature, here we use Extension API to config Envoy to use CPU to inference with specified model to do response check.
+
+The config field follows the Envoy API.
diff --git a/docs-ua/community/rfcs/24-08-21-GenAIExample-002-Edge_Craft_RAG.md b/docs-ua/community/rfcs/24-08-21-GenAIExample-002-Edge_Craft_RAG.md
new file mode 100644
index 00000000..76e097f3
--- /dev/null
+++ b/docs-ua/community/rfcs/24-08-21-GenAIExample-002-Edge_Craft_RAG.md
@@ -0,0 +1,189 @@
+# Edge Craft RAG
+
+This RFC describes a solution of a tunable RAG for edge scenarios.
+
+## RFC Content
+
+### Author
+
+[myqi](https://github.com/myqi)
+
+### Status
+
+Under Review
+
+### Objective
+
+Edge industry users are facing obstacles to build an "out-of-the-box" RAG
+application to meet both quality and performance requirements. Total Cost of
+Ownership(TCO) and pipeline optimization techniques are the two main reasons
+to block this process.
+
+#### Total Cost of Ownership
+
+The HW requirement of a typical edge use case is a single host with one of the
+following combinations:
+- Intel(R) Core(TM) Ultra Processor
+- Intel(R) Core(TM) Processor + Intel(R) Iris(R) Xe Graphics
+- Intel(R) Core(TM) Processor + Intel(R) Arc(TM) A-Series Graphics
+- Intel(R) Xeon(R) Processor + Intel(R) Arc(TM) A-Series Graphics
+
+The scenarios with these hardware options block the edge users from using large
+parameter size LLMs on-prem as well as sophisticated RAG pipeline for their data.
+Thus, the RAG pipeline at edge needs to be highly curated for underlying hardwares
+and suitable models accordingly.
+
+#### RAG Pipeline Optimization Techniques
+
+Tuning RAG pipeline is a systematic problem. First, the quality depends on the
+result of each stage in the pipeline as well as the end-to-end outcome. Second,
+optimization could be a trade-off among the metrics. It is difficult to decide one
+answer is better than another if it is slightly not accurate in number but more
+relevant to the query. Third, the optimization techniques may not intuitively
+reflect to metrics improvements. E.g., recrusive retrieval may contribute to
+improving the recall and context relevancy, or may not.
+
+### Motivation
+
+Edge Craft RAG (EC-RAG) is a customizable, tunable and production-ready
+Retrieval-Augmented Generation system for edge solutions. It is designed to
+curate the RAG pipeline to meet hardware requirements at edge with garanteed
+quality and performance.
+
+From quality perspective, EC-RAG is tunable in the indexing, retrieving,
+reranking and generation stages for particular edge use cases. From performance
+perspective, the pipeline is consolidated in a single service to eliminate the
+overhead of inter-service communication on a single host. Meanwhile, the inferencing
+stages like embedding, reranking and generation are optimized for Intel(R) Iris(R)
+Xe Graphics and Intel(R) Arc(TM) A-Series Graphics.
+
+### Design Proposal
+
+EC-RAG is composed of the following components:
+- UI for doc loading and interactive chatbot.
+- Gateway
+- Mega-service with a single micro-services for the tunable* EC-RAG pipeline.
+- LLM serving microservice optimized for Intel(R) Iris(R) Xe Graphics and Intel(R) Arc(TM) A-Series
+Graphics
+- VectorDB microservice optimized for Intel(R) Iris(R) Xe Graphics and/or Intel(R) Arc(TM) A-Series
+- Docker compose file to launch the UI, Mega/Micro-services
+
+> [!NOTE]
+> *Advanced tuning EC-RAG will need a tool co-piloting with the pipeline which will be described in
+> a separate doc
+
+Below diagram illustrates the overall components of EC-RAG:
+![EC-RAG Diagram](Edge_Craft_RAG.png)
+
+The EC-RAG pipeline will expose 3 types of REST API endpoint:
+- **/v1/data** for indexing
+- **/v1/settings** for configuration
+- **/v1/chatqna** for inferencing
+
+#### /v1/data
+
+| Description | Action | Endpoint | Data Schema |
+| ------------- | ------ | ------------- | ------------------ |
+| Upload a file | POST | /v1/data | FastAPI.UploadFile |
+| List files | GET | /v1/data | |
+| Remove | DELETE | /v1/data/{id} | |
+
+#### /v1/settings/pipelines
+
+| Description | Action | Endpoint | Data Schema |
+| ------------------ | ------ | ----------------------------- | ------------------ |
+| Setup a pipeline | POST | /v1/settings/pipelines | Pipeline object |
+| Get/list pipelines | GET | /v1/settings/pipelines(/{id}) | Pipeline object(s) | |
+| Update pipelines | PATCH | /v1/settings/pipelines/{id} | Pipeline object |
+| Remove a pipeline | DELETE | /v1/settings/pipelines/{id} | |
+
+### /v1/settings/models
+
+| Description | Action | Endpoint | Data Schema |
+| --------------- | ------ | -------------------------- | --------------- |
+| Load models | POST | /v1/settings/models | Model object |
+| Get/list models | GET | /v1/settings/models(/{id}) | Model object(s) |
+| Update models | PATCH | /v1/settings/models/{id} | Model object |
+| Remove a model | DELETE | /v1/settings/models/{id} | |
+
+### Pipeline configuration example
+
+```json
+{
+ "name": "rag_demo",
+ "node_parser" : {
+ "chunk_size": 250,
+ "chunk_overlap": 48,
+ "type": "simple"
+ },
+ "indexer": {
+ "type": "faiss_vector",
+ "model": {
+ "model_id": "BAAI/bge-small-en-v1.5",
+ "model_path": "./bge_ov_embedding",
+ "device": "auto"
+ }
+ },
+ "retriever": {
+ "type": "vectorsimilarity",
+ "top_k": 30
+ },
+ "postprocessors": [
+ {
+ "type": "reranker",
+ "rerank_top_n": 5,
+ "model": {
+ "model_id": "BAAI/bge-reranker-large",
+ "model_path": "./bge_ov_reranker",
+ "device": "auto"
+ }
+ }
+ ],
+ "generator": {
+ "model": {
+ "model_id": "qwen2-7b-instruct",
+ "model_path": "./qwen2-7b-instruct/INT4_compressed_weights",
+ "device": "auto"
+ },
+ "prompt_path" : "./data/default_prompt.txt"
+ },
+ "active": "True"
+}
+```
+
+#### UI
+
+The EC-RAG UI is gradio. The user is able to select the models as well as input
+parameters in different stages for the pipeline. The chatbox is also integrated
+in the UI.
+
+EC-RAG UI - Model Condiguration
+![EC-RAG UI Model Configuration](Edge_Craft_RAG_screenshot_1.png)
+
+EC-RAG UI - Chatbot with settings
+![EC-RAG UI Chatbot](Edge_Craft_RAG_screenshot_2.png)
+
+### Compatibility
+
+EC-RAG megaservice and microservice are compatible with the existing OPEA
+GenAIExamples and GenAIComps repos. The EC-RAG leverages the LLM microservice
+and the VectorDB microservice from GenAIComps.
+
+### Miscellaneous
+
+The EC-RAG will be developed in 2 phases.
+
+#### Phase 1
+
+The UI, gateway, and EC-RAG pipeline will be finished without Vector DB as
+persistent DB. Instead, FAISS will be used for vector search and keep vector
+store in memory.
+
+In this phase, the LLM inferencing will happen in the pipeline until the LLM
+serving microservice supports Intel(R) Iris(R) Xe Graphics and Intel(R) Arc(TM)
+A-Series Graphics.
+
+#### Phase 2
+
+The vector DB will be enabled in this phase as well as LLM inferencing on
+Intel(R) Iris(R) Xe Graphics and Intel(R) Arc(TM) A-Series Graphics.