From 5a3f9797ecf826e5486e114bf215266a83134775 Mon Sep 17 00:00:00 2001 From: digitallysavvy Date: Sat, 8 Jun 2024 00:51:57 -0400 Subject: [PATCH 01/14] added guide --- docs/GUIDE.md | 343 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 343 insertions(+) create mode 100644 docs/GUIDE.md diff --git a/docs/GUIDE.md b/docs/GUIDE.md new file mode 100644 index 0000000..d151b35 --- /dev/null +++ b/docs/GUIDE.md @@ -0,0 +1,343 @@ +# Add Realtime 3D Avatars to Live Video Streams + +In today’s rapidly evolving digital landscape, live streaming video is dominating real-time communication. Users now expect more immersive and customizable streaming options. Content creators are increasingly seeking creative new ways to stream themselves, giving rise to the demand for dynamic 3D avatars that mirror their movements and expressions. + +Traditionally, real-time virtual avatars required complex motion capture equipment and sophisticated software, often making it inaccessible for everyday users and independent creators. However, artificial intelligence has changed this status quo as well. With advancements in computer vision,it's now possible to run sophisticated Ai aialgorithms on-device that can accurately capture and translate human facial gestures into digital form in real-time. + +In this walkthrough, we'll look at how to integrate 3D virtual avatars into your [Agora](https://www.agora.io) live streams using [MediaPipe](https://ai.google.dev/edge/mediapipe/solutions/guide) and 3D avatars from [ReadyPlayerMe](https://readyplayer.me/). Whether you're looking to enhance audience engagement or just add a fun, creative twist to your app's video calls/live broadcasts, this guide will provide you with the necessary steps to bring 3D virtual personas to life. + +## Prerequisites +- [Node.JS](https://nodejs.org) +- A developer account with [Agora](https://console.agora.io) +- A basic understanding of HTML/CSS/JS +- A basic understanding of ThreeJS +- A basic understanding of Agora - [Web QuickStart](https://medium.com/agora-io/a-simple-approach-to-building-a-group-video-chat-web-app-0b8a6cacfdfd) +- A code editor, I use [VSCode](https://code.visualstudio.com) +- A 3D avatar from [ReadyPlayerMe](https://readyplayer.me/) + +## Agora + MediaPipe project +To keep this guide concise, I'm assuming you have some understanding of how to implement the Agora Video SDK into a web-app. If you dont, check out my guide on [ Building a Group Video Chat Web App](https://medium.com/agora-io/a-simple-approach-to-building-a-group-video-chat-web-app-0b8a6cacfdfd). + +To get started, download the [demo project](https://github.com/digitallysavvy/agora-mediapipe-readyplayerme). With the code downloaded, navigate to the project folder in the terminal and use `npm` to install the node pacakges. + +```bash +git clone git@github.com:digitallysavvy/agora-mediapipe-readyplayerme.git +cd agora-mediapipe-readyplayerme +npm i +``` + +## Core Structure (HTML) +Let’s start with the html structure in [`index.html`](index.html), at the top of the `` are the "call" UI elements: a container for the remote videos, a container for the local user with buttons for muting and unmuting the audio/video and a button to leave the chat. + +Aside from the call UI, there's an overlay screen that will allow users to input the URL to their avatars, and a button to join the channel. + +```HTML + + + + + + + + Agora Live Video Demo + + +
+
+
+ + + +
+ + + + +``` + +## Agora Client and data stores +In [`main.js`](/main.js) we create a new Agora client to use Agora's SDK and use `localMedia` to keep a reference to the audio, video, and canvas tracks and their active state. We'll need `headRotation` and `blendShapes` to store the data we get from MediPipe's computer vision. +` +```javascript +// Create the Agora Client +const client = AgoraRTC.createClient({ + codec: 'vp9', + mode: 'live', + role: 'host' +}) + +const localMedia = { + audio: { + track: null, + isActive: false + }, + video: { + track: null, + isActive: false + }, + canvas: { + track: null, + isActive: false + }, +} + +// Container for the remote streams +let remoteUsers = {} + +// store data from facial landmarks +let headRotation +let blendShapes + +``` + +### DOMContentLoaded and Event Listeners +When the page loads, we'll add listeners for the Agora events, the media controls and form submission. With the listeners in place we're ready to show the overlay form. + +```javascript +// Wait for DOM to load +document.addEventListener('DOMContentLoaded', async () => { + // Add the Agora Event Listeners + addAgoraEventListeners() + // Add listeners to local media buttons + addLocalMediaControlListeners() + // Get the join channel form & handle form submission + const joinform = document.getElementById('join-channel-form') + joinform.addEventListener('submit', handleJoin) + // Show the overlay form + showOverlayForm(true) +}) +``` + +> NOTE: Make sure to add client event listensers before joining the channel, otherwise some events may not get triggered as expected. + +## 3D & Avatar Setup +When the user clicks the join button, initialize the ThreeJS scene and append the `` to the `localUserContainer`. After the scene is created, load the avatar using the `glbURL` from the user. After the 3D avatar is loaded, we'll traverse it's scene graph and create a an object with all the nodes. This will give us quick access to the `headMesh`. + +There is a noticable delay between the time it takes to initialize the scene to the moment the 3D avatar is loaded and ready for use. To let the user know it's loading it's good practice to display a loading animation and remove it once the 3D avatar is added to the scene. + +```javascript +// get the local-user container div +const localUserContainer = document.getElementById('local-user-container') + +// show a loading animation +const loadingDiv = document.createElement('div') +loadingDiv.classList.add('lds-ripple') +loadingDiv.append(document.createElement('div')) +localUserContainer.append(loadingDiv) + +// create the scene and append canvas to localUserContainer +const { scene, camera, renderer } = await initScene(localUserContainer) + +// append url parameters to glb url - load ReadyPlayerMe avatar with morphtargets +const rpmMorphTargetsURL = glbURL + '?morphTargets=ARKit&textureAtlas=1024' +let nodes +// Load the GLB with morph targets +const loader = new GLTFLoader() +loader.load(rpmMorphTargetsURL, + async (gltf) => { + const avatar = gltf.scene + // build graph of avatar nodes + nodes = await getGraph(avatar) + const headMesh = nodes['Wolf3D_Avatar'] + // adjust position + avatar.position.y = -1.65 + avatar.position.z = 1 + + // add avatar to scene + scene.add(avatar) + // remove the loading spinner + loadingDiv.remove() +}, +(event) => { + // outout loading details + console.log(event) +}) +``` + +> Note: There is a noticable delay when loading 3D avatars directly from ReadyPlayerMe. + +## Init video element with Agora +We're going to use Agora to get camera access and create our video and audio tracks. We'll use the camera's video track as the the source for the video element. If you'd like a deeper explanation check out my guide on using [Agora with custom video elements](https://medium.com/agora-io/custom-video-elements-with-javascript-and-agora-web-sdk-3c70d5dc1e09). + +```javascript +// Init the local mic and camera +await initDevices('music_standard', '1080_3') +// Create video element +const video = document.createElement('video') +video.setAttribute('webkit-playsinline', 'webkit-playsinline'); +video.setAttribute('playsinline', 'playsinline'); +// Create a new MediaStream using camera track and set it the video's source object +video.srcObject = new MediaStream([localMedia.video.track.getMediaStreamTrack()]) +``` +## MediaPipe Setup +Before we can detect faces and facial gestures, we need to download the lastest WASM files for MediaPipe's computer vision and configure a `FaceLandmarker` task. In the face landmarks configuration, we'll set the `faceLandmarker` to ouput blendshape weights and facial transformations. These two settings are important because when we run the prediction loop we'll need access to that data. + +```javascript +// initialize MediaPipe vision task +const faceLandmarker = await initVision() + +// init MediaPipe vision +const initVision = async () => { + // load latest Vision WASM + const vision = await FilesetResolver.forVisionTasks('https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm') + // configure face landmark tracker + const faceLandmarker = await FaceLandmarker.createFromOptions( + vision, { + baseOptions: { + modelAssetPath: `https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task`, + }, + outputFaceBlendshapes: true, + outputFacialTransformationMatrixes: true, + runningMode: 'VIDEO' + }) + return faceLandmarker +} +``` + +### Computer Vision Prediction Loop +With the `faceLandmarker` and `