Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Example for Integrating Realtime API into Sotopia. #234

Open
wants to merge 77 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
72b82c0
add aact as a dependency
ProKil Oct 4, 2024
ceaeddf
minimal demo example of running custom model
ProKil Oct 6, 2024
51cbb7e
devcontainer setup and example
ProKil Oct 6, 2024
cf9dcd6
remove default_bad_process_model to allow using custom model entirely
ProKil Oct 6, 2024
a07be86
improve the demo to show parallel execution
ProKil Oct 6, 2024
38d182d
CI: update tests trigger from pull request target to pull request
ProKil Oct 6, 2024
4620fb4
fix mypy errors
ProKil Oct 6, 2024
1975021
adding stubs to pyproject.toml
ProKil Oct 6, 2024
d059f0b
poetry lock
ProKil Oct 6, 2024
a7603b4
install all extras in the devcontainer start script
ProKil Oct 6, 2024
a4f4c02
add dev containers instruction
ProKil Oct 7, 2024
569ef11
migration to uv
ProKil Oct 7, 2024
b9fcf3d
update mypy
ProKil Oct 7, 2024
0eed5fd
Merge branch 'feature/migrate-to-uv' into feature/integrate-aact
ProKil Oct 7, 2024
312e052
Merge remote-tracking branch 'origin/main' into feature/migrate-to-uv
ProKil Oct 7, 2024
d8d4708
Update index.mdx
ProKil Oct 7, 2024
5054a97
update uv venv path in the devcontainer and contributor's guide
ProKil Oct 7, 2024
66ba0ae
Merge branch 'feature/migrate-to-uv' of github.com:sotopia-lab/sotopi…
ProKil Oct 7, 2024
7491e75
simple examples of using aact for multi-agent async communication
ProKil Oct 8, 2024
b4a4f67
Merge branch 'feature/migrate-to-uv' into feature/integrate-aact
ProKil Oct 8, 2024
b7ccf10
allowing agents' aact function to return None
ProKil Oct 8, 2024
992320a
import Self for 3.10
ProKil Oct 8, 2024
5828002
Merge remote-tracking branch 'origin/main' into feature/integrate-aact
ProKil Oct 8, 2024
5e3eb86
Create readme.md
ProKil Oct 8, 2024
2b4b00e
dockerfile
ProKil Oct 8, 2024
dba0829
record node log
ProKil Oct 8, 2024
ceebc9f
frequency -> interval
ProKil Oct 8, 2024
fd7ef39
docker compose (it works)
ProKil Oct 9, 2024
65c19a1
use published images to speed up
ProKil Oct 9, 2024
7cee92d
add ci test with docker
ProKil Oct 10, 2024
631c53c
use compose action github action
ProKil Oct 10, 2024
4d7b3e6
update docker compose file
ProKil Oct 10, 2024
b06c75e
update compose file path
ProKil Oct 10, 2024
cdbff03
use github-action-docker-compose-test-run
ProKil Oct 10, 2024
29426aa
remove unused port binding in docker-compose
ProKil Oct 10, 2024
278403a
add quotes to docker compose command
ProKil Oct 10, 2024
b18f76b
test run
ProKil Oct 10, 2024
53b1845
test run
ProKil Oct 10, 2024
3190af5
write test script in tests.sh
ProKil Oct 10, 2024
506237b
use docker compose
ProKil Oct 10, 2024
aa114fe
test run
ProKil Oct 10, 2024
6205623
--rm
ProKil Oct 10, 2024
c8cd931
./ -> .
ProKil Oct 10, 2024
8cf257b
test
ProKil Oct 10, 2024
d453d39
change to arm64
ProKil Oct 10, 2024
2865dd3
fix docker platform problem
ProKil Oct 10, 2024
9c78d12
change test os
ProKil Oct 10, 2024
65b314d
fix some build bugs
ProKil Oct 10, 2024
5810007
fix runner dir
ProKil Oct 10, 2024
6f7ba6f
fix a test case for sample
ProKil Oct 10, 2024
703147a
update cli test to test_install
ProKil Oct 10, 2024
1d1da9b
update test benchmark to improve coverage
ProKil Oct 10, 2024
836d122
remove unused and maintain structured output compatibility
ProKil Oct 10, 2024
1659b34
fix evaluator bug
ProKil Oct 10, 2024
60ba747
Merge branch 'feature/docker-compose' into feature/integrate-aact
ProKil Oct 10, 2024
06b2172
add a test script which contributors can run locally
ProKil Oct 10, 2024
2851036
Merge branch 'feature/docker-compose' into feature/integrate-aact
ProKil Oct 10, 2024
2b2c1ab
Merge remote-tracking branch 'origin/main' into feature/integrate-aact
ProKil Oct 11, 2024
37d3d9f
bump the version to 0.1.1
ProKil Oct 11, 2024
dfd861d
add langchain openai back
ProKil Oct 11, 2024
ee41bbb
add langchain openai in uv lock
ProKil Oct 11, 2024
125cc01
remove redundant cast
ProKil Oct 11, 2024
f3c27f4
add test case
ProKil Oct 11, 2024
d241513
test base agent
ProKil Oct 11, 2024
69821d6
more coverage for agent.py
ProKil Oct 12, 2024
3d581d3
add __init__ to sotopia.experimental
ProKil Oct 12, 2024
30d2d3d
chore: Add experimental page and agents documentation
ProKil Oct 12, 2024
2663f52
first example of realtime api
ProKil Oct 12, 2024
102a535
user input text and realtime replys audio
ProKil Oct 12, 2024
9bf0cc4
finally got audio in audio out right
ProKil Oct 13, 2024
5092508
agent-to-agent async conv simulation example
ProKil Oct 13, 2024
631b3ac
fix mypy and add more instructions in readme
ProKil Oct 13, 2024
112c765
fix contribution
XuhuiZhou Oct 14, 2024
a3d04fc
add install pyaudio to mypy workflow
ProKil Oct 14, 2024
9bc495f
Merge branch 'feature/realtime-api' of github.com:sotopia-lab/sotopia…
ProKil Oct 14, 2024
7c3b26d
accidentally changed test script
ProKil Oct 14, 2024
9adf0e6
Merge remote-tracking branch 'origin/main' into feature/realtime-api
ProKil Oct 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/workflows/mypy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,16 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install Pyaudio
run: sudo apt-get install -y portaudio19-dev
- name: Display Python version
run: python -c "import sys; print(sys.version)"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install uv
uv sync --extra test --extra chat
- name: Type-checking package with mypy
run: |
# Run this mypy instance against our main package.
uv run mypy --strict .
uv run --all-extras mypy --strict .
2 changes: 1 addition & 1 deletion docs/pages/contribution/contribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ Please refer to [Dev Containers](https://containers.dev/supporting#editors) to s

You can also set up the development environment without Dev Containers. There are three things you will need to set up manually:

- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extra`.
- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extras`.
- Redis: Please refer to introduction page for the set up of Redis.
- Local LLM (optional): If you don't have access to model endpoints (e.g. OpenAI, Anthropic or others), you can use a local model. You can use Ollama, Llama.cpp, vLLM or many others which support OpenAI compatible endpoints.

Expand Down
101 changes: 101 additions & 0 deletions examples/experimental/realtime/audio_mixer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
from typing import AsyncIterator, Literal
from aact import Message, Node, NodeFactory
from aact.messages import Tick, Audio
import numpy as np


def merge_audio_streams(
streams: list[bytes], sample_width: Literal[1, 2, 4] = 2
) -> bytes:
# Convert byte streams to numpy arrays of audio samples
format_str = {1: "B", 2: "h", 4: "i"}[sample_width]
stream_samples = [
np.frombuffer(stream, dtype=np.dtype(format_str)) for stream in streams
]

# Make sure both streams are the same length
stream_length = 0
for stream in stream_samples:
assert stream_length == 0 or len(stream) == stream_length
if not stream_length:
stream_length = len(stream)

# Mix audio by adding the samples and avoiding clipping
# mixed_samples = (stream1_samples.astype(np.int32) + stream2_samples.astype(np.int32)) // 2

mixed_samples = np.zeros(stream_length, dtype=np.int32)
for stream in stream_samples:
mixed_samples += stream
mixed_samples //= len(stream_samples)

# Clip the values to ensure they remain within valid range for the bit depth
mixed_samples = np.clip(
mixed_samples, np.iinfo(format_str).min, np.iinfo(format_str).max
)

# Convert back to byte stream
return mixed_samples.astype(np.dtype(format_str)).tobytes()


@NodeFactory.register("audio_mixer")
class AudioMixerNode(Node[Tick | Audio, Audio]):
def __init__(
self,
input_channels: list[str],
tick_input_channel: str,
output_channel: str,
redis_url: str,
buffer_size: int = 1024,
):
super().__init__(
input_channel_types=[(channel, Audio) for channel in input_channels]
+ [(tick_input_channel, Tick)],
output_channel_types=[(output_channel, Audio)],
redis_url=redis_url,
)
self.input_channels = input_channels
self.tick_input_channel = tick_input_channel
self.output_channel = output_channel
self.buffers: dict[str, bytes] = {channel: b"" for channel in input_channels}
self.overflow_buffers: dict[str, bytes] = {
channel: b"" for channel in input_channels
}
self.buffer_size = buffer_size

async def event_handler(
self, channel: str, message: Message[Tick | Audio]
) -> AsyncIterator[tuple[str, Message[Audio]]]:
if channel == self.tick_input_channel:
output_buffers = []

for audio_channel in self.input_channels:
output_buffers.append(
self.buffers[audio_channel]
+ b"\x00" * (self.buffer_size - len(self.buffers[audio_channel]))
)
self.buffers[audio_channel] = self.overflow_buffers[audio_channel][
: self.buffer_size
]
self.overflow_buffers[audio_channel] = self.overflow_buffers[
audio_channel
][self.buffer_size :]
output_buffer = merge_audio_streams(output_buffers)
yield self.output_channel, Message[Audio](data=Audio(audio=output_buffer))

elif channel in self.input_channels:
assert isinstance(message.data, Audio)
if len(self.buffers[channel]) == self.buffer_size:
self.overflow_buffers[channel] += message.data.audio
else:
self.buffers[channel] += message.data.audio
if len(self.buffers[channel]) >= self.buffer_size:
self.overflow_buffers[channel] = self.buffers[channel][
self.buffer_size :
]
self.buffers[channel] = self.buffers[channel][: self.buffer_size]
else:
raise ValueError(f"Unexpected channel: {channel}")
yield (
self.output_channel,
Message(data=Audio(audio=b"")),
) # Unreachable code
35 changes: 35 additions & 0 deletions examples/experimental/realtime/input_node.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import sys
from typing import AsyncIterator

from sotopia.agents.llm_agent import ainput

if sys.version_info < (3, 11):
pass
else:
pass
from aact import Message, Node, NodeFactory
from aact.messages import Text, Zero


@NodeFactory.register("input")
class InputNode(Node[Zero, Text]):
def __init__(self, output_channel: str, redis_url: str) -> None:
super().__init__(
input_channel_types=[],
output_channel_types=[(output_channel, Text)],
redis_url=redis_url,
)
self.output_channel = output_channel

async def event_loop(self) -> None:
while True:
text = await ainput("Enter text: ")
await self.r.publish(
self.output_channel,
Message[Text](data=Text(text=text)).model_dump_json(),
)

async def event_handler(
self, _: str, __: Message[Zero]
) -> AsyncIterator[tuple[str, Message[Text]]]:
yield self.output_channel, Message[Text](data=Text(text=""))
17 changes: 17 additions & 0 deletions examples/experimental/realtime/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## Demo Realtime API

You would need `portaudio` to run this demo:

```bash
# On Mac
brew install portaudio

# On Linux
apt-get install portaudio19-dev
```

Execute this command in the repo folder to run the example:

```python
uv run --extra realtime aact run-dataflow examples/experimental/realtime/realtime_chat.toml
```
74 changes: 74 additions & 0 deletions examples/experimental/realtime/realtime_chat.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
redis_url = "redis://localhost:6379/0"
extra_modules = [
"examples.experimental.realtime.realtime_websocket",
"examples.experimental.realtime.input_node",
"examples.experimental.realtime.audio_mixer"
]

[[nodes]]
node_name = "speaker"
node_class = "speaker"

[nodes.node_args]
rate = 24000
input_channel = "Eve"

[[nodes]]
node_name = "speaker2"
node_class = "speaker"

[nodes.node_args]
rate = 24000
input_channel = "Jane"

# [[nodes]]
# node_name = "listener"
# node_class = "listener"

# [nodes.node_args]
# rate = 24000
# output_channel = "audio_input"


[[nodes]]
node_name = "Eve"
node_class = "openai_realtime"

[nodes.node_args]
input_channel = "Jane_mixed"
output_channel = "Eve"
instruction = "Your name is Eve, you are talking to your friend Jane. You want to convince her to play poker with you tonight. Please start every sentence with \"Jane,\""


[[nodes]]
node_name = "Jane"
node_class = "openai_realtime"

[nodes.node_args]
input_channel = "Eve_mixed"
output_channel = "Jane"
instruction = "Your name is Jane, you are talking to your friend Eve. You want to convince her to play soccer with you tonight. Please let him say the first sentence. Please start every sentence with \"Eve,\""

[[nodes]]
node_name = "audio_mixer_Jane"
node_class = "audio_mixer"

[nodes.node_args]
input_channels = ["Jane"]
tick_input_channel = "tick/millis/20"
output_channel = "Jane_mixed"
buffer_size = 960

[[nodes]]
node_name = "audio_mixer_Eve"
node_class = "audio_mixer"

[nodes.node_args]
input_channels = ["Eve"]
tick_input_channel = "tick/millis/20"
output_channel = "Eve_mixed"
buffer_size = 960

[[nodes]]
node_name = "tick"
node_class = "tick"
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
redis_url = "redis://localhost:6379/0"
extra_modules = [
"examples.experimental.realtime.realtime_websocket",
"examples.experimental.realtime.input_node",
"examples.experimental.realtime.audio_mixer"
]

[[nodes]]
node_name = "speaker"
node_class = "speaker"

[nodes.node_args]
rate = 24000
input_channel = "Eve"

[[nodes]]
node_name = "speaker2"
node_class = "speaker"

[nodes.node_args]
rate = 24000
input_channel = "Jane"

[[nodes]]
node_name = "listener"
node_class = "listener"

[nodes.node_args]
rate = 24000
output_channel = "Jack"


[[nodes]]
node_name = "Eve"
node_class = "openai_realtime"

[nodes.node_args]
input_channel = "Jane_mixed"
output_channel = "Eve"
instruction = "Your name is Eve, you are talking to your friend Jane and Jack. You want to convince them to play poker with you tonight. Please start every sentence with \"Jane,\" or \"Jack,\""


[[nodes]]
node_name = "Jane"
node_class = "openai_realtime"

[nodes.node_args]
input_channel = "Eve_mixed"
output_channel = "Jane"
instruction = "Your name is Jane, you are talking to your friend Eve and Jack. You want to convince them to play soccer with you tonight. Please let him say the first sentence. Please start every sentence with \"Eve,\" or \"Jack,\""

[[nodes]]
node_name = "audio_mixer_Jane"
node_class = "audio_mixer"

[nodes.node_args]
input_channels = ["Jane", "Jack"]
tick_input_channel = "tick/millis/20"
output_channel = "Jane_mixed"
buffer_size = 960

[[nodes]]
node_name = "audio_mixer_Eve"
node_class = "audio_mixer"

[nodes.node_args]
input_channels = ["Eve", "Jack"]
tick_input_channel = "tick/millis/20"
output_channel = "Eve_mixed"
buffer_size = 960

[[nodes]]
node_name = "tick"
node_class = "tick"
Loading
Loading