A proxy for the OpenAI chat completion API that keeps track of used tokens and the resulting costs. Metadata about each query is stored in a database (SQLite by default). Note that openai-api-proxy does absolutely no authentication nor authorization, so never expose it outside of a secure internal network.
This project uses Spring Boot Web to build the REST service. Spring Boot Data JPA, sqlite-jdbc, and hibernate-community-dialects are used for the SQLite database access. JTokkit is used for estimating the tokens of queries before forwarding the API call.
For illustration purposes, we assume that openai-api-proxy runs on localhost port 8080.
The chat completion endpoint has (almost) the same specification as the OpenAI API. This makes it possible to use openai-api-proxy as a transparent proxy for applications. There are three notable differences:
- You don't have to set the
Authorization
header with the OpenAI API token. openai-api-proxy will inject this on the forwarded API calls. - You must set the additional HTTP header
x-user
. This is because openai-api-proxy associates all queries with a user. Failing to do so will result in a 401 Unauthorized HTTP response. - Tool calls are not supported.
Example usage:
curl -H "content-type: application/json" -H "x-user: adam" http://localhost:8080/v1/chat/completions -d '{"model":"gpt-4-1106-preview","messages":[{"role":"user","content":"I am using a proxy to talk to you."}]}'
openai-api-proxy needs to know pricing metadata about the models. The prices are given in dollars per 1 thousand used tokens for both query tokens and answer tokens. The pricing of models may also change over time.
Add pricing metadata for a specific model:
curl -H "content-type: application/json" http://localhost:8080/model/gpt-4-1106-preview -d '{"validFrom":"2023-11-06T00:00:00Z","per1KQueryTokens": 0.01,"per1KAnswerTokens": 0.03}'
Show a specific model:
curl http://localhost:8080/model/gpt-4-1106-preview
Example output:
{
"name": "gpt-4-1106-preview",
"costs": [
{
"validFrom": "2023-11-06T00:00:00Z",
"per1KQueryTokens": 0.01,
"per1KAnswerTokens": 0.03
}
]
}
Show all models:
curl http://localhost:8080/models
Returns a JSON list of model metadata (see above).
Get aggregated information on all queries:
curl http://localhost:8080/usage
Example output:
{
"numQueries": 649,
"queryTokens": 868292,
"answerTokens": 302305,
"cost": 17.752929999999985,
"averageCostPerQuery": 0.02735428351309705
}
The usage can be filtered by user (parameter user
) and/or timestamp (parameters before
and after
). The timestamp
must be given with a timezone, e.g. Z
at the end for GMT or +01
for UTC+1 (the +
is URL-encoded as %2B
).
curl 'http://localhost:8080/usage?user=adam&after=2024-01-24T14:50%2B01'
Get the metadata about all queries:
curl http://localhost:8080/queries
Example output:
[
{
"user": "adam",
"model": "gpt-4-1106-preview",
"timestamp": "2024-01-25T13:18:05.205Z",
"queryTokens": 3243,
"answerTokens": 1399,
"cost": 0.07440000000000001
},
{
"user": "adam",
"model": "gpt-4-1106-preview",
"timestamp": "2024-01-25T13:22:25.648Z",
"queryTokens": 1218,
"answerTokens": 477,
"cost": 0.02649
}
]
This can be filtered in the same way as the /usage
endpoint.
openai-api-proxy stores the ratelimit information of the latest call to the OpenAI API. It can be retrieved via:
curl http://localhost:8080/ratelimits
Example output:
{
"tokens": {
"maximum": 500000,
"remaining": 486895,
"reset": "2024-01-25T14:04:47.719Z"
},
"requests": {
"maximum": 500,
"remaining": 499,
"reset": "2024-01-25T13:27:03.44Z"
}
}
You can set configuration options in either of these places:
- An
application.yaml
in the current working directory. - Via system properties in the JVM (
-D
command line switch of thejava
process). - Via environment variables in
UPPER_CASE_FORMAT
.
Useful configuration options:
Option | Default Value | Description |
---|---|---|
forward.url |
https://api.openai.com/v1/chat/completions |
The URL to forward API calls to. |
forward.token |
not set | The OpenaAI API token to use for forwarded API calls. |
spring.datasource.url |
jdbc:sqlite:openai-queries.db |
The path to the SQLite database (after the jdbc:sqlite: ). Default is the relative path openai-queries.db in the current working directory |
server.port |
8080 |
The TCP port to listen on. |
We recommend to set forward.token
as an environment variable (FORWARD_TOKEN
), as to not easily leak secrets in
open configuration files.
See also the spring documentation for additional configuration options.
Run the main class net.ssehub.openai_api_proxy.OpenAiApiProxy
, for example like this:
java -cp openai-api-proxy-0.0.1-SNAPSHOT-jar-with-dependencies.jar net.ssehub.openai_api_proxy.OpenAiApiProxy
For the first startup, you have to set the configuration option spring.jpa.hibernate.ddl-auto
to update
in order to
create the correct schema for the SQLite database.
You can set up openai-api-proxy as a systemd service roughly like this:
- Create directory
/opt/openai-api-proxy/
, put the jar-with-dependencies there. - Create a
run.sh
in/opt/openai-api-proxy/
:
#!/bin/bash
export FORWARD_TOKEN="YOUR_TOKEN"
export SPRING_DATASOURCE_URL="jdbc:sqlite:/var/lib/openai-api-proxy/queries.db"
exec java -cp /opt/openai-api-proxy/openai-api-proxy.jar net.ssehub.openai_api_proxy.OpenAiApiProxy
Make this only accessible by root
and www-data
:
chown root:www-data /opt/openai-api-proxy/run.sh
chmod u=rwx,g=rx,o= /opt/openai-api-proxy/run.sh
- Create
/var/lib/openai-api-proxy/
and set ownerchown www-data:www-data /var/lib/openai-api-proxy/
- Create
/etc/systemd/system/openai-api-proxy.service
:
[Unit]
Description=Proxy for the OpenAI API that keeps track of tokens and cost
After=network.target
[Service]
User=www-data
Group=www-data
ExecStart=/usr/bin/bash /opt/openai-api-proxy/run.sh
[Install]
WantedBy=multi-user.target
- Reload systemd
systemctl daemon-reload
.
You can start the service with systemctl start openai-api-proxy.service
and/or enable it for automatic startup with
systemctl enable openai-api-proxy.service
. Check logs with journalctl --unit openai-api-proxy.service
.
Remember that for the first startup, the SQLite database has to be setup with the correct schema. So either copy an
existing database to /var/lib/openai-api-proxy/queries.db
or set spring.jpa.hibernate.ddl-auto
to update
for the
first startup.
This project uses Maven for dependency management and the build process. To simply build jars, run:
mvn package
This creates two jar files in the target
folder ($version
is the version that was built, e.g. 0.0.1-SNAPSHOT
):
openai-api-proxy-$version.jar
just includes the class files of this program.openai-api-proxy-$version-jar-with-dependencies.jar
includes the class files of this program, plus all dependencies. This means that this jar can be used when you don't want to manually provide all dependencies of this program each time you execute it.