Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard #68

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
2036378
dashboard v0
ryanhayame Aug 5, 2024
ffceab0
Merge branch 'main' of https://github.com/Trainy-ai/konduktor into da…
ryanhayame Aug 5, 2024
86054b1
websocket logs
ryanhayame Aug 8, 2024
00aeb1e
Merge branch 'main' of https://github.com/Trainy-ai/konduktor into da…
ryanhayame Aug 8, 2024
2a8b53e
logging
ryanhayame Sep 12, 2024
a50e1cb
Merge branch 'main' of https://github.com/Trainy-ai/konduktor into da…
ryanhayame Sep 12, 2024
de3a64c
update
ryanhayame Sep 12, 2024
c9f85a2
Merge branch 'main' of https://github.com/Trainy-ai/konduktor into da…
ryanhayame Sep 24, 2024
d0f58c3
dockerfiles + docker-compose
ryanhayame Sep 24, 2024
29b71ad
backend.default.svc.cluster.local:5001 (broken APIs)
ryanhayame Oct 3, 2024
337815b
Merge branch 'main' of https://github.com/Trainy-ai/konduktor into da…
ryanhayame Oct 3, 2024
828932c
Merge branch 'main' of https://github.com/Trainy-ai/konduktor into da…
ryanhayame Oct 11, 2024
77bc13f
v0.5 dashboard
ryanhayame Oct 11, 2024
da543a5
dashboard 0.7
ryanhayame Oct 17, 2024
a283ecc
Merge branch 'main' of https://github.com/Trainy-ai/konduktor into da…
ryanhayame Oct 17, 2024
14cc432
Merge branch 'main' of https://github.com/Trainy-ai/konduktor into da…
ryanhayame Oct 20, 2024
ba86274
dashboard v0.9
ryanhayame Oct 20, 2024
ff36f53
DASHBOARD V1.0
ryanhayame Oct 22, 2024
bb9aae0
dashboard 1.01
ryanhayame Oct 23, 2024
9142b26
fixes v1
ryanhayame Oct 29, 2024
5dd4c2d
fixes
ryanhayame Oct 29, 2024
ee17803
Merge branch 'main' of https://github.com/Trainy-ai/konduktor into da…
ryanhayame Nov 5, 2024
a178990
dashboard fixes minus socketio
ryanhayame Nov 5, 2024
b30df1b
DASHBOARD V2
ryanhayame Nov 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions Dockerfile.backend
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Use Python 3.11 slim as the base image
FROM python:3.11-slim AS base

# Set environment variables for Python behavior
ENV PYTHONFAULTHANDLER=1 \
PYTHONHASHSEED=random \
PYTHONUNBUFFERED=1

# Set the working directory inside the container
WORKDIR /app

# Builder stage: Install dependencies and build the backend package
FROM base AS builder

# Set environment variables for pip and Poetry
ENV PIP_DEFAULT_TIMEOUT=100 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
POETRY_VERSION=1.3.1

# Install Poetry
RUN pip install "poetry==$POETRY_VERSION"

# Copy the pyproject.toml and poetry.lock files
COPY pyproject.toml poetry.lock ./

# Copy the entire konduktor directory to the container
COPY konduktor ./konduktor

# List the contents of the konduktor directory to verify the copy
RUN ls -la ./konduktor

# Configure Poetry and install dependencies only for the dashboard group
RUN poetry config virtualenvs.in-project true && \
poetry install --with dashboard --no-root

# Final stage for production
FROM base AS final

# Set the working directory
WORKDIR /app

# Copy the virtual environment from the builder stage
COPY --from=builder /app/.venv ./.venv

# Copy the konduktor directory from the builder stage
COPY --from=builder /app/konduktor ./konduktor

# Copy the startup script
COPY startup.sh /app/startup.sh
RUN chmod +x /app/startup.sh

# Expose the port the app runs on
EXPOSE 5001

# Set the startup command
CMD ["/app/startup.sh"]
23 changes: 23 additions & 0 deletions Dockerfile.frontend
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Use the official Node.js 18 slim image
FROM node:18-slim

# Set the working directory
WORKDIR /app

# Copy package.json and package-lock.json from the /frontend folder
COPY konduktor/dashboard/frontend/package*.json ./

# Install dependencies
RUN npm install

# Copy the entire frontend source code
COPY konduktor/dashboard/frontend/ .

# Build the frontend for production
RUN npm run build

# Expose the frontend port
EXPOSE 5173

# Start the frontend app
CMD ["npm", "run", "start"]
4 changes: 2 additions & 2 deletions format.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/env bash
set -eo pipefail

RUFF_VERSION=$(ruff --version | head -n 1 | awk '{print $2}')
MYPY_VERSION=$(mypy --version | awk '{print $2}')
RUFF_VERSION=$(poetry run ruff --version | head -n 1 | awk '{print $2}')
MYPY_VERSION=$(poetry run mypy --version | awk '{print $2}')

echo "ruff ver $RUFF_VERSION"
echo "mypy ver $MYPY_VERSION"
Expand Down
21 changes: 21 additions & 0 deletions konduktor/dashboard/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
### Prereqs: kubectl is configured with remote machine/cluster

## 1. Apply kubernetes manifest
Inside manifests directory (one with dashboard_deployment.yaml):
```
kubectl apply -f dashboard_deployment.yaml
```
Then, wait a minute or two for the pods to finish setup

## 2. Port forward frontend in a terminal
```
kubectl port-forward svc/frontend 5173:5173 -n konduktor-dashboard
```

## 3. Port forward grafana in a terminal
```
kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n prometheus
```

## 4. Open dashboard at http://localhost:5173/

228 changes: 228 additions & 0 deletions konduktor/dashboard/backend/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
from typing import Any, Dict, List

from flask import Flask, jsonify, request
from flask_cors import CORS
from kubernetes import client
from kubernetes.client.exceptions import ApiException

from konduktor.kube_client import batch_api, core_api, crd_api

from .sockets import socketio

app = Flask(__name__)

# Ensure CORS is configured correctly
cors = CORS(app, resources={r"/*": {"origins": "*"}})

# Attach socketio to app after app is created
socketio.init_app(app, cors_allowed_origins="*")

# Use Kubernetes API clients
# Initialize BatchV1 and CoreV1 API (native kubernetes)
batch_client = batch_api()
core_client = core_api()
# Initialize Kueue API
crd_client = crd_api()


# Get a listing of workloads in kueue
def fetch_jobs():
listing = crd_client.list_namespaced_custom_object(
group="kueue.x-k8s.io",
version="v1beta1",
namespace="default",
plural="workloads",
)

return format_workloads(listing)


def format_workloads(listing: Dict[str, Any]) -> List[Dict[str, Any]]:
if not listing:
return []

res = []

for job in listing["items"]:
id = job["metadata"]["uid"]
# name = job["metadata"]["ownerReferences"][0]["name"]
name = job["metadata"]["name"]
created_at = job["metadata"]["creationTimestamp"]
namespace = job["metadata"]["namespace"]
localQueueName = job["spec"].get("queueName", "Unknown")
priority = job["spec"]["priority"]
active = job["spec"].get("active", 0)
status = "ADMITTED" if "admission" in job.get("status", {}) else "PENDING"

statusVal = 1 if "admission" in job.get("status", {}) else 0
order = (statusVal * 10) + priority

res.append(
{
"id": id,
"name": name,
"namespace": namespace,
"localQueueName": localQueueName,
"priority": priority,
"status": status,
"active": active,
"created_at": created_at,
"order": order,
}
)

return res


"""
# for testing: prints workloads in kueue
def list_all_workloads(namespace="default"):
try:
# List workloads from the CRD API
workloads = crd_client.list_namespaced_custom_object(
group="kueue.x-k8s.io",
version="v1beta1",
namespace=namespace,
plural="workloads",
)
for workload in workloads.get("items", []):
print(f"Workload Name: {workload['metadata']['name']}")

except ApiException as e:
print(f"Failed to list workloads: {e}")


# for testing: prints jobs in native kubernetes kueue
def list_all_jobs():
try:
jobs = batch_client.list_job_for_all_namespaces(watch=False) # Get all jobs

if not jobs.items:
print("No jobs found.")
else:
print("Jobs found:")
for job in jobs.items:
print(f"Name: {job.metadata.name}, Namespace: {job.metadata.namespace}")

except ApiException as e:
print(f"Failed to list jobs: {e}")
"""


# ROUTES


@app.route("/", methods=["GET"])
def home():
return jsonify({"home": "/"})


@app.route("/ping", methods=["GET"])
def ping():
return jsonify({"message": "Pong from backend!"})


@app.route("/deleteJob", methods=["DELETE"])
def delete_job():
data = request.get_json()
name = data.get("name", "")
namespace = data.get("namespace", "default")

"""
# This is because kueue and native kubernetes have different job names:
# Split the name into parts using the '-' delimiter
name_parts = name.split('-')
# Slice the list to get all elements except the first and the last
native_job_name_parts = name_parts[1:-1]
# Join the sliced parts back together with '-'
native_job_name = '-'.join(native_job_name_parts)
"""

try:
delete_options = client.V1DeleteOptions(propagation_policy="Background")

crd_client.delete_namespaced_custom_object(
group="kueue.x-k8s.io",
version="v1beta1",
namespace=namespace,
plural="workloads",
name=name,
body=delete_options,
)
print(f"Kueue Workload '{name}' deleted successfully.")

"""
list_all_workloads()
list_all_jobs()

print(f"Native Kubernetes Job Name: {native_job_name}")

batch_client.delete_namespaced_job(
name=native_job_name,
namespace=namespace,
body=delete_options
)
print(f"Native Kubernetes Job {native_job_name} deleted successfully.")
"""

return jsonify({"success": True, "status": 200})

except ApiException as e:
print(f"Exception: {e}")
return jsonify({"error": str(e)}), e.status


@app.route("/getJobs", methods=["GET"])
def get_jobs():
rows = fetch_jobs()
return jsonify(rows)


@app.route("/getNamespaces", methods=["GET"])
def get_namespaces():
try:
# Get the list of namespaces
namespaces = core_client.list_namespace()
# Extract the namespace names from the response
namespace_list = [ns.metadata.name for ns in namespaces.items]
return jsonify(namespace_list)
except ApiException as e:
print(f"Exception: {e}")
return jsonify({"error": str(e)}), e.status


@app.route("/updatePriority", methods=["PUT"])
def update_priority():
data = request.get_json()
name = data.get("name", "")
namespace = data.get("namespace", "default")
priority = data.get("priority", 0)

try:
job = crd_client.get_namespaced_custom_object(
group="kueue.x-k8s.io",
version="v1beta1",
namespace=namespace,
plural="workloads",
name=name,
)

job["spec"]["priority"] = priority

crd_client.patch_namespaced_custom_object(
group="kueue.x-k8s.io",
version="v1beta1",
namespace=namespace,
plural="workloads",
name=name,
body=job,
)
return jsonify({"success": True, "status": 200})

except ApiException as e:
print(f"Exception: {e}")
return jsonify({"error": str(e)}), e.status


if __name__ == "__main__":
socketio.run(app, host="0.0.0.0", port=5001, debug=True, allow_unsafe_werkzeug=True)
Loading