Skip to content

basebox.ai Helm

Overview

The basebox umbrella chart is a Helm chart that bundles all basebox AI platform services into a single deployment. It orchestrates the deployment of frontend, backend services, databases, identity provider, and inference services.

Architecture

The umbrella chart deploys the following services:

Service Purpose Port Dependencies
Frontend Vue 3 web application 3000 AISRV, IDP
IDP Keycloak identity provider 8080 PostgreSQL (idp-db)
AISRV Main API server (GraphQL) 8888 PostgreSQL (aisrv-db), IDP, Inference, RAGSRV
STORESRV Store server (GraphQL) 8889 PostgreSQL (storesrv-db), IDP
RAGSRV RAG vector database service 3001 PostgreSQL (ragsrv-db), RAGSRV-Support, GPU
RAGSRV-Support RAG model inference service 8000 GPU
Inference LLM inference server 8080 GPU
CNPG Operator CloudNativePG database operator N/A -

Prerequisites

Kubernetes Cluster Requirements

  • Kubernetes Version: 1.23 or later
  • Helm Version: 3.x
  • GPU Support: NVIDIA GPU Operator or similar (for inference and RAG services)
  • Ingress Controller: Any
  • Storage: Dynamic volume provisioning with a default storage class

Resource Requirements

Minimum Production Deployment: - CPU: 5+ cores - Memory: 12GB+ RAM - GPU: 1+ NVIDIA GPUs (1 for inference) - Storage: 200GB+ for databases and model caching

Recommended Production Deployment: - CPU: 10+ cores - Memory: 32GB+ RAM - GPU: 2+ NVIDIA GPUs (1 for inference, 1 for RAGSRV) - Storage: 500GB+ SSD/NVMe

External Dependencies

  • Image Registry: Access to gitea.basebox.health/basebox.distribution/
  • Domain/DNS: Configured DNS for the application domain
  • TLS Certificates: Valid certificates for HTTPS (via cert-manager pre-created or other means)

Global Configuration

Service Configuration

CloudNativePG Operator

Parameter Default Description
cnpg-operator.fullnameOverride cnpg Name override for CNPG operator

IDP (Keycloak)

Parameter Default Description
idp.ingress.enabled true Enable ingress for IDP
idp.ingress.className nginx Ingress class name
idp.ingress.hosts[].host Required Hostname for IDP
idp.ingress.hosts[].paths[].path /auth Base path for Keycloak
idp.ingress.tls [] TLS configuration

Key Annotations: - Proxy buffer sizes configured for large headers - CORS enabled for cross-origin requests - Cache disabled for authentication flows - Long timeouts for SSO operations (86400s)

AISRV

Parameter Default Description
aisrv.database.enabled true Enable PostgreSQL database
aisrv.database.host aisrv-db-rw Database host
aisrv.database.port 5432 Database port
aisrv.database.user aisrv Database username
aisrv.database.password Required Database password (use secure value)
aisrv.database.dbname aisrv Database name
aisrv.ingress.enabled true Enable ingress

Exposed Paths: - /rest - REST API endpoints - /graphql - GraphQL API endpoint - /subscriptions - GraphQL subscriptions (WebSocket) - /media - Media file serving

STORESRV

Parameter Default Description
storesrv.logLevel trace Logging level
storesrv.host 0.0.0.0 Server bind address
storesrv.port 8889 Server port
storesrv.oauth.IdpUrl http://idp:8080/realms/master OAuth IDP URL
storesrv.oauth.IdpAud account OAuth audience
storesrv.database.enabled true Enable PostgreSQL database
storesrv.graphql.allowIntrospection true Allow GraphQL introspection
storesrv.graphql.graphiql true Enable GraphiQL interface

RAGSRV

Parameter Default Description
ragsrv.enabled true Enable RAGSRV service
ragsrv.resources.requests.nvidia.com/gpu 1 Number of GPUs to request
ragsrv.resources.limits.nvidia.com/gpu 1 Maximum GPUs
ragsrv.env.COMPUTE gpu Compute mode (gpu or cpu)

Note: RAGSRV-Support is automatically enabled when RAGSRV is enabled.

RAGSRV-Support

Parameter Description

This service is deployed automatically with RAGSRV.

Inference

Parameter Default Description
inference.enabled true Enable inference service
inference.resources.requests.nvidia.com/gpu 1 Number of GPUs to request
inference.resources.limits.nvidia.com/gpu 1 Maximum GPUs

Frontend

Parameter Default Description
frontend.ingress.enabled true Enable ingress
frontend.ingress.hosts[].path / Serve frontend at root path

The frontend is configured with the same proxy and CORS settings as other services.

Ingress Configuration

Common Ingress Annotations

All services use consistent nginx ingress annotations:

Proxy Configuration

nginx.ingress.kubernetes.io/proxy-body-size: "20m"
nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
nginx.ingress.kubernetes.io/proxy-buffers-number: "4"
nginx.ingress.kubernetes.io/proxy-busy-buffers-size: "256k"
nginx.ingress.kubernetes.io/client-header-buffer-size: "16k"
nginx.ingress.kubernetes.io/large-client-header-buffers: "4 16k"
nginx.ingress.kubernetes.io/proxy-read-timeout: "86400"
nginx.ingress.kubernetes.io/proxy-send-timeout: "86400"
nginx.ingress.kubernetes.io/proxy-buffering: "off"

Purpose: - Large body size (20MB) for file uploads - Large buffer sizes for authentication headers (JWT tokens) - Long timeouts (24h) for WebSocket connections - Buffering disabled for streaming responses

Cache Control

nginx.ingress.kubernetes.io/Cache-Control: "no-store, no-cache, must-revalidate, proxy-revalidate"
nginx.ingress.kubernetes.io/Pragma: "no-cache"
nginx.ingress.kubernetes.io/Expires: "0"
nginx.ingress.kubernetes.io/Surrogate-Control: "no-store"

Purpose: Prevent caching of API responses and authentication flows

CORS Configuration

nginx.ingress.kubernetes.io/enable-cors: "true"
nginx.ingress.kubernetes.io/cors-allow-origin: "*"
nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, OPTIONS"
nginx.ingress.kubernetes.io/cors-allow-headers: "Authorization, Content-Type, X-Realm"
nginx.ingress.kubernetes.io/cors-allow-credentials: "true"
nginx.ingress.kubernetes.io/cors-max-age: "86400"

Purpose: Enable cross-origin requests for SPA frontend

Configuration Examples

Minimal Production Configuration

cnpg-operator:
  fullnameOverride: cnpg

idp:
  ingress:
    enabled: true
    className: "nginx"
    annotations:
      cert-manager.io/cluster-issuer: "letsencrypt-prod"
    hosts:
      - host: auth.company.com
        paths:
          - path: /auth
            pathType: Prefix
    tls:
      - hosts:
          - company.com
        secretName: company-tls

aisrv:
  database:
    enabled: true
    password: "<generate-secure-password>"
  ingress:
    enabled: true
    className: "nginx"
    annotations:
      cert-manager.io/cluster-issuer: "letsencrypt-prod"
    hosts:
      - host: api.company.com
        paths:
          - path: /rest
            pathType: Prefix
          - path: /graphql
            pathType: Exact
          - path: /subscriptions
            pathType: Exact
          - path: /media
            pathType: Prefix
    tls:
      - hosts:
          - company.com
        secretName: company-tls

storesrv:
  logLevel: info
  database:
    enabled: true
    password: "<generate-secure-password>"
  graphql:
    allowIntrospection: false
    graphiql: false

ragsrv:
  enabled: true
  resources:
    requests:
      nvidia.com/gpu: 1
    limits:
      nvidia.com/gpu: 1
  env:
    COMPUTE: gpu

ragsrv-support:

inference:
  enabled: true
  resources:
    requests:
      nvidia.com/gpu: 1
    limits:
      nvidia.com/gpu: 1

frontend:
  ingress:
    enabled: true
    className: "nginx"
    annotations:
      cert-manager.io/cluster-issuer: "letsencrypt-prod"
    hosts:
      - host: app.company.com
        paths:
          - path: /
    tls:
      - hosts:
          - company.com
        secretName: company-tls

Installation

Prepare Values File

Critical values to change: - All database passwords - Domain names - TLS secret names - Image pull secrets - GPU resource allocations - Storage class names

Install the Chart

# Install basebox umbrella chart
helm install basebox ./basebox.ai \
  --values values-production.yaml \
  --namespace basebox \
  --create-namespace \
  --timeout 30m

# Watch the deployment
kubectl get pods -n basebox -w

Verify Installation

# Check all pods are running
kubectl get pods -n basebox

# Check database clusters
kubectl get cluster -n basebox

# Check ingress
kubectl get ingress -n basebox

# Check GPU allocation
kubectl describe nodes | grep -A5 "Allocated resources"

Upgrade

Standard Upgrade

# Update values file
vim values-production.yaml

# Upgrade chart
helm upgrade --install basebox basebox/basebox.ai \
  --values values-production.yaml \
  --namespace basebox \
  --timeout 30m

Rolling Upgrade Strategy

For zero-downtime upgrades:

  1. Update databases first (if schema changes)
  2. Update backend services (AISRV, STORESRV, RAGSRV)
  3. Update IDP (authentication service)
  4. Update inference services
  5. Update frontend last

Uninstall

Complete Removal

# Uninstall chart
helm uninstall basebox --namespace basebox

# Delete PVCs (databases - DATA LOSS!)
kubectl delete pvc -n basebox --all

# Delete namespace
kubectl delete namespace basebox

Preserve Data

To preserve databases when uninstalling:

# Uninstall chart
helm uninstall basebox --namespace basebox

# PVCs remain - do not delete them
# To reinstall with existing data:
helm install basebox ./basebox.ai \
  --values values-production.yaml \
  --namespace basebox

Database Management

Backup All Databases

# IDP database
kubectl exec -n basebox idp-db-1 -- \
  pg_dump -U idp idp > idp-backup.sql

# AISRV database
kubectl exec -n basebox aisrv-db-1 -- \
  pg_dump -U aisrv aisrv > aisrv-backup.sql

# STORESRV database
kubectl exec -n basebox storesrv-db-1 -- \
  pg_dump -U storesrv storesrv > storesrv-backup.sql

# RAGSRV database
kubectl exec -n basebox ragsrv-db-1 -- \
  pg_dump -U ragsrv ragsrv > ragsrv-backup.sql

CloudNativePG Backups

Configure automated backups in values:

aisrv:
  aisrv-db:
    cluster:
      backup:
        barmanObjectStore:
          destinationPath: s3://backups/aisrv
          s3Credentials:
            accessKeyId:
              name: backup-s3-creds
              key: ACCESS_KEY_ID
            secretAccessKey:
              name: backup-s3-creds
              key: SECRET_ACCESS_KEY
        retentionPolicy: "30d"

Monitoring

Health Checks

# Check all service health endpoints
kubectl get pods -n basebox -o wide

# Port-forward and test endpoints
kubectl port-forward -n basebox svc/aisrv 8888:8888
curl http://localhost:8888/health

kubectl port-forward -n basebox svc/inference 8080:8080
curl http://localhost:8080/health

Metrics

Enable Prometheus monitoring:

aisrv:
  metrics:
    enabled: true
    port: 9090

idp:
  idp-db:
    cluster:
      monitoring:
        enablePodMonitor: true

Logs

# View logs from all AISRV pods
kubectl logs -n basebox -l app.kubernetes.io/name=aisrv --tail=100 -f

# View logs from inference service
kubectl logs -n basebox -l app.kubernetes.io/name=inference --tail=100 -f

# View logs from specific pod
kubectl logs -n basebox <pod-name> -f

Troubleshooting

Common Issues

Pods Stuck in Pending: - Check GPU availability: kubectl describe nodes | grep nvidia.com/gpu - Check storage: kubectl get pvc -n basebox - Check resource requests vs cluster capacity

Database Connection Failures: - Verify database pods are running: kubectl get pods -n basebox | grep db - Check database secrets: kubectl get secrets -n basebox - Review database logs: kubectl logs -n basebox <db-pod-name>

Ingress Not Working: - Verify ingress controller is running - Check ingress resources: kubectl get ingress -n basebox - Verify DNS points to ingress controller - Check TLS certificates: kubectl get certificates -n basebox

GPU Not Available: - Verify GPU operator: kubectl get pods -n gpu-operator - Check node labels: kubectl get nodes -L nvidia.com/gpu - Verify GPU scheduling: kubectl describe nodes | grep -A5 "nvidia.com/gpu"

Debug Commands

# Get all resources in namespace
kubectl get all -n basebox

# Describe pod for events and status
kubectl describe pod -n basebox <pod-name>

# Check resource usage
kubectl top pods -n basebox
kubectl top nodes

# View events
kubectl get events -n basebox --sort-by='.lastTimestamp'

# Test internal connectivity
kubectl run -it --rm debug --image=busybox --restart=Never -n basebox -- sh
# Inside pod:
wget -O- http://aisrv:8888/health
wget -O- http://inference:8080/health

Security Considerations

Resource Allocation

Backend Services (AISRV, STORESRV):

resources:
  requests:
    cpu: 1000m
    memory: 2Gi
  limits:
    cpu: 2000m
    memory: 4Gi

GPU Services (Inference, RAGSRV):

resources:
  requests:
    cpu: 4000m
    memory: 32Gi
    nvidia.com/gpu: 1
  limits:
    cpu: 8000m
    memory: 64Gi
    nvidia.com/gpu: 1

Databases:

cluster:
  instances: 3
  storage:
    size: 50Gi
    storageClass: fast-ssd

Database Performance

  • Use read replicas (increase instances)
  • Use fast storage (SSD/NVMe)
  • Tune PostgreSQL parameters
  • Regular VACUUM and ANALYZE