Copyright © 2022- basebox GmbH, all rights reserved.
Licensed to be used in conjunction with basebox, only.

BaseBox - Quick Start Guide

Get your BaseBox up and running in minutes using OCI Helm registry.

Prerequisites

Kubernetes cluster (1.24+)
kubectl configured
Helm 3.8+ installed (OCI support required)
Ingress controller (nginx recommended)
Storage provisioner
GPU nodes (optional, for inference service)

Step 1: Install CloudNativePG Operator

# Add CloudNativePG Helm repository
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update

# Install the operator
helm upgrade --install cnpg \
  --namespace cnpg-system \
  --create-namespace \
  cnpg/cloudnative-pg

# Verify installation
kubectl wait --for=condition=Available --timeout=300s \
  deployment/cnpg-cloudnative-pg \
  -n cnpg-system

Step 2: Create Database Clusters

Save this as databases.yaml:

---
# Cluster for aisrv
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: aisrv-db
spec:
  instances: 3
  storage:
    size: 10Gi
    storageClass: standard  # Change to your storage class
  bootstrap:
    initdb:
      database: aisrv
      owner: aisrv
      secret:
        name: aisrv-db-secret
      postInitSQL:
        - CREATE EXTENSION IF NOT EXISTS vector;
---
apiVersion: v1
kind: Secret
metadata:
  name: aisrv-db-secret
type: kubernetes.io/basic-auth
stringData:
  username: aisrv
  password: "530D67F4B07E42F9ABFD2EB3B0E82D13"  # CHANGE IN PRODUCTION
---
# Cluster for idp (Keycloak)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: idp-db
spec:
  instances: 3
  storage:
    size: 5Gi
    storageClass: standard  # Change to your storage class
  bootstrap:
    initdb:
      database: idp
      owner: idp
      secret:
        name: idp-db-secret
---
apiVersion: v1
kind: Secret
metadata:
  name: idp-db-secret
type: kubernetes.io/basic-auth
stringData:
  username: idp
  password: "064AD2B3AA0C4AC389930F9A52F1DAA7"  # CHANGE IN PRODUCTION
---
# Cluster for storesrv
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: storesrv-db
spec:
  instances: 3
  storage:
    size: 10Gi
    storageClass: standard  # Change to your storage class
  bootstrap:
    initdb:
      database: storesrv
      owner: storesrv
      secret:
        name: storesrv-db-secret
---
apiVersion: v1
kind: Secret
metadata:
  name: storesrv-db-secret
type: kubernetes.io/basic-auth
stringData:
  username: storesrv
  password: "530D67F4B07E42F9ABFD2EB3B0E82D13"  # CHANGE IN PRODUCTION

Apply the databases:

kubectl apply -f databases.yaml -n basebox

# Wait for clusters to be ready
kubectl wait --for=condition=Ready --timeout=600s cluster/aisrv-db -n basebox
kubectl wait --for=condition=Ready --timeout=600s cluster/idp-db -n basebox
kubectl wait --for=condition=Ready --timeout=600s cluster/storesrv-db -n basebox

Step 3: Configure Your Values

Create my-values.yaml and customize for your environment:

global:
  # Storage configuration - applies to all components
  storageClass: "standard"  # Change to your storage class (gp3, premium-rwo, etc.)

  # Domain configuration
  domain: "basebox.example.com"  # Change to your domain

  # Common ingress configuration
  ingress:
    enabled: true
    className: "nginx"
    tls:
      enabled: true
      secretName: basebox-tls

# Identity Provider (Keycloak)
idp:
  database:
    host: idp-db-rw
    port: 5432
    username: idp
    password: "064AD2B3AA0C4AC389930F9A52F1DAA7"  # CHANGE IN PRODUCTION
    name: idp

# AI Service
aisrv:
  database:
    host: aisrv-db-rw
    port: 5432
    user: aisrv
    password: "530D67F4B07E42F9ABFD2EB3B0E82D13"  # CHANGE IN PRODUCTION
    dbname: aisrv

  # Application-level LLM configuration
  env:
    AISRV_LLM_MODEL: "casperhansen/llama-3.3-70b-instruct-awq"
    AISRV_LLM_PROVIDER: "vLLM"  # or "TGI" for text-generation-inference
    AISRV_LLM_CONTEXT_SIZE: "32000"      # Max context your app will request
    AISRV_LLM_WORD_LIMIT: "24000"        # Max output tokens
    AISRV_LLM_TEMPERATURE: "0.6"         # Default temperature
    AISRV_LLM_TOP_P: "0.9"               # Default top_p sampling

# Store Service
storesrv:
  database:
    host: storesrv-db-rw
    port: 5432
    user: storesrv
    password: "530D67F4B07E42F9ABFD2EB3B0E82D13"  # CHANGE IN PRODUCTION
    dbname: storesrv

# RAG Service
ragsrv:
  enabled: true
  mode: "cpu"  # Set to "gpu" if you have GPU nodes

  # Uncomment if using GPU mode
  # resources:
  #   limits:
  #     nvidia.com/gpu: 1
  #   requests:
  #     nvidia.com/gpu: 1

# RAG Support Service
ragsrv-support:
  enabled: true
  mode: "cpu"  # Set to "gpu" if you have GPU nodes

  # Uncomment if using GPU mode
  # resources:
  #   limits:
  #     nvidia.com/gpu: 1
  #   requests:
  #     nvidia.com/gpu: 1

# Inference Service (vLLM)
inference:
  enabled: true  # Set to false if you don't have GPU nodes

  # Persistent storage for model cache
  persistence:
    enabled: true
    size: 200Gi  # Adjust based on model size (see sizing guide below)
    accessModes:
      - ReadWriteOnce
    storageClassName: ""  # Uses global.storageClass if empty

  # GPU resources (required for vLLM)
  resources:
    limits:
      nvidia.com/gpu: 1  # Adjust based on model requirements
      memory: 32Gi
    requests:
      nvidia.com/gpu: 1
      memory: 16Gi

  # vLLM environment variables
  env:
    # Model configuration
    MODEL_NAME: "casperhansen/llama-3.3-70b-instruct-awq"
    HUGGING_FACE_HUB_TOKEN: ""  # Add token for gated models (e.g., Llama, Mistral)

    # Cache configuration (points to persistent volume)
    HF_HOME: "~/.cache/huggingface/hub"
    HF_HUB_DISABLE_SYMLINKS_WARNING: "1"

    # Server configuration
    VLLM_HOST: "0.0.0.0"
    VLLM_PORT: "8000"

    # Performance settings (must accommodate AISRV settings)
    VLLM_GPU_MEMORY_UTILIZATION: "0.9"
    VLLM_MAX_MODEL_LEN: "32768"        # Must be >= AISRV_LLM_CONTEXT_SIZE
    VLLM_MAX_NUM_SEQS: "128"           # Concurrent sequences
    VLLM_MAX_TOKENS: "24000"           # Default max output tokens

    # Optimization
    VLLM_DTYPE: "auto"                 # Auto-detect best dtype
    VLLM_ENFORCE_EAGER: "false"        # Enable CUDA graphs for performance

    # Quantization (if using quantized models)
    # VLLM_QUANTIZATION: "awq"         # Uncomment for AWQ models

    # Miscellaneous
    VLLM_NO_USAGE_STATS: "1"

  # Health check probes (allow time for model download on first start)
  livenessProbe:
    httpGet:
      path: /health
      port: 8000
    initialDelaySeconds: 600  # 10 minutes for initial model download
    periodSeconds: 30
    timeoutSeconds: 10
    failureThreshold: 3

  readinessProbe:
    httpGet:
      path: /health
      port: 8000
    initialDelaySeconds: 600
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3

Step 4: Install BaseBox from OCI Registry

Install the Complete Platform (Umbrella Chart)

# Install latest version
helm upgrade --install basebox \
  oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai \
  --namespace basebox \
  --create-namespace \
  --values my-values.yaml \
  --wait \
  --timeout 15m  # Increased timeout for model download

# Or install specific version
helm upgrade --install basebox \
  oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai \
  --version 1.2.3 \
  --namespace basebox \
  --create-namespace \
  --values my-values.yaml \
  --wait \
  --timeout 15m

Note: First deployment will take longer (5-15 minutes) as the inference service downloads the model. Subsequent deployments will be much faster as the model is cached.

Step 5: Verify Installation

# Check all pods are running
kubectl get pods -n basebox

# Watch inference pod startup (model download progress)
kubectl logs -f -n basebox deployment/basebox-inference

# Check ingress
kubectl get ingress -n basebox

# Check database clusters
kubectl get clusters -n basebox

# Check PVCs (verify model cache is bound)
kubectl get pvc -n basebox

# Check all resources
kubectl get all -n basebox

Expected Pod Status

NAME                                    READY   STATUS    RESTARTS   AGE
basebox-aisrv-xxxxx                    1/1     Running   0          5m
basebox-storesrv-xxxxx                 1/1     Running   0          5m
basebox-ragsrv-xxxxx                   1/1     Running   0          5m
basebox-ragsrv-support-xxxxx           1/1     Running   0          5m
basebox-inference-xxxxx                1/1     Running   0          10m  # Takes longer on first start
basebox-idp-xxxxx                      1/1     Running   0          5m
basebox-frontend-xxxxx                 1/1     Running   0          5m

Step 6: Access Your Installation

Once all pods are running, access your installation at:

Frontend: https://basebox.example.com
GraphQL API: https://basebox.example.com/graphql
Keycloak Admin: https://basebox.example.com/auth/admin

Default admin credentials (change after first login): - Username: admin - Password: Check your Keycloak configuration

Working with OCI Charts

View Chart Information

# Show chart details
helm show chart oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai

# Show all chart information (values, readme, etc.)
helm show all oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai

# Show default values
helm show values oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai

Pull Charts Locally

# Pull chart to local directory
helm pull oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai

# Pull and extract
helm pull oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai --untar

# Template the chart locally (without installing)
helm template basebox oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai \
  --values my-values.yaml > rendered-manifests.yaml

Update Your Installation

# Upgrade to latest version
helm upgrade basebox \
  oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai \
  --namespace basebox \
  --values my-values.yaml \
  --wait \
  --timeout 15m

# Upgrade to specific version
helm upgrade basebox \
  oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai \
  --version 1.2.4 \
  --namespace basebox \
  --values my-values.yaml \
  --wait \
  --timeout 15m

Troubleshooting

Pods not starting

# Check pod status
kubectl describe pod <pod-name> -n basebox

# Check logs
kubectl logs <pod-name> -n basebox

# Check events
kubectl get events -n basebox --sort-by='.lastTimestamp'

Inference Service Issues

Model Download Taking Too Long

# Check download progress
kubectl logs -f -n basebox deployment/basebox-inference

# Verify PVC is bound
kubectl get pvc -n basebox

# Check available storage
kubectl describe pvc basebox-inference-model-cache -n basebox

Out of Memory Errors

# Check GPU memory usage
kubectl exec -it -n basebox deployment/basebox-inference -- nvidia-smi

# Solution: Reduce VLLM_GPU_MEMORY_UTILIZATION or use smaller model

Model Not Found / Authentication Errors

# For gated models (Llama, Mistral), verify token is set
kubectl get secret -n basebox

# Add HuggingFace token
helm upgrade basebox \
  oci://gitea.basebox.health/basebox-distribution/helm/basebox.ai \
  --namespace basebox \
  --set inference.env.HUGGING_FACE_HUB_TOKEN="hf_xxxxxxxxxxxxx" \
  --reuse-values

vLLM Server Not Responding

# Check vLLM logs
kubectl logs -n basebox deployment/basebox-inference --tail=100

# Common issues:
# - Model path incorrect: Check MODEL_NAME matches HuggingFace repo
# - Context size too large: Reduce VLLM_MAX_MODEL_LEN
# - Out of GPU memory: Reduce VLLM_GPU_MEMORY_UTILIZATION or model size

Database connection issues

# Check database cluster status
kubectl get cluster -n basebox
kubectl describe cluster aisrv-db -n basebox

# Check database pods
kubectl get pods -l cnpg.io/cluster=aisrv-db -n basebox

# Test database connectivity
kubectl run -it --rm debug --image=postgres:16 --restart=Never -n basebox -- \
  psql -h aisrv-db-rw -U aisrv -d aisrv

Ingress not working

# Check ingress controller
kubectl get pods -n ingress-nginx

# Check ingress resources
kubectl describe ingress -n basebox

# Check TLS certificate (if using cert-manager)
kubectl get certificate -n basebox
kubectl describe certificate basebox-tls -n basebox

Helm installation issues

# List installed releases
helm list -n basebox

# Get release status
helm status basebox -n basebox

# Get release history
helm history basebox -n basebox

# Rollback if needed
helm rollback basebox 1 -n basebox

Configuration Examples

Using a Different Model

To use a different model (e.g., Mistral-7B):

aisrv:
  env:
    AISRV_LLM_MODEL: "mistralai/Mistral-7B-Instruct-v0.2"

inference:
  persistence:
    size: 100Gi  # Smaller model, less storage needed

  resources:
    limits:
      nvidia.com/gpu: 1
      memory: 24Gi
    requests:
      nvidia.com/gpu: 1
      memory: 12Gi

  env:
    MODEL_NAME: "mistralai/Mistral-7B-Instruct-v0.2"
    VLLM_MAX_MODEL_LEN: "32768"
    HUGGING_FACE_HUB_TOKEN: "hf_xxxxxxxxxxxxx"  # Required for Mistral

Using Quantized Models

For AWQ or GPTQ quantized models:

inference:
  env:
    MODEL_NAME: "casperhansen/llama-3.3-70b-instruct-awq"
    VLLM_QUANTIZATION: "awq"  # or "gptq"
    VLLM_MAX_MODEL_LEN: "32768"

Multiple GPU Setup

For models requiring multiple GPUs:

inference:
  resources:
    limits:
      nvidia.com/gpu: 2  # Use 2 GPUs
      memory: 64Gi
    requests:
      nvidia.com/gpu: 2
      memory: 32Gi

  env:
    VLLM_TENSOR_PARALLEL_SIZE: "2"  # Split model across 2 GPUs

Database Connection Endpoints

Your applications connect to these endpoints when using cnpg:

aisrv database:
Read-Write: aisrv-db-rw:5432
Read-Only: aisrv-db-ro:5432
idp database:
Read-Write: idp-db-rw:5432
Read-Only: idp-db-ro:5432
storesrv database:
Read-Write: storesrv-db-rw:5432
Read-Only: storesrv-db-ro:5432

Performance Tuning

vLLM Configuration Constraints

Ensure your settings are compatible:

VLLM_MAX_MODEL_LEN >= (AISRV_LLM_CONTEXT_SIZE + AISRV_LLM_WORD_LIMIT)

Example: - VLLM_MAX_MODEL_LEN: "32768" - AISRV_LLM_CONTEXT_SIZE: "20000" - AISRV_LLM_WORD_LIMIT: "12000" - Total: 32000 tokens ✓ (fits within 32768)

GPU Memory Optimization

If experiencing OOM errors:

Reduce VLLM_GPU_MEMORY_UTILIZATION from 0.9 to 0.8
Reduce VLLM_MAX_MODEL_LEN to match your actual needs
Reduce VLLM_MAX_NUM_SEQS to limit concurrent requests
Use quantized models (AWQ, GPTQ)

Storage Class Selection

Choose appropriate storage class for your cluster and/or provider:

global:
  storageClass: "standard"