Skip to content

RAGSRV Configuration

Overview

RAGSRV (Retrieval-Augmented Generation Server) provides vector database and document processing capabilities for the basebox AI platform. It handles document ingestion, embedding generation, semantic search, and integrates with language models for RAG-based responses. The service uses PostgreSQL with the pgvector extension for vector storage.

Deployment

RAGSRV is deployed via Helm chart to Kubernetes clusters with an integrated PostgreSQL database managed by CloudNativePG. It requires GPU resources for optimal embedding performance.

Helm Chart Configuration

Basic Settings

Parameter Default Description
replicaCount 1 Number of RAGSRV pod replicas
image.repository gitea.basebox.health/basebox-distribution/ragsrv Container image repository
image.pullPolicy IfNotPresent Image pull policy
image.tag gpu-latest Image tag (GPU-enabled version)
fullnameOverride ragsrv Override the full name of the deployment

Service Configuration

Parameter Default Description
service.type ClusterIP Kubernetes service type
service.port 3001 Service port

Resource Management

Parameter Default Description
resources {} CPU/memory resource requests and limits
autoscaling.enabled false Enable horizontal pod autoscaling
autoscaling.minReplicas 1 Minimum number of replicas
autoscaling.maxReplicas 100 Maximum number of replicas
autoscaling.targetCPUUtilizationPercentage 80 Target CPU utilization for scaling

Health Checks

Parameter Default Description
livenessProbe {} Liveness probe configuration (disabled by default)
readinessProbe {} Readiness probe configuration (disabled by default)

Database Configuration

Database Settings

Parameter Default Description
database.enabled true Enable database creation
database.imageName ghcr.io/cloudnative-pg/postgresql:16-standard-bookworm PostgreSQL image
database.host ragsrv-db-rw Database host (read-write service)
database.port 5432 Database port
database.user ragsrv Database username
database.password <secure-password> Database password
database.name ragsrv Database name

CloudNativePG Cluster Settings

Parameter Default Description
ragsrv-db.cluster.instances 1 Number of PostgreSQL instances
ragsrv-db.cluster.storage.size 5Gi Storage size for database
ragsrv-db.cluster.storage.storageClass default Storage class to use
ragsrv-db.cluster.monitoring.enablePodMonitor false Enable Prometheus monitoring

Database Extensions

The database is automatically initialized with: - pgvector extension for vector similarity search - Custom schema from sql/01-schema.sql (provided via ConfigMap) - SUPERUSER role for application user

Environment Variables

Core Configuration

Variable Default Description
COMPUTE gpu Compute mode (gpu or cpu)
HOST 0.0.0.0 Server host to bind to
SERVER_PORT 3001 Server port
LOG_LEVEL trace Logging level (trace, debug, info, warn, error)
API_KEY Required API key for authentication
API_KEY_HEADER X-API-KEY HTTP header name for API key
SUPPORT_HTTP_STREAM true Enable HTTP streaming support

Database Connection (from Secrets)

Variable Source Description
DB_HOST ragsrv-database secret Database hostname
DB_PORT ragsrv-database secret Database port
DB_USER ragsrv-database secret Database username
DB_NAME ragsrv-database secret Database name
DB_PASSWORD ragsrv-database secret Database password

OAuth/Authentication

Variable Description
REQUIRE_AUTH Enable/disable authentication requirement
OAUTH_IDP_URL OAuth Identity Provider URL
OAUTH_IDP_AUD OAuth audience claim
OAUTH_JWKS_URL JWKS endpoint for token verification
ALLOWED_IPS Comma-separated list of allowed IP addresses

GraphQL Configuration

Variable Default Description
GRAPHQL_ALLOW_INTROSPECTION true Allow GraphQL introspection queries
GRAPHQL_GRAPHIQL true Enable GraphiQL interface

Integration Configuration

Variable Description
SUPPORT_SERVICE_URL URL of ragsrv-support service for model processing
WEBHOOK_STATE_URL Webhook URL for state updates (AISRV endpoint)
OWNER_NAMESPACE_UUID Default namespace UUID for ownership

Operational Settings

Variable Default Description
LOG_HEALTH_CHECKS false Log health check requests

Storage Configuration

Persistent Volumes

RAGSRV can use persistent volumes for temporary file storage:

volumes:
  - name: ragsrv-tmp
    persistentVolumeClaim:
      claimName: <name>

volumeMounts:
  - name: ragsrv-tmp
    mountPath: /tmp/ragsrv

Configuration Examples

Production Configuration

# values-production.yaml
replicaCount: 2

image:
  repository: gitea.basebox.health/basebox-distribution/ragsrv
  tag: "v1.2.3"
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 3001

resources:
  requests:
    cpu: 2000m
    memory: 8Gi
    nvidia.com/gpu: 1
  limits:
    cpu: 4000m
    memory: 16Gi
    nvidia.com/gpu: 1

livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 60
  periodSeconds: 30
  timeoutSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 5

database:
  enabled: true
  imageName: ghcr.io/cloudnative-pg/postgresql:16-standard-bookworm
  host: ragsrv-db-rw
  port: 5432
  user: ragsrv
  password: "<generate-secure-password>"
  name: ragsrv

ragsrv-db:
  cluster:
    instances: 3
    storage:
      size: 50Gi
      storageClass: fast-ssd
    monitoring:
      enablePodMonitor: true

volumes:
  - name: ragsrv-tmp
    persistentVolumeClaim:
      claimName: ragsrv-shared-storage

volumeMounts:
  - name: ragsrv-tmp
    mountPath: /tmp/ragsrv

nodeSelector:
  nvidia.com/gpu: "true"

env:
  COMPUTE: "gpu"
  SUPPORT_HTTP_STREAM: "true"
  API_KEY: "<generate-secure-api-key>"
  LOG_LEVEL: "info"
  HOST: "0.0.0.0"
  SERVER_PORT: "3001"

  # OAuth/Auth
  REQUIRE_AUTH: "true"
  OAUTH_IDP_URL: "http://idp:8080/realms/production"
  OAUTH_IDP_AUD: "ragsrv"
  OAUTH_JWKS_URL: "http://idp:8080/realms/production/protocol/openid-connect/certs"

  # Integration
  SUPPORT_SERVICE_URL: "http://ragsrv-support:8000"
  WEBHOOK_STATE_URL: "http://aisrv:8888/rag/v1/state"
  OWNER_NAMESPACE_UUID: "<generate-uuid>"

  # GraphQL
  GRAPHQL_ALLOW_INTROSPECTION: "false"
  GRAPHQL_GRAPHIQL: "false"

  # API
  API_KEY_HEADER: "X-API-KEY"
  LOG_HEALTH_CHECKS: "false"

  # Database (from secrets)
  DB_HOST:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: host
  DB_PORT:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: port
  DB_USER:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: username
  DB_NAME:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: name
  DB_PASSWORD:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: password

CPU-Only Configuration

For environments without GPU support:

# values-cpu.yaml
image:
  tag: "cpu-latest"

resources:
  requests:
    cpu: 4000m
    memory: 16Gi
  limits:
    cpu: 8000m
    memory: 32Gi

env:
  COMPUTE: "cpu"
  # ... other configuration

High Availability Configuration

# values-ha.yaml
replicaCount: 3

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

resources:
  requests:
    cpu: 2000m
    memory: 8Gi
    nvidia.com/gpu: 1
  limits:
    cpu: 4000m
    memory: 16Gi
    nvidia.com/gpu: 1

ragsrv-db:
  cluster:
    instances: 3
    storage:
      size: 100Gi
      storageClass: fast-ssd

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - ragsrv
          topologyKey: kubernetes.io/hostname

Installation

Prerequisites

  • Kubernetes cluster (1.23+)
  • Helm 3.x
  • CloudNativePG operator installed
  • GPU nodes with NVIDIA GPU Operator (for GPU mode)
  • Storage provisioner with dynamic provisioning

Install CloudNativePG Operator

helm repo add cnpg https://cloudnative-pg.github.io/charts
helm upgrade --install cnpg \
  --namespace cnpg-system \
  --create-namespace \
  cnpg/cloudnative-pg

Install RAGSRV Chart

# Install with custom values
helm install ragsrv oci://hub.basebox.ai/helm/ragsrv \
  --values values-production.yaml \
  --namespace basebox \
  --create-namespace

# Verify installation
kubectl get pods -n basebox -l app.kubernetes.io/name=ragsrv
kubectl get cluster -n basebox ragsrv-db

Upgrade

helm upgrade ragsrv oci://hub.basebox.ai/helm/ragsrv \
  --values values-production.yaml \
  --namespace basebox

Uninstall

helm uninstall ragsrv --namespace basebox

# Delete PVCs if needed
kubectl delete pvc -n basebox -l cnpg.io/cluster=ragsrv-db

Verification

Check Deployment

# Check RAGSRV pods
kubectl get pods -n basebox -l app.kubernetes.io/name=ragsrv

# Check database cluster
kubectl get cluster -n basebox ragsrv-db

# View logs
kubectl logs -n basebox -l app.kubernetes.io/name=ragsrv --tail=100

# Check GPU allocation (if using GPU mode)
kubectl describe pod -n basebox -l app.kubernetes.io/name=ragsrv | grep -A5 "nvidia.com/gpu"

Test Health Endpoint

# Port forward
kubectl port-forward -n basebox svc/ragsrv 3001:3001

# Test health endpoint
curl http://localhost:3001/health

Test GraphQL API

# Access GraphiQL interface
kubectl port-forward -n basebox svc/ragsrv 3001:3001

# Open browser to http://localhost:3001/graphql

Secrets Management

The chart creates the ragsrv-database secret automatically with database connection details: - host - Database hostname - port - Database port - name - Database name - username - Database username - password - Database password

Important: Change default passwords in production.

Database Schema Initialization

The database is initialized with:

  1. pgvector extension - Vector similarity search
  2. Custom schema - Loaded from sql/01-schema.sql via ConfigMap
  3. SUPERUSER role - Required for extension management

The schema SQL file is embedded in the chart and applied during database initialization.

GPU Configuration

GPU Requirements

  • NVIDIA GPU with CUDA support
  • NVIDIA GPU Operator installed on cluster
  • Node labeled with nvidia.com/gpu=true

GPU Resource Allocation

resources:
  requests:
    nvidia.com/gpu: 1
  limits:
    nvidia.com/gpu: 1

nodeSelector:
  nvidia.com/gpu: "true"

Verify GPU Access

# Check GPU allocation
kubectl describe pod -n basebox <ragsrv-pod-name> | grep -A5 Limits

# Check GPU usage inside pod
kubectl exec -n basebox <ragsrv-pod-name> -- nvidia-smi

Integration with Other Services

AISRV Integration

RAGSRV sends state updates to AISRV via webhook:

env:
  WEBHOOK_STATE_URL: "http://aisrv:8888/rag/v1/state"

RAGSRV-Support Integration

RAGSRV delegates model inference to the support service:

env:
  SUPPORT_SERVICE_URL: "http://ragsrv-support:8000"

Authentication via IDP

OAuth/OIDC authentication through Keycloak:

env:
  OAUTH_IDP_URL: "http://idp:8080/realms/production"
  OAUTH_JWKS_URL: "http://idp:8080/realms/production/protocol/openid-connect/certs"

Security Considerations

API Key Management

  • Generate cryptographically secure API keys
  • Rotate API keys regularly
  • Store API keys in Kubernetes secrets (not in values files)
  • Use different API keys for different environments

Database Security

  • Use strong database passwords
  • Enable SSL/TLS for database connections in production
  • Regular backups and test restore procedures
  • Monitor for unusual query patterns

Network Security

  • Use ClusterIP for internal-only services
  • Implement Network Policies to restrict traffic
  • Enable authentication (REQUIRE_AUTH: true) in production
  • Use IP allowlisting for additional security

Performance Tuning

GPU vs CPU Mode

  • GPU Mode: Faster embedding generation, lower latency, higher throughput
  • CPU Mode: More flexible, no GPU required, higher memory usage

Resource Allocation

For GPU mode:

resources:
  requests:
    cpu: 2000m
    memory: 8Gi
    nvidia.com/gpu: 1
  limits:
    cpu: 4000m
    memory: 16Gi
    nvidia.com/gpu: 1

For CPU mode:

resources:
  requests:
    cpu: 4000m
    memory: 16Gi
  limits:
    cpu: 8000m
    memory: 32Gi

Database Performance

  • Increase PostgreSQL instances for read replicas
  • Use fast storage (SSD/NVMe) for better I/O
  • Monitor pgvector index performance
  • Tune PostgreSQL parameters for vector operations

Scaling Strategies

  • Horizontal scaling with autoscaling for variable load
  • Vertical scaling (more GPU/CPU) for consistent heavy load
  • Database read replicas for query-heavy workloads

Monitoring and Metrics

Health Checks

Configure health probes for production:

livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 60
  periodSeconds: 30

readinessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 30
  periodSeconds: 10

Database Monitoring

Enable CloudNativePG monitoring:

ragsrv-db:
  cluster:
    monitoring:
      enablePodMonitor: true