Copyright © 2022- basebox GmbH, all rights reserved.
Licensed to be used in conjunction with basebox, only.

RAGSRV Configuration

Overview

RAGSRV (Retrieval-Augmented Generation Server) provides vector database and document processing capabilities for the basebox AI platform. It handles document ingestion, embedding generation, semantic search, and integrates with language models for RAG-based responses. The service uses PostgreSQL with the pgvector extension for vector storage.

Deployment

RAGSRV is deployed via Helm chart to Kubernetes clusters with an integrated PostgreSQL database managed by CloudNativePG. It requires GPU resources for optimal embedding performance.

Helm Chart Configuration

Basic Settings

Parameter	Default	Description
`replicaCount`	`1`	Number of RAGSRV pod replicas
`image.repository`	`gitea.basebox.health/basebox-distribution/ragsrv`	Container image repository
`image.pullPolicy`	`IfNotPresent`	Image pull policy
`image.tag`	`gpu-latest`	Image tag (GPU-enabled version)
`fullnameOverride`	`ragsrv`	Override the full name of the deployment

Service Configuration

Parameter	Default	Description
`service.type`	`ClusterIP`	Kubernetes service type
`service.port`	`3001`	Service port

Resource Management

Parameter	Default	Description
`resources`	`{}`	CPU/memory resource requests and limits
`autoscaling.enabled`	`false`	Enable horizontal pod autoscaling
`autoscaling.minReplicas`	`1`	Minimum number of replicas
`autoscaling.maxReplicas`	`100`	Maximum number of replicas
`autoscaling.targetCPUUtilizationPercentage`	`80`	Target CPU utilization for scaling

Health Checks

Parameter	Default	Description
`livenessProbe`	`{}`	Liveness probe configuration (disabled by default)
`readinessProbe`	`{}`	Readiness probe configuration (disabled by default)

Database Configuration

Database Settings

Parameter	Default	Description
`database.enabled`	`true`	Enable database creation
`database.imageName`	`ghcr.io/cloudnative-pg/postgresql:16-standard-bookworm`	PostgreSQL image
`database.host`	`ragsrv-db-rw`	Database host (read-write service)
`database.port`	`5432`	Database port
`database.user`	`ragsrv`	Database username
`database.password`	`<secure-password>`	Database password
`database.name`	`ragsrv`	Database name

CloudNativePG Cluster Settings

Parameter	Default	Description
`ragsrv-db.cluster.instances`	`1`	Number of PostgreSQL instances
`ragsrv-db.cluster.storage.size`	`5Gi`	Storage size for database
`ragsrv-db.cluster.storage.storageClass`	`default`	Storage class to use
`ragsrv-db.cluster.monitoring.enablePodMonitor`	`false`	Enable Prometheus monitoring

Database Extensions

The database is automatically initialized with: - pgvector extension for vector similarity search - Custom schema from sql/01-schema.sql (provided via ConfigMap) - SUPERUSER role for application user

Environment Variables

Core Configuration

Variable	Default	Description
`COMPUTE`	`gpu`	Compute mode (`gpu` or `cpu`)
`HOST`	`0.0.0.0`	Server host to bind to
`SERVER_PORT`	`3001`	Server port
`LOG_LEVEL`	`trace`	Logging level (`trace`, `debug`, `info`, `warn`, `error`)
`API_KEY`	Required	API key for authentication
`API_KEY_HEADER`	`X-API-KEY`	HTTP header name for API key
`SUPPORT_HTTP_STREAM`	`true`	Enable HTTP streaming support

Database Connection (from Secrets)

Variable	Source	Description
`DB_HOST`	`ragsrv-database` secret	Database hostname
`DB_PORT`	`ragsrv-database` secret	Database port
`DB_USER`	`ragsrv-database` secret	Database username
`DB_NAME`	`ragsrv-database` secret	Database name
`DB_PASSWORD`	`ragsrv-database` secret	Database password

OAuth/Authentication

Variable	Description
`REQUIRE_AUTH`	Enable/disable authentication requirement
`OAUTH_IDP_URL`	OAuth Identity Provider URL
`OAUTH_IDP_AUD`	OAuth audience claim
`OAUTH_JWKS_URL`	JWKS endpoint for token verification
`ALLOWED_IPS`	Comma-separated list of allowed IP addresses

GraphQL Configuration

Variable	Default	Description
`GRAPHQL_ALLOW_INTROSPECTION`	`true`	Allow GraphQL introspection queries
`GRAPHQL_GRAPHIQL`	`true`	Enable GraphiQL interface

Integration Configuration

Variable	Description
`SUPPORT_SERVICE_URL`	URL of ragsrv-support service for model processing
`WEBHOOK_STATE_URL`	Webhook URL for state updates (AISRV endpoint)
`OWNER_NAMESPACE_UUID`	Default namespace UUID for ownership

Operational Settings

Variable	Default	Description
`LOG_HEALTH_CHECKS`	`false`	Log health check requests

Storage Configuration

Persistent Volumes

RAGSRV can use persistent volumes for temporary file storage:

volumes:
  - name: ragsrv-tmp
    persistentVolumeClaim:
      claimName: <name>

volumeMounts:
  - name: ragsrv-tmp
    mountPath: /tmp/ragsrv

Configuration Examples

Production Configuration

# values-production.yaml
replicaCount: 2

image:
  repository: gitea.basebox.health/basebox-distribution/ragsrv
  tag: "v1.2.3"
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 3001

resources:
  requests:
    cpu: 2000m
    memory: 8Gi
    nvidia.com/gpu: 1
  limits:
    cpu: 4000m
    memory: 16Gi
    nvidia.com/gpu: 1

livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 60
  periodSeconds: 30
  timeoutSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 5

database:
  enabled: true
  imageName: ghcr.io/cloudnative-pg/postgresql:16-standard-bookworm
  host: ragsrv-db-rw
  port: 5432
  user: ragsrv
  password: "<generate-secure-password>"
  name: ragsrv

ragsrv-db:
  cluster:
    instances: 3
    storage:
      size: 50Gi
      storageClass: fast-ssd
    monitoring:
      enablePodMonitor: true

volumes:
  - name: ragsrv-tmp
    persistentVolumeClaim:
      claimName: ragsrv-shared-storage

volumeMounts:
  - name: ragsrv-tmp
    mountPath: /tmp/ragsrv

nodeSelector:
  nvidia.com/gpu: "true"

env:
  COMPUTE: "gpu"
  SUPPORT_HTTP_STREAM: "true"
  API_KEY: "<generate-secure-api-key>"
  LOG_LEVEL: "info"
  HOST: "0.0.0.0"
  SERVER_PORT: "3001"

  # OAuth/Auth
  REQUIRE_AUTH: "true"
  OAUTH_IDP_URL: "http://idp:8080/realms/production"
  OAUTH_IDP_AUD: "ragsrv"
  OAUTH_JWKS_URL: "http://idp:8080/realms/production/protocol/openid-connect/certs"

  # Integration
  SUPPORT_SERVICE_URL: "http://ragsrv-support:8000"
  WEBHOOK_STATE_URL: "http://aisrv:8888/rag/v1/state"
  OWNER_NAMESPACE_UUID: "<generate-uuid>"

  # GraphQL
  GRAPHQL_ALLOW_INTROSPECTION: "false"
  GRAPHQL_GRAPHIQL: "false"

  # API
  API_KEY_HEADER: "X-API-KEY"
  LOG_HEALTH_CHECKS: "false"

  # Database (from secrets)
  DB_HOST:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: host
  DB_PORT:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: port
  DB_USER:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: username
  DB_NAME:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: name
  DB_PASSWORD:
    valueFrom:
      secretKeyRef:
        name: ragsrv-database
        key: password

CPU-Only Configuration

For environments without GPU support:

# values-cpu.yaml
image:
  tag: "cpu-latest"

resources:
  requests:
    cpu: 4000m
    memory: 16Gi
  limits:
    cpu: 8000m
    memory: 32Gi

env:
  COMPUTE: "cpu"
  # ... other configuration

High Availability Configuration

# values-ha.yaml
replicaCount: 3

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

resources:
  requests:
    cpu: 2000m
    memory: 8Gi
    nvidia.com/gpu: 1
  limits:
    cpu: 4000m
    memory: 16Gi
    nvidia.com/gpu: 1

ragsrv-db:
  cluster:
    instances: 3
    storage:
      size: 100Gi
      storageClass: fast-ssd

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - ragsrv
          topologyKey: kubernetes.io/hostname

Installation

Prerequisites

Kubernetes cluster (1.23+)
Helm 3.x
CloudNativePG operator installed
GPU nodes with NVIDIA GPU Operator (for GPU mode)
Storage provisioner with dynamic provisioning

Install CloudNativePG Operator

helm repo add cnpg https://cloudnative-pg.github.io/charts
helm upgrade --install cnpg \
  --namespace cnpg-system \
  --create-namespace \
  cnpg/cloudnative-pg

Install RAGSRV Chart

# Install with custom values
helm install ragsrv oci://gitea.basebox.health/basebox-distribution/helm/ragsrv \
  --values values-production.yaml \
  --namespace basebox \
  --create-namespace

# Verify installation
kubectl get pods -n basebox -l app.kubernetes.io/name=ragsrv
kubectl get cluster -n basebox ragsrv-db

Upgrade

helm upgrade ragsrv oci://gitea.basebox.health/basebox-distribution/helm/ragsrv \
  --values values-production.yaml \
  --namespace basebox

Uninstall

helm uninstall ragsrv --namespace basebox

# Delete PVCs if needed
kubectl delete pvc -n basebox -l cnpg.io/cluster=ragsrv-db

Verification

Check Deployment

# Check RAGSRV pods
kubectl get pods -n basebox -l app.kubernetes.io/name=ragsrv

# Check database cluster
kubectl get cluster -n basebox ragsrv-db

# View logs
kubectl logs -n basebox -l app.kubernetes.io/name=ragsrv --tail=100

# Check GPU allocation (if using GPU mode)
kubectl describe pod -n basebox -l app.kubernetes.io/name=ragsrv | grep -A5 "nvidia.com/gpu"

Test Health Endpoint

# Port forward
kubectl port-forward -n basebox svc/ragsrv 3001:3001

# Test health endpoint
curl http://localhost:3001/health

Test GraphQL API

# Access GraphiQL interface
kubectl port-forward -n basebox svc/ragsrv 3001:3001

# Open browser to http://localhost:3001/graphql

Secrets Management

The chart creates the ragsrv-database secret automatically with database connection details: - host - Database hostname - port - Database port - name - Database name - username - Database username - password - Database password

Important: Change default passwords in production.

Database Schema Initialization

The database is initialized with:

pgvector extension - Vector similarity search
Custom schema - Loaded from sql/01-schema.sql via ConfigMap
SUPERUSER role - Required for extension management

The schema SQL file is embedded in the chart and applied during database initialization.

GPU Configuration

GPU Requirements

NVIDIA GPU with CUDA support
NVIDIA GPU Operator installed on cluster
Node labeled with nvidia.com/gpu=true

GPU Resource Allocation

resources:
  requests:
    nvidia.com/gpu: 1
  limits:
    nvidia.com/gpu: 1

nodeSelector:
  nvidia.com/gpu: "true"

Verify GPU Access

# Check GPU allocation
kubectl describe pod -n basebox <ragsrv-pod-name> | grep -A5 Limits

# Check GPU usage inside pod
kubectl exec -n basebox <ragsrv-pod-name> -- nvidia-smi

Integration with Other Services

AISRV Integration

RAGSRV sends state updates to AISRV via webhook:

env:
  WEBHOOK_STATE_URL: "http://aisrv:8888/rag/v1/state"

RAGSRV-Support Integration

RAGSRV delegates model inference to the support service:

env:
  SUPPORT_SERVICE_URL: "http://ragsrv-support:8000"

Authentication via IDP

OAuth/OIDC authentication through Keycloak:

env:
  OAUTH_IDP_URL: "http://idp:8080/realms/production"
  OAUTH_JWKS_URL: "http://idp:8080/realms/production/protocol/openid-connect/certs"

Security Considerations

API Key Management

Generate cryptographically secure API keys
Rotate API keys regularly
Store API keys in Kubernetes secrets (not in values files)
Use different API keys for different environments

Database Security

Use strong database passwords
Enable SSL/TLS for database connections in production
Regular backups and test restore procedures
Monitor for unusual query patterns

Network Security

Use ClusterIP for internal-only services
Implement Network Policies to restrict traffic
Enable authentication (REQUIRE_AUTH: true) in production
Use IP allowlisting for additional security

Performance Tuning

GPU vs CPU Mode

GPU Mode: Faster embedding generation, lower latency, higher throughput
CPU Mode: More flexible, no GPU required, higher memory usage

Resource Allocation

For GPU mode:

resources:
  requests:
    cpu: 2000m
    memory: 8Gi
    nvidia.com/gpu: 1
  limits:
    cpu: 4000m
    memory: 16Gi
    nvidia.com/gpu: 1

For CPU mode:

resources:
  requests:
    cpu: 4000m
    memory: 16Gi
  limits:
    cpu: 8000m
    memory: 32Gi

Database Performance

Increase PostgreSQL instances for read replicas
Use fast storage (SSD/NVMe) for better I/O
Monitor pgvector index performance
Tune PostgreSQL parameters for vector operations

Scaling Strategies

Horizontal scaling with autoscaling for variable load
Vertical scaling (more GPU/CPU) for consistent heavy load
Database read replicas for query-heavy workloads

Monitoring and Metrics

Health Checks

Configure health probes for production:

livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 60
  periodSeconds: 30

readinessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 30
  periodSeconds: 10

Database Monitoring

Enable CloudNativePG monitoring:

ragsrv-db:
  cluster:
    monitoring:
      enablePodMonitor: true