Licensed to be used in conjunction with basebox, only.
RAGSRV Configuration
Overview
RAGSRV (Retrieval-Augmented Generation Server) provides vector database and document processing capabilities for the basebox AI platform. It handles document ingestion, embedding generation, semantic search, and integrates with language models for RAG-based responses. The service uses PostgreSQL with the pgvector extension for vector storage.
Deployment
RAGSRV is deployed via Helm chart to Kubernetes clusters with an integrated PostgreSQL database managed by CloudNativePG. It requires GPU resources for optimal embedding performance.
Helm Chart Configuration
Basic Settings
| Parameter | Default | Description |
|---|---|---|
replicaCount |
1 |
Number of RAGSRV pod replicas |
image.repository |
gitea.basebox.health/basebox-distribution/ragsrv |
Container image repository |
image.pullPolicy |
IfNotPresent |
Image pull policy |
image.tag |
gpu-latest |
Image tag (GPU-enabled version) |
fullnameOverride |
ragsrv |
Override the full name of the deployment |
Service Configuration
| Parameter | Default | Description |
|---|---|---|
service.type |
ClusterIP |
Kubernetes service type |
service.port |
3001 |
Service port |
Resource Management
| Parameter | Default | Description |
|---|---|---|
resources |
{} |
CPU/memory resource requests and limits |
autoscaling.enabled |
false |
Enable horizontal pod autoscaling |
autoscaling.minReplicas |
1 |
Minimum number of replicas |
autoscaling.maxReplicas |
100 |
Maximum number of replicas |
autoscaling.targetCPUUtilizationPercentage |
80 |
Target CPU utilization for scaling |
Health Checks
| Parameter | Default | Description |
|---|---|---|
livenessProbe |
{} |
Liveness probe configuration (disabled by default) |
readinessProbe |
{} |
Readiness probe configuration (disabled by default) |
Database Configuration
Database Settings
| Parameter | Default | Description |
|---|---|---|
database.enabled |
true |
Enable database creation |
database.imageName |
ghcr.io/cloudnative-pg/postgresql:16-standard-bookworm |
PostgreSQL image |
database.host |
ragsrv-db-rw |
Database host (read-write service) |
database.port |
5432 |
Database port |
database.user |
ragsrv |
Database username |
database.password |
<secure-password> |
Database password |
database.name |
ragsrv |
Database name |
CloudNativePG Cluster Settings
| Parameter | Default | Description |
|---|---|---|
ragsrv-db.cluster.instances |
1 |
Number of PostgreSQL instances |
ragsrv-db.cluster.storage.size |
5Gi |
Storage size for database |
ragsrv-db.cluster.storage.storageClass |
default |
Storage class to use |
ragsrv-db.cluster.monitoring.enablePodMonitor |
false |
Enable Prometheus monitoring |
Database Extensions
The database is automatically initialized with:
- pgvector extension for vector similarity search
- Custom schema from sql/01-schema.sql (provided via ConfigMap)
- SUPERUSER role for application user
Environment Variables
Core Configuration
| Variable | Default | Description |
|---|---|---|
COMPUTE |
gpu |
Compute mode (gpu or cpu) |
HOST |
0.0.0.0 |
Server host to bind to |
SERVER_PORT |
3001 |
Server port |
LOG_LEVEL |
trace |
Logging level (trace, debug, info, warn, error) |
API_KEY |
Required | API key for authentication |
API_KEY_HEADER |
X-API-KEY |
HTTP header name for API key |
SUPPORT_HTTP_STREAM |
true |
Enable HTTP streaming support |
Database Connection (from Secrets)
| Variable | Source | Description |
|---|---|---|
DB_HOST |
ragsrv-database secret |
Database hostname |
DB_PORT |
ragsrv-database secret |
Database port |
DB_USER |
ragsrv-database secret |
Database username |
DB_NAME |
ragsrv-database secret |
Database name |
DB_PASSWORD |
ragsrv-database secret |
Database password |
OAuth/Authentication
| Variable | Description |
|---|---|
REQUIRE_AUTH |
Enable/disable authentication requirement |
OAUTH_IDP_URL |
OAuth Identity Provider URL |
OAUTH_IDP_AUD |
OAuth audience claim |
OAUTH_JWKS_URL |
JWKS endpoint for token verification |
ALLOWED_IPS |
Comma-separated list of allowed IP addresses |
GraphQL Configuration
| Variable | Default | Description |
|---|---|---|
GRAPHQL_ALLOW_INTROSPECTION |
true |
Allow GraphQL introspection queries |
GRAPHQL_GRAPHIQL |
true |
Enable GraphiQL interface |
Integration Configuration
| Variable | Description |
|---|---|
SUPPORT_SERVICE_URL |
URL of ragsrv-support service for model processing |
WEBHOOK_STATE_URL |
Webhook URL for state updates (AISRV endpoint) |
OWNER_NAMESPACE_UUID |
Default namespace UUID for ownership |
Operational Settings
| Variable | Default | Description |
|---|---|---|
LOG_HEALTH_CHECKS |
false |
Log health check requests |
Storage Configuration
Persistent Volumes
RAGSRV can use persistent volumes for temporary file storage:
volumes:
- name: ragsrv-tmp
persistentVolumeClaim:
claimName: <name>
volumeMounts:
- name: ragsrv-tmp
mountPath: /tmp/ragsrv
Configuration Examples
Production Configuration
# values-production.yaml
replicaCount: 2
image:
repository: gitea.basebox.health/basebox-distribution/ragsrv
tag: "v1.2.3"
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 3001
resources:
requests:
cpu: 2000m
memory: 8Gi
nvidia.com/gpu: 1
limits:
cpu: 4000m
memory: 16Gi
nvidia.com/gpu: 1
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 5
database:
enabled: true
imageName: ghcr.io/cloudnative-pg/postgresql:16-standard-bookworm
host: ragsrv-db-rw
port: 5432
user: ragsrv
password: "<generate-secure-password>"
name: ragsrv
ragsrv-db:
cluster:
instances: 3
storage:
size: 50Gi
storageClass: fast-ssd
monitoring:
enablePodMonitor: true
volumes:
- name: ragsrv-tmp
persistentVolumeClaim:
claimName: ragsrv-shared-storage
volumeMounts:
- name: ragsrv-tmp
mountPath: /tmp/ragsrv
nodeSelector:
nvidia.com/gpu: "true"
env:
COMPUTE: "gpu"
SUPPORT_HTTP_STREAM: "true"
API_KEY: "<generate-secure-api-key>"
LOG_LEVEL: "info"
HOST: "0.0.0.0"
SERVER_PORT: "3001"
# OAuth/Auth
REQUIRE_AUTH: "true"
OAUTH_IDP_URL: "http://idp:8080/realms/production"
OAUTH_IDP_AUD: "ragsrv"
OAUTH_JWKS_URL: "http://idp:8080/realms/production/protocol/openid-connect/certs"
# Integration
SUPPORT_SERVICE_URL: "http://ragsrv-support:8000"
WEBHOOK_STATE_URL: "http://aisrv:8888/rag/v1/state"
OWNER_NAMESPACE_UUID: "<generate-uuid>"
# GraphQL
GRAPHQL_ALLOW_INTROSPECTION: "false"
GRAPHQL_GRAPHIQL: "false"
# API
API_KEY_HEADER: "X-API-KEY"
LOG_HEALTH_CHECKS: "false"
# Database (from secrets)
DB_HOST:
valueFrom:
secretKeyRef:
name: ragsrv-database
key: host
DB_PORT:
valueFrom:
secretKeyRef:
name: ragsrv-database
key: port
DB_USER:
valueFrom:
secretKeyRef:
name: ragsrv-database
key: username
DB_NAME:
valueFrom:
secretKeyRef:
name: ragsrv-database
key: name
DB_PASSWORD:
valueFrom:
secretKeyRef:
name: ragsrv-database
key: password
CPU-Only Configuration
For environments without GPU support:
# values-cpu.yaml
image:
tag: "cpu-latest"
resources:
requests:
cpu: 4000m
memory: 16Gi
limits:
cpu: 8000m
memory: 32Gi
env:
COMPUTE: "cpu"
# ... other configuration
High Availability Configuration
# values-ha.yaml
replicaCount: 3
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
resources:
requests:
cpu: 2000m
memory: 8Gi
nvidia.com/gpu: 1
limits:
cpu: 4000m
memory: 16Gi
nvidia.com/gpu: 1
ragsrv-db:
cluster:
instances: 3
storage:
size: 100Gi
storageClass: fast-ssd
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- ragsrv
topologyKey: kubernetes.io/hostname
Installation
Prerequisites
- Kubernetes cluster (1.23+)
- Helm 3.x
- CloudNativePG operator installed
- GPU nodes with NVIDIA GPU Operator (for GPU mode)
- Storage provisioner with dynamic provisioning
Install CloudNativePG Operator
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm upgrade --install cnpg \
--namespace cnpg-system \
--create-namespace \
cnpg/cloudnative-pg
Install RAGSRV Chart
# Install with custom values
helm install ragsrv oci://hub.basebox.ai/helm/ragsrv \
--values values-production.yaml \
--namespace basebox \
--create-namespace
# Verify installation
kubectl get pods -n basebox -l app.kubernetes.io/name=ragsrv
kubectl get cluster -n basebox ragsrv-db
Upgrade
helm upgrade ragsrv oci://hub.basebox.ai/helm/ragsrv \
--values values-production.yaml \
--namespace basebox
Uninstall
helm uninstall ragsrv --namespace basebox
# Delete PVCs if needed
kubectl delete pvc -n basebox -l cnpg.io/cluster=ragsrv-db
Verification
Check Deployment
# Check RAGSRV pods
kubectl get pods -n basebox -l app.kubernetes.io/name=ragsrv
# Check database cluster
kubectl get cluster -n basebox ragsrv-db
# View logs
kubectl logs -n basebox -l app.kubernetes.io/name=ragsrv --tail=100
# Check GPU allocation (if using GPU mode)
kubectl describe pod -n basebox -l app.kubernetes.io/name=ragsrv | grep -A5 "nvidia.com/gpu"
Test Health Endpoint
# Port forward
kubectl port-forward -n basebox svc/ragsrv 3001:3001
# Test health endpoint
curl http://localhost:3001/health
Test GraphQL API
# Access GraphiQL interface
kubectl port-forward -n basebox svc/ragsrv 3001:3001
# Open browser to http://localhost:3001/graphql
Secrets Management
The chart creates the ragsrv-database secret automatically with database connection details:
- host - Database hostname
- port - Database port
- name - Database name
- username - Database username
- password - Database password
Important: Change default passwords in production.
Database Schema Initialization
The database is initialized with:
- pgvector extension - Vector similarity search
- Custom schema - Loaded from
sql/01-schema.sqlvia ConfigMap - SUPERUSER role - Required for extension management
The schema SQL file is embedded in the chart and applied during database initialization.
GPU Configuration
GPU Requirements
- NVIDIA GPU with CUDA support
- NVIDIA GPU Operator installed on cluster
- Node labeled with
nvidia.com/gpu=true
GPU Resource Allocation
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
nodeSelector:
nvidia.com/gpu: "true"
Verify GPU Access
# Check GPU allocation
kubectl describe pod -n basebox <ragsrv-pod-name> | grep -A5 Limits
# Check GPU usage inside pod
kubectl exec -n basebox <ragsrv-pod-name> -- nvidia-smi
Integration with Other Services
AISRV Integration
RAGSRV sends state updates to AISRV via webhook:
RAGSRV-Support Integration
RAGSRV delegates model inference to the support service:
Authentication via IDP
OAuth/OIDC authentication through Keycloak:
env:
OAUTH_IDP_URL: "http://idp:8080/realms/production"
OAUTH_JWKS_URL: "http://idp:8080/realms/production/protocol/openid-connect/certs"
Security Considerations
API Key Management
- Generate cryptographically secure API keys
- Rotate API keys regularly
- Store API keys in Kubernetes secrets (not in values files)
- Use different API keys for different environments
Database Security
- Use strong database passwords
- Enable SSL/TLS for database connections in production
- Regular backups and test restore procedures
- Monitor for unusual query patterns
Network Security
- Use
ClusterIPfor internal-only services - Implement Network Policies to restrict traffic
- Enable authentication (
REQUIRE_AUTH: true) in production - Use IP allowlisting for additional security
Performance Tuning
GPU vs CPU Mode
- GPU Mode: Faster embedding generation, lower latency, higher throughput
- CPU Mode: More flexible, no GPU required, higher memory usage
Resource Allocation
For GPU mode:
resources:
requests:
cpu: 2000m
memory: 8Gi
nvidia.com/gpu: 1
limits:
cpu: 4000m
memory: 16Gi
nvidia.com/gpu: 1
For CPU mode:
Database Performance
- Increase PostgreSQL instances for read replicas
- Use fast storage (SSD/NVMe) for better I/O
- Monitor pgvector index performance
- Tune PostgreSQL parameters for vector operations
Scaling Strategies
- Horizontal scaling with autoscaling for variable load
- Vertical scaling (more GPU/CPU) for consistent heavy load
- Database read replicas for query-heavy workloads
Monitoring and Metrics
Health Checks
Configure health probes for production:
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 60
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
Database Monitoring
Enable CloudNativePG monitoring: