Prerequisites for Deployment
Before deploying Langfuse, complete the following steps:
Step 1: Clone the Repository
Clone the codemie-helm-charts repository.
Step 2: Update langfuse/values.yaml
Before deployment, you MUST update the langfuse/values.yaml file with your environment-specific values:
# Update the domain in these sections:
langfuse:
langfuse:
# Specify Langfuse version
image:
tag: "3.129.0" # Pin to specific version for production stability
nextauth:
url: "https://langfuse.%%DOMAIN%%" # Replace with your domain
ingress:
hosts:
- host: "langfuse.%%DOMAIN%%" # Replace with your domain
# Adjust storage class if needed:
global:
defaultStorageClass: "your-storage-class" # e.g., "gp3", "standard", etc.
# Adjust resource limits based on your requirements:
langfuse:
web:
resources:
limits:
cpu: "2" # Adjust as needed
memory: "4Gi" # Adjust as needed
worker:
resources:
limits:
cpu: "2" # Adjust as needed
memory: "4Gi" # Adjust as needed
postgresql:
deploy: false
host: some-postgresql.database.example.com # Replace with your database host
auth:
username: langfuse_admin
existingSecret: langfuse-postgresql
secretKeys:
userPasswordKey: password
# Adjust component resources:
clickhouse:
resources:
limits:
cpu: "2" # Adjust as needed
memory: "8Gi" # Adjust as needed
persistence:
size: "100Gi" # Adjust as needed
# Uncomment this to enable TTL (automatic deletion of old logs) for ClickHouse system logs. Default is 90 days.
# extraOverrides: |
# <clickhouse>
# <!--
# Standard System Tables:
# These settings merge with the default configuration in /etc/clickhouse-server/config.xml.
# We simply inject the <ttl> parameter to enable retention, preserving other default settings
# (partitioning, flush intervals).
# -->
# <query_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </query_log>
# <part_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </part_log>
# <trace_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </trace_log>
# <asynchronous_insert_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </asynchronous_insert_log>
# <asynchronous_metric_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </asynchronous_metric_log>
# <backup_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </backup_log>
# <blob_storage_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </blob_storage_log>
# <crash_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </crash_log>
# <metric_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </metric_log>
# <query_thread_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </query_thread_log>
# <query_views_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </query_views_log>
# <session_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </session_log>
# <zookeeper_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </zookeeper_log>
# <processors_profile_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </processors_profile_log>
# <text_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </text_log>
# <error_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </error_log>
# <query_metric_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </query_metric_log>
# <latency_log>
# <ttl>event_date + INTERVAL 90 DAY DELETE</ttl>
# </latency_log>
# <!--
# OpenTelemetry Span Log:
# We must fully redefine this table configuration (using replace="1") instead of merging.
# Adding a standalone <ttl> tag fails when <engine> is already defined in the default config.
# See bug: https://github.com/ClickHouse/ClickHouse/issues/88366
# -->
# <opentelemetry_span_log replace="1">
# <database>system</database>
# <table>opentelemetry_span_log</table>
# <engine>ENGINE = MergeTree PARTITION BY toYYYYMM(finish_date) ORDER BY (finish_date, finish_time_us, trace_id) TTL finish_date + INTERVAL 90 DAY DELETE</engine>
# <flush_interval_milliseconds>7500</flush_interval_milliseconds>
# <max_size_rows>1048576</max_size_rows>
# <reserved_size_rows>8192</reserved_size_rows>
# <buffer_size_rows_flush_threshold>524288</buffer_size_rows_flush_threshold>
# <flush_on_crash>false</flush_on_crash>
# </opentelemetry_span_log>
# </clickhouse>
redis:
persistence:
size: "2Gi" # Adjust as needed
s3:
persistence:
size: "100Gi" # Adjust as needed
# Configure data retention policies for langfuse (optional):
retention:
langfuse:
enabled: false # Set to 'true' to automatically purge historical data
observationsDays: 90 # Retain observations for 90 days (table: default.observations)
tracesDays: 90 # Retain traces for 90 days (table: default.traces)
blobstoragefilelogDays: 90 # Retain blob storage logs for 90 days (table: default.blob_storage_file_log)
Step 3: Configure PostgreSQL
Configure PostgreSQL running in managed cloud.
3.1. Connect to PostgreSQL Database
Connect to PostgreSQL database codemie depending on your cloud provider. Choose one of the following options:
- Some cloud providers have built-in query tools
- Deploy pgadmin inside the cluster to access your private Postgres instance:
# Create namespace and secret
kubectl create ns pgadmin
kubectl create secret generic pgadmin4-credentials \
--namespace pgadmin \
--from-literal=password="$(openssl rand -hex 16)" \
--type=Opaque
helm upgrade --install pgadmin pgadmin/. -n pgadmin --values pgadmin/values.yaml --wait --timeout 900s --dependency-update
# port-forward to svc
kubectl -n pgadmin port-forward svc/pgadmin-pgadmin4 8080:80
# access via localhost:8080 with secret from pgadmin namespace
# Default user: "pgadmin4@example.com"
# Retrieve the pgAdmin password from the Kubernetes secret.
kubectl -n pgadmin get secret pgadmin4-credentials -o jsonpath='{.data.password}' | base64 -d; echo
3.2. Create Database and User
Execute the following SQL commands to create the database and user:
CREATE DATABASE postgres_langfuse;
CREATE USER langfuse_admin WITH PASSWORD 'your_strong_password_here';
GRANT ALL PRIVILEGES ON DATABASE postgres_langfuse TO langfuse_admin;
3.3. Grant Schema Privileges
Switch to the postgres_langfuse database and grant schema privileges:
GRANT ALL ON SCHEMA public TO langfuse_admin;
Step 4: Configure Data Retention (Optional)
To prevent disk overflow, configure TTL policies in values.yaml to automatically remove old data. Default retention: 90 days.
Langfuse Tables
Set retention.langfuse.enabled: true in values.yaml. TTL is configured for the following tables:
default.observationsdefault.tracesdefault.blob_storage_file_log
These three tables consume the most disk space. Other Langfuse tables have minimal storage impact and do not require TTL configuration.
ClickHouse System Tables
Uncomment the clickhouse.extraOverrides section in values.yaml. TTL is configured for the following tables:
system.query_logsystem.trace_logsystem.metric_log- and other system tables
If you're enabling TTL on an existing Langfuse deployment or changing retention periods, TTL does not retroactively delete old data. You must manually clean up data that exceeds your new retention policy before redeploying.
Steps for changing TTL:
- Update
values.yamlwith new retention settings (configure desired TTL periods) - Connect to ClickHouse - see how to connect to ClickHouse
- Delete old data manually - see Manual Data Deletion queries
- Redeploy Langfuse with
helm upgradeto apply new TTL configuration
For detailed SQL queries and monitoring, see Data Volume Maintenance guide.