Skip to main content

CodeMie Native LLM Config

Native Integration Configuration

This section covers configuring models using CodeMie's native provider integration.

Native integration uses environment-specific YAML configuration files to define available models:

  • Configuration Path: /app/config/llms/ in CodeMie API container
  • File Naming Pattern: llm-<MODELS_ENV>-config.yaml
  • Environment Variable: MODELS_ENV determines which config file to load

Example: MODELS_ENV=production → loads llm-production-config.yaml

For detailed parameter descriptions, see the LLM Model Configuration Reference.

Reference Configurations

AWS Bedrock Configuration Example
# Configuration file for managing multiple LLM and embedding models
llm_models:
- base_name: "claude-4-5-haiku"
deployment_name: "us.anthropic.claude-haiku-4-5-20251001-v1:0"
label: "Bedrock Claude 4.5 Haiku"
multimodal: true
enabled: true
provider: "aws_bedrock"
max_output_tokens: 64000
cost:
input: 0.0000011
output: 0.0000055
cache_read_input_token_cost: 0.00000011
cache_creation_input_token_cost: 0.000001375

- base_name: "claude-4-5-sonnet"
deployment_name: "us.anthropic.claude-sonnet-4-5-20250929-v1:0"
label: "Bedrock Claude 4.5 Sonnet"
multimodal: true
enabled: true
provider: "aws_bedrock"
max_output_tokens: 64000
cost:
input: 0.0000033
output: 0.0000165
cache_read_input_token_cost: 0.00000033
cache_creation_input_token_cost: 0.000004125

- base_name: "claude-sonnet-4-6"
deployment_name: "us.anthropic.claude-sonnet-4-6"
label: "Bedrock Claude Sonnet 4.6"
multimodal: true
enabled: true
default_for_categories: [global]
provider: "aws_bedrock"
max_output_tokens: 64000
cost:
input: 0.0000033
output: 0.0000165
cache_read_input_token_cost: 0.00000033
cache_creation_input_token_cost: 0.000004125

- base_name: "us.meta.llama4-maverick-17b-instruct-v1:0"
deployment_name: "us.meta.llama4-maverick-17b-instruct-v1:0"
label: "LLaMa Maverick Instruct 17B"
multimodal: false
react_agent: false
enabled: true
provider: "aws_bedrock"
max_output_tokens: 8192
features:
streaming: false
tools: true
cost:
input: 0.00000024
output: 0.00000097
cache_read_input_token_cost: 0.00000024

- base_name: "claude-opus-4-5-20251101"
deployment_name: "us.anthropic.claude-opus-4-5-20251101-v1:0"
label: "Bedrock Claude Opus 4.5"
multimodal: true
enabled: true
provider: "aws_bedrock"
max_output_tokens: 64000
cost:
input: 0.0000055
output: 0.0000275
cache_read_input_token_cost: 0.00000055
cache_creation_input_token_cost: 0.000006875

- base_name: "claude-opus-4-6-20260205"
deployment_name: "us.anthropic.claude-opus-4-6-v1"
label: "Bedrock Claude Opus 4.6"
multimodal: true
enabled: true
provider: "aws_bedrock"
max_output_tokens: 128000
cost:
input: 0.0000055
output: 0.0000275
cache_read_input_token_cost: 0.00000055
cache_creation_input_token_cost: 0.000006875

embeddings_models:
- base_name: "titan"
deployment_name: "amazon.titan-embed-text-v2:0"
label: "Titan Embed Text v2.0"
enabled: true
default_for_categories: [global]
provider: "aws_bedrock"
cost:
input: 0.0000002
output: 0
Azure OpenAI Configuration Example
# Configuration file for managing multiple LLM and embedding models
# Keep it up to date as some models can be deprecated
# https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/model-retirements
llm_models:
- base_name: "gpt-4.1"
deployment_name: "gpt-4.1-2025-04-14"
label: "GPT-4.1 2025-04-14"
multimodal: true
enabled: true
provider: "azure_openai"
max_output_tokens: 32768
features:
max_tokens: false
cost:
input: 0.000002
output: 0.000008
input_cost_per_token_batches: 0.000001
output_cost_per_token_batches: 0.000004
cache_read_input_token_cost: 0.0000005

- base_name: "gpt-4.1-mini"
deployment_name: "gpt-4.1-mini-2025-04-14"
label: "GPT-4.1 mini 2025-04-14"
multimodal: true
enabled: true
provider: "azure_openai"
max_output_tokens: 32768
features:
max_tokens: false
cost:
input: 0.0000004
output: 0.0000016
input_cost_per_token_batches: 0.0000002
output_cost_per_token_batches: 0.0000008
cache_read_input_token_cost: 0.0000001

- base_name: "gpt-5-2025-08-07"
deployment_name: "gpt-5-2025-08-07"
label: "GPT-5 2025-08-07"
default_for_categories: [global]
multimodal: true
enabled: true
provider: "azure_openai"
max_output_tokens: 128000
features:
max_tokens: false
temperature: false
cost:
input: 0.00000125
output: 0.000010
cache_read_input_token_cost: 0.000000125

- base_name: "gpt-5-mini-2025-08-07"
deployment_name: "gpt-5-mini-2025-08-07"
label: "GPT-5 Mini 2025-08-07"
multimodal: true
enabled: true
provider: "azure_openai"
max_output_tokens: 128000
features:
max_tokens: false
temperature: false
cost:
input: 0.00000025
output: 0.000002
cache_read_input_token_cost: 0.000000025

- base_name: "gpt-5-nano-2025-08-07"
deployment_name: "gpt-5-nano-2025-08-07"
label: "GPT-5 Nano 2025-08-07"
multimodal: true
enabled: true
provider: "azure_openai"
max_output_tokens: 128000
features:
max_tokens: false
temperature: false
cost:
input: 0.00000005
output: 0.0000004
cache_read_input_token_cost: 0.000000005

- base_name: "gpt-4-vision"
deployment_name: "gpt-4-vision-preview"
multimodal: false
enabled: false
provider: "azure_openai"
cost:
input: 0.00001
output: 0.00003

- base_name: "o3-mini"
deployment_name: "o3-mini-2025-01-31"
label: "o3 Mini 2025-01-31"
multimodal: false
react_agent: false
enabled: true
provider: "azure_openai"
max_output_tokens: 100000
features:
streaming: false
tools: true
temperature: false
parallel_tool_calls: false
system_prompt: false
max_tokens: false
cost:
input: 0.0000011
output: 0.0000044
cache_read_input_token_cost: 0.00000055

- base_name: "o1"
deployment_name: "o1-2024-12-17"
label: "o1 2024-12-17"
multimodal: false
react_agent: false
enabled: true
provider: "azure_openai"
max_output_tokens: 100000
features:
streaming: false
tools: true
temperature: false
parallel_tool_calls: false
system_prompt: false
max_tokens: false
cost:
input: 0.000015
output: 0.00006
cache_read_input_token_cost: 0.0000075

- base_name: "o3-2025-04-16"
deployment_name: "o3-2025-04-16"
label: "o3 2025-04-16"
multimodal: true
react_agent: false
enabled: true
provider: "azure_openai"
max_output_tokens: 100000
features:
streaming: true
tools: true
temperature: false
parallel_tool_calls: false
system_prompt: false
max_tokens: false
reasoning: true
cost:
input: 0.000002
output: 0.000008
cache_read_input_token_cost: 0.0000005

- base_name: "o4-mini-2025-04-16"
deployment_name: "o4-mini-2025-04-16"
label: "o4-mini 2025-04-16"
multimodal: true
react_agent: false
enabled: true
provider: "azure_openai"
max_output_tokens: 100000
features:
streaming: true
tools: true
temperature: false
parallel_tool_calls: false
system_prompt: false
max_tokens: false
reasoning: true
cost:
input: 0.0000011
output: 0.0000044
cache_read_input_token_cost: 0.000000275

embeddings_models:
- base_name: "ada-002"
deployment_name: "text-embedding-ada-002"
label: "Text Embedding Ada"
enabled: true
default_for_categories: [global]
provider: "azure_openai"
cost:
input: 0.0000001
output: 0
Parallel tool calls and reasoning models

Standard models (gpt-4.1, Claude, Gemini) support parallel tool calls — the agent can issue multiple tool calls in one inference round and stream their results concurrently.

Reasoning models (o1, o3, o3-mini, o4-mini, and similar) do not support the parallel_tool_calls OpenAI parameter. Sending it causes an API error. Always set parallel_tool_calls: false in the features block for these models, as shown in the Azure examples above. When this flag is false, the platform automatically strips the parameter from every outgoing request to that model.

For standard models, you can omit the flag entirely — it defaults to false, and parallel execution is controlled by the model's own behavior. Set parallel_tool_calls: true only if you explicitly want to enable and advertise the capability for a specific model.

Google Vertex AI Configuration Example
# Configuration file for managing multiple LLM and embedding models
llm_models:
- base_name: "gemini-3-flash"
deployment_name: "gemini-3-flash-preview"
label: "Gemini 3 Flash"
multimodal: true
enabled: true
provider: "google_vertexai"
max_output_tokens: 65535
cost:
input: 0.0000005
output: 0.000003
cache_read_input_token_cost: 0.00000005

- base_name: "gemini-3.1-pro"
deployment_name: "gemini-3.1-pro-preview"
label: "Gemini 3.1 Pro"
multimodal: true
enabled: true
default_for_categories: [global]
provider: "google_vertexai"
max_output_tokens: 65536
cost:
input: 0.000002
output: 0.000012
cache_read_input_token_cost: 0.0000002
input_cost_per_token_batches: 0.000001
output_cost_per_token_batches: 0.000006

embeddings_models:
- base_name: "gecko"
deployment_name: "text-embedding-005"
label: "Text Embedding Gecko"
enabled: true
default_for_categories: [global]
provider: "google_vertexai"
cost:
input: 0.0000001
output: 0

Configuration Steps

Step 1: Create Model Configuration File

Create a custom model configuration YAML file with your LLM and embedding models:

llm_models:
- base_name: "gpt-4.1"
deployment_name: "gpt-4.1-2025-04-14"
label: "GPT-4.1 (Apr 2025)"
multimodal: true
enabled: true
default_for_categories: [global]
provider: "azure_openai"
max_output_tokens: 32768
features:
max_tokens: false
cost:
input: 0.000002
output: 0.000008
input_cost_per_token_batches: 0.000001
output_cost_per_token_batches: 0.000004
cache_read_input_token_cost: 0.0000005

- base_name: "gpt-4.1-mini"
deployment_name: "gpt-4.1-mini-2025-04-14"
label: "GPT-4.1 Mini (Apr 2025)"
multimodal: true
enabled: true
provider: "azure_openai"
max_output_tokens: 32768
features:
max_tokens: false
cost:
input: 0.0000004
output: 0.0000016
input_cost_per_token_batches: 0.0000002
output_cost_per_token_batches: 0.0000008
cache_read_input_token_cost: 0.0000001

- base_name: "claude-sonnet-4-6"
deployment_name: "us.anthropic.claude-sonnet-4-6"
label: "Bedrock Claude Sonnet 4.6"
multimodal: true
enabled: true
provider: "aws_bedrock"
max_output_tokens: 64000
cost:
input: 0.0000033
output: 0.0000165
cache_read_input_token_cost: 0.00000033
cache_creation_input_token_cost: 0.000004125

embeddings_models:
- base_name: "ada-002"
deployment_name: "text-embedding-ada-002"
label: "Text Embedding Ada 002"
enabled: true
provider: "azure_openai"
default_for_categories: [global]
cost:
input: 0.0000001
output: 0

Step 2: Update Helm Values

This step creates a Kubernetes ConfigMap with your model configuration and mounts it as a file inside the CodeMie API container.

Edit codemie-helm-charts/codemie-api/values.yaml:

# Set environment variable to load custom config
extraEnv:
- name: MODELS_ENV
value: production # Replace with your environment name (e.g., production, staging, dev)
- name: LLM_PROXY_MODE
value: internal # Use native provider integration

# Mount custom config file into the container
# This mounts the ConfigMap as a file at: /app/config/llms/llm-production-config.yaml
extraVolumeMounts: |
- name: codemie-llm-config # Reference to volume defined below
mountPath: /app/config/llms/llm-production-config.yaml # Full path inside container
subPath: llm-production-config.yaml # Key name from ConfigMap data

# Define volume that references the ConfigMap
extraVolumes: |
- name: codemie-llm-config # Volume name (referenced in extraVolumeMounts)
configMap:
name: codemie-llm-config # Name of the ConfigMap (defined in extraObjects)

# Create ConfigMap with model configuration
# The ConfigMap stores your YAML configuration and makes it available to mount
extraObjects:
- apiVersion: v1
kind: ConfigMap
metadata:
name: codemie-llm-config # ConfigMap name (referenced in extraVolumes)
data:
llm-production-config.yaml: | # Key name (referenced in subPath)
# Paste your model configuration here
llm_models:
- base_name: "gpt-4.1"
# ... (model config from Step 1)

How it works:

  1. ConfigMap Creation: extraObjects creates a ConfigMap named codemie-llm-config containing your model configuration
  2. Volume Definition: extraVolumes defines a volume that references the ConfigMap
  3. Volume Mount: extraVolumeMounts mounts the ConfigMap as a file at /app/config/llms/llm-production-config.yaml inside the container
  4. File Loading: CodeMie API reads the file based on the pattern /app/config/llms/llm-<MODELS_ENV>-config.yaml

Important: The file name in mountPath must match the pattern llm-<MODELS_ENV>-config.yaml where <MODELS_ENV> is the value of the MODELS_ENV environment variable. For example:

  • MODELS_ENV=production → loads /app/config/llms/llm-production-config.yaml
  • MODELS_ENV=staging → loads /app/config/llms/llm-staging-config.yaml

Step 3: Deploy Configuration

Before deploying, authenticate with the AI/Run CodeMie Helm registry:

export GOOGLE_APPLICATION_CREDENTIALS=key.json
gcloud auth application-default print-access-token | helm registry login -u oauth2accesstoken --password-stdin europe-west3-docker.pkg.dev

Then deploy using one of the following methods:

Option A: Using Automated Script (Recommended)

# Replace x.y.z with your version (e.g., 2.2.5)
bash helm-charts.sh --cloud <aws|azure|gcp> --version x.y.z --mode update

Option B: Manual Helm Upgrade

# Replace x.y.z with your version (e.g., 2.2.5)
helm upgrade --install codemie-api \
oci://europe-west3-docker.pkg.dev/or2-msq-epmd-edp-anthos-t1iylu/helm-charts/codemie \
--version x.y.z \
--namespace "codemie" \
-f "./codemie-api/values-<aws|azure|gcp>.yaml" \
--wait --timeout 600s
Full Deployment Guide

For detailed deployment instructions, troubleshooting, and additional options, see the Update AI/Run CodeMie documentation.

Step 4: Verify Models

Check that models are loaded successfully:

# View API logs
kubectl logs -n codemie deployment/codemie-api | grep "LLMConfig initiated"

# Expected output:
# LLMConfig initiated. Config=llm-production-config.yaml. LLMModels=[...]. EmbeddingModels=[...]

Configuration Parameters Reference

For detailed parameter descriptions, model categories, features, and embedding model configuration, see the LLM Model Configuration Reference.

Useful Resources

Model Information and Pricing

Provider Documentation