Azure OpenAI Configuration
Overview
This guide explains how to enable Azure OpenAI services in your Azure account and deploy AI models for use with AI/Run CodeMie. Azure OpenAI provides access to OpenAI models including GPT-4.1, GPT-5, o1, and o3-series models.
This configuration is required if you plan to use Azure OpenAI models such as GPT-4.1, GPT-5, or any OpenAI models hosted on Azure.
Prerequisites
Before starting, ensure you have:
- Azure Subscription: Active Azure subscription with appropriate permissions
- Resource Group: Existing or ability to create a resource group
- Regional Availability: Check Azure OpenAI model availability by region
- Quota Understanding: Review Azure OpenAI quota and limits
Step 1: Create Azure OpenAI Resource
1.1 Navigate to Azure Portal
- Go to Azure Portal
- Sign in with your Azure credentials
1.2 Create Azure OpenAI Service
- In the search bar, type "Azure OpenAI" and select Azure OpenAI
- Click Create or Create Azure OpenAI resource
- Fill in the required configuration:
Basics Tab:
| Field | Value | Description |
|---|---|---|
| Subscription | Your subscription | Azure subscription to use |
| Resource group | Select or create new | Logical grouping for resources |
| Region | Choose region | Select region closest to your users |
| Name | Unique name | Resource name (globally unique) |
| Pricing tier | S0 (Standard) | Recommended for production workloads |
Network Tab:
| Setting | Recommendation | Description |
|---|---|---|
| Network | All networks | Allow access from all networks (configure security later) |
- Click Review + create
- Review settings and click Create
- Wait for deployment to complete (typically 1-2 minutes)
Step 2: Configure API Access
2.1 Access Resource Keys and Endpoint
- Once deployment completes, click Go to resource
- In the left navigation, select Keys and Endpoint
- Note down the following information:
| Information | Where to Find | Purpose |
|---|---|---|
| Key 1 or Key 2 | Keys and Endpoint section | API authentication |
| Endpoint | Keys and Endpoint section | API base URL |
| Location | Overview page | Resource region |
2.2 Save Credentials Securely
Store the following information for later use:
# Example credentials (replace with your values)
AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/
AZURE_OPENAI_KEY=your-api-key-here
AZURE_OPENAI_REGION=eastus2
Step 3: Deploy AI Models
3.1 Access Azure AI Foundry Portal
- Return to your Azure OpenAI resource in Azure Portal
- On the Overview tab, click Go to Azure AI Foundry portal
- The Azure AI Foundry portal will open in a new tab
3.2 Navigate to Deployments
- In Azure AI Foundry portal, locate the left navigation menu
- Under Shared resources, click Deployments
- You'll see a list of existing deployments (if any)
3.3 Deploy a Model
- Click Deploy model or + Create new deployment
- Select Deploy base model
- Browse or search for your desired model (e.g., "gpt-4o")
- Select the model and click Confirm
3.4 Configure Deployment Settings
| Setting | Description |
|---|---|
| Deployment name | Identifier for this deployment |
| Deployment type | Standard or Global Standard |
| Model version | Specific model version |
| Tokens per Minute (TPM) | Rate limit for requests |
TPM Rate Limit Guidelines:
- Single Instance: Set to maximum available if this is your only deployment in the region
- Multiple Deployments: Divide quota across deployments based on expected usage
- Check Quota: Navigate to Quotas in left menu to see available capacity
- Review settings and click Deploy
- Wait for deployment to complete (typically 30-60 seconds)
3.5 Verify Deployment
- Once deployed, the model appears in your Deployments list
- Note the Deployment name - you'll use this in CodeMie configuration
- Verify Status shows as Succeeded
Step 4: Deploy Additional Models (Optional)
Repeat Step 3 to deploy additional models based on your requirements:
Recommended Models:
- GPT-4o: General-purpose, multimodal reasoning and code generation
- GPT-4o-mini: Cost-effective option for simpler tasks
- GPT-4.1: Advanced reasoning with larger context window
- GPT-5: Latest generation model with enhanced capabilities
- o1/o3-mini: Specialized reasoning models for complex problem-solving
- Embedding models:
text-embedding-ada-002for vector embeddings
Each model deployment allows you to set specific TPM limits and deployment configurations.
Step 5: Configure Regional Deployments (Optional)
For high availability and redundancy:
- Create Azure OpenAI resources in multiple regions
- Deploy same models in each region
- Configure load balancing using LiteLLM Proxy
Quota Management
Understanding Quotas
Azure OpenAI quotas are managed at the region + model level:
- Tokens Per Minute (TPM): Maximum tokens processed per minute
- Requests Per Minute (RPM): Maximum API requests per minute
- Deployment Limits: Maximum number of model deployments per resource
Viewing and Managing Quotas
- In Azure AI Foundry portal, click Quotas in left navigation
- View current quota allocation and usage
- Request quota increases if needed:
- Click Request quota
- Fill in justification and requirements
- Submit request (approval typically takes 2-5 business days)
Security Best Practices
Network Security
-
Enable Private Endpoints:
- Create private endpoint for Azure OpenAI resource
- Restrict access to your virtual network
-
Configure Network Rules:
- Navigate to resource Networking settings
- Add allowed IP ranges or virtual networks
- Enable Allow Azure services on the trusted services list