Chat Completions API
Introduction
This API provides access to state-of-the-art language models for chat completions. It is fully compatible with the OpenAI Chat Completions API specification, making it easy to integrate with existing applications and SDKs.
Try the Interactive Playground →Example Integrations
Authentication
All API requests require authentication using a Bearer token. Include your API key in the Authorization header of every request:
Authorization: Bearer YOUR_API_KEY
Available Models
The following models are available through the API. Prices shown are per 1 million tokens for prompt and completion, with separate pricing for image inputs where applicable.
Chat Completions: POST /api/v1/chat/completions
This endpoint generates chat completions based on a conversation history. It accepts a list of messages and returns the model's response.
Request
Send a POST request with a JSON body containing your messages and configuration parameters.
Required Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | The ID of the model to use (see Available Models) |
messages | array | An array of message objects representing the conversation history |
Message Object Structure
Each message in the messages array should have the following structure:
{
"role": "user", // "system", "user", or "assistant"
"content": "Hello!" // String or array (for multimodal)
}
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature | float | 1.0 | Sampling temperature (0.0 to 2.0). Higher values make output more random. |
max_tokens | integer | null | Maximum number of tokens to generate in the completion. |
top_p | float | 1.0 | Nucleus sampling parameter (0.0 to 1.0). |
frequency_penalty | float | 0.0 | Penalize repeated tokens (-2.0 to 2.0). |
presence_penalty | float | 0.0 | Penalize tokens based on their presence (-2.0 to 2.0). |
stream | boolean | false | If true, responses are streamed as Server-Sent Events. |
stop | string/array | null | Stop sequence(s) where generation should end. |
detect_media | boolean | false | If true, analyzes conversation to detect if media generation is needed. Cannot be used with streaming. |
bypass_compliance | boolean | false | If true, bypass the default compliance system. |
Example Request
{
"model": "google/gemini-2.5-flash",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"temperature": 0.7,
"max_tokens": 150
}
Response
A successful request returns a JSON object with the model's response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699200000,
"model": "google/gemini-2.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 8,
"total_tokens": 28
}
}
Streaming Responses
When stream is set to true, the API returns responses as Server-Sent Events (SSE). Each event is a JSON object prefixed with data: .
Example streaming response chunks:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699200000,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699200000,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699200000,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}
data: [DONE]
The stream ends with data: [DONE].
detect_media parameter cannot be used with streaming mode. If both are set to true, the API will return an error.
Media Detection
When detect_media is set to true, the API analyzes the conversation to determine if the assistant's response indicates that media generation is required. This feature uses an additional LLM call to intelligently detect intent and agreement between user and assistant.
This parameter should be combined with a system prompt that inform the model how to handle media requests.
Example:
You are allowed to send pictures if asked. Avoid describing the image, just act as you sent it, the system will add the right image automatically.
Supported Media Types
| Media Type | Example Trigger Keywords | Use Case |
|---|---|---|
image | Draw, Show me, Picture, Illustrate, Visualize | Image generation prompts. |
video | Animate, Video of, Show movement, Film | Video generation prompts for animation tools |
voice | Say that, Speak, Voice message, Narrate | Text-to-speech prompts with emotional context |
music | Compose, Play music, Song about, Create a beat | Music generation prompts with genre and mood |
Example Request with Media Detection
{
"model": "google/gemini-2.5-flash",
"messages": [
{
"role": "user",
"content": "Can you draw me a sunset over the ocean?"
}
],
"detect_media": true
}
Response with Media Object
When media is detected, the response includes a media object with the appropriate media type and generation prompt:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699200000,
"model": "google/gemini-2.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'd be happy to create that for you! Here's a beautiful sunset scene."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 15,
"total_tokens": 40
},
"media": {
"image": "A breathtaking sunset over a calm ocean, with vibrant orange and pink hues reflecting on the water, wispy clouds in the sky, peaceful and serene atmosphere, photorealistic, 8k, cinematic composition"
}
}
Media Prompt Examples
The API generates context-aware prompts for different media types:
// Image generation
"media": {
"image": "A majestic wolf howling at a full moon, aggressive stance, detailed fur texture, dark forest background, dramatic lighting, tattoo art style, high contrast black and white"
}
// Video generation
"media": {
"video": "Camera slowly pans across a bustling city street at night, neon lights reflecting on wet pavement, people walking with umbrellas, cinematic 24fps, establishing shot"
}
// Voice synthesis
"media": {
"voice": "Speak warmly and clearly: Welcome to our store! We're so glad you're here. How can I help you today?"
}
// Music composition
"media": {
"music": "Upbeat electronic dance music, 128 BPM, energetic and motivating, featuring synthesizers and strong bass line, perfect for workout playlist"
}
Multimodal Support
Many models support multimodal inputs, allowing you to include images or videos alongside text in your prompts. Check the model's capabilities before using these features.
Including Images
To include an image, use an array for the message content and add an image_url object:
{
"model": "google/gemini-flash-1.5",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}
Using Base64-Encoded Images
You can also provide images as base64-encoded data URIs:
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
}
}
Advanced Parameters
The API supports pass-through of additional parameters that may be specific to certain models or providers. These parameters are sent directly to the underlying model.
Provider Routing
You can influence model selection and fallback behavior using the provider parameter:
{
"model": "google/gemini-2.5-flash",
"messages": [...],
"provider": {
"order": ["Anthropic", "OpenAI"],
"allow_fallbacks": false
}
}
Additional Model Parameters
You can include other parameters supported by specific models:
repetition_penalty- Penalize repeated sequencestop_k- Limit sampling to top K tokensmin_p- Minimum probability thresholdtop_a- Alternative top-p sampling method
Example with additional parameters:
{
"model": "google/gemini-flash-1.5",
"messages": [...],
"temperature": 0.8,
"top_k": 40,
"repetition_penalty": 1.1
}
Error Handling
When an error occurs, the API returns an appropriate HTTP status code and a JSON error object:
{
"error": {
"message": "Invalid API key provided",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}
Common Error Codes
| Status Code | Error Type | Description |
|---|---|---|
400 | Bad Request | Invalid request format or parameters |
401 | Unauthorized | Missing or invalid API key |
402 | Payment Required | Insufficient account balance |
429 | Rate Limit | Too many requests, slow down |
500 | Server Error | Internal server error, try again later |
503 | Service Unavailable | Model temporarily unavailable |
Code Examples
Here are examples of how to use the API in different programming languages.
from openai import OpenAI
# Initialize the client with your API key and base URL
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://chat.api.efficientstack.com/api/v1"
)
# Create a chat completion
response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
temperature=0.7,
max_tokens=150
)
print(response.choices[0].message.content)
# Example with streaming
stream = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
# Example with media detection
response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "Can you draw a sunset?"}
],
extra_body={"detect_media": True}
)
if hasattr(response, 'media') and response.media:
print(f"Media detected: {response.media}")
import OpenAI from 'openai';
// Initialize the client
const openai = new OpenAI({
apiKey: 'YOUR_API_KEY',
baseURL: 'https://chat.api.efficientstack.com/api/v1'
});
// Create a chat completion
async function main() {
const completion = await openai.chat.completions.create({
model: 'google/gemini-2.5-flash',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' }
],
temperature: 0.7,
max_tokens: 150
});
console.log(completion.choices[0].message.content);
// Example with streaming
const stream = await openai.chat.completions.create({
model: 'google/gemini-2.5-flash',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
// Example with media detection
const mediaResponse = await openai.chat.completions.create({
model: 'google/gemini-2.5-flash',
messages: [
{ role: 'user', content: 'Can you draw a sunset?' }
],
detect_media: true
});
if (mediaResponse.media) {
console.log('Media detected:', mediaResponse.media);
}
}
main();
curl -X POST "https://chat.api.efficientstack.com/api/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"temperature": 0.7,
"max_tokens": 150
}'
# Example with streaming
curl -X POST "https://chat.api.efficientstack.com/api/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'
# Example with media detection
curl -X POST "https://chat.api.efficientstack.com/api/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{"role": "user", "content": "Can you draw a sunset?"}
],
"detect_media": true
}'
<?php
require 'vendor/autoload.php';
use OpenAI;
// Initialize the client
$client = OpenAI::factory()
->withApiKey('YOUR_API_KEY')
->withBaseUri('https://chat.api.efficientstack.com/api/v1')
->make();
// Create a chat completion
$response = $client->chat()->create([
'model' => 'google/gemini-2.5-flash',
'messages' => [
['role' => 'system', 'content' => 'You are a helpful assistant.'],
['role' => 'user', 'content' => 'What is the capital of France?']
],
'temperature' => 0.7,
'max_tokens' => 150
]);
echo $response->choices[0]->message->content;
// Example with streaming
$stream = $client->chat()->createStreamed([
'model' => 'google/gemini-2.5-flash',
'messages' => [
['role' => 'user', 'content' => 'Tell me a story']
]
]);
foreach ($stream as $response) {
echo $response->choices[0]->delta->content ?? '';
}
// Example with media detection
$mediaResponse = $client->chat()->create([
'model' => 'google/gemini-2.5-flash',
'messages' => [
['role' => 'user', 'content' => 'Can you draw a sunset?']
],
'detect_media' => true
]);
if (isset($mediaResponse->media)) {
print_r($mediaResponse->media);
}
?>