Chat Completions API

Introduction

This API provides access to state-of-the-art language models for chat completions. It is fully compatible with the OpenAI Chat Completions API specification, making it easy to integrate with existing applications and SDKs.

Try the Interactive Playground →

Example Integrations

Authentication

All API requests require authentication using a Bearer token. Include your API key in the Authorization header of every request:

Authorization: Bearer YOUR_API_KEY

Available Models

The following models are available through the API. Prices shown are per 1 million tokens for prompt and completion, with separate pricing for image inputs where applicable.

Note: Context length represents the maximum number of tokens (prompt + completion) the model can handle. Pricing may vary based on specific features like vision capabilities.

Chat Completions: `POST /api/v1/chat/completions`

This endpoint generates chat completions based on a conversation history. It accepts a list of messages and returns the model's response.

Request

Send a POST request with a JSON body containing your messages and configuration parameters.

Required Parameters

Parameter	Type	Description
`model`	string	The ID of the model to use (see Available Models)
`messages`	array	An array of message objects representing the conversation history

Message Object Structure

Each message in the messages array should have the following structure:

{
  "role": "user",        // "system", "user", or "assistant"
  "content": "Hello!"    // String or array (for multimodal)
}

Optional Parameters

Parameter	Type	Default	Description
`temperature`	float	1.0	Sampling temperature (0.0 to 2.0). Higher values make output more random.
`max_tokens`	integer	null	Maximum number of tokens to generate in the completion.
`top_p`	float	1.0	Nucleus sampling parameter (0.0 to 1.0).
`frequency_penalty`	float	0.0	Penalize repeated tokens (-2.0 to 2.0).
`presence_penalty`	float	0.0	Penalize tokens based on their presence (-2.0 to 2.0).
`stream`	boolean	false	If true, responses are streamed as Server-Sent Events.
`stop`	string/array	null	Stop sequence(s) where generation should end.
`detect_media`	boolean	false	If true, analyzes conversation to detect if media generation is needed. Cannot be used with streaming.
`bypass_compliance`	boolean	false	If true, bypass the default compliance system.

Example Request

{
  "model": "google/gemini-2.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 150
}

Response

A successful request returns a JSON object with the model's response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699200000,
  "model": "google/gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 8,
    "total_tokens": 28
  }
}

Streaming Responses

When stream is set to true, the API returns responses as Server-Sent Events (SSE). Each event is a JSON object prefixed with data: .

Example streaming response chunks:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699200000,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699200000,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699200000,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: [DONE]

The stream ends with data: [DONE].

Important: The detect_media parameter cannot be used with streaming mode. If both are set to true, the API will return an error.

Media Detection

When detect_media is set to true, the API analyzes the conversation to determine if the assistant's response indicates that media generation is required. This feature uses an additional LLM call to intelligently detect intent and agreement between user and assistant.

This parameter should be combined with a system prompt that inform the model how to handle media requests.

Example:


You are allowed to send pictures if asked. Avoid describing the image, just act as you sent it, the system will add the right image automatically.

Supported Media Types

Media Type	Example Trigger Keywords	Use Case
`image`	Draw, Show me, Picture, Illustrate, Visualize	Image generation prompts.
`video`	Animate, Video of, Show movement, Film	Video generation prompts for animation tools
`voice`	Say that, Speak, Voice message, Narrate	Text-to-speech prompts with emotional context
`music`	Compose, Play music, Song about, Create a beat	Music generation prompts with genre and mood

Example Request with Media Detection

{
  "model": "google/gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": "Can you draw me a sunset over the ocean?"
    }
  ],
  "detect_media": true
}

Response with Media Object

When media is detected, the response includes a media object with the appropriate media type and generation prompt:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699200000,
  "model": "google/gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'd be happy to create that for you! Here's a beautiful sunset scene."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 15,
    "total_tokens": 40
  },
  "media": {
    "image": "A breathtaking sunset over a calm ocean, with vibrant orange and pink hues reflecting on the water, wispy clouds in the sky, peaceful and serene atmosphere, photorealistic, 8k, cinematic composition"
  }
}

Media Prompt Examples

The API generates context-aware prompts for different media types:

// Image generation
"media": {
  "image": "A majestic wolf howling at a full moon, aggressive stance, detailed fur texture, dark forest background, dramatic lighting, tattoo art style, high contrast black and white"
}

// Video generation
"media": {
  "video": "Camera slowly pans across a bustling city street at night, neon lights reflecting on wet pavement, people walking with umbrellas, cinematic 24fps, establishing shot"
}

// Voice synthesis
"media": {
  "voice": "Speak warmly and clearly: Welcome to our store! We're so glad you're here. How can I help you today?"
}

// Music composition
"media": {
  "music": "Upbeat electronic dance music, 128 BPM, energetic and motivating, featuring synthesizers and strong bass line, perfect for workout playlist"
}

Implementation Note: The media detection feature is designed for non-streaming responses only. Your client application should use the media prompts to make subsequent calls to specialized media generation APIs (image generators, TTS engines, music composers, etc.) as needed.

Multimodal Support

Many models support multimodal inputs, allowing you to include images or videos alongside text in your prompts. Check the model's capabilities before using these features.

Including Images

To include an image, use an array for the message content and add an image_url object:

{
  "model": "google/gemini-flash-1.5",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}

Using Base64-Encoded Images

You can also provide images as base64-encoded data URIs:

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
  }
}

Important: Not all models support all modalities. If you send an unsupported media type, the API will return an error. Always check the model's capabilities before including images or videos.

Advanced Parameters

The API supports pass-through of additional parameters that may be specific to certain models or providers. These parameters are sent directly to the underlying model.

Provider Routing

You can influence model selection and fallback behavior using the provider parameter:

{
  "model": "google/gemini-2.5-flash",
  "messages": [...],
  "provider": {
    "order": ["Anthropic", "OpenAI"],
    "allow_fallbacks": false
  }
}

Additional Model Parameters

You can include other parameters supported by specific models:

repetition_penalty - Penalize repeated sequences
top_k - Limit sampling to top K tokens
min_p - Minimum probability threshold
top_a - Alternative top-p sampling method

Example with additional parameters:

{
  "model": "google/gemini-flash-1.5",
  "messages": [...],
  "temperature": 0.8,
  "top_k": 40,
  "repetition_penalty": 1.1
}

Error Handling

When an error occurs, the API returns an appropriate HTTP status code and a JSON error object:

{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Common Error Codes

Status Code	Error Type	Description
`400`	Bad Request	Invalid request format or parameters
`401`	Unauthorized	Missing or invalid API key
`402`	Payment Required	Insufficient account balance
`429`	Rate Limit	Too many requests, slow down
`500`	Server Error	Internal server error, try again later
`503`	Service Unavailable	Model temporarily unavailable

Code Examples

Here are examples of how to use the API in different programming languages.

from openai import OpenAI

# Initialize the client with your API key and base URL
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://chat.api.efficientstack.com/api/v1"
)

# Create a chat completion
response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7,
    max_tokens=150
)

print(response.choices[0].message.content)

# Example with streaming
stream = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Example with media detection
response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[
        {"role": "user", "content": "Can you draw a sunset?"}
    ],
    extra_body={"detect_media": True}
)

if hasattr(response, 'media') and response.media:
    print(f"Media detected: {response.media}")

import OpenAI from 'openai';

// Initialize the client
const openai = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://chat.api.efficientstack.com/api/v1'
});

// Create a chat completion
async function main() {
  const completion = await openai.chat.completions.create({
    model: 'google/gemini-2.5-flash',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'What is the capital of France?' }
    ],
    temperature: 0.7,
    max_tokens: 150
  });

  console.log(completion.choices[0].message.content);

  // Example with streaming
  const stream = await openai.chat.completions.create({
    model: 'google/gemini-2.5-flash',
    messages: [{ role: 'user', content: 'Tell me a story' }],
    stream: true
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }

  // Example with media detection
  const mediaResponse = await openai.chat.completions.create({
    model: 'google/gemini-2.5-flash',
    messages: [
      { role: 'user', content: 'Can you draw a sunset?' }
    ],
    detect_media: true
  });

  if (mediaResponse.media) {
    console.log('Media detected:', mediaResponse.media);
  }
}

main();

curl -X POST "https://chat.api.efficientstack.com/api/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

# Example with streaming
curl -X POST "https://chat.api.efficientstack.com/api/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

# Example with media detection
curl -X POST "https://chat.api.efficientstack.com/api/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Can you draw a sunset?"}
    ],
    "detect_media": true
  }'

<?php
require 'vendor/autoload.php';

use OpenAI;

// Initialize the client
$client = OpenAI::factory()
    ->withApiKey('YOUR_API_KEY')
    ->withBaseUri('https://chat.api.efficientstack.com/api/v1')
    ->make();

// Create a chat completion
$response = $client->chat()->create([
    'model' => 'google/gemini-2.5-flash',
    'messages' => [
        ['role' => 'system', 'content' => 'You are a helpful assistant.'],
        ['role' => 'user', 'content' => 'What is the capital of France?']
    ],
    'temperature' => 0.7,
    'max_tokens' => 150
]);

echo $response->choices[0]->message->content;

// Example with streaming
$stream = $client->chat()->createStreamed([
    'model' => 'google/gemini-2.5-flash',
    'messages' => [
        ['role' => 'user', 'content' => 'Tell me a story']
    ]
]);

foreach ($stream as $response) {
    echo $response->choices[0]->delta->content ?? '';
}

// Example with media detection
$mediaResponse = $client->chat()->create([
    'model' => 'google/gemini-2.5-flash',
    'messages' => [
        ['role' => 'user', 'content' => 'Can you draw a sunset?']
    ],
    'detect_media' => true
]);

if (isset($mediaResponse->media)) {
    print_r($mediaResponse->media);
}
?>