Chat Completions API

Introduction

This API provides access to state-of-the-art language models for chat completions. It is fully compatible with the OpenAI Chat Completions API specification, making it easy to integrate with existing applications and SDKs.

Try the Interactive Playground →

Authentication

All API requests require authentication using a Bearer token. Include your API key in the Authorization header of every request:

Authorization: Bearer YOUR_API_KEY

Available Models

The following models are available through the API. Prices shown are per 1 million tokens for prompt and completion, with separate pricing for image inputs where applicable.

Note: Context length represents the maximum number of tokens (prompt + completion) the model can handle. Pricing may vary based on specific features like vision capabilities.

Chat Completions: POST /api/v1/chat/completions

This endpoint generates chat completions based on a conversation history. It accepts a list of messages and returns the model's response.

Request

Send a POST request with a JSON body containing your messages and configuration parameters.

Required Parameters

ParameterTypeDescription
modelstringThe ID of the model to use (see Available Models)
messagesarrayAn array of message objects representing the conversation history

Message Object Structure

Each message in the messages array should have the following structure:

{
  "role": "user",        // "system", "user", or "assistant"
  "content": "Hello!"    // String or array (for multimodal)
}

Optional Parameters

ParameterTypeDefaultDescription
temperaturefloat1.0Sampling temperature (0.0 to 2.0). Higher values make output more random.
max_tokensintegernullMaximum number of tokens to generate in the completion.
top_pfloat1.0Nucleus sampling parameter (0.0 to 1.0).
frequency_penaltyfloat0.0Penalize repeated tokens (-2.0 to 2.0).
presence_penaltyfloat0.0Penalize tokens based on their presence (-2.0 to 2.0).
streambooleanfalseIf true, responses are streamed as Server-Sent Events.
stopstring/arraynullStop sequence(s) where generation should end.
detect_mediabooleanfalseIf true, analyzes conversation to detect if media generation is needed. Cannot be used with streaming.
bypass_compliancebooleanfalseIf true, bypass the default compliance system.

Example Request

{
  "model": "google/gemini-2.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 150
}

Response

A successful request returns a JSON object with the model's response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699200000,
  "model": "google/gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 8,
    "total_tokens": 28
  }
}

Streaming Responses

When stream is set to true, the API returns responses as Server-Sent Events (SSE). Each event is a JSON object prefixed with data: .

Example streaming response chunks:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699200000,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699200000,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699200000,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: [DONE]

The stream ends with data: [DONE].

Important: The detect_media parameter cannot be used with streaming mode. If both are set to true, the API will return an error.

Media Detection

When detect_media is set to true, the API analyzes the conversation to determine if the assistant's response indicates that media generation is required. This feature uses an additional LLM call to intelligently detect intent and agreement between user and assistant.

This parameter should be combined with a system prompt that inform the model how to handle media requests.

Example:


You are allowed to send pictures if asked. Avoid describing the image, just act as you sent it, the system will add the right image automatically.

Supported Media Types

Media TypeExample Trigger KeywordsUse Case
imageDraw, Show me, Picture, Illustrate, VisualizeImage generation prompts.
videoAnimate, Video of, Show movement, FilmVideo generation prompts for animation tools
voiceSay that, Speak, Voice message, NarrateText-to-speech prompts with emotional context
musicCompose, Play music, Song about, Create a beatMusic generation prompts with genre and mood

Example Request with Media Detection

{
  "model": "google/gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": "Can you draw me a sunset over the ocean?"
    }
  ],
  "detect_media": true
}

Response with Media Object

When media is detected, the response includes a media object with the appropriate media type and generation prompt:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699200000,
  "model": "google/gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'd be happy to create that for you! Here's a beautiful sunset scene."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 15,
    "total_tokens": 40
  },
  "media": {
    "image": "A breathtaking sunset over a calm ocean, with vibrant orange and pink hues reflecting on the water, wispy clouds in the sky, peaceful and serene atmosphere, photorealistic, 8k, cinematic composition"
  }
}

Media Prompt Examples

The API generates context-aware prompts for different media types:

// Image generation
"media": {
  "image": "A majestic wolf howling at a full moon, aggressive stance, detailed fur texture, dark forest background, dramatic lighting, tattoo art style, high contrast black and white"
}

// Video generation
"media": {
  "video": "Camera slowly pans across a bustling city street at night, neon lights reflecting on wet pavement, people walking with umbrellas, cinematic 24fps, establishing shot"
}

// Voice synthesis
"media": {
  "voice": "Speak warmly and clearly: Welcome to our store! We're so glad you're here. How can I help you today?"
}

// Music composition
"media": {
  "music": "Upbeat electronic dance music, 128 BPM, energetic and motivating, featuring synthesizers and strong bass line, perfect for workout playlist"
}
Implementation Note: The media detection feature is designed for non-streaming responses only. Your client application should use the media prompts to make subsequent calls to specialized media generation APIs (image generators, TTS engines, music composers, etc.) as needed.

Multimodal Support

Many models support multimodal inputs, allowing you to include images or videos alongside text in your prompts. Check the model's capabilities before using these features.

Including Images

To include an image, use an array for the message content and add an image_url object:

{
  "model": "google/gemini-flash-1.5",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}

Using Base64-Encoded Images

You can also provide images as base64-encoded data URIs:

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
  }
}
Important: Not all models support all modalities. If you send an unsupported media type, the API will return an error. Always check the model's capabilities before including images or videos.

Advanced Parameters

The API supports pass-through of additional parameters that may be specific to certain models or providers. These parameters are sent directly to the underlying model.

Provider Routing

You can influence model selection and fallback behavior using the provider parameter:

{
  "model": "google/gemini-2.5-flash",
  "messages": [...],
  "provider": {
    "order": ["Anthropic", "OpenAI"],
    "allow_fallbacks": false
  }
}

Additional Model Parameters

You can include other parameters supported by specific models:

  • repetition_penalty - Penalize repeated sequences
  • top_k - Limit sampling to top K tokens
  • min_p - Minimum probability threshold
  • top_a - Alternative top-p sampling method

Example with additional parameters:

{
  "model": "google/gemini-flash-1.5",
  "messages": [...],
  "temperature": 0.8,
  "top_k": 40,
  "repetition_penalty": 1.1
}

Error Handling

When an error occurs, the API returns an appropriate HTTP status code and a JSON error object:

{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid request format or parameters
401UnauthorizedMissing or invalid API key
402Payment RequiredInsufficient account balance
429Rate LimitToo many requests, slow down
500Server ErrorInternal server error, try again later
503Service UnavailableModel temporarily unavailable

Code Examples

Here are examples of how to use the API in different programming languages.

from openai import OpenAI

# Initialize the client with your API key and base URL
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://chat.api.efficientstack.com/api/v1"
)

# Create a chat completion
response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7,
    max_tokens=150
)

print(response.choices[0].message.content)

# Example with streaming
stream = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Example with media detection
response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[
        {"role": "user", "content": "Can you draw a sunset?"}
    ],
    extra_body={"detect_media": True}
)

if hasattr(response, 'media') and response.media:
    print(f"Media detected: {response.media}")
import OpenAI from 'openai';

// Initialize the client
const openai = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://chat.api.efficientstack.com/api/v1'
});

// Create a chat completion
async function main() {
  const completion = await openai.chat.completions.create({
    model: 'google/gemini-2.5-flash',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'What is the capital of France?' }
    ],
    temperature: 0.7,
    max_tokens: 150
  });

  console.log(completion.choices[0].message.content);

  // Example with streaming
  const stream = await openai.chat.completions.create({
    model: 'google/gemini-2.5-flash',
    messages: [{ role: 'user', content: 'Tell me a story' }],
    stream: true
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }

  // Example with media detection
  const mediaResponse = await openai.chat.completions.create({
    model: 'google/gemini-2.5-flash',
    messages: [
      { role: 'user', content: 'Can you draw a sunset?' }
    ],
    detect_media: true
  });

  if (mediaResponse.media) {
    console.log('Media detected:', mediaResponse.media);
  }
}

main();
curl -X POST "https://chat.api.efficientstack.com/api/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

# Example with streaming
curl -X POST "https://chat.api.efficientstack.com/api/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

# Example with media detection
curl -X POST "https://chat.api.efficientstack.com/api/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Can you draw a sunset?"}
    ],
    "detect_media": true
  }'
<?php
require 'vendor/autoload.php';

use OpenAI;

// Initialize the client
$client = OpenAI::factory()
    ->withApiKey('YOUR_API_KEY')
    ->withBaseUri('https://chat.api.efficientstack.com/api/v1')
    ->make();

// Create a chat completion
$response = $client->chat()->create([
    'model' => 'google/gemini-2.5-flash',
    'messages' => [
        ['role' => 'system', 'content' => 'You are a helpful assistant.'],
        ['role' => 'user', 'content' => 'What is the capital of France?']
    ],
    'temperature' => 0.7,
    'max_tokens' => 150
]);

echo $response->choices[0]->message->content;

// Example with streaming
$stream = $client->chat()->createStreamed([
    'model' => 'google/gemini-2.5-flash',
    'messages' => [
        ['role' => 'user', 'content' => 'Tell me a story']
    ]
]);

foreach ($stream as $response) {
    echo $response->choices[0]->delta->content ?? '';
}

// Example with media detection
$mediaResponse = $client->chat()->create([
    'model' => 'google/gemini-2.5-flash',
    'messages' => [
        ['role' => 'user', 'content' => 'Can you draw a sunset?']
    ],
    'detect_media' => true
]);

if (isset($mediaResponse->media)) {
    print_r($mediaResponse->media);
}
?>