AI Model Gateway

The routing
layer for AI.

16 frontier models. One OpenAI-compatible endpoint. Smart routing that selects the best model for every request — by cost, speed, or capability.

api.originalpoint.ai/v1  ·  OpenAI-compatible  ·  <50ms routing
16
Frontier Models
4
World-class Providers
99.9%
Uptime SLA
<50ms
Median Latency
The Platform

Infrastructure built
for production AI.

OriginalPoint is the API layer that sits between your application and every major AI provider. One endpoint, one key, complete control. Ship faster. Spend less.

One OpenAI-compatible endpoint for every model. No SDK changes, no provider-specific code. Your existing integration works on day one.

Set a routing policy — cost, speed, or quality — and let OriginalPoint dispatch every request to the optimal model. Override anytime with an explicit model ID.

Bring your own API keys for OpenAI, Anthropic, Google, and xAI. Pay providers directly at cost. We charge only for routing infrastructure.

Per-key usage dashboards with token breakdowns, cost attribution, latency percentiles, and error rates. Export to Datadog, Grafana, or raw CSV.

Automatic fallback routing when a provider degrades. Retry with exponential backoff. Circuit breakers prevent cascade failures from reaching your users.

SOC 2 Type II certified. IP allowlists per API key. Audit logs with 90-day retention. GDPR-compliant with EU data residency options and zero training on your data.

Quick Start
01
Create an account

Sign up and get your API key instantly. No approval process, no sales call. Free tier includes access to all 16 models.

02
Set base_url

Point your existing OpenAI SDK at our endpoint. One line change. No other code modifications needed.

03
Call any model

Use any model ID directly, or pass "auto" to activate smart routing. OriginalPoint selects the best model for each request.

from openai import OpenAI

client = OpenAI(
    api_key="op_...",
    base_url="https://api.originalpoint.ai/v1"
)

response = client.chat.completions.create(
    model="auto",          # or any model ID
    messages=[
        {"role": "user", "content": "Hello, world."}
    ]
)

print(response.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "op_...",
  baseURL: "https://api.originalpoint.ai/v1",
});

const response = await client.chat.completions.create({
  model: "auto",            // or any model ID
  messages: [
    { role: "user", content: "Hello, world." }
  ],
});

console.log(response.choices[0].message.content);
curl https://api.originalpoint.ai/v1/chat/completions \
  -H "Authorization: Bearer op_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "Hello, world."}
    ]
  }'
16 Models

Every frontier model,
one endpoint.

Model Provider Context Input / 1M Output / 1M Tier
GPT-5 mini
OpenAI
16K $0.40 $1.60 Fast
Grok Code Fast 1
xAI
131K $0.50 $2.00 Fast
Gemini 3 Flash
Google
1M $0.10 $0.40 Fast
GPT-5
OpenAI
128K $5.00 $20.00 Versatile
GPT-5.1
OpenAI
128K $8.00 $25.00 Versatile
Claude Sonnet 4
Anthropic
200K $3.00 $15.00 Versatile
Claude Sonnet 4.5
Anthropic
200K $3.00 $15.00 Versatile
Claude Haiku 4.5
Anthropic
200K $0.80 $4.00 Versatile
GPT-5.2
OpenAI
128K $10.00 $30.00 Versatile
GPT-4.1
OpenAI
128K $2.00 $8.00 Versatile
GPT-4o
OpenAI
128K $5.00 $15.00 Versatile
GPT-5.1-Codex-Max
OpenAI
200K $30.00 $120.00 Powerful
Claude Opus 4.5
Anthropic
200K $15.00 $75.00 Powerful
Claude Opus 4.1
Anthropic
200K $15.00 $75.00 Powerful
Gemini 3 Pro
Google
1M $3.50 $10.50 Powerful
Gemini 2.5 Pro
Google
2M $1.25 $5.00 Powerful
View all models with full specs →
Smart Routing

Route by cost,
speed, or quality.

— Cost mode
Minimizes spend per token. Selects the cheapest model that meets your minimum capability threshold. Ideal for high-volume tasks like classification, extraction, and summarization.
— Speed mode
Minimizes time-to-first-token. Routes to the fastest available model at any given moment, factoring in real-time provider latency. For user-facing streaming interfaces.
— Quality mode
Selects the highest-capability model that fits your request. Evaluates context length, task complexity, and current provider performance scores.
Your Application
OriginalPoint Router
↓ parse model="auto" ↓ evaluate routing policy ↓ score provider health ↓ select optimal model
OpenAI
GPT-5 · GPT-4.1
Anthropic
Sonnet · Opus
Google
Gemini 3 · 2.5
xAI
Grok Code Fast
For Developers

Everything you need.
Nothing you don't.

# Install: pip install openai
from openai import OpenAI

# One line change from your existing OpenAI code
client = OpenAI(
    api_key="op_your_key_here",
    base_url="https://api.originalpoint.ai/v1"
)

# Use any model — or let OriginalPoint decide
response = client.chat.completions.create(
    model="claude-sonnet-4",   # or "auto"
    messages=[{"role": "user", "content": "Explain quantum entanglement."}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
// npm install openai
import OpenAI from "openai";

// One line change from your existing OpenAI code
const client = new OpenAI({
  apiKey: "op_your_key_here",
  baseURL: "https://api.originalpoint.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4",  // or "auto"
  messages: [{ role: "user", content: "Explain quantum entanglement." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
# Stream a response from Claude Sonnet 4
curl https://api.originalpoint.ai/v1/chat/completions \
  -H "Authorization: Bearer op_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "stream": true,
    "messages": [{
      "role": "user",
      "content": "Explain quantum entanglement."
    }]
  }'
Enterprise Ready
SOC 2 Type II

Annual third-party audit. Report available under NDA.

GDPR + DPA

EU residency options. Zero training on your data.

IP Allowlists

Per-key network policies. Instant propagation.

99.9% SLA

Contractual uptime. Auto credits. No ticket needed.

Start in 2 minutes.

No credit card required. Free tier includes all 16 models.