AI Model Gateway

The routing
layer for AI.

16 frontier models. One OpenAI-compatible endpoint. Smart routing that selects the best model for every request — by cost, speed, or capability.

api.originalpoint.ai/v1  ·  OpenAI-compatible  ·  <50ms
16
Frontier Models
4
World-class Providers
99.9%
Uptime SLA
<50ms
Median Latency
2.0 The Platform

Infrastructure built
for production AI.

OriginalPoint is the API layer between your application and every major AI provider. One endpoint, one key, complete control.

One OpenAI-compatible endpoint for every model. No SDK changes, no provider-specific code. Your existing integration works on day one.

Set a routing policy — cost, speed, or quality — and let OriginalPoint dispatch every request to the optimal model.

Bring your own API keys for OpenAI, Anthropic, Google, and xAI. Pay providers directly at cost. We charge only for routing.

Per-key dashboards with token breakdowns, cost attribution, latency percentiles, and error rates. Export to Datadog, Grafana, or CSV.

Automatic fallback routing when a provider degrades. Retry with exponential backoff. Circuit breakers prevent cascade failures.

SOC 2 Type II certified. IP allowlists per API key. Audit logs with 90-day retention. GDPR-compliant with EU data residency options.

3.0 Quick Start
01
Create an account

Sign up and get your API key instantly. No approval process, no sales call. Free tier includes access to all 16 models.

02
Set base_url

Point your existing OpenAI SDK at our endpoint. One line change. No other code modifications needed.

03
Call any model

Use any model ID directly, or pass "auto" to activate smart routing.

from openai import OpenAI

client = OpenAI(
    api_key="op_...",
    base_url="https://api.originalpoint.ai/v1"
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello."}]
)
print(response.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "op_...",
  baseURL: "https://api.originalpoint.ai/v1",
});

const res = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Hello." }],
});
console.log(res.choices[0].message.content);
curl https://api.originalpoint.ai/v1/chat/completions \
  -H "Authorization: Bearer op_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Hello."}]}'
4.0 16 Models

Every frontier model,
one endpoint.

Model Provider Context Input / 1M Output / 1M Tier
GPT-5 mini
OpenAI
16K$0.40$1.60Fast
Grok Code Fast 1
xAI
131K$0.50$2.00Fast
Gemini 3 Flash
Google
1M$0.10$0.40Fast
GPT-5
OpenAI
128K$5.00$20.00Versatile
GPT-5.1
OpenAI
128K$8.00$25.00Versatile
Claude Sonnet 4
Anthropic
200K$3.00$15.00Versatile
Claude Sonnet 4.5
Anthropic
200K$3.00$15.00Versatile
Claude Haiku 4.5
Anthropic
200K$0.80$4.00Versatile
GPT-5.2
OpenAI
128K$10.00$30.00Versatile
GPT-4.1
OpenAI
128K$2.00$8.00Versatile
GPT-4o
OpenAI
128K$5.00$15.00Versatile
GPT-5.1-Codex-Max
OpenAI
200K$30.00$120.00Powerful
Claude Opus 4.5
Anthropic
200K$15.00$75.00Powerful
Claude Opus 4.1
Anthropic
200K$15.00$75.00Powerful
Gemini 3 Pro
Google
1M$3.50$10.50Powerful
Gemini 2.5 Pro
Google
2M$1.25$5.00Powerful
View all models with full specs →
5.0 Smart Routing

Route by cost,
speed, or quality.

— Cost mode
Minimizes spend per token. Selects the cheapest model that meets your minimum capability threshold.
— Speed mode
Minimizes time-to-first-token. Routes to the fastest available model, factoring in real-time provider latency.
— Quality mode
Selects the highest-capability model that fits your request, evaluating context length, task complexity, and provider scores.
Your Application
OriginalPoint Router
↓ parse model="auto" ↓ evaluate routing policy ↓ score provider health ↓ select optimal model
OpenAI
GPT-5 · 4.1
Anthropic
Sonnet · Opus
Google
Gemini 3 · 2.5
xAI
Grok Code Fast
6.0 For Developers

Everything you need.
Nothing you don't.

# pip install openai
from openai import OpenAI

client = OpenAI(
    api_key="op_your_key_here",
    base_url="https://api.originalpoint.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Explain quantum entanglement."}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
// npm install openai
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "op_your_key_here",
  baseURL: "https://api.originalpoint.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4",
  messages: [{ role: "user", content: "Explain quantum entanglement." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
curl https://api.originalpoint.ai/v1/chat/completions \
  -H "Authorization: Bearer op_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "stream": true,
    "messages": [{"role":"user","content":"Explain quantum entanglement."}]
  }'
7.0 Enterprise Ready
SOC 2 Type II

Annual third-party audit. Report available under NDA.

GDPR + DPA

EU residency options. Zero training on your data.

IP Allowlists

Per-key network policies. Instant propagation.

99.9% SLA

Contractual uptime. Auto credits. No ticket needed.

Start in 2 minutes.

No credit card required. Free tier includes all 16 models.