AI Model Gateway

The routing
layer for AI.

16 frontier models. One OpenAI-compatible endpoint. Smart routing that selects the best model for every request — by cost, speed, or capability.

Start building → View documentation

api.originalpoint.ai/v1 · OpenAI-compatible · <50ms

2.0 The Platform

Infrastructure built
for production AI.

OriginalPoint is the API layer between your application and every major AI provider. One endpoint, one key, complete control.

One OpenAI-compatible endpoint for every model. No SDK changes, no provider-specific code. Your existing integration works on day one.

Set a routing policy — cost, speed, or quality — and let OriginalPoint dispatch every request to the optimal model.

Bring your own API keys for OpenAI, Anthropic, Google, and xAI. Pay providers directly at cost. We charge only for routing.

Per-key dashboards with token breakdowns, cost attribution, latency percentiles, and error rates. Export to Datadog, Grafana, or CSV.

Automatic fallback routing when a provider degrades. Retry with exponential backoff. Circuit breakers prevent cascade failures.

SOC 2 Type II certified. IP allowlists per API key. Audit logs with 90-day retention. GDPR-compliant with EU data residency options.

3.0 Quick Start

Create an account

Set base_url

Point your existing OpenAI SDK at our endpoint. One line change. No other code modifications needed.

Call any model

Use any model ID directly, or pass "auto" to activate smart routing.

from openai import OpenAI

client = OpenAI(
    api_key="op_...",
    base_url="https://api.originalpoint.ai/v1"
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello."}]
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "op_...",
  baseURL: "https://api.originalpoint.ai/v1",
});

const res = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Hello." }],
});
console.log(res.choices[0].message.content);

curl https://api.originalpoint.ai/v1/chat/completions \
  -H "Authorization: Bearer op_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Hello."}]}'

4.0 16 Models

Every frontier model,
one endpoint.

Model	Provider	Context	Input / 1M	Output / 1M	Tier
GPT-5 mini	OpenAI	16K	$0.40	$1.60	Fast
Grok Code Fast 1	xAI	131K	$0.50	$2.00	Fast
Gemini 3 Flash	Google	1M	$0.10	$0.40	Fast
GPT-5	OpenAI	128K	$5.00	$20.00	Versatile
GPT-5.1	OpenAI	128K	$8.00	$25.00	Versatile
Claude Sonnet 4	Anthropic	200K	$3.00	$15.00	Versatile
Claude Sonnet 4.5	Anthropic	200K	$3.00	$15.00	Versatile
Claude Haiku 4.5	Anthropic	200K	$0.80	$4.00	Versatile
GPT-5.2	OpenAI	128K	$10.00	$30.00	Versatile
GPT-4.1	OpenAI	128K	$2.00	$8.00	Versatile
GPT-4o	OpenAI	128K	$5.00	$15.00	Versatile
GPT-5.1-Codex-Max	OpenAI	200K	$30.00	$120.00	Powerful
Claude Opus 4.5	Anthropic	200K	$15.00	$75.00	Powerful
Claude Opus 4.1	Anthropic	200K	$15.00	$75.00	Powerful
Gemini 3 Pro	Google	1M	$3.50	$10.50	Powerful
Gemini 2.5 Pro	Google	2M	$1.25	$5.00	Powerful

View all models with full specs →

5.0 Smart Routing

Route by cost,
speed, or quality.

— Cost mode

Minimizes spend per token. Selects the cheapest model that meets your minimum capability threshold.

— Speed mode

Minimizes time-to-first-token. Routes to the fastest available model, factoring in real-time provider latency.

— Quality mode

Selects the highest-capability model that fits your request, evaluating context length, task complexity, and provider scores.

Your Application

OriginalPoint Router

↓ parse model="auto" ↓ evaluate routing policy ↓ score provider health ↓ select optimal model

OpenAI
GPT-5 · 4.1 Anthropic
Sonnet · Opus Google
Gemini 3 · 2.5 xAI
Grok Code Fast

6.0 For Developers

Everything you need.
Nothing you don't.

# pip install openai
from openai import OpenAI

client = OpenAI(
    api_key="op_your_key_here",
    base_url="https://api.originalpoint.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Explain quantum entanglement."}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

// npm install openai
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "op_your_key_here",
  baseURL: "https://api.originalpoint.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4",
  messages: [{ role: "user", content: "Explain quantum entanglement." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

curl https://api.originalpoint.ai/v1/chat/completions \
  -H "Authorization: Bearer op_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "stream": true,
    "messages": [{"role":"user","content":"Explain quantum entanglement."}]
  }'

7.0 Enterprise Ready

SOC 2 Type II

Annual third-party audit. Report available under NDA.

GDPR + DPA

EU residency options. Zero training on your data.

IP Allowlists

Per-key network policies. Instant propagation.

99.9% SLA

Contractual uptime. Auto credits. No ticket needed.

The routinglayer for AI.

Infrastructure builtfor production AI.

Every frontier model,one endpoint.

Route by cost,speed, or quality.

Everything you need.Nothing you don't.

Start in 2 minutes.

The routing
layer for AI.

Infrastructure built
for production AI.

Every frontier model,
one endpoint.

Route by cost,
speed, or quality.

Everything you need.
Nothing you don't.