Documentation

Everything you need to integrate OriginalPoint into your application.

Quickstart

OriginalPoint is a drop-in replacement for the OpenAI API. If you already use the OpenAI SDK, changing two lines of code gives you access to all 16 models.

Install the SDK

# Install pip install openai from openai import OpenAI client = OpenAI( api_key="op_your_api_key", base_url="https://api.originalpoint.ai/v1" ) response = client.chat.completions.create( model="claude-sonnet-4", messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content)
// Install npm install openai import OpenAI from 'openai'; const client = new OpenAI({ apiKey: 'op_your_api_key', baseURL: 'https://api.originalpoint.ai/v1' }); const response = await client.chat.completions.create({ model: 'claude-sonnet-4', messages: [{ role: 'user', content: 'Hello' }] }); console.log(response.choices[0].message.content);
curl https://api.originalpoint.ai/v1/chat/completions \ -H "Authorization: Bearer op_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4", "messages": [{"role":"user","content":"Hello"}] }'

Your API key is available in the dashboard immediately after signup. Keys are prefixed with op_.

Authentication

All API requests must include your API key in the Authorization header as a Bearer token.

Authorization: Bearer op_your_api_key

Key format

API keys are 40-character strings prefixed with op_. Example: op_a1b2c3d4e5f6...

Key rotation

You can create up to 3 keys (Free), 25 keys (Pro), or unlimited keys (Enterprise). To rotate a key:

  1. Create a new key in the dashboard
  2. Update your application to use the new key
  3. Delete the old key

Keys can be labeled with a name and scoped to specific IP ranges on Pro and Enterprise plans.

Security best practices

  • Never commit API keys to version control
  • Use environment variables: ORIGINALPOINT_API_KEY=op_...
  • Enable IP allowlists on production keys
  • Rotate keys immediately if you suspect compromise

Chat Completions

The primary endpoint. 100% OpenAI-compatible — any code that works with api.openai.com/v1 works here.

POST https://api.originalpoint.ai/v1/chat/completions

Request parameters

ParameterTypeRequiredDescription
modelstringRequiredModel ID (e.g. claude-sonnet-4) or auto for smart routing.
messagesarrayRequiredArray of message objects with role (system/user/assistant) and content.
temperaturenumberOptionalSampling temperature 0–2. Default: 1. Higher = more random.
max_tokensintegerOptionalMaximum tokens to generate. Defaults to model's context limit.
streambooleanOptionalIf true, returns a streamed Server-Sent Events response. Default: false.
top_pnumberOptionalNucleus sampling parameter 0–1. Default: 1.
routingstringOptionalOriginalPoint routing mode: cost, speed, or quality. Only applies when model=auto.

Response schema

{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1714000000, "model": "claude-sonnet-4", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you?" }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 12, "completion_tokens": 9, "total_tokens": 21 }, "x_op_routing_used": "claude-sonnet-4", "x_op_latency_ms": 340 }

Models API

List all available models with pricing and capability metadata.

GET https://api.originalpoint.ai/v1/models

Response

{ "object": "list", "data": [ { "id": "gpt-5-mini", "object": "model", "created": 1714000000, "owned_by": "openai", "context_window": 16000, "pricing": { "input_per_1m": 0.40, "output_per_1m": 1.60 }, "tier": "fast" }, { "id": "claude-sonnet-4", "object": "model", "owned_by": "anthropic", "context_window": 200000, "pricing": { "input_per_1m": 3.00, "output_per_1m": 15.00 }, "tier": "versatile" } // ... 14 more models ] }

Smart Routing

Set model="auto" to let OriginalPoint pick the best model for each request. Combine with a routing objective to control the optimization target.

Routing modes

ModeParameterBehavior
Cost-optimizedrouting="cost"Selects cheapest capable model. Best for batch, classification, simple Q&A.
Latency-firstrouting="speed"Selects lowest-latency model. Best for chat UIs, real-time features.
Quality-maxrouting="quality"Selects highest-capability model. Best for reasoning, code, analysis.

Fallback logic

If a provider experiences degraded availability, OriginalPoint automatically falls back to the next-best model for your routing objective. Fallbacks happen within the same tier. No code changes needed — the response format is identical.

Example

response = client.chat.completions.create( model="auto", routing="cost", # optional, defaults to "quality" messages=[{"role": "user", "content": "Summarize this."}] ) # response.model shows which model was actually used # response.x_op_routing_used shows the routing decision

SDKs

OriginalPoint is compatible with all existing OpenAI SDKs. Just change the base_url/baseURL.

Python

# Install pip install openai from openai import OpenAI client = OpenAI( api_key="op_your_api_key", base_url="https://api.originalpoint.ai/v1" )

Node.js / TypeScript

// Install npm install openai import OpenAI from 'openai'; const client = new OpenAI({ apiKey: process.env.ORIGINALPOINT_API_KEY, baseURL: 'https://api.originalpoint.ai/v1' });

Other SDKs

Any SDK that supports a custom base URL will work: LangChain, LlamaIndex, Vercel AI SDK, Instructor, and more. Set the base URL to https://api.originalpoint.ai/v1.

Rate Limits

Rate limits are applied per API key. If you exceed a limit, you'll receive a 429 Too Many Requests response.

LimitFreeProEnterprise
Requests per minute (RPM)10500Custom
Tokens per minute (TPM)40,0002,000,000Custom
Tokens per month100,00010,000,000Unlimited
Concurrent requests250Custom
Max tokens per request4,09632,768Model max

Rate limit headers

Every response includes the following headers:

X-RateLimit-Limit-Requests: 500 X-RateLimit-Remaining-Requests: 487 X-RateLimit-Reset-Requests: "2025-01-01T00:00:30Z" X-RateLimit-Limit-Tokens: 2000000 X-RateLimit-Remaining-Tokens: 1987432