Documentation — OriginalPoint

Quickstart

OriginalPoint is a drop-in replacement for the OpenAI API. If you already use the OpenAI SDK, changing two lines of code gives you access to all 16 models.

Install the SDK

            
            
            
          

# Install
pip install openai

from openai import OpenAI

client = OpenAI(
    api_key="op_your_api_key",
    base_url="https://api.originalpoint.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

// Install
npm install openai

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'op_your_api_key',
  baseURL: 'https://api.originalpoint.ai/v1'
});

const response = await client.chat.completions.create({
  model: 'claude-sonnet-4',
  messages: [{ role: 'user', content: 'Hello' }]
});
console.log(response.choices[0].message.content);

curl https://api.originalpoint.ai/v1/chat/completions \
  -H "Authorization: Bearer op_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role":"user","content":"Hello"}]
  }'

Your API key is available in the dashboard immediately after signup. Keys are prefixed with op_.

Authentication

All API requests must include your API key in the Authorization header as a Bearer token.

Authorization: Bearer op_your_api_key

Key format

API keys are 40-character strings prefixed with op_. Example: op_a1b2c3d4e5f6...

Key rotation

You can create up to 3 keys (Free), 25 keys (Pro), or unlimited keys (Enterprise). To rotate a key:

Create a new key in the dashboard
Update your application to use the new key
Delete the old key

Keys can be labeled with a name and scoped to specific IP ranges on Pro and Enterprise plans.

Security best practices

Never commit API keys to version control
Use environment variables: ORIGINALPOINT_API_KEY=op_...
Enable IP allowlists on production keys
Rotate keys immediately if you suspect compromise

Chat Completions

The primary endpoint. 100% OpenAI-compatible — any code that works with api.openai.com/v1 works here.

POST https://api.originalpoint.ai/v1/chat/completions

Request parameters

Parameter	Type	Required	Description
model	string	Required	Model ID (e.g. `claude-sonnet-4`) or `auto` for smart routing.
messages	array	Required	Array of message objects with `role` (system/user/assistant) and `content`.
temperature	number	Optional	Sampling temperature 0–2. Default: 1. Higher = more random.
max_tokens	integer	Optional	Maximum tokens to generate. Defaults to model's context limit.
stream	boolean	Optional	If true, returns a streamed Server-Sent Events response. Default: false.
top_p	number	Optional	Nucleus sampling parameter 0–1. Default: 1.
routing	string	Optional	OriginalPoint routing mode: `cost`, `speed`, or `quality`. Only applies when `model=auto`.

Response schema

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "claude-sonnet-4",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  },
  "x_op_routing_used": "claude-sonnet-4",
  "x_op_latency_ms": 340
}

Models API

List all available models with pricing and capability metadata.

GET https://api.originalpoint.ai/v1/models

Response

{
  "object": "list",
  "data": [
    {
      "id": "gpt-5-mini",
      "object": "model",
      "created": 1714000000,
      "owned_by": "openai",
      "context_window": 16000,
      "pricing": { "input_per_1m": 0.40, "output_per_1m": 1.60 },
      "tier": "fast"
    },
    {
      "id": "claude-sonnet-4",
      "object": "model",
      "owned_by": "anthropic",
      "context_window": 200000,
      "pricing": { "input_per_1m": 3.00, "output_per_1m": 15.00 },
      "tier": "versatile"
    }
    // ... 14 more models
  ]
}

Smart Routing

Set model="auto" to let OriginalPoint pick the best model for each request. Combine with a routing objective to control the optimization target.

Routing modes

Mode	Parameter	Behavior
Cost-optimized	routing="cost"	Selects cheapest capable model. Best for batch, classification, simple Q&A.
Latency-first	routing="speed"	Selects lowest-latency model. Best for chat UIs, real-time features.
Quality-max	routing="quality"	Selects highest-capability model. Best for reasoning, code, analysis.

Fallback logic

If a provider experiences degraded availability, OriginalPoint automatically falls back to the next-best model for your routing objective. Fallbacks happen within the same tier. No code changes needed — the response format is identical.

Example

response = client.chat.completions.create(
    model="auto",
    routing="cost",  # optional, defaults to "quality"
    messages=[{"role": "user", "content": "Summarize this."}]
)
# response.model shows which model was actually used
# response.x_op_routing_used shows the routing decision

SDKs

OriginalPoint is compatible with all existing OpenAI SDKs. Just change the base_url/baseURL.

Python

# Install
pip install openai

from openai import OpenAI

client = OpenAI(
    api_key="op_your_api_key",
    base_url="https://api.originalpoint.ai/v1"
)

Node.js / TypeScript

// Install
npm install openai

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.ORIGINALPOINT_API_KEY,
  baseURL: 'https://api.originalpoint.ai/v1'
});

Other SDKs

Any SDK that supports a custom base URL will work: LangChain, LlamaIndex, Vercel AI SDK, Instructor, and more. Set the base URL to https://api.originalpoint.ai/v1.

Rate Limits

Rate limits are applied per API key. If you exceed a limit, you'll receive a 429 Too Many Requests response.

Limit	Free	Pro	Enterprise
Requests per minute (RPM)	10	500	Custom
Tokens per minute (TPM)	40,000	2,000,000	Custom
Tokens per month	100,000	10,000,000	Unlimited
Concurrent requests	2	50	Custom
Max tokens per request	4,096	32,768	Model max

Rate limit headers

Every response includes the following headers:

X-RateLimit-Limit-Requests: 500
X-RateLimit-Remaining-Requests: 487
X-RateLimit-Reset-Requests: "2025-01-01T00:00:30Z"
X-RateLimit-Limit-Tokens: 2000000
X-RateLimit-Remaining-Tokens: 1987432