Documentation
Everything you need to integrate OriginalPoint into your application.
Quickstart
OriginalPoint is a drop-in replacement for the OpenAI API. If you already use the OpenAI SDK, changing two lines of code gives you access to all 16 models.
Install the SDK
Your API key is available in the dashboard immediately after signup. Keys are prefixed with op_.
Authentication
All API requests must include your API key in the Authorization header as a Bearer token.
Key format
API keys are 40-character strings prefixed with op_. Example: op_a1b2c3d4e5f6...
Key rotation
You can create up to 3 keys (Free), 25 keys (Pro), or unlimited keys (Enterprise). To rotate a key:
- Create a new key in the dashboard
- Update your application to use the new key
- Delete the old key
Keys can be labeled with a name and scoped to specific IP ranges on Pro and Enterprise plans.
Security best practices
- Never commit API keys to version control
- Use environment variables:
ORIGINALPOINT_API_KEY=op_... - Enable IP allowlists on production keys
- Rotate keys immediately if you suspect compromise
Chat Completions
The primary endpoint. 100% OpenAI-compatible — any code that works with api.openai.com/v1 works here.
Request parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Required | Model ID (e.g. claude-sonnet-4) or auto for smart routing. |
| messages | array | Required | Array of message objects with role (system/user/assistant) and content. |
| temperature | number | Optional | Sampling temperature 0–2. Default: 1. Higher = more random. |
| max_tokens | integer | Optional | Maximum tokens to generate. Defaults to model's context limit. |
| stream | boolean | Optional | If true, returns a streamed Server-Sent Events response. Default: false. |
| top_p | number | Optional | Nucleus sampling parameter 0–1. Default: 1. |
| routing | string | Optional | OriginalPoint routing mode: cost, speed, or quality. Only applies when model=auto. |
Response schema
Models API
List all available models with pricing and capability metadata.
Response
Smart Routing
Set model="auto" to let OriginalPoint pick the best model for each request. Combine with a routing objective to control the optimization target.
Routing modes
| Mode | Parameter | Behavior |
|---|---|---|
| Cost-optimized | routing="cost" | Selects cheapest capable model. Best for batch, classification, simple Q&A. |
| Latency-first | routing="speed" | Selects lowest-latency model. Best for chat UIs, real-time features. |
| Quality-max | routing="quality" | Selects highest-capability model. Best for reasoning, code, analysis. |
Fallback logic
If a provider experiences degraded availability, OriginalPoint automatically falls back to the next-best model for your routing objective. Fallbacks happen within the same tier. No code changes needed — the response format is identical.
Example
SDKs
OriginalPoint is compatible with all existing OpenAI SDKs. Just change the base_url/baseURL.
Python
Node.js / TypeScript
Other SDKs
Any SDK that supports a custom base URL will work: LangChain, LlamaIndex, Vercel AI SDK, Instructor, and more. Set the base URL to https://api.originalpoint.ai/v1.
Rate Limits
Rate limits are applied per API key. If you exceed a limit, you'll receive a 429 Too Many Requests response.
| Limit | Free | Pro | Enterprise |
|---|---|---|---|
| Requests per minute (RPM) | 10 | 500 | Custom |
| Tokens per minute (TPM) | 40,000 | 2,000,000 | Custom |
| Tokens per month | 100,000 | 10,000,000 | Unlimited |
| Concurrent requests | 2 | 50 | Custom |
| Max tokens per request | 4,096 | 32,768 | Model max |
Rate limit headers
Every response includes the following headers: