Documentation — OriginalPoint

Quickstart

Get your first response in under 2 minutes. OriginalPoint is OpenAI-compatible — if you've used the OpenAI SDK, you already know how to use this API.

Step 1: Get your API key

Create a free account at originalpoint.ai/signup. After signup, navigate to Dashboard → API Keys and create your first key. Keys look like: op_xxxxxxxxxxxxxxxxxxxx

Step 2: Install the SDK

Terminal

# Python
pip install openai

# Node.js
npm install openai

Step 3: Make your first request

Python

from openai import OpenAI

client = OpenAI(
    api_key="op_your_key_here",
    base_url="https://api.originalpoint.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5",  # or "auto" for smart routing
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"}
    ]
)

print(response.choices[0].message.content)
# → "4"

That's it. The only changes from standard OpenAI usage are api_key (your OP key) and base_url. All models, parameters, and response formats are identical.

Authentication

All API requests require a Bearer token in the Authorization header.

HTTP Header

Authorization: Bearer op_your_key_here

Managing API Keys

You can create, rotate, and revoke keys from Dashboard → API Keys. Best practices:

→ Use one key per project/environment (dev, staging, prod)
→ Set per-key monthly spend caps to prevent runaway costs
→ Never commit keys to source code — use environment variables
→ Rotate immediately if you suspect a key is compromised

Security note: If your key is exposed in a public repo, revoke it immediately from the dashboard and create a new one. We do not automatically detect key exposure.

Chat Completions

POST /v1/chat/completions

Creates a model response for the given chat conversation. Fully compatible with the OpenAI Chat Completions API.

Request Parameters

Parameter	Type	Required	Description
model	string	Required	Model ID (e.g. `gpt-5`, `claude-opus-4.5`) or `"auto"` for smart routing.
messages	array	Required	Array of message objects. Each has `role` (system/user/assistant) and `content` (string or array for vision).
temperature	number	Optional	Sampling temperature 0–2. Higher = more random. Default: 1. Mutually exclusive with `top_p`.
max_tokens	integer	Optional	Maximum tokens to generate. Model-specific maximum applies. If omitted, model uses its default max.
stream	boolean	Optional	If true, returns a stream of Server-Sent Events. Each event contains a partial completion chunk. Default: false.
top_p	number	Optional	Nucleus sampling: considers tokens comprising the top `top_p` probability mass. Range: 0–1. Default: 1.
n	integer	Optional	Number of completion choices to generate. Each uses additional tokens. Default: 1.
stop	string \| array	Optional	Up to 4 sequences where generation stops. The token triggering stop is not included in the output.
user	string	Optional	End-user ID for abuse monitoring. Passed through to providers that support it.
extra_body.routing	string	Optional	OriginalPoint routing mode: `"cost"`, `"latency"`, or `"reliability"`. Only used when `model="auto"`.

Response Schema

JSON Response

{
  "id": "chatcmpl-op_abc123",
  "object": "chat.completion",
  "created": 1746123456,
  "model": "gpt-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "4"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 1,
    "total_tokens": 24,
    "cost_usd": 0.000061  // OriginalPoint extension field
  }
}

Models API

GET /v1/models

Returns a list of all available models with their IDs, providers, and capabilities.

Example Response

{
  "object": "list",
  "data": [
    {
      "id": "gpt-5",
      "object": "model",
      "created": 1746000000,
      "owned_by": "openai",
      "context_length": 131072,
      "pricing": {
        "input_per_million": 2.50,
        "output_per_million": 10.00
      }
    },
    {
      "id": "claude-opus-4.5",
      "object": "model",
      "created": 1746000000,
      "owned_by": "anthropic",
      "context_length": 204800,
      "pricing": {
        "input_per_million": 15.00,
        "output_per_million": 75.00
      }
    }
    // ... 14 more models
  ]
}

Python — list models

models = client.models.list()
for model in models.data:
    print(model.id, model.owned_by)

Streaming

Set stream=True to receive a Server-Sent Events stream. Tokens are emitted as they're generated — ideal for chat UIs.

Python — streaming

import sys

with client.chat.completions.stream(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Write a haiku."}]
) as stream:
    for text in stream.text_stream():
        print(text, end="", flush=True)

Python SDK

Use the official openai Python package — it's fully compatible.

Install

pip install openai

Setup (~/.bashrc or .env)

export OPENAI_API_KEY="op_your_key_here"
export OPENAI_BASE_URL="https://api.originalpoint.ai/v1"

Usage

from openai import OpenAI
import os

# Reads OPENAI_API_KEY and OPENAI_BASE_URL from environment
client = OpenAI()

resp = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(resp.choices[0].message.content)

Node.js SDK

Use the official openai npm package.

Install

npm install openai

JavaScript / TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.ORIGINALPOINT_API_KEY,
  baseURL: 'https://api.originalpoint.ai/v1'
});

const response = await client.chat.completions.create({
  model: 'gemini-3-pro',
  messages: [{role: 'user', content: 'Explain quantum entanglement simply.'}],
  max_tokens: 256
});

console.log(response.choices[0].message.content);

Smart Routing

Set model="auto" to enable smart routing. The OriginalPoint router selects the optimal model in real-time based on your configured routing mode.

How it works: When you use model="auto", the router evaluates all 16 available models against your request's requirements (token budget, context length, modality) and applies the routing strategy you specify.

Smart routing with cost mode

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this text: ..."}],
    extra_body={
        "routing": "cost"  # Always selects cheapest capable model
    }
)

# Check which model was actually used
print(f"Routed to: {response.model}")
print(f"Cost: ${response.usage.cost_usd:.6f}")

Routing Modes

Mode	Value	Best For
Cost	`cost`	High-volume tasks, classification, extraction, summarization where cost matters most
Latency	`latency`	Real-time chat UIs, interactive applications, any UX where response time is critical
Reliability	`reliability`	Production workflows, automated pipelines where 100% success rate matters

Default: If you use model="auto" without specifying a routing mode, "cost" is used. You can set a default routing mode per-key in the dashboard.

BYOK Setup

Connect your own provider API keys to route through your credentials and pay provider rates directly.

1

Go to Dashboard → BYOK
Navigate to the BYOK section in your OriginalPoint dashboard.
2

Add your provider key
Enter your OpenAI (sk-...), Anthropic (sk-ant-...), or Google API key. It's encrypted immediately.
3

Use as normal
Your API requests automatically use your BYOK credentials when available. No code changes needed.

Error Handling

OriginalPoint uses standard HTTP status codes. Error bodies follow the OpenAI error format.

Status	Code	Meaning
200	ok	Request succeeded
400	invalid_request	Malformed request (missing required field, invalid model ID, etc.)
401	invalid_api_key	API key missing, invalid, or revoked
402	quota_exceeded	Monthly spend cap or included token limit reached
429	rate_limit_exceeded	Too many requests. Check `Retry-After` header.
500	internal_error	Unexpected server error. If persistent, check status page.
503	provider_unavailable	Provider is down. Use `"routing": "reliability"` to auto-failover.

Rate Limits

Plan	Requests/min	Tokens/min	Concurrent
Free	60	150K	5
Pro	300	1M	25
Enterprise	Custom	Custom	Custom

Rate limit headers are included in every response: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.