OriginalPoint Documentation

Quickstart

Get your first response in under 2 minutes. OriginalPoint is OpenAI-compatible — if you've used the OpenAI SDK, you already know how to use this API.

Step 1: Get your API key

Create a free account at originalpoint.ai/signup. After signup, navigate to Dashboard → API Keys and create your first key. Keys look like: op_xxxxxxxxxxxxxxxxxxxx

Step 2: Install the SDK

Terminal
# Python
pip install openai

# Node.js
npm install openai

Step 3: Make your first request

Python
from openai import OpenAI

client = OpenAI(
    api_key="op_your_key_here",
    base_url="https://api.originalpoint.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5",  # or "auto" for smart routing
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"}
    ]
)

print(response.choices[0].message.content)
# → "4"
That's it. The only changes from standard OpenAI usage are api_key (your OP key) and base_url. All models, parameters, and response formats are identical.

Authentication

All API requests require a Bearer token in the Authorization header.

HTTP Header
Authorization: Bearer op_your_key_here

Managing API Keys

You can create, rotate, and revoke keys from Dashboard → API Keys. Best practices:

  • Use one key per project/environment (dev, staging, prod)
  • Set per-key monthly spend caps to prevent runaway costs
  • Never commit keys to source code — use environment variables
  • Rotate immediately if you suspect a key is compromised
Security note: If your key is exposed in a public repo, revoke it immediately from the dashboard and create a new one. We do not automatically detect key exposure.

Chat Completions

POST /v1/chat/completions

Creates a model response for the given chat conversation. Fully compatible with the OpenAI Chat Completions API.

Request Parameters

Parameter Type Required Description
model string Required Model ID (e.g. gpt-5, claude-opus-4.5) or "auto" for smart routing.
messages array Required Array of message objects. Each has role (system/user/assistant) and content (string or array for vision).
temperature number Optional Sampling temperature 0–2. Higher = more random. Default: 1. Mutually exclusive with top_p.
max_tokens integer Optional Maximum tokens to generate. Model-specific maximum applies. If omitted, model uses its default max.
stream boolean Optional If true, returns a stream of Server-Sent Events. Each event contains a partial completion chunk. Default: false.
top_p number Optional Nucleus sampling: considers tokens comprising the top top_p probability mass. Range: 0–1. Default: 1.
n integer Optional Number of completion choices to generate. Each uses additional tokens. Default: 1.
stop string | array Optional Up to 4 sequences where generation stops. The token triggering stop is not included in the output.
user string Optional End-user ID for abuse monitoring. Passed through to providers that support it.
extra_body.routing string Optional OriginalPoint routing mode: "cost", "latency", or "reliability". Only used when model="auto".

Response Schema

JSON Response
{
  "id": "chatcmpl-op_abc123",
  "object": "chat.completion",
  "created": 1746123456,
  "model": "gpt-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "4"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 1,
    "total_tokens": 24,
    "cost_usd": 0.000061  // OriginalPoint extension field
  }
}

Models API

GET /v1/models

Returns a list of all available models with their IDs, providers, and capabilities.

Example Response
{
  "object": "list",
  "data": [
    {
      "id": "gpt-5",
      "object": "model",
      "created": 1746000000,
      "owned_by": "openai",
      "context_length": 131072,
      "pricing": {
        "input_per_million": 2.50,
        "output_per_million": 10.00
      }
    },
    {
      "id": "claude-opus-4.5",
      "object": "model",
      "created": 1746000000,
      "owned_by": "anthropic",
      "context_length": 204800,
      "pricing": {
        "input_per_million": 15.00,
        "output_per_million": 75.00
      }
    }
    // ... 14 more models
  ]
}
Python — list models
models = client.models.list()
for model in models.data:
    print(model.id, model.owned_by)

Streaming

Set stream=True to receive a Server-Sent Events stream. Tokens are emitted as they're generated — ideal for chat UIs.

Python — streaming
import sys

with client.chat.completions.stream(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Write a haiku."}]
) as stream:
    for text in stream.text_stream():
        print(text, end="", flush=True)

Python SDK

Use the official openai Python package — it's fully compatible.

Install
pip install openai
Setup (~/.bashrc or .env)
export OPENAI_API_KEY="op_your_key_here"
export OPENAI_BASE_URL="https://api.originalpoint.ai/v1"
Usage
from openai import OpenAI
import os

# Reads OPENAI_API_KEY and OPENAI_BASE_URL from environment
client = OpenAI()

resp = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(resp.choices[0].message.content)

Node.js SDK

Use the official openai npm package.

Install
npm install openai
JavaScript / TypeScript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.ORIGINALPOINT_API_KEY,
  baseURL: 'https://api.originalpoint.ai/v1'
});

const response = await client.chat.completions.create({
  model: 'gemini-3-pro',
  messages: [{role: 'user', content: 'Explain quantum entanglement simply.'}],
  max_tokens: 256
});

console.log(response.choices[0].message.content);

Smart Routing

Set model="auto" to enable smart routing. The OriginalPoint router selects the optimal model in real-time based on your configured routing mode.

How it works: When you use model="auto", the router evaluates all 16 available models against your request's requirements (token budget, context length, modality) and applies the routing strategy you specify.
Smart routing with cost mode
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this text: ..."}],
    extra_body={
        "routing": "cost"  # Always selects cheapest capable model
    }
)

# Check which model was actually used
print(f"Routed to: {response.model}")
print(f"Cost: ${response.usage.cost_usd:.6f}")

Routing Modes

ModeValueBest For
Cost cost High-volume tasks, classification, extraction, summarization where cost matters most
Latency latency Real-time chat UIs, interactive applications, any UX where response time is critical
Reliability reliability Production workflows, automated pipelines where 100% success rate matters
Default: If you use model="auto" without specifying a routing mode, "cost" is used. You can set a default routing mode per-key in the dashboard.

BYOK Setup

Connect your own provider API keys to route through your credentials and pay provider rates directly.

  1. 1
    Go to Dashboard → BYOK
    Navigate to the BYOK section in your OriginalPoint dashboard.
  2. 2
    Add your provider key
    Enter your OpenAI (sk-...), Anthropic (sk-ant-...), or Google API key. It's encrypted immediately.
  3. 3
    Use as normal
    Your API requests automatically use your BYOK credentials when available. No code changes needed.

Error Handling

OriginalPoint uses standard HTTP status codes. Error bodies follow the OpenAI error format.

StatusCodeMeaning
200okRequest succeeded
400invalid_requestMalformed request (missing required field, invalid model ID, etc.)
401invalid_api_keyAPI key missing, invalid, or revoked
402quota_exceededMonthly spend cap or included token limit reached
429rate_limit_exceededToo many requests. Check Retry-After header.
500internal_errorUnexpected server error. If persistent, check status page.
503provider_unavailableProvider is down. Use "routing": "reliability" to auto-failover.

Rate Limits

PlanRequests/minTokens/minConcurrent
Free60150K5
Pro3001M25
EnterpriseCustomCustomCustom

Rate limit headers are included in every response: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.