Skip to main content
Find answers to the most common questions about using Pinaivu below. If you don’t see what you’re looking for, visit the explorer for request-level detail or reach out to support.
Yes. Pinaivu’s API is fully OpenAI-compatible. You only need to point base_url at https://api.pinaivu.com/v1 and supply your Pinaivu API key — no other changes are required.
from openai import OpenAI

client = OpenAI(
    api_key="sk-pnv-...",
    base_url="https://api.pinaivu.com/v1",
)
Sign up or log in at https://api.pinaivu.com. Once you’re in the dashboard, navigate to API Keys and create a new key. Your key will be prefixed with sk-pnv- — copy it immediately, as it won’t be shown again.
Store your key in an environment variable (e.g. PINAIVU_API_KEY) rather than hard-coding it in your source files.
Pinaivu currently serves open-source LLMs routed across its decentralized GPU network, including:
  • llama3.2:1b
  • llama3.2:3b
To fetch the live list of active models at any time, query the /v1/models endpoint:
curl https://api.pinaivu.com/v1/models \
  -H "Authorization: Bearer sk-pnv-..."
The response follows the standard OpenAI model-list schema, so any tooling that already parses that format will work without modification.
A routing receipt is a signed proof of inference that Pinaivu attaches to every completed request. It records which node handled your request, the model used, and a cryptographic attestation that the computation ran as declared.Every routing receipt includes a request_id that you can use to look up the full record on the explorer. You can retrieve a receipt programmatically via the GET /v1/receipts/ endpoint. For a deeper explanation, see Routing Receipts.
Every successful response includes a request_id field. To verify the inference:
1

Copy the request_id

Find the request_id in the response body from your API call.
2

Open the explorer

3

Search for your request

Paste the request_id into the search bar. The explorer shows the routing receipt, the attesting node, timestamps, and the cryptographic proof.
You can also retrieve the receipt directly via the API — see GET /v1/receipts/ and the Verifying Inference guide for full details.
If all GPU nodes on the network are busy or temporarily unreachable, the API returns a 503 Service Unavailable error. The network is self-healing — nodes come back online quickly — so retrying with exponential backoff is usually sufficient.
import time, openai

client = openai.OpenAI(
    api_key="sk-pnv-...",
    base_url="https://api.pinaivu.com/v1",
)

for attempt in range(5):
    try:
        response = client.chat.completions.create(
            model="llama3.2:3b",
            messages=[{"role": "user", "content": "Hello"}],
        )
        break
    except openai.APIStatusError as e:
        if e.status_code == 503:
            time.sleep(2 ** attempt)
        else:
            raise
Avoid tight retry loops without backoff — hammering the API during a recovery window won’t speed things up and may trigger rate limiting.
A 422 Unprocessable Entity response means your request was received and authenticated, but the body failed validation. This is different from a 400 Bad Request — your JSON was syntactically valid, but one or more fields had an incorrect type, an unrecognized value, or a missing required property.Common causes:
  • Passing an unsupported value for model (check the exact ID via GET /v1/models).
  • Sending messages in the wrong format (each entry must include both role and content).
  • Setting parameters outside their allowed range (for example, a negative temperature).
The error response body includes a detail field that identifies which field failed and why — read it carefully to pinpoint the problem before retrying.
Yes. Pinaivu enforces per-key rate limits to keep the network stable for all users. When you exceed your limit, the API returns a 429 Too Many Requests error with a Retry-After header indicating how long to wait.If your use case requires higher throughput, contact support to discuss raising your limits.
Yes. Streaming works exactly like it does with the OpenAI API. Set stream: true (or stream=True in Python) in your request and consume the server-sent event stream as usual.
Python
client = OpenAI(api_key="sk-pnv-...", base_url="https://api.pinaivu.com/v1")

stream = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[{"role": "user", "content": "Tell me a joke."}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)
API (api.pinaivu.com/v1)Chat (chat.pinaivu.ai)
AccessProgrammatic (SDK / HTTP)Browser-based
StateStateless — you manage conversation historyCross-session memory built in
AuthBearer token (sk-pnv-...)Account login
Best forApplications, automation, batch workloadsInteractive exploration, prototyping
Use the API when you’re building a product or pipeline. Use the chat interface when you want to experiment with models interactively without writing code.
Billing is calculated on a per-token basis. The rate depends on which model you use — smaller models like llama3.2:1b cost less per token than larger ones.You can review your usage in two ways:
  • Dashboard — log in at https://api.pinaivu.com and open the Usage tab.
  • API — query the usage endpoint programmatically:
curl https://api.pinaivu.com/v1/usage \
  -H "Authorization: Bearer sk-pnv-..."