Pinaivu FAQ: API Keys, Models, Billing, and Inference

Is Pinaivu compatible with the OpenAI SDK?

Yes. Pinaivu’s API is fully OpenAI-compatible. You only need to point base_url at https://api.pinaivu.com/v1 and supply your Pinaivu API key — no other changes are required.

from openai import OpenAI

client = OpenAI(
    api_key="sk-pnv-...",
    base_url="https://api.pinaivu.com/v1",
)

How do I get an API key?

Sign up or log in at https://api.pinaivu.com. Once you’re in the dashboard, navigate to API Keys and create a new key. Your key will be prefixed with sk-pnv- — copy it immediately, as it won’t be shown again.

Store your key in an environment variable (e.g. PINAIVU_API_KEY) rather than hard-coding it in your source files.

What models are available?

Pinaivu currently serves open-source LLMs routed across its decentralized GPU network, including:

llama3.2:1b
llama3.2:3b

To fetch the live list of active models at any time, query the /v1/models endpoint:

curl https://api.pinaivu.com/v1/models \
  -H "Authorization: Bearer sk-pnv-..."

The response follows the standard OpenAI model-list schema, so any tooling that already parses that format will work without modification.

What is a routing receipt?

A routing receipt is a signed proof of inference that Pinaivu attaches to every completed request. It records which node handled your request, the model used, and a cryptographic attestation that the computation ran as declared.Every routing receipt includes a request_id that you can use to look up the full record on the explorer. You can retrieve a receipt programmatically via the GET /v1/receipts/ endpoint. For a deeper explanation, see Routing Receipts.

How do I verify my inference?

Every successful response includes a request_id field. To verify the inference:

Copy the request_id

Find the request_id in the response body from your API call.

Open the explorer

Navigate to https://explorer.pinaivu.com.

Search for your request

Paste the request_id into the search bar. The explorer shows the routing receipt, the attesting node, timestamps, and the cryptographic proof.

You can also retrieve the receipt directly via the API — see GET /v1/receipts/ and the Verifying Inference guide for full details.

What happens if no nodes are available?

If all GPU nodes on the network are busy or temporarily unreachable, the API returns a 503 Service Unavailable error. The network is self-healing — nodes come back online quickly — so retrying with exponential backoff is usually sufficient.

import time, openai

client = openai.OpenAI(
    api_key="sk-pnv-...",
    base_url="https://api.pinaivu.com/v1",
)

for attempt in range(5):
    try:
        response = client.chat.completions.create(
            model="llama3.2:3b",
            messages=[{"role": "user", "content": "Hello"}],
        )
        break
    except openai.APIStatusError as e:
        if e.status_code == 503:
            time.sleep(2 ** attempt)
        else:
            raise

Avoid tight retry loops without backoff — hammering the API during a recovery window won’t speed things up and may trigger rate limiting.

What does a 422 error mean?

A 422 Unprocessable Entity response means your request was received and authenticated, but the body failed validation. This is different from a 400 Bad Request — your JSON was syntactically valid, but one or more fields had an incorrect type, an unrecognized value, or a missing required property.Common causes:

Passing an unsupported value for model (check the exact ID via GET /v1/models).
Sending messages in the wrong format (each entry must include both role and content).
Setting parameters outside their allowed range (for example, a negative temperature).

The error response body includes a detail field that identifies which field failed and why — read it carefully to pinpoint the problem before retrying.

Is there a rate limit?

Yes. Pinaivu enforces per-key rate limits to keep the network stable for all users. When you exceed your limit, the API returns a 429 Too Many Requests error with a Retry-After header indicating how long to wait.If your use case requires higher throughput, contact support to discuss raising your limits.

Can I use Pinaivu for streaming responses?

Yes. Streaming works exactly like it does with the OpenAI API. Set stream: true (or stream=True in Python) in your request and consume the server-sent event stream as usual.

Python

client = OpenAI(api_key="sk-pnv-...", base_url="https://api.pinaivu.com/v1")

stream = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[{"role": "user", "content": "Tell me a joke."}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

What is the difference between the API and the chat interface?

	API (`api.pinaivu.com/v1`)	Chat (`chat.pinaivu.ai`)
Access	Programmatic (SDK / HTTP)	Browser-based
State	Stateless — you manage conversation history	Cross-session memory built in
Auth	Bearer token (`sk-pnv-...`)	Account login
Best for	Applications, automation, batch workloads	Interactive exploration, prototyping

Use the API when you’re building a product or pipeline. Use the chat interface when you want to experiment with models interactively without writing code.

How is billing calculated?

Billing is calculated on a per-token basis. The rate depends on which model you use — smaller models like llama3.2:1b cost less per token than larger ones.You can review your usage in two ways:

Dashboard — log in at https://api.pinaivu.com and open the Usage tab.
API — query the usage endpoint programmatically:

curl https://api.pinaivu.com/v1/usage \
  -H "Authorization: Bearer sk-pnv-..."