Skip to main content
The chat completions endpoint is the primary way to run LLM inference on Pinaivu. You send a conversation in the form of a messages array, and the API returns the model’s response along with token usage statistics. The endpoint is fully compatible with the OpenAI chat completions API, so any tooling built for OpenAI works here without modification.

Endpoint

POST https://api.pinaivu.com/v1/chat/completions

Request Body

model
string
required
The ID of the model to use. See GET /v1/models for the current list of available models. Example: llama3.2:1b.
messages
array
required
An array of message objects representing the conversation so far. Each object must include a role (system, user, or assistant) and a content string.
stream
boolean
default:"false"
When set to true, the API streams partial response tokens as server-sent events (SSE) instead of returning a single completed response.
temperature
number
Sampling temperature between 0 and 2. Higher values make the output more random; lower values make it more focused and deterministic. Defaults to 1.
max_tokens
integer
The maximum number of tokens the model may generate in its response. When omitted, the model generates until it reaches its natural stopping point or the context limit.

Example Request

cURL
curl https://api.pinaivu.com/v1/chat/completions \
  -H "Authorization: Bearer sk-pnv-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "What is a peer-to-peer AI inference network?" }
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'
from openai import OpenAI

client = OpenAI(
    api_key="sk-pnv-...",
    base_url="https://api.pinaivu.com/v1",
)

response = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is a peer-to-peer AI inference network?"},
    ],
    temperature=0.7,
    max_tokens=256,
)

print(response.choices[0].message.content)
print("Request ID:", response.request_id)

Response

id
string
A unique identifier for this completion, prefixed with chatcmpl-.
object
string
Always "chat.completion".
choices
array
An array of completion choices. Most requests produce exactly one choice.
usage
object
Token consumption for this request.
request_id
string
A unique identifier for this inference request on the Pinaivu network. Use this value to retrieve the signed routing receipt via GET /v1/receipts/.
Example response
{
  "id": "chatcmpl-7f3a9b2c1d4e5f6a",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "llama3.2:3b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A peer-to-peer AI inference network is a decentralized system where independent GPU operators contribute compute capacity to serve machine learning model requests. Instead of routing traffic through a single cloud provider, each request is matched to an available node on the network — reducing latency, eliminating single points of failure, and allowing GPU owners to earn by sharing their hardware."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 38,
    "completion_tokens": 74,
    "total_tokens": 112
  },
  "request_id": "req_01j9z4kxm8vwqe3t5p6n7r2y0c"
}
Save the request_id from every response. You can use it to verify the inference — including which node served your request and the coordinator’s cryptographic signature — at https://explorer.pinaivu.com.