POST /v1/chat/completions — Generate a Chat Response

The chat completions endpoint is the primary way to run LLM inference on Pinaivu. You send a conversation in the form of a messages array, and the API returns the model’s response along with token usage statistics. The endpoint is fully compatible with the OpenAI chat completions API, so any tooling built for OpenAI works here without modification.

Endpoint

POST https://api.pinaivu.com/v1/chat/completions

Request Body

model

string

required

The ID of the model to use. See GET /v1/models for the current list of available models. Example: llama3.2:1b.

messages

array

required

An array of message objects representing the conversation so far. Each object must include a role (system, user, or assistant) and a content string.

Show Message object fields

role

string

required

The role of the message author. One of system, user, or assistant.

content

string

required

The text content of the message.

stream

boolean

default:"false"

When set to true, the API streams partial response tokens as server-sent events (SSE) instead of returning a single completed response.

temperature

number

Sampling temperature between 0 and 2. Higher values make the output more random; lower values make it more focused and deterministic. Defaults to 1.

max_tokens

integer

The maximum number of tokens the model may generate in its response. When omitted, the model generates until it reaches its natural stopping point or the context limit.

Example Request

cURL

curl https://api.pinaivu.com/v1/chat/completions \
  -H "Authorization: Bearer sk-pnv-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "What is a peer-to-peer AI inference network?" }
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

from openai import OpenAI

client = OpenAI(
    api_key="sk-pnv-...",
    base_url="https://api.pinaivu.com/v1",
)

response = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is a peer-to-peer AI inference network?"},
    ],
    temperature=0.7,
    max_tokens=256,
)

print(response.choices[0].message.content)
print("Request ID:", response.request_id)

Response

string

A unique identifier for this completion, prefixed with chatcmpl-.

object

string

Always "chat.completion".

choices

array

An array of completion choices. Most requests produce exactly one choice.

Show Choice object fields

message.role

string

Always "assistant" for completions.

message.content

string

The text generated by the model.

finish_reason

string

The reason the model stopped generating. One of stop (natural end), length (reached max_tokens), or content_filter.

usage

object

Token consumption for this request.

Show Usage object fields

prompt_tokens

integer

Number of tokens in the input messages.

completion_tokens

integer

Number of tokens generated in the response.

total_tokens

integer

Sum of prompt_tokens and completion_tokens.

request_id

string

A unique identifier for this inference request on the Pinaivu network. Use this value to retrieve the signed routing receipt via GET /v1/receipts/.

Example response

{
  "id": "chatcmpl-7f3a9b2c1d4e5f6a",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "llama3.2:3b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A peer-to-peer AI inference network is a decentralized system where independent GPU operators contribute compute capacity to serve machine learning model requests. Instead of routing traffic through a single cloud provider, each request is matched to an available node on the network — reducing latency, eliminating single points of failure, and allowing GPU owners to earn by sharing their hardware."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 38,
    "completion_tokens": 74,
    "total_tokens": 112
  },
  "request_id": "req_01j9z4kxm8vwqe3t5p6n7r2y0c"
}

Save the request_id from every response. You can use it to verify the inference — including which node served your request and the coordinator’s cryptographic signature — at https://explorer.pinaivu.com.

​Endpoint

​Request Body

​Example Request

​Response

Endpoint

Request Body

Example Request

Response