messages array, and the API returns the model’s response along with token usage statistics. The endpoint is fully compatible with the OpenAI chat completions API, so any tooling built for OpenAI works here without modification.
Endpoint
Request Body
The ID of the model to use. See GET /v1/models for the current list of available models. Example:
llama3.2:1b.An array of message objects representing the conversation so far. Each object must include a
role (system, user, or assistant) and a content string.When set to
true, the API streams partial response tokens as server-sent events (SSE) instead of returning a single completed response.Sampling temperature between
0 and 2. Higher values make the output more random; lower values make it more focused and deterministic. Defaults to 1.The maximum number of tokens the model may generate in its response. When omitted, the model generates until it reaches its natural stopping point or the context limit.
Example Request
cURL
Response
A unique identifier for this completion, prefixed with
chatcmpl-.Always
"chat.completion".An array of completion choices. Most requests produce exactly one choice.
Token consumption for this request.
A unique identifier for this inference request on the Pinaivu network. Use this value to retrieve the signed routing receipt via GET /v1/receipts/.
Example response
Save the
request_id from every response. You can use it to verify the inference — including which node served your request and the coordinator’s cryptographic signature — at https://explorer.pinaivu.com.