The Request Flow
Send your prompt to the gateway
You call the Pinaivu gateway at
https://api.pinaivu.com/v1 using any OpenAI-compatible client. Your request looks exactly like a standard chat completions call — no special SDK required.The Coordinator runs a real-time auction
The gateway forwards your request to the Coordinator, a trusted component running inside an AWS Nitro Enclave. The Coordinator broadcasts the job to all available GPU nodes and selects the winning bid within milliseconds.
The winning node runs your inference
The selected GPU node receives your prompt, runs the model, and streams the response directly back through the gateway. No intermediate hops, no unnecessary latency.
The Coordinator settles payment and signs a receipt
Once your response is delivered, the Coordinator finalizes the payment to the winning node and produces a routing receipt — a signed record of the entire transaction.
Why decentralization matters
Traditional inference APIs rely on a single provider’s infrastructure. If that provider has an outage, raises prices, or changes its policies, you have no alternative. Pinaivu works differently:- No single point of failure. Requests are routed to whichever nodes are healthy and available. If one node goes offline, others continue to compete and serve traffic.
- Competitive pricing. GPU operators bid against each other for every job. That market pressure keeps costs fair and gives you better value than a fixed-price monopoly provider.
- Verifiable results. Every inference call produces a cryptographically signed receipt. You’re not asked to trust Pinaivu’s word — you can verify the outcome yourself on the public explorer.
What you get
Nodes & Coordinator
Learn how independent GPU operators and the attested Coordinator work together to deliver every inference request.
Signed Receipts
Every request generates a routing receipt you can verify on the public explorer. Audit who served your inference, when, and at what cost.
Available Models
Pinaivu runs open-source LLMs on its decentralized network. See which models are available and how to specify them in your requests.