Skip to main content

Documentation Index

Fetch the complete documentation index at: https://gcore.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Gcore Everywhere Inference deploys trained AI models on edge inference nodes across 180+ locations worldwide. It brings models closer to users for low response times, with no infrastructure to manage — suited for latency-sensitive workloads in fintech, healthcare, gaming, media, and industrial applications. Gcore routes end-user queries to the nearest running model using anycast endpoints. Smart Routing selects the closest inference region through a single endpoint—no scaling, routing, or node monitoring required.

How Everywhere Inference works

It combines two technologies:
  1. Edge network — provides low latency via anycast balancing, smart routing, and built-in DDoS and bot protection.
  2. Serverless flexible GPU infrastructure — enables deployment of Application Catalog models or custom models on purpose-built NVIDIA GPUs.
How Smart Routing works to speed up requests via Gcore Everywhere Inference
Gcore uses Healthchecks to monitor pod availability. If a pod in one region goes down, requests are automatically routed to the next-closest inference region.
Healthchecks redirects traffic to the next-closest edge node if the closest node is unavailable

Supported VM flavors

The hardware options available to you depend on your account limits and region. To unlock GPU access or add more deployments, submit a quota request.
vGPUsvCPUsMemory (GiB)
416
832
1xL40S16232
2xL40S32464
1xH10016232
2xH10032464
4xH10064928
1xA10016232
2xA10032464
4xA10064928