Skip to main content

Documentation Index

Fetch the complete documentation index at: https://gcore.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

After creating a GPU cluster, use the cluster details page to monitor nodes, resize the cluster, manage power state, configure network interfaces, and delete cluster resources.

Access cluster details

To view and manage an existing cluster, open the cluster details page.
  1. In the Gcore Customer Portal, navigate to GPU Cloud.
  2. In the sidebar, expand GPU Clusters and select Bare Metal GPU Clusters.
  3. Click on a cluster name to open the details page.
The cluster details page displays summary information in the header panel:
FieldDescription
Cluster IDUnique identifier for the cluster
Pkey IDInfiniBand Partition Key ID. Displayed as ”-” if InfiniBand is not configured or during cluster provisioning
OS DistroOperating system image installed on all nodes
StatusCurrent cluster state
RegionData center location
PlanMonthly pricing plan for the cluster
The page is organized into tabs: Overview, Power, Networking, Tags, User Actions, and Delete.
Cluster overview page showing header panel and tabs

View cluster nodes

The Overview tab lists all nodes (servers) in the cluster. Each node entry shows the node name, flavor, assigned IP addresses, status, timestamps, cost, and other metadata. The table supports filtering by name, date range, and status.
All nodes in a cluster share the same configuration (image, network settings, file shares) defined at cluster creation. These settings cannot be changed after the cluster is created.

Resize a cluster

Cluster size can be adjusted after creation by adding or removing nodes. To resize a cluster:
  1. On the Overview tab, click Resize Cluster.
  2. Adjust the instance count using the + and - buttons.
  3. Click Resize.
Resize cluster dialog with instance count controls
New nodes inherit the cluster’s original configuration defined at creation time. The maximum number of nodes is limited by the current stock availability in the selected region.
While a cluster is in the Resizing state, most management actions are unavailable. Bare Metal GPU nodes, especially H100 configurations, may take 15–40 minutes to provision.
Scaling down to zero nodes is not allowed and returns an error. A cluster must contain at least one node. To remove the cluster entirely, use the Delete tab.
When scaling down, the system removes a random node from the cluster. To delete a specific node, use the per-node delete action instead of resize.

Delete a specific node

To remove a specific node without random selection:
  1. Locate the node in the cluster list.
  2. Click the actions menu (three dots) on the node row.
  3. Select Delete.
  4. Confirm the deletion.
Deleting the last node in a cluster automatically deletes the entire cluster. This applies to both UI and API/Terraform operations. When using the API or Terraform to delete nodes, no warning is displayed before the cluster is removed. Ensure at least one node remains if you want to preserve the cluster.

Power actions

Power actions control the running state of cluster nodes. Actions can be applied to individual nodes or in bulk.

Individual node actions

To control a single node:
  1. Locate the node in the cluster list.
  2. Click the actions menu (three dots) on the node row.
  3. Select the desired action:
    • Power on: Start the node
    • Power off: Shut down the node
    • Soft reboot: Graceful restart of the operating system
    • Hard reboot: If soft reboot fails, force restart the node
    • Rebuild: Reinstall the original operating system image used at cluster creation. All data on local storage is deleted.
    • Replace: Delete the current node and provision a new one with the same configuration (flavor, image, network settings). Available in the UI and via the Replace bare metal GPU cluster server API endpoint. Use this to recover a node stuck in a failed state without changing the cluster setup.
Node actions menu showing power and management options

Bulk actions

To apply actions to multiple nodes simultaneously:
  1. Select nodes using the checkboxes in the nodes table.
  2. Click Group actions in the toolbar.
  3. Select the action to apply to all selected nodes.
Alternatively, use the Power tab to perform soft or hard reboot on all cluster nodes at once.
Power tab with soft and hard reboot options

Network interfaces

The Networking tab displays network interfaces for each node. Click on a node name to expand its interface details. Interface types include:
TypeDescription
PublicExternal IP address for internet access
PrivateInternal network for communication with other cloud resources
InfiniBandHigh-speed, low-latency inter-node network for GPU-to-GPU communication
Networking tab showing node and interface list
For flavors with InfiniBand, multiple InfiniBand interfaces are created automatically (by default, 8 for H100 configurations). These appear as “GPU-cluster ib-subnet” entries in the interface list. Click on an interface to expand its details, including IP address, network configuration, and other network information.
Node with expanded interface showing network details

Modify network interfaces

To add network interfaces on a specific node:
  1. Navigate to the Networking tab.
  2. Click on a node to expand its details.
  3. Click Add Interface or Add Sub-Interface.
  4. Configure the interface type (public or private) and IP allocation settings as supported by the service.
Only network interfaces can be modified on individual nodes after cluster creation. Image and storage configuration cannot be changed at the node level. InfiniBand interfaces are managed automatically and cannot be modified or deleted.

Console access

For troubleshooting or when SSH access is unavailable, use the browser-based console.
  1. In the Overview tab, locate the node you want to access.
  2. Click Open Console in the node row.
  3. The console opens in a new browser tab using noVNC.
  4. Log in using the same credentials as for SSH access.
Browser-based noVNC console with login prompt
The console provides direct terminal access to the node, useful when network connectivity issues prevent SSH access.

Tags

Tags are key-value pairs used to organize and categorize clusters. Tags are applied at the cluster level and inherited by all nodes. To manage tags:
  1. Navigate to the Tags tab on the cluster details page.
  2. Enable the Add custom tags checkbox.
  3. Enter the key and value for each tag.
  4. Click Save changes.
Tags tab with custom tags toggle
Use tags for billing allocation, environment identification, or organizational purposes.
Tags can be modified even while the cluster is in the Resizing state, unlike most other management actions.

User actions

The User Actions tab displays a log of all operations performed on the cluster, including creation, deletion, resize, power, and network actions. Use the date and action type filters to narrow results. This log is useful for auditing and troubleshooting.
User Actions tab showing audit log of cluster operations

Delete a cluster

When a cluster is deleted, local NVMe storage is permanently erased; file shares and object storage remain intact. To delete a cluster:
  1. Navigate to the Delete tab on the cluster details page.
  2. Click Delete Cluster.
  3. Confirm the deletion.
Delete tab with warning message and delete button
Cluster deletion is irreversible. Data on local NVMe storage cannot be recovered.

Automating cluster management

The Gcore Customer Portal is suitable for managing individual clusters. For automated workflows—such as CI/CD pipelines, infrastructure-as-code, or batch provisioning—use the GPU Bare Metal API. The API allows:
  • Starting, stopping, and rebooting cluster nodes
  • Resizing clusters and replacing individual nodes
  • Managing network interfaces on cluster servers
  • Deleting clusters and individual nodes programmatically
The GPU Bare Metal API reference provides endpoint details and code examples.