On 03 June 2026, between 13:56 and 15:50 UTC, customers in the us5: Cloud - US East region experienced a service disruption. During this time, users were unable to open or schedule workspaces, and workspace allocations remained unmapped, resulting in regional unavailability.
Root cause
A planned update to our cluster management infrastructure caused an unexpected increase in memory requirements for our core scheduling service. This caused the service to reach its predefined resource limits and restart repeatedly, which temporarily prevented the platform from processing workspace requests in the affected region.
Recovery
Our engineering team took immediate steps to resolve the resource constraints and restore service to the region. We applied a permanent fix to the scheduling infrastructure, which stabilized the environment and allowed all pending workspace requests to complete. Full service was restored by 15:50 UTC.
Corrective and preventative actions
We are implementing increased memory thresholds across all regional clusters to provide additional operational headroom during management system updates.
We are prioritizing the implementation of an active-standby service configuration to eliminate single points of failure and ensure seamless, automated failovers.
We are resolving an observability gap to ensure that service performance and resource metrics remain visible even during periods of high system stress.
We are developing a dedicated runbook for resource-related interruptions to ensure faster identification and standardized remediation steps.
We are conducting a detailed investigation into the management system update to understand why the memory increase occurred and prevent similar behavior in future upgrades.
We apologize for the impact this issue has had on your operations. We are committed to the improvements outlined above to prevent similar disruptions. If you have questions or concerns, please contact our Support team.