SN mainnet downtimes over 09.10-15.10

Ohad-StarkWare · November 3, 2025, 2:00pm

TL;DR

Between Oct 9th and Oct 15th, Starknet experienced a series of downtimes. Most of them lasted a few minutes, but three downtimes were longer, lasting 10-20 minutes each. Two separate issues caused these outages: an Aerospike hotspot issue and a misconfiguration in transaction execution time. Both issues were promptly handled and fixed.

Root Causes

Starknet identified two main bugs as responsible for these outages:
Bug 1 - Aerospike hotspot issue: Growing demand for fetching preconfirmed blocks created hotspots within the Aerospilke database, and caused the committer, a service that consumes it, to not be able to successfully update the block hashes, thus stalling the network

Bug 2 - Misconfiguration in transaction execution time: Some transactions Starknet started receiving required more than 5 seconds to execute, causing a loop where a transaction was picked to be sequenced, a block could not include it, it then returned to the mempool, and the issue repeated in the next block until it was evicted from the mempool.

How did the Starknet team react?

During this period, the team worked non-stop, day and night, and over the course of weekends to accomplish the following:

Delivered several mainnet changes to ease and fix the core issues.
Cooperated with ecosystem applications to mitigate the incident impact and ensure resumed operation.
Transparently reflected the situation to the ecosystem in real time in the relevant telegram group

What went well

The team responded fast to the incidents, monitored the situation, and communicated downtimes transparently over all of the available means of communication.
The team was dedicated to solving the issues fast, including shipping mainnet changes on weekends, holidays and day-offs, alongside with the Aerospike team, to understand how to solve the issue
In the second incident, the core issues were isolated and detected quickly and efficiently

What needs improvement

Adding metrics and monitoring around connection utilization to the Aerospike and hotspot creation - Done
Made the caching mechanism even more robust and comprehensive than what was done on 12.10 - Done
Adding more logs to investigate better execution times - Currently WIP, many logs were added already

Conclusion:

These incidents were a valuable stress test for Starknet’s infrastructure. They helped surface edge cases that could only appear under real network load and allowed the core team to harden key components for greater stability. Thanks to quick coordination between core contributors, node operators, and application teams, the network is now more resilient under pressure.

Topic		Replies	Views
Starknet downtime post-mortem: November 15, 2023 📢 Network Updates post-mortem	3	1380	November 28, 2023
17.09.2025: Monthly Starknet Governance Call: Meeting Minutes 🏛 Governance	0	43	September 23, 2025
Starknet security update: potential full node vulnerability recap 🤷‍♀️ All-Purpose Hangout	2	462	March 8, 2025
Possible ways to make starknet transaction to be faster while processing multiple transactions 🙏 Help and Support consensus , decentralization	1	1090	April 12, 2023
What should be the next Starknet features? Starknet Technical Development	11	573	August 10, 2025