TL;DR – ledger storage is an unsolved problem which causes risk of centralization, censorship, and loss of user funds. WeaveVM offers an out-of-the-box way to stream real-time block data and full chain history to a dedicated permanent storage layer.
Where are all the archive nodes?
Most blockchain networks give users a choice of what type of node they can run to participate. Light clients, full nodes and archive nodes have different roles in the network – and different incentive models.
Archive nodes store the complete history of the network including token balance, user interactions, and contract state. Archive nodes are used to replay transactions for debugging or forensic investigations to understand how a state evolved, as well as being the fastest way to run detailed queries for analytics, supply data to indexers, and act as the source of truth for new network participants to sync from.
Ethereum is one of the most decentralized chains, with over 6,000 nodes and 1 million validators. The number of archive nodes is less obvious. With no authoritative source for how many are live, and no incentive to run one, we can infer that the number is far lower than the number of full nodes.
The coming purge
With EIP-4444 coming into effect with The Purge, nodes will begin pruning historical data older than a year to reduce resource requirements, shifting the responsibility for long-term archive storage away from the protocol and onto independent infrastructure providers or third parties. This raises concerns about the long-term availability and decentralization of historical blockchain data.
Centralized archives for trillion-dollar data
Blockchain infrastructure businesses like Ankr, Alchemy, and Chainstack run archive nodes for customers as part of their RPC and data offerings, but the data will only live as long as there is demand for that side of the business. These centralized solutions create points of failure and risk data loss if the business model becomes unsustainable or priorities shift.
The storage problem scope for L1s and L2s
The Ethereum L1 aside, storage is a tricky issue for L2s, less decentralized networks, and networks that rely on third party data availability (DA) providers.
Arbitrum Nova’s risk analysis report from L2Beat highlights the challenges and risks that arise when networks rely on third-party DA providers which lack onchain historical storage. In this model, Data Availability Committees (DACs) take on the role of maintaining transaction data, but their centralized structure is a vulnerability in of itself.
With a DAC and no onchain storage, the network risks losing access to transaction data entirely—potentially resulting in the accidental permanent loss of user funds. This setup also allows the DAC to censor data, whether due to external pressure or intentional deception.
The DAC’s short-term guarantees make the issue worse, with data commitments expiring after 2 weeks. Beyond this window, the availability of historical data relies entirely on third-party mechanisms or assumptions of good behavior.
Another network that relies on a third party for DA is Dymension. Dymension is an L1 built as the base layer for RollApps – interconnected appchains that settle data to the Dymension L1 via Celestia. With this model, Dymension RollApps are responsible for operating their own archival nodes to offset the regular data purges enforced by Celestia. This infrastructure overhead is unviable for RollApps, and frequent DA layer updates from upstream put RollApp developers in the position of making regular hotfixes to deal with lack of backward compatibility.
WeaveVM as an archive node for your network
Networks like Arbitrum Nova and Dymension can solve these issues by offloading storage to a dedicated EVM compatible permanent storage network with long-term incentives.
WeaveVM acts as a decentralized archive node for networks like:
- Metis
- RSS3
- GOAT Network
- Sei
- Humanode
- Dymension
WeaveVM adds a layer of protection and transparency to these chains, ruling out some of the failure modes of centralization and making it so that any third party can access a permanent trustless data pool of ledger history.
WeaveVM storage is a fraction of the cost of storing data on Ethereum, and even cheaper than L2 calldata. That’s because it uses Arweave under the hood. Arweave is a purpose-built permanent storage chain, with a network size of 300 petabytes spread across hundreds of miners. WeaveVM provides a native EVM interface for chains to hook into Arweave storage at the ledger, DA, and smart contract level.
Running a ledger archiver on WeaveVM
At the time of writing, WeaveVM supports two network types for ledger archiving: EVMs and Substrate networks. Both ledger archival repositories contain detailed node setup guides in their README files.
In brief, after setting up an archival node for your chosen network (either EVM or Substrate), you will establish a data pipeline between the target network and WeaveVM + Arweave.
An archival node runs in two threads - backfill and livesync - to index the target network more efficiently. For each block, it downloads data from the target network, serializes it in Borsh-Brotli according to the Block type specified in the archivers’ schema.rs file [1][2], then pushes it to WeaveVM as calldata to the 0x0 address. WeaveVM, as a network, pushes the transaction’s block to Arweave after Borsh-Brotli encoding the WeaveVM block itself, resulting in a permanent data backup of your target network’s ledger history.
Follow @weavevm on X to keep up to date as we release more features for wvm-archiver and beyond.