Cosmos SDK State Sync Guide

Erik Grinaker
Interchain Ecosystem Blog
4 min readJan 20, 2021

--

The recent Tendermint Core 0.34 release includes support for state sync, which allows a new node to join a network by fetching a snapshot of the application state at a recent height instead of fetching and replaying all historical blocks. This can reduce the time needed to sync with the network from days to minutes.

A Tendermint blog post outlined the state sync protocol itself, and more details are available in the ABCI documentation. However, the good news is that the upcoming Cosmos SDK 0.40 release will include automatic support for state sync, and developers only need to enable it in their applications to make use of it. We’ll explain how in this article, but first some background.

State Sync Snapshots

Tendermint Core handles most of the grunt work of discovering, exchanging, and verifying state data for state sync, but the application must take snapshots of its state at regular intervals and make these available to Tendermint via ABCI calls, and be able to restore these when syncing a new node.

The Cosmos SDK stores application state in a data store called IAVL, and each module can set up its own IAVL stores. At regular height intervals (which are configurable), the Cosmos SDK will export the contents of each store at that height, Protobuf-encode and compress it, and save it to a snapshot store in the local filesystem. Since IAVL keeps historical versions of data, these snapshots can be generated simultaneously with new blocks being executed. These snapshots will then be fetched by Tendermint via ABCI when a new node is state syncing.

Note that only IAVL stores that are managed by the Cosmos SDK can be snapshotted. If the application stores additional data in external data stores, there is currently no mechanism to include these in state sync snapshots, so the application therefore cannot make use of automatic state sync via the SDK. However, it is free to implement the state sync protocol itself as described in the ABCI documentation. Support for this may be added in a later version.

When a new node is state synced, Tendermint will fetch a snapshot from peers in the network and provide it to the local (empty) application, which will import it into its IAVL stores. Tendermint then verifies the application’s app hash against the main blockchain using light client verification, and proceeds to execute blocks as usual. Note that a state synced node will only restore the application state for the height the snapshot was taken at, and will not contain historical data nor historical blocks.

Enabling State Sync Snapshots

To enable state sync snapshots, an application using the Cosmos SDK BaseApp needs to set up a snapshot store (with a database and filesystem directory) and configure the snapshotting interval and the number of historical snapshots to keep. A minimal example of this might be as follows:

snapshotDir := filepath.Join(
cast.ToString(appOpts.Get(flags.FlagHome)), "data", "snapshots")
snapshotDB, err := sdk.NewLevelDB("metadata", snapshotDir)
if err != nil {
panic(err)
}
snapshotStore, err := snapshots.NewStore(snapshotDB, snapshotDir)
if err != nil {
panic(err)
}
app := baseapp.NewBaseApp(
"app", logger, db, txDecoder,
baseapp.SetSnapshotStore(snapshotStore),
baseapp.SetSnapshotInterval(cast.ToUint64(appOpts.Get(
server.FlagStateSyncSnapshotInterval))),
baseapp.SetSnapshotKeepRecent(cast.ToUint32(appOpts.Get(
server.FlagStateSyncSnapshotKeepRecent))),
)

When starting the application with appropriate flags, e.g. --state-sync.snapshot-interval 1000 --state-sync.snapshot-keep-recent 2, it should generate snapshots and output log messages about it:

Creating state snapshot    module=main height=3000
Completed state snapshot module=main height=3000 format=1

Note that the snapshot interval must currently be a multiple of the pruning-keep-every(default 100), to prevent heights from being pruned while taking snapshots. It’s also usually a good idea to keep at least 2 recent snapshots, such that the previous snapshot isn’t removed while a node is attempting to state sync using it.

State Syncing a Node

Once a few nodes in a network have taken state sync snapshots, new nodes can join the network using state sync. To do this, the node should first be configured as usual, and the following pieces of information must be obtained for light client verification:

  • At least 2 available RPC servers.
  • A trusted height.
  • The block ID hash of the trusted height.

This can be obtained e.g. via RPC (must be from a trusted source):

$ curl -s http://foo.net:26657/block | \
jq -r '.result.block.header.height + "\n" + .result.block_id.hash'
1964
6FD28DAAAC79B77F589AE692B6CD403412CE27D0D2629E81951607B297696E5B

We can then configure Tendermint to use state sync in config.toml:

[statesync]
enable = true
rpc_servers = "foo.net:26657,bar.com:26657"
trust_height = 1964
trust_hash = "6FD28DAAAC79B77F589AE692B6CD403412CE27D0D2629E81951607B297696E5B"
trust_period = "336h" # 2/3 of unbonding time

When the node is started it will then attempt to find a state sync snapshot in the network and restore it:

Started node                   module=main nodeInfo="..."
Discovering snapshots for 20s
Discovered new snapshot height=3000 format=1 hash=0F14A473
Discovered new snapshot height=2000 format=1 hash=C6209AF7
Offering snapshot to ABCI app height=3000 format=1 hash=0F14A473
Snapshot accepted, restoring height=3000 format=1 hash=0F14A473
Fetching snapshot chunk height=3000 format=1 chunk=0 total=3
Fetching snapshot chunk height=3000 format=1 chunk=1 total=3
Fetching snapshot chunk height=3000 format=1 chunk=2 total=3
Applied snapshot chunk height=3000 format=1 chunk=0 total=3
Applied snapshot chunk height=3000 format=1 chunk=1 total=3
Applied snapshot chunk height=3000 format=1 chunk=2 total=3
Verified ABCI app height=3000 appHash=F7D66BC9
Snapshot restored height=3000 format=1 hash=0F14A473
Executed block height=3001 validTxs=16 invalidTxs=0
Committed state height=3001 txs=16 appHash=0FDBB0D5F
Executed block height=3002 validTxs=25 invalidTxs=0
Committed state height=3002 txs=25 appHash=40D12E4B3

And that’s it, the node is now state synced, having joined the network in seconds.

--

--