# GP5 Sync Optimization: Path to 2,000+ bl/s with Transaction Blocks

## Problem Statement

GP5 (Geth 1.17 fork for XDC) currently achieves ~2,000 bl/s only on **empty blocks**. When syncing blocks containing transactions (primarily XDPoS signing txs sent to `0x00..0088`), throughput drops to ~85 bl/s due to EVM execution overhead (65%), checkpoint stalls (20%), and log spam (10%).

**Goal**: Achieve 2,000+ bl/s on blocks with transactions while maintaining eth/62, eth/63, eth/100 compatibility with XDC v2.6.8 peers.

## Architecture Analysis

### Geth 1.17 Sync Pipeline (upstream)

Geth 1.17's sync uses a **5-stage concurrent pipeline** orchestrated by `spawnSync()`:
1. **Header Fetch** → concurrent fetcher with capacity tracking per-peer
2. **Body Fetch** → `concurrentFetch()` distributes across ALL peers with QoS-based batch sizing
3. **Receipt Fetch** (snap sync only)
4. **Header Processing** → `processHeaders()` validates and schedules bodies
5. **Block Processing** → `processFullSyncContent()` calls `InsertChain()`

Key features:
- **State Prefetcher** (`core/state_prefetcher.go`): Runs txs in parallel goroutines (4*NumCPU/5) on a throwaway state copy to warm SSD caches before real execution
- **Sender Recovery** (`SenderCacher().RecoverFromBlocks()`): Parallelizes ECDSA signature recovery across batch
- **Pebble DB**: Default backend, significantly faster than LevelDB for concurrent reads
- **Freezer/Ancient DB**: Separates old blocks into append-only flat files for I/O isolation
- **Block cache**: 8192 blocks max, 4096 initial — pipeline buffer between fetch and execution

### GP5's XDC Override (`downloader_xdc.go`)

GP5 replaces upstream's sync with `synchroniseXDC()` which:
- Uses **legacy message format** (no RequestId) for eth/62, eth/63, eth/100 compat
- Has its own header/body fetch loops with custom channel routing
- **v10 body fetcher**: Already event-driven multi-peer dispatch with QoS tracking — this is close to upstream quality
- Uses upstream's `processHeaders()` and `processFullSyncContent()` — these are SHARED
- `blockCacheMaxItems=8192`, `blockCacheInitialItems=4096` — already tuned

### What GP5 ALREADY Uses from Upstream
- `queue.go` — full queue scheduling (ReserveBodies/DeliverBodies)
- `processFullSyncContent()` → `InsertChain()` → `insertChain()`
- State prefetcher (active by default — `NoPrefetch` is false unless `--cache.noprefetch` set)
- Sender cacher (parallel ECDSA recovery)
- Pebble DB support (via `--db.engine pebble`, default)
- Freezer DB (active — `rawdb.freezerdb` wraps chain storage)

### What GP5 DOES NOT Use
- Upstream's `concurrentFetch()` for bodies — replaced by custom `fetchBodiesXDC()` (but v10 is similar quality)
- eth/66+ request IDs — not available with v268 peers

## Current Bottleneck Analysis

The 65% EVM execution cost comes from `ProcessBlock()` in `core/blockchain.go`:

```
ProcessBlock() {
  1. Create statedb from parent root (+ XDC cached root lookup)
  2. State Prefetcher: goroutine runs txs on throwaway copy (ACTIVE)
  3. Process(): iterate txs
     - Signing txs (to 0x88): ApplySigningTransaction() — nonce bump + log + receipt
     - Regular txs: full EVM execution via ApplyTransactionWithEVM()
  4. Finalize(): consensus rewards at checkpoints
  5. ValidateState(): bloom/receipt/state root checks (ALL BYPASSED for chainId 50/51)
  6. Commit state to DB
}
```

For signing txs specifically, `ApplySigningTransaction()` does:
- `Sender()` — ECDSA recovery (expensive! ~0.5ms per tx)
- `statedb.Finalise(true)` — trie update
- Nonce read + set
- Log creation + bloom computation

With ~18 signing txs per block, that's ~9ms just for sender recovery per block, limiting to ~111 bl/s even with zero other work.

## Ranked Optimizations

### 1. SKIP SIGNING TX SENDER RECOVERY DURING SYNC (Expected: +500-1000% speed)

**Impact**: CRITICAL — This is the single biggest win.

**Rationale**: During bulk sync, we trust the block is valid because it came from a v268 peer that already validated it. The `SenderCacher().RecoverFromBlocks()` in `insertChain()` already does parallel recovery, but `ApplySigningTransaction()` calls `types.Sender()` again per-tx.

**Proposal**: Add a "bulk sync mode" flag that:
1. Skips `types.Sender()` in `ApplySigningTransaction()` — use cached sender from `SenderCacher`
2. Or better: **skip signing tx processing entirely** during sync. Signing txs only affect nonces (which diverge anyway due to state root differences) and produce logs (not needed during sync).

**Files to modify**:
- `core/state_processor.go`: Add sync-mode bypass in tx loop (lines 121-133)
- `core/blockchain.go`: Pass sync flag to `ProcessBlock()`

**Compatibility risk**: LOW — signing tx state is already divergent (nonce gaps are tolerated)
**Complexity**: LOW — ~30 lines changed

### 2. SKIP ALL TX EXECUTION DURING BULK SYNC (Expected: +1000-2000% speed)

**Impact**: TRANSFORMATIVE — This is how snap sync achieves its speed.

**Rationale**: GP5 already bypasses ALL validation for XDC chains (chainId 50/51):
- State root: BYPASSED (uses XdcCacheStateRoot)
- Gas used: BYPASSED
- Bloom filter: BYPASSED  
- Receipt root: BYPASSED

Since EVERY validation is bypassed, we can skip tx execution entirely and just:
1. Write block header + body to DB
2. At checkpoint blocks (every 900), execute the Finalize() for reward distribution
3. Maintain a "last executed block" pointer

**The key insight**: If we skip execution, we don't compute state at all. But we need state for reward calculations at checkpoints. Solution: execute only checkpoint blocks (and their preceding block for parentState).

**Implementation**:
```go
// In processFullSyncContent or insertChain:
if bulkSync && isXDCChain {
    if !isCheckpointBlock(block) {
        // Just write header+body+dummy receipt, skip execution
        rawdb.WriteBlock(db, block)
        rawdb.WriteCanonicalHash(db, block.Hash(), block.NumberU64())
        continue
    }
    // Execute checkpoint block normally for reward state
}
```

**Files to modify**:
- `core/blockchain.go`: Add `InsertBlocksWithoutExecution()` method
- `eth/downloader/downloader.go`: `processFullSyncContent()` to use new method during sync
- `core/block_validator.go`: Already fully bypassed, no changes needed

**Compatibility risk**: MEDIUM — Need to ensure chain state is recoverable. After sync completes, may need a "state rebuild" pass from the last checkpoint.
**Complexity**: MEDIUM — ~100 lines, but needs careful testing

### 3. BATCH STATE COMMITS (Expected: +50-100% speed)

**Impact**: HIGH

**Rationale**: Currently, `ProcessBlock()` commits state to the trie DB after EVERY block. Geth's `insertChain()` processes blocks one-at-a-time in its main loop. The state prefetcher helps overlap I/O but doesn't batch commits.

**Proposal**: Accumulate state changes across N blocks (e.g., 64) before committing to the trie. This amortizes trie hashing and DB write overhead.

**Implementation**: Modify the block processing loop to:
1. Process block on statedb WITHOUT calling `statedb.Commit()` or `IntermediateRoot()`
2. Only commit every N blocks or at checkpoints
3. Since state root validation is bypassed for XDC, we don't need intermediate roots

**Files to modify**:
- `core/blockchain.go`: `ProcessBlock()` — add deferred commit mode
- `core/block_validator.go`: `ValidateState()` — skip `IntermediateRoot()` call entirely for XDC

**Note**: `IntermediateRoot()` at line 228 of `block_validator.go` is called even though the result is discarded for XDC chains. This is pure waste — computing the full Merkle root just to throw it away.

**Compatibility risk**: LOW — state root is already bypassed
**Complexity**: MEDIUM — ~80 lines

### 4. SKIP IntermediateRoot() FOR XDC CHAINS (Expected: +30-50% speed)

**Impact**: HIGH — This is immediate, zero-risk.

**Rationale**: In `ValidateState()` (block_validator.go:228), GP5 computes `statedb.IntermediateRoot()` which rehashes the entire state trie — then immediately bypasses the root check for chainId 50/51. The root computation is the most expensive single operation in block validation.

**Proposal**: Move the XDC chain check BEFORE the IntermediateRoot call:

```go
func (v *BlockValidator) ValidateState(...) error {
    // ... gas, bloom, receipt checks (all bypassed for XDC) ...
    
    // XDC: Skip expensive state root computation entirely
    if v.config.ChainID != nil {
        chainID := v.config.ChainID.Uint64()
        if chainID == 50 || chainID == 51 {
            // Still need to cache state for next block
            // Use a cheaper "dirty root" or just commit without hashing
            return nil  
        }
    }
    root := statedb.IntermediateRoot(...)  // Only for non-XDC
}
```

**BUT WAIT**: The IntermediateRoot result is used by `XdcCacheStateRoot()` to map local→remote roots. Without it, the next block can't find its parent state.

**Revised approach**: Replace `IntermediateRoot()` with `statedb.Commit()` which returns the root as a side effect of persisting. Or call `IntermediateRoot()` only every N blocks.

**Files to modify**:
- `core/block_validator.go`: Lines 228-249 — restructure XDC bypass
- `core/xdc_state_root_cache.go`: May need alternative root tracking

**Compatibility risk**: LOW
**Complexity**: LOW — ~20 lines, but needs understanding of state commit flow

### 5. PARALLEL BLOCK PROCESSING PIPELINE (Expected: +100-200% speed)

**Impact**: HIGH but complex.

**Rationale**: Currently block processing is strictly sequential: process block N, commit, process block N+1. With the state root bypass, we can pipeline:
- Thread A: execute block N's transactions
- Thread B: commit block N-1's state to disk
- Thread C: prefetch block N+1's state

**Implementation**: Double-buffer statedb instances:
1. While block N commits to disk, start processing block N+1 on a copy
2. Since state roots are bypassed, we don't need to wait for N's commit to start N+1

**Files to modify**:
- `core/blockchain.go`: Major refactor of `insertChain()` main loop
- New file: `core/pipeline_processor.go`

**Compatibility risk**: MEDIUM — concurrent state access needs careful locking
**Complexity**: HIGH — ~200 lines, complex concurrency

### 6. ENSURE PEBBLE DB IS ACTIVE (Expected: +20-40% speed)

**Impact**: MODERATE

**Rationale**: Geth 1.17 defaults to Pebble, which is significantly faster than LevelDB for concurrent read-heavy workloads (like sync). GP5 supports both.

**Verification needed**: Check if GP5's default docker/systemd config explicitly sets `--db.engine`. If not set, Geth 1.17 defaults to Pebble.

**Action**: Confirm Pebble is the active engine in production GP5 deployments. If using LevelDB, switch to Pebble.

**Files to check**:
- Deployment scripts, Dockerfile, systemd unit files
- `node/database.go` for default engine selection

**Compatibility risk**: NONE — transparent to peers
**Complexity**: TRIVIAL — config change

### 7. ELIMINATE parentState COPY (Expected: +10-20% speed)

**Impact**: MODERATE

**Rationale**: In `state_processor.go:94`, EVERY block processing starts with `parentState := statedb.Copy()`. This deep-copies the entire state trie for reward calculation at Finalize(). But Finalize only uses parentState at checkpoint blocks (every 900).

**Proposal**: Only copy parentState at checkpoint blocks:

```go
var parentState *state.StateDB
if isCheckpointBlock(header) {
    parentState = statedb.Copy()
}
```

**Files to modify**:
- `core/state_processor.go`: Line 94 — conditional copy

**Compatibility risk**: LOW — parentState is only used by XDPoS Finalize
**Complexity**: LOW — ~10 lines

### 8. GP5-TO-GP5 SNAP SYNC (Expected: +5000% for 2nd+ nodes)

**Impact**: TRANSFORMATIVE for fleet scaling, but only for 2nd+ GP5 nodes.

**Rationale**: GP5 has full Geth 1.17 snap sync protocol support. Once one GP5 node reaches chain tip, it can serve snap sync to other GP5 nodes — downloading state directly instead of executing.

**Implementation**: 
1. First GP5 node syncs via full sync (with optimizations above)
2. Additional GP5 nodes use `--syncmode snap` and connect to the first GP5
3. v268 peers don't serve snap, but GP5 peers do natively

**Files to modify**: None — just deployment configuration
**Compatibility risk**: NONE
**Complexity**: TRIVIAL — deployment change

## Implementation Priority

| # | Optimization | Speed Gain | Risk | Effort | Priority |
|---|-------------|-----------|------|--------|----------|
| 4 | Skip IntermediateRoot for XDC | +30-50% | Low | Low | **P0 — Do first** |
| 7 | Eliminate parentState copy (non-checkpoint) | +10-20% | Low | Low | **P0 — Do first** |
| 1 | Skip signing tx sender recovery | +500-1000% | Low | Low | **P0 — Do first** |
| 6 | Verify Pebble DB active | +20-40% | None | Trivial | **P0 — Do first** |
| 3 | Batch state commits | +50-100% | Low | Medium | **P1 — Next sprint** |
| 2 | Skip all tx execution during sync | +1000-2000% | Medium | Medium | **P1 — Next sprint** |
| 5 | Parallel block processing pipeline | +100-200% | Medium | High | **P2 — Future** |
| 8 | GP5-to-GP5 snap sync | +5000% (2nd nodes) | None | Trivial | **P1 — Enable for fleet** |

## Quick Wins (P0) — Combined Expected Impact: 3-10x speedup

### Patch 1: Skip IntermediateRoot for XDC (block_validator.go)

```go
// Before line 228, add early return for XDC:
if v.config.ChainID != nil {
    chainID := v.config.ChainID.Uint64()
    if chainID == 50 || chainID == 51 {
        // Skip expensive trie root computation — result is discarded anyway
        // Commit state directly; the commit root will be cached
        root, _ := statedb.Commit(block.NumberU64(), v.config.IsEIP158(header.Number))
        XdcCacheStateRoot(header.Number.Uint64(), root, header.Root)
        return nil
    }
}
```

### Patch 2: Conditional parentState copy (state_processor.go)

```go
// Replace line 94:
// parentState := statedb.Copy()
// With:
var parentState *state.StateDB
blockNum := header.Number.Uint64()
epochLength := uint64(900)
isCheckpoint := blockNum % epochLength == 0
if isCheckpoint {
    parentState = statedb.Copy()
}
```

### Patch 3: Skip signing tx processing during sync (state_processor.go)

```go
// In the tx loop (line 124-133), add sync mode check:
if to := tx.To(); to != nil && *to == common.BlockSignersBinary && config.IsTIPSigning(blockNumber) {
    if core.IsBulkSyncing() {
        // During sync: just bump nonce, skip receipt/log computation
        statedb.SetNonce(from, statedb.GetNonce(from)+1, tracing.NonceChangeUnspecified)
        receipts = append(receipts, &types.Receipt{Status: 1, GasUsed: 0, TxHash: tx.Hash()})
        continue
    }
    // Normal mode: full signing tx processing
    receipt, err := ApplySigningTransaction(...)
}
```

## Compatibility Guarantees

All optimizations maintain:
- eth/62, eth/63, eth/100 protocol compatibility with v268 peers
- Legacy message format (no RequestId wrappers)  
- 18-field XDC headers (Validators, Validator, Penalties)
- State root bypass for chainId 50/51 (already permanent)
- Block hash integrity (no modifications to stored blocks)

## Metrics to Track

- `chain/inserts` — blocks/second during sync
- `chain/prefetch/executes` — prefetcher effectiveness
- `chain/account/reads/cache/process/hit` — state cache hit rate
- New: `xdc/sync/signing_tx_skipped` — signing txs bypassed
- New: `xdc/sync/intermediate_root_skipped` — root computations saved

## Testing Plan

1. Apothem testnet sync (chainId 51) — full sync from genesis with each optimization
2. Benchmark suite: 1000-block batches at different chain heights
3. State integrity check: compare final state root with known-good v268 node
4. Regression: ensure post-sync node can produce blocks normally