# GP5 Checkpoint Sync V2 — Validated Implementation Plan

**Date:** 2026-05-01
**Branch:** feat/trusted-checkpoint-sync (commit a6624646c)
**Author:** Hermes Agent
**Status:** Cross-validated against RESEARCH_V2_6_8_COMPARISON.md + live code audit

---

## 1. Executive Summary

The RESEARCH_V2_6_8_COMPARISON.md identified **4 CRITICAL**, **5 HIGH**, and **4 MEDIUM** divergences between GP5 and XDPoSChain v2.6.8. The task asked to validate against **3 specific critical bugs** from a (missing) `v2_sync_comparison_report.md`. After exhaustive code review, we map those 3 bugs to **real, already-partially-fixed** issues in the current codebase:

| # | Reported Bug | Current GP5 Status | Verdict |
|---|-------------|-------------------|---------|
| 1 | V2 validation too permissive — skips QC/round/leader/validators/penalties | **`verifyHeader.go` already has full validation** (validator len, QC, round, leader index, coinbase match, penalties, CompareSignersLists). **FIXED in v113/v114.** | ✅ **CLOSED** |
| 2 | Missing gap snapshots — `InsertHeadersBeforeCutoff` never runs XDPoS engine → no snapshots | **`UpdateMasternodesFromHeader` exists** and is called during block import. `repairSnapshot` also loads from contract state. **Partially fixed; gap exists for checkpoint-anchor path.** | ⚠️ **NEEDS v115 HARDENING** |
| 3 | No `getFromCheckpoint` equivalent — no contract fallback when snapshots missing → `ErrUnknownAncestor` | **`repairSnapshot` already queries contract state** via `StateAt(gapHeader.Root)`. **FIXED in v113/v114.** | ✅ **CLOSED** |

**Key Finding:** The 3 reported bugs are **largely already fixed** in v113/v114. What remains is a **gap in the checkpoint-anchor insertion path** (Bug #2 variant): when `InsertHeadersBeforeCutoff` inserts the checkpoint header as a trusted anchor, it does **not** seed the V2 snapshot for that checkpoint's gap block. This causes the first post-checkpoint epoch switch block to fail `calcMasternodes` because `getSnapshot` returns nil or a repaired snapshot that may not match the canonical validator set.

---

## 2. Detailed Cross-Validation of Each Bug

### Bug 1 — V2 Validation Too Permissive

**Claim:** `verifyHeaderV2` only checks timestamp/gas/uncles, skips QC/round/leader/validators/penalties.

**Audit of `consensus/XDPoS/engines/engine_v2/verifyHeader.go` (228 lines):**

```go
// Lines 39-43: Validator signature length check
if len(header.Validator) == 0 {
    return utils.ErrNoValidatorSignatureV2
} else if len(header.Validator) != 65 {
    return fmt.Errorf("invalid validator signature length %d, want 65", len(header.Validator))
}

// Lines 80-103: QC extraction + round validation + QC verification
quorumCert, round, _, err := x.getExtraFields(chain, header)
if err != nil { return utils.ErrInvalidV2Extra }
if quorumCert == nil { return utils.ErrInvalidQuorumCert }
if round <= quorumCert.ProposedBlockInfo.Round { return utils.ErrRoundInvalid }
if err := x.verifyQC(chain, quorumCert, parent, parents); err != nil { return err }

// Lines 127-186: Epoch switch detection + calcMasternodes + CompareSignersLists
isEpochSwitch, _, err := x.IsEpochSwitch(header)
if isEpochSwitch {
    localMasterNodes, localPenalties, _, err := x.calcMasternodes(...)
    validatorsAddress := contracts.ExtractAddressFromBytes(header.Validators)
    if !utils.CompareSignersLists(masterNodes, validatorsAddress) { return utils.ErrValidatorsNotLegit }
    penaltiesAddress := contracts.ExtractAddressFromBytes(header.Penalties)
    if !utils.CompareSignersLists(localPenalties, penaltiesAddress) { return utils.ErrPenaltiesNotLegit }
}

// Lines 214-224: Coinbase/validator match + leader index check
if validatorAddress != header.Coinbase { return utils.ErrCoinbaseAndValidatorMismatch }
leaderIndex := uint64(round) % x.config.Epoch % uint64(len(masterNodes))
if masterNodes[leaderIndex] != validatorAddress { return utils.ErrNotItsTurn }
```

**Verdict:** Every claimed missing check **is present**. The `verifyHeader.go` file is a comprehensive port from v2.6.8. The only soft spot is the bulk-sync fallback (lines 149-162) where `calcMasternodes` failure falls back to `GetMasternodesWithParents`. This is **intentional and safe** because:
- The fallback only triggers when `len(parents) > 0` (bulk sync mode)
- `CompareSignersLists` still runs against the fallback list
- The fallback prevents the ~1 block per sync cycle trap documented in v8.3

**Risk:** LOW. The fallback is bounded and still validates validators/penalties.

---

### Bug 2 — Missing Gap Snapshots

**Claim:** `InsertHeadersBeforeCutoff` never runs XDPoS engine → no snapshots stored for V2 validation.

**Audit of the two insertion paths:**

**Path A: Normal block import (`insertChain` → `VerifyHeader` → `Finalize`)**
- `UpdateMasternodesFromHeader` is called during block import (confirmed in `core/blockchain.go` import path)
- It reads candidates from smart contract state, sorts by stake, stores snapshot
- This path works correctly for normal sync

**Path B: Checkpoint anchor insertion (`InsertHeadersBeforeCutoff`)**
- `InsertHeadersBeforeCutoff` writes headers directly to DB (key-value or ancient)
- **It does NOT run the consensus engine** (by design — checkpoint is trusted)
- **It does NOT call `UpdateMasternodesFromHeader`**
- For a checkpoint that is also a **gap block** (`number % epoch == epoch - gap`), this means the snapshot for that gap block is **never stored**

**Gap Analysis:**
- If checkpoint = 56,000,000 and epoch = 900, gap = 450, then gap blocks are at `N % 900 == 450`
- Checkpoint 56,000,000: `56,000,000 % 900 = 800`. Not a gap block. Safe.
- But what if the checkpoint **is** a gap block? Or what if the first epoch switch after the checkpoint needs the snapshot from the previous gap block?
- The real issue: after inserting checkpoint anchor, when the first post-checkpoint epoch switch block arrives (e.g., 56,000,100 if that's an epoch switch), `calcMasternodes` calls `getSnapshot(56,000,100)` which computes gapNumber = `56,000,100 - 100 - 450 = 55,999,550`. If that gap block's snapshot was never stored (because it was imported before the checkpoint anchor was inserted, or because the checkpoint anchor itself is the gap block), `getSnapshot` will try `repairSnapshot`.

**Current `repairSnapshot` behavior:**
```go
// repairSnapshot at V2 checkpoint reads from smart contract state at gap block
if stateReader, ok := chain.(interface{ StateAt(common.Hash) (*state.StateDB, error) }); ok {
    if statedb, err := stateReader.StateAt(gapHeader.Root); err == nil {
        candidates := state.GetCandidates(statedb)
        // ... sort by stake, return
    }
}
```

This is **functionally equivalent** to `UpdateMasternodesFromHeader`. The snapshot gets rebuilt on-demand.

**Verdict:** The claim is **partially true** for the checkpoint-anchor path, but the `repairSnapshot` fallback covers it. However, relying on repair during sync is **risky** because:
1. `StateAt(gapHeader.Root)` may fail if state is pruned or not yet downloaded
2. The repair path adds latency to header verification
3. If repair fails, the entire sync stops with `ErrUnknownAncestor`

**Fix needed for v115:** Pre-seed the gap snapshot when inserting the checkpoint anchor, if the checkpoint is a gap block or if we can derive the next gap block from it.

---

### Bug 3 — No `getFromCheckpoint` Equivalent

**Claim:** No contract fallback when snapshots missing → `ErrUnknownAncestor`.

**Audit:**
- `repairSnapshot` (lines 785-849 in `engine.go`) **is** the contract fallback
- It queries `state.GetCandidates(statedb)` from the validator contract at the gap block's state root
- It sorts by stake descending (matching v2.6.8 / Nethermind behavior)
- It stores the repaired snapshot to DB and cache

**Verdict:** This bug **is already fixed**. `repairSnapshot` serves the exact same purpose as `getFromCheckpoint` / `ComputeSnapshotFromContract` in Nethermind.

---

## 3. What Actually Needs Fixing (Prioritized)

After cross-validation, the **real remaining work** for v115 is:

### P0 — Checkpoint Anchor Gap Snapshot Pre-seeding
**File:** `core/blockchain.go` (`InsertHeadersBeforeCutoff`)
**Problem:** When a trusted checkpoint anchor is inserted, if that checkpoint is a gap block (or the next gap block after it will be needed before state is available), the snapshot is missing and must be repaired on-demand.
**Fix:** After inserting the checkpoint header, if the checkpoint number satisfies the gap condition, compute and store the snapshot immediately using the checkpoint's state root (if available) or mark it for lazy repair.

```go
// In InsertHeadersBeforeCutoff, after writing the checkpoint header:
if checkpointMatch && bc.chainConfig.XDPoS != nil {
    cpNum := matchedCp.Number
    epoch := bc.chainConfig.XDPoS.Epoch
    gap := bc.chainConfig.XDPoS.Gap
    if cpNum % epoch == epoch - gap {
        // Checkpoint IS a gap block — seed snapshot now if we have state
        if engineV2, ok := bc.engine.(interface {
            UpdateMasternodesFromHeader(consensus.ChainReader, *types.Header, *state.StateDB) error
        }); ok {
            if statedb, err := bc.StateAt(cp.Root); err == nil {
                if err := engineV2.UpdateMasternodesFromHeader(bc, cp, statedb); err != nil {
                    log.Warn("checkpoint gap snapshot pre-seed failed", "err", err)
                } else {
                    log.Info("checkpoint gap snapshot pre-seeded", "checkpoint", cpNum)
                }
            }
        }
    }
}
```

**Risk:** The checkpoint anchor may not have state available at insertion time (it's just a header). In that case, we should **not** fail — the repairSnapshot fallback will handle it later when state is downloaded.

**Dependency:** Requires `StateAt` to be available. If not, skip gracefully.

---

### P1 — `yourturn` Leader Index (Already Fixed, Verify)
**File:** `consensus/XDPoS/engines/engine_v2/mining.go`
**Status:** `yourturnAligned` already uses `round % Epoch % len(masterNodes)` (line 67). The old buggy `yourturn` (line 672 in engine.go) delegates to `yourturnAligned`.
**Action:** Verify no code path calls the old `yourturn` directly with wrong logic. The delegate pattern looks correct.

---

### P1 — `calcMasternodes` First-Epoch Heuristic (Already Fixed, Verify)
**File:** `consensus/XDPoS/engines/engine_v2/engine.go` lines 991-1004
**Status:** The heuristic fallback (`blockNum <= SwitchBlock+Epoch` with candidate count < 20) was **removed** in a prior fix. Current code only special-cases `blockNum == SwitchBlock+1`.
**Action:** Confirm this matches v2.6.8 exactly. It does — v2.6.8 only checks `SwitchBlock+1`.

---

### P1 — Snapshot DB Key / JSON Tag (Already Fixed, Verify)
**File:** `consensus/XDPoS/engines/engine_v2/snapshot.go`
**Status:** DB key is `"XDPoS-V2-" + hash[:]` (line 41). JSON tag is `"masterNodes"` (line 23). Both match v2.6.8.
**Action:** None needed. Already aligned.

---

### P2 — `getEpochSwitchInfo` Missing Penalties/Standbynodes
**File:** `consensus/XDPoS/engines/engine_v2/engine.go` (around line 1202)
**Status:** The RESEARCH report claims penalties/standbynodes are NOT populated. Need to verify if this breaks any RPC or internal logic.
**Action:** Add penalties/standbynodes computation to `getEpochSwitchInfo` to match v2.6.8. This is not consensus-critical but affects RPC compatibility.

---

### P2 — `GetCurrentEpochSwitchBlock` Missing `+SwitchEpoch`
**File:** `consensus/XDPoS/engines/engine_v2/epochSwitch.go`
**Status:** Need to verify if `epochNum` calculation omits `+SwitchEpoch`.
**Action:** Check and fix if needed. Not consensus-critical for block validation but affects epoch number APIs.

---

## 4. Implementation Plan for v115

### Phase 1: P0 Fix — Checkpoint Gap Snapshot Pre-seeding (1 day)
1. **File:** `core/blockchain.go`
2. **Function:** `InsertHeadersBeforeCutoff`
3. **Change:** After inserting checkpoint headers, check if checkpoint is a gap block. If yes, attempt to pre-seed the V2 snapshot using `UpdateMasternodesFromHeader` with the checkpoint's state (if available).
4. **Fallback:** If state is not available, log a warning and continue. The `repairSnapshot` path will handle it later.
5. **Test:** Sync from 56M on Apothem testnet. Verify no `repairSnapshot` warnings for the first post-checkpoint epoch switch.

### Phase 2: P1 Verification — Audit `yourturn` and `calcMasternodes` (0.5 day)
1. Confirm `yourturn` always delegates to `yourturnAligned`
2. Confirm `calcMasternodes` only special-cases `SwitchBlock+1`
3. Run unit tests for both functions

### Phase 3: P2 Fixes — RPC Compatibility (1 day)
1. Add penalties/standbynodes to `getEpochSwitchInfo`
2. Fix `GetCurrentEpochSwitchBlock` epoch number calculation
3. Add validator length check and coinbase/validator mismatch check if not already present (they ARE present in verifyHeader.go, so this may be already done)

### Phase 4: Integration Testing (1-2 days)
1. Full sync from 56M on Apothem
2. Verify no `ErrUnknownAncestor` after checkpoint
3. Verify epoch switch blocks validate correctly
4. Verify state root cache works (XDC-specific)

---

## 5. Risks and Dependencies

| Risk | Impact | Mitigation |
|------|--------|------------|
| Checkpoint anchor state not available at insertion | Pre-seeding fails, falls back to repairSnapshot | Log warning, don't fail insertion |
| `repairSnapshot` state read fails during sync | Sync stops with error | Ensure state is downloaded before epoch switch blocks are validated |
| Snapshot Version mismatch with old DBs | Corrupt snapshots loaded | Current code already rejects Version < 3 and exact count 13 |
| `yourturnAligned` vs old `yourturn` inconsistency | Wrong leader selected | Audit all call sites to confirm delegation |
| State root divergence (uint256 vs big.Int) | Different state roots than v2.6.8 | Already handled by `XdcCacheStateRoot` — cannot be "fixed" without reverting to geth 1.8 |

**Dependencies:**
- Docker/deployment issues must be resolved before testing (current blocker per task description)
- Need access to Apothem testnet with 56M checkpoint for real-world validation

---

## 6. Summary of Findings

| Original Bug | Status in v114 | Action for v115 |
|-------------|----------------|-----------------|
| V2 validation too permissive | **FIXED** — `verifyHeader.go` has full QC/round/leader/validators/penalties checks | None — verify no regression |
| Missing gap snapshots | **PARTIALLY FIXED** — `UpdateMasternodesFromHeader` works for normal import; checkpoint anchor path lacks pre-seeding | **P0: Pre-seed gap snapshot in `InsertHeadersBeforeCutoff`** |
| No `getFromCheckpoint` equivalent | **FIXED** — `repairSnapshot` queries contract state as fallback | None — verify repair path coverage |

**Bottom line:** The 3 critical bugs from the report are **largely resolved** in v113/v114. The remaining work is a **hardening fix** for the checkpoint-anchor insertion path to pre-seed gap snapshots, plus minor RPC compatibility fixes. No fundamental consensus divergence remains unfixed.

---

*Plan generated by Hermes Agent after live code audit of 12+ source files and cross-reference with RESEARCH_V2_6_8_COMPARISON.md.*
