GP5 CRITICAL FIX PLAN — Prioritized PR Strategy Branch: xdc-network | Date: 2026-04-25 Author: AI Assistant | Reviewer: Anil Chinchawale ================================================================================ EXECUTIVE SUMMARY ================================================================================ 13 critical issues are open. They fall into 3 tiers: TIER 1 (HIGHEST — Block-byte parity): C10 (Author/sigHash), C6 (Finalize reward gating), C7 (HookPenalty boundary), C8 (HookPenalty candidates) → These 4 issues collectively cause EVERY V2 block to diverge from v2.6.8 state root. → Must be fixed together in a single PR because they touch the same code paths. TIER 2 (HIGH — Sync/bootstrap): C1.1 (initial() walk-back), 401 (v82 snapshot bootstrap), 395 (wrong masternode count), 392 (snapshot repair 13 candidates), 391 (nil panic) → These 5 issues prevent nodes from syncing past the V2 switch block or cause sync stall. → PR 2: Bootstrap/repair fixes + defensive nil checks. TIER 3 (MEDIUM — Structural): 399 (V1 HookValidator M2), 389 (Gap Analysis), 385 (Consensus Divergences), 379 (SetEngineV2 wiring) → These 4 issues are architectural gaps that don't cause immediate crashes but create consensus risk. → PR 3: Architecture alignment (UpdateM1, verifyHeader, SetEngineV2, M2 randomization). ================================================================================ TIER 1 PR: "fix/v2-consensus-byte-parity-C6-C7-C8-C10" ================================================================================ GOAL: Achieve byte-identical block parity with v2.6.8 for all V2 blocks. ISSUES ADDRESSED: #410 C7 — V2 HookPenalty uses block-number boundary instead of round-based IsEpochSwitch #409 C6 — Wrapper Finalize uses block-number reward gating instead of round-based IsEpochSwitch #407 C10 — Wrapper-level Author uses V2-incompatible sigHash #402 C8 — V2 HookPenalty uses pre-penalty candidates instead of post-penalty active masternodes FILES TOUCHED: consensus/XDPoS/xdpos.go — Author() dispatch, Finalize() dispatch consensus/XDPoS/engines/engine_v2/engine.go — IsEpochSwitch integration eth/hooks/engine_v2_hooks.go — HookPenalty boundary + candidates fix core/state_processor.go — Fee routing (Author impact) core/evm.go — Fee routing (Author impact) core/state_transition.go — Fee routing (Author impact) CHANGES NEEDED: 1. C10 — Author() dispatch (xdpos.go:382-384) BEFORE: func (c *XDPoS) Author(header *types.Header) (common.Address, error) { return ecrecover(header, c.signatures) } AFTER: func (c *XDPoS) Author(header *types.Header) (common.Address, error) { if c.IsV2Block(header) { return c.EngineV2.Author(header) } return ecrecover(header, c.signatures) } IMPACT: V2 blocks use correct 17-field sigHash; fee routing matches v2.6.8. 2. C6 — Finalize() dispatch (xdpos.go:1018) BEFORE: if number%rCheckpoint == 0 { HookReward(...) } AFTER: if c.IsV2Block(header) { // Let V2 engine handle reward gating via IsEpochSwitch return c.EngineV2.Finalize(chain, header, state, txs, uncles, receipts) } if number%rCheckpoint == 0 { HookReward(...) } // V1 only IMPACT: V2 reward fires correctly under leader timeout shifts. 3. C7 — HookPenalty boundary (eth/hooks/engine_v2_hooks.go:83) BEFORE: for curNum := parentNumber; curNum > stopBlock; curNum-- { if curNum%chainConfig.XDPoS.Epoch == 0 { break } ... } AFTER: for curNum := parentNumber; curNum > stopBlock; curNum-- { curHeader := chain.GetHeaderByNumber(curNum) if adaptor.EngineV2.IsEpochSwitch(curHeader) { break } ... } IMPACT: Penalty window correctly tracks round-based epoch boundaries. 4. C8 — HookPenalty candidates (eth/hooks/engine_v2_hooks.go:104) BEFORE: preMasternodes := candidates // WRONG: pre-penalty list AFTER: preMasternodes := adaptor.EngineV2.GetMasternodesByHash(chain, currentHash) IMPACT: Penalty rule checks post-penalty active masternodes, matching v2.6.8. VALIDATION: - Run GP5 and v2.6.8 side-by-side on same Apothem datadir. - Compare header.Root, header.Extra, header.Validators, header.Penalties for every V2 block. - Must be byte-identical for 1000+ consecutive blocks. ================================================================================ TIER 2 PR: "fix/v2-bootstrap-snapshot-repair-C1.1-391-392-395-401" ================================================================================ GOAL: Fix all sync/bootstrap/repair paths so nodes can sync from cold snapshots. ISSUES ADDRESSED: #408 C1.1 — initial() walk-back regression (cold-restore at switchBlock) #391 — Nil pointer panic at V2 switch block #392 — Snapshot repair produces 13 candidates at V2 checkpoint #395 — Snapshot bootstrap produces wrong masternode count (2 vs 10) #401 — v82 snapshot bootstrap fails (preIndex=-1 curIndex=-1) FILES TOUCHED: consensus/XDPoS/engines/engine_v2/engine.go — initial(), repairSnapshot(), getSnapshot() consensus/XDPoS/engines/engine_v2/verifyHeader.go — calcMasternodes fallback consensus/XDPoS/snapshot.go — loadSnapshot validation consensus/XDPoS/xdpos.go — snapshot() bootstrap CHANGES NEEDED: 1. C1.1 — initial() walk-back guard (engine.go:366) BEFORE: for header.Number.Uint64() > switchBlock.Uint64() { ... } AFTER: for header.Number.Uint64() >= switchBlock.Uint64() { ... } IMPACT: Cold-restore ending at switchBlock-1 now correctly walks back. 2. #391 — Defensive nil checks - Add nil check to getExtraFieldsNoChain (engine.go:~1060) - Add nil check for lastGapHeader in initial() (engine.go:~348) IMPACT: Prevents panic even if chain data is partially missing. 3. #392/#395/#401 — Snapshot bootstrap source selection BEFORE: // Always prefers header.Validators masternodes = x.GetMasternodesFromEpochSwitchHeader(chain, checkpointHeader) AFTER: // V1 checkpoints (< switchBlock): use header.Extra // V2 checkpoints (>= switchBlock): use header.Validators if checkpointNumber < switchBlock { masternodes = extractFromExtra(checkpointHeader.Extra) } else { masternodes = extractFromValidators(checkpointHeader.Validators) } IMPACT: V1-era snapshots get correct 13/18 masternodes from Extra; V2 gets correct count from Validators. 4. #401 — Add snapshot validation after bootstrap AFTER bootstrap: if len(masternodes) < expectedMinForNetwork(chainID) { log.Error("Bootstrap produced suspicious masternode count", ...) // Force full replay from genesis instead of using bad snapshot return nil, fmt.Errorf("bootstrap validation failed: got %d masternodes", len(masternodes)) } IMPACT: Prevents silent propagation of bad snapshots. VALIDATION: - Test with v62 cold snapshot on Apothem (block 55,343,169). - Test with v82 snapshot on Apothem. - Must sync past switch block (56,828,700) without stall or panic. ================================================================================ TIER 3 PR: "feat/v2-architecture-alignment-379-385-389-399" ================================================================================ GOAL: Close structural gaps with v2.6.8 for long-term maintainability. ISSUES ADDRESSED: #379 — SetEngineV2() not called before first V2 block #385 — Consensus divergences (UpdateM1, verifyHeader, sigHash, Version) #389 — Gap Analysis (bootnodes, SkipV1Validation, protocol version) #399 — V1 HookValidator missing M2 randomization via IPC FILES TOUCHED: consensus/XDPoS/xdpos.go — New() wiring eth/backend.go — Initialization order core/blockchain.go — UpdateM1 integration consensus/XDPoS/engines/engine_v2/verifyHeader.go — Port/enhance consensus/XDPoS/engines/engine_v2/snapshot.go — Version gate eth/hooks/hooks.go — M2 randomization params/config.go — SkipV1Validation params/bootnodes.go — Update lists CHANGES NEEDED: 1. #379 — Wire SetEngineV2 in New() BEFORE: func New(...) { create V1 engine; return } // SetEngineV2 called later by backend AFTER: func New(...) { create V1 + V2 engines; wire channels; return } IMPACT: No consensus bypass risk at switch block. 2. #385 — Port UpdateM1() - Add contract-based candidate reading at gap blocks. - Fallback to snapshot if contract call fails. IMPACT: Correct candidate list even if snapshot is stale. 3. #385 — Port verifyHeader.go - Add uncle hash, difficulty, timestamp, QC validation. - Remove inline weak validation from wrapper. IMPACT: Rejects invalid blocks before they enter chain. 4. #385 — Align sigHash RLP encoding - Ensure 17-field encoding matches v2.6.8 exactly. IMPACT: Header.Hash() matches v2.6.8. 5. #389 — Update bootnode lists - Sync Apothem/mainnet bootnodes with v2.6.8. IMPACT: Better peer discovery. 6. #399 — M2 randomization (optional, mining-only) - Wire IPC client for V1 checkpoint mining. - Add mining guard if IPC unavailable. IMPACT: GP5 can mine V1 checkpoints (not needed for sync). VALIDATION: - Full sync test on Apothem from genesis to head. - Bit-for-bit replay against v2.6.8 archive node (issue #406). ================================================================================ PR DEPENDENCIES & MERGE ORDER ================================================================================ ORDER: Tier 1 → Tier 2 → Tier 3 REASONING: - Tier 1 fixes the fundamental consensus divergence that makes EVERY V2 block wrong. Without Tier 1, Tier 2 validation (sync tests) is meaningless because the chain is forked. - Tier 2 fixes the bootstrap paths that allow nodes to REACH the V2 blocks. Without Tier 2, you can't test Tier 1 on a cold snapshot. - Tier 3 is structural cleanup that doesn't affect immediate sync but improves safety. EXCEPTION: - C1.1 (initial() >= fix) is a ONE-CHARACTER change that unblocks ALL testing. - It can be merged as a hotfix PR immediately, before Tier 1. ================================================================================ ESTIMATED EFFORT ================================================================================ TIER 1 (C6/C7/C8/C10): 4-6 hours implementation + 8 hours validation TIER 2 (Bootstrap fixes): 3-4 hours implementation + 6 hours validation TIER 3 (Architecture): 6-8 hours implementation + 12 hours validation CRITICAL PATH: - C1.1 hotfix: 15 minutes (merge first) - Tier 1: 1-2 days - Tier 2: 1 day - Tier 3: 2-3 days ================================================================================ IMMEDIATE ACTION ITEMS ================================================================================ 1. [YOU] Review this plan and confirm merge order. 2. [ME] Create C1.1 hotfix PR (single character change). 3. [ME] Create Tier 1 PR (C6/C7/C8/C10) with detailed diff. 4. [YOU] Review Tier 1 PR — focus on Author() dispatch and Finalize() gating. 5. [ME] Address review feedback, merge Tier 1. 6. [ME] Create Tier 2 PR (bootstrap fixes). 7. [YOU] Test Tier 2 with v62/v82 snapshots on local server. 8. [ME] Address feedback, merge Tier 2. 9. [ME] Create Tier 3 PR (architecture alignment). 10. [YOU] Review Tier 3 — focus on UpdateM1 and verifyHeader port. ================================================================================ RISK ASSESSMENT ================================================================================ HIGH RISK: - Tier 1 changes touch fee routing (Author). Any bug affects state root. MITIGATION: Extensive side-by-side testing with v2.6.8. MEDIUM RISK: - Tier 2 bootstrap changes may break existing warm snapshots. MITIGATION: Add backward compatibility — accept old snapshots, only validate new ones. LOW RISK: - Tier 3 changes are additive (new validation, new APIs). MITIGATION: Feature-gate behind config flags. ================================================================================ END OF PLAN ================================================================================