GP5 CRITICAL FIX PLAN — Prioritized PR Strategy
Branch: xdc-network | Date: 2026-04-25
Author: AI Assistant | Reviewer: Anil Chinchawale

================================================================================
EXECUTIVE SUMMARY
================================================================================

13 critical issues are open. They fall into 3 tiers:

TIER 1 (HIGHEST — Block-byte parity): C10 (Author/sigHash), C6 (Finalize reward gating), C7 (HookPenalty boundary), C8 (HookPenalty candidates)
  → These 4 issues collectively cause EVERY V2 block to diverge from v2.6.8 state root.
  → Must be fixed together in a single PR because they touch the same code paths.

TIER 2 (HIGH — Sync/bootstrap): C1.1 (initial() walk-back), 401 (v82 snapshot bootstrap), 395 (wrong masternode count), 392 (snapshot repair 13 candidates), 391 (nil panic)
  → These 5 issues prevent nodes from syncing past the V2 switch block or cause sync stall.
  → PR 2: Bootstrap/repair fixes + defensive nil checks.

TIER 3 (MEDIUM — Structural): 399 (V1 HookValidator M2), 389 (Gap Analysis), 385 (Consensus Divergences), 379 (SetEngineV2 wiring)
  → These 4 issues are architectural gaps that don't cause immediate crashes but create consensus risk.
  → PR 3: Architecture alignment (UpdateM1, verifyHeader, SetEngineV2, M2 randomization).

================================================================================
TIER 1 PR: "fix/v2-consensus-byte-parity-C6-C7-C8-C10"
================================================================================

GOAL: Achieve byte-identical block parity with v2.6.8 for all V2 blocks.

ISSUES ADDRESSED:
  #410 C7 — V2 HookPenalty uses block-number boundary instead of round-based IsEpochSwitch
  #409 C6 — Wrapper Finalize uses block-number reward gating instead of round-based IsEpochSwitch
  #407 C10 — Wrapper-level Author uses V2-incompatible sigHash
  #402 C8 — V2 HookPenalty uses pre-penalty candidates instead of post-penalty active masternodes

FILES TOUCHED:
  consensus/XDPoS/xdpos.go              — Author() dispatch, Finalize() dispatch
  consensus/XDPoS/engines/engine_v2/engine.go  — IsEpochSwitch integration
  eth/hooks/engine_v2_hooks.go          — HookPenalty boundary + candidates fix
  core/state_processor.go             — Fee routing (Author impact)
  core/evm.go                         — Fee routing (Author impact)
  core/state_transition.go              — Fee routing (Author impact)

CHANGES NEEDED:

1. C10 — Author() dispatch (xdpos.go:382-384)
   BEFORE:
     func (c *XDPoS) Author(header *types.Header) (common.Address, error) {
         return ecrecover(header, c.signatures)
     }
   AFTER:
     func (c *XDPoS) Author(header *types.Header) (common.Address, error) {
         if c.IsV2Block(header) {
             return c.EngineV2.Author(header)
         }
         return ecrecover(header, c.signatures)
     }
   IMPACT: V2 blocks use correct 17-field sigHash; fee routing matches v2.6.8.

2. C6 — Finalize() dispatch (xdpos.go:1018)
   BEFORE:
     if number%rCheckpoint == 0 { HookReward(...) }
   AFTER:
     if c.IsV2Block(header) {
         // Let V2 engine handle reward gating via IsEpochSwitch
         return c.EngineV2.Finalize(chain, header, state, txs, uncles, receipts)
     }
     if number%rCheckpoint == 0 { HookReward(...) }  // V1 only
   IMPACT: V2 reward fires correctly under leader timeout shifts.

3. C7 — HookPenalty boundary (eth/hooks/engine_v2_hooks.go:83)
   BEFORE:
     for curNum := parentNumber; curNum > stopBlock; curNum-- {
         if curNum%chainConfig.XDPoS.Epoch == 0 { break }
         ...
     }
   AFTER:
     for curNum := parentNumber; curNum > stopBlock; curNum-- {
         curHeader := chain.GetHeaderByNumber(curNum)
         if adaptor.EngineV2.IsEpochSwitch(curHeader) { break }
         ...
     }
   IMPACT: Penalty window correctly tracks round-based epoch boundaries.

4. C8 — HookPenalty candidates (eth/hooks/engine_v2_hooks.go:104)
   BEFORE:
     preMasternodes := candidates  // WRONG: pre-penalty list
   AFTER:
     preMasternodes := adaptor.EngineV2.GetMasternodesByHash(chain, currentHash)
   IMPACT: Penalty rule checks post-penalty active masternodes, matching v2.6.8.

VALIDATION:
  - Run GP5 and v2.6.8 side-by-side on same Apothem datadir.
  - Compare header.Root, header.Extra, header.Validators, header.Penalties for every V2 block.
  - Must be byte-identical for 1000+ consecutive blocks.

================================================================================
TIER 2 PR: "fix/v2-bootstrap-snapshot-repair-C1.1-391-392-395-401"
================================================================================

GOAL: Fix all sync/bootstrap/repair paths so nodes can sync from cold snapshots.

ISSUES ADDRESSED:
  #408 C1.1 — initial() walk-back regression (cold-restore at switchBlock)
  #391 — Nil pointer panic at V2 switch block
  #392 — Snapshot repair produces 13 candidates at V2 checkpoint
  #395 — Snapshot bootstrap produces wrong masternode count (2 vs 10)
  #401 — v82 snapshot bootstrap fails (preIndex=-1 curIndex=-1)

FILES TOUCHED:
  consensus/XDPoS/engines/engine_v2/engine.go  — initial(), repairSnapshot(), getSnapshot()
  consensus/XDPoS/engines/engine_v2/verifyHeader.go  — calcMasternodes fallback
  consensus/XDPoS/snapshot.go           — loadSnapshot validation
  consensus/XDPoS/xdpos.go              — snapshot() bootstrap

CHANGES NEEDED:

1. C1.1 — initial() walk-back guard (engine.go:366)
   BEFORE:
     for header.Number.Uint64() > switchBlock.Uint64() { ... }
   AFTER:
     for header.Number.Uint64() >= switchBlock.Uint64() { ... }
   IMPACT: Cold-restore ending at switchBlock-1 now correctly walks back.

2. #391 — Defensive nil checks
   - Add nil check to getExtraFieldsNoChain (engine.go:~1060)
   - Add nil check for lastGapHeader in initial() (engine.go:~348)
   IMPACT: Prevents panic even if chain data is partially missing.

3. #392/#395/#401 — Snapshot bootstrap source selection
   BEFORE:
     // Always prefers header.Validators
     masternodes = x.GetMasternodesFromEpochSwitchHeader(chain, checkpointHeader)
   AFTER:
     // V1 checkpoints (< switchBlock): use header.Extra
     // V2 checkpoints (>= switchBlock): use header.Validators
     if checkpointNumber < switchBlock {
         masternodes = extractFromExtra(checkpointHeader.Extra)
     } else {
         masternodes = extractFromValidators(checkpointHeader.Validators)
     }
   IMPACT: V1-era snapshots get correct 13/18 masternodes from Extra; V2 gets correct count from Validators.

4. #401 — Add snapshot validation after bootstrap
   AFTER bootstrap:
     if len(masternodes) < expectedMinForNetwork(chainID) {
         log.Error("Bootstrap produced suspicious masternode count", ...)
         // Force full replay from genesis instead of using bad snapshot
         return nil, fmt.Errorf("bootstrap validation failed: got %d masternodes", len(masternodes))
     }
   IMPACT: Prevents silent propagation of bad snapshots.

VALIDATION:
  - Test with v62 cold snapshot on Apothem (block 55,343,169).
  - Test with v82 snapshot on Apothem.
  - Must sync past switch block (56,828,700) without stall or panic.

================================================================================
TIER 3 PR: "feat/v2-architecture-alignment-379-385-389-399"
================================================================================

GOAL: Close structural gaps with v2.6.8 for long-term maintainability.

ISSUES ADDRESSED:
  #379 — SetEngineV2() not called before first V2 block
  #385 — Consensus divergences (UpdateM1, verifyHeader, sigHash, Version)
  #389 — Gap Analysis (bootnodes, SkipV1Validation, protocol version)
  #399 — V1 HookValidator missing M2 randomization via IPC

FILES TOUCHED:
  consensus/XDPoS/xdpos.go              — New() wiring
  eth/backend.go                        — Initialization order
  core/blockchain.go                    — UpdateM1 integration
  consensus/XDPoS/engines/engine_v2/verifyHeader.go  — Port/enhance
  consensus/XDPoS/engines/engine_v2/snapshot.go  — Version gate
  eth/hooks/hooks.go                    — M2 randomization
  params/config.go                      — SkipV1Validation
  params/bootnodes.go                   — Update lists

CHANGES NEEDED:

1. #379 — Wire SetEngineV2 in New()
   BEFORE:
     func New(...) { create V1 engine; return }
     // SetEngineV2 called later by backend
   AFTER:
     func New(...) { create V1 + V2 engines; wire channels; return }
   IMPACT: No consensus bypass risk at switch block.

2. #385 — Port UpdateM1()
   - Add contract-based candidate reading at gap blocks.
   - Fallback to snapshot if contract call fails.
   IMPACT: Correct candidate list even if snapshot is stale.

3. #385 — Port verifyHeader.go
   - Add uncle hash, difficulty, timestamp, QC validation.
   - Remove inline weak validation from wrapper.
   IMPACT: Rejects invalid blocks before they enter chain.

4. #385 — Align sigHash RLP encoding
   - Ensure 17-field encoding matches v2.6.8 exactly.
   IMPACT: Header.Hash() matches v2.6.8.

5. #389 — Update bootnode lists
   - Sync Apothem/mainnet bootnodes with v2.6.8.
   IMPACT: Better peer discovery.

6. #399 — M2 randomization (optional, mining-only)
   - Wire IPC client for V1 checkpoint mining.
   - Add mining guard if IPC unavailable.
   IMPACT: GP5 can mine V1 checkpoints (not needed for sync).

VALIDATION:
  - Full sync test on Apothem from genesis to head.
  - Bit-for-bit replay against v2.6.8 archive node (issue #406).

================================================================================
PR DEPENDENCIES & MERGE ORDER
================================================================================

ORDER: Tier 1 → Tier 2 → Tier 3

REASONING:
  - Tier 1 fixes the fundamental consensus divergence that makes EVERY V2 block wrong.
    Without Tier 1, Tier 2 validation (sync tests) is meaningless because the chain is forked.
  - Tier 2 fixes the bootstrap paths that allow nodes to REACH the V2 blocks.
    Without Tier 2, you can't test Tier 1 on a cold snapshot.
  - Tier 3 is structural cleanup that doesn't affect immediate sync but improves safety.

EXCEPTION:
  - C1.1 (initial() >= fix) is a ONE-CHARACTER change that unblocks ALL testing.
  - It can be merged as a hotfix PR immediately, before Tier 1.

================================================================================
ESTIMATED EFFORT
================================================================================

TIER 1 (C6/C7/C8/C10):  4-6 hours implementation + 8 hours validation
TIER 2 (Bootstrap fixes):  3-4 hours implementation + 6 hours validation
TIER 3 (Architecture):    6-8 hours implementation + 12 hours validation

CRITICAL PATH:
  - C1.1 hotfix: 15 minutes (merge first)
  - Tier 1: 1-2 days
  - Tier 2: 1 day
  - Tier 3: 2-3 days

================================================================================
IMMEDIATE ACTION ITEMS
================================================================================

1. [YOU] Review this plan and confirm merge order.
2. [ME] Create C1.1 hotfix PR (single character change).
3. [ME] Create Tier 1 PR (C6/C7/C8/C10) with detailed diff.
4. [YOU] Review Tier 1 PR — focus on Author() dispatch and Finalize() gating.
5. [ME] Address review feedback, merge Tier 1.
6. [ME] Create Tier 2 PR (bootstrap fixes).
7. [YOU] Test Tier 2 with v62/v82 snapshots on local server.
8. [ME] Address feedback, merge Tier 2.
9. [ME] Create Tier 3 PR (architecture alignment).
10. [YOU] Review Tier 3 — focus on UpdateM1 and verifyHeader port.

================================================================================
RISK ASSESSMENT
================================================================================

HIGH RISK:
  - Tier 1 changes touch fee routing (Author). Any bug affects state root.
    MITIGATION: Extensive side-by-side testing with v2.6.8.

MEDIUM RISK:
  - Tier 2 bootstrap changes may break existing warm snapshots.
    MITIGATION: Add backward compatibility — accept old snapshots, only validate new ones.

LOW RISK:
  - Tier 3 changes are additive (new validation, new APIs).
    MITIGATION: Feature-gate behind config flags.

================================================================================
END OF PLAN
================================================================================