# Validation Playbook Extensions — Stages 5 and 6

Extends the existing four-stage playbook with mining-side validation. Stages 5 and 6 confirm or refute the Phase 9-12 findings of `part2_phases_9-12.md` (mining-side audit) on actual binaries against actual chain data.

Pre-requisites for both stages:
- Patient binary: built from `XDC-Geth` `xdc-network` HEAD `ace90b251` (or later).
- Reference binary: an XDC 2.6.8 binary built from `XDPoSChain`, tagged release.
- Apothem genesis files for both: `genesis_apothem.json` (4 validators, Period 2s, Epoch 900, Gap 450, V2.SwitchBlock at a known epoch boundary).

Important: Stage 5 and Stage 6 will FAIL on current `xdc-network` HEAD because of M-1 through M-7 plus W-1 through W-6. The playbook is written against the post-fix binary. Until those findings are addressed, the test will surface them in the order: peering → block production → block acceptance → BFT commit.

---

## Stage 5 — Mining shadow test (paired miners)

### Goal

Confirm that GP5 and v2.6.8, configured with the same signer key and pointed at the same parent state, produce **byte-identical** V2 blocks.

If the test passes, Phase 9 (wrapper dispatch) and Phase 11 (worker pipeline) are confirmed correct on the binary at hand. If it fails, the test diagnostic isolates which header field diverges and which bug class (M-1..M-7) is responsible.

### 5.1 Test-network composition

Four-validator private testnet, all running on a single host with isolated devp2p ports:

| Node | Binary | Role | Signer key | Coinbase |
|---|---|---|---|---|
| `n1` | v2.6.8 | proposer for round R | `key_a` (`0x...A1`) | `0x...A1` |
| `n2` | v2.6.8 | voter | `key_b` | `0x...B2` |
| `n3` | GP5 | proposer for round R+1 | `key_c` (`0x...C3`) | `0x...C3` |
| `n4` | GP5 | voter | `key_d` | `0x...D4` |

All four registered as masternodes in the genesis state. Apothem genesis file with:
- `epoch = 900`, `gap = 450`, `period = 2`
- `v2.SwitchBlock = 4500` (5 epochs in for fast V2 entry)
- `v2.SwitchEpoch = 5`

Per-node config: identical `chainID = 51`, identical genesis file, identical V2 config table. Bootnode = n1; n2/n3/n4 dial n1.

### 5.2 Bring-up sequence

```bash
# Build both binaries
cd $XDPOS_268_REPO && make geth && cp build/bin/geth /tmp/geth-268
cd $XDC_GETH_REPO && make geth && cp build/bin/geth /tmp/geth-gp5

# Initialize four datadirs from the same genesis
GENESIS=/tmp/genesis_apothem.json
for n in n1 n2 n3 n4; do
  rm -rf /tmp/$n && mkdir -p /tmp/$n/keystore
done

# Drop pre-generated keystores into each
cp keystores/key_a.json /tmp/n1/keystore/
cp keystores/key_b.json /tmp/n2/keystore/
cp keystores/key_c.json /tmp/n3/keystore/
cp keystores/key_d.json /tmp/n4/keystore/

# Init each
/tmp/geth-268 --datadir /tmp/n1 init $GENESIS
/tmp/geth-268 --datadir /tmp/n2 init $GENESIS
/tmp/geth-gp5 --datadir /tmp/n3 init $GENESIS
/tmp/geth-gp5 --datadir /tmp/n4 init $GENESIS

# Boot n1 first so we have an enode
N1_KEY=/tmp/n1/geth/nodekey
/tmp/geth-268 --datadir /tmp/n1 \
  --networkid 51 \
  --port 30311 --http --http.port 8541 --http.api eth,net,xdpos,debug \
  --mine --miner.etherbase 0x...A1 --unlock 0x...A1 --password /tmp/pwd \
  --syncmode full --gcmode archive \
  --verbosity 4 \
  &> /tmp/n1.log &
sleep 3
N1_ENODE=$(/tmp/geth-268 --datadir /tmp/n1 attach --exec admin.nodeInfo.enode)

# Boot n2 with bootnode
/tmp/geth-268 --datadir /tmp/n2 \
  --networkid 51 \
  --port 30312 --http --http.port 8542 --http.api eth,net,xdpos,debug \
  --bootnodes "$N1_ENODE" \
  --mine --miner.etherbase 0x...B2 --unlock 0x...B2 --password /tmp/pwd \
  --syncmode full --gcmode archive \
  --verbosity 4 \
  &> /tmp/n2.log &

# Boot n3 (GP5)
/tmp/geth-gp5 --datadir /tmp/n3 \
  --networkid 51 \
  --port 30313 --http --http.port 8543 --http.api eth,net,xdpos,debug \
  --bootnodes "$N1_ENODE" \
  --mine --miner.etherbase 0x...C3 --unlock 0x...C3 --password /tmp/pwd \
  --syncmode full --gcmode archive \
  --verbosity 4 \
  &> /tmp/n3.log &

# Boot n4 (GP5)
/tmp/geth-gp5 --datadir /tmp/n4 \
  --networkid 51 \
  --port 30314 --http --http.port 8544 --http.api eth,net,xdpos,debug \
  --bootnodes "$N1_ENODE" \
  --mine --miner.etherbase 0x...D4 --unlock 0x...D4 --password /tmp/pwd \
  --syncmode full --gcmode archive \
  --verbosity 4 \
  &> /tmp/n4.log &
```

Pre-flight: confirm all four peer up via `admin.peers` on each node's HTTP RPC. If `peers` < 3 on any node within 30 s, **STOP — Phase 12 W-9 (handshake) failure**. Capture handshake logs and report which side disconnects first.

### 5.3 Drive past V2 switch

```bash
# Watch height across all four
while true; do
  for p in 8541 8542 8543 8544; do
    h=$(curl -s -X POST -H "Content-Type: application/json" \
      --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
      http://localhost:$p | jq -r .result)
    printf "n%s: %d  " "$((p-8540))" "$((h))"
  done
  echo
  sleep 5
done
# Wait until block height >= 4500 + 100 (V2 switch + safety margin)
```

Expected behavior for blocks 0-4500 (V1): all four nodes mine in turn. Network advances ~1 block / 2s.

At block 4500 (V2 switch): mining shifts to round-based leader rotation. **Watch n3 and n4 logs for `[Prepare]` or `[yourturn]` traces.** If GP5 nodes never emit these logs, that's M-1 / F-11-1 confirmed — GP5's Prepare is V1 path and the worker isn't driving Seal.

If mining stalls past block 4500 even with v2.6.8 nodes (n1, n2): they alone can't form a 75% QC threshold for 4 masternodes (need 3-of-4 votes; n3, n4 not voting because W-1 inbound vote handler is stub → vote pool empty on GP5; but v2.6.8 nodes don't emit votes their peers ignore — only outbound from GP5 is a problem here). **Observation diagnostic:** if v2.6.8 nodes log "vote pool collected: 2" and never reach 3, Phase 12 W-4 confirmed (GP5 not broadcasting votes).

### 5.4 Deterministic-mining harness — the crux test

This is the comparison the audit really wants. Goal: ask both binaries to produce a block for the same parent and same round, with identical state and identical signer key, then compare the produced block bytes byte-for-byte.

Approach: use a debug RPC on each node to "force-mine round R at parent P".

```bash
# Stop all four miners after both have produced ≥ 5 V2 blocks each, then:
# Pick a parent header that both nodes agree on:
PARENT_HASH=$(curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_getBlockByNumber","params":["0x1194", false],"id":1}' \
  http://localhost:8541 | jq -r .result.hash)
# (block 4500 = V2.SwitchBlock; pick a few blocks past, e.g. 4520)

# Tell n1 (v2.6.8): "treat key_a as next leader, mine on top of PARENT_HASH"
curl -s -X POST -H "Content-Type: application/json" \
  --data "{\"jsonrpc\":\"2.0\",\"method\":\"debug_minePinned\",\"params\":[\"$PARENT_HASH\", \"key_a\", 4521], \"id\":1}" \
  http://localhost:8541 > /tmp/block_n1.json

# Tell n3 (GP5): same instruction with the same key/parent/round
curl -s -X POST -H "Content-Type: application/json" \
  --data "{\"jsonrpc\":\"2.0\",\"method\":\"debug_minePinned\",\"params\":[\"$PARENT_HASH\", \"key_a\", 4521], \"id\":1}" \
  http://localhost:8543 > /tmp/block_n3.json

# Compare RLP bytes
diff <(jq -r .result.rawHeader /tmp/block_n1.json | xxd) \
     <(jq -r .result.rawHeader /tmp/block_n3.json | xxd) > /tmp/header_diff.txt

if [ -s /tmp/header_diff.txt ]; then
  echo "FAIL: header bytes differ"
  cat /tmp/header_diff.txt
fi
```

**Note:** `debug_minePinned` does not exist in either binary today. It must be added as a test harness:

```go
// In eth/api_debug.go, conditional on `--debug.minePinned`:
func (api *DebugAPI) MinePinned(parent common.Hash, signerKey string, round uint64) (rawBlock hexutil.Bytes, err error) {
    parentHeader := api.eth.BlockChain().GetHeaderByHash(parent)
    if parentHeader == nil { return nil, fmt.Errorf("parent not found") }
    // Force the engine's currentRound = round (test-only)
    engineV2 := api.eth.Engine().(*XDPoS.XDPoS).EngineV2.(*engine_v2.XDPoS_v2)
    engineV2.SetCurrentRoundForTesting(types.Round(round))
    // Build a fresh header with parent
    header := &types.Header{
        ParentHash: parent,
        Number: new(big.Int).Add(parentHeader.Number, big.NewInt(1)),
        GasLimit: parentHeader.GasLimit,
        Time: parentHeader.Time + 2,
        Coinbase: signerAddrFromKey(signerKey),
    }
    if err := api.eth.Engine().Prepare(api.eth.BlockChain(), header); err != nil { return nil, err }
    statedb, _ := api.eth.BlockChain().StateAt(parentHeader.Root)
    block, err := api.eth.Engine().FinalizeAndAssemble(api.eth.BlockChain(), header, statedb, &types.Body{}, nil)
    if err != nil { return nil, err }
    sealed := make(chan *types.Block, 1)
    if err := api.eth.Engine().Seal(api.eth.BlockChain(), block, sealed, make(chan struct{})); err != nil { return nil, err }
    return rlp.EncodeToBytes((<-sealed).Header())
}
```

Both binaries need this RPC added under a build tag. (v2.6.8's Seal returns synchronously — the harness must adapt.)

### 5.5 Pass criteria

For at least 5 consecutive V2 round-numbers, the deterministic harness must yield byte-identical headers:

```bash
PASS=0
FAIL=0
for R in 4521 4522 4523 4524 4525; do
  HASH_268=$(harness_v268 $R)
  HASH_GP5=$(harness_gp5 $R)
  if [ "$HASH_268" = "$HASH_GP5" ]; then PASS=$((PASS+1)); else FAIL=$((FAIL+1)); fi
done
echo "Pass: $PASS / 5; fail: $FAIL"
```

`PASS=5` is required.

### 5.6 Diagnostic-on-failure decision tree

If `header_diff.txt` is non-empty:

1. **`Validator` field differs** (e.g., empty in GP5, populated in v2.6.8): M-4 (Seal V1 path doesn't write `header.Validator`). Confirm by checking `block_n3.json | jq .result.header.validator` is null.

2. **`Difficulty` field differs** (e.g., 2 in GP5, 1 in v2.6.8): M-5 (CalcDifficulty wrong dispatch).

3. **`Extra` field differs at byte 0** (V1: `0x00 * 32`, V2: RLP-list prefix `0xc8...`): M-1 (Prepare V1 path).

4. **`Validators` or `Penalties` field differs** (empty in GP5 at epoch switch, populated in v2.6.8): M-1 — V2 engine populates these in `Prepare`.

5. **`Coinbase`, `Time`, `GasLimit` differ**: worker integration / Phase 11 — `genParams.coinbase` not equal to signer, or `header.Time` computed differently.

6. **`Root` differs** (state root): M-3 (FinalizeAndAssemble V1 reward gate). If round R is an epoch switch, GP5 may have skipped rewards while v2.6.8 applied them.

7. **Headers byte-identical but bodies differ**: tx-pool ordering. Geth 1.17's tx-pool in patient orders txs differently than v2.6.8. Compare `block.Transactions()` element-by-element.

8. **Headers and bodies identical but signature `Validator` differs** (different bytes for what is allegedly the same hash signed by the same key): the signed hash differs. Bisect `sigHash` field-by-field by computing `sigHash(header)` on both binaries via debug RPC and comparing. M-6 confirmed if patient's wrapper-level sigHash includes geth-1.17 fields v2.6.8 doesn't.

### 5.7 Stretch: multi-round mining

Once Stage 5 single-round passes, run the four-node testnet for ≥ 1 epoch (900 blocks) with mixed proposers. Verify:
- Every GP5-proposed block is committed (3-chain commit) by all four nodes.
- Every v2.6.8-proposed block is committed by all four nodes.
- No round stalls; no fork.

Capture all logs. Pass criterion: `current_block` advances by ≥ 900 within the epoch with zero rejected blocks logged on any node.

---

## Stage 6 — Mixed-peer canary (Apothem mainnet)

### Goal

Confirm GP5 and v2.6.8 nodes can run as peers on Apothem mainnet without forking. Two phases:

**Phase 6A** — read-only sync. Run GP5 alongside v2.6.8 against live Apothem; both sync from genesis or from a snapshot; their canonical chain heads must match for ≥ 10,000 blocks.

**Phase 6B** — GP5 enabled as miner. Register a GP5-keyed masternode on Apothem; observe whether v2.6.8 peers accept GP5-produced blocks and whether the chain advances when GP5 is the proposer.

### 6.1 Phase 6A — read-only canary (low-risk)

Setup on the `xdc07` server:

```bash
# Option A: sync from snapshot (faster)
SNAPSHOT_URL="https://download.xinfin.network/apothem/latest.tar.gz"
mkdir -p /data/canary-268 /data/canary-gp5
cd /data/canary-268 && curl -L $SNAPSHOT_URL | tar xz
cd /data/canary-gp5 && curl -L $SNAPSHOT_URL | tar xz

# v2.6.8 binary
/tmp/geth-268 --datadir /data/canary-268 \
  --apothem \
  --syncmode full --gcmode archive \
  --port 30311 \
  --http --http.port 8541 --http.api eth,net,debug,xdpos \
  --maxpeers 25 \
  --verbosity 3 \
  &> /var/log/canary-268.log &

# GP5 binary
/tmp/geth-gp5 --datadir /data/canary-gp5 \
  --apothem \
  --syncmode full --gcmode archive \
  --port 30312 \
  --http --http.port 8542 --http.api eth,net,debug,xdpos \
  --maxpeers 25 \
  --verbosity 3 \
  &> /var/log/canary-gp5.log &
```

Both nodes peer with the live Apothem network. Run for ≥ 12 hours.

Continuous comparison cron (every 60 s):

```bash
#!/bin/bash
H_268=$(curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://localhost:8541 | jq -r .result)
H_GP5=$(curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://localhost:8542 | jq -r .result)

# Compare canonical block hashes at H_268 - 64 (deeply confirmed)
TARGET=$((H_268 - 64))
HASH_268=$(curl -s -X POST -H "Content-Type: application/json" \
  --data "{\"jsonrpc\":\"2.0\",\"method\":\"eth_getBlockByNumber\",\"params\":[\"0x$(printf '%x' $TARGET)\", false],\"id\":1}" \
  http://localhost:8541 | jq -r .result.hash)
HASH_GP5=$(curl -s -X POST -H "Content-Type: application/json" \
  --data "{\"jsonrpc\":\"2.0\",\"method\":\"eth_getBlockByNumber\",\"params\":[\"0x$(printf '%x' $TARGET)\", false],\"id\":1}" \
  http://localhost:8542 | jq -r .result.hash)

if [ "$HASH_268" != "$HASH_GP5" ]; then
  echo "$(date) FORK at block $TARGET: 268=$HASH_268 GP5=$HASH_GP5" >> /var/log/canary-fork.log
fi
```

### 6.1.1 Pass criteria for Phase 6A

- Zero divergence in canonical block hash at depth 64 for ≥ 10,000 consecutive blocks (~5.5 hours at 2s/block).
- Zero `unable to verify header`, `invalid stateRoot`, `invalid difficulty`, `bad seal`, `validators not legit`, `errForkIDRejected` errors in `/var/log/canary-gp5.log`.
- Zero peer disconnects logged with reason `useless peer` from v2.6.8 peers (would indicate handshake or message-decode failures).

If genesis hash is still placeholder (`CC-1`), Phase 6A will fail at boot with `errGenesisMismatch` from the very first peer dial. Fix CC-1 before running.

### 6.1.2 Diagnostic-on-failure for 6A

| Symptom | Likely cause | Audit reference |
|---|---|---|
| Boot-time `genesis hash mismatch` | CC-1 placeholder genesis hash | Master prompt known divergences |
| Boot-time `errForkIDRejected` from peers | Fork-block configuration mismatch | Phase 12 W-9 |
| Sync stalls at block N where N == V2.SwitchBlock | V1→V2 transition handling | NF-2 / NF-3 / C-comeback (read-side) |
| `invalid stateRoot at block N` (V2 region) | Reward / penalty handling | NF-2, NF-3, C-comeback |
| `invalid difficulty at block N` (V2 region) | Read-side difficulty check (V1 vs V2) | Phase 7 (re-verify) |
| Hash divergence at random V2 block | Snapshot or masternode-list reconstruction | Phase 1, 4 (read-side) |

### 6.2 Phase 6B — mining canary (high-risk)

**ONLY** run after Phase 6A is solid for 24+ hours AND all M-1 through M-7 fixes are in place AND W-1 through W-6 are wired. Phase 6B without those fixes will produce malformed blocks the Apothem network rejects, but more importantly, since GP5's BFT broadcast is stubbed, GP5 will commit blocks **locally** that no other node sees — local fork inevitable.

#### 6.2.1 Pre-flight checklist (must all be green)

- [ ] CC-1 (Apothem genesis hash) replaced with `0xbdea512b4f12ff1135ec92c00dc047ffb93890c2ea1aa0eefe9b013d80640075`.
- [ ] M-1, M-3, M-4, M-5, M-7 dispatches landed.
- [ ] M-6 (SealHash dispatch + drop geth-1.17 conditional fields from sigHash) landed.
- [ ] M-2 (Finalize) confirmed correct in practice via Stage 5 state-root comparison.
- [ ] W-1, W-2, W-3 inbound BFT handlers route to `EngineV2.{Vote,Timeout,SyncInfo}Handler`.
- [ ] W-4, W-5, W-6 outbound broadcasts actually call `p2p.Send(... msgCode, encodedBytes)`.
- [ ] F-11-1 sealing trigger pathway (XDC agent) implemented.
- [ ] Stage 5 deterministic harness PASS=5/5.

#### 6.2.2 Setup

Register a masternode key on Apothem (consult network ops). Confirm the address is in `getMasternodes(currentBlockNumber)` via reference RPC.

```bash
/tmp/geth-gp5 --datadir /data/canary-gp5-mine \
  --apothem \
  --syncmode full --gcmode archive \
  --port 30312 \
  --http --http.port 8542 --http.api eth,net,debug,xdpos,personal \
  --mine \
  --miner.etherbase 0x<masternode-address> \
  --unlock 0x<masternode-address> \
  --password /etc/xdc/pwd \
  --maxpeers 25 \
  --verbosity 4 \
  &> /var/log/canary-gp5-mine.log &
```

Monitor for ≥ 24 hours. The masternode key should be the proposer for ~1/N rounds where N is the active masternode count (≈ 108 on Apothem → ~1 round per ~3.6 minutes when in turn).

#### 6.2.3 Pass criteria for Phase 6B

- Every GP5-produced block is committed (3-chain rule) by v2.6.8 peers — verify by querying `eth_getBlockByNumber` on a v2.6.8 peer at GP5's claimed-mined block height and confirming hash match.
- Zero `bad seal`, `invalid signer`, `validators not legit`, `invalid difficulty`, `invalid stateRoot` errors logged on v2.6.8 peers when receiving GP5-produced blocks.
- Zero local fork events on GP5 (i.e., GP5 never has a `currentBlock` whose hash isn't on the canonical chain held by v2.6.8 peers).
- The next v2.6.8-proposed block (in the round after GP5's) builds on GP5's block (via `parentHash`) without reorg.
- BFT vote/timeout messages emitted by GP5 are received by v2.6.8 peers (verify via `xdpos_receivedVotes` or equivalent introspection on the v2.6.8 side).

#### 6.2.4 Diagnostic-on-failure for 6B

| Symptom on v2.6.8 peer | Likely cause | Audit reference |
|---|---|---|
| `invalid extra` on RLP decode | M-1 not landed (V1 Extra layout) | M-1 |
| `errInvalidDifficulty` | M-5 not landed | M-5 |
| `signer not in masternode list` | M-4 + M-6 (sigHash divergence) | M-4, M-6 |
| `invalid stateRoot` | M-3 (reward gate) or M-2 lingering | M-2, M-3 |
| `vote signature not verified` | sigHash divergence on Vote (W-8) | W-8 |
| GP5 commits block locally; v2.6.8 doesn't see it | W-4/W-5/W-6 not landed (broadcasts stubbed) | W-4..W-6 |
| GP5 doesn't see incoming votes | W-1/W-2/W-3 not landed (handlers stubbed) | W-1..W-3 |
| Round stalls at every GP5 leader-turn | composite — likely M-1/M-4 + W-4 | combination |

### 6.3 Rollback

If Phase 6B fails, immediately:

1. Stop GP5 mining (`miner.stop()` via attach console).
2. Confirm canonical Apothem head on a v2.6.8 reference node.
3. If GP5's local fork is shorter than 64 blocks, GP5 will recover via re-org once it sees the canonical head from peers. If longer, wipe GP5's chain data and re-sync from snapshot.
4. Re-run audit: which finding was the proximate cause? Add to the deferred-fixes list.

### 6.4 Reporting template

Stage 6 sign-off requires:

```
== Apothem canary report ==
Phase 6A:
- Duration: <hours>
- Blocks observed: <N>
- Hash divergences at depth 64: <count>
- Boot errors: <none|list>
- Peer drops with reason "useless peer": <count>
- Sync errors: <none|list>

Phase 6B:
- Duration: <hours>
- Rounds observed where GP5 was proposer: <N>
- Blocks GP5 produced: <N>
- Blocks accepted by v2.6.8 peers (canonical chain): <N>
- Blocks rejected: <N> (with reasons)
- BFT vote/timeout messages emitted by GP5 (count): <N>
- BFT vote/timeout messages received by GP5 from peers (count): <N>
- Local fork events on GP5: <count>
- Reorgs > 5 blocks: <count>

Verdict: <PASS|FAIL with audit reference IDs>
```

---

## Cross-references

- Stage 5 step 5.6 diagnostic decision tree maps to the Section B per-divergence dossier of `part2_phases_9-12.md`.
- Stage 6 pre-flight checklist (6.2.1) enumerates the gate findings.
- Both stages depend on CC-1 (genesis hash) and W-9 (handshake) being resolved first.
