# XDPoS Sealing Wireup on Geth 1.17 — Bugs, Fixes, and Validation

This document captures the three bugs that prevented XDPoS mining on the
post-merge geth 1.17 port, the fixes applied in PR #594, and the validation
performed against a 4-node private network.

## Status before PR #594

The deVTest fork is a port of XDPoSChain onto upstream geth 1.17. Sync paths
were validated extensively across PRs #577–#593 (Apothem to tip, mainnet to
~102M). Mining was assumed to "just work" through the same engine plumbing.

It did not. Three independent issues kept any XDPoS chain — mainnet, testnet,
or private — from minting blocks.

## Bug 1 — `engine.Authorize` was never called

`XDPoS.Authorize(signer, signFn)` is the API that injects the validator's
signing function into the engine. Without it, `c.signFn` is `nil` and every
`Seal` returns `ErrUnauthorized`.

In legacy XDPoSChain v2.6.8 this call was made in `eth/backend.go` after
unlocking the etherbase account. In the geth 1.17 port that code was lost in
translation: the wrapper's `Authorize` method exists, the `XdcAgent` is
constructed, but nothing connects the unlocked wallet to the engine.

Compounding the issue, geth 1.17 deprecated `--unlock`, `--password`, and
`--allow-insecure-unlock` to no-ops. Even if `Authorize` were called, the
keystore would still be locked.

### Fix

`eth/backend.go` `Start()`: after `XdcAgent` is started and before sealing
begins, find the keystore wallet for `PendingFeeRecipient`, read the
passphrase from `XDC_MINER_PASSWORD`, call `ks.Unlock`, then call
`xdposEngine.Authorize(etherbase, signFn)`.

The `signFn` is a closure that calls `keystore.SignHash` **directly** on the
input:

```go
signFn := func(_ accounts.Account, _ string, data []byte) ([]byte, error) {
    return ks.SignHash(account, data)
}
```

It must not be `wallet.SignData`. `SignData` runs `keccak256(data)` before
signing — but the V1 engine already passes `sigHash(header)` (already hashed)
as `data`. The verifier `ecrecover`s against `sigHash` directly, so a
double-hash on the signing side makes every recovery land on a wrong-but-
deterministic address. Observed before the fix: ~99% BAD BLOCK rate with
`err=unauthorized`.

### Why it propagates to V2

The wrapper's `Authorize` (`consensus/XDPoS/xdpos.go`) sets `c.signer` /
`c.signFn` and then calls `c.EngineV2.Authorize(signer, v2SignFn)` where
`v2SignFn` is a closure that wraps our `signFn`. So the same fix
unblocks mining on both V1 (private) and V2 (mainnet at block ≥80,370,000,
testnet at ≥56,828,700) without any V2-specific change.

## Bug 2 — `Seal` deadlocked `XdcAgent` on the cooldown path

V1 enforces "signed recently" cooldown: a signer cannot seal again until
`len(signers)/2 + 1` blocks have passed. In the wrapper's `Seal`, the
cooldown branch was:

```go
log.Info("Signed recently, must wait for others", ...)
<-stop
return nil
```

This blocks on the stop channel — the legacy pattern, where Seal ran in
its own goroutine that the worker could abandon.

`XdcAgent.tryMine` is **synchronous**: it polls every 500 ms and calls
`engine.Seal(chain, block, results, stop)` in line. When Seal blocked on
`<-stop`, the agent goroutine froze. The next ticker fire never reached the
select case. The chain stalled after the first signer hit cooldown — observed
on a 4-node network as stall-at-block-4.

### Fix

`consensus/XDPoS/xdpos.go` Seal: on the cooldown branch, return `nil`
immediately. The next agent tick will retry with a refreshed snapshot —
by then this signer is likely eligible again, or the in-turn signer has
taken the slot.

V2 doesn't have this bug. V2 uses round-based proposer selection: only the
current round leader seals, and there is no mid-Seal stop-blocking.

## Bug 3 — ethstats reported `consensus: unknown`

The upstream ethstats client's `nodeInfo` struct has no `consensus` field, so
the auth message sent to ethstats servers never declares which engine is in
use. Dashboards default to `unknown`.

### Fix

`ethstats/ethstats.go`: add `Consensus string json:"consensus,omitempty"` to
`nodeInfo`, populate it in `login()`:

```go
consensus := "ethash"
if s.engine != nil {
    if name := fmt.Sprintf("%T", s.engine); strings.Contains(name, "XDPoS") {
        consensus = "XDPoS"
    }
}
```

The field flows through the `hello` auth payload. Whether the dashboard at
`stats.xdcindia.com` renders it depends on the server-side template; the
client now sends correct data.

## Validation — 4-node private XDPoS network

A scaled-down XDC chain (chainId 99999) was stood up using a patched
`puppeth` from `XinFinOrg/XDPoSChain` to generate a genesis with the same
system contracts mainnet ships. (The legacy puppeth had its own bug — its
storage extractor RLP-trimmed slot values and corrupted multi-byte fields
like stake caps — patched in the puppeth source out-of-tree.)

### Parameter parity vs mainnet and testnet

| Field | Private | Mainnet | Testnet | Identical |
|---|---|---|---|---|
| `period` | 2s | 2s | 2s | ✅ |
| `epoch` | 900 | 900 | 900 | ✅ |
| `gap` | 450 | 450 | 450 | ✅ |
| `reward` | 5000 | 5000 | 5000 | ✅ |
| `rewardCheckpoint` | 900 | 900 | 900 | ✅ |
| EVM forks (homestead..byzantium) | 1,2,3,3,4 | 1,2,3,3,4 | 1,2,3,3,4 | ✅ |
| `0x...0088` XDCValidator code | 14453 B | 14453 B | 14453 B | ✅ byte-identical |
| `0x...0089` BlockSigner code | 849 B | 849 B | 849 B | ✅ byte-identical |
| `0x...0090` Randomize code | 823 B | 823 B | 823 B | ✅ byte-identical |
| `0x...0099` MultiSig code | 5442 B | 5442 B | 5442 B | ✅ byte-identical |

Validator storage slots (stake cap, min stake, decimals, unlock duration) are
identical to mainnet after the puppeth storage-extractor fix.

Intentional differences: chainId, signer count (4 vs mainnet's evolving set),
V2 switch block (999,999,900 to keep private V1-only for the test).

### Mining test results

After all three fixes, the 4-node network was left running for 66 minutes:

| Metric | Result |
|---|---|
| Blocks produced | 2003 (= expected at 2s period for 66 min) |
| Average block time (last 20) | **2.00s** |
| Head agreement across 4 nodes | within 1 block |
| BAD BLOCK rate | 2 / 2003 = **0.1%** (transient `block in the future` races) |
| Pre-fix BAD BLOCK rate | ~99% (`err=unauthorized` on every seal) |

Signer participation was perfectly round-robin (within 1% of equal):

```
0xb5c6...c72b (node1)  7982 signatures observed
0x1000...f753 (node4)  7913 signatures observed
0x5f4a...6b40 (node2)  7899 signatures observed
0x6fc1...6ae3 (node3)  7898 signatures observed
```

`node1`'s `XdcAgent` alone successfully sealed 7991 times.

### Cross-source consensus identity

Both consumer surfaces agree on the authorized signer set:

- `xdpos.getSigners()` (V1 snapshot) — 4 addresses, exact match with genesis
- `eth_call` to validator contract `getCandidates()` — same 4 addresses
- `getCandidateOwner` returns each signer as its own owner
- `getCandidateCap` returns 50,000 XDC per signer (matches puppeth's hardcoded validatorCap)

### Startup log evidence

```
[XDPoS] V1 consensus hooks attached (#72, #97)
XDPoS: keystore unlocked for sealing             address=0xb5c6...c72b
XDPoS: signer authorized for sealing (direct SignHash) address=0xb5c6...c72b
XdcAgent: started V2 sealing trigger loop        interval=500ms
XdcAgent: started sealing loop                   etherbase=0xb5c6...c72b
Stats server connected                           url=stats.xdcindia.com:443 node=XDC-Private-Node1
```

The `direct SignHash` literal confirms the patched code path is active.

## Applicability to mainnet and testnet

| Chain | Engine at current head | Affected by Bug 1 | Affected by Bug 2 | Affected by Bug 3 |
|---|---|---|---|---|
| Mainnet (chainId 50) | V2 (≥80,370,000) | Yes — `Authorize` propagates via wrapper to `EngineV2.Authorize` | No — V2 uses round-based leader, no cooldown block | Yes |
| Testnet (chainId 51) | V2 (≥56,828,700) | Yes (same path) | No | Yes |
| Private (chainId 99999) | V1 (no switch in genesis) | Yes | **Yes — observed stall at block 4 pre-fix** | Yes |

Sync to tip on Apothem and mainnet was already validated in PRs #577–#593.
With PR #594, the binary can also **mine** on whichever chain this node is
authorized to seal on. For mainnet/testnet that requires the local etherbase
to be in the active masternode set — not testable from a non-masternode
node, but the code paths are exercised end-to-end via the private V1 test.

## Running the test yourself

The 4-node private network harness lives in `XinFinOrg/XinFin-Node` under
the `private/` subdirectory (not committed to that repo at the time of
writing). It contains:

- `genesis.json` — puppeth-generated, mainnet-identical contracts, V1-only
- `start-cluster.sh`, `stop-cluster.sh`
- `FORK_TESTING.md` — fork-by-fork test plan

To exercise the patched binary:

```bash
# from go-ethereum/
make geth

# from XinFin-Node/private/
./start-cluster.sh
# wait ~10s, then
curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://127.0.0.1:8501
# head should advance ~1 block every 2 seconds
```

The miner password is read from `XDC_MINER_PASSWORD`; `start-cluster.sh`
exports the per-node passphrase before exec'ing geth.

## Files touched

- `consensus/XDPoS/xdpos.go` (+10 / -2): cooldown deadlock fix
- `eth/backend.go` (+45 / -2): `engine.Authorize` wiring + keystore unlock
- `ethstats/ethstats.go` (+28 / -20): `consensus` field

Total: 3 files, +83 / -24.
