# Upstream Sync Alignment Proposal (v2 — true upstream alignment)

**Status:** Proposal · **Revised:** 2026-05-12 — replaces the v1 plan that proposed reviving deleted upstream code · **Target:** Q3 2026

## What changed from v1

The v1 proposal (the original issue #576 body) included porting the pre-merge `eth/fetcher.BlockFetcher` from `v1.10.26`. That code path was **deleted upstream** post-merge; porting it back means maintaining a fork of dead code that will never see upstream patches — the opposite of the alignment goal.

This v2 replaces the fetcher port with **engine-API-style block delivery**, the pattern current upstream geth uses. Same latency outcome (≤ block-time lag), but it tracks code that upstream still maintains.

The other changes are structural: the v1 plan bundled architectural rewrite + merge-workflow improvements + latency fix into one 9-week effort. v2 carves them into independently-shippable pieces so each one stands on its own merits.

## Problem

XDC currently diverges from upstream geth in four shared files:

| File | What XDC adds | LOC inline |
|---|---|---|
| `core/blockchain.go` | `checkpointSyncNoState`, `XdcBulkSyncMode` guards, `InsertHeadersBeforeCutoff`, V2 snapshot pre-seed, `RewindHead` | ~480 |
| `eth/handler.go` | `xdcSyncer` wiring, NewBlockHashes legacy cases | ~30 |
| `eth/handler_eth.go` | NewBlockPacket / NewBlockHashes legacy cases, propagation guard | ~80 |
| `eth/downloader/downloader.go` | Cutoff-aware processHeaders, ancient-store insertion, tip-fork recovery | ~50 |

Every quarterly upstream merge conflicts on these files. Today's behaviour is correct (chain syncs, ~2 s tail-lag after PRs #577/#579/#580) but the architectural debt grows with each merge.

A second class of divergence is the parallel sync pipeline (`xdcSyncer` + `downloader_xdc.go`) which exists because XDC peers (v268) speak legacy hash-based protocol without RequestId framing. That's a permanent divergence at the wire level — but it doesn't have to mean a parallel state machine.

## Goals

1. **Mergeable upstream tracking.** Quarterly upstream-version bumps should be hours, not weeks. Security patches and EVM forks apply cleanly.
2. **Single sync state machine.** Legacy-protocol peers become a peer-version-dispatched subcase inside upstream's `Downloader`, not a parallel codepath.
3. **Streaming tail-follow.** Propagated blocks import as they arrive (≤ 2 s lag = one block-time) using the pattern current upstream uses, not the deleted one.
4. **Track upstream, not v2.6.8.** Where current upstream geth and v2.6.8 diverge in patterns (e.g. `eth/fetcher` vs engine-API delivery), prefer upstream.

## Non-goals

- Removing trusted-checkpoint anchor (`--syncfromblock`) — XDC-specific, stays. Moves to a sidecar.
- Removing `checkpointSyncNoState` — XDC-specific, stays. Moves to a sidecar.
- Migrating XDC mainnet to eth/68+ wire protocol — out of scope. v268 stays at the wire level.
- Re-introducing pre-merge `eth/fetcher.BlockFetcher`. Use engine-API-style delivery instead.

## Phases — independently shippable

Each phase is a standalone deliverable with its own value proposition. Pick any one without committing to the others.

### Phase 0 — `processFullSyncContent` race fix (P1, ~half day)

Standalone bug fix. `spawnSync` doesn't fire its `i == len-1` queue.Close cleanly under XDC's fetcher arrangement; `processFullSyncContent` waits on `Results(true)` forever. Today masked by the stall watchdog.

Root-cause in isolation (reproducible test case) and fix. Removes the need for the stall watchdog regardless of whether any other phase happens.

**Value:** removes ~80 lines of workaround in `eth/sync_xdc.go`. Latency unchanged (still cycle-based) but no more "timeout exceeded" / "stalled" log lines. Architectural-debt reduction.

### Phase 1 — Sidecar refactor (P1, ~1 week)

Move every inline XDC block in shared files to a sidecar:

- `core/blockchain.go`'s `checkpointSyncNoState` logic → `core/blockchain_xdc.go`, accessed via methods on a small interface that the shared `blockchain.go` calls.
- `eth/downloader/downloader.go`'s cutoff-aware processHeaders → `eth/downloader/downloader_xdc.go` (already partially split).
- `eth/handler*.go` legacy-packet cases → `eth/handler_xdc.go`.

Shared files end up with ~5 lines of indirection where they had ~50 lines of inline `if XdcBulkSyncMode.Load()` blocks. Conflict surface drops ~80 %.

**Value:** quarterly upstream merge cost drops from "weeks" to "hours". Independently valuable — no behaviour change.

### Phase 2 — Upstream tracking workflow (P0, ~1 hour setup)

Operational, not architectural:

1. `git remote add upstream https://github.com/ethereum/go-ethereum.git`
2. Pin to tagged releases (`v1.14.x`), not master. Quarterly minor-version bumps.
3. `git config rerere.enabled true` — repeated conflict patterns auto-replay.
4. Tag every merged base: `xdc-base-v1.14.x`. Enables bisect across mixed upstream + XDC history.
5. CI matrix runs Apothem sync regression tests on every merge PR.

**Value:** the discipline that makes Phase 1's investment compound. Without Phase 2, even Phase 1's refactor decays as new XDC features add inline edits.

### Phase 3 — Engine-API-style block delivery (P0, ~1 week)

Replaces the current `NewBlockPacket` → `InsertChain` path (and the deleted-upstream fetcher pattern) with the **current** upstream pattern: a thin block-import API that mirrors the `engine_newPayloadV*` shape, fed from XDC's consensus layer at the network handler.

```go
// New: core/blockchain.go — upstream-style entry point
func (bc *BlockChain) NewPayload(block *types.Block) error {
    // Mirror engine_newPayloadV3 semantics: validate, insert, return status.
    // Replaces the bespoke handler_eth.go InsertChain call.
}
```

`handler_eth.go`'s `NewBlockPacket` becomes:

```go
case *eth.NewBlockPacket:
    return h.chain.NewPayload(packet.Block) // or queue for skeleton sync if parent unknown
```

For missing parents: use upstream's existing **skeleton-sync** path (`eth/downloader/skeleton.go`) — the post-merge equivalent of pre-merge fetcher's ancestor walk-back. Skeleton sync is what upstream uses today; we get its improvements for free on every merge.

**Value:** tail-follow latency drops to ≤ 2 s (one block-time). Stall watchdog goes away. Propagation guard (`HasBlock(parent)`) goes away. The architecture matches what upstream maintains in 2026, not what it deleted in 2022.

### Phase 4 — Protocol bridge for v268 peers (P2, ~2–3 weeks)

The remaining permanent divergence is the wire protocol. v268 peers don't speak eth/68+ RequestId framing. Instead of forking the entire `Downloader`, **wrap each peer in a version-aware adapter**:

```go
// eth/protocols/eth/peer_v268.go
type v268Peer struct {
    *eth.Peer
    pending map[uint64]chan *response // legacy reqId → response correlator
}

func (p *v268Peer) RequestHeadersByHash(hash common.Hash, ...) error {
    // Translate the upstream RequestId-based call to v268 wire format.
    // Correlate the response via local pending map.
}
```

Upstream `Downloader` calls `peer.RequestHeadersByHash(...)` polymorphically. The adapter translates underneath. **All XDC-specific protocol code lives in one file under `eth/protocols/eth/`**, not scattered across `downloader_xdc.go` + custom fetcher pipeline.

**Value:** `downloader_xdc.go` (~1500 lines) deletes. Single sync state machine. The only XDC code in the sync path is the peer-version adapter — a well-defined extension point.

## Migration plan — ship in order, pause anywhere

| # | Phase | Effort | Risk | Stops watchdog | Stops divergence |
|---|---|---|---|---|---|
| 0 | `processFullSyncContent` race fix | ½ day | Low | Yes | Partially (~80 LOC) |
| 1 | Sidecar refactor | 1 week | Low | No | Yes (~640 LOC) |
| 2 | Upstream tracking workflow | 1 hr | None | No | Operational |
| 3 | Engine-API-style block delivery | 1 week | Medium | Yes | Yes |
| 4 | v268 peer adapter | 2–3 weeks | Medium | No | Yes (deletes `downloader_xdc.go`) |

**Recommended sequence:** 2 → 0 → 1 → 3 → bake on Apothem for 4 weeks → 4.

Phase 2 first because it's free and the workflow gates the value of every other phase. Phase 0 next because it removes a workaround for free. Phase 1 second because it's the biggest merge-pain reduction and is independently reversible. Phase 3 third because it gets us to the latency target on current upstream patterns. Phase 4 only if Phases 1–3 don't deliver enough alignment.

## What this proposal explicitly does NOT do

- **Does not port `eth/fetcher.BlockFetcher` from pre-merge geth.** That was the v1 plan; v2 replaces it with engine-API-style delivery because the former is dead code upstream.
- **Does not assume v268 protocol is going away.** Phase 4 wraps it cleanly rather than waiting for the ecosystem to migrate.
- **Does not commit to all phases.** Each phase ships standalone.

## Phase boundary checks (when to stop)

- After Phase 0+1+2: re-measure quarterly merge cost. If it's already in the "hours" range, Phases 3+4 become a latency project, not a merge project.
- After Phase 3: re-measure tail-lag. If ≤ 2 s and stable, the latency goal is met; Phase 4 becomes pure cleanup.
- If at any phase the new architecture regresses BFT consensus correctness (validator vote/timeout error rates above pre-rollout baseline), roll back and reassess.

## Open questions

1. Does v268 have semantic differences beyond RequestId framing (e.g. XDPoS vote/timeout messages multiplexed on the same channels as headers/bodies)? Affects whether Phase 4's peer adapter is sufficient or needs deeper abstraction.
2. Where does BFT-specific logic live in the unified architecture? Tip-fork rewind, V2 snapshot seeding, trusted-checkpoint anchor — these are `consensus.Engine`-level concerns that need clear hook points in the unified pipeline. Spell out the BFT extension surface as part of Phase 3.
3. Skeleton sync was designed for beacon-chain-driven targets. XDC doesn't have a beacon chain — the target comes from peer broadcast. Audit needed for whether skeleton sync's assumptions hold.

## References

- Issue #576 — this proposal's home.
- Issue #578 — tail-lag tuning (Phase 0 dependency — current watchdog will be removed by Phase 0+3).
- PR #577 — initial sync fixes (V2 snapshot split, single-round synchronise, propagation guard).
- PR #579 — adaptive stall window + tip-fork recovery.
- PR #580 — surgical `RewindHead`.
- Upstream `eth/downloader/skeleton.go` — the modern equivalent of `eth/fetcher`.
- Upstream `core/engine_api.go` — the engine-API pattern for block delivery.
