# V6 Sync Fix Spec — Non-Restarting Body Fetch with Peer Demotion

## Problem
`fetchBodiesXDC()` in v5b returns `errTimeout` when body download stalls → entire `synchronise()` restarts → 3-5s gap per cycle. Result: ~280 bl/s actual vs ~750+ bl/s potential.

## Solution
Port v268's `fetchParts()` non-restarting, QoS-driven peer demotion model to `fetchBodiesXDC()`.

## Files to Change
- `eth/downloader/downloader_xdc.go` — main changes
- `eth/downloader/peer.go` — add throughput tracking (optional)

## Core Changes

### 1. Replace `return errTimeout` with Peer Demotion

**Before (v5b):**
```go
// fetchBodiesXDC line ~700
if time.Since(lastProgress) > stallTimeout {
    log.Warn("XDC sync: body download stalled", ...)
    return errTimeout  // KILLS SYNC CYCLE
}
```

**After (v6):**
```go
// On stall: expire timed-out requests, re-queue work, continue
if time.Since(lastProgress) > stallTimeout {
    for pid, headers := range inFlight {
        if time.Since(requestStart[pid]) > peerTimeout {
            log.Debug("XDC sync: peer body timeout, demoting", "peer", pid[:8])
            d.queue.CancelBodies(pid)  // re-queue their work
            delete(inFlight, pid)
            peerTimeouts[pid]++
            lastProgress = time.Now()  // reset stall timer
        }
    }
    // Only return error if ALL peers exhausted
    if d.peers.Len() == 0 {
        return errNoPeers
    }
}
```

### 2. Add Ticker-Driven Dispatch (100ms)

**Before:** Dispatch happens in a tight loop with `select` + `default`.  
**After:** 100ms ticker drives dispatch like v268:

```go
ticker := time.NewTicker(100 * time.Millisecond)
defer ticker.Stop()

for {
    select {
    case <-d.cancelCh:
        return errCanceled
    case <-ticker.C:
        // Dispatch to idle peers
        // Check for timeouts
        // Deliver received bodies
    case body := <-xdcBodyCh:
        // Process delivery
    case cont := <-d.queue.blockWakeCh:
        // Headers done signal
    }
}
```

### 3. Track Per-Peer Request Timing

```go
type peerRequest struct {
    headers   []*types.Header
    startTime time.Time
}

inFlight := make(map[string]*peerRequest)
peerTimeout := 10 * time.Second  // per-peer timeout (not global stall)
```

### 4. QoS Throughput Tracking (Optional Enhancement)

```go
type peerThroughput struct {
    blocksDelivered int
    totalTime       time.Duration
    batchSize       int  // adjusted dynamically
}

// On successful delivery:
pt.blocksDelivered += delivered
pt.totalTime += time.Since(req.startTime)
pt.batchSize = min(512, pt.blocksDelivered * 256 / max(1, int(pt.totalTime.Seconds())))

// On timeout:
pt.batchSize = max(64, pt.batchSize / 2)
```

### 5. Peer Lifecycle Events

```go
// Subscribe to peer connect/disconnect
peering := make(chan *peeringEvent, 64)
peeringSub := d.peers.SubscribeEvents(peering)
defer peeringSub.Unsubscribe()

// In main loop:
case event := <-peering:
    if event.join {
        // New peer — immediately dispatch pending work
    }
```

## Testing Plan
1. Build canary image: `gx:fast-sync-v6-<commit>`
2. Deploy to xdc02 apothem (lowest risk, fastest to validate)
3. Compare bl/s: v5b (current) vs v6 over 10-min windows
4. If ≥500 bl/s sustained with no stalls → promote to mainnet canary
5. If regressions → rollback to v5b image

## Success Criteria
- Sustained sync rate ≥500 bl/s (vs v5b's 280 bl/s)
- No sync cycle restarts in logs
- Peer count stable ≥5 (vs v5b's 1-6 fluctuating)
- No data corruption (verify block hashes match v268 reference)

## Rollback
- Keep v5b image tagged and ready
- All nodes can revert with `docker stop && docker run` using v5b image
