Files
chat_grid/refactor.md

102 lines
5.2 KiB
Markdown
Raw Normal View History

2026-02-20 08:16:43 -05:00
# Rewrite Plan: Modern Cross-Browser Spatial Grid
## What This Code Is Today
The current code is a functional prototype: one large HTML file with tightly coupled UI/game/network/audio logic, plus a single Python signaling script. It proves the concept, but it is hard to test, hard to evolve safely, and brittle across browser differences.
At its core, the product is a realtime spatial chat grid with movement and command-driven interaction.
## Rewrite Strategy (No Backward-Compatibility Constraints)
Build a new app in parallel, then cut over once parity + quality gates pass.
V1 explicitly ships without TURN relay infrastructure.
Design the new core so future features (objects, pickups, walls, collisions, and interaction rules) can be added without architectural rework.
## Target Architecture
- `client/`: TypeScript + Vite + Canvas renderer + state store.
- `server/`: Python signaling service using `websockets` (schema-validated).
- `shared/`: Message contracts (JSON schema + generated TS types).
- `tests/`: Unit + Playwright E2E (Chromium, Firefox, WebKit).
- `docs/`: Browser compatibility matrix and ops runbook.
- Core domain model:
- world map + tile metadata (walkable/blocked/zone)
- entities (`player`, `object`, future NPC/system entities)
- actions/commands (move, rename, locate, interact, pickup, use)
- simulation rules (movement, collision, proximity effects)
## Technology Choices
- Client: TypeScript, Vite, ESLint, Prettier, Vitest.
- Realtime: WebSocket signaling + WebRTC media.
- Server runtime: Python + `websockets`.
- Validation: Zod/JSON Schema for all inbound/outbound messages.
- Deployability: Environment-based config only (no hardcoded cert paths).
- ICE for v1: STUN-only (`stun:stun.l.google.com:19302`) with robust retry and failure handling.
## Phases
### Phase 1: Parity Baseline + Browser Hardening (5-8 days)
- Scaffold monorepo structure.
- Define protocol schemas: `welcome`, `signal`, `update_position`, `update_nickname`, `user_left`.
- Implement strict server validation and structured logging.
- Rebuild current behavior with parity:
- grid render + movement + presence sync
- existing commands: `c`, `l`, `shift+l`, `u`, `n`, `m`, `escape`
- nickname flow and reconnect/disconnect behavior
- Cross-browser hardening for latest Chrome/Edge/Firefox/Safari:
- keyboard handling via `event.code`
- capability checks for `setSinkId`, `StereoPannerNode`, autoplay/permissions differences
- explicit no-TURN recovery UX and bounded retry/backoff
- Keep grid/presence functional even if voice fails on restrictive networks.
- V1 server requirements (Python-focused):
- Use Python `websockets` for signaling transport.
- Enforce strict message validation (Pydantic/JSON schema) on receive and send.
- Add structured logging and websocket behavior tests.
### Phase 2: World + Extensibility Architecture (3-5 days)
- Introduce world + entity foundation:
- tile map abstraction with collision checks
- player entity and object entity schema
- action dispatcher for current commands and future interactions
- Keep simulation pure and testable (state in, state out).
### Phase 3: Advanced Audio + WebRTC Robustness (2-4 days)
- Implement peer connection manager and retry/timeout policy.
- Build browser capability layer:
- `setSinkId` optional
- `StereoPannerNode` optional
- autoplay/promise failure handling
- Graceful degradation: if unsupported, keep grid/presence fully functional and reduce to basic audio.
- Implement explicit no-TURN recovery UX:
- Detect ICE `failed`/`disconnected` states and auto-retry with bounded backoff.
- Surface actionable status text (network-restricted, retrying, voice unavailable).
- Keep text/status + grid presence fully functional when voice cannot connect.
### Phase 4: Quality Gates (2-4 days)
- Unit tests for state, protocol, input, and audio math.
- Playwright multi-user E2E in Chromium/Firefox/WebKit.
- Add CI for lint, typecheck, unit tests, and cross-browser smoke tests.
- Add world-rule tests:
- wall collision and blocked-tile movement rejection
- object pickup/use action validation
- deterministic command outcomes
### Phase 5: Cutover (1-2 days)
- Deploy rewrite behind new route or domain.
- Run soak tests and monitor connection/error metrics.
- Decommission old prototype once stable.
## Definition of Done
- Grid + movement + presence stable in latest Chrome, Edge, Firefox, Safari.
- Audio works where supported and degrades cleanly where not.
- Zero runtime dependence on inline scripts or CDN Tailwind runtime.
- One-command local startup and passing CI.
- Known no-TURN limitation documented: some restrictive NAT/firewall networks may not establish voice.
## Post-v1 TURN Trigger
Add TURN when either condition is met:
- Voice connection failure rate exceeds agreed threshold in production telemetry.
- Target users include enterprise/school/mobile networks where relay need is expected.
## Recommended First PR
Create repo skeleton + protocol schema + server validator + parity client slice that renders grid, syncs positions, and supports current commands.
## Recommended Second PR
Add browser hardening completion (capability fallbacks, reconnect UX) and Playwright parity tests across Chromium/Firefox/WebKit.