Professional Services Automated Build & Test
A layered testing methodology for a real-time game
Call of Orion is a Python/Arcade space-survival game with thousands of moving parts — physics, combat AI, pathfinding, save/load, and a live render loop. Keeping it stable takes a deliberate test pyramid: a wide base of fast unit tests, a middle tier of headless integration and performance gates that assert real frame-rate thresholds, and a narrow top of multi-minute soak runs — all bug-focused and green before anything merges.
Why a pyramid
A game's logic and its render loop fail in different ways, so they need different tests. Pure logic — damage routing, inventory math, A* pathfinding, save round-trips — is covered by a huge, fast unit suite that runs in a couple of minutes and pinpoints regressions precisely. Anything that depends on a real Arcade window — frame timing, GPU rendering, full-scene behaviour — moves up into headless integration and performance tests. And slow-burn problems like memory growth or frame-rate decay only show up under sustained load, so they get their own soak tier.
The test pyramid
What each tier covers
| Tier | Scope | What it proves |
|---|---|---|
| Fast unit | Isolated logic — no window | Player physics, weapons & melee arcs, asteroids, alien AI, pickups, blueprints, shields, damage routing, buildings, ship modules & AI-pilot behaviour, drones & A* pathing, inventory math, fog of war, and save/restore round-trips all behave exactly as specified. |
| Integration + performance | Real Arcade window (headless) | Full-frame FPS holds above threshold across all three zones, trade and combat scenes, AI-pilot fleets and station shields; GPU rendering microbenchmarks and all six resolution presets stay within budget. |
| Soak / endurance | 5-minute sustained sessions | FPS and resident memory (RSS) stay flat over time — no leaks, no frame-rate decay — across idle, combat churn, dialogue, station-shield cycles, and Star-Maze pressure. |
Fast by default, slow on demand. The default pytest run executes only the fast unit suite; the window-bound integration and soak tests are opt-in, because a shared Arcade window pollutes other tests' window-size math and each one is comparatively slow. Developers get a tight feedback loop locally; the full multi-hour suite runs as the pre-merge gate.
The quality gate
Linting is treated as a bug gate, not a style police: the rule set is deliberately narrow, targeting the failure classes that have actually caused crashes — undefined names, variables used before assignment, mutable default arguments, and loop-variable closure bugs — without drowning the signal in whitespace nits. Every full cycle is written up with totals, durations, and any anomalies, so the suite's health is auditable over time.
Want a test suite that catches regressions before your users do?
Layered, bug-focused testing — from fast unit checks to performance and endurance gates — built for whatever you ship.