Files

[add] src/geometry/model-instance-collider-generation.ts
 [add] src/runtime-three/rapier-collision-world.ts
 [change] AGENTS.md
 [change] CHAT_CONTEXT.md
 [change] architecture.md
 [change] package.json
 [change] prompts-lite.txt
 [change] prompts.txt
 [change] roadmap.md
 [change] src/assets/model-instances.ts
 [change] src/document/migrate-scene-document.ts
 [change] src/document/scene-document-validation.ts
 [change] src/document/scene-document.ts
 [change] src/runtime-three/first-person-navigation-controller.ts
 [change] src/runtime-three/navigation-controller.ts
 [change] src/runtime-three/runtime-host.ts
 [change] src/runtime-three/runtime-scene-build.ts
 [change] src/runtime-three/runtime-scene-validation.ts
 [change] src/viewport-three/viewport-host.ts
 [change] testing.md

2026-04-04 07:55:41 +02:00

14 KiB

Raw Blame History

testing.md

Philosophy

This project is a tool and a runtime.

That means we must test both:

correctness of authored data and transformations
actual browser behavior experienced by the user

We do not rely on a single testing style. We use a layered strategy:

unit tests
domain/model tests
geometry and serialization tests
browser-level integration tests
end-to-end tests
manual QA for spatial/editor ergonomics

The goal is not maximal test count. The goal is confidence in the edit -> save/load -> run loop.

Early in the roadmap, “save/load” may mean local draft persistence plus JSON import/export. Once scenes depend on external binary assets, “save/load” must expand to cover the portable project package path as well.

Testing priorities

Highest-priority confidence areas:

document validity and migrations
undo/redo correctness
brush generation correctness
per-face material/UV persistence
runtime build correctness
asset import survival
imported-model collider generation and runtime collision correctness
project package portability once binary assets exist
runner navigation/input reliability
spatial audio and interaction basics
critical regressions caught in CI

Test stack

Recommended baseline:

Vitest for unit and integration tests
Vitest Browser Mode where real browser behavior is needed at component/integration level
Playwright for end-to-end testing
optional lightweight golden fixtures for serialized documents and runtime builds

No snapshot-heavy strategy by default. Prefer explicit assertions over giant snapshots.

Global testing rules

Schema changes

Whenever the persisted SceneDocument schema changes:

make the compatibility decision explicit
bump the version when needed
add at least one migration or compatibility test

Persistence coverage

For every author-authored feature that persists:

add a round-trip save/load test
cover the current persistence path used by the product at that milestone
once asset-bearing scenes exist, cover the project package path where relevant
avoid assuming that runtime-only state is persisted

Small fixtures

Prefer tiny, explicit fixtures over large assets or giant snapshots.

Test categories

1. Pure unit tests

Purpose:

fast confidence on isolated logic

Scope:

math helpers
grid snapping
ID utilities
small schema defaults
validation helpers
transform calculations
UV helper logic
entity defaulting logic

Characteristics:

no DOM
no WebGL
no three.js renderer boot if avoidable
deterministic
extremely fast

Examples:

snapValue(1.23, 0.5) -> 1.0
UV rotate/flip calculations
entity schema default application
command label generation if logic matters

2. Domain/model tests

Purpose:

validate the canonical document model and command behavior

Scope:

document factories
migrations
command execution
command undo/redo
selection semantics where model-driven
validation rules

Examples:

create brush command adds valid brush
undo removes it cleanly
redo restores the same result
invalid entity reference is detected
old scene version migrates correctly

These tests should not need a browser renderer.

3. Geometry tests

Purpose:

verify brush/kernel correctness

Scope:

primitive generation
face generation
topology expectations
collision mesh generation
imported-model collider generation
Rapier-backed collider/query integration where relevant
UV projection generation
clipping results
derived mesh determinism

Examples:

box brush creates expected face count
stairs generator creates expected step count
fit-to-face UV produces finite values
clipping yields valid child brushes
generated geometry contains no NaNs
rebuild is deterministic for the same input
imported model collider generation produces finite valid data for the selected mode
imported-model collider generation honors the authored mode semantics:
- terrain -> heightfield
- static -> triangle mesh
- dynamic -> compound convex pieces
- simple -> primitive or convex hull
unsupported imported-model collision modes fail clearly instead of producing silent garbage

Geometry test principles

assert invariants, not fragile exact arrays unless necessary
prefer bounded numeric comparisons
verify no degenerate triangles where required
test edge cases: tiny sizes, rejected zero-like values, unsupported cases failing clearly

Geometry is a high-risk area and deserves dense testing.

4. Serialization tests

Purpose:

ensure document persistence is trustworthy

Scope:

save/load round trips
migration paths
invalid file handling
missing refs behavior
canonical normalization if any

Examples:

scene round-trips without losing face materials
UV state survives save/load
imported asset refs survive save/load
project package export/import preserves asset-backed scenes
unsupported version throws an understandable error
migration from v1 to v2 preserves semantics

Required pattern

For every substantial document feature, add at least:

one round-trip save/load test
one migration or backward-compatibility consideration if schema changed

For authored imported-model collision settings, also add at least:

one round-trip test for the selected collision mode/settings
one validation/build-path test for missing asset or incompatible collision-mode assumptions where relevant
one runtime/query-path test proving the generated collider participates in the actual collision/query layer rather than only existing as dead metadata

5. Browser integration tests

Purpose:

verify real browser behavior that pure tests cannot cover

Use for:

pointer interactions
keyboard shortcut handling
focus issues
canvas/UI interaction boundaries
panel interactions
browser API edge behavior
audio unlock flows where practical
pointer lock flows where practical

Examples:

clicking viewport selects a brush
dragging a gizmo updates inspector values
applying material through UI changes a selected face
entering play mode mounts the runtime canvas
pointer lock request path is handled correctly

6. End-to-end tests

Purpose:

verify the actual user flows across the product

Playwright covers:

page loading
cross-browser execution
real input simulation
visible UI assertions
route/deployment behavior
screenshot and trace capture on failures

Required e2e flows for early milestones

E2E-01 Empty app boots

app loads
viewport visible
no fatal console errors

E2E-02 Create box brush

create box brush
select it
persist through the current save path
reload
brush still exists

E2E-03 Apply material

create room or brush
assign material to a face
persist through the current save path
reload
material persists

E2E-04 Run scene

place PlayerStart
enter run mode
runtime loads
first-person or orbit mode active

E2E-04b World environment

author non-default world lighting/background settings
save or persist through the current path
reload
editor and runner still reflect those settings

E2E-05 Import asset

import test GLB
place a model instance
reload
instance remains visible

E2E-06 Trigger action

create trigger and target
run scene
activate trigger
target effect occurs

E2E-07 Project package portability

export a project package
import or reopen it in the editor
asset-backed scene remains usable

These flows should expand with milestones.

7. Manual QA

Some qualities are hard to fully automate, especially in spatial tools.

Manual QA is required for:

authoring feel
camera comfort
snapping quality
transform ergonomics
texture workflow speed
runtime movement feel
browser UX polish
spatial audio perception

Manual QA checklist style

Every slice should include:

setup
expected steps
expected result
known limitations
browser(s) tested
screenshots or short recordings if helpful

Test directory guidance

Suggested structure:

src/
  ...
tests/
  unit/
  domain/
  geometry/
  serialization/
  browser/
  e2e/
fixtures/
  documents/
  assets/
  packages/
  exports/

Alternative layouts are fine if the categories remain conceptually clear.

Naming conventions

Use descriptive names.

Good:

create-box-brush.command.test.ts
scene-roundtrip.materials.test.ts
runtime-trigger-teleport.e2e.ts

Bad:

misc.test.ts
editor2.test.ts
utils.spec.ts

Test names should tell a future reader:

what behavior is being protected
what broke if it fails

Core invariants to protect

The following invariants are important enough to deserve repeated coverage:

Document invariants

IDs are unique
references resolve or fail clearly
version is known/migratable
entity payload matches type schema
model instances are not mixed into entity collections

Command invariants

execute changes state correctly
undo restores previous state
redo reproduces execute result
command history remains coherent

Geometry invariants

generated meshes contain finite numeric values
expected face counts/topology rules hold
collision/output is deterministic
invalid inputs fail safely

Serialization invariants

save/load preserves semantics
unsupported versions do not silently corrupt
migrations are explicit and tested
binary asset persistence survives the current project-storage strategy
project package export/import preserves semantics when that path exists

Runtime invariants

runner loads valid scenes
missing optional systems fail gracefully
navigation controller activation is exclusive and consistent
interactions target the correct entities or model instances

What to unit test vs what to e2e test

Unit test

When logic is:

deterministic
isolated
data-heavy
performance-sensitive
easier to debug outside the browser

Examples:

brush face generation
UV transforms
validation
migrations
command sequencing

E2E test

When behavior depends on:

actual browser input behavior
canvas and DOM interaction
route/app boot
browser APIs
focus/pointer lock/input timing
asset load flows

Examples:

selecting and moving things via UI
entering play mode
first-person input behavior
import workflow if browser-exposed
prompt/click interactions

Fixture strategy

Use small, explicit fixtures.

Document fixtures

minimal empty doc
one-box-room
textured-room
lit-room
trigger-scene
imported-asset-scene
packaged-project scene
migration-old-version scene

Asset fixtures

tiny GLB static mesh
tiny GLB animated mesh
tiny environment image or skybox fixture
simple audio file
placeholder textures

Keep fixtures:

tiny
deterministic
checked into the repo when legally safe
documented

Do not use giant random assets in core CI.

Browser support testing

At minimum, regularly test in:

Chromium
Firefox
WebKit where relevant

Not every test must run in every browser in every iteration, but critical e2e coverage should include cross-browser confidence at appropriate cadence.

Early CI suggestion:

smoke in Chromium on every push
broader cross-browser on main branch / PR gate / nightly depending on cost

CI expectations

Baseline CI pipeline should include:

install
typecheck
lint
unit/domain/geometry/serialization tests
browser integration tests where stable
Playwright smoke/e2e subset
test artifact upload on failure

Required artifacts on e2e failure

Capture where possible:

screenshots
traces
video if worth the storage cost
console logs
failed document/export fixture if relevant

These artifacts materially reduce debugging time.

Performance testing

Do not overcomplicate early performance testing, but do track basic regressions.

Recommended early checks:

app boot time smoke metric
scene build time for a representative small scene
brush rebuild time for representative test cases
asset import of a small reference GLB
runtime frame stability in a standard test scene

This can begin as manual/dev benchmarking and later become more formal if needed.

Audio testing notes

Spatial audio is important, but automated audio verification is limited.

Automate what we can

sound entities load
trigger paths call correct audio system methods
invalid audio refs surface errors
autoplay rules behave as expected in app state

Manually verify

perceived spatial positioning
distance attenuation feel
loop transition quality
browser-specific unlock friction

Include manual audio QA notes in slices touching audio.

Input testing notes

Input in browser apps is full of edge cases.

Explicitly test:

keyboard focus transitions
pointer lock enter/exit
escape handling
canvas vs panel focus
gamepad absent/present behavior
drag cancellation when pointer leaves element/window

Where automating is hard, document the manual verification steps.

Regression policy

Every bug fix should add one of:

a unit/domain/geometry test
a browser integration test
an e2e test
a documented manual regression step if automation is genuinely not feasible yet

Do not accept “fixed” without protecting against recurrence.

Done criteria from a testing perspective

A slice is not done until:

happy path is covered
one obvious failure path is covered
save/load or persistence path is covered if the feature is author-authored
manual QA notes are written
test commands are documented if new setup is needed

Minimum test commands to maintain

Keep the project easy to verify.

Recommended scripts:

{
  "test": "vitest run",
  "test:watch": "vitest",
  "test:browser": "vitest --browser --run",
  "test:e2e": "playwright test",
  "test:e2e:ui": "playwright test --ui",
  "test:typecheck": "tsc --noEmit"
}

Exact commands may evolve, but the repo should always expose a simple path for:

fast local checks
browser checks
e2e checks
CI checks

What we do not test aggressively yet

Initially, avoid over-investing in:

screenshot snapshot forests
fragile pixel-perfect rendering tests
massive browser matrix on every commit
giant scene stress tests before the core workflow is stable
plugin systems we do not yet have

Test the heart of the product first:

data integrity
brush correctness
interaction correctness
runtime usability

14 KiB Raw Blame History

testing.md

Philosophy

Testing priorities

Test stack

Global testing rules

Schema changes

Persistence coverage

Small fixtures

Test categories

1. Pure unit tests

2. Domain/model tests

3. Geometry tests

Geometry test principles

4. Serialization tests

Required pattern

5. Browser integration tests

6. End-to-end tests

Required e2e flows for early milestones

E2E-01 Empty app boots

E2E-02 Create box brush

E2E-03 Apply material

E2E-04 Run scene

E2E-04b World environment

E2E-05 Import asset

E2E-06 Trigger action

E2E-07 Project package portability

7. Manual QA

Manual QA checklist style

Test directory guidance

Naming conventions

Core invariants to protect

Document invariants

Command invariants

Geometry invariants

Serialization invariants

Runtime invariants

What to unit test vs what to e2e test

Unit test

E2E test

Fixture strategy

Document fixtures

Asset fixtures

Browser support testing

CI expectations

Required artifacts on e2e failure

Performance testing

Audio testing notes

Automate what we can

Manually verify

Input testing notes

Regression policy

Done criteria from a testing perspective

Minimum test commands to maintain

What we do not test aggressively yet

14 KiB

Raw Blame History