Skip to content

Using Ocarina with AI

A working setup: a full test cycle built alongside Claude Code and Ocarina, against the public Katalon CURA demo.

📖 Get the AI example as a reference.

The three spiritual stones

  1. CLAUDE.md at the project root.
  2. skills/ with one <name>/SKILL.md per procedure.
  3. Verification rule: every SUT claim comes from observation (probe, gh api, curl -v), never inference.

CLAUDE.md

Two variants. CLAUDE.md is full (rules + project layout, hierarchy, conventions, CI shape, PR template). CLAUDE.slim.md is rules only. Slim when context is heavy; full for onboarding and reviews. Full wins on disagreement.

Onboarding steps (venv, pip install, the skill battery copied into Claude Code, ruff / mypy / pre-commit, runner smoke-check) live in setup-environment.

Rules:

Security testing is functional and static, never active. No payloads, no crafted requests, no DevTools DOM manipulation. Black-hat scenarios go through the normal UI.

Use constants. Named values aren't inlined.

Datasets are human decisions. Proposing doesn't run.

Verify SUT behaviour empirically. Probe, gh api, or curl -v. Never inference. Re-derive each time: a probe answers only for what it ran; a prior diagnosis only for that run.

Each rule carries a one-line "why."

skills/

One Markdown file per skill, YAML frontmatter + body. Ten families.

Review (13)

Static reads; surface findings.

  • review-spec-gaps — clarification questions on the FRD.
  • review-watcher-misusewatcher.report(...) against the negative-only convention.
  • review-compartmentalisation-leaks — URLs, selectors, magic numbers out of place.
  • review-dead-code — unused connectors / POMs / scenarios / suites / fragments / constants; per finding: delete, incubate (<source-root>/incubator/, dependency tree preserved), or keep.
  • review-report — classify each FAIL / SKIP for one run.
  • Plus: review-type-ignore, review-match-candidates, review-unverified-transitions, review-submit-dispatchers, review-comment-drift, review-suite-stability, review-intent-collisions, review-watcher-emissions.

Analyse (4)

  • analyse-flakiness — widen the transient-error net; chronic deaths are real flakes.
  • analyse-fixture-flakiness — instrument setup/teardown; surface cross-test contamination.
  • analyse-watcher-flakiness — with/without each watcher, interval sweep.
  • analyse-screenshot-flakiness — group by (test, step, browser), spot differences.

Black-hat (6)

  • business-attack-ideation — bring the product down.
  • incoherence-attack-ideation — each step legal, the set impossible.
  • persistence-attack-ideation — repeated retries on blocked actions.
  • permission-appropriateness-audit — is the access model itself appropriate?
  • bfcache-exposure-ideation — BFCache attacks.
  • lateral-resource-ideation — IDOR via the address bar only.

Comprehend (4)

  • assess-test-base — catalogue the suite.
  • assess-ecosystem — bounded public research, token-budget capped.
  • understand-sut-constraints — SUT bounds that break parallel tests.
  • understand-ocarina — walk the docs.

Pick (3)

By mtime, never filename.

  • pick-screenshots, pick-logs, pick-reports.

Author (8)

Each produces a deliverable.

  • empiricism — verify before encoding; don't overwrite intentional-fail gap tests.
  • write-a-probe — throwaway script, gitignored.
  • write-test-strategy — generate the test-strategy doc from the suite (scope, types, coverage tables, cycle tree, pass/fail, gaps, CI matrix).
  • extend-coverage — extend coverage from existing assets.
  • update-frd-and-tests — propagate a spec update.
  • manual-reproduction-guide — human-runnable repro.
  • manage-backlogBACKLOG.md.
  • pr-report — PR-type-aware report.

Refactor (2)

  • refactor-fragmentation — DRY per user preference.
  • introduce-pom-retries — POM-internal retries with the two-test split (first-try + with-retries).

State (1)

  • question-state — interrogate the environment before trusting a result.

Setup (1)

  • setup-environment — venv, dev tooling, the Ocarina skill battery copied into Claude Code's skills directory, driver paths in CLAUDE.local.md, pre-commit loop, runner smoke-check.

Run (1)

  • propose-visual-review — before a local dispatch, offer --not-headless (watch the browser play out) vs headless (CI-shaped). Composes the command; user runs.

Recurring chains

Suite isn't green: review-reportanalyse-*write-a-probe → finding lands in IDENTIFIED_GAPS.md / FRD / scenario comment → probe deleted.

Black-hat scenario looks promising: empiricismextend-coverage (often intentional-fail).

Spec changes: update-frd-and-tests (FRD first, tests follow). Gap tests are reframed, not flipped.

New Ocarina primitive needed: understand-ocarina first, then writing.

About to dispatch a run: propose-visual-review — headed (--not-headless) or headless (CI-shaped)? Composes the command; user runs.

Discipline

Surface, don't apply. Skills produce; the user decides.

Empirical, not assertive. Every SUT claim is observed, cited, dated. Ritual phrase: "Fair point, I'm assuming. Let me verify empirically."

Gap tests are reframed, not turned green. Invert the assertion, rename, move the strategy-doc row, log the resolution in IDENTIFIED_GAPS.md. One motion via update-frd-and-tests.

Watcher emissions are negative signals only. A watcher emitting "login succeeded" breaks the contract.

Distributed when scarcity is shared. If workers contend on a SUT-capped resource (sessions, slots, quotas), coordinate through distributed primitives. Otherwise a worker-local in-memory cache is fine — provided keys can't collide and generation is thread-safe.

Mtime, not filename. UUID suffixes are random; pick-* sorts by mtime.

What this setup isn't

  • Doesn't generate tests autonomously.
  • Doesn't patch hallucinations in CI; a failure triggers review-report + analyse-*.
  • Doesn't rewrite the spec; only update-frd-and-tests does, with a revision line.
  • Doesn't run active security tests. Ever.

Exposed resources


Mojo playing ocarina

Oh wow!
You tweaked it a lot, Mojo reader.


"On Earth and Space, he has all the tricks."

― ▒▒█𝚃𝙾𝙿 𝚂𝙴𝙲𝚁█𝚃 // 𝚂𝙲𝙸 // 𝙽▒▒▒▒𝙾𝙵𝙾𝚁𝙽