Infrastructure & platforms¶

What the bench depends on, what each platform does, how it's configured, and what breaks if any of it goes away. Operator-facing — read this when something's down, before rotating a credential, or when onboarding a future collaborator.

Stack at a glance¶

Layer	Tool
Server	FastAPI (Python 3.11) + uvicorn (single worker, in-process cache)
Frontend	Jinja2 templates + HTMX. No SPA framework.
Database	Postgres via Supabase (Singapore region, `ap-southeast-1` pooler)
Auth	Supabase Auth + GitHub OAuth (ES256 JWT signing, see `app/auth.py`)
LLM	Anthropic Claude (Sonnet 4.5 default; Haiku 4.5 for cheap paths)
Hosting	Fly.io — `sin` (Singapore) region, `shared-cpu-1x` VM, 512 MB
Edge / DNS	Cloudflare — orange-cloud proxy in front of Fly, Full (strict) SSL
Transactional email	Resend (operator alerts for access-request submissions)
Error monitoring	Sentry
Source + CI/CD	GitHub (Actions for test, docker-smoke, deploy, labeler, docs)
Issue tracking	Linear (`Bench` team, `BEN-XXX` prefix) ↔ auto-synced with GitHub Issues
Coding agent	Charlie Labs (installed as GitHub App)
Docs hosting	GitHub Pages (auto-built by mkdocs on push to main)

External services — detail¶

Each section: role · auth · setup · what breaks if it's down.

Anthropic API¶

Role — All LLM-touching paths: hints, code review, JD parsing, credentials analysis, hamming reflections, URL parsing for /library/add, IAC-fit interpretation
Auth — ANTHROPIC_API_KEY env var; stored as Fly secret in prod
Models — claude-sonnet-4-5 default (set in service modules), claude-haiku-4-5 for low-stakes / cheap calls (see _DEFAULT_MODEL in each service)
Setup — Get API key from https://console.anthropic.com → set as Fly secret: flyctl secrets set -a lis-bench ANTHROPIC_API_KEY="sk-ant-..."
Spend cap — ANTHROPIC_DAILY_TOKEN_CAP=50000 (per-user-per-day, wired via app/services/spend_cap.py). Circuit breaker against runaway loops. Adjust via flyctl secrets set.
What breaks if it's down — Every LLM-touching service falls back to its deterministic offline path (keyword matching, canned hints, etc.). User-visible: hints get less specific, reviews are skipped, JDs use keyword-matching only. Spend cap also kicks in if the user has somehow burned through tokens; same fallback path.

Supabase (Postgres + Auth)¶

Role — Single source of truth for application data (problems, attempts, content, JDs, reflections, feedback, access requests). Auth via GitHub OAuth → ES256-signed JWT.
Project URL — https://aemigbrrbykrwclhrpob.supabase.co
Region — Singapore (ap-southeast-1)
Auth — Multiple secrets:
SUPABASE_URL — public project URL
SUPABASE_ANON_KEY — public anon key (safe in browser)
SUPABASE_SERVICE_ROLE_KEY — server-only, never ship to browser
DATABASE_URL — Postgres connection string via Session pooler
SUPABASE_JWT_SECRET — legacy HS256 fallback (Supabase migrated this project to ES256 JWKS; app/auth.py dispatches on JWT header alg)
Setup — flyctl secrets set -a lis-bench SUPABASE_URL="..." SUPABASE_ANON_KEY="..." SUPABASE_SERVICE_ROLE_KEY="..." DATABASE_URL="..." SUPABASE_JWT_SECRET="..."
Migrations — Idempotent SQL files in migrations_pg/, applied at boot by app/db.py:apply_migrations(). Use the next sequential filename when adding (004_, 005_, etc.).
What breaks if it's down — App boot fails (apply_migrations errors), or runtime queries time out. Health check /__health is shallow (doesn't hit the DB) so Fly thinks the machine is up, but every page request 500s.

Fly.io¶

Role — App hosting + auto-deploy. Container running uvicorn.
App name — lis-bench
Region — sin (Singapore) — matches Supabase region for sub-10ms DB round-trips
VM — shared-cpu-1x, 512 MB. Auto-stop when idle. ~$3-5/mo.
Auth — FLY_API_TOKEN stored as GitHub Actions secret for the deploy workflow
Setup — See Deploy guide for the full bootstrap. Auto-deploy fires from .github/workflows/deploy.yml on push to main.
Spend cap — Fly PAYG has no hard cap. Set email alerts at $20/mo, $50/mo in Fly's billing dashboard. Pinned at flyctl scale count 1 to prevent runaway autoscale.
What breaks if it's down — Bench is unreachable. Cloudflare returns 502/503. Recovery: flyctl status -a lis-bench + flyctl logs -a lis-bench to diagnose; rollback via flyctl image rollback <previous-version> if a deploy went bad.

Cloudflare¶

Role — DNS for libearden.dev, edge proxy + CDN for bench.libearden.dev. Orange-cloud (proxied) for the bench subdomain.
Auth — Cloudflare account (operator's email). API access not currently used; everything configured via web UI.
DNS records for bench.libearden.dev:
CNAME bench → 6no30jo.lis-bench.fly.dev (orange-clouded)
CNAME _acme-challenge.bench → bench.libearden.dev.6no30jo.flydns.net (DNS-only, for ACME cert renewal)
TXT _fly-ownership.bench → app-6no30jo (required when Fly app is behind a CDN)
SSL/TLS mode — Full (strict). CF talks HTTPS to origin AND validates Fly's cert. Other modes will cause redirect loops.
Always Use HTTPS — ON.
What breaks if it's down — bench.libearden.dev unreachable. Direct Fly URL lis-bench.fly.dev still works (no CF in front), so set that as the emergency-only access path. Cloudflare outages are rare but loud.

Resend (transactional email)¶

Role — Operator notification emails when users submit /request-access forms. Soft-fail: missing API key just logs and skips; the DB row is the source of truth.
Auth — Two secrets:
RESEND_API_KEY — from https://resend.com → API Keys
OPERATOR_NOTIFY_EMAIL — destination (e.g. li.bearden@proton.me)
OPERATOR_NOTIFY_FROM — optional, defaults to a Resend-default sender; set to bench@libearden.dev after domain verification
Domain verification — libearden.dev is verified on Resend with DKIM + SPF + return-path records in Cloudflare DNS. Lets emails come From: bench@libearden.dev instead of Resend's default sender.
Setup — flyctl secrets set -a lis-bench RESEND_API_KEY="re_..." OPERATOR_NOTIFY_EMAIL="..."
Cost — Free tier (3000/month, 100/day). Realistic volume: ~1-5 emails/week. Will not hit limit.
What breaks if it's down — Access-request form submissions still record to DB; operator alert emails don't send. Operator can poll SELECT * FROM access_requests WHERE status='pending' to see what's queued.
Cloudflare bot detection — Resend API sits behind CF. App's urlopen calls must include an explicit User-Agent header or they get HTTP 403 (CF error 1010). See app/services/notifications.py.

Sentry¶

Role — Error monitoring + breadcrumbs. Captures unhandled exceptions, manual capture_exception calls in service modules, and breadcrumbs from spend_cap (cap-hit events).
Auth — SENTRY_DSN env var
Environment — SENTRY_ENVIRONMENT=prod for production deploys (also gates diagnostic routes — see app/services/security.py:diagnostics_enabled())
Trace sample rate — SENTRY_TRACES_SAMPLE_RATE=0.1 default for non-local
Setup — From sentry.io dashboard, create project; copy DSN; flyctl secrets set -a lis-bench SENTRY_DSN="..."
Cost — Free tier covers 5k errors + 10k performance units/month. We're well under.
What breaks if it's down — Errors still happen but go unobserved. Service modules try-import sentry_sdk defensively, so missing Sentry doesn't crash anything; observability just degrades.

GitHub¶

Role — Source code, issue tracking, Actions for CI/CD, GitHub Pages for docs, GitHub App platform for Charlie + Linear
Repos — LiBearden/research-engineer-interview-prep (private, will go public when ready to advertise)
Workflows (in .github/workflows/):
test.yml — unit tests on every push + PR
docker-smoke.yml — builds image + verifies import app.main succeeds inside it
labeler.yml — auto-applies human-only to PRs touching CODEOWNERS paths
sync-labels.yml — syncs labels.yml → repo labels on push to main
deploy.yml — auto-deploys to Fly on push to main (uses FLY_API_TOKEN secret)
docs.yml — builds mkdocs + publishes to GitHub Pages (this file, added in this PR)
Secrets (in repo Settings → Secrets and variables → Actions):
FLY_API_TOKEN — for deploy workflow
CODEOWNERS — .github/CODEOWNERS marks auth, security, LLM prompts, migrations, deploy infra as @LiBearden-owned. Auto-labeler reads this to apply human-only to PRs touching protected paths.
What breaks if it's down — Can't push code (catastrophic), CI doesn't fire, deploys halt. Existing prod deploy keeps running. Recovery is wait-it-out — GitHub's SLAs are good.

Linear (Bench team)¶

Role — Project tracking + the autonomous-loop dispatch surface. Issues sync bidirectionally with GitHub Issues. Team name: Bench, issue prefix: BEN. Migrated from the multi-project Li's Space team on 2026-05-16 so this project has its own isolated issue space.
Auth — Operator account (li.bearden@proton.me), Linear's GitHub integration installed
Sync rules — Two-way: GH issues create Linear issues and vice versa. Webhook URL configured at GitHub org-level. Magic-words enabled: commit / PR with Resolves #N or Closes BEN-XXX auto-transitions the Linear-side issue through In Progress → Done.
Project assignment — Manual during operator triage. Default-project routing isn't on Basic plan. ~3 seconds extra per issue when triaging.
What breaks if it's down — Auto-status transitions stop, sync stops. New GH issues still get created; new Linear issues just don't auto-mirror until Linear's back. Catch-up: Linear's sync replays missed events.

Charlie Labs (autonomous coding agent)¶

Role — Picks up GitHub issues labeled ready-for-agent + complexity:s/m + vendor:charlie and ships draft PRs. Identity: charlie[bot] (GitHub App).
Auth — Charlie's GitHub App installed on the repo; no PAT to manage. One-click revoke from repo's "Installed GitHub Apps" settings.
Configuration (in Charlie's dashboard): only acts on complexity:s + complexity:m issues, draft PRs only, $10/day daily budget cap.
Daemons installed in repo (.agents/daemons/):
github-activity-digest — posts Slack digest of PR/CI activity
linear-issue-labeler — dormant; would label Linear issues if its taxonomy template were populated. Currently no-ops.
What breaks if it's down — ready-for-agent issues sit in the queue. Operator can still ship work manually. Charlie's outages have been rare; revisit if they become frequent.
Trial outcome — See agents.md for the full 2026-05-16 trial data (5 issues, 4 PRs, 1 cannot-reproduce close, all passed the rubric).

GitHub Pages (docs)¶

Role — Hosts the mkdocs-generated site at https://libearden.github.io/research-engineer-interview-prep/
Auth — Built by GH Actions, deployed to the gh-pages branch which GitHub Pages serves
Setup — Workflow .github/workflows/docs.yml runs on push to main when docs/** or mkdocs.yml changes. First-time setup: in repo Settings → Pages → set source to gh-pages branch (one-time operator action).
What breaks if it's down — Docs site stale or unreachable. No effect on the running app.

Secrets inventory¶

Every secret the app reads from environment, where it lives, what depends on it.

Secret	Where stored	Reader	Failure mode if missing
`ANTHROPIC_API_KEY`	Fly secrets	All LLM-touching services	Falls back to offline paths
`ANTHROPIC_DAILY_TOKEN_CAP`	Fly secrets (or default 200k)	`app/services/security.py`	Defaults to 200k tokens/user/day
`SUPABASE_URL`	Fly secrets	`app/auth.py` JWKS fetch, Supabase client	App boots but auth breaks
`SUPABASE_ANON_KEY`	Fly secrets	Browser-side magic-link client	Auth flow breaks
`SUPABASE_SERVICE_ROLE_KEY`	Fly secrets	Admin / verify flows	Some admin paths break
`SUPABASE_JWT_SECRET`	Fly secrets (legacy)	`app/auth.py` HS256 fallback	OK to be missing if project is fully on ES256
`DATABASE_URL`	Fly secrets	`app/db.py:connect()`	App boot fails
`SENTRY_DSN`	Fly secrets	`app/observability.py`	No error monitoring (degrades silently)
`SENTRY_ENVIRONMENT`	`fly.toml`	Various	Defaults to `local`
`RESEND_API_KEY`	Fly secrets	`app/services/notifications.py`	Operator emails skip (DB row still saved)
`OPERATOR_NOTIFY_EMAIL`	Fly secrets	`app/services/notifications.py`	Same as above
`OPERATOR_NOTIFY_FROM`	Fly secrets (optional)	`app/services/notifications.py`	Falls back to Resend default sender
`LIS_ALLOWED_USER_IDS`	Fly secrets	`app/services/access.py`	Allowlist disabled → anyone with GitHub OAuth can sign in
`LIS_ALLOWED_EMAILS`	Fly secrets	`app/services/access.py`	Same
`LIS_ADMIN_USER_IDS`	Fly secrets (or falls back)	`app/services/access.py`	Falls back to `LIS_ALLOWED_USER_IDS`
`GITHUB_FEEDBACK_TOKEN`	Fly secrets	`app/services/github_feedback.py`	Feedback → GH issue promotion soft-fails (row still saved)
`GITHUB_FEEDBACK_REPO`	Fly secrets (optional)	`app/services/github_feedback.py`	Defaults to `LiBearden/research-engineer-interview-prep`
`LIS_DIAGNOSTICS_ENABLED`	Fly secrets	`app/services/security.py`	`/__sentry/` + `/__claude/` diagnostic routes return 404 in prod
`FLY_API_TOKEN`	GitHub Actions secret	`.github/workflows/deploy.yml`	Auto-deploy from main breaks

Rotation cadence: every 6-12 months for Anthropic + Supabase service role + Fly tokens. Cloudflare uses operator account auth (no rotatable key in app). Charlie's App identity is revocable but not rotatable.

Cost summary¶

Monthly run-rate for the bench at current usage:

Service	Cost
Fly.io (compute + IPv4)	~$3-5
Cloudflare	$0 (free tier)
Supabase	$0 (free tier; well under limits)
Anthropic	~$1-3 (usage-based; capped per-user via `ANTHROPIC_DAILY_TOKEN_CAP`)
Resend	$0 (free tier)
Sentry	$0 (free tier)
GitHub	$0 (personal account, public-or-private repo)
Linear	(existing subscription — paid for cross-project use)
Charlie Labs	(per-task pricing; trial usage to date)
Bench-specific marginal cost	~$5-10/mo

Linear + Charlie aren't bench-only costs (Linear: multi-project use; Charlie: per-task). The bench's incremental burn is dominated by Fly + Anthropic.

Common failure modes + recovery¶

Symptom	Likely cause	Recovery
`bench.libearden.dev` returns 502	Fly machine auto-stopped + slow cold start, OR genuine crash	`flyctl status -a lis-bench` → if stopped, retry curl in 5s. If crash-looping, `flyctl logs -a lis-bench` to diagnose; rollback if a deploy went bad: `flyctl image rollback <version>`
Sign-in redirect loop	Supabase migrated JWT signing alg; `app/auth.py` falls behind	Verify `SUPABASE_URL` set; check Supabase dashboard for current signing key type; `app/auth.py` should dispatch via JWKS for ES256
Operator emails not arriving	`RESEND_API_KEY` missing/expired, OR Cloudflare bot detection blocking	Check Fly logs for `Resend HTTPError`. If `1010`, the User-Agent header is missing (regression). If `403` from Resend, key is expired or domain not verified.
Auto-deploy from main not firing	`FLY_API_TOKEN` expired	Regenerate via `flyctl tokens create deploy`, update GitHub repo secret
Linear issues not auto-creating from GH	Webhook stopped firing or sync paused	Linear → Settings → Integrations → GitHub → click sync row → verify still active. Recent deliveries on the GitHub webhook page should show recent green checks.
Charlie not picking up issues	Charlie config drift OR Charlie's App permission revoked	Check Charlie dashboard for active config; verify the App is installed at GitHub repo Settings → Installed GitHub Apps
Spend cap hit unexpectedly	One of the per-user counters got bumped wrong	Reset via DB: `UPDATE public.users SET settings_json = jsonb_set(settings_json, '{anthropic_today}', '{}') WHERE id = '...'`
Docs site stale	`docs.yml` workflow didn't run on last push	Check Actions tab; manually run `mkdocs gh-deploy --force` from a local clone as fallback

Operational playbook — common procedures¶

Deploy a code change¶

git push origin main   # triggers test.yml + deploy.yml automatically

Watch gh run watch or visit https://fly.io/apps/lis-bench/monitoring

Roll back a bad deploy¶

flyctl releases -a lis-bench   # see recent versions
flyctl image rollback <version> -a lis-bench

Tail logs¶

flyctl logs -a lis-bench

SSH into the running container¶

flyctl ssh console -a lis-bench
# Inside, source the venv if needed:
source /opt/venv/bin/activate
python -c "..."

Rotate Anthropic key¶

Generate new key at https://console.anthropic.com
flyctl secrets set -a lis-bench ANTHROPIC_API_KEY="<new-key>" (triggers redeploy automatically)
Confirm old key works at least until next deploy is verified — Anthropic supports multiple active keys
Revoke old key from Anthropic console once new deploy is verified

Enable diagnostics temporarily¶

flyctl secrets set -a lis-bench LIS_DIAGNOSTICS_ENABLED=1
# Investigate via /__sentry/test, /__claude/test
flyctl secrets set -a lis-bench LIS_DIAGNOSTICS_ENABLED=0