Infrastructure & platforms¶
What the bench depends on, what each platform does, how it's configured, and what breaks if any of it goes away. Operator-facing — read this when something's down, before rotating a credential, or when onboarding a future collaborator.
Stack at a glance¶
| Layer | Tool |
|---|---|
| Server | FastAPI (Python 3.11) + uvicorn (single worker, in-process cache) |
| Frontend | Jinja2 templates + HTMX. No SPA framework. |
| Database | Postgres via Supabase (Singapore region, ap-southeast-1 pooler) |
| Auth | Supabase Auth + GitHub OAuth (ES256 JWT signing, see app/auth.py) |
| LLM | Anthropic Claude (Sonnet 4.5 default; Haiku 4.5 for cheap paths) |
| Hosting | Fly.io — sin (Singapore) region, shared-cpu-1x VM, 512 MB |
| Edge / DNS | Cloudflare — orange-cloud proxy in front of Fly, Full (strict) SSL |
| Transactional email | Resend (operator alerts for access-request submissions) |
| Error monitoring | Sentry |
| Source + CI/CD | GitHub (Actions for test, docker-smoke, deploy, labeler, docs) |
| Issue tracking | Linear (Bench team, BEN-XXX prefix) ↔ auto-synced with GitHub Issues |
| Coding agent | Charlie Labs (installed as GitHub App) |
| Docs hosting | GitHub Pages (auto-built by mkdocs on push to main) |
External services — detail¶
Each section: role · auth · setup · what breaks if it's down.
Anthropic API¶
- Role — All LLM-touching paths: hints, code review, JD parsing, credentials analysis, hamming reflections, URL parsing for
/library/add, IAC-fit interpretation - Auth —
ANTHROPIC_API_KEYenv var; stored as Fly secret in prod - Models —
claude-sonnet-4-5default (set in service modules),claude-haiku-4-5for low-stakes / cheap calls (see_DEFAULT_MODELin each service) - Setup — Get API key from https://console.anthropic.com → set as Fly secret:
flyctl secrets set -a lis-bench ANTHROPIC_API_KEY="sk-ant-..." - Spend cap —
ANTHROPIC_DAILY_TOKEN_CAP=50000(per-user-per-day, wired viaapp/services/spend_cap.py). Circuit breaker against runaway loops. Adjust viaflyctl secrets set. - What breaks if it's down — Every LLM-touching service falls back to its deterministic offline path (keyword matching, canned hints, etc.). User-visible: hints get less specific, reviews are skipped, JDs use keyword-matching only. Spend cap also kicks in if the user has somehow burned through tokens; same fallback path.
Supabase (Postgres + Auth)¶
- Role — Single source of truth for application data (problems, attempts, content, JDs, reflections, feedback, access requests). Auth via GitHub OAuth → ES256-signed JWT.
- Project URL —
https://aemigbrrbykrwclhrpob.supabase.co - Region — Singapore (
ap-southeast-1) - Auth — Multiple secrets:
SUPABASE_URL— public project URLSUPABASE_ANON_KEY— public anon key (safe in browser)SUPABASE_SERVICE_ROLE_KEY— server-only, never ship to browserDATABASE_URL— Postgres connection string via Session poolerSUPABASE_JWT_SECRET— legacy HS256 fallback (Supabase migrated this project to ES256 JWKS;app/auth.pydispatches on JWT header alg)- Setup —
flyctl secrets set -a lis-bench SUPABASE_URL="..." SUPABASE_ANON_KEY="..." SUPABASE_SERVICE_ROLE_KEY="..." DATABASE_URL="..." SUPABASE_JWT_SECRET="..." - Migrations — Idempotent SQL files in
migrations_pg/, applied at boot byapp/db.py:apply_migrations(). Use the next sequential filename when adding (004_,005_, etc.). - What breaks if it's down — App boot fails (
apply_migrationserrors), or runtime queries time out. Health check/__healthis shallow (doesn't hit the DB) so Fly thinks the machine is up, but every page request 500s.
Fly.io¶
- Role — App hosting + auto-deploy. Container running uvicorn.
- App name —
lis-bench - Region —
sin(Singapore) — matches Supabase region for sub-10ms DB round-trips - VM —
shared-cpu-1x, 512 MB. Auto-stop when idle. ~$3-5/mo. - Auth —
FLY_API_TOKENstored as GitHub Actions secret for the deploy workflow - Setup — See Deploy guide for the full bootstrap. Auto-deploy fires from
.github/workflows/deploy.ymlon push tomain. - Spend cap — Fly PAYG has no hard cap. Set email alerts at $20/mo, $50/mo in Fly's billing dashboard. Pinned at
flyctl scale count 1to prevent runaway autoscale. - What breaks if it's down — Bench is unreachable. Cloudflare returns 502/503. Recovery:
flyctl status -a lis-bench+flyctl logs -a lis-benchto diagnose; rollback viaflyctl image rollback <previous-version>if a deploy went bad.
Cloudflare¶
- Role — DNS for
libearden.dev, edge proxy + CDN forbench.libearden.dev. Orange-cloud (proxied) for the bench subdomain. - Auth — Cloudflare account (operator's email). API access not currently used; everything configured via web UI.
- DNS records for
bench.libearden.dev: CNAME bench → 6no30jo.lis-bench.fly.dev(orange-clouded)CNAME _acme-challenge.bench → bench.libearden.dev.6no30jo.flydns.net(DNS-only, for ACME cert renewal)TXT _fly-ownership.bench → app-6no30jo(required when Fly app is behind a CDN)- SSL/TLS mode — Full (strict). CF talks HTTPS to origin AND validates Fly's cert. Other modes will cause redirect loops.
- Always Use HTTPS — ON.
- What breaks if it's down —
bench.libearden.devunreachable. Direct Fly URLlis-bench.fly.devstill works (no CF in front), so set that as the emergency-only access path. Cloudflare outages are rare but loud.
Resend (transactional email)¶
- Role — Operator notification emails when users submit
/request-accessforms. Soft-fail: missing API key just logs and skips; the DB row is the source of truth. - Auth — Two secrets:
RESEND_API_KEY— from https://resend.com → API KeysOPERATOR_NOTIFY_EMAIL— destination (e.g.li.bearden@proton.me)OPERATOR_NOTIFY_FROM— optional, defaults to a Resend-default sender; set tobench@libearden.devafter domain verification- Domain verification —
libearden.devis verified on Resend with DKIM + SPF + return-path records in Cloudflare DNS. Lets emails comeFrom: bench@libearden.devinstead of Resend's default sender. - Setup —
flyctl secrets set -a lis-bench RESEND_API_KEY="re_..." OPERATOR_NOTIFY_EMAIL="..." - Cost — Free tier (3000/month, 100/day). Realistic volume: ~1-5 emails/week. Will not hit limit.
- What breaks if it's down — Access-request form submissions still record to DB; operator alert emails don't send. Operator can poll
SELECT * FROM access_requests WHERE status='pending'to see what's queued. - Cloudflare bot detection — Resend API sits behind CF. App's
urlopencalls must include an explicitUser-Agentheader or they get HTTP 403 (CF error 1010). Seeapp/services/notifications.py.
Sentry¶
- Role — Error monitoring + breadcrumbs. Captures unhandled exceptions, manual
capture_exceptioncalls in service modules, and breadcrumbs fromspend_cap(cap-hit events). - Auth —
SENTRY_DSNenv var - Environment —
SENTRY_ENVIRONMENT=prodfor production deploys (also gates diagnostic routes — seeapp/services/security.py:diagnostics_enabled()) - Trace sample rate —
SENTRY_TRACES_SAMPLE_RATE=0.1default for non-local - Setup — From sentry.io dashboard, create project; copy DSN;
flyctl secrets set -a lis-bench SENTRY_DSN="..." - Cost — Free tier covers 5k errors + 10k performance units/month. We're well under.
- What breaks if it's down — Errors still happen but go unobserved. Service modules try-import
sentry_sdkdefensively, so missing Sentry doesn't crash anything; observability just degrades.
GitHub¶
- Role — Source code, issue tracking, Actions for CI/CD, GitHub Pages for docs, GitHub App platform for Charlie + Linear
- Repos —
LiBearden/research-engineer-interview-prep(private, will go public when ready to advertise) - Workflows (in
.github/workflows/): test.yml— unit tests on every push + PRdocker-smoke.yml— builds image + verifiesimport app.mainsucceeds inside itlabeler.yml— auto-applieshuman-onlyto PRs touching CODEOWNERS pathssync-labels.yml— syncslabels.yml→ repo labels on push to maindeploy.yml— auto-deploys to Fly on push to main (usesFLY_API_TOKENsecret)docs.yml— builds mkdocs + publishes to GitHub Pages (this file, added in this PR)- Secrets (in repo Settings → Secrets and variables → Actions):
FLY_API_TOKEN— for deploy workflow- CODEOWNERS —
.github/CODEOWNERSmarks auth, security, LLM prompts, migrations, deploy infra as@LiBearden-owned. Auto-labeler reads this to applyhuman-onlyto PRs touching protected paths. - What breaks if it's down — Can't push code (catastrophic), CI doesn't fire, deploys halt. Existing prod deploy keeps running. Recovery is wait-it-out — GitHub's SLAs are good.
Linear (Bench team)¶
- Role — Project tracking + the autonomous-loop dispatch surface. Issues sync bidirectionally with GitHub Issues. Team name:
Bench, issue prefix:BEN. Migrated from the multi-projectLi's Spaceteam on 2026-05-16 so this project has its own isolated issue space. - Auth — Operator account (
li.bearden@proton.me), Linear's GitHub integration installed - Sync rules — Two-way: GH issues create Linear issues and vice versa. Webhook URL configured at GitHub org-level. Magic-words enabled: commit / PR with
Resolves #NorCloses BEN-XXXauto-transitions the Linear-side issue throughIn Progress→Done. - Project assignment — Manual during operator triage. Default-project routing isn't on Basic plan. ~3 seconds extra per issue when triaging.
- What breaks if it's down — Auto-status transitions stop, sync stops. New GH issues still get created; new Linear issues just don't auto-mirror until Linear's back. Catch-up: Linear's sync replays missed events.
Charlie Labs (autonomous coding agent)¶
- Role — Picks up GitHub issues labeled
ready-for-agent+complexity:s/m+vendor:charlieand ships draft PRs. Identity:charlie[bot](GitHub App). - Auth — Charlie's GitHub App installed on the repo; no PAT to manage. One-click revoke from repo's "Installed GitHub Apps" settings.
- Configuration (in Charlie's dashboard): only acts on
complexity:s+complexity:missues, draft PRs only, $10/day daily budget cap. - Daemons installed in repo (
.agents/daemons/): github-activity-digest— posts Slack digest of PR/CI activitylinear-issue-labeler— dormant; would label Linear issues if its taxonomy template were populated. Currently no-ops.- What breaks if it's down —
ready-for-agentissues sit in the queue. Operator can still ship work manually. Charlie's outages have been rare; revisit if they become frequent. - Trial outcome — See
agents.mdfor the full 2026-05-16 trial data (5 issues, 4 PRs, 1 cannot-reproduce close, all passed the rubric).
GitHub Pages (docs)¶
- Role — Hosts the mkdocs-generated site at
https://libearden.github.io/research-engineer-interview-prep/ - Auth — Built by GH Actions, deployed to the
gh-pagesbranch which GitHub Pages serves - Setup — Workflow
.github/workflows/docs.ymlruns on push to main whendocs/**ormkdocs.ymlchanges. First-time setup: in repo Settings → Pages → set source togh-pagesbranch (one-time operator action). - What breaks if it's down — Docs site stale or unreachable. No effect on the running app.
Secrets inventory¶
Every secret the app reads from environment, where it lives, what depends on it.
| Secret | Where stored | Reader | Failure mode if missing |
|---|---|---|---|
ANTHROPIC_API_KEY |
Fly secrets | All LLM-touching services | Falls back to offline paths |
ANTHROPIC_DAILY_TOKEN_CAP |
Fly secrets (or default 200k) | app/services/security.py |
Defaults to 200k tokens/user/day |
SUPABASE_URL |
Fly secrets | app/auth.py JWKS fetch, Supabase client |
App boots but auth breaks |
SUPABASE_ANON_KEY |
Fly secrets | Browser-side magic-link client | Auth flow breaks |
SUPABASE_SERVICE_ROLE_KEY |
Fly secrets | Admin / verify flows | Some admin paths break |
SUPABASE_JWT_SECRET |
Fly secrets (legacy) | app/auth.py HS256 fallback |
OK to be missing if project is fully on ES256 |
DATABASE_URL |
Fly secrets | app/db.py:connect() |
App boot fails |
SENTRY_DSN |
Fly secrets | app/observability.py |
No error monitoring (degrades silently) |
SENTRY_ENVIRONMENT |
fly.toml |
Various | Defaults to local |
RESEND_API_KEY |
Fly secrets | app/services/notifications.py |
Operator emails skip (DB row still saved) |
OPERATOR_NOTIFY_EMAIL |
Fly secrets | app/services/notifications.py |
Same as above |
OPERATOR_NOTIFY_FROM |
Fly secrets (optional) | app/services/notifications.py |
Falls back to Resend default sender |
LIS_ALLOWED_USER_IDS |
Fly secrets | app/services/access.py |
Allowlist disabled → anyone with GitHub OAuth can sign in |
LIS_ALLOWED_EMAILS |
Fly secrets | app/services/access.py |
Same |
LIS_ADMIN_USER_IDS |
Fly secrets (or falls back) | app/services/access.py |
Falls back to LIS_ALLOWED_USER_IDS |
GITHUB_FEEDBACK_TOKEN |
Fly secrets | app/services/github_feedback.py |
Feedback → GH issue promotion soft-fails (row still saved) |
GITHUB_FEEDBACK_REPO |
Fly secrets (optional) | app/services/github_feedback.py |
Defaults to LiBearden/research-engineer-interview-prep |
LIS_DIAGNOSTICS_ENABLED |
Fly secrets | app/services/security.py |
/__sentry/* + /__claude/* diagnostic routes return 404 in prod |
FLY_API_TOKEN |
GitHub Actions secret | .github/workflows/deploy.yml |
Auto-deploy from main breaks |
Rotation cadence: every 6-12 months for Anthropic + Supabase service role + Fly tokens. Cloudflare uses operator account auth (no rotatable key in app). Charlie's App identity is revocable but not rotatable.
Cost summary¶
Monthly run-rate for the bench at current usage:
| Service | Cost |
|---|---|
| Fly.io (compute + IPv4) | ~$3-5 |
| Cloudflare | $0 (free tier) |
| Supabase | $0 (free tier; well under limits) |
| Anthropic | ~$1-3 (usage-based; capped per-user via ANTHROPIC_DAILY_TOKEN_CAP) |
| Resend | $0 (free tier) |
| Sentry | $0 (free tier) |
| GitHub | $0 (personal account, public-or-private repo) |
| Linear | (existing subscription — paid for cross-project use) |
| Charlie Labs | (per-task pricing; trial usage to date) |
| Bench-specific marginal cost | ~$5-10/mo |
Linear + Charlie aren't bench-only costs (Linear: multi-project use; Charlie: per-task). The bench's incremental burn is dominated by Fly + Anthropic.
Common failure modes + recovery¶
| Symptom | Likely cause | Recovery |
|---|---|---|
bench.libearden.dev returns 502 |
Fly machine auto-stopped + slow cold start, OR genuine crash | flyctl status -a lis-bench → if stopped, retry curl in 5s. If crash-looping, flyctl logs -a lis-bench to diagnose; rollback if a deploy went bad: flyctl image rollback <version> |
| Sign-in redirect loop | Supabase migrated JWT signing alg; app/auth.py falls behind |
Verify SUPABASE_URL set; check Supabase dashboard for current signing key type; app/auth.py should dispatch via JWKS for ES256 |
| Operator emails not arriving | RESEND_API_KEY missing/expired, OR Cloudflare bot detection blocking |
Check Fly logs for Resend HTTPError. If 1010, the User-Agent header is missing (regression). If 403 from Resend, key is expired or domain not verified. |
| Auto-deploy from main not firing | FLY_API_TOKEN expired |
Regenerate via flyctl tokens create deploy, update GitHub repo secret |
| Linear issues not auto-creating from GH | Webhook stopped firing or sync paused | Linear → Settings → Integrations → GitHub → click sync row → verify still active. Recent deliveries on the GitHub webhook page should show recent green checks. |
| Charlie not picking up issues | Charlie config drift OR Charlie's App permission revoked | Check Charlie dashboard for active config; verify the App is installed at GitHub repo Settings → Installed GitHub Apps |
| Spend cap hit unexpectedly | One of the per-user counters got bumped wrong | Reset via DB: UPDATE public.users SET settings_json = jsonb_set(settings_json, '{anthropic_today}', '{}') WHERE id = '...' |
| Docs site stale | docs.yml workflow didn't run on last push |
Check Actions tab; manually run mkdocs gh-deploy --force from a local clone as fallback |
Operational playbook — common procedures¶
Deploy a code change¶
Watchgh run watch or visit https://fly.io/apps/lis-bench/monitoring
Roll back a bad deploy¶
Tail logs¶
SSH into the running container¶
flyctl ssh console -a lis-bench
# Inside, source the venv if needed:
source /opt/venv/bin/activate
python -c "..."
Rotate Anthropic key¶
- Generate new key at https://console.anthropic.com
flyctl secrets set -a lis-bench ANTHROPIC_API_KEY="<new-key>"(triggers redeploy automatically)- Confirm old key works at least until next deploy is verified — Anthropic supports multiple active keys
- Revoke old key from Anthropic console once new deploy is verified
Enable diagnostics temporarily¶
flyctl secrets set -a lis-bench LIS_DIAGNOSTICS_ENABLED=1
# Investigate via /__sentry/test, /__claude/test
flyctl secrets set -a lis-bench LIS_DIAGNOSTICS_ENABLED=0
See also¶
- Deploy guide — first-time bootstrap + auto-deploy setup
- Access control — single-tenant allowlist + Cloudflare Access (outer gate)
- Autonomous coding agents — Charlie + tier model + trial outcomes
- Common dev tasks
- Setup & tests