Skip to content

Infrastructure & platforms

What the bench depends on, what each platform does, how it's configured, and what breaks if any of it goes away. Operator-facing — read this when something's down, before rotating a credential, or when onboarding a future collaborator.

Stack at a glance

Layer Tool
Server FastAPI (Python 3.11) + uvicorn (single worker, in-process cache)
Frontend Jinja2 templates + HTMX. No SPA framework.
Database Postgres via Supabase (Singapore region, ap-southeast-1 pooler)
Auth Supabase Auth + GitHub OAuth (ES256 JWT signing, see app/auth.py)
LLM Anthropic Claude (Sonnet 4.5 default; Haiku 4.5 for cheap paths)
Hosting Fly.io — sin (Singapore) region, shared-cpu-1x VM, 512 MB
Edge / DNS Cloudflare — orange-cloud proxy in front of Fly, Full (strict) SSL
Transactional email Resend (operator alerts for access-request submissions)
Error monitoring Sentry
Source + CI/CD GitHub (Actions for test, docker-smoke, deploy, labeler, docs)
Issue tracking Linear (Bench team, BEN-XXX prefix) ↔ auto-synced with GitHub Issues
Coding agent Charlie Labs (installed as GitHub App)
Docs hosting GitHub Pages (auto-built by mkdocs on push to main)

External services — detail

Each section: role · auth · setup · what breaks if it's down.

Anthropic API

  • Role — All LLM-touching paths: hints, code review, JD parsing, credentials analysis, hamming reflections, URL parsing for /library/add, IAC-fit interpretation
  • AuthANTHROPIC_API_KEY env var; stored as Fly secret in prod
  • Modelsclaude-sonnet-4-5 default (set in service modules), claude-haiku-4-5 for low-stakes / cheap calls (see _DEFAULT_MODEL in each service)
  • Setup — Get API key from https://console.anthropic.com → set as Fly secret: flyctl secrets set -a lis-bench ANTHROPIC_API_KEY="sk-ant-..."
  • Spend capANTHROPIC_DAILY_TOKEN_CAP=50000 (per-user-per-day, wired via app/services/spend_cap.py). Circuit breaker against runaway loops. Adjust via flyctl secrets set.
  • What breaks if it's down — Every LLM-touching service falls back to its deterministic offline path (keyword matching, canned hints, etc.). User-visible: hints get less specific, reviews are skipped, JDs use keyword-matching only. Spend cap also kicks in if the user has somehow burned through tokens; same fallback path.

Supabase (Postgres + Auth)

  • Role — Single source of truth for application data (problems, attempts, content, JDs, reflections, feedback, access requests). Auth via GitHub OAuth → ES256-signed JWT.
  • Project URLhttps://aemigbrrbykrwclhrpob.supabase.co
  • Region — Singapore (ap-southeast-1)
  • Auth — Multiple secrets:
  • SUPABASE_URL — public project URL
  • SUPABASE_ANON_KEY — public anon key (safe in browser)
  • SUPABASE_SERVICE_ROLE_KEY — server-only, never ship to browser
  • DATABASE_URL — Postgres connection string via Session pooler
  • SUPABASE_JWT_SECRET — legacy HS256 fallback (Supabase migrated this project to ES256 JWKS; app/auth.py dispatches on JWT header alg)
  • Setupflyctl secrets set -a lis-bench SUPABASE_URL="..." SUPABASE_ANON_KEY="..." SUPABASE_SERVICE_ROLE_KEY="..." DATABASE_URL="..." SUPABASE_JWT_SECRET="..."
  • Migrations — Idempotent SQL files in migrations_pg/, applied at boot by app/db.py:apply_migrations(). Use the next sequential filename when adding (004_, 005_, etc.).
  • What breaks if it's down — App boot fails (apply_migrations errors), or runtime queries time out. Health check /__health is shallow (doesn't hit the DB) so Fly thinks the machine is up, but every page request 500s.

Fly.io

  • Role — App hosting + auto-deploy. Container running uvicorn.
  • App namelis-bench
  • Regionsin (Singapore) — matches Supabase region for sub-10ms DB round-trips
  • VMshared-cpu-1x, 512 MB. Auto-stop when idle. ~$3-5/mo.
  • AuthFLY_API_TOKEN stored as GitHub Actions secret for the deploy workflow
  • Setup — See Deploy guide for the full bootstrap. Auto-deploy fires from .github/workflows/deploy.yml on push to main.
  • Spend cap — Fly PAYG has no hard cap. Set email alerts at $20/mo, $50/mo in Fly's billing dashboard. Pinned at flyctl scale count 1 to prevent runaway autoscale.
  • What breaks if it's down — Bench is unreachable. Cloudflare returns 502/503. Recovery: flyctl status -a lis-bench + flyctl logs -a lis-bench to diagnose; rollback via flyctl image rollback <previous-version> if a deploy went bad.

Cloudflare

  • Role — DNS for libearden.dev, edge proxy + CDN for bench.libearden.dev. Orange-cloud (proxied) for the bench subdomain.
  • Auth — Cloudflare account (operator's email). API access not currently used; everything configured via web UI.
  • DNS records for bench.libearden.dev:
  • CNAME bench → 6no30jo.lis-bench.fly.dev (orange-clouded)
  • CNAME _acme-challenge.bench → bench.libearden.dev.6no30jo.flydns.net (DNS-only, for ACME cert renewal)
  • TXT _fly-ownership.bench → app-6no30jo (required when Fly app is behind a CDN)
  • SSL/TLS modeFull (strict). CF talks HTTPS to origin AND validates Fly's cert. Other modes will cause redirect loops.
  • Always Use HTTPS — ON.
  • What breaks if it's downbench.libearden.dev unreachable. Direct Fly URL lis-bench.fly.dev still works (no CF in front), so set that as the emergency-only access path. Cloudflare outages are rare but loud.

Resend (transactional email)

  • Role — Operator notification emails when users submit /request-access forms. Soft-fail: missing API key just logs and skips; the DB row is the source of truth.
  • Auth — Two secrets:
  • RESEND_API_KEY — from https://resend.com → API Keys
  • OPERATOR_NOTIFY_EMAIL — destination (e.g. li.bearden@proton.me)
  • OPERATOR_NOTIFY_FROM — optional, defaults to a Resend-default sender; set to bench@libearden.dev after domain verification
  • Domain verificationlibearden.dev is verified on Resend with DKIM + SPF + return-path records in Cloudflare DNS. Lets emails come From: bench@libearden.dev instead of Resend's default sender.
  • Setupflyctl secrets set -a lis-bench RESEND_API_KEY="re_..." OPERATOR_NOTIFY_EMAIL="..."
  • Cost — Free tier (3000/month, 100/day). Realistic volume: ~1-5 emails/week. Will not hit limit.
  • What breaks if it's down — Access-request form submissions still record to DB; operator alert emails don't send. Operator can poll SELECT * FROM access_requests WHERE status='pending' to see what's queued.
  • Cloudflare bot detection — Resend API sits behind CF. App's urlopen calls must include an explicit User-Agent header or they get HTTP 403 (CF error 1010). See app/services/notifications.py.

Sentry

  • Role — Error monitoring + breadcrumbs. Captures unhandled exceptions, manual capture_exception calls in service modules, and breadcrumbs from spend_cap (cap-hit events).
  • AuthSENTRY_DSN env var
  • EnvironmentSENTRY_ENVIRONMENT=prod for production deploys (also gates diagnostic routes — see app/services/security.py:diagnostics_enabled())
  • Trace sample rateSENTRY_TRACES_SAMPLE_RATE=0.1 default for non-local
  • Setup — From sentry.io dashboard, create project; copy DSN; flyctl secrets set -a lis-bench SENTRY_DSN="..."
  • Cost — Free tier covers 5k errors + 10k performance units/month. We're well under.
  • What breaks if it's down — Errors still happen but go unobserved. Service modules try-import sentry_sdk defensively, so missing Sentry doesn't crash anything; observability just degrades.

GitHub

  • Role — Source code, issue tracking, Actions for CI/CD, GitHub Pages for docs, GitHub App platform for Charlie + Linear
  • ReposLiBearden/research-engineer-interview-prep (private, will go public when ready to advertise)
  • Workflows (in .github/workflows/):
  • test.yml — unit tests on every push + PR
  • docker-smoke.yml — builds image + verifies import app.main succeeds inside it
  • labeler.yml — auto-applies human-only to PRs touching CODEOWNERS paths
  • sync-labels.yml — syncs labels.yml → repo labels on push to main
  • deploy.yml — auto-deploys to Fly on push to main (uses FLY_API_TOKEN secret)
  • docs.yml — builds mkdocs + publishes to GitHub Pages (this file, added in this PR)
  • Secrets (in repo Settings → Secrets and variables → Actions):
  • FLY_API_TOKEN — for deploy workflow
  • CODEOWNERS.github/CODEOWNERS marks auth, security, LLM prompts, migrations, deploy infra as @LiBearden-owned. Auto-labeler reads this to apply human-only to PRs touching protected paths.
  • What breaks if it's down — Can't push code (catastrophic), CI doesn't fire, deploys halt. Existing prod deploy keeps running. Recovery is wait-it-out — GitHub's SLAs are good.

Linear (Bench team)

  • Role — Project tracking + the autonomous-loop dispatch surface. Issues sync bidirectionally with GitHub Issues. Team name: Bench, issue prefix: BEN. Migrated from the multi-project Li's Space team on 2026-05-16 so this project has its own isolated issue space.
  • Auth — Operator account (li.bearden@proton.me), Linear's GitHub integration installed
  • Sync rules — Two-way: GH issues create Linear issues and vice versa. Webhook URL configured at GitHub org-level. Magic-words enabled: commit / PR with Resolves #N or Closes BEN-XXX auto-transitions the Linear-side issue through In ProgressDone.
  • Project assignment — Manual during operator triage. Default-project routing isn't on Basic plan. ~3 seconds extra per issue when triaging.
  • What breaks if it's down — Auto-status transitions stop, sync stops. New GH issues still get created; new Linear issues just don't auto-mirror until Linear's back. Catch-up: Linear's sync replays missed events.

Charlie Labs (autonomous coding agent)

  • Role — Picks up GitHub issues labeled ready-for-agent + complexity:s/m + vendor:charlie and ships draft PRs. Identity: charlie[bot] (GitHub App).
  • Auth — Charlie's GitHub App installed on the repo; no PAT to manage. One-click revoke from repo's "Installed GitHub Apps" settings.
  • Configuration (in Charlie's dashboard): only acts on complexity:s + complexity:m issues, draft PRs only, $10/day daily budget cap.
  • Daemons installed in repo (.agents/daemons/):
  • github-activity-digest — posts Slack digest of PR/CI activity
  • linear-issue-labeler — dormant; would label Linear issues if its taxonomy template were populated. Currently no-ops.
  • What breaks if it's downready-for-agent issues sit in the queue. Operator can still ship work manually. Charlie's outages have been rare; revisit if they become frequent.
  • Trial outcome — See agents.md for the full 2026-05-16 trial data (5 issues, 4 PRs, 1 cannot-reproduce close, all passed the rubric).

GitHub Pages (docs)

  • Role — Hosts the mkdocs-generated site at https://libearden.github.io/research-engineer-interview-prep/
  • Auth — Built by GH Actions, deployed to the gh-pages branch which GitHub Pages serves
  • Setup — Workflow .github/workflows/docs.yml runs on push to main when docs/** or mkdocs.yml changes. First-time setup: in repo Settings → Pages → set source to gh-pages branch (one-time operator action).
  • What breaks if it's down — Docs site stale or unreachable. No effect on the running app.

Secrets inventory

Every secret the app reads from environment, where it lives, what depends on it.

Secret Where stored Reader Failure mode if missing
ANTHROPIC_API_KEY Fly secrets All LLM-touching services Falls back to offline paths
ANTHROPIC_DAILY_TOKEN_CAP Fly secrets (or default 200k) app/services/security.py Defaults to 200k tokens/user/day
SUPABASE_URL Fly secrets app/auth.py JWKS fetch, Supabase client App boots but auth breaks
SUPABASE_ANON_KEY Fly secrets Browser-side magic-link client Auth flow breaks
SUPABASE_SERVICE_ROLE_KEY Fly secrets Admin / verify flows Some admin paths break
SUPABASE_JWT_SECRET Fly secrets (legacy) app/auth.py HS256 fallback OK to be missing if project is fully on ES256
DATABASE_URL Fly secrets app/db.py:connect() App boot fails
SENTRY_DSN Fly secrets app/observability.py No error monitoring (degrades silently)
SENTRY_ENVIRONMENT fly.toml Various Defaults to local
RESEND_API_KEY Fly secrets app/services/notifications.py Operator emails skip (DB row still saved)
OPERATOR_NOTIFY_EMAIL Fly secrets app/services/notifications.py Same as above
OPERATOR_NOTIFY_FROM Fly secrets (optional) app/services/notifications.py Falls back to Resend default sender
LIS_ALLOWED_USER_IDS Fly secrets app/services/access.py Allowlist disabled → anyone with GitHub OAuth can sign in
LIS_ALLOWED_EMAILS Fly secrets app/services/access.py Same
LIS_ADMIN_USER_IDS Fly secrets (or falls back) app/services/access.py Falls back to LIS_ALLOWED_USER_IDS
GITHUB_FEEDBACK_TOKEN Fly secrets app/services/github_feedback.py Feedback → GH issue promotion soft-fails (row still saved)
GITHUB_FEEDBACK_REPO Fly secrets (optional) app/services/github_feedback.py Defaults to LiBearden/research-engineer-interview-prep
LIS_DIAGNOSTICS_ENABLED Fly secrets app/services/security.py /__sentry/* + /__claude/* diagnostic routes return 404 in prod
FLY_API_TOKEN GitHub Actions secret .github/workflows/deploy.yml Auto-deploy from main breaks

Rotation cadence: every 6-12 months for Anthropic + Supabase service role + Fly tokens. Cloudflare uses operator account auth (no rotatable key in app). Charlie's App identity is revocable but not rotatable.

Cost summary

Monthly run-rate for the bench at current usage:

Service Cost
Fly.io (compute + IPv4) ~$3-5
Cloudflare $0 (free tier)
Supabase $0 (free tier; well under limits)
Anthropic ~$1-3 (usage-based; capped per-user via ANTHROPIC_DAILY_TOKEN_CAP)
Resend $0 (free tier)
Sentry $0 (free tier)
GitHub $0 (personal account, public-or-private repo)
Linear (existing subscription — paid for cross-project use)
Charlie Labs (per-task pricing; trial usage to date)
Bench-specific marginal cost ~$5-10/mo

Linear + Charlie aren't bench-only costs (Linear: multi-project use; Charlie: per-task). The bench's incremental burn is dominated by Fly + Anthropic.

Common failure modes + recovery

Symptom Likely cause Recovery
bench.libearden.dev returns 502 Fly machine auto-stopped + slow cold start, OR genuine crash flyctl status -a lis-bench → if stopped, retry curl in 5s. If crash-looping, flyctl logs -a lis-bench to diagnose; rollback if a deploy went bad: flyctl image rollback <version>
Sign-in redirect loop Supabase migrated JWT signing alg; app/auth.py falls behind Verify SUPABASE_URL set; check Supabase dashboard for current signing key type; app/auth.py should dispatch via JWKS for ES256
Operator emails not arriving RESEND_API_KEY missing/expired, OR Cloudflare bot detection blocking Check Fly logs for Resend HTTPError. If 1010, the User-Agent header is missing (regression). If 403 from Resend, key is expired or domain not verified.
Auto-deploy from main not firing FLY_API_TOKEN expired Regenerate via flyctl tokens create deploy, update GitHub repo secret
Linear issues not auto-creating from GH Webhook stopped firing or sync paused Linear → Settings → Integrations → GitHub → click sync row → verify still active. Recent deliveries on the GitHub webhook page should show recent green checks.
Charlie not picking up issues Charlie config drift OR Charlie's App permission revoked Check Charlie dashboard for active config; verify the App is installed at GitHub repo Settings → Installed GitHub Apps
Spend cap hit unexpectedly One of the per-user counters got bumped wrong Reset via DB: UPDATE public.users SET settings_json = jsonb_set(settings_json, '{anthropic_today}', '{}') WHERE id = '...'
Docs site stale docs.yml workflow didn't run on last push Check Actions tab; manually run mkdocs gh-deploy --force from a local clone as fallback

Operational playbook — common procedures

Deploy a code change

git push origin main   # triggers test.yml + deploy.yml automatically
Watch gh run watch or visit https://fly.io/apps/lis-bench/monitoring

Roll back a bad deploy

flyctl releases -a lis-bench   # see recent versions
flyctl image rollback <version> -a lis-bench

Tail logs

flyctl logs -a lis-bench

SSH into the running container

flyctl ssh console -a lis-bench
# Inside, source the venv if needed:
source /opt/venv/bin/activate
python -c "..."

Rotate Anthropic key

  1. Generate new key at https://console.anthropic.com
  2. flyctl secrets set -a lis-bench ANTHROPIC_API_KEY="<new-key>" (triggers redeploy automatically)
  3. Confirm old key works at least until next deploy is verified — Anthropic supports multiple active keys
  4. Revoke old key from Anthropic console once new deploy is verified

Enable diagnostics temporarily

flyctl secrets set -a lis-bench LIS_DIAGNOSTICS_ENABLED=1
# Investigate via /__sentry/test, /__claude/test
flyctl secrets set -a lis-bench LIS_DIAGNOSTICS_ENABLED=0

See also