Conversation
Addresses the gap where post-deploy validation required a terminal +
curl + gh CLI. Now everything lives on /admin as real buttons and
live data.
New /admin page (app/admin/page.js):
- Deploy health panel — /healthz reachability + /api/stats.readiness
flags rendered as green/red dots: upstash redis, upstash vector,
groq, cerebras, gemini, fireworks, cron secret.
- Admin token input (cached in localStorage per-device) for action
authorization when ADMIN_TOKEN is configured server-side.
- Action buttons with live output:
* Run bench (6 scenarios, in-process, ~30-60s)
* Seed blind-eval from live bench
* Recompute dead-block skip list
* Run dialectical audit (reuses /api/dialectical?run=1)
- Live telemetry snapshot: prompt size, phase timings (waveA/B p95),
gauntlet pass rate, graph nodes/edges, relational-time multiplier.
- Blind A/B snapshot: votes, win rate, 95% CI, actually-better flag.
- APK build card: link to GitHub Actions + manifest/icon preview
links. APK build remains GitHub-hosted (not runnable in a Vercel
function).
- Nav pills to /stats, /retro, /dev, /blind-eval, /memory, /prefs,
/meet, /, and GitHub Actions.
New admin API endpoints (token-gated if ADMIN_TOKEN is set,
open in dev mode when not):
- POST /api/admin/bench — runs 6 privacy-mode scenarios against
self, measures TTFB / bridge latency / prose latency / sidecar
coverage, returns summary + per-scenario detail.
- POST /api/admin/recompute-skiplist — forces Step TT's skip-set
recomputation.
- POST /api/admin/seed-blind — runs the bench, then submits each
(gabriella-reply, baseline-stub) pair to /api/blind-eval so the
voting surface reflects current production output.
Main chat header (/) now has an 'admin' link between 'stats' and
'about'.
Full 20-scenario bench remains at scripts/integration-bench.js /
npm run bench-remote for offline runs budget-aware enough to not
fit in a 60s Vercel function.
Reverts the standalone /admin page from the previous commit — /dev was already the canonical ops surface (bootstrap training, datasets, fine-tune jobs, debug logs, token-gated). Shipping /admin alongside created two admin consoles. - Deleted app/admin/page.js and the 'admin' link in the main chat header. - Kept the three /api/admin/* action endpoints (bench, seed-blind, recompute-skiplist) — they're real endpoints with no UI coupling. - Extended /dev with a third tab 'Deploy & validate' that renders the same cards the deleted /admin had: deploy health dots, run bench, seed blind-eval, recompute skip list, run dialectical audit, blind A/B snapshot, APK build link, live telemetry. - DeployTab component inlined at the bottom of app/dev/page.js, reusing /dev's css + token conventions. Tab switch fetches /api/stats on mount. One canonical ops entrypoint again.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2df887b5e6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| method: "POST", | ||
| headers: { | ||
| "Content-Type": "application/json", | ||
| "Authorization": tokenHeader, |
There was a problem hiding this comment.
Forward x-admin-token when chaining to bench
POST /api/admin/seed-blind allows auth via either Authorization: Bearer ... or x-admin-token, but the internal call to /api/admin/bench only forwards the Authorization header. If a client uses the supported x-admin-token path, tokenHeader is empty and the bench request 401s, causing seed-blind to fail with bench failed. Forwarding both auth headers (or sharing the same auth context) avoids breaking one advertised auth mode.
Useful? React with 👍 / 👎.
| opener: lastUser(r), | ||
| category: r.category, | ||
| }, | ||
| a: { source: "gabriella-live", text: r.replyPreview }, |
There was a problem hiding this comment.
Submit full bench reply to blind-eval pairs
This route submits r.replyPreview as Gabriella’s candidate text, but the bench producer truncates that field to 200 characters. For any longer response, blind-eval stores a clipped answer instead of the real model output, which can skew pairwise voting results and downstream win-rate stats. Use the full bench reply text (or add a non-truncated field) when building a.text.
Useful? React with 👍 / 👎.
No description provided.