Testing magic link auth: every edge case, no scripts
Magic link login looks simple and fails in fascinating ways — expired tokens, replay attacks, account enumeration, late delivery. Here's how to cover every edge case in plain-English prompts.
Testing magic link auth: every edge case, no scripts
Magic links are the one auth flow that looks easy and is not. The happy path is six lines of code: generate a token, email it, accept it, sign the user in. The failure modes are where the bugs hide — and they're the kind of bugs you ship without noticing, because nobody clicks an expired link in the test environment unless they're trying to.
This is the testing playbook we'd actually run against a magic-link flow, written as five plain-English prompts. No Playwright, no selectors, no token-decoding logic in your test suite. If you've been here before, scroll to the prompts. If you haven't, the next two sections will save you a security bug.
What "covering magic-link auth" actually means
Most teams test the happy path: request a link, click it, end up signed in. That's roughly 20% of what can go wrong. A complete check needs to verify, at minimum:
- Token expiry. A link clicked after the configured TTL fails cleanly with a useful error. (OWASP's Authentication Cheat Sheet treats short-lived tokens as table stakes — the industry has converged on 15–30 minutes for login tokens, with 15 minutes being most common.)
- Single-use enforcement. A token consumed once can't be consumed again, even within the TTL. This is the protection against an attacker who got hold of the email after you used the link.
- Account enumeration. The page after "send me a link" shows the same message whether or not an account exists. If it doesn't, you've handed attackers a free user-existence oracle.
- Rate limiting. Submitting the same email a hundred times in a minute doesn't actually send a hundred emails — and doesn't tell you the email is valid through timing.
- Late delivery. A link that arrives three minutes after the user requested it still works, assuming the TTL hasn't elapsed. (This one trips up people who use "request time" instead of "send time" as the issuance moment.)
- Cross-device flow. Requesting from one browser and clicking from another (the typical phone-on-laptop scenario) works without weird "session not found" errors.
- Logged-out side effects. Clicking the link while already logged in as a different account does the right thing — either signs you out of the wrong account and into the right one, or refuses cleanly, but never leaves you in a "logged in as both" state.
Anything short of that is a smoke test. We're going to write prompts for all of it.
The model bug everyone ships once
Before the prompts, the bug to look for first: token generation that isn't actually single-use.
Implemented naively, the verification step looks like this:
That's two queries with a race window between them. Two parallel requests on the same token both pass the SELECT and both reach the UPDATE — the user (or an attacker) gets two sessions from one token. The fix is atomic consumption:
You can't test that race condition reliably from a browser. You can test the visible consequence — clicking the same link twice from two tabs — and you can test the broader single-use rule, which is what prompt 2 below does. If you find single-use isn't enforced at the UI level, that's a strong hint the DB-level enforcement is also missing.
Setup: pick the environment
You need a staging environment where:
- Magic links go to a real test inbox you can read (a
@yourapp.testaddress piped to a service like Mailosaur, or a catch-all on a domain you control). - The token TTL matches production (don't run with
MAGIC_LINK_TTL=24h"for testing"). - Rate limiting is on, with production-like thresholds.
The agent can read the email if you give it the inbox URL or its API. Most teams point it at a hosted test inbox; if you're using a service like Mailosaur or Mailtrap, give the agent the message URL or a credentials-protected mailbox page. We're not going to bake those credentials into the prompts — pass them via your scenario config and reference them as {{INBOX_URL}} or similar.
Throughout this guide, the URL we'll target is https://staging.yourapp.com/login. Substitute your own.
Prompt 1: the happy path
Start with the boring one. If this is broken, every other test is moot.
The bug class this catches that you didn't think to script: a malformed mailto: link, a token URL that's percent-encoded twice, a redirect target that breaks in incognito because of a missing third-party-cookie fallback.
Prompt 2: token reuse
This is the test most teams skip. It's also the one that most often surfaces a real bug.
The failure mode you're hunting: the token is single-use server-side, but the page that receives the link doesn't actually mark it consumed until the redirect resolves. So two near-simultaneous clicks both succeed. The agent won't reproduce the millisecond-level race, but it will catch the "I clicked it yesterday and it still works today" class of bug, which is structurally the same.
Prompt 3: expired token
The TTL test. You'll need a way to force-expire a token — either an admin endpoint that ages tokens, an environment flag that sets TTL to 5 seconds for tests, or a clock you can advance. Most teams have one of these in staging; if you don't, the cheapest version is a ?ttl_override=5s query parameter the request endpoint accepts in non-prod environments.
The bug to catch: an expired link that fails silently, leaves the user on a blank page, or — worse — accepts the token anyway because the expiry check is on the wrong column.
Prompt 4: account enumeration
The security one. Most teams have shipped this bug at least once.
Enumeration is one of the easiest auth bugs to ship and one of the hardest to notice without a deliberate check. OWASP's Authentication Cheat Sheet covers the same idea under "Response Discrepancy" — the user-facing language and timing should be indistinguishable.
Prompt 5: cross-device + already-logged-in
The flow the analytics team will ask about three months from now.
This is the test that catches the cookie-collision bug, the "we have two session keys" bug, and the "the magic link silently fails when there's an existing session" bug. None of those rises to a critical security issue on its own. Stacked together over time, they're how support tickets get weird.
What you do with a failure
A failed agent run on a magic-link flow gives you the same artifact as any other Monito session: a screenshot timeline, the full network log (the POST /api/auth/magic-link/request and the GET /api/auth/magic-link/verify calls with payloads), the console output, and the agent's reasoning at each branch. The run docs cover how to pull session details from the CLI; for auth bugs specifically, the network log is where you'll spend most of your time. (If you're new to reading agent runs, the breakdown in why AI QA agents find bugs your scripts miss is the longer version of why the session beats a green checkmark.)
If the failure is the enumeration one (prompt 4), don't just patch the wording — check that the timing is also indistinguishable. The fast path is usually "respond with the success message before the database lookup completes," which the agent can't measure to the millisecond but can usually tell apart at 100ms+.
Run these on every release
Five prompts, ~5 minutes of agent time per full sweep, well under a dollar of run credits. Wire them into a CI/CD step that runs against every preview deploy, or against staging on a schedule. The prompts don't rot — the agent doesn't care that you renamed the "Send me a link" button to "Email me a link" — so the maintenance cost over the next year is approximately zero. That's the trade compared to a Playwright suite that would need a data-testid update every time the form moves; we wrote about the broader pattern in Playwright alternatives without the code.
Magic-link auth is the kind of flow where one bug = a locked-out user who can't even tell you they're locked out. Five prompts is cheap insurance against the silence.
Want the five prompts as a starter pack? Sign up for Monito, paste them into a new Project, point them at your staging URL, and configure your test inbox. First runs are on us.