OAuth callback testing: Google, GitHub, and Microsoft in one prompt each
The OAuth callback is where login quietly breaks — redirect_uri mismatches, dropped state, cancelled consent, expired codes. Here's how to test the Google, GitHub, and Microsoft handshake without writing Playwright.
OAuth callback testing: Google, GitHub, and Microsoft in one prompt each
"Sign in with Google" is four words on a button and about nine round-trips behind it. The happy path works on the first try in development and then breaks in the one place you can't see: the callback. The provider sends the user back to your app with a code and a state parameter in the URL, your server does five things with them in a few hundred milliseconds, and any one of those things can fail in a way that shows the user a blank page, a generic "something went wrong," or — worse — a successful login that shouldn't have happened.
This is the testing playbook we'd actually run against an OAuth callback, written as plain-English prompts you point at a staging URL. No Playwright, no intercepting proxy, no decoding JWTs in your test suite. If you've shipped social login before and know exactly which of these bit you, skip to the prompts. If you haven't, the next two sections are the map of where the bodies are buried.
What "testing the callback" actually means
Most teams test one thing: click the button, approve the consent screen, end up logged in. That's the part that's hardest to break and easiest to test, which is a bad combination — you're spending your test budget on the safe path. The callback is where the interesting failures live, and a real check needs to cover, at minimum:
- The happy path, per provider. Google, GitHub, and Microsoft each return a slightly different shape of response and a different set of profile fields. A flow that works with Google can break with GitHub because the email comes back
nulluntil you make a second API call. - State parameter validation. The
statevalue your app sent at the start must match the one that comes back. Google's own guidance is blunt about why: confirming the returnedstatematches the sentstateis what tells you a real user — not a malicious script — is making the request, and it's your CSRF protection on the login flow. A callback that ignoresstate, or accepts a missing one, is a vulnerability. - redirect_uri mismatch. The provider only sends the user back to a redirect URI you registered, matched exactly. OAuth's redirect-validation rules require an exact string match, not a prefix or substring — partial matching is how you turn your auth server into an open redirector. The failure modes here are mundane and constant: a trailing slash, a
:3000that became:3001, anhttpthat should behttps. - Cancelled consent. The user lands on Google's screen and clicks "Cancel." The provider redirects back with
error=access_deniedand no code. Your app has to handle that as a normal outcome — back to the login page with a calm message — not as an unhandled exception. - Expired or reused authorization code. The code in the callback URL is single-use and short-lived. If a user double-clicks, refreshes the callback page, or the code expires before exchange, the token request fails. The user should see a "let's try that again" path, not a stack trace.
- Account linking and email collision. A user who signed up with email/password and then "Sign in with Google" using the same email — do they get linked to the existing account, a duplicate, or a clear error? This is the one that generates support tickets six months later.
Anything short of that list is a smoke test wearing a trench coat. We'll write prompts for the ones a browser agent can actually drive end to end.
The model bug everyone ships once
Before the prompts, the bug to look for first: a callback that validates the code but not the state.
The naive callback handler does the obvious thing — it reads the code from the query string, exchanges it for a token, fetches the profile, and signs the user in. It works. It ships. And it skipped the one check that wasn't load-bearing for the happy path: comparing the returned state against the value stashed in the session at the start of the flow.
The reason that matters is structural. The whole point of state is to bind the callback to the browser session that initiated it. Without that binding, an attacker can start their own OAuth flow, capture a valid code/state pair, and trick a victim's browser into hitting the callback — logging the victim into the attacker's account, or vice versa. The OWASP Web Security Testing Guide's OAuth section treats exactly this kind of parameter tampering as a first-class test: change the value at the authorization step and watch whether the server notices.
You can't easily forge a cross-session CSRF from a browser agent. But you can test the visible consequence — drop or mangle the state and confirm the callback refuses — and a callback that accepts a tampered state at the UI level is telling you the server-side binding is missing too. Prompt 2 does exactly that.
Setup: what the agent needs
You need a staging environment where:
- OAuth is wired to the providers' real consent screens (or their test/sandbox apps), pointed at a staging redirect URI like
https://staging.yourapp.com/auth/callback. - You have test accounts for each provider you support — a throwaway Google account, a GitHub account, a Microsoft account — whose credentials the agent can use.
- The error states are reachable. If "cancelled consent" currently 500s, that's the bug; you want the test to find it, not be blocked by it.
The agent drives the real consent screen the way a person would: it reads the Google/GitHub/Microsoft login page, types the credentials you give it, clicks "Allow," and follows the redirect back to your app. Don't bake provider credentials into the prompt text — pass them through your Test Scenario config and reference them, the same way we handle secrets in the magic-link auth playbook. Throughout below, substitute your own staging URL and test identities.
Prompt 1: the happy path, one provider at a time
Start boring. If this is broken, nothing else matters.
Run the same prompt for GitHub and Microsoft, swapping the button and the credentials. The bug this catches that you didn't think to script: GitHub returning a null primary email because the test account's email is private, so your "fetch the profile" step succeeds but the user record is missing the one field you key on. Different provider, same prompt, different failure — which is the whole point of testing them separately.
Prompt 2: the state parameter
The security one. Most callbacks ship without this check at least once.
What you're hunting: a callback that reads code and ignores state entirely, or one that checks state is present but never compares it to the session. Either one logs you in on the tampered URL — and that's the finding.
Prompt 3: redirect_uri mismatch
The most common cause of "it works on my machine."
This documents, in a repeatable test, the difference between a registered redirect URI and a tampered one — and confirms the provider is doing exact-match validation rather than prefix matching. (Google, for instance, doesn't allow query parameters in registered redirect URIs at all — dynamic data goes in state, not the redirect URL, which is its own thing worth verifying if your app tries to round-trip a "return to this page" value.)
Prompt 4: cancelled consent
The path your users take more often than you'd think — they click the button, get cold feet on the permissions screen, and bail.
The bug to catch: the callback handler assumes a code is always present and throws when it gets error=access_denied instead. The user clicked one button and got a 500.
Prompt 5: account linking on a shared email
The one the data team asks about later.
Whatever your product decided to do here, this prompt pins it down so a future refactor can't quietly change it. Silent duplicate-account creation is the failure that corrupts your user table one row at a time.
What you do with a failure
A failed callback run gives you the same artifact as any other Test Run: a screenshot timeline through the provider's screens and back, the full network log — including the GET /auth/callback?code=...&state=... and the token exchange POST — the console output, and the agent's reasoning at each branch. For OAuth specifically, the network log is where you live: you can see whether state came back, whether the token exchange 200'd, and what the profile endpoint actually returned. The session docs cover pulling that detail from the CLI, and if reading agent runs is new to you, why AI QA agents find bugs your scripts miss is the longer argument for why the evidence beats a green checkmark.
A scripted version of these tests would mean automating three different providers' login DOMs — the exact selectors that change whenever Google reskins its consent screen, which is the maintenance tax that makes most teams test the callback once and never again. The agent reads each provider's screen fresh, so the prompts above don't rot when Microsoft moves a button. We made the broader version of that case in how to test signup flow; OAuth is the same argument with three external dependencies instead of one.
Run these on every release
Five prompts, one per provider for the happy path plus the four edge cases, is a full OAuth sweep in well under ten minutes of agent time and a couple of dollars of run credits. Wire them into a CI/CD step against every preview deploy, or run them on staging on a schedule. The providers change their screens; your prompts don't care. Here's the one to start with — the cancelled-consent check, because it's the cheapest bug to ship and the most embarrassing to demo:
Save it as a Test Scenario, point it at your staging URL, and you've covered the login path most teams find out is broken from a user, not a test. Your first run is free.