OAuth callback testing: Google, GitHub, and Microsoft in one prompt each

The OAuth callback is where login quietly breaks — redirect_uri mismatches, dropped state, cancelled consent, expired codes. Here's how to test the Google, GitHub, and Microsoft handshake without writing Playwright.

playbookoauthauthenticationno-code
monito

OAuth callback testing: Google, GitHub, and Microsoft in one prompt each

playbookoauthauthenticationno-code
June 29, 2026

"Sign in with Google" is four words on a button and about nine round-trips behind it. The happy path works on the first try in development and then breaks in the one place you can't see: the callback. The provider sends the user back to your app with a code and a state parameter in the URL, your server does five things with them in a few hundred milliseconds, and any one of those things can fail in a way that shows the user a blank page, a generic "something went wrong," or — worse — a successful login that shouldn't have happened.

This is the testing playbook we'd actually run against an OAuth callback, written as plain-English prompts you point at a staging URL. No Playwright, no intercepting proxy, no decoding JWTs in your test suite. If you've shipped social login before and know exactly which of these bit you, skip to the prompts. If you haven't, the next two sections are the map of where the bodies are buried.

What "testing the callback" actually means

Most teams test one thing: click the button, approve the consent screen, end up logged in. That's the part that's hardest to break and easiest to test, which is a bad combination — you're spending your test budget on the safe path. The callback is where the interesting failures live, and a real check needs to cover, at minimum:

  1. The happy path, per provider. Google, GitHub, and Microsoft each return a slightly different shape of response and a different set of profile fields. A flow that works with Google can break with GitHub because the email comes back null until you make a second API call.
  2. State parameter validation. The state value your app sent at the start must match the one that comes back. Google's own guidance is blunt about why: confirming the returned state matches the sent state is what tells you a real user — not a malicious script — is making the request, and it's your CSRF protection on the login flow. A callback that ignores state, or accepts a missing one, is a vulnerability.
  3. redirect_uri mismatch. The provider only sends the user back to a redirect URI you registered, matched exactly. OAuth's redirect-validation rules require an exact string match, not a prefix or substring — partial matching is how you turn your auth server into an open redirector. The failure modes here are mundane and constant: a trailing slash, a :3000 that became :3001, an http that should be https.
  4. Cancelled consent. The user lands on Google's screen and clicks "Cancel." The provider redirects back with error=access_denied and no code. Your app has to handle that as a normal outcome — back to the login page with a calm message — not as an unhandled exception.
  5. Expired or reused authorization code. The code in the callback URL is single-use and short-lived. If a user double-clicks, refreshes the callback page, or the code expires before exchange, the token request fails. The user should see a "let's try that again" path, not a stack trace.
  6. Account linking and email collision. A user who signed up with email/password and then "Sign in with Google" using the same email — do they get linked to the existing account, a duplicate, or a clear error? This is the one that generates support tickets six months later.

Anything short of that list is a smoke test wearing a trench coat. We'll write prompts for the ones a browser agent can actually drive end to end.

The model bug everyone ships once

Before the prompts, the bug to look for first: a callback that validates the code but not the state.

The naive callback handler does the obvious thing — it reads the code from the query string, exchanges it for a token, fetches the profile, and signs the user in. It works. It ships. And it skipped the one check that wasn't load-bearing for the happy path: comparing the returned state against the value stashed in the session at the start of the flow.

The reason that matters is structural. The whole point of state is to bind the callback to the browser session that initiated it. Without that binding, an attacker can start their own OAuth flow, capture a valid code/state pair, and trick a victim's browser into hitting the callback — logging the victim into the attacker's account, or vice versa. The OWASP Web Security Testing Guide's OAuth section treats exactly this kind of parameter tampering as a first-class test: change the value at the authorization step and watch whether the server notices.

You can't easily forge a cross-session CSRF from a browser agent. But you can test the visible consequence — drop or mangle the state and confirm the callback refuses — and a callback that accepts a tampered state at the UI level is telling you the server-side binding is missing too. Prompt 2 does exactly that.

Setup: what the agent needs

You need a staging environment where:

  • OAuth is wired to the providers' real consent screens (or their test/sandbox apps), pointed at a staging redirect URI like https://staging.yourapp.com/auth/callback.
  • You have test accounts for each provider you support — a throwaway Google account, a GitHub account, a Microsoft account — whose credentials the agent can use.
  • The error states are reachable. If "cancelled consent" currently 500s, that's the bug; you want the test to find it, not be blocked by it.

The agent drives the real consent screen the way a person would: it reads the Google/GitHub/Microsoft login page, types the credentials you give it, clicks "Allow," and follows the redirect back to your app. Don't bake provider credentials into the prompt text — pass them through your Test Scenario config and reference them, the same way we handle secrets in the magic-link auth playbook. Throughout below, substitute your own staging URL and test identities.

Prompt 1: the happy path, one provider at a time

Start boring. If this is broken, nothing else matters.

Go to https://staging.yourapp.com/login.
Click "Sign in with Google".

On Google's consent screen, sign in as the test account
{{GOOGLE_TEST_EMAIL}} / {{GOOGLE_TEST_PASSWORD}} and approve the
requested permissions.

Verify that:
- You are redirected back to the application (a yourapp.com URL)
- You land signed in, on the dashboard or first-login destination
- Your name and email from the Google account appear where the app
  shows the logged-in user
- No error appears on the page or in the browser console

Report the full redirect URL you landed on and anything that looks
off — a flash of an error page, a slow blank screen, a missing avatar.

Run the same prompt for GitHub and Microsoft, swapping the button and the credentials. The bug this catches that you didn't think to script: GitHub returning a null primary email because the test account's email is private, so your "fetch the profile" step succeeds but the user record is missing the one field you key on. Different provider, same prompt, different failure — which is the whole point of testing them separately.

Prompt 2: the state parameter

The security one. Most callbacks ship without this check at least once.

Go to https://staging.yourapp.com/login.
Click "Sign in with Google" and stop on Google's consent screen
WITHOUT approving yet.

Look at the URL of the request that started the flow and note the
"state" parameter value.

Now approve the consent. When Google redirects you back to the
application's callback URL, look at the callback URL in the address
bar before the page finishes loading.

Then, in a new tab, take that same callback URL, change the "state"
parameter to a different value (alter a few characters), and visit it.

Verify that the application REJECTS the tampered request — it should
show an authentication error and must NOT sign anyone in. If the app
logs you in despite the altered state, flag it as a security bug.

What you're hunting: a callback that reads code and ignores state entirely, or one that checks state is present but never compares it to the session. Either one logs you in on the tampered URL — and that's the finding.

Prompt 3: redirect_uri mismatch

The most common cause of "it works on my machine."

Go to https://staging.yourapp.com/login and click "Sign in with GitHub".

When you reach GitHub's authorization page, look at the request URL
and find the "redirect_uri" parameter.

Stop before authorizing. In a new tab, rebuild that same authorization
URL but change the redirect_uri to a slightly different value — for
example, add a trailing slash, or change the path to /auth/callback2.
Visit it.

Verify the provider refuses: it should show an error like
"redirect_uri is not associated with this application" and must NOT
redirect back to the altered URL. Record the exact error text.

Then go back and complete the normal flow with the unmodified URL
to confirm the registered redirect_uri still works.

This documents, in a repeatable test, the difference between a registered redirect URI and a tampered one — and confirms the provider is doing exact-match validation rather than prefix matching. (Google, for instance, doesn't allow query parameters in registered redirect URIs at all — dynamic data goes in state, not the redirect URL, which is its own thing worth verifying if your app tries to round-trip a "return to this page" value.)

The path your users take more often than you'd think — they click the button, get cold feet on the permissions screen, and bail.

Go to https://staging.yourapp.com/login.
Click "Sign in with Microsoft".

On the Microsoft consent screen, click "Cancel" or "No" — decline the
permission request instead of approving it.

Verify that:
- You are returned to the application cleanly (not left stranded on a
  Microsoft error page)
- The app shows a calm, human message — something like "Sign-in was
  cancelled" — and offers a way to try again
- There is NO server error, blank page, or unhandled-exception screen
- You are NOT partially signed in

Report exactly what the app showed and the URL you ended on.

The bug to catch: the callback handler assumes a code is always present and throws when it gets error=access_denied instead. The user clicked one button and got a 500.

Prompt 5: account linking on a shared email

The one the data team asks about later.

Precondition: an account already exists with the email
{{SHARED_EMAIL}}, created via email/password signup.

Go to https://staging.yourapp.com/login.
Click "Sign in with Google" and sign in with a Google account whose
email is the SAME {{SHARED_EMAIL}}.

Verify the outcome is one sane, intentional behavior — and report
which one happens:
- The Google identity is linked to the existing account and you land
  in that account, OR
- The app clearly tells you an account with this email already exists
  and asks you to sign in with your password first

There must NOT be: a second duplicate account silently created, or an
error that leaves you locked out of both. Capture the final state and
any account-id or email shown on the dashboard.

Whatever your product decided to do here, this prompt pins it down so a future refactor can't quietly change it. Silent duplicate-account creation is the failure that corrupts your user table one row at a time.

What you do with a failure

A failed callback run gives you the same artifact as any other Test Run: a screenshot timeline through the provider's screens and back, the full network log — including the GET /auth/callback?code=...&state=... and the token exchange POST — the console output, and the agent's reasoning at each branch. For OAuth specifically, the network log is where you live: you can see whether state came back, whether the token exchange 200'd, and what the profile endpoint actually returned. The session docs cover pulling that detail from the CLI, and if reading agent runs is new to you, why AI QA agents find bugs your scripts miss is the longer argument for why the evidence beats a green checkmark.

A scripted version of these tests would mean automating three different providers' login DOMs — the exact selectors that change whenever Google reskins its consent screen, which is the maintenance tax that makes most teams test the callback once and never again. The agent reads each provider's screen fresh, so the prompts above don't rot when Microsoft moves a button. We made the broader version of that case in how to test signup flow; OAuth is the same argument with three external dependencies instead of one.

Run these on every release

Five prompts, one per provider for the happy path plus the four edge cases, is a full OAuth sweep in well under ten minutes of agent time and a couple of dollars of run credits. Wire them into a CI/CD step against every preview deploy, or run them on staging on a schedule. The providers change their screens; your prompts don't care. Here's the one to start with — the cancelled-consent check, because it's the cheapest bug to ship and the most embarrassing to demo:

Test the "Sign in with Google" flow on https://staging.yourapp.com/login.

First, complete it normally: click the Google button, sign in as
{{GOOGLE_TEST_EMAIL}} / {{GOOGLE_TEST_PASSWORD}}, approve consent,
and confirm you land signed in with your email shown on the dashboard.

Then do it again, but this time click "Cancel" on Google's consent
screen instead of approving. Confirm the app returns you to its own
login page with a clear "sign-in cancelled" message, no server error,
and no partial login.

Report console errors, failed network requests, and the exact text
and URL the app showed in each case — including anything broken I
didn't ask about.

Save it as a Test Scenario, point it at your staging URL, and you've covered the login path most teams find out is broken from a user, not a test. Your first run is free.