Why Web Push Breaks in Production (and Why Basic Tutorials Don't Warn You)
Web push notification tutorials almost always end the same way: the demo notification appears in the browser, the author declares success, and the reader is left believing the hard part is over. It isn't.
Getting a push notification to fire on localhost is a 30-minute exercise. Building a web push system that reliably delivers at 100,000 subscribers — without silently dropping messages, accumulating stale subscriptions, or breaking when you rotate credentials — is a fundamentally different problem.
This guide is for engineers who have already worked through the basics of registering a service worker, managing permissions and subscriptions, and crafting rich notifications. What follows covers the production failure surface that those guides don't: subscription lifecycle management, HTTP error handling, database design, VAPID key rotation, cross-browser quirks, retry logic, and the point at which building in-house stops making economic sense.
According to the Web Push Notification Service Market report, the market is projected to grow from $1.59 billion in 2025 to $4.6 billion by 2033. The teams winning in this space are not the ones who implemented web push first — they're the ones who made it reliable.
The Subscription Lifecycle: What Engineers Routinely Get Wrong
A push subscription is not a permanent record. It is a time-limited, revocable credential that the browser generates and that can become invalid for several distinct reasons. Treating it as permanent is the single most common source of production delivery failures.
A subscription object contains three critical fields: the endpoint (a unique URL at the browser vendor's push service), the p256dh key (the browser's public key for payload encryption), and the auth secret (a shared secret for message authentication). All three must be stored correctly and kept current.
Subscriptions become invalid when:
- The user explicitly revokes notification permission in browser settings
- The user clears site data or cookies
- The browser refreshes the subscription endpoint (a
pushsubscriptionchangeevent fires) - The browser vendor's push service retires the endpoint
- The user reinstalls the browser or resets the device
The critical failure most teams miss is the pushsubscriptionchange event. When a browser silently rotates a subscription endpoint — which Chrome and Firefox do under certain conditions — a pushsubscriptionchange event fires in the service worker. If you don't handle it, the new endpoint never reaches your server, the old endpoint eventually returns a 410, and you lose that subscriber permanently without ever knowing why.
The correct handler looks like this:
// service-worker.js
self.addEventListener('pushsubscriptionchange', (event) => {
event.waitUntil(
self.registration.pushManager.subscribe({
userVisibleOnly: true,
applicationServerKey: urlBase64ToUint8Array(VAPID_PUBLIC_KEY)
}).then((newSubscription) => {
return fetch('/api/subscriptions/update', {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
old_endpoint: event.oldSubscription?.endpoint,
new_subscription: newSubscription.toJSON()
})
});
})
);
});
Without this handler, your subscription database slowly fills with stale endpoints, your delivery rate declines, and your error logs fill with 410 responses you didn't plan for.
Handling the 410 Gone Response and Stale Subscriptions
When your application server sends a push message to a browser vendor's push service, the response carries an HTTP status code. Each code requires a specific action. Most teams handle the happy path (201 Created) and crash or log-and-ignore everything else. That's not sufficient.
Here is the complete decision table:
HTTP StatusMeaningRequired Action201 CreatedNotification accepted by push serviceLog success. No further action needed.200 OK / 202 AcceptedNotification accepted (vendor-specific)Log success. No further action needed.400 Bad RequestMalformed request (payload, headers)Do not retry. Fix your payload structure or VAPID headers.401 Unauthorized / 403 ForbiddenVAPID authentication failureDo not retry. Check VAPID key pair — public key in subscription must match private key signing the JWT.404 Not FoundEndpoint URL is invalid or malformedDelete the subscription from your database. Do not retry.410 GoneSubscription has been unsubscribed or expiredDelete the subscription immediately. Do not retry. This is the most common error in production.413 Payload Too LargePayload exceeds the push service limit (~4KB)Do not retry. Reduce payload size; fetch content on notification click instead.429 Too Many RequestsRate limit exceededRetry with exponential backoff. Respect the Retry-After header if present.500 / 503Push service error or temporary outageRetry with exponential backoff. Cap at 3–5 attempts, then move to dead letter queue.
The 410 case deserves extra attention. When you receive a 410, it means either the user explicitly unsubscribed, or the subscription expired. Either way, continuing to attempt delivery to that endpoint wastes resources and inflates your error rate. The correct response is unconditional deletion.
What catches teams off guard is that 404 and 410 can sometimes behave similarly across different browser vendors. Chrome's push service (FCM) returns 410 for expired subscriptions; other vendors may return 404. Your code should handle both as permanent failures requiring deletion — not retries.
Storing and Managing Subscriptions at Scale
Most teams start by stuffing subscription objects into a single database column as JSON and calling it done. This works at 1,000 subscribers. It creates serious operational problems at 500,000.
A production-grade subscription schema separates concerns clearly:
-- push_subscriptions table
CREATE TABLE push_subscriptions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
endpoint TEXT NOT NULL UNIQUE,
p256dh TEXT NOT NULL,
auth TEXT NOT NULL,
browser VARCHAR(50),
platform VARCHAR(50),
expiration_time TIMESTAMPTZ,
is_active BOOLEAN NOT NULL DEFAULT true,
last_used_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_push_subscriptions_user_id ON push_subscriptions(user_id);
CREATE INDEX idx_push_subscriptions_active ON push_subscriptions(is_active) WHERE is_active = true;
CREATE INDEX idx_push_subscriptions_endpoint ON push_subscriptions(endpoint);
A few design decisions worth explaining:
endpointas UNIQUE: Prevents duplicate subscriptions for the same browser session. On re-subscription, upsert on the endpoint rather than inserting a new row.is_activeboolean: Prefer soft-deleting on 410/404 rather than hard-deleting. This preserves your audit trail and allows you to distinguish between "never subscribed," "previously subscribed," and "currently subscribed" — useful for analytics and re-engagement campaigns.last_used_at: Update this on every successful send. Use it to identify and clean up subscriptions that haven't been successfully notified in 90+ days — a strong signal of a zombie subscription.expiration_time: The Push API spec allows subscriptions to carry an expiration timestamp. Not all browsers populate this, but when they do, you should respect it proactively rather than waiting for a 410.
At scale, query patterns determine performance. The two most frequent queries are "give me all active subscriptions for user X" (fan-out on trigger) and "mark this endpoint inactive" (on error). Both are served by the indexes above. For systems sending millions of notifications daily, shard by user_id on a distributed database like Cassandra or DynamoDB rather than a single-node Postgres instance.
VAPID Key Rotation Without Losing Your Subscriber Base
VAPID (Voluntary Application Server Identification) keys authenticate your server to the browser vendor's push service. They're generated once — a public/private ECDSA key pair — and the public key is embedded in every subscription object your users create.
This creates a coupling problem: the public key embedded in a subscription must match the private key you use to sign push requests. If you rotate your VAPID keys without a migration strategy, every existing subscription becomes invalid overnight.
The right approach to key rotation is a phased migration:
- Generate a new key pair. Store both old and new private keys in your secrets manager.
- Deploy the new public key to your frontend. New subscribers will receive subscriptions tied to the new key.
- When sending, route by key version. Use the old private key for subscriptions created before the rotation, and the new private key for subscriptions created after. Add a
vapid_key_versioncolumn to your subscriptions table to track this. - Gradually re-subscribe existing users. On the next page load or active session, silently unsubscribe and re-subscribe users who still hold old-key subscriptions. This exchanges their old subscription for a new one signed with the current key.
- Retire the old key once its associated subscription count drops to zero (or near zero after a cleanup job).
One important operational note: store VAPID private keys as environment variables, never hardcoded in source. A compromised VAPID private key allows any party to impersonate your push server and send notifications to your users. Treat it with the same care as a signing certificate.



