Push Notifications

Web Push Notifications in Production: Architecture, Failures & Scale

Gaurav Verma
May 11, 2026
TABLE OF CONTENTS

Why Web Push Breaks in Production (and Why Basic Tutorials Don't Warn You)

Web push notification tutorials almost always end the same way: the demo notification appears in the browser, the author declares success, and the reader is left believing the hard part is over. It isn't.

Getting a push notification to fire on localhost is a 30-minute exercise. Building a web push system that reliably delivers at 100,000 subscribers -without silently dropping messages, accumulating stale subscriptions, or breaking when you rotate credentials - is a fundamentally different problem.

This guide is for engineers who have already worked through the basics of registering a service worker, managing permissions and subscriptions, and crafting rich notifications. What follows covers the production failure surface that those guides don't: subscription lifecycle management, HTTP error handling, database design, VAPID key rotation, cross-browser quirks, retry logic, and the point at which building in-house stops making economic sense.

The teams winning in this space are not the ones who implemented web push first - they're the ones who made it reliable.

The Subscription Lifecycle: What Engineers Routinely Get Wrong

A push subscription is not a permanent record. It is a time-limited, revocable credential that the browser generates and that can become invalid for several distinct reasons. Treating it as permanent is the single most common source of production delivery failures.

A subscription object contains three critical fields: the endpoint (a unique URL at the browser vendor's push service), the p256dh key (the browser's public key for payload encryption), and the auth secret (a shared secret for message authentication). All three must be stored correctly and kept current.

Subscriptions become invalid when:

  • The user explicitly revokes notification permission in browser settings
  • The user clears site data or cookies
  • The browser refreshes the subscription endpoint (a pushsubscriptionchange event fires)
  • The browser vendor's push service retires the endpoint
  • The user reinstalls the browser or resets the device

The critical failure most teams miss is the pushsubscriptionchange event. When a browser silently rotates a subscription endpoint - which Chrome and Firefox do under certain conditions - a pushsubscriptionchange event fires in the service worker. If you don't handle it, the new endpoint never reaches your server, the old endpoint eventually returns a 410, and you lose that subscriber permanently without ever knowing why.

The correct handler looks like this:

// service-worker.js
self.addEventListener('pushsubscriptionchange', (event) => {
 event.waitUntil(
   self.registration.pushManager.subscribe({
     userVisibleOnly: true,
     applicationServerKey: urlBase64ToUint8Array(VAPID_PUBLIC_KEY)
   }).then((newSubscription) => {
     return fetch('/api/subscriptions/update', {
       method: 'PUT',
       headers: { 'Content-Type': 'application/json' },
       body: JSON.stringify({
         old_endpoint: event.oldSubscription?.endpoint,
         new_subscription: newSubscription.toJSON()
       })
     });
   })
 );
});

Without this handler, your subscription database slowly fills with stale endpoints, your delivery rate declines, and your error logs fill with 410 responses you didn't plan for.

Handling the 410 Gone Response and Stale Subscriptions

When your application server sends a push message to a browser vendor's push service, the response carries an HTTP status code. Each code requires a specific action. Most teams handle the happy path (201 Created) and crash or log-and-ignore everything else. That's not sufficient.

Here is the complete decision table:

HTTP Status Meaning Required Action
201 Created Notification accepted by push service Log success. No further action needed.
200 OK / 202 Accepted Notification accepted (vendor-specific) Log success. No further action needed.
400 Bad Request Malformed request (payload, headers) Do not retry. Fix your payload structure or VAPID headers.
401 Unauthorized / 403 Forbidden VAPID authentication failure Do not retry. Check VAPID key pair — public key in subscription must match private key signing the JWT.
404 Not Found Endpoint URL is invalid or malformed Delete the subscription from your database. Do not retry.
410 Gone Subscription has been unsubscribed or expired Delete the subscription immediately. Do not retry. This is the most common error in production.
413 Payload Too Large Payload exceeds the push service limit (~4KB) Do not retry. Reduce payload size; fetch content on notification click instead.
429 Too Many Requests Rate limit exceeded Retry with exponential backoff. Respect the Retry-After header if present.
500 / 503 Push service error or temporary outage Retry with exponential backoff. Cap at 3–5 attempts, then move to dead letter queue.

The 410 case deserves extra attention. When you receive a 410, it means either the user explicitly unsubscribed, or the subscription expired. Either way, continuing to attempt delivery to that endpoint wastes resources and inflates your error rate. The correct response is unconditional deletion.

What catches teams off guard is that 404 and 410 can sometimes behave similarly across different browser vendors. Chrome's push service (FCM) returns 410 for expired subscriptions; other vendors may return 404. Your code should handle both as permanent failures requiring deletion - not retries.

Storing and Managing Subscriptions at Scale

Most teams start by stuffing subscription objects into a single database column as JSON and calling it done. This works at 1,000 subscribers. It creates serious operational problems at 500,000.

A production-grade subscription schema separates concerns clearly:

-- push_subscriptions table
CREATE TABLE push_subscriptions (
 id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
 user_id       UUID NOT NULL REFERENCES users(id),
 endpoint      TEXT NOT NULL UNIQUE,
 p256dh        TEXT NOT NULL,
 auth          TEXT NOT NULL,
 browser       VARCHAR(50),
 platform      VARCHAR(50),
 expiration_time TIMESTAMPTZ,
 is_active     BOOLEAN NOT NULL DEFAULT true,
 last_used_at  TIMESTAMPTZ,
 created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
 updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_push_subscriptions_user_id ON push_subscriptions(user_id);
CREATE INDEX idx_push_subscriptions_active ON push_subscriptions(is_active) WHERE is_active = true;
CREATE INDEX idx_push_subscriptions_endpoint ON push_subscriptions(endpoint);

A few design decisions worth explaining:

  • endpoint as UNIQUE: Prevents duplicate subscriptions for the same browser session. On re-subscription, upsert on the endpoint rather than inserting a new row.
  • is_active boolean: Prefer soft-deleting on 410/404 rather than hard-deleting. This preserves your audit trail and allows you to distinguish between "never subscribed," "previously subscribed," and "currently subscribed" - useful for analytics and re-engagement campaigns.

last_used_at: Update this on every successful send. Use it to identify and clean up subscriptions that haven't been successfully notified in 90+ days — a strong signal of a zombie subscription.

  • expiration_time: The Push API spec allows subscriptions to carry an expiration timestamp. Not all browsers populate this, but when they do, you should respect it proactively rather than waiting for a 410.

At scale, query patterns determine performance. The two most frequent queries are "give me all active subscriptions for user X" (fan-out on trigger) and "mark this endpoint inactive" (on error). Both are served by the indexes above. For systems sending millions of notifications daily, shard by user_id on a distributed database like Cassandra or DynamoDB rather than a single-node Postgres instance.

VAPID Key Rotation Without Losing Your Subscriber Base

VAPID (Voluntary Application Server Identification) keys authenticate your server to the browser vendor's push service. They're generated once - a public/private ECDSA key pair -= and the public key is embedded in every subscription object your users create.

This creates a coupling problem: the public key embedded in a subscription must match the private key you use to sign push requests. If you rotate your VAPID keys without a migration strategy, every existing subscription becomes invalid overnight.

The right approach to key rotation is a phased migration:

  1. Generate a new key pair. Store both old and new private keys in your secrets manager.
  2. Deploy the new public key to your frontend. New subscribers will receive subscriptions tied to the new key.
  3. When sending, route by key version. Use the old private key for subscriptions created before the rotation, and the new private key for subscriptions created after. Add a vapid_key_version column to your subscriptions table to track this.
  4. Gradually re-subscribe existing users. On the next page load or active session, silently unsubscribe and re-subscribe users who still hold old-key subscriptions. This exchanges their old subscription for a new one signed with the current key.
  5. Retire the old key once its associated subscription count drops to zero (or near zero after a cleanup job).

One important operational note: store VAPID private keys as environment variables, never hardcoded in source. A compromised VAPID private key allows any party to impersonate your push server and send notifications to your users. Treat it with the same care as a signing certificate.

Cross-Browser Delivery Quirks: Chrome, Firefox, and Safari on iOS

Web push is built on open standards - the Push API, the Web Push Protocol, and VAPID - which means it should behave consistently across browsers. In practice, there are meaningful differences that will catch you in production if you don't plan for them.

Browser Push Service Notable Quirks
Chrome / Edge Firebase Cloud Messaging (FCM) Every push message must result in a visible notification (userVisibleOnly: true is mandatory). Silent push is not permitted. Rate limits apply per subscription.
Firefox Mozilla Autopush Allows a limited quota of silent push messages per subscription. Quota is replenished when visible notifications are shown. Firefox returns distinct error formats — validate your error-parsing code against Mozilla's autopush documentation specifically.
Safari (macOS) Apple Push Notification service (APNs) Uses the standard Push API since Safari 16. VAPID-based authentication works. Payload size limit is 2KB vs. 4KB on Chrome/Firefox.
Safari (iOS 16.4+) APNs via WebKit Push only works for PWAs installed to the home screen. The permission prompt must be triggered by a user gesture (not on page load). In EU regions, check your geolocation logic — iOS 17.4 removed standalone PWA support in the EU under the DMA.

The iOS Safari constraint deserves special attention in your subscription flow. When a user visits your site in mobile Safari, 'PushManager' in window returns false - push is not available in the browser context. It becomes available only after the user adds your PWA to their home screen and opens it from there. Your frontend needs to handle this state explicitly:

// Detect iOS Safari in browser (not installed PWA)
const isIOS = /iphone|ipad|ipod/.test(navigator.userAgent.toLowerCase());
const isInStandaloneMode = window.matchMedia('(display-mode: standalone)').matches;

if (isIOS && !isInStandaloneMode) {
 showAddToHomeScreenBanner();
} else if ('PushManager' in window) {
 initPushSubscription();
}

Store the browser and platform columns in your subscription table. This lets you diagnose delivery failures by platform and avoid sending to configurations you know won't succeed.

Retry Logic, Dead Letter Queues, and Delivery Guarantees

Web push delivery is inherently best-effort - the push service will attempt delivery, but there is no guarantee the user's device is online or that the notification will be shown. What you can control is your own server's retry behavior and failure handling.

The architecture for reliable push delivery at scale follows this pattern:

  1. Write to a job queue first. Never send push notifications synchronously in the request path. Publish a job to a queue (Redis, RabbitMQ, SQS) and return immediately. A worker processes the job asynchronously.
  2. Classify errors before retrying. Use the HTTP status decision table above. Only retry on 429 and 5xx. Permanent errors (400, 401, 404, 410) should never be retried -= they won't resolve themselves.
  3. Apply exponential backoff with jitter. For 429 and 5xx, retry after 1s, 4s, 16s, 64s. Add random jitter to avoid thundering-herd problems when a push service recovers from an outage and all your workers retry simultaneously.
  4. Cap retries and move to a dead letter queue (DLQ). After 3–5 attempts, stop retrying and route to a DLQ. The DLQ is not a failure state - it's an observable, auditable record of notifications that could not be delivered. Set up alerting on DLQ depth.
  5. Process the DLQ with human oversight. For high-priority notification types (security alerts, transactional events), have an ops workflow to review DLQ items and decide whether to retry, escalate to email, or discard.

A common question is whether to implement at-least-once or exactly-once delivery semantics. In practice, web push does not support exactly-once delivery - the push protocol itself provides no deduplication at the service layer. Design your notifications to be idempotent where possible: if a user receives the same notification twice due to a retry, it should not cause data inconsistency or a confusing experience.

Observability: What to Monitor in a Production Push System

You cannot improve what you cannot measure. Most teams instrument their push system for the happy path only - they count sends but not failures, and they discover delivery problems through user complaints rather than dashboards.

A production push system should track these metrics at minimum:

Metric What It Tells You Alert Threshold
Delivery success rate (per browser) Overall health of your push pipeline Alert if below 95% over a 5-minute window
410 error rate Subscription churn / stale subscription accumulation Alert if above 5% of sends — may indicate a VAPID key mismatch or bulk unsubscription
401/403 error rate VAPID authentication problems Alert on any sustained occurrence — this typically indicates a key misconfiguration
429 rate Push service rate limiting Alert if above 1% — review your send rate and batching strategy
DLQ depth Undeliverable notification backlog Alert if growing — indicates a systemic delivery problem
Active subscription count (by browser) Subscriber base health Alert on sudden drops greater than 10% — may indicate a bug in your unsubscription logic
Worker queue depth Processing backlog / throughput problems Alert if queue is not draining within your SLA window

Per-notification delivery logs are equally important. Each log record should capture: notification ID, user ID, subscription endpoint (hashed for privacy), timestamp, HTTP status received, number of attempts, and final outcome (delivered, failed-permanent, failed-queued). This enables root-cause analysis for any specific notification failure and supports compliance audit trails.

For the observability infrastructure itself, a time-series database (InfluxDB, Prometheus, or CloudWatch Metrics) paired with a dashboarding layer (Grafana or Datadog) gives you the operational visibility this system requires. Log aggregation (Elasticsearch, Loki) handles per-notification audit trails.

The Build-vs-Buy Inflection Point

Everything covered in this guide represents engineering work that must be built, maintained, and scaled by someone. The question is whether that someone should be your team.

Building a production-grade web push system in-house is viable. It is also substantial. Industry estimates put the engineering effort for a multi-channel notification system at 6–12 months for a 3-person team, with at least one engineer dedicated to ongoing maintenance. That estimate is for the full system - but even the web-push-only surface covered here is a 4–8 week project when you account for subscription lifecycle handling, error classification, retry infrastructure, cross-browser testing, observability, and VAPID key management.

The build-vs-buy calculus shifts depending on where you are:

  • Early-stage team (pre-Series A): The opportunity cost of building notification infrastructure is very high. Every week spent on push delivery plumbing is a week not spent on your core product. Buy.
  • Growth-stage team (Series A/B): If you've already built a basic push implementation and are hitting the production failures described here, rebuilding the plumbing is a distraction from adding channels, improving engagement, and scaling. Buy or migrate.
  • Mid-market / enterprise: If your team has specific compliance requirements, white-labeling needs, or extremely high volume that makes per-notification pricing prohibitive at scale, an in-house system may be justified - but evaluate total cost of ownership honestly.

For teams evaluating notification infrastructure platforms, SuprSend handles the full web push production surface out of the box: subscription lifecycle management, per-notification delivery logs with step-by-step observability, cross-browser compatibility, automatic handling of 410/404 responses, and a multi-channel workflow engine that routes across web push, email, SMS, in-app, and more from a single API. You can explore the notification service architecture guide for a broader view of how these systems fit together, or see how SuprSend compares in the push notification platform comparison.

The goal is not to avoid understanding how web push works - the engineering depth in this guide matters whether you build or buy. It's to make a clear-eyed decision about where your team's leverage is highest.

FAQ

What causes a web push subscription to become invalid?

A subscription becomes invalid when the user revokes notification permission, clears browser data, or when the browser vendor's push service retires the endpoint. Browsers also silently rotate subscription endpoints under certain conditions, firing a pushsubscriptionchange event. If you don't handle that event, your stored endpoint becomes stale and will eventually return a 410 Gone error.

What should I do when I receive a 410 Gone response from a push service?

Delete or deactivate that subscription in your database immediately. A 410 means the subscription no longer exists at the push service - the user has unsubscribed or the endpoint has expired. Retrying will not resolve the error. Do not conflate 410 with 500-series errors, which are transient and should be retried with exponential backoff.

How do I rotate VAPID keys without breaking existing push subscriptions?

VAPID key rotation requires a phased migration. Deploy the new public key to your frontend so new subscriptions use it. Tag subscriptions in your database with the key version used at creation time. Route push requests using the correct private key per subscription version. Then gradually re-subscribe existing users on their next active session to migrate them to the new key pair. Only retire the old key once its associated subscriptions are empty.

Do web push notifications work on iOS?

Yes, but with restrictions. Safari on iOS 16.4 and later supports web push - but only for PWAs installed to the home screen. Push notifications do not work inside the Safari browser tab. The permission prompt must be triggered by a user gesture, not on page load. Note that users in EU countries on iOS 17.4+ may face additional limitations due to Apple's response to the Digital Markets Act; detect standalone mode and geolocation to serve appropriate onboarding flows.

What is the maximum payload size for a web push notification?

Chrome and Firefox support up to 4KB. Safari on macOS and iOS supports up to 2KB. If your content exceeds these limits, send a minimal payload (notification ID and type) and fetch the full content from your server when the notification is clicked or when the service worker processes the push event. This pattern is sometimes called a "thin push."

How should I handle rate limiting (429) from a push service?

Respect the Retry-After header in the response if it is present, and otherwise retry with exponential backoff starting at 1 second. Do not retry immediately - you will hit the rate limit again. If you consistently hit 429 at scale, review your send concurrency and consider batching or rate-limiting your own worker throughput before it reaches the push service.

Is exactly-once delivery possible with web push?

No. The web push protocol does not provide delivery deduplication at the push service layer. You can implement idempotency on your own backend - for example, by tracking notification IDs per subscription - but you cannot prevent a push service from delivering a message twice after a network retry. Design your notifications to be idempotent: displaying the same notification twice should not cause inconsistent state or a confusing user experience.

How many push subscriptions can a single user have?

Each browser profile and device creates its own subscription. A user who uses Chrome on desktop, Safari on iPhone, and Firefox on a work laptop will have up to three active subscriptions. Your database schema should model subscriptions as a one-to-many relationship from user to subscriptions, not one-to-one. Fan-out logic (sending to all active subscriptions for a user) needs to be handled explicitly in your worker layer.

TL;DR

Web push notifications break in production for five predictable reasons: stale subscriptions not being cleaned up on 410 errors, the pushsubscriptionchange event being ignored, VAPID key rotation done without a migration strategy, missing per-browser handling for Safari on iOS, and no retry/dead letter queue infrastructure for transient failures. Build your subscription schema for mutability, handle every HTTP status code from the push service with a deliberate action, add a vapid_key_version column before you ever need to rotate, treat the iOS PWA context as a distinct subscription surface, and instrument your system for delivery rate, 410 rate, and DLQ depth before you launch.

Start building for free on SuprSend, or book a demo to see how the platform handles the production complexity described here.

Written by:
Gaurav Verma
Co-Founder, SuprSend
Implement a powerful stack for your notifications

{"@context":"https://schema.org","@type":"BlogPosting","headline":"Web Push Notifications in Production: Architecture, Failures & Scale","description":"Learn how web push notifications break in production and how to build a reliable architecture—subscription management, VAPID rotation, retry logic, and delivery guarantees.","author":{"@type":"Organization","name":"SuprSend"},"publisher":{"@type":"Organization","name":"SuprSend","logo":{"@type":"ImageObject","url":"https://www.suprsend.com/logo.png"}},"datePublished":"2026-04-21","dateModified":"2026-04-21","mainEntityOfPage":{"@type":"WebPage","@id":"https://www.suprsend.com/post/web-push-notifications-production-architecture"}}{"@context":"https://schema.org","@type":"FAQPage","mainEntity":[{"@type":"Question","name":"What causes a web push subscription to become invalid?","acceptedAnswer":{"@type":"Answer","text":"A subscription becomes invalid when the user revokes notification permission, clears browser data, or when the browser vendor retires the endpoint. Browsers also silently rotate endpoints via a pushsubscriptionchange event. If unhandled, the stored endpoint becomes stale and eventually returns a 410 Gone error."}},{"@type":"Question","name":"What should I do when I receive a 410 Gone response from a push service?","acceptedAnswer":{"@type":"Answer","text":"Delete or deactivate that subscription in your database immediately. A 410 means the subscription no longer exists. Retrying will not resolve the error."}},{"@type":"Question","name":"How do I rotate VAPID keys without breaking existing push subscriptions?","acceptedAnswer":{"@type":"Answer","text":"Use a phased migration: deploy the new public key to your frontend, tag subscriptions with a key version, route push requests using the correct private key per subscription version, then gradually re-subscribe existing users on their next active session."}},{"@type":"Question","name":"Do web push notifications work on iOS?","acceptedAnswer":{"@type":"Answer","text":"Yes, on iOS 16.4 and later, but only for PWAs installed to the home screen. Push does not work inside the Safari browser tab. The permission prompt must be triggered by a user gesture, not on page load."}},{"@type":"Question","name":"What is the maximum payload size for a web push notification?","acceptedAnswer":{"@type":"Answer","text":"Chrome and Firefox support up to 4KB. Safari supports up to 2KB. If content exceeds these limits, send a minimal payload and fetch full content from your server when the notification is clicked."}},{"@type":"Question","name":"How should I handle rate limiting (429) from a push service?","acceptedAnswer":{"@type":"Answer","text":"Respect the Retry-After header if present, and retry with exponential backoff starting at 1 second. Review send concurrency if 429s persist at scale."}},{"@type":"Question","name":"Is exactly-once delivery possible with web push?","acceptedAnswer":{"@type":"Answer","text":"No. The web push protocol does not provide delivery deduplication. Design notifications to be idempotent so receiving the same notification twice does not cause inconsistent state."}},{"@type":"Question","name":"How many push subscriptions can a single user have?","acceptedAnswer":{"@type":"Answer","text":"Each browser profile and device creates its own subscription. Model subscriptions as one-to-many from user to subscriptions in your database."}}]}{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.suprsend.com"},{"@type":"ListItem","position":2,"name":"Blog","item":"https://www.suprsend.com/blog"},{"@type":"ListItem","position":3,"name":"Web Push Notifications in Production: Architecture, Failures & Scale","item":"https://www.suprsend.com/post/web-push-notifications-production-architecture"}]}

By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.