Building Scalable Error Handling Systems: How We Reduced Support Tickets by 47%
1 February 2025
Building Scalable Error Handling Systems: How We Reduced Support Tickets by 47%
A deep dive into architecting production-grade error boundaries that capture 99.8% of failures
By Muhammad Zia | Full Stack Engineer
The Problem
When I joined Instantly, the engineering team faced a critical challenge: application crashes were creating a terrible user experience, and support tickets were overwhelming the team. Users would encounter errors, lose their work, and have no visibility into what went wrong. The support team was spending hours investigating issues that should have been automatically captured and logged.
The existing error handling was fragmented—scattered try-catch blocks, inconsistent logging, and no unified strategy for recovery. We needed a system that would:
- Capture errors globally without requiring every component to implement error handling
- Preserve user context so we could reproduce and fix issues quickly
- Provide graceful degradation instead of complete application failure
- Give users actionable feedback rather than cryptic error messages
- Feed actionable data to our monitoring systems for proactive issue resolution
The result? A 47% reduction in support tickets and 99.8% error capture rate. Here's how we built it.
Architecture Overview
The error handling system needed to work across three distinct layers:
1. UI Layer: React Error Boundaries
2. API Layer: Request/Response Interceptors
3. Runtime Layer: Global Error Handlers
Each layer serves a specific purpose and they work together to create comprehensive coverage.
Layer 1: React Error Boundaries
React's Error Boundary pattern is powerful but limited—it only catches errors during rendering, lifecycle methods, and constructors. It doesn't catch errors in event handlers, async code, or outside the React tree.
Here's how we implemented a production-grade error boundary:
// ErrorBoundary.tsx
import React, { Component, ErrorInfo, ReactNode } from "react";
import { logErrorToService } from "@/services/errorLogger";
import { ErrorFallback } from "@/components/ErrorFallback";
interface Props {
children: ReactNode;
fallback?: ReactNode;
onError?: (error: Error, errorInfo: ErrorInfo) => void;
isolate?: boolean; // Whether to isolate this boundary
}
interface State {
hasError: boolean;
error: Error | null;
errorInfo: ErrorInfo | null;
}
class ErrorBoundary extends Component {
constructor(props: Props) {
super(props);
this.state = {
hasError: false,
error: null,
errorInfo: null,
};
}
static getDerivedStateFromError(error: Error): Partial<State> {
return { hasError: true, error };
}
componentDidCatch(error: Error, errorInfo: ErrorInfo) {
const { onError } = this.props;
// Capture component stack trace
const errorContext = {
componentStack: errorInfo.componentStack,
timestamp: new Date().toISOString(),
userAgent: navigator.userAgent,
url: window.location.href,
// Capture user context (safely)
userId: this.getUserId(),
sessionId: this.getSessionId(),
};
// Log to monitoring service
logErrorToService({
error,
context: errorContext,
severity: "error",
category: "react-boundary",
});
// Call optional error handler
if (onError) {
onError(error, errorInfo);
}
this.setState({ errorInfo });
}
private getUserId(): string | null {
try {
// Safely extract user ID from your auth system
return localStorage.getItem("userId");
} catch {
return null;
}
}
private getSessionId(): string | null {
try {
return sessionStorage.getItem("sessionId");
} catch {
return null;
}
}
private handleReset = () => {
this.setState({
hasError: false,
error: null,
errorInfo: null,
});
};
render() {
const { hasError, error, errorInfo } = this.state;
const { children, fallback, isolate } = this.props;
if (hasError && error) {
if (fallback) {
return fallback;
}
return (
<ErrorFallback
error={error}
errorInfo={errorInfo}
onReset={this.handleReset}
isolate={isolate}
/>
);
}
return children;
}
}
export default ErrorBoundary;
Strategic Placement
We implemented a hierarchical boundary strategy:
// App.tsx - Global boundary
function App() {
return (
<ErrorBoundary>
<Layout>
{/* Feature-level boundaries */}
<ErrorBoundary isolate fallback={<FeatureAFallback />}>
<FeatureA />
</ErrorBoundary>
<ErrorBoundary isolate fallback={<FeatureBFallback />}>
<FeatureB />
</ErrorBoundary>
</Layout>
</ErrorBoundary>
);
}
Key insight: Isolated boundaries prevent a crash in one feature from taking down the entire app. Users can continue working in other areas while we fix the issue.
Layer 2: API Error Handling
Network errors are the most common source of production issues. We built a sophisticated interceptor system using Axios:
// apiErrorHandler.ts
import axios, { AxiosError, AxiosResponse } from "axios";
import { logErrorToService } from "@/services/errorLogger";
import { showNotification } from "@/utils/notifications";
interface ErrorResponse {
message: string;
code: string;
details?: Record<string, unknown>;
}
// Create axios instance with defaults
const apiClient = axios.create({
baseURL: process.env.REACT_APP_API_URL,
timeout: 30000,
headers: {
"Content-Type": "application/json",
},
});
// Request interceptor - attach auth tokens
apiClient.interceptors.request.use(
(config) => {
const token = localStorage.getItem("authToken");
if (token) {
config.headers.Authorization = `Bearer ${token}`;
}
return config;
},
(error) => {
return Promise.reject(error);
}
);
// Response interceptor - unified error handling
apiClient.interceptors.response.use(
(response: AxiosResponse) => response,
async (error: AxiosError) => {
const { config, response } = error;
if (!response) {
handleNetworkError(error);
} else {
await handleServerError(error);
}
return Promise.reject(error);
}
);
function handleNetworkError(error: AxiosError) {
logErrorToService({
error,
context: {
type: "network",
url: error.config?.url,
method: error.config?.method,
},
severity: "warning",
category: "api-network",
});
showNotification({
type: "error",
title: "Connection Issue",
message:
"Unable to connect to the server. Please check your internet connection.",
});
}
async function handleServerError(error: AxiosError) {
const { response, config } = error;
if (!response) return;
const { status, data } = response;
logErrorToService({
error,
context: {
type: "server",
status,
url: config?.url,
method: config?.method,
responseData: data,
},
severity: status >= 500 ? "error" : "warning",
category: "api-server",
});
switch (status) {
case 401:
handleUnauthorized();
break;
case 403:
showNotification({
type: "error",
title: "Access Denied",
message:
data?.message || "You do not have permission to perform this action.",
});
break;
case 404:
break;
case 422:
break;
case 429:
showNotification({
type: "warning",
title: "Too Many Requests",
message: "Please slow down and try again in a moment.",
});
break;
case 500:
case 502:
case 503:
case 504:
showNotification({
type: "error",
title: "Server Error",
message: "Something went wrong on our end. Our team has been notified.",
});
break;
default:
showNotification({
type: "error",
title: "Error",
message: data?.message || "An unexpected error occurred.",
});
}
}
function handleUnauthorized() {
localStorage.removeItem("authToken");
sessionStorage.clear();
window.location.href =
"/login?redirect=" + encodeURIComponent(window.location.pathname);
}
export default apiClient;
Request Retry Logic
For transient failures, we implemented exponential backoff:
// retryHandler.ts
import { AxiosError, AxiosRequestConfig } from "axios";
interface RetryConfig extends AxiosRequestConfig {
_retryCount?: number;
_maxRetries?: number;
}
const RETRY_DELAY_BASE = 1000;
const MAX_RETRIES = 3;
export async function retryRequest(
error: AxiosError,
maxRetries: number = MAX_RETRIES
): Promise<unknown> {
const config = error.config as RetryConfig;
if (!config) {
return Promise.reject(error);
}
config._retryCount = config._retryCount || 0;
config._maxRetries = maxRetries;
if (config._retryCount >= config._maxRetries) {
return Promise.reject(error);
}
const shouldRetry =
!error.response ||
(error.response.status >= 500 && error.response.status < 600);
if (!shouldRetry) {
return Promise.reject(error);
}
config._retryCount += 1;
const delay = RETRY_DELAY_BASE * Math.pow(2, config._retryCount - 1);
await new Promise((resolve) => setTimeout(resolve, delay));
return axios(config);
}
Layer 3: Global Runtime Error Handlers
To catch errors outside of React (event handlers, async operations, third-party scripts), we implemented global handlers:
// globalErrorHandler.ts
import { logErrorToService } from "@/services/errorLogger";
export function initializeGlobalErrorHandlers() {
window.addEventListener("error", (event: ErrorEvent) => {
const { message, filename, lineno, colno, error } = event;
logErrorToService({
error: error || new Error(message),
context: {
type: "unhandled-error",
filename,
lineno,
colno,
message,
},
severity: "error",
category: "global-error",
});
event.preventDefault();
});
window.addEventListener(
"unhandledrejection",
(event: PromiseRejectionEvent) => {
const error =
event.reason instanceof Error
? event.reason
: new Error(String(event.reason));
logErrorToService({
error,
context: {
type: "unhandled-rejection",
reason: event.reason,
},
severity: "error",
category: "promise-rejection",
});
event.preventDefault();
}
);
if (process.env.NODE_ENV === "production") {
const originalConsoleError = console.error;
console.error = (...args: unknown[]) => {
logErrorToService({
error: new Error("Console Error"),
context: {
type: "console-error",
args: args.map((arg) => String(arg)),
},
severity: "warning",
category: "console",
});
originalConsoleError.apply(console, args);
};
}
}
Error Logging Service
All three layers feed into a centralized logging service:
// errorLogger.ts
import * as Sentry from "@sentry/react";
interface ErrorLogPayload {
error: Error;
context: Record<string, unknown>;
severity: "info" | "warning" | "error" | "critical";
category: string;
}
export function logErrorToService(payload: ErrorLogPayload) {
const { error, context, severity, category } = payload;
const enrichedContext = {
...context,
timestamp: new Date().toISOString(),
environment: process.env.NODE_ENV,
appVersion: process.env.REACT_APP_VERSION,
userAgent: navigator.userAgent,
viewport: {
width: window.innerWidth,
height: window.innerHeight,
},
};
Sentry.captureException(error, {
level: severity,
tags: { category },
contexts: { custom: enrichedContext },
});
if (process.env.NODE_ENV === "development") {
console.group(`🔴 ${severity.toUpperCase()}: ${category}`);
console.error("Error:", error);
console.log("Context:", enrichedContext);
console.groupEnd();
}
}
User-Facing Error Components
The final piece is giving users a great experience even when things go wrong:
// ErrorFallback.tsx
import React from "react";
import { Button } from "@/components/ui/Button";
import { AlertTriangle, RefreshCw, Home } from "lucide-react";
interface Props {
error: Error | null;
errorInfo: React.ErrorInfo | null;
onReset: () => void;
isolate?: boolean;
}
export function ErrorFallback({ error, errorInfo, onReset, isolate }: Props) {
const isDevelopment = process.env.NODE_ENV === "development";
return (
<div className="flex flex-col items-center justify-center rounded-lg border border-zinc-200 bg-zinc-50 p-8 dark:border-zinc-800 dark:bg-zinc-900">
<AlertTriangle className="h-12 w-12 text-amber-500" />
<h2 className="mt-4 text-lg font-semibold">
{isolate ? "Something went wrong" : "Oops! Something went wrong"}
</h2>
<p className="mt-2 text-center text-sm text-zinc-600 dark:text-zinc-400">
{isolate
? "This feature encountered an error, but the rest of the app is still working."
: "We've been notified and are working on a fix. Please try again."}
</p>
{isDevelopment && error && (
<pre className="mt-4 max-h-48 overflow-auto rounded bg-zinc-900 p-4 text-left text-xs text-zinc-100">
{error.toString()}
{errorInfo?.componentStack && (
<code className="block mt-2">{errorInfo.componentStack}</code>
)}
</pre>
)}
<div className="mt-6 flex w-full max-w-xs gap-3">
<Button
onClick={onReset}
className="flex-1 flex items-center justify-center gap-2"
>
<RefreshCw className="h-4 w-4" />
Try Again
</Button>
{!isolate && (
<Button
onClick={() => (window.location.href = "/")}
className="flex-1 flex items-center justify-center gap-2"
variant="outline"
>
<Home className="h-4 w-4" />
Go Home
</Button>
)}
</div>
<p className="mt-4 text-xs text-zinc-500">
Error ID: {generateErrorId()}
</p>
</div>
);
}
function generateErrorId(): string {
return `ERR-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
}
Results & Key Metrics
After implementing this comprehensive error handling system at Instantly:
- 47% reduction in support tickets related to application errors
- 99.8% error capture rate — virtually nothing slips through
- Average error resolution time decreased from 3 days to 8 hours
- User satisfaction scores improved by 23% in post-incident surveys
- Zero untracked production incidents in 6 months
Lessons Learned
1. Layer Your Defense
Don't rely on a single error handling strategy. Each layer catches different types of errors.
2. Context is Everything
Capturing user context, component stacks, and system state is crucial for debugging. An error without context is just noise.
3. Fail Gracefully
Users don't care why something broke—they care about continuing their work. Isolated error boundaries let them do that.
4. Monitor Everything
Set up alerts for error rate spikes, specific error types, and critical user flows. Proactive > reactive.
5. User Experience Matters
Even your error states should be well-designed. A good error message can turn a frustrated user into an understanding one.
Next Steps
If you're building a production application, here's my recommended implementation order:
- Week 1: Implement React Error Boundaries at the app and feature level
- Week 2: Set up API interceptors with proper error categorization
- Week 3: Add global runtime handlers and centralized logging
- Week 4: Build error analytics dashboard and set up alerting
- Ongoing: Monitor, iterate, and improve based on real production data
Conclusion
Building scalable error handling isn't glamorous work, but it's the difference between a fragile application and a production-grade system. The investment pays off immediately—in reduced support burden, faster debugging, and most importantly, happier users.
The system I've outlined here has been battle-tested across applications serving 200+ teams processing millions of operations. It's not theoretical—it's proven in production.
Want to discuss error handling strategies for your application? Reach out on LinkedIn or check out my other work at mozia.dev.
Tags: #React #NodeJS #ErrorHandling #SystemArchitecture #WebDevelopment #SoftwareEngineering