Building Scalable Error Handling Systems: How We Reduced Support Tickets by 47%

1 February 2025

ReactNode.jsError HandlingSystem ArchitectureWeb DevelopmentSoftware Engineering

Building Scalable Error Handling Systems: How We Reduced Support Tickets by 47%

A deep dive into architecting production-grade error boundaries that capture 99.8% of failures

By Muhammad Zia | Full Stack Engineer

The Problem

When I joined Instantly, the engineering team faced a critical challenge: application crashes were creating a terrible user experience, and support tickets were overwhelming the team. Users would encounter errors, lose their work, and have no visibility into what went wrong. The support team was spending hours investigating issues that should have been automatically captured and logged.

The existing error handling was fragmented—scattered try-catch blocks, inconsistent logging, and no unified strategy for recovery. We needed a system that would:

Capture errors globally without requiring every component to implement error handling
Preserve user context so we could reproduce and fix issues quickly
Provide graceful degradation instead of complete application failure
Give users actionable feedback rather than cryptic error messages
Feed actionable data to our monitoring systems for proactive issue resolution

The result? A 47% reduction in support tickets and 99.8% error capture rate. Here's how we built it.

Architecture Overview

The error handling system needed to work across three distinct layers:

1. UI Layer: React Error Boundaries

2. API Layer: Request/Response Interceptors

3. Runtime Layer: Global Error Handlers

Each layer serves a specific purpose and they work together to create comprehensive coverage.

Layer 1: React Error Boundaries

React's Error Boundary pattern is powerful but limited—it only catches errors during rendering, lifecycle methods, and constructors. It doesn't catch errors in event handlers, async code, or outside the React tree.

Here's how we implemented a production-grade error boundary:

// ErrorBoundary.tsx
import React, { Component, ErrorInfo, ReactNode } from "react";
import { logErrorToService } from "@/services/errorLogger";
import { ErrorFallback } from "@/components/ErrorFallback";

interface Props {
  children: ReactNode;
  fallback?: ReactNode;
  onError?: (error: Error, errorInfo: ErrorInfo) => void;
  isolate?: boolean; // Whether to isolate this boundary
}

interface State {
  hasError: boolean;
  error: Error | null;
  errorInfo: ErrorInfo | null;
}

class ErrorBoundary extends Component {
  constructor(props: Props) {
    super(props);
    this.state = {
      hasError: false,
      error: null,
      errorInfo: null,
    };
  }

  static getDerivedStateFromError(error: Error): Partial<State> {
    return { hasError: true, error };
  }

  componentDidCatch(error: Error, errorInfo: ErrorInfo) {
    const { onError } = this.props;

    // Capture component stack trace
    const errorContext = {
      componentStack: errorInfo.componentStack,
      timestamp: new Date().toISOString(),
      userAgent: navigator.userAgent,
      url: window.location.href,
      // Capture user context (safely)
      userId: this.getUserId(),
      sessionId: this.getSessionId(),
    };

    // Log to monitoring service
    logErrorToService({
      error,
      context: errorContext,
      severity: "error",
      category: "react-boundary",
    });

    // Call optional error handler
    if (onError) {
      onError(error, errorInfo);
    }

    this.setState({ errorInfo });
  }

  private getUserId(): string | null {
    try {
      // Safely extract user ID from your auth system
      return localStorage.getItem("userId");
    } catch {
      return null;
    }
  }

  private getSessionId(): string | null {
    try {
      return sessionStorage.getItem("sessionId");
    } catch {
      return null;
    }
  }

  private handleReset = () => {
    this.setState({
      hasError: false,
      error: null,
      errorInfo: null,
    });
  };

  render() {
    const { hasError, error, errorInfo } = this.state;
    const { children, fallback, isolate } = this.props;

    if (hasError && error) {
      if (fallback) {
        return fallback;
      }

      return (
        <ErrorFallback
          error={error}
          errorInfo={errorInfo}
          onReset={this.handleReset}
          isolate={isolate}
        />
      );
    }

    return children;
  }
}

export default ErrorBoundary;

Strategic Placement

We implemented a hierarchical boundary strategy:

// App.tsx - Global boundary
function App() {
  return (
    <ErrorBoundary>
      <Layout>
        {/* Feature-level boundaries */}
        <ErrorBoundary isolate fallback={<FeatureAFallback />}>
          <FeatureA />
        </ErrorBoundary>
        <ErrorBoundary isolate fallback={<FeatureBFallback />}>
          <FeatureB />
        </ErrorBoundary>
      </Layout>
    </ErrorBoundary>
  );
}

Key insight: Isolated boundaries prevent a crash in one feature from taking down the entire app. Users can continue working in other areas while we fix the issue.

Layer 2: API Error Handling

Network errors are the most common source of production issues. We built a sophisticated interceptor system using Axios:

// apiErrorHandler.ts
import axios, { AxiosError, AxiosResponse } from "axios";
import { logErrorToService } from "@/services/errorLogger";
import { showNotification } from "@/utils/notifications";

interface ErrorResponse {
  message: string;
  code: string;
  details?: Record<string, unknown>;
}

// Create axios instance with defaults
const apiClient = axios.create({
  baseURL: process.env.REACT_APP_API_URL,
  timeout: 30000,
  headers: {
    "Content-Type": "application/json",
  },
});

// Request interceptor - attach auth tokens
apiClient.interceptors.request.use(
  (config) => {
    const token = localStorage.getItem("authToken");
    if (token) {
      config.headers.Authorization = `Bearer ${token}`;
    }
    return config;
  },
  (error) => {
    return Promise.reject(error);
  }
);

// Response interceptor - unified error handling
apiClient.interceptors.response.use(
  (response: AxiosResponse) => response,
  async (error: AxiosError) => {
    const { config, response } = error;

    if (!response) {
      handleNetworkError(error);
    } else {
      await handleServerError(error);
    }

    return Promise.reject(error);
  }
);

function handleNetworkError(error: AxiosError) {
  logErrorToService({
    error,
    context: {
      type: "network",
      url: error.config?.url,
      method: error.config?.method,
    },
    severity: "warning",
    category: "api-network",
  });

  showNotification({
    type: "error",
    title: "Connection Issue",
    message:
      "Unable to connect to the server. Please check your internet connection.",
  });
}

async function handleServerError(error: AxiosError) {
  const { response, config } = error;

  if (!response) return;

  const { status, data } = response;

  logErrorToService({
    error,
    context: {
      type: "server",
      status,
      url: config?.url,
      method: config?.method,
      responseData: data,
    },
    severity: status >= 500 ? "error" : "warning",
    category: "api-server",
  });

  switch (status) {
    case 401:
      handleUnauthorized();
      break;

    case 403:
      showNotification({
        type: "error",
        title: "Access Denied",
        message:
          data?.message || "You do not have permission to perform this action.",
      });
      break;

    case 404:
      break;

    case 422:
      break;

    case 429:
      showNotification({
        type: "warning",
        title: "Too Many Requests",
        message: "Please slow down and try again in a moment.",
      });
      break;

    case 500:
    case 502:
    case 503:
    case 504:
      showNotification({
        type: "error",
        title: "Server Error",
        message: "Something went wrong on our end. Our team has been notified.",
      });
      break;

    default:
      showNotification({
        type: "error",
        title: "Error",
        message: data?.message || "An unexpected error occurred.",
      });
  }
}

function handleUnauthorized() {
  localStorage.removeItem("authToken");
  sessionStorage.clear();
  window.location.href =
    "/login?redirect=" + encodeURIComponent(window.location.pathname);
}

export default apiClient;

Request Retry Logic

For transient failures, we implemented exponential backoff:

// retryHandler.ts
import { AxiosError, AxiosRequestConfig } from "axios";

interface RetryConfig extends AxiosRequestConfig {
  _retryCount?: number;
  _maxRetries?: number;
}

const RETRY_DELAY_BASE = 1000;
const MAX_RETRIES = 3;

export async function retryRequest(
  error: AxiosError,
  maxRetries: number = MAX_RETRIES
): Promise<unknown> {
  const config = error.config as RetryConfig;

  if (!config) {
    return Promise.reject(error);
  }

  config._retryCount = config._retryCount || 0;
  config._maxRetries = maxRetries;

  if (config._retryCount >= config._maxRetries) {
    return Promise.reject(error);
  }

  const shouldRetry =
    !error.response ||
    (error.response.status >= 500 && error.response.status < 600);

  if (!shouldRetry) {
    return Promise.reject(error);
  }

  config._retryCount += 1;
  const delay = RETRY_DELAY_BASE * Math.pow(2, config._retryCount - 1);
  await new Promise((resolve) => setTimeout(resolve, delay));

  return axios(config);
}

Layer 3: Global Runtime Error Handlers

To catch errors outside of React (event handlers, async operations, third-party scripts), we implemented global handlers:

// globalErrorHandler.ts
import { logErrorToService } from "@/services/errorLogger";

export function initializeGlobalErrorHandlers() {
  window.addEventListener("error", (event: ErrorEvent) => {
    const { message, filename, lineno, colno, error } = event;

    logErrorToService({
      error: error || new Error(message),
      context: {
        type: "unhandled-error",
        filename,
        lineno,
        colno,
        message,
      },
      severity: "error",
      category: "global-error",
    });

    event.preventDefault();
  });

  window.addEventListener(
    "unhandledrejection",
    (event: PromiseRejectionEvent) => {
      const error =
        event.reason instanceof Error
          ? event.reason
          : new Error(String(event.reason));

      logErrorToService({
        error,
        context: {
          type: "unhandled-rejection",
          reason: event.reason,
        },
        severity: "error",
        category: "promise-rejection",
      });

      event.preventDefault();
    }
  );

  if (process.env.NODE_ENV === "production") {
    const originalConsoleError = console.error;
    console.error = (...args: unknown[]) => {
      logErrorToService({
        error: new Error("Console Error"),
        context: {
          type: "console-error",
          args: args.map((arg) => String(arg)),
        },
        severity: "warning",
        category: "console",
      });

      originalConsoleError.apply(console, args);
    };
  }
}

Error Logging Service

All three layers feed into a centralized logging service:

// errorLogger.ts
import * as Sentry from "@sentry/react";

interface ErrorLogPayload {
  error: Error;
  context: Record<string, unknown>;
  severity: "info" | "warning" | "error" | "critical";
  category: string;
}

export function logErrorToService(payload: ErrorLogPayload) {
  const { error, context, severity, category } = payload;

  const enrichedContext = {
    ...context,
    timestamp: new Date().toISOString(),
    environment: process.env.NODE_ENV,
    appVersion: process.env.REACT_APP_VERSION,
    userAgent: navigator.userAgent,
    viewport: {
      width: window.innerWidth,
      height: window.innerHeight,
    },
  };

  Sentry.captureException(error, {
    level: severity,
    tags: { category },
    contexts: { custom: enrichedContext },
  });

  if (process.env.NODE_ENV === "development") {
    console.group(`🔴 ${severity.toUpperCase()}: ${category}`);
    console.error("Error:", error);
    console.log("Context:", enrichedContext);
    console.groupEnd();
  }
}

User-Facing Error Components

The final piece is giving users a great experience even when things go wrong:

// ErrorFallback.tsx
import React from "react";
import { Button } from "@/components/ui/Button";
import { AlertTriangle, RefreshCw, Home } from "lucide-react";

interface Props {
  error: Error | null;
  errorInfo: React.ErrorInfo | null;
  onReset: () => void;
  isolate?: boolean;
}

export function ErrorFallback({ error, errorInfo, onReset, isolate }: Props) {
  const isDevelopment = process.env.NODE_ENV === "development";

  return (
    <div className="flex flex-col items-center justify-center rounded-lg border border-zinc-200 bg-zinc-50 p-8 dark:border-zinc-800 dark:bg-zinc-900">
      <AlertTriangle className="h-12 w-12 text-amber-500" />
      <h2 className="mt-4 text-lg font-semibold">
        {isolate ? "Something went wrong" : "Oops! Something went wrong"}
      </h2>
      <p className="mt-2 text-center text-sm text-zinc-600 dark:text-zinc-400">
        {isolate
          ? "This feature encountered an error, but the rest of the app is still working."
          : "We've been notified and are working on a fix. Please try again."}
      </p>
      {isDevelopment && error && (
        <pre className="mt-4 max-h-48 overflow-auto rounded bg-zinc-900 p-4 text-left text-xs text-zinc-100">
          {error.toString()}
          {errorInfo?.componentStack && (
            <code className="block mt-2">{errorInfo.componentStack}</code>
          )}
        </pre>
      )}
      <div className="mt-6 flex w-full max-w-xs gap-3">
        <Button
          onClick={onReset}
          className="flex-1 flex items-center justify-center gap-2"
        >
          <RefreshCw className="h-4 w-4" />
          Try Again
        </Button>
        {!isolate && (
          <Button
            onClick={() => (window.location.href = "/")}
            className="flex-1 flex items-center justify-center gap-2"
            variant="outline"
          >
            <Home className="h-4 w-4" />
            Go Home
          </Button>
        )}
      </div>
      <p className="mt-4 text-xs text-zinc-500">
        Error ID: {generateErrorId()}
      </p>
    </div>
  );
}

function generateErrorId(): string {
  return `ERR-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
}

Results & Key Metrics

After implementing this comprehensive error handling system at Instantly:

47% reduction in support tickets related to application errors
99.8% error capture rate — virtually nothing slips through
Average error resolution time decreased from 3 days to 8 hours
User satisfaction scores improved by 23% in post-incident surveys
Zero untracked production incidents in 6 months

Lessons Learned

1. Layer Your Defense

Don't rely on a single error handling strategy. Each layer catches different types of errors.

2. Context is Everything

Capturing user context, component stacks, and system state is crucial for debugging. An error without context is just noise.

3. Fail Gracefully

Users don't care why something broke—they care about continuing their work. Isolated error boundaries let them do that.

4. Monitor Everything

Set up alerts for error rate spikes, specific error types, and critical user flows. Proactive > reactive.

5. User Experience Matters

Even your error states should be well-designed. A good error message can turn a frustrated user into an understanding one.

Next Steps

If you're building a production application, here's my recommended implementation order:

Week 1: Implement React Error Boundaries at the app and feature level
Week 2: Set up API interceptors with proper error categorization
Week 3: Add global runtime handlers and centralized logging
Week 4: Build error analytics dashboard and set up alerting
Ongoing: Monitor, iterate, and improve based on real production data

Conclusion

Building scalable error handling isn't glamorous work, but it's the difference between a fragile application and a production-grade system. The investment pays off immediately—in reduced support burden, faster debugging, and most importantly, happier users.

The system I've outlined here has been battle-tested across applications serving 200+ teams processing millions of operations. It's not theoretical—it's proven in production.

Want to discuss error handling strategies for your application? Reach out on LinkedIn or check out my other work at mozia.dev.

Tags: #React #NodeJS #ErrorHandling #SystemArchitecture #WebDevelopment #SoftwareEngineering