Crafting a Language of Failure: The Art of Error Codes

Manna Mahmud
15 min readJan 31, 2024

--

Understanding the Language of Errors

Error codes are more than mere communication threads; they are the subtle yet powerful language through which software speaks of its flaws and foibles. In a world where software increasingly becomes a cornerstone of our daily lives, the poetic flair lies not in the avoidance of errors, but in the eloquent articulation of them. Error codes, when designed with care, turn the maze of troubleshooting into a navigable map, transforming frustration into understanding. They are not just indicators of problems but are beacons guiding users and developers toward solutions, symbolizing a journey of continuous growth and refinement in the software realm.

Indicator of Maturity

Implementing well-thought-out error codes is a hallmark of maturity in software products and their development teams. Error codes extend beyond mere troubleshooting; they embody the software’s ability to self-assess and report. This is a reflection of software intelligence and a step towards autonomous problem-solving.

Contract Between Software and Users

User Perspective: From a user’s standpoint, error codes provide a clear indication that something has gone wrong and often, an indication of what exactly that is. This transparency is crucial for trust and usability.

Developer Perspective: For developers, error codes are a tool for efficient debugging and maintenance. They offer a systematic way to identify, categorize, and rectify issues, enhancing the software’s reliability and performance.

Crafting Error Codes — A Practical Guide

Best Practices:

  • Uniqueness: Every error code should represent a distinct issue. This prevents ambiguity and aids in faster resolution.
  • Descriptiveness: The code should hint at the error category or type, making it easier to understand at a glance.
  • Scalability: The system should be adaptable, allowing for new codes as the software evolves and new types of errors are encountered.
  • Numeric System: Often used for its simplicity. For example, HTTP status codes are a well-known numeric system (404 for Not Found, 500 for Internal Server Error).
  • Alphanumeric System: Offers more detail, like Microsoft’s Windows error codes (e.g., ‘ERROR_FILE_NOT_FOUND’).
  • Categorized System: Grouping error codes by type or module, which can be highly effective in large and complex systems.

Numbering Methods and Meanings

  • Sequential: Simple and straightforward but can become unwieldy as the number of errors grows.
  • Hierarchical: Reflects the structure of the system or software, aiding in pinpointing the error’s location.
  • Modular: Assigns ranges or prefixes based on modules or components, facilitating easier identification of the error source.

Categorizing Errors — System Errors vs. Domain Errors

Defining the Categories: System errors are related to the underlying platform or environment (like OS or hardware issues), while domain errors are specific to the application’s logic and functionality.

Technical Analysis: Different types of errors often necessitate different handling strategies. System errors might require more immediate attention or escalation, whereas domain errors might be more about logic correction or user input validation.

Other Categories of Errors

Expanding the View: Looking at user input errors (wrong data input), network errors (connectivity issues), and hardware failures (disk not found, memory errors), each with its specific set of codes and handling procedures.

IBM’s Implementation

In-Depth Study: IBM has a long history with error codes, especially in their mainframe and server products. Their error codes are often detailed, providing both an error identifier and a descriptive message, which can be crucial for systems requiring high reliability.

Red Hat’s Approach

Practical Application: Red Hat, particularly in their Linux distributions, utilizes a combination of system and application-level error codes. They often follow the POSIX standard for system errors, making them universal and easier to understand across different UNIX-like systems.

Google’s Error Handling

Modern Practices: Google, with its cloud-based services, often employs RESTful API error responses. These are typically HTTP status codes coupled with a JSON body that provides more context, aligning with modern web standards and providing a user-friendly approach to error handling.

Importance of a Comprehensive Error Code System

Enhanced Diagnostics: Detailed error codes allow for quicker and more accurate diagnosis of issues, which is crucial for maintaining high availability and performance.

Improved User Communication: Specific error codes can be mapped to user-friendly messages, improving the overall user experience.

Facilitates Monitoring and Logging: Detailed error codes make it easier to monitor application health and analyze logs for patterns and recurring issues.

Designing a Comprehensive Error Code System

In designing such a system, it’s crucial to:

Define a Clear Structure: Establish a consistent format for error codes that reflect their type and severity.

Ensure Extensibility: The system should be adaptable to accommodate new types of errors as the application evolves.

Document Thoroughly: Maintain clear documentation for each error code, its meaning, and potential resolutions.

Types of Error Codes

  1. Client-Side Errors
  • Input/Validation Errors: Incorrect or missing data provided by the user.
  • Authentication/Authorization Errors: Issues related to user login or permissions.
  • Resource Not Found: Attempting to access a non-existent resource.

2. Server-Side Errors

  • Service Unavailability: When a service the application depends on is not available.
  • Internal Server Errors: Generic errors for unforeseen issues on the server.
  • Integration Errors: Failures in interactions between different systems or services.

3. Network Errors

  • Timeout Errors: Occurs when a request or a response takes too long.
  • Connection Errors: Issues with network connectivity.
  • Environment Errors: Issues arising from the hosting or operating environment.

4. Data Errors

  • Data Access/Retrieval Errors: Problems encountered during data operations.
  • Data Integrity and Validation Errors: Issues with the quality or integrity of data.
  • Concurrency Errors: Errors due to concurrent data access or modification.

5. Configuration Errors

  • Startup Errors: Issues encountered during the application startup.
  • Dependency Errors: Problems related to missing or incorrect software dependencies.

6. Hardware Errors

  • Device Failure: Errors due to hardware malfunctions.

7. Application-Level Domain Errors

  • Business Rule Violations: Errors due to the violation of specific business logic or rules.
  • Entity/Resource State Errors: When an operation cannot be performed due to the current state of a resource (e.g., trying to delete an already deleted item).
  • Domain-Specific Scenarios: Errors unique to the specific domain or industry the application serves (e.g., insufficient funds in a banking application).

8. Security Errors

  • Access Violations: Unauthorized attempts to access resources.
  • Data Breach/Leakage Indications: Potential or actual data security breaches.

Importance of a Diverse Error Handling System

  • Targeted Troubleshooting: More specific error types allow for quicker, targeted responses to issues.
  • User and Developer Clarity: Clear, domain-specific error codes can provide better context to both users and developers, aiding in understanding and resolution.
  • Compliance and Auditing: In certain domains, detailed error logging is essential for compliance and auditing purposes.

Designing Strategy

  • Customized Error Categories: Tailor error categories to fit the specific needs and characteristics of the application and its domain.
  • Standardized Code Structure: Adopt a standardized format across all error types for consistency.
  • Comprehensive Documentation: Document every error type with its possible cause and resolution steps.
  • Continuous Review and Update: Regularly review and update the error code system to align with new features, changes in business rules, or external dependencies.

Extended Types of Error Codes

9. Performance Errors:

  • Related to the degradation of application performance.
  • Examples include slow response times, resource exhaustion (like memory leaks), and inefficiencies in processing.

10. API and Integration Errors:

  • Issues related to interactions with external APIs or integration with other systems.
  • Examples include failed API calls, incompatible data formats, and protocol mismatches.

11. User Experience (UX) Errors:

  • Errors impact the user’s interaction with the application.
  • Examples include broken UI elements, accessibility issues, and confusing navigation flows.

12. Compliance and Regulatory Errors:

  • Failures to adhere to industry-specific regulations or standards.
  • Examples include non-compliance with data protection laws, financial regulations, or accessibility standards.

13. Deployment and Versioning Errors:

  • Issues related to deploying new versions of the application or environment changes.
  • Examples include rollback failures, version incompatibilities, and issues with environment configurations.

14. Logging and Monitoring Errors:

  • Problems within the systems used to log and monitor application activity.
  • Examples include lost logs, incorrect log levels, and monitoring system failures.

15. Transactional Errors:

  • Issues related to transaction management, particularly in database operations.
  • Examples include transaction rollbacks, deadlocks, and isolation-level conflicts.

16. Session Management Errors:

  • Problems related to user session handling in the application.
  • Examples include session timeouts, session hijacking vulnerabilities, and inconsistent session states.

Exploring the error code designs of two technology giants

Google and Microsoft, offer valuable insights into their approach for handling various categories of errors. Both companies have developed their systems to effectively communicate and manage errors across different categories. Let’s dive into each category, examining how Google and Microsoft approach error code design and the reasoning behind their strategies.

1. Application-Level Domain Errors

  • Google: Google often uses descriptive error messages, especially in its APIs, which include both an error code and a message that explains the issue in a user-friendly manner. For instance, Google’s API error might include a code like INVALID_ARGUMENT, paired with a detailed message.
  • Microsoft: Microsoft tends to use numeric error codes, especially in its older software. For example, in SQL Server, you might encounter an error code 547, indicating a foreign key constraint violation.

Wisdom Behind Designs:

  • Google’s Approach: Aims for clarity and immediate understanding, especially useful in API interactions where developers need quick insights.
  • Microsoft’s Approach: Leverages a standardized numeric system, useful for internal debugging and when the software ecosystem is vast and varied.

2. Client-Side Errors

  • Google: In web applications and APIs, Google uses standard HTTP status codes like 404 for "Not Found" or 400 for "Bad Request", which are universally understood in web development.
  • Microsoft: Microsoft’s ASP.NET framework also uses standard HTTP status codes for web applications. For client-side applications like Windows software, it uses system error codes, like ERROR_FILE_NOT_FOUND with a numeric code 0x2.

Wisdom Behind Designs:

  • The common use of HTTP status codes simplifies web development by adhering to a universal standard.
  • Detailed system error codes in client applications aid in precise troubleshooting.

3. Server-Side Errors

  • Google: For server-side errors, especially in cloud services like Google Cloud Platform, Google employs HTTP status codes like 500 Internal Server Error and provides additional details in the response body.
  • Microsoft: In server environments like IIS, Microsoft also uses HTTP status codes. Additionally, it provides specific error codes in products like SQL Server, such as error 823, indicating a disk read error.

Wisdom Behind Designs:

  • Both approaches emphasize clarity and standardization, essential in server environments where multiple systems interact.

4. Network Errors

  • Google: Uses standard networking error codes and messages in its applications and services. For instance, a TIMEOUT error in a network request is communicated.
  • Microsoft: Similarly employs standard network error codes in its operating systems and applications, like ERROR_NETWORK_UNREACHABLE, code 1231.

Wisdom Behind Designs:

  • Adherence to standard network error codes aids in diagnosing issues that span across different systems and platforms.

5. Data Errors

  • Google: In its database services and APIs, Google uses specific error codes for data-related issues, like FAILED_PRECONDITION in Firestore for a query that requires an index.
  • Microsoft: SQL Server uses codes like 2601, indicating a primary key violation, providing immediate clarity on the nature of the data error.

Wisdom Behind Designs:

  • Specificity in error codes helps quickly identify and resolve data integrity and access issues.

6. Configuration Errors

  • Google: In Google Cloud services, configuration errors are often indicated by specific error messages and codes, like INVALID_CONFIG in Google Kubernetes Engine.
  • Microsoft: Provides detailed error codes and messages in its services and applications for configuration issues, like 0x80070005 for access denied due to improper configuration.

Wisdom Behind Designs:

  • Clear, descriptive error messages help in pinpointing configuration mistakes, crucial in complex cloud and service environments.

7. Security Errors

  • Google: Utilizes standard HTTP status codes like 403 Forbidden for security-related issues, along with more descriptive API error codes where needed.
  • Microsoft: In its services and operating systems, Microsoft uses specific security error codes, like 0x800704CF, indicating network-related security errors.

Wisdom Behind Designs:

  • Emphasizes the importance of clear communication in security-related errors to aid in prompt and effective resolution.

8. Performance Errors

  • Google: In Google Cloud Monitoring, performance issues are indicated by specific metrics and alerts rather than traditional error codes.
  • Microsoft: Uses performance counters and event logs in Windows to track performance issues, rather than specific error codes.

Wisdom Behind Designs:

  • The focus on metrics and logging for performance issues reflects the nature of these errors, which are often about trends and patterns rather than single incidents.

9. API and Integration Errors

  • Google: Google’s APIs provide specific error codes like PERMISSION_DENIED or API_KEY_EXPIRED, offering clarity on integration issues.
  • Microsoft: In Azure services, Microsoft uses detailed error messages and codes for API and integration issues, aiding developers in troubleshooting.

Wisdom Behind Designs:

  • Tailored error codes for API and integration issues facilitate smoother inter-service communication and faster problem resolution.

10. User Experience (UX) Errors

  • Google: Focuses more on user-friendly error messages in its applications to guide users, rather than using technical error codes.
  • Microsoft: Similar approach in user-facing applications, with emphasis on actionable error messages rather than codes.

Wisdom Behind Designs:

  • User-friendly messages in UX errors prioritize the end-user experience and provide clear guidance on what actions to take.

11. Compliance and Regulatory Errors

  • Google: In Google Cloud Compliance reports, issues are often flagged with specific compliance-related terms rather than error codes.
  • Microsoft: Uses compliance tools within its ecosystem to flag non-compliance, often without specific error codes.

Wisdom Behind Designs:

  • Reflects the nature of compliance issues which are often more about meeting standards and less about technical faults.

12. Deployment and Versioning Errors

  • Google: In Google Cloud Build or Kubernetes Engine, deployment errors are indicated with descriptive messages, sometimes including error codes.
  • Microsoft: Azure DevOps provides detailed logs and messages for deployment issues, with specific codes in some cases.

Wisdom Behind Designs:

  • Descriptive logs and messages for deployment and versioning issues reflect the complexity and multifaceted nature of these errors.

13. Logging and Monitoring Errors

  • Google: Google Cloud’s operations suite provides alerts and detailed logs for monitoring issues, focusing on descriptive analytics over error codes.
  • Microsoft: In Azure Monitor, a similar approach with detailed logs and alerts rather than specific error codes for logging and monitoring issues.

Wisdom Behind Designs:

  • Emphasizes the analytical and diagnostic nature of logging and monitoring, where detailed logs and metrics are more useful than traditional error codes.

14. Transactional Errors

  • Google: Cloud Spanner and other database services provide specific error messages and codes for transactional issues, like ABORTED for a transaction conflict.
  • Microsoft: SQL Server uses error codes for transactional issues, like 1205 for a deadlock.

Wisdom Behind Designs:

  • Specific error codes in transactional errors aid in quickly identifying and resolving complex database conflicts.

Error Code Design Scheme

Code Format: E[Category][Subcategory][Sequence]

  • E: Denotes 'Error'.
  • [Category]: A single digit representing the main category of error.
  • [Subcategory]: A single digit for the subcategory.
  • [Sequence]: A sequence number for the specific error.

Categories and Subcategories

Application-Level Domain Errors (E1)

  • Business Rule Violations (E11)
  • Entity/Resource State Errors (E12)
  • Domain-Specific Scenarios (E13)

Client-Side Errors (E2)

  • User Input Errors (E21)
  • Authentication/Authorization Errors (E22)
  • Resource Not Found (E23)

Server-Side Errors (E3)

  • Service Unavailability (E31)
  • Internal Server Errors (E32)
  • Configuration Errors (E33)

Network Errors (E4)

  • Timeout Errors (E41)
  • Connection Errors (E42)

Data Errors (E5)

  • Data Access Errors (E51)
  • Data Integrity Errors (E52)
  • Concurrency Errors (E53)

Example Error Codes and Messages

E11201: Business Rule Violation — ‘User age must be over 18’.

  • Indicates a violation of a business rule regarding age.

E21301: User Input Error — ‘Email format is invalid’.

  • This implies the user’s input for the email address is not in a valid format.

E32101: Internal Server Error — ‘Unexpected error occurred during data processing’.

  • A generic server-side error is typically used when no specific details are available.

E41201: Network Timeout Error — ‘Request timed out while connecting to the service’.

  • Indicates a network timeout issue during a service request.

E52301: Data Integrity Error — ‘Duplicate entry not allowed for unique field’.

  • This error occurs when a duplicate entry is attempted for a field that requires unique values.

Considerations while designing

Consistency: Maintain consistent formatting across all error messages and codes.

Documentation: Document each error code with a detailed description and possible resolution steps.

Internationalization: Consider internationalizing error messages for a global user base.

Logging and Monitoring: Implement robust logging and monitoring to capture these errors for analysis and quick resolution.

This error code design allows for precise categorization of errors, enhancing the ability to diagnose and resolve issues effectively. The scheme is scalable, allowing new categories and subcategories to be added as the application evolves.

Integrating Severity into Error Code Design

Severity is an important factor to consider in error code design. Including severity information in error codes or accompanying error messages helps in prioritizing issues, determining the urgency of response needed, and guiding users and developers in understanding the impact of an error. This is particularly crucial in large and complex systems where differentiating between critical failures and minor glitches is essential for efficient troubleshooting and maintenance.

Severity Levels: Commonly, severity levels are categorized as follows:

  • Critical/Error: Indicates a failure that prevents normal operation or leads to a system crash or data loss.
  • Warning: Suggests an issue that might not immediately impact performance but indicates potential future problems.
  • Info: Provides informational messages that don’t indicate an error but are useful for understanding the system’s state.

Incorporating Severity in Codes: There are several approaches to include severity in error codes:

  • Prefix/Suffix in Error Code: Add a letter or number at the beginning or end of the error code to indicate severity (e.g., E for Error, W for Warning, I for Info).
  • Separate Field: In some systems, severity might be indicated in a separate field alongside the error code.

Examples:

  • E-W12301: Warning — User Input Error — ‘Email format is invalid’.
  • E-E32101: Error — Internal Server Error — ‘Unexpected error occurred during data processing’.
  • E-I41201: Info — Network Timeout Error — ‘Request timed out while connecting to the service’.

Implement an error-handling system using Spring Boot 3, Zalando’s problem-spring-web library

Needs to set up an exception-handling mechanism that converts exceptions into HTTP responses following the Problem Details for HTTP APIs (RFC 7807). This approach to error handling is robust, focusing on clear structure and reusability.

Step 1: Add Dependencies

First, ensure you have the necessary dependencies in your pom.xml or build.gradle file. This includes Spring Boot 3 and problem-spring-web.

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.zalando</groupId>
<artifactId>problem-spring-web</artifactId>
<version>{version}</version>
</dependency>
<!-- Add other necessary dependencies -->

Step 1: Define Error Code Enum

Create an enum to represent error categories and their details. This approach ensures that the error codes are not hardcoded throughout the application.

public enum ErrorCode {
PRODUCT_NOT_FOUND("E-E1P404", "Product not found", Status.NOT_FOUND),
INVALID_PRODUCT_DATA("E-E1P400", "Invalid product data", Status.BAD_REQUEST);
// Add more as needed
private final String code;
private final String message;
private final Status status;
ErrorCode(String code, String message, Status status) {
this.code = code;
this.message = message;
this.status = status;
}
public String getCode() {
return code;
}
public String getMessage() {
return message;
}
public Status getStatus() {
return status;
}
}

Step 2: Custom Exception Class

Create a custom exception class that can take an ErrorCode as an argument.

public class CustomBusinessException extends RuntimeException {
private final ErrorCode errorCode;
public CustomBusinessException(ErrorCode errorCode) {
super(errorCode.getMessage());
this.errorCode = errorCode;
}
public ErrorCode getErrorCode() {
return errorCode;
}
}

Step 3: Exception Translator with Zalando’s Problem

Implement an exception translator to convert the custom exceptions into Problem objects.

@ControllerAdvice
public class ExceptionTranslator implements ProblemHandling {
@ExceptionHandler(CustomBusinessException.class)
public ResponseEntity<Problem> handleCustomBusinessException(CustomBusinessException ex, NativeWebRequest request) {
ErrorCode errorCode = ex.getErrorCode();
Problem problem = Problem.builder()
.withType(URI.create("https://example.org/" + errorCode.getCode()))
.withTitle(errorCode.getMessage())
.withStatus(errorCode.getStatus())
.withDetail(ex.getMessage())
.with("errorCode", errorCode.getCode())
.build();
return create(ex, problem, request);
}
// Override other methods from ProblemHandling as needed...
}

Step 4: Using Custom Exceptions in Application Logic

Use the custom exceptions in your application logic by passing the appropriate ErrorCode enum.

public class ProductService {
public Product getProduct(Long id) {
if (productNotFoundCondition) {
throw new CustomBusinessException(ErrorCode.PRODUCT_NOT_FOUND);
}
// Additional logic...
}
}

With this approach, you define error codes and messages in a centralized enum, avoiding hardcoding and making it easier to manage and update error information. The use of Zalando’s problem-spring-web library ensures that errors are handled consistently and returned as standardized HTTP responses, in line with the Problem Details for HTTP APIs specification. This method offers a maintainable, scalable, and clear way to handle errors in a Spring Boot application.

Implementing an interactive and user-friendly error handling design in an Angular application, based on error codes received from a backend service like one using Zalando’s problem-spring-web, is a great way to enhance user experience. By utilizing the error code, message, and other metadata provided in the HTTP response, we can present the errors in a more engaging and informative manner with icons, colors, and content.

Here’s a step-by-step approach to achieve this:

Step 1: Define Error Handling Service in Angular

Create a service in Angular that will handle the HTTP responses and extract necessary information.

import { Injectable } from '@angular/core';
import { HttpErrorResponse } from '@angular/common/http';
@Injectable({ providedIn: 'root' })
export class ErrorHandlerService {
public handleError(response: HttpErrorResponse) {
let errorMessage = 'Unknown error occurred';
let icon = 'error';
let color = 'warn';
if (response.error instanceof ErrorEvent) {
// Client-side error
errorMessage = response.error.message;
} else {
// Server-side error
errorMessage = response.error.detail || response.statusText;
switch (response.error.errorCode) {
case 'E-E1P404': // create constants to remove hard code
icon = 'search_off';
color = 'accent';
break;
case 'E-E1P400':
icon = 'report_problem';
color = 'primary';
break;
// Add more cases as per different error codes
}
}
return { errorMessage, icon, color };
}
}

Step 2: Create a Component to Display Error Messages

This component will display the error message with the appropriate icon and color.

<!-- error-display.component.html -->
<div *ngIf="error" class="error-container" [ngStyle]="{ color: error.color }">
<mat-icon>{{ error.icon }}</mat-icon>
<p>{{ error.errorMessage }}</p>
</div>
// error-display.component.ts
import { Component, Input } from '@angular/core';
@Component({
selector: 'app-error-display',
templateUrl: './error-display.component.html',
styleUrls: ['./error-display.component.css']
})
export class ErrorDisplayComponent {
@Input() error: { errorMessage: string, icon: string, color: string };
}

Step 3: Handling Errors in Angular Service or Component

When calling your API, use the ErrorHandlerService to process any errors and bind the result to your error display component.

this.httpClient.get('/api/products').subscribe({
next: (data) => { /* Handle data */ },
error: (error: HttpErrorResponse) => {
this.error = this.errorHandlerService.handleError(error);
}
});

Step 4: Styling the Error Messages

Add CSS to style the error messages, icons, and colors.

/* error-display.component.css */
.error-container {
display: flex;
align-items: center;
margin-top: 20px;
}
.error-container mat-icon {
margin-right: 10px;
}

This approach allows your Angular application to interactively present backend errors, enhancing user experience. It leverages the structured error responses from the backend, displaying them with relevant icons and colors that correspond to the nature of the error. This method not only improves the aesthetics but also makes the errors more understandable to the users.

For example, schema and code are just samples, in the real world, they would require more enhancements.

Disclaimer: The views reflected in this article are the author’s views and do not necessarily reflect the views of any past or present employer of the author.

--

--