Error Handling & Retries

A. Overview

The video upload and processing pipeline is designed to be robust and resilient. In every step—from file upload to final video analysis—the system anticipates potential failures. Error handling is integrated at multiple layers, and automatic retries are implemented where appropriate. This ensures that transient errors do not interrupt the user experience and that all failures are logged for diagnostic purposes.

B. Error Categories

1. Client-Side Errors (HTTP 4xx)

  • 400 Bad Request:

    • Occurs when the client sends malformed data (e.g., missing required fields like video or user_id, or invalid file format).

    • Validation errors on input fields (e.g., improper file type or an invalid timestamp format).

  • 401 Unauthorized:

    • Triggered when the access token is missing or invalid.

  • 403 Forbidden:

    • Returned when a user lacks the necessary permissions (e.g., a non-admin trying to create a session on behalf of another user).

  • 404 Not Found:

    • Occurs if the requested resource (such as a specific session or video file) is not available.

  • 422 Unprocessable Entity:

    • Used when the file type is correct but cannot be processed (e.g., video file corruption, encryption/decryption failures).

2. Server-Side Errors (HTTP 5xx)

  • 500 Internal Server Error:

    • Catches unexpected issues during backend processing, such as unhandled exceptions in video processing modules.

  • 503 Service Unavailable:

    • Indicates temporary unavailability of the service, possibly due to high load or maintenance.

C. Retry Strategy

1. Automatic Retry Mechanism

For transient errors—especially network issues or temporary backend overload—the system implements an exponential backoff retry strategy:

  • Retry Triggers:

    • Network timeouts, connection resets, HTTP 503 errors, or 500 errors that indicate a temporary glitch.

  • Retry Policy:

    • The client (or the processing service itself) will automatically retry the failed operation up to a maximum of 3–5 times.

    • Delays between retries will increase exponentially (e.g., 1 second, 2 seconds, 4 seconds) to prevent overwhelming the backend.

  • Circuit Breaker:

    • If a certain endpoint consistently fails beyond the maximum retry count, the system will “open” a circuit breaker to temporarily stop retrying, log the error, and alert the monitoring system.

2. Manual Intervention and Fallbacks

  • Fallbacks:

    • In cases where a video file fails to process after all retries, the system can fall back to serving a generic error message (with an option to re-initiate the upload manually) and store the failure details for support.

  • User Feedback:

    • The API returns clear error messages and error codes that can be used by the front end to inform users (e.g., “Upload failed due to network issues. Please try again later.”).

  • Logging & Alerting:

    • All errors—including those that trigger retries—are logged with details (error message, timestamp, affected resource). This information is used for debugging and can trigger alerts for critical issues.

D. Implementation Details

  • Error Response Format: Every error response includes a consistent JSON structure:

{
  "error": "Detailed error message describing what went wrong.",
  "error_code": "SPECIFIC_ERROR_CODE"
}
  • Status Code Mapping: Each endpoint in the API has been mapped to return specific HTTP status codes for expected error scenarios.

  • Monitoring & Analytics: Our backend continuously monitors error rates and retry attempts to automatically adjust thresholds and inform engineering teams of potential issues.

Last updated

Was this helpful?