Skip to content

Retry Strategies

amqp-contract has two retry mechanisms that work together: a queue-level retry mode declared on the queue, and handler-level error classification returned by your handler. The first decides how retries happen, the second decides whether a particular failure should retry at all.

This page explains both, and how they compose.

The mental model

When a worker processes a message, three things can happen:

  1. The handler returns ok(undefined) — the message is acked and gone.
  2. The handler returns err(NonRetryableError) — the message is sent straight to the DLQ, no retries (the queue's retry mode is bypassed).
  3. The handler returns err(RetryableError) — the queue's retry mode decides what happens next.

So RetryableError is the only path that consults the retry mode. NonRetryableError is your way of saying "this will never succeed, don't bother."

A separate path exists for parse / schema-validation failures: those go to the DLQ unconditionally, with no retries, regardless of the queue's mode. Retrying a malformed payload would burn the retry budget on a guaranteed failure. See Error Model.

Queue-level retry modes

The retry mode is declared on the queue, not the handler:

ts
import { defineQueue, defineExchange } from "@amqp-contract/contract";

const dlx = defineExchange("orders-dlx", { type: "direct" });

const queue = defineQueue("orders-processing", {
  deadLetter: { exchange: dlx, routingKey: "orders.dead" },
  retry: {
    mode: "ttl-backoff",
    maxRetries: 5,
    initialDelayMs: 1000,
    maxDelayMs: 30_000,
    backoffMultiplier: 2,
    jitter: true,
  },
});

There are three modes.

none (default)

No retry. A RetryableError from the handler is treated like a NonRetryableError: nack with requeue=false, so the message goes to the DLX (if configured) or is dropped. Useful when the work is idempotent at a higher level (e.g. an outbox pattern that retries from the source).

ts
defineQueue("orders", {
  deadLetter: { exchange: dlx },
  retry: { mode: "none" },
});

immediate-requeue

The simplest useful mode: failed messages are requeued immediately, up to maxRetries times. After that, the message goes to the DLQ.

ts
defineQueue("orders", {
  deadLetter: { exchange: dlx },
  retry: { mode: "immediate-requeue", maxRetries: 5 },
});

For quorum queues, retry counts come from RabbitMQ's native x-delivery-count header. For classic queues, the worker maintains a custom x-retry-count header by re-publishing the message.

Use this mode when failures are likely transient and short-lived (a flapping connection, a brief lock contention) and a tight retry loop is acceptable.

ttl-backoff

Failed messages are routed through a wait queue with a per-message TTL, then back to the main queue. Each attempt's delay grows exponentially. The wait queue, retry exchange, and bindings are auto-generated by defineContract — you don't have to wire them by hand.

ts
defineQueue("orders", {
  deadLetter: { exchange: dlx },
  retry: {
    mode: "ttl-backoff",
    maxRetries: 5,
    initialDelayMs: 1000,
    maxDelayMs: 30_000,
    backoffMultiplier: 2, // delay = initial * multiplier^attempt, capped at maxDelayMs
    jitter: true, // ±50% randomisation to avoid thundering herd
  },
});

Use this mode when failures are likely to take time to resolve (a downstream service is degraded, you're hitting a rate limit, a database is recovering). The exponential growth gives the dependency time to come back; jitter spreads load across many in-flight retries.

Handler-level error classification

Inside a handler, you decide whether a failure is retryable:

ts
import { defineHandler, RetryableError, NonRetryableError } from "@amqp-contract/worker";
import { ResultAsync, Result } from "neverthrow";

const processOrder = defineHandler(contract, "processOrder", ({ payload }) =>
  ResultAsync.fromPromise(callPaymentApi(payload), (error) => {
    // 4xx from a payment provider: retrying won't help.
    if (error instanceof PaymentValidationError) {
      return new NonRetryableError("Invalid payment details", error);
    }
    // 5xx, timeout, network blip: try again.
    return new RetryableError("Payment provider unavailable", error);
  }).map(() => undefined),
);

Rule of thumb: if the same input would produce the same failure tomorrow, it's NonRetryableError. If a transient condition could change, it's RetryableError.

Choosing a mode — quick reference

SituationModemaxRetriesNotes
Transient network blips, lock contentionimmediate-requeue3–5Fast, simple
Rate-limited downstreamttl-backoff5–10Backoff lets the limiter window roll
Degraded service, recovery in secondsttl-backoff5initialDelayMs ≈ 1s, maxDelayMs ≈ 30s
Outbox-driven (source retries)noneDon't double up retries
Idempotent and cheap to retryimmediate-requeue5–10

Inspecting retry state

The worker enriches messages with diagnostic headers only on retry paths that re-publish the message — that is, classic queues in immediate-requeue mode, and any queue in ttl-backoff mode. Direct-DLQ paths (nack(requeue=false): NonRetryableError, validation/parse failures, quorum queues in immediate-requeue) do not modify headers, so the DLQ message looks exactly like the broker delivered it. Plan your DLQ tooling accordingly.

HeaderMeaningWhen the worker sets it
x-delivery-countRabbitMQ-native attempt countSet by the broker (quorum queues only); never re-written
x-retry-countWorker-managed attempt countRepublish paths only (classic immediate-requeue, ttl-backoff)
x-last-errorError message from the most recent attemptRepublish paths only
x-first-failure-timestampEpoch ms of the first failureRepublish paths only
x-wait-queue / x-retry-queueInternal routing pointers used by the ttl-backoff dancettl-backoff republish only

If you need the failure context to be visible on poison messages too, prefer a queue configured with ttl-backoff (which always republishes) or pass maxRetries: 1 so a single republish stamps the headers before the message goes to DLQ.

Pitfalls

  • No DLX configured. nack(requeue=false) will drop the message. The worker logs a warning, but you'll lose the body. Always set deadLetter if you care about poison messages.
  • Throwing instead of returning. A handler that throws an exception bypasses the ResultAsync/Result framework. The worker has a defensive try/catch that nacks to DLQ, but you've lost the chance to classify the error. Always return errAsync(...).
  • Mixing modes per handler. Retry is a property of the queue, not the handler. If you want different policies for different work, give them different queues.
  • Tuning maxRetries without DLQ inspection. Pick a number that makes sense for your latency budget; the right answer almost always lives in the DLQ telemetry, not in your head.

Released under the MIT License.