How to Read the Observability Dashboard

What This Page Is For

The observability dashboard is the fastest way to answer three questions:

Is Syllecta receiving webhook traffic for this tenant?
Is callback delivery healthy or failing?
Which specific events should I open next for triage?

This guide explains how to read the dashboard and when to jump from summary data into the raw event log.

Where to Find It

Tenant users: open your tenant home page. You will see Webhook health followed by Webhook observability.
Admins / super admins: open Tenants, select a tenant, then switch to the Observability tab.

The dashboard always scopes to one tenant at a time and uses that tenant's business-day time zone.

Start With the Summary Cards

The top cards tell you whether you have a traffic problem, a delivery problem, or both.

Total Events

This is the number of webhook events received in the selected date range.

Use it to answer:

Is traffic reaching Syllecta at all?
Did volume suddenly drop after a provider or callback change?

If this number is unexpectedly low, the problem is usually upstream:

provider not sending,
wrong endpoint,
signature secret mismatch,
wrong tenant/callback configuration.

Success Rate

This is the percentage of events that finished as processed.

Interpretation:

High and stable usually means the pipeline is healthy.
Dropping success rate means delivery failures are accumulating faster than normal.
Low volume + low success rate often points to a configuration problem rather than random noise.

Treat this as the fastest “health” signal, not as the final diagnosis.

Failed

This counts events that ended in delivery_failed.

Clicking the card opens Webhooks → Events already filtered to failed deliveries for the same date range.

Use it when you want to move from “something is wrong” to “show me the exact failed rows”.

DLQ

This is the operational count for events that need follow-up after delivery failure.

Practical reading:

Failed > 0 means callbacks are not succeeding
DLQ > 0 means operators should inspect and replay where appropriate

If you only remember one rule: Failed tells you something broke; DLQ tells you it now needs operator attention.

Use the Range Controls First

The dashboard supports:

7D
30D
Refresh
Include synthetic

Recommended usage:

start with 7D when investigating a fresh incident
switch to 30D when you need trend context
enable Include synthetic only when you are validating synthetic pipeline monitors or test traffic

By default, synthetic test events are hidden so production traffic stays readable.

How to Read Each Section

Events per Day

This chart shows daily webhook volume:

green = processed
red = delivery failed

How to use it:

a single red spike usually points to a short incident window
a red band across many days usually means a persistent callback or provider issue
a sudden drop in total events usually means traffic stopped arriving upstream

Each day column is clickable and opens the Webhooks event log filtered to that exact day.

Volume by Provider

This section answers:

which provider is sending most of the traffic
whether failures are isolated to one provider or spread across all providers

Interpret it like this:

one provider degrading while others stay healthy usually means a provider-specific issue
all providers degrading at once usually means a shared callback or downstream availability problem

Top Failure Reasons

This is usually the quickest path to the root cause.

Examples of what you may see:

callback 4xx
callback 5xx
callback not configured
signature/configuration-related reasons

Each row is clickable and opens the Webhooks log filtered to that failure reason.

Use this section when you already know failure exists and need to know why.

Top Failing Callback Endpoints

This section shows which callback URLs are causing the most failed deliveries.

Use it to separate:

one bad downstream endpoint
from a tenant-wide or provider-wide incident

If one callback endpoint dominates the list, the next step is usually to:

open the filtered event log,
inspect the failed rows,
confirm response status and failure detail,
fix the downstream service,
retry events if needed.

Last Successful Delivery / Last Failed Delivery

These cards help with recency checks.

Use them to answer:

when was the last known good delivery?
what is the latest concrete failure I can inspect right now?

They are especially useful after a deploy or secret rotation because they quickly show whether traffic recovered.

Recommended Triage Flow

When someone says “webhooks are broken”, use this sequence:

Open the tenant’s observability dashboard.
Check Total Events.
- If traffic is zero or much lower than expected, investigate provider ingress first.
Check Success Rate and Failed.
- If failures exist, click the Failed card.
In the event log, inspect a recent failed row.
- Look at normalized payload, headers, response status, callback endpoint, failure reason, and detail.
Return to the dashboard and use:
- Top Failure Reasons to confirm the dominant cause
- Top Failing Callback Endpoints to confirm scope
After the downstream fix, use Retry from the event log where appropriate.

That is the intended workflow: dashboard first, event log second.

What the Numbers Usually Mean

High Total Events + High Success Rate

Normal healthy traffic.

High Total Events + Falling Success Rate

Traffic is arriving, but callback delivery is degrading.

Low Total Events + Low Success Rate

Usually a configuration or upstream issue, not just random delivery failures.

Failures Concentrated in One Provider

Likely provider-specific payload/configuration behavior.

Failures Concentrated in One Callback Endpoint

Likely downstream service problem on your side.

Success Rate Recovers but Failed Count Exists in the Window

The incident may already be over, but historical failed rows still exist in the selected range. Open the event log and focus on the newest failures first.

When to Leave the Dashboard

The dashboard is for pattern recognition. Leave it and open raw events when you need:

exact payload content,
headers,
response status,
failure detail,
manual retry,
correlation-level debugging.

Use the dashboard to narrow the search. Use Webhooks → Events to resolve the incident.