How to Read the Observability Dashboard
What This Page Is For
The observability dashboard is the fastest way to answer three questions:
- Is Syllecta receiving webhook traffic for this tenant?
- Is callback delivery healthy or failing?
- Which specific events should I open next for triage?
This guide explains how to read the dashboard and when to jump from summary data into the raw event log.
Where to Find It
- Tenant users: open your tenant home page. You will see Webhook health followed by Webhook observability.
- Admins / super admins: open Tenants, select a tenant, then switch to the Observability tab.
The dashboard always scopes to one tenant at a time and uses that tenant's business-day time zone.
Start With the Summary Cards
The top cards tell you whether you have a traffic problem, a delivery problem, or both.
Total Events
This is the number of webhook events received in the selected date range.
Use it to answer:
- Is traffic reaching Syllecta at all?
- Did volume suddenly drop after a provider or callback change?
If this number is unexpectedly low, the problem is usually upstream:
- provider not sending,
- wrong endpoint,
- signature secret mismatch,
- wrong tenant/callback configuration.
Success Rate
This is the percentage of events that finished as processed.
Interpretation:
- High and stable usually means the pipeline is healthy.
- Dropping success rate means delivery failures are accumulating faster than normal.
- Low volume + low success rate often points to a configuration problem rather than random noise.
Treat this as the fastest “health” signal, not as the final diagnosis.
Failed
This counts events that ended in delivery_failed.
Clicking the card opens Webhooks → Events already filtered to failed deliveries for the same date range.
Use it when you want to move from “something is wrong” to “show me the exact failed rows”.
DLQ
This is the operational count for events that need follow-up after delivery failure.
Practical reading:
- Failed > 0 means callbacks are not succeeding
- DLQ > 0 means operators should inspect and replay where appropriate
If you only remember one rule: Failed tells you something broke; DLQ tells you it now needs operator attention.
Use the Range Controls First
The dashboard supports:
7D30DRefreshInclude synthetic
Recommended usage:
- start with
7Dwhen investigating a fresh incident - switch to
30Dwhen you need trend context - enable Include synthetic only when you are validating synthetic pipeline monitors or test traffic
By default, synthetic test events are hidden so production traffic stays readable.
How to Read Each Section
Events per Day
This chart shows daily webhook volume:
- green = processed
- red = delivery failed
How to use it:
- a single red spike usually points to a short incident window
- a red band across many days usually means a persistent callback or provider issue
- a sudden drop in total events usually means traffic stopped arriving upstream
Each day column is clickable and opens the Webhooks event log filtered to that exact day.
Volume by Provider
This section answers:
- which provider is sending most of the traffic
- whether failures are isolated to one provider or spread across all providers
Interpret it like this:
- one provider degrading while others stay healthy usually means a provider-specific issue
- all providers degrading at once usually means a shared callback or downstream availability problem
Top Failure Reasons
This is usually the quickest path to the root cause.
Examples of what you may see:
- callback 4xx
- callback 5xx
- callback not configured
- signature/configuration-related reasons
Each row is clickable and opens the Webhooks log filtered to that failure reason.
Use this section when you already know failure exists and need to know why.
Top Failing Callback Endpoints
This section shows which callback URLs are causing the most failed deliveries.
Use it to separate:
- one bad downstream endpoint
- from a tenant-wide or provider-wide incident
If one callback endpoint dominates the list, the next step is usually to:
- open the filtered event log,
- inspect the failed rows,
- confirm response status and failure detail,
- fix the downstream service,
- retry events if needed.
Last Successful Delivery / Last Failed Delivery
These cards help with recency checks.
Use them to answer:
- when was the last known good delivery?
- what is the latest concrete failure I can inspect right now?
They are especially useful after a deploy or secret rotation because they quickly show whether traffic recovered.
Recommended Triage Flow
When someone says “webhooks are broken”, use this sequence:
- Open the tenant’s observability dashboard.
- Check Total Events.
- If traffic is zero or much lower than expected, investigate provider ingress first.
- Check Success Rate and Failed.
- If failures exist, click the Failed card.
- In the event log, inspect a recent failed row.
- Look at normalized payload, headers, response status, callback endpoint, failure reason, and detail.
- Return to the dashboard and use:
- Top Failure Reasons to confirm the dominant cause
- Top Failing Callback Endpoints to confirm scope
- After the downstream fix, use Retry from the event log where appropriate.
That is the intended workflow: dashboard first, event log second.
What the Numbers Usually Mean
High Total Events + High Success Rate
Normal healthy traffic.
High Total Events + Falling Success Rate
Traffic is arriving, but callback delivery is degrading.
Low Total Events + Low Success Rate
Usually a configuration or upstream issue, not just random delivery failures.
Failures Concentrated in One Provider
Likely provider-specific payload/configuration behavior.
Failures Concentrated in One Callback Endpoint
Likely downstream service problem on your side.
Success Rate Recovers but Failed Count Exists in the Window
The incident may already be over, but historical failed rows still exist in the selected range. Open the event log and focus on the newest failures first.
When to Leave the Dashboard
The dashboard is for pattern recognition. Leave it and open raw events when you need:
- exact payload content,
- headers,
- response status,
- failure detail,
- manual retry,
- correlation-level debugging.
Use the dashboard to narrow the search. Use Webhooks → Events to resolve the incident.