Slack alerts for automation: practical setup

A first retainer typically starts with "all alerts to #automation-alerts". After 2 weeks the channel has 200 messages a day, no one reads it, a critical incident sits 4 hours before someone notices. This guide explains how not to fall into that trap.

An alert no one reads is not an alert — it is noise. Good observability means fewer channels but more precisely routed.

Webhook vs Bot API — what when

Slack has two main APIs for sending messages:

Incoming Webhook (simplest): URL → POST JSON → message. Setup: 5 min. Limitation: one URL per channel, no interaction.
Bot API (fuller): token + scope → send to any channel, threads, reactions, interactive buttons. Setup: 15-30 min. Requires workspace admin approval.

Rule: if you only send information — webhook. If you need interaction (Ack, Mute 1h, Reassign buttons) — Bot API.

Routing per priority

Instead of 1 channel, use 3:

#ax-alerts-critical — pipeline down, data corrupted, security incident. Ping @here or @oncall. Goal: response <15 min.
#ax-alerts-high — degraded performance, partial failures, parser starts to fail. No ping. Goal: response in business hours <4h.
#ax-logs — normal operations, daily summary, scheduled job completed. Read-only muted by default. Goal: refer when needed.

Critical threshold: only when action required in <1h. Everything else is high or logs. Discipline of critical → respond to every critical in 15 min.

Anatomy of a good alert

Every alert message should contain:

Severity icon — 🔴 critical, 🟡 high, 🔵 info (one glance enough)
System ID — e.g. OPS-25-K7 — immediately clear whose system
1-line summary — what broke (not "error", but "parser failed on nike.com — selector .price-now changed")
Impact — what this means operationally ("0 SKUs collected last 2h, retry queue: 47")
Direct link — to dashboard / logs / runbook
Recommended action — simplest next step

Practical templates

Critical alert (parser failed)

🔴 [OPS-25-K7] Parser failed | nike.com 14 selectors broken — likely site redesign Impact: 0/247 SKUs collected last 90 min Action: fix parser within 4h 📊 Dashboard: ax.io/ops-25-k7 | 📚 Runbook: ax.io/rb/parser-fail

Daily summary (info)

🔵 [OPS-25-K7] Daily summary | 2026-03-25 ✅ 247/247 SKUs collected (100%) 📈 12 prices changed (3 down, 9 up) ⏱️ Total runtime: 4m 32s 📊 Dashboard: ax.io/ops-25-k7

Anti-patterns

What not to do:

Alert per row — "Found 47 new prices" as 47 separate messages. Aggregate.
Alert without context — "Error" or "Failed task". What? Where? What are the consequences?
"All OK" alert — every minute "system OK". No one reads it, hides real alerts.
@channel for normal events — pings reserved for critical. Otherwise the team stops reacting.
Same alert every 5 min — when a parser fails for an hour, 1 alert + escalate, not 12.

Snooze + acknowledgment flow

Bot API allows interactive buttons. Practical implementation:

[Ack] — "I know about the problem, working on it". Suspends re-alerts for 1h.
[Mute 4h] — "I know, fix is coming, do not spam". Mutes for 4h.
[Resolved] — incident closed, opens post-mortem template.
[Escalate] — pings on-call + creates Linear ticket.

Without this, the team lives in the stress of "has someone seen this yet?". With buttons — clear ownership flow.

The point

3 channels (critical / high / logs), severity-first format, 1 alert per incident not per row, acknowledge buttons. Setup of a single pipeline = ~30 min. Setup of an entire stack (5-10 pipelines) = half a day. ROI: first 2 weeks of production save 10+ team hours not searching for what is happening.

Slack alerts for automation: practical setup

§01Webhook vs Bot API — what when

§02Routing per priority

§03Anatomy of a good alert

§04Practical templates

Critical alert (parser failed)

Daily summary (info)

§05Anti-patterns

§06Snooze + acknowledgment flow

§07The point

Most of these techniques we ship to production.