Troubleshooting
Common problems when scans fail, CI rejects your key, you hit concurrency limits, or customer agents do not claim jobs.
On this page
Crawl failures
When a run shows Failed on the dashboard or in run history, open the run detail—the error message is the starting point. The run detail page also links here when a crawl fails.
| Symptom | What to check |
|---|---|
| Run failed immediately | Confirm the sitemap URL is reachable from the execution route (cloud vs customer agent). Check auth, firewall, and that the URL returns valid XML—not an HTML login page. |
| Sitemap fetch or parse error |
Typical messages: Failed to fetch sitemap, HTTP 4xx/5xx on the sitemap,
is not valid XML, or an empty urlset.
Verify the sitemap in a browser or with curl; fix redirects, compression, and namespace issues.
|
| Timeout or cancelled mid-crawl | Large sites may need a higher page limit or lower concurrency in advanced options. Agent crawls can also hit API report limits—see Agents and Customer agent setup. |
| Run stuck on Pending or Running | Pending on an agent route usually means no agent claimed the job (pool mismatch or agent offline). Running with no progress may be a stuck claim—see Agents. Cloud runs that hang may eventually fail via stale-job reconciliation. |
| Only one crawl at a time per site | A second start for the same sitemap while another run is active may be rejected or queued depending on trigger. Finish or cancel the active run first. |
To interpret completed runs (errors vs warnings vs deploy diff), see Reading your report.
Rate limits (429 — too many concurrent crawls)
Signal Diff limits how many crawls can be Pending or Running at once per account. Manual dashboard runs, schedules, CI triggers, and agent-backed jobs all count toward the same tenant limit.
| Account state | Max concurrent crawls |
|---|---|
| Free (no active paid API key) | 1 |
| Paid (at least one active API key) | 3 |
When you exceed the limit, new starts return 429 Too Many Requests with a message like
Too many crawls are already running for your account.
- Open the dashboard and cancel or wait for Pending/Running jobs—including agent crawls you forgot about.
- CI pipelines that retry quickly can hit 429 until an earlier run finishes; stagger workflows or cancel stuck runs.
- Plan limits and upgrade path: Plans and limits — concurrent crawls and Pricing.
CI (401, 404, and fail_mode)
CI triggers require a paid plan, repository secrets, and a reachable sitemap (or agent routing for internal URLs). Step-by-step setup: CI and GitHub Actions setup.
| Symptom | Likely fix |
|---|---|
401 on POST /api/trigger/ci |
Wrong, revoked, or expired SIGNALDIFF_CI_API_KEY.
Create or rotate on Developers → API keys, update every repository secret, and re-run.
See API keys lifecycle guide.
|
404 on /api/trigger/ci |
SIGNALDIFF_API_BASE_URL must be the site origin only—for example
https://signaldiff.dev, not https://signaldiff.dev/api.
The action posts to {origin}/api/trigger/ci; an extra /api produces
/api/api/trigger/ci.
|
| 429 on CI trigger | Tenant concurrent crawl limit—see Rate limits. Wait for or cancel active runs on your account. |
| Workflow failed but crawl “succeeded” | fail_mode gates the job after the crawl completes.
error fails when errorCount > 0;
errorOrWarning also fails on warnings;
none never fails on finding counts alone.
Crawl execution failures (timeout, unreachable sitemap) always fail regardless of mode.
|
| Workflow passed but you expected failure |
Check fail_mode: none (report-only).
For pull requests, gating on new findings since the CI baseline is often better than raw totals—see
Baselines and diffs.
|
| CI crawl stays Pending (agent mode) | execution_mode: agent requires a running enrolled agent with a matching
agent_pool_id—see Agents.
|
Full fail_mode table and PR comment permissions:
CI setup — fail mode.
Customer agents
Agents pull work over HTTPS; jobs stay Pending until an authenticated agent claims them. For enroll, install, pools, and rotation, use Customer agents and the full Customer agent setup guide—this section covers common failure modes.
| Symptom | What to check |
|---|---|
| Heartbeat fails (401/403) |
Credential expired or rotated—re-enroll on Customer agents and update
appsettings.json or container env on every host.
Verify ApiBaseUrl matches your site origin.
|
| Heartbeat OK, no crawls |
Job executionMode must be agent.
API Features:EnableAgentRouting must be true.
Jobs belong to the GitHub user who started them—the agent credential tenant must match.
|
| Jobs stay Pending |
Agent not running, outbound HTTPS blocked, or pool mismatch:
crawl agentPoolId must match the enrolled agent (empty string = default pool on both sides).
|
| UI shows Running, agent idle |
Stuck run: a prior claim set Running but the agent exited before report.
Agents only claim Pending jobs.
Wait for stale reconciliation (hours) or clear the job when the UI allows.
|
| 429 on enroll / heartbeat / claim | Per-agent protocol rate limits—avoid duplicate processes with the same agent ID and reduce aggressive polling. |
| Report or progress errors (500/403) |
Large crawls use chunked upload; oversized batches or missing API endpoints can fail finalize.
Cap pages with crawl MaxPages or see
report page coverage for storage limits and AgentProtocol:MaxReportPages.
Late progress POSTs after finalize may log harmless 403 replay errors—upgrade the agent if noisy.
|
Production checklist: keep heartbeats running, alert when claims stop, and align schedule/CI pool IDs with enrolled agents.
Operator detail for stale heartbeats and fleet status: repository doc
docs/agent-offline-and-heartbeat.md (links here as the canonical user-facing summary).
Report page coverage (large sites)
Signal Diff crawls every URL in your sitemap, but per-page detail in the dashboard, HTML export, and stored run payload is capped to keep Cosmos documents and API responses bounded. Site-wide counts (errors, warnings, info, total pages) always reflect the full crawl.
When stored detail covers fewer pages than were crawled, the run report shows a yellow Not all findings are listed below banner with stored vs total page counts.
Default cap and selection order
The API stores at most 25 pages per run by default (AgentProtocol:MaxReportPages).
When a crawl exceeds that limit, pages are ranked and the highest-priority rows are kept:
- Pages with error-level findings (including HTTP errors)
- Pages with warning-level findings
- Pages with info-level findings
- Clean pages (no findings), in crawl order
Within each page, individual findings may also be trimmed when a page has many issues (highest severity first).
Customer agents use the same selection before upload; align SignalDiffAgent:MaxReportPages with the API setting if you raise the cap.
Deploy diff URL list
The deploy diff card compares this run to your baseline. The URLs changed count is site-wide, but the expandable path-level list is capped at 100 rows, prioritizing new errors, new warnings, finding changes, then title/description/status changes.
What you can do
| Goal | Option |
|---|---|
| Sample a large site |
Set crawl MaxPages in advanced options, schedules, or CI payload so the crawl itself stops after N URLs.
|
| See more per-page detail in reports |
Operators can raise AgentProtocol:MaxReportPages on the API (Azure app setting AgentProtocol__MaxReportPages).
Higher values increase Cosmos document size and API response time—see the operator runbook below.
|
| Agent report upload failures |
Very large chunked uploads can fail at the gateway. Lower MaxPages on the crawl or reduce stored pages before re-running.
See Agents for chunked upload errors.
|
Operator guidance for raising the API cap: repository doc docs/agent-offline-and-heartbeat.md (MaxReportPages).
Dashboard overview cards and top issues use site-wide totals even when per-page detail is partial—see
Reading your report — export.
Other common issues
| Symptom | What to check |
|---|---|
| Cannot sign in | Use Sign in with GitHub and complete authorization. Clear site cookies or try a private window if you loop back to the home page. |
| No deploy diff or run history | The first complete run has no baseline yet. On all plans, runs older than 30 days are removed—see Baselines and diffs — retention. Anonymous try-a-scan flows do not keep history. |
| Schedule did not run | Verify the schedule is enabled and cron/time is in UTC. Last skipped often means concurrency limits or another active run—see Schedules troubleshooting. |