Picture this: It’s 2:00 PM on a Tuesday. Your team’s Slack is quiet, the CI/CD pipeline is humming along, and you’ve just poured a fresh cup of coffee. Then, the first alert hits. A webhook failed. Ten seconds later, Sentry explodes with timeout errors. Is your cloud provider down? Is AWS us-east-1 having another bad day? Is GitHub Pages offline, or is it just Stripe’s sandbox acting up again?
In the heat of an incident, we developers do the "tab-dance." We open status.aws.amazon.com, status.github.com, status.stripe.com, and maybe Downdetector just to be sure. It’s chaotic, slow, and distracting. Recently, a project called IsUpMap climbed to the top of Hacker News by solving exactly this pain point: it aggregates the status of over 100 major services onto a single, clean, real-time dashboard.
As developers, looking at IsUpMap shouldn't just make us say, "Cool tool." It should make us ask: How does this actually work under the hood without melting the client’s browser, running into rate limits, or serving stale data?
Today, we’re going to dissect the architecture of a high-performance status aggregator. We’ll look at the technical challenges of multi-service status monitoring, design an elegant system architecture, and write some highly concurrent Go code to fetch, parse, and cache status data at scale.
The Technical Challenges of Mass Status Aggregation
On paper, checking if a site is up seems simple: just send an HTTP GET request or a ping, right? Unfortunately, when you scale this to hundreds of services and thousands of concurrent users, you run into serious engineering roadblocks.
1. The "Status Page" Diversity Problem
There is no single standard for status pages. Some companies use Atlassian’s Statuspage API, others use Custom HTML dashboards, and some rely on basic JSON endpoints. To aggregate them, your system must handle:
- Structured JSON APIs (e.g.,
https://status.github.com/api/v1/status.json) - HTML Scraping (for legacy dashboards without APIs)
- Different schemas (e.g., one service returns
"status": "good", another returns"indicator": "none")
2. Rate Limiting and IP Blocking
If your backend server pings 100 different status endpoints every 10 seconds, those providers’ security systems (like Cloudflare or Akamai) might flag your server as a DDoS vector. We need a smart caching and polling strategy to respect upstream rate limits.
3. Client-Side Latency
If you make the client’s browser fetch the status of 100 sites directly, the user's browser will choke on network requests, encounter CORS (Cross-Origin Resource Sharing) blocks, and drain their mobile battery. The aggregation must happen on the backend, serving a single, lightweight payload to the client.
Designing the Architecture
To build a robust, production-grade status aggregator, we need to decouple the fetching process from the serving process. If we fetch data synchronously when a user loads our page, the page load time will equal the slowest status page we query (which could be seconds!).
Here is the architecture we want to build:
+-------------------------------------------------------+
| Admin / Cron |
+-------------------------------------------------------+
|
v (Every 60s)
+-------------------------------------------------------+
| Go Worker (Concurrent Fetcher) |
+-------------------------------------------------------+
| | |
v v v
+--------------+ +--------------+ +--------------+
| GitHub API | | Stripe API | | Vercel API |
+--------------+ +--------------+ +--------------+
| | |
+-------------------+-------------------+
|
v (Parse & Normalize)
+-------------------------------------------------------+
| In-Memory Cache (Go) |
+-------------------------------------------------------+
|
v (Expose)
+-------------------------------------------------------+
| Fast HTTP API Endpoint |
+-------------------------------------------------------+
|
v (Fetch JSON / SSE)
+-------------------------------------------------------+
| Frontend Dashboard UI |
+-------------------------------------------------------+
By using an in-memory cache and a background worker, our API can serve status data in sub-milliseconds to thousands of concurrent users, while our backend politely polls upstream status pages once every minute.
Step-by-Step Implementation in Go
Go (Golang) is the perfect language for this task. Its built-in concurrency primitives (Goroutines and Channels) allow us to fetch hundreds of external status pages concurrently with minimal memory overhead.
1. Defining Our Data Structures
First, we need to normalize the statuses of different services into a unified format. Let's define our types:
package main
import (
"sync"
"time"
)
// Status represents the normalized health state of a service
type Status string
const (
StatusOperational Status = "operational"
StatusDegraded Status = "degraded"
StatusMajorOutage Status = "outage"
StatusUnknown Status = "unknown"
)
// Service defines a target platform we want to monitor
type Service struct {
ID string `json:"id"`
Name string `json:"name"`
Type string `json:"type"` // "github", "statuspage", "custom_json"
URL string `json:"url"`
}
// ServiceStatus represents the evaluated status of a service to be sent to the client
type ServiceStatus struct {
ID string `json:"id"`
Name string `json:"name"`
Status Status `json:"status"`
CheckedAt time.Time `json:"checked_at"`
ResponseTime int64 `json:"response_time_ms"`
}
// Global Cache Store
type StatusCache struct {
mu sync.RWMutex
Results map[string]ServiceStatus
}
2. Writing the Fetcher Engine
Now, let's write a concurrent fetcher. We'll use a sync.WaitGroup to fetch all statuses in parallel. To prevent slow endpoints from hanging our entire system, we will enforce a strict 5-second timeout on each HTTP request using Go's context package.
package main
import (
"context"
"encoding/json"
"net/http"
"time"
)
// FetchStatus queries the target service and normalizes the response
func FetchStatus(ctx context.Context, client *http.Client, svc Service) ServiceStatus {
start := time.Now()
req, err := http.NewRequestWithContext(ctx, "GET", svc.URL, nil)
if err != nil {
return ServiceStatus{ID: svc.ID, Name: svc.Name, Status: StatusUnknown, CheckedAt: start}
}
resp, err := client.Do(req)
if err != nil {
return ServiceStatus{ID: svc.ID, Name: svc.Name, Status: StatusMajorOutage, CheckedAt: start}
}
defer resp.Body.Close()
elapsed := time.Since(start).Milliseconds()
// Parse based on service type
var status Status = StatusUnknown
if svc.Type == "statuspage" {
// Atlassian Statuspage structure parsing
var result struct {
Status struct {
Indicator string `json:"indicator"`
} `json:"status"`
}
if err := json.NewDecoder(resp.Body).Decode(&result); err == nil {
switch result.Status.Indicator {
case "none":
status = StatusOperational
case "minor", "major":
status = StatusDegraded
case "critical":
status = StatusMajorOutage
}
}
} else if svc.Type == "github" {
// GitHub specific status API
var result struct {
Status string `json:"status"`
}
if err := json.NewDecoder(resp.Body).Decode(&result); err == nil {
if result.Status == "good" {
status = StatusOperational
} else {
status = StatusMajorOutage
}
}
}
return ServiceStatus{
ID: svc.ID,
Name: svc.Name,
Status: status,
CheckedAt: time.Now(),
ResponseTime: elapsed,
}
}
3. Managing Concurrency and Rate-Limiting
If we have 100+ services, launching 100 goroutines simultaneously is easy for Go. However, we want to make sure we don't saturate our own server's outbound network sockets. We can use a buffered channel as a semaphore to limit concurrent requests (e.g., maximum 20 concurrent outgoing connections at any time).
func UpdateAllStatuses(services []Service, cache *StatusCache) {
var wg sync.WaitGroup
// Semaphore to limit concurrency to 20
sem := make(chan struct{}, 20)
client := &http.Client{
Timeout: 5 * time.Second,
}
for _, svc := range services {
wg.Add(1)
sem <- struct{}{} // Acquire token
go func(s Service) {
defer wg.Done()
defer func() { <-sem }() // Release token
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
result := FetchStatus(ctx, client, s)
// Write to safe concurrent cache
cache.mu.Lock()
cache.Results[s.ID] = result
cache.mu.Unlock()
}(svc)
}
wg.Wait()
}
Serving the Data to the Frontend
Now that our background worker is updating the cache in memory, serving this data to our users is incredibly cheap. We can write a simple HTTP handler in Go that locks the cache read-mutex, marshals the map to JSON, and sends it instantly.
func handleGetStatus(cache *StatusCache) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.Header().Set("Access-Control-Allow-Origin", "*") // Enable CORS
cache.mu.RLock()
defer cache.mu.RUnlock()
json.NewEncoder(w).Encode(cache.Results)
}
}
With this architecture, even if 10,000 developers hit your status page dashboard at the exact same moment, your server won't fetch the external APIs 10,000 times. It will simply read the last cached result from RAM, returning a response in microseconds.
Going Beyond: Adding "Push" Updates with Server-Sent Events (SSE)
Polling is great, but real-time updates are better. What if we want to stream status updates to our frontend dashboard the second our backend worker finishes a cycle?
Instead of forcing the client to reload or poll our backend API every 30 seconds, we can use Server-Sent Events (SSE). SSE is a lightweight, unidirectional protocol over standard HTTP that lets the server push updates directly to the browser. Unlike WebSockets, it doesn't require complex connection upgrades or custom protocols.
Here’s how easy it is to listen to updates in Javascript on the frontend:
const eventSource = new EventSource('https://api.yoursite.com/status/stream');
eventSource.onmessage = (event) => {
const updatedStatuses = JSON.parse(event.data);
updateDashboardUI(updatedStatuses);
console.log("Real-time status synced!");
};
Wrapping Up
Systems like IsUpMap show us that developer tools don't have to be bloated to be immensely useful. By taking ownership of your status data and building a lightweight, concurrent status aggregator, you remove the chaos from incident management. No more frantic tab-switching. Just a single source of truth that keeps you, your team, and your users informed.
Are you building internal tooling or custom dashboards to track your stack's dependency health? Do you prefer parsing status pages on the client-side, or do you always centralize through a backend cache?
Let’s talk in the comments below, or drop your thoughts on Twitter/X tagging @sysseder!