Hey everyone, Alex here. Welcome back to another edition of Coding with Alex at sysseder.com.
Let's do a quick reality check. Think about the last outage you had to debug. Was it a clean, deterministic NullPointerException or a straightforward syntax error? Or was it more like: "Well, Service A timed out waiting for Service B, but Service B actually processed the write, so now our database is in a weird half-state, and the retry queue just DDOS'd our own identity provider"?
If you're building modern web apps, microservices, or cloud-native platforms, it's almost always the latter. And while we like to think of our 2025 tech stack—complete with Kubernetes, service meshes, serverless runtimes, and AI-driven autoscaling—as incredibly advanced, we are still fundamentally fighting the exact same physics we were decades ago.
This week, a classic paper started trending again on Hacker News: L. Peter Deutsch’s "Eight Fallacies of Distributed Computing." Originally drafted at Sun Microsystems back in 1994, and solidified over the years, we are now over 30 years from its inception (and 21+ years since its widespread industry-wide adoption). Yet, as I read through the comments and looked at my own recent pull requests, a cold truth hit me: We are still writing code every single day that assumes these fallacies aren't true.
Today, let’s unpack why these eight fallacies are still the ultimate boss fight of software engineering, how they manifest in modern cloud environments, and how we can write code that actually survives the harsh reality of the network.
---The 8 Fallacies: Then vs. Now
If you need a refresher, the eight fallacies are assertions that programmers new to distributed systems invariably assume to be true, but which always turn out to be false:
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn't change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
Let’s look at the most painful ones we encounter in modern development and see how they break our 2025-era systems.
---Fallacy #1: The Network is Reliable (The Myth of the 100% Uptime API)
We've all written code like this at some point in our careers:
// Dangerous: Assuming the network always delivers
const response = await fetch("https://api.payment-gateway.com/v1/charges", {
method: "POST",
body: JSON.stringify(chargeDetails)
});
const result = await response.json();
await updateDatabaseStatus(result.id, "SUCCESS");
What happens if the network drops after the payment gateway processes the charge, but before it sends the response back to your server? Your client gets a 500 error, thinks the transaction failed, retries, and suddenly your customer is charged twice.
The 2025 Fix: Idempotency and Circuit Breakers
To survive an unreliable network, we must design our operations to be idempotent. An idempotent operation can be repeated multiple times without changing the result beyond the initial application.
Here is how we should refactor that API call using an idempotency key and a basic retry mechanism with exponential backoff:
import { v4 as uuidv4 } from 'uuid';
async function processPaymentWithRetry(chargeDetails, maxRetries = 3) {
// Generate a unique key for this specific transaction attempt
const idempotencyKey = chargeDetails.transactionId || uuidv4();
let attempt = 0;
let delay = 1000; // start with 1 second
while (attempt < maxRetries) {
try {
const response = await fetch("https://api.payment-gateway.com/v1/charges", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Idempotency-Key": idempotencyKey // Tell the receiver to ignore duplicate requests
},
body: JSON.stringify(chargeDetails),
signal: AbortSignal.timeout(5000) // Don't hang forever!
});
if (response.ok) {
return await response.json();
}
if (response.status >= 500) {
// Server error, worth retrying
throw new Error(`Server error: ${response.status}`);
}
// Client error (4xx), do not retry
const errorData = await response.json();
throw new Error(`Client error: ${errorData.message}`);
} catch (error) {
attempt++;
if (attempt >= maxRetries) {
throw new Error(`Failed after ${maxRetries} attempts. Network is down or unstable: ${error.message}`);
}
console.warn(`Attempt ${attempt} failed. Retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
delay *= 2; // Exponential backoff
}
}
}
---
Fallacy #2 & #3: Latency is Zero & Bandwidth is Infinite (The Microservices Tax)
In the monolith days, calling another module was a CPU instruction taking nanoseconds. Today, we've split our monoliths into dozens of microservices. Calling another service now involves serializing data to JSON/gRPC, DNS resolution, TCP handshakes, TLS negotiation, routing through ingress proxies, deserialization, and then doing the reverse on the way back.
This is often referred to as the "Microservices Tax." If Service A calls Service B, which calls Service C, which calls Service D, your latency is the sum of all those network hops plus the execution time.
The Architectural Solution: Bulkheads and Asynchronous Messaging
Instead of synchronous REST chains, modern resilient architecture relies on asynchronous, event-driven communication using message brokers like Kafka, RabbitMQ, or AWS EventBridge.
Consider this transition from synchronous temporal coupling to asynchronous decoupling:
[Synchronous Monolith-in-Disguise (Fragile)]
[Client] -> HTTP -> [Order Service] -> HTTP -> [Inventory Service] -> HTTP -> [Shipping Service]
(If Shipping is slow or down, the whole order placement fails)
[Asynchronous Event-Driven (Resilient)]
[Client] -> HTTP -> [Order Service]
|
(Publishes "OrderCreated" event)
|
v
[Message Broker]
/ \
v v
[Inventory Service] [Shipping Service]
(Processes async) (Processes async)
By moving to an event-driven model, the Order Service can acknowledge the user's order in milliseconds. Even if the Shipping Service goes down for maintenance (violating Fallacy #1 and #5), the messages wait in the queue until the service recovers. The system as a whole remains functional.
Fallacy #5: Topology Doesn't Change (The Cloud Native Lie)
Remember when servers had static IP addresses, hostnames like db-prod-01, and stayed alive for years? Today, we run on Kubernetes, AWS ECS, or serverless functions. Pods are terminated and recreated constantly due to spot instance interruptions, autoscaling, deployments, or node upgrades.
If your code hardcodes IP addresses, or if your application caches DNS resolutions indefinitely, your system will break the moment the underlying infrastructure scales.
The modern antidote: Service Discovery and Smart Retries
Modern applications must rely on dynamic service discovery (like Consul, Kubernetes CoreDNS, or AWS Cloud Map) and ephemeral configurations. Additionally, we must configure our HTTP clients to respect DNS TTLs (Time To Live).
For example, in Node.js, the default behavior of the internal DNS resolver is to cache lookups indefinitely in some configurations. Using a custom agent or libraries like dnscache prevents your application from sending traffic to dead pods that have long been replaced by a rolling deployment:
import dns from 'dns';
import dnscache from 'dnscache';
// Enable DNS caching with a short TTL (e.g., 5 seconds)
dnscache({
"enable": true,
"ttl": 5,
"cachesize": 1000
});
// Now, node's http/https agents will respect the TTL and refresh IPs
// preventing requests from hitting decommissioned container instances.
---
Fallacy #7: Transport Cost is Zero (The Egress Tax)
When you are running a local dev environment, moving gigabytes of data between your local Docker containers costs absolutely nothing. But in production, public cloud providers have made "egress fees" a major profit center.
Moving data out of your cloud provider to the public internet, or even moving data between availability zones (AZs) within the same cloud region, can result in massive monthly bills if you are not careful. If you design an architecture where a high-throughput microservice in AZ-East-1a constantly queries a database partition in AZ-East-1b, you are paying a literal tax on every single packet.
The Fix: Colocation, Compression, and Efficient Protocols
- Keep it local: Use topology-aware routing in Kubernetes to ensure services prefer talking to instances in the same availability zone.
- Stop sending raw JSON: If you are transmitting large payloads, stop using verbose JSON. Switch to binary serialization formats like Protocol Buffers (protobuf) or Avro. They require far less bandwidth and are much faster to serialize/deserialize.
Take a look at how much smaller a Protobuf payload is compared to JSON for a standard user profile:
// Verbose JSON Payload (approx 150 bytes)
{
"userId": "usr_987654321",
"email": "alex@sysseder.com",
"isActive": true,
"roles": ["admin", "developer", "editor"]
}
// Protobuf Binary Representation (approx 45 bytes)
// [0a 0d 75 73 72 5f 39 38 37 36 35 34 33 32 31 12 11 61 6c 65 78 40 73 79 73 ... ]
Cutting your bandwidth usage by 70% doesn't just lower your cloud egress bill; it also directly reduces latency (violating Fallacy #2 in your favor!).
---Wrapping Up: Embrace the Chaos
The "Eight Fallacies of Distributed Computing" aren't just an academic history lesson; they are the laws of physics for the web. Every time we write a line of code that communicates over a network card, we are making assumptions. The key to being a senior engineer is ensuring those assumptions are grounded in reality, not wishful thinking.
Next time you are designing a system, writing an API client, or configuring your cloud infrastructure, ask yourself:
- What happens to this code if the network drops for exactly 3 seconds?
- How does this system behave if latency suddenly spikes by 200ms?
- Am I assuming this IP address will exist an hour from now?
Design for failure, build in redundancy, keep your payloads light, and never, ever trust the wire.
What is your favorite (or most painful) distributed systems failure story? Have you ever brought down a production system because of a DNS caching issue or a missing retry limit? Let’s swap war stories in the comments below!
Until next time, happy coding!