Understanding Clos / Spine–Leaf Architecture: Why Modern Data Centers Depend on It for Scale & Reliability

Sabyasachi (SK)

Say Yes to New Adventures

Lorem Modern data centers—whether built by Google, Meta, AWS, Microsoft, or Arrcus-based fabrics—use a Clos / Spine–Leaf architecture instead of traditional three-tier networks. The reason is simple:

👉 It scales horizontally.
👉 It delivers predictable low latency.
👉 It offers massive throughput and no single point of failure.

Let’s break this down in a way that would impress any Google, Arista, Cisco, or Arrcus interviewer.

1. What is a Clos / Spine–Leaf Architecture?

A Clos fabric (named after Charles Clos, 1950s) is a multi-stage switching topology designed to provide non-blocking, high-bandwidth connectivity using many small switches instead of a few big ones.The modern implementation is called:

✔ Spine–Leaf Architecture

Leaf switches connect to servers (Top of Rack – TOR).
Spine switches connect to all leafs.

Every leaf is connected to every spine → creating a predictable, high-bandwidth fabric.

2. Why Traditional 3-Tier Networks Failed at Scale

Traditional networks used:

Core
Aggregation
Access

Problems appeared when data centers grew:

❌ Traffic bottlenecks between tiers

❌ North–south optimized, not east–west

❌ Limited bandwidth as link counts were fixed

❌ Scaling required expensive “big chassis” switches

But the modern data center needs:

Microservices
AI workloads
Distributed computing
100G/400G server NICs

Traffic is now 80–90% east–west → services talking to each other.
The old model simply cannot sustain this.

3. How a Spine–Leaf Fabric Actually Works

Leaf Switches

Connect to servers.
Provide local switching.
Perform VXLAN encapsulation (as VTEPs).
Have uplinks to every spine.

Spine Switches

Act as the high-speed backbone.
Are L3-only switches (simple, fast).
Never connect to each other.
Provide ECMP (Equal Cost Multi-Pathing).

4. How Spine–Leaf Provides Massive Scale

🎯 A. Horizontal ScalabilityWant more servers?
➡ Add more leaf switches.Want more bandwidth?
➡ Add more spine switches.No need to replace expensive chassis.
Scaling is linear.Example:

4 spines × 48 leafs = 192 fabric links
Add 2 more spines → now 288 links
Bandwidth expands instantly without downtime

This is why Google’s Jupiter fabric scales to 100,000+ servers.

🎯 B. ECMP = More Bandwidth Automatically

Because every leaf connects to all spines, each path has the same cost (distance = 2 hops).

The fabric becomes:➡ Predictable
➡ Low-latency
➡ Multi-path load balanced

Using ECMP hashing, the network uses all paths simultaneously across:

8-way ECMP
16-way ECMP
32-way ECMP
64-way ECMP (in hyperscale fabrics)

The more spines you add → the more bandwidth every server gets.

🎯 C. No Oversubscription (or predictable oversubscription)Oversubscription is:Server-facing bandwidth : Uplink bandwidth
In spine–leaf, you can design:

1:1 (non-blocking fabric)
2:1 (balanced)
4:1 (cost-optimized)

The beauty:
➡ it’s controlled and predictable.

5. How Spine–Leaf Provides High Reliability

✅ A. No Single Point of FailureIn traditional networks:

If an aggregation switch fails → a whole cluster goes down.

In spine–leaf:

If one spine fails, traffic automatically uses the remaining spines.
If one link fails, ECMP reroutes instantly.
If a leaf fails, only servers on that leaf are impacted.

Everything else stays operational.

✅ B. Fast Failover (Subsecond)

Because the fabric is built around L3 + ECMP, failure detection is fast:

BFD (50ms or lower)
BGP/OSPF/ISIS with rapid timers
Reactive ECMP reconvergence

Traffic instantly reroutes around the failure WITHOUT affecting throughput.

✅ C. Simplified L3-only Core

Spines do not do:

STP
MLAG
Flooding
L2 broadcast

Less complexity → fewer outages.Leafs handle L2 (via VXLAN EVPN),
Spines handle L3 (fast, stateless, scalable).

6. Why Hyperscalers Love Clos / Spine–Leaf

✔ Amazon → Uses multi-stage Clos called “Scorpion”

✔ Google → Jupiter, B4 SDN WAN

✔ Microsoft → Clos-based Azure fabrics

✔ Meta → FBOSS + Wedge in store-and-forward Clos

✔ Arrcus ArcOS → Ultra-high performance ECMP Clos with ArcIQThe

reasons:

Massive horizontal scale
Commodity whitebox switches
Automated provisioning
Predictable failure behavior
Supports VXLAN/EVPN overlays

7. Putting It All Together — Summary Table

Feature	Traditional Network	Spine–Leaf (Clos)
Traffic Pattern	North–South	East–West
Scalability	Limited	Massive, linear
Failure Impact	High	Minimal
Upgrade	Complex, disruptive	Add spines/leafs modularly
Bandwidth	Choke points	Predictable ECMP
Fabric Type	Tree	Clos Fabric

Final Answer for an Interview (Concise Version)

A Clos / Spine–Leaf architecture provides scale by using many small switches connected in a multi-stage fabric where leaf switches connect to all spines, enabling horizontal scaling simply by adding more spines or leafs. Reliability comes from ECMP multi-pathing, where multiple equal-cost paths ensure traffic automatically reroutes during link or device failures. This eliminates bottlenecks, delivers predictable low latency, and removes single points of failure, making it the foundation of all modern data center networks.

How to configure?..Follow this

EVPN-VXLAN on Cisco Nexus Explained: L2VNI, BGP EVPN, OSPF Underlay & Full Config | EVPN VXLAN Nexus

Understanding Clos / Spine–Leaf Architecture: Why Modern Data Centers Depend on It for Scale & Reliability

Say Yes to New Adventures

You may also be interested in