Understanding Clos / Spine–Leaf Architecture: Why Modern Data Centers Depend on It for Scale & Reliability

Sabyasachi (SK)

Say Yes to New Adventures

Lorem Modern data centers—whether built by Google, Meta, AWS, Microsoft, or Arrcus-based fabrics—use a Clos / Spine–Leaf architecture instead of traditional three-tier networks. The reason is simple:

👉 It scales horizontally.
👉 It delivers predictable low latency.
👉 It offers massive throughput and no single point of failure.

Let’s break this down in a way that would impress any Google, Arista, Cisco, or Arrcus interviewer.

1. What is a Clos / Spine–Leaf Architecture?

A Clos fabric (named after Charles Clos, 1950s) is a multi-stage switching topology designed to provide non-blocking, high-bandwidth connectivity using many small switches instead of a few big ones.The modern implementation is called:

✔ Spine–Leaf Architecture

  • Leaf switches connect to servers (Top of Rack – TOR).
  • Spine switches connect to all leafs.
Every leaf is connected to every spine → creating a predictable, high-bandwidth fabric.

2. Why Traditional 3-Tier Networks Failed at Scale

Traditional networks used:

  • Core
  • Aggregation
  • Access
Problems appeared when data centers grew:

❌ Traffic bottlenecks between tiers

❌ North–south optimized, not east–west

❌ Limited bandwidth as link counts were fixed

❌ Scaling required expensive “big chassis” switches

But the modern data center needs:

  • Microservices
  • AI workloads
  • Distributed computing
  • 100G/400G server NICs
Traffic is now 80–90% east–west → services talking to each other.
The old model simply cannot sustain this.

3. How a Spine–Leaf Fabric Actually Works

Leaf Switches

  • Connect to servers.
  • Provide local switching.
  • Perform VXLAN encapsulation (as VTEPs).
  • Have uplinks to every spine.
Spine Switches
  • Act as the high-speed backbone.
  • Are L3-only switches (simple, fast).
  • Never connect to each other.
  • Provide ECMP (Equal Cost Multi-Pathing).

4. How Spine–Leaf Provides Massive Scale

🎯 A. Horizontal ScalabilityWant more servers?
➡ Add more leaf switches.Want more bandwidth?
➡ Add more spine switches.No need to replace expensive chassis.
Scaling is linear.Example:

  • 4 spines × 48 leafs = 192 fabric links
  • Add 2 more spines → now 288 links
  • Bandwidth expands instantly without downtime
This is why Google’s Jupiter fabric scales to 100,000+ servers.

🎯 B. ECMP = More Bandwidth Automatically

Because every leaf connects to all spines, each path has the same cost (distance = 2 hops).

The fabric becomes:➡ Predictable
➡ Low-latency
➡ Multi-path load balanced

Using ECMP hashing, the network uses all paths simultaneously across:

  • 8-way ECMP
  • 16-way ECMP
  • 32-way ECMP
  • 64-way ECMP (in hyperscale fabrics)
The more spines you add → the more bandwidth every server gets.

🎯 C. No Oversubscription (or predictable oversubscription)Oversubscription is:Server-facing bandwidth : Uplink bandwidth
In spine–leaf, you can design:

  • 1:1 (non-blocking fabric)
  • 2:1 (balanced)
  • 4:1 (cost-optimized)
The beauty:
➡ it’s controlled and predictable.

5. How Spine–Leaf Provides High Reliability

A. No Single Point of FailureIn traditional networks:

  • If an aggregation switch fails → a whole cluster goes down.
In spine–leaf:
  • If one spine fails, traffic automatically uses the remaining spines.
  • If one link fails, ECMP reroutes instantly.
  • If a leaf fails, only servers on that leaf are impacted.
Everything else stays operational.

B. Fast Failover (Subsecond)

Because the fabric is built around L3 + ECMP, failure detection is fast:

  • BFD (50ms or lower)
  • BGP/OSPF/ISIS with rapid timers
  • Reactive ECMP reconvergence
Traffic instantly reroutes around the failure WITHOUT affecting throughput.

C. Simplified L3-only Core

Spines do not do:

  • STP
  • MLAG
  • Flooding
  • L2 broadcast
Less complexity → fewer outages.Leafs handle L2 (via VXLAN EVPN),
Spines handle L3 (fast, stateless, scalable).

6. Why Hyperscalers Love Clos / Spine–Leaf

✔ Amazon → Uses multi-stage Clos called “Scorpion”

✔ Google → Jupiter, B4 SDN WAN

✔ Microsoft → Clos-based Azure fabrics

✔ Meta → FBOSS + Wedge in store-and-forward Clos

✔ Arrcus ArcOS → Ultra-high performance ECMP Clos with ArcIQThe 

reasons:

  • Massive horizontal scale
  • Commodity whitebox switches
  • Automated provisioning
  • Predictable failure behavior
  • Supports VXLAN/EVPN overlays

7. Putting It All Together — Summary Table

FeatureTraditional NetworkSpine–Leaf (Clos)
Traffic PatternNorth–SouthEast–West
ScalabilityLimitedMassive, linear
Failure ImpactHighMinimal
UpgradeComplex, disruptiveAdd spines/leafs modularly
BandwidthChoke pointsPredictable ECMP
Fabric TypeTreeClos Fabric

Final Answer for an Interview (Concise Version)

A Clos / Spine–Leaf architecture provides scale by using many small switches connected in a multi-stage fabric where leaf switches connect to all spines, enabling horizontal scaling simply by adding more spines or leafs. Reliability comes from ECMP multi-pathing, where multiple equal-cost paths ensure traffic automatically reroutes during link or device failures. This eliminates bottlenecks, delivers predictable low latency, and removes single points of failure, making it the foundation of all modern data center networks.

How to configure?..Follow this

EVPN-VXLAN on Cisco Nexus Explained: L2VNI, BGP EVPN, OSPF Underlay & Full Config | EVPN VXLAN Nexus

Sabyasachi
Network Engineer at Google | 3x CCIE (SP | DC | ENT) | JNCIE-SP | SRA Certified | Automated Network Solutions | AI / ML (Designing AI DC)