Tag: sdn

Google Jupiter Data Center Network Fabric – New Way of Building Data Center Network Underlay

Google’s Datacenter Optical Circuit Switches and Jupiter network fabric

Google’s data centers are unlike any other. It seems they have windows like normal houses because as from the last SIGCOMM’22 presentation, they took their SPINE switches and threw them right out of that window.

Google worked on the Micro Electro Mechanical Systems (MEMS) for years in order to build an Optical Circuit Switch (OCS) that would enable dynamic reconfiguration of optical connections between switches in the data center. Optical Circuit Switch enables on-the-fly data center fabric aggregation block switch connections reconfiguration without the need for physical rewiring. And most interestingly, the usage of Software-Defined Networking (SDN) traffic engineering, enables aggregation block switches to be directly connected and completely removes the need for those bulky Spine switches that were connecting aggregation block switches in CLOS topologies.

OCS MEMS mirror array can redirect light from any input port to any output port by slightly moving one of its mirrors and changing dynamically optical connections between aggregation blocks

Spine switch roles are basically replaced with OCS devices for smart, dynamic and direct interconnections of Leafs. SDN is used for BUM traffic handling (Broadcast, unknown-unicast and multicast traffic) that was the other important spine role.

Google paper (Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking) that was presented at the SIGCOMM’22 conference describes how getting rid of Spines, and smart traffic handling with SDN, enabled the Jupiter fabric to get up to 5x higher speed and capacity and 40% reduction in power consumption in relation to similar CLOS topology datacenter fabrics.

Cisco ACI – API Calls vs JSON POST

API Calls method

The fancy way of configuring Cisco ACI Fabric is by using Python script for generating API calls. Those API calls are then used to configure Cisco ACI by pushing those calls to APIC controller using POSTMAN (or similar tool). Configuration changes done this way are those that you are doing often and without much chance of making mistakes.

You write a Python script and that script will take your configuration variables and generate API call that will configure the system quickly and correctly every time.

The thing is that you need to take the API call example and use Python to write a script that will recreate that API calls with your variables of configuration and do that correctly. You need to know to code in Python and you will need a certain amount of time to write that script.

POST JSON file method

Cisco ACI – Configuring by POSTing JSON

If you are configuring Cisco ACI datacenter fabric it will sooner or later get to the point that you need to configure multiple objects inside the GUI which will, by using the click-n-click method, take a huge amount of time.

While using POSTMAN to create multiple objects of the same type is the preferred method that everybody is speaking about (because you can generate REST API calls using Python or something similar), the quickest way to do it is using POST of JSON configuration file directly through the GUI.

POSTing JSON config example

As described above, the POST of JSON for some simple yet repetitive configuration is the way to go. Let’s see how it’s done:

Creating multiple BDs inside a tenant in Cisco ACI:

How to Advertise a Route from ACI Layer2 BD Outside the Fabric?

Sometimes you will have some L2 domains (Bridge Domains – BD) in your datacenter that will be used with hardware appliances like F5 NLB or something like an additional firewall, WAF or something similar. That is the case where ACI will not route or bridge but the only L3 point of exit from that kind of segment would be on actual hardware appliance outside ACI Fabric – connected to the Leaf port.

We will take an example here and use it throughout the article where BIG IP F5 NLB is used as an L3 termination of L2 BD 10.10.10.0/24.

F5 is directly connected to ACI Leaf and routing from 10.10.10.0/24 subnet (L2 BD) is done directly on F5 device which is default gateway for that subnet endpoints.

ACI L2 BD Host Routing

In those cases for some particular implementations when you decide not to use PBR or Service graphs, it will happen that appliances like our F5 would become L3 termination for some ACI L2 BD like the 10.10.10.0/24 from my beautiful image above.

ACI MultiPod – Enable Standby APIC

APIC Controller Cluster

You actually need three APIC controller servers to get the cluster up and running in complete and redundant ACI system. You can actually work with only two APICs and you will still have a cluster quorum and will be able to change ACI Fabric configuration.

Loosing One Site

In the MultiPod, those three controllers need to be distributed so that one of them is placed in the secondary site. The idea is that you still have a chance to keep your configuration on one remaining APIC while losing completely primary site with two APICs. On the other hand, if you lose secondary site, two APICs in the first site will still enable you to do configuration and management of ACI Fabric as nothing happened.

Losing DCI Interconnect

The second type of MultiPod fail is when you lose DCI (datacenter interconnect). In that case, both sites will keep working but will alert that the other side is down. The secondary site with one APIC will be in read-only mode and the primary site will be fully functional with two remaining APICs on that site. If some changes are made on the primary site, those changes will be replicated to the third controller on the secondary site when DCI recovers and configuration relating site B POD will be then pushed to POD 2 fabric.

DCI issues are not a good time for APIC replacement, just wait for DCI to start working normally and continue to use ACI APIC controllers as before the issues. You will still have the option to manage primary site if DCI fails and after DCI starts working again changes will be replicated to secondary site APICs and Fabric.

Please note that temporary DCI issues are not a good time to replace the APIC. If you are experiencing just a DCI outage the second site still works but it cannot be configured. Think about it, perhaps the best thing to do in this case is not to change the configuration of your fabric on either side while DCI doesn’t get back up and running. That way you are sure your configuration does not affect the MultiPod stability once DCI gets back up and sites start to communicate again.