Category: Software Defined Datacenters

Google Jupiter Data Center Network Fabric – New Way of Building Data Center Network Underlay

Google’s Datacenter Optical Circuit Switches and Jupiter network fabric

Google’s data centers are unlike any other. It seems they have windows like normal houses because as from the last SIGCOMM’22 presentation, they took their SPINE switches and threw them right out of that window.

Google worked on the Micro Electro Mechanical Systems (MEMS) for years in order to build an Optical Circuit Switch (OCS) that would enable dynamic reconfiguration of optical connections between switches in the data center. Optical Circuit Switch enables on-the-fly data center fabric aggregation block switch connections reconfiguration without the need for physical rewiring. And most interestingly, the usage of Software-Defined Networking (SDN) traffic engineering, enables aggregation block switches to be directly connected and completely removes the need for those bulky Spine switches that were connecting aggregation block switches in CLOS topologies.

OCS MEMS mirror array can redirect light from any input port to any output port by slightly moving one of its mirrors and changing dynamically optical connections between aggregation blocks

Spine switch roles are basically replaced with OCS devices for smart, dynamic and direct interconnections of Leafs. SDN is used for BUM traffic handling (Broadcast, unknown-unicast and multicast traffic) that was the other important spine role.

Google paper (Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking) that was presented at the SIGCOMM’22 conference describes how getting rid of Spines, and smart traffic handling with SDN, enabled the Jupiter fabric to get up to 5x higher speed and capacity and 40% reduction in power consumption in relation to similar CLOS topology datacenter fabrics.

Switch vSphere Enterprise Plus license to vSphere Standard on a NSX-T enabled cluster

This article describes the strange workaround of switching VMware NSX-T enabled cluster from using vSphere Enterprise Plus license to vSphere Standard license with vDS licensed through NSX-T. I really hope that you will not need to go through this as it is quite like bringing the whole environment up from scratch. But if you have two clusters with enough resources it will enable you to do it without downtime.

Environment on which this was tested is vSphere 7.0.2 and NSX-T 3.1.2

NSX-T as a network and security platform enables network functions to be virtualised on your vSphere cluster. The way it does this is to implement additional features of network traffic steering and packaging inside its vSphere Distributed Switch (vDS).

Before NSX-T 3.1.1 the only way to get your cluster equipped with vDS was to have a vSphere Enterprise Plus license. From NSX-T 3.1.1 version onwards, VMware gives you the possibility to use vDS without vSphere Enterprise Plus license and license it using NSX-T license. This enabled users with a standard vSphere license to be able to deploy NSX-T on all editions of vCenter Server and vSphere.

After this started to be possible there were some customers who realised that in some cases the only reason they have vSphere Enterprise Plus license for a specific cluster is to be able to use NSX-T since that was needed in the past. So they decided that they will transfer those Enterprise Plus licenses to some other (new) cluster that needs those licenses for more different features.

NSX-T Edge Transport Node Packet Capture

NSX-T v3.0.1 and v3.1.3 were used to try the stuff described below

As always with network engineers, even when working with SDN/SSDC solutions, sooner or later you will be asked to troubleshoot connectivity across your hops. And if working with VMware NSX-T platform, your next-hop for the North-South Datacenter traffic will almost always be NSX-T EDGE Transport Node VM. It will be really useful then to be able to get some packet traces out of that box in order to troubleshoot the traffic issues in detail.

One of the examples would be simple routing or some sort of Loadbalancing traffic that seems not to reach the backend hosts behind NSX-T edge.

On the NSX-T EDGE VM it’s fairly simple to capture traffic directly. It’s possible to get the output out on the console or to save it to the file on the EDGE and then pull it out with SCP.

If you have an EDGE Cluster, normally build out of 2 VMs, first, you need to see on which node the T0 or T1 router you want the traffic to be captured is active.

Let’s say we want to capture traffic on “T0-router” shown in the image below. You can go to that T0 router from the UI and check the High Availability Mode output:

NSX-T EDGE VM Active T0

VMware NSX-T Install Tips & Tricks

UPDATE on 13 Feb 2021:
There were some changes and improvements with version NSX-T 3.1, so some tips are no longer needed. I’m in the process of proving those notes myself, but it seems NSX EDGE VMs can be migrated now and EDGE VTEPs don’t need a separate subnet from HOST VTEPs anymore.

Intro

It’s a shortlist of things that you should probably know when installing VMware NSX-T. Of course, installing NSX-T should be done by following the official documentation. This here is just a few additional points that could help. It’s for your peace of mind afterward.

This is an article from the VMware from Scratch series

NSX Manager is a Cluster of three VMs

You should end up having three NSX-Manager VMs in a cluster when you finish NSX-T installation. The first one will be deployed via .ovf file from vCenter, the other two direct from first NSX Manager GUI as soon as you connect it to vCenter (aka. adding the Fabric -> Compute Manager)

VMware NSX-T Managers cluster

NSX Manager VMs should not run on the same ESXi host

Use vCenter datacenter configuration VM/host rules (affinity rules) to automatically keep manager VMs running on different hosts on the VMware environment. It’s about the host failing and you still having most of the managers running.

VMware TKGI – Deployment of Harbor Container Registry fails with error

This is an article from the VMware from Scratch series

During the process of preparation to Install Tanzu Kubernetes Grid Integrated Edition (TKGI v1.8) on vSphere with NSX-T Data Center (v3.0.2) one of the steps is to use Ops Manager to deploy Harbor Container Registry (in this case v2.1.0).

The process of deployment ended with Harbor error several times so I’m sharing here my solution in order to ease things out for you giving the fact that I didn’t come across any solution googling around.

VMware NSX TKGI K8S

Image from VMware website https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid-Integrated-Edition/index.html

In the process, the Harbor Registry product tile is downloaded from the VMware Tanzu network portal, imported in the Ops Manager installation dashboard, and selected to be configured and prepared for deployment into the VMware environment.