Tag: SDDC

VMware NSX-T Install Tips & Tricks

UPDATE on 13 Feb 2021:
There were some changes and improvements with version NSX-T 3.1, so some tips are no longer needed. I’m in the process of proving those notes myself, but it seems NSX EDGE VMs can be migrated now and EDGE VTEPs don’t need a separate subnet from HOST VTEPs anymore.

Intro

It’s a shortlist of things that you should probably know when installing VMware NSX-T. Of course, installing NSX-T should be done by following the official documentation. This here is just a few additional points that could help. It’s for your peace of mind afterward.

This is an article from the VMware from Scratch series

NSX Manager is a Cluster of three VMs

You should end up having three NSX-Manager VMs in a cluster when you finish NSX-T installation. The first one will be deployed via .ovf file from vCenter, the other two direct from first NSX Manager GUI as soon as you connect it to vCenter (aka. adding the Fabric -> Compute Manager)

VMware NSX-T Managers cluster

NSX Manager VMs should not run on the same ESXi host

Use vCenter datacenter configuration VM/host rules (affinity rules) to automatically keep manager VMs running on different hosts on the VMware environment. It’s about the host failing and you still having most of the managers running.

Software-defined data center and what’s the way to do it

SDDC – Software-Defined Data Centers

Times of Software Defined everything has long since arrived, the need to implement many appliances, two or more for each network function, is not so popular anymore. The possibility to manage packet forwarding, load balancing and security of network traffic inside the datacenter from one simple web console is showing finally that things can be managed in a simpler way after all. All vendors in the networking world tried to come up with their own way of centralizing data center management, as it ends up, all of them did it, some better than the others. As always, it’s not a surprise that some vendors are better in creating hardware-based forwarding solutions and some others in software solutions (in this case, software for packet forwarding).

Requirements

It seems that we have basically only a few good options when wanting to select a complete SDDC solution. The data center needs to provide a large number of server access ports in the form of networking devices that are configured and managed as simply and promptly as possible. Datacenter network needs to be configured in a way to provide robustness and stability of packet forwarding at almost line rate and all that at 10-100, even 400Gbps speeds.

Cisco ACI – API Calls vs JSON POST

API Calls method

The fancy way of configuring Cisco ACI Fabric is by using Python script for generating API calls. Those API calls are then used to configure Cisco ACI by pushing those calls to APIC controller using POSTMAN (or similar tool). Configuration changes done this way are those that you are doing often and without much chance of making mistakes.

You write a Python script and that script will take your configuration variables and generate API call that will configure the system quickly and correctly every time.

The thing is that you need to take the API call example and use Python to write a script that will recreate that API calls with your variables of configuration and do that correctly. You need to know to code in Python and you will need a certain amount of time to write that script.

POST JSON file method

How to Advertise a Route from ACI Layer2 BD Outside the Fabric?

Sometimes you will have some L2 domains (Bridge Domains – BD) in your datacenter that will be used with hardware appliances like F5 NLB or something like an additional firewall, WAF or something similar. That is the case where ACI will not route or bridge but the only L3 point of exit from that kind of segment would be on actual hardware appliance outside ACI Fabric – connected to the Leaf port.

We will take an example here and use it throughout the article where BIG IP F5 NLB is used as an L3 termination of L2 BD 10.10.10.0/24.

F5 is directly connected to ACI Leaf and routing from 10.10.10.0/24 subnet (L2 BD) is done directly on F5 device which is default gateway for that subnet endpoints.

ACI L2 BD Host Routing

In those cases for some particular implementations when you decide not to use PBR or Service graphs, it will happen that appliances like our F5 would become L3 termination for some ACI L2 BD like the 10.10.10.0/24 from my beautiful image above.

ACI MultiPod – Enable Standby APIC

APIC Controller Cluster

You actually need three APIC controller servers to get the cluster up and running in complete and redundant ACI system. You can actually work with only two APICs and you will still have a cluster quorum and will be able to change ACI Fabric configuration.

Loosing One Site

In the MultiPod, those three controllers need to be distributed so that one of them is placed in the secondary site. The idea is that you still have a chance to keep your configuration on one remaining APIC while losing completely primary site with two APICs. On the other hand, if you lose secondary site, two APICs in the first site will still enable you to do configuration and management of ACI Fabric as nothing happened.

Losing DCI Interconnect

The second type of MultiPod fail is when you lose DCI (datacenter interconnect). In that case, both sites will keep working but will alert that the other side is down. The secondary site with one APIC will be in read-only mode and the primary site will be fully functional with two remaining APICs on that site. If some changes are made on the primary site, those changes will be replicated to the third controller on the secondary site when DCI recovers and configuration relating site B POD will be then pushed to POD 2 fabric.

DCI issues are not a good time for APIC replacement, just wait for DCI to start working normally and continue to use ACI APIC controllers as before the issues. You will still have the option to manage primary site if DCI fails and after DCI starts working again changes will be replicated to secondary site APICs and Fabric.

Please note that temporary DCI issues are not a good time to replace the APIC. If you are experiencing just a DCI outage the second site still works but it cannot be configured. Think about it, perhaps the best thing to do in this case is not to change the configuration of your fabric on either side while DCI doesn’t get back up and running. That way you are sure your configuration does not affect the MultiPod stability once DCI gets back up and sites start to communicate again.