Software-defined data center and what's the way to do it

Table of Contents

SDDC – Software-Defined Data Centers

Times of Software Defined everything has long since arrived, the need to implement many appliances, two or more for each network function, is not so popular anymore. The possibility to manage packet forwarding, load balancing and security of network traffic inside the datacenter from one simple web console is showing finally that things can be managed in a simpler way after all. All vendors in the networking world tried to come up with their own way of centralizing data center management, as it ends up, all of them did it, some better than the others. As always, it’s not a surprise that some vendors are better in creating hardware-based forwarding solutions and some others in software solutions (in this case, software for packet forwarding).

Requirements

It seems that we have basically only a few good options when wanting to select a complete SDDC solution. The data center needs to provide a large number of server access ports in the form of networking devices that are configured and managed as simply and promptly as possible. Datacenter network needs to be configured in a way to provide robustness and stability of packet forwarding at almost line rate and all that at 10-100, even 400Gbps speeds.

Packet forwarding itself is not enough, data center networking solution needs to provide security. Security then needs to be in different forms. First you most of the time need support for multiple tenant configurations which can be used for public-cloud/something or for separation of test/stage/production environments. After tenants, you will probably need some security zoning and more security in the form of traffic inspection for North-South traffic. North-South being really important as soon as you publish anything from your data center to the outside world. Later you will probably get to micro-segmentation inside the data center itself to enhance the security or to achieve some crazy security dream of complete isolation and whitelisting specific traffic even inside L2 domains.

The latest of mentioned security options, the micro-segmentation, enables granular security and service availability but at the same time to prevents unwanted traffic between compromised components of services that are running on datacenter server hypervisor hosts.

Datacenter networking solution needs to provide more, a nice and useful monitoring tool, and some other tools that ease troubleshooting of issues on multiple devices at once.

Did I mention redundancy? Okay, you surely need that too.

Selecting the right technology and solution

When you look at this from the perspective of a network solution architect, without going deep into every part of each below-mentioned technology (the thing we are doing with other articles on this site), you can use this as design suggestion that must be taken into account if you really want to implement a real software-defined datacenter with all its components and features.

It is simply stated as a solution that uses both solutions. Cisco ACI should be used for hardware fabric provisioning and monitoring and VMware NSX-T for advanced networking L3-L7 functions.

Those two technologies are not the same kind of technology, they are basically impossible to compare because they are complemental. You still need servers to be connected to a robust network in order to get from each host at least 4x10G connection with the datacenter and ACI is the solution that is proven to be good for this part. On the other side, you need to be able to put a firewall in from of each VM in the data center if the security is of any importance to you and you want that security policy still stays fairly simple to manage and enforce. You also want to have good load balancing and multisite solutions with those features to be able to be easily implemented to certain groups of VMs inside the datacenter and specific traffic flows. VMware NSX-T excels at that part.

Choose wisely

Managing many things related to packet forwarding and networking security directly on the network traffic source is having several important advantages.

In today’s datacenter, from a networking device physical perspective, traffic sources are mostly hypervisor hosts. The ability to control traffic before it actually exits the hypervisor host towards some other host is showing so many advantages over doing the same thing on the networking hardware that it is starting to be difficult to defend any other datacenter design solution.

On the other hand, having simple datacenter networking device fabric components provisioning is a really helpful and nice thing to have when we speak about datacenter with 2000+ 10G server access ports.

The thing is, some solutions that are good in some parts of the above-mentioned requirement list are not very good at other things that we need in the datacenter. A great number of issues arise when we want to solve all requirements with one solution. Been there, done that, it’s not fun.

Cisco ACI

Cisco Application Centric Infrastructure is a hardware datacenter fabric provisioning solution for a software-defined data center that uses Nexus9K L3 switches and converts them in a CLOS fabric.

It does that by automatically provisioning fabric compatible Cisco switches with ISIS routing between them and enabling VxLAN overlay in order to enable routing of L2 traffic through overlay and thus making 100 and more switches to work as one big line card switch chassis with 1000+ datacenter server access ports. It is a proven and good Cisco-based data center fabric solution and a really surprisingly stable CLOS topology. Cisco says it enables all other multitenancy, multi-site datacenter, segmentation, load-balancing, and micro-segmentation features within its ACI fabric APIC controller configuration. It furthermore markets itself to be able to handle multiple types of third party appliances integration inside its Application Centric configuration model with advanced policy-based packet redirects mechanisms.

The truth is, it should not be used for anything more than a good datacenter server access networking solution which enables server uplinks connected to redundant switches inside the data center to have the option to be connected inside the same L2 segment no matter on which switches inside datacenter they are physically connected. We are speaking about the need of having the same VLANs (VxLANs to be exact but really and L2 domain) available for all VMware hypervisor hosts cluster uplinks no matter where connected, and all that without the need for expanding VLANs through the whole datacenter in the classic way of VLAN propagation and Spaning-Tree L2 loop prevention confounding mechanism mix.

This is where Cisco ACI is good. More ports needed, you uplink another Leaf towards both Spine switches and click in APIC controller provisioning and the new switch is part of that fabric in less than a minute. Want Multi-Site solution with multiple site traffic inbound paths with some firewalls on the datacenter entrance, asymmetric routing will kill your design. Want the microsegment 1000 server VMs so that each has its own security policy, within Multi-Site ACI fabric, is a no go from a scale perspective.

Still want to have all that, leave the ACI to do what he does best, to be a datacenter traffic path and L3 fabric, get VMware NSX-T for other stuff and do that part of networking in software.

VMware NSX-T

It’s not perfect either, you still cannot have a dual-datacenter solution with inbound traffic able to enter at both locations for all destinations, no no, only from one side. Still, old legacy stuff prevents implementing that kind of design without the global load balancers and authoritative DNS records handling for inbound traffic management. Still, you need BGP and ISP AS-Path prepend and policy tweaking of BGP side stuff for active-active datacenter at least.But for Disaster Recovery as a secondary backup site, it works great. You have a data center and want to make it a private cloud, public cloud, hybrid cloud? It can be done, fairly easily. VMware with NSX-T enables you to have all that managed through one NSX-T manager console. NSX-T manager is basically a virtual controller cluster appliance that manages VMware ESXi host in-kernel implemented virtual switch configuration and with that enables packets handling by the hypervisor directly in the host kernel. Before the packet exits the host. Remember that? that’s good, because the forwarding decisions and security and everything that you want is applied at the traffic source, and that’s good.

Anything is not for everything

Working on many new datacenter designs in the last few years helped me to realize how the thinking about separate appliance for each networking function doesn’t always get you the best data center design. On the other side, complicated hardware switching fabrics that are pushed by some vendors did not show the ability to simplify things in the data center, rather, if you want to use them for anything else than switching fabric, they mostly just start to make networking in the data center more complicated bringing additional layers of complexity it the design.

The idea of managing datacenter fabric from one central point, having the possibility to automatically provision and manage L3 based CLOS topology datacenter switching fabric sounds really promising. The idea by itself was promising, but getting the Cisco ACI fabric to work and afterward configure and troubleshoot some simple security and forwarding paradigms on that fabric it’s all but simple. The CAM tables on switches in use were simply not so extensive to support the scale of micro-segmentation that was required by some customers. Policy-based redirection for third-party security appliance integration was confusing and unmanageable. Everything after L3 was simply too complicated.

The whole idea of using legacy networking protocols with hardware switches and simply hide everything behind centralized provisioning of Application Centric, relational database formatted configuration, brought so many obstacles in the ability to create some advanced configuration on that fabric that after some time we ended up enabling only simple switching function on L3 based fabric getting the fabric of 10 to 40 switches to work as one simple L3 switch.

Having NSX-T will give you a lot of fancy options for managing network functions without touching the network hardware (underlay). But to meet standard network underlay requirements for NSX-T, you still need a good Leaf-Spine fabric in the background: https://blog.ipspace.net/2020/02/do-we-need-complex-data-center-switches.html

Everything beyond that was more or less full of scale and forwarding limitations that we needed to get rid of. It is the point where you should start to think that some networking features that customer needs are better virtualized and applied prior to packets exiting the hypervisor. This was starting to look at the best way to implement those complex micro-segmentation solutions into the data center. It ended up with the use of both ways of doing datacenter fabric mentioned above with each solution solving half of the modern datacenter requirements.

Google Jupiter Data Center Network Fabric - New Way of Building Data Center Network Underlay

Cisco DNA Upgrade Issues - Application Update Stuck

Cisco Champion 8th year in a row