NSX-T Active-Active Multisite in a Single Region and Failover to a Secondary Region Part 1.

An overview of different NSX-T Multisite Topology options

Multisite Deployment of NSX-T Data Center

A client required the ability to deploy NSX-T Multsite in an Active-Active manner within a single rack, and have a backup/DR rack to failover to with minimal intervention and disruption to the dataplane. This setup will require local egress within the active rack, therefore having minimal data sent across racks.

I have explored three ways to possibly configure multisite;

Active-Standby
Active-Active
Active-Active in a single site or region, with failover to a secondary region

Below are quick summaries of the above topologies.

1. Active-Standby Topology

The standby topology consists of having a T0 gateway in Active-Standby which effectively places the Edge VM's in Active-Standby as well. Refer to the image below.

In this NSX-T multisite deployment scenario, should the primary site fail, the NSX-T Edge Cluster and dataplane will failover to the secondary site and the standby Edge VM will become active. The workload failover is beyond the scope of this article, however must be thought of.

2. Active-Active Topology;

The NSX-T Datacenter Active-Active multisite topology isn't how one would traditionally envisage an Active-Active site functioning. There would be two T0's, each with their own edge cluster, both with segments attached to them directly or plumbed into a T1, the T1 is then linked to the T0. Each site propagating different subnets via eBGP or made available through static routing. Above this clients may choose to have some form of application layer load balancing with the use of a GSLB or any other mechanism they deem appropriate. During a site failure, depending on which site fails, the NSX-T Edge Cluster's active node would fail to the other site. Until the site that failed is brought back online, all traffic for the segments that were in the failed site will be propagated through the second site. Refer to the image below.

3. Active-Active in a single rack or site with a standby site or rack for failover

In this NSX-T multisite design, we look at configuration to enable an Active-Active T0 gateway and to be able to control where the Edge VM's are placed and where the dataplane traffic will ingress/egress. Generally, a single rack/site deployment is easy as there is single rack for all appliances or there is more than one rack and no need to control where traffic is ingressing and egressing.

However, for this NSX-T edge cluster failure scenario, there were two racks in a single site (each with their own ToR's with routing enabled and uplinks to the network core). To ensure dataplane traffic was ingressing and egressing from the active rack and only failed to the backup rack if the active failed, I had to reduce manual intervention to minimize dataplane downtime.

The T0 will peer upstream to the ToR's. Whilst this would satisfy the minimal downtime, it does not satisfy having dataplane traffic egressing locally in the active rack. This is because the ToR's and by nature of dynamic routing the core, have learnt the routes from either 'sites' peers. The physical fabric sees the paths being the same length and therefore will balance across all. We now need to make the active site the preferred route, and this can be done by prepending the AS and attaching it to the out filter on the interfaces pointed to the second rack's ToR's.

Below is a diagram of this topology, keeping in mind I replicated this environment in my lab and a production environment would generally have redundancy built in at each layer.

Search

NSX-T | NSX-ALB | Software-Defined Data Centers | VMware Cloud Foundation (vCF) | - LAB2PROD