Multisite Deployment of NSX-T Data Center | NSX-T 3.0 Installation Guide Online | LAB2PROD
NSX-T Active-Active Multisite in a Single Region and Failover to a Secondary Region Part 1.
An overview of different NSX-T Multisite Topology options
Multisite Deployment of NSX-T Data Center
A client required the ability to deploy NSX-T Multsite in an Active-Active manner within a single rack, and have a backup/DR rack to failover to with minimal intervention and disruption to the dataplane. This setup will require local egress within the active rack, therefore having minimal data sent across racks.
I have explored three ways to possibly configure multisite;
- Active-Standby
- Active-Active
- Active-Active in a
single site or region, with failover to a secondary region
Below are quick
summaries of the above topologies.
1. Active-Standby Topology
The standby topology consists of
having a T0 gateway in Active-Standby which effectively places the Edge VM's in
Active-Standby as well. Refer to the image below.
In this NSX-T multisite deployment scenario, should the
primary site fail, the NSX-T Edge Cluster and dataplane will failover to the secondary site and the
standby Edge VM will become active. The workload failover is beyond the
scope of this article, however must be thought of.
2. Active-Active Topology;
The NSX-T Datacenter Active-Active multisite topology isn't
how one would traditionally envisage an Active-Active site functioning.
There would be two T0's, each with their own edge cluster, both with segments
attached to them directly or plumbed into a T1, the T1 is then linked to the
T0. Each site propagating different subnets via eBGP or made available through static routing. Above
this clients may choose to have some form of application layer load balancing
with the use of a GSLB or any other mechanism they deem appropriate. During a site failure, depending on which site fails, the NSX-T Edge Cluster's active node would fail to the other site. Until the site that failed is brought back online, all traffic for the segments that were in the failed site will be propagated through the second site. Refer to the image below.
3. Active-Active in a single rack or site with a standby site or rack
for failover
In this NSX-T multisite design, we look at configuration to enable an Active-Active T0 gateway and to be able to control where the
Edge VM's are placed and where the dataplane traffic will ingress/egress.
Generally, a single rack/site deployment is easy as there is single rack for
all appliances or there is more than one rack and no need to control where
traffic is ingressing and egressing.
However, for this NSX-T edge cluster failure scenario, there were
two racks in a single site (each with their own ToR's with routing enabled and
uplinks to the network core). To ensure dataplane traffic was ingressing
and egressing from the active rack and only failed to the backup rack if the
active failed, I had to reduce manual intervention to minimize dataplane
downtime.
The T0 will peer upstream to the ToR's.
Whilst this would satisfy the minimal downtime, it does not satisfy having
dataplane traffic egressing locally in the active rack. This is because
the ToR's and by nature of dynamic routing the core, have learnt the routes
from either 'sites' peers. The physical fabric sees the
paths being the same length and therefore will balance across all. We now
need to make the active site the preferred route, and this can be done by prepending
the AS and attaching it to the out filter on the interfaces pointed to the
second rack's ToR's.
Below is a diagram of this
topology, keeping in mind I replicated this environment in my lab and a
production environment would generally have redundancy built in at each
layer.
Comments