top of page
Search

Redundancy and Resiliency: Data Centers

Updated: Apr 19, 2023

The ability for data centers to operate continuously without interruption is a top priority for companies when designing and planning these facilities. Downtime of data center operation, even for only a few hours, can be detrimental to the company’s profitability and reputation.


Redundancy and resiliency are two strategies that are often implemented in the mechanical, electrical and plumbing (MEP) and telecommunications infrastructure design for data centers to maximize the amount of time these facilities are operational. These two terms are often used interchangeably, but redundancy is a component to overall resiliency. The differences in these two approaches should be understood by owners during the design of a data center.


What is Redundancy?


Redundancy is the duplication of critical components with the intention of increasing reliability o f the system. If the primary component in a system fails, the backup component operates to keep the system functioning. It is important to recognize how different disciplines’ systems are interconnected when identifying which components should be redundant.


There are different types of redundancy approaches that can be applied on a project.

  • N: N is the minimum equipment capacity required for operation. An example would be a chilled water plant that is designed so that one chiller serves the entire cooling load for the facility. A failure or routine maintenance of this single chiller will lead to facility operation downtime while the chiller is being serviced or replaced.


  • N+1: N+1 is the equipment capacity needed for operation plus a backup unit with similar performance data as the primary equipment. This setup ensures that the building remains fully functional as the standby unit becomes operational when the primary equipment fails or goes offline for service. A chiller plant designed in N+1 configuration would consist of a primary chiller and a standby chiller. If the primary chiller fails, the standby chiller becomes operational so that the facility can remain operational.


  • N+2: N+2 is similar to N+1. The difference is that there are two backup units for the primary operating unit. It provides a higher level of system protection and resiliency.


  • 2N: 2N is when a second system is installed that is fully redundant and independent of the primary system. This ensures that a single point of failure in the primary system will not impact facility operations. An example of this setup would be two chilled water systems (chillers, chilled water pumps, chilled water distribution piping and HVAC cooling equipment) that operate independent of one another. If there was a total failure of one of the components in the primary system (i.e. the chiller fails), the second system can operate independently, because it is not affected by a component failure on the primary system.


  • 2N+1: This configuration provides a fully paralleled backup system plus an extra backup component for the equipment in the primary and backup systems.

Rendering of Chiller Room by F&T

What is Resiliency?

Resiliency is the ability for a facility to recover quickly and continue operating even when there is a failure or downtime of the system. A holistic approach should be executed when designing a resilient data center. While redundant components increase the resilience of the facility, there are other measures that can be implemented to enhance resiliency and ensure there is little to no downtime.


The following approaches should be taken into consideration to increase the resiliency of a data center:

  • Redundant equipment and systems

  • Backup generators that provide power to facility when there is a power outage

  • Equipment and systems that are located above the highest expected floodwater level

  • Having one or more backup data centers located in different regions of the country

  • Routine maintenance of equipment and systems

  • Frequent training of facility staff to prevent accidental human errors

  • Employing security measures to prevent cyber attacks


Minimizing downtime is an essential component to designing MEP and telecommunications systems for data centers. It is important for owners to discuss their redundancy and resiliency needs with engineers of all disciplines during the design phase so that data center operation meets their desired outcome.





Written by:


Mark Rowlenson, LEED AP

Mechanical Group Leader

Comments


bottom of page