High Availability and Disaster Recovery

High Availability (HA) and Disaster Recovery (DR) are two related but separate techniques which work together to ensure your mission critical Patriot monitoring system has the greatest possible uptime

High Availability: High Availability is the term used to describe the hardware, software and processes designed to eliminate single points of failure and provide redundancy to ensure your mission critical Patriot monitoring systems have the greatest possible uptime.
Disaster Recovery: Disaster Recovery refers to redundant hardware and software situated in a remote location (offsite) so that in the event of a catastrophic disaster monitoring operations can be relocated with the least possible downtime and disruption.

General Recommendations

Downtime Requirements

There is a direct trade-off between cost and the level of downtime that can be tolerated. Hardware or operating system failure at the primary site is significantly more likely to occur than a catastrophic disaster. For this reason High Availability measures introduced at the primary site are likely to yield the greatest return in uptime. Disasters such as floods, fires, earthquakes, long term power outages and major power surges do happen and must be planned for, but as they are far less frequent a greater downtime is usually tolerable.

Quality Equipment

The risk of hardware failure can be mitigated considerably with quality hardware components with built in fault tolerance and redundancy. This includes backup power generation, UPS systems, redundant power supplies, redundant network connections, redundant fans, ECC error correcting memory, raided hard drives, hot swappable hard drives, Storage Area Networks (SANs) and Network Attached Storage (NAS). However these measures are beyond the scope of this document and should be considered alongside budget constraints and in consultation with IT professionals.

Data Backups

It is important to realise High availability and disaster recovery measures do not replace the need for regular (automated) backups. Whereas HA and DR strategies are intended to mitigate downtime resulting from hardware or software failures, regular backups are required to protect against data loss (e.g. unintended mass deletion) or unexpected data corruption.

High Availability

To have a true highly available system, all components must have redundancy, so that there are no single points of failure.

Software

The Patriot services (Data Service and Task Service) as well as SQL Server should use Windows Failover Clustering to provide redundancy.

In a cluster, multiple copies of the software are configured on separate servers, with one copy being active. If the active copy fails for any reason, another copy is automatically started on a different server. Clients (e.g. workstations, IP devices) are automatically connected to the active service as it switches between servers.

Services can also be manually moved between servers within the cluster, allowing for easy system maintenance such as Windows Updates without requiring software downtime.

For other mission-critical software such as IP Receiver software, consult your supplier / IT team for their recommendations.

Additional information:

Hardware / Network

These items are outside the scope of Patriot documentation, but must be considered. Ensure you have a redundancy plan for any critical equipment such as physical receivers, network infrastructure, phone lines etc.

Disaster Recovery

High Availability at the primary site should be combined with Disaster Recovery to protect against problems which make the entire primary monitoring site unavailable. Disaster Recovery requires a completely independent system at a separate location which can be used in these situations.

See Disaster Recovery for more information and recommended practices.