listen to this article:

When Traditional High Availability Is Not Good Enough - Unbound Security Blog

In this blog, we will explore how one can provide a highly available key management and vHSM service, for the relevant cryptographic use cases, comparing Unbound pure-software technology to the legacy HSMs.

The digital revolution has transformed the landscape of business. Traditional high availability is no longer good enough; key applications must be accessible at all times for businesses to survive and thrive in today’s highly competitive and dynamic environment. Meeting these higher availability demands requires a well-thought-out strategy that accounts for the increasing complexity of enterprise application infrastructures. Data centers and systems now span the globe, integrating disparate business processes. Designing your application infrastructure for continuous availability, therefore, begins with the architecture that must include all the underlying services, including cryptographic ones.

The goal of a traditional High Availability (HA) architecture is to mitigate or prevent application downtime or outages due to failures caused by any type of infrastructure failure. Disaster recovery primarily deals with falling back on the secondary site in case of a failure at the primary site. With globalization and the Internet driving application access from all corners of the world, making applications available all the time is far more important than ever before.

Numerous applications rely on cryptographic functions at the back-end. In fact, practically every application that we are using in our daily activities, such as authentication to a service or an app, banking transactions, secure browsing, or sending an email critically depends on the durability and availability of a cryptographic service.

99% is Not 100%

In the past, applications used to safeguard the encryption keys in Hardware Security Modules (HSM), a dedicated, rigid, and inflexible HW cryptographic appliance.

The declared Mean Time Between Failures (MBTF) values of several legacy HSM vendors vary in the range of 5-40 years. The enormous standard deviation of this range (over 14 years), reflects much on the flawed prediction calculation methods that were used by these manufacturers. It is unrealistic to believe that HSM could last for 40 years even under ideal operating conditions, not even mentioning that it wouldn’t be technologically relevant.

The availability of the HSM can be calculated using the following formula:

Traditional High Availability

This yields an overall availability of 95%-99%.

Eliminating the Single Point of Failure

Unbound Security completely eliminates the single point of failure for the most sensitive assets, ensuring keys and secrets are never kept whole (as they used to be protected inside HSMs in the past). Unbound implements multiparty computation (MPC) to create and use the fragmented secret without ever unifying it, in a method mathematically proven to be impossible to breach or hack of any single location.

The Unbound Key Control (UKC) system is comprised of one or more pairs of standard servers that are installed and managed by the user. Each of these pairs is comprised of an Entry Point and a Partner. Together, they form the secure boundary of the UKC. To satisfy the minimum high availability requirements two pairs must be used, comprised of four servers.

Applications within the network connect to the entry point for consuming cryptographic services for the keys that are managed within the UKC.

UKC provides a solution with traditional high availability, meaning that no single server failure stops UKC functionality. An aspect of traditional high availability is the existence of a Disaster Recovery (DR) or Continuity of Business (COB) site that takes over once the main site fails. While such a site is not required to be online as long as the main site is functional, it does need to stay connected and data synced with the online system so that it can take over as needed with up-to-date key material.

Measured UKC software server availability is 99.9%. Hence, the single pair UKC availability would be 99.8% (since Entry Point and Partner are operating in series). The following table demonstrates the availability of the UKC service per a certain number of pairs running in parallel:

Number of pairs UKC cluster availability
1 99.8%
2 99.9996%
3 99.9999999984%

With just 2 UKC pairs (a total of 4 servers) one can reach an availability level typically feasible only for telecom grade equipment (between five and six nines).

With 3 UKC pairs (a total of 6 servers) availability level of IaaS/PaaS service is reached. UKC with 3 pair has 10.5 nines of availability (!), compared to AWS S3 with 11 nines.

Use Cases

The high-end level of key management and cryptographic keys availability is paramount for services serving a large number of end-users. Such cryptography consuming services include:

  • Code signing for a SaaS / large enterprise
  • Protecting SSL keys for hosted websites provider
  • Document signing for a SaaS / large enterprise
  • Securing payments transactions for a bank
  • PGP within an organization
  • IPsec for a telecom / SP network
  • Smart metering for a water / gas / electrical utility
  • File-level encryption for endpoint devices in an enterprise

Deployment Options Improving Traditional High Availability

The location of the UKC cluster nodes is determined according to the application architecture, locations of the users consuming services, and regulatory compliance aspects.  The following figure depicts several possible topologies that allow creating an elaborate high availability scheme, such as locating the UKC nodes:

  • On-prem – in the DC and the DR sites
  • Hybrid – on-prem and at the CSP
  • Single CSP – across different regions / availability zones
  • Different CSPs – a node per each CSP (at least)

When Traditional High Availability Is Not Good Enough - Unbound Security Blog

Deployment Options Improving High Availability

In Short…

It is essential to create a coherent design for the required services availability which is appropriate to the particular business processes and to match them to how critical each of these processes is to the overall business mission of the organization. Based on this information a proper arrangement for high availability should be made, preventing downtime of crucial components of the service, such as key management and protection.

Unbound UKC allows applications to enjoy superior availability, comparted only with the cloud service provider’s infrastructure.