99.9% vs 99.99% vs 99.999% Uptime:
What the Numbers Actually Mean
SLA uptime percentages are one of the most consistently misread numbers in cloud procurement. The gap between 99.9% and 99.999% sounds small — it is 0.099 percentage points. But translated into allowed downtime per year, the difference is 8 hours and 40 minutes versus 5 minutes. Choosing the wrong SLA tier for a revenue-generating system is a decision that will eventually become very visible.
The Numbers Translated
99.9% uptime allows 8 hours, 45 minutes of downtime per year — about 43 minutes per month. That is one extended maintenance window per quarter, or three or four shorter incidents spread across the year. For internal tools and development environments, this is generally acceptable.
99.99% uptime allows 52 minutes and 34 seconds per year — about 4 minutes and 22 seconds per month. This is the baseline for production systems that generate revenue. Achieving it requires N+1 redundancy on all components (at minimum two of everything in the critical path), automated failover with no manual steps required, and no single points of failure in the infrastructure stack. This is achievable without heroics — it is a standard architecture target for any properly-designed cloud infrastructure.
99.999% uptime allows 5 minutes and 15 seconds per year — 26 seconds per month. This tier requires 2N redundancy on power and cooling (two independent power paths, two independent cooling systems), active-active multi-zone architecture so that failure of an entire zone does not cause user-visible downtime, zero-downtime deployment practices for every software change, and extremely mature incident response (mean time to recovery measured in seconds, not minutes). This is expensive to build and operate, and is only appropriate for tier-1 systems where even minutes of downtime cause direct, significant financial loss.
Downtime Allowance Per Year
- • 99.9% — 8 hours 45 minutes
- • 99.99% — 52 minutes 34 seconds
- • 99.999% — 5 minutes 15 seconds
Typical Use Cases
- • 99.9% — Internal tools, dev/staging environments
- • 99.99% — Production web apps, customer-facing APIs
- • 99.999% — Payment processing, banking core systems
How to Choose the Right Tier for Your System
Calculate the cost of one hour of downtime for the system in question. Include: lost revenue (if the system is in the payment or order flow), support cost (the ticket and call volume spikes during outages), SLA penalties you owe to your own customers if you have service agreements, and reputational cost (harder to quantify, but real for B2C products). Compare that number to the monthly cost differential between SLA tiers. If one hour of downtime costs your business $10,000 and the 99.99% tier costs $200/month more than 99.9%, the answer is obvious.
Credit Terms: Read the Fine Print
SLA uptime numbers are only part of the evaluation. Credit terms matter as much as the headline percentage. A provider with a 99.99% SLA that offers 10% of your monthly bill as credit if they miss the target is providing meaningful financial protection only if the 10% credit meaningfully compensates for your downtime cost. Many large cloud providers offer exactly this — a 10–25% credit for SLA misses — which is often far less than the actual cost of the outage to your business.
What SLA Documents Often Exclude
-
Scheduled maintenance windows — Many providers exclude planned maintenance from SLA calculations. A provider that takes 4-hour maintenance windows monthly is effectively operating at lower availability than their SLA number implies. Check whether maintenance is counted, and whether maintenance scheduling requires your confirmation.
-
Force majeure clauses — Broad force majeure clauses can exclude events that are reasonably foreseeable (e.g., power grid issues, internet exchange outages) from SLA coverage. A provider with good redundancy should not need to invoke force majeure for foreseeable infrastructure failures.
-
Upstream internet outages — If the outage is caused by an upstream internet provider and not the cloud provider's infrastructure, many SLAs do not apply. This is legitimate if the provider has multi-path connectivity — less legitimate if they have a single upstream provider.
-
Credit claim process — Some providers require you to file a claim within a specific window (e.g., 30 days of the incident) to receive credits. Credits that require manual filing are credits that are rarely collected in practice.
Hyper App's Approach
Hyper App offers a 99.999% SLA on production infrastructure — 5 minutes 15 seconds of allowed downtime per year. Credits are applied automatically to your next invoice when an SLA miss is detected; you do not need to file a claim. Scheduled maintenance is announced 72 hours in advance and is excluded from SLA calculations only if the client has confirmed the maintenance window in advance. Unconfirmed maintenance that causes downtime counts against the SLA. That policy reflects what a genuine service commitment looks like: the infrastructure team bears the operational burden of planning maintenance around client confirmation, not the other way around.