Degree of Availability from Components level to System level

I created this post from one of my blogs on Availability, as per the suggestions that calculating Availability is tricky and a separate blog should be there for calculating or predicting the availability.

Consider a System with 3 subsystems/components A, B, and C. The component B is a combination of components B1 and B2. Here A, B, and C are in series and component B1 and B2 are in Parallel.

Sample System To calculate Availability

Calculating Degree of High Availability

Components of a sub system, are called operating in series if failure of any of the components causes failure of the sub system. In such case, multiply the availability (A) of components, to find availability of the Sub system.  Asubstem= Acomponent_1 x Acomponent_2

In this case, component A, B, C are working in series. Hence the availability of complete system will be

Asystem = AA x AB x AC

Components of a sub system, are called operating in parallel if failure of ALL components causes failure of the sub system. In such cases if a components fails, other components take over. In such case, multiply the Unavailability (UA) of components, to find availability of the Sub system.

Asubstem= 1- (UAcomponent_1 x UAcomponent_2 x ……… UAcomponent_n)

                        Where UAcomponent= 1-Acomponent

Now let’s use the above formula to find the availability of component B, which is set of two components.

AB = 1- (1-AB1) x (1-AB2)

Hence to calculate the Availability of above sample system, following are the steps.

Availability          = AA x AB x AC

= AA x {1- (1-AB1) x (1-AB2)} x AC

= 99.00% x {1- (1-99.00%) x (1-99.99%)} x 99.99%

= 99.00% x 99.9999% x 99.99%

= 98.99%

Advertisements

4 Steps to Reveal the Degree of Availability of your IT Solution

High Availability - factors causing downtime

High Availability – Planned and Unplanned Downtime

Calculating the degree of availability is tricky and require exercise to segregate system into components and then then calculate availability of each component and consolidate the availability of entire system.  Here are the three steps to calculate, how much your system is available.

Step 1: Decide the level of Availability you need

Downtime can be categorized in to Planned (or Scheduled) and Unplanned (or Unscheduled) downtime. Usually maintenance tasks; such as installing updates, configuration changes results into planned downtimes. Unplanned downtime is caused by events which were unknown until they occur such as hardware failure, network outage etc.

As planned downtime is well informed in advance and does not impact user base due to workarounds, hence sometimes planned downtime is excluded in calculating the availability. So it is your discretion if you want to include the planned down time. Depending upon the considerations, there are three types of Availability levels

  1. Highly Available: System available during specified operating hours with No unplanned outage
  2. Continuous Operations: System available 24 x 7, with No planned outage
  3. Continuous Availability: System available 24 x 7, with No planned/unplanned outage

Step 2: Break the System in to components

A software system is built by integrating the various software/hardware subsystems (components) and downtime/failure of any subsystem results in partial/full unavailability of the system. Hence you would need to calculate the availability of each subsystem to determine the availability of the target system. Hence you need to break the system in to components. Each components should be capable enough as a unit to fail the system or some other components. Then identify the availability of each component.

Step 3: Measure the Availability of each component

To measure the Availability of a component, you need to know the Mean Time Between Failures (MTBF) and Mean Time To Recover (MTTR) for each component. Once you have this information then use the formula, Availability = MTBF/ (MTBF+MTTR), to get the availability of the components.

You can find the Availability data from your Vendors who are providing infrastructure or softwares.

Step 4: Consolidate the availability of the components

Components of a sub system, are called operating in series if failure of any of the components causes failure of the sub system. In such case, multiply the availability (A) of components, to find availability of the Sub system.  Asubstem= Acomponent_1 x Acomponent_2

Components of a sub system, are called operating in parallel if failure of ALL components causes failure of the sub system. In case a components fails, other components take over. In such case, multiply the Unavailability (UA) of components, to find availability of the Sub system.

Asubstem= 1- (UAcomponent_1 x UAcomponent_2 x ……… UAcomponent_n)

                        Where UAcomponent= 1-Acomponent

Consider a System with 3 subsystems/components A, B, and C. The component B is a combination of components B1 and B2. Here A, B, and C are in series and component B1 and B2 are in Parallel.

Sample System To calculate Availability

Calculating Degree of High Availability

Hence the to calculate the Availability of above sample system, following are the steps.

Availability          = AA x AB x AC

= AA x {1- (1-AB1) x (1-AB2) } x AC

= 99.00% x { 1- (1-99.00%) x (1-99.99%) } x 99.99%

= 99.00% x 99.9999% x 99.99%

= 98.99%

Do you know any other better way to calculate the Availability? Leave your thoughts in the comments box.

Are you losing profits! Make your system highly available?

HA_Cost of Downtime

Increase Profit – Make your IT solution Highly Available

Recent studies indicate that almost 59 percent of Fortune 500 companies witness IT outages of 1.6 hours per week. A company  with around 10,000 employees, which pays USD 56 per hour (including salary and  benefits), is losing USD 46 billion per year due to unavailability of its software solutions. (http://www.evolven.com)

According to a research by Coleman Parkes, 37,160,146 person hours are lost across Europe due to IT downtime.

In the last one decade, there has been a major shift in the way business organizations work.  Most companies are actively using technology at every level to become more efficient and productive and improve their profitability. Take any industry—whether healthcare, manufacturing, social networking, media, or communication—you will be amazed at the technical solutions (a combination of software/hardware) that companies within these verticals have deployed and the critical role these solutions are playing in improving their businesses.

At the same time, though these IT solutions are adding value, they are also leading to loss of business when they are not ‘available or down’.    Airlines, for instance, cannot afford outages that cause their ticket booking systems to be down, even for a few hours. Retail chains like BestBuy and  eBay cannot afford outages of their e-commerce websites or billing systems around Thank Giving or Christmas.

The ‘Availability’ of a system talks about how long the system will remain up and running to serve the purpose of its end users. A system which is up and accessible to end users will be considered ‘Available’. If a system is up and due to network issues not accessible to end users, it will be considered up, but not ‘Available!’

High Availability

Downtime at 10% Unavailability

Calculating Downtime is an intuitive way of calculating Availability. The claim that an IT solution is 90 percent available in a year (24x7x365), draws a ‘wow’ reaction. What an impressive figure—just 10 percent downtime in a year! Let’s examine this data more closely. Ten percent unavailability implies that the software is down for:

  • 36.5 days in a year,or
  • 72 hours in a month, or
  • 16.8 hours a week or
  • 2.4 hours a day

Therefore, if a system is available for 90 percent it implies that the total downtime for the target system is one month in a period of 12 months.

It has become a trend to express Availability in the count of Nines (9). For example, one nine to seven 9s (90, 99, 99.90, 99.99, 99.999, 99.9999, 99.99999). The more the nines, the more reliable and available the system is. For example, if an online banking system is available for 99.999 percent of the time, then it means the system is down only for 5.26 minutes in a year!

The following table reveals interesting facts about down time. Clearly, the more nines, the more the Availability.

HA_3_Availability in Nines

And how should one calculate Availability?

To measure Availability, you need to know the Mean Time Between Failures (MTBF) and Mean Time To Recover (MTTR). Once you have this information then use the following formula:

Availability = MTBF/ (MTBF+MTTR)

The above formula will show just how much your system is available to end users. In the blogs that follow, I will be writing about the factors which result in lower availability as well as how you can calculate the MTBF and MTTR.

I have tried to briefly talk about Availability and am looking forward to hearing your opinions on the issue and taking the discussion forward.