Degree of Availability from Components level to System level

I created this post from one of my blogs on Availability, as per the suggestions that calculating Availability is tricky and a separate blog should be there for calculating or predicting the availability.

Consider a System with 3 subsystems/components A, B, and C. The component B is a combination of components B1 and B2. Here A, B, and C are in series and component B1 and B2 are in Parallel.

Sample System To calculate Availability

Calculating Degree of High Availability

Components of a sub system, are called operating in series if failure of any of the components causes failure of the sub system. In such case, multiply the availability (A) of components, to find availability of the Sub system.  Asubstem= Acomponent_1 x Acomponent_2

In this case, component A, B, C are working in series. Hence the availability of complete system will be

Asystem = AA x AB x AC

Components of a sub system, are called operating in parallel if failure of ALL components causes failure of the sub system. In such cases if a components fails, other components take over. In such case, multiply the Unavailability (UA) of components, to find availability of the Sub system.

Asubstem= 1- (UAcomponent_1 x UAcomponent_2 x ……… UAcomponent_n)

                        Where UAcomponent= 1-Acomponent

Now let’s use the above formula to find the availability of component B, which is set of two components.

AB = 1- (1-AB1) x (1-AB2)

Hence to calculate the Availability of above sample system, following are the steps.

Availability          = AA x AB x AC

= AA x {1- (1-AB1) x (1-AB2)} x AC

= 99.00% x {1- (1-99.00%) x (1-99.99%)} x 99.99%

= 99.00% x 99.9999% x 99.99%

= 98.99%

Advertisements

4 Steps to Reveal the Degree of Availability of your IT Solution

High Availability - factors causing downtime

High Availability – Planned and Unplanned Downtime

Calculating the degree of availability is tricky and require exercise to segregate system into components and then then calculate availability of each component and consolidate the availability of entire system.  Here are the three steps to calculate, how much your system is available.

Step 1: Decide the level of Availability you need

Downtime can be categorized in to Planned (or Scheduled) and Unplanned (or Unscheduled) downtime. Usually maintenance tasks; such as installing updates, configuration changes results into planned downtimes. Unplanned downtime is caused by events which were unknown until they occur such as hardware failure, network outage etc.

As planned downtime is well informed in advance and does not impact user base due to workarounds, hence sometimes planned downtime is excluded in calculating the availability. So it is your discretion if you want to include the planned down time. Depending upon the considerations, there are three types of Availability levels

  1. Highly Available: System available during specified operating hours with No unplanned outage
  2. Continuous Operations: System available 24 x 7, with No planned outage
  3. Continuous Availability: System available 24 x 7, with No planned/unplanned outage

Step 2: Break the System in to components

A software system is built by integrating the various software/hardware subsystems (components) and downtime/failure of any subsystem results in partial/full unavailability of the system. Hence you would need to calculate the availability of each subsystem to determine the availability of the target system. Hence you need to break the system in to components. Each components should be capable enough as a unit to fail the system or some other components. Then identify the availability of each component.

Step 3: Measure the Availability of each component

To measure the Availability of a component, you need to know the Mean Time Between Failures (MTBF) and Mean Time To Recover (MTTR) for each component. Once you have this information then use the formula, Availability = MTBF/ (MTBF+MTTR), to get the availability of the components.

You can find the Availability data from your Vendors who are providing infrastructure or softwares.

Step 4: Consolidate the availability of the components

Components of a sub system, are called operating in series if failure of any of the components causes failure of the sub system. In such case, multiply the availability (A) of components, to find availability of the Sub system.  Asubstem= Acomponent_1 x Acomponent_2

Components of a sub system, are called operating in parallel if failure of ALL components causes failure of the sub system. In case a components fails, other components take over. In such case, multiply the Unavailability (UA) of components, to find availability of the Sub system.

Asubstem= 1- (UAcomponent_1 x UAcomponent_2 x ……… UAcomponent_n)

                        Where UAcomponent= 1-Acomponent

Consider a System with 3 subsystems/components A, B, and C. The component B is a combination of components B1 and B2. Here A, B, and C are in series and component B1 and B2 are in Parallel.

Sample System To calculate Availability

Calculating Degree of High Availability

Hence the to calculate the Availability of above sample system, following are the steps.

Availability          = AA x AB x AC

= AA x {1- (1-AB1) x (1-AB2) } x AC

= 99.00% x { 1- (1-99.00%) x (1-99.99%) } x 99.99%

= 99.00% x 99.9999% x 99.99%

= 98.99%

Do you know any other better way to calculate the Availability? Leave your thoughts in the comments box.

Are you losing profits! Make your system highly available?

HA_Cost of Downtime

Increase Profit – Make your IT solution Highly Available

Recent studies indicate that almost 59 percent of Fortune 500 companies witness IT outages of 1.6 hours per week. A company  with around 10,000 employees, which pays USD 56 per hour (including salary and  benefits), is losing USD 46 billion per year due to unavailability of its software solutions. (http://www.evolven.com)

According to a research by Coleman Parkes, 37,160,146 person hours are lost across Europe due to IT downtime.

In the last one decade, there has been a major shift in the way business organizations work.  Most companies are actively using technology at every level to become more efficient and productive and improve their profitability. Take any industry—whether healthcare, manufacturing, social networking, media, or communication—you will be amazed at the technical solutions (a combination of software/hardware) that companies within these verticals have deployed and the critical role these solutions are playing in improving their businesses.

At the same time, though these IT solutions are adding value, they are also leading to loss of business when they are not ‘available or down’.    Airlines, for instance, cannot afford outages that cause their ticket booking systems to be down, even for a few hours. Retail chains like BestBuy and  eBay cannot afford outages of their e-commerce websites or billing systems around Thank Giving or Christmas.

The ‘Availability’ of a system talks about how long the system will remain up and running to serve the purpose of its end users. A system which is up and accessible to end users will be considered ‘Available’. If a system is up and due to network issues not accessible to end users, it will be considered up, but not ‘Available!’

High Availability

Downtime at 10% Unavailability

Calculating Downtime is an intuitive way of calculating Availability. The claim that an IT solution is 90 percent available in a year (24x7x365), draws a ‘wow’ reaction. What an impressive figure—just 10 percent downtime in a year! Let’s examine this data more closely. Ten percent unavailability implies that the software is down for:

  • 36.5 days in a year,or
  • 72 hours in a month, or
  • 16.8 hours a week or
  • 2.4 hours a day

Therefore, if a system is available for 90 percent it implies that the total downtime for the target system is one month in a period of 12 months.

It has become a trend to express Availability in the count of Nines (9). For example, one nine to seven 9s (90, 99, 99.90, 99.99, 99.999, 99.9999, 99.99999). The more the nines, the more reliable and available the system is. For example, if an online banking system is available for 99.999 percent of the time, then it means the system is down only for 5.26 minutes in a year!

The following table reveals interesting facts about down time. Clearly, the more nines, the more the Availability.

HA_3_Availability in Nines

And how should one calculate Availability?

To measure Availability, you need to know the Mean Time Between Failures (MTBF) and Mean Time To Recover (MTTR). Once you have this information then use the following formula:

Availability = MTBF/ (MTBF+MTTR)

The above formula will show just how much your system is available to end users. In the blogs that follow, I will be writing about the factors which result in lower availability as well as how you can calculate the MTBF and MTTR.

I have tried to briefly talk about Availability and am looking forward to hearing your opinions on the issue and taking the discussion forward.

15 point tests for Browser Compatiability

This blog may be useful for you if  you have encountered that your team: is getting Production bug reproducible on specific browser(s) has missed important Browser Compatibility test cases is new to Browser compatibility has to ensure testing coverage on … Continue reading

Is it possible to Automate Accessibility Testing???

Accessibility- Visual Disability  This blog discuss the challenges to Automate the accessibility testing of Web applications made accessible  for people having Vision related issues; such as blindness and low vision etc.

   Starting with a brief about Accessibility, I will discuss the challenges we faced, and then will end with the
solution.

As per W3C, Web accessibility means that people with disabilities can use the Web. More specifically, Web accessibility means that people with disabilities can perceive, understand, navigate, and interact with the Web, and that they can contribute to the Web.

W3C started an initiative; Web Accessibility Initiative (WAI), to lead the Web to its full potential to be accessible, enabling people with disabilities to participate equally on the Web. Web Accessibility testing is to validate that website is accessible to people with various level of disabilities.

Businesses are making their websites accessible to avoid legal issues, expend the business (approx 1 trillion $ market) and remove inequality among people with various level of abilities. Tesco invested £35,000 to make their website accessible and generated £1.5 million in a year from online sales to disabled people in Europe.

Broadly disabilities can be grouped under Sensory (Vision, Hearing), Physical (Hand movement, paralysis etc), and Cognitive (dyslexia, slow processing of information etc) disabilities.

For people with Visionary problems such as low vision or blindness, there are some assistive screen reading tools; such as JAWS, NVDA etc.  These tools read the web content; end user hears and accordingly with the help of Keyboard can interact with the website. Tab key, Arrow keys, Enter Key, Shift, CTRL, ALT, and Space bar are most used keys for navigation.

Testing the Website for vision accessibility is a two steps process;  Step 1, Use free tools where you provide the URL of your website and the tool generates a report showing the how accessible is the website. Take appropriate action. Step 2, Manual test engineers imitate the blind users, hear the web content for correctness, and test the navigation and functionalities using keyboard.

Hearing the content and then verifying what you see on the screen, repetitively is monotonous and boring tasks for manual test engineers.  Disorientation leaves space for missing vision accessibility issues during regression testing. Keeping this in mind, since few days I along with my colleague, trying to automate, accessibility testing.

The foremost challenge was we could not find over web, if somebody has tried to automate Screen readers.  Then next biggest challenge was how to verify if the Screen readers are reading the content right. Another technical challenge is that Screen reader tools are not accepting the Keyboard Shortcut inputs sent by various paid/open source tool such as QTP, SilkTest, Test Complete, Selenium, AutoIT, Robot Api etc.  These Keyboard shortcuts help the disabled people to navigate and use the functionality of the Web page.

The solution; we created an Object Repository which contains all the objects, their IDs, specific attribute which JAWS reads, and expected content. We were sure that if the right content is set in the right property of an object, JAWS is going to read it correctly. We also found during our R&D, that JAWS reads the ARIA labels first. So in cases where the development is at initial stage, I would recommend to ensure that development team is entering content in the ARIA labels associated with an object. These contents will be read by JAWS when user moves control on the object.

What is ARIA (http://en.wikipedia.org/wiki/WAI-ARIA) WAI-ARIA describes how to add semantics and other metadata to HTML content in order to make user interface controls and dynamic content more accessible.)

In our case, the test website is already developed, so instead of asking development team to add ARIA labels for all objects, we collected all objects, and their content in specific attributes, which JAWS is reading. This way we created our object repository and automated Screen Readers. For navigation, currently we are using the Tab key, Arrow keys, Enter Key, and Space bar.  With these keys we are able to check all objects, content for JAWS, and functionalities.

I am looking forward to hear from you if you have suggestions and queries.

Empower your team, build a Responsibility Matrix

A group becomes a team when each member is sure enough of himself and his contribution to praise the skills of the others.

A group becomes a team when each member is sure enough of himself and his contribution to praise the skills of the others.

Have you ever faced a situation where in your absence, or that of a critical person, other team members are in a quandary regarding taking decisions, executing tasks or plans or sending reports?

Have you found yourself in a position where team members are calling you, as the key decision-making rests with you?

This is typically a problem faced by people who are managing multiple projects and are key to specific projects. Their team members normally have issues when they are unavailable.  This problem is compounded in the case of geographically distributed teams, or those working in different time zones.

If you have faced such a situation, this blog may be useful for you.

I believe the solution to this challenge lies in engaging with all your team members to create a ‘Responsibilities Matrix’. The idea here is to identify and list all the important tasks to be performed by the team.  Following this, the team needs to identify the primary and secondary owners of these tasks.  These responsibilities must be rotated, wherever possible, among other team members. This will help create backups and reduce dependencies on a few individuals.

I am using this responsibility matrix in all of my projects and a sample of it is attached to this blog.

The idea behind the matrix is bigger than simply adding tasks and the names of the engineers in charge. The aim is to develop a sense of ownership and team spirit. It is to empower the team, improve transparency and communication and lower the dependency on specific people.

How we did it was we sat together and identified the various tasks and grouped them under a ‘Major task’.  The teams then picked the owners of each Major task and recorded them in the Matrix.

The Primary owners were assigned the job of ensuring that the tasks were completed as planned. The secondary owners were directed to play the role of the primary owners in their absence. The individual contributors—the team members—were asked to complete the tasks. In case of a rotation, we advised them to make sure that the primary and secondary owners of Major tasks were not be moved at the same time.

I hope this gives you an idea about the ‘Responsibility Matrix’ and its benefits. I look forward to hearing your views on it.

responsibility matrix

Compressed workweeks: A new strategy for workforce retention

Compressed Workweek

Increase Productivity – Reduce working days

How wonderful it would be if you had to work for only four days and get three days off—starting from Friday and ending on Sunday. Interesting? Keep reading.

I read a few articles recently which talked about how a few organizations were experimenting with the idea of giving people Fridays off, in case they had completed their weekly quota of hours!

They referred to this as ‘Compressed Workweeks’. Some other companies called it Alternative workweek schedules.  What this really means is that if the weekly quota of people is 42 hours, they can work 10.5 hours for four days and avail of the Compressed Workweeks benefit.

Another option is for people to work nine hours for four days and then work three hours each on Friday and Sunday.

I recently heard that companies such as IBM, Qualcomm, PwC India, Dell and some others were experimenting with the Compressed Workweeks concept.

It is my belief that flexibility in working hours can help employee manage their hectic schedules as well as balance their professional and person lives.

The concept, can for instance, work for professionals who want to enjoy long weekends.   Naturally, they will have had to put in extra effort on weekdays and deliver their assignments on time.

While on the face of it, the model appears interesting, I am not sure whether it can work in the software industry. For one, it requires better visibility of the work to be done in every week/month and a clear division between the fixed working days and the optional working weekdays.

Another bottleneck can be that the software industry is driven by output rather than the hours spent in the office. Even Agile practices suggest that if a person is unable to deliver user stories assigned to him, then his velocity reaches zero!  Also, after working for 6-7 hours, the productivity of people typically recedes and produces rework.

Though, I am sure that the compressed workweek idea will help engage employees, keep their morale high and retain them, them, there are several logistical issues to consider. Keeping track of the projects, monitoring their progress and managing them will need additional effort.

The Compressed workweeks model can be truly successful where the work volume per hour is defined, as with call centers and software maintenance projects.

This is of course my view. I am keen to know what you think about the emerging trend. Do you think it will be a hit with the software industry, especially outsourcing service providers?

Do write in and share your views.