BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Outage Prevention: Taking Humans Out Of The IT Equation

This article is more than 10 years old.

Guest post written by Jonathan Crane

Jonathan Crane is chief commercial officer at IPsoft.

With faster networks, higher uptime and more responsive performance management, IT organizations are turning out better results within a shorter period of time – and often at a lower cost – than just a few years ago. However, there is still one glaring problem that is plaguing IT performance at an extremely high rate: human error.

Mistakes are inevitable when humans manage important IT processes and systems, and therefore human error is present in almost every aspect of IT. But the rate of human errors appears to be rising beyond an acceptable rate. In 2010, 51 percent of network outages were caused by human error, according to the Ponemon Institute’s National Survey on Data Center Outages. More recently, the Amazon cloud outage and Blackberry server outage were both instigated by human error. These examples demonstrate just how devastating human error can be, resulting in poor end user experiences, revenue loss and negative ramifications on a company’s reputation. After all, IT is the backbone that supports the majority of organizations and, if it fails, so does the organization as a whole. Thus, IT should refocus on innovation as a way to improve performance and processes, remove operational risks and reduce costs.

The real problem is that the current mindset surrounding human error seems to be one of retroactive fire drills rather than proactive prevention. Instead of reacting to the disastrous results of human error too late, IT departments should instead be pushing for the minimization, if not complete elimination, of end user-facing issues resulting from human error.

To improve the productivity of their IT departments, many companies are implementing automation to help manage basic IT functions. IT automation tools come in two flavors: traditional automation, which relies on a tree-based logic system, and autonomic expert systems, which are based on self-learning principles. Traditional automation follows a pre-programmed formula based on set conditions and infrastructure settings. This type of automation works well when the same process is repeated often. Because automated actions must be reprogrammed with new coding each time the sequence of steps changes, it often does not make sense for IT departments to automate tasks that frequently change or are reactive. However, time savings from this type of automation vary significantly, with most IT engineers seeing improvements of approximately 11-30 percent.

Autonomic expert systems are also being used to help eliminate human error. Autonomics are based on self-learning principles and mimic the work of human IT engineers. These expert systems can eliminate human involvement in up to 70 percent of level 1 IT tasks and 30-40 percent of level 2 IT tasks, leading to reduced IT costs and improved scalability, flexibility and compliance. Plus, because autonomics can perform routine IT processes much more reliably and efficiently than humans, implementing autonomics often means drastic reductions in mean time to resolution (MTTR), more consistent business-related outcomes and a huge amount of time that can now be dedicated to other more creative pursuits.

The learning process of autonomic expert systems is similar to how a child learns by observing its parents or how the human immune systems learn to fight infections based on vaccines and medication. Essentially, the software is uploaded into the backend of IT infrastructure to monitor performance. When it encounters an outage, bottleneck or other performance issues, it will flag the problem to an engineer and then monitor the steps the engineer takes to remediate the issue. That way, the next time this problem occurs, the autonomic system, having been empowered by the engineer, can deploy the correct solution to “heal” the IT infrastructure, eliminating the need for human intervention. Going forward, the autonomic system will observe and track engineers’ actions to develop probabilities of what to do in certain situations and extrapolating further from those trends, expanding its knowledge base as well as the range of issues that it can resolve.

Unlike traditional automation, self-learning autonomic tools are able to understand and replicate complex decision making processes and behave in a similar way to the human cognitive process by constantly learning new ways of handling IT problems. The significant advantage that autonomic expert systems have over humans is that they behave consistently and always comply with the process rules they have learned. With these sorts of expert systems, IT departments can decrease the prevalence of network outages due to human error.

At the same time, by removing human employees from the low-level work that most often leads to errors, companies can develop an employee base focused more on the proactive aspects of business that require human creativity and intellect. Currently, 80 percent of IT activity, time and dollars are spent on routine “keep the lights on” activities with only 20 percent going to develop new services and applications. By flipping these numbers to focus employees on pushing the company forward, an organization could dramatically improve their offerings and gain ground in the industry.

Within the next decade, I predict that IT departments will be almost entirely managed by autonomic expert systems. Without this shift in IT management, human error will continue to plague our IT infrastructure performance. Already, companies that implement autonomic expert systems are removing humans from 40 and up to 80 percent of IT processes. Clearly there is room for growth with these statistics and I think we will see an increase in the prevalence of autonomic expert systems over the next few years. In the end, it would be a mistake to not take advantage of these solutions.