Cloud platforms, managed service providers, and organizations undertaking digital transformations are beginning to reap the benefits of an emerging IT trend: the use of AI-powered IT operations technology to monitor and manage the IT portfolio automatically.
This emerging practice, known as AIOps, is helping enterprises head off potential outages and performance issues before they negatively impact operations, customers, and the bottom line. But the more advanced deployments are beginning to use AI systems not just to identify issues, or to predict issues before they happen, but to react to events with intelligent, automated mitigation.
But what exactly is AIOps and how are organizations putting it to use today? Here we take a deeper look at the technologies, strategies, and challenges of AI-assisted IT operations.
What is AIOps?
AIOps is an emerging IT practice that applies artificial intelligence to IT operations to help organizations intelligently manage infrastructure, networks, and applications for performance, resilience, capacity, uptime, and, in some cases, security. By shifting traditional, threshold-based alerts and manual processes to systems that take advantage of AI and machine learning, AIOps enables organizations to better monitor IT assets and anticipate negative incidents and impacts before they take hold.
Carhartt CIO John Hill leverages AIOps at the work-apparel retailer in three main areas: service management, performance management, and IT automation. Thanks to intelligent monitoring, Carthartt can now spot problems before they impact users or customers.
“It’s the whole process of monitoring your environment and understanding what’s going on — and taking actions based on those indicators,” he says. “Previously, you would rely on an outage or some indication that something isn’t working” to know when a fix was needed — events likely to have already degraded customer experience before you knew of them.
AIOps tools
Many AIOps platforms have been built on monitoring systems with a long history. Others began in AI labs and grew outwards. Good AIOps tools generate forward-looking guesses about machine load and then watch to see whether anything deviates from these estimates. Anomalies might be turned into alerts that generate emails, Slack posts, or, if the deviation is large enough, pager messages. Sophisticated AIOps tools also offer “root cause analysis,” which creates flowcharts to track how problems can ripple through the various machines in a modern enterprise application. Anyone considering adopting an AIOps platform will want to evaluate how well each AIOps offering integrates with your particular databases and services. The following AIOps tools are among the best available today:
- AppDynamics
- BigPanda
- Datadog
- Dynatrace
- GitHub Copilot
- IBM Watson Cloud Pak for AIOps
- LogicMonitor
- Moogsoft
- New Relic One
- Splunk
For a deeper look at these tools, see “Top 10 AIOps platforms.”
AIOps use cases
AIOps may already be at work in your IT portfolio without you even knowing it. Advanced CRM or ERP systems often have intelligent management built in. Most major cloud platforms make use of machine learning–powered monitoring and management tools as well.
But relying on built-in functionality within point solutions has its downsides. Sixty-five percent of IT organizations in an AIOps Exchange survey said they still rely on monitoring approaches — whether intelligent or not — that are either siloed, rules-based or don’t cover the needs of their entire IT environment. Moreover, according to a recent BigPanda survey, 42 percent of IT organizations use more than 10 different monitoring tools for their IT environments.
That was how Carhartt started with AIOps. “Previously, for the different environments, we’d have to monitor them independently,” Hill says. To manage this complexity, Hill opted to combine monitoring onto two platforms, settling first on AppDynamics for application performance monitoring, and later adding Turbonomic to keep tabs on Carhartt’s infrastructure.
Performance issues on the company’s website during Black Friday and Cyber Monday shopping rushes forced the need for a change. By the time the company saw the problems, customers had already felt the service degradation, Hill says.
Since Carhartt deployed AppDynamics in the fall of 2017, spikes during Black Friday and Cyber Monday have been met with zero downtime.
“We had record growth,” he says. “We grew double the rate of the industry as a whole, without any of the outages or performance degradation that we had experienced previously.”
Carhartt added Turbonomic in early 2019 for resource management of both on-prem and cloud environments. With the new system, utilization has increased from 70 to 92 percent, he says. “It probably saved us 25 percent of infrastructure costs.”
Increased utilization needs are processed automatically, without human intervention, while decreases in capacity still require human approval.
“It sees that we’ve got a capacity challenge and it puts a change request through to ServiceNow,” Hill says. “When we have too much capacity, it creates a ticket in ServiceNow, and someone looks at it first. It’s a quick review — just a click. For now, I don’t need to automate it.”
The next step for the company is automating business tasks, such as processing customer orders using text recognition and natural language processing.
AIOps adoption
By 2023, 40 percent of companies will be using AIOps for application and infrastructure monitoring, according to Gartner. But by all accounts, AIOps adoption is still in its early stages. According to a 2019 survey sponsored by Loom Systems, only 5 percent of companies have implemented AIOps so far.One thing hurting adoption is that there are a lot of vendors in the market, says Akash Bhatia, managing director and partner at Boston Consulting Group. “Almost too many.”
And with 59 percent of organizations in the exploration phase, according to the Loom Systems report, it’s still hard for customers to figure out exactly what they’re offering. Plus, many vendors operate in just one segment of AIOps, Bhatia says, such as application performance monitoring, infrastructure management, or network performance monitoring and diagnostics. But the market is showing signs of consolidation as the technology matures, he adds.
IDC predicts the AIOps market, which it calls IT operations analytics, will grow from $2.9 billion in 2018 to $4.5 billion in 2023, with most of the growth coming from AIOps as a service. And while AIOps is often bundled in with enterprise software platforms or cloud services, larger enterprises are beginning to invest in AIOps as a standalone budget item, says Stephen Elliot, analyst and program vice president for AIOps at IDC.
“They’re realizing that they’re in a multicloud world,” he says. “And they have agile transformation happening, and they have DevOps teams, and they’re realizing that they’ve got to move faster and that complexity is increasing.”
AIOps value proposition
Companies that leverage AIOps are beginning to see the importance of shifting from systems that perform analysis and predictions to those that make decisions on their own. Enter automation.
“They need tools that can collect massive pools of information, apply analytics, reduce the noise, and drive faster problem identification and resolution,” Elliot says.
Automation also requires greater AIOps integration. A problem with application performance may be due to a software issue, a networking issue, or a hardware issue. In a multi-cloud environment, the root cause can be in one cloud, or in another cloud, or be the result of a combination of factors. If your AIOps infrastructure is fragmented, finding and fixing the root causes of problems can be a challenge.
“Then you’re back to hand-to-hand combat, where every group has its own tools,” says David Link, CEO at ScienceLogic, an AIOps vendor. “If you have a unique tool for every application initiative, you can’t scale the enterprise that way.”
Meanwhile, companies that have deployed AIOps, like Carhartt, are finding that their investments are paying off. According to a survey by Enterprise Management Associates, 81 percent of enterprises using AIOps report a positive return on investment. In fact, 42 percent said that the value of AIOps “dramatically” exceeds the costs.
According to EMA, the six most common use cases for AIOps are cross-domain application infrastructure and performance, capacity management and infrastructure optimization, DevOps and agile, customer and end-user experience management and business alignment, cost management and change management.
AIOps as a revenue generator
Cincinatti Bell’s CBTS subsidiary provides communication services to enterprise customers. CBTS used to stand for “Cincinnati Bell Technology Solutions” but as the company expanded to other geographies, it now stands for “Consult Build Transform Support,” says Joe Putnick, the company’s chief innovation officer.
Moving to AIOps was critical to helping improve reaction times, he says, but has now become a source of new business opportunities. For example, before the company turned to AIOps, it would take hours, days or “never” to get customer equipment into the CBTS monitoring, management and billing systems, Putnick says.
“Now I’ve taken provisioning from five hours down to two minutes,” Putnick says. “And when I say provisioning, I mean the full provisioning across the whole IT service management and event management systems. I know that those statistics are pretty compelling.”
The company is also using AIOps to analyze usage patterns and automate responses. “We’re applying AIOps to predict where the capacity needs to be so that we can maintain maximum uptime and maximum customer satisfaction,” he says.
AIOps has helped CBTS grow from less than 40 sites per month, to more than 500 average installs per month, says Putnick — with almost the same number of people.
CBTS uses a combination of tools built into AWS, its own custom-coded applications inside of ServiceNow, custom machine learning and adaptive algorithms, as well as AIOps tools from ScienceLogic. Next up: value-added services for its customers. For example, customer service chatbots that CBTS provides its customers can be made more intelligent and responsive using the data, analytics and predictions that come out of its AIOps systems.
AIOps and managed services providers
But to see the full potential of AIOps, you should look no further than the managed services provider (MSP) industry.
“It’s probably the largest part of the market right now,” says Justin Richie, data science director at Nerdery, a digital services consultancy. “They’re definitely trying to invest in algorithmic support where they can. They know, outside the hardware, their largest expense is human capital.”
For MSPs, AIOps means higher efficiency, lower costs, and faster resolution times — all significant competitive differentiators in this sector.
“It’s one half of our value proposition for AIOps,” says Raghu Kamath, senior vice president of strategy and operations at San Jose-based MSP NetEnrich, which has rolled out AIOps across more than 1,000 clients. “We started to implement it across a few customers, then gradually extended it across our customer base over the last twelve months. Now, over 50 percent of our customers are on the AIOps platform.”
One of the most obvious and immediate benefits for NetEnrich was a reduction in noise. False alarms create unnecessary work for employees, and slow down response times for customers.
“Our response time to detect and take action has increased — our mean time to repair has become at least 30 percent faster after implementing AIOps,” Kamath says. “And it will continue to increase as AIOps becomes more mature and brings in more inference models.”
Because NetEnrich uses AIOps in so many different customer environments, Kamath has a unique perspective on the technology. First, he has found that the more homogeneous the environment, the easier it is to deploy AIOps.
“It becomes a lot more complex when you start integrating all these different environments,” he says.
Also, customers that use public cloud infrastructure have a leg up because the environments are more consistent. Still, there are occasional hurdles in getting cloud vendors to open up their systems.
“But the public cloud vendors are shifting their position,” he says. “If you look at the amount of data you had access to two years ago, to now, it has gotten a lot better.”
Leveraging AIOps for legacy applications and hardware is tricky, Kamath says. “If you don’t have enough logs, it becomes pretty difficult to infer anything. That is why we encourage our customers to accelerate their digital transformations and modernize their applications.”
Next read this: