MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
genai
Search

5 ways generative AI boosts cloud and IT operations

Tuesday April 29, 2025. 11:00 AM , from InfoWorld
Anyone who thinks engineers, administrators, and analysts in IT operations have an easy job hasn’t spent enough time in their shoes.

Automation, observability, and machine learning help IT operations deploy and manage many more large-scale and mission-critical workloads. However, expected service levels, compliance requirements, multi-cloud complexities, and exponentially increasing data volumes all increase business requirements and expectations of IT operations.

How to leverage genAI in IT and cloud operations

According to the 2024 Global Workforce AI Report, 85% of IT teams say AI makes their workday more positive. These professionals said AI gives them time to learn new skills, get more work done, and take on more creative work.

Software developers use genAI to generate code, create documentation, and simplify app modernizations.

GenAI is helping data scientists spend more time learning end-user workflows and reducing data bias in training data.

CIOs invest in agentic AI to drive customer success, improve supply chain forecasting, and find manufacturing defects.

But how can genAI simplify work in IT and cloud operations? Lori Rosano, MD & SVP of North American Public Cloud at SAP, says, “Integrating genAI into cloud and IT operations empowers organizations to elevate their performance, improve agility, and become more resourceful and better equipped to respond to evolving environments.”

Here are five ways to use genAI in incident response, security, cloud infrastructure, and finops.

Improve AIops and incident response

I’ve previously written about various ways to use AIops, including for machine learning in application monitoring, helping site reliability engineers (SREs) meet service level objectives, and reducing major incident resolution times. AIops solves the problem of centralizing alert information, sequencing telemetry data, identifying likely root causes, and triggering common remediation automations.

”GenAI is significantly enhancing IT and cloud operations by automating tasks such as incident resolution and log analysis,” says Kellyn Gorman, advocate and engineer at Redgate. “It leverages predictive analytics to monitor system performance and address potential issues before they arise, provides data-driven recommendations for workload optimization, and improves user interactions through conversational tools.”

GenAI increases the scope of AIops capabilities, especially in complex IT environments where it’s hard to trace incidents to their sources. By providing engineers with genAI prompt capabilities, they can explore different scenarios around root causes and remediations of challenging incidents.

“GenAI assists by generating insights, summarizing complex system data, and automating documentation for incident response and remediation,” says Preetpal Singh, global head of product and platform engineering at Xebia. “By reducing manual effort in interpreting system logs and operational workflows, genAI helps ops teams make data-driven decisions faster while AI-driven automation handles performance tuning and anomaly detection.”

Enable accurate root cause analysis

Most IT service management functions separate incident management and problem management functions. The primary role of incident management is to find the source of an issue and restore services, while problem management performs root cause analysis, especially regarding recurring issues with multiple underlying symptoms.

“Coupling observability with AIops enables automated detection, diagnosis, and remediation—delivering self-healing infrastructure that strengthens application resilience,” says Steve Mayzak, global managing director of Search AI at Elastic. “Teams can also better interpret data and signals, gain visibility, and optimize operations. GenAI takes this further, providing intuitive navigation and deeper insights via simple queries. For example, if code consumes excessive processing power, genAI can analyze code profiling data, pinpoint high-load functions, and recommend optimizations to boost efficiency and cut costs.”

IT operations have long sought the opportunity to extend performance analytics into the application and networking layers where the more complex issues occur.

“Not only can genAI help to reduce time to resolution through the swift analysis of data sets and incident alerts, but it can also directly assist IT teams by answering their questions,” says Anant Adya, EVP at Infosys Cobalt. “AI chatbots can guide professionals through complex incidents by compiling resources and solutions from different networks.”

As organizations train genAI tools on observability, incident response, and asset management data, they will usher in a new era where AI agents trained for IT operations can analyze historical performance and recommend configuration changes to improve resiliency.

Enhance security audits and threat detection

Resolving and finding root causes of security incidents is more challenging with the growing number of threats and bad actors exposing vulnerabilities in ways that can be impossible to trace manually.

“Cloud security, even with a large team of human IT professionals, often feels like a game of whack-a-mole because there are too many entrances with too many moles to whack,” says Joe Warnimont, security and technical expert at HostingAdvice.com. “Generative AI changes the game, as it can patrol many entrances simultaneously while also making predictions on where to respond based on trends and past infiltrations.”

I expect specialized AI agents to support IT operations that differ from those information security professionals use. Each agent focuses on a specific function to detect, predict, and respond to issues and optimization opportunities.

“For cloud security, genAI enhances threat detection, identifies anomalies, and automates incident response,” says Bakul Banthia, co-founder of Tessell. “It strengthens access management by analyzing user behavior and device security while continuously auditing cloud configurations for compliance.”

Another opportunity for IT operations is accelerating compliance with data governance policies. Many organizations are deploying data security posture management (DSPM) platforms and defining their AI governance policies, but what about the required implementations in IT operations?

“With the vast amounts of data stored in the cloud, ensuring data security and privacy is paramount,” says Josh Ray, CEO at Blackwire. “GenAI can help enforce data governance policies, improve threat detection and response, automate compliance policy enforcement, and deliver continuous security improvements.”

Scale cloud ops in complex environments

Incident and problem management are reactive, where genAI can analyze data quickly and respond to issues autonomously or with a human in the middle. Another opportunity is using genAI for more proactive work where it can improve the robustness and scale of implementing standard operating procedures.

“Generative AI for IT operations helps organizations struggling to keep up with the complexity and scale of modern IT environments by streamlining processes and automating routine tasks, like patching,” says Joel Carusone, SVP of data and AI at NinjaOne.

GenAI is also used in strategic IT functions, such as scaling cloud operations for complex workloads.

“GenAI is improving IT and cloud operations by automating infrastructure, predicting demand, and reducing waste, but without oversight, it can just as easily drive up costs,” says Karthik SJ, GM of AI at LogicMonitor. “Ops teams need to learn to track AI workloads in real-time, fine-tune automation to prevent unnecessary scaling, and use AI insights to optimize costs. The real value isn’t in letting AI run the show—it’s in knowing how to control it to make cloud operations leaner, faster, and more cost-effective.”

I also see agentic AI as a partner to cloud architects and engineers, especially as public clouds and infrastructure providers release new capabilities and innovations. We should expect cloud AI agents to do more than scale infrastructure. As their sophistication improves, they can be invaluable partners for scenario-planning architecture upgrades.

Shift to scaleable finops and IT strategic planning

A similar shift is happening in finops, where early AI agents are reactive and provide tactical best practices to reduce cloud costs.

“GenAI is transforming finops by automating cloud cost optimization, identifying unused resources, and dynamically adjusting workloads to reduce waste,” says Tiago Miyaoka, AI and data practice lead at Andela. “Tasks that once required manual effort from finops engineers—such as tracking underutilized instances and reallocating resources—can now be streamlined with AI-driven systems. By continuously scanning cloud environments and applying intelligent cost-saving strategies, genAI helps organizations minimize expenses while maintaining performance.”

Integrating and normalizing all the cost and consumption data to support finops activities can be challenging for larger enterprises operating in multiple clouds, geographically dispersed data centers, and edge-computing locations. GenAI is already overhauling data integration capabilities, and finops use cases offer a significant cost and carbon savings opportunity.

“Many of the cloudops and finops tools involve analyzing tons of usage data stored in several different databases and rely on APIs and scripts to get insights into the usage and costs,” says Karthik Kannan, head of product management, strategy, and operations at Nile. “GenAI capabilities such as data summarization, data visualization, and text summarization can potentially reduce or eliminate the need for such software. Ops teams can get instant insights into the usage and costs and design their optimization strategies around those insights.”

GenAI will present new opportunities to simplify work in IT operations but don’t expect an agentic AI silver bullet anytime soon. With every wave of infrastructure and operational simplifications comes a new generation of capabilities businesses need, creating new challenges to operational resiliency.
https://www.infoworld.com/article/3966216/5-ways-generative-ai-boosts-cloud-and-it-operations.html

Related News

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Current Date
Apr, Tue 29 - 22:31 CEST