More and more organizations are digitally transforming their business by moving to the cloud.  This isn’t surprising—but it does present a challenge to the IT team who is increasingly becoming a “broker” for these cloud services.   IT (despite not controlling the underlying infrastructure, platforms, or even applications providing the cloud service) is still on the line for the continuity of that service, as well as the impact to the business if the service is impaired or down.  The dilemma is simple: Do I  really know how anything not in my data center is operating?  How do I monitor anything I put in or consume from the cloud?

First of all—“cloud monitoring” can mean a great deal of things to many people.  Mainly though—it boils down to the delivery of the contracted service and meeting agreed upon service-level agreements (SLAs) from the service provider.  However, monitoring such SLAs can be difficult for IT.  Oftentimes—they must rely on the service provider to do the monitoring (and the reporting) for them.  While this isn’t necessarily bad from a “reducing complexity” standpoint, IT must still be able to aggregate any data provided by the service provider into a holistic view of their environment.

So—if I want to do a better job at “cloud monitoring”—what should I be focused on?

Agents vs. Agent-less 

This is an on-going debate in the monitoring community—and there are pros and cons on both sides.  Deploying agents usually allows for more in-depth and rich reporting when it comes to metrics.  However—deploying and configuring agents takes time and may not make sense for the type of cloud services you’re consuming.  For highly elastic computing scenarios where VMs may only exist for a short duration—the strategy should be agent-less.  If using IaaS as a backbone to run critical IT and business services—then monitoring via agents may make more sense.  However, the agent vs. agent-less debate needs input from your cloud service provider too.  Some providers are OK with deployed agents reporting back monitoring data to you—while others will prefer to tackle certain types of monitoring on their end and simply provide you with the data—or potentially worse,  just a dashboard.  While IT monitoring dashboards are essential, you need a way to integrate the data provided to you by your cloud service provider into your own monitoring solution.

Monitoring IaaS

Rarely will you get any data about the CSPs underlying physical infrastructure—so you’re stuck with getting data on the VMs themselves.  For the most part—this is what you’re really care about anyways.  IT monitoring solutions have been tackling VM monitoring for years—so as long as your monitoring solution can tackle major virtualization platforms like Xen, vSphere, Hyper-V, and KVM—you should be good to go.   Whether you want to deploy an agent or go agentless should probably be determined by your own cloud strategy and discussed with your IaaS provider to come up with solution that gives you the types of monitoring data you need.

Monitoring the Cloud Management Platform

There a tons of cloud management platforms available, but being able to cover popular ones like OpenStack, CloudStack, AWS, Azure, CloudPlatform, OpenShift, and Cloud Foundry should be key in your cloud monitoring strategy.  Make sure your monitoring solution can accommodate your cloud management platform of choice.  The last thing you need is to add yet another monitoring solution to the mix just to cover your cloud strategy.

Monitoring the Application Itself

Identifying key applications being hosted in or entirely provided through the service provider is critical.  Knowing how the VM and platform is operating is great, but if the application or service on top of the platform goes down– you’d better know that.  Make sure you can adequately monitor any key applications that are cloud-based—but also know how those applications tie into your entire service delivery model.

What’s Kinds of Data Do I Need?
Two kinds of data should be tracked when it comes to cloud services:

  • Availability and overall performance/health: The idea is to be able to quickly answer these questions:  Is the cloud application available and how is it operating?  This should be captured in a kind of up/down and general statistics about the application operation:  Storage usage, memory usage, cpu usage, etc.
  • Experience of the application/service: This is a bit trickier, but the idea is to answer this question:  How is the application performing?  What is the consumer of the application experiencing?  This is normally determined by time to contact the application or service, the amount of time it takes to get a response back from the application or service, etc.  This is mainly accomplished through a series of pings and synthetic transactions to the service and is critical to determining if a cloud application or service is meeting its SLA.

Uniting the Picture

Last, but not least—no cloud application is really in isolation, but rather interacts with many other components in your organization to deliver a service.  Knowing how the performance of the cloud application impacts the overall delivery of any IT or business services is critical.  Often monitoring occurs in silos—and this is particularly true when the cloud is involved.  You need to be able to unite all the monitoring data you have (on-premise, off-premise, and cloud-based) into a clear picture of how the entire state of your IT is running and how any individual service is affecting your business.  Without that clearer picture, it is often difficult for IT (and the CEO) to relate esoteric stats about network performance or meeting of SLAs to the actual operation of the business itself.


There’s no doubt that cloud services are increasingly important to businesses everywhere.  As organizations seek to operate with greater agility, they are transforming themselves by moving to the cloud.  However having a solid, cloud monitoring approach is just as important as the cloud service itself.  It’s critical you know how you’ll monitor any service you outsource to or consume from the cloud or you may just find yourself experiencing service outages and delays you’d hoped were a thing of the past.

1 vote, average: 5.00 out of 51 vote, average: 5.00 out of 51 vote, average: 5.00 out of 51 vote, average: 5.00 out of 51 vote, average: 5.00 out of 5 (1 votes, average: 5.00 out of 5)
You need to be a registered member to rate this post.

Leave a Reply

One Comment

By: wheaps
May 3, 2017
3:17 pm