APM Tools A GUIDE FOR BUYERS
Why Time Series matters for metrics, real-time, and sensor data DOWNLOAD THE E-BOOK
“MySQL is not intended for time series data… I can testify it is like pounding nails with a screwdriver. It’s definitely not what you want to do in any relational database.” John Burk, Senior Software Developer
Buyers Guide
APMs are more important than ever for microservice-based architectures A BY JAKUB LEWKOWICZ
pplication performance management (APM) solutions need to adapt now that the age of monolithic applications has evolved into microservice-based architectures, which are innately distributed and complex and therefore harder to monitor. Collecting vast troves of data on how apps are performing is no longer enough, and APM providers have been adding new ways to analyze that data that
will drive meaningful and hyperfast solutions to expose any bottlenecks or code dependencies. Whether that’s by adding AI, ML, new plugins or methods of monitoring, reliability and speed are on everyone’s mind. “It’s not just enough to monitor specific isolated metrics because it’s not enough to just detect that something’s wrong. You need to act fast because the environment is fast. The end user reaction to degradation is
catastrophic,” said Daniella Pontes, senior product marketing manager at InfluxData. “If you are in a big event day, you are talking about hundreds of thousands of dollars per minute or billions per day. So you can’t afford a degradation that cannot be quickly identified and, most importantly, fixed.” In 2017, The Economist reported that the world’s most valuable resource is no longer oil, but data. But data in application monitoring isn’t effective if it can’t be analyzed, which makes it all the more crucial to have easy-to-use and intuitive monitoring to transform that data into outcomes, Pontes added. Most commonly, teams use APM tools when they find out that their app
Service mesh is a relatively new method that aids APM in microservices. “Instead of using an API gateway which can be challenging, service meshes are a very new modern way that we can concentrate, be a proxy, and provide a point that all microservices can report to,” said Charley Rich, senior director analyst at Gartner. “And then a monitoring tool can inquire to the service mesh to capture the collection of data. So it can act as a collection point and you can help in terms of ease of deployment and potentially performance.” Another trend is the use of OpenTracing. OpenTracing is a CNCF project that includes a set of vendor-neutral
Another major change in who uses the APMs in an organization has occurred, moving more towards the developers. is running slow, according to Denny LeCompte, general manager of application management at SolarWinds. “You’re then trying to find out as rapidly as you can, is it the code? Is it the infrastructure? Is it the network? Is it the database? You’re trying to figure out where in the stack it is. If you can provide an application team a way to reduce the meantime to resolution or meantime to innocence, that’s it,” LeCompte explained. APM solutions leverage data that is collected through API gateways, service mesh, business transaction tracking, log analytics and container APIs to determine both the performance experienced by end users of an application and to measure the computational resources to see whether there is an adequate capacity to support a load and to find potential bottlenecks.
APIs and instrumentation that is used for distributed tracing. “OpenTracing, census telemetry, service mesh and others need to be explored and utilized,” Rich said. “We’re moving from an era where the monitoring solutions go out and collect the data they need to an area where the infrastructure and applications are reporting back that information.” Another major change in who uses the APMs in an organization has occurred, moving more towards the developers, according to LeCompte. “Ten years ago the app dev guys would not have cared. That was not their problem. Whereas now, they’re definitely more involved and when there is a problem, they are more likely to go into the tool and expect the monitor tool to help them understand,” LeCompte said. “It’s getting to the
point where any sort of application team would feel naked without a tool to provide them with visibility.” Meanwhile, Pontes said APM solutions have evolved to a point where all parts of a team are using it. The developers are using APM to understand how fragmented code performs before moving forward with it in the production environment. The CI/CD teams are using it to understand what kind of impact that change can do and IT teams are using it to make sure everything stays as it should. What used to be one slowly changing monolith is now all of a sudden dozens of quickly changing microservices that get changed on a weekly or even daily cadence, according to Ivo Mägi, CEO of Plumbr. “Every change is risky by nature so you need to keep a closer eye on your microservices-based architecture because errors are just more likely to happen in situations where you have really agile release cadences,” Mägi said. He added that APM helps users with availability metrics so that whenever those metrics drop below tolerable levels, the teams are aware of the issues emerging. Another important aspect is the distributed tracing throughout all the microservices in the back end that allows one to zoom in to the exact service failing and, better yet, into the single line of source code in a particular service failing. These functionalities cut down the time to resolution for every incident. “Technical monitoring solutions like APMs are similar to sport watches in the sense that through some sensors they gather data and turn it into information. It would be like monitoring the heart rate or steps done during the day. Now if I just see that I did 3000 steps during the day, I don’t know whether I just broke the world record or I am the laziest guy in the world.. I actually haven’t changed my habits nor really gained anything It’s just a distraction after a while,” Mägi explained. “But if I know that 10,000 steps a day keeps the doctor away and that coupling this with an actual action and doing the remain-
How these companies can help you monitor your applications Daniella Pontes, senior product marketing manager, InfluxData InfluxDB time series database offers a platform where today’s highly complex application environments (ephemeral containers, distributed apps in hybrid and multi-cloud, mobile applications, expanding APIs and such) can be effectively monitored. InfluxDB provides a scalable time series store that ingests data types (application metrics, logs, tracing and more) together with a real-time analytical engine that can process these data in complex ways. InfluxDB’s data enrichment and advanced operations takes high-volume raw data from multiple sources and delivers information as
needed to be presented to the various audiences in order to be actionable. When early signs of degradation or anomalies are detected in important KPIs monitored with InfluxDB platform, the diagnosis analysis comes into place and integration with an existing bytecode APM solution could provide quick access to what could be a root-cause in the application code. This way customers can use application performance monitoring and instrumentation in a more effective, focused and manageable manner.
Ivo Mägi, CEO of Plumbr Most organizations today are facing operating in DevOps, dynamic infrastructure or microservices environments. Against these institutional changes most monitoring tools fail by just working in the background to protect brands against outages in service. An APM fit for the era of digital transformation, though, guides you in proactively helping improve app performance with a focus on increasing user engagement and transactions, as well as ensuring that engineers are adding value rather than fighting fires. Plumbr APM simplifies the act of discovering, verifying, fixing and actively preventing issues through four key features:
• Actionable alerts based on user data. • Distributed tracing for all and every request in your stack. • Root-cause detection that, unlike other APMs, dives to the exact line of code that needs fixing. • Impact analysis to show were the biggest ROI lies for connected business outcomes. This allows users to trace every user interaction done in the UI throughout distributed traces in your back end monitored by the APM down to the bottleneck or error that’s actually hindering the user experience, and then being able to rank those bottlenecks and error based on the impact they have on users.
Denny LeCompte, general manager, application management, SolarWinds The main thing is that what we do see customers looking for is an integrated solution. So we think that the advantage that we’ve got is that we have best-of-breed products. They were all built standalone to either solve performance, logging or APM. And then we’re bringing them together so that they can work in a tightly integrated fashion so that they’re all mature enough. What we try to do is not necessarily have the most features or the lowest price. What we shoot for is for every product to be the best value in the market so we have all the features that almost everybody needs. One of the things we want to bring is an affordability element to the whole equation that lets you either save money or
stretch your monitoring farther so that you can monitor more of your vertical apps. We could see that a lot of customers didn’t want to have to go and instrument an application to be monitored. They really just want to use it out-ofthe-box. That’s been important for all of our SolarWinds products. We do not think that a monitoring product should require some third-party to go spend a bunch of time and money to make it work. It should all be a sort of automatic out-of-box. The SolarWinds APM Suite includes Pingdom, AppOptics and Loggly which combine user experience monitoring with custom metrics, code analysis, distributed tracing, log analytics and log management. ❚
A guide to APM tools ■ AppDynamics: The AppDynamics Application Intelligence Platform provides a real-time, end-to-end view of application performance and its impact on digital customer experience, from end-user devices through the back-end ecosystem — lines of code, infrastructure, user sessions and business transactions. The platform was built to handle the most complex, heterogeneous, distributed application environments; to support rapid identification and resolution of application issues before they impact users; and to deliver real-time insights into the correlation between application and business performance. ■ Catchpoint Systems: Catchpoint offers innovative, real-time analytics across its Synthetic Monitoring and Real User Measurement (RUM) tools. Both solutions work in tandem to give a clear assessment of performance, with Synthetic allowing testing from outside of data centers with expansive global nodes, and RUM allowing a clearer view of end-user experiences. ■ Dynatrace provides software intelligence to simplify enterprise cloud complexity and accelerate digital transformation. With AI and complete automation, our all-in-one platform provides answers, not just data, about the performance of applications, the underlying infrastructure and the experience of all users. We help companies mature existing enterprise processes from CI to CD to DevOps, and bridge the gap from DevOps to hybrid-to-native AIOps. ■ Instana is a fully automatic APM solution that makes it easy to visualize and manage the performance of your business applications and services. The only APM solution built specifically for cloud-native microservice architectures, Instana leverages automation and AI to deliver immediate actionable information to DevOps. For developers, Instana’s AutoTrace technology automatically captures context, mapping all your applications and microservices without continuous additional engineering.
■
FEATURED PROVIDERS ■
■ InfluxData: APM can be performed using InfluxData’s platform InfluxDB. InfluxDB is a purpose-built time series database, real-time analytics engine and visualization pane. It is a central platform where all metrics, events, logs and tracing data can be integrated and centrally monitored. InfluxDB also comes built-in with Flux: a scripting and query language for complex operations across measurements. ■ Plumbr: Plumbr is a modern monitoring solution designed to be used in microservice-ready environments. Using Plumbr, engineering teams can govern microservice application quality by using data from web application performance monitoring. Plumbr unifies the data from infrastructure, applications, and clients to expose the experience of a user. This makes it possible to discover, verify, fix and prevent issues. Plumbr puts engineering-driven organizations firmly on the path to providing a faster and more reliable digital experience for their users. ■ SolarWinds: The SolarWinds APM Suite — Pingdom, AppOptics, and Loggly — combines user experience monitoring with custom metrics, code analysis, distributed tracing, log analytics, and log management to provide proactive visibility into modern applications. All major types of data are collected, including logs, traces, metrics, and both synthetic and real enduser experience data, enabling proactive problem avoidance and rapid root cause troubleshooting. The suite works across all major application development architectures: monolithic, n-tier SOA, and microservices.
■ LightStep’s mission is to deliver insights that put organizations back in control of their complex software applications. Its first product, LightStep [x]PM, is reinventing application performance management. It provides an accurate, detailed snapshot of the entire software system at any point in time, enabling organizations to identify bottlenecks and resolve incidents rapidly. ■ New Relic: New Relic’s comprehensive SaaS-based New Relic Software Analytics Cloud provides a single powerful platform to get answers about application performance, customer experience, and business success for web, mobile and back-end applications. New Relic delivers code-level visibility for applications in production that cross six languages — Java, .NET, Ruby, Python, PHP and Node.js — and supporting more than 70 frameworks. New Relic Insights is embedded in the
platform, enabling customers to do detailed, ad hoc queries for real-time analytics across New Relic’s APM, Mobile, Browser and Synthetics products. ■ Oracle: Oracle provides a complete end-to-end application performance management solution for custom and Oracle applications. Oracle Enterprise Manager is designed for both cloud and on-premises deployments; it isolates and diagnoses problems fast, and reduces downtime, providing end-toend visibility through real user monitoring; log monitoring; synthetic transaction monitoring; business transaction management and business metrics. ■ OverOps captures code-level insight about application quality in real-time to help DevOps teams deliver reliable software. Operating in any environment, OverOps employs both static and dynamic code analysis to collect unique data about every error and exception
www.sdtimes.com
— both caught and uncaught — as well as performance slowdowns. This deep visibility into an application’s functional quality not only helps developers more effectively identify the true root cause of an issue, but also empowers ITOps to detect anomalies and improve overall reliability.
■ Pepperdata: With proven products, operational experience, and deep expertise, Pepperdata provides enterprises with predictable performance, empowered users, managed costs and managed growth for their big data investments, both on-premise and in the cloud. Pepperdata enables enterprises to manage and improve the performance of their big data infrastructures by troubleshooting problems, maximizing cluster utilization, and enforcing policies to support multi-tenancy. ■ Riverbed recognizes the need to maximize digital performance and is uniquely positioned to provide organizations with a Digital Performance Platform that delivers superior digital experiences and accelerates performance, allowing our customers to rethink what is possible. Riverbed application performance solutions provide superior levels of visibility into cloud-native applications — from end users, to microservices, to containers, to infrastructure — to help you dramatically accelerate the application lifecycle from DevOps through production. ■ SmartBear: AlertSite’s global network of more than 340 monitoring nodes helps monitor availability and performance of applications and APIs, and find issues before they hit end consumers. The Web transaction recorder DejaClick helps record complex user transactions and turn them into monitors, without requiring any coding. ■ SOASTA: The SOASTA platform enables digital business owners to gain continuous performance insights into their real-user experience on mobile and Web devices — in real time and at scale. ❚
ing 7,000 steps, I have gained quality in my life. And to me this is really similar to what APMs are able to do. If you understand how and why performance and availability can impact your business and know when to respond then you can actually have a significant impact on your business.” However, despite all of its benefits, creating an effective APM solution comes with a set of challenges. According to Rich, the biggest challenge when monitoring microservices is its ephemerality, and APM vendors have to adapt to work with it. “Usually agents for most cases are specific, so that’s problematic for a lot of vendors. To package agents in the containers, I need to know in advance what’s going to go into a container image. That’s a lot of work. And it also makes me more static when I’m trying to be agile,” Rich said. “They’re just there for moments, then gone and somewhere else, which makes monitoring challenging. That’s different from the traditional approaches to monitoring within an enterprise in a cloud,” Another challenge, according to a Gartner report, is that many organizations don’t provide production visibility for the application development and DevOps teams that build microservicebased applications, resulting in an isolation from the IT teams that are responsible for operational deployment. To fix these problems, Gartner recommends companies adopt a coordinated monitoring strategy between operations, developers and DevOps teams, enabling service discovery by using the API gateway layer, leveraging service mesh and maintaining up-todate service metrics. Rich said companies that are undergoing digital transformation are the primary candidates for using APM solutions. Mode 2 applications that emphasize agility and speed need to be monitored the most because these are the ones that change frequently. Sometimes changes occur several times a day; therefore, protecting the moneymaking applications is most critical. “Anything that’s built now really
November 2019
SD Times
does need some sort of APM. I don’t really think there’s an application in modern times that doesn’t do better with some level of monitoring,” SolarWinds’ LeCompte said. “Lots of customers only monitor the most missioncritical things, but if you built it and it’s running part of your business, then if you’re not monitoring it, you’re just going to be surprised.” LeCompte said this includes things many people would not immediately regard as an application, such as websites. Yet, web dev and web operations teams are constantly monitoring how different users are perceiving it. He added that users expect an APM solution to work out-of-the-box and to automate agent deployment. “Customers don’t want to have to spend weeks rolling this thing out. We do not think that a modern product should require some third party to go spend a bunch of time and money to make it work. It should all be a sort of automatic out of the box,” LeCompte said.
Increasing automation to keep up with continuous deployment In order to keep up with the rapid pace of monitoring, many APM solutions are adding AI and ML capabilities. Manual APMs are no longer equipped to deal with the dynamism and the scale that microservices require, said Pontes. “You need to feed the data into artificial intelligence and machine learning frameworks to start automating certain aspects of the workflow. Because the human factor is actually the bottleneck,” Pontes said. These machine learning additions do things like correlation and analysis to reduce the volume of alerts, preventing a storm, reducing false alarms, detecting anomalies and finding unusual values to then correlate them and then predicting the potential impact, Rich added. “Machine learning has been embedded in many APM solutions, not necessarily to do anything new but to do what they did before much better,” Rich said. ❚
43