25 minute read

Are your metrics right for a remote workforce?

BY GUARANG TORVEKAR

So much of what we do at work has to be measured. There is a sense that, if something cannot be measured, does it even really exist? Certainly, if a project or function can not demonstrate how it is being measured in a clear, understandable manner, its ability to secure approval or signoff is dramatically reduced.

Metrics, key performance indicators, objectives and key results (OKRs), being able to measure progress — it all links back to a need within organisations to ultimately quantify return on investment. When we all worked in one place, most metrics were tied to outputs — achieve sales targets, ship code, maintain a positive net promoter score.

Changing environments demand new metrics

But how have those ways of measurement changed in the last year? Do they take into account the challenges and opportunities that come with remote working? As Dan Montgomery, the founder and managing director of Agile Strategies, said, the current situation “is a great opportunity to get better at managing people around outcomes rather than tasks or, worse yet, punching a virtual clock to prove they ’ re working. Many employees working from home genuinely have big challenges, including bored kids, sick relatives and an unending stream of bad news. They need the flexibility right now and will appreciate your trust in them. ”

Having that flexibility is particularly critical in uncertain times. “Now more than ever, the goals that we ’ re setting are so critical for us to be able to navigate what happens next, ” Ryan Panchadsaram, co-founder and head coach of What Matters said.

Defining a clear vision

aligning targets and objectives throughout the business. It doesn ’t matter whether you ’ re a start-up, a scale up or an established sector leader, without a goal at the company level, you ’ re lost. Chris Newton, VP of Engineering at Immersive Labs, calls this “Vision — it all needs to have a really clear, inspiring, well understood company vision that is really guiding every department in the business. Not just product and tech, but you ’ re talking about the whole wider business. There has to be a direction, a clear direction for the company. ”

Chris was talking as part of a recent Indorse Engineering Leaders panel discussion. Once you have that big vision, he says

“Underpinning that is going to be the product and tech side of things. You will have your product vision: ‘ what are we trying to achieve for our customers through the product?’ Then you have the engineering vision that underpins the product vision. It is complementary to the product vision, and it supports it. The engineering vision & strategy lines up to delivering the best outcomes for customers through the product vision. ”

It is only once that big picture is in place that a business can start to work out how it is going to get there.

The right framework for transparency and function

Chris was particularly keen on Objectives and Key Results, or OKRs. “Objectives framework, such as OKRs, can be a really powerful tool in terms of getting that prioritization and alignment right. It’ s great to make a clear and visible link between what software engineers and managers are doing on the ground and how that then ties back up to top-level objectives. ”

What this brings to an organisation is transparency in goal setting. Everyone, from senior executives down to team members, is clear on how objectives are created and how what they do helps drive results.

Having that process is critical to determining what action is going to be taken. As another panellist, Nik Gupta, Software Development Manager at Amazon, highlighted, getting the basics right is critical. Nik and his team “ spend about two months just getting our metrics right. Literally, just figuring out what are the right metrics we should track worldwide — are they instrumented, are they reliable, and how would we validate them, etc. It is absolutely essential to get that framework built before you start delving into ‘ what projects are we going to do and why. ’”

What that looks like is going to vary, and it can be easier for some functions than it is for others, as Smruti Patel, another panellist, highlighted. As Head of LEAP and Data Platform at Stripe, she has found that the former is easier to measure than the latter. For LEAP, “the metrics here are obviously more tangible. It’ s easier to measure how much you ’ re spending on your infrastructure or how much time the customer sees when they make a request. ”

However, on the data infrastructure side

“ some of the inherent qualities or principles from the platform that the internal users require are security, reliability, availability, and leverage, in terms of product enablement, which then enables Stripe ’ s users. Here, identifying the right set of metrics for infrastructure kind of work has been a challenge. ”

To solve this, Smruti and her team were looking at leveraging learnings from LEAP and seeing how they could be applied to Data Platform.

Prepare for change

However, while it is important to be clear on what you should measure, being too rigid once they ’ re defined is counterproductive. Panchadsaram pointed out that “OKRs were never meant to be these rigid rails, they were meant to be a tool for your teams to collectively commit to something. ”

In a blog for O’Reilly.com, former Rent the Runway CTO Camille Fournier echoed this sentiment when she said “ measurement needs to be focused on the right goals for right now, and you should expect that what you measure will change frequently as the state of systems and the business changes. ”

That can only be achieved when metrics are aligned throughout the organisation.

Put simply, for metrics to be relevant in the current climate, they need to be aligned with a company vision which is then cascaded down the organisation. It is a process that needs to be rigorous in order to inform the work teams need to do, but it also needs to be flexible. At a time when the situation changes almost daily, it is the only way organisations operating with remote teams are going to develop metrics that are beneficial to the business. z

Gaurang Torvekar is the founder of Indorse, a provider of engineering metrics solutions.

BY CHRISTINA CARDOZA AND DAVID RUBINSTEIN

It seems like the industry is leaving application performance management (APM) behind and moving towards a new observability world. But don

’t be fooled. While vendors are rebranding themselves as observability tools, APM is still an important piece of the puzzle.

“Observability is becoming a bigger focus today, but APM just by design will continue to have a critical role to play in that. Think about observability holistically, but also understand that your applications, your user-face applications and your back-end applications are driving revenue, ” said Mohan Kompella, vice president of product marketing at the IT Ops event correlation and automation platform provider BigPanda.

Because of the complexity of modern applications that rely on outside services through APIs and comprise microservices running in cloud-native environments, simply monitoring applications in the traditional way doesn

’t cover all the possible problems users of those applications might experience.

“What’ s important, ” explained Amy Feldman, head of AIOps product marketing at Broadcom, “is to be able to take a look at data from various different aspects, to be able to look at it from the traditional bytecode instrumentation, which is going to give you that deep-level transactionability back into even legacy systems like mainframe or, TIBCO, or even an MQ message bus that a lot of enterprises still rely on. ”

Further, as more applications are running in the cloud, Feldman said she

www.sdtimes.com July 2021 SD Times 25 Buyers Guide

ing to change the landscape

” of what monitoring looks like, and they want to be able to have more control over what the output looks like. “So they ’ re relying more on logs and relying more on configuring it through APIs, ” she said. “We want to be able to move from this [mindset of] ‘I’ m just telling you what to collect from an industry and vendor perspective, ’ to having the business be more in charge about what to collect. ‘This is the output, I want you to measure it, look at all the data and be able to assimilate that into that entire topological view.

APM, observability or AI Ops?

Kompella explained there

’ s a lot of confusion in the market today because as vendors add more and more monitoring capabilities into their solutions, APM is being blended into observability suites. Vendors are now offering “ all-in-one ” solutions that provide everything from APM to infrastructure, logging, browser and mobile capabilities. This is making it even harder for businesses to find a solution that works best for them because although vendors claim to provide everything you need to get a deep level of visibility, each tool addresses specific concerns.

“Every vendor has certain areas within observability they do exceedingly well and you have to be really clear about the problem you ’ re trying to solve before making a vendor selection. You don ’t want to end up with a suite that claims to do everything, but only gives you mediocre results in the one area you really care about, ” Kompella said.

When looking to invest in a new observability tool, businesses and development teams need to ask themselves what the specific areas or technologies that they are interested in monitoring are and where they are located. Are they on-premises or are they in the cloud? “That is a good starting point because it helps you understand if you need an application monitoring tool that’ s built for microservices moni-

The trouble with alerts

Alarms are a critical way to inform organizations of performance breakdowns. But alarm overload, and the number of false positives these systems kick off, has been a big pain point for those responsible for monitoring their application systems.

Amy Feldman, head of AI Ops product marketing at Broadcom, said this problem has existed since the beginning of monitoring. “This is a problem we’ve been trying to sell for at least 20 years, 20 plus years … we’ve always had a sea of alarms, ” she said. “There have always been tickets where you’re not sure where the root cause is coming from. There’s been lengthy war rooms, where customers and IT shops spend hours trying to figure out where the problem is coming from. ”

Feldman believes the industry is at a point now where sophisticated solutions using new algorithmic approaches to datasets have given organizations the capability to understand dependencies across an infrastructure network. Then, using causal pattern analysis, you understand the cause and effect of certain patterns that go on to be able to determine where your root cause is coming from.

“I think we’re at a really exciting point now, in our industry, where those challenges that we’ve always seen for the last 20 years, are something that we truly can accomplish today, ” she said. “We can reduce the noise inside of the Event Stream to be able to show what really has the biggest impact on your business and your end users. We’re able to correlate the data to be able to recognize and understand patterns. ‘I’ve seen this before, therefore, this problem is a recurring problem, this is how you fix the problem. ’” AI and ML are key, Feldman said. “I think APM was probably one of the first industries to kind of adopt that. But now we’re seeing that evolution of where it’s taking off across multiple data sets, whether that’s the cloud observability, data sets, networking, data sets, APM data sets, even, mainframe and queuing type information, all of that now is getting normalized in and then used your experience too. So all the information now is coming together is giving us a great opportunity. ” z

—DavidRubinstein

< continued from page 25 toring and therefore in the cloud, or if you still have a large number of on-premise Java-based applications, ” Kompella explained.

Much of monitoring applications in the cloud is reliant upon the providers giving you the data you need. Feldman said cloud providers could give you information through an API, or deliver it through their monitoring tool. The APM solution has to be able to assimilate that information too.

While Feldman said the cloud providers haven ’t always provided all the data needed for monitoring, she believes they ’ re getting better at it. “There ’ s definitely an opportunity for improvement. And in a lot of areas, you do see APM vendors also provide their own way to instrument the cloud... being able to install an agent inside of the cloud service, to be able to give you additional metrics, ” she said. “But we ’ re seeing, I think, a little bit more transparency than we had before in the past. And that’ s because they have to be able to provide that level of service. And being able to have that trend, a little bit of transparency, helps to increase communications between the service and the provider. ”

BigPanda wind” and decide to measure whichever way the wind blows. You really have to understand your systems to figure out what metrics are going to matter to you. One way to do that is by analyzing what is generating revenue. Kompella went on to explain that you have to look at where you ’ ve had outages or incidents in the last couple of months, how they ’ ve impacted your revenue and rating, and then that will lead you to the right type of APM or observability tools that can help you solve those problems.

Additionally, businesses need to look at their services from the evolution of their technology stack. For instance, a majority of their applications may be onpremises today, but the company might have a vision to migrate everything to the cloud over the next three years. “You want to make sure that whatever investments you make in APM tools are able to provide you the deep visibility your team needs. You

don ’t want to end up with a legacy tool that solves your existing problems, but then starts to break down over the next few years, ” said Kompella. “Technology leaders should judiciously analyze both what’ s in the bag today versus what’ s going to happen in the next few years, and then make a choice.

Have you have ever said these words? Eight different “We monitor APM tools and everything, we still can't see an SEE nothing.” incident coming?

Managing today’s digital services is a significant challenge given the scale, velocity and fragmentation of modern technology the scale, velocity and fragmentation of modern technology environments. Human teams simply can’t keep up with the volume of alerts and manual work required to maintain an extraordinary digital experience for customers. If you have multiple observability and monitoring tools, and still lack situational awareness, here’s how we help: • BigPanda delivers domain-agnostic AIOps. We consolidate data regardless of which tools and vendors you work with. Read what

Gartner has to say about domain-agnostic AIOps at my.bigpanda.io/GartnerMarketGuide.

• BigPanda correlates and enriches diverse data into insights.

With AIOps-powered event correlation and enrichment, we eliminate alert noise and give your experts highly-contextualized incidents. They pinpoint root cause and get through triage faster. • BigPanda improves collaboration with automation. Let us automatically push details about the profiled incident to your ticketing, chat and notification tools so you can eliminate manual handoffs and work on resolving the issue.

Prevent and resolve outages. Visit www.bigpanda.io

How does your solution help teams manage monitoring?

Mohan Kompella, vice president of product marketing at BigPanda:

There are two main ways we help. For large companies that have multiple observability tools, multiple monitoring tools and multiple APM tools, which is basically a majority of the market out there, BigPanda comes in and unifies all of those fragmented domains and teams using those fragmented siloed products. The number one reason why companies choose us is because we are vendor agnostic, we are domain agnostic, we sit in the middle and unify all these APM tools and vendors.

Secondly, we help with incident management — how you prevent and resolve outages. While APM and observability tools are fantastic at providing the deep, deep visibility businesses need, that forensic data doesn’t become important until later in the process. Teams need a smart detector to connect the dots and find probable causes or culprits, and then they can get into the forensics more.

When you have an outage or a massive incident that is crippling to your users or system, BigPanda connects all the dots, connects all the signals together and says here is the problem and here is what we think is causing it. BigPanda excels at that root probable cause, and then your APM experts can come in and dive deeper into the issue. BigPanda sits in the front for the detection problem, root cause identification, and the APM and observability tools can come in to surface the data and resolve the problem.

Amy Feldman, head of AIOps product marketing, Broadcom:

Broadcom's AIOps solution is based on open source, allowing it to be an open, agnostic platform, easily integrating various data sets such as metrics, logs, wire, performance, transactional and user experience. A differentiator is that the solution looks at time, text, topology and training in order to get to the root cause of the performance problem. Our APM plugs into our AIOps platform for increased observability.

We analyze data based on those four spectrums — time, text, topology and training. There's not one single approach that solves all problems; you have to look at it from different angles, and at all the pieces. And because the platform is open and agnostic, we can then incorporate all different kinds of data, which gives you that extra observability, because the more data that you have across the entire landscape, the better insights you can get out of it.

There is business-related data, user experience data, APM data, Open Tracing information, network data, and third-party data as well. We treat this data as if it was a first-class citizen, so it becomes part of the topology, incorporated into the data models, and incorporated into the platform itself. So that gives you that greater visibility you need to be able to deliver business outcomes.

AIOps from Broadcom includes our full-stack monitoring capabilities — APM, user experience, networking infrastructure, along with AI and ML reducing alarm noise, providing root cause analysis tied with intelligent automation to resolve issues quickly and improve customer experience. z

What is to come?

The reason monitoring strategies are becoming so important is because the pressure for digital transformation is just that much greater today. A recent report from management consulting company McKinsey & Company found the COVID-19 crisis has accelerated digital transformation efforts by seven years.

“During the pandemic, consumers have moved dramatically toward online channels, and companies and industries have responded in turn. The survey results confirm the rapid shift toward interacting with customers through digital channels. They also show that rates of adoption are years ahead of where they were when previous surveys were conducted, ” the report stated.

This means that the pressure to move or migrate to the cloud quickly is that much greater, according to Mohan Kompella, vice president of product marketing at BigPanda, and as a result APM solutions have to be built for the cloud.

“Enterprises can no longer afford to look for APM tools or observability tools that just don’t work in a cloud-native environment, ” he said.

Kompella also sees more intelligent APM capabilities coming out to meet today’s needs to move to the cloud or digitally transform. He went on to explain that APM capabilities are becoming very commoditized, so the differences between vendors are getting smaller and smaller. “Getting deep visibility into your applications has been largely solved by now. Companies need something to make sense of this tsunami of APM and observability data, ” he said.

The focus is now shifting to bringing artificial intelligence and machine learning into these tools to make sense of all the data. “The better the AI or the machine learning is at generating these insights, the better it is at helping users understand how they’re generating these insights, ” said Kompella.

“Every large company has similar problems, but when you start to dive in deeper, you realize that every company’s IT stack is set up a little bit differently. You absolutely need to be able to factor in that understanding of your unique topology in your unique ID stack into these machine learning models, ” said Kompella. z — Christina Cardoza

A guide to APM tools

n FEATURED PROVIDERS n

n Big Panda: Big Panda is a event correlation and automation platform powered by AIOps to help IT operations, network operations, DevOps and SRE teams detect, prevent and resolve outages. The platform prevents incidents from escalating into outages, enables rapid incident and outage resolution with automated root cause analysis, and automates manual tasks to speed up incident response.

n

Broadcom:

Broadcom DX Application Performance Management, part

of the AIOps Platform from Broadcom, delivers mobile-to mainframe observability for user behavior, performance analysis, and code-level diagnostics along with easy-to-use workflows and dashboard to understand the health of any multi-cloud app. The solution provides advanced analytics based on time, text, topology, and training, so you can pinpoint and resolve performance issues quickly and ensure that every user transaction becomes a loyalty-building interaction.

n Akamai provides application performance management as part of its Ion solution, which is a suite of intelligent performance optimizations and controls for delivering high-quality web iOS and Android app experiences. The solution continuously monitors real user behavior and adapts in real time to context, user behavior and connectivity changes.

n AppDynamics by Cisco is an APM provider that provides customers with information on user experience. Its Experience Journey Mapping feature tracks the application paths most common among users and evaluates performance, enabling customers to see how their users are interacting with their app. Companies can use AppDynamics to optimize customer journeys across devices and quickly identify any issues.

n Amazon CloudWatch is an application and infrastructure monitoring solution built for DevOps engineers, developers, SREs and IT managers. It provides data and actionable insights to monitor apps, respond to performance changes, optimize resource utilization, and get a unified view of operational health.

n Catchpoint is the enterprise-proven ally that empowers teams with the visibility and insight required to deliver on the digital experience demands of customers and employees. With its combined true synthetic, real user, network, and endpoint monitoring capabilities and the largest, most diverse global monitoring network in the industry, Catchpoint delivers in-depth, accurate, and full-stack performance insights.

n Datadog APM provides end-to-end distributed tracing at scale capabilities for front-end devices and databases. Users can monitor service dependencies, reduce latency, and eliminate errors for the best possible user experience.

n Dynatrace provides software intelligence to simplify enterprise cloud complexity and accelerate digital transformation. With AI and complete automation, our all in-one platform provides answers, not just data, about the performance of applications, the underlying infrastructure and the experience of all users.

n InfluxData: APM can be performed using InfluxData’s platform InfluxDB. InfluxDB is a purpose-built time series database, real-time analytics engine and visualization pane. It is a central platform where all metrics, events, logs and tracing data can be integrated and centrally monitored.

n Instana is a fully automatic APM solution that makes it easy to visualize and manage the performance of your business applications and services. The only APM solution built specifically for cloudnative microservice architectures, Instana leverages automation and AI to deliver immediate actionable information to DevOps.

n LaunchDarkly is a feature management platform that empowers all teams to safely deliver and control software through feature flags. By separating code deployments from feature releases, LaunchDarkly enables you to deploy faster, reduce risk, and iterate continuously. LaunchDarkly integrates with several observability and APM solutions such as AppDynamics, Datadog, Dynatrace, Honeycomb, New Relic, and SignalFX. These integrations help measure how each feature affects key service metrics such as response times and error rates.

n Lightstep‘s mission is to deliver insights that put organizations back in control of their complex software applications. It provides an accurate, detailed snapshot of the entire software system at any point in time, enabling organizations to identify bottlenecks and resolve incidents rapidly.

n Microsoft Azure Monitor provides full observability into applications, infrastructure and network. It’s application sights feature provides an APM service for developers and DevOps professionals to monitor live applications, detect performance anomalies, diagnose issues and understand what users are doing.

n New Relic One aims to go beyond traditional monitoring solutions by embracing observability. It provides users with a real-time view of operational data so they can respond faster, optimize better and build great modern software. It includes a telemetry data platform, full-stack observability, and applied intelligence.

n Oracle provides a complete end to-end application performance management solution for custom and Oracle applications. Oracle Enterprise Manager is designed for both cloud and on-premises deployments; it isolates and diagnoses problems fast, and reduces downtime, providing end-to-end visibility through real user monitoring; log monitoring; synthetic transaction monitoring; business transac-

tion management and business metrics.

n OpsRamp is a modern IT operations management platform that allows enterprise IT teams and MSPs to “control the chaos” of digital infrastructure. OpsRamp does this through hybrid discovery and monitoring, event and incident management, remediation and automation, powered by AIOps. Users can detect and resolve incidents faster, understand resource dependencies and avoid costly performance issues that result in lost revenue and productivity.

n OverOps captures code-level insight about application quality in real time to help DevOps teams deliver reliable software. Operating in any environment, OverOps employs both static and dynamic code analysis to collect unique data about every error and exception — both caught and uncaught — as well as performance slowdowns.

n Pepperdata is a leader in the APM space with proven products, operational experience, and deep expertise. It provides enterprises with predictable performance, empowered users, managed costs and managed growth for their big data investments, both on-premise and in the cloud.

n Plumbr is a modern monitoring solution designed to be used in microservice-ready environments. Using Plumbr, engineering teams can govern microservice application quality by using data from web application performance monitoring. Plumbr unifies the data from infrastructure, applications, and clients to expose the experience of a user. This makes it possible to discover, verify, fix and prevent issues.

n Riverbed’s application performance solutions provide superior levels of visibility into cloud-native applications — from end users, to microservices, to containers, to infrastructure — to help you dramatically accelerate the application lifecycle from DevOps through production.

n SmartBear: AlertSite’s global network of more than 340 monitoring nodes helps monitor availability and performance of applications and APIs, and find issues before they hit end consumers. The Web transaction recorder DejaClick helps record complex user transactions and turn them into monitors, without requiring any coding.

n Splunk APM enables users to innovate faster in the cloud, improve user experience and future-proof applications. It features NoSample full-fidelity trace ingestion so developers never miss an anomaly, AI-driven analytics and directed troubleshooting, high cardinality exploration of traces, and an open standards approach.

n Stackify by Netreo’s APM solution Retrace gives developers straightforward insights into performance bottlenecks. It integrates code profiling, error tracking and application logs; troubleshoots problems and looks for ways to optimize code; and collects detailed snaptops of what code is doing and how long it takes. z

< continued from page 26

Getting the big picture

Broadcom

’ s Feldman explained that a monitoring solution should give you perspective and context around what is happening, so having the traditional inside-out view of APM coupled with an outside-in perspective can aid in resolving issues when they arise. Such things as synthetic monitoring of network traffic, and real user monitoring of how applications are used can provide invaluable insight to an application ’ s performance. She also noted if the application is running in the cloud, you could use Open Tracing techniques to get things like service mesh information to understand what the user experience is for a particular cloud service.

Kompella added that log management and network performance monitoring (NPM) can help extend your monitoring capabilities. While APM tools are good at providing a deep dive of forensics or metrics, log traces help you go even deeper into what’ s going on with your applications and services and help improve performance, he said.

Network performance monitoring is also extremely important because most large enterprises are working in very hybrid environments where some parts of their technology stacks live on-premises and in the private or public cloud. Additionally, applications tend to have a multi-cloud strategy and are distributed across multiple cloud providers.

“Your technology stack is extremely fragmented and distributed across all these on-prem and cloud environments, which also means that understanding the performance of your network becomes super critical, ” said Kompella. “You might have the most resilient applications or the best APM tools, but if you ’ re not closely understanding network traffic trends or understanding the potential security issues impacting your network, that will end up impacting your customer experience or revenue generating services.

This article is from: