026_SDT058.qxp_Layout 1 3/28/22 5:50 PM Page 26
26
SD Times
April 2022
www.sdtimes.com
Guest View BY VIJOY PANDEY
The case for full-stack observability Vijoy Pandey is VP of Emerging Technologies and Incubations (ET&I) at Cisco.
T
he application-first digital economy and future of work slowly taking shape over the past few years got a jolt of adrenaline in March of 2020. Before the pandemic, 50 percent of companies polled by the World Economic Forum expected that software, automation and AI would lead to some significant reskilling of their workforce as well as some reductions. COVID-19 significantly accelerated and exacerbated this, profoundly impacting software developers.
The application is the new brand The business agility and quality of digital experience provided by modern applications has led to the latest industry mantra: the application experience is the new brand. This application experience demands a faster cadence of features and functions, consistent availability, enhanced application performance, and paramount trust and security around the data being handled by the application. AppDynamics’ App Attention Index shows brands have one shot to deliver the ‘total application experience.’ At the heart of providing this application experience is the developer, who is now tasked to deliver these apps and features faster, with higher availability and better security than ever before. In this distributed modern application development environment, being able to observe your applications end-to-end and top-to-bottom is critical to providing better customer experience, application availability and performance. This visibility is also key to driving down mean time to resolution (MTTR) on failures, and monitoring KPIs on how the business is doing. This is known as full-stack observability. Full-stack observability allows any persona — developer, SRE, product, customer success, or business lead — to answer the questions of “What Happened?” “Where did it happen?” “Why did it happen?” and “Can it happen in the future?”
At the heart of providing this application experience is the developer.
Alice, and her rendezvous with full-stack observability Alice is a developer in the mobile banking app team at New Bank, Inc. Two months into the pan-
demic her product manager asked her to develop a new feature for the New Bank mobile app: Contactless Cash Withdrawal. The customer experience was quite simple, but the development experience was anything but. Alice had to start with mobile (say iOS) APIs, as that’s where her customers interacted with the app. Her entire back end was in AWS, so she had to select her AWS services carefully, while customer data was accessible via Salesforce SaaS APIs. Her bank’s transactional back ends existed on-premises on bare metal servers over a monolithic database whose APIs provided a global and account-level consistency picture, while her branch ATM’s edge compute nodes had a different set of APIs to manage geo-local cash consistency. There were other SaaS APIs to manage location, identity, compliance, etc. A month after production deployment, the customer success team starts getting an increased number of calls around the contactless cash withdrawal feature taking too much time in spitting out the cash at various ATMs. Simultaneously, using a full-stack observability solution, the business metrics team witnesses increased transaction delays in the Digital Endpoint Monitoring (DEM) dashboard for the mobile banking app. Alice and her fellow developers and SREs start invoking code using the full-stack observability APIs that uniformly queries and correlates relevant events across the Data Platform, which includes Metrics, Logs and Traces from every API, app, service, and infrastructure (HW or SW) component outlined in the distributed development environment above. After a few quick debugging cycles, they noticed that the latency between a service in AWS US-East and their on-premises software stack had been steadily increasing over the past hour. Using any capable monitoring tool, one could easily jump to the conclusion that this could be a network problem. But using full-stack observability, they could find out that a few memory (RAM) banks on their on-premises database server had failed. This was causing that database server to queue up incoming requests, which in turn was driving up the service layer latency between the AWS service and their on-premises software stack. z