1 minute read
microservices
imagine an organization that has a distributed system with multiple microservices. one of the developers is trying to perform a root cause analysis of an increase in aPi latency
The developer struggles with quickly identifying the outlier of long aPi calls, and with understanding the distribution of duration within the aPi call and zooming in on the ones that present the longest duration she also struggles with understanding the full context, meaning the data on what the specific aPi’s downstream dependencies are, in order to figure out the root cause and map the bottlenecks across the e2e flow. another important point is repro- duction, exploring if the problem is still there and doesn’t represent a momentary issue. a good observability approach and a few best practices can solve the mentioned challenges, and help instantly troubleshoot the example above, but it should include tooling that allows:
The next challenge is identifying the aPi call that is responsible for the latency issue and determining whether it impacts all other traces or is just local in nature.
This is just one simple example but developers encounter similar issues on a daily basis: from aPi discovery to validating aPis, to investigating and troubleshooting issues.
• aPi-level observability
• dependencies between aPis
• aPi specs and their enforcement
API observability best practices 1.Enable
auto-instrumentation. one of the best practices to overcome aPi observability challenges is through the use of auto-instrumentation instrumentation refers to the process of adding code to an application or service to collect data about its behavior and performance instrumentation is valuable for troubleshooting microservices because it provides a comprehensive view that is otherwise almost impossible to obtain through other methods it collects data by instrumenting every service. This can include metrics such as request latency, error rates, and resource usage in addition, it allows real-time monitoring, hence enabling developers to quickly identify and respond to issues as they arise This can help reduce downtime and improve system reliability. another benefit is it provides a continued on page 8 >