Measuring End-to-End Response Time Through ALL The Tiers larry.klein@hotsos.com 07 March 2005
www.hotsos.com
Copyright © 1999–2005 by Hotsos Enterprises, Ltd.
Slide 1
Agenda • A Case Study • Lessons Learned • Generalizing a Method for End to End Performance Optimization
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 2
A Case Study Client XYZ • Insurance company • Custom application – under development for two years – nighttime batch load of transaction data – daytime online edits, reviews, approvals of transaction data – in “pilot production” phase for 10% of total online user base • some users in geographically distributed corporate offices • other users work from home via dialup or broadband VPN • Online response times unacceptable – “everything is slow” – editors and reviewers are clerical and paid by “piece work”
• “Please help us fix our Oracle Database” www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 3
You should diagnose Oracle performance problems like you would diagnose any other performance problem. 1. “What’s the most important thing to be working on?” – Business priority 2. “Where did the system spend my time?” – Resource profile – Which SQL is responsible? – Which SQL to blame for time spent between db calls? 3. “What’s the most effective way to fix it?” – Eliminate waste, eliminate competition – Stop when “fixing” isn’t worth it 4. Repeat Source: Millsap, C.; Holt, J. 2003. Optimizing Oracle Performance. www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 4
I arrived on Monday. Client promised to pull together Transaction Information by Wednesday. Meanwhile…
Oracle Database
? www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 5
Analysis Start with the Oracle Database Why start with Oracle? • Single element in common across customers’ environments • Very well instrumented • All I knew about Client XYZ’s environment
www.hotsos.com
Here’s what I did: • v$session showed 200 JDBC Thin Client sessions – Connection Pool! • v$sess_io showed relatively “even” session usage • picked 5 out of 200 sessions • dbms_system.set_ev (sid,serial#,10046,12,’’) • captured and analyzed trace files
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 6
Trace File Analysis, Recommendations, Results • “SQL*Net message from client” predominated • “CPU service” for a handful of SQL statements taking one or a few seconds each • Each SQL statement preceded by a dbms_describe
With much iterative, cut/paste testing: • ran dbms_stats at 30% • one new index • one code change • one hint • individual SQL statements now subsecond • fixes implemented • made notes – to pursue reason why/ possible elimination of describe – to review init.ora CBO parm’s • •
www.hotsos.com
optimizer_index_cost_adj = 10 not 100 optimizer_index_cache = 80 not 0
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 7
Meanwhile, Client Feedback on Thursday then again on Friday after Fixes
User Action
Response Time Response Time After Fixes Before Fixes (seconds) (seconds)
Response Time Goal (seconds)
A
19
18
5
B
31
29
5
C
6
6
5
…
…
…
5
Clearly, not enough Progress!!! www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 8
Now What? What’s Happening End to End?
Oracle Database User
www.hotsos.com
?
SQL*Net msg from Client
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 9
The End to End Question is easier in the IBM Mainframe world Graphical Command Center
Mainframe
“The status light is red for the Catalog Center operation in Wichita because Carol Smith’s Order Entry transaction from there is taking 3.92 seconds which exceeds the SLA of 2.0 seconds; drilling down on the red status shows her online transaction waiting on the database, which is suffering from high I/O contention on file 23, due to large batch job XYZ that’s running right now at an inappropriate time of day.”
IBM Designed and Built Performance Considerations throughout its End to End Technology Stack www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 10
In the Open Systems World Our Strength is our Weakness… • Lower cost of computing • Robust solutions from multiple vendors • No common or dominant performance management architecture across the end to end technology stack of multiple vendors
How can I figure out the End to End Here?
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 11
Hotsos Method R Reprised 1. “What’s the most important thing to be working on?” – Business priority 2. “Where did the system spend my time?” – Resource profile – Which SQL is responsible? – Which SQL to blame for time spent between db calls? 3. “What’s the most effective way to fix it?” – Eliminate waste, eliminate competition – Stop when “fixing” isn’t worth it 4. Repeat Source: Millsap, C.; Holt, J. 2003. Optimizing Oracle Performance. www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 12
Generalizing Method R for End to End Performance Optimization • • • • • • •
Identify and Prioritize Important Transactions Determine Application Architecture Establish Instrumentation Plan Conduct Tests or Probe Production Measure Activity Compile Results Identify Opportunities to Optimize
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 13
Client XYZ’s Application Architecture After Many Conversations with Many “Silo” Teams 3rd Party Workflow Product Interface Conn Pool
Oracle
MQ
CICS/ Cobol
Workflows User
Trans
PC
Gateway
.Net
AP SO
Engine
HTTP
Client
SOAP
www.hotsos.com
MQ Interface
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 14
How to Instrument this End to End? 3rd Party Workflow Product
Key for “Logging” • Existing, custom written
Interface Conn Pool
• Existing, vendor provided
Oracle
Workflows User
Trans
PC
Gateway
AP SO
Engine .lis file
.log file
10046 traces
HTTP
.Net Client
SOAP
Stopwatch
MQ Interface
MQ
CICS/ Cobol
.out file
Sniffer
www.hotsos.com
.log file
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 15
Log File Attributes, Issues, and Demonstration Tests Attributes • Timestamps – start – end – duration (calculated) • Transaction/Step name – current process – called process
Issues • Awareness - Client’s Production Support team didn’t know that logs were available or how to enable them – Original architects no longer involved with project – “Custom” logs built by silo teams for their own purposes but not widely known to other teams – “Standard” logs supplied by vendors but not widely known by project team • Inconsistent timestamp granularity across log files
Test 1
Fail
UNIX servers not time synched
Test 2
Fail
Windows servers not time synched
Test 3
Fail
Logging filesystems filled up
Test 4
Fail
Sniffer buffer filled up
Test 5
Success
Collate activities cross platform by timestamp
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 16
The Logs were Helpful To Reveal Discrete Actions per Architecture Component .Net Client User Actions
Trans Gtwy SOAP Calls
Called Workflows
Called Oracle Procedures
A
S1
Wa
GET…
B
S2
Wb
DO…
C
S3
Wc
xyzLKUP…
…
…
…
…
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 17
The Logs when Collated by Timestamp were Helpful To Map the User Action/Transaction Flow • User Action A – SOAP Call 1
• User Action B – SOAP Call 7
• Workflow Wd
• Workflow Wz
– Oracle Proc xLKUP – Oracle Proc GET…
– SOAP Call 2
• MQ Call 23
– SOAP Call 9
• Workflow Wa
• Workflow Wz
– Oracle Proc jLKUP
• MQ Call 17
– Oracle Proc Do…
– SOAP Call 1
– SOAP Call 8
• Workflow Wd
• Workflow Wg – Oracle Proc yLKUP
www.hotsos.com
– Oracle Proc jLKUP
– Oracle Proc xLKUP – Oracle Proc GET…
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 18
The Logs were Helpful To Create a Resource Profile 3rd Party Workflow Product Interface Conn Pool
Oracle
Workflows User PC
Trans
“A”
.Net
AP SO
Engine
Gateway
.08
HTTP
seconds
8.70
Client
SOAP
18.00
9.00
seconds
seconds
www.hotsos.com
seconds
MQ Interface
MQ
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
CICS/ Cobol
Slide 19
The Logs were Helpful To Reveal “Too Much Work” and Other Issues
Count
Count % of Total Count
24
18.18%
73980.00
21.36%
3082.50
1
0.76%
3213.00
0.93%
3213.00
Lookup
81
61.36%
185327.00
53.51%
2287.99
S3
Other
14
10.61%
44598.00
12.88%
3185.57
S4
Other
12
9.09%
39222.00
11.32%
3268.50
132
100.00%
346340.00
100.00%
2623.79
SOAP Call
Type
S1
Lookup
LOGIN
Other
S2
Totals: www.hotsos.com
Sum (msec)
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Sum % of Total Sum
Avg (msec)
Slide 20
Client XYZ Status • “Thanks – this is the best End to End View we’ve ever had!” • Using logging on an ongoing basis • Collapsing multiple, ongoing static lookups – one static lookup at login – cache lookups in .Net client – realize tradeoff of one slightly longer login for many 6-8 second savings per edit or review throughout the day • Reevaluating current Transaction Gateway Usage (XML Lovefest) – currently maps .Net client inbound SOAP/XML to alternative SOAP/XML format before passing downstream to Workflow or MQ; receivers need to reparse for their own purposes – substantial latency/cpu consumption from SOAP/XML parsing and manipulation – considering simpler non-XML map for messages “once in glass house” to eliminate redundant construction/parsing overhead • Reevaluating current Workflow Engine Usage – currently used as “database driver” but each workflow = 1000+ steps around db call – considering using only when “flow” is required otherwise call db from Gateway
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 21
Lessons Learned If you are an Oracle Performance Analyst… • Current methods can quickly isolate problems to be inside or upstream of the database • If the problems are “not Oracle”, you can still add HUGE value – you already are successful within an existing method framework – you are a big picture thinker – you know what questions need answers – you have the confidence to lead – you can be an objective facilitator across many people and technologies – you can dig out the details from simple log file analysis • You should check your eyeglass prescription before reviewing logs • You need to expand your past successes with the Generalized Method R for End to End Optimization
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 22
Lessons Learned If you are Building or Managing an Application • Performance needs to be architected into an Application by Design • Performance needs to be a “mindset” throughout a Project Life Cycle, not an afterthought months or years later • When in “Design Doubt,” Performance Proofs of Concept can channel development efforts in a positive and efficient direction • Performance methods and tools need to be – well-tested – documented – implemented • You need to Employ the Generalized Method R for End to End Performance Optimization
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 23
The Generalized Method R for End to End Performance Optimization • • • • • • •
Identify and Prioritize Important Transactions Determine Application Architecture Establish Instrumentation Plan Conduct Tests or Probe Production Measure Activity Compile Results Identify Opportunities to Optimize
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 24
Identify and Prioritize Important Transactions • Critical to the business • Priority order • Use case – user – location – navigation path – data • Current and desired response time • Improvement’s value to the business
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 25
Determine Application Architecture User
Trans
• • •
Oracle Database
Middle Tier
Web Server
Service
DB
Request
Call
Identify the major technology building blocks through which the business transactions flow Identify the service calls/interfaces among the blocks Construct a “sequence diagram” template to map the blocks User Time
Unacct Time
www.hotsos.com
Web Time
Unacct Time
Middle Tier Time
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Unacct Time
Database Time
Slide 26
Establish Instrumentation Plan For each transaction, • establish how to measure end-user time • establish how to measure response time consumed by each building block in the architecture
Trans
Stopwatch? Sniffer? www.hotsos.com
Middle Tier
Web Server
User
Oracle Database
Service
DB
Request
Call
Log File?
Log File?
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Traces Slide 27
Conduct Tests or Probe Production • Define goals • Construct a capture plan – How/who will drive transactions? – How/who will support the logging? • Prepare to execute the plan – Time synch all platforms – Enough space for logging? – “Skinny down” test connection pool • Execute the plan • Measure activity • Preserve the results, outputs www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 28
Measure Activity
Phase
Steps
Before
Enable data collection for each relevant block
During
Confirm that data are being collected properly
After
• Turn off data collection • Capture the results
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 29
Compile Results • • • • • •
Use the Sequence Diagram Template Sift through the collected data Key individual event data into the template Record or calculate total “accounted for” time Derive “unaccounted for” time Determine if “unaccounted for” requires refinement and a new test
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 30
Identify Opportunities to Optimize • Review sequence diagrams • Construct resource profiles • In descending order by time consumption – identify the component consuming the most time – evaluate optimization opportunities • eliminating unnecessary service calls • combining many service calls into fewer • tuning individual service calls or actions
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 31
References Millsap, C.; Holt, J. 2003. Optimizing Oracle Performance. Sebastopol CA: O’Reilly. This book provides a full description of Method R, a detailed reference for Oracle’s extended SQL trace facility, an introduction to queueing theory for the Oracle practitioner, and a set of worked performance improvement example cases.
www.hotsos.com
Copyright © 1999–2004 by Hotsos Enterprises, Ltd.
Slide 32