Millsap - Profiling Oracle, How it Works by rocker12

Profiling Oracle: How it Works

Cary Millsap Hotsos Enterprises, Ltd. Revised 31 January 2005

Conventional Oracle “tuning” methods are outrageously complex, and they don’t work. For more than three decades, good software developers have known how to optimize computer-executed tasks using a simple method that works every time. This presentation describes how to use this method in an Oracle environment, and it demonstrates the method’s use with an example.

1 How Long? Why? After you strip everything away except for what you really want, performance analysis is the study of how long something takes. In daily life, you analyze performance every day. Getting places on time, choosing the right queue at the toll plaza, minimizing errand times: each is a tiny exercise in performance analysis. In life, the study of performance is the study of how long something takes. To make something faster, you begin by considering exactly where the time went for the individual task that’s most interesting to you. In your Oracle life, though, you’ve been trained to view things differently. Professionals have taught you that looking directly at how long an individual task takes is either unproductive or impossible. You’re taught that the only way to analyze Oracle performance is by studying a complex network of statistics that don’t exactly answer the real question you’re trying to answer.

You can see a great example of what I’m talking about by looking at a standard Oracle Statspack report.1 Nowhere on a Statspack report does there appear any account of how long your particular program took. Instead, Statspack gives you a completely different view. It gives you information about resources consumed on your system while your program was running—including resources your program consumed plus resources consumed by anything else running during the same observation period. Statspack gives you information from the system’s perspective, not from your perspective. Asking Statspack how long something took is rather like asking how long a flight from Dallas to Chicago should take, and then having someone give you an answer like this: “The average fuel consumption per flight for all flights conducted upon the planet in the past year is x.”

There are only a couple of things you can do with information like that: •

You could try to extract the information you really wanted from the information you got. For some, it’s a fun technical challenge. For example, you might use the relationship average-flight-duration = total-fuel ÷ average-fuel-consumption-rate to try to compute some approximation of a world-wide average flight duration. But even if you got this far, you would need to realize that this “average flight duration” you’ve found includes thousands of different kinds of flights. Such a number would include some 18½-hour flights from Los Angeles to Singapore, some 35-minute flights from Honolulu to Maui, and everything in-between. What can an average like that possibly tell you about the duration of a flight from Dallas to Chicago? If you’re lucky, the duration of a Dallas-to-Chicago flight might resemble the worldwide average. But, realize… You’ve got to be lucky for the answer to do you any good at all.

•

Or you could insist upon a direct answer to your question: How long does a flight from Dallas to Chicago take? Such information is, of course, attainable through other means.

Why doesn’t a Statspack report tell you how long your program takes? One good reason is that such information cannot be derived from the kinds of operational data that Statspack uses as its input (at least not from the stuff you’re taught to give it as its input).2 Perhaps the biggest problem with Statspack is that when

Statspack is “a set of SQL, PL/SQL, and SQL*Plus scripts that allow the collection, automation, storage, and viewing of performance data. This feature has been replaced by the Automatic Workload Repository.” From Oracle® Database Performance Tuning Guide 10g Release 1 (10.1) available at otn.oracle.com.

Chapter 8 of Optimizing Oracle Performance, by Cary Millsap and Jeff Holt, contains more than 40 pages of detail describing why this is so.

you’re trying to diagnose a performance problem, sometimes it tells you. The big problem, you see, is that sometimes it doesn’t. Statspack might give you the right answer for the same reason that a worldwide flight-time average might actually resemble the average flight time from Dallas to Chicago. But it might not. Traditional Oracle performance analysis methods just don’t work for several classes of commonly occurring performance problems.3 The problem is that there’s no direct correspondence between the metric values you’re “supposed to” study and the goodness or badness of performance in your Oracle system. Example: Most analysts are taught that a database buffer cache hit ratio value near 100% is an indicator of good system health. However, such an assumption leads to false positive conclusions, because several types of program inefficiencies drive nice, high database buffer cache hit ratio values. One such problem is application code that fails to use Oracle’s array processing features. Imagine fetching 100 rows from a single Oracle block with 100 fetch calls. This operation would produce a 99% buffer cache hit ratio (a cache miss on the first read, and then 99 hits on subsequent re-reads of the same block). Imagine fetching the same 100 rows from the same Oracle block with just one fetch call. This single-fetch version executes more quickly than the 100-fetch version (much more quickly if you conduct the two tests through Oracle Net on a WAN), but its buffer cache hit ratio would fall to 0%.4 The traditional performance analysis methods that the Oracle culture teaches its newcomers are medieval in comparison to the tools and methods that professional software developers use.5 Why not just look at what you’re trying to see (what took so long)? The answer “because it’s impossible” is simply not acceptable, because developers do it all the time. The way we do it is by using software profilers.

One might argue that they’re so “commonly occurring” specifically because traditional methods leave them undiagnosed.

For details and more examples, see “Why a 99%+ database buffer cache hit ratio is not ok” and “Why you should focus on LIOs instead of PIOs,” available at www.hotsos.com. 5

That’s not to say that tools that Oracle performance analysts use don’t look nice enough. Most traditional Oracle performance analysis tools are tremendously attractive. They just don’t tell you where your code is spending its time.

2 The Role of Profilers A profiler is a tool that reports a useful decomposition of a computer program’s response time, so that a performance analyst can accurately diagnose the root cause of a performance problem. Profilers are nothing new. The principle is at least three decades old.6 Countless software developers even today use the GNU profiler called gprof, which has been around since 1988.7 Exhibit 1 shows an example of a flat profile. A flat profile shows the total amount of time that a program has spent executing each of its named internal subroutines. Exhibit 1. Sample output of gprof.8 $ gprof Flat profile: Each sample counts as 0.01 seconds. % cumul. self self total time seconds seconds calls us/call us/call name 68.59 2.14 2.14 62135400 0.03 0.03 step 31.09 3.11 0.97 499999 1.94 6.22 nseq 0.32 3.12 0.01 main

This particular program’s response time was 3.12 seconds (you can see this in the bottom row of the cumul. seconds column). From the output, you can see exactly where the time went: 68.59% of the program’s response time was consumed by 62,135,400 executions of the subroutine called step. GNU gprof output doesn’t show exactly where a program’s time went, but it’s a statistically valid enough indication of it for gprof to serve plenty well as a reliable performance diagnostic tool. The GNU gprof manual page explains the impact of the statistical errors and how to work around them. Oracle timing data, which I’ll get to in a moment, produces more accurate timing information that is not susceptible to the same statistical errors that gprof endures. Whenever more than one thing is happening on a computer system at a given time, a profiling tool like gprof is utterly indispensable for determining why a

Knuth, D. E. 1971. “Empirical study of FORTRAN programs” in Software—Practice and Experience, Apr/Jun 1971, Vol. 1, No. 2, pp105–133. Mainframers in the audience might remember the famous mainframe profiling tool called STROBE, or the Candle Omegamon feature called INSPECT. 7

Fenlason, J.; Stallman, R. 1988. “The GNU profiler,” available at http://www.gnu.org/software/binutils/manual/gprof-2.9.1/html_mono/gprof.html.

Gough, B. J. 2004. An Introduction to GCC—for the GNU compilers gcc and g++. Bristol UK: Network Theory Ltd.

particular program is running longer than you want it to.9 Especially under conditions of heterogeneous load, the kinds of hit rates and utilization metrics that you’ve been trained to consider are stunningly unreliable indications of what you might do to make a program run faster. But you, the Oracle performance analyst, can’t use gprof to debug and optimize the code in the Oracle kernel (I’ll show you why in a minute). You have to rely upon the operational data that the Oracle kernel developers have given you. In the 1980s, you were literally condemned to the state of not being able to find out where your response time is going. But in late 1992, Oracle Corporation made a significant step toward solving the problem. If you’re using any version of Oracle at least as new as 7.0.12, then you automatically possess an essential key to being able to profile your Oracle application programs.

3 Instrumenting Your Code To use gprof, you have to instruct the compiler to build the appropriate profiling library functions into your executable—when you use the GNU C compiler called gcc, you have to use the -pg option. Thus, you can’t use gprof upon the Oracle kernel code, because you’re not permitted to compile it. However, you can create profile data for an Oracle session by activating a special Oracle mechanism that causes the Oracle kernel to store its own operational timing data. Oracle Corporation calls this mechanism extended SQL trace. An Oracle session’s extended SQL trace attribute is easy to manipulate. Before you get started, make sure that the Oracle instance parameter called TIMED_STATISTICS is set to TRUE. This is safe and easy to do. (Beginning with version 7.3, a database administrator can do it online with an ALTER SYSTEM command.) Beginning with Oracle version 9, Oracle Corporation began shipping Oracle database software with TRUE as the parameter’s default value. Once you have your TIMED_STATISTICS setting in proper shape, you can activate and deactivate tracing using standard Oracle packages, as shown in Exhibits 2 and 3.

Software profiling is a momentous rediscovery of the twenty-first century. Newer profiling tools like the DTrace tool that is built into Sun Microsystems’s Solaris 10 kernel provide spectacular diagnostic capabilities See http://www.sun.com/2004-0518/feature/ for more information.

Exhibit 2. Activating and deactivating Oracle extended SQL trace in Oracle version 10. /* If you can modify your Oracle application’s source code, then do this. */ dbms_monitor.session_trace_enable(null, null, true, true) /* Your code to be profiled goes here. */ dbms_monitor.session_trace_disable() /* Otherwise, if your application tasks identify themselves by service/module/action name with either dbms_application_info or with the new version 10 OCI attributes (OCI_ATTR_*), then do this. */ dbms_monitor.serv_mod_act_trace_enable(:service, :module, :action, true, true) /* Code to be profiled runs during this time. */ dbms_monitor.serv_mod_act_trace_disable(:service, :module, :action) /* …or just disconnect */ /* Otherwise, do this */ dbms_monitor.session_trace_enable(:sid, :serial, true, true) /* Code to be profiled runs during this time. */ dbms_monitor.session_trace_disable(:sid, :serial) /* …or just disconnect */

Exhibit 3. Activating and deactivating Oracle extended SQL trace in Oracle versions 8 and 9. /* If you can modify your Oracle application’s source code, then do this. */ dbms_support.start_trace(true,true) /* Your code to be profiled goes here. */ dbms_support.stop_trace()/* …or just disconnect */ /* Otherwise, do this */ dbms_support.start_trace_in_session(:sid, :serial, true, true) /* Code to be profiled runs during this time. */ dbms_support.stop_trace_in_session(:sid, :serial) /* …or just disconnect */

4 Profiling Oracle The problem with profiling Oracle prior to 1992 was that Oracle customers couldn’t do it. The Oracle kernel simply couldn’t emit the operational data required to produce a profile. This is why traditional methods of looking at system-wide resource consumption statistics were invented in the first place (like the ones implemented in Statspack). When Oracle Corporation fixed that problem in version 7.0.12, we graduated to a new problem: raw extended SQL trace data is messy and difficult to interpret. For almost a decade, most Oracle customers either didn’t know that extended SQL trace data even existed, or they knew about it but were unaware of how much diagnostic power lay in the data, because the files Oracle emits are so complicated. Since the year 2000, a few colleagues and I have devoted our careers to determining the minimal necessary sequence of steps required to solve an

arbitrary Oracle performance problem. What we’ve learned is that Oracle extended SQL trace files contain so much diagnostic value that we almost never need to look at any other operational performance data. This way of looking at a system from the user’s experience inward is an upside-down way to look at a system compared to the traditional approach of looking from the guts of the system outward. But with experience it becomes clear that looking at a system from the inside out was the upside-down way all along. After you learn how to extricate meaning from Oracle extended SQL trace data, there remains only one problem. To manage the information contained within the very large extended SQL trace files that the Oracle kernel emits, you need software assistance. This used to be a difficult technical problem. Now it’s merely a question of economics, because today you can buy textbooks and education that teach you how to extract what you need from your trace data, and you can buy prepackaged software or services if you prefer to have someone else do the work for you.10

4.1 Flat Profiles The GNU gprof tool serves as a good motivational metaphor for Oracle profiling, because it produces two kinds of profiles. The first kind of profile, the flat profile (which you’ve seen already), is a decomposition of total response time by subroutine call. One type of flat profile for an Oracle session decomposes response time by named subroutines inside the Oracle kernel that Oracle Corporation calls timed events. Creating a flat profile by timed event for a selected Oracle session is straightforward. Each year I teach a few hundred students how to do it. Exhibit 4 shows a flat profile for a real performance problem that my company helped a customer solve a few years ago.

Further information about one means of obtaining such courses, books, software, and services is rendered in the About Hotsos section at the end of this paper.

Exhibit 4. A flat profile for a problem program on an Oracle system. Oracle subroutine --------------------------------SQL*Net message from client SQL*Net more data from client db file sequential read CPU service, EXEC CPU service, PARSE CPU service, FETCH unaccounted-for latch free log file sync SQL*Net more data to client log file switch completion enqueue SQL*Net message to client buffer busy waits db file scattered read SQL*Net break/reset to client --------------------------------Total response time

Duration (secs) ----------------984.010 49.6% 418.820 21.1% 279.340 14.1% 136.880 6.9% 74.490 3.8% 37.320 1.9% 27.720 1.4% 23.690 1.2% 1.090 0.1% 0.830 0.0% 0.280 0.0% 0.250 0.0% 0.240 0.0% 0.220 0.0% 0.010 0.0% 0.000 0.0% ----------------1,985.190 100.0%

# Calls --------95,161 3,345 45,084 67,888 10,098 57,217 1 34,695 506 15,982 3 106 95,161 67 2 2 ---------

Dur/call ----------0.010340 0.125208 0.006196 0.002016 0.007377 0.000652 27.720000 0.000683 0.002154 0.000052 0.093333 0.002358 0.000003 0.003284 0.005000 0.000000 -----------

We were actually able to solve the customer’s performance problem by looking at only the data shown in Exhibit 4. The highlighted number gave us everything we needed to determine that the company had implemented an Oracle Net configuration error. Diagnosis required roughly ten minutes. Testing the solution required another half hour. Production implementation took place the following day. The repair eliminated roughly half of the total response time consumed by this program. However, prior to profiling the program, the company had spent several months trying to solve the problem with traditional performance methods—with no success whatsoever.11 Another type of flat profile that is valuable in an Oracle context is the profile by database call. For example, it is interesting to know how much time was consumed by parsing for a given SQL statement. How much time by executing? And how much time by fetching? For such information to constitute a true profile the total response time rendered in the table needs to be the total response time contributed by the statement. To do this requires the addition of a row denoting time spent between database calls. Sometimes the time spent between calls dominates a statement’s total response time. Exhibit 5 shows an example of a database call profile. In this example, you can see that the dominant response time consumer for the statement was the time spent between database calls, followed distantly by the time spent executing and, finally, parsing the statement. If you’re familiar with the behavior that causes most between-call event time that is attributable to SQL statements, the greatest benefit of the database call profile in Exhibit 5 is the information about how

The problem and its solution are described in their entirety in Optimizing Oracle Performance, pages 327–332.

many calls there were to each type of database call (highlighted). There’s more information about this in the case study later in this paper. Exhibit 5. A flat profile that decomposes Oracle response time by database call. Database call -------------------Between-call events EXEC PARSE -------------------Total -------------------Total per EXEC Total per row

-----Duration (seconds)----Elapsed CPU Other ------------- ----- -----23.850 93.2% 0.000 23.850 0.880 3.4% 0.890 -0.010 0.870 3.4% 0.820 0.050 ------------- ----- -----25.600 100.0% 1.710 23.890 ------------- ----- -----0.074 0.3% 0.005 0.069 0.074 0.3% 0.005 0.069

Calls -----0 348 696 -----1,044 -----3 3

Rows ----0 348 0 ----348 ----1 1

LIOs -----0 3,859 0 -----3,859 -----11 11

PIOs ---0 351 0 ---351 ---1 10

4.2 Call Graphs The second kind of profile in the GNU gprof output is the call graph. A call graph shows response time as it pertains to the relationships among subroutines being called and subroutines doing the calling. A call graph in the GNU gprof sense is not a picture; it’s a table that represents parent-child relationships using indentation. A call graph is interesting in the Oracle context because of the way that Oracle database calls can nest (Oracle Corporation uses the term recursive SQL to refer to nested database calls). For example, the following statement produces such nested database calls: select object_name from dba_objects where object_id=:id

Specifically, the Oracle parse call for this statement motivates three database call “children”: a parse call, an exec call, and a fetch call upon a new SQL statement that is never mentioned in the top-level application code. The SQL statement that the Oracle kernel must parse, execute, and fetch before it can finish parsing our application’s select statement is this one: select text from view$ where rowid=:1

A call graph for the relationship between our application SQL and its recursive SQL might look like this:

Statement -------------------------------------select object_name from dba_objects... │ select text from view$ where ro... -------------------------------------Total

------Duration (seconds)-----Including Self children --------------- ------------0.200s 40.0% 0.500 100% 0.300s 60.0% 0.300 60% --------------- ------------0.500s 100.0%

A true call graph for any task that had an n-second duration would show a total response time of n seconds, even if it had to add a line item labeled “time spent

between statements.” Oracle sessions often consume a considerable amount of response time between statements, as I’ll reinforce shortly in the case study. The great benefit of having an Oracle call graph is that it enables you to affix the blame for poor performance accurately upon the specific SQL statement that is causing your pain. For example, if a recursive SQL statement that reloads your dictionary cache (part of your shared pool) is a dominant contributor to your response time, a call graph shows you which SQL statements the recursive statement is a child of. Perhaps it’s possible to parse those particular statements less often. With the call graph, you can target those specific statements that will produce the greatest result for the least effort. Without the call graph, you might mistakenly believe that you had to reduce parse calls system-wide, or that your best recourse is to increase the size of your shared pool.

5 Case Study When you use a profiler, your work follows a repeatable plan. This section shows a case that illustrates the process. The root cause of the performance problem in this case is like dozens we see each year in the field.

5.1 Baseline The case begins with a batch job that takes too long to run. The database server is a single-CPU Windows XP machine running Oracle version 9.2.0.4. There are two concurrent processes on the system, each running the same kind of batch job, but manipulating completely different tables in a single database. Each client process inserts 5,000 rows into its own table. The application client process runs on the same host as the Oracle kernel. The client connects via the following alias defined in the system’s tnsnames.ora file: v92 = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = CVM-LAP02)(PORT = 1521)) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = v92.hotsos) ) )

Exhibit 6 shows the baseline flat profile for the slow job. The job ran for 11.6 seconds, and the flat profile shows why.

Exhibit 6. The baseline flat profile for the problem batch job.12 Oracle subroutine --------------------------------instrumentation gap SQL*Net message from client CPU service, PARSE calls CPU service, EXEC calls SQL*Net message to client db file sequential read log file sync latch free instrumentation overlap --------------------------------Total

Duration (secs) ----------------8.182 70.5% 2.371 20.4% 1.843 15.9% 0.411 3.5% 0.066 0.6% 0.055 0.5% 0.017 0.1% 0.006 0.1% -1.350 -11.6% ----------------11.601 100.0%

# Calls Dur/call --------- ----------10,071 0.000812 20,007 0.000119 5,001 0.000368 5,002 0.000082 20,007 0.000003 2 0.027650 3 0.005714 19 0.000315 140 0.009495 --------- -----------

The profile is clear: 70.5% of the slow program’s response time is due to a response time component called instrumentation gap. Whether you know what that means or not, the profile says undeniably where you need to invest your effort. But what does instrumentation gap mean? There’s no “official” Oracle timed event named this (if you review the content of Oracle’s V$EVENT_NAME fixed view, you’ll notice that there’s no Oracle timed event named instrumentation gap). The instrumentation gap timed event is an event that a profiler must synthesize to recognize otherwise unexplained jumps in the timer values rendered within an Oracle trace file. It’s all part of the effort to account for exactly the response time that the user has experienced. In the Oracle case, when instrumentation gap time dominates a profile, it usually indicates that a process has spent a lot of its time in the operating system’s preempted state.13 The most effective way to reduce the amount of time that a program spends in this state is to reduce competition for the system’s CPU. (An operating system scheduler will not preempt a process if there’s no other process waiting in the run queue.) The second timed event listed in the profile is SQL*Net message from client. Most Oracle instruction teaches you that you’re supposed to ignore this particular event, because it represents time that the Oracle kernel spends “idle,” awaiting some command. However, this particular “idle” event accounts for 20.4% of our program’s response time. To ignore the time attributed to this event would be an exceptionally bad policy, as you’ll see in a moment.

All the profiles for this case study were computed using a pre-beta version of a software product called Hotsos Profiler P5. The profiles for this particular case may vary slightly in later, more thoroughly tested versions of the software. However, the gist of the case—the flow of the diagnosis—are exactly correct. 13

For details about why it works this way, see pages 170–175 of Optimizing Oracle Performance.

The third row in the profile, labeled CPU service, PARSE calls, is your first clue about what kind of CPU service it is that this program has been competing to use. A program can be preempted and tally instrumentation gap time only if it is trying to consume CPU service. This row tells us that this program has consumed nearly two seconds of CPU time making Oracle parse calls: 5,001 of them, in fact. The fourth row in the profile (CPU service, EXEC calls) shows that the program consumed 0.411 seconds of its total response time actually inserting the rows into its table. Contemplation of this row puts things into perspective: of the 11.632 seconds it took to run this thing, the program spent less than half a second (only 3.5% of the job’s run time) doing the work I had asked it to do. An efficient program would spend less time doing anything other than inserting the rows we need inserted. Experienced profile users will recognize a common poor-performance pattern in this profile. Notice that its number of parse calls nearly equals the number of exec calls: This program parses too much.

5.2 Drill-down So, which SQL is it that’s being parsed too much? The answer to this question lies in the job’s profile by SQL statement (the job’s call graph), shown in Exhibit 7. From this profile, it’s easy to tell that the offending SQL statement is an insert statement, which single-handedly accounts for 97.7% of the job’s 11.601-second total response time. Exhibit 7. The SQL statement profile for the problem case. Cursor statement --------------------------------------insert into parse2 values (1, lp… │ select file# from file$ where ts… │ update tsq$ set blocks=:3,maxblo… │ select u.name,o.name, t.update$,… [[no text for statement]] │ select order#,columns,types from… │ │ select i.obj#,i.ts#,i.file#,i.bl… │ │ select name,intcol#,segcol#,type… │ │ select ts#,file#,block#,nvl(bobj… │ │ select type#,blocks,extents,mine… … (35 other statements have been elided) --------------------------------------Total

Duration of self (sec) ------- -----11.333 97.7% 0.002 0.0% 0.001 0.0% 0.000 0.0% 0.000 0.0% 0.039 0.3% 0.000 0.0% 0.000 0.0% 0.000 0.0% 0.000 0.0%

Duration incl. children (sec) ------- -----11.336 97.7% 0.002 0.0% 0.001 0.0% 0.000 0.0% 0.094 0.8% 0.039 0.3% 0.000 0.0% 0.000 0.0% 0.000 0.0% 0.000 0.0%

------- -----11.601 100.0%

------- ------

The database call profile for this SQL statement tells the story of where the statement spent all this time. This profile is shown in Exhibit 8.

Exhibit 8. The database call profile for the problem case. Oracle hash value: 2581399381 Statement re-use: 5,000 similar but distinct texts Response time contribution: 11.333 seconds (97.7% of task total 11.601 seconds) Database call -------------------Between-call events PARSE EXEC -------------------Total -------------------Total per EXEC Total per row

-----Duration (seconds)----Elapsed CPU Other ------------- ----- -----6.541 57.7% 0.000 6.541 3.801 33.5% 1.843 1.958 0.991 8.7% 0.411 0.580 ------------- ----- -----11.333 100.0% 2.253 9.079 ------------- ----- -----0.002 0.0% 0.000 0.002 0.002 0.0% 0.000 0.002

Calls -----0 5,000 5,000 -----10,000 -----2 2

Rows ----0 0 5,000 ----5,000 ----1 1

LIOs -----0 0 30,851 -----30,851 -----6 6

PIOs ---0 0 0 ---0 ---0 0

insert into parse2 values (1, lpad('1',20))

Most of the job’s time was consumed between database calls. What’s the best way to reduce the time consumed between database calls? The best way to reduce time spent between database calls is to reduce the number of database calls to begin with. Exhibit 8 shows that this job’s insert “statement” executed 10,000 database calls altogether, with 5,000 parse calls and 5,000 exec calls. Parsing consumed far more time than executing. So, why does the job execute 5,000 Oracle parse calls? The highlighted text in the profile heading tells the answer: there are 5,000 similar but distinct versions of this SQL text. A quick look either at the raw trace data or the application code itself shows the following sequence of SQL statements that were processed during the problem job: insert insert insert … insert

into parse2 values (1, lpad('1',20)) into parse2 values (2, lpad('2',20)) into parse2 values (3, lpad('3',20)) into parse2 values (5000, lpad('5000',20))

Each distinct SQL statement is, in turn, parsed and then executed. That’s a big waste of time and energy, because with Oracle, an application doesn’t need to do so much work to get the job done of inserting 5,000 rows into a table. A look at the application source code reveals the problem. The source code says, essentially:14 for each row, varying $n over some range of values $result = do("insert into t values ($n, …)")

Each year, we see this problem manifested in many languages including Java, C, Visual Basic, and others. In this paper, I use a pseudocode that’s similar in structure to Unix shell or Perl programming, using the convention that scalar variables are denoted with a $ prefix. For example, the statement $x=1 assigns the value 1 to the variable named $x, and f($x) executes the function called f upon the value of $x.

Most database interfaces offer a function similar to the do() function shown here. A call to do($statement) behaves effectively like this: $handle = parse($statement) $result = exec($handle)

So the application is making database calls like this: for each row, varying $n over some range of values $handle = parse("insert into t values ($n, …)") $result = exec($handle)

One rule of thumb that every Oracle application developer ought to know is this one: It’s almost never appropriate to execute Oracle parse calls inside a loop.

5.3 Proposed Solution The original application code can be rewritten to produce exactly the same functional result more efficiently. The new code is this: $handle = parse("insert into t values (?, …)") for each row, varying $n over some range of values $result = exec($handle, $n, …)

Here are the changes: 1.

The new code doesn’t use do() at all. Instead, it separates the parse and exec calls into two distinct function calls, so that the parse may be executed once and only once, outside the loop.

In the parse call, it uses a placeholder (in this case, the ? symbol) instead of some actual value to be inserted. This allows for one parse call to create a reusable cursor handle that can accommodate many different exec calls with different values.

In the exec call, the argument list includes the actual values that are to be bound into the original SQL statement in the position denoted by the placeholder.

Writing the code this way should eliminate 4,999 parse calls and of course some amount of time that is attributable to those eliminated parse calls.

5.4 Estimated Impact In a little test case executed on my laptop PC, it’s good enough to come up with an idea and then run with it. How much time will we save if we eliminate 4,999 parse calls? Easy, just change the code, re-run it, re-profile it, and see. But in the business world, sometimes it’s not so easy. Sometimes we can’t try something in a production system without constructing a compelling cost-benefit case to a manager that holds the authority to approve or reject the

implementation of an idea. A profile provides excellent information for making that case. Exhibit 9 is the best place to begin. It is the timed event profile for just the problem insert statement. Exhibit 9. The timed event profile for the problem SQL statement. Oracle subroutine --------------------------------instrumentation gap SQL*Net message from client CPU service, PARSE calls CPU service, EXEC calls SQL*Net message to client log file sync latch free instrumentation overlap --------------------------------Total

Duration (secs) ----------------7.956 70.2% 2.370 20.9% 1.843 16.3% 0.411 3.6% 0.066 0.6% 0.004 0.0% 0.003 0.0% -1.320 -11.6% ----------------11.333 100.0%

# Calls Dur/call --------- ----------9,861 0.000813 20,000 0.000118 5,000 0.000369 5,000 0.000082 20,000 0.000003 2 0.002096 18 0.000192 139 0.009497 --------- -----------

It is reasonable to expect that the proposed elimination of parse calls will result in the improvements depicted in Exhibit 10. The highlighted figures show the estimated result of the call reduction caused by rewriting the code. I expect for the code change to impact more than just the CPU service, PARSE calls component. In fact, there are four timed events for which I expect an elimination of parse calls to create a response time improvement. Exhibit 10. Estimated savings of the proposed source code modification to eliminate Oracle parse calls. Response time component

Time before (seconds)

# Calls before

Est. # calls after

Est. time after (seconds)

9,861

7.956

4,862

3.923

20,000

2.370

15,001

1.778

CPU service, PARSE calls

5,000

1.843

0.000

CPU service, EXEC calls

5,000

0.411

5,000

0.411

20,000

0.066

15,001

0.050

0.004

0.003

139

–1.320

139

–1.320

instrumentation gap SQL*Net message from client

SQL*Net message to client log file sync latch free instrumentation overlap Total

11.333

4.848

I have derived each “estimated time after” figure using the following relation:

timeafter = timebefore

callsafter callsbefore

Note that forecasting response time this way works only if two assumptions hold true. First, individual call durations must be uniformly distributed. If call durations are not uniformly distributed, then you need skew data to allow you to distinguish between best-case and worst-case predictions. It is therefore an important feature of profiling software to inform you about the skew inherent in the statistic that you’re analyzing. Second, a timed event’s response time must be directly proportional to the number of calls your program makes to the timed event. I’ll show you shortly why sometimes this assumption fails to match reality. It is therefore reasonable to suppose that response time for the statement will drop from 11.333 seconds (before) to 4.843 seconds (after). Plugging this estimated new response time value back into the SQL statement profile for the job (Exhibit 7) shows that we should expect the job that presently takes 11.601 seconds to consume only 5.116 seconds after the proposed change. That’s not a bad day’s work: it’s a 56% reduction in response time. As I’ll reveal shortly, a forecast computed like I’ve shown here is almost always a conservative estimate, which is additional good news.

5.5 Impact Analysis Running the program again after pulling the parse call out of the loop produces spectacular results, as shown in Exhibit 11. The response time is now only 1.530 seconds—a whopping 87% reduction in response time. Exhibit 11. The profile for the job formerly known as the “problem job.” Oracle subroutine --------------------------------instrumentation gap SQL*Net message from client CPU service, EXEC calls CPU service, PARSE calls log file sync SQL*Net message to client latch free instrumentation overlap --------------------------------Total

Duration (secs) ----------------0.974 63.7% 0.393 25.7% 0.381 24.9% 0.090 5.9% 0.027 1.8% 0.008 0.5% 0.000 0.0% -0.343 -22.4% ----------------1.530 100.0%

# Calls Dur/call --------- ----------4,993 0.000195 5,010 0.000078 5,011 0.000076 11 0.008194 1 0.027396 5,010 0.000002 9 0.000000 38 0.009035 --------- -----------

I hope that two thoughts are swirling through your head at this point:

•

Wow. This program runs even faster than I thought it was going to!

•

Wow. You sure stink at forecasting!

The first thought is a pleasant one because it represents an expectation nicely exceeded. However, the second thought is not so nice. Missing a forecast by this much is treacherous, even when the error is “in your favor.”

Remember, my forecast estimated a 56% reduction in response time (from 11.601 seconds to 5.116 seconds), but what we actually achieved was an 87% reduction (to 1.530 seconds). What if your manager responsible for making the go/no-go decision on implementing this solution had had a 75% cut-off point in his head? Then he would have rejected our proposal on the grounds that an expected 56% improvement wasn’t good enough. You would never have had the chance to show off your good work. If the forecast had been more accurate, your manager would have approved the project. So, what happened!? Why were the results so much better than the forecast?

5.6 Forecast Error Analysis To become a better forecaster, it is critical to practice using feedback. I presented a forecast for improvement in four separate lines of the statement’s timed event profile. Let’s analyze each individual line item of the forecast in turn. Exhibit 12 shows the analysis. Exhibit 12. Analysis of my performance improvement forecasting error. Before actual (seconds)

Prediction (seconds)

After actual (seconds)

Prediction error (seconds)

instrumentation gap

7.956

3.923

0.974

2.949

SQL*Net message from client

2.370

1.778

0.393

1.385

CPU service, PARSE calls

1.843

0.000

0.090

–0.090

SQL*Net message to client

0.066

0.050

0.008

0.042

11.333

4.848

1.530

3.318

Response time component

Total

My overall estimate was wrong by 3.318 seconds. Here’s why: CPU service, PARSE calls I actually did a pretty good job on this one. I missed by only -0.090 seconds. SQL*Net message to client I didn’t do too badly here either, at least on the timing prediction. However, you might notice that I missed the change to the number of calls pretty badly. I forecast a reduction from 20,000 calls to 15,001 calls. My timing estimate was close because the total time consumed was small to begin with. SQL*Net message from client Here, my forecasting error is more significant. I missed by nearly a second and a half. The reason is that I predicted a reduction in calls from 20,000 to 15,001, but the actual call count dropped to about 5,000. Had I accurately predicted this call count drop, my timing prediction would have been 0.593 seconds for this response time component, which would have been off

by only 0.2 seconds. The reason that the call count fell more than I expected is apparently a driver-specific issue that I haven’t researched fully. It’s an area in which I’ve satisfied myself that, for now, a pleasant surprise is good enough.15 instrumentation gap Here’s where I really messed up, missing by nearly three full seconds. The reason for this part of the forecasting error is, happily, predictable using concepts described in another of my works.16 In a nutshell, the problem with the forecast is that I predicted a linear improvement in response time (remember the formula I showed for computing timeafter?). However, the delays induced by queueing for a CPU don’t work in a linear way. Exhibit 13 shows how response time (R) for a single-CPU system degrades (goes up) non-linearly as utilization on a system (ρ) increases (moves to the right). Moving left-to-right on such a curve is a miserable experience: response time degrades exponentially as load increases, especially when you get rightward of the little dashed line—called the knee in the curve—shown here at ρ = 0.5. However, eliminating parse calls—which we’ve done with the application code change—represents a right-to-left movement along this curve, which is a very happy thing: response time improves exponentially. I predicted a linear improvement in response time, but the actual improvement was exponential, so my forecast was pessimistic.

Exhibit 13. Response time (R) as a function of utilization (ρ).

In my test, I used Perl with the DBI module version 1.37 and the DBD::Oracle module version 1.12. 16

See chapter 9, Queueing Theory for the Oracle Practitioner, in Optimizing Oracle Performance, by Cary Millsap and Jeff Holt.

5.7 Next Steps A good thing about software profilers is that they make it much easier to determine when you’re finished “tuning.” In this case study, I’ve eliminated 87% of the program’s original response time. The job now spends 24.9% of its time actually inserting (CPU service, PARSE calls) and 25.7% of its time in the client application and communicating between the database and client processes (SQL*Net message from client). But it still spends 63.7% of its time doing other things (presumably managing process preemptions). Is this good enough? That is a question for your business to decide. The answer comes from answering the following questions: How much better could we make it? How much would it cost to make it that much better?

One strategy for eliminating even more database calls plus undoubtedly some of the SQL*Net message from client duration is to use Oracle’s array insertion feature. So the question becomes: Is the performance you might gain by using Oracle’s array insertion feature within the application worth the cost of learning how to use it?

I leave that decision to you as an exercise.

6 Conclusion A profiler should be the backbone of any performance measurement strategy. A good Oracle profiler gives you the first information you need in any performance problem diagnosis project: exactly where all of your program’s time has gone. A profiler gives you two capabilities that you can’t find anywhere else: Measuring The core benefit of using a profiler is that you can see exactly what impact a given problem creates for exactly the program you care about. This capability is indispensable in problem diagnosis situations, and even in preemptive performance management tasks. Forecasting A profiler aids forecasting in two distinct ways. First, it allows you to play what if? for each component of your program’s response time. It shows you the details of what impact a proposed change will have upon your program, with a minimum of mathematical effort. Second, it allows you to see the exact result of a change after you make it. When you use a profiler in a test environment, you get a much more detailed idea of how the change you are testing will behave when you implement your change in production.

One of the most important practical features of a profiler is that it shows you where investment of time and energy is futile. Imagine banking your hopes that a USD 250,000 I/O subsystem upgrade will cut your response times in half. Now, before the upgrade, imagine your profiler output showing that every one of your Top Twenty Slow Programs has less than 5% of its total time consumed by I/O processing. Introduction to profiling commonly changes planning in a profound way. By selecting the particular case study that I’ve shown in this paper, I’ve meant to illustrate several lessons, including the following.

•

Some common performance problems, like the one described here, are much easier to diagnose with a good profiler than with any other tool. Other metrics (low system-wide parse-to-exec ratios, for example) don’t provide the measurement and forecasting capabilities that profiler provides.

•

Sometimes, it’s difficult to create an accurate response time forecast—I chose this case study specifically to illustrate this point. However, a profilerbased forecast always provides excellent value. When a profile-based forecast is wrong, it will almost always be wrong in the pessimistic direction.

•

Eliminating work is a highly effective optimization strategy, for two reasons. First, eliminating work of course reduces response time for every instruction you can eliminate from your program’s code path. But what many people fail to consider (I think because their profilers don’t illuminate the issue) is that, second, eliminating work reduces response time both for your program and every other program on the system that was queueing behind the repaired job for some resource. The impact is particularly prominent when you’re running multiple concurrent copies of the same program.

Acknowledgments Thank you to the people who have made Oracle profiling possible: Anjo Kolk, the father of response-based Oracle performance analysis; Mogens Nørgaard, the man who introduced me to the Oracle features that make profiling possible; Virag Saksena, who showed me the true potential of extended SQL trace data; Jeff Holt, the father of our Hotsos Profiler product line; Jon Bentley, the author of More Programming Pearls, which planted the profiling seed in my mind; Jonathan Gennick, my O’Reilly editor who helped me learn; Gary Goodman and the staff at Hotsos Enterprises, who make it possible for me to feed my family while having fun at work. Thank you to my contributors and proofreaders: Larry Klein, Karen Morton, and James Steel. And thank you to my beautiful and loving family—Mindy, Alex, and Nik, and my parents Van and Shirle—who support me every day.

About the Author Cary Millsap is a software developer, and instructor for Hotsos Enterprises, Ltd, a company that he co-founded in 1999. He is the author of Optimizing Oracle Performance, with Jeff Holt, a work for which the authors received the Oracle Magazine Authors of the Year award for 2004.

About Hotsos Hotsos Enterprises, Ltd. is a Dallas-based company with more than 600 customers and clients worldwide that use the company’s software products, classroom and on-site education, and consulting services to improve the performance of their Oracle systems. Hotsos’s revolutionary methods and tools help database administrators and application developers solve performance problems faster and more permanently than ever before. For more information on Hotsos products and solutions, visit www.hotsos.com or call +1.817.488.6200. You can find more information about Hotsos products and services related to this paper by visiting the following resources: http://www.oreilly.com/catalog/optoraclep/index.html This is the home page for the book Optimizing Oracle Performance. It includes a free download of the first chapter. http://www.hotsos.com/courses/PD101.php This page describes the Hotsos educational course that covers collection and interpretation of Oracle performance diagnostic data. http://www.hotsos.com/products/profiler.html This page describes the Hotsos Profiler, a software profiler for Oracle extended SQL trace data. The Hotsos Profiler P5 (scheduled for release in early 2005) is the only profiler for Oracle that possesses all of the profiling features discussed in this paper. http://www.hotsos.com/services/index.html This page describes Hotsos performance on-site and remote services for helping you optimize your Oracle systems.