Publication
: ST ES ts BE CTIC men ent A e PR quir agem Re an M
A
To Attract a Good Team, Pitch The Job and Fork Over the Dough
By Stress Testing to Beat the Guillotine Know How Much Stoning Your Systems Can Withstand
Ignite Your Deployed Apps
VOLUME 5 • ISSUE 5 • MAY 2008
Contents
14
A
Publication
COV ER STORY
Guard the Castle Keep and Foil Angry Mobs With Stress Testing
Systems often fail under pressure before getting any serious attention. Soothe
By Alan Berg
irate users and dodge the pitchforks—with stress testing.
20
How to Attract The A-Team
Strong, well-rounded test teams boast a healthy number of software engineers. But how do you lure these valuable players to your group—and when you’ve got them, how do you keep them there?
By Jeff Feldstein
Depar t ments
How Much Punishment Can Your App Take?
26
7 • Editorial Is your Web site ready for prime time?.
8 • Contributors
To get a clear picture of the kind of abuse your application can take from the public, use multi--scenario load and performance tests. By Prakash Sodhani
Get to know this month’s experts and the best practices they preach.
31
Deploy And Drag?
Got a killer app that inexplicably bogs down in deployment? We’ll show you how to use free client-side-only tools to troubleshoot those bothersome performance bottlenecks and get your apps up to speed. By Michele Kennedy
9 • Feedback It’s your chance to tell us where to go.
11 • Out of the Box New products for testers.
36 • Best Practices Does natural language mean you can dump documentation? Not exactly. By Geoff Koch
38 • Future Test Choose your weapons! The blossoming security market offers an array of appliances to protect your systems. By Ryan Sherstobitoff
MAY 2008
www.stpmag.com •
5
Ed N otes VOLUME 5 • ISSUE 5 • MAY 2008 Editor Edward J. Correia +1-631-421-4158 x100 ecorreia@bzmedia.com
EDITORIAL Editorial Director Alan Zeichick +1-650-359-4763 alan@bzmedia.com
Copy Editor Laurie O’Connell loconnell@bzmedia.com
Contributing Editor Geoff Koch koch.geoff@gmail.com
ART & PRODUCTION Art Director LuAnn T. Palazzo lpalazzo@bzmedia.com SALES & MARKETING Publisher
Ted Bahr +1-631-421-4158 x101 ted@bzmedia.com Associate Publisher
List Services
David Karp +1-631-421-4158 x102 dkarp@bzmedia.com
Lisa Fiske +1-631-479-2977 lfiske@bzmedia.com
Advertising Traffic
Reprints
Phyllis Oakes +1-631-421-4158 x115 poakes@bzmedia.com
Lisa Abelson +1-516-379-7097 labelson@bzmedia.com
Director of Marketing
Accounting
Marilyn Daly +1-631-421-4158 x118 mdaly@bzmedia.com
Viena Ludewig +1-631-421-4158 x110 vludewig@bzmedia.com
READER SERVICE Director of Circulation
Agnes Vanek +1-631-443-4158 avanek@bzmedia.com
Customer Service/ Subscriptions
+1-847-763-9692 stpmag@halldata.com
Cover Art by The Design Diva, NY
President Ted Bahr Executive Vice President Alan Zeichick
BZ Media LLC 7 High Street, Suite 407 Huntington, NY 11743 +1-631-421-4158 fax +1-631-421-4130 www.bzmedia.com info@bzmedia.com
Software Test & Performance (ISSN- #1548-3460) is published monthly by BZ Media LLC, 7 High Street, Suite 407, Huntington, NY, 11743. Periodicals postage paid at Huntington, NY and additional offices. Software Test & Performance is a registered trademark of BZ Media LLC. All contents copyrighted 2008 BZ Media LLC. All rights reserved. The price of a one year subscription is US $49.95, $69.95 in Canada, $99.95 elsewhere. POSTMASTER: Send changes of address to Software Test & Performance, PO Box 2169, Skokie, IL 60076. Software Test & Performance Subscribers Services may be reached at stpmag@halldata.com or by calling 1-847-763-9692.
MAY 2008
The Stress Of Going National happened when I clicked Watching the Today Show on one. “Wow, there are one morning in late some red houses there.” January, I saw a segment Again I was unimpressed. about Rottenneighbor.com, The problem? The a Web site relatively new at Web site failed to provide the time that lets people feedback. If the cursor identify and rate their changed to an hourglass neighbors—good or bad— or stopwatch, or if a word and post comments about like “Loading…” popped them. The idea is to give up, I would have underpeople about to relocate a stood immediately (or semblance of who lives Edward J. Correia maybe after half a minnearby. ute) that the site was having performAs someone who has lived though a ance issues. nightmare neighbor scenario, this Even a 404 error page would have seemed like a great idea to me. You been preferable to this application’s never really know what kind of people behavior. But because the app are living next door, even if you’re appeared to be working properly to lucky enough to meet them beforethe first-time visitor, I drew an incorhand, as I did. But the trials (literally) rect conclusion about the site’s value and tribulations of my family are a to me. story for another time. Today’s point is If I was in the market to buy someabout your Web site’s performance thing, this folly would surely have under load. resulted in a lost sale, and perhaps a When I got to the office that mornpermanently lost customer. ing, I couldn’t resist looking up my As it turns out, I returned to the site own neighborhood to see if I or any or a few days later and found better permy neighbors had been written up. formance. I figured it must have been Judging from the site’s performance, overwhelmed with hits after national my guess was that lots of other people exposure on the Today Show. Calls to had the same idea. To call it sluggish the company for comment on my thewould be kind. A more accurate term would be glacial. ory and its load testing practices, if A simple interface starts out well any, were not returned. enough. There’s a one-line form that The developers at Rottenneighbors defaults to a quick search—just type in a .com have since added user feedback zip code and hit enter. A mashup with during long load times. A message in a Google Maps appears, with red or translucent gray box states “Neighborgreen cartoon houses (a la Hasbro’s hood Ratings Loading…” whenever a Monopoly game) corresponding to map is adjusted: a handsome improvehomes that have been written up by one ment. or more neighbors. Along with red and green, the site When I tried it for the first time, a also now displays gold houses, which map of my neighborhood came up relindicate multiple comments on a sinatively quickly. Not knowing what the gle house or multiple houses grouped site was supposed to look like, I just together. figured that none of houses in my area They’re obviously working hard to had comments attached. keep people coming back. And they’ll Unimpressed, I clicked away from need to; there are far too few comments the browser window, but didn’t close in the areas I looked at for this tool to it. When I happened upon it about 30 be effective. And the site’s performance minutes later, the map contained a few under load might still be an unknown red Monopoly houses, but nothing variable. ý www.stpmag.com •
7
Contributors ALAN M. BERG again graces our pages with his enjoyable style, this time in the area of last-minute stress testing. Beginning on page 14, Berg shares his techniques for avoiding the angry villagers your application’s users become when they don’t get the performance they need and expect. Berg is the lead developer of Central Computer Services at the University of Amsterdam, a post he has held more than seven years. He holds a bachelor’s degree, two master’s degrees and a teaching certification.
JEFF FELDSTEIN manages a team of 40 software engineers across the U.S., Israel and India. He’s manager of software development at Cisco Systems, and is always among the most highly rated speakers at Software Test & Performance Conferences. Why? Because he’s extremely adept at recounting wryly the experiences of his 25-year career as a software developer, tester, development manager and computer consultant. Starting on page 20, Feldstein instructs you on the art of attracting and hiring highquality test teams, and the skills needed to retain them.
Load testing comes in many shapes and sizes. In his years as a quality professional with top IT organizations, PRAKASH SODHANI has seen many of those scenarios, and no two are exactly alike. Turn to page 26 to learn about some of those scenarios and some techniques for best tackling each. Prakash is a quality control specialist with a global IT services organization. He holds a master’s degree in computer science, is a Certified Software Test Engineer, Certified Quality Improvement Associate and Sun Certified Java Programmer, and holds MCP (SQL Server 2000, C#) and Brainbench's Software Testing certifications.
First-time feature writer MICHELE KENNEDY has been a programmer since 1988 when she developed salesforce automation systems with PickBASIC for tech publisher CMP Media. She left CMP to join TSI, a mapping and spatial technology consultancy, where she remains as a senior analyst. Michele shares the lessons she learned while troubleshooting a Web-based database application that performed well in the lab but slowed to a crawl once deployed. To learn how she used free tools to determine where the application was spending time—without packet sniffers or server-side software—turn to page 31. TO CONTACT AN AUTHOR, please send e-mail to feedback@bzmedia.com.
8
• Software Test & Performance
MAY 2008
Feedback TESTING REVOLUTION Regarding Edward J. Correia’s article “10 Things I Hate About Testing” (Test & QA Report, March 18, 2008), you may refer to an article from Cem Kaner, “The Ongoing Revolution in Software Testing,” at www.kaner.com/pdfs/TheOngoing Revolution.pdf. He is one of the most influential and vocal about software testing, and I am sure you have heard of him. Alex Almeda Manila, Philippines
KNOW IT WHEN YOU SEE IT “10 Things I Hate About Testing” is a great article! I work on the other side of the fence selling automated integration testing software. In early discovery phases, I quickly learn about my prospect’s desire to automate, even though they cannot articulate formal requirements. It is like going Christmas shopping on December 24. Upon entering the store, the rep asks, “What are you looking for? The response: “I am buying a present for my wife.” The rep inquires, “What are you looking for?” The answer: “I don’t know, but when I see it, I will know.” Jim Bernstein Claymont, DE
WORK, NOT PLAY I have been working as a test engineer for almost two years here in Brazil. I’ve just read “10 Things I Hate About Testing” and I agree with the first five things you pointed out. I have myself faced all of them. When I joined the company, testers’ salaries were smaller than developers. This has changed in theory, but is still not a reality for most testers here. The topic I most agree with you is the phrase “Testing is easy and anyone can do it.” After these two years working with other testers and developers, the perception that some developers have of testing and tests is clear. In the last four months I’ve been working in a mobile game project. At the beginning I could see that some of the developers thought my work was to have fun playing the game, but during the project I’ve been able to prove the other way around, showing that testing is a complex activity that needs concenMAY 2008
FROM THE BLOGOSPHERE In the February 2008 issue of Software Test & Performance, [I] read Karen Johnson’s article,“MultiUser Testing Should Be in Every Field of Dandy Test Designs.” Karen writes about a very rare subject: functional multi-user testing. Should admit that I started to read with a thought “one more article about performance testing” but soon realized that it is about quite [a] different subject. And yes, indeed, without functional multi-user testing, most of [the] errors mentioned in [the] article will slip through… Some may be found during performance testing (probably the most severe, like deadlocks, if they are in the typical scenarios included in performance testing), and you will need to trace them down to the source, and probably it will be much later down the cycle. I am afraid that very few companies really do functional multi-user testing (as described in the article), although this is just a speculation based on my personal limited experience. By the way, it is a good illustration that multi-user testing and performance testing are not synonyms. We can have single-user performance testing and multi-user functional testing. Blog post by Alexander Podelko
www.testingreflections.com/node/view/6665 tration, focus and certain abilities. I’m feeling pleased by receiving positive feedback from team members congratulating my work on the project. Two other things I personally think and hate about testing: • Developers and managers thinking automation is the solution for all their problems! I do not have much experience in test automation, although I’ve read some papers and articles about it. As a tester I understand the complexity that involves a successful test automation initiative, which certainly is not by automating everything you have. • Developers and other team members saying they do exploratory testing when they do not have any knowledge about the complexity of effective exploratory testing. I’m waiting for your other five topics. Josè Carrèra Recife, Pernamubco Brazil
JUST A WARM BODY? The main thing I hate about testing is not having management back me up. I
have to defend my job and position over and over to developers, managers and support staff because quality is so low on their list of priorities. They didn’t have QA for years so they don’t know what to do with me. Then when I do point out an error or errors (which is pretty much all the time), not only the developers but management as well immediately go on the defense. I frequently ask myself “Why am I here? They don’t even want me here. Why did they even hire me? They just want a warm body to be able to say in meetings with customers, ‘Yes, we have QA.’” If management would believe in QA, I could take this product to much higher levels! L.H. Dallas, TX FEEDBACK: Letters should include the writer’s name, city, state, company affiliation, e-mail address and daytime phone number. Send your thoughts to feedback@bzmedia.com. Letters become the property of BZ Media and may be edited for space and style. www.stpmag.com •
9
Out of t he Box
It’s Good to Have A Clarifire When The Job Is Process Management Clarity is critical for any successful team—each member acting as clarifier. To that end, automation tool maker eMason has released Clarifire 3.0, adding collaboration and automated document assembly capabilities to its flagship process management tool. Clarifire combines a process and business rules engine and customizable data collection system with a role-based environment. According to the company (www.emason.biz), a portal new in Clarifire 3.0 gives teams a “secure meeting place for the exchange of information, documents,
ideas and training.” The portal also provides secure access to user-automated processes and dashboards, the latter offering a single place to manage THE PROJECT MANAGEMENT DASHBOARD in Clarifire 3.0 displays broad status at a glance. integration. “Old wrapper applications, macros and mini apps can be elima level of “automation based on user inated with Clarifire providing central, actions.” Documents are created through secure, automated integration.” the selection of collected data and conform Enhancements to the rule engine add to provided templates.
Borland Smoothes Lifecycle QA With Silk Borland in early April began shipping Silk 2008, with enhancements that the company says fit the QA tool with every stage of the application development lifecycle. All three of Silk’s main modules have been updated in the release. SilkTest
2008, the functional and regression-testing module, introduces Open Agent, part of a framework that delivers “the latest support for Web 2.0 applications built on Adobe Flex” and Windows, the company said. An Eclipse interface and Java-lan-
SILK 2008 enhancements include testing apps made with Adobe’s FlexStore (shown). MAY 2008
guage scripting capabilities are scheduled to be added “later this quarter.” SilkPerformer 2008, the suite’s load and performance testing module, also adds support for so-called Web 2.0 technologies, including Flex via AMF3. The update also enhances AJAX savvy with recognition of XML and JSON requests. There’s also “comprehensive support,” said the company, with Remedy Action Request System, BMC Software’s service process management platform. “Enhancements to the Silk products reflect… better manual testing, expanded versatility around automation and improved enterprise support of the quality life cycle,” said Brad Johnson, Borland’s senior director of product marketing. Holding the reins to the suite is SilkCentral Test Manager 2008, which the company claims can now better ensure “reuse and management of large sets of test assets, greater flexibility for manual testers, security improvements and significant performance and scalability updates.” www.stpmag.com •
11
Original Software Offers Analgesic For Testers’ Pain
processor, memory and other settings. APIs are provided for integrating Lifecycle Manager with existing enterprise tools.
Original Software (not original slogan) offers “an aspirin for the testing community” with an update to its TestDriveAssist semi-automation and reporting tool and ManualTesting.com, a resource devoted to manual testing practices and practitioners. TestDrive-Assist, which gives manual testers screen, input, database and status information from their tests, now includes a link validation panel and can import test plans and use them as a checklist. Spell checking also has been improved, the company said. “It is a fact that manual testing is here to stay, but the process continues to be a major bottleneck of application development,” said company CEO Colin Armitage, of the hopeful benefits of the new Web site. “With the launch… we hope to assist the community with their manual testing processes.” The Web site provides free resources, tutorials, screencasts, whitepapers, an “ask the experts” zone and discussion forums.
Sigma Tracks Virtual And Physical Tests
VMware in The Development Lifecycle On March 31, VMware Lifecycle Manager became generally available. The tool provides management information and capabilities to simplify the task of measurement and chargeback of virtual machine creation and usage to individuals and departments. Information collected and reported by Lifecycle Manager includes virtual machine approvers and owners, dates of request and acquisition, deployment locations, time in operation and scheduled date of decommission. Automation features help avoid manual, repetitive and error-prone tasks, and allow enforcement of corporate policies. The new tool also allows IT to “create a catalog of standard IT services,” enabling users requesting a virtual machine to select from a predefined menu of configurations that vary by
12
• Software Test & Performance
What VMware does for virtual environments, Sigma does for tests. Sigma Resources and Technologies on April 15 began shipping a version of its SigmationTF testing framework that “manages the virtual test environment while VMware Lab Manager automatically sets up and manages virtual machines and infrastructure.” The company claims to save significant time and resources for test departments using VMware and to accelerate the process of implementing tests on virtual machines, as well as managing and supporting those machines.
Panorama 5 Now Tracks .NET OPNET Technologies now includes .NET capabilities in Panorama 5.0, the latest version of its real-time analytics and application performance-management system. The tool now supports both Java and .NET; it began shipping in late March. Panorama reportedly can span multiple servers and application tiers to provide “real-time performance data of complex applications by monitoring system and application metrics within each server and across all tiers.” The data can be helpful for spotting performance anomalies, identifying trends and making business decisions.
CodeSonar: This Is Not Your Gramma’s Analysis Tool Declaring an industry first, GrammaTech announced last month that CodeSonar Enterprise, its flagship static analysis tool for C/C++ code, is “compatible with all aspects of the Common Weakness Enumeration standard.” CWE is a dictionary of software weak-
ness types agreed upon by dozens of software companies, mostly security-related, and developed by MITRE Corp. for the National Cyber Security Division of the Department of Homeland Security. “CWE is an important and valuable initiative that will help CodeSonar users understand the state of their code more effectively,” explained Paul Anderson, GrammaTech’s vice president of engineering. CWE was created to serve as a “unifying language of discourse and a measuring stick for tools and services,” according to a GrammaTech document accompanying the release. IBM, Microsoft and Oracle are among the companies in cooperation with the effort.
Content Manager 4.5 Gets More Savvy With Drag-and-Drop Savvy Software last month unveiled Savvy CM 4.5, the latest version of its Web site content management that now includes the ability to move content and navigate a site using drag-and-drop. And for shops using ColdFusion, CM 4.5 now includes administrative tools for the Adobe environment with broad site-management capabilities. According to the company, Savvy CM offers one level of interaction for Web developers and another for a Web site’s users, allowing the latter group to create, edit and manage Web site content without a developer’s coding know-how. Content editing is done by “clicking on an area of a Web site in Savvy’s browserbased interface, updating the information and then publishing it to the Web with another click,” said the company. Other features introduced in version 4.5 include the ability to password-protect content for multiple user classes, groups and other management roles; a “publish all pending content” command; the ability to move a page and all its relations and to designate a page as active, disabled or deleted; and a log-in object for embedding security into a template. Send product announcements to stpnews@bzmedia.com MAY 2008
4AKE THE
HANDCUFFS OFF
QUALITY ASSURANCE
Empirix gives you the freedom to test your way. Tired of being held captive by proprietary scripting? Empirix offers a suite of testing solutions that allow you to take your QA initiatives wherever you like. Download our white paper, “Lowering Switching Costs for Load Testing Software,” and let Empirix set you free.
www.empirix.com/freedom
By Alan Berg
I
t should never happen, but it does. A sad truism is that the professional stress tester is routinely asked to come in and perform some form of capacity testing on a mission-critical enterprise Web application at the last of last minutes. Not every time, but it happens just enough to make you wonder where and whether the collective common sense in any particular organization exists. For a significant number of large organizations I’ve seen, a system first has to fail under pressure or start looking to users like it’s about to selfdestruct in order to draw any serious attention. In fact, the system administrator gets scared only when systems start creaking and bending into extreme yoga positions. Bottlenecks hide behind other bottlenecks; minor bugs grow to legendary proportions. Suddenly there’s a flash of lightning, and anonymous wailing begins from the throngs of angry villagers. Often just as unexpectedly, the Alan Berg is the lead developer at the University of Amsterdam.
14
• Software Test & Performance
clouds part and funds become available for the department in pain to find vital maintenance, performance measurement, results dissection and planning for a course of action. Higherlevel management sits by to urge the workers on. They return to the path toward stability, but with little time and under intense pressure. With stress testing tools, some things can be done quickly, and others take more time, effort and experience. Adding extra servers to the front of your load-balanced topology may sound like a good idea. But in reality, if the bottlenecks are due to capacity limits of your third-tier database server, adding more horsepower won’t eliminate the bottleneck, but will increase the risk to your infrastructure.
Put Time in a Box Doing a thorough job in precious little time requires time boxing: dividing the project into small segments, each with its own objectives, deadlines and budget, if necessary. It’s also vital to know MAY 2008
How Stress Testing Can Help You Avoid The Angry Users, Their Pitchforks and Torches where to find quick wins—those requiring just a small amount of analysis to steer you in the right direction and move you toward your objective. Here are a few tactics that have worked for me over the years, proving useful on a number of large-scale, public-facing systems. Know your enemy—and quickly. Powerful tools such as JMeter and LoadRunner are the nuclear bombs of the performance-testing arsenal. They can hurt your systems fast and with relatively little hardware, so they’re best used with caution. It’s easy to mistake a thread in JMeter (or a similar tool) for the equivalent of a user. However, that’s simply not true. JMeter can potentially react between 10 and 20 times faster MAY 2008
than an average human can. A typical desktop with JMeter can run 300-500 threads—depending on the complexity of the task—a number equivalent to approximately 3,000 concurrent users. Running JMeter over five desktops in master-slave configuration raises that level to 15,000 concurrent users, the rough equivalent to a community of about 150,000 Web users. In other words, you can thoroughly stress out even the biggest university infrastructure in the world or almost any popular Web site with just a few desktop computers. With that kind of power just clicks away, you can do a lot of damage. Fun though it is, knocking out a system is not the purpose of stress testing. It’s mostly to gain confidence that the
infrastructure can survive the worst (and most heavily trafficked) day of the year, to tune the system and help identify bottlenecks, and to wring that last 25 percent of capacity out of every single component. Therefore, it’s important to base your expectation of stability on realistic tests that stress all of the subsystems.
Hits per Hour The first important number to measure is the hits-per-hour. Apache logging is adequate for most numbers, and interpreting the data is straightforward. To count the number of hits in February between 1300 and 1400 hours may be as simple as entering the following command-line instruction: grep 12/Feb/2008:13 access_log | wc –l www.stpmag.com •
15
AVOID THE ANGRY MOBS
Let’s say you get the number 1,000,011. One hit includes the loading of each gif on the page, plus the JavaScript and all the other things the Web browser downloads to render a page. On the systems I test, the ratio of HTML pages to the rest is around 1:5 or 1:10. For stress testing at a constant rate, you’ll therefore need to hit the Web pages around 200,000 times an hour. And if you want to test for certain stability on the worst day of the year, you should probably double this to 400,000 hits per hour. I wouldn’t worry about distributions quite yet. Remember our scenario; we have time only for quick wins. If you’re lucky and have a Debian Linux box hanging around, I suggest using Bradford Barrett’s Webalizer. It’s a fast and efficient log analyzer that installs with one simple command:
FIG. 1: BRADFORD BARRETT’S WEBALIZER
sudo apt-get install webalizer
way I like my log crunchers.
Generating a detailed report is just as simple: sudo webalizer -Q access_log Note: -Q stands for really quiet, which is the
On completion of the analysis, Webalizer updates the report under the directory/var/www/webalizer. Figure 1 shows a day’s usage pattern
FIG. 2: A WEBDRUID FLOW DIAGRAM
Home page 37% (5003 hits)
70% (3142 hits)
/webapps/login
17% (757 hits)
7% (311 hits)
2% (246 hits)
1% (62 hits)
58% (4645 hits) 47% (6228 hits)
4% (436 hits)
12% (1001 hits)
LogFormat “%h %l %u %t \”%r\” %>s %b” common LogFormat “%h %l %u %t \”%r\” %>s %b %T” common
/sso_login.html 28% (2233 hits)
16
• Software Test & Performance
for one of the farms of Web servers at the University of Amsterdam’s LMS. From the report, you get an understanding of the relative importance of pages, particular IP addresses, kilobytes of transfer and time of day. When you’re planning for the running of scheduled tasks such as backups, virus checkers, database updates and the like, it’s particularly important to be aware of your time-of-day stats. Zooming in, you can clearly see two main peaks—one at 1300 and one at 2100 hours. If you’re looking for sophisticated testing, you must take into account each population’s usage patterns by narrowing the period. Again, I normally use grep to split into times of day-bounded files. You have a range of Web analyzers to choose from; however, even the best tool is only as good as the data that is given to it. Make sure that your log files record the response time per request; sometimes this value isn’t set by default during installation. For example, for Apache 1.3, you may need to change the common format in a similar way to:
Once enabled, working out average session times is viable. Tools such as Awstats are worth the extra effort of configuring to gain such figures. MAY 2008
AVOID THE ANGRY MOBS
A Trial Run Let’s say we learned that for the normal user, the average session time is 350 seconds, with 175 hits per user. That equals 0.5 hits (not HTML pages) per second. Assuming a constant hit rate for the hour of testing:
F
ROM THE TRENCHES With a good team, some sticky tape and decent instincts for estimation, stress testing can deliver significant improvements in already stressed environments. I caught up with performance engineer Chris Kretler at a recent conference for the opensource learning management system Sakai. Chris works in a university environment dealing
Total hits / (time of test * user hits per second) = no users on system 1000,000/(3600 *.5) ~ 555 users To make things more complex, we’re taking only a rough estimation of the capacity of one machine in a small park of three. Therefore, for semi-realistic testing, we need about 1,700 threads hitting 600,000 HTML pages an hour, which means a delay-per-page-hit of around eight seconds per thread. You now have to choose either to hit the system harder from one desktop, or softer and more realistically from three. Caching improves performance significantly for most modern deployments. If not done correctly, load testing hits only a few pages hard. Caches fill up, and then you’re only measuring a near-ideal situation. Therefore, even if you have a very short time box to achieve a realistic test, you may still need a range that’s as wide as possible. To help with traffic analysis, I use WebDruid, a tool similar to Webalizer. Out of the box, WebDruid produces a nice graphic that clearly defines user flow and simplifies detailed planning. Figure 2 is a screen grab of a small part of such a flow showing ratios between different events and volumes. I use a combination of grep and a log analyzer to see emerging patterns at different times of the day. Ratios are important for stress tools that have widgets that can divide the flow of testing. Fine-tuning the ratios and listening to the answers from the volume calculations is important, especially when fiddling with the delays in tests. Having a large random dataset to help the threads fill the system caches makes for quite reasonable testing that predicts expected capacity.
with extremely large-scale deployments. We talked about his time in the trenches and how he stops the villagers from becoming angry mobs with pitchforks and torches.—Alan Berg
Alan Berg: What’s your current function? Chris Kretler: I’m a QA performance engineer at the University of Michigan. Michigan has a dedicated environment to test performance on new configurations of hardware and software. We currently use LoadRunner to generate load at or above production levels.
Which types of systems are you testing [or] have tested before? Prior to Michigan, I worked for Compuware as a consultant in Silicon Valley for 10 years. Most of the work there was with customer-designed Web applications, ERP packages like SAP and Oracle, and Citrix applications. The focus at Michigan is Sakai, an open-source Java education package.
Have you any hints or tips for the readers? Three things come to mind: 1. Study your production environment. This is what we want to replicate in our test environment. What mix of transactions occurs at peak periods? For Web applications, how many hits-per-second occur at peak? How large is our database in production? 2. Set up a baseline data environment, back up and restore to this prior to large tests. The initial data setup will take time, so make sure to include this in your test plan. Having the same data environment prior to key tests means that performance differences are likely due to the software or hardware changes you’re testing for. Limiting the number of changing variables between tests is good practice. 3. Get the experts involved. The response times and throughput generated by most performance testing tools is a good place to start. Just as important is the data from application and database servers. Interpreting production-scale test results requires the expertise of those who maintain the systems in production. Their time is often short, so it’s important to schedule tests when they can be available.
Can you think of any generic examples of bad practice? I can think of two things. A tester can often control the first, but not the second: 1.Testers often work with inadequate representations of production.This is particularly true of databases.There may be no contention when 10 processes are concurrently querying a table with 100 rows in it. Nevertheless, this test doesn’t prove anything if that same table may have 100,000 rows in production.This may give users a false sense of security prior to a rollout. It may also give testing a bad name if there are eventual performance problems. 2. Testing too late in the cycle, where testing is viewed as a rubber stamp before deployment. In this circumstance, testers often feel pressure to certify a release they are not comfortable with. Test as early as possible. We want to give performance feedback to developers
Quick Wins
when they have enough time to make modifications before release. This may mean asking
The systems are failing, you’re down to 10% on the shields, and the structural integrity involves chewing gum and a script to restart your servers at 5 a.m. Under this oppressive situation, you need quick wins—and you need them yesterday. First, get all the relevant people
for and testing the working portions of an application prior to the full implementation.
MAY 2008
together: the developer who knows the application, the database administrator who understands the importance of query optimizations and index optimization, the network guru, and the all-
around team member who knows a little about everything. Boundary testing is always a good start point. Fire up your stress tests to a level where response times are turwww.stpmag.com •
17
AVOID THE ANGRY MOBS
bulent and long. Iterate the level down until your system is screaming, but hanging by its fingernails at the edge of the cliff. Get the team to find the most significant bottleneck, check the logs for errors, and generally brainstorm on the big wins. You might be surprised how little details—such as an operating system setting for file descriptors, indexing in a database or throttle values in a SAP dispatcher—can have enormous impact. Smoke tests also can lead to quick results. Just leave your system under significant stress overnight and see what happens. All kinds of unusual events creep out from under the floorboard: database backups, network down time, changes in temperature at your data center. If your tests are wide enough, even misbehaving memory can come alive. Smoke tests are also good to burn in new hardware, better to fail early before placing under fire by critical end users. Failover tests are also a good idea and should be performed regularly. Place
your system under load and pull a network cable out to see what happens. This will put your assumptions to the most valid of tests. Results I’ve seen in
•
sate for these types of behavior.
An Esoteric Art These problems have nothing to do with any given organization. After all, stress testing is esoteric, and to some extent remains a secretive art. The value of applying this hardlearned methodology crystallizes only under those extreme situations. Put more precisely, the testing itself is the relatively easy part; it’s the interpretation of the results that’s a bigger challenge. As stress testers, you’re looking at a black box while the rest of the team looks out from the inside. Hopefully, there’s a meeting of minds somewhere in the middle, where optimal performance meets peak user demand. ý
The testing itself is the relatively easy part; it’s the interpretation of the results that’s a bigger challenge.
18
• Software Test & Performance
• the past range from perfect performance to a comedy of errors. Failovers usually work, though sometimes they take a little too long, and one server gets most of the hits or the server coming into service gets well and truly hammered. It’s best to know what to expect—assumption is a dangerous sport under high load. Load balancers can usually be configured to compen-
REFERENCES • Awstats - http://awstats.sourceforge.net • JMeter homepage - http://jakarta.apache.org /jmeter • JMeter recording test tutorial http://jakarta .apache.org/jmeter/usermanual/jmeter_proxy_step_ by_step.pdf • Sakai – www.sakaiproject.org • Webalizer - www.mrunix.net/webalizer • WebDruid - www.projet-webdruid.org
MAY 2008
Gomez, Inc. 10 Maguire Road Lexington, Massachusetts 02420
JUST BECAUSE YOUR INFRASTRUCTURE SURVIVED THE LOAD TEST DOESN’T MEAN THE CUSTOMER EXPERIENCE DID TOO. With Gomez Reality Load™ XF you can test from outside the data center — from machines in the real world — where the unpredictability of network and user conditions can challenge today’s complex web applications. Unlike competing offerings, Gomez Reality Load XF delivers a global network of over 40,000 backbone and desktop testing locations, more than 100 browser and operating system combinations, fast results and encyclopedic detail. So, even if your infrastructure survived a traditional load test, with Gomez you’ll know if the customer experience did too.
With no software to buy or install, or advance reservations, you can start load testing fast. And, you can easily combine load testing with our Active Network™ XF or Actual Experience™ XF products, for 24x7 production monitoring. For details in the United States call +1 877.372.6732, in the United Kingdom +44 (0)1753 626 632, or in Germany +49 (0)40 53299 207. Or visit www.gomez.com. Gomez. Ensuring Quality Web ExperiencesSM.
Gomez® and Gomez.com® are registered service marks, and ExperienceFirstTM, Active NetworkTM XF and Active Last MileTM XF are service marks of Gomez, Inc. All other trademarks and service marks are the property of their respective owners.
20
• Software Test & Performance
MAY 2008
By Jeff Feldstein
pplications and developers get all the attention. But testing the large, mission-critical software applications of today has become
A
as complex as developing the applications themselves. Users want everything, and they want it now: new features, more elegance, faster performance, increased ease-of-use and scale. Mix all these needs with upper management asking us to improve productivity while cutting costs and development leaders suggesting new methodologies, and we really have our hands full.
The Test Team In our roles as test engineers, test managers and quality assurance departments, we attempt to balance the needs of the many. But for our companies to stay ahead, we need to improve in all areas simultaneously. We must constantly examine our approach to testing and look for areas to improve; some over time—others drastically. Without strong, well-rounded test teams, we have little hope of doing this. So the question becomes: How best do we recruit, hire, retain and grow strong, well-rounded test teams? It’s my belief that software testing should be approached primarily as a software development problem. Keep in mind that the principles of software engineering and test engineering are the same. A technique or structure for the product that increases testability or quality can always be traced to a software engineering principle. Remember, too, that all software engineering principles that apply to the product are just as important for the Jeff Feldstein leads a team of 40 testers for Cisco Systems. MAY 2008
automation used to test the product. Software engineering skills are important to accurately measure the effectiveness of testing, as well. For example, only an experienced software engineer can design, develop and interpret code coverage of the product under test. This means that a large majority of your test team should be skilled, experienced and capable software engineers. This doesn’t mean that the entire team has to possess in-depth software engineering knowledge, but you’re in a far stronger position if a good percentage of the team does have these skills. The exact percentage can be adjusted based on your project needs and the realities of the current organization.
The Test Cycle Typical software quality problems include bugs found too late, features that used to work but now don’t (regression), performance that’s too slow, scaling that’s insufficient, response to stress in the environment that’s not proper, and instability over the long term. How is mitigation of these quality issues aided by software engineers acting as testers? Bugs found too late. If you’ve read this far, you know that the later the bug is found, the more expensive it is to fix. With the typical style of testing (wait until features are handed off to test), most or all of the bugs are found late in the development cycle. If, however, we code the test automation at the same time (or before, as in test-first development
How to Sell The Job, Pay And Treat The Tester, So Your Department Won’t Need A Revolving Door
www.stpmag.com •
21
TEAM RETENTION
approaches) the code itself is written, we’d find bugs almost as they’re introduced in the code. There would be no need to wait for the hand-off to test, and often we’d avoid complicated troubleshooting and debugging sessions. Coding this type of automation usually requires a fairly high level of programming skill, but allows bugs to be found sooner (or avoided completely). Usually it’s a worthwhile exchange. Regression. After the test case is developed and debugged, it can be reused whenever necessary (on later builds, newer code releases or environmental compatibility checks) relatively inexpensively. The sooner the test case is developed, the more it will prove useful and the easier it will be to fix regressions. Regression tests run manually and cost almost as much to run each time they’re executed, and automation through the GUI often costs more to maintain than to rewrite. Performance. Well-engineered automation, written in a fully capable programming language, can easily be equipped to measure timing and other resource utilization checks. These checks can be recorded with actual test results and tracked over time. Automation that is derived from scripts may have a side effect of slowing down the application under test so that timing results aren’t accurate. The idea that whatever is measured is changed by the measurement can’t be completely avoided, but the unwanted side effects are easier to detect and work around when the people doing the measuring have the right level of software skill. Scale. While there are many excellent commercial and open source tools to aid with scale testing, we often find that some augmentation to this tooling is required to correctly duplicate our specific environment. Sometimes the available tooling doesn’t help at all, and we need to develop our own scale environment. In either case, developing, refining and supporting this tooling requires skilled software development engineers. Even in the case where tooling might fit our specific needs, it’s
22
• Software Test & Performance
important to understand exactly what the tool is doing and how it’s implemented in order to grasp what it does well and what it is not testing. Stress testing. I define stress testing as the verification of a system’s behavior when input levels: a. exceed expectations, or
that a released product will behave correctly over its entire life span, which normally ranges from several months to two years. Stability testing is used primarily for released applications rather than Web servers or applications that you might have control over and can be upgraded relatively easily. This kind of testing often requires simulating the life span of the product in just a few days or weeks; in other words, speeding up the cycle to see how the system behaves over time. Compressing real time by causing events to happen at a far greater speed than they might in the real world is a complicated problem. The more powerful the programming tools and the more skilled the engineer, the better chance you have in pulling off this complex type of testing.
Categories of Testers
b. are at abnormally high levels, or c. are hostile to the system (such as with unexpected environmental conditions or invalid parameters). This stress can include too many users logging in, losing power or too many orders requested at once. While many of these test cases can be designed by a domain expert and others who might have a knack for breaking software, designing the tooling and measuring the efficacy of this testing can require in-depth software engineering knowledge. Stability testing. This means ensuring
Now that you’re totally convinced of the benefits of software engineers to the testing organization, let’s spend some time on the test engineers themselves. For the purposes of this discussion, test engineers can be broadly defined as falling into three categories: classic testers, scripters and software engineers. Classic testers are the application testers who comprised most of the test groups I encountered when I first switched over from development. These testers often came from the application’s domain, and may have a strong background in the business that the application serves. They might be power users, experts in the field under test (rather than technology experts, software engineers or computer scientists), and they may have an in-depth understanding of the application under test. These testers can be very good at designing both positive and negative test cases, and can be very creative in designing ways of killing the product. Scripters often started out as classic testers but picked up some scripting knowledge on the job. They might have learned a bit of Perl or shell scripting, or perhaps learned scripting while using a commercial test tool. Perhaps MAY 2008
TEAM RETENTION
they also spent time in some of the GUI record-and-playback technologies. I distinguish here between a scripting language and a programming language. A programming language (such as Java or C) has the capability to not only procedurally execute steps, but has complex logic and looping structures and the capability to create and manipulate advanced data structures (see “The Script on Programming” sidebar). Software engineers are software developers functioning in the role of testers. These people have both formal education and professional experience in software engineering. This is the same education, background and skill-set that the software developer possesses; it’s just the application of those skills that is different. In other words, the software engineer as developer builds the application under test. The software engineer as tester develops a system that validates its functionality and tries to break it. These skills allow the test engineer to work with the development team to ensure that the product is designed for testability and is properly unittested. Since he possesses the same skill-set and has a similar vocabulary as the developer, the test engineer is better able to assess the product and point out areas where the testability of the product can be improved. This tester also has the ability to design and build test automation that doesn’t just mimic a user sitting at keyboard. Instead, he can build automation that is well designed and is as sophisticated as the system under test. Additionally, the software engineer is better equipped to build an environment that more closely models or simulates the customer or real-world environment and is itself error-resilient.
conferences, I’ve asked many test engineers and their managers to relate some of the things developers have said about them. Here are a few of the most biting: “Testers are dumb,” “Testing is boring, manual and repetitive,” “Testing isn’t creative and lacks innovation opportunities,” “Testing is not a career,” “Test engineers blindly follow process whether it contributes to quality or not,” and, of course, the attitude I had when I was developing: “Test is a necessary evil standing between my brilliant creation and the user.” How many of these have you heard? Do you have even worse stories? One obstacle we face in building high-performance test teams is overcoming
•
creating a small slice of the application, you’re developing a complete system to verify its functionality, scalability, reliability, etc. Additionally, your software needs to contain creative ways to break the system under test using methods and techniques that are either not practical manually or too time-consuming to reproduce for each product build. You decide what to build. Most “software developers” are building systems to meet requirements published by a marketing or product management team. New features and capabilities must first be sold to the product managers before they can be put into a product. The marketing team then needs to weigh the cost of the developer-suggested feature against the feature they might have had in mind and decide what to drop. In testing, you show your boss how the software will increase productivity, help ensure quality or find more bugs, and he gives you the goahead to start building your feature or system. No marketing meetings, survey groups or cost analyses. Small teams. Because test teams are smaller than their development counterparts, you tend to work on a bigger chunk of a project (or maybe a whole project) instead of just a small slice. In addition, you tend to have a better perception of the “big picture,” understanding more about the behavior of the product as a whole rather than an in-depth knowledge of how your specific slice is implemented. Creativity. Because developers are typically building what a product manager specifies, their creativity must usually be limited to the implementation—in other words, how best to implement these features to maximize performance and quality. Test engineers, on the other hand, can be creative not only in implementation, but in the features they build, as well. Two common areas for creativity are in the tooling required for testing and new, creative ways of testing. This might also include new ways to stress the system, model-based testing and perhaps various ways to automatically measure the quality of the application and the automation itself.
In testing, you show your boss how the software will increase productivity,
help ensure quality or find more bugs, and he gives you the go-ahead.
Recruiting Software Engineers How do we go about building this team, given the fact that most developers want only to develop and will avoid testing at any cost? Perhaps the biggest obstacle to recruiting software engineers into a testing organization is the perception of test engineering. While attending test MAY 2008
• these perceptions. By now, you’re probably wondering how to explain to that candidate looking for a development position what a software engineer will do in the testing department. Here’s one statement to get you started: “A software engineer on my test team designs sophisticated software to validate the system’s functionality, ensure performance and scale requirements, and prove its reliability and resiliency to errors.” If the candidate hasn’t hung up the phone by the time you finish that sentence, it’s time for a more detailed discussion. In fact, the position that I take during recruiting is not that the position is only equal to development, but that it’s better. Here’s why: Sophisticated software. You’ll be designing and developing highly engineered, sophisticated, error-resilient software. It’s only the purpose of the software that’s different. Instead of
www.stpmag.com •
23
TEAM RETENTION
You get to break your buddy’s code. No explanation required. Innovation. Test automation—especially highly engineered automation— tends to be a newer, less explored field than pure software development. This allows more room for innovation. While this overlaps with creativity, the emphasis here is on a field that is still developing and emerging. With the industry focusing on ways of developing higherquality software, new testing techniques and methodologies are developed, published and refined constantly. Bringing these innovations to the team and applying them to a particular product can be rewarding for the engineer while helping to increase product quality and/or team productivity. Customer interaction/business knowledge. Most effective test engineers—be they classic, scripter or software engineer—need to develop a good understanding of the product as a whole. This understanding is more common in testers because we’re less concerned with implementation details of a specific feature and more concerned with software validation. We often switch our testing from one area of the product to another based on whether there are new features implemented, existing features enhanced or bugs fixed. In addition, because customer-found bugs are our escapes, we find ourselves analyzing how the customer is using a product and feeding that information back into the testing for the follow-on releases. Also, test engineers are often well suited for customer interaction during external testing because the customer probably needs the support of a person who has a wide understanding of the product. Many engineers enjoy the customer interaction and business travel that often goes along with this. If that’s the case for you, it’s a good recruiting point. It’s helpful to keep in mind that not every engineer wants to stay an engineer forever. I’ve seen many cases where both developers and testers
want eventually to move into more of a business role. In many ways, it’s easier to make this leap from the testing role: Good understanding of the product, business issues solved and experience with customer interaction all help with this transition.
Recruiting Disappointments
•
Even when you follow all of the guidelines given above and bring up other (probably better) points that you come up with on your own, you won’t win over every candidate you talk to. You may not even win over the majority. Don’t be discouraged. We’re all fighting an uphill battle by asking developers to consider testing. Remember, once good engineers are open the possibility of testing, it grows on them. I’ve spoken to many engineers who started out in testing for various reasons and have grown to enjoy it. Remember too that development doesn’t win over every good candidate. Development engineering candidates may think the team they’re interviewing for is too big, too small, too junior or too senior; or the project, department or company doesn’t fit into their career goals. Like those of us in testing, development managers need some salesmanship in recruiting.
These engineersin-the-making can bring fresh, new, unorthodox ideas to an organization.
24
• Software Test & Performance
•
Alternative Approaches There are a couple of alternative approaches to attracting candidates: interns, internal candidates and the “test now develop later” strategy. Interns. My own experience has shown that hiring interns (student software engineers near to completing their degree) is an extremely effective recruiting technique. Interns working fulltime for the summer or part-time throughout the school year (assuming you have a college or university nearby), can be thought of in the dual role of working on a project while on an extended two-way interview. Most times, interns work diligently because they’re often ecstatic to have their first “real” job. They care far less
than either the experienced programmer or recent graduate that they aren’t yet in their dream role of software developer. I still remember my happiness the day I quit my job pumping gas for a programming internship that started the next day, paying double, in a real office. Although still at college, I felt that all of my hard work was finally yielding a return, although the actual work I was doing in my new position wasn’t important. I can’t imagine it’s too different for interns today. These engineers-in-the-making can bring fresh, new, unorthodox ideas to an organization. Once they’ve worked in your test group for a few months or a year, they’ll understand implicitly the advantages of testing and can be converted to full-time employees upon graduation. Internal candidates. Occasionally, you’ll come across engineers in the development organization who are ready for a change, or possibly looking for their leadership role or other opportunity for advancement. A relatively senior software developer can make an excellent leader for a test department. This offers the dual benefit of ramping up test while providing growth for an engineer with leadership potential. Test first, then develop. Many companies (and some of my colleagues) in testing have found the recruiting technique of “Join my company now in a test role, stick with it for 18-24 months, and we’ll move you to development” quite successful. I have, too. Often I’ve found that fewer engineers wish to switch to development than expected, and grow into solid senior test engineers and leaders.
Other Resources Another resource for recruitment is to use recent graduates (and perhaps former interns) you’ve hired. Look for a recent graduate who has a few months to a couple of years of experience in your test team, is articulate and really enjoying his job, and is enthusiastic about his prospects. Engaging these engineers in the recruiting process— having them meet and speak with candidates close to their age—can help greatly with your recruiting efforts. The candidate might hear the “testing is better” message a little clearer when it comes from somebody who was recruited through this method and came to MAY 2008
TEAM RETENTION
enjoy the benefits of working in test. On occasion, the testing department will experience some attrition to development. This is rare in my experience, but it’s not necessarily a bad thing. It can be advantageous to have between 10 percent and 20 percent of your test engineers switching to development after a couple of years because these engineers can be the ones most concerned about testable architectures and quality. They typically remain friends to test because they have seen the process from both points of view and can be effective ambassadors between test and development. In addition, they can often be counted on to be early adopters within development of newer test tools or techniques.
Keep ’Em on the Farm Now that you’ve hired your team and they’re working well, you need to focus on retaining your top talent. As your test organization matures, it’s important to pick leaders who enjoy testing and are happy with their work. The example they implicitly set for the junior engineers on your team will be noticeable. It’s also important to keep your engineers interested and challenged. Be sure to make time for training in your business or problem domain, software engineering skills, general business skills, and management or leadership training. Another important retention technique also makes good sense from a quality standpoint: Work to have test engineers involved early in the product life cycle. Early-definition phases aren’t too soon to involve test engineers. Not only will they have important insights to give to the marketing and architecture teams, they’ll also enjoy the experience. Part of your job as a test leader or manager is to keep your finger on the pulse of your colleagues in development. If they’re researching or teaching new technologies or architectures, make sure that you have participation from test as well. This helps build a close relationship between development and test, and provides interesting insights for your test engineers.
Organization Support A test manager is usually part of a larger organization, and can’t build his dream team without support from that organization. Once it’s understood that testing is just as important as development and MAY 2008
T
HE SCRIPT ON PROGRAMMING While test departments might gain some productivity with scripting, there are several drawbacks to approaching automation solely with scripting. Scripting often mimics a human sitting at a computer keyboard. Depending on the situation, this mimicking can be faster than a human, but at times it’s slower. While scripting will reproduce scenarios exactly, it does so with less imagination and intuition than a human, catching errors the tester wasn’t looking for or had missed with scripting. In addition, mimicking what a human can do may not fully exercise the application. It’s often difficult or awkward to scale scripts. While scripting may work well at mimicking one user, it’s not well suited to generating the activity of hundreds or thousands of simultaneous users or transactions. Another important capability of the programming language over scripting is the ability to manipulate complex data structures. The effect of this during testing is that it limits the tester’s ability to perform data-driven testing. In other words, in cases where the logic of executing a script is the same but data varies the test case (both in input and output), it’s advantageous to reuse the same logic for all the test cases. While some scripting languages can read and compare files, manipulating and analyzing data that doesn’t easily fit into a flat-file format is usually quite awkward. Programmatically detecting errors though an interface designed for humans is difficult (and error-prone by itself). Detecting errors by employing technology that tries to turn human-consumable output back into something that’s programmatically readable seems like adding a level of complexity to the error detection to compensate for less-skilled programmers rather than enhancing the capability of the test system. Yes, it can be done. But instead of having tests written to interact directly with a programmatic interface such as an API, the engineer can concentrate on deep analysis and error-checking of the application rather than the distractions that occur when trying to figure out if the human-presented data was translated correctly back into a programmatic form. While one script can call another, scripting reuse is not nearly as elegant or powerful as what’s available in a full programming language. Software reuse, executed correctly, is an area that greatly improves code productivity and reliability.
requires engineers of similar skill-sets, it’s important that the organization treat the engineers as peers. This includes equal pay scales and provisions for parallel career tracks with similar opportunities for advancement. Culturally, it’s important that test engineering has an equal say with other organizations regarding document reviews, project decisions and other team discussions. Additionally, test shouldn’t be looked upon as a service organization to development, but instead have an independent voice to the senior managers making business decisions based on data from multiple points of view. While many software companies already have this kind of structure, just as many probably do not. If your organization is among the latter group, it’s
not likely to change overnight. But with persistence, demonstration of solid results and measurable improvements, these types of parity should be achievable over time. Although in many environments, it’s viewed as subservient to other members of the development team, test engineering is an important component of the software development process. The strongest test departments are comprised substantially of software engineers. Given the right leadership, it’s possible to recruit, retain and grow software engineers in test roles that contribute greatly to product development and quality while they’re respected and valued by their development peers, customers and senior management. ý www.stpmag.com •
25
By Prakash Sodhani
M
ost companies perform some sort of testing or quality work before rolling out a product.
For some, testing is just a formality; others stake their entire reputations on it. If Apple’s iPod and iPhone had been released without testing, I can’t imagine they’d have become quite the phenomenon we’re seeing today. It was a big surprise to me to see how loosely load testing was being used in the various companies I’ve worked in as a consultant. I’ve seen projects where any business process run with x number of users with no heed to results constituted a successful load test. On the other hand, some companies follow a formal process and timeline for load testing, and pay great attention to load test results. In either case, it’s the preparation and planning for load and performance tests that can prevent headaches for a company whose products are susceptible to public usage. In my experience, load and performance testing assume more importance when more significant usage is expected from end users. Load testing scenarios also depend on peak times of usage and the type of application. There are, however, a few load-testing scenarios that can apply to most applications that return adequate results for determining if your application is up to the challenge. This article discusses a few such scenarios. You might also note that since I’ve used LoadRunner quite a lot during past few years, some of my techniques may
be related specifically to it. However, I believe the practices covered here also are applicable to any load testing tool, with perhaps a few adjustments. First, let’s cover a few prerequisites for performing load and performance testing.
Prerequisite 1: Load Testing Criteria Most companies will have an expert use automated test tools to simulate end users and business processes. One of the problems associated with this is that load testing experts may not understand the business processes associated with the application under test or be able to determine which business processes should be chosen for load test. The selection of business processes can be tougher in organizations that don’t have service level agreements (SLAs) or any documentation. One approach to selecting the business processes for load testing is to rely on the expertise of the subject-matter experts, business analysts and anyone else who might have good idea of how to use the system under test. Some of the selection criteria for business processes could include CPU utilization, number of users (simultaneous and concurrent), business criticality (end user and stakeholder expectations) and hard-
A Look At
Typical Usage Scenarios Can Keep
You Out of A Performance Quagmire
Prakash Sodhani is a quality control specialist at a global IT services company based in Texas.
26
• Software Test & Performance
MAY 2008
ware performance. Once the initial business processes are selected, the test engineer can focus on those scenarios, as the business processes are expected to generate the majority of the traffic.
Prerequisite 2: Load Testing Environment Before starting a load test, it’s essential to document the specific conditions under which the test will be conducted. Conditions may include: • Test machines to be used and their hardware configurations • Software installations and versions • Time of day when the test will be conducted (to make sure results aren’t skewed by unrelated traffic) • Number of end users simulated • Database parameters, such as the number of connections • Server (Web and application) configurations The objective is to precisely delineate the conditions under which the tests are conducted to make sure they simulate the production environment as closely as possible.
MAY 2008
Prerequisite 3: Monitoring and Analysis Every component of the system needs monitoring: client, Web servers, application servers, database, network, etc. Before beginning load testing, you must understand what will be monitored and how the results will be analyzed. It’s important to know what constitutes a successful load test and what results will require changes to the system. Monitoring involves on-the-fly observation of the results and tweaking the parameters as more tests are conducted. Almost all load test tools allow testers to look at the performance of the system while the test is running. You may see various graphs and choose which parameter you want to see. Let’s look at an example. You’re running a test, and expect a response time of 10 seconds when running a business process with 50 users. While running the test, you observe that when the number of running users is 30, the response time is 15 seconds. So, you already have exceeded the expected response time with fewer users. This information requires that you change your scenario; it doesn’t make sense to continue a test when you know performance won’t be up to spec.
www.stpmag.com •
27
NO USAGE SURPRISES
Y axis = Number of virtual users
FIG. 1: SIMULTANEOUS USERS
X axis = Duration of test run
In the above graph, we see that the number of users is increased as per ramp-up = time set and allowed to run for some time. At the end, users are stopped as per ramp-down criteria.
An analysis is performed when the tests are completed. This in-depth study helps us derive conclusions, but it’s important to decide on the metrics that will determine whether load tests were successful or whether changes need to be made to the system before deployment. With expected results in place, you can use different graphs to derive appropriate conclusions. For example: If as before, the expected response time is 10 seconds with 50 users, you can merge a response time graph with the number of users and analyze the response time for each user.
Simultaneous Users Test This test is intended to simulate the most realistic scenario of application performance during peak usage times. Let’s first understand the difference between simultaneous users and concurrent users. Simultaneous users are the virtual users who are “ramped up” during testing in a pre-decided time interval. For example, let’s say that the number of users during peak usage for the AUT is 100. In a realistic scenario, all 100 users wouldn’t start at exactly at the same time, but would start at different times over the course of a number of minutes. So it’s more realistic to add, say, five users every 30 seconds. This will have different users performing differ-
28
• Software Test & Performance
ent business processes at different times, which is also more realistic. We define concurrent users, on the other hand, as virtual users performing exactly the same business process at any point in time. This scenario is intended to generate heavy load with the goal of breaking the application
• Concurrent users are the ones doing exactly the same business process at any point in time.
• under the worst possible conditions. I’ll cover concurrent users next. The following are some of the steps to be performed for a simultaneous user test: • Identify the peak usage of the
application in production. This may require discussion with developers, business analysts and other subject-matter experts (SMEs). • Based on the number of users at peak time, determine how you’ll ramp up the virtual users during the test. For example: If peak usage is 60 users, you may have a scenario where you ramp up one user per second so that all users are using the system after one minute. This also ensures that users are using different parts of application at some point in time. • Set the duration you want users to run this scenario. You may set the duration depending on realistic scenarios. As you’re simulating the peak scenario, it’s possible that all the 60 users won’t be using the system for the complete duration of the test. The recommended duration to run simultaneous users tests is around 45 minutes to one hour. • Set the ramp-down criteria for the virtual users. As you specified the ramp-up, you may want to specify how you want users to stop the test. You may want to stop them gradually or at once (as might happen at quitting time). This test ensures that we have simulated a realistic scenario where the focus is less on finding the breaking point of application and more on the performance of the application based on metrics such as response time, CPU utilization, server performance and other tasks under realistic usage.
Concurrent Users Test Concurrent users are the ones doing exactly the same business process at any point in time, with the goal of breaking the application. Let’s look at an example to understand it a little better. Let’s say you have a business process, X1, that you want to include in the test. This business process has two distinct transactions, or steps. For the concurrent users test, we’ll script our business process in the following way: Process X1_Start <rendezvous1_start> Transaction1_X1 <rendezvous1_end> <rendezvous2_start> Transaction2_X1 <rendezvous2_end>
MAY 2008
NO USAGE SURPRISES
This pseudo code shows process X1 and two distinct transactions. Each transaction is enclosed within <rendezvous> tags. When each virtual user sees this tag, they’ll wait for other virtual users to reach this point before continuing to next step. If your scenario specifies 100 users, all users will first wait at <rendezvous1_start> for all 100 users to reach this point before they execute Transaction1_X1. This follows for all transactions that have <rendezvous> tags. The use of these tags allows us to simulate a load far greater than that simulated by the simultaneous users test. In the concurrent users test, each user hits the system at one point, generating a large number of hits. Here are some of the steps to be performed for a concurrent user test: • Identify the peak usage of the application in production. This may require discussion with developers, business analysts and other subject-matter experts (SMEs). • There is no ramp-up in the concurrent users test. All users execute any step in a business process exactly at the same time. Even with a ramp-up, the users will wait at <rendezvous> tag for other users to arrive before continuing the process.
FIG. 2: CONCURRENT USERS
Y axis = Number of virtual users
ProcessX1_End
X axis = Duration of test run
Here, all users start at exactly the same time (line coincides with Y axis) and are then allowed to run for some time. At the end, all users are stopped at once (no ramp-down criteria).
• Set the duration for which you want the users to run this scenario. The recommended duration for concurrent users test is around 15 to 30 minutes. The duration is shorter than other tests, as it generates significant load and is expected to reach the breaking point pretty soon.
Y axis = Number of virtual users
FIG. 3: RELIABILITY TEST
X axis = Duration of test run
In the above graph, we see two scenarios. For scenario 1, the number of users is increased as per the ramp-up time set and allowed to run for some time. At the end, users are stopped as per ramp-down criteria. Once scenario 1 is done, scenario 2 is started and the process goes on as per the number of scenarios to be run.
MAY 2008
• There is no ramp-down criteria for virtual users. Users run the business processes and stop at the same time.
Reliability Test This test is an extended version of the simultaneous users test. Reliability refers to the consistency of a measure—a test is considered reliable if we get the same result repeatedly. The purpose of this test is to make sure the application continues to behave as expected when subjected to a realistic load for an extended period of time. It isn’t rare to see applications behaving perfectly fine when subjected to load for a short time, and then degrade in performance when a consistent load is applied over a longer period. For example, an application may work well when a load of 50 users is applied for an hour, but show unexpected results when same load of 50 users has persisted for four hours. The following are some of the steps to be performed for the reliability test: • Identify the usage of the application in production over a period of time, such as over five hours during peak hours. This may require discussion with developers, business analysts and other subject-matter experts (SMEs). • Based on the range of the number of users as estimated above, www.stpmag.com •
29
NO USAGE SURPRISES
Capacity Test This test is another version of a simultaneous user test. The objective of this test is to find the breaking point of the application under the conditions of simultaneous users. The following are some of the steps to be performed for a capacity test: • Identify the peak usage of the application in production. This may require discussion with developers, business analysts and other subject-matter experts (SMEs). • Based on the number of users at peak time, determine how you’ll ramp up the virtual users during the test. For example: If peak usage is 60 users, you may have a scenario where you ramp up one user per second, so that all users are using the system after one minute. This also ensures that users are using different parts of application at some point of time. • Set the duration for which you want the users to run this scenario.
FIG. 4: CAPACITY TEST
Y axis = Number of virtual users
construct different scenarios and determine how you’ll ramp up the virtual users for each scenario during the test. For example: if peak usage is 60 to 150 users over five hours, then you may choose a scenario in which you ramp up one user per second so that all users are using the system after one minute and another scenario in which you have two users per second. You can also schedule when each scenario executes; for example: scenario 1 before scenario 2 and vice versa. • Set the duration for which you want the users to run this scenario. You may set the duration for each individual scenario. For example, you may schedule scenario 1 for 1 hour and scenario 2 for 45 minutes. The recommended duration to run each individual scenario is around 45 minutes to one hour. • Set the ramp-down criteria for the virtual users for each individual scenario. As you specified the ramp-up, you may want to specify how you want users to stop the test. You may want to stop them at gradually or at once.
X axis = Duration of test run
In the above graph, we see that the number of users is increased as per the ramp-up time set and allowed to run for some time. Once it reaches the peak usage and application metrics are within accepted range, more users are added manually (seen as the spike at the middle of the test run) and allowed to run for some time. This process is continued until degradation in application performance is seen. At the end, users are stopped as per ramp-down criteria.
You may set it depending on realistic scenarios. As you’re simulating the peak scenario, it’s possible that all the 60 users won’t be using the system for the complete duration of the test. The recommended duration to run a capacity test is around 45 minutes to one hour.
•
the maximum number of users that your application can support based on realistic scenarios. • Set the ramp-down criteria for the virtual users. As you specified the ramp-up, you may want to specify how you want users to stop the test. You may want to stop them gradually or at once. This test ensures that you’ve simulated the realistic scenario in which the focus is on finding the breaking point of your application based on the simultaneous-user scenario as mentioned above. While any type of test with random numbers of users can be called a load test, this type of testing is best left to a specialist to design, perform and analyze. The true objective of load testing is realized when different combinations of scenarios are tried in controlled conditions and the results are carefully analyzed. There are infinite combinations of each of these scenarios. What’s important is to understand the constraints of the load tests in the context of your application parameters and stakeholder expectations. Only then will you achieve the goals of confidence of stability and performance optimization. ý
The true objective of load testing is realized when combinations of scenarios are tried in controlled conditions.
30
• Software Test & Performance
• • Once the peak usage is reached and application performance is within expected limits, increase the number of users on-the-fly during the test. Keep doing this until you start observing performance degradation or application behavior that doesn’t conform to the expectation set beforehand. This will give you
MAY 2008
Troubleshoot Performance Bottlenecks With Simple (And Free) Client-Side-Only Tools By Michele Kennedy
ou’ve designed a killer Web application. You’ve completed development. You’ve executed your testing plan. You’ve finally
Photograph by Christian Lohman
Y
given your stamp of approval. You deploy this gem to production only to see the application come to a screechMichele Kennedy is a senior analyst at TSI, a mapping and spatial technology consultancy. MAY 2008
ing halt. Yes, it runs. But... it... runs... so... slowly. You become frustrated and start blaming the network or the server. After all, the same code runs lightning fast in your development environment. The differences are the network, the server and the database. But how do you figure out which is the culprit? Working for a consulting company, I’m usually prohibited from installing any kind of monitoring or trou-
bleshooting software in the client’s production environment. Development is done offsite with a sampling of data from the production database. Tracking down performance bottlenecks can be frustrating, and the troubleshooting tools I have are limited. Making matters worse, the application is already finished and deployed. There’s no time or budget to improve performance by rewriting code. Besides, I’m confident that my development team has already done their best to optimize code during the development phase through efficient page design, caching and database interaction. My goal is to see what’s happening in the production environment and get the application running at peak performance. I was working with a GIS application built with SQL Server and Visual Studio along with a product called www.stpmag.com •
31
SPEED PEEKS & TWEAKS
MapXtreme by Pitney Bowes MapInfo. The application was installed on the client’s production server and was running on a WAN for internal employee use. We used simple tools for our diagnosis while onsite and identified some common errors. We also tweaked a few other items before we left for the day, with the application running like a top. Here’s how we did it.
To Your Debug Flags Be True Our first move was to look at the web.config file that ASP.NET uses to define application settings. One thing jumped out. The debug flag found in the compilation statement in the web.config file <compilation debug = true> was set to true. This is not something you want in production. This flag is set to true by default to aid in development, but often finds its way into production while still set to true. It may not seem like a big deal—the application looks the same; runs the same. In fact, it could be difficult to tell if the flag is set to true unless you actually look at the setting in the configuration file. But this line of code can cause your application to run slower by reduced caching and higher memory overhead and compilation inefficiencies. Reduced Caching. When the debug flag is set to true, scripts and images that are loaded from the Web Resources.axd handler aren’t cached. The WebResources .axd handler loads resources such as gifs and JavaScript files used by the application. Without caching, all client JavaScript code and images will be reloaded on every page, every time. When you run in debug mode during development, caching of Java Script and images is intentionally disabled to save developers from having to continually clear their cache. This saves lots of development time, but
•
you need to remember to switch that flag to false for deployment. Caching becomes all the more important for page-loading speed if your application is heavy with thirdparty tools, menus, tree-type controls and JavaScript that will use the WebResources.adx handler to deploy. This was the case with our application, which contained not only lots of JavaScript, but menu images and tool icons deployed by WebResources.adx. The ideal scenario is to avoid trips to the server whenever possible, and to bring back as few bytes necessary. Caching is a great way to cut down on bytes and round trips to the server. Another crucial difference when running an application in debug mode is the timeout feature, during which time ASP.NET requests do not time out. This is intended to allow the developer ample time to debug without losing a request. But in a production environment, timeouts are necessary to avoid runaway processes that could create bottlenecks. When the debug flag is set to true, the compilation of ASP .NET pages also takes longer because batch optimization is disabled. When an aspx, asax or ascx page is requested for the first time, the page is compiled into an assembly, or dll. If debug is set to false, multiple files are batch-compiled into larger assembles, rather than a single assembly for each file, as in debug mode. When applications contain a large number of aspx and ascx pages, memory fragmentation can eventually result in Out of Memory exceptions.
We built the app using a small subset of the production database, which is actually quite sizeable. This can mask some SQL inefficiencies.
32
• Software Test & Performance
•
Settings for ASP.NET 2.0 If you’re running ASP.NET 2.0, there’s a setting that will eliminate inefficiencies arising from DEBUG=TRUE flags in production. The setting is in the machine.config file.
These machine.config statements specify the deployment retail value: <configuration> <system.web> <deployment retail=“true”/> </system.web> </configuration>
The <deployment retail = true> switch will disable the <compilation debug=“true”> switch. This setting also will disable the ability to render trace output in a Web page for users to see, and disable the ability to show detailed error messages remotely. The last two items are security best practices. They prevent hackers from gaining knowledge of your application or server from the error messages being displayed.
Release vs. Debug Build Visual Studio offers the option to create builds in Release Mode or Debug Mode. Before deploying to production, you should always create your final build in Release Mode. This will optimize your code for the best performance. A debug build generates a file containing debug symbols (the .pdb file), incorporates extra instructions to accommodate the debugger and allow you to set breakpoints, and will allocate extra memory during execution. When you create your build in Release Mode, it’s optimized for efficiency. The executable will be smaller, faster and more efficient than a debug build.
A Peek at the Database The next step in our troubleshooting day was to take a look at process response times and their impact on returning page control to the browser. When we started, some of the response times could be measured in minutes. No user is that patient. We needed to analyze the queries that were taking place and figure out how they could be optimized. As often occurs in development, we built the app using a small subset of the production database, which is actually quite sizeable. This can mask some SQL inefficiencies by providing quicker response times than could be expected in production. Microsoft provides SQL Server Profiler, a utility that makes it easy to examine your queries, step through problem queries to find the cause and MAY 2008
SPEED PEEKS & TWEAKS
locate slow-running queries. When run on the machine hosting the production database, SQL Server Profiler captures all SQL events—including queries and stored procedures—that are executed. The events are saved in a trace file that can be analyzed to diagnose problems. One problem when using Profiler in production is that the trace window will be flooded with query statements not just from the AUT, but from every other application using the server. You’ll need to set up a filter based on database ID to narrow the trace only your database. The following SQL statement will return the ID of the current database: SELECT DB_ID() AS [Database ID];
To set up the Profiler filter to trace only your database, go to New trace on the menu bar and click on the Filters tab. Click on Database ID and enter the value of your database in the Equals box. After setting up our filter, we executed a sequence of steps in the application that we knew would have slow response times. Stepping through the application code could have told us which queries were being executed, but Profiler makes this job a snap—it displays a list of all the executed queries and stored procedures. The Duration column lists the length of time each query took to execute, making the slow-running queries easy to spot. Once we identified the dogs, we worked in Microsoft’s Enterprise Manager and SQL Query Analyzer to get to the root of each problem. Executing the stored procedure in Query Analyzer took just as long as we saw it take in the application. At this point, our problem was reduced to a SQL issue. How do you speed up a SQL query? In some cases it’s as easy as adding an index. We identified several indexes that could speed up our queries. After adding an index, execution time for one of the queries went from minutes to seconds.
Fiddling With Logs Who would have thought there could be files missing files from an application? In development and even after deployment, the pages contained no MAY 2008
F
IDDLER ON THE WEB Fiddler is a free tool from Microsoft that acts as a transparent proxy, inspecting and logging all HTTP traffic. From inside a browser window, Fiddler displays a list of each file called to generate a Web page and captures object names, bytes, load time and other details associated with each file. There is sufficient help on the Fiddlertool.com Web site to get you started with the tool, including some great demo videos that really show the power of the tool. It’s a great tool for both debugging and performance tuning. Configuration To install Fiddler, just download the executable from the fiddler Web site (www.fiddler tool.com/fiddler) and step through the install process. It runs on all versions of Windows (including Vista) and requires 10MB of disk space, an 800MHz processor and 128MB RAM. After installation, a Fiddler icon will appear on the IE toolbar and in your Programs menu. When you start up Fiddler, it immediately starts capturing traffic.To make it stop, toggle the Capture Traffic command in the File menu. The first thing you’ll see in the Web sessions window is a call back to the Fiddler Web site to make sure you’re running the latest version. If you’d prefer to check for updates manually, you’ll find that under Fiddler Options in the Tools menu.You should know about two other useful options. In the Rules menu, select Performance. Fiddler can show the duration of each HTTP request going over the network by enabling “show time to last byte.” Add a timestamp to each logged item by selecting “show response timestamp.” For the remaining options, the default settings were good enough for me. Operation Once you have Fiddler capturing your traffic, go ahead and run your Web application. When you switch back to the Fiddler interface, you’ll see every HTTP request being logged in the Fiddler Web sessions window, with details including number of bytes for each item, the length of time it took to retrieve the item, cookies used and caching information. This may expose errors in logic that you might have been unaware of, such as routines that are called unnecessarily or multiple times. As in my case, you might also see 404 errors popping up if your application is trying to access a file that isn’t there. Analysis Using the statistics tab, you can highlight one or more line items to get quick sums of total bytes for the selected items as well as the total response time. The Session Inspector tab allows you to take a look at header and response information. You can add or remove compression to see how it affects your application. You can view images, HTML or raw header information. The timeline tab gives you a Gantt-style chart of your application so you can see which processes are taking up the most time. Use the filters tab to filter out responses that you don’t need to see, eliminating some of the clutter in the Web sessions window. Happy Fiddling!
broken image symbols, and everything looked and behaved fine (aside from the performance issue). It was time to check the logs. You can learn a lot about what’s happening in an application behind the scenes by looking its logs. Helpful for this task is a free tool from Microsoft called Fiddler (see “Fiddler on the Web” sidebar, above). It captures and displays HTTP traffic with an easy-to-use Web-based interface.
Fiddler eliminates the need to hunt and sift through large log files for the lines that pertain to an AUT. The display includes object names and load times, and allows line-item highlighting for file size and other information useful for debugging. Using Fiddler, we were able to see the list of components being called each time the application was loaded. And there were plenty of 404 errors. We even noticed a routine that was www.stpmag.com •
33
SPEED PEEKS & TWEAKS
unnecessarily being called twice. Missing objects are not always obvious just by looking at pages through a browser. By scrutinizing the Web logs and capturing traffic in this way, you might be surprised by all the components that make up every one of your Web pages. 404 errors might appear as your page tries to access pieces of HTML or images that don’t exist or have moved. These 404 errors almost certainly are slowing you down. A missing file causes the application to go look for it every time, making an unnecessary trip to the server and coming back empty-handed. In our case, the application was taking only about a second to generate all the 404 errors. But we also found that our 404 errors were resulting from a placeholder piece of HTML used in an IFRAME that didn’t exist on the server in the path where it was looking—yet the application was trying to retrieve it over and over. By placing the file in the proper place on the server, the application
was able to find and cache it so it wouldn’t have to keep returning to the server to look for it again and again. Does this mean you want to see only return codes of 200 in your logs
•
browser and round trips to the server. With the use of just a couple of tools—none of which required installation on the production server—we were able to increase the response time of our application to the level we had during development. Fiddler revealed 404 errors that we never knew we had, allowing us to shave one second of page load time. We increased our page compilation efficiency and improved our caching by switching our debug flag to false. We used SQL Server Profiler to identify database columns badly in need of an index. Numerous variables impact application performance. Network traffic patterns, competition from other applications, lost or missing objects, redundancies in the code—all can play a role in slowing your app to a crawl. A few simple tools can point you to trouble spots. And a few small changes can help you to pick up a lot more speed. A little persistence and some systematic logic will be the keys to your success. ý
The idea is to reduce the number of bytes sent to the browser and round trips to the server.
34
• Software Test & Performance
• (meaning the file was returned to the browser)? Not necessarily. You want to see a lot of caching. Return codes of 302 and 304 indicate that a file was retrieved from cache; you want to see a lot of those. Remember, the ideal is to reduce the number of bytes sent to the
MAY 2008
FALL
Crazy About Testing! September 24-26, 2008 Marriott Copley Place, Boston, MA
Best Prac t ices
Requirements Engineering Or Natural Language? Will the emergence of natMore Nuanced Agile Awareness ural language processing Agile programming pracand semantics mean a tices may not have permareprieve for rigorous docnently slain the documentumentation of requireit-first dragon, but agility ments? undeniably is winning the If the subject of requirePR war. ments management makes Its chief foe, various iteryou cringe, it’s likely ations of the waterfall because you remember the process, is generally set up pre-PC era when computGeoff Koch as a stodgy and hopelessly ing resources were generaloutdated straw man in much writing ly expensive and scarce, and when proabout trends in coding. Too much writgrammers were taught not to start coding anything until they got the requireing about software today is deadly dull ments right. and full of overly circumscribed prose, The music might have been freewhich is what makes the Manifesto for spirited—a little Jefferson Airplane, Agile Software Development so bracing. anyone?—but the process of creating Pertinent to the topic at hand, two of code certainly was not. As a result, in the Manifesto’s four gospel truths relate many development projects, huge directly to managing requirements. “We chunks of time were spent engaged in have to come to value… working software the mind-numbing task of writing and over comprehensive documentation,” editing requirements documents. declares Kent Beck and his collaborators Things are different today. at agilemanifesto.org, and “responding to Iterative, build-a-little, test-a-little change over following a plan.” processes are all the rage. And at the The Manifesto does come with a reaextreme end of the spectrum, some sonable disclaimer of sorts, assuring readagilists gleefully urge developers to ers that there is in fact some value in docjust start coding and forget even a curumentation and plans. But as is true of sory attempt to gather requirements most actual-mileage-may-vary statements, up-front. this one gets lost in the way that agile Looking at this nearly half-century methodologies are discussed and vigorsweep of computing, it’s tempting to ously debated online. declare that any schmuck can see the One meme worth questioning in the inexorable evolution from processagile community, and indeed in technolheavy to process-lite requirements ogy generally, is that the geezers just management. don’t get it. Consider the case of Donald Call me a schmuck, but I think this Reifer, a self-proclaimed old-timer with is an overly simple and overtly incormore than 30 years of experience in IT. rect view of software history. My bestReifer, a consultant and visiting computer science faculty member at the University practice takeaways after reporting this of Southern California, is working on a month’s column? federal research contract to make it easy Requirements management matters for government agencies to handle more than ever, context is king and COTS, commercial off-the-shelf, softperhaps most important, technology ware. (In the mid-1990s, the Defense tends to progress in circles rather than Department began insisting on use of evolve in a straight line.
36
• Software Test & Performance
COTS rather than custom software, a procurement trend that’s long since spread to other federal agencies.) Reifer and his collaborators are building a system that will identify dead and potentially malicious code in the commercial software used by government IT architects. More interesting than the project itself is the development process being used. The project “is being developed in five spirals, and each spiral is being developed as a sprint,” says the softspoken, gravel-voiced Reifer. “Three months into the project, the customer had a prototype of the infrastructure and GUI for the tool system; that’s powerful.” Reifer is no newcomer to agility. In a 2000 paper “Requirements Management: The Search for Nirvana” in IEEE Software magazine, he argued that too many developers spend too much time specifying requirements. Sometimes, particularly when doing research and development, requirements are more likely to be discovered than delineated. And the best way to engage in this discovery process, says Reifer, is to use agile methods, including iterative development and rapid prototyping. However, Reifer is hardly one to declare that agility is the death knell of requirements management. He points out that many agilists sing the praises of modeling languages such as UML or SysML for the ability to quickly and easily patch together abstract models of software systems. But when building such models, aren’t agile developers practicing requirements management? In effect, Reifer says, this modeling is a way of “discovering what the user really wants.” A more concrete example of the importance of requirements manageGeoff Koch writes about science and technology from Lansing, Mich. Write to him with your favorite acronym, tech or otherwise, at koch.geoff@gmail.com. MAY 2008
Best Practices ment comes by way of Reifer’s collaboration with USC professor Barry Boehm, most famous for developing COCOMO, an algorithmic software cost estimation model. “We see lots of data, and what we find is that 50 to 75 percent of the work and cost associated with software maintenance is related to integration and testing,” says Reifer. “If that’s the case, then efforts to manage requirements, understand change and have traceability all must be streamlined.”
Cluing in to Context
Documentation is undoubtedly crucial when you’re parsing out pieces of your project to subcontractors.
What’s most refreshing about Reifer is that unlike so many agile adherents, he doesn’t defend the methodology with anything approaching religious zeal. Our conversation is peppered with avuncular aphorisms (“If you know what you want, specify it. If you don’t know what you want, explore”) and commonsensical observations about the overarching importance of context. For example, while the act of writing requirements documents doesn’t belong in every software project, such documentation is undoubtedly crucial when you’re parsing out pieces of your project to subcontractors or trying to coordinate activities among developers dispersed across countries and time zones. For confirmation that software development today is indeed a far-flung, worldis-flat endeavor, look no further than Google Scholar. There are a wealth of papers, many of which authored by European academics, that hint at an intriguing future for requirements engineering, one in which all those reams of requirements documentation turn out to be useful all over again. The idea is that the algorithm-driven ability to parse and understand natural language context might someday soon be applied to vast databases containing both external customer feedback and internal requirements documents maintained by MAY 2008
•
product engineering teams. Deciding what subset of these requirements to include in the next release might be as simple as querying this database and generating a prioritized list of related customer- and engineeringdriven requests. Several open-source prototypes are available for those who possess moderately deep wellsprings of both curiosity and time. One is ReqSimile(http:// reqsimile.sourceforge.net), developed by researchers at Lund University in Sweden. “We are currently running experiments on students to assess ReqSimile’s effectiveness and efficiency compared to commercially available tools,” says Lund professor Bjorn Regnell, coauthor of the 2005 paper “A Linguistic-Engineering Approach to Large-Scale Requirements Management” in IEEE Software that describes some of the preliminary performance of the tool. “Quite a few individual requirements engineers around the world have probably had a look at it.” Other tools include WMatrix (http: //ucrel.lancs.ac.uk/wmatrix) and Onto Lancs (www.lancs.ac.uk/postgrad/gacitua/start.htm). Fair warning here that in terms of maturity and slick GUIs, all of these tools are a far cry from those offered from the big vendors like Borland, Telelogic and IBM. But that’s not to say that these prototype tools and the underlying approach are altogether useless to developers today. “I think my advice to a developer would be to carefully assess their problem to see if [language engineering] techniques can help them,” says Pete Sawyer, a professor at Lancaster University in the U.K. Sawyer’s 2005 paper “Shallow Knowledge as an Aid to Deep Understanding in Early Phase Requirements Engineering” describes how raw text files can be analyzed to help guide the process of formulating requirements. “Essentially, if they have a lot of docu-
•
ments to mine for domain information, [some of these prototype] tools may be able to help,” Sawyer says. “If documents are not the principal source or if the information is in non-textual form, then the potential benefits of the techniques diminish sharply.”
Old-School Sense So it turns out old-school documentation still makes sense in certain contexts and may be even more useful in the future. This isn’t a surprise, given the way patterns repeat themselves through history. After all, computing is moving away from the desktop and toward the mainframe-like computing cloud, at least if you believe the hype from the likes of Google and IBM. And the free-spirited gamers among you know from the recent commercials for Xbox 360 game Lost Odyssey that even 40 years after “White Rabbit” was released, Grace Slick and the gang still are somewhat cool, at least in the coding world. ý
Index to Advertisers Advertiser
Page
AutomatedQA www.testcomplete.com/stp
10
Checkpoint Technologies www.checkpointech.com/BuildIT
34
Empirix www.empirix.com/freedom
13
Gomez www.gomez.com
19
Hewlett-Packard www.hp.com/go/quality
40
IBM www.ibm.com/takebackcontrol/systems
2-3
ITKO www.itko.com
8
Parasoft www.parasoft.com/qualitysolution
6
Seapine www.seapine.com/qra
4
Software Test & Performance www.stpmag.com
39
Software Test & Performance Conference www.stpcon.com
35
Software Test & Performance White Papers www.stpmag.com/tqa
18
www.stpmag.com •
37
Future Future Test
Test
The Showdown On Port 80 The Internet in some ways are not used to dealing is like living out on the with software, but they are frontier: Lawbreakers are accustomed to applieverywhere, looking to ances.” plunder and pillage with Choosing Your Weapons no regard for human The blossoming security decency, or for right and appliance market breaks wrong. But while the down into several cate“Wild West” settled down gories. The largest is the after a time, computer group of firewall and VPN security threats continue Ryan Sherstobitoff appliances that come from to rise, and show no signs a wide array of networking and securiof abating. ty vendors. Cisco, for example, builds As security threats continue to security features into its routers and grow, so do the number of companies switches, offers specialized VPN, fireturning to security appliances. Like wall and IPS appliances, so-called old-West pioneers who always packed a “Anti-X” editions covering viruses, six-gun, IT departments are forced to worms, spyware and spam, as well as take a vigilante approach rather than combo boxes. waiting for the sheriff to arrive. To proThen there are specialized security tect the homestead, companies have appliances to address specific activithe option of installing a number of ties. Decru, Inc., now part of Network security appliances to filter e-mail and Appliance, has something called the Web content before it hits their DataFort, a dedicated appliance for servers. encrypting and decrypting network “These days it is standard practice traffic. to install many things on an appliBluesocket, Inc. makes appliances ance,” says Paul Stamp, senior analyst for controlling the interface between for Forrester Research in Cambridge, wired and wireless segments of a netMass. “An appliance can use custom work. ConSentry Networks makes hardware to accelerate functions and appliances that sit between the access efficiency more quickly than just using and distribution layers on a network multipurpose hardware with some fabric and use algorithms to detect software installed on it.” network anomalies. Consequently, we’re witnessing a Further, there are perimeter securisteady rise in the use of security applity devices from vendors with roots in ances. According to IDC, the market the antivirus space. Although for threat management hardware will Symantec last year laid off its security pass $5 billion by 2009. One factor appliance staff to concentrate on mancontributing to the growing security aged services, both McAfee and Panda market is the distribution of security Software have unified threat manageduties throughout the IT organization. ment appliances that address a range “We are starting to find a lot of of security issues. functions that were being done by Perimeter security appliances security are being offloaded to differincorporate a number of features that ent teams, such as the networking one would typically find in a security staff,” Stamp explains. “Network teams
38
• Software Test & Performance
software suite. The advantage of the dedicated box is ease of setup—just plug the device into the network. While a corporate data center may have the support staff to manage separate applications, appliances are a good match for smaller companies that don’t have specialized IT security personnel. They’re also a good option for certain types of large enterprises. “We have found that it has more to do with how distributed an environment is, rather than the size of the company,” says Stamp. “Large retail chains like using appliances because they have many locations that have relatively small processing requirements.”
All’s Quiet Stamp notes that appliances are ideally suited to filtering typical e-mail and Web systems. But when traffic loads are unknown or less predictable, he advises going with software on a server. “If the server is overloaded, you can generally stick another CPU or additional memory in there, but you are a bit more constrained with an appliance,” Stamp says. “Appliances tend to be better when you can predict the load, and an organization generally knows how much e-mail it gets in a day.” He also says that a general-purpose security appliance might not be the best choice when the different types of traffic it filters are managed by different groups within the organization. “If you have an organization where e-mail filtering is supported by a different team than the Web content, you have a lot of finger pointing when something goes wrong,” Stamp explains. “You need to determine ahead of time who owns the box and what procedures to follow for dealing with issues that arise.” Given those caveats, installing a security appliance still allows an organization to achieve a much higher level of security without an excessive management headache. ý Ryan Sherstobitoff is product technology officer at security appliance maker Panda US. MAY 2008
where you can · Download the latest issue of ST&P or back issues you may have missed · Read the latest FREE white papers · Watch our FREE webinars · Sign up for our FREE e-newsletter, Test & QA Report · Visit our wildly popular technical conferences · Visit our sponsors to take advantage of their many valuable offers · Use our quick links to visit ST&P’s print advertisers · Renew your subscription
A LT E R N AT I V E T H I N K I N G A B O U T Q UA L I T Y M A N AG E M E N T S O F T WA R E:
Make Foresight 20/20. Alternative thinking is “Pre.” Precaution. Preparation. Prevention. Predestined to send the competition home quivering. It’s proactively designing a way to ensure higher quality in your applications to help you reach your business goals. It’s understanding and locking down requirements ahead of time—because “Well, I guess we should’ve” just doesn’t cut it. It’s quality management software designed to remove the uncertainties and perils of deployments and upgrades, leaving you free to come up with the next big thing.
Technology for better business outcomes. hp.com/go/quality ©2007 Hewlett-Packard Development Company, L.P.