Publication
: ST ES nt g BE CTIC yme unin A lo T PR Dep nce st- ma Po rfor Pe
A
VOLUME 4 • ISSUE 6 • JUNE 2007 • $8.95
Stretch Your Budget: How To Pick The Best Free Software
Thou Shalt Test Thy Software Better .NET Testing With Intelligent Random Data
www.stpmag.com
Flexible Testing Advice That Won’t Cramp Your Style
Trying to be agile when your Java code is fragile? Feeling the pressure to release software faster? Are you bringing new features to market as quickly as your business demands? If enhancing or extending your Java application feels risky – if you need to be agile, but instead find yourself hanging on by a thread – AgitarOne can help.
With AgitarOne’s powerful, automated unit testing features, you can create a safety net of tests that detect changes, so you know instantly when a new feature breaks something. Now you can enhance and extend your Java applications – fast, and without fear. And with AgitarOne’s interactive capabilities for exploratory testing, it’s easy to test your code as you write it. Whether you’re building a new application, adding new features to existing software, or simply chasing a bug, AgitarOne can help you stay a jump ahead.
©2007 Agitar Software, Inc.
4AKE THE
HANDCUFFS OFF
QUALITY ASSURANCE
Empirix gives you the freedom to test your way. Tired of being held captive by proprietary scripting? Empirix offers a suite of testing solutions that allow you to take your QA initiatives wherever you like. Visit us at the Better Software Conference & EXPO 2007, booth #29, to receive your Get Out of Jail Free Kit including our whitepaper, Lowering Switching Costs for Load Testing Software.
www.empirix.com/freedom
VOLUME 4 • ISSUE 6 • JUNE 2007
Contents
14
A
Publication
COV ER STORY Implementing Agile Testing Doesn’t Have to Be a Stretch
Embracing change is one of agility’s toughest and richest challenges. But your agile adventure doesn’t have to be all or nothing. These tips will help you stretch to integrate agility into your process today. By Bob Galen
22
How to Pick Good Sof tware Want a sure-footed way to stretch your budget? Gauge project maturity and business readiness with any number of models and frameworks. By Alan Berg
Depar t ments 7 • Editorial In which the editor waxes rhapsodic about the virtues of open source.
8 • Contributors
28
Thou Shalt Experiment With Thy Software To tame testing chaos, try Taguchi! This take on R.A. Fisher’s Design of Experiments will put you in touch with your creative side—and make you a far better tester. By Yogananda Jeppu
33
Practice .NET Testing With IR Data
When you need to determine what test data to use, try intelligent random test data to increase the breadth of your data coverage and your chance of exposing defects. By Bj Rollison
Get to know this month’s experts and the best practices they preach.
9 • Feedback Now it’s your turn to tell us where to go.
10 • Out of the Box New products for developers and testers.
40 • Best Practices Post-deployment tuning can drive you wild. By Geoff Koch Care for a sanity check?
42 • Future Test Third parties are everywhere! Keep an eye on your composite Web apps. By Imad Mouline
JUNE 2007
www.stpmag.com •
5
Ed Notes VOLUME 4 • ISSUE 6 • JUNE 2007 Editor Edward J. Correia +1-631-421-4158 x100 ecorreia@bzmedia.com
EDITORIAL Editorial Director Alan Zeichick +1-650-359-4763 alan@bzmedia.com
Copy Editor Laurie O’Connell loconnell@bzmedia.com
Contributing Editor Geoff Koch koch.geoff@gmail.com
ART & PRODUCTION Art Director LuAnn T. Palazzo lpalazzo@bzmedia.com
Art /Production Assistant Erin Broadhurst ebroadhurst@bzmedia.com
SALES & MARKETING Publisher
Ted Bahr +1-631-421-4158 x101 ted@bzmedia.com Associate Publisher
List Services
David Karp +1-631-421-4158 x102 dkarp@bzmedia.com
Agnes Vanek +1-631-421-4158 x111 avanek@bzmedia.com
Advertising Traffic
Reprints
Phyllis Oakes +1-631-421-4158 x115 poakes@bzmedia.com
Lisa Abelson +1-516-379-7097 labelson@bzmedia.com
Director of Marketing
Accounting
Marilyn Daly +1-631-421-4158 x118 mdaly@bzmedia.com
Viena Isaray +1-631-421-4158 x110 visaray@bzmedia.com
READER SERVICE Director of Circulation
Agnes Vanek +1-631-421-4158 x111 avanek@bzmedia.com
Customer Service/ Subscriptions
+1-847-763-9692 stpmag@halldata.com
Cover Photograph by Josef Vital
President Ted Bahr Executive Vice President Alan Zeichick
BZ Media LLC 7 High Street, Suite 407 Huntington, NY 11743 +1-631-421-4158 fax +1-631-421-4130 www.bzmedia.com info@bzmedia.com
Software Test & Performance (ISSN- #1548-3460) is published monthly by BZ Media LLC, 7 High St. Suite 407, Huntington, NY, 11743. Periodicals postage paid at Huntington, NY and additional offices. Software Test & Performance is a registered trademark of BZ Media LLC. All contents copyrighted 2007 BZ Media LLC. All rights reserved. The price of a one year subscription is US $49.95, $69.95 in Canada, $99.95 elsewhere. POSTMASTER: Send changes of address to Software Test & Performance, PO Box 2169, Skokie, IL 60076. Software Test & Performance Subscribers Services may be reached at stpmag@halldata.com or by calling 1-847-763-9692.
JUNE 2007
It’s Open Source Open Season A number of major strides in on it,” and the effort eventool development have been tually died. His posting made recently—for the betdescribed how Bugzilla’s terment of software testers numbering during the ensueverywhere. Most notable is ing six years went from 2.2 to the release in mid-May of 2.22, but never to 3.0. Bugzilla 3.0, the open-source “Since Hixie had already bug catcher born in 1998 taken the version 3 number, with little fanfare. we didn’t want to confuse I’m sure I’m not the first anyone, so the existing code person to notice this, but I base continued to evolve think it bears repeating. adding numbers to the Edward J. Correia There’s a certain honesty minor version component of about open source development and the 2.x, even though several…releases had fairprocesses they use. And I’m referring not ly major new features.” It wasn’t until last to the obvious aspect of its honesty in that summer that “some of us realized that code is exposed for all to see. Good or bad, Bugzilla…was deserving of a major version there’s no hiding your mistakes, and often bump,” and after a team meeting and concritics are brutally honest and spare no sensus on a roadmap, “we decided there hard feelings. was no need to worry about confusion What stands out in my mind is the hon(otherwise we would have called this esty that open source developers exhibit Bugzilla 4.0 instead.)” in their version numbering. It’s surely no The deference paid to a product’s capasurprise that commercial developers for bilities and the loyalty to those who came years have engaged in the practice of boostbefore were the factors that led the team ing their version numbers to market or to them finally take the leap. “The vast promote their product, to keep up with a majority of the design goals from Hixie’s competitor or just to get fresh coverage. Bugzilla 3 proposal were met along the But just as often, the open community way by iterative development of the existdoes the reverse—erring on the side of ing code base,” including a complete humility, and inserting decimals in places rewrite of the back end and an output temthat make marketing people cringe and plating system. With more people like that commercial developers have never these, the world would be a much better even heard of. place. For more details on version 3, please I came to this realization after reading see Out of the Box on page 10. Justdave’s blog entry of 05.10.07, the day Open Nominations after Bugzilla was released. In it, Bugzilla Nominations are now open for the project lead Dave Miller asks, “Why did it 2007 Testers Choice Awards, which we take Bugzilla nine years to get from verbestow upon the top test, QA and persion 2.0 to 3.0?” I thought his answer was formance tools as voted by you, the readtelling. ers of this magazine. Some of you may recall a plan by Ian Nominations are open now through Hickson (a.k.a. “Hixie”) about six years July 13, 2007. Once the nominations are ago to rewrite Bugzilla from scratch (I’ll tabulated, we’ll open our Web site for admit that I don’t, but I certainly recall the your voting. Ballots will be accepted from prior effort that resulted in Netscape 7). Aug. 1 through Sept. 4, 2007. “Hixie’s design for Bugzilla 3… was realFor more information, visit stpmag ly pretty good,” said Justdave’s blog entry. .com/testerschoice. ý “But very few people joined him to work www.stpmag.com •
7
You’re Reading A Winner!
Software Test & Performance magazine is a 3-time winner in the 2007 American Inhouse Design Awards, from the editors of Graphic Design USA.
Its three winning designs were selected from more than 4000 submissions, the most ever according to organizers.
BZ Media
Contributors If your team has been grappling with the transition from traditional testing to agile methods, or is about to, you’ll want to read our cover article, written by BOB GALEN—a regular contributor to these pages. Galen explains which types of organizations lend themselves best to adoption of Scrum and other methods, and gives real-world advice on how to implement them without too much heavy lifting. The article begins on page 14. In his 25 years in software development, Galen has designed everything from Web applications to real-time systems. He is principal of RGalen Consulting Group, a software development consultancy. Judging open source quality is something that ALAN BERG does on a regular basis. As lead developer at the Central Computer Services at the University of Amsterdam for the past seven years, Berg has sought out and contributed to dozens of projects for the advancement of the systems me oversees. In the article that begins on page 22, Berg turns his experiences into insights that can help as you search for the perfect open-source code base for your project. Berg is the author of numerous articles and papers on software development and testing. He holds a bachelor’s and two master’s degrees and a teaching certification. YOGANANDA JEPPU is a scientist at IFCS Aeronautical Development Agency in Bangalore, India. Beginning on page 28, Jeppu demonstrates a creative flair by taking what some might consider a dry subject—orthogonal arrays—and peppering it with interesting and relevant quotes from industry and historical notables, wrapped neatly within a cloak of divinity. Jeppu’s published works are many, and relate mostly to test methodologies of real-time systems for aeronautics-industry control systems for performance and quality assurance. In his present post since 1995, Jeppu has also served as a scientist for seven years at the Defense Research and Development Laboratory in Hyderabad. BJ ROLLISON is a test architect in the Engineering Excellence group at Microsoft. Since 1994, Rollison has also served as test lead of the setup team for international versions of Windows 95, international test manager for IE and other Web client projects, and the director of test training. Beginning on page 33, Rollison explains how to increase test coverage of .NET applications using stochastic (random) character generation vs. static data in automated tests. He also outlines a strategy to avoid common errors associated with using random data through the use of seed values. Using C# code examples, he shows how to equivalence-class data into subsets. TO CONTACT AN AUTHOR, please send e-mail to feedback@bzmedia.com.
8
• Software Test & Performance
JUNE 2007
Feedback AVOIDING THE QUESTION In Edward J. Correia’s “What’s the Best Way of Testing Flash Apps?” (T&QA Report, Mar. 27, 2007), Bill Perry did not answer your question. Maybe he’s been working mostly on mobile app development lately and has nothing to offer testers? So, like a politician, he answered the question he wished you had asked, not the one you did ask. Tom Tulinsky Ladera Heights, Calif.
USING FLASH ELSEWHERE The article “What’s the Best Way of Testing Flash Apps?” only seems to talk about Flash on mobile devices. In our organization, we use Flash in a simple way to present dynamic information and reports—information like graphs that can be changed to the style of graph the user likes: bar, pie, line, etc. As most of the UI is normal HTML with JavaScript etc., our UI tests can drive it. I can’t, however, validate the graphs and a couple of other small bits. This is my pain point as a tester. This may mean that reporting is a candidate for not automating, but due to the size and complexity of the reporting, it’s a hugely time-consuming manual task, and none of us are particularly confident that it has been done justice. So I would like to be able to validate Flash in this environment. Personally, I think Flash is overkill for this, as the UI is an administration UI mainly for IT engineers. So flashy graphics may look nice but don’t fool anyone. But then I’m only a tester, and what does my opinion count for? Name withheld
FLASH GETS FLEX-IBLE Flash is a compelling platform to build Rich Internet Apps; however, an increasing majority of these apps are today created using Adobe Flex, which still targets the Flash platform, but does so almost completely from code as opposed to the Flash IDE. Flex has a number of tools available for automated testing, from Flex Unit, a unit test framework, to full functional testJUNE 2007
TECHNICALLY CORRECT, BUT NOT FUNCTIONALLY CORRECT Regarding Edward J. Correia’s article,“How to Avoid a Self-Fulfilling Failure Scenario,” (Test & QA Report, April 3, 2007), Robin Goldsmith says that while testability and clarity are important, they mainly address form, not content.“A requirement can be testable and wrong; and testability is irrelevant with respect to an overlooked requirement.” I think that in addition to an overlooked requirement, the statement should include technically correct requirements, but not functionally correct:The requirement gathered appears correct from a technical perspective and is correctly recorded in a use case, but fails to provide the required business functionality. Often a user/customer thinks the requirement did state the functional need correctly and finds that when executing a test script, the technical aspect of the function is correct, but the business functionality is wrong. When they question the developer, the developer states, “You signed off on the use case and the identified requirement, and the test does not fail.Therefore, any change is a change of scope that must be paid for.” This is a real problem, especially when the gathering of the requirements fails the business need. Jim Watson New Albany, Ohio
ing integrated with top test tools such as Mercury QuickTest Pro. There is a lot of misinformation about the Flash platform, and as a developer who uses it continually, I am interested in people knowing that testing tools do exist and the modern tools are beginning to rival and exceed those from comparable platforms. Michael Labriola Park Ridge, Ill
GENERALLY VALUABLE, BUT… I got a little lost and confused in “What’s the Best Way of Testing Flash Apps?” It started out as an interesting article on testing Flash apps, as the title reads. It turned into a marketing pitch for all the things that Flash can do. I try to read as many of the testing QA articles that I receive because they tend to be interesting and valuable. This one, unfortunately, does not qualify, mainly because of the change of topics in the middle. Thank you for providing what is generally a valuable resource to the software development community. Gary Klesczewski Berlin, Conn.
FLASH IS A WASTE! So you asked the question “How would you suggest that one tests a Flash-based application?” and Bill Perry answers “You should be testing mobile applications.” What sort of misdirection was that? The only thing worse than that answer was the fact that you published it and all that followed in your article. You should have concluded your article with “You shouldn't be wasting your time trying to test Flash-based applications. In fact, you should strongly advise your company to avoid using Flash for development,” as it would have been the only honest thing to do. Walter Gorlitz Surrey, B.C., Canada CORRECTION: The CTO of build-tools maker OpenMake Software is Steve Taylor. He was misidentified in the May issue. FEEDBACK: Letters should include the writer’s name, city, state, company affiliation, e-mail address and daytime phone number. Send your thoughts to feedback@bzmedia.com. Letters become the property of BZ Media and may be edited for space and style. www.stpmag.com •
9
Out of t he Box
Bugzilla 3.0 Gomez Puts QA on SaaS Map Faster, More Giving To Customs Bugzilla 3.0 was made generally available in early May, and gives users of the opensource defect-tracking system faster performance, a Web services interface, and the ability to control and customize far more of the environment than in prior versions, according to project developers. Faster performance comes by way of support for Apache’s mod_perl, thanks to back-end code that’s been refactored into Perl modules that interact with the database to deliver “extremely enhanced page-loading performance,” according to Bugzilla release notes. The recommended minimum server memory is 1.5GB and as much as 8GB for sites with heavy traffic, if the option is enabled. Bugzilla also will still run as a CGI application if performance isn’t critical, system memory is an issue or for servers running something other than Apache. From the accommodation department, Bugzilla now permits custom fields, custom bug resolutions and defaults (though fixed, duplicate and moved remain untouchable), and the ability to assign permissions on a per-product basis. Also new, developers and testers can now officially file and modify bugs via e-mail (available previously as unsupported) and set up default cc: lists to force certain addresses to always be added to lists for specific components. Administrators are now notified of Bugzilla updates when they log in. A new globalwatchers parameter allows lists of addresses to receive all bug notifications generated by the system. All outbound e-mails are now controlled through templates, allowing them to be easily customized and localized as part of a language pack. A new mailform parameter offers control over which addresses show up in the from field on outbound messages.
10
• Software Test & Performance
If you’re in the market for an in-house QA software solution, you might consider holding off for a week or two. Web application interface management tools maker Gomez this month is set to release a set of software as a service solutions that it claims are the first quality assurance solutions to be offered in this way. The trio of services, dubbed OnDemand QA, is available separately to provide load testing, functional testing for AJAX-enabled applications, and visual performance testing across a variety of browsers. According to the company, the socalled Reality Load XF applies real-world loads to applications using Web-connected devices in scenarios both inside and outside the firewall. More than 14,000 measurement locations are available to simulate load characteristics of the actual user base of applications
Software services from Gomez perform load, QA and usability tests on Web apps in a variety of screen resolutions and browser types.
Interface improvements include unchangeable fields that appear unchangeable, warnings when a duplicate bug is about to be accidentally submitted (such as by going back in a browser or refreshing a page), and a navigation and search box at the top of each page in addition to the one that was always at the bottom. Also new are customizable
skins (using CSS) and saved searches, which allows group members to subscribe to searches saved by others in that group. There are now QuickSearch plug-ins for Firefox 2.0 and IE7. Dozens of additional enhancements for Bugzilla users and administrators are listed at www.bugzilla.org/releases /3.0/new-features.html.
under test, says the company. Reality Check XF, the functional tester, applies use-case scripts recorded with the Selenium open-source Web-testing framework to AJAX apps. Tests against multiple browsers and operating systems can be scheduled or performed manually, and can provide screen captures and full playback. Third in the trio is Reality View XF, which tests for visual performance of Web pages across Firefox, IE6 and 7, Opera Safari and other browsers on multiple operating systems and at multiple resolutions. Page-rendering performance results are reported to enable optimization. The services will be available at www.realityqa.com.
JUNE 2007
COVERITY FINDS THE DEADLY DEFECTS THAT OTHERWISE GO UNDETECTED. Your source code is one of your organization’s most valuable assets. How can you be sure there are no hidden bugs? Coverity offers advanced source code analysis products for the detection of hazardous defects and security vulnerabilities, which help remove the obstacles to writing and deploying complex software. With Coverity, catastrophic errors are identified immediately as you write code, assuring the highest possible code quality— no matter how complex your code base. FREE TRIAL: Let us show you what evil lurks in your code. Go to www8.coverity.com to request a free trial that will scan your code and identify defects hidden in it.
© 2007 Coverity, Inc. All rights reserved.
Your code is either coverity clean—or it’s not.
Reticulitermes Hesperus, or Subterranean Termite—unchecked, property damage estimated at $3 billion per year. Electron Micrograph, 140X
GUI-er PushToTest 5 PushToTest 5, the open-source SOA automation framework now in alpha, is scheduled for general availability this month. According to Frank Cohen, founder of the like-named consultancy, the new version will have the ability to create, edit and debug GUI-based test scenarios, to pause, modify and resume running tests, and to execute tests on distributed nodes. Also supported will be dynamic JAR loading, testing of REST and AJAX applications, multidimensional load testing, HTTPS support in the test recorder, resource monitoring and email notification. Cohen said PushTo Teststands out for its focus on SOA governance and automation.
Your App Go Splat? Ever wonder what was going on inside a user’s machine just before your application crashed? Of course you have. A service from Bugsplat Software can put your name on the “Send Report” dialog instead of Microsoft’s, costs less than Redmond’s “free” WinQual service, supports non-native apps, and provides more information about the cause of the crash, according to the company. The Bugsplat service uses the Windows Minidump technology to deliver data about application stability, including function names, sourcecode line numbers, automatically calculated call stacks, statistics about the most critical bugs and the ability to include log files, system info and other custom data to reports. The service for Windows, .NET and Java apps also optionally delivers any name on the send report dialog and contact info of the crash victim. Versions for Linux and Mac OS X are planned. Bugsplat inte(www.bugsplatsoftware.com) grates with defect tracking systems.
ANTS-y .NET Profiler Fast application performance is not one of the hallmarks of the managed environment. But Visual Studio 2003 or 2005 developers now have one more performance tool to consider.
12
• Software Test & Performance
ANTS Profiler 3.0 from Red Gate Software is now fully integrated with Visual Studio, giving developers using .NET languages a means to profile and optimize application performance without leaving either Microsoft IDE. The latest version identifies performance bottlenecks within applications and provides fast or detailed feedback. Detailed mode delivers a line-level report of code execution time. Available in onetenth the time is a fast-mode report of the execution time of methods. In addition, version 3.0 now supports 64-bit systems, IIS7 in Vista, AST.NET Web servers and .NET 3.0 apps based on WCF, WPF and Windows Workflow.
Testers Asking ‘What’s SAMATE?’ Klocwork K7.7 now permits users of Visual Studio .NET 2005 and IntelliJ IDEA to analyze code from within those environments. It also has expanded stack trace capabilities for all supported IDEs, including prior versions of Visual Studio as well as Eclipse, IBM RAD 6, Wind River Workbench and QNX Momentics. 6.3 on Linux, Solaris and Windows. K7.7 also improves Java code checking, now with new coding warning practices and the ability to “tag certain Java methods as unsafe,” according to company documents. Accuracy rates for C/C++ and Java also have been improved, as has reporting. The company also reported a 90-percent pass rate when testing nearly 1,400 known security vulnerabilities provided by the Software Assurance Metrics and Tool Evaluation (SAMATE) program, part of the NIST and the Department of Homeland Security. According to the SAMATE Web site (samate.nist.gov), the term is actually pronounced suh-MATE, but hey… tomato, tomato.
Fanfare for The Common Test Action For people testing software for devices, it’s not unusual to have to repeat many of the most common tasks—setting up the devices and configuring the test beds—over and over again; these process-
es can often only be done manually. Claiming to automate many of these test actions is Fanfare Group, maker of FanfareSVT, an integrated development environment for testing complex equipment and systems. In mid-May the company unveiled iTest Personal, a recording solution that according to the company can capture a tester’s every action, command and test executed through any number of interfaces, including command line, SNMP, Web or CMD shells. Devices and test actions be recalled and reproduced; documentation is generated automatically, and chronicles each command, action and device response. Test scenarios—which also function as test scripts—can be sent to remote testers, developers or automation teams for reference, editing and reuse through and drag-and-drop interface. The tool is available now for Linux, Solaris and Windows.
A Requirements LifeCycle Blueprint Requirements life-cycle management isn’t a term that’s on too many lips, a fact that test tools company Blueprint hopes to change. Formerly known as Sofea, the company in May, along with a name change, introduced Requirements Center 2007, a solution it says provides developers and testers with application simulation, requirements-driven testing and collaboration for geographically distributed teams. Available in three editions, an analyst version includes tools to define “application functionality, simulate end-to-end processes and package business context” into a reference platform usable by others. The developer version lets programmers and system architects create or further enhance the simulations and export development artifacts from requirements models to their existing development environments. A version for testers links requirements definition with QA and can generate functional tests that provide complete coverage of those requirements. All are available now as a free upgrade for current customers of Blueprint’s Profesy visual requirements tool. Send product announcements to stpnews@bzmedia.com JUNE 2007
Memory Loss Affecting Your Multi-Threaded, Multi-Process Application? Announcing MemoryScape 2.0 Now even smarter!
Download your free 15-day trial version of MemoryScape 2.0, the newest memory debugger from TotalView Technologies — now with MPI support. Provided by TotalView Technologies, the leader in multi-core debugging software, MemoryScape is specifically designed to address the unique memory debugging challenges that exist in complex applications. MemoryScape supports C, C++ and Fortran on Linux, UNIX and Mac OS X, and is a focused, efficient and intuitive memory debugger that helps you quickly understand how your program is using memory as well as identify and resolve memory problems such as memory leaks and memory allocation errors. MemoryScape utilizes wizard-based tools, provides collaboration facilities and does not require instrumentation, relinking, or rebuilding. Its graphical and intuitive user interface simplifies memory debugging throughout the memory debugging process and has been proven to find critical memory errors up to 500% quicker. Before you forget, go to our web site at www.totalviewtech.com/memoryscape or call 1-800-856-3766 for more information. Š 2007 TotalView Technologies, LLC TotalView is a registered trademark of TotalView Technologies, LLC. All other names are trademarks of their respective holders.
You Can Take Advantage Of Its Benefits Without Bending Over Backward By Bob Galen
I
n the training ground of agile discussion groups—where newbies learn the ins and outs of agile development—members generally
try to be helpful. But the nature of that help is sometimes skewed depending on the purist or pragmatist tendencies of the helper. For example, a recent dialogue on an agile-testing discussion list went something like this: New Scrum user: We are a waterfall shop transitioning to Scrum. Our QA team has some reasonable automation experi-
14
• Software Test & Performance
ence. However, we’re concerned about our ability to run our regression test suites within the context of the agile iterations. Of particular concern is how we perform integration testing across multiple Scrum teams’ development interdependent product components. How do we do internal sprint testing and still provide sufficient integration testing without being disruptive? Any approaches out there?
Almost immediately, an agile purist replied with the following advice: Do testfirst development. Developers write unit tests. Testers write acceptance tests. Automate everything and run it within the iteration. You’re asking if you can do it differently or partially. No! Do test-first development, and everything works out. Then, almost as if on cue, an agile pragmatist offered what I view as much Bob Galen is an independent software consultant from Cary, N.C. JUNE 2007
more useful and sage recommendations: I’m a tester in a team that’s been doing Scrum for about two years now. Our approach is developers write and run unit tests while testers try hard to write system tests in the same sprint as new features are developed; we also try to automate them where it makes sense. We try to reserve the final sprint of the project for a complete rerun of the manual regression tests and occasionally plan a sprint so that time is reserved to do a partial or full run. It’s not ideal, but like any real-world activity, the project constraints are also far from a textbook model. We struggle with having the time to keep the automated regression tests running clean while we’re developing new tests for new features. We also have less test coverage and less test automation than we’d like. JUNE 2007
In other words, business as usual…
What’s the Point? The second, more useful reply contains practical advice based upon personal experience—balanced across what’s working and the associated challenges. Agile methods work best in specific contexts: new projects, small teams, engaged customers, situations with ambiguous and unknown requirements and with little or no legacy-testing baggage holding them back. But in the real world, these project conditions are difficult to come by—particularly the legacytesting baggage part. And it’s not as if those manual and automated regression tests are useless. Often, they contain domain experience and coverage far beyond their slowly growing agile counterparts—
value that needs to be leveraged until the team gains sufficient experience and coverage. I’m not saying the agile purists are wrong. In fact, every new methodology needs evangelists who are passionate about the entirety of their approaches and who firmly push for adoption. They’re defining the nirvana state—the goal to which we all must aspire in order to be truly agile. But for those of us in testing who have adopted (or been adopted into) agility and who aren’t blessed with a “greenfield” project space, I’m offering this article as a means to limber you up to agile methods. So come along with me to discover how testers can bend, stretch and contribute to Scrum project scenarios and contexts that might not be in the best shape, www.stpmag.com •
15
Photograph by Josef Vital
Implementing Agile Testing Doesn’t Have To Be a Stretch
EXERCISING AGILITY
Adapting Legacy Testing Practices Toward Agility Within Scrum
FIG. 1: AGILE VS. TRADITIONAL
Agile: Developer-Driven • Unit tests • Continuous integration • API & low-level integration • Data integrity & object interaction Agile: QA-Driven • Customer acceptance testing • Feature testing • Exploratory testing • Limited integration & regression
nor perfectly aligned with pure agile contexts.
Core Agile Tenets A quick review of the key principles that drive all of the agile methods, including Scrum: Time-boxed iterations. Usually two to four weeks in length. Iterations have an initial planning session, daily stand-up meetings and a closing demonstration of delivered results. Emergent requirements, architecture and design. Instead of trying to anticipate every need, the methods try to evolve products on an as-needed and just-enough basis. Products emerge from the team’s efforts in collaboration with the customer. Customer inclusion. Customers are a critical part of the team, working to provide insight on what to build, prioritization and business value, as well as serving as a liaison to the broader business. Often, testers partner with customers to demonstrate and confirm feature acceptance. Parallel execution. Teams are composed of all functions required to meet each iteration’s planned goal. The teams also employ as little multitasking as possible. Self-directed teams. Teams plan and strategize on how best to meet customer expectations for features. They work toward a common goal and help one another achieve it, regardless of particular expertise. Transparency. Agile results, for good or for bad, are ultra-visible. You’ll know
16
• Software Test & Performance
Traditional: QA-Driven • Functional (feature-driven) testing • System testing, end-to-end testing • Regression testing • User acceptance testing • Load and performance testing • Nonfunctional requirement testing • Security testing • Usage scenario testing • Developer-driven
exactly where you stand each day, and will fully understand your teams’ capacities. Problems will be surfaced immediately for action. By and large, stakeholders welcome this aspect. Common agile practices push testers to the ends of the iteration. Early on, they’re involved in the detailed description and clarification of features for inclusion in the product backlog for prioritized implementation. During the later stages of the iteration, they focus on automating acceptance testing and ensuring that the iteration results demonstrably work in the sprint review. In both cases, they collaborate heavily with customers in elaborating and testing requirements.
With that bit of context, let’s plunge into our agile workout regimen. Much of it has to do with changing our traditional testing mindset and challenging many of our most cherished testing practices. I’ll start with a personal example. I’ve been known as a fervent test planner. I’d literally write hundreds of pages of planning documents that tried to capture the essence of every step of large-scale testing efforts. I found that, while they received sign-off, rarely were they read or truly understood. Even worse, they didn’t serve to guide testing execution with my teams. Under the banner of process or plan-driven action, I’d fallen into a trap—one that many of us fall into. Instead of focusing on process and planning, testing teams need to focus more on delivering business value, improving our cycle time and improving our own personal competency and capacity. Plans and process don’t do that for you. Agile thinking around people adapting to discovery and making good decisions does. So, like any good survivors in a new terrain, we need to stretch our conventional thinking. In this case, let’s explore how to implement and integrate testing support within Scrum sprints and agile iterations in general.
Testing Focus: Inside and Outside Of the Iteration An agile team wants to completely test each iteration’s stories or features while focusing on automated unit and
FIG. 2: SPRINTING TOWARD THE FINISH
Stabilization Sprint(s)—focused on integrating development release forward toward production release Testing Activities: 1. Full regression 2. Overall integration 3. Performance 4. Usability 5. Bug fix 6. Productionpromotion steps
Production Product Release
Product Owner defined Product Backlogs and “coordinates” between development sprints
JUNE 2007
EXERCISING AGILITY
acceptance testing to provide adequate coverage. This is driven by the team’s holistic view of quality and delivering working code. However, every professional tester realizes that there are countless contexts where this level of testing is insufficient. In fact, it could even be dangerous in mission- or safety-critical projects. Professional testers need to “stretch” their agile teams toward performing the sorts of complete and extended testing that is typically not considered necessary within the agile iteration—for example, running interoperability, performance, regression or integration testing that is broad in scope and potentially based on legacy manual and automated test cases. But iterations can be “broken” by too much testing. It can simply overwhelm the focus of the team and drive their attention away from delivering feature value. That’s why I look at our job as testers as one of balancing testing focus and coverage in two directions. Within the iteration, our prime directive is to keep the quality proposition of the deliverables on point. Ensuring the customer gets what they need—measured by acceptance testing contracts. At the same time, we need to broaden our team’s view to the variety of testing needs and work with our Scrum Master, Product Owner and team to plan for all requisite testing coverage at appropriate points. Achieving the proper balance can be a great challenge depending on your domain context, business needs and legacy testing culture. Nonetheless, we need to guide our teams through it toward production release. Figure 1 illustrates the testing focus points between agile and legacy contexts, depicting the tension and the gap that exists between the purist and pragmatist views. Scrum testers need to navigate these endpoints and effectively test across all as required. There are two iterative models that can help accommodate this balancing act.
Scrum: Stabilization Sprints In the stabilization sprint model, Scrum teams iterate on building a product for several developmentfocused iterations. However, there are aspects of testing that simply can’t be completed within each iteration, such as integration or regression testing JUNE 2007
coverage. Over time, this builds up testing technical debt that must be mitigated. In this model, the team transitions from development-focused sprints toward stabilization or testing-focused sprints. Usually, stabilization sprints are planned to lead toward a production release of the product. The number of stabilization sprints can vary from team to team and product to product; I’ve seen as few as one twoweek sprint for straightforward integration efforts to three or four 30-day sprints for larger, more complex projects. Some of the factors that influence the number of stabilization sprints include: • The number of parallel development sprints coalescing into each stabilization sprint • The amount of time since the last stabilization sprint: the amount of accrued testing technical debt • The amount of integration (environment setup, execution and rework) effort required • Rework levels (defect repairs and clean-up) and release-quality criteria levels All contribute to the number and length of the stabilization sprints. Figure 2 depicts a Scrum stabilization sprint model’s transition and focus. One of this model’s greatest challenges is resource sharing. Once the product increments move toward stabilization, the development-focused sprint teams are largely idle. Of course there will be defects to repair, and other work driven from the stabilization sprint, but rarely will it consume all of the team’s resources. Usually, teams are pressured to move on to the next segment of development work (enhancements, maintenance and new products), and the test and development resources oscillate their focus across the two projects. This can put pressure on the testing team because they often have to be in two places at once. Their prime directive is to execute and drive the testing within the stabilization sprints. However, if the development sprints proceed to their next sequence of work, they need to be members of those teams as well. This oscillation or multi-tasking between development and stabilization sprints can put tremendous pres-
sure on testing team staffing ratios. The answer is often to hire more testers; but rarely does management want to deal with that reality—particularly in an agile space. Another approach is to tightly cou-
U
N-AGILE I have a client who was operating within a large-scale Scrum instantiation, with 30+ simultaneous Sprint iterations. They worked in a regulated environment and had a large test infrastructure built up to cover their applications. About 20 percent of the test cases were automated. They found themselves operating effectively within their Scrum development sprints, but chose to focus most of their testing resources outside the agile teams operating in that traditional system-test mindset. In essence, they bolted traditional waterfall testing onto agile/Scrum development. This actually slowed their overall deployment by two to three times that of their previous waterfall approach. They believed—and I agree—that the primary driver for this was the lack of tester involvement and engagement in the development sprints. Testers lost context on what was being delivered, leading to an extreme effect on their test efficiency. This misadventure amplifies the need for testers to internalize these processes, engage their agile teams in a balanced fashion and not to get stuck in their traditional methods.
ple the development resources to the first round of stabilization sprints. In reality, you’ll find enough integration issues and bugs to keep everyone occupied. Rarely is it necessary to continue this heavy development-sprint resource focus beyond the initial stabilization sprint. Each product team, or set of teams, must find its resource-sharing balance between development- and stabilization-focused sprints.
Scrum: Skewed Testing Sprints An alternative approach for testing www.stpmag.com •
17
EXERCISING AGILITY
beyond the development sprint is what some would call the waterfall-esque variant. In it, you set up two “connected” sprints—one focused on developing product increments and the other on subsequent testing. In this model, the idea is to mitigate the testing technical debt on a release-by-release basis. Part of the strategy is to avoid the number of iterations and complexity associated with stabilizing a large set of development sprints. This model supports a small number of development sprints feeding the testing sprint, with the capacity of the receiving testing-sprint team being the primary throttle. By far, the most typical arrangement I see is one to three development sprints feeding a testing sprint. Usually, testers in the development and testing sprints have a liaison or shared-resource relationship. The liaison works to share knowledge for planning, hand-off transition and execution purposes. As in the case of the stabilization sprint, developers can be assigned to the skewed testing sprint. However, I’ve seen this occur much less often. What usually happens is that issues are reported backward to the development sprint for investigation work and repair. This can either happen by adding items to the product backlog and/or reserving some team bandwidth in the current sprint backlog for this sort of work. Figure 3 provides an illustration of the model. It’s important to note that this model is no excuse to fall back into
E
XAMPLE CONTEXT WHERE AGILE TESTING ADAPTATION IS REQUIRED Suppose the following defines a project context for your testing effort moving to Scrum: A large enterprise application with 10 Sprint teams developing features in parallel to support a major new release of your flagship product. The domain is health care, and the testing will be driven by FDA coverage and compliance regulations. The company itself is operating at CMMI Level 3, which brings with it process and artifact requirements. The application has a significant baseline of test cases (6,500) that are run in regression, with only 10 percent of those being automated. This is the second overall product/project that has used Scrum. Organizational experience is nine months, and roughly 20 percent of the team has deep experience. In this context, you’ll need to be adaptive in your agile methods. Clearly, you won’t initially be able to run your testing suites within each sprint or stop to automate everything. You’ll also need traceability and coverage reporting that will align much better with your traditional testing approaches than within agility. Finally, all of your management and reporting mechanisms will need to gradually be changed so that your stakeholders understand the quality levels of each product release. This example typifies some of the transitional testing challenges within agile teams.
traditional testing patterns. All team members should focus on getting as much testing done (unit, acceptance, regression and integration) within the development sprints as possible—as well as on automating those tests. Skewed testing sprints are often used as a transitional model for larger teams moving into agility. Early on, more testing is focused within the testing sprints than ought to be—primarily because the development teams are new to agile testing practices. But over time, the focus and breadth of the skewed testing sprints should be narrowed in duration and resources.
FIG. 3: SKEWED RESULTS
Skewed Testing Sprint(s)—focused on providing more formal testing feedback by virtually running development& testing in parallel/skewed Sprints Testing Activities: 1. Partial regression 2. Limited integration 3. Early performance 4. Bug fixing 5. QA-promotion steps
Interim or Integration Product Releases
Product Owner & test team members coordinate bug feedback into current Development Sprint Backlog & Product Backlog
18
• Software Test & Performance
From an agile purist perspective, this is probably the least attractive of the models because it separates team roles across development and testing activity and introduces hand-offs and staging delays that are truly wasteful. Nonetheless, in some situations, it makes sense to stage your iterations in this way; for example, when you have a heavy legacy-testing burden but want to make production releases in minimal time.
Melding Agile and Traditional Automation? One of the more common challenges facing agile testers with a large-scale investment in traditional automation is that it doesn’t merge or fit well with agile automation tools and techniques. Tools Integration. The first challenge is a philosophical one. Traditional testing tools are expensive, cumbersome, often dictate proprietary languages, and are targeted toward testers who typically have limited programming experience. Simply put, they’re not that attractive to developers familiar with the simpler and more powerful tools often used within an agile context. The open-source test automation tools that agilists favor also exacerbate this issue because they don’t integrate easily with traditional testing tools. Because of this, traditional tools JUNE 2007
EXERCISING AGILITY
can be effectively used only outside of the core agile teams as a supplement to QA-centered testing (integration, regression and so on). However, as you move forward into agility, you should shift your investment in them to toolsets that can be better shared and operated collaboratively across the entire agile team. At the Wrong Level. The second problem is that traditional or legacy automation investments are usually too brittle to be of use in an agile life cycle. They’re simply too susceptible to maintenance during application evolution and change. This is usually because the automation has been developed at the GUI or interface level, which increases its brittleness and maintenance burden. This actually leads into the Agile Testing Food Pyramid depicted in Figure 4. Traditional testing is usually conducted at the levels that the testers have more exposure to and control of—normally above APIs and at the GUI or external interface level. While this does enable them to write a high degree of automation, it also can be quite sensitive to change. Agile teams approach the testing problem from the opposite direction. The majority of the investment is at the unit-test level, with the smallest effort directed at traditional interfaces. This has (at least) two advantages. First, these tests are less sensitive to change—and when they do change, the changes are tightly coupled to the code that generated the change. That places the change burden squarely in the hands of each developer during their code and unit-testing development. Second, since the development team is writing the highest proportion of test cases, and these usually outnumber the testers, they generate more coverage, much faster. What a wonderful side effect! Can We Merge the Two Approaches? I think not. While they certainly overlap and share some integration points, the approaches are fundamentally different. I think the agilists are going in the right direction and basically get it right: A breadth of solidly running automation coverage is key for high-quality, iterative software development. It needs to be nimble and fast. Too many traditional automation efforts struggle under their own weight when it comes to changes. JUNE 2007
They’re expensive to create and disruptive to maintain. However, don’t throw away existing automation when bringing it into an agile team context. Instead, start changing your investment ratio to better align with the agile side of the test pyramid and begin to migrate or reuse your existing automation as appropriate.
Defects, Metrics and Process Agility also places change pressure on measurement and reporting practices. Often, testing teams are responsible for measuring application maturation by setting and mapping progress toward release criteria. Usually this
extremely important in agile teams, offering a place where our expertise can shed light on our true progress toward our sprint goals. Typically, these take the form of burndown and trending charts. There’s still a place for multifaceted release criteria within the agile methods; for example, setting unit-test coverage targets as a means of measuring sprint success and feature completeness. But these should be set and highly visible within the context of the team—and the cost supported by all, including the Product Owner. Again, this is a place where our expertise and skill can help direct our teams.
FIG. 4: INVERSE PYRAMIDS
Unit
Regression & Scenario
Functional & Integration
Functional, Integration & Acceptance
Features, Regression & Scenario
Automated developer tests—under the APIs, Unit tests
Traditional Pyramid
Agile Pyramid
takes the form of test-case coverage and defect trending. From an agile perspective, logging, discussing, prioritizing, tracking and verifying defects is much less interesting than simply fixing them as early as possible—or better yet, helping to prevent them in the first place. In fact, some agile teams don’t even use a defect-tracking system—preferring to manage the few intra-iteration defects that do surface by writing them on a story card. This illustrates the agile focus on working software and customer collaboration over planning and reporting. But don’t view this as license to throw out all your standard metrics and techniques. Rather than internally managing the quality proposition, agile testers collaborate with their Product Owners and Scrum Masters to craft progress charts that illustrate how the entire team is meeting their overall goals. This quality-progress role is
Agile Test Resource (People) Implications Some of the largest challenges of a move toward agility relate to your team. Testers in agile teams must change at a variety of fundamental levels. They need to be more comfortable with the products’ technology set while working hand-inhand with the developers as partners instead of adversaries. This surfaces in not only technical, but soft skills. Testers need to become comfortable working with developers, customers and stakeholders, while focusing their efforts more on requirement definition and acceptance-test development and automation. They need to adopt more of a service-oriented view within the agile team. In many ways they become facilitators for quality products rather than serving in a “quality cop” or “gatekeeper” role. This transition can be difficult for testers that are entrenched in this view of their role, but it’s a fundamental shift that needs to be made. www.stpmag.com •
19
EXERCISING AGILITY
Tester-to-Developer Ratios Since agile methods assume that testing and quality are a team responsibility, often the notion of developer-to-tester ratios gets lost in staffing Scrum teams. However, as stated earlier, there can be much more to testing agile deployed products than the effort imposed by unit and acceptance testing. And this also presupposes that the development team is passionately applying these techniques. Early, non-pristine agile contexts will actually need more testing resources than their traditional equivalents because testers are really being pulled in two directions. Within agile teams, testers need to be placed within each sprint to facilitate and execute as much testing as possible. But this usually doesn’t immediately lighten the burden for traditional test preparation: Writing test scripts; maintaining automation; preparing production lab and data environments, tools and process support; and running iterative testing cycles (stabilization and/or skewed testing sprints) can usually engage the entire pre-agile deployment testing team. Testing teams are often forced to make a resource-direction commit decision. Frequently, teams focus on their traditional practices and tendencies, short-shrifting the Scrum development sprints. This is exactly the wrong decision! Though it may seem counterintuitive, they need to focus much more heavily on embracing agility and partnering more solidly as part of the Scrum sprints.
The Quality Champion What I consider the single most important aspect of agility for testers is not to lose sight of their role as quality champions. Changing practices takes time, and it’s easy to get caught in the “methodology muck” as you struggle to adapt your testing practices. Remember, it can be a great challenge to converge toward the agile principles in many realworld project and team contexts. On some projects, business pressure to release products in agile contexts corrals the team into inappropriate quality trade-offs, potentially compromising the quality practices that agile approaches are meant to engender. And in some contexts, simply focusing on unit and acceptance testing won’t meet the business needs, requiring far
20
• Software Test & Performance
M
INIMIZING TESTING TECHNICAL DEBT
In many agile contexts, there is the notion of technical debt from a code-base perspective. As legacy systems age and need maintenance, they can accrue more and more brittleness and technical debt, making repair work challenging due to architectural fragility and the impact of change side effects. The agile methods, particularly Extreme Programming, subscribe to a set of practices intended to mitigate this technical debt risk. The practices include implementing high-coverage unit testing, continuous integration, pair programming and aggressive refactoring. All of these are done on a finely grained, daily schedule so that this risk is continuously beaten down and mitigated. The same sort of debt creeps into systems from an agile testing perspective. However, it’s often impossible to perform sufficient testing within aggressive timeboxed iterations and still deliver significant content value. Therefore, testing debt will accrue in many contexts as development iterations progress. The only way to mitigate it is to: • Drive up as much testing as possible within each iteration • Heavily automate tests • Occasionally “pause” for integration and broader-scale testing activity—essentially beating down the debt
more extensive and traditional testing. In these cases, testers can serve as a champion of product quality in the same manner that developers champion architectural integrity: Craft solid acceptance tests for each sprint feature with the Product Owner while helping to craft overall sprint release criteria. With the Scrum Master and Product Owner, help to plan testing-focused iterations: length, breadth, requirements, focus, integration, regression etc. Work with developers on their choices as to what, when and how to test their deliverables. Provide data and metrics that assist the entire team in understanding overall product quality and in making good release readiness decisions.
Agile Survival Skills Agile survival is about challenging adaptation without abandoning your testing knowledge, skill and experience to start anew. Early on, many testers thought that the agile methods would put them out of work, fearing that the developers would be doing “all of” the testing. In my experience, nothing could be further from the truth. However, we do need to adapt as follows: Instead of being a quality cop... become a valued team member and
partner, providing data for team-based insight and decisions. Serve as a testing subject-matter expert across the broader testing and quality landscape. Instead of being technically disconnected… become a partner, pairing with developers to focus and improve their testing via tools and techniques— showing the way toward properly designed tests. Instead of becoming requirement documentation–bound… become accepting of change and partner with the customer, narrowing the views of their needs and guiding the team toward realization and providing value. Facilitate the notion of acceptance for delivered features. Instead of having software thrown over the wall to you… become a part of the agile, iterative model and influence product quality throughout each project, leveraging your quality expertise and generating conversation, teamwork and testing. I hope this survival guide will help you stretch your capacities so you’ll be toned and ready to embrace agility. ý ACKNOWLEDGMENTS The concept of the “Testing Food Pyramid” is most often credited to agile veteran Mike Cohn, founder of Mountain Goat Software. I’ve included a quantified a version of it here. I also want to thank Shaun Bradshaw of Questcon for his insightful review and comments.
JUNE 2007
IT TOOK TOOK A THOUSAND YEARS TO BUILD ROME; YOUR DEV TEAM HAS A MONTH.
Your challenge: finish big projects eons faster. Defy it: communicate and collaborate better with Visual Studio速 Team System. More tips and tools at defyallchallenges.com
Sure-Footed Steps To Gauge Open Source Projects By Alan Berg
T
he quality of an open source product should be a key determining point in an organization’s decision to deploy it. But companies
often end up with systems that begin as proprietary, but evolve as time goes on, as patches and add-ons are put in place to provide features or solve problems. And sometimes the most effective and available solution comes from the open source community, giving rise to hybrid systems that seem to be sprouting up everywhere. So determining the quality of open source products is important for smooth planning of change. A number of models and frameworks exist for measuring the maturity and business readiness of open source software (see Table 1). Common to all is the aim to break the products into component parts and provide objective measurements for things like support, training, documentation, road map, reputation and software quality. Quality can be measured by several means, including the number of unit tests in the code base. With the ever-expanding popularity of open source software in the marketplace, these types of highly focused sites can only become more popular. And being their criteria can be more effective than making up your own deterAlan Berg is the lead developer at the University of Amsterdam.
mining factors. But whenever humans are involved, measurements tend to be subjective. So I suggest supplementing your own selection criteria with the models used by others. In this article, I’ll give you some practical, bottom-up advice aimed to neatly dovetail with any given set of criteria to provide a realistic starting point. While the examples in this article relate only to Java projects, the principles can apply to any programming language and will enhance your understanding of the quality of any given project.
Check the Reviews Before diving into code reviews and installing software in the lab, I suggest looking at what review sites think of your potential candidate. I particularly like ohloh.net, which delivers clear warnings of potential issues such as licensing models and low popularity. It also measures lines of code and CVS/subversion activity when possible to provide a rough estimate of project effort. For ohloh.net, note the number of contributors mentioned for uploading to a given project. For large projects, branch managers generally coordinate and place the changes into the revision control system on behalf of a group of developers. Therefore, the number of JUNE 2007
real contributors can easily be underestimated. Specialized sites for given topics also exist. For example, I’ve recently worked on consolidating candidates for an enterprise-wide content management system into a short list. During the research phase, I extensively visited the CMSMatrix (www.cmsmatrix.org) site as a baseline sanity check. One word of warning: You may be tempted to take the information from the above-mentioned sites without filtering. Don’t do it! Instead, verify version numbers against reality; you may be looking at an archeological find rather than a snazzy new AJAX-enabled Web 2.X application.
Don’t Trust Version Numbers Some have meaning, some don’t. Many open source projects use odd numbers for development branches and even numbers for stable versions; the Linux kernel and Moodle (moodle.org) are prime examples of such projects. If you see a killer feature in the development branch, don’t automatically assume that it reaches production ripeness in one cycle. Rather, use the version number as a hint and verify the situation by reading the road map carefully and ghost-observing the forums for hints of trouble to come. Confusion also can arise from generic misuse of version numbers by commercial parties. Version 9 may simply be a number to match a move made by a competing company. Just because an open source product such as the Davenport WebDAV gateway project (davenport.sourceforge.net) hovers beneath the 1.0 version doesn’t mean that the code quality is less than a commercial company would dub version 5.x. Version numbers can sometimes be a statement about the company’s professional modesty.
Know the Cost Knowing the rough price of building a similar product can be helpful in deciding whether your company wants to build from scratch, use and improve, or buy in via placing resources such as person-hours within the community’s reach. The decades-old Constructive Cost Model (COCOMO) enables cost estimations in development time and dollars of a given project based on the number of physical lines of code that exist (or JUNE 2007
need to be developed). This mature, oftcited model is straightforward, and permits calculation using clearly understandable statistics. For an implementation of the COCOMO method that can read source code in approximately 30 languages and generate a straightforward report within seconds, try David A. Wheeler’s open source SLOCCount (www.dwheeler .comsloccount). This tool is particularly useful for examining differences between different versions of the same project. For a baseline example, let’s generate two reports of different versions of Apache 2 server, a stable, mature and highly successful project, and see what happens. We’ll walk though measuring the approximate differences between Apache 2 server version 2.0.59 and 2.2.4. Under Linux, assuming that you’ve downloaded and installed both SLOC Count for report generation and Wget(www.gnu .org/software /Wget) for downloading the relevant source code, the following recipe works: mkdir sloc_test cd slock_test Wget http://apache .dsmirror.nl/httpd/httpd2 .2.4.tar.gz Wget http://apache .dsmirror.nl/httpd /httpd2.0.59 .tar.gz tar xvf httpd2.0.59 .tar.gz tar xvf httpd-2.2.4 .tar.gz sloccount httpd-2.0.59 > 2.0.59.txt sloccount httpd-2.2.4 > 2.2.4.txt
The report generated for 2.0.59 should look similar to Listing 1. Note that nearly 51 person-years are involved—roughly the productivity of one hardworking and consistent person over an entire working lifetime. The cost of development would approach US$7 million. For the newer version
Why Pay for Commercial Software When So Many Good Products Are Afoot? www.stpmag.com •
23
SIZING UP 0SS
2.2.4 and its 225,065 lines of code, the cost was close to $8 million (not shown).
Speak the Language Of course, the COCOMO model provides only a rough estimate of the cost. Perhaps more useful to your decision about whether to attempt development with a product would be the information about its languages. The top section of the report in Listing 1 shows a language breakdown of all Apache 2 modules, and the middle section shows language totals. So if your shop was proficient only in ANSI C, for example, your skills would be adequate for 88.91 percent of the Apache project. Beware of projects that have higher percentages of more languages. This could be sign of maintainability issues. The more languages, the more effort required to maintain them.
Kick the Tires Let’s say you’ve narrowed your selection matrix down to two enterprisescale applications, both Java-based. It’s time to kick the tires on the project and check under the hood for signs of a tangled wire harness. One free and easy way to check the code for errors is to do it through Eclipse. Each new version sports more automatic checks for bug patterns or poor style use than the one before. Download the code and view it in
•
and the product may be gently decaying. At the other, there could be frantic activity as a highly committed development community pushes toward a release date. Project activity also can provide evidence that code is converging toward stability, and comments logged with code may offer the clues about project weaknesses. If you’re lucky, the project itself will point you to its activity measurements through its Web site. If not, there’s a way to generate the statistics yourself. An excellent product for activity discovery is statSVN (www.statsvn.org), a Java-based statistics tool for Subversion repositories. A typical reportgeneration cycle requires only a few commands and a couple of minutes. The generation sequence should be similar to:
Beware of projects that have higher percentages of more languages. The more languages, the more effort required to maintain them.
Look for Road Maps If well-documented promises were broken in the past, it’s safe to expect the same behavior in the future. Whether you’re using open source or proprietary software, never trust a road map unless history has proven it reliable. Regularly broken road maps tend to be caused by chaotic processes such as underestimation of workload, unexpected issues in the quality of the code, or arguments within the project’s development community. Like poor-quality code, these factors fly a red flag that cries out, “Verify that my project is fit for its purpose.”
FIG. 1: ECLIPSE FINDS BUGS
24
• Software Test & Performance
•
Eclipse. If the code appears full of yellow lines as is the “poor” variable shown in Figure 1, you might take points away on that code’s report card. Eclipse static code-analysis extensions such as PMD (pmd.sourceforge.net) and FindBugs (findbugs.sourceforge.net) can provide further insight into negative paths through the code base.
Check Repository Activity An enormous advantage of open source is availability of the code base. This normally translates into public, openly viewable repositories such as those found in CVS and Subversion containers. You can use this fact to your advantage before you get involved in a particular project by analyzing the development activity around the project. At one extreme, there may be no activity at all for years,
#Get the code from url svn co url #Enter the local source directory cd trunk #Generate a local svn log file svn log url –v –xml > info.xml cd .. #Generate index.htm one level above
trunk java -jar /location to jar/statsvn.jar info.xml /trunk
The result is a thorough and helpful report, including a Jtreemap visual representation of change (see Figure 2). Note that the larger the square, the more lines of code for a given file or directory. The brighter colors describe greater velocity of change for the given file. Later on, as your project moves into the full QA cycle, the Jtreemap graph can suggest which areas to test. Red indicates the most likely area to have the highest density of newly introduced bugs and where you’ll need to focus most of your testing and QA efforts.
Static Code Review Tools that generate static code reports search the specific source code for known bug patterns. These tools have the potential to search vast numbers of lines of code quickly for hundreds of patterns. Tools such as FindBugs even delve into compiled classes. The risk with static code reviews is the introduction noise, warnings for even minor infringements and inaccuracies. Fortunately, as time goes on, these tools are becoming more sophisticated. JUNE 2007
SIZING UP OSS
You can find excellent and easy-touse free static analysis tools that are actively developed and improving. My personal favorites are FindBugs, PMD and QJ Pro (qjpro.sourceforge.net). I’ll scan the Davenport project using FindBugs to demonstrate how easy it is to generate a report. With Davenport’s davenport.jar and jcifs1.2.13.jar (or equivalent) in the same directory as FindBugs, run the following command: FindBugs -textui davenport.jar jcifs-1.2.13.jar
If all goes well, you should see a large amount of text sprawling across your screen, with errors or warnings against line numbers. In my test, the davenport.jar generated few messages, indicative of a rock-solid code base. However, the number of defects— great or small—alone is not an indicator of product quality and not as relevant as comparisons of multiple projects. A good way to use static code analysis as a selection filter is to compare the defect density found against known good and bad projects that use the same programming language.
Take It Out For A Spin
FIG. 2: TINTS OF CHANGE
As radical as it sounds, I suggest running the code from your first two or three choices whenever it is safe and practical. Some conflicts are not documented and can only be found through trial and error. Perhaps the promised functionality works only for specific versions with some arbitrary set of browsers, and your installed base is using an older version. Or maybe the database driver doesn’t work with your in-house standard database. Get your hands dirty through installation and use. Try to capture potential failures early so that the blame mechanism is light. Otherwise, you may end up with a product that you
LISTING 1: DOWN TO COCOMO SLOC 80982 65746 27419 9511 6103 2331 2193 1240 123 58
Directory srclib modules server build support include os test top_dir docs
SLOC-by-Language (Sorted) ansic=69453,sh=9837,perl=1070,awk=397,pascal=225 ansic=65461,lex=190,yacc=95 ansic=27419 sh=8346,awk=488,perl=433,pascal=244 ansic=5934,perl=92,sh=77 ansic=2331 ansic=2188,sh=5 ansic=1219,perl=21 sh=122,ansic= lisp=27,sh=24,perl=7
Totals grouped by language (dominant language first): ansic: 174006 (88.91%) sh: 18411 (9.41%) perl: 1623 (0.83%) awk: 885 (0.45%) pascal: 469 (0.24%) lex: 190 (0.10%) yacc: 95 (0.05%) lisp: 27 (0.01%) Total Physical Source Lines of Code (SLOC) = 195,706 Development Effort Estimate, Person-Years (Person-Months) = 50.96 (611.50) (Basic COCOMO model, Person-Months) = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 2.39 (28.63) (Basic COCOMO model, Months) = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 21.36 Total Estimated Cost to Develop = $ 6,883,764 (average salary) = $56,286/year, overhead = 2.40). SLOCCount, Copyright (C) 2001-2004 David A. Wheeler SLOCCount is Open Source Software/Free Software, licensed under the GNU GPL. SLOCCount comes with ABSOLUTELY NO WARRANTY, and you are welcome to redistribute it under certain conditions as specified by the GNU GPL license; see the license for details. Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."
JUNE 2007
can push through the organization via a well-managed project, but that requires unnecessary energy—for example, forcing the necessary change in the definition of the company desktop with the implied patching from system administrators. A further hidden advantage of downloading and installing software is that the report writer tends not be the techie who’s charged with the maintenance of the product through its life cycle. Installation enhances the opportunity for the writer to learn from the practitioner. Keep your primitive senses alive and listen for the rhythmic bang of the practitioner’s head against the nearest and hardest wall. If you don’t have the time to download and motivate the organization to install the software, at least look at the online demo versions and form an opinion on the workflow and the relative quality of the GUI that the end user first buys into. If the demo on the supplier’s Web site isn’t working, take that as a serious hint to investigate further or begin looking elsewhere.
Verify Standards Compatibility In the competitive universe of dynamic market change, you don’t want to be locked into a proprietary standards market corner where you’re held hostage to monopolistic lethargy, lack of quality, pricing levels and other acidic processes. Open standards in theory enable software that follows the right protocols to replace other, worsewritten software that follows the same protocols and methodologies. If you www.stpmag.com •
25
SIZING UP 0SS
B
UG CHECKING WITH MAVEN Ant and Maven (maven.apache.org) are two popular build managers. Numerous well-known Java-based projects such as Sakai and Maven are applied as part of the consistent infrastructure. The tools support the developer during the whole life cycle through compiling, testing, reporting and deploying.
alone is a good hint that the code is only partially finished or that the value was overlooked in the rush to deadline. To generate a jar file, type:
I like Maven, especially version 2, because it makes my life easier for day-to-day development. For example, to create the beginning of project that deploys as a war file, you type a command similar to:
“All very interesting,” you may be thinking, “but how does this help with quality measurements?” First, to create a Web site with a report, type: mvn site
mvn archetype:create -DgroupId=nl.berg.alan DartifactId=test-webapp -DarchetypeArtifactId=mavenarchetype-webapp Maven can generate a site with project-related information that you can use to help determine software quality.
mvn package
You’ll find that most of the links point to zero information because the template project hasn’t been configured to point to any given repository, mentions no distribution lists or developers, and is generally missing relevant configuration that you’d normally find in the top-level pom.xml file. To check for bug patterns in the code via the PMD plug-in, type:
Here’s an example based on a Maven archetype (read template). Download and install Maven as mentioned on the Maven Web site, maven.apache.org. Create a temporary directory on your system, and then type:
mvn pmd:pmd This action will generate the originally named pmd.html within your site, and should show a message similar to:
mvn archetype:create -DgroupId=nl.berg.alan DartifactId=test-app
nl/berg/alan/App.java Violation Line Avoid unused local variables such as ‘poor’ 16
You should now have a basic Java project. Add the following method to the example Java code test-app/src/main/java/nl/berg/alan/App.java:
To check for duplicate code, type mvn pmd:cpd. A cpd.html file is also generated with a relevant report.
public void tracer(){ int poor=5; }
In conclusion, when you find a Maven-enabled project, try to generate a site and additional reports—you never know what extra context
As you saw previously in Eclipse, the integer is never used. This fact
information you’ll find.
buy into open source, you should also buy into standards whenever possible. The difficulties arise when product X says it’s standard-Y compatible when it’s only partially compatible. Reading the documentation alone won’t highlight this hidden untruth. To avoid the unsavory side effects of partial compatibility, you might want to seek out and apply a protocol test suite. One such tool is litmus (www.WebDAV.org /neon/litmus), which is part of the WebDAV project that ensures that potential servers offer the necessary protocols. Here’s how to install litmus on Debian:
tion purposes; for example: litmus -k http://test_server/dav/~alan_dav_test alan_dav_test password
TABLE 1: ARBITERS Open Source Initiative (OSI) www.opensource.org/ Open Business Readiness Rating (OpenBRR) www.openbrr.org/wiki/index.php/Home Navicasoft Open Source Maturity Model www.navicasoft.com/pages/osmm.htm Cap Gemni Open Source Maturity Model www.seriouslyopen.org/nuke/html/index.php
sudo apt-get install litmus
Ohloh Open Source Reviews www.ohloh.net
If compiled correctly, litmus even works over SSL. The tool expects the URL of the target WebDAV area, the username and password for authentica-
Qualification and Selection of Open Source Software (QOS) www.qsos.org
26
• Software Test & Performance
A long list of tests are mentioned, and if you see issues like the one below, it’s time to scan the developer forums again for the now-obvious warning signs: 7. delete_null.....FAIL (DELETE nonexistent resource succeeded
You’d be amazed by the number of projects that start without some form of initial sanity check. For large projects, I recommend a small filtering process before diving headlong into a significant team effort. It’s easier to change direction early— when you’re still kicking the application’s tires—than later when you realize that those tires are actually made of plastic. Lack of diligence at a project’s onset can ultimately cost an organization more than a small upfront effort. Happy hunting! ý JUNE 2007
Qbsbtpgu!TPBuftu
UN
7FSJmFT 8FC TFSWJDFT JOUFSPQFSBCJMJUZ BOE TFDVSJUZ DPNQMJBODF 40"UFTU XBT BXBSEFE i#FTU 40" 5FTUJOH 5PPMw CZ 4ZT $PO .FEJB 3FBEFST
8FC 4FSWJDFT
"QQMJDBUJPO 4FSWFS
Qbsbtpgu!Kuftu
UN
7FSJmFT +BWB TFDVSJUZ BOE QFSGPSNBODF DPNQMJBODF +VEHFE *OGP8PSME T 5FDIOPMPHZ PG UIF :FBS QJDL GPS BVUPNBUFE +BWB VOJU UFTUJOH
%BUBCBTF 4FSWFS
*NQSPWJOH QSPEVDUJWJUZ DBO TPNFUJNFT CF B MJUUMF TLFUDIZ
"QQMJDBUJPO -PHJD
1SFTFOUBUJPO -BZFS
-FHBDZ
Qbsbtpgu!XfcLjoh
8FCTJUF
-FU 1BSBTPGU mMM JO UIF CMBOLT XJUI PVS 8FC QSPEVDUJWJUZ TVJUF 1BSBTPGU QSPEVDUT IBWF CFFO IFMQJOH TPGUXBSF EFWFMPQFST JNQSPWF QSPEVDUJWJUZ GPS PWFS ZFBST +UFTU 8FC,JOH BOE 40"UFTU XPSL UPHFUIFS UP HJWF ZPV B DPNQSFIFOTJWF MPPL BU UIF DPEF ZPV WF XSJUUFO TP ZPV DBO CF TVSF ZPV SF CVJMEJOH UP TQFD
5IJO $MJFOU
UN
7FSJmFT )5.- MJOLT BDDFTTJCJMJUZ BOE CSBOE VTBHF BOE NBOBHFT UIF LFZ BSFBT PG TFDVSJUZ BOE BOBMZTJT JO B TJOHMF JOUFHSBUFE UFTU TVJUF
UIF OFX DPEF EPFTO U CSFBL XPSLJOH QSPEVDU BOE BOZ QSPCMFNT DBO CF mYFE JNNFEJBUFMZ 8IJDI NFBOT ZPV MM CF XSJUJOH CFUUFS DPEF GBTUFS 4P NBLF 1BSBTPGU QBSU PG IPX ZPV XPSL UPEBZ "OE ESBX PO PVS FYQFSUJTF
(P UP XXX QBSBTPGU DPN 451NBH t 0S DBMM Y ª 1BSBTPGU $PSQPSBUJPO "MM PUIFS DPNQBOZ BOE PS QSPEVDU OBNFT NFOUJPOFE BSF USBEFNBSLT PG UIFJS SFTQFDUJWF PXOFST
Trusting Instincts Is Not a Sin— Experimental Design Can Help
By Yogananda Jeppu
hildren are born true scientists. They spontaneously experiment and experience and experience again. They select,
C
combine and test, seeking to find order in their experiences: “Which is the mostest? Which is the leastest?” They smell, taste, bite and touch-test for hardness, softness, springiness, roughness, smoothness, coldness, warmness: they heft, shake, punch, squeeze, push, crush, rub and try to pull things apart. –R. Buckminster Fuller Through his experiments in the natural world, Italian physicist Galileo Galilei came to realize that nature didn’t operate as defined in Scripture: “Thou hast fixed the earth immovable and firm.” He raised this problem report and was pulled up by the Management for his views. Under pressure, he had to bow down—and sadly, was not around to say “I told you so” when the system failed. How many times as a tester have you faced this problem? You’re sure, with a strong “gut feeling,” that there is a problem. But you don’t have conYogananda Jeppu is a scientist at IFCS Aeronautical Development Agency in Bangalore, India.
28
• Software Test & Performance
crete proof due to lack of time and inadequate testing—and you didn’t have the heart to say “I told you so” after the catastrophe. Scientists have unravelled the mysteries of nature through experimentation. They may have had an original theory but would still have to prove it correct through experiments. In science, “Theory guides. Experiment decides,” the saying goes. Like nature, software today is a chaotic, mysterious object. One doesn’t know what the Creator has put into it. It’s for the Tester to dig deep into the mysteries to find out what’s hidden in its myriad loops. Experimenting with software is surely the way forward for the test team, but the lack of time makes this experimentation a costly endeavor. Numerous techniques exist to help tame the chaos, including the orthogonal array, a systematic approach that incorporates statistics à la R.A. Fisher and his Design of Experiments (DOE)(see “DOE and Taguchi”). I’ve tried orthogonal arrays and found that while they do reduce the test sizes, they’re not the end of the affair. Experimental design also can be used to further optimize the test cases to increase coverage.
Taguchi Test Flight For a sample problem, let’s test a safety-critical component of flight control software. The Taguchi approach (see “DOE and Taguchi”) can be used for other less-critical software as well, since the underlying concept of software testing is universal. So, let’s get off the ground. We’re testing the computation of the scheduled gain of an embedded controller. In fighter aircraft, the feedback controller is designed by tuning the feedback gains so that the aircraft performs as required. This has to be done at various operating altitudes and speeds to ensure uniform performance over the complete operating range. The control designers ensure this by scheduling the gains using altitude and speed as the input parameter. Altitude and speed are provided by a complex system known as the Airdata system. The gains, around 30 or more, are provided as lookup tables that are a function of speed and altitude. Intermediate values are computed using an interpolation algorithm. The tester has to ensure that approximately 5,000-odd gain values have been encoded correctly, and that the interpolation works satisJUNE 2007
factorily for the complete range of speed and altitude. In an end-to-end test scenario, the Airdata system also is connected in place and the tester must design several static and dynamic tests to check the gain values and the interpolation. Let’s tackle this problem using DOE and design a single test case that
will ensure near-complete coverage. In the schematic of the test setup in Figure 1, the hardware component injects the air data sensor outputs into certified Airdata system. The software under test uses the outputs of this component to generate gain data for the control law.
Phot ogra ph b y
Ravi Tahil rama ni
Step 1: Brainstorming The experimenter who does not know what he is looking for will not understand what he finds. –Claude Bernard The Taguchi approach presents a system for the application of DOE to a problem. The first step is brainstorming—perhaps the most important element of the design process. The team gets together to discuss the problem at hand. DOE considers a process to be affected by factors and their respective levels. In software, this would mean the various inputs to the software and its value or amplitude, as the case may be. The process is also affected by noise. This is the uncertainty. In a manufacturing process, uncertainties would include the skill of the worker, weather conditions and so on. In software, they would include hardware, operating system and other environmental variables, the randomness of known inputs and so on. The critical objectives of brainstorming are: • What are we trying to achieve? • How do we measure it? These issues are debated and decided during the brainstorming sessions. In the problem at hand, we must test the table lookup. Each table has 22 speed points and eight altitude points, totalling 176 possible data points in each table. Our objective is to test all 176 data points in all the data tables. As the speed and altitude points for each table are the same, we can test all the tables at once as long as we can ensure that we cover any one table completely. This would be possible if we could by some means excite the inputs to the Airdata system such that we get at least one point for every adjacent four www.stpmag.com •
29
TEST THEE
FIG. 1: AIRDATA INJECTOR Software Under Test Hardware Components
Airdata System
points. This would test the four data points and the interpolation algorithm, as shown in Figure 2. This 2D lookup table is given for speed and altitude values. The table can be adequately covered if there is at lest one test point in each cell. We can now test the data by comparison with another table lookup algorithm computed offline, perhaps using a spreadsheet, MATLAB or other software. The inputs to the Airdata system are the sensor parameters, which change depending on altitude and speed. To test the algorithm, all we have to do is change altitude and speed in some fashion and generate the signals required by the Airdata systems. We can now inject the signals to the Airdata system, and hopefully it will generate the test points required to test the table. In the example, the altitude varies from 0 to 15 Km and speed from 0 to 2 Mach. We’ve decided to change the altitude in specific steps defined by a bias variable Bz and hold this for a specific time duration defined by variable Td. We’ll also add a sinusoidal signal to
TABLE 1: SEVEN DOE FACTORS S No 1
Fact or Td
The time that the aircraft remains at a specific altitude
2
Fm
The frequency of the sinusoidal waveform for mach number
3
Bm
The bias of the sinusoidal waveform for mach number
4
Am
The amplitude of the sinusoidal waveform for mach number
5
Fz
The frequency of the sinusoidal waveform for altitude
6
Bz
The bias of the sinusoidal waveform for altitude
7
Az
The amplitude of the sinusoidal waveform for altitude
30
Description
• Software Test & Performance
Gain Scheduler
Control Law
the altitude with frequency Fz and amplitude Az. We hope this will generate a dynamic signal. Similarly, we’ll generate a sinu-
We must get at least one point in the grid—or, for a 22 x 8 array, 176 data points, each in one cell of the table. Let’s define an experiment outputparameter coverage as the number of cells having a data point. Let’s also define a parameter wastage as points greater than three in a cell. The Taguchi approach defines a quality characteristic for each output parameter. This isn’t the value of the output, but in a broader sense a measure of the desirability of the output parameter. The QC for coverage is “the bigger the better” and for wastage “the smaller the better.”
FIG. 2: INTERPOLATION ALGORITHM
Altitude
Data Point
Table Data Speed
soidal signal for speed defined by frequency Fm, amplitude Am and bias Bm. We’ve now identified seven factors for DOE that will affect our experiment. We’ll consider three levels for each of these factors, as shown in Table 1. Although we have no knowledge of the Levels Unit L1 L2 L3 Airdata system’s workings, we’re sec 5 10 20 trying to test a component that is Hz 0.01 0.05 0.1 beyond it in the signal flow. We’re attempting to gen0.5 1.0 1.5 erate a test case by passing signals 1 1.5 2 into a black box, a common problem faced by testers Hz 0.01 0.05 0.1 carrying out an end-to-end shakedown test. Km 1 2 5 Now it’s time for us to decide Km 0 1 2 what we get from our experiments.
Another QC is “nominal is best,” which is not being considered here.
Step 2: Design When you’re experimenting you have to try so many things before you choose what you want, and you may go days getting nothing but exhaustion. –Fred Astaire If we were to try out the full factorial test or test every single combination of seven factors and three levels, we’d have to do 3^7 (2187) experiments. Instead, let’s use an orthogonal array to reduce our test cases (see Table 2). Taguchi has defined standard sets of orthogonal array for various factor and level combinations. Let’s choose an L18 orthogonal array. L stands for Latin squares and 18 the number of tests we’ll have to run. An L18 array caters to eight factors: one factor with two levels and seven factors with three levels. By conducting 18 experiments, we can hope for some interesting results. As we have only seven factors, we’ll treat the factor with two levels as a dummy. We’ll use the orthogonal array, and assign factors and levels to the columns and rows. JUNE 2007
TEST THEE
TABLE 2: ORTHOGONAL ARRAY Factors/ Test Case 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1
2
3
4
5
1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
1 2 3 1 2 3 2 3 1 3 1 2 2 3 1 3 1 2
1 2 3 2 3 1 1 2 3 3 1 2 3 1 2 2 3 1
Step 3: Experiment There are three principal means of acquiring knowledge... observation of nature, reflection and experimentation. Observation collects facts; reflection combines them; experimentation verifies the result of that combination. –Denis Diderot We can conduct the experiment in any order. In a manufacturing process, we’d have probably run repeat tests and averaged the results. In this case, we don’t have to do all that work, as we haven’t considered variations or noise. Let’s run 18 simulations and monitor the coverage and wastage criteria for each test. We get a maximum coverage of 125—a long way from what we wanted. On a check of the cumulative coverage of our 18 experiments, there are seven cells that we still haven’t covered; proof that orthogonal array isn’t the end of the matter in software testing. The Taguchi approach provides a means to analyze the effect of individual factors and their levels on the experiment. Let’s carry out an analysis of means to study the effect of the factors, DOE’s fourth step.
JUNE 2007
Step 5: Optimization Don’t be too timid and squeamish about your actions. All life is an experiment. The more experiments you make the better. –Ralph Waldo Emerson Optimizing in DOE involves a simple visual inspection of the plot and selection of levels for factors that give a higher value than the grand mean for the QC—“bigger is better”—and selection of the factor and level that gives a lesser value than the mean for QC: “smaller is better.” Thus, levels 3, 2, 2, 1, 3, 1 and 3 for the seven factors Td, Fm, Bm, Am, Fz, Bz and Az respectively would provide us an optimal value for coverage. We haven’t considered wastage here. We haven’t considered this specific test case in the orthogonal array—this combination isn’t there in the 18 tests we’ve carried out. So we must design one more test—test case 19—and run the simulation. We get coverage of 168—much higher than the maximum of 125 obtained from the 18 tests before. The cumulative coverage of all the 19 test cases is 176 cells. We’ve done our task. But can we do better?
FIG. 3: DOE FACTOR PLOTS Td
150
Fm
100
Bm
100
100
80
80
50
60
60
Am
90 80 70
0
Step 4: Factors Experiment alone crowns the efforts of medicine, experiment limited only by the natural range of the powers of the human mind. Observation discloses in the animal organism numerous phenomena existing side by side, and interconnected now profoundly, now indirectly, or accidentally. Confronted with a multitude of different assumptions, the mind must guess the real nature of this connection. –Ivan Petrovich Pavlov
better off with level 2 here. Using simple mathematics, we can get a lot of interesting results and insights into the functioning of the black box. We may not yell “Eureka!” but the results are interesting. Now that we know how the software is behaving, we can get along to the DOE’s next step of optimization.
A simple visual technique used in DOE is to plot 7 8 6 the mean out1 1 1 come of individ2 2 2 ual factors and 3 3 3 their levels. The 2 3 3 grand mean is 3 1 1 1 2 2 computed as an 3 2 3 average of out1 3 1 come of all the 2 1 2 experiments. In 2 2 1 this case, the aver3 3 2 age coverage is 59 1 1 3 1 3 2 cells and wastage 2 1 3 is 73.8583. We’ve 3 2 1 divided wastage by 3 1 2 100 to keep them 1 2 3 in the same range 2 3 1 for ease of plotting. This doesn’t affect the analysis at all. The mean effect of factor 1 at level 1—a Td of 5 seconds is computed as the average of test cases 1,2,3,10,11 and 12. This is because these are the test cases where factor 1 is at level 1 (see the “2” column in Figure 2 and the corresponding rows). Similar analysis is carried out for all factors and all levels, as plotted in Figure 3, which depicts the effect of the various factors and their level 1, 2 and 3 with respect to the grand mean, as well as the coverage and wastage metrics. The plot shows the effect of the factors very clearly. If we increase Td to level 3, we get a large coverage, but also a higher level of wastage. Increasing frequency of speed Fm to level 3 reduces coverage and increases wastage. We’re
1
2
3
Fz
90
40
1
3
Bz
150
80
2
40
60 1
80
50
60
3
50
1
2
3
0
1
2
3
40
1
2
3
Analysis of Mean Wast e Coverag e
70 60
50
Az
100
100
2
Mean C Mean W
1
2
3
www.stpmag.com •
31
TEST THEE
D
Let’s assign new values for the various levels in the plot with a simple logic. In cases where level 3 showed a higher coverage, we’ll increase the values for all the three levels. In cases where level 2 showed a better result, we’ll reduce the values, and in cases where level 1 showed a better result, we’ll reduce the values. When we carry out the 18 experiments again with these new values, we get a higher value of coverage for all 18 test cases compared with the earlier set of values (see Figure 4). This shows that we’re on the right track. When we optimize and generate the 19th case, we get coverage of 175 cells—just one cell left uncovered. We can add just one more test case to cover this cell. The true worth of an experimenter consists in his pursuing not only what he seeks in his experiment, but also what he did not seek –Claude Bernard Out of curiosity, I decided to determine whether this test case could be further optimized using a genetic algorithm, a technique that mimics Darwin’s “survival of the fittest” using the selection, reproduction and mutation concepts to arrive at an optimized solution. I used a standard GA package and set a population size of 20 to start the experiment. After 30 generations and 600 simulations (30 x 20), I arrived at a single test case that could cover all 176 cells. We also tried pure randomnumber injection into our simulation, and after 300 iterations received a cumulative coverage of 174 cells. The DOE method, though manual, offers the benefit of insight into the process.
ESIGN OF EXPERIMENTS AND TAGUCHI Our view... is that it is an essential characteristic of experimentation that it is carried out with limited resources, and an essential part of the subject of experimental design to ascertain how these should be best applied; or, in particular, to which causes of disturbance care should be given, and which ought to be deliberately ignored. —Sir Ronald A. Fisher
Sir Ronald Fisher advocated Design of Experiments (DOE) in the late 1920s to optimize the yield of crops considering the various factors like water, soil and weather. He used an orthogonal array to reduce the experiments by considering a small part or a fractional factorial of the complete experimental matrix. However, he didn’t stop there, but optimized crop yields by selecting the best combination of factors. This technique enjoyed a long lifespan in the agricultural domain. As a researcher at Electronic Control Laboratory in Japan, Dr. Genichi Taguchi 1 carried out significant research with DOE techniques in the late 1940s. He spent considerable effort to make this experimental technique easy to apply and employed it to improve the quality of manufactured products. Dr. Taguchi’s standardized version of DOE, popularly known as the Taguchi method or approach, was introduced in the U.S. in the early 1980s.Today it’s one of the most effective quality-building tools used by engineers in all types of manufacturing activities.The approach has also been used to optimize process control, biochemical experiments, advertisements and marketing—and even making an omelet. So why not software testing? The Taguchi approach has also been used successfully for testing software2. The earliest known use of DOE for software testing was the employment of an orthogonal array for compiler testing3. It has also been used for testing the AT&T PMX/StarMAIL4. The test cases have been designed by using only a part of the Taguchi approach; i.e., the use of orthogonal arrays.
Time-Tested Robustness
FIG. 4: NEAR-TOTAL COVERAGE 180 160 140 120 100 80 60 40 20 0
32
2
4
6
8
• Software Test & Performance
10
12
If the only tool you have is a hammer, you tend to see every problem as a nail. –Abraham Maslow Design of Experiments is a time-tested engineering optimization tool. It’s been used for decades to build efficiencies into manufacturing to generate higher yields and improve quality. You’ve now seen how software engineers can use a part of that process—the orthogonal array—to design more efficient test cases. DOE can be used effectively to optimize test cases based on certain criteria. The criterion is immaterial—all that matters SET I is that it be measurSET II able in terms of desirability. The mathematics used 14 16 18 are simple, the concepts are easy
to understand, and there are software packages available that can design and optimize the test cases using Taguchi DOE. Experimenting with applications in this way also gives the tester an insight into the software’s inner workings. In the words of the English philosopher Roger Bacon, “Argument is conclusive, but... it does not remove doubt, so that the mind may rest in the sure knowledge of the truth, unless it finds it by the method of experiment. For if any man who never saw fire proved by satisfactory arguments that fire burns his hearer’s mind would never be satisfied, nor would he avoid the fire until he put his hand in it that he might learn by experiment what argument taught.” ý REFERENCES 1. en.wikipedia.org/wiki/Genichi_Taguchi 2. Genichi Taguchi, Subir Chowdhury, Yuin Wu, “Taguchi’s Quality Engineering Handbook,” John Wiley and Sons, 2004 3. Rober Mandl, “Orthogonal Latin Squares: An Application of Experiment Design to Compiler Testing,” Communications of the ACM, Vol. 128, No. 10, October 1985 4. Robert Brownlie, James Prowse Madhav S. Phadke, “Robust Testing of AT&T PMX/StarMAIL Using OATS,” AT&T Technical Journal, Vol. 71. No. 3, May/June 1992
JUNE 2007
Practice Safe .NET Test With IR Data esting is the most challenging job in software development. The software tester must select an extremely small number of tests from the countless possibilities
T
and perform them within a limited period of time, often with too few resources. Additionally, employing only a fraction of the possible tests, the tester must evaluate, collect, analyze and provide qualified information to the decision makers for accurate risk analysis and a clear understanding of the customer value proposition. For perspective, let’s say we want to test possible filenames on a FAT file system that imposed the 8.3 character filename format, using only the 26 letters from A to Z. BJ Rollison has more than 16 years’ experience in the computer industry and is currently a Test Architect with Microsoft’s Engineering Excellence group. JUNE 2007
The total number of possible filenames with an eight-letter filename and a three-letter extension is 268 + 263, or 208,827,099,728. If we were assigned to test long filenames on a Windows platform using only ASCII characters (see Table 1), the number of possibilities increases because there are 86 possible characters we can use in a valid filename or extension and a maximum filename length of 251 characters plus a threecharacter extension. That’s 86251 + 863. Trust me, that is one big number. And the possibilities expand to virtual infinity if we increase the test data set to include all Unicode 5.0 characters, which includes about 101,063 code points. I think you get the point. One of the professional tester’s most important skills is to determine which test data to use in a test case so as to provide a high degree of confidence that the feature or functionality is adequately tested.
Photograph by John-Mark Romans
By BJ Rollison
www.stpmag.com •
33
SAFE STRINGS
TABLE 1: ASCII RESTRICTIONS 4 EO T 00 04
DC 1 00 11
3 ET X 00 00 3 DC DC 2 3 00 00 12 13
6 7 AC BE K L 00 00 06 07
8 9 BS HT 00 00 08 9
A LF 00 0A
B VT 00 0B
C FF 00 0C
D CR 00 0D
E SO 00 0E
F SI 00 0 F
DC 4 00 14
5 EN A 00 00 5 NA K 00 15
SY ET N B 00 00 16 17
CA N 00 18
E M 00 19
SU B 00 1A
ES C 00 1B
FS 00 1C
GS 00 1D
RS 00 1E
# 00 23
$ 00 24
% 00 25
& 00 26
‘ 00 27
( 00 28
) 00 29
* 00 2A
+ 00 2B
, 00 2C
00 2D
. 00 2E
2 00 32
3 00 33
4 00 34
5 00 35
6 00 36
7 00 37
8 00 38
9 00 39
: 00 3A
; 00 3B
< 00 3C
= 00 3D
> 00 3E
A 00 41
B 00 42
C 00 43
D 00 44
E 00 45
F 00 46
G 00 47
H 00 48
I 00 49
J 00 4A
K 00 4B
L 00 4C
M 00 4D
N 00 4E
P 00 50
Q 00 51
R 00 52
S 00 53
T 00 54
U 00 55
V 00 56
W 00 57
X 00 58
Y 00 59
Z 00 5A
[ 00 5B
\ 00 5C
] 00 5D
^ 00 5E
6
` 00 60
a 00 61
b 00 62
c 00 63
d 00 64
e 00 65
F 00 66
g 00 67
h 00 68
i 00 69
j 00 6A
k 00 6B
l 00 6C
m 00 6D
n 00 6E
7
P 00 70
q 00 71
r 00 72
s 00 73
t 00 74
u 00 75
V 00 76
w 00 77
x 00 78
y 00 79
z 00 7A
{ 00 7B
| 00 7C
} 00 7D
~ 00 7E
U S 00 1 F / 00 2 F ? 00 3 F O 00 4 F _ 00 5 F o 00 6 F DE L 00 7 F
SP 00 20
! 00 21
“ 00 22
3
0 00 30
1 00 31
4
@ 00 40
5
0 NU L 00 00
1 ST X 00 01
1
DL E 00 10
2
0
2 SO T 00 02
Generate Random Static For that experimental data, testers have two choices: static data in the form of hard-coded strings in a table or other document, and randomly generated test data. Random test data is formed using one of two methods: obtuse random (OR) generation or intelligent random (IR) generation. Each of these forms has its benefits and drawbacks, but here we’ll focus on the IR form of test data generation. When applied appropriately, intelligent test data can quickly and easily increase the breadth of test coverage with little effort, offering a greater probability of exposing errors as compared to “best guess” approaches. Data- and keyword-driven test automation approaches tend to rely on static test data passed to the application under test (AUT) from the data or keyword table. Scripted and procedural test automation occasionally uses hardcoded strings (a horrible practice), or read streams of data from various file formats into memory for use in propa-
34
• Software Test & Performance
gating test data into the AUT. Sometimes static test data provides value in terms of increased confidence or higher probabilities of defect exposure—especially if that data has exposed errors in the past in similar functional areas. However, the repeated use of static data has an extremely low probability of proving or disprov-
ing anything new in subsequent iterations of automated test cases. Creating large sets of files with static data is time-consuming and may also require periodic maintenance that further increases the overall cost of the automation effort. Relying only on static data often leads to the pesticide paradox commonly associated with test automation, in which code over time becomes resistant to static tests while still remaining flawed. The random test data—sometimes frowned on in software testing because of its unpredictability—is automatically generated at runtime. Another drawback of the inordinate use of unconstrained random data is that it can render testers or developers unable to reproduce defects exposed using the randomly generated test data. But random test data can also be used quite effectively in automated functional tests. Valid and invalid randomly generated test data can better evaluate the robustness of an AUT than static data because the application of random test data causes unexpected or exceptional conditions. Often, when unanticipated states occur in software, the result is typically unpredictable behavior such as data corruption or the AUT’s inability to properly handle the data stream, leading to a program crash or hang. And as testers, we’re all pretty darn satisfied when we cause a catastrophic application failure.
It All Starts With a Seed The trick to using random test data effectively for functional testing is to overcome the obstacles of nonrepeata-
TABLE 2: EQUIVALENCE CLASS(ROOM) Input/Output Parameter
Valid Class Subsets
Invalid Class Subsets
Base filename (without 3-letter extension)
Punctuation 0x21, 0x23 – 0x29, 0x2B0x2D, 0x3B, 0x3D, 0x40, 0x5B, 0x5D, 0x60, 0x7B, 0x7D, 0x7E
Control characters U+0000 – U+001F, & U+007F
Numbers 0x30 – 0x39
Reserved characters (0x2F, 0x3A, 0x3C, 0x3E, 0x7C)
Uppercase ASCII 0x41 – 0x5A
Space character (0x20) as only character in a string
Lowercase ASCII 0x61 – 0x7A
Space character (0x20) as first character in a string
String length 1<> 251 inclusive
Space character (0x20) as last character in a string
JUNE 2007
SAFE STRINGS
bility and unpredictability. For the first problem, a seed value will ensure repeatability of randomly generated test data. For the second, the common technique of equivalence-class partitioning can constrain randomness to produce predictable outcomes. Such intelligent random test–data generation increases both the breadth of data coverage and the probability of exposing unexpected defects. Modern programming languages include the ability to generate pseudorandom numbers. If no seed value is specified, the random number generator typically uses a time-dependent seed value to produce a random value within a finite range of values. But if the random number generator starts from a specific seed value, the same number or series of numbers is generated each time. Of course, a hard-coded seed value isn’t beneficial, because we’d generate the same random test data each time the test is executed. So instead of using a hard-coded seed value or passing a seed value as an argument to a random data generator, we’ll create a random seed value to seed a new random object that will generate our random test data. The method below is a simple example of how to generate a random seed value in C#. //************************************** // Description: C# method to generate a random seed value // Param: None // Return: non-negative random integer value in the range of 0 to 231 - 1 //************************************** * private static int GetSeedValue() { // Create a new instance of the pseudo random number generator object Random randomObject = new Random(); // Return a non-negative random integer value in the range of 0 to 2,147,483,647 int seedValue = randomObject.Next(); return seedValue; }
Now we simply have to call the GetSeedValue() method from within our automated test case. In the example below, we declare an int data type and assign it the return value of the GetSeedValue() method. The seed value is passed as an argument value to the Random class object to create a new instance of the pseudo–random number generator assigned to the variable named randomTestData JUNE 2007
Object. In this example, the seed value is displayed in the console window, but it’s a best practice to write the seed value to a permanent test-case log because the console window is closed once the test-case execution completes. static void Main() { // Get a seed value int seed = GetSeedValue(); // Ideally the seed value is recorded in a test case log file Console.WriteLine(seed); // Create a new instance of the pseudo random number generator object based on the seed value Random randomTestDataObject = new Random(seed); }
With a randomly generated seed value as a base number, you can generate a new pseudo–random number generator object that will produce repeatable outputs of randomized test data. This approach not only provides great variability in test data each time the test is executed, but also offers the ability to exactly reproduce specific test data if the randomly generated data exposes an error.
Growing Intelligent Test Data Predictable test data allows testers to anticipate results when applying that data. The key to predictable outcomes with intelligent random test data is a common technique known as equivalence-class partitioning. The ability to accurately design equivalence-class partitions of the data set is essential to projecting an expected outcome. The first step is to define the data set required for the specific purpose of the test case. Start by generating a random string of ASCII characters for use as a valid filename. For this example, we’ll use a restricted data set of ASCII-only characters, as illustrated in Table 1. But we know that not all ASCII character code points are valid in a filename. We’ll need to decompose the data in Table 1 using the equivalence class, partitioning heuristics of range of values, unique values and special values into valid and invalid equivalence-class subsets. The entire range of ASCII-character code points are the Unicode values from U+0000 through U+007F. However, the range of character code points between U+0000 and U+001F, as well as the U+007F code point, are control characters and therefore not
R
ANDOM, SOMETIMES RECKLESS Software developer and consultant Jonathan Kohl on his blog www.kohl.ca/blog/archives/000160 .html discussed a situation in which a client’s excessive use of random data made it virtually impossible to track down failures. Kohl appropriately referred to this as “reckless test automation.” Anecdotal exceptions aside, several interesting studies have shown that the use of randomly generated data in software testing produced overwhelmingly positive results. A seminal study was conducted in 1990 by a group at the University of Wisconsin that developed Fuzz, a tool and technique to generate random test data. The study subjected software programs to these random input streams from a black-box test approach.The randomly generated test data initially exposed failure rates as high as 33 percent in Unix software. By 1995, the researchers still found failure rates as high as 23 percent using random test data. This study led to the birth of “fuzz testing,” which is now commonly used as a penetration attack method in security testing. Their work can be found at www.cs.wisc.edu/~bart/fuzz/.
legal character-code points in a filename. These character code–point values are the first invalid equivalence class subset. Now our range of valid characters for a filename is limited to the character code points between U+0020 and U+007E. However, there are nine reserved characters within that range that also are invalid characters in a filename (the “, *, /, :, <, >, ?, \ and | characters). The respective Unicode code points for those invalid filename characters are U+0022, U+002A, U+002F, U+003A, U+003C, U+003E, U+003F, U+005C and U+007C. These nine characters would be another invalid equivalent class subset. Also, although the space character (U+0020) is a valid character in a filewww.stpmag.com •
35
SAFE STRINGS
name, it can’t be the only character, or the first or last character in a filename on a Windows platform. The Windows file system will normally not accept a filename that contains only spaces, and will automatically truncate space characters from the leading and trailing positions of a filename string. Therefore, our intelligent randomstring generator must prevent the first or last randomly generated character
from being a space character. Because we want to use the randomly generated string as an oracle to verify whether the file is actually saved to the system, if we pass a random string that contains a leading or trailing space as an argument to the File.Exists() method to verify the existence of the file on the system, the method would return false because Windows file I/O functions truncate that character from the
LISTING 1: ANY RANDOM ASCII //************************************************************* // Description: C# method to generate a valid filename of ASCII characters // Param: Random object // Return: Variable length string of ASCII only characters <= maxLength //************************************************************** private static string GetRandomAsciiFilename(Random randomTestDataObject) { // maximum length of base filename component assuming standard 3 letter extension int maxLength = 251; // minimum range of valid characters converted to an integer value int minChar = Convert.ToInt32(‘\u0020’); // maximum range of valid characters converted to an integer value int maxChar = Convert.ToInt32(‘\u007E’); // new instance of a Stringbuild object to store random characters StringBuilder result = new StringBuilder(); // Generate a random string length between 1 and 251 based on the seeded random object int length = randomTestDataObject.Next(1, maxLength +1); while (result.Length < length) { // Generate a random number between the minimum character value // the maximum character value and convert it to a character type char c = Convert.ToChar(randomTestDataObject.Next(minChar, maxChar + 1)); // Verify the random character code point value is not a reserved character if (!IsReservedChar(c)) { // Verify the first and last character is not a space character if (!(result.Length == 0 && c == ‘\u0020’) && !(result.Length == length - 1 && c == ‘\u0020’)) { // If the character is not a reserved character, only spaces, or a lead or trailing // space character it is appended to the Stringbuilder object result.Append(c); } } } return result.ToString(); } //************************************************************* // Description: C# method to test for reserved characters // Param: Randomly generated character // Return: True if random character is a reserved character; otherwise false. //************************************************************** private static bool IsReservedChar(char c) { // Character array of all reserved characters char[] reservedChar = new char[9] { ‘\u0022’, ‘\u005C’, ‘\u003C’, ‘\u003E’, ‘\u007C’, ‘\u002F’, ‘\u003F’, ‘\u002A’, ‘\u003A’ }; foreach (char rc in reservedChar) { // Compare randomly generated character against each reserved character if (rc == c) { return true; } } return false; }
36
• Software Test & Performance
string when it was saved to the file system. Therefore, we need special subsets to identify strings composed only of space characters or strings with leading and trailing space characters as invalid equivalence-class members for a base filename component. For example, Windows file I/O functions will remove the space at the end of the filename mytest .txt and save the file as mytest.txt without the space character. Within the range of characters between U+0020 and U+007E, we must define the valid class subsets. The valid character code points include punctuation symbols, numbers and upperand lowercase characters A though Z, as outlined in Table 2. Also, the string length for a valid base filename with a standard three-letter extension is from 1 to a maximum string length of 251 characters. Table 2 illustrates a simplified equivalence-class partition table for this particular example. (Note: Additional unique and special valid and invalid class subsets that aren’t included in this simplified example must be considered for complete testing of this simple feature.) The equivalence-class table provides a design template for developing the algorithms used to generate random data and constrain that data within specific guidelines. We can easily exclude the invalid class control–character subset by setting our minimum (minChar) and maximum (maxChar) ranges of random characters to generate in the GetRandomAsciiFilename method (see Listing 1). By default, this also includes all the valid character subsets. The method also generates a random-length string greater than one character and less than the maximum of 251 characters. A random character is generated and checked to make sure it isn’t reserved by calling the IsReservedChar() method. It also checks that it isn’t a space character if it’s the first or last character in the string. Calling the GetRandomAsciiFile name() method and assigning it to a string variable is all that’s required to build a valid random-length base filename component on the Windows operating system. Of course, the range of allowable characters in a filename is much greater than illustrated in this example: JUNE 2007
SAFE STRINGS
FIG. 1: NOTEPAD—THE STALWART
static void Main() { int seed = GetSeedValue(); Console.WriteLine(seed); Random randomTestDataObject = new Random(seed); // Get a random string and assign it to the string type variable string myRandomFileName = GetRandomAsciiFilename(randomTestDataObject); … // open the file save dialog and set the text of the string variable to the filename control }
The methods described above illustrate the foundation of intelligent random test data. This data is repeatable and predictable using probabilistic algorithms. Randomly generated seed values are used to both generate random data and to replicate previous test data. The predictable nature of random test data is defined by the valid and invalid equivalence-class partition subsets. Of course, the accuracy of the data depends on the tester’s knowledge and ability to precisely decompose the data set, and random test data doesn’t emulate average user inputs.
potential defects with greater efficiency than manual ad hoc testing or methods that use static test data. For example, most applications or Web apps today support Unicode characters, yet most testers limit the majority of test data to the set of characters on the keyboard (or ASCII characters). One way to easily increase testing effectiveness is to employ strings of randomly generated Unicode characters beyond the standard characters defined by ASCII. This increases the variability of different character-code
points in varying positions in the string, but also increases the breadth of coverage to include multiple language groups. Sure, it’s unlikely an average user would use a Hindi character, a Japanese character and a Russian character in a single string. However, a string-parsing algorithm that supports Unicode doesn’t necessarily care that a character is assigned to a particular language group or family, but that it is a valid or invalid Unicode character-code point and it must be handled accordingly. To illustrate this point, we used a simple random-string generator to construct a string of 1,000 randomly selected Unicode character-code points between U+0020 and U+FFFF. This includes Unicode code points that aren’t assigned characters as represented by rectangles. It also contains many ideographic characters because these characters dominate the Unicode repertoire. This simple, obtusely generated random string was saved as a Unicodeencoded text file and then opened with several simple word-processing applications to test for their ability to handle random Unicode test data. The file opened as expected, the characters displayed correctly and the string remained intact (see Figure 1) when opened in Notepad. When the same file was opened with a similar freeware application named Win32Pad, the application dis-
FIG. 2: WIN32—LOOKED BAD, FILE GOOD
Putting IR Data to Work Intelligent random test data does provide increased variability and breadth of coverage of the possible test data without exhaustive testing. And as the number of iterations of the automated test case increases, the randomly generated intelligent test data has a greater probability of exposing any JUNE 2007
www.stpmag.com •
37
SAFE STRINGS
FIG. 3: COPYWRITER—LOSS INVITER
played corrupted characters, indicating it was unable to properly support Unicode characters (see Figure 2). However, subsequent I/O tests revealed that Win32Pad did not alter the file structure, and the file contents remained intact even when saved with a different filename and reopened with Notepad.
The original test file was opened with a third freeware application, called CopyWriter. In this test, the application not only displayed corrupted characters, it also truncated the string after 135 characters (see Figure 3). Further I/O tests revealed that this application not only truncated the string, but also corrupted the
file contents, resulting in data loss. This critical defect was discovered in less than 100 consecutive iterations of the automated test case comparing simple file I/O functionality of these three applications. The total automated test time was less than five minutes from the initial test to the test that identified the first data loss/data corruption–type defect by comparing file contents. The code examples here illustrate the fundamental concepts of intelligent random test–data generators. But effective application of IR test data also can benefit from more resilient data generators. I’ve written a functional intelligent random (IR) test–data generator— called GString—that can automatically generate strings of random Unicode code-point values. Download it free at www.testingmentor.com/tools.html. GString also includes a dynamic link library to generate intelligent random strings in automated test cases. Using intelligent random test data along with data representative of the customer and historically problematic is a smart approach to improving your automated (and manual) tests. ý
NOMINATIONS ARE NOW OPEN FOR THE 2007 TESTERS CHOICE AWARDS The Testers Choice Awards, sponsored by BZ Media’s Software Test & Performance Magazine, are bestowed each year on the top test/QA and performance management products. The judges are the most important people in our industry: YOU, the 25,000 test/QA professionals who read Software Test & Performance Magazine. Awards will be given for winners and finalists in each of the categories. The top vote-getter will also be awarded a grand prize. Winners will be announced at the Software Test & Performance Conference October 2–4, 2007, in Boston. The results will be published in the December issue of the magazine and posted to STPmag.com on December 1, 2007.
Visit www.stpmag.com/testerschoice for further details about what’s becoming one of the industry’s most important and prestigious awards. Remember, NOMINATIONS CLOSE JULY 13.
www.stpmag.com/testerschoice 38
• Software Test & Performance
JUNE 2007
PUT ECLIPSE TO WORK!
LEARN HOW TO BUILD BETTER SOFTWARE USING ECLIPSE! MARK YOUR CALENDAR! NOVEMBER 6–8
WRITE BETTER SOFTWARE
LEARN FROM THE BEST
GO BEYOND THE FREE IDE
GET THE INSIDE TRACK
by leveraging Eclipse’s features
with Eclipse add-ins and plug-ins
SAVE TIME AND MONEY with proven productivity tips
LEVERAGE CODE REUSE with the Eclipse Rich Client Platform (RCP)
experts in the Eclipse community on the forthcoming Eclipse 3.3 and Europa code releases
MOVE INTO THE FUTURE with AJAX, Web 2.0 and SOA
BECOME AN ECLIPSE MASTER
by taking classes grounded in real-world experience
For sponsorship opportunities or exhibiting information, contact Donna Esposito at 415 -785-3419 or desposito@bzmedia.com.
REGISTRATION NOW OPEN!
www.eclipseworld.net
WASHINGTON, D.C. AREA!
HYATT REGENCY RESTON • RESTON, VA • NOVEMBER 6–8, 2007 A BZ Media Event
Best Prac t ices
Post-Deployment Tuning: Widgets Won’t Save You Post-deployment tuning Souza recalls a recent allhas always been one of hands-on-deck effort to tech’s more maddening troubleshoot a slow query exercises. Tracing the root that threatened to bring cause of even minor perdown his employer’s entire formance issues often inWeb site. Tracking down the volves major investments of culprit, a single missing time and energy. Invariably, index file, turned into a attention turns from Keystone Kops affair. Bepatient troubleshooting to yond arguing and blaming vigorous finger-pointing. each other as the site kept Geoff Koch Surely, the blame game grinding to a halt, Souza goes, the architects could have been and his colleagues desperately tried to more diligent in gathering requiretrack down the problem using little more ments. Developers could have tried than a scattershot, trial-and-error harder to avoid sloppy coding. And it’s approach. Recent deployments, includhard to believe testers didn’t pay more ing operating system and database patchattention during quality assurance. es, were reviewed; locking schemes on Such criticisms may be well-deserved. tables were changed; database, applicaBut that doesn’t change the fact that tion and Web server logs were scoured; many smart software types have a slew of and of course, numerous meetings and tips for taking ownership of after-theconference calls were held. fact tuning. Among these: embrace the “This is typical of what happens when basics, including properly benchmarkyou don’t have diagnostic tools and pering and instrumenting your system; formance metrics for the black boxes we always think carefully about whether call applications and servers,” says Souza, tuning is worth it; and never accept conadding that “one of the most important ventional wisdom about the declining things is that you need to have some importance of post-deployment tuning knowledge about what normal is.” without carefully thinking about it. Baseline performance shifts over This is sound advice, though pertime, so keeping track of “normal” haps tough to remember. So for those requires near-constant vigilance. Just who want the summary version, these how much monitoring should be done? three words best sum up what to do Enough, says Souza, that if one piece of when faced with distressing sub-par the infrastructure or application performance of those upstream in the degrades, you can immediately identify develop cycle: the location of the problem. And Deal with it. enough so that if you tune your applica“One of the most important things tion, you’re aware whether it improves you can do, whether you are an adminand by how much. istrator or developer, is assume that the “I have seen people tune and tune for performance problem is yours until you weeks and not make any performance prove otherwise,” says Steve Souza, a improvements as they randomly tune consultant in Washington, D.C. “I have anything within flailing distance,” he says. seen everyone sit around and point finFirst, Do No Harm gers and not take the 10 minutes it The flailing comes about because, too would take to prove that it was or was often, post-deployment tuning is a reacnot their problem.”
40
• Software Test & Performance
tion to shrill and sometimes unreasonable complaints made by management or customers. Let’s face it, those charged with making IT gear work better are rarely in the upper echelons of corporate hierarchy and so are invariably swamped with requests for bug fixes and enhancements issued from on high. Yet this frantic, hamster-wheel approach of never-ending optimization isn’t always the right tack. Until recently, Dassault Systèmes’ subsidiary ABAQUS, Inc. pursued a strategy of tuning its core engineeringmodeling application for specific industry segments. The company’s underlying algorithms were robust. But given that the program is used to simulate everything from car crashes to diaper stretching to air moving over a plane’s wing, it’s easy to see why some fine-tuning might be necessary. However, Nick Monyatovsky, an ABAQUS platform specialist based at the division’s Providence, R.I., headquarters, explains that for his firm, the best post-deployment performancetuning strategy was to stop doing it altogether. One reason is that cranking up the speed of an application that can take days to crunch through its modeling calculations results in an inevitable and sometimes unacceptable hit to accuracy, specifically in floating point calculations. Next, work done to optimize performance for a specific customer doesn’t necessarily translate well to other customers or industry segments. Diapermakers aren’t keen on aerodynamics, after all. Finally, doing post-deployment, customer-specific tuning is an unacceptable drain on resources. Geoff Koch writes about science and technology from Lansing, Mich. Write to him with your favorite acronym, tech or otherwise, at koch.geoff@gmail.com. JUNE 2007
Best Practices “It was a logistical nightmare,” says Monyatovsky, explaining why ABAQUS has stopped providing versions of software specifically tuned for a customer or industry segment. “We were maintaining different builds and different quality assurance processes. The overhead was just far too big.”
Tuning Still Matters in A Widget World Big changes are afoot in the world of consumer software. Reading those who hail the rise of application-in-a-browser computing, it’s fair to wonder whether Monyatovsky’s concerns about rigorous in-house tuning will simply evaporate because of new models of writing and distributing software. In a Newsweek article published in late December 2006, reporter Brian Braiker suggested that 2007 might be the year of the widget. And in an April 2007 column in Business 2.0, tech pundit Om Malik suggested the widget could make the jump to corporate
respectability “in record time.” As more and more code is deployed in bite-sized bits of functionality that work within standard browsers, postdeployment tuning starts to look like a much simpler problem for vendors and users alike. Tweak a proxy setting here, download a plug-in there—and the Web and various back-end systems will take care of the rest, right? Maybe not, says James Reinders, director of +business development and marketing for Intel’s Developer products division near Portland, Ore. “All the systems the widgets deploy on— Java, Flash, .NET, SAP—sit on top of very carefully engineered bases which are under intense pressure to be high performance,” he says. Widget writers, Reinders continues, do indeed help write code that pushes the envelope. But the work of those software engineers who tune core systems is increasingly important, in large measure due to the rise of abstraction—what Reinders calls “removed-from-the-hard-
ware programming environments.” Frustrating as it may be, it’s flat-out unlikely that the need to be able to coax extra performance from deployed applications will vanish anytime soon. So even as others dream of a tuning-free world built on ubiquitous bandwidth and Webbased everything, perhaps the best advice is to listen to the old-timers who tend to be skeptical of such claims. The idea that new ways of distributing and using code will obviate the need to do tuning “is wrong on many fronts,” says consultant Souza, a 21-year industry veteran. “This is like saying that the days of well-thought-out software, analysis design and requirements are dead because we now have browsers. In many ways, software development process hasn’t changed much through the years.” Which is why cold pizza, ultra-caffeinated beverages, conference calls and meetings, emergency messages on pagers at 2 a.m.—and, yes, even fingerpointing—will be with us well into the future. ý
Index to Advertisers Advertiser
URL
Agitar Software Inc.
www.agitar.com/agile
Coverity
www8.coverity.com
Empirix
www.empirix.com/freedom
Eclipse World
www.eclipseworld.net
39
Hewlett-Packard
www.hp.com/go/software
44
Klocwork
www.klocwork.com
43
Microsoft
www.defyallchallenges.com
21
Parasoft
www.parasoft.com/stpmag
27
Software Test & Performance Conference
www.stpcon.com
6
Software Test & Performance Magazine
www.stpmag.com
8
Software Test & Performance Free Software
www.stpmag.com
41
Seapine Software Inc.
www.seapine.com/qawps
Testers Choice Awards
www.stpmag.com/testerschoice
38
TotalView Technologies (formerly Etnus)
www.totalviewtech.com/memoryscape
13
JUNE 2007
Page
2 11 3
4
www.stpmag.com •
41
Future Future Test
Test
Composite Web Applications: Our Blind Spot try at will because AJAX is Imagine your company anticipating your movelaunching a new missionments and making server critical software application calls behind the scenes. that’s gone through a lessMash up Google Maps with, than-thorough QA cycle. say, a real estate–brokerage Unit, integration and load home finder, and you sudtesting never got done. denly have a lot going on at Functional testing was too the browser that used to little, too late. happen where you could Sounds like certain failcontrol it, on your server. ure waiting to happen? Imad Mouline We’re seeing layers upon Absolutely, but most compalayers of third parties involved in Web nies actually operate this way. Look at virapplications. For example, you may think tually any Web application released today. you’re just paying bills on your bank Web Even when they do perform a thorough QA cycle on their Web applications, site, but really be interacting with a thirdorganizations often skip huge swaths of party bill-paying Web service. That service the code. This oversight can lead to big may be enlisting someone else’s hidden business problems. Web analytics Web service, and so on. What Web application testers rouIn providing sophisticated Web applitinely overlook is the increasing share of cations like these, organizations are parathe presentation, business logic and data doxically relinquishing control of them. delivered by third parties. (CompliLayers of third-party Web services are cating this problem, any testing that is wonderful when they work, but in most done is often performed in artificially cases, organizations have no reliable data ideal conditions.) on what’s working when and how well. The third-party contribution to Web We now live in a composite world—or a state of anarchy, depending on your point of applications can’t be overstated. For view. Either way, many organizations are example, one global company I work still managing their Web experience as if with—and this is typical—has 50 separate this were the year 2000, when IT had third parties involved in providing concomplete control. tent, logic, data and infrastructure services (content offloading or Web acceleraSome Imperatives tion) to their Web properties. As many as Engineer the customer perspective into 20 of those third-party providers can your product. Don’t just pile on functionsimultaneously contribute to a single ality—switch applications on and see how page view, with services aggregated just in they affect the business. Test from the time at the end user’s browser. If this beginning of the development process company tested only its own code prior to and from the outside in. As you build out deployment, it’d be in big trouble. the application, make sure all the parts With Google Maps, for example, you still work—and work well together. can “speed-drag” satellite images of Know (and manage) what feeds into your neighborhood, city, state or coun-
42
• Software Test & Performance
the composite application. Many organizations understand the concept of thirdparty Web services, but still test and monitor the delivery of only the content they themselves serve up. Keep track of every factor affecting your customers’ experience, including third-party data and services. Make sure that all application components, including third-party contributions, meet release criteria that guarantee a high-performing Web application. Just because your third parties deliver well in one geography doesn’t mean they will in all geographies. Don’t assume your third parties are as consistent as you are. Test everything. Today’s Web applications now span many browser types, servers, connection speeds and thirdparty content/infrastructure services. Shouldn’t your development, functional testing, load testing and monitoring do the same? Organizations need to test from outside the firewall (better yet, around the world) from real-world connections at real-world speeds using the same browsers their customers use. Know your customers, their profiles and their usage patterns. What kind of browsers do they use? What kind of machines? How do they connect to the Internet? Where in the world are they located? What are their usage patterns: days, nights, weekends, certain times of day, or certain paths through the application? Keep on testing. You should test continually, even after production, because, unlike in the old days, you’re not in control of updates to the overall code. Although you may not be releasing new code yourself, new code will inevitably be getting to your end users. And just because their experience is satisfactory today doesn’t mean it will be tomorrow. Bottom line: Even in this composite world, you must take control. Although Web apps are increasingly determined by others, you’re not at their mercy. By testing from the start of the development process from the outside in and from the customer’s perspective, you turn a sharp eye on the blind spot of performance. Now watch it improve. ý Imad Mouline is chief technology officer at Gomez Corp., a maker of tools for Web application interface management. JUNE 2007
Make sure your critical applications are never in critical condition. Weâ&#x20AC;&#x2122;ve turned I.T. on its head by focusing on I.T. solutions that drive your business. What does this mean for Quality Management? It means efficiency that results in shorter cycle times and reduced risk to your company. It also means you can go live with confidence knowing that HP Quality Management
upgrades. Find out how. Visit www.hp.com/go/software. Technology for better business outcomes.
Š2007 Hewlett-Packard Development Company, L.P.
software has helped thousands of customers achieve the highest quality application deployments and