Publication
: ST ES nt g BE CTIC yme unin A lo T PR -Dep nce st a Po rform Pe
A
VOLUME 5 • ISSUE 6 • JUNE 2008 • $8.95
The Right Stuff For Java Test Coverage
Agile Development Calls For Clearing the Skies for Agile Testing Scr um Te s t Pilot s Share War Storie s Closing In Fast On the Top Guns of Unit Testing
4AKE THE
HANDCUFFS OFF
QUALITY ASSURANCE
Empirix gives you the freedom to test your way. Tired of being held captive by proprietary scripting? Empirix offers a suite of testing solutions that allow you to take your QA initiatives wherever you like. Download our white paper, “Lowering Switching Costs for Load Testing Software,” and let Empirix set you free.
www.empirix.com/freedom
VOLUME 5 • ISSUE 6 • JUNE 2008
Contents
14
A
Publication
C OV E R S T ORY
Agile Development Flies High— Only With Agile Testing
Adopting an agile development methodology? Don’t forget to incorporate an agile testing methodology, as well, right from takeoff. By Glenn Jones
20
Slipping Into Scrum
A transition to Scrum can be tough, tossing traditional system testers far beyond their comfort zone. Learn how to avoid getting burned, with these three case studies. By Robert Sabourin
34
Unit Test Tool Showdown
In unit testing, choosing the right tool is everything. But how do you find out the details? Watch JUnit and JTiger face off in this five-step showdown. By Marcus Borch
Depar t ments 7 • Editorial Presenting the Televator: yet another UI freeze-up; this time for more than 48 hours!
8 • Contributors Get to know this month’s experts and the best practices they preach.
9 • Feedback It’s your chance to tell us where to go.
11 • Out of the Box
26
Covered In Java
What does 100% coverage really mean? To effectively assess your code and your test suite, dig into coverage criteria and the different levels of trust they should By Mirko Raner inspire.
New products for testers.
13 • The Show Report STPCon lights up the San Mateo spring.
36 • Best Practices Post-deployment performance problems are often all too human. By Geoff Koch
38 • Future Test Obscurity is old news: Testing won’t work By David Kapfhammer unless it’s visible.
JUNE 2008
www.stpmag.com •
5
Ed N otes VOLUME 5 • ISSUE 6 • JUNE 2008 Editor Edward J. Correia +1-631-421-4158 x100 ecorreia@bzmedia.com
EDITORIAL Editorial Director Alan Zeichick +1-650-359-4763 alan@bzmedia.com
Copy Editor Laurie O’Connell loconnell@bzmedia.com
Contributing Editor Geoff Koch koch.geoff@gmail.com
ART & PRODUCTION Art Director LuAnn T. Palazzo lpalazzo@bzmedia.com SALES & MARKETING Publisher
Ted Bahr +1-631-421-4158 x101 ted@bzmedia.com Associate Publisher
List Services
David Karp +1-631-421-4158 x102 dkarp@bzmedia.com
Lisa Fiske +1-631-479-2977 lfiske@bzmedia.com
Advertising Traffic
Reprints
Phyllis Oakes +1-631-421-4158 x115 poakes@bzmedia.com
Lisa Abelson +1-516-379-7097 labelson@bzmedia.com
Director of Marketing
Accounting
Marilyn Daly +1-631-421-4158 x118 mdaly@bzmedia.com
Viena Ludewig +1-631-421-4158 x110 vludewig@bzmedia.com
READER SERVICE Director of Circulation
Agnes Vanek +1-631-443-4158 avanek@bzmedia.com
Customer Service/ Subscriptions
+1-847-763-9692 stpmag@halldata.com
Cover Photograph by Laurin Rinder
President Ted Bahr Executive Vice President Alan Zeichick
BZ Media LLC 7 High Street, Suite 407 Huntington, NY 11743 +1-631-421-4158 fax +1-631-421-4130 www.bzmedia.com info@bzmedia.com
Software Test & Performance (ISSN- #1548-3460) is published monthly by BZ Media LLC, 7 High Street, Suite 407, Huntington, NY, 11743. Periodicals postage paid at Huntington, NY and additional offices. Software Test & Performance is a registered trademark of BZ Media LLC. All contents copyrighted 2008 BZ Media LLC. All rights reserved. The price of a one year subscription is US $49.95, $69.95 in Canada, $99.95 elsewhere. POSTMASTER: Send changes of address to Software Test & Performance, PO Box 2169, Skokie, IL 60076. Software Test & Performance Subscribers Services may be reached at stpmag@halldata.com or by calling 1-847-763-9692.
JUNE 2008
Of Elevators And Heart Failure In this space in April, I told about the malfunction had you about how frequently I already been aware of it, seem to find kiosk-style some laughing along with applications that have crashme at the glaring breach of ed and are unable to recover functionality and security. gracefully (or at all). In According to the compaother words, they were ny’s Flashy Web site (www “caught with their UI pants .televator.net), the Televator down,” my crude way of system has the company describing an application handling updates of the that has blue-screened or a screens on its network, Edward J. Correia system that is stuck at a flashthough I’d say not very well. ing cursor. Due to upload constraints, no jobs can be It happened again. This time I was ridadded to the network play list on the ing in the elevator at a software testing same day of delivery,” read the advertisconference in Orlando, Fla. We’ve all ing specifications on the site. How very seen them—the small LCD monitors un-Web 2.0 of them. above the checkout line at the supermarSometimes the suppliers of such soluket (flashing specials when it’s too late for tions offer the option to “self-host” their you to do anything about it) and on just systems. So I decided to call Televator, just about every flat surface available in the to see if the San Diego-based company public square. was indeed maintaining this hotel’s This one was a product of Televator screens and if they knew about the probCorp. Its name was emblazoned on the lem. Unfortunately, the phone number bezel and was part of a logo that resemlisted on the company’s Web site was disbles the Empire State Building. A cleverly connected or out of service. No further named company, I thought at the time, information was available. displaying messages that were attractive On the Conference but not particularly memorable. I also I won’t reveal the name of the conference recall that the display was static—it I was attending; suffice to say it was that of changed not a bit during my ride to the a competitor. I’ll admit that I was there, at 20th floor. Mentally, I wrote it off as a least in part, to see how the other guys do backlit sign and thought nothing more of things and to gather ideas for our own it until the next day. event—the Software Test & Performance When I entered the elevator the folConference—the next of which is comlowing morning, the Televator’s display ing this September in Boston. had changed. Now there were several casBut I’m not here to hawk our concading DOS windows piled one atop the ference; rather to appeal to testers and other. A message inside the top-most wintest managers to seek out places of dow read “copying Heartbeat.txt…1 file learning like this and other fine events copied.” From the DOS prompt, I was throughout the year. Look for events able to see the drive letter from which the that offer a balanced mix of notable copy operation was executed. Was there industry figures and experienced testany other information displayed in the ing practitioners presenting their window? Oh yeah, how about the names knowledge in a variety of formats and of the file server and share directory. settings, from full-day tutorials on speAnd here’s the kicker. This window cific topics to sessions open to the will of was on display for at least another 48 the participants. ý hours. Many of the people I spoke with www.stpmag.com •
7
Contributors GLENN JONES brings nearly a quarter century of software development and leadership experience to his role as vice president of customer solutions at Mitrix, which offers on-demand supply chain execution services. This month, Glenn explains why running an agile development process without automated regression testing is like flying at mach speed in the dark. Beginning on page 14, Glenn describes how to build a testing process with constant feedback and validation that will keep your testing projects on target. He’ll teach your team to deliver automated tests for the build environment before features are considered complete. We’re pleased to once again have test-industry veteran ROBERT SABOURIN to share his wisdom on our pages. Turn to page 20 for three real-world stories of the plight of teams trying to transition to the Scrum method of agile development, as told by the Scrum Master himself. A well-respected member of the software engineering community, Rob is founder and president of Amibug.com, a software testing consultancy in Montreal. He has more than 25 years of management experience leading teams of software development professionals, and has managed, trained, mentored and coached hundreds of top professionals in the field. He frequently speaks at conferences and writes on software engineering, SQA and testing. Unit tests ensure the quality of your code, but how do you ensure the quality of unit tests? By reading the article on page 26 by MIRKO RANER, that’s how. Mirko is a systems engineer with Parasoft, where he’s part of the development team responsible for JTest. His areas of interest include enforcement of coding standards, automated test generation, JVM implementation and Unicode. Mirko is a graduate of Friedrich-Alexander University in Nuremberg; he’s also a prolific writer and blogger at raner.ws. He has presented at industry conferences. Part of the agile process is to deliver working code verified by unit tests. Starting on page 33, MARCUS BORCH, co-author (with Elfriede Dustin) of the upcoming “Implementing Automated Software Testing” (AddisonWesley, TBP 2009), describes his approach to choosing a unit testing framework with all the capabilities needed for effective testing. Marcus has worked in the defense industry as a software engineer for more than eight years. He’s currently a technical lead at Innovative Defense Technologies (IDT), a Virginia-based software testing and automation consulting company, where he designs and implements test modules to automate functional and system-level testing. TO CONTACT AN AUTHOR, please send e-mail to feedback@bzmedia.com.
8
• Software Test & Performance
JUNE 2008
Feedback ‘TEN THINGS I HATE’ FORUM
TESTERS OUT OF DEVELOPMENT Regarding I.B. Phoolen’s April 2008 article, “A Home for Testers,” the proper home for a testing organization is exogenous to the development staff. It should fall under a technically astute product management group that is not an extension of a marketing group. John McGrath
Via e-mail P.S. Excel is not acceptable if there is more than one tester unless it is a shared file with version control.
IT TAKES STRENGTH “10 Things I Hate About Testing”—great article! It really takes a strong tester’s mentality to overcome these hates and continue to uphold the product’s quality. Fred Chan Singapore
SANS DOCUMENTATION, DEVS DICTATE I read your article “Testers Are Idiots” (Test & QA Report, March 4, 2008), and it made me smile because it is a good summary of what I perceive in my work (two years working in testing). Reading “10 Things I Hate About Testing,” I agree completely with point 1 and think that points 2 and 3 are related, because if there is no documentation, the “how to start testing” is given by the developers. All points mentioned in both articles feel totally valid to me. Sandra De Le`on Via e-mail
MORE SMS & GPS Is it possible to see some articles or reviews on test tools in the areas of SMS (MT-SMS, MO-SMS) and GPS? It is a struggle to find test tools, best practices and testing processes around these topics. William Barghorn West Deptford, NJ
BEYOND COOL—CORRECT! Regarding Edward J. Correia’s article, “The Klingon Voice-Engine Debugger,” (Test & QA Report, April 8, 2008), this is JUNE 2008
going to kill any thoughts I might have ever had about being cool, but I have to point out that the Klingon phrase is actually from “Star Trek III: The Search for Spock,” not “Star Trek II.” Perry A. Reed Tampa, FL
NO BUTTER, PLEASE I have serious concern about the following comments in the article “More Things You Hate About Testing” (T&QA Report, March 25, 2008): “Number 8. Differences in Criteria for Judging a ‘Good Tester.’ But no one should have to “butter up” the boss; advancement should be based on merit alone. Unfortunately, that is seldom the case. If you are good at buttering, you might be the best tester in your organization, says Sodhani.” I lead and manage testing in my organization, and for me, how a tester is judged is very important. I have never been the “buttering” kind, and I realize how things can change even if you are not the most competent person around. Reason for not agreeing: Normally, people who are semi-competent make these statements. When people have technical or leadership competence, they surface without “buttering.” But I agree with the statement “You don’t have to be technically smart to succeed.” Recommendation: While posting articles, please use appropriate verbiage, not to impact the value of testing or tester. Srinivas Uragonda Pune, Maharashtra, India
Regarding “10 Things I Hate About Testing,” (Test & QA Report, March 25, 2008), I provided responses to Mr. Sodhani’s first five complaints on the Software Quality Assurance Forums at www.sqaforums.com/showflat.php?Cat=0 &Number=476825&page=0&vc=1&PHPS ESSID=#Post476825. Mr. Sodhani’s statements are in italics. “I am surprised to look at what testing teams have been reduced to nowadays. And in so many teams, most of the people I see don’t have any career objectives and are only there for the paycheck.” Well, if people in QA are only there for a paycheck, I think it's to be expected one doesn’t get the respect of others. In this example, the root cause is issues with the leadership team and their ability to understand the importance of each team involved in the development lifecycle. It would not surprise me to also find those in the UI and project management teams have the same sentiments of the QA team in such an environment. One should not hate QA but rather despise poor leadership. “Running around to gather requirements” Why would one rely on the developers to communicate product functionality? Software can’t be built without knowledge of what needs to be coded; thus, the developers, Tech support, professional services, project management and possibly a UI team are in the same situation as a QA team. Gathering and understanding requirements is something that QA and R&D generally do throughout the project unless it’s a very trivial app. Take a look at the change history for the requirements document. Given the PM team has a goal to write good requirement documents, they will be updating reqs and adding entries to the change history. Trivial components within one’s app will remain quite static throughout a project, unless, of course, one is in the situation where knowledge of customer needs is totally unknown and thus every part of continued on page 12 > FEEDBACK: Letters should include the writer’s name, city, state, company affiliation, e-mail address and daytime phone number. Send your thoughts to feedback@bzmedia.com. Letters become the property of BZ Media and may be edited for space and style. www.stpmag.com •
9
Out of t he Box
With Fortify 360, Security Can Come Full ‘Cycle’ Security tools maker Fortify Software is taking aim at the software development life cycle. With its recent release of Fortify 360, the company now claims to offer security solutions to identify, prioritize and repair vulnerabilities during every phase of development. The Fortify 360 suite offers five core functions, all aimed at the broader group of development, testing, operations and management. Analysis includes static and dynamic analysis of running applications during QA testing and monitoring for post-deployment. Audit Workbench provides a visualization of defects and enables teams to set priorities. An Instant Remediation module provides a real-time patch delivery system for rapid response to urgent vulnerabilities. A Collaboration module offers a shared workspace for security and development teams to work out and resolve security defects. And a Security Governance center displays a central security dashboard and control panel for monitoring, reporting and
Dashboards in Fortify 360 provide quick views of security vulnerabilities and can spot trends before they become serious.
tracking trends. “[Security is] not just about technology, but also about bridging the gap between those in the enterprise responsible for development and security,” said Fortify CTO Roger Thornton of the need for a collaborative solution. Security is a low priority in many organizations, he added, and takes a back seat to functionality, performance and even quality. “[A]nd most business managers are often unaware of the inherent business and security risks of deploying dangerously exposed software,” he said. Fortify 360 also gathers and reports threat intelligence as reported by the
company’s Security Research Group, which is dedicated to this function. Delivered as so-called rulepacks, the updates keep the users current on causes of security-related system failures, realworld remedies and ways to strengthen their systems from future attacks. “Fortify 360 challenges the premise of other point solutions in the industry by addressing the root cause of software vulnerabilities from the get-go,” said Barmak Meftah, Fortify’s senior vice president of products and services. “Our approach really allows customers to change how they view their software and achieve their security goals much faster.”
Tap Into the Cloud With Virtual Lab Virtualization has been a great help to testers, but who needs all those PCs piling up? What’s on that Compaq? How old is that HP in the corner? Does the Dell still need a new power supply? Someone fried the motherboard. A company called Skytap has emerged that claims it can take care of all those problems. Its flagship product, Virtual Lab, offers a browser-based means of creating, executing and managing instances of virtual operating systems and applications. The Web-based infrastructure provisions virtually the hardware, software, networking and storage for your test applications to run. JUNE 2008
Instances appear as thumbnails in a browser window along with owner and application name, creation date, number of CPUs and memory, status and notes. Clicking on a thumbnail drills into that configuration for further analysis and editing. All team collaboration takes place online. To invite someone to collaborate, simply send a snapshot URL via e-mail or IM, and the person or persons can bring up the same screen you’re seeing, with changes appearing to everyone in real time. Testers can use this space to re-create and discuss bugs, interface implementation or other test-
ing activities. A separate page is used for administration of user rights and permissions, viewing audit trails and monitoring projects. Files are delivered to Virtual Lab configurations by dragging them to a specific browser screen. Virtual machines can be quickly created based on those included in the pre-developed Skytap library. Virtual Lab is now in beta. Pricing is estimated to start at US$100 per month plus volume-based usage fees. General availability is set for late summer or early fall. Skytap was founded in 2006, and was formerly called Illumita. www.stpmag.com •
11
An Agile Java Server Is this the beginning of the end for container-based Java application servers? SpringSource has begun beta-testing a Java application server that it says is compliant with Equinox, the OSGi-based dynamic technology adapted for servers by the Eclipse community. It’s a core part of the SpringSource Application Platform, which enables plugging and unplugging of apps and modules without the need to stop and start the server. The platform is built around a socalled Dynamic Module Kernel (dmKernel), which SpringSource said works with Spring, Apache’s Tomcat and the OSGi model to enable side-by-side app version deployments and zero-downtime resource library upgrades. The big advantage for testers is the ability to incrementally deploy patched apps without the need for restarts, shortening iterative test cycles and benefiting especially those doing agile development. General availability is set for June; licens-
Feedback < continued from page 9 the app is fair game to significant change in some cases. It would not surprise me in this environment [if] the QA team didn’t request money in their budget to visit customers and understand their needs rather than assuming all that needed to happen to understand functionality is go talk to R&D. The first mistake is to rely on a development team to communicate all product functionality. I wonder what happens when engineers take vacation? Does testing and learning about the product halt until they return? Is gathering requirements throughout the project an issue with lack of knowledge about the customer, inability to know how to communicate a requirement or simply a result of poor evaluations of implicit requirements prior to functionality being implemented? If one is in an agile environment, the test plan for <x>, <y>, <z> functionality should be signed off and of high quality the minute the functionality is ready to test. Thus, if one asked the development team the things they hate most about development, [will] the statement
12
• Software Test & Performance
ing will be under the GPL. Download the beta at www.springsource.com /products/suite/applicationplatform.
When Real Networks Fail, Do the Apposite Network emulation appliance maker Apposite in late April unveiled Linktropy 4.0, the latest version of its WAN emulator that can now capture live network conditions and play them back for application testing and performance analysis. Linktropy appliances simulate link bandwidth, delay, packet loss, congestion and other network impairments, according to the company, and can also be configured to emulate terrestrial, Internet, wireless, satellite and other wide area network types. The new capabilities permit equipped appliances to record actual network conditions and then play them back through the appliance. The firmware update will work on Linktropy 4500, 7500 and 7500 Pro models and is set to be available in June. "We can code functionality w/o requirements yet QA waits on us to explain the functionality to them" end up in their list? “Developers dictating what to test” In the example he provides, the QA engineer is really at fault for considering the R&D engineer as his or her boss and thus must comply with their request. What is preventing the QA engineer from taking the “free” SQL statements provided by R&D and then modifying/adding/deleting in order to create a complete test suite? I don’t think any of us have been in a situation where the development team told us we can’t add anything to the test strategy the development team defined. Such a conversation starts and ends in a matter of seconds: “Development team... thanks for your input in regards to our test strategy... We’re adding in your comments to our strategy and should have a test plan ready for review in a few days.” I don’t think anyone has ever been in a situation where a development team escalated an issue to senior management/project management that QA refused to execute the entire test strategy they defined. However, a development team and other team members putting pressure on the QA team to test quick-
Integrity Gets Virtual Automation Module April also saw the release of Virtual Automation Module, an extension to mValent’s Integrity application configuration-management solution intended to ease the management of virtual environments. Integrity works by capturing the state of virtual environments, including code and configuration settings, monitoring them for changes. Provisioning new systems can then be automated by replicating those changes. The automation module allows Integrity users to “customize and instantiate virtual environments” with multiple configurations and code versions. It also eases reuse of virtualized systems through snapshots, which can be edited with automatic auditing and version history. Send product announcements to stpnews@bzmedia.com
er... well, that’s just life in the QA world. Salary disparity... I personally have no issue with developers making a bit more money than QA. If salary disparity makes it into a top ten list of things this person hates about QA, nothing is stopping them from moving to the development team. I personally think that development does take more intellect than testing does in some, but not all cases. “Too much focus on manual testing” This points to an issue in the leadership team within QA to run a fiscally responsible department. Rather than spending some additional up-front money to automate, they instead spend all their money on manual testing. I would not make a broad statement that leadership within QA departments is generally so poor across companies they are unable to make fiscally responsible decisions and is thus why this EFFECT of poor leadership (i.e., no automation) has made it into a top 10 list of what things this person hates about testing. About the only thing to be upset about is the ability of Mr. Sodhani to get this nonsense published. Zach Frison (a.k.a. “thekid”) Via e-mail JUNE 2008
S h o w R e p o rt
STPCon’s Mother Was Lightning Talks, Says Ygor If you’ve seen the 1939 film Son of Frankenstein with Bela Lugosi, you might recall when Lugosi’s Ygor told the good doctor Frankenstein— father of the Monster—that “His mother was lightning.” Maybe it’s a stretch, but to me, the Lightning Talks session at the end of day one of April’s conference really electrified the remaining two days. The entertaining and informative Lightning Talks, introduced at the April Software Test & Performance Conference, went off almost without a hitch. It energized me personally, put a charge into all the speakers involved, and seemed to light up the crowd as they fired rubber baseballs at the speakers right on queue. Lightning Talks featured top-notch speakers, including Michael Bolton, Hans Buwalda, Doug Hoffman and Google’s Jason Huggins. There even was a light-hearted presentation by yours truly, which if I might, drew quite a few laughs. It told the story of a fictional test team with faces you’d recognize from just about every television newscast of the last 18 months. E-mail me if you’d like a copy of the slides. Lightning Talks, for the uninitiated, take place in a large, theater-style room during a one-hour session. As the audience files in, they’re provided with their ammunition. One at a time, speakers were given five minutes to present their topic—the time was kept on a large animated online stopwatch. If a speaker went over the allotted time, the audience had the authority to pummel that speaker with their baseballs. And believe me, they did. This session took place after the conclusion of full-day tutorials, and people apparently need to unload. It was so well received that many of the participants suggested we schedule a Lightning Talk every day (we’re thinkJUNE 2008
Attendees of the Software Test & Performance Conference, Spring 2008, were treated to rubber baseballs and speakers were subjected to them.
ing about it). The talks were useful, people said, not just because they gave them a chance to “audition” speakers or in some cases get a preview of an upcoming class, but they also got to take out some of their aggressions, relieve some stress and have some good-time fun. When was the last time you could say that after a conference? As we always do, the San Mateo STPCon offered tracks on security testing, performance testing, databases, Java, metrics, QA, management and automation. And feedback from students was terrific. This year, we added GUI testing and analysis classes taught by none other than Jeff Johnson, author of the hilarious and popular “GUI Bloopers” book series, which lampoons hundreds of real-world interface design goofs, gaffes and blunders from all over the world. Then there’s HOTS, the Hands-On Tools Showcase. This is an event we
debuted at last year’s San Mateo conference, and it too was a smashing success. In this after-hours event, HOTS vendors ply conference-goers with food and liquor while showing off their latest software. This year’s HOTS drew more people than last year, was more casual, and featured “sliders,” a popular hot sandwich that disappeared faster than the beer. If you couldn’t make it to the West Coast show, we’ll be doing it all again this September 24–27 at the Marriott Copley Place in Boston. Why will it be great? Because we’re Crazy About Testing! But don’t just take my word for it. Listen to (and see) what attendees had to say at www.youtube.com /v/soR-jhU3ic0. ý STPCon is a unique conference for software developers, development managers, test/QA managers and senior test professionals. More information may be found at stpcon.com. www.stpmag.com •
13
By Glenn Jones
R
unning an agile development process without concurrent and automated regression testing
is like flying through the clouds at mach 2 without a navigation system. Without the constant feedback and validation, there’s a high probability that you won’t get to where you want to go and that you’ll crash. Many organizations are moving or have already moved to an agile software development process, such as Extreme Programming or Feature-Driven Development. They make this switch to achieve the benefits of faster revision cycles, faster time-to-market for key features and reduced risk. I’m sold on agility. I’ve been impressed by the quality of the software that teams produce using agile methodologies to deal with ever-changing requirements. But I’ve also learned that if you’re adopting an agile development methodology, you must incorporate an agile testing methodology.
experts develop specifications; developers use the specifications to design, implement and unit-test their code; and testers use the functional specifications along with the developer’s designs to develop and execute a test plan. Developers typically complete an entire application before testing begins. Many times tests aren’t automated until after the release, if they’re automated at all. In an agile development process, functionality, designs and code are in a constant state of change, but the system must always be in an executable state. Most agile methods develop software in brief iterations that last approximately a month. The agile development team is constantly enhancing the software, writing new code and refactoring existing code. Each iteration is set up as a small development project and concludes with the release of new, incremental functionality—not a major application. The agile methodology requires immediate feedback on whether the latest incremental piece of code works, so test-
First, the Infrastructure
Strategy, Automation and Feature Iteration In tandem with this process, the test team develops the testing strategy, selects the appropriate automation tools, and begins developing test infrastructure components. The development team then creates and automates most of the testing at the unit and module level. Then we start to iterate on a feature basis. At this stage, we form feature teams to implement features leveraging the infrastructure. The feature team typically consists of a functional expert, one or two developers, a tester and a technical writer for user documentation. If the feature has a strong UI content, the team will also include a UI designer—sometimes referred to as a user experience engineer or an information architect—and a graphic designer. The feature team uses a single wiki page to capture all aspects of a feature, including the requirements, design, impact on other areas of the system, test plan and checklist to help ensure the team has covered the key items in the process. The team must deliver automated tests to the build environment before the feature can be considered complete.
How Agile Testing Differs From Waterfall In more traditional development processes, functional
Testing And Development Teams Must Follow The Same Flight Plan For Projects To Straighten Out And Fly Right ing happens in tandem with development, instead of being left to a separate integration and testing team at the end of the process. If testing shows a component, interface or feature isn’t working, developers must reconcile it immediately, for their own progress and for the progress of the other teams working in parallel on the code base. This early and continual focus on testing also helps the team to design with testability in mind. Immediate feedback comes in the form of build results. We run four different types of test builds: continuous, nightly, comprehensive and penalty box. This offers the comprehensive regression testing coverage we need while providing the development team quick feedback when a code check-in Glenn Jones is vice president of customer solutions at Mitrix, which offers on-demand supply chain execution services. www.stpmag.com •
15
Photograph by Laurin Rinder
Whenever we kick off the development phase of a new product or new piece of software, we first focus on getting the infrastructure in place. This is a highly interactive stage with a small core team. Examples of infrastructure include the logging system, exception handling system, UI layout system, URL processing module, database abstraction system, document handling system, unit testing system and automated testing system. During this stage, most of the functionality we’re developing won’t be visible to the end users— it’s like the plumbing of the project. Once this infrastructure is in place, we use unit tests created by developers to test the infrastructure. We shoot for very high code coverage—more than 80 percent—in this area.
AGILE TEST FLIGHT PLAN
breaks functionality. Each test build consists of several steps, including retrieving the latest changes from source control, compiling, packaging for distribution and testing. Each type of build includes a different mix of automated regression tests. Automated tests include unit, API, command-line and UI tests. Since Fanfare’s business is producing an automated testing product that is a scripting language, we also develop automated “dog food” tests in which we use our product to test our product.
Continuous Builds We run a continuous build every time someone checks in code, no matter how small the piece. If someone submits a change while a build is running, we’ll include that change in the next continuous build. The continuous build includes a subset of unit tests, functional tests and enough UI tests to ensure that the primary navigation paths and key screens are functioning properly. Our goal is to run the continuous build in less than 20 minutes, so we’re selective as to which automated tests to include. We try to include a few tests for each module. We include more tests for modules that are more complex or that tend to break due to their dependence on other modules. Developers are constantly updating their source code to capture changes made by other developers. If we ran tests less frequently—say, once a day—and someone made a change that broke some functionality, other developers might not be able to test their new code due to the broken functionality. Multiple developers could waste time figuring out which piece of code broke the functionality. Instead, the continuous build helps save time by quickly identifying problems, allowing developers to fix errors in their own code, and minimizing any impact on the productivity of other developers.
Nightly Builds In addition to the continuous builds, we run nightly builds that perform a much more extensive set of automated tests, aiming to get more code and functional coverage. We try to run all unit, API, command-line and dog-food tests along with as many automated UI tests as we can fit in the allotted time. Because of the time required to run all UI tests, we focus on testing all navigation paths along with a subset of functionality on
16
• Software Test & Performance
each screen, including dialogs, wizards and pop-ups. Because we have offshore development centers and developers check in new code around the clock, we aim to run the nightly build in the three-hour window when the sourcecode control system is the least active. The source-code control system is least active on Saturday nights, so that’s when we run all automated tests in a comprehensive build. This can take up to 12 hours, with automated UI tests taking most of this time. When an automated test breaks, our testing tool places it in the penalty box and notifies the person responsible for the test. The tool runs penalty box tests after the continuous build. Once the developer fixes either the code or the test, he or she places it back into the appropriate build.
To Incorporate User Feedback, Test Before the UI Is Completed Part of the agile advantage is that it captures and incorporates user feedback throughout the development process, resulting in a more usable application. This means the UI is in flux throughout the project. One of the first questions I
get from development teams who are used to waterfall is, “How can you test before the UI is completed?” It’s a challenge, but a manageable one if you develop with this kind of testing in mind. We try to reduce our testing of the UI as much as possible, for two reasons: First, the UI tends to change until late in the development cycle, requiring heavy maintenance of any automated UI tests. Second, even automated UI tests tend to run significantly more slowly than other types of tests. We can run approximately 50 API tests in one second, but to run 50 UI tests would take about 30 minutes. We don’t recommend automating significant portions of the UI tests early in the development cycle, because you’ll end up spending a lot of energy just maintaining your UI test. Instead, our strategy is to postpone automating UI tests, and meanwhile to get as much coverage as possible using automated unit, API and command-line tests. To support this strategy, we maintain a clear separation of the UI from our models and business logic. The models and business logic can and JUNE 2008
AGILE TEST FLIGHT PLAN
should have a clean and stable API, but the UI can and should be dynamic up until the end. Our developers working on the business logic and models develop APIs and write tests for them as they go. We mostly use our own automated testing product, but you can achieve much the same result (although more slowly) by writing your own test suite to make calls against APIs and command lines to validate correct responses. The UI is one of the last things we stabilize. We try to get as much coverage as possible using non-UI testing, allowing us to reduce the automated UI testing to functions available only on the user interface. As we complete the initial cut at the primary UI screens, we create a quick sanity test that navigates through the UI screens to validate that the navigation and key UI components function properly. We run the sanity test as a part of the continuous build, but it doesn’t give a lot of coverage. As soon as the UI elements start stabilizing, our test team starts developed automated UI tests.
The Need for Speed: Automated Testing Because agile development teams need frequent feedback from tests at all levels (unit, functional, API, command line and user interface), teams must automate the testing process as they develop the system. Of course, any time we create something new, we have to test manually, but manual testing is more resourceintensive and takes longer than automated testing during regression tests. As soon as a manual test has cleared a piece of code, we build an automated test and add it to one or more of our regression testing suites. As we find and fix bugs, we add an automated regression test covering the fixed functionality to ensure the bug does not reoccur. We’re constantly adding to the same code base, building and running the product as we add to it. We use each test again and again to ensure that a certain piece of code is continuing to perform as expected. Automation of testing is key: we simply couldn’t meet the aggressive timelines of an agile project if we had to perform all those tests manually.
Developers Think About Testing— Testers Think About Development With agility, since tests are developed JUNE 2008
at the same time as the code itself, testing becomes integral to development, blurring the traditional boundary between QA and development. As soon as we assign a developer to a feature, we assign a tester to that feature, too. Sometimes they sit next to each other and sometimes they don’t, but each one knows the other is watching his or her work. The test team formulates the test strategy and test plans throughout the agile process from the initial systemlevel requirements analysis through each iteration cycle. The team develops a comprehensive understanding of how to thoroughly test features and functionality. For the initial iterations, the test team focuses on understanding the ins and outs of the system architecture and develops infrastructure and reusable components to support automated testing. A good automated testing architecture is just as important to the success of an agile project as is a good software architecture. The test team complements the development team by developing and executing extensive tests, including positive test cases covering a greater degree of functionality, negative test cases, testing for boundary conditions, edge cases, corner cases and testing for performance, scalability and reliability. While developers usually think only of how to test that something works, testers think of what conditions will cause the code to fail. When developers and testers work in tandem, they can be even more efficient in covering all the bases. In an agile environment, testers need to be highly analytical and creative because agility requires them to handle uncertainty. Unlike in traditional environments, they don’t get a complete application with a UI and a narrow brief to do manual testing. I find it most effective to hire developers who can think a bit like testers, and testers who can write code and scripts, but think like testers. Each team member can then anticipate the issues that might arise. Whenever possible, I’ll seat testers next to developers. They develop feature and system-level test cases while the developer comes up with unit and positive functional test cases.
Functional and API Tests The agile development and testing process works best when developers
take the responsibility not only for developing extensive unit tests, but also for developing and automating functional and API tests. At Fanfare, developers create automated tests covering a subset of the positive test cases, and we run most of these tests in our continuous builds. By having developers develop a subset of the automated tests, the developers must design for testability. For example, a developer may be designing a game that takes several hours to reach the final stage. To test what happens in the final stage, the tester could have to play the game for several hours before reaching the final stage, or the developer could design in a mechanism via an API that would allow the tester to go straight to the final stage. Once the development team begins to stabilize the code for a feature, usu-
T
YPES OF TESTS Automated regression test: Runs without human intervention to validate that changes to the system have not broken existing functionality Unit test: Validates that individual units of source code are working properly Public API test: Validates that APIs documented and accessible by thirdparty developers are functioning properly Private API test: Validates that APIs exposed for internal use are functioning properly Command-line test: Validates that commands entered at a command line are functioning properly User interface test: Validates that UIs are functioning properly Dog-food test: Using your own product to test another of your products
ally after several iterations, the test team executes and automates its test cases. Because the testers are involved early and extensively, they proactively improve our testing process. As we’re having design meetings, testers offer their advice about how to make a section of code easier to test. Testers then www.stpmag.com •
17
AGILE TEST FLIGHT PLAN
integrate most automated test cases into the continuous, nightly or comprehensive build, as appropriate. Scalability and performance tests typically run in a separate testing environment.
Communication: The Key To Quality High-bandwidth communication between developers and QA teams is essential. I prefer the developer and tester of a feature to be in the same room together, but that’s not always possible, as we have offshore resources working on some projects. We take a Scrum approach, with team members all meeting for at least 15-30 minutes a day, whether they’re in the same room or on a conference call. Scrums involve developers, testers and the domain experts. Also, we make sure that each feature has a single document that captures the requirement for that feature and its impact, as well as information on the APIs, the user interface and how it should all be tested. We maintain these documents in wikis, so
everyone can access and update them with their own information: Developers can update the design information, and testers can update how to do the testing. Some organizations have all their
•
The team must have immediate feedback whenever they break existing functionality.
• developers working on the same platform (such as Windows XP), but we have at least one developer developing and testing on each of our target platforms. We currently support multiple versions of Windows and Solaris and various Linux editions. Due to differences
in the UI and threading technologies, each platform performs and behaves a little differently, typically requiring some platform-specific tuning. By having developers developing and testing on each of our target platforms, we catch these nuances early and develop and implement strategies to address the issues. Our hourly build develops installers and runs automated tests on all supported platforms. This works very well for us. If a feature fails a test, the developer for that feature immediately checks with other developers, to notify them and get advice on fixing the problem. With the speed and constant change of agile development, the team must have immediate feedback whenever they break existing functionality. It may appear to be more painful to develop and maintain automated regression tests throughout the development cycle, but with the right strategy and implementation of automated testing, I’ve found that the gain far exceeds the pain. ý
Without oversight, software projects can creep out of control and cause teams to freak. But with Software Planner, projects stay on course. Track project plans, requirements, test cases, and d efects via the web. Share documents, hold discussions, and sync with MS Outlook®. Visit SoftwarePlanner.com for a 2-week trial.
18
• Software Test & Performance
JUNE 2008
Real-World Stories Of Scrum Migration—And No One Got Badly Burned
By Robert Sabourin
I
love Scrum. Scrum is a software development framework that helps teams deliver working, shippable code
Illustration by The Design Diva, NY
in short iterations. I’ve worked in many life-cycle models, from traditional waterfall methods through complex spirals to modern evolutionary approaches. Scrum shines in turbulent project contexts in which business, organization and technical factors are constantly in flux. The transition to Scrum takes many traditional independent system testers out of their comfort zone because Scrum challenges many common notions about development. If you’ve been testing as part of an independent team, you may be in for quite a surprise moving to Scrum. In traditional projects, test teams find defects that escaped from the previous phases of development. In Scrum, testers are part of a self-organized team charged with getting things done. Testers work hand in hand with developers and other team members, and are involved in all aspects of development: planning, elaborating stories, testing, debugging and delivering working code. This article will describe three recent adventures I’ve had helping independent testers make a successful transition to Scrum. In each case, the transition to Scrum met with different problems as system testers tried to improve chaotic turbulent projects. The testers took the dive from the frying pan and into the fire. All survived with different scars and some important lessons learned.
A Bit About Scrum I use Scrum frameworks to enable delivery of working, shippable code in reasonably short periods. Some projects have one-week iterations, but most clients choose a Sprint of between two and four weeks In Scrum, the project backlog is a live, prioritized heap of potential Rob Sabourin is founder and president of Amibug.com, a software testing consultancy in Montreal.
20
• Software Test & Performance
requirements, including User Stories, Capabilities, Constraints and Product Characteristics or Quality Factors. The product owner is responsible for actively managing the backlog. Potential entries come from customers, marketing ideas, architects and team members. The backlog evolves as development progresses. Backlog entries contain just enough information to prioritize them. Entries require further elaboration to implement. They are detailed as late as possible, often after an implementation decision is made. The Scrum team and product owner select high-priority project backlog entries to implement in the Sprint. At the planning meeting, developers and testers estimate and scope out the work to do. The result is a selection of items moved from the project backlog into the Sprint backlog. During the iteration the team focuses on fulfilling the Sprint backlog. During planning, testers and developers determine how to code and test each story. The team begins the Sprint with a strong idea of how it will end. To paraphrase Steve Covey, “Begin with the End in Mind—First things First.” The dynamic of development is facilitated by the Scrum Master. The Scrum Master ensures the team is in sync and represents the team to external stakeholders. Daily stand-up meetings facilitate communication and keep team members aware of progress. The Scrum Master will block distractions and help keep the team on track, facilitating and supporting the team’s self-organization. Testers work hand in hand with developers throughout the Sprint, preparing test scenarios and data, and working with builds prepared on-the-fly during the iteration. Developers are often challenged to test, and testers may also be asked to develop code as part of the Sprint. Traditional roles are set aside as required to adapt to the work at hand. The Sprint ends with a demonstration of the working code that has passed all tests set out during the planning meeting. Any changes, clarifications or refocusing have been done with full involvement of the product owner to ensure no surprises at the
end. A Sprint retrospective is held by all team members to identify changes to improve subsequent iterations.
The Role of Testing in Scrum I see the traditional role of testing as falling into two broad camps: gatekeepers and information providers. • Gatekeepers act as the guardians of quality and block the release of weak, ineffective or dangerous products. • Information providers offer massive objective information about the state of the project under development. What works? What doesn’t? What bugs must be resolved? www.stpmag.com •
21
SCRUM MIGRATION
used to identify inconsistencies and differences. Subject-matter experts and customers can help assess correctness, and testers can also use a series of heuristics to guide validation. I’ve seen many cases in which the tester points out an inconsistency to the developer and then works to understand and resolve the concern. The developer can inspect and modify code while the tester varies conditions to build a good understanding of the problem at hand.
Results of Testing
In Scrum, some team members may be primarily testers, some may be primarily developers, and some may take other roles. But all team members may be called upon to test or to support testing during the Sprint. The tester works with the developer. The tester is both a consultant and active participant. Before the code is written, the tester works with the developer to decide what it means to confirm the code has been implemented correctly. Elaboration of story tests before the coding starts is a very important blend of design and requirement analysis. The tester is helping the developer decide what it means to fulfill the story.
The Basis of Testing in Scrum In system testing, testers often base testware on many sources of information. Typical sources include requirements, designs, usage scenarios, code, support information and fault models. In a Scrum project, the basis for testing is very different. Strategies to assess correctness must be determined during planning. The amount of documentation available is minimal. The tester must use creativity and judgment to come up with effective ways to assess correctness. Side-by-side testing may be
22
• Software Test & Performance
Traditional testing focuses a lot on documentation. Test documentation is a rich deliverable that can be used to demonstrate in detail what parts of the application were exercised, what problems were identified and what other findings were observed and reported. In Scrum projects, I see a minimum of test documentation. When a tester finds a bug, he works directly with the developer to resolve the problem on the spot. No bug list is used. The only bugs documented are those that aren’t corrected on purpose. Care is taken to ensure that test documentation required to conform to regulatory requirements is kept. For example, in a medical project, I worked on a detailed log of how drug data was validated, which was required to conform to relevant domain regulations.
The Work of Testing I encourage Scrum testers to hone their skills in exploratory testing. Testers need to quickly assess the stability of the system and then explore changes or emergent behaviors. Exploration involves navigating though the new features, concurrently designing and executing tests as the tester learns about the implementation. Testers can explore the typical, alternate and error paths associated with each story being implemented. The typical path of buying a book on Amazon.com involves buying the book based on the title. An alternate path would be buying the book based on the author name or the ISBN number. An error path could include attempting to buy a book that doesn’t exist, is out of inventory or with an invalid credit card. Testers should also focus on exploring “what if” questions about the implementation. What if there isn’t enough disk space? What if a process runs out of system resources? What if the input is
invalid? Exploring failure modes can expose weakness in design. I encourage testers to use automation, generally controlling the application via a well-defined API, using scripted languages such as Perl or Ruby. I avoid traditional GUI-based test automation tools since updating scripts when GUIs change takes a lot of time.
Three Case Studies I help fix broken development projects. Several companies have consulted me recently regarding problems in their implementation of Scrum. Task analysis of these stories highlights insights on avoiding problems when migrating to Scrum from traditional life-cycle models. The following three stories are true. Only the names have been changed to protect the innocent. These are projects in which system testers stepped out of the frying pan and into the fire.
The Case of the Weak Pilot TBU, a major data processing corporation, decided to implement Scrum in an effort to improve its failing life-cycle model. Previous projects were implemented with a waterfall approach. TBU has been experiencing turbulence in product requirements and new market pressures from small niche competitors who were able to come up with competitive solutions quickly. Rework due to changing requirements was getting out of hand. The heavy delivery cycle, although rigorous, wasn’t fast enough to compete. TBU has a small software engineering group dedicated to refining the lifecycle model and very excited about possibilities of implementing Scrum. Scrum indeed seems ideal for TBU. Short incremental delivery cycles with working, shippable code would be an important part of meeting the new market pressures. TBU runs several dozens of concurrent development projects. The organization has a rich history of delivering high-quality solutions and has come to depend on its independent testing team. Indeed, the TBU system-testing team has saved the day many times by finding important showstopper bugs before release. The team has considerable political power and a tendency to block initiatives that they feel risk product delivery quality. After piloting Scrum for over six months with a series of three-week JUNE 2008
SCRUM MIGRATION
iterations, several problems occurred. TBU called me in to perform a task analysis and recommend changes to help the pilot project get back on track. I looked under the hood, and here’s what I found. TBU selects a pilot. Inherently conservative, TBU was very careful in selecting a pilot project to experiment with Scrum. They chose a project that would have minimal impact on the business if it failed. A failed pilot project would, in the worst case, involve absolutely no lost business. TBU kicks off Scrum. The developers were given basic training in Scrum. One of the developers was assigned the role of Scrum Master over and above his normal daily development activities. Testing discovers it’s doing Scrum. After the first Scrum iteration started, the TBU testing team was informed of how Scrum would work. Testers weren’t aware of the process. There was a lot of confusion as to what was expected of testers in the iteration. Testing team location. The testing team continued to work in a testing lab far away from the work area of the developers. Testers couldn’t easily meet or interact with developers. Double entry of bugs. In the experimental pilot implementation of Scrum, TBU’s director of testing wanted to ensure that product bug and test data wasn’t lost at the end of the project. The testers were instructed to enter all bugs found in both the TBU traditional-testing bug tracking as well as in the new Scrum tools. Testers managed the double entry of all bug and test data, as well as redundant documentation of all test cases run. Product owner absent. This was a very low-priority business project for TBU. The product owner was involved with many other responsibilities. At the onset of the Scrum implementation, the product owner actively participated in the Scrum planning meetings, but wasn’t available to the team whenever prioritization or clarification was needed. Team interrupted by support issues. The
•
development team was constantly interrupted by support issues often related to basic system operation. Developers provided second-line support but were basically transferred calls directly from the help desk continuously. The Scrum Master was seemingly unaware of this problem, which chewed through a lot of the team’s development capacity. It was treated by developers as a necessary evil and a part of life. Scrum Master was also a developer. The Scrum Master was a developer on the team. He took on important development activities as well as coordinating the team, representing the team to stakeholders and escalating issues to management. Often the work of the Scrum Master was conflicted. Should he spend the time writing critical code or helping unblock another team member? He made sure the mechanics of Scrum worked fine, but he neglected to ensure that the dynamics of the team were fluid and effective. Escalating a problem to management is not the same as solving it. Tester got builds too late. Testers didn’t get builds from developers until very late into the Sprint. Developers delayed giving code to testers, effectively implementing a microscopic waterfall model within the Sprint. Developers didn’t coordinate work with testers. Test automation. Very little effort was spent by testers automating tests. Very little effort was spent by developers building a framework to support automation. No Application Program Interface (API) was established to support automation by developers for their unit testing or by testers to help exercise the user stories. Testability issues weren’t found in the project backlog and were never prioritized as part of an iteration. Overreliance on hardening iterations. In some Scrum projects, an iteration will be dedicated to hardening the codebase. A hardening iteration focuses on bug repair. Hardening iterations can be used to explore how applications work on different platforms, in differ-
The corporate inertia of traditional testing is like a vulture waiting for the next weakness to be exposed.
JUNE 2008
•
ent development contexts or when different co-resident third-party applications are running at the same time. TBU used hardening iterations as a placebo for traditional system testing. The remedy. As a result of my detailed task analysis, I suggested that TBU change a few things to implement Scrum effectively. 1. Make sure the Scrum Master is not on the project’s critical path 2. Eliminate redundant bug-tracking system 3. Have testers work directly with developers and in close proximity 4. Do everything possible to ensure active product ownership. Have the role seconded if the product owner is distracted by other responsibilities 5. Choose a more important project for next pilot 6. Allocate time to support issues before planning 7. Have testers engaged in the planning meeting 8. Ensure testability issues are in the project backlog—they’ll quickly bubble up in priority. What is TBU doing now? TBU has wisely scrapped the initial Scrum pilot. It was deemed better to try again fresh. A group of three pilot projects are underway in a much more important product area with very active product ownership. There is less redundant work, but the corporate inertia of the traditional testing role is hovering like a vulture waiting for the next weakness to be exposed.
The Case of the Distracting Documentation BoxedIn is a major leader in the domain of information management, with a long track record of leadership in its niche. The company has a rich tradition of corporate responsiveness. When a customer has an urgent problem or highly specialized request, BoxedIn takes pride in the timely delivery of a high-quality solution. BoxedIn is the “Ghostbuster” of information! BoxedIn had trouble resuming projects interrupted by urgent opportunities. The company often shifts all developers and testers to an emergency, leaving “normal” ongoing projects in a lifeless limbo. Scrum is a development framework that could be quite effective for BoxedIn. During the iteration, a team www.stpmag.com •
23
SCRUM MIGRATION
would be dedicated to completing the Sprint without interruption. New opportunities could become high-priority requirements for the next iteration, thus eliminating the need to constantly cancel, refocus and reprioritize corporate projects. Before Scrum, BoxedIn had an independent test team that relied on detailed manual scripted testing. The team was composed of subject-matter experts, with a spattering of “professional” testers. The testing team dedicated a lot of effort to ensuring that several notions of coverage were maintained and documented. Any time a support person asked “Have you tested this feature?” the system test team could respond with an answer quoting the build, tester, date, test description and even the actual test data used. The system-testing team kept track of a lot of detailed data. The development team was torn apart and reassembled in response to corporate emergencies. Developers were heroic knights in shining armor. When a problem came up, they mounted beautiful white horses to ride into battle, making whatever trade-offs and compromises they could to save the day, and then proudly rode home victorious, into the sunset, with yet another new configuration to maintain and support for years to come. Development had a rich tradition of documenting and validating requirements, functional specifications and designs before diving into a solution. Each developer had his own special style of unit testing, and the developers did extensive integration testing before builds were released to the independent system testing team. Product managers are the customer’s advocates. They work in the same team as the system testers. Product managers gained consensus among many project stakeholders to prioritize requirements. BoxedIn kicks off Scrum. An urgent critical business project was selected as a Scrum pilot implementation. Team members weren’t trained. Roles weren’t clearly defined. Storyboards replaced traditional requirements for the new project. Product managers were dumped into the role of product owners. Development
managers were made into Scrum Masters, and developers and testers attempted to deal with the new reality. Testing team uncomfortable. At first, the testing team was completely lost. Where was the test documentation based on formal requirements or
•
Testers were able to discover unexpected emergent behaviors and identify many critical bugs as they learned about the application. The team was able to find many dozen critical bugs without having to document line-byline in advance each test case or procedure. They felt something was wrong and were afraid that they wouldn’t be able to answer customer support if queried about whether a certain feature had been tested. The testing team was out of its comfort zone. In order to compensate, members spent many hours working overtime trying to create documentation to match the testing done. Product management passive, not active. The role of product management didn’t evolve into the role of product ownership. BoxedIn product managers wanted to gain consensus from stakeholders before prioritizing requirements or making product decisions. Traditional product management documents weren’t needed in the new process. The system architect, development lead, product manager and Scrum Master maintained separate requirement lists. All were conflicting. The remedy. As a result of my detailed task analysis, I suggested that BoxedIn change a few things to implement Scrum effectively. 1. Train team in Scrum roles, responsibilities and framework 2. Implement active product management 3. Consolidate backlog, one for project and one for current Sprint 4. Involve testers in Sprint planning sessions 5. Coach testers and developers on exploratory testing 6. Developers should consistently use unit test framework 7. Use stories for requirements and testing 8. Ensure that team is committed during iteration What is BoxedIn doing now? BoxedIn has effectively implemented Scrum. Testers are no longer focused on documenting the testing effort, but rather on implementing exploratory testing based on storyboards and failure modes. They use automation tools supported by the development team. It
Projects can progress and deal with urgent changes without having to scrap disrupted project work.
24
• Software Test & Performance
• design documentation? Testing was to be based on running code and a storyboard. The team was able to use storyboards and running code to identify many important problems, but they were never sure of how much testing was actually done. They didn’t have a notion of test coverage. Discovering exploratory testing.
JUNE 2008
SCRUM MIGRATION
took three iterations to get the team in sync (one month per iteration). The testers had access to expert coaching and some on-site training. All key team members received Scrum training. BoxedIn management is now using Scrum on more projects. They are delighted with the fact that projects can progress and deal with urgent changes without having to scrap disrupted project work.
The Case of the Graceful Goose TimeSoft is a world leader in human resource scheduling and management software. TimeSoft management has established an important mission to dramatically improve time-to-market without risking product quality. TimeSoft was often caught following “tyranny of the urgent” projects and opportunities, leaving incomplete or inconsistently documented software in their wake. Projects were constantly interrupted, and the company was at risk of losing market share to smaller niche players. TimeSoft was forcefully merged with another related software company. This merger raised management awareness of the software project’s problems and led to some dramatic changes. Some members of the newly merged TimeSoft management team had previous successes with Scrum frameworks in other companies. Scrum was a natural fit for TimeSoft. TimeSoft kicks off Scrum. TimeSoft management first identified internal champions to support the move to Scrum. They trained the internal organization leaders, including all product owners, development leads and test leads. Once this was done, developers and testers were offered a blend of inhouse, custom-developed training and external public training in Scrum. The in-house training was designed to focus developers and testers on the differences between their traditional roles and what would be expected of them in the Scrum implementation. With one Machiavellian fell swoop, TimeSoft implemented Scrum in all business units concurrently. There were no pilot projects. It was a complete commitment to change that was supported by all levels of management. Business units had no choice in the matter. TimeSoft adopted a philosophy of creating “barely adequate” documentation for all requirements, designs and JUNE 2008
project tests. After each Scrum iteration, the team decided if they required more, less or different levels of detail in project documentation. TimeSoft pain. The transition to Scrum was painful and abrupt. It was very difficult for teams to deliver working, shippable code for the first few iterations. They did a lot of learning and self organizing to find the acid mix of development testing and documentation required. Sprint durations were varied, and teams were established to implement “integration stories” so that multiple small Scrum teams could work in parallel from a common project backlog. During the first few Sprints, testers relied too much on manual testing. They also started testing too late into the iterations. The remedy: Test automation 1. Eliminate bug report 2. Learn from other teams 3. Institutionalize learning from mistakes Test automation was made the highest-priority backlog item. The ability to control and observe applications using automated tools enabled automated regression testing. The effect was to have a few iterations that don’t add new external features, but do add testing hooks and frameworks ready for action. This led to a dramatic acceleration in development and increased confidence in code delivered at the end of each iteration. What is TimeSoft doing now? TimeSoft continues to learn and adapt. The testing team is learning new ways to apply visual models and systematically implement exploratory testing in each iteration. The team has a wonderful dynamic. Developers and testers collaborate in all aspects of planning, coding and testing in each Sprint. At TimeSoft, learning and adapting are critical success factors.
Some Common Threads Testing should have an active role throughout each Sprint. Testing priorities should be in the project backlog and can include frameworks, hooks and test data. It’s also a good idea to include stress testing experiments in the backlog. Iteration planning should decide and prioritize the focus of testing. We should estimate testing effort based on the work to do. Planning should help us
decide what we’ll test and what we won’t test as part of the iteration. During the Sprint, testers work directly with developers. They can coach developers in building effective unit and regression tests, and collaborate with developers as code is developed. They can also communicate with developers to describe, isolate and fix bugs immediately. Testers should learn to practice exploratory testing. Concurrently design and implement tests as you learn about the application. Make sure you learn and adapt as a result of every iteration. Scrum teams are self organizing and very effective at dealing with turbulence and reacting to change. To avoid corporate inertia, it’s a good idea to continuously learn from your mistakes. Moving from traditional models of testing to Scrum can be exciting. If we can overcome our fears and hesitation, Scrum can be a fun and productive way to enable the development of solid, shippable “test-inspired” code. Remember, it’s all about people… and the occasional bug! ý www.stpmag.com •
25
Unit Tests and Code Coverage Tell Only Part of the Quality Story—The True Picture Appears When You Read Between the Beans By Mirko Raner
U
nit tests ensure the quality of your code. But how do you determine the quality of your unit tests? Nowadays, a comprehensive
suite of unit tests is a standard component of every professional Java project. Developers rely on unit tests as indicators of code quality. A test suite that passes without any failures is generally seen as an indicator of high-quality code and properly functioning software. But how much of your code is really tested by your test suite? And how can you be sure that there are no significant gaps? A popular measure for test suite quality is the test coverage achieved by the suite. In simplified terms, this is the overall percentage of code that is exercised by unit tests. Many project teams don’t know the actual coverage that their test suites achieve. Often, a development team is surprised when they’re confronted with the actual coverage numbers that were measured for their tests. Also, many developers are unaware that even when a coverage tool reports 100 percent coverage and all tests are passing (and supposedly correct), undiscovered bugs can remain in the code. To effectively assess the quality of your test suite and the code being tested, it’s essential to understand the various coverage criteria and the different levels of trust they should inspire. This article explains the basic notions of test coverage and discusses some of the common pitfalls and misconceptions regarding coverage granularity, coverage criteria, coverage density and test overlap. Java code examples will be used to illustrate these concepts, but most of the presented ideas apply just as Mirko Raner is a systems engineer with test tools maker Parasoft. JUNE 2008
well to other programming languages.
Coverage Granularity Before delving into the details of coverage criteria, let’s explore the concept of coverage granularity, an important aspect of automated tools that measure test coverage. To determine the supported level of granularity, ask yourself, “What is the smallest unit of code whose coverage status I can determine unambiguously?” For many available tools, granularity is limited to individual lines. This can be a bigger limitation than it seems at first glance. The key problem is that Java, like most popular programming languages, is a freeform language. A developer may write a complete Java class as one very long line of code. If a coverage tool that offers line-based coverage granularity reports that this one-liner class is “covered,” what exactly does that mean? Does it mean that every single expression in that line was covered, or is 100 percent coverage reported if at least one expression in the line was covered? Admittedly, this example is somewhat contrived, and such a programming style would already be a cause for concern. However, line-based granularity can reach its limits for some pretty common idioms. Consider the method in Listing 1, in which a class demonstrates the limits of line-based coverage granularity. LISTING 1 Public class Listing1 { static int minOrMax(boolean minimum, int a, int b) { return minimum? Math.min(a, b):Math.max(a, b); } }
A single test case will partially cover the line containing the return statement, but at least two test cases are needed to cover both alternatives of the conditional expression. Even without using the conditional operator, it isn’t difficult to create situations in which a line of code is only partially covered. Exceptions during program execution can always leave parts of a line without coverage. Line-based coverage granularity is sufficient for most cases—as long as the coverage tool doesn’t report complete coverage for lines that are, in reality, only partially covered. Also, it may be tricky to determine why a particular line isn’t fully covered and what tests need to be added to achieve full coverage. Coverage tools that report coverage for individual expressions make it easier to identify missing test cases. Visualization of expression-based coverage granularity is a little more intrusive. Line-based coverage can easily be displayed in a side ruler of the source editor, whereas expression-based coverage requires markers (like coloring or underlining) in the source code itself. In rare cases, a seemingly atomic expression translates to multiple bytecode instructions—some of which may not get covered under certain circumstances. Ideally, coverage granularity reaches down to the level of individual bytecode instructions. In fact, many coverage tools collect information about whether or not individual bytecode instructions are executed. However, the granularity of the collected data may be reduced to expression-based or line-based coverage when reported to the user. www.stpmag.com •
27
BETWEEN THE BEANS
FIG. 1: CONTROL FLOW FROM LISTING 2
[condition2]
[else]
[condition2]
[else]
of the first if statement wouldn’t get executed. In practice, it’s just as important to consider what happens when a certain piece of code isn’t executed. Testing of such situations needs to go beyond merely checking that every statement was executed. LISTING 2
output ++
output ++
[condition1]
[else]
[condition1]
[else]
return null
return new int[output]
return null
return new int[output]
[condition2]
[else]
[condition2]
[else]
Public class Listing2 { public static int[] method(byte input) { boolean condition1 = (input & 1) == 0; boolean condition2 = (input & 2) == 0; int output = -1; if (condition2) { output++; } if (condition1) { return null; } else { return new int[output]; } } }
LISTING 3 output ++
output ++
[condition1]
[else]
[condition1]
[else]
return null
return new int[output]
return null
return new int[output]
public class Listing3 extends junit.framework.TestCase { public void testMethod0() { int[] result = Listing2.method((byte)0); assertNull(result); } public void testMethod1() { int[] result = Listing2.method((byte)1); assertEquals(0, result.length); } }
Branch Coverage
Coverage Criteria
Statement Coverage
Independent of coverage granularity, there are a number of different coverage criteria that take different aspects of coverage into account. The scope of the various coverage criteria will be illustrated using the Java method shown in Listing 2, which uses a Java method to demonstrate different coverage criteria. The example method doesn’t perform any practical operation, but is fairly simple and quite helpful for illustrating all coverage criteria discussed in this article. For reasons of simplicity, the tested method has only a single input parameter. However, since a method with two parameters can also be viewed as a method with one parameter that happens to be a pair or tuple, the concepts discussed in this article apply just as well to methods with multiple parameters.
The most basic coverage criterion is statement coverage (also sometimes referred to as block coverage). Complete statement coverage is achieved when each statement of a tested method is executed by at least one test case. For the example method from Listing 2, two test cases, using the input values 0 and 1, are sufficient to achieve complete statement coverage. A corresponding JUnit test is shown in Listing 3, which shows an example JUnit test case that provides 100 percent statement coverage for Listing 2. The body of the first if statement will be executed for both inputs. Statement coverage doesn’t take into account what would happen if the condition of the first if statement evaluated to false, in which case the body
28
• Software Test & Performance
The coverage criterion that takes these situations into account is called branch coverage (also known as condition coverage or decision coverage). Complete branch coverage requires that all possible outcomes of a conditional expression are executed by a test case. For the example method, this means that there needs to be a test input where condition2 (in the first if statement) evaluates to false and where the body of the first if statement is therefore skipped. To achieve 100 percent branch coverage for the example method, you could add a third test case that uses an input value of 2. The test case that uses 0 as an input would be redundant in this scenario. If the test inputs are chosen carefully, you may be able to achieve complete branch coverage with the same number of inputs. However, this isn’t a general rule, and you might need more test cases to JUNE 2008
BETWEEN THE BEANS
achieve branch coverage than you need to achieve statement coverage.
Path Coverage Will 100 percent branch coverage guarantee that a piece of code is bugfree? Though it presents a significant improvement over mere statement coverage, complete branch coverage still provides no guarantee that a piece of code always behaves correctly. The example method in Listing 2 can actually throw a NegativeArraySizeException under certain circumstances. This problem occurs for all inputs where the body of the first if statement is skipped because condition2 evaluates to false, and at the same time condition1 evaluates to false as well. In these cases, the code will attempt to allocate an array of size -1. Branch coverage does cover both alternatives of the second if statement, but it doesn’t guarantee that the statement’s else branch is covered in combination with the body of the first if statement being skipped. To provide coverage for such situations, we need to look at combinations of branches, also known as code paths. This coverage criterion is called path coverage. Path coverage requires not only that every possible branch is executed at least once, but also that every possible combination of branches is executed at least once. Figure 1 shows the control flow of the example method from Listing 2 as UML activity diagrams. In each diagram, a different code path is highlighted in bold red. The example method has a total of four different code paths. To achieve 100 percent path coverage, you need a minimum of four different test cases (for example, using the input values 0, 1, 2 and 3).
The Problem With Path Coverage Currently, few available coverage tools support path coverage. There are a number of explanations for this: • High number of paths. The number of possible code paths typically increases exponentially with the cyclomatic complexity of a method. Achieving a high percentage of path coverage by manually writing test cases is effectively impossible for any method that contains more than just a handful of lines. Even automated test generation tools may have difficulties JUNE 2008
generating test cases for complete path coverage, especially when nested loops are involved. For example, a method with a sequence of 10 non-nested if statements already has 1,024 (or 210) possible paths. The code path for a loop that terminates after 499 iterations is different from the code path of the same loop terminating after 500 iterations. Similarly, an array store operation throwing an exception because of a null reference, and that same operation throwing an ArrayStoreException, need to be treated as different code paths. • Difficulty identifying and covering paths. Automated test generation tools for path coverage first need to determine which code paths are possible and which ones aren’t, and then to generate test inputs that cover all possible paths. Both steps are very timeintensive, and the accuracy of the results can’t really be guaranteed because the required analysis involves problems that are known to be NP-hard or even undecidable (like the infamous halting problem). • Representation challenges. Unlike statement and branch coverage, path coverage is difficult to visualize. Marking lines or expressions with different colors isn’t enough to convey this type of coverage information. Due to this lack of an easily understandable visualization, path coverage remains a difficult concept for many developers. The practical approaches for achieving path coverage are similar for automated and manual test generation. Instead of trying to find all possible code paths, it’s helpful to focus on “interesting” code paths. It makes little sense to write a test case that would execute a loop 499 times and then add another test case that executes the loop 500 times if nothing really different happens. Rather than using a top-down approach, it’s often more useful to use a bottom-up method that starts at possible points of failure and then finds code paths that would lead to the failures. The potential troublemakers include not only possible null references (NullPointerException), type
incompatibilities (ClassCastException, ArrayStoreException) and array boundary violations (ArrayIndexOutOfBoundsException), but also divisions by zero (ArithmeticException) and potential synchronization issues (IllegalMonitorStateException). The tricky part is that these exceptions occur as a side effect of low-level bytecode instructions and aren’t declared anywhere. Testing tools that are capable of performing a flow analysis of the tested code can be very helpful in identifying code paths that need further testing. LISTING 4 class Listing4 { public static int add(int a, int b) { return a + b; } } public class Listing4Test extends junit.framework.TestCase { public void testAdd0() { int result = Listing4.add(0, 0); assertEquals(0, result); } }
LISTING 5 class Listing5 { public static int add(int a, int b) { return 0; } }
Toward Full Regression Coverage As surprising as it may sound, even complete path coverage doesn’t mean that your code always behaves correctly. The simple add method in Listing 4 has only a single code path. Technically, you need only a single test case to achieve 100 percent path coverage. Listing 4 also contains a sample JUnit test for this scenario. If you’re using a test-driven development approach (TDD), you’ll be familiar with the idea of using unit tests as a kind of specification for the code under test. You first write a test that asserts what a new method is supposed to do, watch it fail, and add just enough implementation code to make it pass. If you show the test case from Listing 4 to a developer and ask for an implementation, the result might be a method that always returns 0, as shown in Listing 5. Interestingly, the developer wouldn’t even violate the spirit of TDD by doing so. There would be 100 www.stpmag.com •
29
BETWEEN THE BEANS
The question is which inputs to choose for your test cases. Extreme values (or so-called corner cases) are a good choice. For an int parameter, you could, for example, use Integer.MAX_VALUE, Integer.MAX_VALUE-1, 1, 0, -1, Integer .MIN_VALUE +1, and Integer.MIN _VALUE as test inputs. Similarly, a null reference, an empty string and various strings containing special characters make good test inputs for a String parameter. Test cases that aim at improved regression coverage often have a common test structure, but use different input values. In these situations, it may be helpful to use tools that support test case parameterization, which extracts the common test structure and separates the test data into an Excel spreadsheet or some other external data store. percent path coverage and no test failure, but obviously the implementation wouldn’t work properly. In real life, these kinds of issues rarely appear in the form of simple methods that add two values, but usually take a more complicated shape. The fundamental problem is that a set of unit tests might provide full path coverage for a method, but at the same time might not provide a complete specification of that method. One typical scenario is that a developer optimizes the implementation of an existing method in such a way that all test cases still pass, but previously unasserted functionality is inadvertently removed or changed. This can happen even with 100 percent path coverage. The illustrative code example from Listing 2 would require 256 different test cases (for all possible values of the type byte) to achieve full regression coverage. Without the additional test cases, a developer could replace the original implementation with simplified code that returns correct results only for the test inputs 0, 1, 2 and 3, but simply throws an UnsupportedOperationException for all other inputs. All existing test cases pass, yet a part of the original functionality is lost. Even a set of unit tests that provides full path coverage isn’t guaranteed to detect such a regression. If path coverage is already difficult to achieve, full regression coverage can only be called a practical impossibility. The add method from Listing 4 takes two integer parameters, each of
30
• Software Test & Performance
which can assume 232 different values. Full regression coverage for this method would require 264 (i.e., 232 ? 232) test cases. With a clever parameterized testing approach, it would be possible to represent all these test
Perturbation Testing Another option for improving the regression coverage of a test suite is socalled perturbation testing, which applies minor modifications (such as adding or
TABLE 1: FULL COVERAGE FOR LISTING 2 Coverage criterion
Minimum number of different inputs required for 100 percent coverage
Example of a minimum set of test inputs
Statement coverage
2
0, 1
Branch coverage
2
1, 2
Path coverage
4
0, 1, 2, 3
Full regression coverage
256
-128, ..., 127
cases in a compact fashion, but impossible to execute all tests within a reasonable time frame. Full regression coverage would detect even subtle problems like the infamous Pentium FDIV bug; unfortunately, it’s completely impractical for all intents and purposes. The effective impossibility to guarantee the proper functioning of something as simple as an adding method isn’t exactly trust-inspiring. So, short of creating test cases that exercise every possible combination of inputs, what can you do in practice to achieve a reasonable level of protection against regression failures?
subtracting a small number) to input values from existing tests with the goal of exposing new interesting code paths and asserting previously unasserted behavior. Corner case values and perturbation testing are heuristic rather than systematic, but both can still increase regression coverage and can be applied manually or by automated tools such as Parasoft Jtest. Table 1 provides a summary of the test cases that are necessary for achieving 100 percent statement, branch, path and full regression coverage for the tested method from Listing 2.
Corner Cases
Besides coverage granularity and the various coverage criteria, there is also the concept of coverage density, which is somewhat related to test overlap.
Clearly, the more different inputs you keep firing at a tested method, the better your regression coverage will be.
Coverage Density, Equivalence Classes and Test Overlap
JUNE 2008
BETWEEN THE BEANS
Generally, you want to avoid having multiple test cases test the same functionality (i.e., you want to minimize the test overlap). The idea behind this principle is that test cases break not only because of bugs in the code, but more frequently because of changes in the specification. Code behavior that was previously assumed to be correct may become incorrect or insufficient due to new requirements. If code is modified to reflect the new requirement, the test cases that assert the old behavior may break and need to be updated. If you’re following a TDD approach, the updating of the test cases should actually happen before the tested code is changed. In a test suite with a high level of test overlap, one small change in the specification (and implementation) may require a large number of test cases to be updated. In such a situation, the test cases were obviously overlapping in some code detail that was subject to change. Such occurrences are highly undesirable because they significantly increase the test suite’s maintenance cost.
from the same equivalence class. Equivalence classes vary according to the coverage criterion. For example, in terms of statement coverage, the input values 0 and 2 are both in the equivalence class for covering the
TABLE 2: FROM LISTING 2 Equivalence class I II III IV
Input values 0, 4, 8, 12, 16, ... 1, 5, 9, 13, 17, ... 2, 6, 10, 14, 18, ... 3, 7, 11, 15, 19, ...
return null statement of the example method, but they’d be in different equivalence classes when looking at path coverage. Identifying equivalence classes for test inputs is a useful tool for minimizing test overlap, but again, trouble looms when we move toward full regression coverage. If a test suite achieves full regression coverage for a particular method, this implies that the suite forms a complete specification of that method. Any change in the method’s behavior—no matter how minor—would result in a test failure.
Equivalence Classes Minimizing test overlap is closely related to the mathematical concept of a so-called equivalence class. In simplified mathematical terms, an equivalence class is a set of elements that are all equivalent to each other according to a certain relation. This idea is also relevant for test coverage. We previously identified four different code paths for the tested method shown in Listing 2. If you take a closer look at the code logic, you’ll notice that only the two least significant bits of the input are evaluated. The remaining bits of the input have no influence on the code path that is taken. So, effectively, testing the method with an input value of 4 will use the same code path that is taken for an input value of 0. In the same fashion, the input values 1, 5, 9, 13... will all cause the same code path to be taken. For the purposes of path coverage, the method from Listing 2 has four equivalence classes, which are summarized in Table 2. To achieve complete path coverage with minimal test overlap, it’s sufficient to pick one input value from each equivalence class. There’s nothing to be gained in terms of path coverage if multiple test cases use input values JUNE 2008
• A frequent problem is common code that is directly or indirectly executed by a large number of test cases.
• For the example method from Listing 2, we already determined that test cases with all 256 possible input values would be necessary. How many equivalence classes for test inputs would there be in terms of full regression coverage? Unfortunately, the answer to this question is 256. For full regression coverage, no input value is equivalent to any other input value. Full regression coverage
means that the behavior of a method is completely locked in. For example, even though the input values 0 and 4 are in the same equivalence class for the purpose of path coverage, they are in equivalence classes of their own for full regression coverage. If they were in the same equivalence class, this would imply that just picking one of the values (for instance, 4) would still satisfy the criterion of full regression coverage. However, in that case, it would be possible to implement the tested method in such a way that it works properly for the input of 4 but not for the input of 0 (for example, by adding a check that deliberately returns a wrong result if the input was 0). Therefore, 0 and 4 can’t be in the same equivalence class for full regression coverage.
Coverage Density When aiming for full regression coverage, the trick of picking only one input from each equivalence class can no longer be used for minimizing test overlap. Full regression coverage will always cause additional overlap. What other principles can be used to mitigate the negative effects of test overlap? A frequent problem is common code that is directly or indirectly executed by a large number of test cases. Specification changes affecting that common code are likely to cause a large number of failures. The goal is to avoid such concentrations and rewrite the tests in a way that limits the amount of commonly executed code. Coverage density is a helpful metric that can be used to create test suites that execute the tested code in a more evenly distributed fashion. Coverage density extends the dichotomy of “covered” versus “not covered” to a numeric metric that also counts how often a branch or path is executed. For example, instead of simply getting a yes or no answer as to whether a particular line was covered, coverage density would also tell you that the line was executed exactly 500 times. Coverage density can be applied to any coverage criterion, but most commonly it’s offered in conjunction with statement or branch coverage. Again, visualization of path coverage densities is just as problematic as visualizing simple “yes/no” path coverage. A common way of visualizing covwww.stpmag.com •
31
BETWEEN THE BEANS
erage density is to add colored markers with different brightness in the source code editor. For example, a light shade of green might indicate that a piece of code is covered by a few test cases, but an extremely dark shade of green would be a warning of a large concentration of test cases that all execute the same particular piece of code. Such warning indicators should ideally prompt a refactoring that moves the common code out of the code path.
How to C Yourself Understanding the various aspects of test coverage is the first step toward creating effective test suites. Coverage awareness is the next step. If you haven’t already done so, obtain a free or commercial tool for measuring the coverage that your unit tests actually achieve. Make sure that you understand exactly what coverage granularity and coverage criteria the selected tool supports. You should immediately add test cases for covering source code that isn’t
32
• Software Test & Performance
yet covered—possibly by means of additional tools that can automatically generate tests for existing source code. Today, a coverage goal of 70 to 80 percent of statement or, preferably, branch coverage is fairly standard (and pretty
• Parameterized testing can help to separate input data from test structure.
• realistic for most types of applications). However, keep in mind that statement or branch coverage, as well as merely line-based coverage granularity, can be misleading. Tools that support automated flow analysis can assist in finding additional interesting code paths. Expressions that can potentially throw exceptions
can be used as starting points for manually identifying new, interesting code paths in a bottom-up fashion. Additional tests that use corner-case inputs and perturbations of existing inputs can be used to safeguard against future regressions. As an additional measure, parameterized testing can help to separate input data from test structure. If possible, test overlap should be minimized by selecting one representative input from each equivalence class of test inputs. Coverage density analysis may be used to achieve evenly distributed execution frequencies for tested code. All coverage metrics should be monitored over time to ensure continuous improvement of code quality. Test coverage is just a single aspect of software quality. It’s important to note that a test suite might fully cover the tested code but inadvertently assert incorrect behavior. Although it’s usually wrong to assume—and we’ve all heard the consequences—it’s a good idea in this case to factor this assertion into your coverage testing. ý
JUNE 2008
By Marcus Borch
W
hen your only tool is a hammer, every problem looks like a nail. That bit of wisdom—
generally credited to Abraham Maslow, a noted psychologist in the first half of the twentieth century—if applied to software testing, could have disastrous results. Part of the agile process is to deliver working code by making use of unit tests. But selecting the wrong tool or clinging to processes because “We’ve always done it that way” will doom your efforts from the start. So what we’ve provided here is an approach to choosing a unit testing framework that’s sure to offer all the capabilities you need.
Side by Side For the purposes of this example, we chose Java as the target language. Of the available unit-testing frameworks, we’ll examine two: JUnit and JTiger. Evaluating a Framework in Five Steps: 1. Identify the tool requirements and criteria (the framework criteria) 2. Identify a list of tools that meet the criteria (JUnit and JTiger) 3. Assign a weight to each tool criteria based on imporMarcus Borch is a technical lead at Innovative Defense Technologies (IDT), a Virginia-based software testing consulting company specializing in automated testing. JUNE 2008
A Head-ToHead Look At JUnit And JTiger Open-Source Tool Projects www.stpmag.com •
33
UNIT TEST TOOL TIME
tance or priority 4. Evaluate each tool candidate and assign a score 5. Multiply the weight by each tool candidate score to get the tool’s overall score for comparison Table 1 contains an example list of unit-testing framework evaluation criteria. Based on these criteria, we derive weights for the importance of each feature to the overall framework criteria. For example, the more important the criteria are to the project, the higher the assigned weights from 1 to 5 (with 5 being the highest). Then we rank the target frameworks. This rank also ranges from 1 to 5 and is based on how closely each of the criteria is met by the framework. Weight and rank are then multiplied to produce a final score. Features and capabilities of candidate frameworks are then compared based on the resulting score to determine best fit for the project. For the purposes of this example, we’ve chosen a project that will introduce unit testing to its software development process for the first time. At the time of this writing, three major opensource Java unit testing frameworks were available: JUnit, JTiger and TestNG. All three offer a substantial set of functionality for unit-testing Java code. The groundbreaking JUnit by Kent Beck and Erich Gamma has been a force for more than 10 years, and is an obvious choice for providing baseline functionality for any team—and for the purposes of this comparison. JTiger was developed separately by Tony Morris with the intention of overcoming what he perceived as the shortcomings of JUnit. We therefore deemed it a valid competitor to JUnit for the sake of this exam-
34
• Software Test & Performance
TABLE 1: SCORE CARD JUnit
Tool Weight
JTiger
Score
Value
Score
Value
5
5
25
5
25
5
5
25
5
25
5
5
25
2
10
Multi-platform support
1
5
5
5
5
Licensing
4
5
20
5
20
Common IDE integration
4
4
16
3
12
No external dependencies
4
5
20
5
20
Test results reporting functions
5
3
15
5
25
xUnit
5
5
25
5
25
Unit Test Framework Evaluation Criteria Price Documentation
(1-5)
(availability, amount,…)
Support (forums, mailing lists,…)
176
ple. TestNG—developed by Cedric Beust for similar reasons—wasn’t evaluated because it’s based on JUnit and is therefore quite similar. Price: Since JUnit and JTiger are open source and free, both offer the advantages of no licensing fees and free (community) support, and aren’t tied to a single vendor. Documentation: Both frameworks have adequate documentation available online to facilitate integration, which is particularly necessary when working with a new tool and a team that’s unfamiliar with unit testing. Support: As important as documentation is to launching the integration effort, so is support to understanding the tool itself. JUnit was found to have an edge over JTiger in this respect. JTiger has an IRC channel dedicated to users of the framework, but that’s the extent of its online support. JUnit has
167
Web forums hosted by JUnit.org as well as an active Yahoo group dedicated to users of the framework. There’s also a JUnit section in Eclipse’s help files; the IDE includes JUnit by default. Multi-platform support: Since in this example, the client’s target language is Java, cross-platform support is implicit. A lower weight was assigned to this criterion, based on the assertion that all Java unit test frameworks will inherit from the language. Licensing: Both JUnit and JTiger are licensed under the Common Public License (www.opensource.org /licenses/cpl1.0.php). This license is similar to the GNU General Public License, but is intended to prevent contributors from charging fees for content versions to which they’ve contributed. Common IDE integration: The ease of IDE integration with the framework is important to the client’s development community. Software developers are more apt to become accustomed to using a tool that plugs into their IDE than one that doesn’t. In this example, the client has adopted Eclipse as their development environment. JTiger currently has no direct method for integrating with Eclipse. External dependencies: Neither JUnit nor JTiger require external libraries to be installed. Such minimization or elimination of dependencies helps reduce the tool’s footprint. xUnit: JUnit publicizes its membership in the xUnit unit testing framework group. Although there’s no direct mention of it on the JTiger Web site, it appears to follow this pattern. JUNE 2008
UNIT TEST TOOL TIME
The xUnit idea was originally developed by Kent Beck for Smalltalk (www.xprogramming.com/testfram.ht m). Frameworks that are a part of the xUnit family are expected to provide the following components: 1. Test fixtures – Used to create a state in which multiple test cases are executed. 2. Test suites – An aggregate of tests associated with the same test fixture. 3. Test execution – Execute a unit test with initialization and clean-up capabilities; and produce test results. 4. Assertion mechanism – A means to determine the success or failure of the unit under test. Selecting a unit test framework that sticks with this pattern gives some confidence that a minimal set of functionality will be provided. Community: JUnit is hosted at sourceforge.net, a predominant site for numerous open-source projects.
Sourceforge gives direct access to several project details and statistics such as activity percentile, number of bugs (open/total) and repository activity. It is currently being maintained by developers numbering just over a half dozen.
•
cific scope of unit testing, such as of database applications (DbUnit, www .dbunit.org/) or Java Enterprise Edition applications (JUnitEE, www.junitee .org/). A plus for JTiger is its reporting functionality. Output formats can be HTML, XML or plain text. This could prove beneficial when a unit-test-result artifact is a requirement for one of the stages in the client software development process model.
Given its history, wide adoption throughout the Java community and numerous recognitions of excellence, JUnit was selected.
• JTiger releases are self-hosted. Its Web site doesn’t provide the specifics about the projects that are presented for JUnit. Features: Advantage goes to JUnit for its readily available add-ons. These extensions can allow for targeting a spe-
And the Winner Is…
Both tools scored well, with JUnit just a few points ahead of JTiger. Given its history, wide adoption throughout the Java community and numerous recognitions of excellence, JUnit was selected for this project. Its longevity and success rate give it a rich repository of experience and knowledge, all conducive to the introduction of a unit testing framework for a new team. For teams that are unfamiliar with unit testing and related tools, such resources are key to a successful implementation. ý
NOMINATIONS NOW OPEN FOR
THE 2008 TESTERS CHOICE AWARDS "The Testers Choice Awards recognize excellence in software test and performance tools. The awards encompass the full range of tools designed to improve software quality." NOMINATIONS CLOSE JUNE 13 There is no limit on the number of products that may be nominated by a company, or on the number of categories for which a product may be nominated. There will be a processing fee for each nomination. All nominations must be received by June 13, 2008.
VOTING STARTS JULY 1 Voting enters you for a chance to win an iPod Touch!* WATCH YOUR E-MAIL BEGINNING JULY 1 FOR YOUR INVITATION TO VOTE. Online voting opens on July 1 and closes on July 30. Only qualified subscribers to Software Test & Performance may vote. Winners will be announced at the Software Test & Performance Conference Fall 2008, Sept. 24-26, 2008, in Boston, MA. The awards will appear in the November 2008 issue of Software Test& Performance. Questions? Contact editor Edward J. Correia, at ecorreia@bzmedia.com.
stpmag.com/testerschoice *Winner may choose between the iPod Touch and a $300 Amazon.com Gift Certificate
JUNE 2008
www.stpmag.com •
35
Best Prac t ices
Get on the Couch With Users to Grok Security Here’s the thing about writTool Evaluation project ing this column: Most of the of the U.S. Department topics are incredibly interof Homeland Security (http: esting to an incredibly small //samate.nist.gov/). number of people. This isn’t The paper, which will be meant to be a slight. It’s just published in the journal that—let’s be honest here— Ada Letters, describes most non-techies rarely stay Motorola’s experience up nights fretting about how using a handful of static to improve the state of unit analysis tools to automate testing or post-deployment the process of adhering to Geoff Koch performance tuning. secure coding standards. The masses, though, may, worry about The trio of Motorola authors, including gaps in software security, something that Margaret Nadworny, spent most of their affects everyone from grandmas to senior time discussing their work with Inforce, a government officials. This is one reason I quality- and security-oriented tool offered was so excited to dig into this month’s by Burlington, Mass.-based Klocwork. assignment. At long last, here was a “The static analysis tools on their own chance to talk about how the good work leave much to be desired—and this surof the test and QA community can help prised us,” says Nadworny, summarizing solve—and I don’t think I’m overstating the findings of the paper in an e-mail. this—one of this century’s most pressing Given that automating any part of the tech-related problems. coding process is difficult, it’s not surprisHowever, by the time I filed this coling that Nadworny listed several problem umn, my enthusiasm had evaporated. areas for future development (though I The reason is that software security seems had to grin when I read the recommenall but preordained to remain hopelessly dation about reducing cases of both false paradoxical. Sure, compared to other positives and false negatives. Does that thorny software issues, it’s arguably mean that every possible result returned grasped as a problem by the greatest by the tool is somewhat suspect?) number of mere mortals out there who What is surprising is that I was pointed don’t dream in code. But security is also to the paper by Klocwork CTO Gwyn least susceptible to fixes and solutions Fisher and company’s PR apparatchik. If proffered by the coding and testing the best proof point a tool vendor can crowds. To those of you who so earnestly offer is an oblique conference paper want to talk about things like memory offering only the faintest praise, it’s safe allocation and input validation, I can only to say we’re a long way from automating ask: Why bother? even the most basic processes that conNo, I’m not suffering from a bout of tribute to secure coding. summertime nihilism. Realism is more This is too bad, since tech security is like it. Let’s start with the reality that tools fast becoming a multi-headed hydra. Yes, available to help software organizations we have to worry about the possibility of turn out locked-down, secure code are sloppy coding by large-install-base vensadly lacking. Consider the case of a presdors. Who here believes that Microsoft entation by the Motorola Software Group will never again do something stupid, like at the Static Analysis Summit II, held Nov. allowing for the IDQ.DLL overflow in the 8–9 in Fairfax, Virginia and sponsored Code Red Worm? But we also have to by the Software Assurance Metrics and worry about the various breakwaters that
36
• Software Test & Performance
protect the growing number of networked devices, which, as you probably know, include pacemakers. That wireless radio embedded into Medtronic pacemakers to allow doctors to monitor and adjust the devices sans scalpel? Hacked by researchers at the University of Washington and the University of Massachusetts who, according to a March 12 article in The New York Times, warned that “too little attention was being paid to security in the growing number of medical implants being equipped with communications capabilities.”
Beginning to Haul in Big Phish Chest pains aside, the exploits of whitehat academics are far from the greatest cause for concern. Recently, real damage appears to have been done by a phishing scam that targeted thousands of highranking executives across the United States, as described by one of the Times’ senior tech reporters, John Markoff, in an April 16 article. Up to 2,000 of these leviathans—e-targeting of the rich and powerful is known as whaling—may have been affected, possibly compromising passwords or other personal or corporate information. More troubling still is the April 21 cover story in BusinessWeek on the increasing attacks on computers with national security data at U.S. government agencies and defense contractors. The article describes several cases of increasingly sophisticated digital espionage, often involving highly customized malware-embedded phishing e-mails. These targeted attacks have at least two things in common. First, the e-mails generally sail through antivirus programs untouched. Second, the phishing expediBest Practices columnist Geoff Koch thinks he understands the security settings in the Windows XP Security Center running on his Toshiba laptop. Contact him at gkoch at stanfordalumni.org. JUNE 2008
Best Practices tions rely on all-too-predictable lapses in judgment, an inviolable human trait that fuels my pessimism about coding and testing solutions to software security. Yes, there is such a thing as secure coding commandments—see, as just one example, information at the PHP Security Consortium (http://phpsec .org/)—addressing everything from use of library functions to handling of filenames to buffer overflows. You should attempt to follow and enforce these guidelines, probably with some combination of a disciplined use of a tool and manual inspection of your code base. But you should also get beyond your comfort zone and read up on the growing body of literature about addressing the end-user chink in software security’s armor. One theme in this literature is that users are baffled by even the most basic security settings in everyday applications. A 2006 paper in the journal Computers & Security by researchers at the U.K.’s University of Plymouth found that among 340 survey respondents, the
majority of whom were in the tech-savvy 17–29 age range, nearly half couldn’t understand terminology such as “unsigned ActiveX controls will not be downloaded” associated with security settings in Internet Explorer.
‘It Won’t Happen to Me’ A second theme, soundly covered by Ryan West’s cover article “The Psychology of Security” in the April 2008 issue of Communications of the ACM, is that all users suffer congenital fuzzy thinking about their susceptibility to security threats. More specifically, “People tend to believe they are less vulnerable to risks than others,” writes West, a Dell design researcher. It’s this trait that ensures that the phishing, spear fishing, whaling and other targeted attacks will continue well into the foreseeable future. Some assert that software prompts, dialog boxes and other coding measures can help remedy matters, both by providing users more plain-language descriptions of security features and clearer
ideas about the potential consequences of risky behavior online. But in light of unending string of examples of our seemingly hard-wired hubris, I’m betting that software solutions will collectively remain a toothless tiger to the bad guys. My advice to coders and testers is to read up on the foibles of human decision-making, just so you get some idea what you’re up against. (That’s right. Your users are as big a problem as the criminals.) You might start with “Catastrophe: Risk and Response,” a 2004 book by federal appellate judge and allaround polymath Richard Posner. Time magazine’s Nov. 26, 2006, cover story “How Americans Are Living Dangerously” is another good overview. As for me, I’m happy enough to be headed back to the relative safety of next month’s topic: dynamic and static code analysis. I’ve decided I’m perfectly content to trade away end-user distractions and national security implications for a little bit of obscurity any day of the week. ý
Index to Advertisers Advertiser
URL
Automated QA
www.testcomplete.com/stp
10
Black Hat
www.blackhat.com
19
Checkpoint Technologies
www.checkpointech.com/BuildIT
32
Empirix
www.empirix.com/freedom
4
Hewlett-Packard
hp.com/go/securitysoftware
40
iTKO
www.itko.com/lisa
Pragmatic
www.softwareplanner.com
Parasoft
www.parasoft.com/adp www.parasoft.com/qualitysolution
Software Test & Performance
www.stpmag.com
Software Test & Performance Conference
www.stpcon.com
Testers Choice Awards
www.stpmag/testerschoice
JUNE 2008
Page
8
18
6
37, 39 2, 3
35
www.stpmag.com •
37
Future Future Test
Test
The Importance Of Being Visible One of the worst things a Guides for defect managetesting organization can do ment, test project manageis to operate in general ment and user engagement obscurity from the rest of IT. (user guide) are the priIf a testing organization mary mechanisms for comdoesn’t treat the developmunication. ment organization, business The user guide outlines analysts and users like “cusfor all consumers of testing tomers,” it will lose credibiliservices exactly how to ty as the “team that operates engage the testing team, behind the curtain,” and defines expectations and David Kapfhammer ultimately become ineffecgives an understanding of tive. Testing should be conducted visibly. the type of testing services available to The importance of testing visibly can’t the project team. be understated. Providing line of sight The user guide should outline a set of into the operations of a testing organizaprocedures for the project team to tion is crucial for the effective manageemploy in its interaction with the testing ment of not only the testing team, but the team, and provide detail on several cateentire IT project. gories of activity, which include: Quite often, testing teams do just the • The systems that testing will support opposite, intentionally conducting their • The organization of the testing team test management and execution in • How users of testing activities obscurity. The project team doesn’t know, request and receive services nor do they understand what’s going on • Principles that drive governance, in the testing phases of the SDLC. Testing decision making and validation teams feel compelled, for some reason, to • An orchestration document that keep their activities secret, only to surface connects all of the testing team’s when there are bugs to be reported. defining documents for the user Why is this? Perhaps they feel that to • Service-level agreements (SLA) provide and unbiased perspective, they Project teams often don’t know what’s must be completely disconnected from happening during the testing phases of a the rest of the project team. This condiproject. As a result, when budgets need to tion is an unfortunate reality that often be trimmed, the testing program is often ends in frustration. on the chopping block. However, clearly Creating visibility into the testing team codified procedures and deliverables and all of its operations can provide comhelp a testing organization illustrate their fort to the project team, develop expectaworth to the overall effort and provide a tions and cultivate predictability. set of expectations for the project team. Allowing the project team insight into The testing organization can establish test operations doesn’t discount the credthe user guide as the vehicle for assigning ibility of the tests, nor does it disintegrate expectations to the rest of the project the importance of being unbiased. team. For example, turnaround time for bug fixes can be defined there, and subHow to Add Visibility sequent impact to the project can be conThe testing team should create a series of veyed if those expectations aren’t met. “defining documents” that outline specifThese are often referred to as service-level agreements (SLA). Furthermore, the user ic ways to create visibility for the team.
38
• Software Test & Performance
guide can make a reference to the defect management guide at this stage, inviting all readers to familiarize themselves with the test team’s processes. Business demands for shorter project schedules, combined with a greater degree of impact to an organization when faulty code is deployed, have generated more demand for repeatable, predictable testing. This surge in the marketplace has manifested itself in a variety of ways. One of the more visible means has been a push by organizations to have a clearly defined blueprint for their testing and quality assurance needs. Testing and QA are starting to make an appearance at the executive levels via enterprise-wide testing strategies, enabled through a clearly defined future-state road map. Road map and strategy creation requires foundational principles; testing visibly is one of them. In addition to the positioning of testing and QA at the executive levels of organizations, the testing and QA industry has also been responding with its own mechanisms for the future. The Quality Assurance Institute (QAI), the world’s leading institute for providing effective solutions for testing in the information services profession, is finalizing its development of a quantitative rating system. These ratings will allow organizations to quantify their ability to perform testing and quality assurance. This type of measurement of organizational capability is an indicator of the future expectations for testing and quality assurance. At the onset of a project, the testing team should issue a user manual that details how to engage testing services, including a specific operational workflow breakdown, to maintain the testing organization’s open-door policy. All members of an IT project team should know exactly what the test team is doing and where they stand with respect to schedule, cost and scope. By testing visibly and providing clear insight into testing operations, project teams can more effectively communicate. More often than not, this results in more efficient operations. ý David Kapfhammer is practice director of quality assurance and testing solutions at Keane, an IT services and outsourcing consultancy. JUNE 2008
where you can · Download the latest issue of ST&P or back issues you may have missed · Read the latest FREE white papers · Watch our FREE webinars · Sign up for our FREE e-newsletter, Test & QA Report · Visit our wildly popular technical conferences · Visit our sponsors to take advantage of their many valuable offers · Use our quick links to visit ST&P’s print advertisers · Renew your subscription