: s ST ES BE CTIC ting ent A a m PR ner uire Ge Req r tte Be
Public
ation
A
UE 5 • E 4 • ISS
MAY 2007
• $8.95
VOLUM
www.
stpm ag.co
m
Dressing Down SOA Layers for Effective End-to-End Testing Find Java Leaks With a Peek at The Permanent Generation
What Your Mama Never Taught You About Securing The SDLC
Zipping Through Bugs Before They Zap Your App
Trying to be agile when your Java code is fragile? Feeling the pressure to release software faster? Are you bringing new features to market as quickly as your business demands? If enhancing or extending your Java application feels risky – if you need to be agile, but instead find yourself hanging on by a thread – AgitarOne can help.
With AgitarOne’s powerful, automated unit testing features, you can create a safety net of tests that detect changes, so you know instantly when a new feature breaks something. Now you can enhance and extend your Java applications – fast, and without fear. And with AgitarOne’s interactive capabilities for exploratory testing, it’s easy to test your code as you write it. Whether you’re building a new application, adding new features to existing software, or simply chasing a bug, AgitarOne can help you stay a jump ahead.
©2007 Agitar Software, Inc.
4AKE THE
HANDCUFFS OFF
QUALITY ASSURANCE
Empirix gives you the freedom to test your way. Tired of being held captive by proprietary scripting? Empirix offers a suite of testing solutions that allow you to take your QA initiatives wherever you like. Download our white paper, “Lowering Switching Costs for Load Testing Software,” and let Empirix set you free.
www.empirix.com/freedom
VOLUME 4 • ISSUE 5 • MAY 2007
Contents
14
A
Publication
COV ER STORY Get Continuous Integration Into Your Head and Your Workflow
When it comes to battling bugs, the best defense is a fast defense. With CI, you can tackle problems as they arise, while they’re still fresh in your By Jeffrey Fredrick mind—in the time it takes for a coffee break!
20
A Pattern For SOA Testing
Testing a service-oriented architecture requires a clear eye and a serious plan. By analyzing your components at three levels, you can dress your SOAs for By David S. Linthicum success.
26
Depar t ments
Java Leaks Are Not Just For the Young
7 • Editorial Calling on the podmeisters for requirements inspiration.
Your Java app is showing its age! We conclude the series by taking on leaks in the permanent generation. By Gregg Sporar, A. Sundararajan and Frank Kieviet
8 • Contributors Get to know this month’s experts and the best practices they preach.
35
9 • Feedback
Have to Go Back To School to Learn To Secure Your SDLC?
Now it’s your turn to tell us where to go.
It’s now common knowledge: Security should be built in, early in the life cycle. Here’s how to take the guesswork out of implementation. By Ryan Berg
40 • Best Practices
10 • Out of the Box New products for developers and testers.
Attention, testers! It’s time to go back to basic By Geoff Koch training.
42 • Future Test Achieve strategic benefits with an office of test management. By Mark Sloan
MAY 2007
www.stpmag.com •
5
Qbsbtpgu!TPBuftu
UN
7FSJmFT 8FC TFSWJDFT JOUFSPQFSBCJMJUZ BOE TFDVSJUZ DPNQMJBODF 40"UFTU XBT BXBSEFE i#FTU 40" 5FTUJOH 5PPMw CZ 4ZT $PO .FEJB 3FBEFST
8FC 4FSWJDFT
"QQMJDBUJPO 4FSWFS
Qbsbtpgu!Kuftu
UN
7FSJmFT +BWB TFDVSJUZ BOE QFSGPSNBODF DPNQMJBODF +VEHFE *OGP8PSME T 5FDIOPMPHZ PG UIF :FBS QJDL GPS BVUPNBUFE +BWB VOJU UFTUJOH
%BUBCBTF 4FSWFS
*NQSPWJOH QSPEVDUJWJUZ DBO TPNFUJNFT CF B MJUUMF TLFUDIZ
"QQMJDBUJPO -PHJD
1SFTFOUBUJPO -BZFS
-FHBDZ
Qbsbtpgu!XfcLjoh
8FCTJUF
-FU 1BSBTPGU mMM JO UIF CMBOLT XJUI PVS 8FC QSPEVDUJWJUZ TVJUF 1BSBTPGU QSPEVDUT IBWF CFFO IFMQJOH TPGUXBSF EFWFMPQFST JNQSPWF QSPEVDUJWJUZ GPS PWFS ZFBST +UFTU 8FC,JOH BOE 40"UFTU XPSL UPHFUIFS UP HJWF ZPV B DPNQSFIFOTJWF MPPL BU UIF DPEF ZPV WF XSJUUFO TP ZPV DBO CF TVSF ZPV SF CVJMEJOH UP TQFD
5IJO $MJFOU
UN
7FSJmFT )5.- MJOLT BDDFTTJCJMJUZ BOE CSBOE VTBHF BOE NBOBHFT UIF LFZ BSFBT PG TFDVSJUZ BOE BOBMZTJT JO B TJOHMF JOUFHSBUFE UFTU TVJUF
UIF OFX DPEF EPFTO U CSFBL XPSLJOH QSPEVDU BOE BOZ QSPCMFNT DBO CF mYFE JNNFEJBUFMZ 8IJDI NFBOT ZPV MM CF XSJUJOH CFUUFS DPEF GBTUFS 4P NBLF 1BSBTPGU QBSU PG IPX ZPV XPSL UPEBZ "OE ESBX PO PVS FYQFSUJTF
(P UP XXX QBSBTPGU DPN 451NBH t 0S DBMM Y ª 1BSBTPGU $PSQPSBUJPO "MM PUIFS DPNQBOZ BOE PS QSPEVDU OBNFT NFOUJPOFE BSF USBEFNBSLT PG UIFJS SFTQFDUJWF PXOFST
Ed Notes VOLUME 4 • ISSUE 5 • MAY 2007 Editor Edward J. Correia +1-631-421-4158 x100 ecorreia@bzmedia.com
EDITORIAL Editorial Director Alan Zeichick +1-650-359-4763 alan@bzmedia.com
Copy Editor Laurie O’Connell loconnell@bzmedia.com
Contributing Editor Geoff Koch koch.geoff@gmail.com
ART & PRODUCTION Art Director LuAnn T. Palazzo lpalazzo@bzmedia.com
Art /Production Assistant Erin Broadhurst ebroadhurst@bzmedia.com
SALES & MARKETING Publisher
Ted Bahr +1-631-421-4158 x101 ted@bzmedia.com Associate Publisher
List Services
David Karp +1-631-421-4158 x102 dkarp@bzmedia.com
Agnes Vanek +1-631-421-4158 x111 avanek@bzmedia.com
Advertising Traffic
Reprints
Phyllis Oakes +1-631-421-4158 x115 poakes@bzmedia.com
Lisa Abelson +1-516-379-7097 labelson@bzmedia.com
Marketing Manager
Accounting
Marilyn Daly +1-631-421-4158 x118 mdaly@bzmedia.com
Viena Isaray +1-631-421-4158 x110 visaray@bzmedia.com
READER SERVICE Director of Circulation
Agnes Vanek +1-631-421-4158 x111 avanek@bzmedia.com
Customer Service/ Subscriptions
+1-847-763-9692 stpmag@halldata.com
Cover Photo Illustration by The Design Diva, NY
President Ted Bahr Executive Vice President Alan Zeichick
BZ Media LLC 7 High Street, Suite 407 Huntington, NY 11743 +1-631-421-4158 fax +1-631-421-4130 www.bzmedia.com info@bzmedia.com
Software Test & Performance (ISSN- #1548-3460) is published monthly by BZ Media LLC, 7 High St. Suite 407, Huntington, NY, 11743. Periodicals postage paid at Huntington, NY and additional offices. Software Test & Performance is a registered trademark of BZ Media LLC. All contents copyrighted 2007 BZ Media LLC. All rights reserved. The price of a one year subscription is US $49.95, $69.95 in Canada, $99.95 elsewhere. POSTMASTER: Send changes of address to Software Test & Performance, PO Box 2169, Skokie, IL 60076. Software Test & Performance Subscribers Services may be reached at stpmag@halldata.com or by calling 1-847-763-9692.
MAY 2007
Where Art Thou, Podcasts? If you’re a subscriber to my grams were misplaced; weekly newsletter and don’t cables had to be run in the always have time to read it, ceiling. More processes you may soon have the were created, others fineoption to listen; the Test & tuned; requirements were QA Report will one day have adjusted. an audio counterpart. After much effort, schedBut I’m not sure exactly ule shifting and compreswhen that day will be. It truly sion, prototype programs amazes me that despite all were produced. This led to the technology and talented deployment decisions, prespeople to run it, we still can’t entation decisions, deciEdward J. Correia predict all the hurdles we’ll sions about program stumble across when trying to implement description and myriad others involving something as simple as a podcast. tone, style and content. A feedback loop I realize that simple is a relative term, was created followed by new prototypes and one that I use quite deliberately and based on that feedback. Requirements for your benefit, dear reader. Because and processes were adjusted. compared with the complexities faced Today, the project is close to launchevery day by you and your development ing. We’re now creating landing pages for and testing team, a few words spoken into downloading and streaming the proa microphone surrounded by music seem grams. Soon we’ll be deciding where to little more than a glorified greeting on add links to our newsletters and elsesomeone’s answering machine. where. We’re also thinking about pubWith that disclaimer out of the way— lishing our podcasts through Apple’s and props to a Web team that works tireiTunes client. lessly to produce several terrific proBefore I became a podcast subscriber grams, including mine—I’ll describe and producer, I had little use for iTunes some of the trials we encountered during but to download songs for my kids. But our humble podcasting project. since then, I’ve developed a new respect The project’s requirements were for the tool, which I can only characterize laid out several months ago, when the as a brilliant utility—a simple distribution decision was made by management vehicle for the producer and acquisition that we should embark on this jourvehicle for the consumer. ney—along with the blog, one of the Like so many things Apple, its brilhallmarks of our age’s new media. As liance is in its simple utility. As a synchromanagement communicated the nization tool, it automatically keeps hard requirements to senior staff, we were drive content in perfect harmony with all on board and immediately set out to one or more iPods (and non-Apple develop a project plan, including some devices with modification). And for podrudimentary processes. casts, it synchronizes your hard drive with A training company was engaged. any you subscribe to, regardless of the There were learning curves as editors publisher. Simply paste the publisher’s learned to be script writers and on-air talpodcast URL in the appropriate iTunes ent, Web designers learned to be radio dialog, and presto—fresh content autoproducers, and Web coders studied RSS. matically downloads every time you As the infrastructure was slowly put into launch the program. place, machines wouldn’t talk to each If only the “Voice of Test and QA” proother; files created by unfamiliar programs were as easy to create. ý www.stpmag.com •
7
Contributors JEFFREY FREDRICK is a top committer for CruiseControl, an open source Java framework that facilitates a continuous build process with capabilities such as source control and e-mail notification plug-ins. His two-part article on continuous integration begins on page 14 as our lead story. In part one, he introduces his proven techniques for discovering defects early with the practice of continuous testing, and how to work it into your process. Jeffrey is also head of product management for Agitar Software, which develops unit-testing tools.
With the prevalence of service-oriented architectures showing no sign of abating, we’re pleased to present the first of a three-part series on SOA testing by DAVID LINTHICUM, an internationally renowned expert in the areas of enterprise application integration and SOA. In part one, beginning on page 20, David defines the components commonly found in SOAs, and how to test each for reusability, security, abstraction and orchestration. He draws on his experiences in numerous posts at AT&T, Bridgewerx, EDS, Ernst and Young, Mercator Software and Mobil Oil. David is currently CEO of Linthicum Group, a corporate SOA consultancy.
GREGG SPORAR (left) and A. SUNDARARAJAN conclude their fine series on Java memory leak detection with an indepth look at leaks in the permanent generation. For this article, which begins on page 26, they’re joined by their colleague at Sun Microsystems, senior staff engineer FRANK KIEVIET (right). Frank has been working on server back-end systems and infrastructure components for the last 10+ years, most of them using Java. His current duties at Sun include work in the SOA/Business Integration group.
RYAN BERG is a co-founder and chief scientist at Ounce Labs, a maker of security tools. For our Security Zone special section, he describes techniques for implementing security early in the development and testing life cycle, rather than as a bolt-on after or immediately prior to deployment. In the late 1990s, Ryan designed and developed the infrastructure for the managed firewall and security services at GTE Internetworking/Genuity. Prior to founding Ounce, Ryan co-founded Qiave Technologies—a pioneer in kernel-level security—which was acquired by WatchGuard Technologies in October, 2000. Security Zone begins on page 35. TO CONTACT AN AUTHOR, please send e-mail to feedback@bzmedia.com. MAY 2007
Feedback DIY JOB DEFINITION
EARLIER=CHEAPER Edward J. Correia’s “Eliminate Errors Before They’re Set in Code” (Test & QA Report, Feb. 27, 2007) is based on a very sound development principle well-known in manufacturing:The earlier mistakes are discovered, the less expensive it is to fix them. So obviously,requirements reviews and design reviews are good practices in systems and software engineering for detecting and removing defects before progressing to the next phase of development. But reviews remain an inspection-based approach for quality control. Inspecting products—whether documents, databases or manufactured parts—means that some work has already been invested in the products. Defects detected in inspections still mean waste. More modern approaches to quality, such as TQM, Poka Yoke and Taguchi, take a quality assurance approach: Review the processes by which products are produced, identify the factors that produce errors, and eliminate those factors to make the results of the processes more tightly predictable. That is an approach of defect prevention rather than defect detection. One of the most revolutionary improvements in manufacturing has been the advent of three-dimensional modeling that detects conflicting parts so part designs can be changed before they are manufactured.The success of the Boeing 777 program was partly due to tremendous cost savings from Boeing’s application of 3D computer-aided design: Aircraft fuselage, aileron, stabilizer and wing parts produced all over the world came together on assembly floors and fit together within a few thousandths of an inch. In systems and software engineering, such technology would take the form of formal specifications with capabilities to detect incompleteness and mismatched interfaces. Such technologies exist, such as T-VEC’s RAVE and the techniques employed by Praxis High Integrity Systems. However, such technologies and methods are widely regarded by most of the industry as excessively rigorous and a tedium to be avoided. Forget that blueprint stuff, they know how to build houses. So, they wind up fixing mistakes instead. Phil Boettge Moorestown, NJ
RECIPROCITY ROCKS! Excellent article: “No Shortage of Bad Leaders Out There (T&QA, Mar. 6, 2007). I like the reflection at the end for employees to evaluate ourselves, too, along with the quote “If you always give, you will always have.” Rich Guard Indianapolis, IN
SEEKING SYNCHRONIZATION I enjoyed reading the column “No Shortage of Bad Leaders Out There.” In my opinion, sometimes the employee’s behavior also triggers negative behavior from the manager. To avoid such situations, effort is needed from both sides. The employee should look into himself MAY 2007
and analyze which of his actions could lead the manager to behave badly. It can really be worth spending time to analyze yourself—you may find a few small things that can be corrected or improved with insignificant effort. On the manager’s part, this situation is a real management challenge. If the manager finds some actions of an employee that are not in line with the organization’s needs, then he/she must be capable enough to communicate it effectively to the employee. At the end, a good synchronization of thoughts is necessary for sustainable development and growth of the organization. Piyush Jain Noida, India
Here’s one for you. I work in an IT shop, and over the past five years, it’s naturally evolved as technology progresses. I spoke with my “Leader” and asked her to define my duties in this shop so that I may know what I am responsible for. Her response was that I need to figure that out for myself. I stared at her and asked again, explaining that with all the additions and removals of certain applications and functions, my co-workers and I seem to be stepping on each others’ toes, sometimes performing the same actions or maintenance twice on systems recently added. She again looked at me and said that I need to figure that out for myself. Frustration has set in. Name withheld
HUMILIATED & DEMORALIZED I personally feel that anyone capable of getting the work done from an employee is not just enough. It requires mentoring from time to time. I was in the same situation described in “Team Leaders Behaving Badly”: humiliated in front of my coworkers, and my manager blasting away whatever would come out of his mouth. This led to my emotional and professional demoralization. Finally, I got an opportunity I was waiting for, and jumped from the company to a new organization. I have found mentors and not just managers. It gives me great pride to work in such an organization. It’s not how you treat your employees, it’s how you treat them well for what they are. And, last but not least, if I were a manager, I would [strive to] be a guide and support system for my team. Great article! Name withheld FEEDBACK: Letters should include the writer’s name, city, state, company affiliation, e-mail address and daytime phone number. Send your thoughts to feedback@bzmedia.com. Letters become the property of BZ Media and may be edited for space and style. www.stpmag.com •
9
Out of t he Box
Framework Takes Total Approach to Multi-Core The 21st century clearly is shaping up to be the age of processor cores. Showing a grasp of this fact is TotalView Technologies, which in early April unveiled the TotalView Multi-Core Debugging Framework, part of a suite it says is designed to “simplify the complexities of multi-core debugging.” The company (formerly known as Etnus) identifies five areas it calls essential to creating and debugging the multi-threaded, multi-process applications needed for emerging platforms. They are source code, memory, performance, data-centricity and active Web. Later this year, the company will introduce products for the latter three. For the first two, the company has enhanced its TotalView Debugger and introduced MemoryScape, a new interactive memory profiling tool that it claims can identify and help resolve problems on running applications. MemoryScape works with the TotalView Debugger to find memory leaks, heap allocation overwrites and other such
TotalView Multi-Core Debugging Framework now includes MemoryScape, an interactive memory profiling tool that operates on running applications.
problems that can only be detected during execution. Now at version 8.1, TotalView Debugger gives Linux, Mac OS X and Unix developers and testers a “single view of the complete application, parallel process acquisition and control; advanced breakpoints and watchpoints,” and offers the ability to test fixes to code in real time, according to a company document. The latest version enhances break-
point setting capabilities, letting testers set C++ breakpoints on all methods of a class or all functions that have the same name but different arguments. Also new is the ability to show class static variable data and to set rules for source code searches. Licensing restrictions also have been relaxed, making the tool more accessible to smaller teams and individuals, the company said, though pricing was not disclosed.
Devices Become Central to Adobe’s CS3 If you’ve ever developed content for mobile devices, you know that constrained resources, deployment snafus and maintenance are just the first few hurdles. “The main challenge for content publishers is fragmentation,” said Bill Perry, manager of global developer programs at Adobe. Nokia, for example, offers 28 different models with the Flash player installed, each potentially requiring its own port, he said. “Unlike a desktop or Web page, the mobile space has different runtime engines for Java, Symbian, etc. Some devices have eight APIs; others have seven. Developers spend about 60 percent of their time testing and porting.” Adobe began to address the problem
10
• Software Test & Performance
in March with the release of Device Central, a component of its Creative Suite 3. The tool allows developers and designers to build application mockups that adhere to device specs stored within the tool. “Flash content providers said it has helped,” said Perry of the tool, which was released for beta testing last December. “They are able to create apps in about a third the time. The only modification is for screen size.” The tool also will be included with new versions of Flash, Photoshop and Premier stand-alone products. “There are thousands of devices out there; one carrier might have 40 to 50 devices,” said Perry. “So if you’re creating mobile content, being able to physically
acquire those devices for testing and tweaking takes time.” Device Central will solve that chaos through device profiles and emulators. “As a user,” Perry explained, “I can look up a profile and see video codecs, graphics supported, screen resolution, languages, APIs, HTML support, browser” and countless other device-specific specifications. Device Central will permit profiles to be grouped by screen size, orientation or other physical characteristics, and will help facilitate reuse of elements such as bitmaps, and to optimize them to reduce file size. “Larger file size means more waiting for downloads and higher transfer costs of applications sent over the air,” Perry said. MAY 2007
COVERITY FINDS THE DEADLY DEFECTS THAT OTHERWISE GO UNDETECTED. Your source code is one of your organization’s most valuable assets. How can you be sure there are no hidden bugs? Coverity offers advanced source code analysis products for the detection of hazardous defects and security vulnerabilities, which help remove the obstacles to writing and deploying complex software. With Coverity, catastrophic errors are identified immediately as you write code, assuring the highest possible code quality— no matter how complex your code base. FREE TRIAL: Let us show you what evil lurks in your code. Go to www8.coverity.com to request a free trial that will scan your code and identify defects hidden in it.
© 2007 Coverity, Inc. All rights reserved.
Your code is either coverity clean—or it’s not.
Reticulitermes Hesperus, or Subterranean Termite—unchecked, property damage estimated at $3 billion per year. Electron Micrograph, 140X
GUI Testing Now Automatic Instantiations in March released WindowTester Pro 2.0, a tool based on its RCP Developer that quickens GUI testing by recording user interactions for later playback and editing. What’s unique about WindowTester Pro, according to the company, is its ability to store use cases in Java, offering greater flexibility than competitive tools using XML or proprietary scripts. “Developers resist GUI testing because it’s complex, time-consuming and diffuses their focus on product development,” said Instantiations CTO Dan Rubel. “WindowTester reduces both the testing complexity and time needed,” he added, by automating the recording, test generation, code coverage and playback of GUI interactions without a high degree of hand coding and maintenance. Code output is based on the JUnit framework. Pricing starts at US$319 per developer per year. WindowTester Pro 2.0 is available now at www.instantiations.com /windowtester.
Bug Detective Takes Up Residence There’s a Bug Detective in C++test 7.0, Parasoft’s latest code analysis, review and automated unit testing suite for Linux, Unix and Windows. According to the company, the new sleuth finds runtime defects by tracing and simulating execution paths that would otherwise elude authorities and manual tests or inspections. Also new is a code review module, which the company said “automates preparation, notification and tracking of peer code reviews,” doing the job that many team members won’t. By using the tool, “teams can establish a bulletproof review process where all new code gets reviewed and all identified issues get resolved,” according to a company news release. C++test 7 also now integrates with Visual Studio 2003 and 2005, Eclipse 3.1 and 3.2, and Workbench versions 2.5 and higher (Wind River’s Eclipse-based device development environment for Linux and Windows). The company also released Insure++
12
• Software Test & Performance
7.1, which it says “now verifies proper use of the Standard Template Library (STL) during runtime” and dynamically verifies containers, iterators, pointers and references invoked by the STL, which is among the libraries in the C++ Standard Library.
They’ll Be Calling You Build Meister Further expanding the law-enforcement metaphor, OpenMake Software on May 1 was set to begin shipping Meister 7.0, a version of its build management tool that it says lets testers “expose the build forensics needed to link production binaries” back to their original DNA, or source code. According to OpenMake, Meister 7.0 links with a central knowledge base that contains build-to-release information, connects developers with production results, and gives test teams better traceability of failed builds. OpenMake CTO Mike Taylor in a statement claimed that Meister enhances agile and continuous development methodologies. “By minimizing redundant scripting tasks and supporting a self-documented build-to-release process that is community developed… agile developers will find that Meister’s Build Methods will enable them to develop builds that are as adaptable as their development processes,” he said, referring to Meister’s set of extensible build best-practices. Pricing starts at $875 per named seat.
NullSafe Code Scan Tool Bears Attention Are software errors becoming a bear? Is your task list too much to bear? Do you code all night until your knuckles are bare? Is this humor becoming unbearable? If you answered yes to any of these questions, then news from Smart Bear Software might lessen your despair. The company, a maker of peer code review and other tester tools, has released NullSafe, an automated defectdetection tool that bears attention. According to company claims, the tool scans Java code and detects null pointer exception errors, which it says account for as much as 10 percent of all programming errors.
“This type of error can be challenging to track down,” said Smart Bear founder and CEO Jason Cohen. “As programmers ourselves, we are excited to have this tool available. NullSafe will save folks a lot of time and will help them produce better code overall.” What differentiates NullSafe from competitive static code analyzers, said Cohen, is its accuracy. “Other tools deliver so many false positives that they are unusable. NullSafe’s analysis algorithm ensures fast and accurate location of errors,” Cohen claimed. The tool also permits strictness configuration and control of enforcement rules. Target source code is not required. Pricing starts at $99.95 per seat; barely a blip on most budgets.
Eclipse, Vista Smooth as Silk Borland Software has brought manual testing capabilities to its Silk line of automated testing tools, and integrated the tools with Eclipse and Vista. SilkCentral Test Manager brings a centralized test results console to Eclipse and helps to simplify manual testing from within the environment. According to a company news release, a new “test-tocode” impact analysis feature allows testers to “visualize the direct relationships between lines of code and software tests,” to set test priorities, confirm test coverage and control maintenance. Test Manager’s manual testing capabilities are implemented via an Eclipsebased app, which “guides testers through the manual testing process,” capturing test results along the way and sending them to the console for further processing, analysis and management. SilkPerformer, Borland’s JMX-based Java server monitoring tool, also now works with Eclipse, and enables developers and testers to test and manage performance of their Java apps from within the environment. Users of Borland’s SilkTest automated functional and regression testing can now use it to test their UIs on Vista, Windows XP 64, IE 7 and .NET. All are available now. Send product announcements to stpnews@bzmedia.com MAY 2007
Memory Loss Affecting Your Multi-Threaded, Multi-Process Application?
Download your free 15-day trial version of MemoryScape, the newest memory debugger from TotalView Technologies. Provided by TotalView Technologies, the leader in multi-core debugging software, MemoryScape is specifically designed to address the unique memory debugging challenges that exist in complex applications. MemoryScape supports C, C++ and Fortran on Linux, UNIX and Mac OS X, and is a focused, efficient and intuitive memory debugger that helps you quickly understand how your program is using memory as well as identify and resolve memory problems such as memory leaks and memory allocation errors. MemoryScape utilizes wizard-based tools, provides collaboration facilities and does not require instrumentation, relinking, or rebuilding. Its graphical and intuitive user interface simplifies memory debugging throughout the memory debugging process and has been proven to find critical memory errors up to 500% quicker. Before you forget, go to our web site at www.totalviewtech.com/memoryscape or call 1-800-856-3766 for more information. Š 2007 TotalView Technologies, LLC TotalView is a registered trademark of TotalView Technologies, LLC. All other names are trademarks of their respective holders.
By Jeffrey Fredrick
“H
ow long will it take you to find the bug?” As a conference speaker, I’ve used that question
for the past few years to challenge audience members to change the way they think about their development process. You can try the home version of this test now. If just before you started this article, you’d gone to your desk and—your mind wandering as you anticipated reading the ideas presented here—you accidentally checked in a bug… when is the first time someone is likely to discover the problem?
I’ve experienced this strange and disturbing phenomenon firsthand. Code that seemed clear and straightforward is mystically transformed into a cipher in only a few short weeks. What was I thinking when I wrote that? I now have no idea. These costs at the level of individuals mount up with the scale of the project. Test plans take longer to execute than scheduled. Developers who were supposed to start working against a new milestone are pulled into bug repair. And to deal with all the bug fixes, we face an infinite series of rerunning test plans, only to find new bugs requiring new
Photograph by Joshua Blake
Put Continuous Integration Into Your Head I’ve asked this question of audiences all over the world. Their answers have varied based on the particulars of the project and where they happened to be in the development cycle. But the most common answer is that any bug committed today is likely to go unnoticed until the project enters system test after a period of weeks. This answer is common, but it indicates a serious problem that haunts many development teams. The problem is that the longer a bug lies latent in the system, the greater the cost to fix it—and the cost isn’t just limited to the engineering effort, but rather is distributed across the team. Each bug found by a quality assurance engineer means time spent investigating and reporting the bug, time taken away from executing against the test plan. Once the bug is in the system, a “bug dance” ensues as ownership of the bug bounces from person to person, each spending time looking into the cause until they realize that the bug isn’t in their code. Finally, the bug reaches the ultimate owner, who, before he can even start to fix the bug, must load all the context of the code into his head, and at some point will wonder, “What was I thinking when I wrote this?”
14
• Software Test & Performance
fixes requiring a new round of testing, and so on. It’s no wonder that project managers tell me that system test is the first time they start to understand where they really are in the project; where they have a real understanding of how much work remains. But it doesn’t have to be like this.
What Is Continuous Integration? About six years ago, I first learned about continuous integration, a practice that comes out of the world of agile software development; Extreme Programming in particular. The idea was born from the observation that the traditional integration phase was a point of extreme pain for most projects. Those of us who have lived through such a project understand the phrase integration hell. The source of your misery is the fact that the integration phase might be the first time the entire team brings their code together all at once. For months, any given developer might be evolving a view of the system in a way that was incompatible with his teammates’. Ironing out these differences might take days before the system could even compile, and weeks before the system was ready to move into system test. The XP people proposed an opposite model. What if, instead of integrating our changes only at the end of the project, we did it every day or even several times a day, so that we know that we’re all working in the same code base and we never allow our views of the system to diverge? The answer is a seeming paradox: By integrating all the time, you catch conflicts while they’re small and spend much less time overall on integration issues. Robert Martin, agile evangelist and founder of development consultancy Object Mentor, has a nice morality tale about keeping dishes clean. What’s the fastest way to be done with the dishes? If we think only of today, the answer is to just leave them on the counter unwashed. Eventually, though, we run out of dishes and need to address that mess we’ve allowed to accumulate. And not only is the pile itself daunting, but age has made the job harder and the dried goop makes us weep for our folly. In the same way, allowing code problems to accumulate makes our problems harder. If we want to go faster today, we leave merge problems till later,
Zip Up All of Those Bugs Early, While They’re Still at The Top of Your Mind but if we want to go faster overall, we should fix the integration problems as we go along. But changing the behavior of developers to merge and commit regularly is only the first half of the story. Any discussion of continuous integration is going to involve the word test, because CI aims to ensure not only that the code always compiles, but that it always works at least as well today as it did yesterday. Applied this way, CI acts as a ratchet, ensuring that progress is not lost—and it’s the tests that give the ratchet its teeth. In CI’s original formulation, running a complete set of unit tests is an integral part of the process. The pair of developers committing their code would be responsible for running all the tests on the merged code base and reverting their commit if any of the tests failed. But as continuous integration has passed beyond XP and agile teams enter the mainstream, the patterns of use have changed to match the local conditions.
Build Automation Most mainstream teams tend to have fewer pure unit tests that run quickly. What automated tests that do exist tend to be system tests, which are much slower to run. As a result, instead of running the tests as part of the commit task, they’re often pushed off to an automated CI tool such as CruiseControl. CI tools typically monitor the source control system for changes and once detected, start a build. But the build on a CI system is likely to be different than the production build done for building the product, because the goals are different. The mission of the production build is to produce software. This is the software that will be tested by QA and eventually used by end users. However, the CI build is designed to produce feedback—to let developers know as quickly as possible if something has gone wrong. Because of this urgency, a number of steps that are part of a production build—generating documentation, obfuscaJeffrey Fredrick is a top committer for the CruiseControl project and head of product management at Agitar Software. www.stpmag.com •
15
BUG WISDOM
tion or creating installers—are usually omitted from the CI build. This process of removing elements from the build highlights the purpose of the CI tool, which is to provide quick feedback. How quick? About the time it takes to get a cup of coffee.
The “Cup of Coffee” Test When a developer fixes a bug, most of the effort comes from trying to reload the context, to remember what the code was supposed to do, and to relearn how all the pieces fit together. To avoid that cost, it’s best to learn about it while I’ve still got the information in my head. This is where the “cup of coffee” test comes in. After I commit a change, I’m likely to need a bit of a break. Maybe I’ll surf the Web a bit, get up for a walk, or maybe grab a cup of coffee. If I find a message waiting for me when I get back to my desk that my change just broke something, I’m mentally still on the same task. So a CI system that can give me feedback in less time than it takes for my coffee break saves the costs of context switching—not to mention the other costs of detecting bugs late in the game. Not all problems can be detected that quickly. Compile failures might be quick to detect, but some system tests take hours to run. Before beginning down the CI path, take stock of your current situation and decide how to get the most out of your journey. Many factors play into a successful CI implementation, and you’ll have to consider them carefully in the several decisions you’ll be making.
for the day. I favor the guideline that at least each completed task should be committed, but I expect that most tasks should take no more than a couple of days to complete. In practice, I tend to commit work several times a day and even as frequently as multiple times per hour when working on a series of very short tasks.
When adopting a CI mindset, it’s useful to have a number of different feedback builds, each providing feedback on different information. I use the term feedback deliberately. You can use the results of a CI build to answer any question you like about the state of the code base, and then use that information to guide what happens next. The most common question people look to answer is “Does the code base compile?” If not, that’s probably a situation that you’ll want to address immediately. After that, teams generally look to run automated unit tests, system tests, or both. And since system tests typically take more time to execute, they also serve as a good reason to have multiple builds. The first time I used CI, our team had a set of unit tests that took about five minutes to execute, and a suite of system tests that took about an hour. If we had run all of these tests as part of a single build, there would be only about eight builds a day, and the delay from the time we check in until the time we received feedback would vary from at least one hour to more than two. Compared to waiting a few weeks for feedback, a couple of hours isn’t bad, but we could do better—two hours is a long time to wait to learn the code isn’t compiling. So we got a second machine and divided the work into two builds:
•
Compared to waiting a few weeks for feedback, a couple of hours isn’t bad, but we could do better.
• In the bad old days, developers would frequently hold their changes on their machine, putting weeks of effort at the mercy of the health of the hard drive. Today, this practice manifests as private branches. Neither practice encourages the team to integrate their changes.
Know What to Build Deciding how to deal with failing builds is related to the number of builds required and when they should be run.
TABLE 1: FREE AND CONTINUOUS Product/project Anthill BuildBot
Interactions in Source Control
CABIE
First off, you are using source control, right? There are so many options, including free and low-cost hosted solutions, that there’s really no excuse for any development team—or even a single developer—to not use a version control system. Once source control is in place, determine how often you want your developers to commit their changes to the system. Some teams set rules, such as everyone committing every hour, or all code must be checked in by 3:00 pm, leaving time to resolve problems before leaving
Continuum
16
• Software Test & Performance
Implementation language
Execution
Web site
Java
JVM
anthillpro.com
Python/Twisted
Linux/Unix
sourceforge.net/projects/buildbot
Perl
Win service/ Linux-Unix daemon
cabie.tigris.org
Java/Maven/Ant
JVM
maven.apache.org/continuum
CruiseControl
Java
JVM
cruisecontrol.sourceforge.net
CruiseControl.NET
.NET
.NET, Mono
confluence.public.thoughtworks.org
CruiseControl.rb
Ruby
Ruby
cruisecontrolrb.thoughtworks.com
Draco.NET
C#
Win Service
draconet.sourceforge.net
Drumbeat CI**
C#
.NET, Mono
timpanisoftware.com
Gump
Java
JVM
gump.apache.org
Tinderbox
Perl
Perl
mozilla.org/tinderbox.html
TeamCity*
Java
JVM
jetbrains.com/teamcity
*Free for open source projects **Free two-user edition available
Sources: http://en.wikipedia.org/wiki/Continuous_integration damagecontrol.codehaus.org/Continuous+Integration+Server+Feature+Matrix
MAY 2007
BUG WISDOM
the quick build that was to give us feedback in less than 10 minutes (thus passing the “cup of coffee” test), and the longer BVT (Build Verification Test) build. By simply adding another machine running a different CI build, we increased our opportunities for feedback almost tenfold. After establishing these two build cycles, I soon noticed two unexpected effects. The first was on the developers, and it was something I noticed at lunchtime: You might ask someone if they wanted to go to lunch and they’d say they wanted to wait until they got the build e-mail. I saw this as powerful proof of Alistair Cockburn’s observation that “people want to be good citizens.” The developers wanted the project to succeed, and when we gave them this opportunity of rapid feedback, they grasped it as a tool to take responsibility for the code, to be sure they were leaving it in a good state. The rapid feedback was empowering, in a way that the same feedback delivered later would not have been. The second unexpected effect I noticed was on our nightly production builds. Prior to our adopting CI, these nightly builds were a hit-or-miss affair. A certain percentage would simply fail to compile, and another goodly number would fail to function in the most trivial way. Either way, these problems in reliably delivering a production build had a significant impact on QA and their ability to keep up with testing the new features. When we started running our CI builds, suddenly QA started getting good builds, reliably, every single night. In retrospect, I should have expected this, but the change was sudden and complete, and it led me to understand the value of another element of the feedback cycles: The fast cycles help ensure that the longer build cycles work. Our quick builds would catch all the compile problems so the BVTs and nightly production builds never encountered them. Likewise, for the BVTs to pass, the product had to be functioning, so there were no more “brain-dead” builds delivered to QA. Spreading builds across multiple systems is one way to shorten the time to feedback, but it isn’t the only way— to throw hardware at the problem. One team I know had been successful in adding more and more system tests into the continuous integration build. MAY 2007
Unfortunately, the time to execute the tests also increased, and the build time eventually exceeded 24 hours. Here, the feedback cycle was so long that each build could contain changes from virtually the entire team, and it wasn’t at all obvious which changes were the cause of test failures. This breakdown in the connection between cause and effect meant that test failures were no longer being addressed promptly, and the entire team became desensitized to failing tests. To continue getting value out of the tests, something had to be done. The majority of the tests were database- and memory-intensive, and there seemed no easy way to rewrite them all to be faster. After an investment of $50,000 in hardware, the team reduced its build time to about two hours. While this may seem an extreme measure, the value provided by a shorter feedback cycle and earlier bug detection paid back in spades.
Know When to Build All of these feedback cycles are worthless if nobody’s paying attention. And with the ability to start running multiple CI builds comes the possibility of overload. It’s important therefore to consider not only what questions you
want your CI build to answer, but when you’re likely to care about those answers. For example, consider a report showing the code coverage achieved by a set of unit or system tests. A code coverage report is a great tool for identifying areas that require more testing, but how frequently can your team act on this data? Certainly not every five minutes. The data from a quick build should always be actionable. If the test or compile failed, you want to fix it now. But the code coverage report is less likely to change significantly from commit to commit, so getting the slightly updated version several times an hour doesn’t help—it only adds to the background noise. A code coverage report is a good candidate to be moved to a longer feedback cycle. Other good candidates include static analysis results that could be generated by checkstyle (style violations), findbugs (likely code errors) or simian (code duplication). By moving these sorts of reports to a nightly or even weekly cycle, you can pair them with a process such as a scheduled review so that any problems identified will be acted upon.
Creating the Tests When laying out a CI strategy, considwww.stpmag.com •
17
BUG WISDOM
er which tests will be created and where they’ll come from. Perhaps your team will be using Test-Driven Development (TDD), so that all new code will be created with a set of unit tests. But what about code that was created without tests? One option is to create tests for existing code only as that code is changed. Another option is to use a tool that can generate a set of tests for existing code, so that any changes in behavior can be detected. While much of CI’s focus is on developers and the tests they create, CI is an excellent tool for QA teams as well. QA engineers shouldn’t be shy about getting their automated tests running under a CI build, even if it requires a bit of ingenuity to make it work. In the BVT build I mentioned earlier, one part of the build had WinRunner invoked from Ant. Plain text results from WinRunner were then parsed into XML so they could be included in the results of the CI system. Somewhat surprisingly, some of the QA teams I’ve spoken with have actually been inhibited from getting their tests running under a CI build precisely because they have so many tests they
18
• Software Test & Performance
can run. I’ve had people protest that their test suites aren’t appropriate to run as a CI build because they take 24, 36, even 48 hours or more. But this shouldn’t be a problem. It’s better to get test results in two days than two weeks—not only because of the possibility of catching bugs earlier, but to ensure that the tests are up-to-date with all the latest changes coming out of development.
Plan for Gaps Another factor that plays a role in CI planning is geography. With CI, as with virtually every other development practice, the needs of a small team in an open floor plan are often quite different than those of a geographically dispersed team spanning several time zones. In the smaller, more cohesive team, a build failure (and, critically, what’s done about it) is easy to communicate in an ad hoc fashion. As the distance increases, a more formal process needs to be established to account for the gaps in work schedules. Some say a gap of more than 30 feet is enough to disrupt such ad hoc interaction. Any team that isn’t able to work
face-to-face to resolve build issues needs clear direction on what should happen any time a build fails so that problems aren’t allowed to fester. I’ve seen teams that don’t communicate these expectations find their builds staying broken longer and longer. Eventually, such CI efforts fail because the broken build draws no reaction.
Realizing CI’s Benefits Continuous integration attempts to remove the standard delay between when a bug is created and when that bug is detected. Removing that delay reduces the effort required for individuals to fix their bugs. And, by skipping all the costs associated with the “bug dance,” this provides enormous downstream benefits for the organization. To realize benefits of this magnitude you need to do more than just install a CI tool and call it done. You must plan wisely and make clear decisions about how the team should behave. You’ve got to create tests and be diligent about responding to what the test failures have to say. If you’re willing to put in this effort, you’ll be able to find most bugs in the time it takes to finish a cup of coffee. ý
MAY 2007
Get A C lea r Picture Of Testing SOA Architecture By David S. Linthicum
D
oes testing change with SOA? You bet it does. unless you’re willing to act now, you may find yourself behind the curve as SOA becomes systemic to all that is enterprise architecture, and we add more complexity to get to an agile and reusable state. If you’re willing to face the SOA challenge, the return on your SOA investment will come back threefold—that is, if it’s a well-tested SOA. An untested SOA could cost you millions. Testing SOAs is a complex computing problem. You need to learn how to isolate, check and integrate, ensuring that things work at the service, persistence and process layers. The foundation of SOA testing is to select the right tool for the job, have a well-conceived plan, and spare no expense in testing cycles— or else your SOA may lay an egg. Organizations are beginning to roll out their first instances of SOA, typically as smaller projects. While many work well, some don’t live up to expectations due to quality issues that could have been prevented with adequate testing. If you’re diving into SOA, you need to take these lessons, hard learned by others, and make sure that testing is on your priority list.
How Do You Test Architecture? The answer? You don’t. Instead, you learn how to break down the architecture to its component parts, working from the most primitive to the most sophisticated, testing each component,
20
• Software Test & Performance
then the integration of the holistic architecture. In other words, you have to divide the architecture into domains, such as services, security, governance, etc., and test each domain using whatever approach and tools are indicated. If this sounds complex, it is. Indeed, the notion of SOA is loosely coupled complex interdependence, and your SOA testing approach must follow the same patterns. Before you can properly approach SOA testing, it’s best to first understand the concept of SOA and its component parts. There are many other references about the notion of SOA, but for our purposes, it’s best defined thus: SOA is a strategic framework of technology that allows all interested systems, inside and outside of an organization, to expose and access well-defined services, and information bound to those services, that may be furthermore abstracted to orchestration layers and composite David S. Linthicum is CEO of Linthicum Group, an SOA corporate consultancy. MAY 2007
To Sharpen Things Up, First Break It
Apply Your Favorite Test Pattern MAY 2007
Pho to I llus tra tion by T he Des ign Div a, N Y
All Down, Then
www.stpmag.com •
21
SOA CRACKUP
applications for solution development. The primary benefits of an SOA, and thus the objective of a test plan, include: • Reuse of services, or the ability to leverage application behavior from application to application without a significant amount of re-coding or integration. • Agility, or the ability to change business processes on top of existing services and information flows quickly and as needed to support a changing business. • Monitoring, or the ability to monitor points of information and point of service, in real time, to determine the wellbeing of an enterprise or trading community. Moreover, the ability to change processes or adjust processes for the benefit of the organization in real time. • Extend reach, or the ability to expose certain enterprise processes to other external entities for the purpose of inter-enterprise collaboration or shared processes. What is unique about an SOA is that it’s as much a strategy as a set of technologies, and it’s more of a journey than a destination. Moreover, it’s a notion that is dependent on specific technologies or standards, such as Web services, but really requires many different types of technologies and standards for a complete SOA. All of these must be tested. Figure 1 represents a model of the SOA components and how they’re inter-
related. What’s key here is that those creating the test plan have a macro understanding of how all of the components work together, as well as how each component exists unto itself and the best approach to testing that component. You can group the testing domains for SOA into these major categories: • Service-level testing • Security-level testing • Orchestration-level testing • Governance-level testing • Integration-level testing I’ll focus on service-level testing, since it’s more critical to SOA. In addition, the categories or domains that you choose to test within your architecture may differ due to the specific requirements for your project. And other areas need attention as well, including quality assurance for the code, performance testing and auditing.
Service-level Testing Within the world of SOA, services are the building blocks, found at the lowest level of the stack. Services become the base of an SOA, and while some are abstract existing “legacy services,” others are new and built for specific purposes. Moving up the stack, we then find composite services, or services made up of other services, and all services abstracted into the business process or orchestration layer, which provides the agile nature of an SOA, since you can create and change solutions using a con-
FIG. 1: INDEPENDENT CELLS
Monitoring/Event Management
Security
Governance
Process/Orchestration Services Data Services/Messaging Data Abstraction Data
Data
Legacy
Legacy
Rep
Internet Based New Services Services
22
• Software Test & Performance
MAY 2007
SOA CRACKUP
figuration metaphor. Also, it’s noteworthy that, while most of the services tested within SOAs will be Web service–based, it’s still acceptable to build SOAs using services that leverage other enabling technologies such as CORBA, J2EE and even proprietary approaches. When testing services, you need to keep the following in mind: Services are not complete applications or systems, and must be tested as such. They’re only a small part of an application. Nor are they subsystems; they’re small parts of subsystems as well. Thus, you need to test them with a high degree of independence, so that they can function by themselves, as well as part of a cohesive system. Indeed, services are more analogous to traditional application functions in terms of design and the way they’re leveraged to form solutions, fine- or coarse-grained. The best approach to testing services is to list the use cases for those services. At that point, you can design testing approaches for that service, including testing harnesses or the use of SOA testing tools (discussed later). You also need to consider any services that the service may employ, and thus be tested holistically as a single logical service. In some cases, you may be testing a service that calls a service where some of the services are developed and managed in-house, and some of them exist on remote systems that you don’t control. All use cases and configurations must be considered. Services should be tested with a high degree of autonomy. They should execute without dependencies, if at all possible, and be tested as independent units of code using a single design pattern that fits within other systems that use many design patterns. While all services can’t be all things to all containers, it’s important to spend time understanding their foreseeable use and ensure that those are built into the test cases. Services should have the appropriate granularity. Don’t focus on too-finegrained or too-coarse-grained. Focus
•
on that correct granularity for the purpose and use within the SOA. Here, the issues related to testing are more related to performance than anything else. Too-fine-grained services have a tendency to bog down due to the communications overhead required when dealing with so many services. Too-coarse-grained services don’t provide the proper autonomic values to support their reuse. You need to work with the service designer on this one. So, what do you test for within services? First, it’s important to follow a few basic principles. First and foremost, services should be tested for reuse (reusability). Services become a part of any number of other applications, and thus must be tested so they properly provide behavior and information, but not be application- or technology-specific. This is a difficult paradigm for many developers, since custom one-off software that digs deeply into native features is what they’ve been doing for most of their careers. Thus, the patterns must be applicable to more than a single problem domain, application or standard—you must have use for your reusable service, and it must be in good working order. To test for reusability, you must create a list of candidate uses for the service; for instance, a shipping service that plugs into accounting, inventory and sales systems (see Figure 2). Then, the service should be consumed by the client, either through a real application (in a testing domain) or a simulator, and the results noted. In addition, the service should be tested for heterogeneity. Web services should be built and tested so that there are no calls to native interfaces or platforms. This is due to the fact that a service, say, one built on Linux, may be leveraged by applications on Windows, Macs and even mainframes. Those that leverage your service should do so without regard for how it was created, and should be completely platform-independent. The approach
Services are not complete systems, and must be tested as such.
MAY 2007
•
SOA CRACKUP
to testing this is rather obvious: Simply consume the service on several different platforms and note any calls to the native subsystems. You should also test for abstraction. Abstraction allows access to services from multiple, simultaneous consumers; hiding technology details from the service developer. The use of abstraction is required to get around the many protocols, data access layers and even security mechanisms that may be in place, thus hiding these very different technologies behind a layer that can emulate a single layer of abstraction. Abstraction is tested effectively by doing: implementing instances and then testing the results. Regression and integration testing is the best approach, from the highest to the lowest layers of abstraction. When we build or design services, we also need to test for aggregation. Many services will become parts of other services, and thus composite services leveraged by an application, and you must consider that in their design. For instance, a customer validation service may be part of a customer processing service, which is part of the inventory control system. Aggregations are clusters of services bound together to create a solution, and should be tested holistically through integration testing procedures. Service testing means different things to different organizations, due to the fact that SOA is so new. Most who are testing services attempt to figure it out as they go along, typically rolling their own tools for testing, including service-consumption test harnesses and service-producer test harnesses, for the particular use cases they’re testing. Others are learning to leverage off-theshelf testing tools such as Parasoft SOAtest or Mindreef SOAPscope.
•
such as identity management. When testing your SOA for security issues, you must first understand the security requirements and then design a test plan around those requirements, pointing at specific vulnerabilities. Most testers find that black-box testing is the best way to test for security issues in the world of SOA, including penetration testing, vulnerability testing and so on, using existing techniques and tools. A further concern of SOA security is the fact that many SOAs allow services to be consumed outside the enterprise and thus create a new set of vulnerabilities, including information security issues and denial of service attacks. Moreover, many SOAs also make the reverse trip, allowing for the consumption of services outside of the firewall into the SOA. This opens the door for other types of attacks, and security needs to be tested in this case as well. Vulnerabilities here include malicious services. Again, most who test their SOA for security issues have a tendency to roll their own approaches and build their
Abstraction allows access to services from multiple, simultaneous consumers.
•
own tools. However, a few new tools are appearing on the market, such as Vordel SOAPbox. SOAPbox is for testing the security of XML applications, such as XML Web services. It’s used during development and deployment phases to test an XML application’s compliance with security standards. SOAPbox highlights security tokens, signatures and encrypted content in XML documents.
Orchestration-level Testing For our purposes, we can define orchestration as a standards-based mechanism that defines how Web services work together, including business logic, sequencing, exception handling and process decomposition, including service and process reuse. Orchestrations may span a few internal systems, systems between organizations or both. Moreover, orchestrations are long-running, multi-step transactions, almost always controlled by one business party, and are loosely coupled and asynchronous in nature. We can consider orchestration as really another complete layer over and above more traditional application integration approaches, including information- and service-oriented integration. Orchestration encapsulates these integration points, binding them together to form higher-level processes and composite services. Indeed, orchestrations themselves should become services.
FIG.2: INTELLIGENCE SHARING
Shipping Service
SOA (Shared Services)
Sales
Shipping Service
SOA (Shared Services)
Inventory
Shipping Service
SOA (Shared Services)
Accounting
Security-level Testing Security strategy, technology and implementation should be systemic to a SOA, and even bring along new concepts
24
• Software Test & Performance
Producer Interface
Consumer Interface
MAY 2007
SOA CRACKUP
interfaces, including behavior and information sharing between the services, are working correctly. The type of integration testing you carry out should work through the layers of communications, working up through the network, to the protocols and inter-process communications, including testing the REST or SOAP interfaces to the services, or whatever communication mechanism that’s employed by the services you’re deploying. Things to look for here include: • Can communications be established with late binding—dynamically, as needed? • Is the integration stable under an increasing load? • Is the transmitted information correct for the service or applications? • Are the security mechanisms working properly? • How does the SOA recover from application, database and network failures?
FIG. 3: PLANNED EXECUTION
Test and evaluate
Test plan Create test plan
Design testing
Test results
Black Box Testing
White Box Testing
Thus, you test them as you would other services, including abstraction, reuse, granularity and so on. However, you should note that these services sit above existing services, and the testing should regress from the top level down, from the orchestration layer down to the primitive services. Tools such as Mindreef SOAPscope: Solutions for Web Services and SOA may be effective here, as well.
Governance-level Testing Although there are many SOA governance vendors, and thus many SOA governance definitions, the best is from Wikipedia: SOA governance is an emerging discipline that enables organizations to provide guidance and control of their service-oriented architecture (SOA) initiatives and programs. Many organizations are attempting to transition from silo-oriented applications to agile, composite clients and services. This transition requires that the “service” become the new unit of work. The IT organization must now manage these services across the entire life cycle, from inception through analysis, design, construction, testing, deployment and production execution. At each stage, certain rules or policies must be carried out to MAY 2007
ensure that the services provide value to the consumers. SOA governance is the discipline of creating policies, communicating and enforcing them. So, in essence you have a life cycle and policy management layers, including testing, that need testing. No problem. You test a governance system by matching the policies the governance system is looking to manage and control with the actual way it manages and controls them. Thus, it’s a simple matter of listing the policies and establishing test cases for each, such as: XYZ Service can leverage services only within the firewall test policy Does the policy disallow that service from leveraging services outside the firewall? It’s as easy as that, and there are no tools for testing SOA governance systems other than those provided by the SOA governance solution vendors.
Creating a Test Plan
•
The test plan you create for your SOA should reflect the requirements of your project. Unfortunately, one size does not fit all. Figure 3 depicts the high-level process you can employ to drive SOA testing for your project. However, you may have special needs, such as more emphasis on performance and security. In the end, you’ll find that SOA testing incorporates all of the testing technology and approaches we’ve developed for other distributed systems, adding new dimensions such as services, orchestration and governance testing. Adapting to SOA means expanding your skills and understanding to include those new dimensions. Many failed SOA projects are directly attributable to lack of testing, with the testers assuming that the new technology would work flawlessly. Unfortunately, that’s just not yet a reality. ý
At each stage, rules or policies must be carried out to ensure that the services provide value.
Integration-level Testing As with traditional integration testing, this step aims to determine if all of the
•
www.stpmag.com •
25
By Gregg Sporar, A. Sundararajan and Frank Kieviet
ike the rest of us, you probably don’t think much about aging parts under the hood— until a hose starts to leak. Then it
L
suddenly becomes that trip’s urgent issue, sometimes forcing you off the road with disastrous results. So what happens when your Java app starts to show its age and springs a memory leak? This is an issue you can’t ignore, either, because memory leaks in Java applications decrease the CPU time available for your application, slowing its ability to respond—or stopping it altogether. Like a crashed engine-cooling system, memory leaks in your Java application demand a variety of fix-it tools and techniques. In Part 1 of this series, we explored what a memory leak is and described techniques for solving heap memory leaks. To conclude, we’ll take a look at a particularly tricky problem: permanent-generation memory leaks.
What Is the Permanent Generation? The Java Virtual Machine divides memory into three parts, or generations. The young and tenured generations are used to hold the objects directly allocated in your application with the new
operator; the term heap is commonly used to refer to the combination of the young and tenured generations. The permanent generation, however, is very different. The JVM employs it to hold the classes that your application uses. Your application’s classes are loaded by a class loader, which the JVM provides so that you don’t have to be concerned with it. While the JVM stores class data in the permanent generation, a class loader object is just stored in the heap along with your application’s objects. The permanent generation is also used to hold interned strings. An interned string has been passed to the intern() method of the String class, which places it in a pool. An interned string can be compared to another interned string using the == operator, which is faster than the equals() method. For more information about the JVM memory generations and garbage collection options, refer to the online article “Tuning Garbage Collection” (java.sun.com/docs/hotspot/gc1.4.2/). Most modern JVMs have command-line parameters for specifying heap and permanent generation size (see Table 1). Unless otherwise noted, all references to a JVM or Java Development Kit (JDK) The authors work at Sun Microsystems.
refer to Sun’s reference implementations, version 1.4.2 or higher.
What Causes PermGen Leaks? If your application encounters an OutOfMemoryError, it could be the permanent generation that ran out of space. Starting with JDK 5, the OutOfMemoryError message includes the phrase “PermGen full” if the problem is with the permanent generation. The source of most permanent-generation memory problems is usually either too many classes, too many interned strings or a memory leak caused by a class loader reference. The first two are easy to fix if your system has additional RAM—simply increase the size of the permanent generation (see Table 1, page 28). For two reasons, memory leaks caused by class loader references can be difficult to track down, let alone to fix. First, most developers aren’t familiar with class loaders and how they work. Second, only a limited number of tools provide helpful information about the contents of the permanent generation. Problems with class loader references frequently happen in Web and enterprise Java applications because Web and application servers typically use
Leaks Crash Apps From Anywhere –Learn To Keep Them From Being Permanent 26
• Software Test & Performance
MAY 2007
MAY 2007
www.stpmag.com •
27
JAVA STOP-LEAK II
TABLE 1: GET HOLD OF THE HEAP Usage
Option -Xmx
Maximum size for the heap (young + tenured generations) Example: -Xmx128m sets the maximum size to 128 megabytes.
-Xms
Initial size for the heap (young + tenured generations) Example: -Xms64m sets the initial size to 64 megabytes
-XX:MaxPermSize
Maximum size for the permanent generation Example: -XX:MaxPermSize=96m sets the maximum size to 96 megabytes
-XX:PermSize
Initial size for the permanent generation Example: -XX:PermSize=32m sets the initial size to 32 megabytes
multiple custom-class loaders. This allows the server to load and unload independent applications without a restart. Since Web and application servers can contain more than one application, they’re frequently given the generic term container. So if the container you’re using reports OutOfMemoryError: PermGen full, you might be tempted to conclude that the container’s code has a bug. While that might be the case, often the problem is more subtle and lies elsewhere. As an example, the servlet source code shown below looks innocent enough. It defines a one-line doGet() method. Note that the class created for CUSTOMLEVEL is an anonymous class. That is necessary because the constructor of Level is protected. package com.stc.test;
import java.io.*; import java.net.*; import java.util.logging.*; import javax.servlet.*; import javax.servlet.http.*; public class Leak extends HttpServlet { private static final String STATICNAME = "This leaks!"; private static final Level CUSTOMLEVEL = new Level("test", 550) {}; // anon class! protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { Logger.getLogger("test").log(CUSTOMLEVEL, "doGet called"); } }
When this servlet is loaded as part of an application, the container will use a class loader solely for that application. Figure 1 shows the objects in memory
FIG. 1: LEAK ’N’ LOAD
Leak STATICNAME
Container
Leak.class
CUSTOMLEVEL
AppClassloader
Leak$1.class
Level.class Level.INFO Level.SEVERE Level. ...etc
28
• Software Test & Performance
after the servlet gets loaded; the objects in yellow were loaded by the application class loader; all other objects are green. The Container has a reference to the application class loader that it used and, of course, also has a reference to the servlet. The servlet object has a reference to its class, which in turn has a reference to the application class loader that loaded it. The link shown in red is a bit surprising, though. The code below is for the constructor of the JDK’s Level class. protected Level(String name, int value) { this.name = name; this.value = value; synchronized (Level.class) { known.add(this); } }
The last line in that constructor is the cause of the link shown in red. The variable called known is the reason for the link. It’s an ArrayList in the Level class, and since it is a static variable, it’s part of the data for the Level class. You might expect that when the application is unloaded, all of its objects and classes would be candidates for removal by the JVM’s garbage collector. But that’s not the case. Figure 2 shows memory after the application is unloaded. The only object that can be garbage-collected is the Leak object. The object created for CUSTOMLEVEL can’t be garbage-collected because it’s still being referenced by that static ArrayList. Since the CUSTOMLEVEL object has a link to its class, which in turn has a link to the application class loader, the application class loader can’t be garbage-collected—nor can any of the classes it loaded into the permanent generation. Since the Level class is part of Java’s runtime library, it gets loaded by a special class loader called the bootstrap or null class loader. Classes that are loaded by the bootstrap class loader are never garbage-collected. The Level class remains in the permanent generation until the JVM exits. In this example, the classes loaded by the application class loader also will remain in the permanent generation until the JVM exits. This is a typical class loader–memory leak. In this simple example, the class loader–memory leak is fairly small; only two classes have leaked in the permanent generation. In a typical application, the problem would be much worse—the MAY 2007
JAVA STOP-LEAK II
larger the application, the bigger the problem. Additional factors can prevent the removal of class information from the permanent generation. Each object on the heap has a reference to its class. Since every class keeps a reference to its superclass, a single object on the heap can lead to multiple class entries in the permanent generation. Class loaders also can form parent-child relationships, and each has a reference to the other. For further information on class loading, refer to blogs.sun.com /sundararajan/entry/understanding_ java_class_loading_part.
FIG. 2: LEAK UNLOAD
Leak STATICNAME
Container
Leak.class
CUSTOMLEVEL
AppClassloader
Leak$1.class
Level.class
Finding PermGen Leaks
Level.INFO
If the permanent generation is running out of space, and increasing its size doesn’t fix the problem, additional tools and techniques will be required. The jmap utility has been around since JDK 5 (for more information, see the online documentation at java.sun.com /javase/6/docs/technotes/tools /share/jmap.html). It has a -permstat option that displays the classes loaded in the permanent generation and the class loaders used to load them. The -permstat option also displays statistics on interned strings. Unfortunately, jmap isn’t available for Windows prior to JDK 6, and even then, the Windows version doesn’t support the -permstat option. So while jmap is useful where it’s available, for applications on JDK 1.4.2 or those deployed on Windows systems, other options must be used. The JConsole utility displays perma-
Level.SEVERE
FIG. 3: LINK-A-LEAK
MAY 2007
Level. ...etc
nent-generation statistics in its memory tab. If there is a class loader memory leak, however, the classes tab is usually more helpful. It displays the number of classes that have been loaded and unloaded. The Verbose Output option is especially useful. Checking that option causes the name of each class to be displayed on your application’s stdout as it’s loaded and unloaded. Use that output to look for patterns: If the same classes are loaded repeatedly without being unloaded, you’ve probably encountered a leak. As an alternative to JConsole’s Verbose Output option, you can specify these JVM options when you start your application:
-verbose:class XX:+TraceClassUnloading
It may take quite a while before the garbage collector will try to collect in the permanent generation—the exact timing is difficult to predict. A sure way of forcing a garbage collection is to create an out-of-memory exception in the permanent generation. For an example, Listing 1 contains a servlet that instantiates class loaders and loads a large class in them until it runs out of memory in the permanent generation. If the cause of an inadvertent reference to a class loader isn’t apparent, you must dig a bit deeper into the permanent generation to find the problem. To track down this sort of leak, you must usually locate the root set object that is preventing the garbage collector from removing the classes from the permanent generation. Root set objects are the starting point for the garbage collector—any object reachable from a root set object is not a candidate for garbage collection. For a full definition of root set, refer to the Memory Management Glossary (www.memorymanagement .org/glossary/r.html). This is where many profiling tools fall short: most profilers do not display the references between class objects and class loaders. Using Figure 2 as an example, most profilers will not show the connections between AppClassLoader and Leak .class or Leak$1.class. Most profilers incorrectly display all classes as root set objects, and don’t try to trace references to those class objects www.stpmag.com •
29
JAVA STOP-LEAK II
TABLE 2: OQL IN THE PERM GEN Notes
Query
Returns
select count(heap.classes())
Number of classes loaded into the permanent generation
select cl from instanceof java.lang.ClassLoader cl
Lists all class loaders
The sun.reflect. DelegatingClassLoader is used to optimize reflection performance.
select map(sort(map(heap.objects('java.lang.Cl assLoader'), '{ loader: it, count: it.classes.elementCount }'), 'lhs.count < rhs.count'), 'toHtml(it) + "<br>"')
Lists all class loaders and the number of classes they have loaded
Creates a histogram so you can see which class loaders are responsible for loading the most classes.
select map(heap.objects('java.lang.ClassLoader' ), function (it) { var res = ''; while (it != null) { res += toHtml(it) + "->"; it = it.parent; } res += "null"; return res + "<br>"; })
Lists all class loaders and their chain of parent class loaders
Shows the parent-child relationships between class loaders. The class loader for an individual class is listed on the jhat page for that class.
select { loader: cl, classes: filter(map(cl.classes.elementData, 'it'), 'it != null') } from instanceof java.lang.ClassLoader cl
Lists all class loaders and the classes they loaded
All references to an individual class loader are also available from the jhat object instance page for that class loader.
select { loader: cl, liveness: heap.livepaths(cl) } from instanceof java.lang.ClassLoader cl
Lists all class loaders and all references to each class loader
from the actual root set objects, one or more of which could be referencing a class via its class loader. To detect the problem shown in Figure 2, it would be helpful to know why the Leak.class and all other classes loaded by AppClassLoader are still in the permanent generation. But if your profiling tool doesn’t display references from root sets that go through a class loader, it will be difficult to track them down. Fortunately, the jhat utility included with JDK6 does, and includes an Object Query Language (OQL) that is able to report on the permanent generation. To use jhat, you must obtain a snapshot file (or dump) of the memory used by your application. If the JVM is reporting an OutOfMemoryError, add -XX:+HeapDumpOnOutOfMemoryError
30
• Software Test & Performance
to the command line that starts your application. This option is supported by the most recent updates of JDK 1.4.2, JDK 5 and JDK 6, and will create a file that ends with .hprof. If you don’t want to wait for an OutOfMemoryError to occur, you must force the creation of the memory snapshot. With JDK 6, simply use the new-dump option of the jmap utility to create the snapshot file. With earlier versions of the JDK, specify -Xrunhprof:heap=dump,format=b
on the command line that starts your application. A Ctrl-\ (or Ctrl-Break on Windows) at the console of your application will create the snapshot file. If no console is available, use the kill command (on Solaris or Linux systems) or a tool such as StackTrace (www .adaptj
.com/root/main/stacktrace). Regardless of how the snapshot file is created, you should try to unload any classes that you suspect are the source of the problem before you create the snapshot. For example, with Web or enterprise Java applications, undeploy the application component(s) from the server before creating the snapshot file. This will reduce the number of references to the classes and their class loaders, which will hopefully make it easier to spot the inadvertent references that are causing the problem. The jhat utility has only one required parameter: the name of the snapshot file. Note that the jhat included with JDK 6 is able to read snapshot files created by older versions of the JVM. The jhat utility contains a Web server; a browser is used to access its user interface. It reads a snapshot file and allows queries on the data in the file, in a manner similar to a database server. It runs on port 7000 by default, so after it starts, specify localhost:7000 in your browser. Several default queries are available as links from the main page. The Object Query Language is implemented on a JavaScript engine, so JavaScript can be used in the queries, and you can write nearly procedural queries. OQL has a built-in object called Heap that is used to access information about the permanent generation. Several queries that are helpful for tracking down permanent generation problems are shown in Table 2. A complete discussion of OQL is beyond the scope of this article—refer to the OQL help page at the jhat user interface for full details. For more information on jhat, refer to its online help page: java.sun.com/javase/6/docs /technotes/tools/share/jhat.html. Another handy feature for finding memory leaks is jhat’s ability to compute all reference routes from root set objects to a particular object. A reference route is a series of objects in which each object references the next. When you locate a class that shouldn’t be in the permanent generation, use jhat’s ClassLoader link to examine its class loader’s reference routes. For the example servlet on page 28, the Leak class would be listed on the initial jhat page. Clicking its link would display a page that includes a link for its class loader. Clicking that class loader MAY 2007
JAVA STOP-LEAK II
link displays a page with information about the class loader, including links labeled “Reference Chains from Rootset.” Clicking the “Exclude weak refs” link displays a list of references from root set objects, as shown in Figure 3 on page 29. It reveals that the static variable called known from the shorter listing on page 28 is the root set object that is ultimately preventing the garbage collection of this class loader. For class loaders that have multiple root set references, instead of examining each reference route individually, look for a repeating pattern in the reference routes. The Unix tools sort and uniq can help you to quickly find the common denominator in the reference routes. This can also be done with an OQL script, as shown in the listing below. (function() { var histo = new Object(); // replace parameter to findObject() with your object's address map(heap.livepaths(heap.findObject('0x0f054b90')), function(path) { for (var i = 0; i < path.length; i++) { var desc = toHtml(path[i]); if (desc in histo) { histo[desc]++; } else { histo[desc] = 1; } } }); var str = "<table>"; for (i in histo) { str += "<tr>"; str += "<td>" + i + "</td><td>" + histo[i] + "</td></tr>"; } str += "</table>"; return str; })();
The OQL object heap has two methods that are useful in this situation: livepaths() and findObject(). The livepaths() method returns an array of reference routes for the specified Java object. Each path is an array of objects. The findObject() method returns a specific object given its object identifier. So passing the result of findObject() to livepaths() creates a list of all reference routes from root set objects to the specified object. A bit of post-processing is done to create a histogram, which makes it easier to see the references with the highest counts—the higher the count, the more suspicious the reference. Note that the parameter passed to findObject() is the hexadecimal object identifier displayed by jhat for the
32
• Software Test & Performance
object for which you want to see the references. Applications—like automobiles— will run longer and require less maintenance if built well from the beginning. Preventing leaks is one of the most important things you can do to ensure a long life for your car—and for your Java code. ý ACKNOWLEDGMENTS Many thanks to Alan Bateman for his suggestions and advice on tracking down all sorts of memory leaks.
CORRECTIONS Part 1 of this article in the April issue contained three errors. Output from the -verbose:gc command line option was described incorrectly. It should have read: “The number in parentheses is the total amount of heap space (not counting the permanent generation), minus one of the survivor spaces.” In the section on generating a heap memory snapshot file, a Ctrl-\ (or Ctrl-Break in Windows) at the console of your application will NOT end the application, as was indicated. The leading hyphen was incorrectly omitted from the command-line flag: -Xrunhprof:heap=dump, format=b.
LISTING 1: LEAKY SERVLET package com.stc.test; import java.io.*; import java.util.ArrayList; import javax.servlet.*; import javax.servlet.http.*; public class RunGC extends HttpServlet { private static class XClassloader extends ClassLoader { private byte[] data; private int len; public XClassloader(byte[] data, int len) { super(RunGC.class.getClassLoader()); this.data = data; this.len = len; } public Class findClass(String name) { return defineClass(name, data, 0, len); } } protected void processRequest(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); out.println("<html><body><pre>"); try { // Load class data byte[] buf = new byte[1000000]; InputStream inp = this.getClass().getClassLoader() .getResourceAsStream("com/stc/test/BigFatClass.class"); int n = inp.read(buf); inp.close(); out.println(n + " bytes read of class data"); // Exhaust permgen ArrayList keep = new ArrayList(); int nLoadedAtError = 0; try { for (int i = 0; i < Integer.MAX_VALUE; i++) { XClassloader loader = new XClassloader(buf, n); Class c = loader.findClass("com.stc.test.BigFatClass"); keep.add(c); } } catch (Error e) { nLoadedAtError = keep.size(); } // Release memory keep = null; out.println("Error at " + nLoadedAtError); // Load one more; this should trigger GC XClassloader loader = new XClassloader(buf, n); Class c = loader.findClass("com.stc.test.BigFatClass"); out.println("Loaded one more"); } catch (Exception e) { e.printStackTrace(out); } out.println("</pre></body></html>"); out.close(); } }
MAY 2007
The Sec urit y Zone
Secure Software From the Ground Up
Will Securing Your SDLC Have You Going Back To School? By Ryan Berg rganizations should implement source-code security scanning tools as part of the software development life cycle to find and fix the highest number of security issues early in the project,” states Amrit Williams of Gartner Research. In his April 2006 paper titled “Implement Source Code Security Scanning Tools to Improve Application Security,” he asserts that the practice “will result in a higher-quality product and lower overall application life cycle costs.” Countless studies and analyst recommendations suggest the value of improving security during the SDLC rather than trying to address vulnerabilities in software discovered after widespread adoption and deployment. Automated source code analysis is generally accepted as the most effective method of security testing early in the life cycle, because it allows assessments of any piece of code without requiring a completed application. Although penetration testing is also an important element of software security, its value often comes later, when it can be used on a completed application with a functional interface. The justifications for implementing these technologies, including improved software quality, cost savings and customer loyalty, are clear. What’s not always clear is exactly how to implement them.
“O
Roadblocks to Building In Security Among the hurdles that impede security testing in the SDLC, the largest is typically the skills gap between developers and security experts. These two skill-sets are often not present in the same individual or even the same group, and organizationally, there is little inherent synergy. While development goals focus on product functionality and on-schedule delivery, security staff is often tasked with eliminating vulnerabilities and implementing security controls only after the applications are completed and deployed. To effectively reduce vulnerabilities created during MAY 2007
the development process, cooperation must be achieved between these two groups. In all cases, higher-level management support for improving security during development is essential. In addition to organizational impediments, a general hesitancy to change or revise an existing SDLC process may delay implementation of security testing. Similar to this conceptual roadblock, there are other misconceptions about integrating security analysis during development that must be overcome before an initiative can move forward. A common fiction is that the development schedule cannot afford to be stretched any further, not even to address security issues. And while there may be initial lapses in the development cycle, the fact is that the most time-efficient method for reducing software risk is at the outset of development. The process eventually reduces development time by instilling good, secure coding practices. Another misconception is the belief that organizations already doing peer review don’t need additional security code review. However, peer review is not a substitute for security review, and is typically used to find functional bugs. So unless the review is targeted specifically at finding security defects and the reviewers have a deep understanding of application security, many of the more critical security vulnerabilities and design flaws will be missed. In many cases the best-intentioned user requirement implemented without functional error can lead to the greatest security risk.
Core Responsibilities Once past the initial hurdles, many enterprises still find it challenging to identify the most appropriate method and resources Ryan Berg is a co-founder and chief scientist at security tools maker Ounce Labs.
www.stpmag.com • 35
The Sec urit y Zone to implement source code analysis in their development life cycle. Three models are common scenarios currently being used to successfully reduce vulnerabilities during development. These models help establish criteria for assessing goals, resources, obstacles and ultimately, the most favorable approach for individual organizations. Although development organizations and processes have their own distinct characteristics, these models address the common elements that should be leveraged to achieve effective security testing. The primary functions that must be served by existing staff or experts brought in during implementation are: Set security requirements. A manager or central source of security expertise defines what are considered vulnerabilities and how to judge criticality based on business needs. Configure analysis. Internal definitions are used to customize the source-code analysis tool to match policies. Scan source code. The source-code analysis product is run against the target application or parts of the application to pinpoint vulnerabilities. Triage results. Staff members with knowledge of security and of the application study results to prioritize remediation workflow. Remediate flaws. Vulnerabilities are eliminated by rewriting code, removing flawed components or adding security-related functions. Verify fixes. The code is rescanned and studied to ensure the code changes have eliminated the vulnerability while maintaining application functionality.
potential flaws across the complex interactions of various files and functions. This may be extremely time-consuming, however, unless the code base is small. A typical workflow might be similar to the following: The managers: • Define security requirements The developers then: • Check out the latest version of the source code actively under development • Add/modify existing application functionality • Configure scanner for analysis • Run scan on entire application • Triage analysis results • Fix any reported vulnerabilities • Run scan on entire application to verify vulnerability fix • Check in latest code changes The managers then: • Review development progress reports How does it work? This model is assumed to be implemented as part of an existing SDLC during the development phase. It requires the developers involved to have sufficient security knowledge not only for identifying vulnerabilities and performing appropriate triage, but more importantly for understanding how to best fix them. Some management input is usually required to provide basic security requirements as well as guidance for decision-making (for example, how to judge criticality for Web interface vulnerabilities as opposed Independent Model to those found in a database application). When security poliWhen you don’t have time to read the manual and just want to cies aren’t defined by a central source, developers are left to get started, this model is the one to use. Think of the Independent make individual decisions about what constitutes a vulneraModel as the Swiss army knife for source code analysis; while often bility and/or a fix. not the most effective tool for the job, it will almost always work. Managers should be able to track development progress to This is also the model most often envisioned by organizameasure delays caused by these security efforts. But generally, this tions when the concept of implementing security analysis in the model lacks higher-level reporting on vulnerabilities found and SDLC is introduced. In the Independent Model (Figure 1), each fixed because logistically, the vulnerabilities identified are elimdeveloper is responsible for analyzing code for security vulnerinated before entering into a centralized reporting structure. When does it work? Organizations with small development abilities, identifying those that are most critical, providing necteams and small applications generally get the most value from essary fixes, and verifying that the flaws have been eliminated. the Independent Model. Depending on the In these cases, it’s much organization and the FIG. 1: SMALL APPS, SMALL TEAMS easier to provide suffiapplication requirecient training and guidments, these security Manager ance so the developers’ efforts are typically pracSecurity Developer Developer Requirements security efforts will result ticed at regular intervals Configure Configure in effective remediation. during the development Scan Scan Developers are expectcycle, where software Triage Triage Remediate Remediate ed to learn from their engineers scan their Verify Verify security efforts and avoid own code after making Developer common mistakes as changes or additions, Configure they write future code, then remediate any vulScan although this is often a nerabilities before Triage difficult thing to measchecking files back into Remediate Verify ure in this model. the source code control SOURCE CODE CONTROL SYSTEM Because of the immesystem. Ideally, individdiate benefits and the uals scan the entire relatively little investapplication so they can ment required to initialso track data flow and
36
• Software Test & Performance
MAY 2007
The Sec urit y Zone although vulnerability ate the Independent FIG. 2: BIGGER TEAMS, OUTSIDE EXPERTS scanning is conducted Model, it proves to be a centrally on the complete feasible method for Manager application. Typically, the introducing security QA/Release Enineer Team Security quality assurance or into small development Configure Requirements Scan release engineering team organizations. It’s also a assumes this responsibilpossible option for simity, which enables this ple system integrator model to integrate easily projects, where analysis with existing developand remediation is perRaw Data ment structures. The timformed by only a few ing and frequency of developers. Developer Developer assessments can be conWhen doesn’t it work? trolled to meet a flexible The Independent range of testing requireModel isn’t scalable Triage SOURCE CODE Triage CONTROL SYSTEM ments, from an agile beyond small applicaRemediate Remediate Verify Verify development process to tions or small teams that a more structured are specialized in source process such as waterfall code review. Most organdevelopment. izations can’t rely on The typical workflow of the Distributed Model looks somedevelopers to have sufficient expertise to make difficult security decisions, and training to achieve that level of understand- thing like this: The managers: ing often requires an impractical commitment of resources. • Define security requirements Even if they do reach this point, the lack of centralized conThen QA/release engineers: trols makes this an inefficient model, often leading to redun• Configure the scanner for build integration dant work among developers who scan the same code bases simul• Sync the latest code at each milestone taneously and may potentially work on fixing the same vulnera• Run scan on the components released for the current bilities. This model also fails because of the difficulty of creating milestone and enforcing secure coding policies across more than a few • Provide raw analysis results to development developers. Without quantifiable standards defined companyThe developers then: wide (for example, what level of encryption to use, how to vali• Triage analysis results date input and so on), the standard practice becomes whatever • Make necessary code changes for remediation methods are favored by individual reviewers, which leads to incon• Verify remediation before checking code back into sistency and, in many cases, poor security. control server Another shortcoming is the difficulty of tracking project Managers: data from a business perspective, such as the number and type • Track development progress and review vulnerability of vulnerabilities discovered, improvements in application secudata from assessments rity over time, and return on investment for the security How does it work? The security analysis can take place either resources used. If developers have the right knowledge and tools, it’s likely that vulnerabilities will be eliminated during as a requirement before entering the quality assurance phase development. However. without reports and artifacts demon- or as part of an acceptance test after QA has begun. The QA strating these improvements, there is no viable way to com- or release engineering team configures the analysis according municate the value to executives, auditors, customers, partners to centralized security requirements, scans the entire application, and distributes the results as raw data to the development and other stakeholders. team. Individual developers are then responsible for sifting Best practices. For best results in the Independent Model: Establish a set of quantifiable, enforceable security require- through the data to identify the most critical vulnerabilities and perform the appropriate remediation. With this distribution of ments to guide remediation efforts. Conduct security peer reviews among developers to verify that responsibilities, the development team still requires security all security fixes effectively eliminate vulnerabilities without neg- knowledge and experience, but the work associated with configuration and scheduling can be distributed to the QA/release atively affecting functionality. Identify and/or train a security-capable member of the devel- engineering team. Depending on the size of application and number of vulopment team to act as a mentor, guiding other developers on nerabilities found in the initial scan, several iterations of secuanalysis, triage and security review to verify fixes. rity testing may be required in the Distributed Model to verify that the fixes are effective while maintaining application Distributed Model Similar to the Independent Model, the Distributed Model relies functionality. Repetitive testing is critical as the application on developers to perform the majority of the security functions, is being developed, because new dependencies and interacMAY 2007
www.stpmag.com •
37
The Sec urit y Zone tions in the code may expose vulnerabilities that had previously eluded discovery. Increasing the frequency of assessments in this model significantly improves the team’s ability to discover vulnerabilities early, though a balance must be found in order to adhere to the development schedule. A primary advantage of the Distributed Model is the ability to find this balance, because adjustments can be made more easily to a centralized scanning process than to individual scanning as in the Independent Model. This model also eliminates the workflow redundancies that occur when developers are individually responsible for configuring and running their own assessments. When does it work? Medium-sized development teams using a formal software development process are best suited for the Distributed Model. It functions effectively within an agile development process because the frequency of testing increases the chances of finding and correcting vulnerabilities early in the lifecycle. A waterfall development process offers fewer opportunities, but also benefits from the Distributed Model because the larger scope of product milestones maximizes the value of each assessment. For cases in which developers lack sufficient security experience, specialists may be brought into this model to receive results of the analysis and perform the necessary triage. They might be at a small disadvantage because they are unfamiliar with the application architecture, but this approach offers developers additional freedom to concentrate on their coding responsibilities. This approach works only if the security audit team is integrated as part of the software engineering team, however, since this model is assumed to be carried out in line with the development life cycle, and not as part of a parallel or out-of-band process. When doesn’t it work? The Distributed Model doesn’t scale well for complex applications, large development teams or development life cycles that aren’t driven by functional- or componentbased milestones. It begins to break down especially in the triage responsibilities for the developers, who have the same problems as their counterparts in the Independent Model, including duplication of work and lack of detailed knowledge of FIG. 3: FLEXIBLE BUT COSTLY security and overall application architecture. If Manager the triage process doesSecurity n’t accurately identify the Requirements most critical vulnerabilities or individual developers are working with overlapping components of the code, the value of remediation efforts can’t keep up with the losses in productivity. Best practices. For SOURCE CODE best results in the CONTROL SYSTEM Distributed Model: Begin scanning early in the life cycle and maintain a regular, fre-
38
• Software Test & Performance
quent schedule of assessments. Assign developers responsibility for securing distinct components of the application to reduce redundant work as much as possible. Identify a security-minded member of the team to act as a mentor, guiding other developers on analysis, triage and peer review to verify fixes.
Centralized Model The Centralized Model is the most flexible approach, able to be adapted to any size team, independent of the software development process and application complexity. It’s generally not recommended for smaller teams to begin with, however, because of the initial investment in resources required. In most cases, source code analysis programs that start with one of the two other models will eventually evolve into the centralized approach because of the gains in efficiency and measurability of results. This is especially true as application development requirements become more complex and the size of the team increases. The Centralized Model workflow typically works this way: Managers: • Define security requirements Security Analysis Team members then: • Configure the scanner for build integration • Retrieve the latest code for analysis • Scan application source code • Triage the results • Assign vulnerability remediation directly to development Followed by developers, who: • Make necessary code changes for remediation • Check-in code As managers: • Track development progress, review vulnerability data, and monitor results of remediation efforts How does it work? Unlike the other two models, which require developers to become security experts, the Centralized Model allows the security functions to be carried out by the group with the greatest experience and knowledge of software vulnerabilities. The security analysis team scans the entire applicaSecurity AnalysisTeam tion, leveraging a cenConfigure tralized source of expertScan Triage ise and technology, regardless of whether they exist internally or Vulnerability Data externally of the develRemediation Data opment life cycle. Raw results are triaged by this Developer Developer security team as well, so the information they generate provides develRemediate Remediate opers with a prioritized Verify Verify remediation workflow based on the criticality of vulnerabilities. The secuMAY 2007
The Sec urit y Zone rity team is also more likely to interpret security requirements properly when they configure the analysis because they understand the business-level issues of risk management. Ideally, vulnerability remediation assignments are categorized and assigned to individual developers through a defect tracking system (DTS) that allows the entire team to monitor progress of flaws being fixed. Developers, meanwhile, are allowed to focus much more of their time on advancements and improvements to the source code, whether adding functionality to the software or recoding elements that were considered vulnerable. The security audit team in this model provides developers with remediation advice that is not only context-specific, but also incorporates remediation guidelines according to corporate policy. For example, the audit team may find an SQL injection vulnerability in the credit-card number processing of an online billing application. The recommended fix could be to modify the SQL to use a stored procedure or parameterized SQL statement, or it could be to use the corporate standard credit-card validation routine. The central security team will be able to correctly report this vulnerability and assign remediation based on company-specific policies, instead of requiring developers to become security policy experts. In this model, developers may or may not run their own scans to verify their fixes are effective, but because the scope of a vulnerability may manifest itself only when the entire application is analyzed, it’s usually better for the security team to own responsibility for verification. When does it work? The Centralized Model’s primary advantage is the flexibility to integrate efficiently either inside the SDLC, as a complement to internal software audit teams, or externally as a tool for security integrators or code review services. It can be used independent of a formalized development process, though its effectiveness may be diminished without a defined structure. Centralizing the configuration, analysis and triage makes this the easiest model to manage, and it has the capability to scale quickly as the development team or project scope increases. Because of the various deployment scenarios possible with the Centralized Model (development, internal audit, external code review, and so on), it often requires different delivery methods for the source code. Connecting to the source-code control system may work well if the analysis team is in line with the SDLC, but external review teams will most likely request remote access or code delivered on portable storage devices, such as CD or USB drive. When doesn’t it work? The Centralized Model can be applied to all project scenarios, though it may be more complicated than necessary for small teams or applications. As well, it requires support and commitment from management to ensure that sufficient resources are allocated and that common goals are defined for both the participating development and security organizations. Without this cross-functional cooperation and dedication of sufficient resources, organizations should not expect a measurable return on investment. Best practices. For best results in the Centralized Model: Include both security and development in planning and information sharing prior to implementation to achieve agreement on key elements such as: • Application design and architecture MAY 2007
TABLE 1: HOW MODELS MEASURE UP Requirements Model
Independent Distributed Model Model
Effectively reduces security vulnerabilities Centralizes management and remediation strategy
-
+
Reports on cross-enterprise results and progress
Supports easy implementation for small development projects
+ +
Centralizes analysis configuration
Supports distribute development teams
Centralized Model
+
Prioritizes remediation to assign to developers Requires low level of management commitment Scales for large applications and development teams
-
+
• Security requirements • Security training • Prioritization of remediation workflow • Provide guidance for the analysis team with respect to policies and objectives to assure that their results, triage and vulnerability assignments achieve the greatest impact. • Integrate the process with existing technologies, such as defect tracking systems, to make the transition as smooth as possible for staff and to maximize long-term efficiency.
Choosing the Right Model The Independent, Distributed and Centralized Models all represent current deployment scenarios being used successfully in various development organizations. Each approach offers a degree of flexibility to accommodate specific requirements of existing processes, so the models may ultimately take on different shapes. However, the fundamental stages and functions serve as guidance for organizations looking to begin implementing security testing during development. When choosing one of the three models as a template, it’s important to first catalog existing resources (security expertise, technologies, service partners and so on) as well as project objectives (fewer security patches, competitive advantage, compliance and so on). Table 1 provides a rough checklist of how each model fits typical requirements that may come into play during the decision-making process. With growing awareness of software vulnerabilities as a critical problem in information security, and with the availability of accurate, efficient source-code vulnerability analysis technologies, implementing security testing into software development is occurring more frequently, and with greater success. Companies that can create an organized process to reduce vulnerabilities in applications before they’re shipped or deployed are recognizing tremendous cost savings, both internally and for the customers, partners and other stakeholders that rely on their software. ý www.stpmag.com •
39
Best Prac t ices
Some Basic Training In Requirements This month, I’d like to share Most coders are typically a few unfortunate realities kept at arm’s length from about gathering and managthose who plot business ing requirements for the direction, so Kokune’s sugcreation of software. gestion is sure to raise eyeFirst, coders can work brows. But the NTT team’s like crazy to keep requirerationale, spelled out in a ments in sync with busiMarch 2007 article in the ness strategy, only to see Journal of Systems and their projects fail. Why? Software, is sound. Because too often, the The researchers take aim Geoff Koch business strategy itself is at business scorecard somewhere between slightly off-kilter methodology, a common approach to and hopelessly flawed. modeling strategy. Starting with an overSecond, as any list of requirements arching business goal, companies fill out grows, some proposed functions inevitathese scorecards as they work backward, bly start to conflict with others. Organfleshing out the necessary business izations are often at a loss as to how to processes and technical functions to recognize such conflicts before coding achieve the desired result. and testing, condemning themselves to Scorecards often dictate the requireexpensive rework down the line. ments for new software projects. But Then there’s the problem of soliciting researchers say developers should apply end-user feedback. There’s no polite way more of a critical eye to these documents, to put this. The more you ask the great which too often are built with paltry unwashed masses for their opinion about input from the individual contributors your code, the more likely it is you’ll throughout the company who will be struggle with their vague, redundant, most affected by new strategies. inconsistent or quasi-useless answers. Kokune suggests that developers In short, dealing with requirements is should “extend strategic goals by facts.” difficult—a truism that you hardly need a As an example, the researchers describe Ph.D. to appreciate. But this doesn’t their work with an unnamed Japanese mean that the academic crowd isn’t payauto company that was tweaking its teching attention. If you can stomach dense nology systems to achieve goals such as prose and an occasional tendency to belareducing logistics costs. bor the obvious, much recent research Vetting the preexisting business stratesuggests that pointy-headed professorial gy required interviewing 15 workers from types may know a thing or two about best five of the company’s units involved in practices, too. logistics. Talking to the sales organization, for instance, revealed frustration Don’t Take Strategy on Faith with sales quotas that were unreasonably Take the work of Atsushi Kokune. Along high for new products. This presented an with several colleagues from the research opportunity that wasn’t necessarily and development organization of NTT reflected in the original scorecard; nameData Corp. in Japan, Kokune advances ly, if new software could make it easier for the somewhat radical notion that develproduct line managers to listen to and opers should start doing a bit of fact incorporate feedback and data from the checking of all the MBA-forged strategies sales force, overall logistics-related costs that underpin new software projects. might actually go down.
40
• Software Test & Performance
“When examining the validity, especially completeness, of software requirements, it is necessary to check if software functional requirements are consistent with business goals and business processes,” the authors write.
Catch Conflicts Early It’s also necessary, says Mohamed Shehata in a February 2007 article in Computer Networks magazine, to make sure that requirements don’t conflict. In his research, Shehata, a professor in the Department of Electrical and Computer Engineering at the University of Calgary, focuses mostly on the nitty-gritty functional requirements that are gathered before building any complex system. More requirements mean more potential conflicts, a fact Shehata and his collaborators illustrate by giving several examples from a fictitious digital home system. One requirement, R6, might be that the system automatically opens windows in the living room at 11 pm. Another, R7, might automatically increase/decrease the temperature of the house to 68 degrees Fahrenheit at 11 pm. If it’s too hot or cold outside, then the act of opening the windows may work against efforts to regulate temperature. This conflict would surely be spotted before coding began if the system was made up of just 10 requirements. But what if the list had 1,000 or even 10,000 requirements? To handle this kind of complexity, Shehata urges developers to act the same way biologists do when confronted with lots of potentially interrelated flora and fauna; that is, build a branching taxonomy to show all connections and interactions between requirements. The goal, Geoff Koch writes about science and technology from Lansing, Mich. Write to him with your favorite acronym, tech or otherwise, at koch.geoff@gmail.com. MAY 2007
Best Practices the authors say, should be to build a taxonomy that can answer questions such as “When and how do two requirements interact? How to detect this interaction? And how do we resolve it?”
Garbage In, Requirements Out? Conflicting requirements can come from many sources, including the peskiest bunch of all: end users. From people who call customer support asking for help with their cup holder (sorry, it’s the DVD tray) to those who happily put down the phone to close windows in their house (um, I think the support guy meant the windows on your computer screen), tech lore is flat-out loaded with groan-inducing tales of ineptitude. The problem is that best practices from across the world of hardware and software engineering invariably suggest getting healthy servings of feedback from the customer base. So it’s no surprise that when writing requirements based on such feedback, it can be problematic to separate the wheat from the chaff.
Leave it to mathematicians to come up with an innocuous, amorphous term such as “non-canonical requirements” to describe the collective train wreck of ideas from common sense–challenged end users and other stakeholders. In research published in the January 2007 issue of Knowledge and Information Systems magazine, Peking University professor Kedian Mu and collaborators suggest use of the ominous-sounding annotated predicated calculus to make sense of the mess. Fair warning here: The paper involves heavy slogging though many equations. After several pages, it becomes clear how unlikely it is that anyone besides applied mathematics professors like Mu would be able to understand, let alone actually use the model he suggests. The point, however, is that pushed through some sort of systematic filter, even a hopeless jumble of customer suggestions can be turned into useful requirements. OK, so you might not lean heavily on lattice theory the way Mu and his
co-authors do. But it is possible you’d take other cues from his approach. For example, Mu suggests assigning concrete measures of vagueness and redundancy to the list of individual requirements. This makes it easy to sort the list and ultimately reason toward a sane overall requirements document. Then again, sanity may be too ambitious a goal when it comes to hitting the moving target of requirements management. On days when business strategy seems especially stupid, your systems architect is putting in for stress leave and the data from the latest customer focus group seems to have topped a new level of incoherence, it’s tempting to throw up your hands and think of another line of employment. Why not doctoral work in computer science or applied mathematics? After all, for someone who earns a living doing scholarship, there’s no such thing as a bad experience. Rather, it’s all fodder for the next research paper. ý
Index to Advertisers Advertiser
URL
Agitar Software Inc.
www.agitar.com/agile
2
Bredex
www.bredexsw.com
8
Coverity
www8.coverity.com
11
Empirix
www.empirix.com/freedom
Eclipse World
www.eclipseworld.net
Fortify
www.discoverhackistan.com
Hewlett-Packard
www.hp.com/go/software
44
Klocwork
www.klocwork.com
43
Parasoft
www.parasoft.com/stpmag
Perforce
www.perforce.com
33
Pragmatic
www.softwareplanner.com/STPA
41
Software Test & Performance Conference
www.stpcon.com
31
Software Test & Performance White Papers
www.stpmag.com
18
Seapine Software Inc.
www.seapine.com/qawps
TechExcel Inc.
www.techexcel.com/alm
19
TotalView Technologies (formerly Etnus)
www.totalviewtech.com/memoryscape
13
MAY 2007
Page
3 34 22, 23
6
4
www.stpmag.com •
41
Future Future Test
Test
The Next Step: Office of Test Management ity testing and metrics for Today’s mature IT industry the enterprise. Specifically, has created a double-edged the TMO defines the testing sword for businesses. On methodology, selects and one side, organizations are procures testing tools, idencompletely reliant on their tifies acceptable standards of software applications—they software performance, and simply can’t compete withoversees the successful exeout them. On the other cution of testing. side, users and customers believe software should be Why Is It Necessary? foolproof, and they’re Mark Sloan The number of customer increasingly intolerant of touchpoints affected by software grows IT-related snafus and breakdowns. every day. Software bugs that cause a negCustomers patronize businesses that ative customer experience can seriously deliver satisfying and trouble-free experidamage a company’s brand and bottom ences. Prioritizing the issue of quality line. The imperative for properly funcacross the enterprise is the key to maintioning software means that testing mertaining balance so that neither blade cuts its a direct line to senior management. into the bottom line. The increased complexity of software The days of narrowly focused, comapplications also is driving the need to partmentalized testing that had little-tothink about testing strategically—verticalno visibility within the organization are ly as well as horizontally. And integration over. Given the stakes in today’s marketis fast becoming table stakes; today’s place, the time has come to elevate testapplications rarely operate in a standing to a strategic, enterprise-wide peralone mode. Application interaction and spective—anchored by a high-profile test interdependency create ripple effects management office (TMO). that can be felt far downstream. Tactical, What Is the TMO? narrowly focused testing sometimes The TMO is a centralized organization makes it impossible to find the root cause responsible for everything related to testof problems. A quality review from an ing. A great deal of effort has been end-to-end process perspective is vital. poured into IT optimization, including This is simplified by the TMO. process reengineering and the establishWith an ever-widening scope of ment of program management offices. processes and applications, a global disThese PMOs centralize and standardize tribution of developers and testers, and a methodology. The TMO brings these desire to standardize technologies and optimization principles and benefits to platforms, the time is right for an empowsoftware testing. ered, centralized testing organization to Ideally, the TMO reports directly to bring it all together. the CIO or CTO, and has authority over the definition, execution and reporting Beyond the Legacy Approach of all applications as well as systems qualMoving to a strategic testing approach
42
• Software Test & Performance
led by the TMO requires changes throughout the enterprise and may generate challenges and opposition. Three items are critical to the TMO’s success. First, establish the division of responsibilities. Instituting the TMO means that some decisions concerning testing and quality formerly made in middle management will now be made at higher levels. This issue must be addressed upfront, with the full backing and enthusiastic support of senior management. Second, create a transition plan. Incorporate the viewpoints of knowledgeable testing resources to make the plan as effective as possible. Should you transition by application, platform or process? What’s the timeline? Remember to test the plan—the transition phase shouldn’t be an excuse for a lapse in quality. Finally, take a long-term view. In the short run, implementing a TMO may seem frustrating for some and add layers to the process. Stay focused on the goal: optimizing organizational processes.
Benefits of Strategic Solutions Companies that adopt this strategic model report improvements in process, cost savings and quality. The TMO establishes testing consistency across all software applications, leverages and consolidates resources, and encourages best practices enterprise-wide. As importantly, the TMO promotes communication and teamwork, limiting the finger-pointing common in distributed testing environments. Cost savings accrue from reduced head-count and the ability to make better hardware and software purchasing decisions, based on centralized planning and volume discounts. Speed-to-market with new, higher-quality products eliminates problems before applications go live. I believe that a strategic solutions approach to software testing anchored by the TMO is a competitive necessity today. Businesses that move quickly to embrace strategic testing will win customer loyalty and improve the bottom line—while avoiding the embarrassment and pain of that unyielding double-edged sword. ý Mark Sloan is vice president of consulting and professional services at Convergys, which provides CRM, billing and HR outsourcing. MAY 2007
Make sure your critical applications are never in critical condition. Weâ&#x20AC;&#x2122;ve turned I.T. on its head by focusing on I.T. solutions that drive your business. What does this mean for Quality Management? It means efficiency that results in shorter cycle times and reduced risk to your company. It also means you can go live with confidence knowing that HP Quality Management
upgrades. Find out how. Visit www.hp.com/go/software. Technology for better business outcomes.
Š2007 Hewlett-Packard Development Company, L.P.
software has helped thousands of customers achieve the highest quality application deployments and