Publication
: ST ES b BE CTIC We SOA A g nd PR stin s a Te vice r Se
A
VOLUME 4 • ISSUE 11 • NOVEMBER 2007 • $8.95 • www.stpmag.com
JMeter and Selenium — Functional Testing’s Mammoth Matchup
A Process So Easy To Use, Anyone Can Do It
The Great Test Tool Hunt
Don't Etch Those Test Metrics Decisions in Stone
VOLUME 4 • ISSUE 11 • NOVEMBER 2007
Contents
12
A
Publication
C OV E R ST ORY
Take Automated Testing Out Of The Cave, Into the Future
In the 21st century, no tester can go it alone. With this framework approach, non-technical staff can lend a hand. By Jun Zhuang and Robert Carlson
16
Fresh Duo For Function Testing
JMeter and Selenium: For the best and quickest test-plan coverage, apples and oranges do mix. These two open source options, each with its own strengths, work even better together. By Alan Berg
28
Test Tool Shootout When security’s at stake, choose your weapons with care. By Elfriede Dustin
23
Winning The SOA Shell Game
Testing SOA-based apps can be like taking aim at a moving target. Even if the services you’re testing were developed inside company walls, tiny, unseen changes can wreak havoc on your automated tests. Here’s how to keep By Jim Murphy sight of the ball.
NOVEMBER 2007
Depar t ments 7 • Editorial Cough if you're against the spread of disease.
8 • Contributors Get to know this month’s experts and the best practices they preach.
9 • Out of the Box New products for testers.
32
Meaningful Metrics
It takes a lot more than numbers to evaluate software quality. You also need to analyze statistics for what they don’t reveal. By L.R.V. Ramana
36 • Best Practices Testing with SOA and Web services requires knocking down some walls. By Geoff Koch
38 • Future Test To mitigate the risks of reuse, an established quality process can enforce policy and promote trust. By Wayne Ariola
www.stpmag.com •
5
Ed N otes VOLUME 4 • ISSUE 11 • NOVEMBER 2007 Editor Edward J. Correia +1-631-421-4158 x100 ecorreia@bzmedia.com
EDITORIAL Editorial Director Alan Zeichick +1-650-359-4763 alan@bzmedia.com
Copy Editor Laurie O’Connell loconnell@bzmedia.com
Contributing Editor Geoff Koch koch.geoff@gmail.com
ART & PRODUCTION Art Director LuAnn T. Palazzo lpalazzo@bzmedia.com
Art /Production Assistant Erin Broadhurst ebroadhurst@bzmedia.com
SALES & MARKETING Publisher
Ted Bahr +1-631-421-4158 x101 ted@bzmedia.com Associate Publisher
List Services
David Karp +1-631-421-4158 x102 dkarp@bzmedia.com
Lisa Fiske +1-631-479-2977 lfiske@bzmedia.com
Advertising Traffic
Reprints
Phyllis Oakes +1-631-421-4158 x115 poakes@bzmedia.com
Lisa Abelson +1-516-379-7097 labelson@bzmedia.com
Director of Marketing
Accounting
Marilyn Daly +1-631-421-4158 x118 mdaly@bzmedia.com
Viena Isaray +1-631-421-4158 x110 visaray@bzmedia.com
READER SERVICE Director of Circulation
Agnes Vanek +1-631-443-4158 avanek@bzmedia.com
Customer Service/ Subscriptions
+1-847-763-9692 stpmag@halldata.com
Cover photo illustration by LuAnn T. Palazzo
President Ted Bahr Executive Vice President Alan Zeichick
BZ Media LLC 7 High Street, Suite 407 Huntington, NY 11743 +1-631-421-4158 fax +1-631-421-4130 www.bzmedia.com info@bzmedia.com
Software Test & Performance (ISSN- #1548-3460) is published monthly by BZ Media LLC, 7 High St. Suite 407, Huntington, NY, 11743. Periodicals postage paid at Huntington, NY and additional offices. Software Test & Performance is a registered trademark of BZ Media LLC. All contents copyrighted 2007 BZ Media LLC. All rights reserved. The price of a one year subscription is US $49.95, $69.95 in Canada, $99.95 elsewhere. POSTMASTER: Send changes of address to Software Test & Performance, PO Box 2169, Skokie, IL 60076. Software Test & Performance Subscribers Services may be reached at stpmag@halldata.com or by calling 1-847-763-9692.
NOVEMBER 2007
Fighting a Spam Insurgency learn how to test legitiMy company recently inmate applications, bestalled a spam-filtering syscause your days are numtem on top of the one that bered. Someday soon, I came with our e-mail syswon’t say when, the stars tem. The IT folks apparentwill align and someone ly had some trouble during will figure out a way to the installation. At least, stop your employers from that’s what they said doing their dirty business. when we complained about David Berlind last strange network behavior month offered a suggesthat day. There were service tion on his blog (blogs disruptions to our local Edward J. Correia .zdnet.com/Berlind) that I servers and our Internet think would get us closer. His idea was connection, and I saw no reduction in for AOL, Google, Microsoft and the amount of incoming spam I receive, Yahoo, the world’s four largest e-mail which is easily 100 pieces an hour. providers, to standardize on a single I am told the problems have been anti-spam strategy for their e-mail sysfixed, but I’m not so sure. The nettems—a de facto standard—that other work is still pretty flaky, and I haven’t e-mail developers and filter makers seen any of the spam-filtering funcwould have to follow to remain comtionality I was told to expect from the patible. new solution. The IT department is Because in addition to the false-posstill pretty slow about returning my itive problem we users face for incomphone calls and e-mail inquiries. ing mail, a problem that is at least As much as I dislike service disrupsomewhat within our control, Berlind tions, I resent more the reason that we decries the issue of deliverability of have to put up with this particular one. outbound mail, over which we have Much as I’ve tried, I can’t seem to play absolutely no power. “It is probably Pollyanna’s Glad Game with spam— true that if everyone in the world ran there’s just isn’t any sensible reason just one solution, we’d be able to for this evil disease. Don’t these peotweak that solution in such a way that ple realize that no one falls for these we’d finally get a handle on the lame-brain offers with half the terms inbound and outbound problems assomisspelled or with dollar signs instead ciated with spam,” wrote Berlind in his of Ses? Would you apply for a loan using an address you received via Oct. 15 post. e-mail? I sure wouldn’t. So for the testers of spam, I have a Which brings me to my next point. few questions. Why do you think this If these spammers are out there, which work is worthy of your time? Are you they most assuredly are, then there aware that your employer causes milmust also be people testing their spam lions of innocent people to lose hunto maximize its penetration of the vardreds of millions of dollars and/or ious filters on the market. So it should hours of productivity every year, multifollow, then, that some of those testers plied when employers lose that promight even be reading this column ductivity? right now. And just one more question: Mr. If that’s the case, I’d like to offer a or Mrs. spam-tester person: If you suggestion: Learn a new trade. Attend know all that, how do you sleep at a software testing conference and night? ! www.stpmag.com •
7
Contributors JUN ZHUANG is senior QA engineer at MedPlus, a Quest Diagnostics company that designs medical devices and systems. He has been working in software testing and automation since 2000 and has experience with numerous automation tools and approaches. ROBERT CARLSON is manager of test automation and load/performance testing at MedPlus. He has been involved with software quality since 1995, and was involved in the design, implementation, support and build-out of various automation approaches across multiple technologies and industries. The two teamed up to explain a framework they built for MedPlus that simplifies automated testing for non-technical staff. Our cover article begins on page 12. The author of numerous articles and papers on software development and testing, ALAN BERG once again employs his interesting style in this tutorial on how to automate functional testing using the popular open source tools JMeter and Selenium. Turn to page 16 for his article. Berg is the lead developer at the Central Computer Services at the University of Amsterdam, a post he has held for the last seven years. He holds a bachelor’s degree, two masters’ degrees and a teaching certification. JIM MURPHY is lead architect and co-founder of SOA solutions provider Mindreef, and brings more than 10 years’ experience designing, implementing, testing and debugging distributed software architectures using Java, .NET, C++ and XML. Since SOA apps often include services created and controlled by companies other than your own, Murphy teaches you how to automate your SOA tests for monitoring and validating in-house and third-party services to stay on top of changes that could affect your SOAs. His article begins on page 23. Myriad testing tools exist to help automate the process of securing applications. But which ones are right for your situation? Beginning on page 28, QA consultant ELFRIEDE DUSTIN summarizes the different types of tools available and the requirements met by each, and maps those tools to specific vendors. Author of “Effective Software Testing” (Symantec Press, 2006) and a number of other books on software security, Dustin is an independent software testing and QA consultant in the Washington D.C. area.
Failure to understand and analyze metrics can lead to wrong conclusions about project health. At minimum, such conclusions can result in confusion; at worst, a loss of project control, delayed deliveries, poor quality, rework and cost overruns. Beginning on page 32, learn how to dive below the surface of your test metrics with L.R.V. RAMANA, senior test manager at Virtusa, a development and testing consultancy in Hyderabad, India. He has nine years in software testing, test management and fine tuning of test processes. TO CONTACT AN AUTHOR, please send e-mail to feedback@bzmedia.com.
8
• Software Test & Performance
NOVEMBER 2007
Out of t he Box
Sound the Fanfare: iTest Team Can Automate Analysis Results of Any Test If you’re using recordand-playback tools to help automate device function and regression testing, you’ve probably wished the tools also could analyze your test results. According to the Fanfare Group, your wish has come true. The company in late October began shipping iTest Team, a version of its test setup, capture and documentation tool that it claims can now process test outputs and indicate where tests pass and fail. The new release builds on iTest Personal, the capture tool first shipped in May that permits the creation of test scenarios, a type of test scripts that can be sent to remote testers, developers or automation teams for reference, editing and reuse through a drag-and-drop interface. Fanfare founder and CTO Kingston Duffie said that iTest Personal is useful mainly for getting a tester to the starting point of testing. “iTest Team’s primary objective is to take you to the next step—to define what the test is and to define the criteria that determines pass and fail.” In short, he said, the goal is total automation. “This will perform lights-out testing on its own for weeks,” he said. The newest module, and the one with Fanfare’s secret sauce, is the automatic mapping tool that Duffie said works by normalizing data from disparate systems—such as SNMP, HTML and CMD shell sessions—into XML, which is processed and displayed in a point-andclick environment. Once these outputs NOVEMBER 2007
With iTest Team, test outputs (in the response panes, above) are mapped to rules that determine whether tests pass or fail (in the execution pane, foreground).
are mapped, there is never a need to revisit them for test execution. “You define test data one time,” said Fanfare Vice President of Marketing David Gehringer. “The alternative is to have highly skilled Tcl or Perl scripters writing 10 lines of
code to find one value.” Such scripts require maintenance and are easily broken. “Our solution requires no scripting whatsoever,” he said, but added that custom scripts can be used, if desired. “If you have people that live and breathe Tcl, they can still use them, but they’ll be less brittle and require less maintenance.” Gehringer described a simple process for automating analysis of test pass/failure using the iTest Team GUI environment. “You simply go to a failure, right-click and add a rule. There’s no manual parsing of test outputs,” he said. “The easiest part of automation is sending commands,” added Duffie. “The hardest part is figuring out what determines a pass or a fail.” Duffie contrasted iTest Team with static code analyzers and unit test tools, which he said play an important role, but are just parts of the total quality picture. “It’s all well and good to say that code is correct, but the biggest impact comes in the field, when software doesn’t meet feature requirements.” iTest Team also permits testers to kick off tests with multiple threads. “So if someone wants to kick off a thread that puts a load on a system and then test another part of the application to see how it performs, they can do that easily. They just click a check box next to the processes they want running,” explained Duffie. Available now for Linux, Solaris and Windows, iTest Team subscription pricing starts at US$6,500 per year plus the cost of optional add-ons for Web testing, SNMP, test equipment, Tcl rendering and response mapping. An Eclipse plugin also is available. www.stpmag.com •
9
My Tool’s Faster Than Your Tool If you’re using Oracle’s tool to unload database tables for testing on Windows, there’s a company claiming to offer an alternative that’s twice as fast without gobbling up huge chunks of system resources. Innovative Routines International (IRI) in September unveiled a Windows version of Fast Extract, claiming the ability to unload “large Oracle tables in parallel to portable flat files in fixed- or variable-length formats,” while using all supported combinations of SQL SELECT features. The company claims a data rate of twice that of Oracle’s spool command. Perhaps more of interest to developers building ETL streams, the tool creates layout metadata of the extracted files using Oracle’s SQL*Loader utility for bulk loads and CoSort’s SortCL language for transformation, protection, reporting and preload sorting. The metadata also helps increase data warehouse performance and database availability, the company said.
SOAtest 5.5 in Line With Team System With the October release of SOAtest 5.5, Parasoft adds support for the Windows Communication Foundation and other protocols to the services testing suite, allowing testers of .NET-based applications to flex the AUT’s messaging in a multitude of open and proprietary protocols. SOAtest 5.5 also automates the creation of intelligent stubs, letting testers “emulate the behavior of a running system” to test services in the context of their actual behavior rather than on a live system. According to the company, the suite for Linux, Solaris and Windows also now is fully integrated with Visual Studio Team System for Software Testers, and able to receive test results directly within Microsoft’s environment.
Original Software, Unoriginal Features A version of TestDrive-Assist released in September by Original Software adds linkand spell-checking capabilities. The tool also includes a markup functions, allow-
10
• Software Test & Performance
ing testers and developers to annotate any area of the program under test, the company said. The link checker reportedly “checks and flags potential problems [and] creates an annotated list of many different link errors,” including 404 (server not found) errors, time-outs and redirection messages. The spell-checker currently supports English, Spanish, Dutch and custom terms. The markup function “records and creates an annotated list of comments and details the corrected actions.” Changes to text, images and formatting appear as markup or highlighting, and can be accompanied by comments. TestDrive-Assist can stand alone or work with Original’s regression testing and code-free scripting tools.
Coverity Gets Boolean on Code Claiming to break barriers in the realm of static code analysis was Coverity, which in September released a software analysis engine based on Boolean satisfiability that’s now part of its Prevent SQS software quality system for C/C++ and Java. Satisfiability is generally defined as the task of figuring out if the variables of a particular Boolean formula can be assigned to make the formula resolve as true. It is further defined that no assignments can make the formula false, else to say it is unsatisfiable. According to the company, the new False Path Pruning Solver is unlike static analyzers that use dataflow analysis and multiple checkers. The so-called SAT engine uses a new technique that “creates a bit-accurate representation of a software system where every relevant software operation is translated into Boolean values (true and false) and Boolean operators,” such as and, not and or. This enables solvers to analyze complex defects with fewer false positives by determining if paths to potential defects are feasible and “pruning” unfeasible results. “Bringing SAT’s proven capabilities to static code analysis will provide developers with an arsenal of new solvers that uncover the toughest code defects,” said Coverity CTO Ben Chelf in a statement. “By leveraging technology that automates the accurate detection of defects, developers can
stop wasting time tracking down bugs.” Next year the company says it will release additional solvers for checking code assertions statically and for detecting critical bug categories, such as integer overflows. Pricing is based on project size.
GammaTech Has The Sonar Also in the source-code analysis business is GammaTech, which in September unveiled CodeSonar Enterprise, claiming it permits managers and test teams to collaborate on C/C++ code analysis results across the company. Users can limit their views by relevance, module or warning class. Warnings can be annotated and tagged with state information, giving managers quick access to fix status. The tool also stores historical information, enabling trend reports and detailed sorting. According to the company, the tool for Linux, Solaris and Windows also now includes a more accurate static-analysis engine, better performance of examination times through parallelization of some of its analysis phases and better support for embedded compilers. CodeSonar is set for release by the end of 2007; pricing for small projects will start at $4,000.
QNX Opens SMP RTOS Source Code If you’re working on applications running on the QNX Neutrino embedded real-time operating system, testing recently has been getting easier through greater access to kernel source code. QNX Software Systems has now added the source code of symmetric multiprocessing capabilities of the core RTOS to its list of published intellectual property. Under its recently introduced hybrid license, Neutrino developers and testers can see, extend and modify the source code, and aren’t required to return those changes to the community. Royalties still apply to designs that incorporate the RTOS, and commercial developers must still pay for Momentics development seats. The company also recently launched Foundry27 (www.foundry27.com), a NOVEMBER 2007
community portal and support site where developers and testers can find downloads, projects, ideas, discussion boards and other helpful resources. In its third decade in business, QNX in 2004 was acquired by car audio and telematics systems maker Harman International Industries, which kept it as an independent unit.
Altosoft Sings the Praises Of Monitoring in Insight 2.1
Update to Micro .NET Framework Also on the sonar for embedded testers should be Microsoft, which published .NET Micro Framework 2.0 Service Pack 1, with enhancements it says help streamline deployment of applications and updates to devices in the field. SP1 also now better protects devices from unauthorized installations by preventing the deployment of unsigned firmware and application code. Other new tools include a bitmap font generator that “gives developers increased flexibility in interface design and provides increased localization support,” the company said in a statement. Several device makers announced support last week for the .NET Micro Framework, including Atmel, Embedded Fusion and NXP Semiconductor.
Get Requirements From Word, Excel Application requirements are often distributed across hundreds or even thousands of documents stored on employee hard drives. Aware of this reality is Compuware, which in October released Optimal Trace 5, a version of its requirements management solution that it says can import text from Microsoft Word and Excel, or any document or file structure, into its repository and make the data useful for determining business intent. Optimal Trace 5 also reportedly allows fields to be defined at any level, including by project, package and requirement. A customizable requirements-capture system has a simplified list style concept. Send product announcements to stpnews@bzmedia.com NOVEMBER 2007
Improvements to Insight’s drag-and-drop dashboard environment simplify creation of application monitors and offer more control over user roles and privileges, the company claims.
Business intelligence-solutions developers Altosoft on Oct. 29 began shipping Insight 2.1, an update to its BI platform that simplifies the creation, security and maintenance of performance dashboards, and improves performance of historical data analysis, the company said. Insight is built around an analytics engine for Windows XP, 2003 and Vista that combines real-time event processing, historical data analysis and application process monitoring. The solution is designed to deliver information about business activities for operational forecasts, alerting and incident management. According to the company, Insight 2.1 improves the drag-and-drop environment for creating dashboards, including the addition of new charting options and navigation, formatting and other predeveloped graphical components. The company also reports better control over user roles and privileges,
including the ability to control drill-down and dashboard appearance and layout. Performing historical analysis using Insight has never required a data warehouse. The company now says that retrieval speeds have been improved by as much as 120 percent, thanks to an optimized analytical-calculation engine capable of analysis across multiple databases of multiple terabytes in size. This gives rise, according to company claims, of higher throughput and faster calculation of key performance indicators, central factors for its business intelligence-delivery capability. The incident management module now offers more options for tracking, prioritizing and resolving incidents and exceptions, and can trigger alerts and create incidents based on realtime forecasts generated by a capability Altosoft calls predictive process analytics. Pricing was not disclosed. www.stpmag.com •
11
By Jun Zhuang and Robert Carlson
T
est automation typically relies on a select group of technical people within the QA organization for development and
maintenance. This dependence can be magnified when an automation framework is used. Although beneficial, frameworks can make it difficult to involve non-technical staff effectively in the automation process. What’s needed—and what we’ve developed—is an automation framework approach that is suited to bridging this gap. Our approach captures both test data and test logic in a format that’s easily managed and consumed by both technical and nontechnical personnel. This increases the QA staff’s ability to actively participate in the automation process, decreasing the time required for automation projects. At the same time, fundamental elements of modularity and data reuse required to support a robust, low-maintenance automation solution are fully realized.
What’s Required A key challenge for software test automation is determining how to automate hundreds or even thousands of existing manual test cases in a relatively short time frame without introducing Jun Zhuang and Robert Carlson work for MedPlus Inc., where they develop and test software for medical devices.
12
• Software Test & Performance
too much maintenance overhead. Simple record and playback attempts often fail because of their inherent disorganization and everincreasing maintenance demands over time. Using a function-based approach requires less maintenance, but introduces new complexities and severely reduces the ability of non-technical personnel to participate in the process. An action/keyword-driven approach opens the automation to staff outside the technical circle, but requires significant overhead in documentation, management and coordination to keep the growing system from falling into disarray. This article presents a frameworkbased approach to automation that is superior to these common strategies. This intuitive methodology implements results in a well-organized, scalable framework that serves as a strong foundation for automation efforts. Maintenance is minimized as a result of high code and data reuse. In this data-driven approach, both the test data and logic are captured in a logical, organized manner that enables non-technical staff to make significant contributions, increasing the practical capacity available to build up the automation.
Modular and Data-Driven To accomplish a task in an application, a user typically follows a specific set of steps in a specific sequence with specific input—commonly referred to as a workflow. For this article, we’ll focus on a simple workflow of deficiency analysis in the context of an electronic patient record–management system. In a medical setting, a complete chart without deficiencies is a requirement for billing. A deficiency analyst is responsible for checking patients’ medical records for completeness and for adding markers to patients’ charts to indicate deficiencies. To test this workflow, we incorporated the creation of a chart with documents on which deficiency markers are placed. Basic workflow is: Create Patient Record ! Add Documents to Patient Record ! Add Deficiencies
For each of these three unique activities, a step-level driver function is created. In this case, the following functions are created: CreatePatientRecord() AddDocument() AddDeficiency()
We refer to these drivers as modules. They’ll become the key building blocks for our test cases in this approach. Similarly, other modules can be created for other functionalities of the medical NOVEMBER 2007
FRAMEWORKS FOR AUTOMATED TESTING
bine these workflows to meet general test objectives. Consistent with this framework’s data-driven approach, a top-level master data table is created to pull together all areas of test. This table is referred to as the test case manager. A generic top-level driver script is also created to process this table, directing testcase execution to execute desired processing modules, according to the data table. Next, let’s examine the relationship among data tables and the rules governing their format.
FIG. 1: MASTER TABLE PROCESSOR Test case starts
Retrieve test case data from master data table
Column 1 empty?
Y
...
Y
N
N
Column n empty?
Y
Direct execution to driver function 2
Direct execution to driver function n
Driver function 1 retrieves data from child data table 1 and processes the data (drives the app)
Driver function 2 retrieves data from child data table 2 and processes the data (drives the app)
Driver function n retrieves data from child data table n and processes the data (drives the app)
Data Separate From Code Each module is responsible for processing data required to accomplish its designed purpose within the application • Software Test & Performance
Test case ends
N
Direct execution to driver function 1
record system to complete deficiencies, edit documents, send messages and perform other tasks. Each module should be designed to handle both positive and negative cases, with appropriate exception handling, since the collection of these modules will provide the foundation for building automated test coverage across the entire area of deficiency analysis. To test the workflow identified, the developed modules must be executed in sequence. This sequence of execution requires a reliable transition from the final state of one module to the beginning state of the next. Either transition states must be stored and transferred across modules or all modules must return to a common state. We chose the latter approach. For simple applications, we developed all modules to return to a single common landing page. Generally, the log-in page or the first page after logging into the application is a good choice. For more complex applications, a landing page can be selected for each major area of functionality and used in similar fashion. Once processing is completed for each area, execution returns to the general application landing page—again, most likely a log-in or home page in the application. While the landing-page approach may introduce some extra motion compared to managing transition states, we’ve found it to be reliable, easy to maintain and easy to apply across all modules.
14
Y
Column 2 empty?
Data Table Structure and Rules
under test (AUT). For example, the CreatePatientRecord() function will: • Read in data from an Excel spreadsheet in a data table • Navigate to the Add Patient page • Fill in corresponding fields on the Add Patient page • Click the Save button to create a new patient record • Verify patient created or not created, depending on the test being performed Data and code are separated in this design. Test data is always contained in data tables, never directly coded in
Data consumed by this automation framework is provided in Excel spreadsheets referred to as data tables. They obey the following rules: Overall data table structure is two (or three at most) tiers deep. • The two-tier structure has a master data table and a single layer of referenced child data tables. This is the recommended approach unless a specific need is identified to introduce the three-tier structure. • For a three-tier structure, a child data table may reference another, deeper-level child table. This is generally done to reduce overall complexity of the tables, or to
FIG. 2: TEST CASE EXECUTION
Test case starts Log into the app
Add Patient Go to ‘Add Patient’ page Fill in patient info Save patient Return to common landing page
Add Document Go to ‘Chart Editor’
Add Deficiency Go to
‘Deficienty Analysis’
Retreive patient chart Retreive patient chart
Add document Add deficiencies Return to common landing page
module functions. Changes to the underlying code can be made transparently, so long as the data is still processed according to test needs. Similarly, the data content can be varied and even extended to meet new test requirements. After creating the modules to support the automation of various workflows, a means must be supplied to com-
Return to common landing page
Test case ends
Log out
group together common data sets (for example, detailed patient information) that are reused across multiple tasks within an area. The master data table (or test case manager) defines tasks to be performed for test cases (see the master data table in Table 1). • Generally, each row in the master data table is a test case. A test case NOVEMBER 2007
FRAMEWORKS FOR AUTOMATED TESTING
is defined by a series of tasks. These tasks must be performed in order from left to right as they appear on the spreadsheet. If a test case needs to complete a task from a right column first, it only has to be split into two or more rows. • Each column of the master data table maps to a child data table through the column header. In general, all child data tables are contained in same workbook as the master. Each child data table contains detailed data needed to complete a specific task. Master data table processing occurs according to the following: • A single row is parsed from left to right by underlying driver function. • When data is encountered in a cell, the associated task (identified by column header) is executed. • Generally, a single number serves as a reference to identify a specific row in the child data table to be processed. However, a range reference or metadata can be included, so long as the underlying module supports parsing of the data type. • If a cell is empty, the associated task isn’t executed, and processing moves to the cell to the immediate right. Each child data table contains data used to perform a specific task; for example, adding a patient (see the Add Patient Child data table in Table 1). • Usually, each column on a child table maps to a field on a specific page within the application under test, or to a practical action to be taken on a page. For example, the Add Patient data table has an Action column to indicate whether to add a new patient or edit an existing one, since the same window is being used for both purposes. • Each row contains the required data to accomplish a task in the application. This may include validation data, notes, etc. Additionally, a child data table may drive processing across multiple pages in the AUT—this can help to reduce the number of data tables when data from multiple pages is needed for a task and the amount on each page isn’t very large. • A driver function must be created for each child data table to process the data. This function is also responsible for bringing up the NOVEMBER 2007
TABLE 1: SAMPLE DATA TABLES MASTER DATA TABLE (TEST CASE MANAGER) Test case description
Add patient
Add document
Edit document
Add deficiency
...
2
3
150
2
14
Add Deficiency
ADD PATIENT CHILD DATA TABLE Row # Action 2 Add
Last Doc
First John
Middle
Sex M
DOB
Acct# Street ABC-123
State
Zip
...
ADD DOCUMENT CHILD DATA TABLE Patient acct#
Row#
...
Document type Blood Test
2 ABC-123
3 4
ADD DEFICIENCY CHILD DATA TABLE Row#
Patient acct#
Document type
Deficiency type
Assigned user
2
ABC-123
Blood Test
Signature
Smith, John
page to which the child data table is mapped and setting the application to the common landing page after processing the data.
A Practical Example Using our automation tool, we created a master driver function to process the data in the master data table. This highlevel driver function requires practically no business intelligence and has a robust recovery system to set the application to the ready state in case of test case failure. It operates as shown in Figure 1. Figure 2 illustrates tracing through the driver processing for the data tables in Table 1.
Benefits of the Approach This approach yields a framework model that is intuitive to design, build and maintain. Well-designed data tables not only organize the system, but provide an element of self-documentation. Separation of code and data offers a layer of abstraction that increases the underlying code’s flexibility and resilience. High code reuse at the module level further minimizes maintenance. Many places in the system are appropriate for generic or templatebased approaches, further extending
Due date
...
the ability to reuse code and data. With this approach, domain engineers (manual testers, business analysts, etc.) can play a substantial role in the design and build-out of test automation assets. With top-level processing logic and test data contained in easily accessible Excel spreadsheets, we can now tap the knowledge of these engineers practically. This provides a tangible boost to resource capacity available to build the automation and further ensures that automated tests accurately reflect test intentions. Our experience indicates that such personnel were able to independently create and execute tests after only a four-hour training session. This approach can be scaled to meet a variety of test needs. A minimal framework can be created quickly to cover a small application or specific area of a larger system. In this case, all data sheets can be designed and contained in a single compact workbook; or a specific test need, such as boundary testing, can be met with minimal adaptations. Best of all, this approach has proven practical for large-scale automation—across all areas of complex applications, and even across system boundaries. ! www.stpmag.com •
15
Mixing Apples And Oranges Is Good for Fruit Salad— And for Automated Function Testing
By Alan Berg
E
very programmer I know has a favorite program editor. Some programmers like vi; others are partial to Emacs. Personally, I’m addicted to Eclipse.
Could the same loyalty factor be at hand in automatic assertion testing for application functionality? Functional tests involve sending requests to systems and checking the responses for correctness. Selenium and JMeter are two tools that can perform these automatic functional tests against Web applications. They are open source and free. JMeter (http://jakarta.apache.org/jmeter) can generate complex test plans, monitor and report from a large range of options, and validate results via a range of assertion types. Therefore, this well-designed tool is a natural for load testing and performance analysis of Web services and applications, as well as live assertion testing of various types of data connections. Selenium is a set of tools that enable automated functional testing via control of Alan Berg is the lead developer at the University of Amsterdam.
16
• Software Test & Performance
Web browsers. The two main tools are Selenium Remote Control and the betterknown Selenium IDE (www.openqa.org /selenium-ide/), an IDE and extension set for running tests within a Firefox 1.5 or 2 browser. Selenium Remote Control works with a wide range of browsers and enables programmatic testing. However, here I’ll focus on the Firefox embedded IDE version. Because of its popularity, elegant simplicity and intuitive clarity, the IDE is the simplest to describe and offers the easiest approach to generating tests by hand rapidly. Now, let’s compare and contrast these two well-known tools and explore how to use them together for the best and quickest coverage of relevant test plans.
JMeter vs. Selenium JMeter, which is built in Java, has expanded over time to include databases, LDAP, FTP, NOVEMBER 2007
SOAP and JMS. I’ve used JMeter often to stress-test infrastructure and watch for performance bottlenecks. It can enact automatic assertion tests while simultaneously stressing applications. JMeter offers a broad range of assertion possibilities. It measures information, such as whether specific strings or regex expressions exist in the returned HTML, or if the size of the response is within predefined limits. Also testable are response-time ranges and the correctness of a returned XML or HTML packet. For an in-depth look at using JMeter as a stress-testing tool, including the mechanics of its installation, see my article “Stress Testing: Take an Open Source Approach” in the Jan. 2006 issue (http://stpmag.com/backissues2006.htm). While JMeter is a stand-alone application, Selenium sits inside the Firefox Web browser as a plugin. This enables Selenium to record functional tests as the user browses the application. JMeter has a proxy server that can intercept and record requests, but it sits outside the browser and doesn’t fully understand the nuances of client-side scripting languages. NOVEMBER 2007
Selenium understands JavaScript and can send the correct input to pop-up windows and assert against confirmation dialogs. JMeter has little direct understanding of JavaScript applications, limiting its usefulness for script-rich applications. JMeter has broader capabilities than Selenium. It combines stress- and assertion testing with an expandable framework. Its wellestablished, active development community adds new assertion tests in each new version. Further, JMeter can work with a number of threads at a time, and can hit the infrastructure hard while performing basic functional tests under load. Does the response take too much time? Is the size of the response incorrect? Is the returned information badly formed? JMeter can answer these questions and, if finely tuned, can place boundaries around the acceptable behavior of complex infrastructures under realistic production loads. Selenium, on the other hand, understands JavaScript better than JMeter does. Assertion tests against confirmation boxes are more www.stpmag.com •
17
FRESH FUNCTION TEST COMBO
viable. Selenium is also intuitive and obvious, the easier of the tools to learn. JMeter caters to session management via the cookie manager and certificate management. It allows large sets of data such as username and passwords to be stored as files in the bin directory and used to pump the parameters into threads. This allows for a large number of simultaneous and highly personalized thread requests to any given Web application. Selenium scores well with its easy-tomanipulate plans. JMeter JMX plans are
address http://localhost so that no information escapes onto the Internet. The application is a simple JavaScript affair. When you press the button found on the index page, a confirmation dialog should pop up asking if it’s OK to redirect. Agreeing with the confirmation redirects you via JavaScript to the message found in good.html. Canceling the confirmation gets you redirected to bad.html. Since bad.html doesn’t exist, the server sends a 404 error message back. good.html also has a link back to the index page for easy navigation.
FIG. 1: JMETER TEST PLAN RECIPE
in XML format and require an initial cost in time and energy to understand before you can successfully manipulate them. JMeter also has issues when working with session management of certain software applications (which shall remain nameless). The next sections will introduce you to the basic operation of both JMeter and Selenium and enable you to perform basic assertion testing from the start.
The Web Application A prerequisite to testing a Web application is to have a Web application to test. Listing 1 and 2 (available for download at http://stpmag.com/downloads /stp-0711_berg.zip) make up a simple Web application. Place the files in the main directory of your test Web server; otherwise, you may need to modify the URLs mentioned in the tests later. I recommend testing under the safe loopback
18
• Software Test & Performance
Client-side scripting languages such as JavaScript are used by Web applications that wish for immediate responses from servers. However, to maintain session information, Java application servers store state and program-specific attributes in cookies. Therefore, fastresponding client-side applications come at the cost of simplicity.
JMeter Basics To run JMeter, you need to have Java 1.4.x installed and the current version of JMeter, which at press time was 2.2. Once downloaded, expand the archive file, enter the bin directory and run the JMeter script. The JMeter client will appear. If you’re new to JMeter, follow the instructions to build your first assertion test against the application mentioned in the listings. Note the simplicity and strength of the tests and the speed with which you can create them. As a conven-
ience, the test plan is downloadable at AssertionResults.jmx (http://stpmag .com/downloads/stp-0711_berg.zip). JMeter has a plan—a test plan that tells the client application what to do. The test plan is composed of elements, some of which are allowed to reside in other elements and modify the behavior of their parents. For our recipe, the test plan contains these four elements: • The top-level element of all test plans is normally the thread group that tells JMeter how many concurrent threads to run. The practical limit of threads depends on how the plan’s weight and the strength of the client machine(s). From practical experience, 150 threads is sufficient to run against a static HTML site, and 50 threads for dynamic situations where processing of results is required. • As a child of the thread group, the HTTP request element makes an HTTP request to the Web server. • A potential child to the HTTP request is the request assertion. This assertion element is responsible for testing the HTTP responses against a set of test-defined patterns. Applying Perl 5–like regex expressions can be powerful because they make complex pattern-matching possible. • Assertion results, as a child of the thread group, listens for the responses and reports failed assertions. Our first simplistic recipe is as follows: 1. Right-click the Test Plan element seen just below the top-left corner of the JMeter main screen and then add a thread group. 2. Right-click the thread group and choose Sampler/add HTTP request 3. For the properties of the HTTP request, fill in the following two details: a. Server name or IP: local host b. Path: Index.html 4. Right-click the HTTP request and choose Assertions/Add Response Assertion. 5. In the response assertion dialog, press the Add button and add the text “Hello World” to the blank box that will appear. 6. Repeat, adding the text “Test NOVEMBER 2007
FRESH FUNCTION TEST COMBO
Form.” Figure 1 shows the results of your actions to this point. 7. On the left side of the main JMeter page, right-click the thread group and add Listener/Assertion results. 8. Press Ctrl–R to run the test. At this point, JMeter will ask your permission to save the plan. After running, view the assertion results. The results should be similar those in to Figure 2. Notice that JMeter didn’t mention the assertion that succeeded—only the failure: text expected to contain /Hello World/
To erase all the generated results, press Ctrl-E. Let’s revisit the assertion patterns and try the following new patterns. Note: JMeter will report the first assertion that fails. Therefore, to see the newer assertions fail, you’ll need to remove the older ones. In the case above, the “Hello World” assertion needs first to be removed. Add the following pattern: <title>.*</title>
This will make sure that there’s a title in the page. The full stop contained in the pattern tells JMeter that any character is acceptable. The * is a regex and implies that JMeter should search for one or more characters. Basically, the whole line implies search for all the characters contained within the title tag. Pattern recognition is case sensitive, so “test form” will fail while “Test Form” will succeed. To remove this case sensitivity (always useful for long sentences), place an (?i) before the string, as follows: (?i)test form
JMeter doesn’t stop at text assertions; it can handle a number of other potential assertion elements. For example, you can test for size of response or delay in time. These are good metrics for NOVEMBER 2007
FIG. 2: JMETER ASSERTION RESULTS
warning about weak points of the infrastructure for a given stress test. You also can apply MD5 checksums to determine if an evildoer has changed your content since you last checked the site. JMeter even addresses complex enterprise issues such as XML validation and can identify the number of invalid HTML entries in a given Web page. Testers can use JMeter with XPath to navigate XML documents and look for the correct structure and data values. But as versatile as JMeter is, it does have issues with handling JavaScript. That’s where Selenium comes in.
Selenium IDE Basics The Selenium IDE is among the most elementary of tools to set up (see
FIG. 3: WELCOME TO SELENIUM
“Selenium Setup” sidebar). Also simple is the application of its browser-driven functional test–plan generator. Selenium resides within Firefox and records events as you surf. The tester can select parts of any given Web page, such as a given piece of text, to assert against. The resulting plans can be saved as human-readable HTML documents that are straightforward to manipulate. But my favorite feature is Selenium’s ability to list all available and currently valid options during the process of typing. List completion is a simple yet elegant way to enable instantaneous selftraining. The main thing to remember before creating your own test plans is to make sure you’re at the right Web location in your browser before running any previously recorded test plans. To achieve this, browse to the right location by hand or if necessary, use extra Selenium open commands in the test plan itself. If you have any inconsistent errors in your test plan, this would be the first issue to verify. The HTML-based test plans are easy to read. To run a series of these plans is simply to create a second HTML document that links to all the test plans. The following recipe for a test plan checks the functionality of an index page and a page redirected to after agreeing with a confirmation pop-up. 1. To activate the IDE, click the Tools menu option at the top of the browser and select the Selenium IDE option. You should see a new window, as shown in Figure 3. 2. Enter the base URL for the example application www.stpmag.com •
19
The Future of Software Testing...
February 26â&#x20AC;&#x201C;27, 2008 New York Hilton New York City, NY Re gis ter N ow
A BZ Media Event
Bri a n Be h l e n d o r f
R e x B la c k
Jef f Feldstein
Rober t M ar t i n
G a r y McGraw
Al a n Pa ge
Stretch your mind at FutureTest 2008 â&#x20AC;&#x201D;
Rober t Sabourin
an intense two-day conference for executive and senior-level managers involved
Joel Spolsky
with software testing and quality assurance. Our 10 visionary
Tony Wasser m a n
keynotes and two hard-hitting panel discussions will inform
A la n Z e i c h ic k
you, challenge you and inspire you.
www.futuretest.net
FRESH FUNCTION TEST COMBO
in the main Firefox window. For example: http://localhost
3. Select and right-click the Test Form text. Firefox now presents an option to “verifyTextPresent Test Form.” After selecting that option, you should see: open / verifyTextPresent Test Form
4. Next, click the Press Me button and accept the confirmation dialog. 5. Right-click the new page and highlight the text “The Good.” Then choose verifyTextPresent The Good. 6. Save the test plan as simple.html using the File>Save Test As… option. Well done! You have now saved your first test. To run the saved plan, press the green Forward button at the top of the Selenium IDE. Test results should be similar to those shown in Figure 4. The number of passing tests is shown in green; failing tests in red. To view the HTML version of the plan, select the IDE’s Source tab. Notice that all commands reside in a single HTML table and that each command occupies a row in that table. If you feel brave enough to extend the
FIG. 4: SELENIUM TEST RESULTS
22
• Software Test & Performance
plan by hand (for instance, to add extra rows), you can manipulate the HTML directly. This is not a particularly interesting exercise for a human being, however, but the approach is valid for automatically generating test plans via scripts. If you have time, I recommend trying out this option. For more control over the plans, you may also manipulate the source via the Table tab in the Selenium IDE. For example, right-click assertConfirmation Redirect? to add another assertion to test the confirmation dialog, and then choose Insert New Command. In the Command Select box, you have a choice of all the commands available. Randomly select a few to see what they can achieve. Up until now, we’ve focused on running one plan, but you can also bundle test plans by having a second HTML page with links to all the test plans you wish to run. As you’ve previously seen for the structure of the test plans, you need to place the links to each plan in a table in the test suite HTML file; something like this: <table> <tr><td>All tests</td></tr> <tr><td><a href=”simple.html” >A simple test</a></td></tr> </table>
Firefox can run parts of itself via so-called chrome URLs (http://kb .mozillazine.org/Chrome_URLs). These URLs activate different Firefox functions and any included browser
S
ELENIUM SETUP The Selenium IDE, which plugs into Firefox, takes only a couple of clicks to install. In the most recent versions of Firefox, all you have to do is visit the link www.openqa.org /selenium-ide/download.action. After you click the link at the top of the Firefox browser, a yellow warning pane with the following comment will appear: “To protect your computer, Firefox prevented this site (www.openqa.org) from installing software on your computer.” Click the now-available Edit Options button at the top right; then click Allow; then close. Click the link again and Install. For certainty, I advise you to restart Firefox.
extensions. For example, the following URL runs the bookmark manager: chrome://browser/content/bookmarks/bookmarksMan ager.xul
If the test suite testSuite.html is in the c:\tests directory, you can activate the test runner using this URL: chrome://seleniumide/content/selenium/TestRunner .html?test=file:///c:/tests/testSuite.html
Running the test suite is simply a matter of clicking the All button (see Figure 4). You’ve now explored all the core functionality of the Selenium IDE and learned how to activate the TestRunner. With practice and patience, you’ll start to discern regular patterns of tests. Because the plans are stored in predictable HTML, it’s possible to automatically create relatively complex tests via simple scripts that generate the plans directly. You can now see that both tools have their strengths. JMeter is multithreaded and able to perform complex analysis. Selenium understands JavaScript and is easy to learn and build on. Both tools have a place in my lab, and I’ve often used both simultaneously to quickly generate reproducible failures. ! NOVEMBER 2007
Winning The SOA Shell Game How To Keep Sight of the Ball When Someone Else Is Controlling the Shells
T
esting SOA-based applications can be like taking aim at a moving target. Even if the services you’re testing were developed inside
company walls, tiny unseen changes can wreak havoc on automated tests. Multiply that by 1,000 and you approach the shell game that is automated testing of third-party services. This article describes automation Jim Murphy is lead architect and co-founder of Mindreef, which makes SOA testing solutions. NOVEMBER 2007
techniques and methods I’ve developed and used effectively over many years in the field of software development and testing. I’d venture to say that no organization, at least in my experience, has nailed SOA yet, especially in the area of quality management, with all its implications and ramifications. And it
looks like most companies will venture down that new road at some point. There’s a profound difference between testing a distributed system where you control all the pieces and testing a system composed of services not under your control. External services represent components with potentially different design philosophies, business contexts, technical attributes, design goals and life cycles. Incorporating this variability www.stpmag.com •
23
Photograph by Greg Nicholas
By Jim Murphy
AUTOMATED SOA TESTING
So for the purpose of this discussion, let’s define external services as services that are owned by people not on your immediate project team. Figure 1 describes examples of server ownership domains that appear naturally as your SOA projects evolve. The services closest to you are services built within the scope your current project. Quality is handled using a traditional approach.
FIG. 1: EXAMPLES OF SOA SERVER OWNERSHIP DOMAINS
Project
Department
Company
Internal
Partner
Vendor
Internet
External
Building an Order System a` la UPS into a cohesive picture of quality is daunting. The task is particularly challenging if you have a traditional enterprise testing approach where the status quo is 100 percent determinism. Parallel test environments are the norm, and you as a tester can take hold of the requirements and track them through development and change management systems. As SOAs become broader and more prevalent, a complete test-lab approach grows more difficult to implement because you lack direct control over all services in your dependency chain, and external services have other owners. What’s more, SOA quality and our traditional view of quality are difficult to balance. Traditionally, we say that a system’s quality is determined by how well the system meets its requirements. But since a fundamental driver for SOA is agility, or resilience in the face of changing requirements, this quality measure is more dynamic than the traditional measures we’re accustomed to. There are strong business drivers motivating the use of services outside a project scope. These include the cost savings of reuse and the ability to gain value from the efforts of others. For example, services such as those offered by salesforce.com allow companies to avoid building their own CRM, which might be outside their core competence.
Whose Services Are Those, Anyway? As you move across the ownership domain continuum, you have less visibility and control of services. Department and company ownership represent things you could theoretically change, but their life cycles are probably different than yours, so any enhancements, modifications or fixes
24
• Software Test & Performance
will likely need to be performed by someone else based on their own priorities. Depending on the size of the company and its political climate, you may not have to get too far removed before internal corporate services feel as far removed as external ones. Partner and vendor services come in a variety of forms. Some are external services available via the increasingly popular software-as-a-service (Saas) model. Amazon.com, FedEx, UPS, Salesforce .com and many others have services that can greatly enhance your business processes. Software vendors offering packaged application suites have service interfaces for integration that you’ll undoubtedly use in your SOA projects. The behavior of these systems, their versioning life cycle and quality will be dictated by those vendors and your corporate update schedule—likely quite separate from your project schedule. Finally, at the end of the spectrum are services available on the Internet without a service-level agreement. These are typically low-value services that providers may change, stop supporting or abandon without notice. One would think that critical enterprise applications wouldn’t depend on a relationship with random third-party service providers, but due to the ease of connectivity of Web services and the ubiquity of HTTP, there isn’t much stopping a developer from using them. Knowing these dependencies exist is the first step in assessing the quality picture.
•
Imagine that you’re building a new order-processing system for your business. A service inventory perspective for the project might look similar to that in Figure 2. Order and invoice services are built within the scope of the example project but depend on services in a vendorprovided ERP system, a tax calculation service in a finance system owned by a separate department and the reuse of shipping services built and maintained by your department that internally use external vendor services to initiate and track shipped orders. From Figure 2, you can see that the quality of your company’s telesales application depends on the order management services you’re building, which in turn depend significantly on the quality of its dependencies. For different reasons, those dependencies aren’t under the direct control of the project. Updates to internal configuration of the ERP system or a major version upgrade could put the order management system at risk. To prevent such a disaster scenario, you need to take stock of your service inventory. Testing a service-oriented architecture implies a service-oriented approach. Your primary abstraction of interest is the service. Start by taking stock of the services involved in the project, classifying and layering them into domains. You’ll have two primary classifications: consumers and providers. These will exist across various architectural, technical
As SOAs become more prevalent,
a complete test-lab approach grows more difficult.
•
NOVEMBER 2007
AUTOMATED SOA TESTING
and ownership domains. Figure 2 shows an architectural domain model for the dependent services of the order management application, with ownership levels from the continuum overlaid in color. Assign ownership to this domain model so you know who’s responsible for service support and life-cycle decisions. This is a key aspect of SOA governance, and one that can be addressed by a service registry system. Such a registry might be as simple as a Excel spreadsheet or a sophisticated UDDI registry and repository system. Additional attributes at the service level help to prioritize and plan testing. Across the services you’re building, identify service risk, complexity, effort and dependencies. Not all services in your project require the same level of testing. Some will have simple interfaces with no side effects, while others could represent long-running transactions that change the world around them. When testing services on your project, also remember that the level of testing required may be more than you’d expect if the services will be reused in the future in a different context.
depth and style of testing depends on the nature of the service. Strive for coverage of behavior tests against all operations described in the WSDL and all combinations of XML structural options. Complex type structures in a WSDL
•
uncovering code paths that are exercised with simple request-message structures but complex data combinations. In my experience, you’ll find a mix for most services that represents two dimensions of request message variation: data elements and data structure. Performance testing is similar to testing other systems, particularly Web applications, but there are several critical differences. First is data management. Testing a user visiting a Web site typically has much simpler test data–management requirements than a typical SOA performance test. A typical HTTP GET request requires a URL with a handful of encoded URL parameters. A typical service invocation requires dozens of structural and data elements. Also, the nature of Web site load and service load can be quite different. Web site load is generated by many humans with a browser downloading server resources every 1–10 seconds or so. Service load is generated from other systems—not humans. The level of concurrency is often much lower, the transaction throughput requirements are often higher,
Performance testing for SOAs presents a significant difference: managing dependencies.
The Varying Aspects Of SOA Testing Your SOA testing efforts will focus on a few key areas as described in Table 1. The service interface is a critically important point of focus for highquality services. The WSDL and associated XML schema documents (XSDs) define how consumers of your services will interact with it. WSDL interfaces that leak implementation details or that use non-interoperable constructs will limit the degree of autonomy of future consumers. This limits who will use the service in the future. You can increase service interface quality by establishing design policies for WSDL and XSD as you’d do with coding standards. Service behavior testing looks at how an individual service unit responds to various types of input, both positive and negative. This is the unit test applied to a service. The NOVEMBER 2007
• are often marked as optional but aren’t actually coded as options in the service, or an element in a request message is described as a choice of many possible options. Test everything that the contract advertises to get full WSDL contract coverage of the service. Service behavior may be heavily data oriented. Use white-box techniques (see Software Test & Performance September 2007 issue) for
FIG. 2: DOMAIN MODEL FOR DEPENDENT SERVICES ERP Inventory Service Order Management System
Telesales App
Order Service
Invoice Service
Finance
Ups.com
Tax Service
Tracking Service
Logistics
FedEx.com
Shipping Service
Tracking Service
www.stpmag.com •
25
AUTOMATED SOA TESTING
and the typical traffic includes many more POSTS compared to GETs.
Do You Really Have A Parallel Universe? When performing behavior and performance testing, the most significant difference is managing dependencies. In our example, every invocation of the order services will result in calls to its dependent services, possibly creating undesirable side effects that need to be understood. The traditional approach to this problem is to stand up a completely disconnected parallel test or staging system. This will continue to be possible for your project services and applications, but will vary with dependencies. Some services will have test instances, even external thirdparty services. To use test instances, you must have at minimum the ability to configure the endpoints used by service-consuming code in your systems. The often-described “Registry Lookup Pattern” in which consuming code looks up the service endpoint at runtime is almost never used. Instead, endpoint URLs are configured in system metadata and configuration files in arcane ways. Knowing where TABLE 1: KEY these are, and how to Interface point them to test instances, is critical Behavior for preventing 1,000 delivery trucks from showing up at your Performance building. Emerging technologies like Service Component Architecture (SCA) (www .ibm.com/developer works/library /specification/ws-sca/) look like promising ways to explicitly manage service endpoint URLs as well as other critical service component metadata. It’s very likely that if this is your first SOA style project, you don’t have a consistent approach to endpoint management. Endpoint URLs that are hard coded or buried in configuration files are problems waiting to be found. One all-too-common problem is that of rogue clients. A rogue client starts with a developer referencing a
26
• Software Test & Performance
Change happens for a variety of reasons and at different levels of severity. It can range from positive performance improvements and defect fixes that have a positive impact on the overall system to interface changes that break deployed systems. Look to your governance process to establish policies related to change management for internal services and encapsulate external services outside of your governance envelope. Use a runtime SOA management system to monitor your actual system performance and behavior. This will be the first thing to alert you when your systems are going down. Knowing that change in your deployed system is inevitable is the first step in managing and testing for it. There are several kinds of changes you should be watching for, testing against and prepared to handle. Service behavior changes. Changes in service behavior are often testing, but it’s likely that you won’t internal to the implementation of a have complete coverage. And, even service and are meant to increase when you have staging instances, if functionality, fix a defect or improve they’re provided by external parties in performance. a shared environment, they won’t take Imagine a tax service that was inikindly to you performing load tests tially deployed to calculate sales tax against them. for the U.S. but was enhanced to also So when you don’t have a test calculate sales tax for Canada. The SOA TESTING AREAS interface for the Validate the service interface WSDL and all associated schema service doesn’t documents against technical and design policies change, but the Verify that a service behaves as designed by exercising full coverage of service implemenservice operations and input messages as described by the service inte tation does. This face contract should have miniMeasure the response time of request processing and transaction mal effect on your throughput of services against performance goals consumers. Services that are successfully adopted by growing coninstance and you can’t load-test sumers may encounter performance against the ones you do have, you limits that were not a problem initially. need a new strategy to isolate your In this case, changes to service systems under test from your dependimplementation—including caching encies. and clustering—can introduce imTable 2 describes several approachproved performance, likely without es you can use to isolate services and impacting your systems. However, applications under test from their when system timing changes signifidependencies. cantly, you may uncover previously Plan for Change undetected race conditions in your The most significant alteration brought system that you don’t expect. This is about by SOA is that of unanticipated one of the many sources of gremlins in a deployed system. change. This prospect might seem scary Service interface changes. When the to traditionalists, but such change is WSDL changes, the service owner often essential for reaping SOA’s benefits. Web service, generating a Web services proxy and wrapping the client in a handy API. The code is then bundled as a library and distributed to unwitting development teams. Those teams end up using that service without even knowing it. Test or staging instances are valuable for isolating side effects during
•
Knowing that change is inevitable is the first step in managing and testing for it.
•
NOVEMBER 2007
AUTOMATED SOA TESTING
creates a new endpoint for the new version of the service so as not to impact existing consumers. But it’s easy for this seemingly obvious step to be missed when service providers take a code-first approach instead of contract-first approach. In code-first, the service developer thinks in terms of the implementation. The WSDL is generated from that implementation at runtime. When the developer changes data structures or methods in the implementation, the WSDL changes automatically. With contract-first, the WSDL is considered the primary interface of the service and its changes are managed explicitly. It takes more discipline to do, but it’s essential for managing change and helping consumers of your service to do so. Monitoring a service’s WSDL for changes at runtime is one way to keep track of changes. This is especially important when the WSDL is made up of many nested XSD files contributed and owned across domains. Semantic changes at the process level. In highly cohesive systems, data from one service is used, transformed and fed into another. When this underlying data changes, you may encounter problems that didn’t exist at design time. These won’t appear at the service level; only at the interaction level. Changes in the cloud. Increasingly, enterprises are deploying sophisticated SOA infrastructure such as XML appliances, Enterprise Service Buses (ESBs), registry/repositories, and runtime management and governance systems. These systems exist in the network cloud between service consumers and providers. When changes are made to policies and configurations, they can have an unanticipated effect on existing systems. Changes may be made to message routing rules, security policies or message transformation formats. Performance changes. If your project makes use of a great new service, chances are good that others will want to do the same. What starts as a wellbehaved dependency may change unexpectedly as others begin using the service, putting more load on the system. Service-level agreements (SLAs) that guarantee response time and availability levels are used to manage these issues. But, as with other consumer protections, it’s useful to “trust NOVEMBER 2007
TABLE 2: ISOLATING DEPENDENCIES Description
Method
Advantage
Disadvantage
Simulations
Create a mock service that will stand in for the real thing at an endpoint you control. Start with WSDL and define stubs.
You manage it. Web services with WSDL are ideal candidates.
Building a rich simulation can be time consuming.
Test account
A distinguished identity serves as a test account.These accounts are often defined by service providers instead of using a separate test system.
Likely already use identity in your systems. Low cost and no added complexity.
Only used by services that respect it. Account data can vary by ownership domain.
Message header
Define a test SOAP header that indicates a test message should not have side effects and must be propagated. Use mustUnderstand =true to ensure systems support it.
Transparent to the system; hidden in the plumbing.
Services have to be written to respect the header and process accordingly.
Domain data partitioning
Use application data to indicate a test. Use a specific indicated ISBN number when ordering tests, or name a test warehouse when doing inventory picks, etc.
Simple to perform.
Might not isolate side effects. Easy to let a test get away from you by using the wrong data.
Compensating transactions
Clean up after live messages have been sent. For example, order a book; then cancel it quickly.
Simplest to perform— just use the system.
Might not compensate properly. Watch out!
but verify” that the SLA you’re expecting (and perhaps paying for) is what you’re getting.
Testing at Runtime External dependencies require awareness and vigilance at design time and runtime. Continuous integration (CI) is a widely adopted technique to identify breaking system changes as early as possible. Testers can extend that philosophy by bringing continuous integration to the SOA layer. Incorporating service side-effect isolation techniques, runtime testing of an SOA can help increase the visibility of quality across the system. Runtime tests, called probes or synthetic transactions, are common in operations centers and add a deeper indicator of application health. Often, technical operations monitoring entails watching systems at a hardware level. Although the status lights are green, the applications and services
might not be functioning correctly. Use runtime tests for a clearer picture of service and application health.
Plan for Runtime Problems Problems will always exist. To solve them, you must locate the source of the trouble, gather relevant artifacts for support and escalate to development for investigation. Extending the test environment into production will give QA and test teams a powerful tool to assist support and development in these highly visible, expensive crises. With a focus on dependencies—the most likely sources of change—as well as techniques for isolating and identifying problem causes, you can provide an increasingly valuable service in runtime issue mediation. You’ll better understand the entire system and diagnose problems, because the runtime environment isn’t much different from the test environment. ! www.stpmag.com •
27
Photograph courtesy of OK Corral, Tombstone, AZ
By Elfriede Dustin
I
t’s the American Wild West, version 2.0. Data security threats come in many shapes and sizes,
and just as many tools exist to help you protect your assets from attack. If you’ve been hunting for the Wyatt Earp of tools to help defend your Tombstone of a company, look no further. Antivirus, intrusion detection, firewall and network sniffing solutions are now considered standard equipment and are assumed to be in place. Covered here are security testing tools to complement those. This article presents a summary of the types of security testing tools available with the requirements met by each, and provides a map of vendors to those tools. Example security-testing tool requirements are summarized in Table 1. Each of these requirements can be broken down in further detail. In addition to the tool requirements in Table 1, related tool quality attributes should be considered during a tool evaluation, such as: • Installation. Ease of product installation • Uninstall. Cleanliness of uninstall • Vendor experience and reputation • Training and support. Adequacy and responsiveness of training and support • Documentation. Completeness and comprehensibility of documentation • Configurability. Ability to adapt the product to each evaluation activity, ease of setup of each new evaluation project • Tuneability. Ability to guide and focus the analysis toward specific desired features, flaw types or metrics; ease of “fine tuning” or tailoring different assessments of the project • Integratability/Interoperability. Level of integration supported by the tool into a larger framework or process • Balance of effort. Ratio of tool analysis to human analysis in finding actual flaws; how much information and/or effort the tool needs from the operator to complete its analysis
• Strength of analysis criteria • Input/output parameters • Expandability. Whether the tool suite works on applications and infrastructure A comparison of the available features in commercial and open source security-testing tools with the example requirements is described in Table 2. Various static analysis and dynamic analysis tools, and fuzzers or penetration testing tools, meet the requirements in Table 1 and can be used at the source code and/or binary and executable level. Let’s look at each of those types of tests.
Static Analysis vs. Dynamic Analysis 1 Program analysis can be categorized into two groups according to when the analysis occurs. Static analysis involves analyzing a program’s source code or machine code without running it. Many tools perform static analysis, and in particular compilers. Examples of static analysis used by compilers include analyses for correctness, such as type-checking, and analysis for optimization, which identifies valid performance-improving transformations. Also, some stand-alone static analysis tools can identify bugs or help visualize code. Tools performing static analysis need only to read a program in order to analyze it. Dynamic analysis involves analyzing a client program as it executes. Many tools perform dynamic analysis; for example, prowlers, checkers and execution visualizers. Tools performing dynamic analysis must instrument the client program with analysis code. The analysis code may be inserted entirely inline; it may also include external routines called from the inline analysis code. The analysis code runs as part of the program’s normal execution, not disturbing the execution (other than probably slowing it down), but doing extra work on the side, such as measuring performance or identifying bugs. Static analysis can be sound, as it can consider all execution paths in a program, whereas dynamic analysis is unsound in general, as it considers only a single execution path. However, dynamic analysis is typically more precise than static analysis
A Wild
West Security Test Tool
Shootout
Author of “Effective Software Testing” (Symantec Press, 2006) and a number of other books on software security, Elfriede Dustin is an independent software testing and QA consultant in the Washington D.C. area. NOVEMBER 2007
www.stpmag.com •
29
ON THE DRAW: SECURITY TEST TOOLS
TABLE 1: EXAMPLE SECURITY TEST TOOL REQUIREMENTS Number
Example Requirement
1
Tests source code for any type of vulnerability
2
Tests binaries; i.e., executables, for any type of vulnerability
3
Detects issues specifically related to real-time systems, such as deadlock detection, asynchronous behavior issues, etc.
4
Creates a baseline and regression-tests any type of patch for newly introduced vulnerabilities
5
Provides assurance that already verified source code hasn’t changed once it’s built into an executable
6 7
Helps a tester find the trigger and/or the payload of malicious code
8
Can be applied during the software development life cycle to check software for vulnerabilities in software in various stages
9
Provides minimal false positives or false negatives; i.e., effectiveness of tool has been statistically proven
10
Is able to handle foreign source code; i.e., foreign language comments, etc.
11
Is compatible with required platforms (i.e. Unix, Linux, Windows, etc.)
12
Is compatible with required development languages; i.e., C, C++, ADA, Java, etc.
13
Is able to scale to test source code or executables of various size; i.e., up to millions of lines of code
14
Leaves no footprint, makes no changes to the software under test; does not adversely affect code
15
Generates useful diagnostic, prognostic and metric statistic
Provides information about the binary such as which local system objects are created
because it works with real values in the perfect light of runtime. For the same reason, dynamic analyses are often much simpler than static analyses.
Source Analysis vs. Binary Analysis Program analysis can be categorized into another two groups, according to the type of code being analyzed. Source analysis involves analyzing programs at the level of source code. Many tools perform source analysis; compilers are again a good example. This category includes analyses performed on program representations that are derived directly from source code, such as control-flow graphs. Source analyses are generally done in terms of programming language constructs, such as functions, statements, expressions and variables. Binary analysis involves analyzing programs at the level of machine code, stored either as object code (pre-linking) or executable code (post-linking). This category includes analyses performed at the level of executable intermediate representations, such as bytecodes, which run on a virtual machine. Binary analyses are generally done in terms of machine entities, such as procedures, instructions, registers and memory locations.
30
• Software Test & Performance
Source analysis is platform-independent (architecture and operating system), but language-specific. Binary analysis is language-independent but platform-specific. Source code analysis has access to high-level information, which can make it more powerful; dually, binary analysis has access to low-level information (such as the results of register allocation) that is required for some tasks. One advantage of binary analysis is that the original source code is not needed, which can be particularly important for dealing with library code, for which the source code is often not available on systems.
Application Footprinting In addition to static and dynamic analysis, a useful way of detecting vulnerabilities on binaries is application footprinting. This is the process of discovering what system objects and system calls an application uses. Application footprinting uses similar inspection techniques as network footprinting, but focuses on just one application. It helps determine how that application receives inputs from its environment via operating system calls and what OS objects that application is using, such as network ports, files and registry keys.
Fuzzing, or Penetration Testing 2 Penetration testing or fuzzing is a software testing technique often used to discover security weaknesses in applications and protocols. The basic idea is to attach the inputs of a program to a source of random or unexpected data. If the program fails (for example, by crashing, or by failing built-in code assertions), there are defects to correct. The majority of security vulnerabilities, from buffer overflows to cross-site scripting attacks, are the result of insufficient validation of user-supplied input data. Bugs found using fuzz testing are frequently severe, exploitable bugs that could be used by a real attacker. This has become even more true as fuzz testing has become more widely known, as the same techniques and tools are now used by attackers to exploit deployed software. This is a major advantage over binary or source auditing, or even fuzzing’s close cousin, fault injection, which often relies on artificial fault conditions that are difficult or impossible to exploit.
Threat Modeling After you’ve performed your dynamic and static analyses, footprinting and penetration testing, the results need to be evaluated. Threats must be listed, ranked and mitigated according to the ease of attack and the seriousness of the attack’s potential impact. Table 1 describes at a high level the requirements that an example testing tool must meet, and lists requirement #6 as “Helps a tester find the trigger and/or the payload of malicious code.” Using threat modeling, potential security threats are hypothesized and evaluated based on an understanding of the application’s design. Threat modeling is further described in “The Art of Software Security Testing” (AddisonWesley, 2006), which I cowrote with Chris Wysopal, Lucas Nelson and Dino Dai Zovi.
Automated Regression Testing Automation has proven effective for regression testing. Automated regression testing can help verify that no new vulnerabilities have been introduced to previously verified functionality. In addition to those listed in Table 2, numerous other automated software testing tools can aid in verifying that a patch or update didn’t break previously working functionality or that previously secure functionality remains trustworthy. NOVEMBER 2007
ON THE DRAW: SECURITY TEST TOOLS
Wireless Security Assessment Tools An increasing number of testers need to verify the security of wireless clients. KARMA (www.theta44.org/karma/index .html) is a set of free tools for assessing the security of wireless clients at multiple layers. Wireless sniffing tools discover clients and their preferred/trusted networks by passively listening for 802.11 Probe Request frames.
From there, individual clients can be targeted by creating a Rogue AP for one of their probed networks (which they may join automatically) or using a custom driver that responds to probes and association requests for any SSID. Higher-level fake services can then capture credentials or exploit client-side vulnerabilities on the host. Numerous security testing tools are
available to meet your requirements. It’s important to evaluate them based on your specific system engineering environment, so that you can arm yourself and your company with the best defenses for your needs. ! REFERENCES 1 http://valgrind.org/docs/phd2004.pdf 2 www.hacksafe.com.au/blog/2006/08/21/ fuzz-testing-tools-and-techniques/
TABLE 2: OPEN SOURCE VS. COMMERCIAL SECURITY TEST TOOLS Testing Tools
Type
Pros
Cons
Example Vendors
Runs on binary/executable; tests code during runtime; code is instrumented to allow for runtime evaluation
Doesn't require source code; can be run on binaries; typically excels at detecting dynamic memory corruption, resource leak and threading defects; pinpoints precisely what went wrong in a given execution trace of the code; produces relevant results. One advantage that binary code scanners have over source code scanners is the ability to look at the compiled results and factor in any vulnerabilities created by the compiler itself. Furthermore, library function code or other code delivered only as a binary can be examined.
Requires test cases; tied to test suite; test is only as good as tests that were designed/ executed; this can result in false negatives and can be slow when trying to cover all paths; potential performance implications that mask bugs (non-deterministic); defects are discovered later in the software development life cycle
Open source: Valgrind (www.val grind.org), Vendor-provided: Rational/IBM Purify (www.ibm.com), http://samate.nist .gov/index.php/ Binary_Code_Scann ers
2, 8, 9,10, 12, 14, 15
Application footprinting
Runs on binary/executable; process of discovering what system calls and system objects an application uses
Will provide information about the binary such as which information the application accesses over a network or which local system objects are created
Bugs are discovered late in the software development life cycle
lsof on UNIX; or strace/ktrace/truss on UNIX; ProcessExplorer on Windows
7
Fuzz testing tools and techniques (also known as penetration testing)
Fuzz testing or fuzzing is a software testing technique often used to discover security weaknesses in applications and protocols
Useful for performing a black-box assessment of a network, server or application
Bugs are discovered late in the software development life cycle
Peach Fuzzer Framework (http://peachfuzz .sourceforge.net/)
2
Static code analyzers
Tests code during compile time; analyzes source code for possible vulnerabilities; can also find unused code
Can detect a wide range of coding flaws; no test cases required to find problems; much better coverage because it analyzes the entire code (without test cases), which can result in more test coverage; faster and many more possible outcomes/results; bugs are uncovered very early in the Software Development Lifecycle
Source code needs to be accessible; false positive problems; output can be very superficial if it doesn't understand how your software is built from the source code; often difficult to provide the entire picture of all the code (sometimes looking at only a code file at a time); based on heuristics which don't necessarily apply
Open source: Splint is open source version of Lint (http://splint.org); Vendor-provided: PRQA (Programming Research); www.program mingresearch.com or Coverity
1,3,4,6,10,11,12, 14, 15
Test tools for binaries Profilers, checkers, memory-leak detection tools, Binary code scanners
Test tools for source code
Features
Satisfies Table 1 Tool Requirement
Note: Security test–tool requirements are highly dependent on your system engineering environment.
NOVEMBER 2007
www.stpmag.com •
31
By L.R.V. Ramana
Y
our team has completed its testing and logged all known defects. You’ve generated your test metrics and published a test report
Things Aren’t Always How They Appear— What The Reports Say And What They Don’t
32
• Software Test & Performance
certifying application quality. Defect-to-remark ratio is 1:1, the defect severity index is sloping down week-to-week, and time-to-find defects is on the rise. All the metrics suggest a good quality application. However, most of what you published in the test report turned out to be wrong, the production release failed to perform in the market, and many defects were reported by customers. Customers are upset, and managers want explanations. Situations like this are certainly not rare. Poor quality of test cases or test practices and an inefficient test process could be among the multiple reasons for such problems. But one factor often overlooked is the test manager’s ability to analyze the different test metrics for what they do not reveal. This lack of understanding (as opposed to lack of interest) often leads testers to rely on a few test metrics to measure the application’s quality. Relying on test metrics by itself isn’t a cause for concern, but test managers and test leads should be aware of some caveats that metrics trends can suggest: • Test metrics alone don’t provide genuine insight into the application’s real quality. • Test metrics shouldn’t be examined in isolation. Different metrics should be compared and analyzed together for a more reliable test summary. • Conducting a root cause analysis as part of the metrics analysis
yields reliable results. • Thorough and systematic analysis of test metrics is important to make metrics a reliable tool to measure quality. • While some test metrics should be analyzed for trends over time (for example, multiple test cycles), other metrics should be analyzed only for a specific test cycle. Table 1 (see page 35) shows that some metrics are typically collected by a team as part of an organization’s test metrics programs. There may be many other metrics used for measuring. To keep it simple and highlight the importance of proper metrics analysis, we’ll stick to the four metrics depicted in the table. Some organizations categorize all issues identified and logged by test teams as remarks. Once a remark is logged, a defect triage process occurs in which valid remarks get converted into defects and are assigned to developers for repair. Invalid remarks include those classified as duplicates, invalid, unable to reproduce, as designed, as per requirement and so on. As for weights of severity, different organizations have different severity classifications, and weights are assigned based on the organization’s priorities. Also, your organization might assign a weight of one (1) to a low severity defect, while others assign a weight of zero (0), as in the example to follow. Let’s assume that the test team has been testing a product and has generatL.R.V. Ramana is a senior test manager at Virtusa, a development consultancy in Hyderabad, India. NOVEMBER 2007
Photo illustration by The Design Diva, NY
ed the metrics in Figure 1 (see page 34). After an analysis of the graphs in the figure, itâ&#x20AC;&#x2122;s safe to deduce the following:
Defect-to-Remark Ratio This shows a favorable trend. Except for a single drop, the graph has been constantly rising during the last 10 test cycles. The test team has been logging remarks, and most get converted into defects. This also indicates that the number of log entries marked as invalid, duplicates, etc. falls as test cycles progress. NOVEMBER 2007
What is not shown are the factors that could alter this seemingly favorable trend: Test coverage. Relying simply on the defect-to-remark ratio will result in a poor analysis if test coverage is low. For example, if the requirements coverage is 70 percent, this trend is true only as it relates to 70 percent of the requirement. The remaining 30 percent of requirements (not covered) might include crucial parts of the functionality. Assuming that structured testing was done, analyze the quality of the test cases to ensure that
theyâ&#x20AC;&#x2122;re designed to identify critical defects. Analyze the results by considering code coverage ratios as well. Defect severity. The team might be logging simple cosmetic remarks; for example, spelling mistakes and text alignment, while not logging any critical or high-severity remarks. While the defect-to-remark ratio might be favorable, the quality of the application in this case might still be questionable. Number of defects. The graph fails to indicate the total number of defects www.stpmag.com â&#x20AC;˘
33
MAKING METRICS MEANINGFUL
FIG 1: SAMPLE PRODUCT METRIC ANALYSIS: FOUR FIELDS
Defect-to-Remark Ratio 100% 80% Percentage
95%
80 %
95 %
80 % 70 %
65 %
60% 55 %
50 %
45%
40%
40 %
20%
Mean Time (in hours)
Mean Time to Find a Defect
0% 1
2
3
4
5
6
7
8
9
0 .5 5
0.6 0.5 0.4 0.3 0.2 0.1
0 .5 0 .3 5 0 .1 5
0
10
0 .1 5
0 .1
2
4
5.5
5 .5 5 4 .2
0
2
2 .2 3 .8
4
2
3
6
8
1 .7 5
1 .2 5
10
12
Percentage of coverage
Desired
6 5 4 3 2 1 0
100% 80% 60% 40% 20% 0% Requirements Coverage
Test Cycles
logged. If there were 1,000 defects in the application but the team was able to identify and log only 100, this would be cause for grave concern. Defect classification. The graph does not comment on the defect classification. Of the 100 remarks that were logged, it’s possible that 90 were technical defects; for example, wrong DB schema, wrong tables being updated or coding errors that cause system crash. It’s possible that the test team had done a good job in identifying technical defects but a poor job of identifying functionality defects (of the business logic). The graph doesn’t present this point of view.
Defect Severity Index The defect severity index shown in Figure 1 consistently slopes downward, indicating an increasingly favorable trend as severe defects are repaired. As the test cycles progress (from cycle 1 to cycle 10), the downward slope suggests an increase in application quality as fewer critical and high-severity defects are being reported. What isn’t shown is that while a fall in the defect severity index is definitely a good trend, looking at this index in • Software Test & Performance
8
10
12
Test Coverage
Defect Severity Index Severity index
6 Test Cycles
Test Cycles
34
0 .3
0 .0 5
0
0 .5
0 .2
Actual
Code Coverage
Documented vs. Executed Test Cases
Planned vs. Executed Test Cases
Type of coverage
isolation could be misleading. The following factors must also be considered for a meaningful analysis: Number of defects logged. Consider an example in which the test team executed two cycles of testing, with the number of defects logged against each of these cycles and their calculated severity index shown in Table 2. Assuming that all else remains constant, when we compare the severity index of cycle 1 with that of cycle 2, the latter appears to be favorable (as the severity index is lower). But when we add the number of defects logged against their severity, the result is the opposite. While the total number of severity 1 and 2 defects for cycle 1 is 15, the number of severity 1 and 2 defects for cycle 2 is 20. In terms of quality, cycle 1 is better than cycle 2 because cycle 1 has fewer high severity defects. This, despite the fact that the total number of defects logged in cycle 1 is more than in cycle 2, and cycle 1’s severity index is greater than that of cycle 2. Combining this metric with test coverage would show a similar result. Low test coverage, even when coupled with a downward severity index, would not
be a healthy trend. Defect severity. Let’s consider another example where the test team again executed two cycles of testing. The severity of defects logged against each of these cycles along with the calculated severity index is shown in Table 3. Looking at this severity index, it would appear that cycle 1 is better than cycle 2 because of its lower severity index . However, cycle 2 is actually better than cycle 1 because the total number of severity 1 and 2 defects is zero compared to a total of 8 severity 1 and severity 2 defects in cycle 1. Just because the severity index is lower in cycle 1, it’s not safe to assume that the quality of the application is better.
Mean Time to Find A Defect (MTFD) From Figure 1, we see that the mean time to find a defect has been increasing over time. While this can indicate a positive, that the team is finding it more and more difficult to identify defects, it doesn’t necessarily mean that the quality of the application is getting better over time. There can be hidden meaning here, NOVEMBER 2007
MAKING METRICS MEANINGFUL
too, and taking this metric in isolation can be misleading. Here are the other factors to consider: Defect severity. In the mean-time graph in Figure 1, testers took five minutes to identify a defect in cycle 1, and 55 minutes in cycle 10. While an increase in duration is positive, it’s important to look at defect severity before drawing conclusions on quality. It’s possible that the team logged only severity 4 defects during cycle 1 and severity 1 defects during cycle 10. In this case, however, the graph shows a favorable increase in time to find a defect. But the reality is that the quality is not up to the mark because higherseverity defects are being detected, even during cycle 10. Test coverage. Testing during cycle 1 could have been on the user interface layer, resulting in a greater number of (UI-related) defects detected in a shorter period of time. On the other hand, cycle 10 tests might have been focused on database transactions, hence resulting in fewer defects identified in a given period of time. However, it’s generally believed that a single database transaction defect (detected in a fixed period of time) would have a higher priority than 10 UI defects (detected in the same amount of time). Deceptively low defect rates are also possible if cycle 1 covered 90 percent of requirements and cycle 10 covered just 10 percent. A greater coverage often leads to the detection of more defects. Type of tests. Other factors for consid-
TABLE 2: NUMBER OF DEFECTS Severity S1 S2 S3 S4 Severity Index
Cycle 1 5 10 50 100 1.52
Cycle 2 5 15 30 100 1.43
eration are the types of tests conducted during each cycle. For instance, it’s possible that a simple regression test was conducted during cycle 1, while a reliability test was conducted during cycle 10. While a large number of defects during cycle 1 would be expected, it would not be a good sign to identify a defect every 55 minutes when running reliability tests.
Test Coverage Figure 1 shows requirement coverage at 70 percent, code coverage in need of improvement and 90 percent of the documented test cases have been executed, leading to a high degree of application test coverage. One hundred percent of planned test cases have been executed. This is what it could mean: Requirement coverage. While requirement coverage stands at 70 percent, there’s no indication of whether that 70 percent covers the most critical functionality or the most elementary. The graph is silent about functional vs. nonfunctional requirements and implicit vs. explicit requirements. In the event that the 30 percent balance forms the most important part of functionality, 70 requirement coverage
TABLE 1: WHAT TO TEST? TYPICAL METRICS No 1.
Metric Name Defect-to-remark ratio
Metric Definition Used to measure the number of remarks logged by the test team that get converted to defects. Ideally, all remarks should be converted into defects, resulting in a ratio (1:1) or 100 percent.
2.
Defect severity index
Weighted average index of the severity of defects. A higher severity defect gets a higher weight. S1 is a show-stopper, S2 is high severity, S3 is medium, S4 is low. Ideally, this should slope down as test cycles progress.
3.
Mean time to find a defect
Calculates the time gap between two successive defects being detected. As test cycles progress, these times should increase. A variant of this is the MTTF (Mean time to failure).
4.
Test coverage
Includes requirements and code covered by test cases. Coverage could also represent the number of developed test cases vs. the number of test cases planned for test execution, test cases planned vs. test cases executed, and developed cases vs. cases executed. Higher coverage percentages are better.
NOVEMBER 2007
TABLE 3: SEVERITY OF DEFECTS Severity of Defects S1 S2 S3 S4 Severity Index
Cycle 1 4 4 42 27 2.42
Cycle 2 0 0 75 2 2.92
is inadequate. Code coverage. While the 50 percent code coverage itself is inadequate, it’s also important to be explicit about what the 50 percent refers to. Does it refer to HTML and JavaScript code or to the application’s core Java code? How much of each? All of one and none of the other? Sufficiency of code coverage often depends on the type of application under test. In the case of applications with life-critical functionality such as medical equipment or aircraft control systems, 50 percent coverage is clearly lacking; perhaps even illegal. But if the AUT is for the HR team to track employees using the corporate gym, 50 percent coverage might be adequate. Another point is to clarify how the 50 percent is achieved. Was the testing done through white-box test cases, black-box test cases or a combination of both? Documented vs. executed test cases. While 90 percent is an adequate portion and presents a good view of this category, it might not be if the documented test cases are of inferior quality or miss a critical functional requirement. Planned vs. executed test cases. One hundred percent is the ultimate goal of any coverage measure, but the plan itself must be adequate and timed efficiently. If a team plans to execute user interface (UI) test cases in the system integration phase, for example, coverage could still be shown as 100 percent, but later prove incorrect. While the test metrics covered in this article are but a small subset of the large number of possible metrics to generate, the techniques explained here can help you understand an important aspect of test reporting. While the use of test metrics is a good practice, teams that depend totally on them must also understand the various components and factors that affect their conclusions about the quality of their applications. For any test report that presents metrics along with detailed root cause analysis, it’s important to consider different points of view to help prevent surprises in the future. ! www.stpmag.com •
35
Best Prac t ices
Consequence of SOA: Increased Test Complexity Look up the phrase walled ESB Links Stand-Alone garden on Wikipedia. As of Applications “OBI addresses basic things the evening in early fall about having a product sold when I pecked out this colonline via AOL, getting umn, the definition was “a paid and so on,” says Dasari, closed set or exclusive set who previously worked at of information services proNCR, a division of AT&T, vided for users.” which coincidentally is now The collective Wikipedthe nation’s top Internet ian wisdom then lists several service provider. “We’re trycompanies known for their Geoff Koch ing to unify several different walled-garden approach to projects and services at AOL.” content and applications. It should come Dasari describes OBI as a mostlyas a surprise to exactly no one who’s hovfrom-scratch application that uses Web ered about the Web in the last decade or services technology to stitch together so that AOL tops this list. siloed functionality with the enterprise Assigned this month to explore service bus in the BEA WebLogic prodtesting with service-oriented architecuct. Now, all the legacy AOL code that ture and Web services, both of which handles things like subscription and are ostensibly all about knocking user information, order management down walls and exposing services, I and billing is on the way to behaving as couldn’t resist checking in with stara semi-integrated whole. Faced with a crossed AOL, which in recent days has request from a user, the service bus slipped to #3 on the list of largest ISPs exposes several different APIs, commuin the land. nicates with the appropriate Web servicIt’s been well reported that part of es and finally, if all works according to AOL’s strategy for survival is to give plan, sends back a unified response. away its services and content for free As Dasari tells it, AOL, like so many to try to compete with Google and companies experimenting with SOA, is Yahoo! for advertising dollars. For a trying to make the most of existing IT company that lost nearly 1.8 million infrastructure and reuse discrete softsubscribers in the first quarter of 2007, ware components. Of course, this is all this approach had better mean more in the name of helping out customers than offering free e-mail at aol.com and users, even if the new approach and plastering the site with cheesy causes complications for upstream banner ads. developers and testers. Hemadri Dasari, a principal softPrior to Web services and SOA, ware engineer who has been at AOL Dasari says, most applications were writfor eight years, points me to the comten by one development team, subjectpany’s Open Business Infrastructure ed to QA testing by another team and (OBI) project as a more developerthen certified as “basically done.” The focused example of the new AOL. The real headaches associated with integradescription Dasari e-mailed me is tion and end-to-end testing would usudepressingly full of tech marketing jarally come much later and would often gon: “OBI provides a monetization be dealt with by other system architects engine… [and] generic capabilities for and integrators. business integration.” He’s better on “Today, the core functionality of a the phone.
36
• Software Test & Performance
given component likely depends on a call sent to a service bus that in turn needs to coordinate between multiple different Web services using SOAP calls,” Dasari says. “As a result, we’re seeing those endto-end integration issues right up front during the development process.” One key to surviving this newfound complexity is automated testing, including regression testing. Dasari says that up to 90 percent of AOL’s Web servicesrelated testing is automated. Another survival tip, he continues, is to keep your scripting skills sharp. “We’re constantly writing different scripts to address various services dependent on Java and JavaScript,” Dasari says. “Scripting skills, along with knowledge of the specific business domain and context, will become increasingly important in the future.” Web services and the newly open ethic of sharing and interoperating affect more than companies like AOL. Indeed, entire job classifications are being buffeted. For an example, look no further than your poor IT operations staff.
From Paper Jams to Programming In Java? Today as always, IT operations teams are required, first and foremost, to keep the production environment humming along smoothly. The only problem is that stuff that’s thrown over the transom either from in-house developers or external suppliers is often less rigorously tested than ever before. “The nature of SOA environments is going to force more and more testing to move to IT operations,” says Dennis Powell, a senior product manager at StackSafe, a software start-up in Tysons Corner, Va. “There are so many more pieces that are now involved. If you Write to Best Practices columnist Geoff Koch at koch.geoff@gmail.com. NOVEMBER 2007
Best Practices work in IT operations, you are going to have to become more of an expert, particularly when talking about composite applications.” Does this mean that the guys who deal with paper jams and software upgrade installations must be at least moderately proficient in technologies and specifications such as SOAP, WSDL and UDDI? Powell thinks so, though it’s more important, he says, for IT operations teams to form tighter links with application development and support groups. Powell began his career more than 20 years ago writing COBOL code for monolithic CICS mainframes. He remembers handing over COBOL programs compiled with object code to IT staffers who “knew exactly what they were dealing with. There was always just one interface, and though the program was usually fairly rigid, it was almost always fairly straightforward.” The rise of object-oriented programming in the early 1990s was the first sign of a software world that would eventually be comprised of much smaller components, many of which would have to interact with one another. And if objectoriented programming was about modest servings of small clusters of functionality, Web services and SOA often means managing bite-sized crumbs of single-purpose code. Powell says that smart use of virtualization may help beleaguered IT managers to test and arrange these crumbs into real sustenance—that is, into working services. Thanks to cheap hardware and new virtualization tools, such as those offered by Powell’s current employer, it’s possible to copy entire disk images either onto virtualized containers within a production server or a separate stand-alone server devoted strictly to testing and QA. These copies can then be networked together to test, or at least better understand, all the SOA-type interactions among the various components.
Shar Van Boskirk told The New York Times that the Big Apple move may be “too little, too late”—so too is Powell skeptical that things will get any easier for the IT operations crowd in the
•
to have the capacity to test every permutation that’s thrown their way.” The only solution, he continues, is to aggressively pursue deep subjectmatter expertise and to insist that upstream test and QA teams are given proper resources and, whenever possible, staffed by engineers and not reinvented English majors. Long term, the emerging Information Technology Infrastructure Library, ITIL, may offer some relief in the form of best practices in managing change. But for now, it’s best to anticipate steadily rising complexity and start looking for stress-relieving hobbies outside the office. There’s always gardening. To keep filching neighbors and hungry wildlife at bay, you might even consider building a wall, a structure that’s still useful if you’re talking fresh produce and not software programs. !
Does the guy who deals with paper jams have to know SOAP and WSDL?
All-Services World Ahead However, just as AOL’s future remains in doubt—Forrester Research analyst NOVEMBER 2007
• open-architecture, all-services world to come. “IT is always going to be saddled with the age-old problem: supporting the entire production environment,” Powell says. “They simply are not going
Index to Advertisers Advertiser
URL
Page
FutureTest 2008
www.futuretest.net
20-21
Hewlett-Packard
www.hp.com/go/quality
IBM
www.ibm.com/takebackcontrol/open
iTKO
www.itko.com/lisa
8
Seapine
www.seapine.com/ttstech
4
STPCon Spring 2008
www.stpcon.com
6
Software Test & Performance White Papers
www.stpmag.com
39
40
2-3
www.stpmag.com •
37
Future Future Test
Test
SOA’s Been A Matter of Trust One of the benefits of servpromote internal adoption ice-oriented architecture and reuse as well as busior Web services is the ness continuity. Ultimately, opportunity to reuse busithe organization must trust ness components. But this that the business compoand other benefits of servnents they use are secure, ice orientation are largely reliable and compliant. constrained by the manExternally, the business agement of the various must enable its partners to SOA domains: security, use these business assets. management, registry, With external reuse, the Wayne Ariola development, orchestraissue of risk compounds. tion, composite services and enableDowntime of an hour can cost not only ment and integration. The lack of a substantial losses in revenue but, more solid SOA governance strategy throughimportant, a perceived lack of quality and out the entire service life cycle can reliability in the company in general. result in an inconsistent and uncontrolAt the minimum, companies should lable IT infrastructure that compromisemploy interoperability, unit, scenario, es SOA’s benefits. But without adequate regression, performance and security enforcement, any governance strategy is penetration testing to provide the necdestined for failure. essary inspection points for the business To achieve the forecasted benefits of components. Like the certification SOA, including reuse, you must achieve processes the automotive industry uses, trust: trust that your services can supa visible and capable quality process is port defined business objectives, that critical for the success of SOA. Without they’re scalable to meet the demands of this, the organization and its partners business partners, and that they’re are simply not likely to reuse the busiinteroperable. And to build this trust, ness components. This lack of trust, you must begin at the design phase. which is reasonable when dealing with a mission-critical business process, will A Quality Process for SOA deteriorate the efficiencies that the Similar to the certification programs for SOA model promises. used cars, inspection points of business Enforcing Policies components must be established, executed and measured before designing or SOA governance, as an overarching exploring a SOA project. Such a process management activity, is what determust be clearly defined, objective, relimines the policies relevant to how servable and repeatable to promote trust in ices are handled in their different lifethe business components that are to be cycle stages, from inception to discovreused either internally or externally. ery, to invocation and consumption. Developers must have some sort of When exposing business informacontrol or process to determine what tion and operations either within the services are published and become a enterprise or by sharing data with partpart of the SOA being built. To control ners externally, all parts of its system how Web services are defined, develmust interact flawlessly and securely. To oped and deployed, companies implepromote trust with external partners, ment internal standards and policies or visible quality metrics are necessary to
38
• Software Test & Performance
industry guidelines and best practices. These policies describe which standards are adopted within a particular organization and which version of these standards is endorsed. For example, companies may choose WSDL 1.1, SOAP 1.2 and WS-Security 1.0. Once these adopted standards and versions are selected, it’s critical that the governance policy is enforceable. Therefore, to achieve trust in their Web services, organizations must have a governance solution that ensures the enforcement of the various standards and policies. Automated policy-enforcement tools provide a governance solution that allows the control of policies and enforces them explicitly as part of the overall quality process of a serviceoriented architecture. Policies can be enforced with the following practices: • Requiring schema validation tests against the respective standard schemas • Executing XML static analysis to ensure that the correct standards are used and implemented correctly • Performing semantic analysis to ensure that XML artifacts aren’t only schema valid, but can be interpreted correctly by the consumers Because compliance with policies is preventative rather than reactive, these practices prevent policy violations from creeping into the SOA, thereby ensuring interoperable and functional Web services. Companies have invested heavily in building and stabilizing the legacy systems that support vital business components. Service-oriented architectures offer these companies an efficient way to decrease integration expense, increase business agility and consolidate applications by reusing valuable business components. To mitigate the inherent risk of reuse, a visible and measurable quality process must be applied to enforce policies, promote trust and increase application quality. ! Wayne Ariola is vice president of strategies and corporate development at test tools maker Parasoft. NOVEMBER 2007
A LT E R N AT I V E T H I N K I N G A B O U T Q UA L I T Y M A N AG E M E N T S O F T WA R E:
Make Foresight 20/20. Alternative thinking is “Pre.” Precaution. Preparation. Prevention. Predestined to send the competition home quivering. It’s proactively designing a way to ensure higher quality in your applications to help you reach your business goals. It’s understanding and locking down requirements ahead of time—because “Well, I guess we should’ve” just doesn’t cut it. It’s quality management software designed to remove the uncertainties and perils of deployments and upgrades, leaving you free to come up with the next big thing.
Technology for better business outcomes. hp.com/go/quality ©2007 Hewlett-Packard Development Company, L.P.