Data Architecture and Data Warehousing by The Ken Orr Institute

Business Intelligence Vol. 3, No. 2

Integrating Enterprise Data Architecture with Enterprise Data Warehousing by Ken Orr, Fellow, Cutter Business Technology Council

Businesses worldwide are moving toward becoming real-time enterprises. In the interests of reaching that goal, they're showing widening interest in the intersection of high-level enterprise data architecture activities with those of data warehousing and strategic IT planning. This Executive

Report examines how the major efforts of strategic IT planning, enterprise architecture, and data integration and access relate to one another and the critical role that the enterprise data architecture plays in this.

About Cutter Consortium

Access to the

Access

Experts

Cutter Consortium’s mission is to foster the debate of, and dialogue on, the business-technology issues challenging enterprises today and to help organizations leverage IT for competitive advantage and business success. Cutter’s philosophy is that most of the issues managers face are complex enough to merit examination that goes beyond simple pronouncements. The Consortium takes a unique view of the business-technology landscape, looking beyond the one-dimensional “technology” fix approach so common today. We know there are no “silver bullets” in IT and that successful implementation and deployment of a technology is as crucial as the selection of that technology. To accomplish our mission, we have assembled the world’s preeminent IT consultants — a distinguished group of internationally recognized experts committed to delivering top-level, critical, objective advice. Each of the Consortium’s nine practice areas features a team of Senior Consultants whose credentials are unmatched by any other service provider. This group of experts provides all the consulting, performs all the research and writing, develops and presents all the workshops, and fields all the inquiries from Cutter clients. This is what differentiates Cutter from other analyst and consulting firms and why we say Cutter gives you access to the experts. All of Cutter’s products and services are provided by today’s top thinkers in business and IT. Cutter’s clients tap into this brain trust and are the beneficiaries of the dialogue and debate our experts engage in at the annual Cutter Summit, in the pages of the Cutter IT Journal, through the collaborative forecasting of the Cutter Business Technology Council, and in our many reports and advisories. Cutter Consortium’s menu of products and services can be customized to fit your organization’s budget. Most importantly, Cutter offers objectivity. Unlike so many information providers, the Consortium has no special ties to vendors and can therefore be completely forthright and critical. That’s why more than 5,300 global organizations rely on Cutter for the no-holds-barred advice they need to gain and to maintain a competitive edge — and for the peace of mind that comes with knowing they are relying on the best minds in the business for their information, insight, and guidance.

Cutter Business Technology Council

Rob Austin

Christine Davis

Tom DeMarco

Jim Highsmith

Tim Lister

Ken Orr

Ed Yourdon

Integrating Enterprise Data Architecture and Enterprise Data Warehousing BUSINESS INTELLIGENCE ADVISORY SERVICE Executive Report, Vol. 3, No. 2

by Ken Orr, Fellow, Cutter Business Technology Council INTRODUCTION Large organizations are becoming increasingly interested in the intersection of high-level enterprise data architecture (EDA) activities with those of data warehousing and strategic IT planning. Why? Organizations worldwide are moving to become real-time enterprises. In a recent Enterprise Architecture Executive Report, “Building a Real-Time Enterprise: Why It’s Worth the Effort” (Vol. 5, No. 10), I sought to explain why this EDA is so important and why organizations that can make the transition to a real-time enterprise are apt to be more competitive as the world comes out of the current economic downturn. In the real-time enterprise, data warehousing becomes much more important. Rather than

being something that’s nice to have, near real-time data warehouses are essential to running the real-time enterprise. In this context, having the right kind of data warehousing becomes key to making the real-time enterprise really real-time. To support real-time decisionmaking, organizations must create straightthrough processing of their critical data, where operational databases, data warehouses, and associated data marts become part of an integrated set of distributed enterprise data that’s updated consistently. But enterprise architecture (EA) is even more critical to building a truly real-time enterprise. A few years ago, most organizations believed that their high-level architecture activities were pretty

much separate; but today, they’re seeing that these high-level activities (strategic IT planning, data management, portfolio management, asset management, and technology management) increasingly overlap (see Figure 1), and all depend on a consistent, up-todate enterprise architecture. This report examines how these major efforts (strategic IT planning, EA, and data integration and access) relate to one another and the critical role that the EDA plays in this. EDA: THE HOLE IN THE ZACHMAN FRAMEWORK The notion of an IT enterprise architecture has been around since the late 1980s, when John Zachman published his famous article on what’s now called the

BUSINESS INTELLIGENCE ADVISORY SERVICE

Technology Management Asset Management

Zachman Framework [7]. In its current form, the framework contains six columns and six rows (see Figure 2).

Strategic IT Planning Data Management

Portfolio Management

Enterprise Architecture

Figure 1 — The merger of high-level IT activities.

Data

Function

Network

People

Time

Motivation

Scope Enterprise Model Systems Model

Enterprise architecture is in. More organizations are using this framework as a way to capture and maintain high-level information linking IT assets, business concerns, and strategies. In a recent Cutter Consortium survey, for example, Cutter Consortium Senior Consultant Paul Harmon found that nearly two-thirds of responding companies are developing EAs (see Figure 3) [1]. It’s rare to find two-thirds of any large group of enterprises doing the same thing, yet it clearly demonstrates that there’s significant interest in EA.

Technology Model Detailed Representations Functional System

Figure 2 — Zachman’s Framework.

No 36%

Yes 64%

Figure 3 — Does you organization have an enterprise architecture (EA)?

But the next finding that Harmon reported was quite interesting and, from my standpoint, even startling. He asked what parts of Zachman’s model the respondent companies have information on. Figure 4 shows the answer to this question. The reason this chart is so startling is the very small number of organizations that say their EA contains definitions of the enterprise data. Now it may be that the respondents didn’t fully

The Business Intelligence Advisory Service Executive Report is published by the Cutter Consortium, 37 Broadway, Suite 1, Arlington, MA 02474-5552, USA. Client Services: Tel: +1 781 641 9876 or, within North America, +1 800 492 1650; Fax: +1 781 648 1950 or, within North America, +1 800 888 1816; E-mail: service@cutter.com; Web site: www.cutter.com. Group Publisher: Kara Letourneau, E-mail: kletourneau@cutter.com. Managing Editor: Rick Saia, E-mail: rsaia@cutter.com. Production Editor: Linda Mallon, E-mail: lmallon@cutter.com. ISSN: 1540-7403. ©2003 by Cutter Consortium. All rights reserved. Unauthorized reproduction in any form, including photocopying, faxing, and image scanning, is against the law. Reprints make an excellent training tool. For information about reprints and/or back issues, call +1 781 648 8700 or e-mail service@cutter.com.

VOL. 3, NO. 2

www.cutter.com

EXECUTIVE REPORT understand the question, but it may relate to a deeper, more significant problem. The problem, I suspect, is that people think that with so much effort devoted to application data models and database design, there’s no need for a specific EDA. But that’s not the case. Data is the most important IT asset that an enterprise has; it represents the crown jewels of IT. Without it, nothing else works. So it would seem that if the “data” column in Zachman’s framework is largely missing in most enterprise architecture activities, then there’s a major hole in most EA efforts. And since I believe that the data column is perhaps the most important in the entire framework, I hope to provide guidance on how someone would go about developing an EDA and show how critical it would be in developing a data warehousing strategy. I want to stress here that EA and all its pieces and connections are not an academic or abstract exercise, at least to my clients. The enterprise architecture is a core component of IT strategic planning and IT management. EA is somewhat new and certainly difficult, and there are few proven tools and methods for building one. But an enterprise architecture is nonetheless very important to building a successful real-time enterprise in the 21st century.

Building a Real-World EA

Over the years, there has been much criticism in EA articles that many organizations took the Zachman Framework too literally. Indeed, for most of its history, the framework has lacked substance. In part, this was the result of too many people (including, to a degree, Zachman himself) trying to figure out what should go into each cell, as if by simply filling in this matrix, we would suddenly have a model that — like the DNA double-helix — would allow us to understand the structure of the entire IT world. I believe most folks doing serious enterprise architecture understand that the Zachman Framework is a useful intellectual device for talking about high-level issues in IT planning and design, but it’s not a road map, certainly not one you can pull out when you’re lost. What we have found in “doing enterprise architecture” is that the Zachman Framework

Network 7% Components 13%

Systems 24%

is a very rough map that’s only partially filled in. Our experience is that most large organizations have only pieces of the information they need to develop a complete EA. Figure 5, for example, illustrates the information most organizations have. What we see here is that when we start developing an EA we have lots of detailed information at the bottom layers (rows) of the Zachman Framework and fairly spotty information everywhere else. 1 In addition, much of the information that we do have is either very dated or of poor quality. The job of developing an accurate EA, then, involves coming up with a logical framework that contains all the needed information, 1

In some respects, Figure 5 shows that we have perhaps more detailed information than we actually have. What it really represents is the fact that in order for the enterprise to operate at all, the bottom rows of Zachman’s matrix must be filled in.

Data 2% Infrastructure 29%

Applications 25%

Figure 4 — If you have an EA, which of the following does it define?

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

related in such a way that it makes sense and can be used to fill in the rest of the boxes. How Do the Various Columns in the Zachman Framework Relate to One Another?

In his original paper, Zachman only included three columns:

Data

Function

data (what), process (how), and network (where). Though there have been additions, the original three columns have continued to be the most important, and in many ways the most accessible. Clearly, the addition of people (who), time (when), and motivation (why) provides a more

Network

People

Time

Motivation

Scope

Enterprise Model Systems Model Technology Model Detailed Representations Functional System

Figure 5 — Information at the beginning of an EA.

Database (data)

Input (data)

Application (system)

Output (data)

Computer (network)

Figure 6 — The input/output process plus database model.

VOL. 3, NO. 2

complete framework with respect to classical analysis; the first three are the aspects of EA for which the most data exists in most organizations. Classically, systems (process) have inputs (data), produce outputs (data), store information in databases (data), and run on computers (network) (see Figure 6). This model corresponds very closely to the way Zachman originally portrayed his architecture; and it’s easiest to relate to these particular IT components. In developing an enterprise architecture, we have found it easier to relate to all the various pieces, but it’s always the data that represents the foundation of IT within the enterprise, since it has to do with the ultimate products that IT develops (e.g., outputs and screens). Computers are important because they allow employees to process, store, and retrieve information. There’s no better way of handling the millions of pieces of data the organization has. Every large organization has an enterprise architecture, whether it knows it or not. It may not be explicit, but it’s there. And to a high degree, that EA revolves around the data that’s stored in the databases. Not only do all large organizations have some form of enterprise architectures, they have an EDA. For the most part, this data architecture is implicit rather than explicit. In most organizations, the EDA is

www.cutter.com

EXECUTIVE REPORT fragmented over the thousands of applications within the organization and shows up in the thousands of tables and files that are maintained on regular bases, such as daily, weekly, and monthly. Over the years, there have been numerous attempts to rationalize the implicit data architecture to help us make some sense of our data, remove redundancy, and improve data quality. We have done enterprise data models and application data models repeatedly, but after all is said and done, most organizations don’t have coherent EDAs. We have discovered that one thing that’s missing is a way to relate the data that we keep in our technical systems with the important things in the real world. We need a new way of thinking about data at a high level. For this, we need to have common “business semantics.” AN INTRODUCTION TO BUSINESS SEMANTICS As with most areas of architecture and design, there’s no one best way to do things. “Doing enterprise architecture,” for example, takes analytical and communication skills and requires research. My own feeling is that the best data architects are like the best building architects: they’re constantly learning about their users and looking for good models in books and articles.

Simple data models are hard to come up with; it’s only the complex ones that are easy. My experience is that the harder you work, the simpler and more elegant the model becomes. But it helps if you’ve done a lot of them and have worked in other industries or companies. All the good models for the same industry tend to look alike; it’s only the bad models that have a lot of variety. In our work, we’re constantly amazed at how a basic understanding of business semantics aids in both enterprise data architecture and data warehouse design. On a base level, data models are abstract representations of the real world. The better they model that real world, the better they work. Over time, most databases come to model the same classes of objects in much the same fashion. As a consequence, it’s not stretching the point too much to say that all of the good data models look alike. The principal reason this is true is that the real world is pretty much the same for all enterprises in the same market. As a consequence, we store information about “customers,” “salespeople,” “products,” “orders,” and “shipments.” This is not an accident. But even though database designers and architects have the best tools and the most experience at modeling large things, they have not had, by and large, a way of classifying the different classes of entities that they’re modeling.

In an attempt to be general, and business independent, most of the people doing this kind of modeling have what I refer to as a “flatland view” of the things they’re modeling. In IT, everything is an “entity” or an “object.” All entities are the same; it’s just the name of the entity and its connection with other entities that are important. There’s no fundamental difference between a “customer” and an “order.” This kind of thinking has kept us from seeing that there are real differences between the different subclasses of entities and that the differences ought to guide us in developing a very high-level, logical view of the enterprise’s data. In this section, we’re going to discuss the semantics that underlie all our data modeling. We’re going to attempt to show that these semantics show up everywhere in the organization and that, if we develop the right kinds of models that reflect these semantics, a whole range of valuable things are possible. The Four Major Classes of Entities

In our work, we have discovered four basic classes of entities that are found in nearly every kind of business system. These entities are: 1. Actors 2. Messages 3. Objects (subjects) 4. Events

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

Recently, we have seen increased interest in business semantics. People are becoming more interested in semantics as they come to understand what is closely related to the data that’s modeled and stored. Actors

Like their name, actors are the entities that cause things to happen in a systems sense. In general, we find three major subclasses of actor: individuals, organizations, and systems. From a systems standpoint, the most important characteristic of an actor is the ability to autonomously send and receive messages. In a systems context, actors send and receive messages from other actors. We model this phenomenon using something we call context diagrams (see Figure 7). These diagrams focus specifically

on what the identifying actors are within a system and the basic messages communicated between them. We use these context diagrams to help us understand how all the pieces fit together and to help us construct both the low- and highlevel models that are internally consistent. Unlike many modeling techniques, we usually start in the middle and work up and down. This approach makes sure that we don’t miss too many important things and also helps to keep us from getting into too much detail too early. For example, we often develop a diagram like the one in Figure 7, identify the boundary of the enterprise, and then redraw the diagram at the very highest level to determine how the system looks from outside the enterprise (see Figure 8). These context diagrams also help us identify the major external

Invoice Customer Payment Accounting Delivered Equipment

Order

Shop Order Entry

Billing Notice Entered Order

Actor

Credit Manager

Approved Order

Shipping Notice Sales Manager

Message

Figure 7 — A context diagram.

VOL. 3, NO. 2

actors and external messages that the system supports, something that we know we’ll need when we begin to develop the data architecture. Messages (Transactions)

Actors communicate with one another via “messages.” They always carry information, but they may carry other things as well. For example, it’s possible to think of a package or shipment as a physical message. In a systems context, a series of messages represents a conversation or, in a business environment, what has been classically called a business exchange. A business exchange is also called a barter. Indeed, as far as economic historians can deduce, the most ancient form of business transaction was the barter (“A gives B a chicken, and B gives A some salt.”) This underlies all of business and forms the essence of the legal concept of a contract. As we discuss business processes here, the idea of a business exchange will be central. Another term for message is “transaction”; however, the original meaning of a transaction is single message but rather a business exchange. Messages in manual information systems often take the form of a document or conveyance. Students of written language suggest that the very earliest recorded writing was some kind of business

www.cutter.com

EXECUTIVE REPORT transaction. In most applications, messages contain all the key relational data. Probably the prototypical message is a package such as the one shown in Figure 9. A package such as the one above typically has three characteristics that are found in business messages:

that ties everything together. Figure 11 shows an initial data model for the actors and message we’ve been talking about. This diagram illustrates clearly how our invoice message relates the customer to the products he or she buys.

marts) start from invoice information. If you want to know which customers bought which products, you need to go to the invoices; if you want to know which products were bought by which customers, you need to go to the invoices. (In general, invoices may not be the whole story since they don’t tell you

All sales/revenue BI systems (such as data warehouses and data

1. A sender (from actor) 2. A recipient (to actor) 3. A set of contents (objects)

Invoice Customer

In a data system, most information messages have — at the least — this same general format: a sender, a receiver, and something that’s being referenced. For example, take a typical invoice (see Figure 10). Most real-world systems depend on messages for communication. In a sales order system, for example, we receive “orders,” send “shipments” and “invoices (bills),” and finally, receive “payments.” These are all messages, describing the multiple parties (such as customers, salespeople, and the enterprise) and the stuff (objects or services) that we provide or receive. It doesn’t matter whether these messages are electronic or written on the back of an envelope, typed, or handwritten — under the appropriate circumstances, they are legally binding on the two parties. Messages are especially important in developing EDAs; indeed, messages are the glue (relations) ©2003 CUTTER CONSORTIUM

Payment Accounting Delivered Equipment

Order

Shop Order Entry

Billing Notice Entered Order Credit Manager

Approved Order

Shipping Notice Sales Manager

Actor

Enterprise

Message

Figure 8 — Abstracting a high-level context diagram.

Contents: Object

From: Actor

Jim Jones 123 Main Toledo, OH

Bill Smith Mason Hall Lawrence, KS

To: Actor

Figure 9 — A package as a protoypical business message.

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

Actor: from

Actor: to

Object (Products)

Message (Invoice)

Figure 10 — A typical invoice (message).

Customer (Actor: to)

Invoice (Message)

actors and messages. In some systems, most of the entities being modeled are internal, but for the most part, it’s the external actors, external messages, and externally referenced objects (products and services) that are most critical to get right. The reason for this is that it’s normally the external actors and messages that determine the business’ success or failure. Without customers and externally referenced products, there’s no need for an orderentry system or any of the internal documents (messages) that track the order around the plant. Objects (Subjects)

Product (Object: contents)

Figure 11 – An entity-relationship diagram of the invoice message.

what products were returned and what invoices were never paid, but they do represent the basic fundamental transactional data in the sales process). In our work, we’re conscious from the beginning to look for the key actors in any enterprise modeling activity, along with the key messages and key objects with which that enterprise itself is involved. Most good architects do this naturally. All the good ones VOL. 3, NO. 2

have their own business semantics, but at a high level, most architects recognize the major classes of things that they are modeling. Indeed, many have basic templates that they apply, mostly unconsciously. It’s important here to talk just a little about the differences between external and internal actors, messages, and objects. As we will see, in any enterprise-level modeling, there are internal and external

It’s a shame that there aren’t better words to describe what we mean by objects. Unfortunately, most of the really good words for things have already been used in other contexts, so we’ll use “object” in the way most people do, to represent some “thing,” usually a passive thing. In modeling the dynamics of a business system, there are at least two major parties (e.g., “us and the customer” or “us and the vendor” or “us and the employee”) involved. Typically, there are also many other secondary actors involved. We spend a lot of time modeling the business context so that we don’t miss any of them. And, in any real business context, there are also many messages that go back and forth between these parties. These messages count for a lot. If you really

www.cutter.com

EXECUTIVE REPORT understand these messages and the sequence in which they flow around the system from actor to actor, it’s pretty straightforward to understand the business process that’s going on. Messages are key because they provide clues to what activities go on in the system. And these activities lead to reports and screens that people need to do their work. In turn, these reports and screens point to data attributes and data structures needed in the enterprise data architecture. By focusing on the actors and messages, we can systematically extract the data structures and data attribute definitions that we need out of the systems definition. But actors and messages aren’t everything. Systems (business processes) are always “about something” besides the actors and messages. If we go back to the idea that every business process is some kind of barter (business exchange), then the object of a system is the subject of the business transaction. If I’m trying to model a “sales order system,” for example, as Figure 12 illustrates, the whole exercise is about two objects: products (for the customer) and money (for us). Here, the object we’re interested in is the product, which is the same thing we saw with customer, invoice, and product. “Products” are passive things that we sell, so it’s important that we keep information about them.

Systems always have some key object or objects. Sometimes it’s a “parcel of land” (a real-estate system), sometimes it’s a “job” or “position” (an employment system), or “stocks and bonds” (a brokerage system), or a “policy” (an insurance system). But there is almost always some key object that’s central to the system. Objects come in all sizes and shapes. There is often a complex data structure associated with the object. This is part of the process of developing a good data model.

Customer

Products $$$

Enterprise

Figure 12 — The business exchange underlying the sales order system.

Customer Order

In many applications, one of the most complicated and most complex parts of the system has to do with product structure. A complicated tangible product such as an automobile or computer has a large number of individual pieces that must fit together precisely. A Bill of Materials (list of components included in a product) and parts explosion information are often included in the product database. A similar issue occurs in dealing with complex insurance policies as well. This is all object/subject information. Events

Events are important because they signal the beginning or end of something important, either within or outside the enterprise, something our business must monitor. Like objects, events are implicit rather than explicit in our context diagrams (see Figure 13). Here, there’s an event that may be important to the organization for

Delivered Equipment

Payment Invoice

Events

Enterprise

Figure 13 — Identifying events on a context diagram.

every place that a message leaves or enters an actor. Many disciplines have been developed for modeling events in computer science. Most of this interest, however, has been in areas of real-time and control systems, where very complex decisions must be made in real time, and understanding the state of the object is very important. Historically, event information has often been ignored or relegated to control information or standard time functions. But events are important. And it’s important to keep track of event

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

information in a wide variety of applications.

In addition to the major categories, there are some important subclasses of the four major classes. I try to represent this information in Figure 14. As you can see, there is no necessary symmetry between the subclasses of the four major classes. Actors tend to be individuals, organizations, and systems; objects break down into various kinds of tangible and intangible assets; messages are either internal or external; and events are either periodic or on demand.

Periodic

Internal

On Demand

Events

External

Messages

Land

Product, Intangible

Product, Tangible

Objects

System

Organization

Individual

Actors

Position

Understanding the distinction between actors, messages, objects, and events makes it possible to talk much more intelligently about our enterprise architecture. Over time, this semantic way of looking at things grows on you, because making the distinctions provides all kinds of guidelines for how to deal with each kind of information.

Service

Subclasses of the Four Major Entity Classes

Actors Customer Salesperson Employee Vendor

a a a

Objects Products

a a

Services

Position/Job Policy (Insurance) Parcel

a a

Messages Order Shipment Payment Shipping Notice

a a a a

Events Send Order Receive Order Send Shipment Receive Shipment

a a a a

Send Monthly Report Send Quarterly Report

a a

This semantic breakdown helps me because, from a database standpoint, each example of the various classes tends to behave in much the same way. Actors are independent and therefore tend to have unique identifiers. The same is true with objects, but they often have very complex internal structures, which, in relational databases, mean multiple related tables. Actors (especially organizations) and a lot of product structures are recursive (e.g., Bill of Materials), which give some people problems, not because the solution is complex, but because it may not be obvious. In all systems, messages link things (like actors and objects). As a result, you find a lot of foreign keys on message tables. Messages also tend to have simple (one or two levels) hierarchies. But in many systems, messages represent the bulk of the data. In many, you have hundreds of thousands of customers, but you may have millions or billions of transactions. In data warehousing, message information almost always dominates other data. Events tend to be associated with time. Events are either periodic (e.g., weekly, monthly, yearly) or on demand (such as real time). They tend to contain snapshots of the real world at some point in time and are most closely related to messages.

Figure 14 â&#x20AC;&#x201D; Examples of the four major entity classes and associated subclasses.

VOL. 3, NO. 2

www.cutter.com

EXECUTIVE REPORT Comments on Business Semantic Classes

We’ve used these business semantics for a long time. The more we use them, the more we incorporate them into everything that we do. As you’ll see in the next section, we use our classification of actors, messages, and objects to design core data warehouses. The fundamental idea that there are general classes of entities that reflect the real world is becoming more common. There has been a strong movement within the object-oriented community toward the use of patterns. In his wonderful book, Data Model Patterns [2], David Hay illustrates more than 100 excellent data models in which he follows a very similar business semantic as the one we use here. There are, of course, differences; for example, Hay uses “party” instead of “actor,” but the concepts are, as far as I can tell, identical. Armed with better business semantics and sets of such business semantic models, data architects are in a much better position to understand business areas that they may not have had experience with — and quickly come to understand how to model them. Equally important, they may be able to recommend models that are better in the long run. For too long, high-level modeling has relied principally on user input and architect experience. We are at a point in the history of software engineering where ©2003 CUTTER CONSORTIUM

we can really begin to develop high-level models that map to a common real world. UNDERSTANDING ENTERPRISE DATA FLOW Over the years, we have discovered that there are, in fact, two separate components that go into the enterprise data architecture: 1. The enterprise data flow architecture (EDFA) (data warehouse architecture) 2. The enterprise logical data architecture We will devote this section to our EDFA. EDFA (Data Warehouse Architecture)

For a couple of decades, I have been working steadily on how to understand the data that organizations have at a high level and how that data gets transformed, moved, and used. One reason is that, at various times in my career, I have been involved in designing systems that were primarily intended to support management (end-user) information access and retrieval. As a consequence, I have been involved in building systems that pulled data from one or more systems and assembled a database that would allow the end user easy access to that information. In the 1980s, the term “data warehousing” came into being when a number of companies began to explore new ways to deliver

information that would answer end-user queries from data on existing (i.e., legacy) systems. I was doing consulting with IBM at the time, and the problem we faced remains one of the major problems in IT — managers and staff need access to information that requires the integration of data from multiple, independent systems. Modern enterprises must operate at much higher speeds than in the past. To do that, management needs answers and alternatives, and managers want to look at multiple scenarios. But most legacy systems were not good at providing management information. Since the databases involved were complex, it often took considerable programming to come up with the reports that management needed. The time and management frustration were even greater when the needed information had to be drawn from multiple systems. The EDFA is a way of representing the overall structure of data, communication, processing, and presentation that exists for end-user computing within the enterprise. The architecture is made up of a number of well-defined, interconnected layers, or critical components (see Figure 15): n The presentation/desktop access layer (1) n The data source layer (operational data layer [2a]/external data layer [2b]/non-operational data layer [2c]) VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE The Data Source Layer

At the opposite side of this diagram are the sources of enterprise information. They are: n Operational data — the data that resides in operational (legacy) databases.

Figure 15 — Enterprise data flow architecture.

n The core data warehouse layer (3)

The Presentation/ Desktop Access Layer

n The data mart layer (4)

The information access layer is the one end users deal with directly. In particular, it represents the tools that the end user normally uses, such as Excel, Access, Business Objects, and SAS. This layer also includes the hardware and software involved in displaying and printing reports, spreadsheets, graphs, and charts for analysis and presentation. Over the past two decades, this layer has expanded enormously, especially as end users have moved much of their work to the Internet. An increasingly major component of this layer is a highlevel search (represented by the book-like object at the top of the layer) that helps users find the information they want to manipulate and display.

n The data staging and quality layer (5) n The data feed/data mining/indexing layer (6) n The data access layer (7) n The metadata repository layer (8) n The warehouse management layer (9) n The application messaging (transport) layer (10) n The Internet/intranet layer (11) I have chosen to describe this architecture in terms of the layers in which the data resides first (i.e., presentation, data source, core data warehouse, and data marts) and will then move on to the other supporting layers.

VOL. 3, NO. 2

n External data — the data that’s imported into the enterprise from external data sources. A great deal of the marketing and competitive data that users need exists outside the enterprise; with the Internet, even more data becomes widely available. n Non-operational data — in many systems, end users need information that’s not currently available in any computerreadable form. This data must be created and maintained over time but doesn’t reside in any traditional operational database. The Core Data Warehouse Layer

The core data warehouse stages the actual data used primarily for informational uses. In some cases, one can think of the data warehouse simply as a logical or virtual view of data. But increasingly, the core data warehouse represents detailed and summarized data that makes generating data marts and answering ad hoc queries easier. In a physical data warehouse, copies of operational or external data are actually stored in a form that’s easy to access and highly flexible. (The

www.cutter.com

EXECUTIVE REPORT design of the core data warehouse is a critical issue in the EDFA, and business semantics can help in this activity.) The Data Mart Layer

The data mart layer is the layer where the various so-called “data cubes,” or multidimensional database tools, reside. It can also contain small departmental or project subset databases. The data mart layer typically includes what are often referred to as BI tools, such as: n Online analytical processing (OLAP) tools n Relational OLAP (ROLAP)/Multidimensional OLAP (MOLAP)/Hybrid OLAP (HOLAP) tools n Relational applications There’s a great deal of confusion about data warehouses versus data marts. To clarify, the core data warehouse means the central staging area that’s intended to be the data source for a broad set of internal and external data, where the data mart represents a highly optimized data structure that allows a subset of end users to slice and dice a predefined set of data. Data Staging and Quality Layer

The data staging and quality layer is perhaps the most underemphasized part of the data warehousing infrastructure. Data staging is also called copy management or

replication management, but it includes all the processes that are necessary to select, edit, summarize, combine, and load data warehouse and information access data from operational or external databases. The most critical part of this layer involves data quality. Much of the data that exists in our existing database is of questionable value. Data quality tests and validation make sure that only high-quality data gets through to the core data warehouse. Functions included in this layer are, typically: n Copy management n Simple transformations n Data cleansing n Metadata mining The Data Feed/Data Mining/ Indexing Layer

The next component of the data warehouse architecture is the data feed/data mining/indexing layer. This layer takes data from the core data warehouse and performs a number of operations so that in data marts, the data can be accessed more easily and more rapidly. Proprietary multidimensional databases (MDDBs) normally require extensive preprocessing to precompute values used for slicing and dicing the data. Similarly, bit-mapped indexed databases require an extensive indexing pass to create

these functions typically included in this layer: n Data subsetting/summarizing n Data mining n Indexing n Sparse matrix preparation n Pre-aggregation of totals The Data Access Layer

The data access layer is involved with allowing the data staging and quality layer to talk to databases in the data source layers without having to understand exactly how these data sources are organized. In today’s network world, SQL has emerged as the common data language. It was originally developed as a query language, but over the past few decades has become the de facto standard for data interchange. One of the key breakthroughs of the past few years has been the development of a series of data access “filters” that make it possible for SQL to access nearly all database management systems (DBMSs) and data file systems, whether relational or non-relational. These filters make it possible for state-of-the-art data access tools to access data stored on DBMSs that are pre-relational in nature. The data access layer not only spans different DBMSs and file systems on the same hardware; it also spans manufacturers and

VOL. 3, NO. 2

14 network protocols. One of the keys to a data warehousing strategy is to provide end users with “universal data access.” In theory at least, that means that end users, regardless of location or data access tool, should be able to access any or all of the data in the enterprise that’s necessary for them to do their jobs. The data access layer, then, is responsible for interfacing between data access tools and operational databases. In some cases, this is all that certain end users need. However, in general, organizations are developing a much more sophisticated scheme to support data warehousing.

BUSINESS INTELLIGENCE ADVISORY SERVICE To have a fully functional warehouse, you must have a variety of metadata available, along with data about the end-user views of data, and data about the operational databases. Ideally, end users should be able to access data from the data warehouse (or from the operational databases) without having to know where that data resides or the form in which it’s stored. Information included in the metadata repository includes definitions of: n Data source files/tables n Transformation from data source to core data warehouse

Functions in this layer include:

n Core data warehouse

n Conversion between SQL and native database access

n Transformation from core data warehouse to data marts

n Native database retrieval

n Data marts

n Conversion of native database format to SQL tables n Sending SQL responses The Metadata Repository Layer

To provide for universal data access, it’s absolutely necessary to maintain some form of data directory or repository of metadata (data about data within the enterprise) information. For instance, record descriptions in a COBOL program are metadata, so are DIMENSION statements in a Fortran program, or SQL Create statements. The information in an entity-relationship diagram is also metadata. VOL. 3, NO. 2

Warehouse Management Layer

The warehouse management layer is involved in scheduling the various tasks that must be accomplished to build and maintain the data warehouse and data directory information. The process management layer can be thought of as the scheduler, or the highlevel job control for the many processes that must occur to keep the data warehouse up to date. These functions include: n Scheduling n Performance n Security

Application Messaging Layer

The application messaging layer involves transporting information around the enterprise computing network. Application messaging is referred to as “middleware,” but it can involve more that just networking protocols. For example, it can be used to isolate applications, operational or informational, from the exact data format on either end. It can also be used to collect transactions or messages and deliver them to a certain location at a given time. Application messaging is the transport system underlying the data warehouse. This layer typically includes: n Logging n Connection between

applications n Bulk loading Internet/Intranet Layer

The Internet/intranet layer provides the logical messaging format for communication between the various architectural elements. This layer includes: n Browser interface (HTML/XML) n TCP/IP Some Comments About EDFA

Large organizations have widely accepted data warehousing. But there’s a lot of confusion about exactly what that means. Not surprisingly, many terms — such as data marts, OLAP, ROLAP, www.cutter.com

EXECUTIVE REPORT MOLAP, and business intelligence — have sprouted around the core concept. This has allowed many people to simply rename their standalone end-user applications (data marts) as data warehouse activities. Data warehousing involves implementing a core data warehouse that acts as an enterprise data asset; this makes supporting end-user requests for core information easier and more timely than using the traditional piecemeal approaches. The reason we have laid out our EDFA is that it’s vitally important to understand the distinctions that underlie the rest of this report. Readers must not only understand that “data marts” and “core data warehouses” are different, but also where they fit in the overall enterprise data flow. If we create a real-time enterprise, we must know all of the places data exists in the EDFA so it can be updated and accessed correctly. So, with a clear understanding of our business semantics and the core data warehouse, we can discuss how we go about designing the core data warehouse itself.

Kimball Versus Inmon: Who’s on First?

One of the ongoing controversies in data warehousing stems from the idea of what data warehouses are and how to design them. The controversy boils down to what you think a “data warehouse” and “data mart” are and how they fit together. In addition, it comes down to whether one sees data warehouse design as a subset (extension) of normal database design or as a wholly separate activity. Historically, the debate has been largely characterized by two of the most influential writers on data warehousing: Ralph Kimball [4] and Bill Inmon [3] . Kimball was the father of something that has come to be known as the “star schema” approach to designing what he calls data warehouses. But little of Kimball’s work addresses data warehouse architecture; rather, his technique is good for designing data marts.

Inmon, on the other hand, has taken a somewhat more traditional approach in which the data warehouse plays a more important role. He has focused on all the steps of moving data (right to left in Figure 16) from the data source layer to the core data warehouse layer to the data mart layer. In particular, Inmon is perhaps most famous for his introduction of the idea of the operational data store, a staging area in which data is pulled together from various data sources, cleansed, integrated, and then uploaded to various data marts. Now all of this seems very innocuous here, but the controversy over whose approach to use, Kimball’s or Inmon’s, has been an ongoing debate for nearly a decade. I’m not sure why, but it’s a heated debate. (Disclosure: I know both men slightly.) Each approach has its good and bad features. But the right approach, it seems to me, falls closer to Inmon’s work than

DESIGNING THE CORE DATA WAREHOUSE Now that we have a framework for an EDFA, let’s examine its bare bones (see Figure 16) and concentrate with what is perhaps the central issue for developing real data warehouses: the design of the core data warehouse.

Detailed DW

Data Mart Layer

Core DW Layer

Data Source Layer

Figure 16 — Bare bones enterprise data flow architecture.

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

Kimball’s, which lacks both rigor and understanding.

approach has enormous problems scaling.

marts and other data sets that are intended for end-user access.

The most important thing to recognize is that, ultimately, the Kimball-Inmon debate centers on summary versus atomic data as the foundation for data warehouses. Kimball, unfortunately, has approached the problem of building a data resource for an enterprise from a traditional end-user/data processing standpoint. This view sees the development of data warehouses as data repositories that are simple enough for end users to access directly. But it focuses on repositories that are far too limited to support large-scale data access needs. In addition, Kimball’s

The approach that we use to design a core data warehouse is based on our business semantics. As you will see, we believe in utilizing the notion of actors, messages, objects, and events to design and populate our data warehouses.

The controversy over the respective roles of data marts and data warehouses is a direct result of misunderstanding the architectural role of the data warehouse. Why is the core data warehouse there at all? Why not simply load data directly from source data systems directly into data marts? The answers to these questions are at the heart of building the real-time enterprise.

The goal of developing a data warehouse is to have it be a modular set of data tables that can be used to support the broadest range of end-user reporting needs. In general, the core data warehouse is not intended for direct access by the end users themselves, but as a mechanism to allow the staging of data to data

Flat File

Ease of Use

Data Cube

Star Schema

Snowflake

Snowball Avalanche

Scope

Figure 17 – Ease of use versus scope.

VOL. 3, NO. 2

Third Normal Form (3NF)

The answer to both questions is the same: end users must be able to easily find, access, manipulate, and display data from a number of sources. But there’s a fundamental conflict between ease of use and scope. In other words, we can present data to the end user that’s easy to manipulate but limited in scope (i.e., it contains only a limited amount of data about one subject), or we can give the end user access to data that will answer very complex queries requiring data from multiple tables, but we can’t do both without data marts (or enduser databases) and core data warehouses. Figure 17 illustrates this key problem involved with data marts and core data warehouses. On one axis we have ease of use, and on the other, scope. Over decades of work in this field, it has become clear to me that the easiest data format for end users to deal with is an old-fashioned “flat file,”

www.cutter.com

EXECUTIVE REPORT

Next in ease of use would probably be data cubes. Data cubes arguably make slicing and dicing data easiest and most direct for managers and end users. Much of the early push to data warehousing came from vendors of MDDBs, which is another term for data cubes. Data cubes represented the majority of OLAP tools. The star schema enables most of the same capabilities of the data cube within the world of relational databases (see Appendix on page 32). Star schema designs are not quite as easy to use as flat files or data cubes, but they provide most of the same capabilities and run on most commercial relational DBMSs. However, data cubes and star schema are somewhat limited when it comes to scope. Star schema, like flat files, work best where there is only one (or a few) “fact” tables. As designers attempt to add more tables to their data marts, it becomes more complex and difficult to work with data cubes and star schema.

of dimensional tables and multiple fact tables. Over time, these structures, like traditional (nonnormalized) database designs, become clumsier to work with. My experience is that snowflakes have a way of turning into snowballs, and snowballs into avalanches (see Figure 18). Then, we’re back to a state of uncontrollable database complexity without any rhyme or reason for the design. The only stopping place as the scope of your data warehouse increases is a third normal form (3NF) relational database; anything else is much like patching patches. My experience leads me to draw a line on Figure 17 that represents

complexity (or scope), somewhere after the snowflake schema (see Figure 18). I consider everything to the left of that line a data mart, everything to the right a true data warehouse, or at least in need of data warehousing analysis and design. One of the great problems with star schema as a design philosophy is that, although it tends to work well for small problems, it doesn’t work well, or at all, as the scope of the problem grows to pass that line. My experience has been that it’s better to divide the problem of delivering data to end users into two parts: the data mart and the core data warehouse. Moreover, I recommend thinking of the base of the core data warehouse as a

Flat File

Data Cube

Ease of Use

in which all the data is stored on a file with one record that contains all the data fields. Although these files are highly redundant, most users find it easy to manipulate data found on these files, and ease of use is perhaps the most important factor when it comes to providing immediate access to a wide class of end users.

Star Schema

Snowflake

Snowball Avalanche 3NF

Data Marts

The next step in terms of scope is what some call “snowflake” schemas, which are hierarchies

Data Warehouses

Scope

Figure 18 — Crossing the complexity barrier.

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

normalized database copied primarily for the key source systems in which the base-level messages, actors, and objects are entered and processed. Normalization of data in large databases has a really bad rap in the data warehousing world. Why? Largely because really normalized databases need a great deal of manipulation (selecting, projecting, and joining) to yield what often seems to be a rather simple answer. A client recently commented that it was not unusual for a fairly straightforward query to involve as many as 20 different tables in one of his major systems. But end users have trouble writing SQL for three tables, much less 20. So if you read much of the BI literature, there is this common assumption that

however you organize your data warehouse, you want to make sure to denormalize it. It seems like a good suggestion, but normalization, it turns out, is not all bad, especially if you’re trying to build a core data warehouse that can support multiple business functions with the same data resource. Normalized databases have two important capabilities: they’re the most flexible way to organize large sets of organized data, and they’re also the least redundant way to store large amounts of data. Because of these advantages, we develop our core data warehouse around mostly normalized atomic data, or baselevel business messages, such as transactions. Now before you get concerned about the fact that

Invoice

Customer

Invoice Header

Invoice Line

Product

Figure 19 — An extended customer/invoice/product data model.

VOL. 3, NO. 2

normalized data is difficult for end users to manipulate directly, you’ll notice that end users interact mostly with data marts or specially designed mini-marts that contain highly structured data in ways that are easy for the end user to manipulate. The core data warehouse is just what its name implies: a warehouse of information that’s used to stage and store the information that’s ultimately reformatted and used to load these user-friendly data marts. Designing the Core Data Warehouse

Even though I don’t try to build an entire enterprise level data all at once (I highly recommend you implement your EDFA one application at a time), the core data warehouse will ultimately need to provide a large portion of the data the organization needs to manage business. No highly denormalized star schema approach will solve this problem. We need an approach with significant theory and experience behind it: a (nearly) normalized base of data. Taking clues from the business semantic model, we can develop a core data warehouse design without many of the problems data architects have faced in building enterprise data models. Let’s return to our customer/ invoice/product data model. In Figure 19, we’ve taken the model in Figure 11 on page 8 and added some detail so that we can make

www.cutter.com

EXECUTIVE REPORT it directly into a normalized set of tables. We’ve done that by introducing an “invoice header” and an “invoice line” within the invoice entity. Now let’s see how we can make a direct link from the important business semantic entities that we captured in business context and data modeling to core data warehouse design (see Figure 20). In our design process, four major quadrants make up the core data warehouse. They are:

This base of sales (invoice) data can support most basic inquiries that might be made of most “official sales” data for the company. This is particularly important in developing a core data warehouse that can become a corporate data asset. The idea is to create a base of information that can be used to populate any

number of data marts or standard queries or answer a variety of ad hoc reports. Figure 21 shows how one connects the data sources to the base-level information. This may look simple, but in practice, connecting the core data warehouse with the data sources is the

4. Hierarchies of Actors and Objects

3. Summary Messages

n Quadrant 1 — atomic messages (transaction data) n Quadrant 2 — atomic actors/ objects (dimensional data)

Customer

Product

n Quadrant 3 — summary messages (fact tables) n Quadrant 4 — hierarchies of actors/objects You can think of the bottom half (quadrants 1 and 2) of this model as being normalized or nearly normalized. The top half (quadrants 3 and 4) contain either summarized (denormalized) data or hierarchies of data typically related to the actor or object data. Of all the data that needs to go into a well-designed core data warehouse, the atomic messages are by far the most numerous and most important. In the example that we’ve been using, quadrant 1 might consist of invoice data. This invoice is, in turn, connected to at least one actor (the customer) and one object (the product).

Invoice Header Invoice Line 2. Atomic Actors and Objects

1. Atomic Messages

Figure 20 — The four quadrants of the core data warehouse.

Customer

Data Sourcing or Replication

Product

Invoice Header Invoice Line

Data Sourcing or Replication Data Sources Data Sourcing or Replication

Figure 21 — Connecting the core data warehouse to the data sources.

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

most complex part of most data warehousing projects. It’s made somewhat easier if one pays attention to capturing the basic source (atomic) with which to

Customer

populate the data warehouse. This means that the data normally comes from a smaller number of ultimate sources.

Assume two things here: We’re building this core data warehouse to support a number of management information needs, and we want to support the sales management and product management functions. Sales is organized by territory, region, and company; product management by product class, product family, and company (see Figure 23).

Product

Invoice Header Invoice Line

Actors

Objects Message (Transactional) Data

Dimensional Data

Figure 22 — Dimensional and transactional data.

Summary/ Hierarchical Data

Company

Region

Product Family

Territory

Product Class

Customer

Product

Invoice Header

Base (Atomic) Data

Invoice Line

Actors

Objects

Dimensional Data

Message (Transactional) Data

Figure 23 — Adding the management hierarchies to the core data warehouse framework.

VOL. 3, NO. 2

As we build this approach, it’s useful to relate it to other nomenclature within the data warehousing world. (Note that in Figure 22, much of the data warehousing world calls actors and objects “dimensions.”)

Many, if not most “informational” systems organize data into hierarchies. Most often, structures map directly from the business’s organizational structure. The final step in coming up with our core data warehouse design is creating summary tables from the message data (see Figure 24). These tables are roughly equivalent to Kimball’s “fact tables,” but are much more modest in scope and much easier to understand. Such tables make sense when some accounting time period, such as a week or a month, is used as the basis for analytic uses. Creating such records also makes sense where there are huge amounts of detail (atomic) transactions and users can work with data that’s already summarized. But although summary records are useful and necessary, rememwww.cutter.com

EXECUTIVE REPORT ber that in a core data warehouse design, summary, or fact tables are frequently necessary; every time you summarize, you lose resolution. Most BI data-flow thinking is a direct result of decades of work with systems that were designed when data storage and processing were very expensive, and computers were slow. In the early days of computing, it made sense to summarize vast amounts of data, then use that summary data for future reporting. But that’s not nearly so important in a world of super servers, high-speed data storage, and smart software. Data marts may be constructed from summary tables, but at some point a really well-designed EDFA should make it possible for the user who has reached the bottom level of his or her data mart to drill back to the detail transactions that made up that fact table. There are other things that may be done to make the core data warehouse more effective. For example, it may be useful to create basic history records in the summary message quadrant (see Figure 25). This action could produce comparative reports (e.g., this year versus last year). We now have an initial core data warehouse. The information in it makes it easy to create a data cube to support both the sales and product management functions, pulling data from the

dimensional tables and the monthly sales table.

Summary/ Hierarchical Data

Company

Region

Product Family

Territory

Product Class

Customer

Product

Figure 26 represents the primary relationship between the core data warehouse and end-user data marts. The warehouse

Summary Messages (Fact Tables) Monthly Sales (Current Year) ‡

Invoice Header

Base (Atomic) Data

Invoice Line

Actors

Objects Message (Transactional) Data

Dimensional Data

Figure 24 — Adding summary (fact) tables to the core data warehouse.

Summary/ Hierarchical Data

Company

Region

Product Family

Territory

Product Class

Customer

Product

Summary Messages (Fact Tables) Monthly Sales (Current Year)

Monthly Sales (Last Year)

‡

Invoice Header

Base (Atomic) Data

Invoice Line

Actors

Objects

Dimensional Data

Message (Transactional) Data

Figure 25 — An extended core data warehouse.

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

Data Cube 1

Summary/ Hierarchical Data

Company

Region

Product Family

Territory

Product Class

Customer

Product

Summary Messages (Fact Tables) Monthly Sales (Current Year)

Monthly Sales (Last Year)

‡

Invoice Header

Base (Atomic) Data

Invoice Line

Actors

Objects Message (Transactional) Data

Dimensional Data

Figure 26 — Loading a data mart (data cube) from the core data warehouse.

Data Cube 2

Data Cube 1

Enterprise

Market

Region

Product Family

SIC

Territory

Product Class

Customer

Product

Summary Messages (Fact Tables) Monthly Sales (Current Year)

Monthly Sales (Last Year)Market

‡

Customer

Invoice Header Invoice Line

represents the base information that’s used to load the data marts. The base, or atomic, data in quadrant 1 is stable. It’s created just once, then used to upload a variety of data marts. In this initial core data warehouse, we can easily load data cube 1. Now, suppose another end-user department, one responsible for marketing to specific industries, wants to build its own data mart, based on a “customer hierarchy” that’s different from the core data warehouse (e.g., standard information category [SIC], market, and enterprise). We can do that by adding hierarchical information to quadrant 4 (see Figure 27). The principal benefit here is that the core data warehouse is designed to expand incrementally, eliminating the need for new data marts to build their own data staging processes. As we add information about the business process, we need to add business transactions to quadrant 1. For example, we might extend our warehouse to support the entire sales order process by adding messages for orders, shipments, returns, credit memos, payments, and refunds. With this information, we can construct any view of this information for those who oversee the process. The Importance of the Core Data Warehouse Design Framework

Actors

Objects

Dimensional Data

Message (Transactional) Data

Figure 27 — Loading a second data mart for the core data warehouse.

VOL. 3, NO. 2

It’s not a stretch to think of data warehousing as being the first, and perhaps most important, enterprise system. www.cutter.com

EXECUTIVE REPORT Data warehousing came about because end users’ demands for information were exceeding the ability of traditional one-of-a-kind data marts to solve the problem. So, whenever a department or project needed management information, it extracted data from whatever sources were available, then massaged that data to get the results. Some of these systems outgrew their original purposes and took on lives of their own. Frequently, the results from these one-off systems were different from the results the production operational systems (e.g., monthend accounting) would generate. But even though people could envision a corporate or business process data warehouse, they were just too big to create all at once. By having an overall data flow architecture and a core data warehouse design template, it becomes possible to build a core data warehouse that can be used for dozens, or perhaps hundreds, of different functions. Each new function doesn’t have to be redone in connecting with source data; all you need to do is add to the warehouse and add only the new information to the data staging process. We have spent a lot of time defining the EDFA and the core data warehouse because, in many respects, they represent major targets for our enterprise data architecture. The reason the architecture is important is that it makes the logical connection

between the business process (workflow) data and the management (analysis, control, and planning) data. The core data warehouse is intended to be the key vehicle for mapping one into the other. DISCOVERING THE EDA Suppose for a moment that the core data warehouse represents one stake in the ground, a primary goal of information integration within the enterprise. This makes sense, because providing management with data that is easier to get at and manipulate, and of higher quality, is critical to a realtime enterprise. So with this in mind, we can view the EDA as a way to model the data within the enterprise in an order that we can see what parts of that data are most important. This architecture should help us identify those actors, messages, objects, and events that are more important to the business. With that information in hand, it becomes possible to identify the different classes of users (e.g., top management, departmental management, brand management) and to begin to ask and answer the classic questions (who? what? when? where? how? why?) to determine which sets of information they will most likely need. Getting at the Business Actors, Messages, Objects, and Events

Though it looks easy, identifying the key business actors,

messages, objects, and events is not. The reason lies in the subtleties of natural language. For the most part, IT folks are not particularly well versed in semantics. Programmers simply want unique names for things that the computer will understand and process correctly. They don’t particularly care whether the terms they use within the computer make sense to people; they just want to make sure that they make sense to the compilers they have to work with. Moreover, they prefer short names because they’re easier to type. Database analysts have a more sophisticated sense of the subtleties of natural language, but they’re mostly concerned with ensuring that attributes and tables have unique names that will work with the DBMS that they’re working with and the programming languages that have to access them. Database (and object) modelers are often the most sophisticated people in the organization when it comes to semantic uses of data, but they’re sometimes too abstract when it comes to language. And most of all, they just want everybody to agree on one meaning for one concept. Here are a couple of examples drawn from personal experience. Several years ago, I was about two weeks into a new assignment and was in a modeling meeting when an experienced analyst threw up her hands and remarked, “This company is so screwed up that management doesn’t even know VOL. 3, NO. 2

24 what it means by the term ‘customer!’” Since then, I’ve heard this same thing many times. And it isn’t just “customer”; I’ve heard the same remarks about “employee,” “vendor,” and “product.” It caused me to start wondering: What’s the real problem here? I came to see that terms such as customer, employee, and vendor are not simple concepts. The reason that people had so many problems with common terms was that they were using the same words to represent different things. In a major diversified company, for example, there are lots of different classes of customers because the company has different classes of products. For example, General Electric (GE) sells dishwashers and microwaves, as well as locomotives and jet engines. Each group refers to customers and products, but the customers and products are very different. In the consumer world, there are two major classes of customers: the retail outlets that buy GE appliances directly, and the consumer who buys them from the retail outlets. Meanwhile, a locomotive or jet engine, being a different kind of product, needs different product, marketing, and pricing information, and are sold in ones and twos, not thousands.

BUSINESS INTELLIGENCE ADVISORY SERVICE Understanding the Business Context

Normally, the best place to begin developing an enterprise data architecture is to define the business context. As we said in the business semantics section, we’re looking for the business-critical actors, messages, objects, and events. What we capture provides the context in which things exist, such as who starts things off and what we send to whom. Figure 28 shows the context of an enterprise, in this case a printing company. The framework diagram in Figure 28 is really the sum of a number of context diagrams of individual systems. At this level, this system provides a basis for thinking about the business’s major processes, such as sales order, supply chain, payroll, and management reporting. From a data architecture standpoint, there are a number of things that stand out here. First, it begins to point out some of the actors that we’ll have to track (customers, vendors, employees). In addition, we can begin to spot the objects that will be in the same part of the model with the key external actors. Reviewing Existing Data Models

Our recommendations to people building an EDA is to do it by business domain, since it’s always easier to put things together than to pull them apart.

VOL. 3, NO. 2

The great thing about an enterprise data architecture is that it doesn’t change very much over time. If an enterprise stays in the same business and operates

through the same channels, the EDA is likely to stay the same. For this reason, it’s important to go back through previous requirements or modeling activities. As an example, I have been working with one of my best clients for a couple of decades; in that time, its basic data models haven’t changed much. Moreover, it often doesn’t matter that a particular model wasn’t implemented or was only partially implemented. What matters is that it’s a good high-level, logical model of the business. There are times when the basic business changes, and the historical data models don’t reflect that. For example, a few years back, I worked with an apparel company that shifted its fundamental business model from manufacturing private label clothing for large retail chains using their own plants (mostly in the US) to acquiring small companies with their own labels, who made their clothing in Third-World countries. It makes sense to try to determine whatever models exist. If you have worked in similar industries or have friends who do, it can’t hurt to look at other models. Finally, there are some books with which you should be familiar: n In Enterprise-Wide Data Modeling: Systems in Industry, A.W. Scheer explains how enterprise data architectures go together [5].

www.cutter.com

EXECUTIVE REPORT

Figure 28 — An enterprise-level context diagram.

n In Data Model Patterns: Conventions of Thought, David C. Hay offers excellent models based on a set of semantics quite close to the one we explained in Section 2 [2]. Understanding the Enterprise’s Business Processes

Context diagrams provide a basis for understanding the overall business. Business process, or “swimlane,” diagrams help you understand how the business handles business exchanges. Documenting the business processes help the student understand how the principal business messages flow through

the organization. Figure 29, for example, shows the sales order process for the printing company whose business processes are detailed in Figure 28. By understanding the business, it becomes easier to understand what other actors, messages, objects, and events need to be reflected in our EDA. Not only do we come to know the internal actors, we gain an understanding about the “states” of the sales order as it works through the system. We also see that we need to be able to model “job specs,” “estimates,” and “proposals” as essential elements of our data

model. (When we get into discussions of data warehouse planning, we also realize that in order to support management information needs, we will need to capture job spec, estimating, and proposal data if we’re going to track sales order fulfillment more closely.) At each stage in this EDA process, we fill in more information about the critical business semantic entities. But throughout the data architecture process, the need to be exhaustive must be balanced against the need to present a clear, simple, high-level view of the enterprise. There’s no simple shortcut for this process.

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

Billing

Producing this view is challenging under the best of circumstances. The most important thing to remember is that the EDA’s role is different from that of an application’s data model. It’s a way we can help people understand how they can conceptualize enterprise data so that we can do a better job of positioning various systems projects and activities.

Accounts Receivable and Sales Accounting

Modeling Major Objects

Customer

Submit Order

job request

Product

proposal

Sales

Review Job

estimate

Prepare Proposal

Estimating job spec

Estimate Job

Production Scheduling

Production

Figure 29 — Sales order business process.

order

Submit Order and Forecast

Most organizations produce something. In manufacturing or construction businesses, there is a common consensus about the importance of the enterprise’s products. In other kinds of businesses, such as service organizations and public agencies, the product or service they deliver is more abstract and, therefore, not quite so well thought out. In our work, we have found that modeling an enterprise’s major products or services are very important and useful in understanding how it should structure its enterprise data architecture. How information is or will be used to support management needs is equally important. In a recent project, in which I worked with a construction organization, it became clear that at the highest level, they delivered a “completed project”; in this case, the completed project was some segment of roadway with such elements as bridges and overpasses (see Figure 30).

Figure 30 — Basic object structure.

VOL. 3, NO. 2

www.cutter.com

EXECUTIVE REPORT The “project” structure shows that projects are split into two basic components: work phases and construction line items. In turn, information is maintained about estimates and actual costs for various work activities, and for “structure” (e.g., bridges and intersections) and “non-structure” (e.g., highway segments) items. I had been working within this area for some time and had a number of presentations on the overall data model, which contained so many entities that it

was referred to lovingly as the eye chart. This was the first time that I felt I had some understanding about how the various tables fit together. This was even more evident when we placed the major tables in this application over this project data structure (see Figure 31). The white boxes belong to the project management application, while the gray ones belong to applications that are used to support it. By representing the information about projects this way,

it became much easier to understand and talk about how this information had been used in various end-user activities and how it might be used in the future. One of the great insights that came out of this analysis was the application of the basic six questions of journalism to organizing this information (see Figure 32). This activity was only possible because we had knowledgeable people who were involved with developing the basic system and many of the end-user data marts

Project PWPH

PAOB

PACT

AEMP

ACAP Contract/Contractor

PWBS Contracting/ Financing

Contractor/ Consultant

CCFB PWPF

PSTR PRWY

Figure 31 — Project data structure overlaid with current relational tables.

VOL. 3, NO. 2

How much? (Financial questions [high level + est.])

BUSINESS INTELLIGENCE ADVISORY SERVICE Where? Who? Project PWPH

When?

PAOB

PACT

AEMP

ACAP Contract/Contractor

PWBS Contracting/ Financing

Who? Contractor/ Consultant

CCFB

How much? Financial questions (detail)

PWPF

Who is paying?

What? PSTR PRWY

Figure 32 — Attaching five of the six classic questions to the project data structure

over a long period of time. But out of this discussion, it became evident that developing such a chart and overlaying these questions could go a long way toward developing our long-term data warehouse structures and analyzing other parts of the business with which we were not so familiar. Some Observations About Understanding Information in a Real-World Context

The more you know about business semantics, business processes, and data architecture, the more you understand how the pieces fit together. No matter how complex the organization, or how

VOL. 3, NO. 2

complex the systems structure, it’s possible to see the outlines of the business process. As a result, you learn to look for clues as ways to tie things together. You learn to focus on the place in the organization (or systems structure) where the original business messages enter and leave the organization. And you learn, as I pointed out in the section on the EDFA, to focus on detail transactional data. I have learned, for example, to try to bring in the lowest level of detail as the basic structural information on which to build data warehouses. But even here, semantics and reality enter the conversation.

In our discussion of the data structure in Figures 30 and 31, it became clear that much of the data that appeared in gray was actually tied to even lower-level information, in this case, time sheets and equipment usage records. But, my subject matter expert cautioned me, we didn’t actually sum up the time sheets and equipment records because the information was just not good enough. Indeed, much of the cost data was actually added to higherlevel records rather than computed from the lowest-level information. As I thought about it, it occurred to me that many, if not most systems had similar complex data entry problems — not just at the lowest level of some organizational or business process hierarchy, but at the level where the information was deemed appropriate. Some Observations About Stable Systems and a Stable EDA

I’m a firm believer that there are always good reasons why certain patterns reappear in all sorts of circumstances. For example, I have always been intrigued by how universal and stable accounting systems such as accounts payable, accounts receivable, and payroll are. What is it about these systems that have lasted through the years? We’re just beginning to get a glimmer of why these systems seem to be so universal, and that’s helping us see how we might structure our EDA so that

www.cutter.com

EXECUTIVE REPORT people could understand it better and use it more.

exchange of money.2 Although this way of thinking may not work in all instances, it works in enough systems to apply it as a way to help us organize our enterprise data. Figure 34 gives us a simple framework for structuring the highest level of our EDA.

Not only do stable accounting systems relate to one of the principal outside actors (stakeholders) that the enterprise deals with (see Figure 32) but also a parallel operational system. So, for example, the purchasing and accounts payable systems run in parallel, as do the sales order and accounts receivable systems. This holds for the human resources and payroll systems as well. In each instance, an operational system deals with the object of the exchange and an accounting system handles the

The main observation about stable applications is they have something fundamentally to do with the underlying business semantics. The most stable systems have to do with one external actor, such as a customer, vendor, or employee, and a consistent set of business messages that cover a business exchange or business process (see Figure 33).

Employee

I owe this insight into the high-level organization of data to my friend J.D. Warnier [6].

Customer (internal)

. .O lP

t en m p i h ice m o lS vo n na I r Me l e er na f I nt r e ns I nt ra tT s Co

e I nt

od uc

sig

Internal Sales

Human Resources

There are obviously lots of other ways to structure an EDA. This one has a kind of elegant

a rn

t t

Payroll Management Reporting

Cost Transfer

General Ledger Sh

r do

n me

ice

o Inv

n Ve

t en

m ay

P or

d en

Vendor

Purchasing

O. P.

Sales

Accounts Payable

Accounts Receivable Cu

sto

er Or S hip der sto mer m In m en v er t Pa oice ym en t

sto

Customer (external)

Figure 33 â&#x20AC;&#x201D; The relationship of stable applications within the enterprise.

ÂŠ2003 CUTTER CONSORTIUM

VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

Enterprise Assignment Employee

Internal P.O.

Deliverable

Work

Internal Shipment

Time sheet

Product

Internal Invoice

Paycheck

Vendor

Product

P.O. Vendor Shipment

Purchased

Vendor Invoice

Products

Vendor Payment

Department

Cost Transfer Sales Order Customer Shipment Customer Invoice

Customer

Customer Payment

Figure 34 — Enterprise data architecture framework.

simplicity and symmetry, and I certainly recommend it. Architects love symmetry, but don’t lean too heavily on it. Real businesses, especially large ones, are complex. There are often, even in the most Byzantine organization and systems structure, underlying reasons why their organization and systems are what they are. Indeed, an enterprise’s success often flows more or less directly from the unique way they see the world. CONCLUSION Of all the components of an enterprise architecture, EDA is the most important. For a couple of decades now, we have been talking about reusing objects, components, programs, and systems. But what we really reuse is data. It’s the currency of IT — the one thing IT provides that the enterprise really can’t do without. So if there’s anything in the Zachman Framework that’s particularly important, it’s the data column. Over the history of computing, data tools have led the way VOL. 3, NO. 2

toward standardization. In the 1950s and 1960s, there was a push for standard access methods to eliminate the need to know exactly where data was stored on our systems. In the 1960s and 1970s, we saw first-generation DBMSs that allowed different applications to share the same data. In the 1980s and 1990s, we saw the emergence of relational DBMSs that allowed us to have very simple but elegant ways of sharing data and answering queries against this data. In the late 1980s and early 1990s, we also saw cross-database architectures such as data warehouses that helped us bridge the gap from incompatible databases and data naming. That process continues today. EDA represents the next generation — managing all major data components across the whole enterprise, or at least large parts of it. It’s amazingly difficult if you have no overall road map to know where you are. I spend a fair amount of my time working with systems analysts, database

administrators, and programmers down in the bowels of the IT organization. It’s hard to understand the big picture when all you can see is a wall-sized data model where the boxes have only sixor eight-character abbreviations. So many of the problems that I see every day stem from not having a common vision about data entities, data attributes, and data names — in other words, a lack of understanding of the business and technology semantics that exist in the real world. It’s hard for most of IT’s users to understand how easy it is to get lost in the jungle of different representations and different models. It’s hard for them to understand that even though we talk about systems and data engineering, there isn’t as much of it as we’d like to pretend. Most of our systems are not very smart. It would be nice if this were not the case, but it’s true. A great deal of our data architecture is determined more by our database technology (or at least our view of it) than by the business’s long-term needs. And many of our business trends don’t help this problem. Buying large-scale packages may save millions in helping us move from legacy to modern multitier applications, but in general, it doesn’t make getting at data any easier. A few years ago, I was working with a client that had just installed a major enterprise resource planning (ERP) package. Initially, the vendor pitched the package on

www.cutter.com

EXECUTIVE REPORT the basis that “we have thousands of reports, any information you need, we already have!” Unfortunately, the client didn’t need any of the thousands of reports, and getting the data out of the package and into a form in which it could integrate that information with other information it already had, became a nightmare. At one point, the company could get one of its data marts to come within $30 million of its operating statement (more than just a rounding error for this particular division). For those of you challenged to justify the cost of developing a good EDA, just point to the cost of hundreds or thousands of redundant data files in the operations. As a test, see whether the operations guys can tell you what’s on any specific database or how redundant the data on that database is to that of an adjacent application, such as order entry and shipping, or shipping and billing. In IT, we have watched for decades as we have promised more than one generation of management that with just one more technological upgrade, we would at last be able to get them the information they need in the appropriate time frame. And we can get the prepackaged stuff such as canned queries and standard reports most of the time. But if anything, we’re further away from being able to provide the kind of instantaneous information that the real-time enterprise needs.

Cisco has gotten a lot of mileage from IT executives for its internal systems and claims it can close its books in less than a day. Most of my clients are not in that boat. But Cisco got there through lots of work and commitment to good systems architecture. Cisco is perhaps as close to being a real-time enterprise as there is in the business world. But other companies make the same commitment, intent on becoming real-time enterprises. If you read my Enterprise Architecture Executive Report last October, you know that I believe that the top row of the Zachman Framework more nearly represents city and regional planning than it does architecture. In much the same way, the data architect should be thought of more as a data planner than a data architect. The city planning function is not so much involved in laying out exactly how various parts of the city are to be organized but in trying to gain a political consensus on the general topology of the city and its environs. In the same way, the top-level data planning function involves laying out the general flow (the enterprise data flow architecture) and the general topology of the major data assets. As more organizations move toward becoming real-time enterprises, they’re going to need EDAs to help them make this move, and architects who understand the map of the whole enterprise, not just one piece. Today,

this is especially important work in the evolution of advanced computer systems. ABOUT THE AUTHOR Ken Orr is a Fellow of the Cutter Business Technology Council and a Cutter Consortium Senior Consultant and contributor to Cutter’s Business-IT Strategies and Enterprise Architecture Practices. He is also a regular speaker at Cutter Summits and symposia. Mr. Orr is a Principal Researcher with the Ken Orr Institute, a business technology research organization. Previously, he was an Affiliate Professor and Director of the Center for the Innovative Application of Technology with the School of Technology and Information Management at Washington University. He is an internationally recognized expert on technology transfer, software engineering, information architecture, and data warehousing. Mr. Orr has more than 30 years’ experience in analysis, design, project management, technology planning, and management consulting. He is the author of Structured Systems Development, Structured Requirements Definition, and The One Minute Methodology. He can be reached at korr@cutter.com. REFERENCES 1. Harmon, P. “Enterprise Architectures.” Cutter Consortium Enterprise Architecture Executive Update, Vol. 5, No. 16, September 2002. VOL. 3, NO. 2

BUSINESS INTELLIGENCE ADVISORY SERVICE

2. Hay, D. Data Model Patterns: Conventions of Thought. Dorset House, 1996. 3. Inmon, W.H. Building a Data Warehouse. 3rd edition. John Wiley & Sons, 2002. 4. Kimball, R. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. 2nd edition. John Wiley & Sons, 2002. 5. Scheer, A.W. Enterprise-Wide Data Modelling: Information Systems in Industry. Springer Verlag, 1990. 6. Warnier, J.D. Logical Construction of Systems. Van Nostrand Reinhold, 1981. 7. Zachman, J. “A Framework for Information Systems Architecture.” IBM Systems Journal, Vol. 26, No. 3, 1987.

APPENDIX: STAR SCHEMA DATA MART DESIGN Ralph Kimball is credited with creating the “star schema” design approach [4]. Star schema is perhaps best thought of as a way to implement an MDDB in a relational framework. A data cube of information about the sales and costs of products sold in a given month might include hierarchical dimensions for customers, sales regions, products, and time. So, a manager could quickly look at which customer bought which products, or which products were bought in which regions, or in which time period the products were purchased. Most MDDBs make this very easy and intuitive. Kimball’s design strategy makes this kind of multidimensional analysis straightforward within the framework of a more traditional relational database.

In this strategy, the design of a data mart or data warehouse is organized around two major elements: fact tables and dimension tables. Fact tables represent some central fact or concept that users are interested in, for example, purchases. A fact table resembles a flat file with the redundant information pulled out and stored in what Kimball calls dimension tables. Here (see Figure 1) we have the dimensions that make the hierarchical analysis possible. The term “star schema” takes its name from the star-like appearance of an entity-relationship diagram.

Dimension Tables

Customer Customer #

Product

Product # Region # Year Sales Amt Region

Cost Amt

Year

Fact Table

Figure 1 — A “star schema” design.

VOL. 3, NO. 2

www.cutter.com

Index

> Business Intelligence Advisory Service

of published issues

Upcoming Topics l

Data Warehousing and Enterprise Analytics Raymond Pettit Building and Sustaining High Performance Organizations With Business Measures Jeff McGillan

Executive Reports Vol. 3, No. 2

Integrating Enterprise Data Architecture and Enterprise Data Warehousing by Ken Orr

Vol. 3, No. 1

Enterprise Business Suites by John Harney

Vol. 2, No. 12 Building a Smarter Internet: Technologies for the Semantic Web by Ken Orr Vol. 2, No. 11 The Other Side of Customer Experience Management: Customer-Centric Understanding and Equity by Dr. Raymond Pettit Vol. 2, No. 10 Integration Capabilities of Enterprise Portals by Brian J. Dooley

Data Modeling for CRM David Loshin

Vol. 2, No. 9

Personalization from Web Sites to Software: Mass-Produced Individuality by Jesse Feiler

This index includes Business

Vol. 2, No. 8

Developing BI Decision-Support Applications: Not Business As Usual by Larissa T. Moss

Vol. 2, No. 7

Supply Chain Intelligence: Technology, Applications, and Products by Curt Hall

Vol. 2, No. 6

Managing Corporate Intellectual Property: Key to the Knowledgeto-Net-Worth Transformation by the National Knowledge and Intellectual Property Management Taskforce

Vol. 2, No. 5

The 12 Application Priorities for Competitive Intelligence in the Modern Business Enterprise by Arik Johnson

Vol. 2, No. 4

Achieving a High-Quality Data Resource by Michael Brackett

Vol. 2, No. 3

The State of CRM: Addressing Deficiencies and the Achilles’ Heel of CRM by Dr. Raymond Pettit

Vol. 2, No. 2

Wireless Technology for CRM by Ian Hayes

Intelligence Executive Reports and Executive Updates that have been recently published, plus upcoming Executive Report topics. Reports that have already been published are available electronically in the Online Resource Center. The Resource Center includes the entire Business Intelligence Advisory Service archives plus additional articles authored by Cutter Consortium Senior Consultants on the topic of business intelligence.

Executive Updates Vol. 3, No. 2

Leveraging IT and Data Management by Craig McComb

For information

Vol. 3, No. 1

Supply Chain Intelligence: Development Issues (Part VII) by Curt Hall

on beginning a subscription

Vol. 2, No. 18 Supply Chain Intelligence: Development Issues (Part VI) by Curt Hall

or upgrading your current subscription to include access

Vol. 2, No. 17 Database Refactoring: Improving Data Quality After the Fact by Scott W. Ambler

to the Online Resource

Vol. 2, No. 16 The Role of Program Management for BI by Claudia Imhoff

Center, contact your account

Vol. 2, No. 15 Supply Chain Intelligence: Development Issues (Part V) by Curt Hall

representative directly or

Vol. 2, No. 14 Supply Chain Intelligence: Development Issues (Part IV) by Curt Hall

call +1 781 648 8700 or send e-mail to sales@cutter.com.

Vol. 2, No. 13 Supply Chain Intelligence: Development Issues (Part III) by Curt Hall Vol. 2, No. 12 Business Intelligence Software by Richard T. Dué Vol. 2, No. 11 Data Quality An Interview with Tom Redman Vol. 2, No. 10 IBM Writes the Book on “CRM Financial Services” by Dr. Raymond Pettit Vol. 2, No. 9

Supply Chain Intelligence: Development Issues (Part II) by Curt Hall

Vol. 2, No. 8

Supply Chain Intelligence: Development Issues (Part I) by Curt Hall

Vol. 2, No. 7

Supply Chain Intelligence: Initial Findings by Curt Hall

Vol. 2, No. 6

Web, Portal Services May Be the B2B Globalization Silver Bullet for Small to Mid-Sized Enterprises: Part II by Bruce Taylor

Vol. 2, No. 5

Wireless Business Intelligence by Curt Hall

ACCESS TO THE EXPERTS

CUTTER CONSORTIUM SUMMIT 2003

conference Discuss these important issues with the experts:

Eclipse: A Large-Scale Open Source Development Project — What We Can Learn from Open Source Erich Gamma shares his insight on the Eclipse Project and reflects on the best practices for managing such a large project and a distributed team that includes open source contributors.

How the Genomic Revolution Will Change Computing Juan Enriquez will reveal why gene research is the single most important driver of new computers and software in places like IBM, Compaq, and Sun Microsystems.

Best Practices in IT Governance Christine Davis offers advice on how your organization can extract more value from its IT investments by adopting a leading-edge IT governance model.

Balancing Risk and Value Tom DeMarco and Tim Lister put value assessment and risk assessment in context and show a technique for managing both.

Web Services: Childhood’s End? Tom Welsh takes a skeptical but open-minded look at the past, present, and future of Web services; its current limitations; and likely future trends.

at a glance

Summit 2003 April 28-30, 2003 Hotel@MIT Cambridge, MA, USA

How It Works The Summit’s unique format — 90-minute keynotes followed by 90-minute panel debates — gives panelists and attendees a chance to challenge the views of the keynote speakers and engage in informal and illuminating debate. The often intense interaction encourages knowledge sharing and learning. The informal setting of this intimate gathering is the perfect opportunity to candidly discuss the challenges you face — from technical concerns and strategies, to trends in your own organization, to techniques you can use to overcome the political roadblocks in your enterprise.

Kent Beck and Joshua Kerievsky.

Summit 2003 is packed with opportunities for one-on-one interaction with speakers (most of whom stay for all three days) and colleagues from around the world, as well as time to deliberate on the issues as a group. With conference sessions designed to maximize participation and interaction, and long breakfasts and lunches each day, there is plenty of opportunity to exchange your opinions about the conference’s topics. Plus, Monday and Tuesday evening cocktail parties provide an opportunity to unwind and socialize with all.

Product Development — Join Jim Highsmith and Ken Schwaber in a discussion

CUTTER CONSORTIUM

Nursing the Hangover: Funding Technology Innovation in the Post-Bubble Economy Lou Mazzucchelli gives a glimpse into the macroeconomics of innovation and reveals some of the interesting implications IT managers will face down the road as a result of today’s reduced competitive environment. Join one or more of the breakfast roundtables:

CRM — Join the discussion on business strategy, organizational structure and culture, and the technology investment required to make CRM work.

IT Executives: The New Agenda — What’s different about being an IT executive today than yesterday? Discuss the changes you’re experiencing. XP: Is it really so “extreme” anymore? — Come share your experiences with

about using agile methods while under strict product quality, liability, and regulatory requirements.

2003

SUMMIT

Don’t Miss These Eye-Opening Workshops

Agile Software Development with Jim Highsmith: Sunday, April 27, 2003: 9 am - 4 pm Enterprise Architecture and IT Strategy with Peter Herzum: Sunday, April 27, 2003: 9 am - 12 pm Web Services and Service-Oriented Architectures with Peter Herzum: Sunday, April 27, 2003: 1 pm - 4 pm Test-Driven Management: A Key to XP Success with Joshua Kerievsky: Thursday, May 1, 2003: 9 am - 4 pm Registration for one or all workshops does not require registration for Summit 2003.

Cutter Consortium, 37 Broadway, Suite 1, Arlington, MA 02474-5552, USA; +1 781 648 8700; consortium@cutter.com; www.cutter.com

CUTTER CONSORTIUM SUMMIT 2003

workshops

at a glance

Agile Software Development with Jim Highsmith

Agile software development combines specific software development and project management practices with an explicit organizational perspective that enables teams to deliver software products in volatile business and technology situations. Whether your key projects involve implementing a new CRM system, developing custom software, delivering a sophisticated product with embedded software, or installing the latest Web services technology, agility is the key to success. In this full-day tutorial, Jim Highsmith will help you discover how your organization can adopt and benefit from agile software development.

Enterprise Architecture and IT Strategy with Peter Herzum Why — and how — have many Fortune 100 companies chosen to invest in enterprise architecture (EA) — even when forced to reduce the overall IT budget? How is EA used to reduce costs and align IT with the business? Find out in this half-day workshop with Peter Herzum. In this session, you will get a comprehensive overview of modern and pragmatic approaches to EA. The session provides executives, senior managers, and senior architects with an intense update on state-of-the-art conceptual frameworks, viewpoints, models, processes, and techniques for EA.

Web Services and Service-Oriented Architectures with Peter Herzum

In this fast-paced half-day workshop, Peter Herzum provides you with the state-of-the-art of Web services and service-oriented architectures. You’ll focus on what senior technologists and technology executives need to know about these much-hyped topics. Gain an overview of the most relevant concepts, standards, technologies, and architectural issues; discuss the most important dimensions of a successful adoption of Web services; look at how various enterprises have actually adopted Web services, their critical success factors and ROI, and the lessons learned; review existing platforms, architectures, and methodologies for Web services and service-oriented architectures; and discuss the future of Web services and their impact on the business.

Test-Driven Management: A Key to XP Success with Joshua Kerievsky

Senior managers frequently don’t communicate their organizational intentions and business objectives to XP teams. As a result, an XP team can create finely crafted, fully tested software that still fails to meet unarticulated organizational or financial objectives. Test-driven management enables XP project managers to clearly articulate and assess organizational intentions and business objectives for their projects and teams. In this one-day workshop, together with Joshua Kerievsky, you will explore how management tests can be integrated into project charters to ensure that organizational goals are well understood; study several realworld management tests; discuss the differences between internal and external management tests; and determine under what circumstances each is best for your organization.

Event

# Registrants

Summit 2003 Conference

US $597 workshop only US $497 when registered for Summit

______ ______

___________ ___________

US $330 workshop only US $280 when registered for Summit

______ ______

___________ ___________

US $330 workshop only US $280 when registered for Summit

______ ______

___________ ___________

US $597 workshop only US $497 when registered for Summit

______ ______

___________ ___________

Test-Driven Management: A Key to XP Success

Sunday, April 27, 2003 9 am - 4 pm

Enterprise Architecture and IT Strategy with Peter Herzum Sunday, April 27, 2003 9 am - 12 pm

Web Services and Service-Oriented Architectures with Peter Herzum Sunday, April 27, 2003 1 pm - 4 pm

Test-Driven Management: A Key to XP Success with Joshua Kerievsky Thursday, May 1, 2003 9 am - 4 pm Registration for one or all workshops does not require registration for Summit 2003. Register online: www.cutter.com/summit/

Total $______________________

___________ ___________

Web Services and Service-Oriented Architecture

Agile Software Development with Jim Highsmith

Payment

______ ______

Enterprise Architecture and IT Strategy

— Gary Walker, Manager, Software Development, MDS Sciex

Subtotal

US $1,995 per person US $1,795 each for three or more people

Agile Software Development

“Very thought-provoking. I’m sure that I will be able to quickly put some of these concepts into practice.”

Name

Title

Organization

Department

Address/P.O. Box

City

State/Province

Telephone

Fax

E-Mail

Check enclosed (payable to Cutter Consortium) Invoice my company Credit card (Mastercard, Visa, AmEx, or Diners Club) (Charge will appear as Cutter Consortium.)

Name on Credit Card Card #

Exp. Date

Signature

ZIP/Postal Code

Country

Business Intelligence Practice The strategies and technologies of business intelligence and knowledge management are critical issues enterprises must embrace if they are to remain competitive in the e-business economy. It’s more important than ever to make the right strategic decisions the first time. Cutter Consortium’s Business Intelligence Practice helps companies take all their enterprise data, augment it if appropriate, and turn it into a powerful strategic weapon that enables them to make better business decisions. The practice is unique in that it provides clients with the full picture: technology discussions, product reviews, insight into organizational and cultural issues, and strategic advice across the full spectrum of business intelligence. Clients get the background they need to manage technical issues like data cleansing as well as management issues such as how to encourage employees to participate in knowledge sharing and knowledge management initiatives. From tactics that will help transform your company to a culture that accepts and embraces the value of information, to surveys of the tools available to implement business intelligence initiatives, the Business Intelligence Practice helps clients leverage data into revenue-generating information. Through Cutter’s subscription-based service and consulting, mentoring, and training, clients are ensured opinionated analyses of the latest data warehousing, data mining, knowledge management, CRM, and business intelligence strategies and products. You’ll discover the benefits of implementing these solutions, as well as the pitfalls companies must consider when embracing these technologies. Products and Services Available from the Business Intelligence Practice

• • • • •

The Business Intelligence Advisory Service Consulting Inhouse Workshops Mentoring Research Reports

Other Cutter Consortium Practices

Cutter Consortium aligns its products and services into the nine practice areas below. Each of these practices includes a subscription-based periodical service, plus consulting and training services.

• • • • • • • • •

Agile Project Management Business Intelligence Business-IT Strategies Business Technology Trends and Impacts Enterprise Architecture IT Management Measurement and Benchmarking Strategies Risk Management and Security Sourcing

Senior Consultant Team The Senior Consultants on Cutter’s Business Intelligence team are thought leaders in the many disciplines that make up business intelligence. Like all Cutter Consortium Senior Consultants, each has gained a stellar reputation as a trailblazer in his or her field. They have written groundbreaking papers and books, developed methodologies that have been implemented by leading organizations, and continue to study the impact that business intelligence strategies and tactics are having on enterprises worldwide. The team includes:

• • • • • • • • • • • • • • • • • •

Verna Allee Stowe Boyd Clive Finkelstein Jonathan Geiger David Gleason Curt Hall Claudia Imhoff André LeClerc Lisa Loftis David Marco Larissa T. Moss Joyce Norris-Montanari Ken Orr Raymond Pettit Ram Reddy Thomas C. Redman Michael Schmitz Karl M. Wiig