Solution Manual for Database Systems Design, Implementation, & Management, 14th Edition Carlos Coron by welldoneassistant

CHAPTER 1: DATABASE SYSTEMS

TABLE OF CONTENTS Answers to Review Questions…….................................…………………………………………….1 Answers to Problems ...............................................................................................................9

ANSWERS TO REVIEW QUESTIONS 1.

Define each of the following terms: Answer: a. data Raw facts from which the required information is derived. Data have little meaning unless they are grouped in a logical manner. b. field A character or a group of characters (numeric or alphanumeric) that describes a specific characteristic. A field may define a telephone number, a date, or other specific characteristics that the end user wants to keep track of. c. record A logically connected set of one or more fields that describes a person, place, event, or thing. For example, a CUSTOMER record may be composed of the fields CUST_NUMBER, CUST_LNAME, CUST_FNAME, CUST_INITIAL, CUST_ADDRESS, CUST_CITY, CUST_STATE, CUST_ZIPCODE, CUST_AREACODE, and CUST_PHONE. d. file Historically, a collection of file folders, properly tagged and kept in a filing cabinet. Although such manual files still exist, we more commonly think of a (computer) file as a collection of related records that contain information of interest to the end user. For example, a sales organization is likely to keep a file containing customer data. Keep in mind that the phrase related records reflects a relationship based on function. For example, customer data are kept in a file named CUSTOMER. The records in this customer file are related by the fact that they all pertain to customers. Similarly, a file named PRODUCT would contain records that describe products—the records in this file are all related by the fact that they all pertain to products. You would not expect to find customer data in a product file, or vice versa.

NOTE Field, record, and file are computer terms, created to help describe how data are stored in secondary memory. Emphasize that computer file data storage does not match the human perception of such data storage. 2.

What is data redundancy, and which characteristics of the file system can lead to it? Answer: Data redundancy exists when unnecessarily duplicated data are found in the database. For example, a customer’s telephone number may be found in the customer file, in the sales agent file, and in the invoice file. Data redundancy is symptomatic of a (computer) file system, given its inability to represent and manage data relationships. Data redundancy may also be the result of poorly designed databases that allow the same data to be kept in different locations. (Here's another opportunity to emphasize the need for good database design!)

What is data independence, and why is it lacking in file systems? Answer: Data independence is a condition in which the programs that access data are not dependent on the data storage characteristics of the data. Systems that lack data independence are said to exhibit data dependence. File systems exhibit data dependence because file access is dependent on a file’s data characteristics. Therefore, any time the file data characteristics are changed, the programs that access the data within those files must be modified. Data independence exists when changes in the data characteristics don’t require changes in the programs that access those data. File systems lack data independence because all data access programs are subject to change when any of the file system’s data storage characteristics—such as changing a data type—change.

What is a DBMS, and what are its functions? Answer: A DBMS is best described as a collection of programs that manage the database structure and that control shared access to the data in the database. Current DBMSs also store the relationships between the database components; they also take care of defining the required access paths to those components. The functions of a current-generation DBMS may be summarized as follows: 

The DBMS stores the definitions of data and their relationships (metadata) in a data dictionary; any changes made are automatically recorded in the data dictionary.



The DBMS creates the complex structures required for data storage.



The DBMS transforms entered data to conform to the data structures in the previous item.



The DBMS creates a security system and enforces security within that system.



The DBMS creates complex structures that allow multiple-user access to the data.



The DBMS performs backup and data recovery procedures to ensure data safety.



The DBMS promotes and enforces integrity rules to minimize data integrity problems.



The DBMS provides access to the data via utility programs and from programming languages interfaces.



The DBMS provides end-user access to data within a computer network environment.

What is structural independence, and why is it important? Answer: Structural independence exists when data access programs are not subject to change when the file’s structural characteristics, such as the number or order of the columns in a table, change. Structural independence is important because it substantially decreases programming effort and program maintenance costs.

Explain the differences among data, information, and a database. Answer: Data are raw facts. Information is processed data to reveal the meaning behind the facts. Let’s summarize some key points: 

Data constitute the building blocks of information.



Information is produced by processing data.



Information is used to reveal the meaning of data.



Good, relevant, and timely information is the key to good decision making.



Good decision making is the key to organizational survival in a global environment.

A database is a computer structure for storing data in a shared, integrated fashion so that the data can be transformed into information as needed. 7.

What is the role of a DBMS, and what are its advantages? What are its disadvantages? Answer: A database management system (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database. Figure 1.4 (shown in the text) illustrates that the DBMS serves as the intermediary between the user and the database. The DBMS receives all application requests and translates them into the complex operations required to fulfill those requests. The DBMS hides much of the database’s internal complexity from the application programs and users. The application program might be written by a programmer using a programming language such as COBOL, Visual Basic, or C++, or it might be created through a DBMS utility program. Having a DBMS between the end user’s applications and the database offers some important advantages. First, the DBMS enables the data in the database to be shared among multiple applications or users. Second, the DBMS integrates the many different users’ views of the data into a single all-encompassing data repository. Because data are the crucial raw material from which information is derived, you must have a good way of managing such data. As you will discover in this book, the DBMS helps make data management more efficient and effective. In particular, a DBMS provides advantages such as: 

Improved data sharing. The DBMS helps create an environment in which end users have better access to more and better-managed data. Such access makes it possible for end users to respond quickly to changes in their environment.



Improved data security. A DBMS provides a framework for better enforcement of data privacy and security policies.



Better data integration. Wider access to well-managed data promotes an integrated view of the organization’s operations and a clearer view of the big picture. It becomes much easier to see how actions in one segment of the company affect other segments.



Minimized data inconsistency. Data inconsistency exists when different versions of the same data appear in different places. For example, data inconsistency exists when a company’s sales department stores a sales representative’s name as “Bill Brown” and the company’s personnel department stores that same person’s name as “William G. Brown” or when the company’s regional sales office shows the price of product “X” as $45.95 and its national sales office shows the same product’s price as $43.95. The probability of data inconsistency is greatly reduced in a properly designed database.



Improved data access. The DBMS makes it possible to produce quick answers to ad hoc queries. From a database perspective, a query is a specific request for data manipulation (e.g., to read or update the data) issued to the DBMS. Simply put, a query is a question and an ad hoc query is a spur-of-the-moment question. The DBMS sends back an answer (called the query result set) to the application. For example, end users, when dealing with large amounts of sales data, might want quick answers to questions (ad hoc queries) such as:  What was the dollar volume of sales by product during the past six months?  What is the sales bonus figure for each of our salespeople during the past three months?  How many of our customers have credit balances of $3,000 or more?



Improved decision making. Better-managed data and improved data access make it possible to generate better-quality information, on which better decisions are based.



Increased end-user productivity. The availability of data, combined with the tools that transform data into usable information, empowers end users to make quick, informed decisions that can make the difference between success and failure in the global economy.

The advantages of using a DBMS are not limited to the few just listed. In fact, you will discover many more advantages as you learn more about the technical details of databases and their proper design. Although the database system yields considerable advantages over previous data management approaches, database systems do carry significant disadvantages. For example: 

Increased costs. Database systems require sophisticated hardware and software and highly skilled personnel. The cost of maintaining the hardware, software, and personnel required to operate and manage a database system can be substantial. Training, licensing, and regulation compliance costs are often overlooked when database systems are implemented.



Management complexity. Database systems interface with many different technologies and have a significant impact on a company’s resources and culture. The changes

introduced by the adoption of a database system must be properly managed to ensure that they help advance the company’s objectives. Given the fact that database systems hold crucial company data that are accessed from multiple sources, security issues must be assessed constantly.



Maintaining currency. To maximize the efficiency of the database system, you must keep your system current. Therefore, you must perform frequent updates and apply the latest patches and security measures to all components. Because database technology advances rapidly, personnel training costs tend to be significant.



Vendor dependence. Given the heavy investment in technology and personnel training, companies might be reluctant to change database vendors. As a consequence, vendors are less likely to offer pricing point advantages to existing customers, and those customers might be limited in their choice of database system components.



Frequent upgrade/replacement cycles. DBMS vendors frequently upgrade their products by adding new functionality. Such new features often come bundled in new upgrade versions of the software. Some of these versions require hardware upgrades. Not only do the upgrades themselves cost money, but it also costs money to train database users and administrators to properly use and manage the new features.

List and describe the different types of databases. Answer: The focus is on Section 1-3b, Types of Databases. Organize the discussion around the number of users, database site location, and data use: 



Number of users 

Single-user



Multiuser



Workgroup



Enterprise

Database site location 

Centralized



Distributed



Cloud-based

Type of data 

General-purpose



Discipline-specific

Database use 

Transactional (production) database (OLTP)



Data warehouse database (OLAP)

Degree of data structure



Unstructured data



Structured data

For a description of each type of database, please see Section 1-3b. 9.

What are the main components of a database system? Answer: The basis of this discussion is Section 1-7a, The Database System Environment. Figure 1.10 provides a good bird’s-eye view of the components. Note that the system’s components are hardware, software, people, procedures, and data.

10. What is metadata? Answer: Metadata is data about data. That is, metadata defines the data characteristics such as the data type (such as character or numeric) and the relationships that link the data. Relationships are an important component of database design. What makes relationships especially interesting is that they are often defined by their environment. For instance, the relationship between EMPLOYEE and JOB is likely to depend on the organization’s definition of the work environment. For example, in some organizations, an employee can have multiple job assignments, while in other organizations—or even in other divisions within the same organization—an employee can have only one job assignment. The details of relationship types and the roles played by those relationships in data models are defined and described in Chapter 2, Data Models. Relationships will play a key role in subsequent chapters. You cannot effectively deal with database design issues unless you address relationships. 11. Explain why database design is important. Answer: The focus is on Section 1-4, Why Database Design Is Important. Explain that modern database and applications development software is so easy to use that many people can quickly learn to implement a simple database and develop simple applications within a week or so, without giving design much thought. As data and reporting requirements become more complex, those same people will simply (and quickly!) produce the required add-ons. That’s how data redundancies and all their attendant anomalies develop, thus reducing the “database” and its applications to a status worse than useless. Stress these points: 

Good applications can’t overcome bad database designs.



The existence of a DBMS does not guarantee good data management, nor does it ensure that the database will be able to generate correct and timely information.



Ultimately, the end user and the designer decide what data will be stored in the database.

A database created without the benefit of a detailed blueprint is unlikely to be satisfactory. Pose this question: would you think it is smart to build a house without the benefit of a blueprint? So why would you want to create a database without a blueprint? (Perhaps it would be OK to build a chicken coop without a blueprint, but would you want your house to be built the same way?) 12. What are the potential costs of implementing a database system? Answer: Although the database system yields considerable advantages over previous data management approaches, database systems do impose significant costs. For example:



Increased acquisition and operating costs. Database systems require sophisticated hardware and software and highly skilled personnel. The cost of maintaining the hardware, software, and personnel required to operate and manage a database system can be substantial.



Management complexity. Database systems interface with many different technologies and have a significant impact on a company’s resources and culture. The changes introduced by the adoption of a database system must be properly managed to ensure that they help advance the company’s objectives. Given the fact that database systems hold crucial company data that are accessed from multiple sources, security issues must be assessed constantly.



Vendor dependence. Given the heavy investment in technology and personnel training, companies may be reluctant to change database vendors. As a consequence, vendors are less likely to offer pricing point advantages to existing customers and those customers may be limited in their choice of database system components.



Frequent upgrade/replacement cycles. Vendors come up with new features, often included in new versions. Such versions frequently require processing and hardware upgrades. Such upgrades come with additional costs on dollars, personnel time, and downtime.

13. Use examples to compare and contrast unstructured and structured data. Which type is more prevalent in a typical business environment? Answer: Unstructured data are data that exist in their original (raw) state, that is, in the format in which they were collected. Therefore, unstructured data exist in a format that does not lend itself to the processing that yields information. Structured data are the result of taking unstructured data and formatting (structuring) such data to facilitate storage, use, and the generation of information. You apply structure (format) based on the type of processing that you intend to perform on the data. Some data might be not ready (unstructured) for some types of processing, but they might be ready (structured) for other types of processing. For example, the data value 37890 might refer to a zip code, a sales value, or a product code. If this value represents a zip code or a product code and is stored as text, you cannot perform mathematical computations with it. On the other hand, if this value represents a sales transaction, it is necessary to format it as numeric. If invoices are stored as images for future retrieval and display, you can scan them and save them in a graphic format. On the other hand, if you want to derive information such as monthly totals and average sales, such graphic storage would not be useful. Instead, you could store the invoice data in a (structured) spreadsheet format so that you can perform the requisite computations. Based on sheer volume, most data are unstructured or semistructured. Data for conducting actual business transactions are usually structured. 14. What are some basic database functions that a spreadsheet cannot perform?

Answer: Spreadsheets do not support self-documentation through metadata, enforcement of data types or domains to ensure consistency of data within a column, defined relationships among tables, or constraints to ensure consistency of data across related tables. It is important to note that newer versions of MS Office Excel come with new features such as PowerQuery and PowerBI that add more database-like data management functionality to the Excel spreadsheet. 15. What common problems do a collection of spreadsheets created by end users share with the typical file system? Answer: A collection of spreadsheets shares several problems with the typical file system. The first problem is that end users create their own, private copies of the data, which creates issues of data ownership. This situation also creates islands of information where changes to one set of data are not reflected in all of the copies of the data. This leads to the second problem—lack of data consistency. Because the data in various spreadsheets may be intended to represent a view of the business environment, a lack of consistency in the data may lead to faulty decision making based on inaccurate data. 16. Explain the significance of the loss of direct, hands-on access to business data that end users experienced with the advent of computerized data repositories. Answer: Users lost direct, hands-on access to the business data when computerized data repositories were developed because the IT skills necessary to directly access and manipulate the data were beyond the average user’s abilities, and because security precautions restricted access to the shared data. This was significant because it removed users from direct data manipulation and introduced significant time delays for data access. The trade-off of data access versus data security often pays off due to the increasing emphasis in security imposed by the likes of data breaches and data hacks. When users need answers to business questions from the data, necessity often does not give them the luxury of time to wait days, weeks, or even months for the required reports. The desire to return hands-on access to the data to the users, among other drivers, helped to propel the development of database systems. While database systems have greatly improved the ability of users to directly access data, the need to quickly manipulate data for themselves has led to the problem of spreadsheets being used when databases are needed. 17. Explain why the cost of ownership may be lower with a cloud database than with a traditional, company database. Answer: Cloud databases reside on the Internet instead of within the organization’s own network infrastructure. This can reduce costs because the organization is not required to purchase and maintain the hardware and software necessary to house the database and support the necessary levels of system performance. Companies typically experience savings in hardware, people, and management while investing those savings in increasing data analytics and business intelligence. However, companies must ensure that cloud providers comply with all required data insurance, security, and privacy regulations.

ANSWERS TO PROBLEMS ONLINE CONTENT The file structures you see in this problem set are simulated in a Microsoft Access

database named Ch01_Problems, available at www.cengage.com.

Given the file structure shown in Figure P1.1, answer Problems 1–4. FIGURE P1.1 The File Structure for Problems 1-4

How many records does the file contain? How many fields are there per record? Answer: The file contains seven records (21-5Z through 31-7P) and each of the record is composed of five fields (PROJECT_CODE through PROJECT_BID_PRICE).

What problem would you encounter if you wanted to produce a listing by city? How would you solve this problem by altering the file structure? Answer: The city names are contained within the MANAGER_ADDRESS attribute and decomposing this character (string) field at the application level is cumbersome at best. (Queries become much more difficult to write and take longer to execute when internal string searches must be conducted.) If the ability to produce city listings is important, it is best to store the city name as a separate attribute.

If you wanted to produce a listing of the file contents by last name, area code, city, state, or zip code, how would you alter the file structure? Answer: The more we divide the address into its component parts, the greater its information capabilities. For example, by dividing MANAGER_ADDRESS into its component parts (MGR_STREET, MGR_CITY, MGR_STATE, and MGR_ZIP), we gain the ability to easily select records on the basis of zip codes, city names, and states. Similarly, by subdividing the MANAGER name into its components MGR_LASTNAME, MGR_FIRSTNAME, and MGR_INITIAL, we gain the ability to produce more efficient searches and listings. For example, creating a phone directory is easy when you can sort by last name, first name, and initial. Finally, separating the area code and the phone number will yield the ability to efficiently group data by area codes. Thus MGR_PHONE might be decomposed into MGR_AREA_CODE and MGR_PHONE. The more you decompose the data into their component parts, the greater the search flexibility. Data that are decomposed into their most basic components are said to be atomic.

What data redundancies do you detect? How could those redundancies lead to anomalies? Answer: Note that the manager named Holly B. Parker occurs three times, indicating that she manages three projects coded 21-5Z, 25-9T, and 29-2D, respectively. (The occurrences indicate that there is a 1:M relationship between PROJECT and MANAGER: each project is managed by only one manager but, apparently, a manager may manage more than one project.) Ms. Parker’s phone number and address also occur three times. If Ms. Parker moves and/or changes her phone number, these changes must be made more than once and they must all be made

correctly … without missing a single occurrence. If any occurrence is missed during the change, the data are “different” for the same person. After some time, it may become difficult to determine what the correct data are. In addition, multiple occurrences invite misspellings and digit transpositions, thus producing the same anomalies. The same problems exist for the multiple occurrences of George F. Dorts. 5.

Identify and discuss the serious data redundancy problems exhibited by the file structure shown in Figure P1.5. Answer:

FIGURE P1.5 The File Structure for Problems 5–8

NOTE It is not too early to begin discussing proper structure. For example, you may focus student attention on the fact that, ideally, each row should represent a single entity. Therefore, each row’s fields should define the characteristics of one entity, rather than include characteristics of several entities. The file structure shown here includes characteristics of multiple entities. For example, the JOB_CODE is likely to be a characteristic of a JOB entity. PROJ_NUM and PROJ_NAME are clearly characteristics of a PROJECT entity. Also, since (apparently) each project has more than one employee assigned to it, the file structure shown here shows multiple occurrences for each of the projects. (Hurricane occurs three times, Coast occurs twice, and Satellite occurs four times.) At first glance, the file structure in Figure P1.5 seems appropriate from the reporting point of view. After all, the columns contain a single value, there are no multi-value cells, all data in the columns are the same data type, and each row conveys the needed information (who works in each project, their role, the charge per hour, and the hours worked). However, we need to approach this from the designer and the data processing point of view. Is the file structure providing info about one or multiple entities? It clearly shows information for multiple entities: project, employees, job roles, hours worked. Therefore, from the processing point of view, this is the ground for data duplication and anomalies. The file's poor structure sets the stage for multiple anomalies. For example, if the charge for JOB_CODE = EE changes from $85.00 to $90.00, that change must be made twice. Also, if employee June H. Sattlemeier is deleted from the file, you also lose information about the existence of her JOB_CODE = EE, its hourly charge of $85.00, and the PROJ_HOURS = 17.5. The loss of the PROJ_HOURS value will ultimately mean that the Coast project costs are not being

charged properly, thus causing a loss of PROJ_HOURS*JOB_CHG_HOUR = 17.5 × $85.00 = $1,487.50 to the company. Incidentally, note that the file contains different JOB_CHG_HOUR values for the same CT job code, thus illustrating the effect of changes in the hourly charge rate over time. The file structure appears to represent transactions that charge project hours to each project. However, the structure of this file makes it difficult to avoid update anomalies and it is not possible to determine whether a charge change is accurately reflected in each record. Ideally, a change in the hourly charge rate would be made in only one place and this change would then be passed on to the transaction based on the hourly charge. Such a structural change would ensure the historical accuracy of the transactions. You might want to emphasize that the recommended changes require a lot of work in a file system. 6.

Looking at the EMP_NAME and EMP_PHONE contents in Figure P1.5, what change(s) would you recommend? Answer: A good recommendation would be to make the data more atomic. That is, break up the data components whenever possible. For example, separate the EMP_NAME into its components EMP_FNAME, EMP_INITIAL, and EMP_LNAME. This change will make it much easier to organize employee data through the employee name component. Similarly, the EMP_PHONE data should be decomposed into EMP_AREACODE and EMP_PHONE. For example, breaking up the phone number 653-234-3245 into the area code 653 and the phone number 234-3245 will make it much easier to organize the phone numbers by area code. (If you want to print an employee phone directory, the more atomic employee name data will make the job much easier.)

Identify the various data sources in the file you examined in Problem 5. Answer: Given their answers to Problem 5 and some additional scrutiny of Figure P1.5, your students should be able to identify these data sources:    

Employee data such as names and phone numbers. Project data such as project names. If you start with an EMPLOYEE file, the project names clearly do not belong in that file. (Project names are clearly not employee characteristics.) Job data such as the job charge per hour. If you start with an EMPLOYEE file, the job charge per hour clearly does not belong in that file. (Hourly charges are clearly not employee characteristics.) The project hours, which are most likely the hours worked by the employee for that project. (Such hours are associated with a work product, not the employee per se.)

Given your answer to Problem 7, what new files should you create to help eliminate the data redundancies found in the file shown in Figure P1.5? Answer: The data sources are probably the PROJECT, EMPLOYEE, JOB, and CHARGE. The PROJECT file should contain project characteristics such as the project name, the project manager/coordinator, the project budget, and so on. The EMPLOYEE file might contain the employee names, phone number, address, and so on. The JOB file would contain the billing charge per hour for each of the job types—a database designer, an applications developer, and an accountant would generate different billing charges per hour. The CHARGE file would be used to keep track of the number of hours by job type that will be billed for each employee who worked on the project.

Identify and discuss the serious data redundancy problems exhibited by the file structure shown in Figure P1.9. (The file is meant to be used as a teacher class assignment schedule. One of the many problems with data redundancy is the likely occurrence of data inconsistencies—two different initials have been entered for the teacher named Maria Cordoza.) Answer:

FIGURE P1.9 The File Structure for Problems 9 and 10

Note that the teacher characteristics occur multiple times in this file. For example, the teacher named Maria Cordoza’s first name, last name, and initial occur three times. If changes must be made for any given teacher, those changes must be made multiple times. All it takes is one incorrect entry or one forgotten change to create data inconsistencies. Redundant data are not a luxury you can afford in a data environment. 10. Given the file structure shown in Figure P1.9, what problem(s) might you encounter if building KOM were deleted? Answer: You would lose all the time assignment data about teachers Williston, Cordoza, and Hawkins, as well as the KOM rooms 204E, 123, and 34. Furthermore, you will lose all references to Anne Hawkins and Maria Cordoza. Here is yet another good reason for keeping data about specific entities in their own tables! This kind of an anomaly is known as a deletion anomaly. 11. Using your school’s student information system, print your class schedule. The schedule probably would contain the student identification number, student name, class code, class name, class credit hours, class instructor name, the class meeting days and times, and the class room number. Use Figure P1.11 as a template to complete the following actions.

Answer:

FIGURE P1.11 Student Schedule Data Format

a. Create a spreadsheet using the template shown in Figure P1.11 and enter your current class schedule. b. Enter the class schedule of two of your classmates into the same spreadsheet. c. Discuss the redundancies and anomalies caused by this design. This could be a good “mini-group” problem—groups of three students maximum. Ask them to create their individual class schedules in separate spreadsheets and then, a single spreadsheet containing all their class schedules. This exercise should incentivize “group discussion” and discover data anomalies and brainstorm better ways to store the class schedule data. Students are likely to use MS Excel or Google Sheets to create a simple tabular spreadsheet containing the data outlined in Figure P1.11. The rows of the spreadsheet(s) will represent each one of the classes they are taking. Ask the students to generate a roster for students taking the database class. Students are likely to identify the redundancies around the class information since all three schedules (the student’s own schedule plus the schedules of the two classmates) will have at least the database class in common. This easily leads to discussions of separating the data into at least two tables in a database. However, that still leaves the redundancies of student data with each class that they are taking. Astute students might realize that this is analogous to the Employee Skills Certification shown in Figures 1.5 and 1.6, such that a table for student data, a table for class data, and a table to relate the students and classes are appropriate. For more challenging work, ask them to create a report of the schedule of classes per room. What fields do they need to add to this report, and in what order? Do they have what they need to sort this so the report shows the schedule by day and, within each day, by the time of day?

TABLE OF CONTENTS Answers to Review Questions…………………… ....... ……………………………………………….1 Answers to Problems .............................................................................................................10

ANSWERS TO REVIEW QUESTIONS 12. Discuss the importance of data models. Answer: A data model is a relatively simple representation, usually graphical, of a more complex real-world object event. The data model’s main function is to help us understand the complexities of the real-world environment. The database designer uses data models to facilitate the interaction among designers, application programmers, and end users. In short, a good data model is a communications device that helps eliminate (or at least substantially reduce) discrepancies between the database design’s components and the real-world data environment. The development of data models, bolstered by powerful database design tools, has made it possible to substantially diminish the database design error potential. (Review Sections 2-1 and 2-2 in detail.) 13. What is a business rule, and what is its purpose in data modeling? Answer: A business rule is a brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organization’s environment. In a sense, business rules are misnamed: they apply to any organization—a business, a government unit, a religious group, or a research laboratory; large or small—that stores and uses data to generate information. Business rules are derived from a description of operations. As its name implies, a description of operations is a detailed narrative that describes the operational environment of an organization. Such a description requires great precision and detail. If the description of operations is incorrect or incomplete, the business rules derived from it will not reflect the real-world data environment accurately, thus leading to poorly defined data models, which lead to poor database designs. In turn, poor database designs lead to poor applications, thus setting the stage for poor decision making—which may ultimately lead to the demise of the organization. Note especially that business rules help to create and enforce actions within that organization’s environment. Business rules must be rendered in writing and updated to reflect any change in the organization’s operational environment. Properly written business rules are used to define entities, attributes, relationships, and constraints. Because these components form the basis for a database design, the careful derivation and definition of business rules is crucial to good database design.

14. How do you translate business rules into data model components? Answer: As a general rule, a noun in a business rule will translate into an entity in the model, and a verb (active or passive) associating nouns will translate into a relationship among the entities. For example, the business rule “a customer may generate many invoices” contains two nouns (customer and invoice) and a verb (“generate”) that associates them. 15. Describe the basic features of the relational data model and discuss their importance to the end user and the designer. Answer: A relational database is a single data repository that provides both structural and data independence while maintaining conceptual simplicity. The relational database model is perceived by the user to be a collection of tables in which data are stored. Each table resembles a matrix composed of row and columns. Tables are related to each other by sharing a common value in one of their columns. The relational model represents a breakthrough for users and designers because it lets them operate in a simpler conceptual environment. End users find it easier to visualize their data as a collection of data organized as a matrix. Designers find it easier to deal with conceptual data representation, freeing them from the complexities associated with physical data representation. 16. Explain how the entity relationship (ER) model helped produce a more structured relational database design environment. Answer: An entity relationship model, also known as an ERM, helps identify the database’s main entities and their relationships. Because the ERM components are graphically represented, their role is more easily understood. Using the ER diagram, it’s easy to map the ERM to the relational database model’s tables and attributes. This mapping process uses a series of well-defined steps to generate all the required database structures. (This structures mapping approach is augmented by a process known as normalization, which is covered in detail in Chapter 6 “Normalization of Database Tables.”) 17. Consider the scenario described by the statement “A customer can make many payments, but each payment is made by only one customer.” Use this scenario as the basis for an entity relationship diagram (ERD) representation. Answer: This scenario yields two entities: CUSTOMER and PAYMENT. The ERDs shown in Figure Q2.6 uses the Chen and Crow’s Foot notation as shown in Figure 2.3 in the book.

Figure Q2.6 The Chen and Crow’s Foot ERDs for Question 6 Chen model 1 CUSTOMER

M makes

PAYMENT

Crow’s Foot model

CUSTOMER

makes

PAYMENT

NOTE Remind your students again that we have not (yet) illustrated other constructs like cardinality and participation on the ERD’s presentation. Their treatment are covered in detail in Chapter 4, “Entity Relationship (ER) Modeling.” 18. Why is an object said to have greater semantic content than an entity? Answer: An object has greater semantic content because it embodies both data and behavior. That is, the object contains, in addition to data, also the description of the operations that may be performed by the object. 19. What is the difference between an object and a class in the object-oriented data model (OODM)? Answer: An object is an instance of a specific class. It is useful to point out that the object is a run-time concept, while the class is a more static description. Objects that share similar characteristics are grouped in classes. A class is a collection of similar objects with shared structure (attributes) and behavior (methods). Therefore, a class resembles an entity set. However, a class also includes a set of procedures known as methods. 20. How would you model Question 6 with an OODM? (Use Figure 2.4 as your guide.) Answer: The OODM that corresponds to question 6’s ERD is shown in Figure Q2.9:

Figure Q2.9 The OODM Model for Question 9

CUSTOMER M PAYMENT

21. What is an ERDM, and what role does it play in the modern (production) database environment? Answer: The extended relational data model (ERDM) is the relational data model’s response to the object-oriented data model (OODM), which adds object extensions to DBMSs based on the relational model. Most current RDBMSes support at least a few of the ERDM’s extensions. For example, support for complex data types such as large binary objects (BLOBs) is now common. In modern production database environments, most DBMSs are deeply rooted in the relational model; however, they tend to offer, and organizations occasionally use, some object extensions. 22. What is a relationship, and what three types of relationships exist? Answer: A relationship is an association among (two or more) entities. Three types of relationships exist: one-to-one (1:1), one-to-many (1:M), and many-to-many (M:N or M:M.) Note: We will learn in Chapter 4 that a relationship can also exist among rows of one (the same) entity. 23. Give an example of each of the three types of relationships. Answer: 1:1 An academic department is chaired by one professor; a professor may chair only one academic department. 1:M A customer may generate many invoices; each invoice is generated by one customer. M:N An employee may have earned many degrees; a degree may have been earned by many employees. 24. What is a table, and what role does it play in the relational model? Answer: Strictly speaking, the relational data model bases data storage on relations. These relations are based on algebraic set theory. However, the user perceives the relations to be tables. In the relational database environment, designers and users perceive a table to be a matrix consisting of a series of row/column intersections.Tables, also called relations, are related to each other by sharing a common entity characteristic. For example, an INVOICE table would contain a customer number that points to that same number in the CUSTOMER table. This feature enables the RDBMS to link invoices to the customers who generated them.

Tables are especially useful from the modeling and implementation perspectives. Because tables are used to describe the entities they represent, they provide an easy way to summarize entity characteristics and relationships among entities. And, because they are purely conceptual constructs, the designer does not need to be concerned about the physical implementation aspects of the database design. 25. What is a relational diagram? Give an example. Answer: A relational diagram is a visual representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities. Therefore, it is easy to see what the entities represent and to see what types of relationships (1:1, 1:M, M:N) exist among the entities and how those relationships are implemented. An example of a relational diagram is found in the text’s Figure 2.2. MS Access, Database Tools, “Relationships” option on the main Access menu could be used to illustrate simple relational diagrams. 26. What is connectivity? (Use a Crow’s Foot ERD to illustrate connectivity.) Answer: Connectivity is the relational term to describe the types of relationships (1:1, 1:M, M:N).

In the figure, the business rule that an advisor can advise many students and a student has only one assigned advisor is shown within a relationship with a connectivity of 1:M. The business rule that a student can register only one vehicle to park on campus and a vehicle can be registered by only one student is shown with a relationship with a connectivity of 1:1. Finally, the rule that a student can register for many classes, and a class can be registered for by many students, is shown by the relationship with a connectivity of M:N. 27. Describe the Big Data phenomenon. Answer: Over the last few years, a new wave of data has “emerged” to the limelight. Such data have always existed but did not receive the attention that it is receiving today. These data are characterized for being high volume (petabyte size and beyond), high frequency (data are generated almost constantly), and mostly semi-structured. These data come from multiple and varied sources such as website logs, website posts in social sites, and machinegenerated information (GPS, sensors, etc.). Such data have been accumulated over the years and companies are now awakening to the fact that it contains a lot of hidden

information that could help the day-to-day business (such as browsing patterns, purchasing preferences, and behavior patterns). The need to manage and leverage this data has triggered a phenomenon labeled “Big Data.” Big Data refers to a movement to find new and better ways to manage large amounts of web-generated data and derive business insight from it, while, at the same time, providing high performance and scalability at a reasonable cost. 28. What does the term 3 Vs refer to? Answer: The term 3 Vs refers to the 3 basic characteristics of Big Data databases, they are: 

Volume: Refers to the amounts of data being stored. With the adoption and growth of the Internet and social media, companies have multiplied the ways to reach customers. Over the years, and with the benefit of technological advances, data for millions of e-transactions were being stored daily on company databases. Furthermore, organizations are using multiple technologies to interact with end users and those technologies are generating mountains of data. This ever-growing volume of data quickly reached petabytes in size and it’s still growing.



Velocity: Refers not only to the speed with which data grows but also to the need to process these data quickly in order to generate information and insight. With the advent of the Internet and social media, business responses times have shrunk considerably. Organizations not only need to store large volumes of quickly accumulating data, but also need to process such data quickly. The velocity of data growth is also due to the increase in the number of different data streams from which data is being piped to the organization (via the web, e-commerce, Tweets, Facebook posts, emails, sensors, GPS, and so on).



Variety: Refers to the fact that the data being collected comes in multiple different data formats. A great portion of these data comes in formats not suitable to be handled by the typical operational databases based on the relational model.

The 3 Vs framework illustrates what companies now know, that the amount of data being collected in their databases has been growing exponentially in size and complexity. Traditional relational databases are good at managing structured data but are not well suited to managing and processing the amounts and types of data being collected in today’s business environment. 29. What is Hadoop, and what are its basic components? Answer: In order to create value from their previously unused Big Data stores, companies are using new Big Data technologies. These emerging technologies allow organizations to process massive data stores of multiple formats in cost-effective ways. Some of the most frequently used Big Data technologies are Hadoop and MapReduce. 

Hadoop is a Java-based, open-source, high-speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop originated from Google’s work on distributed file systems and parallel processing and is currently supported by the Apache Software Foundation.1 Hadoop has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce.

For more information about Hadoop, visit hadoop.apache.org. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.



Hadoop Distributed File System (HDFS) is a highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds. In order to achieve high throughput, HDFS uses the write-once, read many model. This means that once the data is written, it cannot be modified. HDFS uses three types of nodes: a name node that stores all the metadata about the file system; a data node that stores fixed-size data blocks (that could be replicated to other data nodes), and a client node that acts as the interface between the user application and the HDFS.



MapReduce is an open-source application programming interface (API) that provides fast data analytics services. MapReduce distributes the processing of the data among thousands of nodes in parallel. MapReduce works with structured and nonstructured data. The MapReduce framework provides two main functions, Map and Reduce. In general terms, the Map function takes a job and divides it into smaller units of work; the Reduce function collects all the output results generated from the nodes and integrates them into a single result set.

30. What are the basic characteristics of a NoSQL database? Answer: Every time you search for a product on Amazon, send messages to friends in Facebook, watch a video in YouTube, or search for directions in Google Maps, you are using a NoSQL database. NoSQL refers to a new generation of databases that address the very specific challenges of the “big data” era and have the following general characteristics: 

Not based on the relational model.



Support distributed database architectures.



Provide high scalability, high availability, and fault tolerance.



Support very large amounts of sparse data.



Geared toward performance rather than transaction consistency.

31. Using the example of a medical clinic with patients and tests, provide a simple representation of how to model this example using the relational model. Answer: As you can see in Figure Q2.20, the relational model stores data in a tabular format in which each row represents a “record” for a given patient. In this case, each patient can have many tests and each test refers to only one patient. As you can see the TestData table contains the PAT_NUM foreign key to point to the PatientData table.

32. What is logical independence? Answer: Logical independence exists when you can change the internal model without affecting the conceptual model. When you discuss logical and other types of independence, it’s worthwhile to discuss and review some basic modeling concepts and terminology: 

In general terms, a model is an abstraction of a more complex real-world object or event. A model’s main function is to help you understand the complexities of the realworld environment. Within the database environment, a data model represents data structures and their characteristics, relations, constraints, and transformations. As its name implies, a purely conceptual model stands at the highest level of abstraction and focuses on the basic ideas (concepts) that are explored in the model, without specifying the details that will enable the designer to implement the model. For example, a conceptual model would include entities and their relationships and it may even include at least some of the attributes that define the entities, but it would not include attribute details such as the nature of the attributes (text, numeric, etc.) or the physical storage requirements of those attributes.



The terms data model and database model are often used interchangeably. In the text, the term database model is used to refer to the implementation of a data model in a specific database system.



Data models (relatively simple representations, usually graphical, of more complex real-world data structures), bolstered by powerful database design tools, have made it possible to substantially diminish the potential for errors in database design.



An internal schema depicts a specific representation of an internal model, using the database constructs supported by the chosen database.



The external model is the end users’ view of the data environment.

33. What is physical independence? Answer: You have physical independence when you can change the physical model without affecting the internal model. Therefore, a change in storage devices or methods and even a change in operating system will not affect the internal model. The terms physical model and internal model may require a bit of additional discussion: 

The physical model operates at the lowest level of abstraction, describing the way data are saved on storage media such as disks or tapes. The physical model requires the definition of both the physical storage devices and the (physical) access methods required to reach the data within those storage devices, making it both software- and hardware-dependent. The storage structures used are dependent on the software (DBMS, operating system) and on the type of storage devices that the computer can handle. The precision required in the physical model’s definition demands that database designers who work at this level have a detailed knowledge of the hardware and software used to implement the database design.



The internal model is the representation of the database as “seen” by the DBMS. In other words, the internal model requires the designer to match the conceptual model’s characteristics and constraints to those of the selected implementation model. An internal schema depicts a specific representation of an internal model, using the database constructs supported by the chosen database.

ANSWERS TO PROBLEMS Use the contents of Figure 2.1 to work Problems 1–3. Write the business rule(s) that govern the relationship between AGENT and CUSTOMER. Answer: Given the data in the two tables, you can see that an AGENT—through AGENT_CODE—can occur many times in the CUSTOMER table. But each customer has only one agent. Therefore, the business rules may be written as follows: One agent can have many customers. Each customer has only one agent. Given these business rules, you can conclude that there is a 1:M relationship between AGENT and CUSTOMER. 34. Given the business rule(s) you wrote in Problem 1, create the basic Crow’s Foot ERD. Answer: The Crow’s Foot ERD is shown in Figure P2.2a.

Figure P2.2a The Crow’s Foot ERD for Problem 3 serves

AGENT

CUSTOMER

For discussion purposes, you might use the Chen model shown in Figure P2.2b. Compare the two representations of the business rules by noting the different ways in which connectivities (1, M) are represented. The Chen ERD is shown in Figure P2.2b.

Figure P2.2b The Chen ERD for Problem 2 Chen model 1 AGENT

M serves

CUSTOMER

35. Using the ERD you drew in Problem 2, create the equivalent object representation and UML class diagram. (Use Figure 2.4 as your guide.) Answer: The OO model is shown in Figure P2.3a., and the UML class diagram is shown in Figure P2.3b.

Figure P2.3a The OO Model for Problem 3 AGENT M CUSTOMER

Figure P2.3b The UML Model for Problem 3

Using Figure P2.4 as your guide, work Problems 4 and 5. The DealCo relational diagram shows the initial entities and attributes for the DealCo stores, which are located in two regions of the country.

Figure P2.4 The DealCo relational diagram 36. Identify each relationship type and write all of the business rules. Answer: One region can be the location for many stores. Each store is located in only one region. Therefore, the relationship between REGION and STORE is 1:M. Each store employs one or more employees. Each employee is employed by one store. (In this case, we are assuming that the business rule specifies that an employee cannot work in more than one store at a time.) Therefore, the relationship between STORE and EMPLOYEE is 1:M. A job—such as accountant or sales representative—can be assigned to many employees. (For example, one would reasonably assume that a store can have more than one sales

representative. Therefore, the job title “Sales Representative” can be assigned to more than one employee at a time.) Each employee can have only one job assignment. (In this case, we are assuming that the business rule specifies that an employee cannot have more than one job assignment at a time.) Therefore, the relationship between JOB and EMPLOYEE is 1:M. 37. Create the basic Crow’s Foot ERD for DealCo. Answer: The Crow’s Foot ERD is shown in Figure P2.5a.

Figure P2.5a The Crow’s Foot ERD for DealCo is location for

REGION

STORE

employs

is assigned to

JOB

EMPLOYEE

The Chen model is shown in Figure P2.5b. (Note that you always read the relationship from the “1” to the “M” side.)

Figure P2.5b The Chen ERD for DealCo M

1 is location for

REGION

STORE 1

employs

1 JOB

is assigned to

M EMPLOYEE

Using Figure P2.6 as your guide, work Problems 6−8. The Tiny College relational diagram shows the initial entities and attributes for the college.

Figure P2.6 The Tiny College relational diagram 38. Identify each relationship type and write all of the business rules. Answer: The simplest way to illustrate the relationship among ENROLL, CLASS, and STUDENT is to discuss the data shown in Table P2.6. As you examine the Table P2.6 contents and compare the attributes to relational schema shown in Figure P2.6, note these features: 

We have added an attribute, ENROLL_SEMESTER, to identify the enrollment period.



Naturally, no grade has yet been assigned when the student is first enrolled, so we have entered a default value “NA” for “Not Applicable.” The letter grade—A, B, C, D, F, I (Incomplete), or W (Withdrawal)—will be entered at the conclusion of the enrollment period, the SPRING-12 semester.



Student 11324 is enrolled in two classes; student 11892 is enrolled in three classes, and student 10345 is enrolled in one class.

Table P2.6 Sample Contents of an ENROLL Table STU_NUM

CLASS_CODE

ENROLL_SEMESTER

ENROLL_GRADE

11324

MATH345-04

SPRING-14

11324

ENG322-11

SPRING-14

11892

CHEM218-05

SPRING-14

11892

ENG322-11

SPRING-14

11892

CIS431-01

SPRING-14

10345

ENG322-07

SPRING-14

All of the relationships are 1:M. The relationships may be written as follows: COURSE generates CLASS. One course can generate many classes. Each class is generated by one course.

CLASS is referenced in ENROLL. One class can be referenced in enrollment many times. Each individual enrollment references one class. Note that the ENROLL entity is also related to STUDENT. Each entry in the ENROLL entity references one student and the class for which that student has enrolled. A student cannot enroll in the same class more than once. If a student enrolls in four classes, that student will appear in the ENROLL entity four times, each time for a different class. STUDENT is shown in ENROLL. One student can be shown in enrollment many times. (In database design terms, “many” simply means “more than once.”) Each individual enrollment entry shows one student. 39. Create the basic Crow’s Foot ERD for Tiny College. Answer: The Crow’s Foot model is shown in Figure P2.7a.

Figure P2.7a The Crow’s Foot Model for Tiny College generates

COURSE

CLASS

is referenced in

is shown in

STUDENT

ENROLL

The Chen model is shown in Figure P2.7b.

Figure P2.7b The Chen Model for Tiny College M

1 generates

COURSE

CLASS 1

is referenced in

1 STUDENT

is shown in

M ENROLL

40. Create the UML class diagram that reflects the entities and relationships you identified in the relational diagram.

Answer: The OO model is shown in Figure P2.8a, and the UML class diagram is shown in Figure P2.8b.

Figure P2.8a The OO Model for Tiny College COURSE

STUDENT

ENROLL

CRS_CODE

CRS_DESCRIPTION C CRS_CREDIT

ENROLL_SEMESTER C

STU_NUM

ENROLL_GRADE

CLASSES: M

CLASSES: M CLASS

CLASS

CLASS C

CLASS_CODE

STU_LNAME

CLASS_DAYS

STU_FNAME

CLASS_TIME

STU_INITIAL

CLASS_ROOM

STU_DOB

COURSES:

COURSE

ENROLLMENT:

STUDENTS: M STUDENT

ENROLL ENROLLMENT:

Note: C = Character D = Date N = Numeric

ENROLL

Figure P2.8b The UML Model for Tiny College

Typically, a hospital patient receives medications that have been ordered by a particular doctor. Because the patient often receives several medications per day, there is a 1:M relationship between PATIENT and ORDER. Similarly, each order can include several medications, creating a 1:M relationship between ORDER and MEDICATION. Answer: a. Identify the business rules for PATIENT, ORDER, and MEDICATION. The business rules reflected in the PATIENT description are: A patient can have many (medical) orders written for him or her. Each (medical) order is written for a single patient. The business rules reflected in the ORDER description are: Each (medical) order can prescribe many medications. Each medication can be prescribed in many orders.

The business rules reflected in the MEDICATION description are: Each medication can be prescribed in many orders. Each (medical) order can prescribe many medications. b. Create a Crow’s Foot ERD that depicts a relational database model to capture these business rules.

Figure P2.9 Crow’s foot ERD for Problem 9

United Broke Artists (UBA) is a broker for not-so-famous artists. UBA maintains a small database to track painters, paintings, and galleries. A painting is created by a particular artist and then exhibited in a particular gallery. A gallery can exhibit many paintings, but each painting can be exhibited in only one gallery. Similarly, a painting is created by a single painter, but each painter can create many paintings. Using PAINTER, PAINTING, and GALLERY, in terms of a relational database: Answer: What tables would you create, and what would the table components be? We would create the three tables shown in Figure P2.10a. (Use the teacher’s Ch02_UBA database in your instructor’s resources to illustrate the table contents.)

FIGURE P2.10a The UBA Database Tables

As you discuss the UBA database contents, note in particular the following business rules that are reflected in the tables and their contents: 

A painter can paint may paintings.



Each painting is painted by only one painter.



A gallery can exhibit many paintings.



A painter can exhibit paintings at more than one gallery at a time. (For example, if a painter has painted six paintings, two may be exhibited in one gallery, one at another, and three at the third gallery. Naturally, if galleries specify exclusive contracts, the database must be changed to reflect that business rule.)



Each painting is exhibited in only one gallery.

The last business rule reflects the fact that a painting can be physically located in only one gallery at a time. If the painter decides to move a painting to a different gallery, the database must be updated to remove the painting from one gallery and add it to the different gallery.

b. How might the (independent) tables be related to one another? Figure P2.10b shows the relationships.

FIGURE P2.10b The UBA Relational Model

41. Using the ERD from Problem 10, create the relational schema. (Create an appropriate collection of attributes for each of the entities. Make sure you use the appropriate naming conventions to name the attributes.) Answer: The relational diagram is shown in Figure P2.11.

FIGURE P2.11 The Relational Diagram for Problem 11

42. Convert the ERD from Problem 10 into a corresponding UML class diagram. Answer: The basic UML solution is shown in Figure P2.12.

FIGURE P2.12 The UML for Problem 12

43. Describe the relationships (identify the business rules) depicted in the Crow’s Foot ERD shown in Figure P2.13.

Figure P2.13 The Crow’s Foot ERD for Problem 13 Answer: The business rules may be written as follows: 

A professor can teach many classes.



Each class is taught by one professor.



A professor can advise many students.



Each student is advised by one professor.

44. Create a Crow’s Foot ERD to include the following business rules for the ProdCo company: Answer: a. Each sales representative writes many invoices. b. Each invoice is written by one sales representative. c. Each sales representative is assigned to one department. d. Each department has many sales representatives. e. Each customer can generate many invoices. f.

Each invoice is generated by one customer.

The Crow’s Foot ERD is shown in Figure P2.14. Note that a 1:M relationship is always read from the one (1) to the many (M) side. Therefore, the customer–invoice relationship is read as “one customer generates many invoices.”

Figure P2.14 Crow’s Foot ERD for the ProdCo Company

45. Write the business rules that are reflected in the ERD shown in Figure P2.15. (Note that the ERD reflects some simplifying assumptions. For example, each book is written by only one author. Also, remember that the ERD is always read from the “1” to the “M” side, regardless of the orientation of the ERD components.)

FIGURE P2.15 The Crow’s Foot ERD for Problem 15

Answer: The relationships are best described through a set of business rules: 

One publisher can publish many books.



Each book is published by one publisher.



A publisher can submit many (book) contracts.



Each (book) contract is submitted by one publisher.



One author can sign many contracts.



Each contract is signed by one author.



One author can write many books.



Each book is written by one author.

This ERD will be a good basis for a discussion about what happens when more realistic assumptions are made. For example, a book—such as this one—may be written by more than one author. Therefore, a contract may be signed by more than one author. Your students will learn how to model such relationships after they have become familiar with the material in Chapter 3. 46. Create a Crow’s Foot ERD for each of the following descriptions. (Note that the word many merely means more than one in the database modeling environment.) Answer: a. Each of the MegaCo Corporation’s divisions is composed of many departments. Each department has many employees assigned to it, but each employee works for only one department. Each department is managed by one employee, and each of those managers can manage only one department at a time. The Crow’s Foot ERD is shown in Figure P2.16a.

FIGURE P2.16a The MegaCo Crow’s Foot ERD

As you discuss the contents of Figure P2.16a, note the 1:1 relationship between the EMPLOYEE and the DEPARTMENT in the “manages” relationship and the 1:M relationship between the DEPARTMENT and the EMPLOYEE in the “is assigned to” relationship. b. During some period of time, a customer can download many ebooks from BooksOnline. Each of the ebooks can be downloaded by many customers during that period of time. The solution is presented in Figure P2.16b. Note the M:N relationship between CUSTOMER and EBOOK. Such a relationship is not implementable in a relational model.

If you want to let the students convert Figure P2.16b’s ERD into an implementable ERD, add a third DOWNLOAD entity to create a 1:M relationship between CUSTOMER and DOWNLOAD and a 1:M relationship between EBOOK and DOWNLOAD. (Note that such a conversion has been shown in the next problem solution.) c. An airliner can be assigned to fly many flights, but each flight is flown by only one airliner. Originally, the student may think that there is a 1:M relationship between AIRCRAFT and FLIGHT. And probably based on the business rule, this would be correct. The teacher could use this opportunity to expand into “real-world” situations and discuss how business rules should be properly defined with a time dimension in mind. Make the students think of a FLIGHT as having a flight number, a date, and an aircraft (among other attributes such as from and to destinations) It is common practice in the airline industry to replace an AIRCRAFT for various reasons (schedule maintenance, engine checkups, engine problems, etc.). So, in practice the same FLIGHT can be performed by a different AIRCRAFT. In this case, you can say that “over a period of time,” a flight can be flown by many aircraft (but only one at a time) and an aircraft can fly many flights. See the ERDs, sample tables, and relational diagram below.

FIGURE P2.16c The Airline Crow’s Foot ERD

Initial M:N Solution AIRCRAFT

flies

FLIGHT

Implementable Solution AIRCRAFT

is assigned to

ASSIGNMENT

shows in

FLIGHT

We have created a small Ch02_Airline database to let you explore the implementation of the model. (Check the data files available for Instructors at www.cengage.com.) The tables and the relational diagram are shown in the following two figures. Discuss how a FLIGHT occurs several times on different dates. (See tables below.) For example, flight AA2376, from Nashville (BNA) to Los Angeles (LAX) can occur on several dates; leaving BNA at 9:00 am CST and arriving at LAX at 1:00 pm PST. And, it could use different aircraft (over a period of time).

FIGURE P2.16c The Airline Database Tables

FIGURE P2.16c The Airline Relational Diagram

d. The KwikTite Corporation operates many factories. Each factory is located in a region, and each region can be “home” to many of KwikTite’s factories. Each factory has many employees, but each employee is employed by only one factory. The solution is shown in Figure P2.16d.

FIGURE P2.16d The KwikTite Crow’s Foot ERD EMPLOYEE Remember that a 1:M relationship is always read from the “1” side to the “M” side. Therefore, the relationship between FACTORY and REGION is properly read as “factory employs employee.”

contains

REGION

employs

FACTORY

e. An employee may have earned many degrees, and each degree may have been earned by many employees. The solution is shown in Figure P2.16e.

FIGURE P2.16e The Earned Degree Crow’s Foot ERD

EMPLOYEE

earns

DEGREE

Note that this M:N relationship must be broken up into two 1:M relationships before it can be implemented in a relational database. Use the airline ERD’s decomposition in Figure P2.16c as the focal point in your discussion. 47. Write the business rules that are reflected in the ERD shown in Figure P2.17. Answer: A theater shows many movies. A movie can be shown in many theaters. A movie can receive many reviews. Each review is for a single movie. A reviewer can write many reviews. Each review is written by a single reviewer. Note that the M:N relationship between theater and movie must be broken into two 1:M relationships using a bridge table before it can be implemented in a relational database.

FIGURE P2.17 The Crow’s Foot ERD for Problem 17

TABLE OF CONTENTS Answers to Review Questions……… .............. ……………………………………………………….1 Answers to Problems .............................................................................................................14

ANSWERS TO REVIEW QUESTIONS ONLINE CONTENT The website (www.cengage.com) includes MS Access databases and SQL script files (Oracle, SQL Server, and MySQL) for all of the datasets used throughout the book. 48. What is the difference between a database and a table? Answer: A table, a logical structure that represents an entity set, is only one of the components of a database. A table stores the end-user data. The database is a structure that houses one or more tables and metadata. The metadata are data about data. Metadata include the data (attribute) characteristics and the relationships between the entity sets. 49. What does it mean to say that a database displays both entity integrity and referential integrity?

Answer: Entity integrity describes a condition in which all tuples within a table are uniquely identified by their primary key. The unique value requirement prohibits a null primary key value because nulls are not unique. Referential integrity describes a condition in which a foreign key value has a match in the related table or in which the foreign key value is null. The null foreign key “value” makes it possible not to have a corresponding value, but the matching requirement on values that are not null makes it impossible to have an invalid value.

50. Why are entity integrity and referential integrity important in a database? Answer: Entity integrity and referential integrity are important because they are the basis for expressing and implementing relationships in the entity-relationship model. Entity integrity ensures that each row is uniquely identified by the primary key. Therefore, entity integrity means that a proper search for an existing tuple (row) will always be successful. (And the failure to find a match on a row search will always mean that the row for which the search is conducted does not exist in that table.) Referential integrity means that, if the foreign key contains a value, that value refers to an existing valid tuple (row) in another relation. Therefore, referential integrity ensures that it will be impossible to assign a non-existing foreign key value to a table. 51. What are the requirements that two relations must satisfy to be considered union-compatible? Answer: In order for two relations to be union-compatible, both must have the same number of attributes (columns) and corresponding attributes (columns) must have the same domain. The first requirement is easily identified by a cursory glance at the relations’ structures. If the first relation has 3 attributes then the second relation must also have 3 attributes. If the first table has 10 attributes, then the second relation must also have 10 attributes. The second requirement is more difficult to assess and requires understanding the meanings of the attributes in the business environment. Recall that an attribute’s domain is the set of allowable values for that attribute. To satisfy the second requirement for union-compatibility, the first attribute of the first relation must have the same domain as the first attribute of the second relation. The second attribute of the first relation must have the same domain as the second attribute of the second relation. The third attribute of the first relation must have the same domain as the third attribute of the second relation, and so on. NOTE: the professor may further explain that you could apply the UNION operator to two relations with different number of attributes by using the PROJECT operator to project only the common attributes, assuming those attributes share common domains. Remember that for the relational model, the result of a relational set operation is another relation (table). 52. Which relational algebra operators can be applied to a pair of tables that are not unioncompatible? Answer: The Product, Join, and Divide operators can be applied to a pair of tables that are not union-compatible. Divide does place specific requirements on the tables to be operated on; however, those requirements do not include union-compatibility. Select (or Restrict) and Project are performed on individual tables, not pairs of tables. (Note that if two tables are joined, then the result is a single table and the Select or Project operator is performed on that single table.) 53. Explain why the data dictionary is sometimes called “the database designer’s database.” Answer: Just as the database stores data that is of interest to the users regarding the objects in their environment that are important to them, the data dictionary stores data that is of interest to the database designer about the important decisions that were made in regard to the database structure. The data dictionary contains the number of tables that were created, the names of all of those tables, the attributes in each table, the relationships between the tables, the data type of each attribute, the enforced domains of the attributes, and so on. All of these data represent decisions that the database designer had to make and data that the database designer needs to record about the database.

54. A database user manually notes that “The file contains two hundred records, each record containing nine fields.” Use appropriate relational database terminology to “translate” that statement. Answer: Using the proper relational terminology, the statement may be translated to “the table—or relation—contains two hundred rows—or, if you like, two hundred tuples, or entities. Each of these rows contains nine attributes.” Use Figure Q3.8 to answer Questions 8–12. 55. Using the STUDENT and PROFESSOR tables, illustrate the difference between a natural join, an equijoin, and an outer join. Answer:

FIGURE Q3.8 The Ch03_CollegeQue Database Tables

The natural JOIN process begins with the PRODUCT of the two tables. Next, a SELECT (or RESTRICT) is performed on the PRODUCT generated in the first step to yield only the rows for which the PROF_CODE values in the STUDENT table are matched in the PROFESSOR table. Finally, a PROJECT is performed to produce the natural JOIN output by listing only a single copy of each attribute. The order in which the query output rows are shown is not relevant.

STU_CODE

PROF_CODE

DEPT_CODE

128569

512272

531235

553427

The equiJOIN’s results depend on the specified condition. At this stage of the students’ understanding, it may be best to focus on equijoins that retrieve all matching values in the common attribute. In such a case, the output will be:

STU_CODE

STUDENT. PROF_CODE

PROFESSOR. PROF_CODE

DEPT_CODE

128569

512272

531235

553427

Notice that in equijoins, the common attribute appears from both tables. It is normal to prefix the attribute name with the table name when an attribute appears more than once in a table. This maintains the requirement that attribute names be unique within a relational table. In the outer JOIN, the unmatched pairs would be retained and the values that do not have a match in the other table would be left null. It should be made clear to the students that outer joins are not the opposite of inner joins (like natural joins and equijoins). Rather, they are “inner join plus”—they include all of the matched records found by the inner join plus the unmatched records. Outer JOINs are normally performed as either a left outer join or a right outer join so that the operator specifies which table’s unmatched rows should be included in the output. Full outer joins depict the matched records plus the unmatched records from both tables. Also, like equijoins, outer joins do not drop a copy of the common attribute. Therefore, a full outer join will yield these results:

STU_CODE

STUDENT. PROF_CODE

PROFESSOR.P ROF_CODE

DEPT_CODE

128569

512272

531235

553427

100278 531268

A left outer join of STUDENT to PROFESSOR would include the matched rows plus the unmatched STUDENT rows:

STU_CODE

STUDENT. PROF_CODE

PROFESSOR. PROF_CODE

DEPT_CODE

128569

512272

531235

553427

100278 531268 A right outer join of STUDENT to PROFESSOR would include the matched rows plus the unmatched PROFESSOR row.

STU_CODE

STUDENT. PROF_CODE

PROFESSOR.P ROF_CODE

DEPT_CODE

128569

512272

531235

553427

56. Create the table that would result from πstu_code(student). Answer:

STU_CODE 128569 512272 531235 553427 100278 531268

57. Create the table that would result from πstu_code, dept_code(student ⋈ professor). Answer:

STU_CODE

DEPT_CODE

128569

512272

531235

553427

58. Create the basic ERD for the database shown in Figure Q3.8. Answer: Both the Chen and Crow’s Foot solutions are shown in Figure Q3.11.

FIGURE Q3.11 The Chen and Crow’s Foot ERD Solutions for Question 11 Chen ERD (generated with PowerPoint) 1 PROFESSOR

M advises

STUDENT

Crow’s Foot ERD (generated with PowerPoint)

PROFESSOR

advises

STUDENT

Chen ERD (generated with Visio Professional)

NOTE From this point forward, we will show the ERDs in Crow’s Foot format unless the problem specifies a different format.

59. Create the relational diagram for the database shown in Figure Q3.8. Answer: The relational diagram, generated in the Microsoft Access Ch03_CollegeQue database, is shown in Figure Q3.11.

FIGURE Q3.11 The Relational Diagram

Use Figure Q3.13 to answer Questions 13–17.

FIGURE Q3.13 The Ch03_VendingCo Database Tables

60. Write the relational algebra formula to apply a UNION relational operator to the tables shown in Figure Q3.13. Answer: The question does not specify the order in which the table should be used in the operation. Therefore, both of the following are correct. BOOTH ⋃ MACHINE MACHINE ⋃ BOOTH You can use this as an opportunity to emphasize that the order of the tables in a UNION command do not change the contents of the data returned.

61. Create the table that results from applying a UNION relational operator to the tables shown in Figure Q3.13 Answer:

BOOTH_PRODUCT

BOOTH_PRICE

Chips

1.5

Cola

1.25

Energy Drink Chips Chocolate Bar

2 1.25 1

Note that when the attribute names are different, the result will take the attribute names from the first relation. In this case, the solution assumes the operation was BOOTH UNION MACHINE. If the operation had been MACHINE UNION BOOTH then the attribute names from the MACHINE table would have appeared as the attribute names in the result. Also, notice that the “Chips” from both tables appears in the result, but the “Energy Drink” from both does not. A UNION operator will eliminate duplicate rows from the result; however, the entire row must match for two rows to be considered duplicates. In the case of “Chips”, the product names were the same but the prices were different. In the case of “Energy Drink”, both the product names and the prices matched so the second Energy Drink row was dropped from the result. 62. Write the relational algebra formula to apply an INTERSECT relational operator to the tables shown in Figure Q3.13. Answer: The question does not specify the order in which the table should be used in the operation. Therefore, both of the following are correct. BOOTH ⋂ MACHINE MACHINE ⋂ BOOTH 63. Create the table that results from applying an INTERSECT relational operator to the tables shown in Figure Q3.13. Answer:

BOOTH_PRODUCT Energy Drink

BOOTH_PRICE 2

Note that when the attribute names are different, the result will take the attribute names from the first relation. In this case, the solution assumes the operation was BOOTH INTERSECT MACHINE. If the operation had been MACHINE INTERSECT BOOTH then the attribute names from the MACHINE table would have appeared as the attribute names in the result.

64. Using the tables in Figure Q3.13, create the table that results from MACHINE DIFFERENCE BOOTH. Answer:

MACHINE_PRODUCT

MACHINE_PRICE

Chips

1.25

Chocolate Bar

Note that the order in which the relations are specified is significant in the results returned. The DIFFERENCE operator returns the rows from the first relation that are not duplicated in the second relation. Just as with the INTERSECT operator, the entire row must match an existing row to be considered a duplicate. Use Figure Q3.18 to answer Question 18. 65. Suppose you have the ERD shown in Figure Q3.18. How would you convert this model into an ERM that displays only 1:M relationships? (Make sure you create the revised ERD.) Answer:

FIGURE Q3.18 The Crow’s Foot ERD for DRIVER and TRUCK

The Crow’s Foot solution is shown in Figure Q3.18sol. Note that the original M:N relationship has been decomposed into two 1:M relationships based on these business rules: 

A driver may receive many (driving) assignments.



Each (driving) assignment is made for a single driver.



A truck may be driven in many (driving) assignments.



Each (driving) assignment is made for a single truck.

Note that a driver can drive only one truck at a time, but during some period of time, a driver may be assigned to drive many trucks. The same argument holds true for trucks—a truck can only be driven during one trip (assignment) at a time, but during some period of time, a truck may be assigned to be driven in many trips. Also, remind students that they will be introduced to optional (and additional) relationships as they study Chapter 4, “Entity Relationship (ER) Modeling.” Finally, remind your students that you always read the relationship from the “1” side to the “M” side. Therefore, you read “DRIVER receives ASSIGNMENT” and “TRUCK is driven in ASSIGNMENT.”

FIGURE Q3.18sol The Crow’s Foot ERM Solution for Question 18

66. What are homonyms and synonyms, and why should they be avoided in database design? Answer: Homonyms appear when more than one attribute has the same name. Synonyms exist when the same attribute has more than one name. Avoid both to avoid inconsistencies. For example, suppose we check the database for a specific attribute such as NAME. If NAME refers to customer names as well as to sales rep names, a clear case of a homonym, we have created an ambiguity, because it is no longer clear which entity the NAME belongs to. Synonyms make it difficult to keep track of foreign keys if they are named differently from the primary keys they point to. Using REP_NUM as the foreign key in the CUSTOMER table to reference the primary key REP_NUM in the SALESREP table is much clearer than naming the CUSTOMER table’s foreign key SLSREP. The proliferation of different attribute names to describe the same attributes will also make the data dictionary more cumbersome to use. Some data RDBMSs let the data dictionary check for homonyms and synonyms to alert the user to their existence, thus making their use less likely. For example, if a CUSTOMER table contains the (foreign) key REP_NUM, the entry of the attribute REP_NUM in the SALESREP table will either cause it to inherit all the characteristics of the original REP_NUM, or it will reject the use of this attribute name when different characteristics are declared by the user. 67. How would you implement a l:M relationship in a database composed of two tables? Give an example. Answer: Let’s suppose that an auto repair business wants to track its operations by customer. At the most basic level, it’s reasonable to assume that any database design you produce will include at least a car entity and a customer entity. Further suppose that it is reasonable to assume that: 

A car is owned just by one customer.



A customer can own more than one car.

The CAR and CUSTOMER entities and their relationships are represented by the Crow’s Foot ERD shown in Figure Q3.20. (Discussion: Explain to your students that the ERDs are very basic at this point. Your students will learn how to incorporate much more detail into their ERDs in Chapter 4. For example, no thought has—yet—been given to optional relationships or to the strength of those relationships. At this stage of learning the business of database design, simple is good! To borrow an old Chinese proverb, a journey of a thousand miles begins with a single step.)

FIGURE Q3.20 The CUSTOMER owns CAR ERM

Use Figure Q3.21 to answer Question 21. 68. Identify and describe the components of the table shown in Figure Q3.21, using correct terminology. Use your knowledge of naming conventions to identify the table’s probable foreign key(s). Answer:

FIGURE Q3.21 The Ch03_NoComp Database EMPLOYEE Table

Figure Q3.21’s database table contains: 

One entity set: EMPLOYEE.



Six attributes: EMP_NUM, EMP_LNAME, EMP_INIT, EMP_FNAME, DEPT_CODE, and JOB_CODE.



Ten entities: The 10 workers shown in rows 1–10.



One primary key: The attribute EMP_NUM because it identifies each row uniquely.



Two foreign keys: The attribute DEPT_CODE, which probably references a department to which the employee is assigned, and the attribute JOB_CODE, which probably references another table in which you would find the description of the job and perhaps additional information pertaining to the job.

Use the database shown in Figure Q3.22 to answer Questions 22–27.

FIGURE Q3.22 The Ch03_Theater Database Tables

69. Identify the primary keys. Answer: DIR_NUM is the DIRECTOR table’s primary key. PLAY_CODE is the PLAY table’s primary key. 70. Identify the foreign keys. Answer: The foreign key is DIR_NUM, located in the PLAY table. Note that the foreign key is located on the “many” side of the relationship between director and play. (Each director can direct many plays ... but each play is directed by only one director.) 71. Create the ERM. Answer: The entity relationship model is shown in Figure Q3.24.

FIGURE Q3.24 The Theater Database ERD

72. Create the relational diagram to show the relationship between DIRECTOR and PLAY. Answer: The relational diagram, shown in Figure 3.21, was generated with the help of Microsoft Access. (Check the Ch03_Theater database.)

FIGURE Q3.25 The Relational Diagram

73. Suppose you wanted quick lookup capability to get a listing of all plays directed by a given director. Which table would be the basis for the INDEX table, and what would be the index key? Answer: The PLAY table would be the basis for the appropriate index table. The index key would be the attribute DIR_NUM. 74. What would be the conceptual view of the INDEX table described in Question 26? Depict the contents of the conceptual INDEX table. Answer: The conceptual index table is shown in Figure Q3.27.

FIGURE Q3.27 The Conceptual Index Table Index Key

Pointers to the PLAY Table

100

101

2, 5, 7

102

1, 3, 6

ANSWERS TO PROBLEMS Use the database shown in Figure P3.1 to answer Problems 1–9. FIGURE P3.1 The Ch03_StoreCo Database Tables

For each table, identify the primary key and the foreign key(s). If a table does not have a foreign key, write None. Answer:

TABLE

PRIMARY KEY

FOREIGN KEY(S)

EMPLOYEE

EMP_CODE

STORE_CODE

STORE

STORE_CODE

REGION_CODE, EMP_CODE

REGION

REGION_CODE

NONE

NOTE: the STORE_CODE foreign key in the EMPLOYEE table represents where the employee works. The EMP_CODE in the STORE table represents who is the store manager. 75. Do the tables exhibit entity integrity? Answer yes or no and then explain your answer. Answer:

TABLE

ENTITY INTEGRITY

EXPLANATION

EMPLOYEE

Yes

Each EMP_CODE value is unique and there are no nulls.

STORE

Yes

Each STORE_CODE value is unique and there are no nulls.

REGION

Yes

Each REGION_CODE value is unique and there are no nulls.

76. Do the tables exhibit referential integrity? Answer yes or no and then explain your answer. Write NA (Not Applicable) if the table does not have a foreign key. Answer:

TABLE

REFERENTIAL INTEGRITY

EXPLANATION

EMPLOYEE

Yes

Each STORE_CODE value in EMPLOYEE points to an existing STORE_CODE value in STORE.

STORE

Yes

Each REGION_CODE value in STORE points to an existing REGION_CODE value in REGION and each EMP_CODE value in STORE points to an existing EMP_CODE value in EMPLOYEE.

REGION

77. Describe the type(s) of relationship(s) between STORE and REGION. Answer: Because REGION_CODE values occur more than once in STORE, we may conclude that each REGION can contain many stores. But since each STORE is located in only one REGION, the relationship between STORE and REGION is M:1. (It is, of course, equally true that the relationship between REGION and STORE is 1:M.) 78. Create the ERD to show the relationship between STORE and REGION. Answer: The Crow’s Foot ERD is shown in Figure P3.5. Note that each store is located in a single region, but that each region can have many stores located in it. (It’s always a good time to focus a discussion on the role of business rules in the creation of a database design.)

FIGURE P3.5 ERD for the STORE and REGION Relationship

79. Create the relational diagram to show the relationship between STORE and REGION. Answer: The relational diagram is shown in Figure P3.6. Note (again) that the location of the entities is immaterial … the relationships are carried along with the entity. Therefore, it does not matter whether you locate the REGION on the left side or on the right side of the display. But you always read from the “1” side to the “M” side, regardless of the entity location.

FIGURE P3.6 The Relational Diagram for the STORE and REGION Relationship

80. Describe the type(s) of relationship(s) between EMPLOYEE and STORE. (Hint: Each store employs many employees, one of whom manages the store.) Answer: There are TWO relationships between STORE and EMPLOYEE. The first relationship, expressed by STORE employs EMPLOYEE, is a 1:M relationship, because one store can employ many employees and each employee is employed by one store. The second relationship, expressed by EMPLOYEE manages STORE, is a 1:1 relationship, because each store is managed by one employee and an employee manages only one store.

NOTE It is useful to introduce several ways in which the manages relationship may be implemented. For example, rather than creating the manages relationship between EMPLOYEE and STORE, it is possible to simply list the manager’s name as an attribute in the STORE table. This approach creates a redundancy that may not do much damage if the information requirements are limited. However, if it is necessary to keep track of each manager’s sales and personnel management performance by store, the manages relationship we have shown here will do a much better job in terms of information generation. Also, you may want to introduce the notion of an optional relationship. After all, not all employees participate in the manages relationship. We will cover optional relationships in detail in Chapter 4, “Entity relationship (ER) Modeling.” 81. Create the ERD to show the relationships among EMPLOYEE, STORE, and REGION. Answer: The Crow’s Foot ERD is shown in Figure P3.8. Remind students that you always read from the “1” side to the “M” side in any 1:M relationship, that is, a STORE employs many EMPLOYEEs and a REGION contains many STORES. In a 1:1 relationship, you always read from the “parent” entity to the related entity. In this case, only one EMPLOYEE manages each STORE … and each STORE is managed by only one EMPLOYEE. Figure P3.8’s ERD includes the properties of the manages relationship. Note that there is no mandatory 1:1 relationship available at this point. That’s why there is an optional relationship—the O symbol—next to the STORE entity to indicate that an employee is not necessarily a manager. Let your students know that such optional relationships will be explored in detail in Chapter 4. (Explain that you can create mandatory 1:1 relationships when you add attributes to the entity boxes and specify a mandatory data entry for those attributes that are involved in the 1:1 relationship.)

FIGURE P3.8 StoreCo Crow’s Foot ERD

82. Create the relational diagram to show the relationships among EMPLOYEE, STORE, and REGION. Answer: The relational diagram is shown in Figure P3.9.

FIGURE P3.9 The Relational Diagram

NOTE The relational diagram in Figure P3.9 was generated in Microsoft Access. If a relationship already exists between two entities, Access generates a virtual table (in this case, EMPLOYEE_1) to generate the additional relationship. The virtual table cannot be queried; its only function is to store the manages relationship between EMPLOYEE and STORE. Just how multiple relationships are stored and managed is a function of the software you use. Use the database shown in Figure P3.10 to work Problems 10−16. Note that the database is composed of four tables that reflect these relationships: 

An EMPLOYEE has only one JOB_CODE, but a JOB_CODE can be held by many EMPLOYEEs.



An EMPLOYEE can participate in many PLANs, and any PLAN can be assigned to many EMPLOYEEs.

Note also that the M:N relationship has been broken down into two 1:M relationships for which the BENEFIT table serves as the composite or bridge entity.

FIGURE P3.10 The Ch03_BeneCo Database Tables

83. For each table in the database, identify the primary key and the foreign key(s). If a table does not have a foreign key, write None. Answer:

TABLE

PRIMARY KEY

FOREIGN KEY(S)

EMPLOYEE

EMP_CODE

JOB_CODE

BENEFIT

EMP_CODE + PLAN_CODE

EMP_CODE, PLAN_CODE

JOB

JOB-CODE

None

PLAN

PLAN_CODE

None

84. Create the ERD to show the relationship between EMPLOYEE and JOB. Answer: The ERD is shown in Figure P3.11. Note that the JOB_CODE = 1 occurs twice in the EMPLOYEE table, as does the JOB_CODE = 2, thus providing evidence that a JOB can be assigned to many EMPLOYEEs. But each EMPLOYEE has only one JOB_CODE, so there exists a 1:M relationship between JOB and EMPLOYEE.

FIGURE P3.11 The ERD for the EMPLOYEE–JOB Relationship

85. Create the relational diagram to show the relationship between EMPLOYEE and JOB. Answer: The relational schema is shown in Figure P3.12.

FIGURE P3.12 The Relational Diagram

86. Do the tables exhibit entity integrity? Answer yes or no and then explain your answer.

TABLE

ENTITY INTEGRITY

EXPLANATION

EMPLOYEE

Yes

Each EMP_CODE value is unique and there are no nulls.

BENEFIT

Yes

Each combination of EMP_CODE and PLAN_CODE values is unique and there are no nulls.

JOB

Yes

Each JOB_CODE value is unique and there are no nulls.

PLAN

Yes

Each PLAN_CODE value is unique and there are no nulls.

87. Do the tables exhibit referential integrity? Answer yes or no and then explain your answer. Write NA (Not Applicable) if the table does not have a foreign key. Answer:

TABLE

REFERENTIAL INTEGRITY

EXPLANATION

EMPLOYEE

Yes

Each JOB_CODE value in EMPLOYEE points to an existing JOB_CODE value in JOB.

BENEFIT

Yes

Each EMP_CODE value in BENEFIT points to an existing EMP_CODE value in EMPLOYEE and each PLAN_CODE value in BENEFIT points to an existing PLAN_CODE value in PLAN.

JOB

PLAN

88. Create the ERD to show the relationships among EMPLOYEE, BENEFIT, JOB, and PLAN. Answer: The Crow’s Foot ERD is shown in Figure P3.15.

FIGURE P3.15 BeneCo Crow’s Foot ERD

89. Create the relational diagram to show the relationships among EMPLOYEE, BENEFIT, JOB, and PLAN. Answer: The relational diagram is shown in Figure P3.16. Note that the location of the entities is immaterial—the relationships move with the entities.

FIGURE P3.16 The Relational Diagram

Use the database shown in Figure P3.17 to answer Problems 17–23.

FIGURE P3.17 The Ch03_TransCo Database Tables

90. For each table, identify the primary key and the foreign key(s). If a table does not have a foreign key, write None. Answer:

TABLE

PRIMARY KEY

FOREIGN KEY(S)

TRUCK

TRUCK_NUM

BASE_CODE, TYPE_CODE

BASE

BASE_CODE

None

TYPE

TYPE_CODE

None

NOTE The TRUCK_SERIAL_NUM could also be designated as the primary key. Because the TRUCK_NUM was designated to be the primary key, TRUCK_SERIAL_NUM is an example of a candidate key. 91. Do the tables exhibit entity integrity? Answer yes or no and then explain your answer. Answer:

TABLE

ENTITY INTEGRITY

EXPLANATION

TRUCK

Yes

The TRUCK_NUM values in the TRUCK table are all unique and there are no nulls.

BASE

Yes

The BASE_CODE values in the BASE table are all unique and there are no nulls.

TYPE

Yes

The TYPE_CODE values in the TYPE table are all unique and there are no nulls.

92. Do the tables exhibit referential integrity? Answer yes or no and then explain your answer. Write NA (Not Applicable) if the table does not have a foreign key. Answer:

TABLE

REFERENTIAL INTEGRITY

TRUCK

Yes

BASE

TYPE

EXPLANATION

The BASE_CODE values in the TRUCK table reference existing BASE_CODE values in the BASE table or they are null. (The TRUCK table’s BASE_CODE is null for TRUCK_NUM = 1004.) Also, the TYPE_CODE values in the TRUCK table reference existing TYPE_CODE values in the TYPE table.

93. Identify the TRUCK table’s candidate key(s). Answer: A candidate key is any key that could have been used as a primary key, but that was, for some reason, not chosen to be the primary key. For example, the TRUCK_SERIAL_NUM could have been selected as the PK, but the TRUCK_NUM was actually designated to be the PK. Therefore, the TRUCK_SERIAL_NUM is a candidate key. Also, any combination of attributes that would uniquely identify any truck would be a candidate key. For example, the combination of BASE_CODE, TYPE_CODE, TRUCK_MILES, and TRUCK_BUY_DATE is not likely to be duplicated and this combination would, therefore, be a candidate key. However, while the latter combination might constitute a candidate key, such a combination would not be practical. (An extreme—and impractical—example of a candidate key would be the combination of all of a table’s attributes.) Furthermore, this assumes that the TRUCK_MILES attribute represents the number of miles in the truck when we purchased and not the actual miles driven. The actual miles driven value changes over time. This will not be a good prime attribute choice as the attribute will always be different as time goes by.

NOTE Some of the answers to the following problem 21 define only a few of the available correct choices. For example, a superkey is, in effect, a candidate key containing redundant attributes. Therefore, any primary key plus any other attribute(s) is a superkey. Because a secondary key does not necessarily yield unique outcomes, the number of attributes that constitute a secondary key is somewhat arbitrary. The adequacy of a secondary key depends on the extent of the end-user’s willingness to accept multiple matches. Remember the secondary key is used to create indexes for data retrieval (search) purposes.

94. For each table, identify a superkey and a secondary key. Answer:

TABLE

SUPERKEY

SECONDARY KEY

TRUCK

TRUCK_NUM + TRUCK_MILES

BASE_CODE + TYPE_CODE

TRUCK_NUM + TRUCK_MILES + TRUCK_BUY_DATE

(This secondary key is likely to produce multiple matches, but it is not likely that end users will know attribute values such as TRUCK_MILES or TRUCK_BUY_DATE. Therefore, the selected attributes create a reasonable secondary key.)

TRUCK_NUM + TRUCK_MILES + TRUCK_BUY_DATE + TYPE_CODE BASE

TYPE

BASE_CODE + BASE_CITY

BASE_CITY + BASE_STATE

BASE_CODE + BASE_CITY + BASE_CITY

(This a very effective secondary key, since it is not likely that a state contains two cities with the same name.)

TYPE_CODE TYPE_DESCRIPTION

TYPE_DESCRIPTION

95. Create the ERD for this database. Answer: The Crow’s Foot ERD is shown in Figure P3.22.

FIGURE P3.22 TransCo Crow’s Foot ERD

96. Create the relational diagram for this database. Answer: The relational diagram is shown in Figure P3.23.

FIGURE P3.23 The Ch03_TransCo Relational Diagram

Use the database shown in Figure P3.24 to answer Problems 24−31. AviaCo is an aircraft charter company that supplies on-demand charter flight services using a fleet of four aircraft. Aircraft are identified by a unique registration number. Therefore, the aircraft registration number is an appropriate primary key for the AIRCRAFT table.

FIGURE P3.24 The Ch03_AviaCo Database Tables (Part 1) The nulls in the CHARTER table’s CHAR_COPILOT column indicate that a copilot is not required for some charter trips or for some aircraft. Federal Aviation Administration (FAA) rules require a copilot on jet aircraft and on aircraft that have a gross take-off weight over 12,500 pounds. None of the aircraft in the AIRCRAFT table are governed by this requirement; however, some customers may require the presence of a copilot for insurance reasons. All charter trips are recorded in the CHARTER table.

FIGURE P3.24 The Ch03_AviaCo Database Tables (Part 2)

NOTE Earlier in the chapter, you were instructed to avoid homonyms and synonyms. In this problem, both the pilot and the copilot are listed in the PILOT table, but EMP_NUM cannot be used for both in the CHARTER table. Therefore, the synonyms CHAR_PILOT and CHAR_COPILOT are used in the CHARTER table. Although the solution works in this case, it is very restrictive and it generates nulls when a copilot is not required. Worse, such nulls proliferate as crew requirements change. For example, if the AviaCo charter company grows and starts using larger aircraft, crew requirements may increase to include flight engineers and load masters. The CHARTER table would then have to be modified to include the additional crew assignments; such attributes as CHAR_FLT_ENGINEER and CHAR_LOADMASTER would have to be added to the CHARTER table. Given this change, each time a smaller aircraft flew a charter trip without the number of crew members required in larger aircraft, the missing crew members would yield additional nulls in the CHARTER table. You will have a chance to correct those design shortcomings in Problem 27. The problem illustrates two important points: 1. Don’t use synonyms. If your design requires the use of synonyms, revise the design! 2. To the greatest possible extent, design the database to accommodate growth without requiring structural changes in the database tables. Plan ahead and try to anticipate the effects of change on the database. 97. For each table, identify each of the following when possible: Answer: a. The primary key

TABLE

PRIMARY KEY

CHARTER

CHAR_TRIP

AIRCRAFT

AC_NUMBER

MODEL

MOD_CODE

PILOT

EMP_NUM

EMPLOYEE

EMP_NUM

CUSTOMER

CUS_CODE

b. A superkey

TABLE

SUPER KEY

CHARTER

CHAR_TRIP + CHAR_DATE

AIRCRAFT

AC_NUM + MOD-CODE

MODEL

MOD_CODE + MOD_NAME

PILOT

EMP_NUM + PIL_LICENSE

EMPLOYEE

EMP_NUM + EMP_DOB

CUSTOMER

CUS_CODE + CUS_LNAME

c. A candidate key

TABLE

CANDIDATE KEY

CHARTER

No practical candidate key is available. For example, CHAR_DATE + CHAR_DESTINATION + AC_NUMBER + CHAR_PILOT + CHAR_COPILOT will still not necessarily yield unique matches, because it is possible to fly an aircraft to the same destination twice on one date with the same pilot and copilot. You could, of course, present the argument that the combination of all the attributes would yield a unique outcome.

AIRCRAFT

See the previous discussion.

MODEL

See the previous discussion.

PILOT

See the previous discussion.

EMPLOYEE

See the previous discussion. But perhaps the combination of EMP_LNAME + EMP_FNAME + EMP_INITIAL + EMP_DOB will yield an acceptable candidate key.

CUSTOMER

See the previous discussion.

d. The foreign key(s)

TABLE

FOREIGN KEY

CHARTER

CHAR_PILOT (references PILOT) CHAR_COPILOT (references PILOT) AC_NUMBER (references AIRCRAFT) CUS_CODE (references CUSTOMER)

AIRCRAFT

MOD_CODE

MODEL

None

PILOT

EMP_NUM (references EMPLOYEE)

EMPLOYEE

None

CUSTOMER

None

e. A secondary key

TABLE

SECONDARY KEY

CHARTER

CHAR_DATE + AC_NUMBER + CHAR_DESTINATION

AIRCRAFT

MOD_CODE

MODEL

MOD_MANUFACTURER + MOD_NAME

PILOT

PIL_LICENSE + PIL_MED_DATE

EMPLOYEE

EMP_LNAME + EMP_FNAME + EMP_DOB

CUSTOMER

CUS_LNAME + CUS_FNAME + CUS_PHONE

98. Create the ERD. (Hint: Look at the table contents. You will discover that an AIRCRAFT can fly many CHARTER trips but that each CHARTER trip is flown by one AIRCRAFT, that a MODEL references many AIRCRAFT but that each AIRCRAFT references a single MODEL, and so on.) Answer: The Crow’s Foot ERD is shown in Figure P3.25. The optional (default) 1:1 relationship crops up in this ERD, just as it did in the Problem 8 solution. Use the same discussion that accompanied Problem 8. Also, note that EMPLOYEE is the “parent” of PILOT. Note that all pilots are employees, but not all employees are pilots—some are mechanics, accountants, and so on. (This discussion previews some of the Chapter 4 coverage … coming attractions, so to speak.) The relationship between PILOT and EMPLOYEE is read from the

“parent” entity to the related entity. In this case, the relationship is read as “an EMPLOYEE is a PILOT.”

FIGURE P3.25 The Ch03_AviaCo Database ERD

99. Create the relational diagram. Answer: The relational diagram is shown in Figure P3.26.

FIGURE P3.26 The Ch03_AviaCo Database Relational Diagram

100. Modify the ERD you created in Problem 25 to eliminate the problems created by the use of synonyms. (Hint: Modify the CHARTER table structure by eliminating the CHAR_PILOT and CHAR_COPILOT attributes; then create a composite table named CREW to link the CHARTER and EMPLOYEE tables. Some crew members, such as flight attendants, may not be pilots. That’s why the EMPLOYEE table enters into this relationship.) Answer: The Crow’s Foot ERD is shown in Figure P3.27.

FIGURE P3.27 The Ch03_AviaCo_2 Database ERD

101. Create the relational diagram for the design you revised in Problem 27. Answer: (After you have had a chance to revise the design, your instructor will show you the results of the design change, using a copy of the revised database named Ch03_AviaCo_2.) The relational diagram for the Ch03_AviaCo_2 database is shown in Figure P3.28. Note that there are a few additional entities that you will encounter again in Chapter 4. (You can safely ignore the extra entities, RATING and EARNEDRATING at this point … but you can let the students “read” the relationship between these two entities.) Note that you can easily derive the M:N relationship between PILOT and RATING. (A PILOT can earn many RATINGs. A RATING can be earned by many PILOTs.) Even though your students may not know what a rating is, they can still draw up conclusions about its relationship to other entities by looking at relational diagrams and ERDs. And that’s one of the many strengths of design tools. Also, you can let your students break the M:N relationship down into two 1:M relationships—note that this is done through the EARNEDRATING entity. The issues encountered in the design and implementation of the Ch3_AviaCo_2 database will be revisited many times in the book.

FIGURE P3.28 The Ch03_AviaCo_2 Relational Diagram

You want to see data on charters flown by either Robert Williams (employee number 105) or Elizabeth Travis (employee number 109) as pilot or copilot, but not charters flown by both of them. Complete Problems 29–31 to find this information. 102. Create the table that would result from applying the SELECT and PROJECT relational operators to the CHARTER table to return only the CHAR_TRIP, CHAR_PILOT, and CHAR_COPILOT attributes for charters flown by either employee 105 or employee 109. Answer:

CHAR_TRIP

CHAR_PILOT

CHAR_COPILOT

10003

105

109

10006

109

10009

105

10010

109

10013

105

10016

109

105

10018

105

104

103. Create the table that would result from applying the SELECT and PROJECT relational operators to the CHARTER table to return only the CHAR_TRIP, CHAR_PILOT, and CHAR_COPILOT attributes for charters flown by both employee 105 and employee 109. Answer:

CHAR_TRIP

CHAR_PILOT

CHAR_COPILOT

10003

105

109

10016

109

105

104. Create the table that would result from applying a DIFFERENCE relational operator of your result from Problem 29 to your result from Problem 30. Answer:

CHAR_TRIP

CHAR_PILOT

10006

109

10009

105

10010

109

10013

105

10018

105

CHAR_COPILOT

104

TABLE OF CONTENTS Answers to Review Questions………………………… ...........……………………………………….1 Answers to Problems……………………………………………………………………………..…. 22 Case Solutions ........................................................................................................................52

ANSWERS TO REVIEW QUESTIONS 105. What two conditions must be met before an entity can be classified as a weak entity? Give an example of a weak entity.

Answer: To be classified as a weak entity, two conditions must be met: 1. The entity must be existence-dependent on its parent entity. 2. The entity must inherit at least part of its primary key from its parent entity. For example, the (strong) relationship depicted in the text’s Figure 4.9 shows a weak CLASS entity: 1. CLASS is clearly existence-dependent on COURSE. (You can’t have a database class unless a database course exists.) 2. The CLASS entity’s PK is defined through the combination of CLASS_SECTION and CRS_CODE. The CRS_CODE attribute is also the PK of COURSE. The conditions that define a weak entity are the same as those for a strong relationship between an entity and its parent. In short, the existence of a weak entity produces a strong relationship. And if the entity is strong, its relationship to the other entity is weak. (Note the dotted relationship line in the text’s Figure 4.9 when the relationship is weak.) Keep in mind that whether or not an entity is weak usually depends on the database designer’s decisions. For instance, if the database designer had decided to use a single attribute as shown in the text’s Figure 4.9, the CLASS entity would be strong. (The CLASS entity’s PK is CLASS_CODE, which is not derived from the COURSE entity.) In this case, the relationship between COURSE and CLASS is weak. (Note the dashed relationship line in the text’s Figure 4.9.) If the designer chose the composite key, as shown in Figure 4.10, the relationship is strong, as denoted by the solid line. However, regardless of how the designer classifies the relationship—weak or strong—CLASS is always existencedependent on COURSE.

106. What is a strong (or identifying) relationship, and how is it depicted in a Crow’s Foot ERD? Answer: A strong relationship exists when an entity is existence-dependent on another entity and inherits at least part of its primary key from that entity. A strong relationship is shown as a solid line. In other words, a strong relationship exists when a weak entity is related to its parent entity. (Note the discussion in Question 1.) 107. Given the business rule “an employee may have many degrees,” discuss its effect on attributes, entities, and relationships. (Hint: Remember what a multivalued attribute is and how it might be implemented.) Answer: Suppose that an employee has the following degrees: BA, BS, and MBA. These degrees could be stored in a single string as a multivalued attribute named EMP_DEGREE in an EMPLOYEE table such as the one shown next:

EMP_NUM

EMP_LNAME

EMP_DEGREE

123

Carter

AA, BBA

124

O’Shanski

BBA, MBA, Ph.D.

125

Jones

126

Ortez

BS, MS

Although the preceding solution has no obvious design flaws, it is likely to yield reporting problems. For example, suppose you want to get a count for all employees who have BBA degrees. You could, of course, do an “in-string” search to find all of the BBA values within the EMP_DEGREE strings. But such a solution is cumbersome from a reporting point of view. Query simplicity is a valuable thing to application developers—and to end users who like maximum query execution speeds. Database designers ought to pay some attention to the competing database interests that exist in the data environment. One—very poor—solution is to create a field for each expected value. This “solution” is shown next:

EMP_NUM

EMP_LNAME

EMP_DEGREE1 EMP_DEGREE2

123

Carter

BBA

124

O’Shanski

BBA

MBA

125

Jones

126

Ortez

EMP_DEGREE3

Ph.D.

This “solution” yields nulls for all employees who have fewer than three degrees. And if even one employee earns a fourth degree, the table structure must be altered to accommodate the new data value. (One piece of evidence of poor design is the need to alter table structures in response to the need to add data of an existing type.) In addition, the query simplicity is not enhanced by the fact that any degree can be listed in any column. For example, a BA degree might be listed in the second column, after an associate of arts (AA) degree has been entered in EMP_DEGREE1. One might simplify the query environment by creating a set of attributes that define the data entry, thus producing the following results: EMP_NUM

EMP_LNAME

123

Carter

124

O’Shanski

125

Jones

126

Ortez

EMP_AA

EMP_AS

EMP_BA

EMP_BS

EMP_BB A

EMP_MS

EMP_MBA

EMP_PhD

X X X X

This “solution” clearly proliferates the nulls at an ever-increasing pace. The only reasonable solution is to create a new DEGREE entity that stores each degree in a separate record, thus producing the following tables. (There is a 1:M relationship between EMPLOYEE and DEGREE. Note that the EMP_NUM can occur more than once in the DEGREE table. The DEGREE table’s PK is EMP_NUM + DEGREE_CODE. This solution also makes it possible to record the date on which the degree was earned, the institution from which it was earned, and so on.

Table name: EMPLOYEE EMP_NUM

EMP_LNAME

123

Carter

124

O’Shanski

125

Jones

126

Ortez

Table name: DEGREE EMP_NUM

DEGREE_CODE

DEGREE_DATE

DEGREE_PLACE

123

May-1999

Lake Sumter CC

123

BBA

Aug-2004

U. of Georgia

124

BBA

Dec-1990

U. of Toledo

124

MBA

May-2001

U. of Michigan

124

Ph.D.

Dec-2005

U. of Tennessee

125

Aug-2002

Valdosta State

126

Dec-1989

U. of Missouri

126

May-2002

U. of Florida

Note that this solution leaves no nulls, produces a simple query environment, and makes it unnecessary to alter the table structure when employees earn additional degrees. (You can make the environment even more flexible by naming the new entity QUALIFICATION, thus making it possible to store degrees, certifications, and other useful data that define an employee’s qualifications.) 108. What is a composite entity, and when is it used? Answer: A composite entity, also known as a bridge entity, is generally used to transform complex relationships that cannot be implemented in the relational model. For example, it is used to implement M:N relationships or higher order relationships by decomposing the relationship into 1:M relationships. This allows for properly implementable placement of FKs. 109. Suppose you are working within the framework of the conceptual model in Figure Q4.5. Answer:

FIGURE Q4.5 The Conceptual Model for Question 5

Given the conceptual model in Figure Q4.5: a. Write the business rules that are reflected in it. Even a simple ERD such as the one shown in Figure Q4.5 is based on many business rules. Make sure that each business rule is written on a separate line and that all of its details are spelled out. In this case, the business rules are derived from the ERD in a “reverseengineering” procedure designed to document the database design. In a real-world database design situation, the ERD is generated on the basis of business rules that are written before the first entity box is drawn. (Remember that the business rules are derived from a carefully and precisely written description of operations.)

Given the ERD shown in Figure Q4.5, you can identify the following business rules: 1. A customer can own many cars. 2. Some customers do not own cars. 3. A car is owned by one and only one customer. 4. A car may generate one or more maintenance records. 5. Each maintenance record is generated by one and only one car. 6. Some cars have not (yet) generated a maintenance procedure. 7. Each maintenance procedure can use many parts. (Comment: A maintenance procedure may include multiple maintenance actions, each one of which may or may not use parts. For example, 10,000-mile check may include the installation of a new oil filter and a new air filter. But tightening an alternator belt does not require a part.) 8. A part may be used in many maintenance records. (Comment: Each time an oil change is made, an oil filter is used. Therefore, many oil filters may be used during some period of time. Naturally, you are not using the same oil filter each time—but the part classified as “oil filter” shows up in many maintenance records as time passes.) Note that the apparent M:N relationship between MAINTENANCE and PART has been resolved through the use of the composite entity named MAINT_LINE. The MAINT_LINE entity ensures that the M:N relationship between MAINTENANCE and PART has been broken up to produce the two 1:M relationships shown in business rules 9 and 10. 9. Each maintenance procedure generates one or more maintenance lines. 10. Each part may appear in many maintenance lines. (Review the comment in business rule 8.) As you review the business rules 9 and 10, use the following two tables to show some sample data entries. For example, take a look at the (simplified) contents of the following MAINTENANCE and LINE tables and note that the MAINT_NUM 10001 occurs three times in the LINE table:

Sample MAINTENANCE Table Data MAINT_NUM

MAINT_DATE

10001

15-Mar-2022

10002

15-Mar-2022

10003

16-Mar-2022

Sample LINE Table Data MAINT_NUM

LINE_NUM

LINE_DESCRIPTION LINE_PART

LINE_UNITS

10001

Replace fuel filter

FF-015

10001

Replace air filter

AF-1187

10001

Tighten belt

alternator

10002

Replace bulbs

taillight

BU-2145

10003

Replace oil filter

OF-2113

10003

Replace air filter

AF-1187

b. Identify all of the cardinalities. The Crow’s Foot ERD, shown in Figure Q4.5, does not show cardinalities directly. Instead, the cardinalities are implied through the Crow’s Foot symbols for connectivity. You might write the cardinality (0,N) next to the MAINT_LINE entity in its relationship with the PART entity to indicate that a part might occur “N” times in the maintenance line entity or that it might never show up in the maintenance line entity. The latter case would occur if a given part has never been used in maintenance. 110. What is a recursive relationship? Give an example. Answer: A recursive relationship exists when an entity is related to itself, that is, some instances (rows) in the entity (table) are related to other instances (rows) in that same entity (table). For example, a COURSE may be a prerequisite to a COURSE. (See Section 4.1j, “Recursive Relationships,” for additional examples.) 111. How would you (graphically) identify each of the following ERM components in a Crow’s Foot notation? Answer: The answers to Questions (a) through (d) are illustrated with the help of Figure Q4.7.

FIGURE Q4.7 Crow’s Foot ERM Components

a. An entity An entity is represented by a rectangle containing the entity name. (Remember that, in ER modeling, the word “entity” actually refers to the entity set.) b. The cardinality (0,N) Cardinalities are implied through the use of Crow’s Foot symbols for connectivity. For example, note the implied (0,N) cardinality in Figure Q4.7. c. A weak relationship A weak relationship exists when the PK of the related entity does not contain at least one of the PK attributes of the parent entity. For example, if the PK of a COURSE entity is CRS_CODE and the PK of the related CLASS entity is CLASS_CODE, the relationship between COURSE and CLASS is weak. (Note that the CLASS PK does not include the CRS_CODE attribute.) A weak relationship can be indicated by a dashed line in the ERD. d. A strong relationship A strong relationship exists when the PK of the related entity contains at least one of the PK attributes of the parent entity. For example, if the PK of a COURSE entity is CRS_CODE and the PK of the related CLASS entity is CRS_CODE + CLASS_SECTION, the relationship between COURSE and CLASS is strong. (Note that the CLASS PK includes the CRS_CODE attribute.) A strong relationship can be indicated by a solid line in the ERD.

112. Discuss the difference between a composite key and a composite attribute. How would each be indicated in an ERD? Answer: A composite key is one that consists of more than one attribute. If the ER diagram contains the attribute names for each of its entities, a composite key is indicated in the ER diagram by the fact that more than one attribute name is underlined to indicate its participation in the primary key. A composite attribute is one that can be subdivided to yield meaningful attributes for each of its components. For example, the composite attribute CUS_NAME can be subdivided to yield the CUS_FNAME, CUS_INITIAL, and CUS_LNAME attributes. There is no ER convention that enables us to indicate that an attribute is a composite attribute. 113. What two courses of action are available to a designer who encounters a multivalued attribute? Answer: The discussion that accompanies the answer to Question 3 is valid as an answer to this question. Briefly, the multivalued attribute can be separated into multiple columns, or it can be placed in a separate table. The first option is only appropriate if the designer knows, absolutely knows, the maximum number of possible values any row could have for the attribute. This can yield a workable solution but is fraught with numerous issues for performance and querying. The second option of using a separate table will always yield a workable solution, and generally has the best performance and querying capabilities. For additional insight, see discussion in Section 4-1b, in particular Figures 4.3, 4.4, and 4.5 and Table 4.1. 114. What is a derived attribute? Give an example. What are the advantages or disadvantages of storing or not storing a derived attribute? Answer: A derived attribute is an attribute whose value is calculated (derived) from other attributes. The derived attribute need not be physically stored within the database; instead, it can be derived by using an algorithm. For example, an employee’s age, EMP_AGE, may be found by computing the integer value of the difference between the current date and the EMP_DOB. If you use MS Access, you would use INT((DATE() − EMP_DOB)/365). Similarly, a sales clerk’s total gross pay may be computed by adding a computed sales commission to base pay. For instance, if the sales clerk’s commission is 1%, the gross pay may be computed by EMP_GROSSPAY = INV_SALES*1.01 + EMP_BASEPAY Or the invoice line item amount may be calculated by LINE_TOTAL = LINE_UNITS*PROD_PRICE Advantages of storing a derived attribute include reduced complexity of the query to retrieve the computed values and less processing overhead at the time of retrieval. Disadvantages of storing a derived attribute include increased possibility of data inconsistency and increased processing overhead at the time of storage.

Advantages of not storing a derived attribute include reduced risk of data inconsistency or stale values. Disadvantages of not storing a derived attribute include increased query complexity and performance penalties for calculating the value when it is needed. 115. How is a relationship between entities indicated in an ERD? Give an example using the Crow’s Foot notation. Answer: Use Figure Q4.7 as the basis for your answer. Briefly, a relationship is indicated by a line connecting the related entities. Note the distinction between the dashed and solid relationship lines, then tie this distinction to the answers to Questions 7c and 7d. 116. Discuss two ways in which the 1:M relationship between COURSE and CLASS can be implemented. (Hint: Think about relationship strength.) Answer: Note the discussion about weak and strong entities in Questions 7c and 7d. Then follow up with this discussion: The relationship is implemented as strong when the CLASS entity’s PK contains the COURSE entity’s PK. For example, COURSE(CRS_CODE, CRS_TITLE, CRS_DESCRIPTION, CRS_CREDITS) CLASS(CRS_CODE, CLASS_SECTION, CLASS_TIME, CLASS_PLACE) Note that the CLASS entity’s PK is CRS_CODE + CLASS_SECTION—and that the CRS_CODE component of this PK has been “borrowed” from the COURSE entity. Because CLASS is existence-dependent on COURSE and uses a PK component from its parent (COURSE) entity, the CLASS entity is weak in this strong relationship between COURSE and CLASS. The Visio Crow’s Foot ERD shows a strong relationship as a solid line. (See Figure Q4.12a.) Visio refers to a strong relationship as an identifying relationship.

FIGURE Q4.12a Strong COURSE and CLASS Relationship

Sample data are shown next:

Table name: COURSE CRS_CODE

CRS_TITLE

CRS_DESCRIPTION

CRS_CREDITS

ACCT-211

Basic Accounting

An introduction to accounting. Required of all business majors.

CIS-380

Database Techniques I

Database design and implementation issues. Uses CASE tools to generate designs that are then implemented in a major database management system.

CIS-490

Database Techniques II

The second half of CIS-380. Basic Web database application development and management issues.

Table name: CLASS CRS_CODE

CLASS_SECTION

CLASS_TIME

CLASS_PLACE

ACCT-211

8:00 a.m. – 9:30 a.m. T-Th.

Business 325

ACCT-211

8:00 a.m. – 8:50 a.m. MWF

Business 325

ACCT-211

8:00 a.m. – 8:50 a.m. MWF

Business 402

CIS-380

11:00 a.m. – 11:50 a.m. MWF

Business 415

CIS-380

3:00 p.m. – 3:50 a.m. MWF

Business 398

CIS-490

1:00 p.m. – 3:00 p.m. MW

Business 398

CIS-490

6:00 p.m. – 10:00 p.m. Th.

Business 398

The relationship is implemented as weak when the CLASS entity’s PK does not contain the COURSE entity’s PK. For example, COURSE(CRS_CODE, CRS_TITLE, CRS_DESCRIPTION, CRS_CREDITS) CLASS(CLASS_CODE, CRS_CODE, CLASS_SECTION, CLASS_TIME, CLASS_PLACE) (Note that CRS_CODE is no longer part of the CLASS PK, but that it continues to serve as the FK to COURSE.) The Crow’s Foot ERD shows a weak relationship as a dashed line. (See Figure Q4.12b.) A weak relationship is also known as a non identifying relationship.

FIGURE Q4.12b Weak COURSE and CLASS Relationship

Given the weak relationship depicted in Figure Q4.12b, the CLASS table contents would look like this:

Table name: CLASS CLASS_CODE

CRS_CODE

CLASS_SECTION

CLASS_TIME

CLASS_PLACE

21151

ACCT-211

8:00 a.m. – 9:30 a.m. T-Th.

Business 325

21152

ACCT-211

8:00 a.m. – 8:50 a.m. MWF

Business 325

21153

ACCT-211

8:00 a.m. – 8:50 a.m. MWF

Business 402

38041

CIS-380

11:00 a.m. – 11:50 a.m. MWF

Business 415

38042

CIS-380

3:00 p.m. – 3:50 a.m. MWF

Business 398

49041

CIS-490

1:00 p.m. – 3:00 p.m. MW

Business 398

49042

CIS-490

6:00 p.m. – 10:00 p.m. Th.

Business 398

The advantage of the second CLASS entity version is that its PK can be referenced easily as an FK in another related entity such as ENROLL. Using a single-attribute PK makes implementation easier. This is especially true when the entity represents the “1” side in one or more relationships. In general, it is advisable to avoid composite PKs whenever it is practical to do so. 117. How is a composite entity represented in an ERD, and what is its function? Illustrate the Crow’s Foot notation. Answer: The label “composite” is based on the fact that the composite entity contains at least the primary key attributes of each of the entities that are connected by it. The composite entity is an important component of the ER model because relational database models should not contain M:N relationships—and the composite entity can be used to break up such relationships into 1:M relationships. Suppose, for example, that you want to design a class enrollment entity to serve as the “bridge” between STUDENT and CLASS in the M:N relationship defined by these two business rules: 

A student can take many classes.



Each class can be taken by many students.

In this case, you could create a (composite) entity named ENROLL to link CLASS and STUDENT, using these structures: STUDENT(STU_NUM, STU_LNAME …) ENROLL(STU_NUM, CLASS_NUM, ENROLL_GRADE …) CLASS(CLASS_CODE, CRS_CODE, CLASS_SECTION, CLASS_TIME, CLASS_PLACE) 118. What three (often conflicting) database requirements must be addressed in database design? Answer: Database design must reconcile the following requirements: a. Design elegance requires that the design must adhere to design rules concerning nulls, derived attributes, redundancies, relationship types, and so on. b. Information requirements are dictated by the end users. c. Operational (transaction) speed requirements are also dictated by the end users. Clearly, an elegant database design that fails to address end-user information requirements or one that forms the basis for an implementation whose use progresses at a snail’s pace has little practical use. 119. Briefly, but precisely, explain the difference between single-valued attributes and simple attributes. Give an example of each. Answer: A single-valued attribute is one that can have only one value. For example, a person has only one first name and only one social security number. A simple attribute is one that cannot be decomposed into its component pieces. For example, a person’s sex is classified as either M or F and there is no reasonable way to decompose M or F. Similarly, a person’s first name cannot be decomposed into meaningful components. (In contrast, if a phone number includes the area code, it can be decomposed into the area code and the phone number. And a person’s name may be decomposed into a first name, an initial, and a last name.) Single-valued attributes are not necessarily simple. For example, an inventory code HWPRIJ23145 may refer to a classification scheme in which HW indicates Hardware, PR indicates Printer, IJ indicates Inkjet, and 23145 indicates an inventory control number. Therefore, HWPRIJ23145 may be decomposed into its component parts, even though it is single-valued. To facilitate product tracking, manufacturing serial codes must be singlevalued, but they may not be simple. For instance, the product serial number TNP5S2M231109154321 might be decomposed this way: TN = state = Tennessee P5 = plant number 5 S2 = shift 2 M23 = machine 23 11 = month, i.e., November 09 = day

154321 = time on a 24-hour clock, i.e., 15:43:21, or 3:43 p.m. plus 21 seconds. 120. What are multivalued attributes, and how can they be handled within the database design? Answer: The answer to Question 3 is just as valid as an answer to this question. You can augment that discussion with the following discussion: As the name implies, multivalued attributes may have many values. For example, a person’s education may include a high school diploma, a two-year college associate degree, a fouryear college degree, a Master’s degree, a Doctoral degree, and various professional certifications such as a Certified Public Accounting certificate or a Certified Data Processing Certificate. There are basically two ways to handle multivalued attributes—three if you count ignoring the fact that it is multivalued, and two of those three ways are bad: 1. If we ignore that the attribute is multivalued, then the educational attainments may be kept as a single, variable-length string or character field. This solution is undesirable because it becomes difficult to query the table. For example, even a simple question such as “how many employees have four-year college degrees?” requires string partitioning that is time-consuming at best. Of course, if there is no need to ever group employees by education, the variable-length string might be acceptable from a design point of view. However, as database designers we know that, sooner or later, information requirements are likely to grow, so the string storage is probably a bad idea from that perspective, too. 2. Each of the possible outcomes is kept as a separate attribute within the table. This solution is undesirable for several reasons. First, the table would generate many nulls for those who had minimal educational attainments. Using the preceding example, a person with only a high school diploma would generate nulls for the two-year college associate degree, the four-year college degree, the Master’s degree, the Doctoral degree, and for each of the professional certifications. In addition, how many professional certification attributes should be maintained? If you store two professional certification attributes, you will generate a null for someone with only one professional certification and you’d generate two nulls for all persons without professional certifications. And suppose you have a person with five professional certifications? Would you create additional attributes, thus creating many more nulls in the table, or would you simply ignore the additional professional certifications, thereby losing information? 3. Finally, the most flexible way to deal with multivalued attributes is to create a composite entity that links employees to education. By using the composite entity, there will never be a situation in which additional attributes must be created within the EMPLOYEE table to accommodate people with multiple certifications. In short, we eliminate the generation of nulls. In addition, we gain information flexibility because we can also store the details (date earned, place earned, etc.) for each of the educational attainments. The (simplified) structures might look like those in Figures Q4.16a and Q4.16b.

FIGURE Q4.16a The Ch04_Questions Database Tables

FIGURE Q4.16b The Ch04_Questions Relational Diagram

By looking at the structures shown in Figures Q4.16a and Q4.16b, we can tell that the employee named Romero earned a Bachelor’s degree in 1989, a Certified Network Professional certification in 2002, and a Certified Data Processing certification in 2004. If Randall were to earn a Master’s degree and a Certified Public Accountant certification later, we merely add another two records in the EMP_EDUC table. If additional educational attainments beyond those listed in the EDUCATION table are earned by any employee, all we need to do is add the appropriate record(s) to the EDUCATION table, then enter the employee’s attainments in the EMP_EDUC table. There are no nulls, we have superb query capability, and we have flexibility. Not a bad set of design goals! The database design on which Figures Q4.16a and Q4.16b are based is shown in Figure Q4.16c.

Figure Q4.16c The Crow’s Foot ERD for the Ch04_Questions Database

NOTE Discuss with the students that the design in Figure Q4.16c shows that an employee must meet at least one educational requirement, because EMP_EDUC is not optional to EMPLOYEE. Thus each employee must appear at least once in the EMP_EDUC table. And, given this design, some of the educational attainments may not yet be earned by employees, because the design shows EMP_EDUC to be optional to EDUCATION. In other words, some of the EDUCATION records are not necessarily referenced by any employee. (In the original M:N relationship between EMPLOYEE and EDUCATION, EMPLOYEE must have been optional to EDUCATION.) Questions 17–20 are based on the ERD in Figure Q4.17.

Figure Q4.17 The ERD for Questions 17−20

121. Write the 10 cardinalities that are appropriate for this ERD. Answer: The cardinalities are indicated in Figure Q4.17sol.

FIGURE Q4.17sol The Cardinalities

122. Write the business rules reflected in this ERD. Answer: The following business rules are reflected in the ERD: 

A store may place many orders. (Note the use of “may”—which is reflected in the ORDER optionality.)



An order must be placed by a store. (Note that STORE is mandatory to ORDER. In this ERD, the order environment apparently reflects a wholesale environment.)



An order contains at least one order line. (Note that ORDER_LINE is mandatory to ORDER, and vice versa.)



Each order line is contained in one and only one order. (Discussion: Although a given item—such as a hammer—may be found in many orders, a specific hammer sold to a specific store is found in only one order.)



Each order line has a specific product written in it.



A product may be written in many orders. (Discussion: Many stores can order one or more specific products, but a product that is not in demand may never be sold to a store and will, therefore, not show up in any order line—note that ORDER_LINE is optional to PRODUCT. Also, note that each order line may indicate more than one of a specific item. For example, the item may be “hammer” and the number sold may be 1 or 2, or 500. The ORDER_LINE entity would have at least the following attributes: ORDER_NUM, ORDLINE_NUM, PROD_CODE, ORDLINE_PRICE, ORDLINE_QUANTITY. The ORDER_LINE composite PK would be ORDER_NUM + ORDLINE_NUM. You might add the derived attribute ORDLINE_AMOUNT, which would be the result of multiplying ORDLINE_PRICE and ORDLINE_QUANTITY.)



A store may employ many employees. (Discussion: A new store may not yet have any employees, yet the database may already include the new store information … location, type, and so on. If you made the EMPLOYEE entity mandatory to STORE, you would have to create an employee for that store before you had even hired one.)



Each employee is employed by one (and only one) store.



An employee may have one or more dependents. (Discussion: You cannot require an employee to have dependents, so DEPENDENT is optional to EMPLOYEE. Note the use of the word “may” in the relationship.)



A dependent must be related to an employee. (Discussion: It makes no sense to keep track of dependents of people who are not even employees. Therefore, EMPLOYEE is mandatory to DEPENDENT.)

123. What two attributes must be contained in the composite entity between STORE and PRODUCT? Use proper terminology in your answer. Answer: As modeled in the figure, ORDER_LINE is the only composite entity between STORE and PRODUCT. The composite entity must at least include the primary keys of the entities it references. The combination of these attributes may be designated to be the composite entity’s (composite) primary key. Each of the (composite) primary key’s attributes is a foreign key that references the entities for which the composite entity serves as a bridge. As you discuss the model in Figure Q4.17sol, note that an order is represented by two entities, ORDER and ORDER_LINE. Note also that the STORE’s 1:M relationship with ORDER and the ORDER’s 1:M relationship with ORDER_LINE reflect the conceptual M:N relationship between STORE and PRODUCT. The original business rules probably read: 

A store can order many products.



A product can be ordered by many stores.

124. Describe precisely the composition of the DEPENDENT weak entity’s primary key. Use proper terminology in your answer. Answer: If DEPENDENT is considered a weak entity, as the question states, then it will have a composite PK that includes the EMPLOYEE entity’s PK and one of its attributes. For example, if the EMPLOYEE entity’s PK is EMP_NUM, the DEPENDENT entity’s PK might be EMP_NUM + DEP_NUM. Note that modeling DEPENDENT as a weak entity is not required, as is shown by the use of a strong relationship between EMPLOYEE and DEPENDENT. In such a case, the PK of DEPENDENT could be a single attribute such as DEP_NUM alone, depending on the domain of the attribute. 125. The local city youth league needs a database system to help track children who sign up to play soccer. Data need to be kept on each team, the children who will play on each team, and their parents. Also, data need to be kept on the coaches for each team. Answer: Draw a data model with the entities and attributes described here. Entities required: Team, Player, Coach, and Parent Attributes required: Team: Team ID number, Team name, and Team colors Player: Player ID number, Player first name, Player last name, and Player age Coach: Coach ID number, Coach first name, Coach last name, and Coach home phone number Parent: Parent ID number, Parent last name, Parent first name, Home phone number, and Home address (Street, City, State, and Zip code) The following relationships must be defined: 

Team is related to Player.



Team is related to Coach.



Player is related to Parent.

Connectivities and participations are defined as follows: 

A Team may or may not have a Player.



A Player must have a Team.



A Team may have many Players.



A Player has only one Team.



A Team may or may not have a Coach.



A Coach must have a Team.



A Team may have many Coaches.



A Coach has only one Team.



A Player must have a Parent.



A Parent must have a Player.



A Player may have many Parents.



A Parent may have many Players.

This is a great exercise in that it opens up possibilities for several discussion points. The conceptual ERD prior to placement of foreign keys and the resolution of the M:N relationship is shown in Figure Q4.21a.

FIGURE Q4.21a Conceptual ERD for Question 21

The most apparent issue that must be resolved is the M:N relationship. This is necessary so that foreign keys can be appropriately placed throughout the data model. The revised ERD with properly placed foreign keys is shown in Figure Q4.21b.

FIGURE Q4.21b ERD with Foreign Keys for Question 21

This solution, however, still leaves an interesting question about the Team_Colors attribute. What if teams have more than one color as is implied by the plural “colors” being used by the business users? Let’s consider three options: (1) leave it as is (as if Team_Colors is a singlevalued attribute), (2) create multiple attributes within the TEAM entity, or (3) create a new COLOR table. Team_Colors may be left as a single attribute if it is determined through discussion with the business users that they are not concerned with dealing with the different colors individually. For example, they will never be interested to know how many teams have the color Blue as one of their team colors, then we may choose to implement the design as given above. However, if the users are interested, or foresee the possibility that at some time in the future they may become interested, in addressing the different colors for a given team individually, then we must modify the above design to accommodate this need. If we determine that all teams have the same number of colors, and no team now or in the future will ever have more than that number of colors, then we may modify the design by adding additional attributes in the TEAM entity. For example, if all teams, now and forever, will always have exactly two team colors then we may produce the design shown in Figure Q4.21c.

FIGURE Q4.21c ERD with Two Team Colors for Question 21

This is a reasonable solution given the assurance that all teams now and forever will have exactly two team colors. A problem arises, however, if we cannot rely on that assurance. If some teams have fewer colors, then our design will lead to an increased number of nulls. If a team ever has more than two colors, we will have to modify the structure of the database after it has been built to add another team color attribute. This change in structure may require changes in the front-end applications so that they can properly address this new attribute. To avoid these potentially serious modifications in the future, we can redesign the database with a more robust structure that can handle any number of team colors without future modifications to the database or the front-end applications. The design with a separate table to handle the multivalued Team_Colors attribute is shown in Figure Q4.21d.

FIGURE Q4.21d ERD with Color Table for Question 21

ANSWERS TO PROBLEMS Use the following business rules to create a Crow’s Foot ERD. Write all appropriate connectivities and cardinalities in the ERD. Answer: 

A department employs many employees, but each employee is employed by only one department.



Some employees, known as “rovers,” are not assigned to any department.



A division operates many departments, but each department is operated by only one division.



An employee may be assigned many projects, and a project may have many employees assigned to it.



A project must have at least one employee assigned to it.



One of the employees manages each department, and each department is managed by only one employee.



One of the employees runs each division, and each division is run by only one employee.

The answers to Problem 1 (all parts) are included in Figure P4.1.

FIGURE P4.1 Problem 1 ERD Solution

As you discuss the ERD shown in Figure P4.1, note that this design reflects several useful features that become especially important when the design is implemented. For example:



The ASSIGN entity is shown to be optional to the PROJECT. This decision makes sense from a practical perspective, because it lets you create a new project record without having to create a new assignment record. (If a new project is started, there will not yet be any assignments.)



The relationship expressed by “DEPARTMENT employs EMPLOYEE” is shown as mandatory on the EMPLOYEE side. This means that a DEPARTMENT must have at least one EMPLOYEE in order to have departmental status. However, DEPARTMENT is optional to EMPLOYEE, so an employee can be entered without entering a departmental FK value. If the existence of nulls is not acceptable, you can create a “No assignment” record in the DEPARTMENT table, to be referenced in the EMPLOYEE table if an employee is not assigned to a department.



Note also the implications of the 1:1 “EMPLOYEE manages DEPARTMENT” relationship. The flip side of this relationship is that “each DEPARTMENT is managed by one EMPLOYEE.” (This latter relationship is shown as mandatory in the ERD. That is, each department must be managed by an employee!) Therefore, one of the EMPLOYEE table’s PK values must appear as the FK value in the DEPARTMENT table. (Because this is a 1:1 relationship, the index property of the EMP_NUM FK in the DEPARTMENT table must be set to “unique.”)



Although you ought to approach a 1:1 relationship with caution—most 1:1 relationships are the result of a misidentification of attributes as entities—the 1:1 relationships reflected in the “EMPLOYEE manages DEPARTMENT” and “EMPLOYEE runs DIVISION” are appropriate. These 1:1 relationships avoid the data redundancies you would encounter if you duplicated employee data—such as names, phones, and e-mail addresses—in the DIVISION and DEPARTMENT entities.

Also, if you have multiple relationships between two entities—such as the “EMPLOYEE manages DEPARTMENT” and “DEPARTMENT employs EMPLOYEE” relationships—you must make sure that each relationship has a designated primary entity. For example, the 1:1 relationship expressed by “EMPLOYEE manages DEPARTMENT” requires that the EMPOYEE entity be designated as the primary (or “first”) entity. If you use Visio to create your Crow’s Foot ERDs, Figure P4.3 shows how the 1:1 relationship is specified. If you use some other CASE tool, you will discover that it, too, is likely to require similar relationship specifications. 126. Create a complete ERD in Crow’s Foot notation that can be implemented in the relational model using the following description of operations. Hot Water (HW) is a small start-up company that sells spas. HW does not carry any stock. A few spas are set up in a simple warehouse so customers can see some of the models available, but any products sold must be ordered at the time of the sale. Answer: 

HW can get spas from several different manufacturers.



Each manufacturer produces one or more different brands of spas.



Each and every brand is produced by only one manufacturer.



Every brand has one or more models.



Every model is produced as part of a brand. For example, Iguana Bay Spas is a manufacturer that produces Big Blue Iguana spas, a premium-level brand, and Lazy Lizard spas, an entry-level brand. The Big Blue Iguana brand offers several models, including the BBI-6, an 81-jet spa with two 6-hp motors, and the BBI-10, a 102-jet spa with three 6-hp motors.



Every manufacturer is identified by a manufacturer code. The company name, address, area code, phone number, and account number are kept in the system for every manufacturer.



For each brand, the brand name and brand level (premium, mid-level, or entry-level) are kept in the system.



For each model, the model number, number of jets, number of motors, number of horsepower per motor, suggested retail price, HW retail price, dry weight, water capacity, and seating capacity must be kept in the system.

FIGURE P4.2 Problem 2 ERD Solution

127. The Jonesburgh County Basketball Conference (JCBC) is an amateur basketball association. Each city in the county has one team as its representative. Each team has a maximum of 12 players and a minimum of 9 players. Each team also has up to 3 coaches (offensive, defensive, and physical training coaches). During the season, each team plays 2 games (home and visitor) against each of the other teams. Given those conditions, do the following: 

Identify the connectivity of each relationship.



Identify the type of dependency that exists between CITY and TEAM.



Identify the cardinality between teams and players and between teams and city.



Identify the dependency between COACH and TEAM and between TEAM and PLAYER.



Draw the Chen and Crow’s Foot ERDs to represent the JCBC database.



Draw the UML class diagram to depict the JCBC database.

The Chen ERD solution is shown in Figure P4.3Chen. (The Crow’s Foot solution is shown after the discussion.)

FIGURE P4.3 Chen The JCBC Chen ERD M

M GAME

(1,1)

sponsors

CITY (1,1)

(2,N)

(1,1)

1 has

TEAM (1,1)

(1,3)

(9,12)

PLAYER (1,1)

is coached by

(1,1)

COACH

To help the students understand the ER diagram’s components better, note the following relationships: 

The main components are TEAM and GAME.



Each team plays each other team at least twice.



To play a game, two teams are necessary: the home team and the visiting team.



Each team plays at least twice: once as the home team and once as the visiting team.

Given these relationships, it becomes clear that TEAM participates in a recursive M:N relationship with GAME. The relationship between TEAM and GAME becomes clearer if we list some attributes for each of these entities. Note that the TEAM_NUM appears twice in a GAME record: once as a GAME_HOME_TEAM and once as a GAME_VISIT_TEAM.

Implementation of this solution yields the relational diagram shown in Figure P4.3RD. (If you implement this design in Microsoft Access, note that Access will generate a virtual table named TEAM_1 to indicate that two relationships exist between GAME and TEAM. We created a database named Ch04_JCBC_V1 to illustrate this design implementation.

100

FIGURE P4.3RD The JCBC Relational Diagram, Version 1

The solution shown in Figure P4.3Chen yields a database that enables its users to track all games. For example, a simple query—based on the two relationships between TEAM and GAME yields the output shown in Figure P4.3SO. (We have created only a few records to show the results for games 1 and 2 played by teams named Bears, Rattlers, Sharks, and Tigers, respectively.)

FIGURE P4.3SO The JCBC Database Game Summary Output, Version 1

As you examine the design and its implementation—check the relational diagram in Figure P4.3RD—note that this solution uses synonyms, because the TEAM_NUM shows up in GAME twice: once as the GAME_HOME_TEAM and once as the GAME_VISIT_TEAM. Given the use of these synonyms, the GAME entity also becomes very cumbersome structurally as you decide to track more game data. For example, if you wanted to keep track of runs, hits, and errors, you would have to have one set of each for each of the two teams—all in the same record. Clearly, such a structure is undesirable: the use of synonyms requires the addition of two new attributes— one for the home team and one for the visiting team—for each additional characteristic you want to track. To eliminate the structural problem discussed in the previous paragraph, you can let each game be represented by two entities: GAME and GAME_LINE. Figure P4.3RD2 shows the structures of these two entities in a segment of the revised relational diagram. We have added a LOCATION

101

entity to specify the actual location of the game—knowing that a game is played in Nashville is not sufficiently specific. Players, coaches, and spectators ought to know where in Nashville the game is played.

FIGURE P4.3 RD2 The Revised JCBC Database Relational Diagram

102

NOTE Quite aside from the fact that we ought to know where in each city any given game is played, the LOC_ID attribute in GAME refers to a LOCATION entity that was created to make the database more flexible by permitting the use of multiple locations in each city. Although this capability was not required by the problem description—each city only fields one team at this point—it is very likely that additional teams will be organized in the future. Good design first ensures that current requirements are met. This design does that. But good design also anticipates the reasonably expected changing dynamics of the database environment. This revised design does that, too. Additional flexibility is gained by the use of the GAME entity. For example, if you want to track the assignment of referees in each of the games, you can easily create a REFEREE entity in a M:N relationship with the GAME entity. (A referee may referee many games and many referees referee each game.) This M:N relationship may then be transformed into two 1:M relationships through the use of a composite entity, perhaps named REF_GAME. Finally, point out to the students that the relationship between the newly created GAME and GAME_LINE entities is structurally similar to the by now familiar relationship between INVOICE and INV_LINE entities. The completed database design is implemented as shown in the Crow’s Foot ERD in Figure P4.3CF and in a UML class diagram in Figure P4.UML.

103

FIGURE P4.3CF The JCBC Crow’s Foot ERD

104

FIGURE P4.3UML The JCBC UML Class Diagram

105

NOTE You may wonder why we examined this solution in such detail. (The sample implementation is shown in the database named Ch04_JCBC_Version2.) After all, mere games hardly seem to merit this level of database design attention. Actually, there is the proverbial method in the madness. The basketball—or any other game—environment is likely to be familiar to your students. Therefore, it becomes easier for you to show the design and implementation of recursive relationships—which are actually rather complex things. Fortunately, even complex design issues become manageable in a familiar data environment. Recursive relationships are common enough—or should be—to merit attention and the development of expertise in their implementation. In many manufacturing industries, incredibly detailed part tracking is mandatory. For example, the implementation of the recursive relationship “PART contains PART” is especially desirable in the aviation manufacturing businesses. Such businesses are required by federal law to maintain absolute parts tracing records. If a complex part fails, it must be possible to follow all the trails to all the component parts that may have been involved in the part’s failure. 128. Create an ERD based on the Crow’s Foot notation using the following requirements: Answer: 

An INVOICE is written by a SALESREP. Each sales representative can write many invoices, but each invoice is written by a single sales representative.



The INVOICE is written for a single CUSTOMER. However, each customer can have many invoices.



An INVOICE can include many detail lines (LINE), each of which describes one product bought by the customer.



The product information is stored in a PRODUCT entity.



The product’s vendor information is found in a VENDOR entity.

NOTE The ERD must reflect business rules that you are free to define (within reason). Make sure that your ERD reflects the conditions you require. Finally, make sure that you include the attributes that would permit the model to be successfully implemented. A Crow’s Foot ERD solution is shown in Figure P4.4a, however, this problem allows students the creativity to determine potential attributes for themselves, which can lead to an infinite variety of solutions based on the attributes they speculated.

106

FIGURE P4.4a The Crow’s Foot ERD Solution for Problem 4

107

NOTE Keep in mind that the preceding ER diagram reflects a set of business rules that may easily be modified. For example, if customers are supplied via a commercial customer list, many of the customers on that list will not (yet!) have bought anything, so INVOICE would be optional to CUSTOMER. We are assuming here that many vendors can supply a product and that each vendor can supply many products. The PRODUCT may be optional to VENDOR if the vendor list includes potential vendors from which you have not (yet) ordered anything. Some products may never sell, so LINE is optional to PRODUCT... because an unsold product will never appear in an invoice line. You may also want to show the students how the composite entities may be represented at the final implementation level. For example, LINE is shown as weak to INVOICE, because it borrows the invoice number as part of its primary key and it is existencedependent on INVOICE. The modified ER diagram is shown next in Figure P4.4b. The point of this exercise is that the design’s final iteration depends on the exact nature of the business rules and the desired level of implementation detail.

108

FIGURE P4.4b The Modified Crow’s Foot ERD Solution for Problem 4

129. The Hudson Engineering Group (HEG) has contacted you to create a conceptual model whose application will meet the expected database requirements for the company’s training program. The HEG administrator gives you the following description of the training group’s operating environment. (Hint: Some of the following sentences identify the volume of data rather than cardinalities. Can you tell which ones?) Answer: The HEG has 12 instructors and can handle up to 30 trainees per class. HEG offers five Advanced Technology courses, each of which may generate several classes. If a class has fewer than 10 trainees, it will be canceled. Therefore, it is possible for a course not to generate any classes. Each class is taught by one instructor. Each instructor may teach up to two classes or may be assigned to do research only. Each trainee may take up to two classes per year.

109

Given that information, do the following: a. Define all of the entities and relationships. (Use Table 4.4 as your guide.) The HEG entities and relationships are shown in Table P4.5a.

Table P4.5a The Components of the HEG ERD ENTITY

RELATIONSHIP

CONNECTIVITY

ENTITY

INSTRUCTOR

teaches

1:M

CLASS

COURSE

generates

1:M

CLASS

is listed in

1:M

ENROLL

TRAINEE

is written in

1:M

ENROLL

As you examine the summary in Table P4.5a, it is reasonable to assume that many of the relationships are optional and that some are mandatory. (Remember a point we made earlier: when in doubt, assume an optional relationship.) 

A COURSE does not necessarily generate a class during each training period. (Some courses may be taught every other period or during some other specified time frames. Therefore, it is reasonable to assume that CLASS is optional to COURSE.



Each CLASS must be related to a COURSE. (The class must cover designated course material!) Therefore, COURSE is mandatory to CLASS.



Some instructors may teach a class every other period or even rarely. Therefore, it is reasonable to assume that CLASS is optional to INSTRUCTOR during any enrollment period. This optionality makes sense from an implementation point of view, too. For example, if you appoint a new instructor, that instructor will not—yet— have taught a class.



Not all trainees are likely to be enrolled in classes during some time period. In fact, in a real-world setting, many trainees are likely to get informal “on the job” training without going to formal classes. Therefore, it is reasonable to assume that ENROLL is optional to TRAINEE.

You cannot create an enrollment record without having a trainee. Therefore, TRAINEE is mandatory to ENROLL. (Discussion point: What about making TRAINEE optional to ENROLL? In any case, optional relationships may be used for operational reasons, whether or not they are directly derived from a business rule.) Note that a real-world database design requires the explicit recognition of each relationship’s characteristics. When in doubt, ask the end users! b. Describe the relationship between instructor and class in terms of connectivity, cardinality, and existence dependence. Both Questions (a) and (b) have been addressed in the ER diagram shown in Figure P4.5b.

110

FIGURE P4.5b The HEG ERD

As you discuss Figure P4.5b, keep the discussion in part (a) in mind. Also, note the following points: 

A trainee can take more than one class, and each class contains many (10 or more) trainees, so there is a M:N relationship between TRAINEE and CLASS. (Therefore, a composite entity is used to serve as the bridge between TRAINEE and CLASS.)



A class is taught by only one instructor, but an instructor can teach up to two classes. Therefore, there is a 1:M relationship between INSTRUCTOR and CLASS.



Finally, a COURSE may generate more than one CLASS, while each CLASS is based on one COURSE, so there is a 1:M relationship between COURSE and CLASS.

These relationships are all reflected in the ER diagram shown in Figure P4.5b. Note the optional and mandatory relationships: 

To exist, a CLASS must have TRAINEEs enrolled in it, but TRAINEEs do not necessarily take CLASSes. (Some may take “on the job training.”)



An INSTRUCTOR may not be teaching any CLASSes during some enrollment periods. For example, an instructor may be assigned to duties other than training. However, each CLASS must have an INSTRUCTOR.



If an insufficient number of people sign up for a CLASS, a COURSE may not generate any CLASSes, but each CLASS must represent a COURSE.

111

NOTE The sentences “HEG has twelve instructors.” and “HEG offers five advanced technology courses.” are not reflected in the ER diagram. Instead, they represent additional information concerning the volume of data (number of entities in an entity set), rather than information concerning entity relationships. Because the HEG description leaves room for different interpretations of optional vs. mandatory relationships, we like to give the student the benefit of the doubt. Therefore, unless the question or problem description is sufficiently precise to leave no doubt about the existence of optional/mandatory relationships, we base the student grade on two criteria: 1. Was the basic nature of the relationship—1:1, 1:M, or M:N—selected and displayed properly? 2. Given the student’s rendering of such a relationship, are the cardinalities appropriate? You can add substantial detail to the ERD by including sample attributes for each of the entities. Using a data modeling tool, you can also let your student declare the nature—weak or strong—of the relationships among the entities. Finally, remind your students that the order in which the attributes appear in each entity is immaterial. Therefore, the (composite) PK of the ENROLL entity can be written as either CLASS_CODE + TRN_NUM or as TRN_NUM + CLASS_CODE. That’s why it is also immaterial which one of the foreign key attributes is FK1 or FK2. As you discuss the ERD shown in Figure P4.5b, note that the basic components of this problem are found in the text’s Figure 4.32. Note also that the ENROLL entity in Figure P4.5b uses a composite PK (TRN_NUM + CLASS_CODE) and that, therefore the relationships between ENROLL and CLASS and TRAINEE are strong. Finally, discuss the reason for the weak relationship between COURSE and CLASS—the CLASS entity’s PK (CLASS_CODE) does not “borrow” the PK of the parent COURSE entity. If the CLASS entity’s PK had been composed of CRS_CODE + CLASS_SECTION, the relationship between COURSE and CLASS would have been strong. Discussion: Review the text to show the two possible relationship strengths between COURSE and CLASS. Emphasize that the choice of the PK component(s) is usually a designer option, but that single-attribute PKs tend to yield more design options than composite PKs. Even the composite ENROLL entity can be modified to have a single-attribute PK such as ENROLL_NUM. Given that choice, CLASS_CODE + TRN_NUM constitute a candidate key—CLASS_CODE and TRN_NUM continue to serve as foreign keys to CLASS and TRAINEE, respectively. Given the latter scenario, you can create a (unique) composite index to prevent duplicate enrollments.

112

130. Automata, Inc., produces specialty vehicles by contract. The company operates several departments, each of which builds a particular vehicle, such as a limousine, truck, van, or RV. Answer: 

Before a new vehicle is built, the department places an order with the purchasing department to request specific components. Automata’s purchasing department is interested in creating a database to keep track of orders and to accelerate the process of delivering materials.



The order received by the purchasing department may contain several different items. An inventory is maintained so the most frequently requested items are delivered almost immediately. When an order comes in, it is checked to determine whether the requested item is in inventory. If an item is not in inventory, it must be ordered from a supplier. Each item may have several suppliers.

Given that functional description of the processes at Automata’s purchasing department, do the following: a. Identify all of the main entities. b. Identify all of the relations and connectivities among entities. c. Identify the type of existence dependence in all the relationships. d. Give at least two examples of the types of reports that can be obtained from the database. The initial Crow’s Foot ERD is shown in Figure P4.6init. The discussion preceding Figure P4.6rev explains why the revision was made.

113

FIGURE P4.6init Initial Automata Crow’s Foot ERD

As you explain the development of the Crow’s Foot ERD shown in Figure P4.6init, several points are worth stressing: 

The ORDER and ORD_LINE entities are perfect reflections of the INVOICE and INV_LINE entities the students have encountered before. This kind of 1:M relationship is quite common in a business environment and you will see it recur throughout the book and in its many problems. Note that the ORD_LINE entity is weak, because it inherits part of its PK from its ORDER “parent” entity. Therefore, the “contains” relationship between ORDER and ORD_LINE is properly shown as an identifying (strong) relationship. (The relationship line is solid, rather than dashed.) Finally, note that ORD_LINE is mandatory to ORDER; it is not possible to have an ORDER that does not contain at least one order line. And, of course, ORDER is mandatory to ORD_LINE, because an ORD_LINE occurrence cannot exist without referencing an ORDER.



The ORDER entity is shown as optional to DEPARTMENT, indicating that it is quite possible that a department has not (yet) placed an order. Aside from the fact that such an optionality makes common sense, it also makes operational sense from a database point of view. For example, if the ORDER entity were mandatory to the DEPARTMENT entity, the creation of a new department would require the creation of an order, so you might have to create a “dummy” order when you create a new department. Also, keep in mind that an order cannot be written by a department that does not (yet) exist.

114



Note also that the VENDOR may not (yet) have received an order, so ORDER is optional to VENDOR. The VENDOR entity may contain vendors who are simply potential suppliers of items and you may want to have such potential vendors available just in case your “usual” vendor(s) run(s) out of items that you need.

The other optionalities should be discussed, too—using the same basic scenarios that were described in bullets 2 and 3.

NOTE In this presentation, the relationship between VENDOR and ITEM is shown as 1:M. Therefore, each vendor can supply many items, but only one vendor can supply each item. If it is possible for items to be supplied by more than one vendor, there is a M:N relationship between VENDOR and ITEM and this relationship would have to be implemented through a composite (bridge) entity. Actually, such an M:N relationship is specified in the brief description of the Automata company’s data environment. Therefore, the following Figure P4.6rev more accurately reflects the problem description.

FIGURE P4.6rev Revised Automata Crow’s Foot ERD

131. United Helpers is a nonprofit organization that provides aid to people after natural disasters. Based on the following brief description of operations, create the appropriate fully labeled Crow’s Foot ERD. Answer:

115



Volunteers carry out the tasks of the organization. The name, address, and telephone number are tracked for each volunteer. Each volunteer may be assigned to several tasks, and some tasks require many volunteers. A volunteer might be in the system without having been assigned a task yet. It is possible to have tasks that no one has been assigned. When a volunteer is assigned to a task, the system should track the start time and end time of that assignment.



Each task has a task code, task description, task type, and task status. For example, there may be a task with task code “101,” a description of “answer the telephone,” a type of “recurring,” and a status of “ongoing.” Another task might have a code of “102,” a description of “prepare 5,000 packages of basic medical supplies,” a type of “packing,” and a status of “open.”



For all tasks of type “packing,” there is a packing list that specifies the contents of the packages. There are many packing lists to produce different packages, such as basic medical packages, child-care packages, and food packages. Each packing list has an ID number, a packing list name, and a packing list description, which describes the items that should make up the package. Every packing task is associated with only one packing list. A packing list may not be associated with any tasks, or it may be associated with many tasks. Tasks that are not packing tasks are not associated with any packing list.



Packing tasks result in the creation of packages. Each individual package of supplies produced by the organization is tracked, and each package is assigned an ID number. The date the package was created and its total weight are recorded. A given package is associated with only one task. Some tasks (such as “answer the phones”) will not produce any packages, while other tasks (such as “prepare 5,000 packages of basic medical supplies”) will be associated with many packages.



The packing list describes the ideal contents of each package, but it is not always possible to include the ideal number of each item. Therefore, the actual items included in each package should be tracked. A package can contain many different items, and a given item can be used in many different packages.



Each item that the organization provides has an item ID number, item description, item value, and item quantity on hand stored in the system. Along with tracking the actual items that are placed in each package, the quantity of each item placed in the package must be tracked as well. For example, a packing list may state that basic medical packages should include 100 bandages, 4 bottles of iodine, and 4 bottles of hydrogen peroxide. However, because of the limited supply of items, a given package may include only 10 bandages, 1 bottle of iodine, and no hydrogen peroxide. The fact that the package includes bandages and iodine needs to be recorded along with the quantity of each item included. It is possible for the organization to have items that have not been included in any package yet, but every package will contain at least one item.

The ERD for United Helpers is shown in Figure P4.7a.

116

FIGURE P4.7a United Helpers ERD

This problem, however, does leave room for interesting discussion with the students regarding the need to verify requirements with the business users. In fact, getting unambiguous business rules can be one of the most difficult parts of the design process. In this problem, the potential for a relationship between the packing list (LIST) and the items (ITEM) stocked by the organization can be a source for discussion. Students may envision that a LIST can specify many ITEMs and an ITEM can be specified in many LISTs. This would imply the need for a M:N relationship between ITEM and LIST. However, the business users may not intend for the packing list to be that specific. For example, the packing list may specify that “2 liter of iodine” should be included in a given type of package without specifying whether it should be two 1-liter bottles of iodine or four 500-ml bottles of iodine. Note that “1-liter bottle of iodine” and “500-ml bottle of iodine” would have to be separate entity instances in ITEM because they have different values. If it is the case that the packing list is intentionally generic in its description of the ideal contents, then a relationship between LIST and ITEM would not be appropriate. 132. Using the Crow’s Foot notation, create an ERD that can be implemented for a medical clinic using the following business rules:

117

Answer: 

A patient can make many appointments with one or more doctors in the clinic, and a doctor can accept appointments with many patients. However, each appointment is made with only one doctor and one patient.



Emergency cases do not require an appointment. However, for appointment management purposes, an emergency is entered in the appointment book as “unscheduled.”



If kept, an appointment yields a visit with the doctor specified in the appointment. The visit yields a diagnosis and, when appropriate, treatment.



With each visit, the patient’s records are updated to provide a medical history.



Each patient visit creates a bill. Each patient visit is billed by one doctor, and each doctor can bill many patients.



Each bill must be paid. However, a bill may be paid in many installments, and a payment may cover more than one bill.



A patient may pay the bill directly, or the bill may be the basis for a claim submitted to an insurance company.



If the bill is paid by an insurance company, the deductible is submitted to the patient for payment.

The ERD solution is shown in Figure P4.8.

118

FIGURE P4.8 The Medical Clinic’s Crow’s Foot ERD

133. Create a Crow’s Foot notation ERD to support the following business operations: Answer: 

A friend of yours has opened Professional Electronics and Repairs (PEAR) to repair smartphones, laptops, tablets, and MP3 players. She wants you to create a database to help her run her business.



When a customer brings a device to PEAR for repair, data must be recorded about the customer, the device, and the repair. The customer’s name, address, and a contact phone number must be recorded (if the customer has used the shop before, the information already in the system for the customer is verified as being current). For the device to be repaired, the type of device, model, and serial number are recorded (or verified if the device is already in the system). Only customers who have brought devices into PEAR for repair will be included in this system.

119



Because a customer might sell an older device to someone else who then brings the device to PEAR for repair, it is possible for a device to be brought in for repair by more than one customer. However, each repair is associated with only one customer. When a customer brings in a device to be fixed, it is referred to as a repair request, or just “repair,” for short. Each repair request is given a reference number, which is recorded in the system along with the date of the request, and a description of the problem(s) that the customer wants fixed. It is possible for a device to be brought to the shop for repair many different times, and only devices that are brought in for repair are recorded in the system. Each repair request is for the repair of one and only one device. If a customer needs multiple devices fixed, then each device will require its own repair request.



There are a limited number of repair services that PEAR can perform. For each repair service, there is a service ID number, description, and charge. “Charge” is how much the customer is charged for the shop to perform the service, including any parts used. The actual repair of a device is the performance of the services necessary to address the problems described by the customer. Completing a repair request may require the performance of many services. Each service can be performed many different times during the repair of different devices, but each service will be performed only once during a given repair request.



All repairs eventually require the performance of at least one service, but which services will be required may not be known at the time the repair request is made. It is possible for services to be available at PEAR but that have never been required in performing any repair.



Some services involve only labor activities and no parts are required, but most services require the replacement of one or more parts. The quantity of each part required in the performance of each service should also be recorded. For each part, the part number, part description, quantity in stock, and cost is recorded in the system. The cost indicated is the amount that PEAR pays for the part. Some parts may be used in more than one service, but each part is required for at least one service.

120

FIGURE P4.9 The PEAR ERD

134. Luxury-Oriented Scenic Tours (LOST) provides guided tours to groups of visitors to the Washington D.C. area. In recent years, LOST has grown quickly and is having difficulty keeping up with all of the various information needs of the company. The company’s operations are as follows: Answer: 

LOST offers many different tours. For each tour, the tour name, approximate length (in hours), and fee charged is needed. Guides are identified by an employee ID, but the system should also record a guide’s name, home address, and date of hire. Guides take a test to be qualified to lead specific tours. It is important to know which guides are qualified to lead which tours and the date that they completed the qualification test for each tour. A guide may be qualified to lead many different tours. A tour can have many different qualified guides. New guides may or may not be qualified to lead any tours, just as a new tour may or may not have any qualified guides.

121



Every tour must be designed to visit at least three locations. For each location, a name, type, and official description are kept. Some locations (such as the White House) are visited by more than one tour, while others (such as Arlington Cemetery) are visited by a single tour. All locations are visited by at least one tour. The order in which the tour visits each location should be tracked as well.



When a tour is actually given, that is referred to as an “outing.” LOST schedules outings well in advance so they can be advertised and so employees can understand their upcoming work schedules. A tour can have many scheduled outings, although newly designed tours may not have any outings scheduled. Each outing is for a single tour and is scheduled for a particular date and time. All outings must be associated with a tour. All tours at LOST are guided tours, so a guide must be assigned to each outing. Each outing has one and only one guide. Guides are occasionally asked to lead an outing of a tour even if they are not officially qualified to lead that tour. Newly hired guides may not have ever been scheduled to lead any outings. Tourists, called “clients” by LOST, pay to join a scheduled outing. For each client, the name and telephone number are recorded. Clients may sign up to join many different outings, and each outing can have many clients. Information is kept only on clients who have signed up for at least one outing, although newly scheduled outings may not have any clients signed up yet.

a. Create a Crow’s Foot notation ERD to support LOST operations.

122

FIGURE P4.10a The first LOST ERD

b. The operations provided state that it is possible for a guide to lead an outing of a tour even if the guide is not officially qualified to lead outings of that tour. Imagine that the business rules instead specified that a guide (a) is never, under any circumstance, allowed to lead an outing unless he or she is qualified to lead outings of that tour. How could the data model in Part a. be modified to enforce this new constraint?

123

FIGURE P4.10b The second LOST ERD 135. Beverage Buddy (BB) is a diabetes-friendly mobile app to track and share beverage information with friends. BB tracks data about teas, coffees, and other drinks to help individuals with diabetes manage their blood sugar levels. Create a Crow’s Foot notation ERD to support the core operations of the BB app as follows:

124

Answer: 

The app will track beverages by many different brewers. For each beverage, the name of the beverage and the type of beverage are stored. The type of beverage can be “Tea,” “Coffee,” “Cider,” or “Other” at this time, but new types may be added later. Each beverage is provided by a single brewer. A “brewer” is a company that provides beverages. Brewers must be added to the system by BB staff. (It is not part of the app that you are helping with, but brewers must sign a contract with the BB parent company; therefore, users cannot add brewers or beverages.) Each brewer is assigned a number by the system that is stored along with the company name, address, and date that they were first added to the BB system. If a brewer provides alcoholic beverages, then the brewer’s license number is also kept in the system.



Most brewers provide a large number of beverages to the system that users can see. Brewers do not typically provide their menu of beverages to be added to BB until after the contract issues are settled, so it is possible for a brewer to appear in the system before any of their beverages have been added. It is not possible to enter a beverage without specifying which brewer provides that beverage.



BB also tracks data on the venues that sell the beverages. Most beverages are available from a wide range of venues. A venue may be any type of bar or restaurant. (Just like brewers, venues also have to contract with the BB parent company to appear in the system, but this is outside the app that you are helping with.) Each venue has a name and address. Venues can also specify a “preference,” which is a means of identifying themselves as primarily a coffee shop, tea house, or bar. The preference does not limit which beverages are sold at that venue but allows users to easily specify that they are searching for coffee preference venues or tea preference venues. A venue will normally provide many different beverages. Again, due to delays in the entering of data related to venues and beverages outside the BB app, it is possible for a venue to be entered in the system before specifying which beverages it carries. It is also possible to enter beverages in the system before specifying which venues carry that beverage.



Users of Beverage Buddy must register before using the app. Registration requires providing the user’s name (first name and last name), an email address, and date of birth. Users can change or update any of this information later without having to reregister. Users can view all the beverages in the system as well as search for beverages from individual brewers. Beverages can be searched by name, type, color, grams of sugar, total carbohydrates, and sweetener (if any) used in the beverage. The system you are helping with does not keep a record of which beverages are viewed or the searches performed.



If a user tries a beverage that is listed in the BB app, they can add it to their “drink list.” A drink list is simply the list of all the tracked beverages that the user has ever tried. When a user adds a beverage to their drink list, the date the beverage is added is also recorded. Users can mark beverages on their drink list as a “favorite” if they want.



Users can connect with each other through the BB app by adding each other as friends. When a user requests to friend another user, the friendship is marked as “requested” in the system. When the other user accepts the request, the friendship is marked as “confirmed” in the system. When users become friends in the app, the date of the friendship is recorded. Friends in the app can see each other’s drink lists and favorites. Users can “friend” as many other users of the BB app as they wish, but users are not required to friend anyone.

125



Venues can occasionally sponsor events. Venues are not required to sponsor any events, but some venues sponsor many events each year. The events are tracked in BB. Each event has a name, start date, and end date. Some events have an admission fee associated with them, but some do not. Only events sponsored by venues appear in the BB app. Each event is sponsored by a single venue. Users can see upcoming events within the app. If the user plans to attend the event, they can sign up for the event through the app. The BB app does not handle payments so if the event has an admission fee, payments for the admission fee are not done or tracked within the app. If a user signs up for an event, the date that they sign up is recorded. Users do not always attend the events that they sign up for. If a user attends the event, then they can “check in” at the event when they get there. Checking in at the event is simply indicating in the app that they actually attended the event. A user can, and hopefully will, sign up and attend many different events. An event will hopefully be attended by dozens of users. The event needs to be able to be entered in the system before the users can sign up for it. Some users have never signed up for, nor attended, any events.



For example, Aziz installs the Beverage Buddy app on his phone and registers as a user. He goes to a tea house named “Tropical Teas” after work one day. While there, he looks on BB for a black tea sold at this tea house that has fewer than 3 grams of sugar and fewer than 5 total carbohydrates. Looking through the results, he decides to try a beverage named “Cabo Crisp” that is brewed by “World Tea Market” (not to be confused with the “Cabo Crisp” that is a cider brewed by “Greenhouse Brewers”). After he orders with the waiter and the tea is brought to him, Aziz adds Cabo Crisp to his drink list in BB, and marks it as a favorite. While he is drinking his tea, he looks for his friend Kayla on the system by her email address and sends a friend request. Almost immediately, Kayla accepts his request and they are now friends in BB. Looking at Kayla’s drink list, he sees that she has also tried Cabo Crisp and marked it as one of her favorites. He finds that Kayla has tried over 50 different drinks and notes that Kayla also marked the coffee drink named, “Butter Blend,” as one of her favorites. Aziz finds that Butter Blend is not available at “Tropical Teas,” but it is available at “GrindHows” near his work. He can see that GrindHows is sponsoring a free book reading event next Tuesday from 4 pm until 7 pm with 50% off all coffees. Aziz signs up to attend the book reading event.



To help protect user privacy, BB does not store data about any searches that users make.

126

FIGURE P4.11 The Beverage Buddy ERD

127

CASE SOLUTIONS 136. The administrators of Tiny College are so pleased with your design and implementation of their student registration and tracking system that they want you to expand the design to include the database for their motor vehicle pool. A brief description of operations follows: Answer: 

Faculty members may use the vehicles owned by Tiny College for officially sanctioned travel. For example, the vehicles may be used by faculty members to travel to off-campus learning centers, to travel to locations at which research papers are presented, to transport students to officially sanctioned locations, and to travel for public service purposes. The vehicles used for such purposes are managed by Tiny College’s Travel Far But Slowly (TFBS) Center.



Using reservation forms, each department can reserve vehicles for its faculty, who are responsible for filling out the appropriate trip completion form at the end of a trip. The reservation form includes the expected departure date, vehicle type required, destination, and name of the authorized faculty member. The faculty member who picks up a vehicle must sign a checkout form to log out the vehicle and pick up a trip completion form. (The TFBS employee who releases the vehicle for use also signs the checkout form.) The faculty member’s trip completion form includes the faculty member’s identification code, the vehicle’s identification, the odometer readings at the start and end of the trip, maintenance complaints (if any), gallons of fuel purchased (if any), and the Tiny College credit card number used to pay for the fuel. If fuel is purchased, the credit card receipt must be stapled to the trip completion form. Upon receipt of the trip completion form, the faculty member’s department is billed at a mileage rate based on the vehicle type used: sedan, station wagon, panel truck, minivan, or minibus. (Hint: Do not use more entities than are necessary. Remember the difference between attributes and entities!)



All vehicle maintenance is performed by TFBS. Each time a vehicle requires maintenance, a maintenance log entry is completed on a prenumbered maintenance log form. The maintenance log form includes the vehicle identification, brief description of the type of maintenance required, initial log entry date, date the maintenance was completed, and name of the mechanic who released the vehicle back into service. (Only mechanics who have an inspection authorization may release a vehicle back into service.)



As soon as the log form has been initiated, the log form’s number is transferred to a maintenance detail form; the log form’s number is also forwarded to the parts department manager, who fills out a parts usage form on which the maintenance log number is recorded. The maintenance detail form contains separate lines for each maintenance item performed, for the parts used, and for identification of the mechanic who performed the maintenance item. When all maintenance items have been completed, the maintenance detail form is stapled to the maintenance log form, the maintenance log form’s completion date is filled out, and the mechanic who releases the vehicle back into service signs the form. The stapled forms are then filed, to be used later as the source for various maintenance reports.



TFBS maintains a parts inventory, including oil, oil filters, air filters, and belts of various types. The parts inventory is checked daily to monitor parts usage and to reorder parts that reach the “minimum quantity on hand” level. To track parts usage, the parts manager requires each mechanic to sign out the parts that are used to perform each vehicle’s maintenance; the parts manager records the maintenance log number under which the part is used.

128



Each month TFBS issues a set of reports. The reports include the mileage driven by vehicle, by department, and by faculty members within a department. In addition, various revenue reports are generated by vehicle and department. A detailed parts usage report is also filed each month. Finally, a vehicle maintenance summary is created each month.

Given that brief summary of operations, draw the appropriate (and fully labeled) ERD. Use the Crow’s foot methodology to indicate entities, relationships, connectivities, and cardinalities. The solution is shown in Figure P4.12.

FIGURE P4.12 The Tiny College TFBS Maintenance ERD 137. During peak periods, Temporary Employment Corporation (TEC) places temporary workers in companies. TEC’s manager gives you the following description of the business:

129

Answer: 

TEC has a file of candidates who are willing to work.



Any candidate who has worked before has a specific job history. (Naturally, no job history exists if the candidate has never worked.) Each time the candidate works, one additional job history record is created.



Each candidate has earned several qualifications. Each qualification may be earned by more than one candidate. (For example, more than one candidate may have earned a Bachelor of Business Administration degree or a Microsoft Network Certification, and clearly a candidate may have earned both a BBA and a Microsoft Network Certification.)



TEC offers courses to help candidates improve their qualifications.



Every course develops one specific qualification; however, TEC does not offer a course for every qualification. Some qualifications are developed through multiple courses.



Some courses cover advanced topics that require specific qualifications as prerequisites. Some courses cover basic topics that do not require any prerequisite qualifications. A course can have several prerequisites. A qualification can be a prerequisite for more than one course.



Courses are taught during training sessions. A training session is the presentation of a single course. Over time, TEC will offer many training sessions for each course; however, new courses may not have any training sessions scheduled right away.



Candidates can pay a fee to attend a training session. A training session can accommodate several candidates, although new training sessions will not have any candidates registered at first.



TEC also has a list of companies that request temporaries.



Each time a company requests a temporary employee, TEC makes an entry in the Openings folder. That folder contains an opening number, a company name, required qualifications, a starting date, an anticipated ending date, and hourly pay.



Each opening requires only one specific or main qualification.



When a candidate matches the qualification, the job is assigned, and an entry is made in the Placement Record folder. The folder contains such information as an opening number, candidate number, and total hours worked. In addition, an entry is made in the job history for the candidate.



An opening can be filled by many candidates, and a candidate can fill many openings.



TEC uses special codes to describe a candidate’s qualifications for an opening. The list of codes is shown in Table P4.13.

130

Table P4.13 Codes for Problem 13 CODE

DESCRIPTION

SEC-45

Secretarial work; candidate must type at least 45 words per minute

SEC-60

Secretarial work; candidate must type at least 60 words per minute

CLERK

General clerking work

PRG-PY

Programmer, Python

PRG-C++

Programmer, C++

DBA-ORA

Database Administrator, Oracle

DBA-DB2

Database Administrator, IBM DB2

DBA-SQLSERV

Database Administrator, MS SQL Server

SYS-1

Systems Analyst, level 1

SYS-2

Systems Analyst, level 2

NW-CIS

Network Administrator, Cisco experience

WD-CF

Web Developer, ColdFusion

TEC’s management wants to keep track of the following entities: COMPANY, OPENING, QUALIFICATION, CANDIDATE, JOB_HISTORY, PLACEMENT, COURSE, and SESSION. Given that information, do the following: a. Draw the Crow’s Foot ERDs for this enterprise. b. Identify all necessary relationships. c. Identify the connectivity for each relationship. d. Identify the mandatory and optional dependencies for the relationships. e. Resolve all M:N relationships. The solutions for Problems 13a–13e are shown in Figure P4.13.

131

FIGURE P4.13 TEC Solution ERD

132

To help the students understand Figure P4.13’s ER diagram’s components better, the following discussion is likely to be useful: 

Each COMPANY may list one or more OPENINGs. Because we will maintain COMPANY data even if a company has not (yet!) hired any of TEC’s candidates, OPENING is an optional entity in the COMPANY-lists-OPENING relationship.



OPENING is existence-dependent on COMPANY, because there cannot be an opening unless a company lists it. If you decide to use the COMPANY primary key as a component of the OPENING’s primary key, you have satisfied the conditions that will allow you to classify OPENING as a weak entity and the relationship between COMPANY and OPENING will be strong or identifying. In other words, the OPENING entity is weak if its PK is the combination of OPENING_NUM and COMP_CODE. (The COMP_CODE remains the FK in the OPENING entity.)

Note that there is a 1:M relationship between COMPANY and OPENING, because a company can list multiple job openings. The next table segment shows that the WEST Company has two available job openings and the EAST Company has one available job opening. Naturally, the actual table would have additional attributes in it—but we’re merely illustrating the behavior of the PK components here.

COMP_CODE

OPENING_NUM

West

East

However, if the OPENING’s PK is defined to be a single OPENING attribute such as a unique OPENING_NUM, OPENING is no longer a weak entity. We have decided to use the latter approach in Figure P4.13. Note that this decision causes the relationship between COMPANY and OPENING to be weak. (The relationship line is dashed.) In this case, the COMP_CODE attribute would continue to be the FK pointing to the COMPANY table, but it would no longer be a part of the OPENING entity PK. The next table segment shows what such an arrangement would look like:

OPENING_NUM

COMP_CODE

10025

West

10026

West

10027

East

133



Similarly, the relationship between PLACEMENT and OPENING may be defined as strong or weak. We have used a weak relationship between OPENING and PLACEMENT.



A job candidate may have had many jobs—remember that TEC is a temp employer. Therefore, a candidate may have many entries in HISTORY. But keep in mind that a candidate may just have completed job training and, therefore, may not have had job experience (i.e., no job history) yet. In short, HISTORY is optional to CANDIDATE.



To enable TEC or its clients to trace the entire employment record of any candidate, it is reasonable to expect that the HISTORY entity also records the job(s) held by the candidate before that candidate was placed by TEC. Only the portion of the job history created through TEC placement is reflected in the PLACEMENT entity. Therefore, PLACEMENT is optional to HISTORY.



The semantics of the problem seem to suggest that the HISTORY is an entity that exists in a 1:1 relationship with PLACEMENT. After all, each placement generates one (and only one) entry in the candidate’s history.



Because each placement must generate an entry in the HISTORY entity, one would reasonably conclude that HISTORY is mandatory to PLACEMENT. Note that PLACEMENT is redundant because a job placement obviously creates a job history entry. However, such a redundancy can be justified on the basis that PLACEMENT may be used to track job placement details that are of interest to TEC management.



HISTORY is clearly existence-dependent on CANDIDATE; it is not possible to make an entry in HISTORY without having a CANDIDATE to generate that history. Given this scenario, the CANDIDATE entity’s primary key may be used as one of the components of the HISTORY entity’s primary key, thus making HISTORY a weak entity.



Each CANDIDATE may have earned one or more QUALIFICATIONs. Although a company may list a qualification, there may not be a matching candidate because it is possible that none of the candidates have this qualification. For instance, it is possible that none of the available candidates is a Pascal programmer. Therefore, CANDIDATE is optional to QUALIFICATION. However, many candidates may have a given qualification. For example, many candidates may be C++ programmers. Keep in mind that each qualification may be matched to many job candidates, so the relationship between CANDIDATE and QUALIFICATION is M:N. This relationship must be decomposed into two 1:M relationships with the help of a composite entity we will name EDUCATION. The EDUCATION entity will contain the qualification code, the candidate identification, the date on which the candidate earned the qualification, and so on. A few sample data entries might look like this:

QUAL_CODE

CAND_NUM

EDUC_DATE

PRG-PY

4358

12-Dec-00

PRG-C++

4358

05-Mar-03

DBA-ORA

4358

23-Nov-01

DBA-DB2

2113

02-Jun-85

DBA-ORA

2113

26-Jan-02

134

Note that the preceding table contents illustrate that candidate 4358 has three listed qualifications, while candidate 2113 has two listed qualifications. Note also that the qualification code DBA-ORA occurred more than once. Clearly, the PK must be a combination of QUAL_CODE and CAND_NUM, thus making the relationships between QUALIFICATION and EDUCATION and between EDUCATION and CANDIDATE strong. In this example, the EDUCATION entity is both weak and composite. 

Each job OPENING requires one QUALIFICATION, and any given qualification may fit many openings, thus producing a 1:M relationship between QUALIFICATION and OPENING. For example, a job opening for a C++ programmer requires an applicant to have the C++ programming qualification, but there may be many job openings for C++ programmers! However, a qualification does not require an opening. (After all, if there is no listing with a C++ requirement, a candidate who has the C++ qualification does not match the listing!) Therefore, OPENING is optional to QUALIFICATION.



In the ERD shown in Figure P4.10a, we decided to define the OPENING entity’s PK to be OPENING_NUM. This decision produces a non identifying (weak) relationship between OPENING and QUALIFICATION. However, if you want to ensure that there cannot be a listed opening unless it also lists the required qualification for that opening, the OPENING is existence-dependent on QUALIFICATION. If you then decide to let the OPENING entity inherit QUAL_CODE from QUALIFICATION as part of its PK, OPENING is properly classified as a weak entity to QUALIFICATION.



One or more candidates may fill a listed job opening. Also, keep in mind that, during some period of time, a candidate may fill many openings. (TEC supplies temporaries, remember?) Therefore, the relationship between OPENING and CANDIDATE is M:N. We will decompose this M:N relationship into two 1:M relationships, using the composite entity named PLACEMENT as the bridge between CANDIDATE and OPENING.



Because a candidate is not necessarily placed, PLACEMENT is optional to CANDIDATE. Similarly, since an opening may be listed even when there is no available candidate, PLACEMENT is optional to OPENING.

138. Use the following description of the operations of the RC_Charter2 Company to complete this exercise: Answer: 

The RC_Charter2 Company operates a fleet of aircraft under the Federal Air Regulations (FAR) Part 135 (air taxi or charter) certificate, enforced by the FAA. The aircraft are available for air taxi (charter) operations within the United States and Canada.



Charter companies provide so-called unscheduled operations—that is, charter flights take place only after a customer reserves the use of an aircraft at a designated date and time to fly to one or more designated destinations; the aircraft transports passengers, cargo, or some combination of passengers and cargo. Of course, a customer can reserve many different charter trips during any time frame. However, for billing purposes, each charter trip is reserved by one and only one customer. Some of RC_Charter2’s customers do not use the company’s charter operations; instead, they purchase fuel, use maintenance services, or use other RC_Charter2 services. However, this database design will focus on the charter operations only.

135



Each charter trip yields revenue for the RC_Charter2 Company. This revenue is generated by the charges a customer pays upon the completion of a flight. The charter flight charges are a function of aircraft model used, distance flown, waiting time, special customer requirements, and crew expenses. The distance flown charges are computed by multiplying the round-trip miles by the model’s charge per mile. Round-trip miles are based on the actual navigational path flown. The sample route traced in Figure P4.14 illustrates the procedure. Note that the number of roundtrip miles is calculated to be 130 + 200 + 180 + 390 = 900.

Destination 180 miles

Intermediate Stop

200 miles 390 miles

Pax Pickup 130 miles

Home Base FIGURE P4.14 Round-Trip Mile Determination 

Depending on whether a customer has RC_Charter2 credit authorization, the customer may do the following:

a. Pay the entire charter bill upon the completion of the charter flight. b. Pay a part of the charter bill and charge the remainder to the account. The charge amount may not exceed the available credit. c. Charge the entire charter bill to the account. The charge amount may not exceed the available credit. d. Customers may pay all or part of the existing balance for previous charter trips. Such payments may be made at any time and are not necessarily tied to a specific charter trip. The charter mileage charge includes the expense of the pilot(s) and other crew required by FAR 135. However, if customers request additional crew not required by FAR 135, those customers are charged for the crew members on an hourly basis. The hourly crewmember charge is based on each crew member’s qualifications. e. The database must be able to handle crew assignments. Each charter trip requires the use of an aircraft, and a crew flies each aircraft. The smaller, piston-engine charter aircraft require a crew consisting of only a single pilot. All jets and other aircraft that have a gross takeoff weight of at least 12,500 pounds require a pilot and a copilot, while some of the larger aircraft used to transport passengers may require flight attendants as part of the crew. Some of the older aircraft require the assignment of a flight engineer, and larger

136

cargo-carrying aircraft require the assignment of a loadmaster. In short, a crew can consist of more than one person, and not all crew members are pilots. f.

The charter flight’s aircraft waiting charges are computed by multiplying the hours waited by the model’s hourly waiting charge. Crew expenses are limited to meals, lodging, and ground transportation.

The RC_Charter2 database must be designed to generate a monthly summary of all charter trips, expenses, and revenues derived from the charter records. Such records are based on the data that each pilot in command is required to record for each charter trip: trip date(s) and time(s), destination(s), aircraft number, pilot data and other crew data, distance flown, fuel usage, and other data pertinent to the charter flight. Such charter data are then used to generate monthly reports that detail revenue and operating cost information for customers, aircraft, and pilots. All pilots and other crew members are RC_Charter2 Company employees; that is, the company does not use contract pilots and crew. FAR Part 135 operations are conducted under a strict set of requirements that govern the licensing and training of crew members. For example, pilots must have earned either a Commercial license or an Airline Transport Pilot (ATP) license. Both licenses require appropriate ratings. Ratings are specific competency requirements. For example, consider the following: 

To operate a multiengine aircraft designed for takeoffs and landings on land only, the appropriate rating is MEL, or Multiengine Landplane. When a multiengine aircraft can take off and land on water, the appropriate rating is MES, or Multiengine Seaplane.



The instrument rating is based on a demonstrated ability to conduct all flight operations with sole reference to cockpit instrumentation. The instrument rating is required to operate an aircraft under Instrument Meteorological Conditions (IMC), and all such operations are governed under FAR-specified Instrument Flight Rules (IFR). In contrast, operations conducted under “good weather” or visual flight conditions are based on the FAR Visual Flight Rules (VFR).



The type rating is required for all aircraft with a takeoff weight of more than 12,500 pounds or for aircraft that are purely jet-powered. If an aircraft uses jet engines to drive propellers, that aircraft is said to be turboprop-powered. A turboprop—that is, a turbo propeller-powered aircraft—does not require a type rating unless it meets the 12,500-pound weight limitation.



Although pilot licenses and ratings are not time limited, exercising the privilege of the license and ratings under Part 135 requires both a current medical certificate and a current Part 135 checkride. The following distinctions are important: a. The medical certificate may be Class I or Class II. The Class I medical is more stringent than the Class II, and it must be renewed every six months. The Class II medical must be renewed yearly. If the Class I medical is not renewed during the six-month period, it automatically reverts to a Class II certificate. If the Class II medical is not renewed within the specified period, it automatically reverts to a Class III medical, which is not valid for commercial flight operations. b. A Part 135 checkride is a practical flight examination that must be successfully completed every six months. The checkride includes all flight maneuvers and procedures specified in Part 135.

Nonpilot crew members must also have the proper certificates to meet specific job requirements. For example, loadmasters need an appropriate certificate, as do flight attendants. Crew members such as loadmasters and flight attendants may be required in operations that involve large aircraft

137

with a takeoff weight of more than 12,500 pounds and more than 19 passengers; these crew members are also required to pass a written and practical exam periodically. The RC_Charter2 Company is required to keep a complete record of all test types, dates, and results for each crew member, as well as examination dates for pilot medical certificates. In addition, all flight crew members are required to submit to periodic drug testing; the results must be tracked as well. Note that nonpilot crew members are not required to take pilot-specific tests such as Part 135 checkrides, nor are pilots required to take crew tests such as loadmaster and flight attendant practical exams. However, many crew members have licenses and certifications in several areas. For example, a pilot may have an ATP and a loadmaster certificate. If that pilot is assigned to be a loadmaster on a given charter flight, the loadmaster certificate is required. Similarly, a flight attendant may have earned a commercial pilot’s license. Sample data formats are shown in Table P4.14.

Table P4.14 PART A TESTS

Test Code

Test Description

Test Frequency

Part 135 Flight Check

6 months

Medical, Class I

6 months

Medical, Class II

12 months

Loadmaster Practical

12 months

Flight Attendant Practical

12 months

Drug test

Random

Operations, written exam

6 months

138

PART B RESULTS

Employee

Test Code

Test Date

Test Result

101

12-Nov-21

Pass-1

103

23-Dec-21

Pass-1

112

23-Dec-21

Pass-2

103

11-Jan-22

Pass-1

112

16-Jan-22

Pass-1

101

16-Jan-22

Pass-1

101

11-Feb-22

Pass-2

125

15-Feb-22

Pass-1

PART C LICENSES AND CERTIFICATIONS

License or Certificate

License or Certificate Description

ATP

Airline Transport Pilot

Comm

Commercial license

Med-1

Medical certificate, Class I

Med-2

Medical certificate, Class II

Instr

Instrument rating

MEL

Multiengine Land aircraft rating

Loadmaster

Flight Attendant

139

Employee

License or Certificate

Date Earned

101

Comm

12-Nov-1997

101

Instr

28-Jun-1998

101

MEL

9-Aug-1998

103

Comm

21-Dec-1999

112

23-Jun-2006

103

Instr

18-Jan-2000

112

27-Nov-2009

Pilots and other crew members must receive recurrency training appropriate to their work assignments. Recurrency training is based on an FAA-approved curriculum that is job specific. For example, pilot recurrency training includes a review of all applicable Part 135 flight rules and regulations, weather data interpretation, company flight operations requirements, and specified flight procedures. The RC_Charter2 Company is required to keep a complete record of all recurrency training for each crew member subject to the training. The RC_Charter2 Company is required to maintain a detailed record of all crew credentials and all training mandated by Part 135. The company must keep a complete record of each requirement and of all compliance data. To conduct a charter flight, the company must have a properly maintained aircraft available. A pilot who meets all of the FAA’s licensing and currency requirements must fly the aircraft as Pilot in Command (PIC). For aircraft that are powered by piston engines or turboprops and have a gross takeoff weight under 12,500 pounds, single-pilot operations are permitted under Part 135 as long as a properly maintained autopilot is available. However, even if FAR Part 135 permits single-pilot operations, many customers require the presence of a copilot who is capable of conducting the flight operations under Part 135. The RC_Charter2 operations manager anticipates the lease of turbojet-powered aircraft, which are required to have a crew consisting of a pilot and copilot. Both the pilot and copilot must meet the same Part 135 licensing, ratings, and training requirements. The company also leases larger aircraft that exceed the 12,500-pound gross takeoff weight. Those aircraft might carry enough passengers to require the presence of one or more flight attendants. If those aircraft carry cargo that weighs more than 12,500 pounds, a loadmaster must be assigned as a crew member to supervise the loading and securing of the cargo. The database must be designed to meet the anticipated capability for additional charter crew assignments. a. Given this incomplete description of operations, write all applicable business rules to establish entities, relationships, optionalities, connectivities, and cardinalities. (Hint: Use the

140

following four business rules as examples, and write the remaining business rules in the same format.) 

Each charter trip is requested by only one customer.



Some customers have not yet requested a charter trip.



An employee may be assigned to serve as a crew member on many charter trips.



Each charter trip may have many employees assigned to serve as crew members.

b. Draw the fully labeled and implementable Crow’s Foot ERD based on the business rules you wrote in Part a of this problem. Include all entities, relationships, optionalities, connectivities, and cardinalities. The following business rules can be derived from the description of operations: 

A customer may request many charter trips.



Each charter trip is requested by only one customer.



Some customers have not (yet) requested a charter trip.



Every charter trip is requested by at least one customer.



An employee may be assigned to serve as a crew member on many charter trips.



Each charter trip may have many employees assigned to it to serve as crew members.



An employee may not yet have been assigned to serve as a crew member on any charter trip.



A charter trip may not yet have any employee assigned to serve as a crew member.



Each customer may make many payments.



Some customers have not made any payments yet.



Every payment is made by only one customer.



Every payment must have been made by a customer.



A payment may be toward many charter trips.



A payment may not be in reference to any charter trip.



Every charter trip must have a payment made.



Each charter trip has only one payment.



Every charter trip involves the use of a single aircraft.



Every charter trip requires at least one aircraft.



An aircraft may be used for many charter trips.



An aircraft may not yet have been used for any charter trip.



Each aircraft is only one model airplane.



Every aircraft has a model designation.



An airplane model is not required to be associated with any aircraft that the company owns.



The company may own many aircraft of a given model.

141



A given flight assignment may be given to many crew members.



Some flight assignments may not have ever been given to any crew member.



Every crew member assignment is associated with a flight assignment.



Every crew member assignment is associated with only one flight assignment.



An employee may have taken many tests.



Some employees may have taken no tests yet.



A test may be taken by many employees.



A test may not have been taken by any employee yet.



Each employee has one job with the company.



Every employee has only one job with the company.



A job may be done by many employees.



A job may be currently unfilled and not be associated with any employee.



An employee may be a pilot, and every pilot is an employee.



A pilot may have earned many ratings.



Some pilots have not earned any rating yet.



A rating may be earned by many pilots.



Some ratings are not held by any pilots.



A pilot may have many licenses.



A pilot may not have any license yet.



A license may be held by many pilots.



A license may not be held by any pilot yet.



Every employee can have many qualifications.



Some employees do not have any qualifications.



Each qualification can be held by many employees.



Some qualifications are not held by any employee.

The completed ERD is shown in Figure P4.14b.

142

FIGURE P4.14b The RC_Charter2 Flight Department Crow’s Foot ERD

143

TABLE OF CONTENTS Answers to Review Questions………….... …………………………………………………………….1 Answers to Problems ...............................................................................................................9

ANSWERS TO REVIEW QUESTIONS 139. What is an entity supertype, and why is it used? Answer: An entity supertype is a generic entity type that is related to one or more entity subtypes, where the entity supertype contains the common characteristics and the entity subtypes contain the unique characteristics of each entity subtype. The reason for using supertypes is to minimize the number of nulls and to minimize the likelihood of redundant relationships. 140. What kinds of data would you store in an entity subtype? Answer: An entity subtype is a more specific entity type that is related to an entity supertype, where the entity supertype contains the common characteristics and the entity subtypes contain the unique characteristics of each entity subtype. The entity subtype will store the data that is specific to the entity; that is, attributes that are unique to the subtype. 141. What is a specialization hierarchy? Answer: A specialization hierarchy depicts the arrangement of higher-level entity supertypes (parent entities) and lower-level entity subtypes (child entities). To answer the question precisely, we have used the text’s Figure 5.2. (We have reproduced the figure here for your convenience.) Figure 5.2 shows the specialization hierarchy formed by an EMPLOYEE supertype and three entity subtypes—PILOT, MECHANIC, and ACCOUNTANT.

144

(Text) FIGURE 5.2 A Specialization Hierarchy

The specialization hierarchy shown in Figure 5.2 reflects the 1:1 relationship between EMPLOYEE and its subtypes. For example, a PILOT subtype occurrence is related to one instance of the EMPLOYEE supertype, and a MECHANIC subtype occurrence is related to one instance of the EMPLOYEE supertype. See Question 5 for the discussion of overlapping and disjoint subtypes. 142. What is a subtype discriminator? Give an example of its use. Answer: A subtype discriminator is the attribute in the supertype entity that is used to determine to which entity subtype the supertype occurrence is related. For any given supertype occurrence, the value of the subtype discriminator will determine which subtype the supertype occurrence is related to. For example, an EMPLOYEE supertype may include the EMP_TYPE value “P” to indicate the PROFESSOR subtype. Using Figure 5.2, the EMP_TYPE subtype discriminator attribute would have a value to represent a pilot (“P”), a mechanic (“M”), or an accountant (“A”). Notice that this is a disjoint constraint on the subtype discriminator.

145

143. What is an overlapping subtype? Give an example. Answer: Overlapping subtypes are subtypes that contain non unique subsets of the supertype entity set; that is, each entity instance of the supertype may appear in more than one subtype. For example, in a university environment, a person may be an employee or a student or both. In turn, an employee may be a professor as well as an administrator. Because an employee also may be a student, STUDENT and EMPLOYEE are overlapping subtypes of the supertype PERSON, just as PROFESSOR and ADMINISTRATOR are overlapping subtypes of the supertype EMPLOYEE. The text’s Figure 5.4 (reproduced next for your convenience) illustrates overlapping subtypes with the use of the letter O inside the category shape.

(Text) FIGURE 5.4 Specialization Hierarchy with Overlapping Subtypes

144. What is a disjoint subtype? Give an example. Answer: Disjoint subtypes, also known as nonoverlapping subtypes, are subtypes that contain a unique subset of the supertype entity set; in other words, each entity instance of the supertype can appear in only one of the subtypes. For example, in Figure 5.2, shown in Question 3, an employee (supertype) who is a pilot (subtype) can appear only in the PILOT subtype, not in any of the other subtypes. In an ERD, such disjoint subtypes are indicated by the letter d inside the category shape. See Figure 5.2 in textbook or in Question 3. Also, see Figure 5.5 Disjoint and Overlapping Subtypes in the textbook.

146

NOTE There are multiple ER notations to represent supertypes/subtypes. Please consult the documentation of the ER diagramming tool you are using. 145. What is the difference between partial completeness and total completeness? Answer: Partial completeness means that not every supertype occurrence is a member of a subtype; that is, there may be some supertype occurrences that are not members of any subtype. Total completeness means that every supertype occurrence must be a member of at least one subtype. For Questions 8–10, refer to Figure Q5.8

FIGURE Q5.8 The PRODUCT Data Model

146. List all of the attributes of a movie. Answer: Recall that the subtype inherits all of the attributes and relationships of the supertype. Therefore, all of the attributes of a subtype include the common attributes from the supertype plus the unique (unique to that subtype) attributes from the subtype. All of the attributes of a movie would be: 

Prod_Num



Prod_Title



Prod_ReleaseDate



Prod_Price



Prod_Type



Movie_Rating



Movie_Director

147

147. According to the data model, is it required that every entity instance in the PRODUCT table be associated with an entity instance in the CD table? Why or why not? Answer: No. The completeness constraint for the data model shows a total completeness constraint from PRODUCT to the subtypes. However, the total completeness constraint indicates that every instance in the supertype (PRODUCT) must be associated with one row in some subtype, not all subtypes. Since the subtypes are designated as disjoint, or exclusive, then every row in the supertype is associated a row in only one subtype. For some products that subtype will be CD, but for other products the subtype will be either Movie or Book. 148. Is it possible for a book to appear in the BOOK table without appearing in the PRODUCT table? Why or why not? Answer: No. Subtypes can only exist within the context of a supertype. 149. What is an entity cluster, and what advantages are derived from its use? Answer: An entity cluster is a “virtual” entity type used to represent multiple entities and relationships in the ERD. An entity cluster is formed by combining multiple interrelated entities into a single abstract entity object. An entity cluster is considered “virtual” or “abstract” in the sense that it is not actually an entity in the final ERD, but rather a temporary entity used to represent multiple entities and relationships with the purpose of simplifying the ERD and thus enhancing its readability. 150. What primary key characteristics are considered desirable? Explain why each characteristic is considered desirable. Answer: Desirable PK characteristics are summarized in the text’s Table 5.3, reproduced below for your convenience. The table also includes the reason why each characteristic is desirable. (See the Rationale column.)

148

(Text) TABLE 5.3 Desirable Primary Key Characteristics PK Characteristic

Rationale

Unique values

The PK must uniquely identify each entity instance. A primary key must be able to guarantee unique values. It cannot contain nulls.

Nonintelligent

The PK should not have embedded semantic meaning. An attribute with embedded semantic meaning is probably better used as a descriptive characteristic of the entity rather than as an identifier. In other words, a student ID of “650973” would be preferred over “Smith, Martha L.” as a primary key identifier.

No change over time

If an attribute has semantic meaning, it may be subject to updates. This is why names do not make good primary keys. If you have “Vickie Smith” as the primary key, what happens when she gets married? If a primary key is subject to change, the foreign key values must be updated, thus adding to the database work load. Furthermore, changing a primary key value means that you are basically changing the identity of an entity.

Preferably single-attribute

A primary key should have the minimum number of attributes possible. Single-attribute primary keys are desirable but not required. Single-attribute primary keys simplify the implementation of foreign keys. Having multiple-attribute primary keys can cause primary keys of related entities to grow through the possible addition of many attributes, thus adding to the database work load and making (application) coding more cumbersome.

Preferably numeric

Unique values can be better managed when they are numeric because the database can use internal routines to implement a “counter-style” attribute that automatically increments values with the addition of each new row. In fact, most database systems include the ability to use special constructs, such as Autonumber in MS Access, to support self-incrementing primary key attributes.

Security complaint

The selected primary key must not be composed of any attribute(s) that might be considered a security risk or violation. For example, using a Social Security number as a PK in an EMPLOYEE table is not a good idea.

149

151. Under what circumstances are composite primary keys appropriate? Answer: Composite primary keys are particularly useful in two cases: 

As identifiers of composite entities, where each primary key combination is allowed only once in the M:N relationship.



As identifiers of weak entities, where the weak entity has a strong identifying relationship with the parent entity.

To illustrate the first case, assume that you have a STUDENT entity set and a CLASS entity set. In addition, assume that those two sets are related in a M:N relationship via an ENROLL entity set in which each student/class combination may appear only once in the composite entity. The text’s Figure 5.7 (reproduced here for your convenience) shows the ERD to represent such a relationship.

(Text) FIGURE 5.7 The M:N Relationship Between Student and Class

As shown in the text’s Figure 5.7, the composite primary key automatically provides the benefit of ensuring that there cannot be duplicate values—that is, it ensures that the same student cannot enroll more than once in the same class. In the second case, a weak entity in a strong identifying relationship with a parent entity is normally used to represent one of two cases: 1. A real-world object that is existent dependent on another real-world object. Those types of objects are distinguishable in the real world. A dependent and an employee are two separate people who exist independent of each other. However, such objects can exist in the model only when they relate to each other in a strong identifying relationship. For example, the relationship between EMPLOYEE and DEPENDENT is one of existence dependency in which the primary key of the dependent entity is a composite key that contains the key of the parent entity.

150

2. A real-world object that is represented in the data model as two separate entities in a strong identifying relationship. For example, the real-world invoice object is represented by two entities in a data model: INVOICE and LINE. Clearly, the LINE entity does not exist in the real world as an independent object, but rather as part of an INVOICE. In both cases, having a strong identifying relationship ensures that the dependent entity can exist only when it is related to the parent entity. In summary, the selection of a composite primary key for composite and weak entity types provides benefits that enhance the integrity and consistency of the model. 152. What is a surrogate primary key, and when would you use one? Answer: A surrogate primary key is an “artificial” PK that is used to uniquely identify each entity occurrence when there is no good natural key available or when the “natural” PK would include multiple attributes. A surrogate PK is also used if the natural PK would be a long text variable. The reason for using a surrogate PK is to ensure entity integrity, to simplify application development by making queries simpler, to ensure query efficiency—for example, a query based on a simple numeric attribute is much faster than one based on a 200-bit character string—and to ensure that relationships between entities can be created more easily than would be the case with a composite PK that may have to be used as an FK in a related entity. 153. When implementing a 1:1 relationship, where should you place the foreign key if one side is mandatory and one side is optional? Should the foreign key be mandatory or optional? Answer: Section 5.4.1 provides a detailed discussion. The text’s Table 5.5, reproduced here for your convenience, shows the rationale for selecting the foreign key in a 1:1 relationship based on the relationship properties in the ERD.

(Text) TABLE 5.5 Selection of Foreign Key in a 1:1 Relationship Case

ER Relationship Constraints

Action

One side is mandatory and Place the PK of the entity on the mandatory the other side is optional. side in the entity on the optional side as an FK and make the FK mandatory.

Both sides are optional.

Select the FK that causes the fewest nulls, or place the FK in the entity in which the (relationship) role is played.

III

Both sides are mandatory.

See Case II or consider revising your model to ensure that the two entities do not belong together in a single entity.

151

154. What is time-variant data, and how would you deal with such data from a database design point of view? Answer: As the label implies, time-variant data are time-sensitive. For example, if a university wants to keep track of the history of all administrative appointments by date of appointment and date of termination, you see time-variant data at work. Other examples of time-variant data are stock prices; they vary multiple times during the day. Generally, an accepted design practice is to record the opening and closing stock price. In other cases, such as product prices (they change over a period of time), it is a common practice to store current prices in the product table and past prices in a related table with a date field to represent the date the price changed. Also, the teacher could use Figure 5.11 to illustrate a history of the various jobs and salaries a person had over time. 155. What is the most common design trap, and how does it occur? Answer: A design trap occurs when a relationship is improperly or incompletely identified and therefore, it is represented in a way that is not consistent with the real world. The most common design trap is known as a fan trap. A fan trap occurs when you have one entity in two 1:M relationships to other entities, thus producing an association among the other entities that is not expressed in the model.

152

ANSWERS TO PROBLEMS Given the following business scenario, create a Crow’s Foot ERD using a specialization hierarchy if appropriate. Two-Bit Drilling Company keeps information on employees and their insurance dependents. Each employee has an employee number, name, date of hire, and title. If an employee is an inspector, then the date of certification and the certification renewal date should also be recorded in the system. For all employees, the Social Security number and dependent names should be kept. All dependents must be associated with one and only one employee. Some employees will not have dependents, while others will have many dependents. Answer: The data model for this solution is shown in Figure P5.1 below.

FIGURE P5.1 Two-Bit Drilling Company ERD

In this scenario, a specialization hierarchy is appropriate because there is an identifiable type or kind of employee (Inspectors), and additional attributes are recorded that are specific to just that kind or type. It is worth noting that if there is only a single subtype, the disjoint/overlapping designation may be omitted—if there is only one subtype then there is no other subtype to overlap or be disjoint from. Also, when there is only a single subtype, the completeness constraint is always partial completeness. If the completeness constraint were identified as total completeness, that would mean that every employee must be an inspector, in which inspector would be a synonym for employee not a kind of employee. Another test for determining if a specialization hierarchy is appropriate is if the entity subtype will be involved in a relationship with other entities. It is safe to assume that there will be cases in which the INSPECTOR entity will be related to other entities, rather than to the EMPLOYEE entity.

153

156. Given the following business scenario, create a Crow’s Foot ERD using a specialization hierarchy if appropriate. Tiny Hospital keeps information on patients and hospital rooms. The system assigns each patient a patient ID number. In addition, the patient’s name and date of birth are recorded. Some patients are resident patients who spend at least one night in the hospital and others are outpatients who are treated and released. Resident patients are assigned to a room. Each room is identified by a room number. The system also stores the room type (private or semiprivate) and room fee. Over time, each room will have many patients. Each resident patient will stay in only one room. Every room must have had a patient, and every resident patient must have a room. Answer: The data model for this scenario is given in Figure P5.2 below.

FIGURE P5.2 Tiny Hospital ERD

Note that in this scenario, a specialization hierarchy is not appropriate. While resident patients are an identifiable kind or type of patient instance, there are not additional attributes that are unique to only that kind or type of patient. Participation in a relationship that is unique to a particular kind or type of instance is not sufficient justification for a specialization hierarchy. Indicating that only some instances will participate in a relationship is addressed by the optional participation designation. In this scenario, all resident patients must have a room; however, not all patients are resident patients so ROOM is optional to patient. If students ask about the need for an attribute to distinguish between outpatients and resident patients, remind them that in this limited scenario the only distinction between outpatients and resident patients is whether or not they are associated with a room. Therefore, the Room_Num foreign key in the PATIENT table can serve in that capacity. 157. Given the following business scenario, create a Crow’s Foot ERD using a specialization hierarchy if appropriate. Granite Sales Company keeps information on employees and the departments in which they work. For each department, the department name, internal mailbox number, and office phone extension are kept. A department can have many assigned employees, and each employee is assigned to only one department. Employees can be salaried, hourly, or work on contract. All employees are assigned an employee number, which is kept along with the employee’s name and address. For hourly employees, hourly wages and target weekly work hours are stored; for example, the company may target 40 hours/week for some employees, 32 for others, and 20 for others. Some salaried employees are salespeople who can earn a commission in addition to their base salary. For all salaried employees, the yearly salary amount is recorded in the system. For salespeople, their commission percentage on sales and commission percentage on profit are stored in the system. For example, John is a salesperson with a base salary of $50,000 per year plus a 2 percent commission on the sales price for all sales he makes, plus another 5 percent of the profit on each of those sales. For contract employees, the beginning date and end date of their contracts are stored along with the billing rate for their hours. Answer: The data model for this scenario is given in Figure P5.3 below.

154

Notice in the ERD that not all salaried employees earn commissions. Therefore, only salespersons earn commissions but also have a base salary. Remember the subtype inherit the parent entity’s attributes.

155

4. In Chapter 4, you saw the creation of the Tiny College database design, which reflected such business rules as “a professor may advise many students” and “a professor may chair one department.” Modify the design shown in Figure 4.35 to include these business rules: 

An employee could be staff, a professor, or an administrator.



A professor may also be an administrator.



Staff employees have a work-level classification, such as Level I or Level II.



Only professors can chair a department. A department is chaired by only one professor.



Only professors can serve as the dean of a college. Each of the university’s colleges is served by one dean.



A professor can teach many classes.



Administrators have a position title.

Given that information, create the complete ERD that contains all primary keys, foreign keys, and main attributes. Answer: The solution is shown in Figure P5.4 below.

156

FIGURE P5.4 Updated Tiny College ERD

157

Note that the business rules require that the subtypes be overlapping for some subtypes but disjoint for others. Specifically, the STAFF subtype is disjoint from ADMIN and PROFESSOR, but ADMIN and PROFESSOR are overlapping. Such complex requirements may be implemented in the database through the use of database constraints as described in Chapter 7, Introduction to Structured Query Language (SQL). Tiny College wants to keep track of the history of all its administrative appointments, including dates of appointment and dates of termination. (Hint: Time-variant data are at work.) The Tiny College chancellor may want to know how many deans worked in the College of Business between January 1, 1960, and January 1, 2022, or who the dean of the College of Education was in 1990. Given that information, create the complete ERD that contains all primary keys, foreign keys, and main attributes. Answer: The solution is shown in the following figure:

FIGURE P5.5 Tiny College Job History ERD Segment

158

6. Some Tiny College staff employees are information technology (IT) personnel. Some IT personnel provide technology support for academic programs, some provide technology infrastructure support, and some provide support for both. IT personnel are not professors; they are required to take periodic training to retain their technical expertise. Tiny College tracks all IT personnel training by date, type, and results (completed versus not completed). Given that information, create the complete ERD that contains all primary keys, foreign keys, and main attributes. Answer: This problem provides an opportunity to reinforce the idea that to qualify as a subtype, the identifiable kind or type of instance must include additional attributes—being an identifiable kind or type of entity instance is necessary but not sufficient to justify the creation of subtypes. Given the minimal attributes specified in the problem, the solution would be as shown in Figure 5.6A.

FIGURE 5.6A Minimal Tiny College IT Staffing Solution

If, as is often the case in the problems included in textbook, we assume that the attributes specified are just a subset of the complete attribute requirements for each entity, we can consider what the data model would be given that additional attributes that are unique to the described kinds of entity instances will exist. In that case, the expanded solution including subtypes for the described kinds of staff members is shown in Figure 5.6B.

159

FIGURE 5.6B Expanded Tiny College IT Staffing Solution

Note that in the specification of ITSTAFF as a subtype of STAFF, there is no disjoint/overlapping designation for the subtype. When there is only one subtype, there is nothing to be disjointed from or to overlap with; therefore, the designation may be safely omitted.

160

7. The FlyRight Aircraft Maintenance (FRAM) division of the FlyRight Company (FRC) performs all maintenance for FRC’s aircraft. Produce a data model segment that reflects the following business rules: 

All mechanics are FRC employees. Not all employees are mechanics.



Some mechanics are specialized in engine (EN) maintenance. Others are specialized in airframe (AF) maintenance or avionics (AV) maintenance. (Avionics are the electronic components of an aircraft that are used in communication and navigation.) All mechanics take periodic refresher courses to stay current in their areas of expertise. FRC tracks all courses taken by each mechanic—date, course type, certification (Y/N), and performance.



FRC keeps an employment history of all mechanics. The history includes the date hired, date promoted, and date terminated.

Given those requirements, create the Crow’s Foot ERD segment. Answer: The solution is shown in the following figure:

Note that this is a very simplified version of the aircraft problem domain. The purpose is to help students with the modeling notation for specialization hierarchies and to illustrate how this notation is different from the original entity relationship models. To truly justify the existence of the mechanic subtypes, each subtype MUST have attributes that are unique to that particular subtype. A good class exercise is to have students suggest attributes that may be unique to each subtype.

161

8. “Martial Arts R Us” (MARU) needs a database. MARU is a martial arts school with hundreds of students. The database must keep track of all the classes that are offered, who is assigned to teach each class, and which students attend each class. Also, it is important to track the progress of each student as they advance. Create a complete Crow’s Foot ERD for these requirements: 

Students are given a student number when they join the school. The number is stored along with their name, date of birth, and the date they joined the school.



All instructors are also students, but clearly not all students are instructors. In addition to the normal student information, for all instructors, the date that they start working as an instructor must be recorded along with their instructor status (compensated or volunteer).



An instructor may be assigned to teach any number of classes, but each class has one and only one assigned instructor. Some instructors, especially volunteer instructors, may not be assigned to any class.



A class is offered for a specific level at a specific time, day of the week, and location. For example, one class taught on Mondays at 5:00 p.m. in Room 1 is an intermediate-level class. Another class taught on Mondays at 6:00 p.m. in Room 1 is a beginner-level class. A third class taught on Tuesdays at 5:00 p.m. in Room 2 is an advanced-level class.



Students may attend any class of the appropriate level during each week, so there is no expectation that any particular student will attend any particular class session. Therefore, the attendance of students at each individual class meeting must be tracked.



A student will attend many different class meetings, and each class meeting is normally attended by many students. Some class meetings may not be attended by any students. New students may not have attended any class meetings yet.



At any given meeting of a class, instructors other than the assigned instructor may show up to help. Therefore, a given class meeting may have a head instructor and many assistant instructors, but it will always have at least the one instructor who is assigned to that class. For each class meeting, the date of the class and the instructors’ roles (head instructor or assistant instructor) need to be recorded. For example, Mr. Jones is assigned to teach the Monday, 5:00 p.m., intermediate class in Room 1. During a particular meeting of that class, Mr. Jones was the head instructor and Ms. Chen served as an assistant instructor.



Each student holds a rank in the martial arts. The rank name, belt color, and rank requirements are stored. Most ranks have numerous rank requirements, but each requirement is associated with only one particular rank. All ranks except white belt have at least one requirement.



A given rank may be held by many students. While it is customary to think of a student as having a single rank, it is necessary to track each student’s progress through the ranks. Therefore, every rank that a student attains is kept in the system. New students joining the school are automatically given the rank of white belt. The date that a student is awarded each rank should be kept in the system. All ranks have at least one student who has achieved that rank at some time.

Answer: The solution for this case is shown in Figure P5.8 below.

162

FIGURE P5.8 MARU ERD Solution

Notice that the figure includes surrogate keys for RANK, REQUIREMENT, and MEETING because the natural keys did not meet the requirements for a good primary key. The most common areas for confusion among students on this particular case surround attendance in the class meetings. Students tend to think of relationship between CLASS and STUDENT similar to the M:N enroll relationship that they have seen throughout the textbook. In this case, however, the relationship is not an enrollment relationship—instead it is an attendance relationship. As described in the case, students do not enroll in any particular class. What must be tracked is the attendance for each individual class meeting. Therefore, the M:N relationship in this scenario is actually between the STUDENT and the individual class MEETING.

163

The case also provides an opportunity to reinforce the fact that subtypes inherit not only the attributes of the supertype but also the relationships. One requirement of the case is that the system must be able to track which instructors actually taught each class meeting. There is already a M:N relationship between STUDENT and MEETING that can be implemented with the ATTENDANCE bridge entity using only the Stu_Num and Meet_Num attributes. Students should consider that because INSTRUCTOR is a subtype of STUDENT, instructors are already associated in a M:N relationship with MEETING through that same bridge. By adding the Attend_Role attribute to ATTENDANCE, the bridge entity can properly track all students in a given class meeting and record what role they played in that meeting (e.g., student, assistant instructor, or head instructor). Finally, it is worth pointing out to the students that requirements are described as being an attribute of a rank. Some students will immediately consider requirements to be an entity, while others will model requirement as an attribute of the RANK entity. Considering rank requirements to be an attribute of RANK is perfectly acceptable—however, it must be noted that as such rank requirements would be a multivalued attribute. Therefore, the preferred implementation of a multivalued attribute (creating a new entity for the multivalued attribute) would result in the creation of the REQUIREMENT table anyway. So either way the student approaches the problem, it will eventually lead to the solution shown above. 9. The Journal of E-commerce Research Knowledge is a prestigious information systems research journal. It uses a peer-review process to select manuscripts for publication. Only about 10 percent of the manuscripts submitted to the journal are accepted for publication. A new issue of the journal is published each quarter. Create a complete ERD to support the business needs described below. 

Unsolicited manuscripts are submitted by authors. When a manuscript is received, the editor assigns it a number and records some basic information about it in the system, including the title of the manuscript, the date it was received, and a manuscript status of “received.” Information about the author(s) is also recorded, including each author’s name, mailing address, email address, and affiliation (the author’s school or company). Every manuscript must have an author. Only authors who have submitted manuscripts are kept in the system. It is typical for a manuscript to have several authors. A single author may have submitted many different manuscripts to the journal. Additionally, when a manuscript has multiple authors, it is important to record the order in which the authors are listed in the manuscript credits.



At his or her earliest convenience, the editor will briefly review the topic of the manuscript to ensure that its contents fall within the scope of the journal. If the content is not appropriate for the journal, the manuscript’s status is changed to “rejected,” and the author is notified via email. If the content is within the scope of the journal, then the editor selects three or more reviewers to review the manuscript. Reviewers work for other companies or universities and read manuscripts to ensure their scientific validity. For each reviewer, the system records a reviewer number, name, email address, affiliation, and areas of interest. Areas of interest are predefined areas of expertise that the reviewer has specified. An area of interest is identified by an IS code and includes a description (e.g., IS2003 is the code for “database modeling”). A reviewer can have many areas of interest, and an area of interest can be associated with many reviewers. All reviewers must specify at least one area of interest. It is unusual, but possible, to have an area of interest for which the journal has no reviewers. The editor will change the status of the manuscript to “under review” and record which reviewers received the manuscript and the date it was sent to each reviewer. A reviewer will typically receive several manuscripts to review each year, although new reviewers may not have received any manuscripts yet.

164



The reviewers will read the manuscript at their earliest convenience and provide feedback to the editor. The feedback from each reviewer includes rating the manuscript on a 10-point scale for appropriateness, clarity, methodology, and contribution to the field, as well as a recommendation for publication (accept or reject). The editor will record all of this information in the system for each review received, along with the date the feedback was received. After all of the reviewers have provided their evaluations, the editor will decide whether to publish the manuscript and change its status to “accepted” or “rejected.” If the manuscript will be published, the date of acceptance is recorded.



After a manuscript has been accepted for publication, it must be scheduled. For each issue of the journal, the publication period (fall, winter, spring, or summer), publication year, volume, and number are recorded. An issue will contain many manuscripts, although the issue may be created in the system before it is known which manuscripts will be published in that issue. An accepted manuscript appears in only one issue of the journal. Each manuscript goes through a typesetting process that formats the content, including fonts, font size, line spacing, justification, and so on. After the manuscript has been typeset, its number of pages is recorded in the system. The editor will then decide which issue each accepted manuscript will appear in and the order of manuscripts within each issue. The order and the beginning page number for each manuscript must be stored in the system. After the manuscript has been scheduled for an issue, the status of the manuscript is changed to “scheduled.” After an issue is published, the print date for the issue is recorded, and the status of each manuscript in that issue is changed to “published.”

Answer: The solution for this case is shown in Figure P5.9 below.

165

FIGURE P5.9 Journal of E-Commerce Research Knowledge ERD Solution

166

Again, this is another opportunity to stress to students that the creation of subtypes requires that there exist identifiable kinds or types of entity instances and that kind or type must have additional attributes that are unique to that kind or type. In this case, AUTHOR is a subtype because it is an identifiable kind or type of PERSON and it includes additional attributes that are unique to authors (i.e., the address attributes). There is no subtype for reviewers because there are no attributes that are unique to just that kind or type of PERSON. Reviewers do have relationships that are unique to them, but that is not a sufficient reason to create a subtype. It is not uncommon for students to want to make a separate subtype for each value that the manuscript status attribute can have. Students will often, rightly, point out that there are new attributes that come into play with different manuscript statuses. What the students are missing is that there is no described mechanism by which a manuscript that has been accepted can fail to be published. Therefore, once a manuscript is accepted, it does have all of the attributes in the ACCEPTED subtype—the user just doesn’t have a value for all of them yet. Global Unified Technology Sales (GUTS) is moving toward a “bring your own device” (BYOD) model for employee computing. Employees can use traditional desktop computers in their offices. They can also use a variety of personal mobile computing devices such as tablets, smartphones, and laptops. The new computing model introduces some security risks that GUTS is attempting to address. The company wants to ensure that any devices connecting to their servers are properly registered and approved by the Information Technology department. Create a complete ERD to support the following business needs: 

Every employee works for a department that has a department code, name, mailbox number, and phone number. The smallest department currently has five employees, and the largest department has 40 employees. This system will only track in which department an employee is currently employed. Very rarely, a new department can be created within the company. At such times, the department may exist temporarily without any employees. For every employee, an employee number and name (first, last, and middle initial) are recorded in the system. It is also necessary to keep each employee’s title.



An employee can have many devices registered in the system. Each device is assigned an identification number when it is registered. Most employees have at least one device, but newly hired employees might not have any devices registered initially. For each device, the brand and model need to be recorded. Only devices that are registered to an employee will be in the system. While unlikely, it is possible that a device could transfer from one employee to another. However, if that happens, only the employee who currently owns the device is tracked in the system. When a device is registered in the system, the date of that registration needs to be recorded.



Devices can be either desktop systems that reside in a company office or mobile devices. Desktop devices are typically provided by the company and are intended to be a permanent part of the company network. As such, each desktop device is assigned a static IP address, and the MAC address for the computer hardware is kept in the system. A desktop device is kept in a static location (building name and office number). This location should also be kept in the system so that if the device becomes compromised, the IT department can dispatch someone to remediate the problem.



For mobile devices, it is important to also capture the device’s serial number, which operating system (OS) it is using, and the version of the OS. The IT department is also verifying that each mobile device has a screen lock enabled and has encryption enabled for data. The system should support storing information on whether or not each mobile device has these capabilities enabled.

167



After a device is registered in the system, and the appropriate capabilities are enabled if it is a mobile device, the device may be approved for connections to one or more servers. Not all devices meet the requirements to be approved at first so the device might be in the system for a period of time before it is approved to connect to any server. GUTS has a number of servers, and a device must be approved for each server individually. Therefore, it is possible for a single device to be approved for several servers but not for all servers.



Each server has a name, brand, and IP address. Within the IT department’s facilities are a number of climate-controlled server rooms where the physical servers can be located. Which room each server is in should also be recorded. Further, it is necessary to track which operating system is being used on each server. Some servers are virtual servers and some are physical servers. If a server is a virtual server, then the system should track which physical server it is running on. A single physical server can host many virtual servers, but each virtual server is hosted on only one physical server. Only physical servers can host a virtual server. In other words, one virtual server cannot host another virtual server. Not all physical servers host a virtual server.



A server will normally have many devices that are approved to access the server, but it is possible for new servers to be created that do not yet have any approved devices. When a device is approved for connection to a server, the date of that approval should be recorded. It is also possible for a device that was approved for a server to lose its approval. If that happens, the date that the approval was removed should be recorded. If a device loses its approval, it may regain that approval at a later date if whatever circumstance that lead to the removal is resolved.



A server can provide many user services, such as email, chat, homework managers, and others. Each service on a server has a unique identification number and name. The date that GUTS began offering that service should be recorded. Each service runs on only one server although new servers might not offer any services initially. Client-side services are not tracked in this system so every service must be associated with a server.



Employees must get permission to access a service before they can use it. Most employees have permissions to use a wide array of services, but new employees might not have permission on any service. Each service can support multiple approved employees as users, but new services might not have any approved users at first. The date on which the employee is approved to use a service is tracked by the system. The first time an employee is approved to access a service, the employee must create a username and password. This will be the same username and password that the employee will use for every service for which the employee is eventually approved.

Answer: The solution for this case is shown in Figure P5.10 below.

168

FIGURE P5.10 Global Unified Technology Sales ERD Solution

158. Global Computer Solutions (GCS) is an information technology consulting company with many offices throughout the United States. The company’s success is based on its ability to maximize its resources—that is, its ability to match highly skilled employees with projects according to region. To better manage its projects, GCS has contacted you to design a database so GCS managers can keep track of their customers, employees, projects, project schedules, assignments, and invoices.

169

The GCS database must support all of GCS’s operations and information requirements. A basic description of the main entities follows: 

The employees of GCS must have an employee ID, a last name, a middle initial, a first name, a region, and a date of hire recorded in the system.



Valid regions are as follows: Northwest (NW), Southwest (SW), Midwest North (MN), Midwest South (MS), Northeast (NE), and Southeast (SE).



Each employee has many skills, and many employees have the same skill.



Each skill has a skill ID, description, and rate of pay. Valid skills are as follows: Data Entry I, Data Entry II, Systems Analyst I, Systems Analyst II, Database Designer I, Database Designer II, Java I, Java II, C++ I, C++ II, Python I, Python II, ColdFusion I, ColdFusion II, ASP I, ASP II, Oracle DBA, MS SQL Server DBA, Network Engineer I, Network Engineer II, Web Administrator, Technical Writer, and Project Manager. Table P5.11A shows an example of the Skills Inventory.

TABLE P5.11A Skill

Employee

Data Entry I

Seaton Amy; Williams Josh; Underwood Trish

Data Entry II

Williams Josh; Seaton Amy

Systems Analyst I

Craig Brett; Sewell Beth; Robbins Erin; Bush Emily; Zebras Steve

Systems Analyst II

Chandler Joseph; Burklow Shane; Robbins Erin

DB Designer I

Yarbrough Peter; Smith Mary

DB Designer II

Yarbrough Peter; Pascoe Jonathan

Java I

Kattan Chris; Epahnor Victor; Summers Anna; Ellis Maria

Java II

Kattan Chris; Epahnor Victor, Batts Melissa

C++ I

Smith Jose; Rogers Adam; Cope Leslie

C++ II

Rogers Adam; Bible Hanah

Python I

Zebras Steve; Ellis Maria

Python II

Zebras Steve; Newton Christopher

ColdFusion I

Duarte Miriam; Bush Emily

ColdFusion II

Bush Emily; Newton Christopher

ASP I

Duarte Miriam; Bush Emily

170

Skill

Employee

ASP II

Duarte Miriam; Newton Christopher

Oracle DBA

Smith Jose; Pascoe Jonathan

SQL Server DBA

Yarbrough Peter; Smith Jose

Network Engineer I

Bush Emily; Smith Mary

Network Engineer II

Bush Emily; Smith Mary

Web Administrator

Bush Emily; Smith Mary; Newton Christopher

Technical Writer

Kilby Surgena; Bender Larry

Project Manager

Paine Brad; Mudd Roger; Kenyon Tiffany; Connor Sean



GCS has many customers. Each customer has a customer ID, name, phone number, and region.



GCS works by projects. A project is based on a contract between the customer and GCS to design, develop, and implement a computerized solution. Each project has specific characteristics such as the project ID, the customer to which the project belongs, a brief description, a project date (the date the contract was signed), an estimated project start date and end date, an estimated project budget, an actual start date, an actual end date, an actual cost, and one employee assigned as the manager of the project.



The actual cost of the project is updated each Friday by adding that week’s cost to the actual cost. The week’s cost is computed by multiplying the hours each employee worked by the rate of pay for that skill.



The employee who is the manager of the project must complete a project schedule, which effectively is a design and development plan. In the project schedule (or plan), the manager must determine the tasks that will be performed to take the project from beginning to end. Each task has a task ID, a brief task description, starting and ending dates, the types of skills needed, and the number of employees (with the required skills) needed to complete the task. General tasks are the initial interview, database and system design, implementation, coding, testing, and final evaluation and sign-off. For example, GCS might have the project schedule shown in Table P5.11B.

171

TABLE P5.11B Project ID:

Description: Sales Management System

Company :

See Rocks

Contract Date: 2/12/2022

Region: NW

Start Date:

3/1/2022

End Date:

Budget: $15,500

Start Date

End Date

Task Description

Skill(s) Required

3/1/18

3/6/22

Initial Interview

Project Manager

Systems Analyst II

DB Designer I

7/1/2022

Quantity Required

3/11/18

3/15/22

Database Design

DB Designer I

3/11/18

4/12/22

System Design

Systems Analyst II

Systems Analyst I

3/18/18

3/22/22

Database Implementation

Oracle DBA

3/25/18

5/20/22

System Coding & Testing

Java I

Java II

Oracle DBA

3/25/18

6/7/22

System Documentation

Technical Writer

6/10/18

6/14/22

Final Evaluation

Project Manager

Systems Analyst II

DB Designer I

Java II

Project Manager

Systems Analyst II

DB Designer I

Java II

Project Manager

6/17/18

7/1/18

6/21/22

7/1/22

On site System Online and Data Loading

Sign-Off

172



GCS pools all of its employees by region; from this pool, employees are assigned to a specific task scheduled by the project manager. For example, in the first project’s schedule, you know that a Systems Analyst II, Database Designer I, and Project Manager are needed for the period from 3/1/22 to 3/6/22. The project manager is assigned when the project is created and remains for the duration of the project. Using that information, GCS searches the employees who are located in the same region as the customer, matches the skills required, and assigns the employees to the project task.



Each project schedule task can have many employees assigned to it, and a given employee can work on multiple project tasks. However, an employee can work on only one project task at a time. For example, if an employee is already assigned to work on a project task from 2/20/22 to 3/3/22, the employee cannot work on another task until the current assignment is closed (ends). The date that an assignment is closed does not necessarily match the ending date of the project schedule task because a task can be completed ahead of or behind schedule.



Given all of the preceding information, you can see that the assignment associates an employee with a project task, using the project schedule. Therefore, to keep track of the assignment, you require at least the following information: assignment ID, employee, project schedule task, assignment start date, and assignment end date. The end date could be any date, as some projects run ahead of or behind schedule. Table P5.11C shows a sample assignment form.

TABLE P5.11C Project ID: 1

Description: Sales Management System

Company: See Rocks

Contract Date: 2/12/2022

SCHEDULED

As of: 03/29/22 ACTUAL ASSIGNMENTS

Project Task

Start Date

End Date

Skill

Employee

Start Date

End Date

Initial Interview

3/1/22

3/6/22

Project Mgr.

101—Connor S.

3/1/22

3/6/22

Sys. Analyst II

102—Burklow S.

3/1/22

3/6/22

DB Designer I

103—Smith M.

3/1/22

3/6/22 3/14/22

Database Design

3/11/22

3/15/22

DB Designer I

104—Smith M.

3/11/22

System Design

3/11/22

4/12/22

Sys. Analyst II

105—Burklow S.

3/11/22

Sys. Analyst I

106—Bush E.

3/11/22

Sys. Analyst I

107—Zebras S.

3/11/22

Database Implementation

3/18/22

3/22/22

Oracle DBA

108—Smith J.

3/15/22

System Coding & Testing

3/25/22

5/20/22

Java I

109—Summers A.

3/21/22

3/19/22

173

Java I

110—Ellis M.

3/21/22

Java II

111—Ephanor V.

3/21/22

Oracle DBA

112—Smith J.

3/21/22

113—Kilby S.

3/25/22

System Documentation

3/25/22

6/7/22

Tech. Writer

Final Evaluation

6/10/22

6/14/22

Project Mgr. Sys. Analyst II DB Designer I Java II

On site System Online and Data Loading

6/17/22

6/21/22

Project Mgr. Sys. Analyst II DB Designer I Java II

Sign-Off

7/1/22

Project Mgr.

(Note: The assignment number is shown as a prefix of the employee name—e.g., 101 or 102.) Assume that the assignments shown previously are the only ones as of the date of this design. The assignment number can be any number that matches your database design. 

Employee work hours are kept in a work log, which contains a record of the actual hours worked by employees on a given assignment. The work log is a form that the employee fills out at the end of each week (Friday) or at the end of each month. The form contains the date, which is either the current Friday of the month or the last workday of the month if it does not fall on a Friday. The form also contains the assignment ID, the total hours worked either that week or up to the end of the month, and the bill number to which the work-log entry is charged. Obviously, each work-log entry can be related to only one bill. A sample list of the current work-log entries for the first sample project is shown in Table P5.11D.

174

TABLE P5.11D Employee Name

Week Ending

Assignment Number

Burklow S.

3/1/22

1-102

xxx

Connor S.

3/1/22

1-101

xxx

Smith M.

3/1/22

1-103

xxx

Burklow S.

3/8/22

1-102

xxx

Connor S.

3/8/22

1-101

xxx

Smith M.

3/8/22

1-103

xxx

Burklow S.

3/15/22

1-105

xxx

Bush E.

3/15/22

1-106

xxx

Smith J.

3/15/22

1-108

xxx

Smith M.

3/15/22

1-104

xxx

Zebras S.

3/15/22

1-107

xxx

Burklow S.

3/22/22

1-105

Bush E.

3/22/22

1-106

Ellis M.

3/22/22

1-110

Ephanor V.

3/22/22

1-111

Smith J.

3/22/22

1-108

Smith J.

3/22/22

1-112

Summers A.

3/22/22

1-109

Zebras S.

3/22/22

1-107

Burklow S.

3/29/22

1-105

Bush E.

3/29/22

1-106

Ellis M.

3/29/22

1-110

Hours Worked

Bill Number

175

Employee Name

Week Ending

Assignment Number

Ephanor V.

3/29/22

1-111

Kilby S.

3/29/22

1-113

Smith J.

3/29/22

1-112

Summers A.

3/29/22

1-109

Zebras S.

3/29/22

1-107

Hours Worked

Bill Number

Note: xxx represents the bill ID. Use the one that matches the bill number in your database. 

Finally, every 15 days, a bill is written and sent to the customer for the total hours worked on the project during that period. When GCS generates a bill, it uses the bill number to update the work-log entries that are part of the bill. In summary, a bill can refer to many work-log entries, and each work-log entry can be related to only one bill. GCS sent one bill on 3/15/22 for the first project (SEE ROCKS), totaling the hours worked between 3/1/22 and 3/15/22. Therefore, you can safely assume that there is only one bill in this table and that the bill covers the work-log entries shown in the preceding form.

Your assignment is to create a database that fulfills the operations described in this problem. The minimum required entities are employee, skill, customer, region, project, project schedule, assignment, work log, and bill. (There are additional required entities that are not listed.) 

Create all of the required tables and required relationships.



Create the required indexes to maintain entity integrity when using surrogate primary keys.



Populate the tables as needed, as indicated in the sample data and forms.

This is a complex database design case that requires the identification of many business rules, the organization of those business rules, and the development of a complete database model. Note that this database design case has three primary objectives: 

Evaluation of primary keys and surrogate keys. (When should each one be used?)



Evaluation of the use of indexes on candidate keys to avoid duplicate entries when using surrogate keys.



Evaluation of the use of redundant relationships. In some cases, it is better to have the foreign key attribute added to an entity, instead of using multiple join operations.

We recommend that you use this problem as the basis for a two-part case project. One way to work with this database case is to form small groups of two or three students and then let each group work the problem independently. The following bullet list provides a sample scenario: 

Divide the class in groups of three students per group.



Distribute the GCS database case to all students.



Assign a deadline for the groups to submit an initial design ERD with written explanations of the ERD components and features. This deadline should be two

176

weeks from the assignment date. (While the groups are working on the design phase, students will be learning to use SQL to generate information.) 

The initial ERD must include:  All the main entities with all primary/foreign keys clearly labeled.  The identification of all relevant dependent attributes.  For each table, the identification of all possible required indexes.



Meet with each group and evaluate each design, paying close attention to:  The propagation of primary/foreign keys and how surrogate keys would be useful to simplify the design.  The use of indexes to minimize the occurrence of duplicate entries.  By this time, students should be familiar with SQL. Ask questions about how a query would be written to generate information. You can use the sample queries provided in the GCSdata-sol.mdb teacher solution file. (This database is located on your Instructor’s CD.)

Please note that there are two database files available: 

The GCSdata.mdb database is located in the Student subfolder on the Instructor’s CD. This MS Access database contains the sample CUSTOMER, EMPLOYEE, REGION, and SKILL tables. You can either distribute this file to your students by copying it to a common drive in your lab or you can ask your students to download this file from the Course Technology website for this book.



The GCSdata-sol.mdb database is located in the Teacher subfolder on the Instructor’s CD. This MS Access database contains the complete set of populated tables. In addition, the solution database contains some sample queries. You can use the sample queries as the basis for second part of this case, which may be used to complement the SQL coverage in Chapters 7 and 8.

Figure P5-11A shows the sample tables in the GCSdata.mdb student database.

177

FIGURE P5-11A GCS Student Sample Database Tables

The GCSdata-sol.mdb file contains the solution for this design case. Figure P5-11B shows the relational diagram for the solution.

178

FIGURE P5.11B Relational Diagram for the GCS Database

To help your students understand the ERD, use Table P5.11 to describe the main tables and the main indexes that are appropriate for this design implementation.

TABLE P5.11 ERD Documentation Table Name

Primary key

Unique, Not Null Index (on candidate key)

Explanation

Customer

cus_id (surrogate)

unique(cus_name)

The unique index on cus_name is used to ensure no duplicate customers exist.

Region

region_id (surrogate)

unique(region_name)

The unique index on region_name is used to ensure that no duplicate regions are entered.

Employee

emp_id (surrogate)

unique(emp_lname, emp_fname, emp_mi)

The unique index on emp_lname, emp_fname, and emp_mi is used to ensure that no duplicate employees are entered.

Skill

skill_id (surrogate)

unique(skill_description)

The unique index on skill_description is used to ensure that no duplicate skills are entered.

179

Table Name

Primary key

Unique, Not Null Index (on candidate key)

skill_id

Explanation

EmpSkill

emp_id, (composite)

The composite primary key ensures that no duplicate skills are entered for each employee.

Project

prj_id (surrogate)

unique(cus_id, prj_description)

The unique index on cus_id and prj_description is used to ensure that no duplicate project entries exist for a given customer.

Task (project schedule)

task_id (surrogate)

unique(prj_id, task_descript)

The unique index on prj_id and task_descript is used to ensure that no duplicate task is given for the same project.

TS (task ts_id (surrogate) schedule)

unique(task_id, skill_id)

The unique index on task_id and skill_id is to prevent duplicate listings for a single skill within a single task for a single project.

Assign

asn_id (surrogate)

unique (ps_id, emp_id, ts_id)

The unique index on ps_id, emp_id, and ts_id is used to ensure that an employee cannot be assigned twice to perform the same skill on the same task for a given project.

Worklog

wl_id (surrogate)

unique(asn_id, wl_date)

The unique indexes on asn_id and wl_date are used to ensure that no duplicate work log entries exist (for an employee) on a given date.

Bill

bill_id (surrogate) It is important to point out to your students that the surrogate primary keys are usually not shown in the graphical user interfaces that are available to the end users. The only function of the surrogate primary key is to provide a single-attribute identifier for each row in the table. Answer: The completed ERD for the GCS database is shown in Figure P5-11C.

180

FIGURE P5.11C ERD for the GCS Database

181

TABLE OF CONTENTS Answers to Review Questions .................................................................................................1 Answers to Problems .............................................................................................................10

ANSWERS TO REVIEW QUESTIONS 159. What is normalization? Answer: Normalization is the process for assigning attributes to entities. Properly executed, the normalization process eliminates uncontrolled data redundancies, thus eliminating the data anomalies and the data integrity problems that are produced by such redundancies. Normalization does not eliminate data redundancy; instead, it produces the carefully controlled redundancy that lets us properly link database tables. 160. When is a table in 1NF? Answer: A table is in 1NF when all the key attributes are defined (no repeating groups in the table) and when all remaining attributes are dependent on the primary key. However, a table in 1NF still may contain partial dependencies, that is, dependencies based on only part of the primary key and/or transitive dependencies that are based on a nonkey attribute. 161. When is a table in 2NF? Answer: A table is in 2NF when it is in 1NF and it includes no partial dependencies. However, a table in 2NF may still have transitive dependencies, that is, dependencies based on attributes that are not part of the primary key. 162. When is a table in 3NF? Answer: A table is in 3NF when it is in 2NF and it contains no transitive dependencies. 163. When is a table in BCNF? Answer: A table is in Boyce-Codd Normal Form (BCNF) when it is in 3NF and every determinant in the table is a candidate key. For example, if the table is in 3NF and it contains a nonprime attribute that determines a prime attribute, the BCNF requirements are not met. (Reference the text’s Figure 6.8 to support this discussion.) This description clearly yields the following conclusions: 

If a table is in 3NF and it contains only one candidate key, 3NF and BCNF are equivalent.



BCNF can be violated only if the table contains more than one candidate key. Putting it another way, there is no way that the BCNF requirement can be violated if there is only one candidate key.

164. Given the dependency diagram shown in Figure Q6.6, Answer Items 6a–6c. Answer:

FIGURE Q6.6 Dependency Diagram for Question 6

182

a. Identify and discuss each of the indicated dependencies. C1  C2 represents a partial dependency, because C2 depends only on C1, rather than on the entire primary key composed of C1 and C3. C4  C5 represents a transitive dependency, because C5 depends on an attribute (C4) that is not part of a primary key. C1, C3  C2, C4, C5 represents a set of proper functional dependencies, because C2, C4, and C5 depend on the primary key composed of C1 and C3. b. Create a database whose tables are at least in 2NF, showing the dependency diagrams for each table. The normalization results are shown in Figure Q6.6b.

183

FIGURE Q6.6b The Dependency Diagram for Question 6b Table 1

Primary key: C1 Foreign key: None Normal form: 3NF

Table 2 C1

Primary key: C1 + C3 Foreign key: C1 (to Table 1) Normal form: 2NF, because the table exhibits the transitive dependencies C4 C5

c. Create a database whose tables are at least in 3NF, showing the dependency diagrams for each table. The normalization results are shown in Figure Q6.6c.

FIGURE Q6.6c The Dependency Diagram for Question 6c

Table 1 Primary key: C1 Foreign key: None Normal form: 3NF

Table 2 Primary key: C1 + C3 Foreign key: C1 (to Table 1) C4 (to Table 3) Normal form: 3NF

Table 3 Primary key: C4 Foreign key: None Normal form: 3NF

184

165. The dependency diagram in Figure Q6.7 indicates that authors are paid royalties for each book they write for a publisher. The amount of the royalty can vary by author, by book, and by edition of the book. Answer:

FIGURE Q6.7 Book Royalty Dependency Diagram

Students may have questions about the last sentence in the problem statement. Illustrate to the student the following facts to clarify this problem: If a book can only have one author, then you can imply that knowing the ISBN, you also know the author and the royalty rate. However, it is very common for a book to have multiple authors, and in that case, the authors may have the same or different royalty rates. The main point of this design is the flexibility of it. Flexibility is important because it “future-proofs” the data model by allowing the model to support changes in the business rules in the future. a. Based on the dependency diagram, create a database whose tables are at least in 2NF, showing the dependency diagram for each table. The normalization results are shown in Figure Q6.7a.

185

FIGURE Q6.7a The 2NF Normalization Results for Question 7a

b. Create a database whose tables are at least in 3NF, showing the dependency diagram for each table. The normalization results are shown in Figure Q6.7b.

FIGURE Q6.7b The 3NF Normalization Results for Question 7b

186

166. The dependency diagram in Figure Q6.8 indicates that a patient can receive many prescriptions for one or more medicines over time. Based on the dependency diagram, create a database whose tables are in at least 2NF, showing the dependency diagram for each table.

Answer: FIGURE Q6.8 Prescription Dependency Diagram

The normalization results are shown in Figure Q6.8a.

FIGURE Q6.8a The 2NF Normalization Results for Question 8

167. What is a partial dependency? With what normal form is it associated? Answer: A partial dependency exists when an attribute is dependent on only a portion of the primary key. This type of dependency is associated with 1NF. The second normal form (2NF) eliminates partial dependencies.

187

168. What three data anomalies are likely to be the result of data redundancy? How can such anomalies be eliminated? Answer: The most common anomalies considered when data redundancy exists are: update anomalies, addition anomalies, and deletion anomalies. All these can easily be avoided through data normalization. Data redundancy produces data integrity problems, caused by the fact that data entry failed to conform to the rule that all copies of redundant data must be identical. 169. Define and discuss the concept of transitive dependency. Answer: Transitive dependency is a condition in which an attribute is dependent on another attribute that is not part of the primary key. This kind of dependency usually requires the decomposition of the table containing the transitive dependency. To remove a transitive dependency, the designer must perform the following actions: 

Place the attributes that create the transitive dependency in a separate table.



Make sure that the new table’s primary key attribute is the foreign key in the original table.

Figure Q6.11 shows an example of a transitive dependency removal.

FIGURE Q6.11 Transitive Dependency Removal

188

170. What is a surrogate key, and when should you use one? Answer: A surrogate key is an artificial PK introduced by the designer with the purpose of simplifying the assignment of primary keys to tables. Surrogate keys are usually numeric, they are often automatically generated by the DBMS, they are free of semantic content (they have no special meaning), and they are usually hidden from the end users. 171. Why is a table whose primary key consists of a single attribute automatically in 2NF when it is in 1NF? Answer: A dependency based on only a part of a composite primary key is called a partial dependency. Therefore, if the PK is a single attribute, there can be no partial dependencies. 172. How would you describe a condition in which one attribute is dependent on another attribute when neither attribute is part of the primary key? Answer: This condition is known as a transitive dependency. A transitive dependency is a dependency of one nonprime attribute on another nonprime attribute. (The problem with transitive dependencies is that they still yield data anomalies.) 173. Suppose someone tells you that an attribute that is part of a composite primary key is also a candidate key. How would you respond to that statement? Answer: This argument is incorrect if the composite PK contains no redundant attributes. If the composite primary key is properly defined, all of the attributes that compose it are required to identify the remaining attribute values. By definition, a candidate key is one that can be used to identify all of the remaining attributes, but it was not chosen to be a PK for some reason. In other words, a candidate key can serve as a primary key, but it was not chosen for that task for one reason or another. Clearly, a part of a proper (“minimal”) composite PK cannot be used as a PK by itself. More formally, you learned in Chapter 3, “The Relational Database Model,” Section 3-2, that a candidate key can be described as a superkey without redundancies, that is, a minimal superkey. Using this distinction, note that a STUDENT table might contain the composite key STU_NUM, STU_LNAME This composite key is a superkey, but it is not a candidate key because STU_NUM by itself is a candidate key! The combination STU_LNAME, STU_FNAME, STU_INIT, STU_PHONE might also be a candidate key, as long as you discount the possibility that two students share the same last name, first name, initial, and phone number. If the student’s Social Security number had been included as one of the attributes in the STUDENT table—perhaps named STU_SOCSECNUM—both it and STU_NUM would have been candidate keys because either one would uniquely identify each student. In that case, the selection of STU_NUM as the primary key would be driven by the designer’s choice or by end-user requirements. Note, incidentally, that a primary key is a superkey as well as a candidate key. 174. A table is in ______ normal form when it is in ______ and there are no transitive dependencies. Answer: See the discussion in Section 6-3c, “Conversion to Third Normal Form (3NF).”

189

ANSWERS TO PROBLEMS Using the descriptions of the attributes given in the figure, convert the ERD shown in Figure P6.1 into a dependency diagram that is in at least 3NF. Answer: An initial dependency diagram depicting only the primary key dependencies is shown in Figure P6.1a below.

FIGURE P6.1a Initial Dependency Diagram for Problem 1

There are no composite keys being used, therefore, by definition, there is not an issue with partial dependencies and the entities are already in 2NF. Based on the descriptions of the attributes, it appears that the patient name, phone number, and address can be determined by the patient id number. Therefore, the following transitive dependency can be determined. App_PatientID  (App_Name, App_Phone, App_Street, App_City, App_State, App_Zip) As discussed in the chapter, ZIP_Codes can be used to determine a city and state; therefore, we also have the transitive dependency: App_Zip  App_City, App_State Figure P6.1b depicts the dependency diagram with these transitive dependencies included.

190

FIGURE P6.1b Revised Dependency Diagram for Problem 1

Since the first transitive dependency completely encloses the second transitive dependency, it is appropriate to resolve the first transitive dependency before resolving the second. Figure P6.1c shows the results of resolving the first transitive dependency.

FIGURE P6.1c Resolving the First Transitive Dependency

Finally, the second and final transitive dependency can now be resolved as shown in the final dependency diagram in Figure P6.1d.

191

FIGURE P6.1d Final Dependency Diagram for Problem 1

Note that at this time, we have resolved all of the transitive dependencies. Decisions on whether or not to denormalize, and perhaps not remove the final transitive dependency, have yet to be made. Also, the structures have not yet had the benefit of additional design modifications such as achieving proper naming conventions for the attributes in the new tables. However, creating the fully normalized structures is an important set toward making informed decisions about the compromises in the design that we may choose to make. NOTE: Please note that we are making the assumption that a zip code only determines one city and state. Unfortunately, this is not true, there are a handful of zip codes that traverse states. In these cases, it would be appropriate not to use the [App_zip, App_City, App_State] relation and instead add these attributes to the previous relation. Hence, the relation would be: [App_PatiendID, App_Name, App_Phone, App_Street, App_City, App_Zip, App_State]. 175. Using the descriptions of the attributes given in the figure, convert the ERD shown in Figure P6.2 into a dependency diagram that is in at least 3NF. Answer: An initial dependency diagram depicting only the primary key dependencies is shown in Figure P6.2a below.

192

FIGURE P6.2a Initial Dependency Diagram for Problem 2

Based on the descriptions of the attributes given, the following partial dependency can be determined: Pres_SessionNum  (Pres_Date, Pres_Room) Also, the following transitive dependencies can be determined: Pres_AuthorID  (Pres_FName, Pres_LName) Figure P6.2b shows the revised dependency diagram including the partial and transitive dependencies.

FIGURE P6.2b Revised Dependency Diagram for Problem 2

Resolving the partial dependency to achieve 2NF yields the dependency diagram shown in Figure P6.2c.

193

FIGURE P6.2c 2NF Dependency Diagram for Problem 2

Finally, the transitive dependency is resolved to achieve the 3NF solution shown in the final dependency diagram in Figure P6.2d.

194

FIGURE P6.2d Final Dependency Diagram for Problem 2

195

176. Using the INVOICE table structure shown in Table P6.3, do the following: Answer:

TABLE P6.3 Attribute Name

Sample Value

INV_NUM

211347

211348

211349

PROD_NUM

AA-E3422QW

QD-300932X

RU-995748G

AA-E3422QW

GH-778345P

SALE_DATE

15-Jan-2022

16-Jan-2022

PROD_LABEL

Rotary sander

0.25-in. bit

Rotary sander

Power drill

VEND_CODE

211

309

211

157

VEND_NAME

NeverFail, Inc.

BeGood, Inc.

NeverFail, Inc.

ToughGo, Inc.

QUANT_SOLD 1

PROD_PRICE

$3.45

$39.99

$49.95

$87.75

$49.95

drill Band saw

a. Write the relational schema, draw its dependency diagram and identify all dependencies, including all partial and transitive dependencies. You can assume that the table does not contain repeating groups and that an invoice number references more than one product. (Hint: This table uses a composite primary key.) Answer: The solutions to both problems (3a and 3b) are shown in Figure P6.3a.

NOTE We have combined the solutions to Problems 3a and 3b to let you illustrate the start of the normalization process within a single PowerPoint slide. Students generally seem to have an easier time understanding the normalization process if they can compare the normal forms directly. We will continue to use this technique for several of the initial normalization decompositions if the available PowerPoint slide space permits it. b. Remove all partial dependencies, write the relational schema, and draw the new dependency diagrams. Identify the normal forms for each table structure you created.

196

NOTE You can assume that any given product is supplied by a single vendor but a vendor can supply many products. Therefore, it is proper to conclude that the following dependency exists: PROD_NUM VEND_NAME

→

PROD_DESCRIPTION,

PROD_PRICE,

VEND_CODE,

(Hint: Your actions should produce three dependency diagrams.)

FIGURE P6.3a The Dependency Diagrams for Problems 3a and 3b

c. Remove all transitive dependencies, write the relational schema, and draw the new dependency diagrams. Also identify the normal forms for each table structure you created. Answer: To illustrate the effect of Problem 3’s complete decomposition, we have shown Problem 3a’s dependency diagram again in Figure P6.3c.

197

FIGURE P6.3c The Dependency Diagram for Problem 3c

d. Draw the Crow’s Foot ERD.

NOTE Emphasize that, because the dependency diagrams cannot show the nature (1:1, 1:M, M:N) of the relationships, the ER Diagrams remain crucial to the design effort. Complex design is impossible to produce successfully without some form of modeling, be it ER, Semantic Object Modeling, or some other modeling methodology. Yet, as the preceding decompositions demonstrate, the dependency diagrams are a valuable addition to the designer’s toolbox. (Normalization is likely to suggest the existence of entities that may not have been considered during the modeling process.) And, if information or transaction management issues require the existence of attributes that create other than 3NF or BCNF conditions, the proper dependency diagrams will at least force awareness of these conditions.

198

Answer: The invoicing ERD, accompanied by its relational diagram, is shown in Figure P6.3d. (The relational diagram only includes the critical PK and FK components, plus a few sample attributes, for space considerations.)

FIGURE P6.3d The Invoicing ERD and Its (Partial) Relational Diagram

Crow’s Foot Invoicing ERD

Invoicing Relational Diagram, Sample Attributes INVOICE INV_NUM INV_DATE

LINE 1

INV_NUM PROD_NUM NUM_SOLD

PRODUCT 1 M

PROD_NUM

VEND_CODE

PROD_DESCRIPTION PROD_PRICE

VENDOR

VEND_NAME M

VEND_CODE

199

177. Using the STUDENT table structure shown in Table P6.4, do the following: Answer:

TABLE P6.4 Attribute Name

Sample Value

STU_NUM

211343

200128

199876

198648

223456

STU_LNAME

Stephanos

Smith

Jones

Ortiz

McKulski

STU_MAJOR

Accounting

Marketing Marketing

Statistics

DEPT_CODE

ACCT

MKTG

MATH

DEPT_NAME

Accounting

Marketing Marketing

Mathematics

DEPT_PHONE

4356

4378

3420

COLLEGE_NAME

Business Admin

Arts & Sciences

ADVISOR_LNAME

Grastrand

Gentry

Tillery

Chen

ADVISOR_OFFICE

T201

T228

T356

J331

ADVISOR_BLDG

Torre Building

Jones Building

ADVISOR_PHONE

2115

2123

2159

3209

STU_GPA

3.87

2.78

2.31

3.45

3.58

STU_HOURS

117

113

STU_CLASS

Junior

Sophomore

Senior

Junior

MKTG

a. Write the relational schema and draw its dependency diagram. Identify all dependencies, including all transitive dependencies. Answer: The dependency diagram for Problem 4a is shown in Figure P6.4a.

200

FIGURE P6.4a The Dependency Diagram for Problem 4a

STU_NUM STU_LNAME STU_MAJOR DEPT_CODE DEPT_NAME DEPT_PHONE COLLEGE_NAME

Transitive Dependencies

ADV_LASTNAME ADV_OFFICE ADV_BUILDING ADV_PHONE STU_CLASS STU_GPA STU_HOURS

Transitive Dependency

Note 1: The ADV_LASTNAME is not a determinant of ADV_OFFICE or ADV_PHONE, because there are (potentially) many advisors who have the same last name. Note 2: If a department has only one phone, DEPT_CODE is a determinant of DEPT_PHONE. If a department has several phones, the DEPT_CODE is no longer a determinant of DEPT_PHONE. In any case, if you know the DEPT_PHONE value, you know the DEPT_CODE value. Therefore, DEPT_PHONE is a determinant of DEPT_CODE. This latter dependency, indicated in orange, sets the stage for a BCNF violation when the initial structure is normalized. Note 3: ADV_OFFICE is a determinant of ADV_BUILDING if the ADV_OFFICE is , in effect, a code. For example, if offices such as HE-201 and HE-324 use the prefix HE to indicate their location in the Heinz building, the office locators determine the building.

As you discuss Figure 6.4a, note that the single attribute PK (STU_NUM) automatically places this table in 2NF, because it is not possible to have partial dependencies when the PK consists of a single attribute. The relational schema for the dependency diagram shown in Figure P6.4a is written as: STUDENT(STU_NUM, STU_LNAME, STU_MAJOR, DEPT_CODE, DEPT_NAME, DEPT_PHONE, ADVISOR_LNAME, ADVISOR_OFFICE, ADVISOR_BLDG, ADVISOR_PHONE, STU_GPA, STU_HOURS, STU_CLASS) Notice the ADVISOR_OFFICE values in Figure P6.4a show a literal prefix that we can interpret represents the building. Furthermore, notice how the first letter matches the building name. Based on this we say that there is a transitive dependency in which ADVISOR_OFFICE determines ADVISOR_BLDG. b. Write the relational schema and draw the dependency diagram to meet the 3NF requirements to the greatest practical extent possible. If you believe that practical considerations dictate using a 2NF structure, explain why your decision to retain 2NF is appropriate. If necessary, add or modify attributes to create appropriate determinants and to adhere to the naming conventions.

201

NOTE Although the completed student hours (STU_HOURS) do determine the student classification (STU_CLASS), this dependency is not as obvious as you might initially assume it to be. For example, a student is considered a junior if the student has completed between 61 and 90 credit hours. Answer: The normalized structure is shown in Figure P6.4b. The relational schemas are written as: STUDENT(STU_NUM, STU_LNAME, STU_MAJOR, DEPT_CODE, ADVISOR_NUM, STU_GPA, STU_HOURS, STU_CLASS) (Note that we have added the ADVISOR_NUM to serve as an FK to the advisor attributes.) MAJOR(MAJOR_CODE, DEPT_CODE, MAJOR_DESCRIPTION) BUILDING(BLDG_CODE, BLDG_NAME, BLDG_MANAGER) DEPARTMENT(DEPT_CODE, DEPT_NAME, DEPT_PHONE, COLLEGE_CODE) COLLEGE(COLL_CODE, COLL_NAME) (After studying Chapter 4, “Entity Relationship (ER) Modeling,” your students should know enough about database design to suggest many improvements in the design before it can be implemented.)

202

FIGURE P6.4b The Normalized Dependency Diagrams for Problem 4b

STU_NUM STU_LNAME STU_MAJOR DEPT_CODE ADV_NUM STU_CLASS STU_GPA STU_HRS

Transitive Dependency

MAJOR_CODE DEPT_CODE MAJOR_DESCRIPTION BLDG_CODE BLDG_NAME BLDG_MANAGER DEPT_CODE DEPT_NAME DEPT_PHONE COLL_CODE COLL_CODE COLL_NAME Note: If a department has only one phone, DEPT_CODE is a determinant of DEPT_PHONE. If a department has several phones, the DEPT_CODE is not a determinant of the DEPT_PHONE. However, if you know a department phone number, you also know the DEPT_CODE ... thus creating a condition in which the BCNF requirement is not met.

Note: If several advisors share a phone, the ADV_PHONE is not a determinant of the other advisor attributes.

ADV_NUM ADV_LASTNAME ADV_OFFICE ADV_BUILDING ADV_PHONE Transitive Dependency Note: The ADV_NUM attribute was created to produce a proper primary key. The dotted transitive dependency line indicates that this dependency is subject to interpretation. (See the discussion in the IM text.)

As you discuss Figure P6.4b, explain that, in this case, the STUDENT table structure indicates a 2NF condition because two transitive dependencies exist. If there is an information requirement to track the components of each major, we can break out a major code, store it in STUDENT, create a new entity named MAJOR, and relate it to its department in a 1:M relationship. (Each department offers many majors, but only one department offers each major.) Creating a new entity to eliminate the student classification-induced transitive dependency increases implementation complexity needlessly; student hours are updated each semester by application software and other application software can then use a look-up table to update the classification when necessary. Structure simplicity is a virtue. In any case, the normalization diagram may be modified as shown next. (We have added a few attributes, such as BLDG_MANAGER, to improve the database’s ability to provide information.) Note that the assumptions inherent in the business rules also make an impact on normalization practices!

203

If the room is numbered to reflect the building it is in—for example, HE105 indicates room 105 in the Heinz building—one might argue that the ADV_OFFICE value is the determinant of the ADV_BUILDING. (You will learn in Chapter 7 that you can create a query to find a building by looking at room prefixes.) However, if you define dependencies in strictly relational algebra terms, you might argue that partitioning the attribute value to “create” a dependency indicates that the partitioned attribute is not (in that strict sense) a determinant. Although we have indicated a transitive dependency from ADV_OFFICE to ADV_BUILDING, we have used a dotted line to indicate that there is room for argument in this set of transitive dependencies. In any case, the (arguable) dependency ADV_OFFICE  ADV_BUILDING does not create any problems in a practical sense, so it is acceptable to ignore this (arguable) transitive dependency. Keep in mind that the decomposition shown in Figure P6.4b is subject to many modifications, depending on information requirements and business rules. For example, both the department and the college may be tied to the building in which they are located. Additional modifications are discussed in the answer to Problem 9. c. Using the results of Problem 4, draw the Crow’s Foot ERD.

NOTE This ERD constitutes a small segment of a university’s full-blown design. For example, this segment might be combined with the Tiny College presentation in Chapter 4. Answer: The Crow’s Foot ERD is shown in Figure P6.4c.

FIGURE P6.4c The College ERD

As you examine the ER diagrams in Figure P6.4c, note that we have made several assumptions that cannot be inferred directly from the dependency diagram in Problem 4b. For example: 

Apparently, some buildings do not house advisors. Some buildings may be used for

204

storage, others for classrooms, and so on. 

When a student is assigned to a department, that department must assign an advisor to that student. That is, a student must have an advisor. Therefore, ADVISOR is mandatory to STUDENT.



Evidently, some advisors do not (yet?) have students assigned to them. From an operational point of view, this optionality is desirable, because it enables us to create a new advisor without having to assign a student advisee to that new advisor. (The new advisor may have to receive some training before having students assigned to him or her.)



Some departments do not offer majors. For example, a department may offer service courses only.



Some colleges do not have departments. This condition is subject to a business rule that is not specified, nor can it be inferred from the dependency diagram. However, this characteristic is not unusual in a college environment. For example, some professional curricula are certified by special boards. Such boards may make certification conditional on the professional curriculum’s independence. (We have created the optionality for discussion purposes. This discussion should stress the importance of the business rules. You generate the business rules by asking detailed questions!)



All departments must be affiliated with a college.



Notice also the there is a new relationship, a DEPARTMENT employs zero or many ADVISORs.



STUDENT is optional to MAJOR. This optionality, too, is desirable from an operational point of view. For example, new majors may not (yet) have attracted students.

Business rules may change the nature of the structures shown here. For example, an advisor is likely to be a professor ... who is an employee of the university. Therefore, you might introduce a superset/subset relationship between EMPLOYEE and PROFESSOR, while the need to distinguish between professors and advisors disappears. Similarly, EMPLOYEE may be the source of information concerning the BUILDING manager, thus creating a relationship between BUILDING and EMPLOYEE. Note also that the nature of the relationships (1:1, 1:M, M:N) is not revealed by the dependency diagrams. For example, the 1:M relationship between MAJOR and DEPARTMENT (a department can offer many majors, but each major is offered by only one department) cannot be inferred from the dependency diagram. Normalization and ER modeling are part of the same design process! Finally, note that we have also included several new entities, MAJOR and BUILDING, to reflect the preceding discussion.

NOTE Remind your students that the order of the attribute listing in each entity is immaterial. Although it is customary to list the PK attribute first, there is no requirement to do so. Similarly, whether the STU_LNAME is listed before or after the STU_GPA has no effect on the STUDENT entity’s functionality.

205

178. To keep track of office furniture, computers, printers, and other office equipment, the FOUNDIT company uses the table structure shown in Table P6.5. Answer:

Table P6.5 Attribute Name

Sample Value

ITEM_ID

231134-678

342245-225

254668-449

ITEM_LABEL

HP DeskJet 895Cse

HP Toner

DT Scanner

ROOM_NUMBER

325

123

BLDG_CODE

NTC

CSF

BLDG_NAME

Nottooclear

Canseefar

BLDG_MANAGER

I. B. Rightonit

May B. Next

a. Given that information, write the relational schema and draw the dependency diagram. Make sure that you label the transitive and/or partial dependencies. Answer: The answers to this problem are shown in Figure P6.5a and the relational schema definition below the figure. Notice also the change in naming convention in some of the attributes. It is important to understand that sometimes the designer makes deductions based on the data as presented. However, such deductions may change as he/she learns more about the data and processes. For example, does the room number determine the building name? In this case, the answer depends on many factors. Can you determine the correct location of the item based on just the room number? Or do you also need the building code? These are the type of questions the designer must ask and adapt the model to the answer. The purpose is to build a model that is flexible enough to represent real-world data interactions. For this example, we will initially assume that the room number determines the building code and name.

FIGURE P6.5a The FOUNDIT Co. Initial Dependency Diagram

The dotted transitive dependency lines indicate that these transitive dependencies are subject to interpretation. We will address these dependencies in the discussion that accompanies Problem 5b’s solution. The relational schema may be written as follows: ITEM(ITEM_ID, ITEM_DESCRIPTION, ROOM_NUMBER, BLDG_CODE, BLDG_NAME, BLDG_MANAGER)

206

b. Write the relational schema and create a set of dependency diagrams that meet 3NF requirements. Rename attributes to meet the naming conventions and create new entities and attributes as necessary. Answer: The dependency diagrams are shown in Figure P6.5b. We have added a sample relational diagram to illustrate the relationships at this point. The relational schemas are written below in Figure 6.5b.

FIGURE P6.5b FOUNDIT Co. 3NF and Its Relational Diagram

The relational schemas are written as follows: ITEM (ITEM_ID, ITEM_DESCRIPTION, ROOM_NUMBER, BLDG_CODE) BUILDING (BLDG_CODE, BLDG_NAME, EMP_NUM) EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL)

207

Notice that in the diagram in Figure P6.5b, we now have determined that the location of an item is determined by the building code and room number. For reporting purposes, this change allows to easily generate a report of all items in each building. A building can have many rooms, so knowing the building code will not tell you what the room in that building is. If the room is numbered to reflect the building it is in—for example, HE105 indicates room 105 in the Heinz building—one might argue that the ROOM_NUMBER value is the determinant of the BLDG_CODE and the BLDG_NAME values. You will learn in Chapter 7, “Introduction to Structured Query Language (SQL),” that you can create a query to find a building by looking at room prefixes. However, if you define dependencies in strictly relational algebra terms, you might argue that partitioning the attribute value to “create” a dependency indicates that the partitioned attribute is not (in that strict sense) a determinant. Although initially, we had indicated a transitive dependency from ROOM_NUMBER to BLDG_CODE and BLDG_NAME, we used a dotted line to indicate that there is room for argument in this set of transitive dependencies. In any case, this (arguable) dependency does not create any problems in a practical sense, so we have not identified it in the solution. Clearly, BLDG_CODE is a determinant of BLDG_NAME. Therefore, the transitive dependency is marked properly in the Problem 5b solution. The dependency diagrams in Figure P6.5b reflect the notion that one employee manages each building. Thus, we have also renamed BLDG_MANAGER as EMP_NUM to reflect the fact that an employee is the manager of the building and we have added the EMPLOYEE entity. c. Draw the Crow’s Foot ERD. Answer: Use Figure P6.5c to show that, in this case, the ER diagram reflects the business rule that one employee can manage many (or at least more than one) buildings. Because all employees are not required to manage buildings, BUILDING is optional to EMPLOYEE in the manages relationship. Once again, the nature of this relationship is not and cannot be reflected in the dependency diagram.

NOTE We also assume here that each item has a unique item code and that, therefore, an item can be located in only one place at a time. However, we demonstrate in Appendixes B and C that inventory control requirements usually cover both durable and consumable items. Although durables such as tables, desks, lamps, computers, printers, and so on would be uniquely identified by an assigned inventory code, consumables such as individual reams of paper would clearly not be so identified. Therefore, a given inventory description such as “8.5 inch × 11 inch laser printer paper” could describe reams of paper located in many different buildings and in rooms within those buildings. We demonstrate in Appendixes B and C how such a condition may be properly handled.

208

FIGURE P6.5c The FOUNDIT Co. ERD

As you examine Figure P6.5c, note that an EMPLOYEE can manage zero to many BUILDINGs. A BUILDING contains many ROOMs. Each ROOM is located in a single building. Therefore, you can expand the design shown in Figure P6.5b to the one shown in Figure P6.5c. This solution assumes that a room is directly traceable to a building. For example, room SC-508 would be located in the Science (SC) Building and room BA-305 would be located in the Business Administration (BA) building. A ROOM can store zero or many ITEMs. Although optional participations make excellent default conditions, it is always wise to establish the optionality based on a business rule. In any case, the designer must ask about the nature of the room/building relationship. 179. The table structure shown in Table P6.6 contains many unsatisfactory components and characteristics. For example, there are several multivalued attributes, naming conventions are violated, and some attributes are not atomic. Answer:

209

Table P6.6 Attribute Name

Sample Value

EMP_NUM

1003

1018

1019

1023

EMP_LNAME

Willaker

Smith

McGuire

EMP_EDUCATION

BBA, MBA

BBA

JOB_CLASS

SLS

EMP_DEPENDENTS

Gerald (spouse), Mary (daughter), John (son)

DEPT_CODE

MKTG

DEPT_NAME

BS, MS, Ph.D. JNT

DBA

JoAnne (spouse)

George (spouse) Jill (daughter)

MKTG

SVC

INFS

Marketing

General Service

Info. Systems

DEPT_MANAGER

Jill H. Martin

Hank B. Jones

Carlos Ortez

EMP_TITLE

Sales Agent

Janitor

DB Admin

EMP_DOB

23-Dec-1968

28-Mar-1979

18-May-1982

20-Jul-1959

EMP_HIRE_DATE

14-Oct-1997

15-Jan-2006

21-Apr-2003

15-Jul-1999

EMP_TRAINING

L1, L2

L1, L3, L8, L15

EMP_BASE_SALARY

$38,255.00

$30,500.00

$19,750.00

$127,900.00

EMP_COMMISSION_RATE

0.015

0.010

a. Given the structure shown in Table P6.6, write the relational schema and draw its dependency diagram. Label all transitive and/or partial dependencies. Answer: The dependency diagram is shown in Figure P6.6a. Note that the order of the attributes has been changed to make the transitive dependencies easier to mark. (In any case, the order in which the attributes are written into a relational database table is immaterial.) The relational schema is written below in Figure P6.6a. Please note the change of name of the attribute EMP_NUM to EMP_CODE to illustrate that the employee identification could include character and numeric values. In addition, EMP_TITLE was changed to JOB_TITLE to indicate that the position determines the title.

210

FIGURE P6.6a The Dependency Diagram for Problem 6a

The relational schema is written as: EMPLOYEE (EMP_CODE, EMP_LNAME, EMP_EDUCATION, DEPT_CODE, DEPT_NAME, DEPT_MANAGER, EMP_DEPENDENTS, EMP_DOB, EMP_HIRE_DATE, EMP_TRAINING, JOB_TITLE, JOB_CLASS, EMP_BASE_SALARY, EMP_COMMISSION_RATE) b. Draw the dependency diagrams that are in 3NF. (Hint: You might have to create a few new attributes. Also make sure that the new dependency diagrams contain attributes that meet proper design criteria; that is, make sure there are no multivalued attributes, that the naming conventions are met, and so on.) Answer: Dependency diagrams have no way to indicate multivalued attributes, nor do they provide the means through which such attributes can be handled. Therefore, the solution to this problem requires a basic knowledge of modeling concepts, once again indicating that normalization and design are part of the same process. Given the sample data shown in Problem 6, EDUCATION, DEPENDENT, and TRAINING are multivalued attributes whose values are stored as comma-separated string values. We have created the appropriate entities to avoid the use of multivalued attributes. (See Figure P6.6b.)

211

FIGURE P6.6b The Dependency Diagrams for Problem 6b

At this time, it will be good to preface the discussion of Figure P6.6b reminding students that a real-world design would have to include additional entities or additional attributes. For example, the EMPLOYEE entity may include attributes such as employee experience—perhaps measured by time, longevity, and so on. And, of course, you might include year-to-date (YTD) earnings and taxes in each employee’s records, too. This problem is a great source of discussion material! Notice that the multivalued attributes for dependents, education, and training have been resolved using new entities (DEPENDENT, EMPEDU, and EMPTRN) in one-to-many relationships with EMPLOYEE. The new design also resolves the transitive dependencies for DEP_CODE and JOB_CLASS.

212

The relational schemas are written as: EMPLOYEE(EMP_CODE, EMP_LNAME, DEPT_CODE, JOB_CLASS, EMP_DOB, EMP_HIRE_DATE, EMP_BASE_SALARY, EMP_COMMISSION_RATE) DEPENDENT(EMP_CODE, DEP_NUM, DEP_FNAME, DEP_TYPE) EDUCATION(EDU_CODE, EDU_DESCRIPTION) EMPEDU(EMP_CODE, EDU_CODE, DATE_EARNED) TRAINING(TRN_CODE, TRN_DESCRIPTION) EMPTRN(EMP_CODE, TRN_CODE, DATE_EARNED) DEPARTMENT(DEPT_CODE, DEPT_NAME, EMP_CODE) JOB(JOB_CLASS, JOB_TITLE) c. Draw the relational diagram. Answer: The relational diagram is shown in Figure P6.6c.

FIGURE P6.6c The Relational Diagram for Problem 6c

d. Draw the Crow’s Foot ERD. Answer: The Crow’s Foot solution is shown in Figure P6.6d.

213

FIGURE P6.6d The Crow’s Foot ERD for Problem 6d

180. Suppose you are given the following business rules to form the basis for a database design. The database must enable the manager of a company dinner club to mail invitations to the club’s members, to plan the meals, to keep track of who attends the dinners, and so on. 

Each dinner serves many members, and each member may attend many dinners.



A member receives many invitations, and each invitation is mailed to many members.



A dinner is based on a single entree, but an entree may be used as the basis for many dinners. For example, a dinner may be composed of a fish entree, rice, and corn, or the dinner may be composed of a fish entree, a baked potato, and string beans.

Answer: Because the manager is not a database expert, the first attempt at creating the database uses the structure shown in Table P6.7.

Table P6.7 Attribute Name

Sample Value

MEMBER_NUM

214

235

214

MEMBER_NAME

Alice VanderVoort

MEMBER_ADDRESS

325 Meadow Park

123 Rose Court

325 Meadow Park

MEMBER_CITY

Murkywater

Highlight

Murkywater

MEMBER_ZIPCODE

12345

12349

12345

INVITE_NUM

B. Gerald M. Gallega

Alice VanderVoort

214

Attribute Name

Sample Value

INVITE_DATE

23-Feb-2022

12-Mar-2022

23-Feb-2022

ACCEPT_DATE

27-Feb-2022

15-Mar-2022

27-Feb-2022

DINNER_DATE

15-Mar-2022

17-Mar-2022

15-Mar-2022

DINNER_ATTENDED

Yes

DINNER_CODE

DI5

DI2

DINNER_DESCRIPTION

Glowing sea delight

Glowing delight

ENTREE_CODE

EN3

EN5

ENTREE_DESCRIPTION

Stuffed crab

Marinated steak

DESSERT_CODE

DE8

DE5

DE2

sea Ranch Superb

DESSERT_DESCRIPTION Chocolate mousse Cherries jubilee with raspberry sauce

Apple pie honey crust

with

a. Given the table structure illustrated in Table P6.7, write the relational schema and draw its dependency diagram. Label all transitive and/or partial dependencies. (Hint: This structure uses a composite primary key.) Answer: The last sentence of the problem indicates that the manager, who knows nothing about database design, attempted to use a composite key with the structure that was created. As such, the relational schema may be written as follows: MEMBER(MEMBER_NUM, INVITE_NUM, MEMBER_NAME, MEMBER_ADDRESS, MEMBER_CITY, MEMBER_ZIP_CODE, INVITE_DATE, ACCEPT_DATE, DINNER_DATE, DINNER_ATTENDED, DINNER_CODE, ENTRÉE_CODE, ENTRÉE_DESCRIPTION, DESSERT_CODE, DESSERT_DESCRIPTION) However, based on the data shown, we can see that each invitation has a unique identifier. As discussed in Chapter 3, the manager’s selected composite primary key is not acceptable because it is not a candidate key. If an attribute in a composite superkey can be removed and the remaining attributes still form a superkey, then the original composite key was not a candidate key. In this case, the manager’s choice of (MEMBER_NUM + INVITE_NUM) was not a candidate key because if we remove MEMBER_NUM from the composite, the remaining attribute INVITE_NUM is still a superkey. Improving the primary key selection based on this analysis of the keys, we can improve the design to 1NF. We can see that each invitation is to a specific user and for a specific dinner. So, the new relational schema will be:

215

INVITE(INVITE_NUM, MEMBER_NUM, MEMBER_NAME, MEMBER_ADDRESS, MEMBER_CITY, MEMBER_ZIP_CODE, INVITE_DATE, ACCEPT_DATE, DINNER_DATE, DINNER_ATTENDED, DINNER_CODE, ENTRÉE_CODE, ENTRÉE_DESCRIPTION, DESSERT_CODE, DESSERT_DESCRIPTION) The dependency diagram is shown in Figure P6.7a. Note that DIN_CODE in Figure P6.7a does not determine DIN_ATTEND; just because a dinner is offered does not mean that it is attended. Note also that we have shortened the prefixes—for example, MEMBER_ADDRESS has been shortened to MEM_ADDRESS—to provide sufficient space to include all the attributes.

FIGURE P6.7a The Dependency Diagram for Problem 7a

b. Break up the dependency diagram you drew in Problem 7a to produce dependency diagrams that are in 3NF and write the relational schema. (Hint: You might have to create a few new attributes. Also, make sure that the new dependency diagrams contain attributes that meet proper design criteria; that is, make sure there are no multivalued attributes, that the naming conventions are met, and so on.) Answer: Actually, there is no way to prevent the existence of multivalued attributes by merely following normalization rules. Instead, knowledge of ER modeling concepts will help define the environment in which the multivalued attributes are dealt with. Although we keep repeating the message, it is worth repeating: normalization and modeling fit within the same design spectrum and they take place concurrently as the definition of entities and their attributes take place.

216

The design process can be described thus: 

Define entities, attributes, and relationships and model them.



Normalize.



Redesign based on the normalization outcomes and the evaluation of the design’s ability to meet transaction and information requirements.



Normalize the results and evaluate the normal forms until the process has yielded a stable design, implementation, and applications development environment.

Such a process will yield the dependency diagrams shown in Figure P6.7b. In this case, it hardly seems practical to eliminate the 2NF condition displayed by MEMBER. After all, zip codes tend to be thought of as part of the address. Worse, the elimination of the MEMBER’s 2NF condition would require the creation of a ZIPCODE table, with ZIP_CODE as the foreign key in the MEMBER table. Such a solution would merely add complexity without adding functionality.

FIGURE P6.7b The Dependency Diagram for Problem 7b

217

As you examine Figure P6.7b, note how easy it is to see the functionality of the decomposition. For example, the INVITATION, DINNER and MEMBER entities make it possible to track who was sent an invitation on what date (INVITE_DATE) to a dinner to be held at some specified date (DIN_DATE), what dinner (DIN_CODE) would be served on that date, who (MEM_NUM) accepted the invitation (INVITE_ACCEPT), and who actually attended (INVITE_ATTEND). The INVITE_ACCEPT attribute would be a simple Y/N, as would be the INVITE_ATTEND. To avoid nulls, the default values for INVITE_ACCEPT and INVITE_ATTEND could be set to N. Getting the number of acceptances for a given dinner by a given date would be simple, thus enabling the catering service to plan the dinner better. The relational schemas follow: INVITATION(INVITE_NUM, INVITE_DATE, DIN_CODE, MEM_NUM, INVITE_ACCEPT, INVITE_ATTEND) MEMBER(MEM_NUM, MEM_NAME, MEM_ADDRESS, MEM_CITY, MEM_STATE, MEM_ZIP) DINNER(DIN_CODE, DIN_DATE, DIN_DESCRIPTION, ENT_CODE, DES_CODE) ENTRÉE(ENT_CODE, ENT_DESCRIPTION) DESSERT(DES_CODE, DES_DESCRIPTION) Naturally, to tracks costs and revenues, the manager would ask you to add appropriate attributes in DESSERT and ENTRÉE. For example, the DESSERT table might include DES_COST and DES_PRICE to enable the manager to track net returns on each dessert served. One would also expect that the manager would want to track YTD expenditures of the members and, of course, there would have to be an invoicing module for billing purposes. And what about keeping track of member balances as the members charge meals and make payments on account? c. Using the results of Problem 7b, draw the Crow’s Foot ERD. Answer: The Crow’s Foot ERD is shown in Figure P6.7c.

218

FIGURE P6.7c The Crow’s Foot ERD for Problem 7c

181. Use the dependency diagram shown in Figure P6.8 to work the following problems. Answer:

FIGURE P6.8 Initial Dependency Diagram for Problem 8

B C

F G

a. Break up the dependency diagram shown in Figure P6.8 to create two new dependency diagrams: one in 3NF and one in 2NF. Answer: The dependency diagrams are shown in Figure P6.8a.

219

FIGURE P6.8a The Dependency Diagram for Problem 8a

b. Modify the dependency diagrams you created in Problem 8a to produce a set of dependency diagrams that are in 3NF. (Hint: One of your dependency diagrams should be in 3NF but not in BCNF.) Answer: The solution is shown in Figure P6.8b.

FIGURE P6.8b The Dependency Diagram for Problem 8b

c. Modify the dependency diagrams you created in Problem 8b to produce a collection of dependency diagrams that are in 3NF and BCNF. Answer: The solution is shown in Figure P6.8c. Note that the A, C, and E attributes in the first three structures can be used as foreign keys in the fourth structure.

FIGURE P6.8c The Dependency Diagrams for Problem 8c

220

182. Suppose you have been given the table structure and data shown in Table P6.9, which was imported from an Excel spreadsheet. The data reflect that a professor can have multiple advisees, can serve on multiple committees, and can edit more than one journal. Answer:

Table P6.9 Attribute Name

Sample Value

EMP_NUM

123

104

118

120

PROF_RANK

Professor

Asst. Professor

Assoc. Professor

EMP_NAME

Ghee

Rankin

Ortega

Smith

DEPT_CODE

CIS

CHEM

CIS

ENG

DEPT_NAME

Computer Systems

Chemistry

Computer Systems

PROF_OFFICE

KDD-567

BLF-119

KDD-562

PRT-345

ADVISEE

1215, 2312, 3233, 2218, 2098

3102, 2782, 3311, 2008, 2876, 2222, 3745, 1783, 2378

2134, 2789, 3456, 2002, 2046, 2018, 2764

2873, 2765, 2238, 2901, 2308

COMMITTEE_CODE

PROMO, TRAF APPL, DEV

DEV

SPR, TRAF

PROMO, DEV

JOURNAL_CODE

JMIS, JMGT

Info.

QED,

Info.

English

SPR

JCIS, JMGT

Given the information in Table 6.9: a. Draw the dependency diagram. Answer: The dependency diagram is shown in Figure P6.9a.

221

FIGURE P6.9a The Dependency Diagram for Problem 9a

Note that Figure P6.9a reflects several ambiguities. For example, although each PROF_OFFICE value shown in Table P6.9 is unique, does that limited information indicate that each professor has a private office? If so, the office number identifies the professor who uses that office. This condition yields a dependency. However, this dependency is not a transitive one, because a nonkey attribute, PROF_OFFICE, determines the value of a key attribute, EMP_NUM. (We have indicated this potential transitive dependency through a dashed dependency line.)

NOTE The assumption that PROF_OFFICE  EMP_CODE is a rather restrictive one, because it would mean that professors cannot share an office. One could safely assume that administrators at all levels would not care to be tied by such a restrictive office assignment requirement. Therefore, we will remove this restriction in the remaining problem solutions. Also, note that there is no reliable way to identify the effect of multivalued attributes on the dependencies. For example, EMP_NUM = 123 could identify any one of five advisees. Therefore, knowing the EMP_NUM does not identify a specific ADVISEE value. The same is true for the COMMITTEE_CODE and JOURNAL_CODE attributes. Therefore, these attributes are not marked with a solid arrow line. However, if you know that EMP_NUM = 123, you will also know all five advisees, all four committee codes, and all three journal codes for that employee number value. But you do not have a unique identification for each of those attribute values. Therefore, you cannot conclude that EMP_NUM  ADVISEE, nor can you conclude that EMP_NUM  COMMITTEE_CODE or that EMP_NUM  JOURNAL_CODE. b. Identify the multivalued dependencies. Answer: Table P6.9 shows several professor attributes—ADVISEE, COMMITTEE_CODE, and JOURNAL_CODE—that represent multivalued dependencies.

222

c. Create the dependency diagrams to yield a set of table structures in 3NF. Answer: The dependency diagrams are shown in Figure P6.9c. Note that we have assumed that it is possible that professors can share an office.

FIGURE P6.9c The Dependency Diagram for Problem 9c

d. Eliminate the multivalued dependencies by converting the affected table structures to 4NF. Answer: The structures shown in Figure P6.9d conform to the 4NF requirement. Yet this normalization does not yield a viable database design. Here is another opportunity to stress that normalization without data modeling is a poor way to generate useful databases. (Note that we have assumed that an advisee can have only one advisor, but that an advisor can have many advisees.)

223

FIGURE P6.9d The Initial Dependency Diagrams for Problem 9d

The dependency diagrams shown in Figure P6.9d constitute an attempt to eliminate the shortcomings of the “system” shown in Figure P6.9c. Unfortunately, while this solution meets the normalization requirements, it lacks the ability to properly link the professors to committees and journals. (That’s because the relationships between professors and journals and between professors and committees are M:N.) This solution would yield Tables P6.9d1 and P6.9d2. (One would expect a professor to be an employee, so it’s reasonable to assume that—at some point— we’ll have to create a supertype/subtype relationship between employee and professor.)

224

Table P6.9d1 Implementation of the M:N Relationship between EMP_NUM and COMMITTEE_CODE EMP_NUM

COMMITTEE_CODE

123

PROMO

123

TRAF

123

APPL

123

DEV

104

DEV

118

SPR

118

TRAF

120

PROMO

120

SPR

120

DEV

The PK of Table P6.9d1 is EMP_NUM + COMMITTEE_CODE.

Table P6.9d2 Implementation of the M:N Relationship between EMP_NUM and JOURNAL_CODE EMP_NUM

JOURNAL_CODE

123

JMIS

123

QED

123

JMGT

118

JCIS

118

JMGT

The PK of Table P6.9d2 is EMP_NUM + JOURNAL_CODE. Because EMP_CODE = 104 does not show any entries in the JOURNAL_CODE, the employee code does not occur in Table P6.9d2. The preceding examples illustrate that multivalued attributes and M:N relationships are better modeled using the ERD. After the ERD has done its work, you should, of course, use

225

dependency diagrams to check for data redundancies! Figure P6.9e shows a more practical solution to the problem and its structures all conform to the normalization requirements. e. Draw the Crow’s Foot ERD to reflect the dependency diagrams you drew in Problem 9c. (Note: You might have to create additional attributes to define the proper PKs and FKs. Make sure that all of your attributes conform to the naming conventions.) Answer: Given the discussion in the previous problem segment d, we have incorporated additional features in the Crow’s Foot ERD shown in Figure P6.9e. Note that we have eliminated the M:N relationships in this design by creating composite entities as well as renaming JOURNAL_CODE to JOURNAL_ID. This design is implementable and it meets design standards. Normalization was part of the process that led to this solution, but it was only a part of that solution. Normalization does not replace design!

FIGURE P6.9e The Crow’s Foot ERD for Problem 9e

226

183. The manager of a consulting firm has asked you to evaluate a database that contains the table structure shown in Table P6.10. Answer:

Table P6.10 Attribute Name

Sample Value

Sample value

Sample Value

CLIENT_NUM

298

289

CLIENT_NAME

Marianne R. Brown

James D. Smith

CLIENT_REGION

Midwest

Southeast

CONTRACT_DATE

10-Feb-2022

15-Feb-2022

12-Mar-2022

CONTRACT_NUMBER

5841

5842

5843

CONTRACT_AMOUNT

$2,985,000.00

$670,300.00

$1,250,000.00

CONSULT_CLASS_1

Database Administration

Internet Services

Database Design

CONSULT_CLASS_2

Web Applications

Database Administration

CONSULT_CLASS_3

Network Installation

CONSULT_CLASS_4 CONSULTANT_NUM_1

CONSULTANT_NAME_1

Rachel G. Carson

Gerald Ricardo

CONSULTANT_REGION_1

Midwest

Southeast

CONSULTANT_NUM_2

CONSULTANT_NAME_2

Karl M. Spenser

Anne T. Dimarco

Gerald K. Ricardo

CONSULTANT_REGION_2

Midwest

Southeast

CONSULTANT_NUM_3

CONSULTANT_NAME_3

Julian H. Donatello

Geraldo J. Rivera

CONSULTANT_REGION_3

Midwest

Southeast

CONSULTANT_NUM_4

CONSULTANT_NAME_4

Donald Chen

CONSULTANT_REGION_4

West

25 K.

Angela M. Jamison

227

Table P6.10 was created to enable the manager to match clients with consultants. The objective is to match a client within a given region with a consultant in that region and to make sure that the client’s need for specific consulting services is properly matched to the consultant’s expertise. For example, if the client needs help with database design and is located in the Southeast, the objective is to make a match with a consultant who is located in the Southeast and whose expertise is in database design. (Although the consulting company manager tries to match consultant and client locations to minimize travel expense, it is not always possible to do so.) The following basic business rules are maintained: 

Each client is located in one region



A region can contain many clients.



Each consultant can work on many contracts



Each contract might require the services of many consultants.



A client can sign more than one contract, but each contract is signed by only one client.



Each contract might cover multiple consulting classifications. For example, a contract may list consulting services in database design and networking.



Each consultant is located in one region.



A region can contain many consultants.



Each consultant has one or more areas of expertise (class). For example, a consultant might be classified as an expert in both database design and networking.



Each area of expertise (class) can have many consultants. For ex ample, the consulting company might employ many consultants who are networking experts.

a. Given this brief description of the requirements and the business rules, write the relational schema and draw the dependency diagram for the preceding (and very poor) table structure. Label all transitive and/or partial dependencies. Answer: One of the first steps when working with data is to determine the scope of what you are modeling, and what exactly you are trying to model. When the problem is large and contains many pieces of data, you may want to break it down into smaller parts that are easier to model. In our example, looking at the data you have in this problem, you have clients, consultants, regions, and contracts. The main entity here is the contract that binds clients with consultants providing an expertise. Here is a perfect illustration of the value of business rules. If the business rules had not been available, the sample record would produce ambiguities. For example, if you only look at the sample data in the one available record, defining the relationships between client, contract, consultant, region, and expertise would have been difficult, at best. The business rules augment the original data and their use removes the ambiguities. The business rules help establish that a client can sign more than one contract, so you need more than the client number to identify the remaining attributes. Also, the same client can sign multiple contracts on the same date or on different dates, using the same set of consultants for each contract or a different set of consultants for each contract. Remember also that the consultants have more than one area of expertise, so the same consultant may work on different contracts for the same client or for different clients.

228

Based on the business rules and the sample data, we can conclude that the contract number (CONTRACT) is the identifier for the entire row of data. All other attributes can be identified if we know the contract number. So, we make that our primary key in our dependency diagram. Notice also the presence of repeating groups that are not identified in the diagram (consultants and expertise class). Given the combination of the business rules and the sample data, the dependencies show up in Figure P6.10a.

FIGURE P6.10a The ConsultCo Dependency Diagram

The relational schema is written as follows: Note that the PK is the first listed attribute; you can write the relational schema this way: CONTRACT(CONTRACT, CLIENT_NUM, CLIENT_NAME, DATE, CLASS_1, CLASS_2, CLASS_3, CLASS_4, REGION, CONS_NUM_1, CONS_NAME_1, REGION1 CONS_NUM_2, CONS_NAME_2, REGION2, CONS_NUM_3, CONS_NAME_3, REGION3, CONS_NUM_4, CONS_NAME_4, REGION4) In any case, remind your students that the order in which the attributes are listed is immaterial in a relational database environment. b. Break up the dependency diagram you drew in Problem 10a to produce dependency diagrams that are in 3NF and write the relational schema. (Hint: You might have to create a few new attributes. Also make sure that the new dependency diagrams contain attributes that meet proper design criteria; that is, make sure there are no multivalued attributes, that the naming conventions are met, and so on.)

229

Answer: Remind the student, one more time, to use the business rules to discover the true nature of the data relationships. For example, based on the business rules, we can conclude: 

A contract has one customer, but a customer can have many contracts.



A contract requires many expertise classes, and a expertise class can appear in many contracts.



A contract requires many consultants, and a consultant can be assigned to many contracts.



A client has one region, but a region has many clients.



A consultant has one region, but a region has many clients.



A consultant can have many expertise and an expertise can belong to many consultants.

This clearly shows the following relationships: 

CLIENT (1) – (M) CONTRACT



CONTRACT (M) – (M) CLASS (expertise classes)



CONTRACT (M) – (M) CONSULTANT



REGION (1) – (M) COSULTANT



REGION (1) – (M) CLIENT



CONSULTANT (M) – (M) CLASS (expertise classes)

To complete the structures, we have renamed some entities and attributes. Although the normalization procedure has left us with the 3NF system shown in Figure P6.10b, it is not possible to see that some of the relationships between the entities are of the M:N variety. (It would be appropriate to point out that the multivalued attributes encountered in Problem 10’s sample values are probably best handled through the use of composite entities. Similarly, the M:N relationship between contract and consultant would have to be handled through a composite entity, perhaps named ASSIGNMENT, to indicate the assignment of consultants to contracts. (We will resolve those issues in the answers to subsequent problems.) Here is yet another indication that normalization, while very useful as a tool to help eliminate data redundancies, is incapable of serving as the sole source of good database design.

230

FIGURE P6.10b The ConsultCo Dependency Diagrams in 3NF

The relational schemas are written as follows: CLIENT(CLIENT_NUM, CLIENT_NAME, REGION_CODE) CLASS(CLASS_CODE, CLASS_DESCRIPTION) CONTRACT(CONTR_NUM, CLIENT_NUM, CONTR_DATE) CONSULTANT(CONS_NUM, CONS_NAME, REGION_CODE) REGION(REGION_CODE, REGION_NAME) CONTRCLASS(CONTR_NUM, CLASS_CODE) CONTRCONS(CONTR_NUM, CONS_NUM)) SKILL(CLASS_CODE, CONS_NUM)

231

Keep in mind that the preceding dependency diagrams and relational schemas do not (yet) define a practical design. For example, processing requirements usually dictate that the attributes be made more atomic. (Printing mailing labels and creating mailing lists and phone directories would mandate the decomposition of CLIENT_NAME into CLIENT_FNAME, CLIENT_LNAME, and CLIENT_INITIAL. The CONS_NAME must be similarly decomposed.) Furthermore, notice the naming of the entities resolving the multivalue attributes and M:M relationships—using the composite names of the parent entities. These names will be further refined in the next step. Also, remember that this simple system lacks many important entities and attributes. For instance, at this point there’s no way to contact the clients, nor can clients contact the consultants. Clearly, we ought to add addresses and phone numbers. In the next step we will refine the design further to enable us to track billing charges by class, by consultant, and contract. c. Using the results of Problem 10b, draw the Crow’s Foot ERD. Answer: The final ERD is shown in Figure P6.10c. Notice that we have added some additional attributes to some of the entities. It is important to explain the role of the ASSIGNMENT entity as a multipurpose entity. 

First, this is a composite entity representing the relationship between CONTRACT and CLASS (CONTRCLASS relation in the previous schema). Remember a contract can have one or more expertise classes required. Therefore, the CONTR_NUM and CLASS_CODE attributes must be required.



The CONS_NUM is optional. Remember, a consultant is assigned to a contract based on his/her expertise class. Therefore, the consultant must match a required expertise class. Thus, we can say that the ASSIGNMENT entity also fulfills the role of the CONTRCONS relation—depicting who is assigned to a contract. However, this attribute can be made optional because we don’t know who fulfills the expertise at contract creation. The consultant can be assigned later.



The ASSIGNMENT surrogate primary key is important in this design. First, it allows to assign a consultant, not at record creation but at a later time. Second, it allows to have multiple consultants with the same expertise class in a contract. However, notice that the contract number and class code attributes are mandatory; you can’t have an assignment without a contract number or a required expertise (or role).



Note also that the ASSIGN_CHG_HOUR is written into the ASSIGNMENT table by the applications software from the CLASS table to ensure the historical accuracy of the charges. If the CLASS_CHG_HOUR changes, we must preserve the original charge per hour that was in effect when the assignment charge was made.

You can let your students use database software such as Microsoft Access to implement this system. Naturally, you can add tables and attributes to enable the system to handle invoicing and reporting of consulting activities by consultant, by class, by client, and so on. We have added a few of the appropriate entities and attributes in the answer to Problem 6.10c. The Crow’s Foot ERD is shown in Figure P6.10c.

232

FIGURE P6.10c The ConsultCo ERD for Problem 10c

The addition of the ASSIGNMENT entity addresses the problem of keeping track of billable hours and charges by consultant and the addition of the SKILL entity enables the end user to track all consultant qualifications. Whether or not optionalities are included in the ERD depends on the business rules and on the operational requirements. It is again worth emphasizing that many optionalities exist for operational reasons. That’s why the optionality is often used as the default condition. In any case, the database designer is obligated to develop precise business rules to make sure that the data environment is properly reflected in the design.

233

184. Given the sample records in the CHARTER table shown in Table P6.11, do the following: Answer:

Table P6.11 Attribute Name

Sample Value

CHAR_TRIP

10232

10233

10234

10235

CHAR_DATE

15-Jan-2022

16-Jan-2022

17-Jan-2022

CHAR_CITY

STL

MIA

TYS

ATL

CHAR_MILES

580

1,290

524

768

CUST_NUM

784

231

544

784

CUST_LNAME

Brown

Hanson

Bryana

Brown

CHAR_PAX

CHAR_CARGO

235 lbs.

18,940 lbs.

348 lbs.

155 lbs.

PILOT

Melton

Chen

Henderson

Melton

COPILOT

Henderson

Melton

FLT_ENGINEER

O’Shaski

LOAD_MASTER

Benkasi

AC_NUMBER

1234Q

3456Y

1234Q

2256W

MODEL_CODE

PA31-350

CV-580

PA31-350

MODEL_SEATS

MODEL_CHG_MILE

$2.79

$23.36

$2.79

a. Write the relational schema and draw the dependency diagram for the table structure. Make sure that you label all dependencies. CHAR_PAX indicates the number of passengers carried. The CHAR_MILES entry is based on round-trip miles, including pickup points. (Hint: Look at the data values to determine the nature of the relationships. For example, note that employee Melton has flown two charter trips as pilot and one trip as copilot.) Answer: The dependency diagram is shown in Figure P6.11a. Please note we abbreviated the last three attribute names.

234

FIGURE P6.11a The Dependency Diagram for Problem 11a

The relational schema is written as follows: CHARTER(CHAR_TRIP, CHAR_DATE, CHAR_CITY, CHAR_MILES, CUST_NUM, CUST_LNAME, CHAR_PAX, CHAR_CARGO, PILOT, COPILOT, FLT_ENGINEER, LOAD_MASTER, AC_NUMBER, MOD_CODE, MOD_SEATS, MOD_CHG_MILE) b. Decompose the dependency diagram you drew to solve Problem 11a to create table structures that are in 3NF and write the relational schema. Answer: The normalized dependency diagram is shown in Figure P6.11b. (Note the addition of MOD_CODE in the AIRCRAFT table to serve as the AIRCRAFT table’s FK to MODEL.)

235

FIGURE P6.11b The Normalized Dependency Diagram for Problem 11b

c. Draw the Crow’s Foot ERD to reflect the properly decomposed dependency diagrams you created in Problem 11b. Make sure the ERD yields a database that can track all of the data shown in Problem 11. Show all entities, relationships, connectivities, optionalities, and cardinalities. Answer: The initial Crow’s Foot ERD is shown in Figure P6.11c.

236

FIGURE P6.11c The Initial Crow’s Foot ERD for Problem 11c

While the ERD shown in Figure P6.11c faithfully reflects the results generated by the normalization process, it has a major design flaw. This flaw has the following consequences: 

If additional crew members such as copilots, loadmasters, and flight engineers are not assigned to the flight, the CHARTER table will include many nulls. (Many of the smaller aircraft that are used in charter flying require only that a pilot and a functioning autopilot be used. In fact, the Federal Air Regulations (FARs) that govern charter aviation permit single pilot operations for aircraft that have less than a 12,500lbs. gross take-off weight and that are not turbine-powered.)



The inclusion of COPILOT, FLT_ENGINEER, and LOAD_MASTER also produce synonyms in the CHARTER table.



As the aircraft used in the charter flights become larger and more complex, crews become larger, thus producing more synonyms and more potential nulls. (Not to mention that the CHARTER table will have to be modified to accept additional crew members such as flight attendants.)

The problems associated with the ERD shown in Figure P6.11c are eliminated through the composite entity named CREW in Figure P6.11d. Note that this modification makes it possible to assign any number of crew members. To ensure that the crew members are properly qualified, a job attribute can be added to the EMPLOYEE entity and the applications software can then assign crew members based on job classifications such as pilot, loadmaster, flight attendant, and so on. Because only some employees are qualified as crew members, CREW is optional to EMPLOYEE. But each crew member must be an employee, so EMPLOYEE is mandatory to CREW. Also, note that for simplification purposes not all attributes of the CHARTER table are shown—attributes such as CHAR_PAX and CHAR_CARGO that are not shown will be part of the final design.

237

FIGURE P6.11d The Final Crow’s Foot ERD for Problem 11c

Note that the application shown in Figure P6.11e—based on the design shown in Figure P6.11d—enables the end user to input only those crew members that are required for the charter flight. (In this case, only two crew members are required, but the design permits the addition of many more crew members without making structural changes in the database tables. Such flexibility is the essence of good design.)

238

FIGURE P6.11e Sample Charter Record

ANSWERS TO REVIEW QUESTIONS 185. Explain why it would be preferable to use a DATE data type to store date data instead of a character data type. Answer: The DATE data type uses numeric values based on the Julian calendar to store dates. This makes date arithmetic such as adding and subtracting days or fractions of days possible (as well as numerous special date-oriented functions discussed in the next chapter). 186. Explain why the following command would create an error and what changes could be made to fix the error.

239

Answer: SELECT V_CODE, SUM(P_QOH) FROM PRODUCT; The command would generate an error because an aggregate function is applied to the P_QOH attribute but V_CODE is neither in an aggregate function nor in a GROUP BY clause. This can be fixed by either (1) placing V_CODE in an appropriate aggregate function based on the data that is being requested by the user, (2) adding a GROUP BY clause to group by values of V_CODE (i.e., GROUP BY V_CODE), (3) removing the V_CODE attribute from the SELECT clause, or (4) removing the Sum aggregate function from P_QOH. Which of these solutions is most appropriate depends on the question that the query was intended to answer? 187. What is a cross join? Give an example of its syntax. Answer: A CROSS JOIN is identical to the PRODUCT relational operator. The CROSS JOIN is also known as the Cartesian product of two tables. For example, if you have two tables, AGENT, with 10 rows, and CUSTOMER, with 21 rows, the CROSS JOIN resulting set will have 210 rows and will include all of the columns from both tables. Syntax examples are: SELECT * FROM CUSTOMER CROSS JOIN AGENT; or SELECT * FROM CUSTOMER, AGENT; If you do not specify a join condition when joining tables, the result will be a CROSS JOIN or PRODUCT operation. 188. What three join types are included in the outer join classification? Answer: An OUTER JOIN is a type of JOIN operation that yields all rows with matching values in the join columns as well as unmatched rows. (Unmatched rows are those without matching values in the join columns.) The SQL standard prescribes three different types of join operations: LEFT [OUTER] JOIN RIGHT [OUTER] JOIN FULL [OUTER] JOIN The LEFT [OUTER] JOIN will yield all rows with matching values in the join columns, plus all of the unmatched rows from the left table. (The left table is the first table named in the FROM clause.) The RIGHT [OUTER] JOIN will yield all rows with matching values in the join columns, plus all of the unmatched rows from the right table. (The right table is the second table named in the FROM clause.) The FULL [OUTER] JOIN will yield all rows with matching values in the join columns, plus all the unmatched rows from both tables named in the FROM clause. 189. Using tables named T1 and T2, write a query example for each of the three join types you described in Question 4. Assume that T1 and T2 share a common column named C1. Answer: LEFT OUTER JOIN example: SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.C1 = T2.C1; RIGHT OUTER JOIN example: © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

240

SELECT * FROM T1 RIGHT OUTER JOIN T2 ON T1.C1 = T2.C1; FULL OUTER JOIN example: SELECT * FROM T1 FULL OUTER JOIN T2 ON T1.C1 = T2.C1; 190. What is a recursive join? Answer: A recursive join is a join in which a table is joined to itself. 191. Rewrite the following WHERE clause without the use of the IN special operator: WHERE V_STATE IN (‘TN’, ‘FL’, ‘GA’) Answer: WHERE V_STATE = ‘TN’ OR V_STATE = ‘FL’ OR V_STATE = ‘GA’ Notice that each criterion must be complete (i.e., attribute-operator-value). 192. Explain the difference between an ORDER BY clause and a GROUP BY clause. Answer: An ORDER BY clause has no impact on which rows are returned by the query; it simply sorts those rows into the specified order. A GROUP BY clause does impact the rows that are returned by the query. A GROUP BY clause gathers rows into collections that can be acted on by aggregate functions.

241

193. Explain why the following two commands produce different results: SELECT DISTINCT COUNT (V_CODE) FROM PRODUCT; SELECT COUNT (DISTINCT V_CODE) FROM PRODUCT; Answer: The difference is in the order of operations. The first command executes the Count function to count the number of values in V_CODE (say the count returns “14” for example) including duplicate values, and then the Distinct keyword only allows one count of that value to be displayed (only one row with the value “14” appears as the result). The second command applies the Distinct keyword to the V_CODEs before the count is taken, so only unique values are counted. 194. What is the difference between the COUNT aggregate function and the SUM aggregate function? Answer: COUNT returns the number of values without regard to what the values are. SUM adds the values together and can only be applied to numeric values. 195. In a SELECT query, what is the difference between a WHERE clause and a HAVING clause? Answer: Both a WHERE clause and a HAVING clause can be used to eliminate rows from the results of a query. The differences are (1) the WHERE clause eliminates rows before any grouping for aggregate functions occurs, while the HAVING clause eliminates groups after the grouping has been done, and (2) the WHERE clause cannot contain an aggregate function, but the HAVING clause can. 196. What is a subquery, and what are its basic characteristics? Answer: A subquery is a query (expressed as a SELECT statement) that is located inside another query. The first SQL statement is known as the outer query, and the second is known as the inner query or subquery. The inner query or subquery is normally executed first. The output of the inner query is used as the input for the outer query. A subquery is normally expressed inside parentheses and can return zero, one, or more rows and each row can have one or more columns. A subquery can appear in many places in a SQL statement: 

as part of a FROM clause,



to the right of a WHERE conditional expression,



to the right of the IN clause,



in an EXISTS operator,



to the right of a HAVING clause conditional operator,



in the attribute list of a SELECT clause.

Examples of subqueries are: INSERT INTO PRODUCT SELECT * FROM P; DELETE FROM PRODUCT WHERE V_CODE IN (SELECT V_CODE FROM VENDOR WHERE V_AREACODE = ‘615’);

242

SELECT

V_CODE, V_NAME

FROM

VENDOR

WHERE

V_CODE NOT IN (SELECT V_CODE FROM PRODUCT);

197. What are the three types of results that a subquery can return? Answer: A subquery can return (1) a single value (one row, one column), (2) a list of values (many rows, one column), or (3) a virtual table (many rows, many columns). 198. What is a correlated subquery? Give an example. Answer: A correlated subquery is subquery that executes once for each row in the outer query. This process is similar to the typical nested loop in a programming language. Contrast this type of subquery to the typical subquery that will execute the innermost subquery first, and then the next outer query … until the last outer query is executed. That is, the typical subquery will execute in serial order, one after another, starting with the innermost subquery. In contrast, a correlated subquery will run the outer query first, and then it will run the inner subquery once for each row returned in the outer subquery. For example, the following subquery will list all the product line sales in which the “units sold” value is greater than the “average units sold” value for that product (as opposed to the average for all products.) SELECT

INV_NUMBER, P_CODE, LINE_UNITS

FROM

LINE LS

WHERE

LS.LINE_UNITS > (SELECT WHERE

AVG(LINE_UNITS) FROM LINE LA LA.P_CODE = LS.P_CODE);

The previous nested query will execute the inner subquery once to compute the average sold units for each product code returned by the outer query. 199. Explain the difference between a regular subquery and a correlated subquery. Answer: A regular, or uncorrelated, subquery executes before the outer query. It executes only once and the result is held for use by the outer query. A correlated subquery relies in part on the outer query, usually through a WHERE criterion in the subquery that references an attribute in the outer query. Therefore, a correlated subquery will execute once for each row evaluated by the outer query; and the correlated subquery can potentially produce a different result for each row in the outer query. 200. What does it mean to say that SQL operators are set-oriented? Answer: The description of SQL operators as set-oriented means that the commands work over entire tables at a time, not row-by-row.

243

201. The relational set operators UNION, INTERSECT, and EXCEPT (MINUS) work properly only when the relations are union-compatible. What does union-compatible mean, and how would you check for this condition? Answer: Union-compatible means that the relations yield attributes with identical names and compatible data types. That is, the relation A(c1,c2,c3) and the relation B(c1,c2,c3) have union-compatibility if both relations have the same number of attributes, and corresponding attributes in the relations have “compatible” data types. Compatible data types do not require that the attributes be exactly identical—only that they are comparable. For example, VARCHAR(15) and CHAR(15) are comparable, as are NUMBER (3,0) and INTEGER, and so on. Note that this is a practical definition of union-compatibility, which is different than the theoretical definition discussed in Chapter 3. From a theoretical perspective, corresponding attributes must have the same domain. However, the DBMS does not understand the meaning of the business domain, so it must work with a more concrete understanding of the data in the corresponding columns. Thus, it only considers the data types. 202. What is the difference between UNION and UNION ALL? Write the syntax for each. Answer: UNION yields unique rows. In other words, UNION eliminates duplicates rows. On the other hand, a UNION ALL operator will yield all rows of both relations, including duplicates. Notice that for two rows to be duplicated, they must have the same values in all columns. To illustrate the difference between UNION and UNION ALL, let’s assume two relations: A (ID, Name) with rows (1, Lake, 2, River, and 3, Ocean) and B (ID, Name) with rows (1, River, 2, Lake, and 3, Ocean). Given this description, SELECT * FROM A UNION SELECT * FROM B will yield: ID Name 1. Lake 2. River 3. Ocean 1. River 2. Lake while SELECT * FROM A UNION ALL

244

SELECT * FROM B will yield: ID Name 1. Lake 2. River 3. Ocean 1. River 2. Lake 3. Ocean 203. Suppose you have two tables: EMPLOYEE and EMPLOYEE_1. The EMPLOYEE table contains the records for three employees: Alice Cordoza, John Cretchakov, and Anne McDonald. The EMPLOYEE_1 table contains the records for employees John Cretchakov and Mary Chen. Given that information, list the query output for the UNION query. Answer: The query output will be: Alice Cordoza John Cretchakov Anne McDonald Mary Chen 204. Given the employee information in Question 19, list the query output for the UNION ALL query. Answer: The query output will be: Alice Cordoza John Cretchakov Anne McDonald John Cretchakov Mary Chen 205. Given the employee information in Question 19, list the query output for the INTERSECT query. Answer: The query output will be: John Cretchakov

245

206. Given the employee information in Question 19, list the query output for the EXCEPT (MINUS) query of EMPLOYEE to EMPLOYEE_1. Answer: This question can yield two different answers. If you use SELECT * FROM EMPLOYEE MINUS SELECT * FROM EMPLOYEE_1 the answer is Alice Cordoza Anne McDonald If you use SELECT * FROM EMPLOYEE_1 MINUS SELECT * FROM EMPLOYEE the answer is Mary Chen 207. Suppose a PRODUCT table contains two attributes, PROD_CODE and VEND_CODE. Those two attributes have values of ABC, 125, DEF, 124, GHI, 124, and JKL, 123, respectively. The VENDOR table contains a single attribute, VEND_CODE, with values 123, 124, 125, and 126, respectively. (The VEND_CODE attribute in the PRODUCT table is a foreign key to the VEND_CODE in the VENDOR table.) Given that information, what would be the query output for: Because the common attribute is V_CODE, the output will only show the V_CODE values generated by each query. Answer: a. A UNION query based on the two tables? 125,124,123,126 b. A UNION ALL query based on the two tables? 125,124,124,123,123,124,125,126 c. An INTERSECT query based on the two tables? 123,124,125 d. An EXCEPT (MINUS) query based on the two tables? If you use PRODUCT MINUS VENDOR, the output will be NULL. If you use VENDOR MINUS PRODUCT, the output will be 126.

246

208. Why does the order of the operands (tables) matter in an EXCEPT (MINUS) query but not in a UNION query? Answer: MINUS queries are analogous to algebraic subtraction—it results in the value that existed in the first operand that is not in the second operand. UNION queries are analogous to algebraic addition—it results in a combination of the two operands. (These analogies are not perfect, obviously, but they are helpful when learning the basics.) Addition and UNION have the commutative property (a + b = b + a), while subtraction and MINUS do not (a – b ≠ b – a). 209. What MS Access and SQL Server function should you use to calculate the number of days between your birth date and the current date? Answer: In MS Access, the DATE() function would be used. In MS SQL Server, the GETDATE() function would be used. 210. What Oracle function should you use to calculate the number of days between your birth date and the current date? Answer: The SYSDATE keyword can be used to retrieve the current date from the server. By subtracting your birthdate from the current date, using date arithmetic, the number of dates will be returned. Note that in Oracle, the SQL statement requires the use of the FROM clause. In this case, you may use the DUAL table. (The DUAL table is a dummy “virtual” table provided by Oracle for this type of query. The table contains only one row and one column so queries against it can return just one value.) 211. What string function should you use to list the first three characters of a company’s EMP_LNAME values? Give an example using a table named EMPLOYEE. Provide examples for Oracle and SQL Server. Answer: In Oracle, you use the SUBSTR function as illustrated next: SELECT SUBSTR(EMP_LNAME, 1, 3) FROM EMPLOYEE; In SQL Server, you use the SUBSTRING function as shown: SELECT SUBSTRING(EMP_LNAME, 1, 3) FROM EMPLOYEE; 212. What two things must a SQL programmer understand before beginning to craft a SELECT query? Answer: Before crafting a SELECT query, the SQL programmer must (1) understand the data model in which the query will operate, and (2) the problem being solved. Data models are often complex to the point that knowing what data is available, the meaning of that data, and how to transform the data to produce the desired results will require the programmer to become very familiar with the data model before the query can be created. Problem statements that seem clear to users can often be interpreted in many ways, so it is important for the programmer to understand exactly what the user is requesting.

247

ANSWERS TO PROBLEMS All of the problems in the Problems section require writing SQL code. Since there are minor differences in the code based on the DBMS used, solutions for all of the problems are provided in separate files for Oracle, MySQL, and Microsoft SQL Server. Solutions for Microsoft Access are provided in .mdb files for each data model used in the Problems section. The files are located in the “Teacher” data files that accompany the book, and are named as follows: Oracle:

Ch07_ProblemSolutions_ORA.txt

MySQL:

Ch07_ProblemSolutions_MySQL.txt

SQL Server:

Ch07_ProblemSolutions_SQL.txt

MS Access:

Ch07_ConstructCo.mdb Ch07_Fact.mdb Ch07_LargeCo.mdb Ch07_SaleCo.mdb

ANSWERS TO REVIEW QUESTIONS 213. What type of integrity is enforced when a primary key is declared? Answer: Creating a primary key constraint enforces entity integrity (i.e., no part of the primary key can contain a null and the primary key values must be unique). 214. Explain why it might be more appropriate to declare an attribute that contains only digits as a character data type instead of a numeric data type. Answer: An attribute that contains only digits may be properly defined as character data when the values are nominal; that is, the values do not have numerical significance but serve only as labels such as ZIP codes and telephone numbers. One easy test is to consider

248

whether or not a leading zero should be retained. For the ZIP code 03133, the leading zero should be retained; therefore, it is appropriate to define it as character data. For the quantity on hand of 120, we would not expect to retain a leading zero such as 0120; therefore, it is appropriate to define the quantity on hand as a numeric data type. 215. What is the difference between a column constraint and a table constraint? Answer: A column constraint can refer to only the attribute with which it is specified. A table constraint can refer to any attributes in the table. 216. What are “referential constraint actions”? Answer: Referential constraint actions, such as ON DELETE CASCADE, are default actions that the DBMS should take when a DML command would result in a referential integrity constraint violation. Without referential constraint actions, DML commands that would result in a violation of referential integrity will fail with an error indicating that the referential integrity constraint cannot be violated. Referential constraint actions can allow the DML command to successfully complete while making the designated changes to the related records to maintain referential integrity. 217. What is the purpose of a CHECK constraint? Answer: A CHECK constraint is used to limit the values that can appear in an attribute. It performs the function of enforcing a domain.

249

218. Explain when an ALTER TABLE command might be needed. Answer: ALTER TABLE is used to modify the structure of an existing table by adding, removing, or modifying column definitions and, in some cases, constraints. Many database structures have long, useful lives in an organization. It is not uncommon for a database to exist in organizational systems for decades. If the existing database structure needs to be modified to accommodate changes in business requirements or the integration of new systems, the existing structure will be modified with ALTER TABLE commands. This preserves the existing data in the table, as opposed to dropping the table and then recreating it. 219. What is the difference between an INSERT command and an UPDATE command? Answer: The INSERT command is used to add a new row to a table. The UPDATE command changes the values in attributes of an existing row. UPDATE will not increase the number of rows in a table, but INSERT will. 220. What is the difference between using a subquery with a CREATE TABLE command and using a subquery with an INSERT command? Answer: Using a subquery with a CREATE TABLE command is a DDL command and will create a new database table. The table will be structured to match the structure of the data returned by the subquery, and the data from the subquery will be placed in the table. Therefore, using a subquery with CREATE TABLE will both create the structure and place data inside that structure. Using a subquery with an INSERT command is a DML command and will add data to an existing table. This operation requires that the target table where the data should be stored must already exist. The programmer must ensure that the structure of the data being returned by the subquery is appropriate in terms of data types and constraints for the structure of the table where the results are to be stored. 221. What is a sequence? Write its syntax. Answer: A sequence is a special type of object that generates unique numeric values in ascending or descending order. You can use a sequence to assign values to a primary key field in a table. A sequence provides functionality similar to the AutoNumber data type in MS Access. For example, both sequences and AutoNumber data types provide unique ascending or descending values. However, there are some subtle differences between the two: 

In MS Access, an AutoNumber is a data type; in Oracle and SQL Server, a sequence is a completely independent object, rather than a data type.



In MS Access, you can only have one AutoNumber per table; in Oracle and SQL Server, you can have as many sequences as you want, and they are not tied to any particular table.



In MS Access, the AutoNumber data type is tied to a field in a table; in Oracle and SQL Server, the sequence-generated value is not tied to any field in any table and can, therefore, be used on any attribute in any table.

250

The syntax used to create a sequence is: CREATE SEQUENCE CUS_NUM_SEQ START WITH 100 INCREMENT BY 10 NOCACHE; MySQL does not currently support sequences. 222. What is a trigger, and what is its purpose? Give an example. Answer: A trigger is a block of procedural SQL code that is automatically invoked by the DBMS upon the occurrence of a data manipulation event (INSERT, UPDATE, or DELETE). Triggers are always associated with a table and are invoked before or after a data row is inserted, updated, or deleted. A table can have zero, one, or more triggers. Triggers provide a method of enforcing business rules such as: 

A customer making a credit purchase must have an active account.



A student taking a class with a prerequisite must have completed that prerequisite with a B grade.



To be scheduled for a flight, a pilot must have a valid medical certificate and a valid training completion record.

Triggers are also excellent for enforcing data constraints that cannot be directly enforced by the data model. For example, suppose that you must enforce the following business rule: If the quantity on hand of a product falls below the minimum quantity, the P_REORDER attribute must the automatically set to 1. To enforce this business rule, you can create the following TRG_PRODUCT_REORDER trigger in Oracle: CREATE OR REPLACE TRIGGER TRG_PRODUCT_REORDER BEFORE INSERT OR UPDATE OF P_ONHAND, P_MIN ON PRODUCT FOR EACH ROW BEGIN IF :NEW.P_ONHAND <= :NEW.P_MIN THEN NEW.P_REORDER := 1; ELSE :NEW.P_REORDER := 0; END IF; END; 223. What is a stored procedure, and why is it particularly useful? Give an example. Answer: A stored procedure is a named block of procedural SQL and standard SQL statements. One of the major advantages of stored procedures is that they can be used to encapsulate and represent business transactions. For example, you can create a stored procedure to represent a product sale, a credit update, or the addition of a new customer. You can encapsulate SQL statements within a single stored procedure and execute them as a single transaction.

251

There are two clear advantages to the use of stored procedures: 1. Stored procedures substantially reduce network traffic and increase performance. Because the stored procedure is stored at the server, there is no transmission of individual SQL statements over the network. 2. Stored procedures help reduce code duplication through code isolation and code sharing (creating unique procedural modules that are called by application programs), thereby minimizing the chance of errors and the cost of application development and maintenance. For example, the following PRC_LINE_ADD stored procedure will add a new invoice line to the LINE table and it will automatically retrieve the correct price from the PRODUCT table. CREATE OR REPLACE PROCEDURE PRC_LINE_ADD (W_LN IN NUMBER, W_P_CODE IN VARCHAR2, W_LU NUMBER) AS W_LP NUMBER := 0.00; BEGIN -- GET THE PRODUCT PRICE SELECT P_PRICE INTO W_LP FROM PRODUCT WHERE P_CODE = W_P_CODE; -- ADDS THE NEW LINE ROW INSERT INTO LINE VALUES(INV_NUMBER_SEQ.CURRVAL, W_LN, W_P_CODE, W_LU, W_LP); DBMS_OUTPUT.PUT_LINE('Invoice line ' || W_LN || ' added'); END;

252

ANSWERS TO PROBLEMS All of the problems in the Problems section require writing SQL or procedural SQL code. Since there are minor differences in the code based on the DBMS used, solutions for problems are provided in separate files for Oracle, MySQL, and Microsoft SQL Server. Solutions for Microsoft Access are provided in .mdb files for each data model used in the Problems section. A very few of the problems do not apply to all DBMS products. For example, MySQL is installed in “autocommit” mode by default, therefore, issuing COMMIT commands are not necessary. On the other hand, Oracle does not use autocommit by default and does require COMMIT commands to make DML command results permanent in the database. Therefore, instructions about issuing commands to make DML changes permanent do not apply to MySQL but are necessary for Oracle. The files are in the “Teacher” data files that accompany the book and are named as follows: Oracle:

Ch08_ProblemSolutions_ORA.sql

MySQL:

Ch08_ProblemSolutions_MySQL.sql

SQL Server:

Ch08_ProblemSolutions_SQL.sql

MS Access:

Ch08_AviaCo.mdb Ch08_ConstructCo.mdb Ch08_MovieCo.mdb Ch08_SaleCo.mdb Ch08_SimpleCo.mdb

ANSWERS TO REVIEW QUESTIONS 224. What is an information system? What is its purpose? Answer: An information system is a system that

253



Provides the conditions for data collection, storage, and retrieval



Facilitates the transformation of data into information



Provides management of both data and information

An information system is composed of hardware, software (DBMS and applications), database(s), procedures, and people. Good decisions are generally based on good information. Ultimately, the purpose of an information system is to facilitate good decision making by making relevant and timely information available to the decision makers. 225. How do systems analysis and systems development fit into a discussion about information systems? Answer: Both systems analysis and systems development constitute part of the Systems Development Life Cycle (SDLC). Systems analysis, phase II of the SDLC, establishes the need for and the extent of an information system by 

Establishing end-user requirements



Evaluating the existing system



Developing a logical systems design

Systems development, based on the detailed systems design found in phase III of the SDLC, yields the information system. The detailed system specifications are established during the systems design phase, in which the designer completes the design of all required system processes.

254

226. What does the acronym SDLC mean, and what does an SDLC portray? Answer: The acronym SDLC is used to label the System Development Life Cycle. The SDLC traces the history of an information system from its inception to its obsolescence. The SDLC is composed of six phases: planning, analysis, detailed system, design, implementation, and maintenance. 227. What does the acronym DBLC mean, and what does a DBLC portray? Answer: The acronym DBLC is used to label the Database Life Cycle. The DBLC traces the history of a database system from its inception to its obsolescence. Since the database constitutes the core of an information system, the DBLC is concurrent to the SDLC. The DBLC is composed of six phases: initial study, design, implementation and loading, testing and evaluation, operation, and maintenance and evolution. 228. Discuss the distinction between centralized and decentralized conceptual database designs. Answer: Centralized and decentralized designs constitute variations on the bottom-up and top-down approaches we discussed in the third question presented in the discussion focus. Basically, the centralized approach is best suited to relatively small and simple databases that lend themselves well to a bird’s-eye view of the entire database. Such databases may be designed by a single person or by a small and informally constituted design team. The company operations and the scope of its problems are sufficiently limited to enable the designer(s) to perform all of the necessary database design tasks: 1. Define the problem(s). 2. Create the conceptual design. 3. Verify the conceptual design with all user views. 4. Define all system processes and data constraints. 5. Assure that the database design will comply with all achievable end-user requirements. The centralized design procedure thus yields the design summary shown in Figure Q9.5A.

255

FIGURE Q9.5A The Centralized Design Procedure

Conceptual Model

Conceptual Model Verification

User Views

System Processes

Data Constraints

D A T A D I C T I O N A R Y

Note that the centralized design approach requires the completion and validation of a single conceptual design.

NOTE Use the text’s Figures 9.15 and 9.16 to contrast the two design approaches, then use Figure 9.6 to show the procedure flows; demonstrate that such procedure flows are independent of the degree of centralization. In contrast, when company operations are spread across multiple operational sites or when the database has multiple entities that are subject to complex relations, the best approach is often based on a decentralized design. Typically, a decentralized design requires that the design task be divided into multiple modules, each one of which is assigned to a design team. The design team activities are coordinated by the lead designer, who must aggregate the design teams’ efforts. Since each team focuses on modeling a subset of the system, the definition of boundaries and the interrelation between data subsets must be very precise. Each team creates a conceptual data model corresponding to the subset being modeled. Each conceptual model is then verified individually against the user views, processes, and constraints for each of the modules. After the verification process has been completed, all modules are integrated in one conceptual model. Since the data dictionary describes the characteristics of all the objects within the conceptual data model, it plays a vital role in the integration process. Naturally, after the subsets have been aggregated into a larger conceptual model, the lead designer must verify that the

256

combined conceptual model is still able to support all the required transactions. Thus, the decentralized design activities may be summarized as shown in Figure Q9.6B.

FIGURE Q9.6B The Decentralized Design Procedure

DATA COMPONENT

Conceptual Models

Verification

Subset A

Subset B

Subset C

Views, Processes, Constraints

Aggregation

FINAL CONCEPTUAL MODEL

D A T A D I C T I O N A R Y

Keep in mind that the aggregation process requires the lead designer to assemble a single model in which various aggregation problems must be addressed: 

Synonyms and homonyms. Different departments may know the same object by different names (synonyms), or they may use the same name to address different objects (homonyms). The object may be an entity, an attribute, or a relationship.



Entity and entity subclasses. An entity subset may be viewed as a separate entity by one or more departments. The designer must integrate such subclasses into a higher-level entity.



Conflicting object definitions. Attributes may be recorded as different types (character, numeric), or different domains may be defined for the same attribute. Constraint definitions, too, may vary. The designer must remove such conflicts from the model.

257

229. What is the minimal data rule in conceptual design? Why is it important? Answer: The minimal data rule specifies that all the data defined in the data model are actually required to fit present and expected future data requirements. This rule may be phrased as All that is needed is there, and all that is there is needed. 230. Discuss the distinction between top-down and bottom-up approaches in database design. Answer: There are two basic approaches to database design: top-down and bottom-up. Top-down design begins by identifying the different entity types and the definition of each entity’s attributes. In other words, top-down design: 

starts by defining the required data sets and then



defines the data elements for each of those data sets.

Bottom-up design: 

first defines the required attributes and then



groups the attributes to form entities.

Although the two methodologies tend to be complementary, database designers who deal with small databases with relatively few entities, attributes, and transactions tend to emphasize the bottom-up approach. Database designers who deal with large, complex databases usually find that a primarily top-down design approach is more appropriate. 231. What are business rules? Why are they important to a database designer? Answer: Business rules are narrative descriptions of the business policies, procedures, or principles that are derived from a detailed description of operations. Business rules are particularly valuable to database designers because they help define: 

Entities



Attributes



Relationships (1:1, 1:M, M:N, expressed through connectivities and cardinalities)



Constraints

To develop an accurate data model, the database designer must have a thorough and complete understanding of the organization’s data requirements. The business rules are very important to the designer because they enable the designer to fully understand how the business works and what role is played by data within company operations.

258

NOTE Do keep in mind that an ERD cannot always include all the applicable business rules. For example, although constraints are often crucial, it is often not possible to model them. For instance, there is no way to model a constraint such as “no pilot may be assigned to flight duties more than ten hours during any 24-hour period.” It is also worth emphasizing that the description of (company) operations must be done in almost excruciating detail and it must be verified and reverified. An inaccurate description of operations yields inaccurate business rules that lead to database designs that are destined to fail. 232. What is the data dictionary’s function in database design? Answer: A good data dictionary provides a precise description of the characteristics of all the entities and attributes found within the database. The data dictionary thus makes it easier to check for the existence of synonyms and homonyms, to check whether all attributes exist to support required reports, to verify appropriate relationship representations, and so on. The data dictionary’s contents are both developed and used during the six DBLC phases: DATABASE INITIAL STUDY The basic data dictionary components are developed as the entities and attributes are defined during this phase. DATABASE DESIGN The data dictionary contents are used to verify the database design components: entities, attributes, and their relationships. The designer also uses the data dictionary to check the database design for homonyms and synonyms and verifies that the entities and attributes will support all required query and report requirements. IMPLEMENTATION AND LOADING The DBMS’s data dictionary helps to resolve any remaining attribute definition inconsistencies. TESTING AND EVALUATION If problems develop during this phase, the data dictionary contents may be used to help restructure the basic design components to make sure that they support all required operations. OPERATION If the database design still yields (the almost inevitable) operational glitches, the data dictionary may be used as a quality control device to ensure that operational modifications to the database do not conflict with existing components.

259

MAINTENANCE AND EVOLUTION As users face inevitable changes in information needs, the database may be modified to support those needs. Perhaps entities, attributes, and relationships must be added, or relationships must be changed. If new database components fit into the design, their introduction may produce conflict with existing components. The data dictionary turns out to be a very useful tool to check whether a suggested change invites conflicts within the database design and, if so, how such conflicts may be resolved. 233. What steps are required in the development of an ER diagram? (Hint: See Table 9.3.) Answer: Table 9.3 is reproduced for your convenience.

Table 9.3 Developing the Conceptual Model Using ER Diagrams STEP

ACTIVITY

Identify, analyze, and refine the business rules.

Identify the main entities, using the results of Step 1.

Define the relationships among the entities, using the results of Steps 1 and 2.

Define the attributes, primary keys, and foreign keys for each of the entities.

Normalize the entities. (Remember that entities are implemented as tables in an RDBMS.)

Complete the initial ER diagram.

Validate the ER model against the end users’ information and processing requirements.

Modify the ER model, using the results of Step 7. Point out that some of the steps listed in Table 9.3 take place concurrently. And some, such as the normalization process, can generate demand for additional entities and/or attributes, thereby causing the designer to revise the ER model. For example, while identifying two main entities, the designer might also identify the composite bridge entity that represents the many-to-many relationship between those two main entities.

234. List and briefly explain the activities involved in the verification of an ER model. Answer: Section 9-4c, “Data Model Verification,” includes a discussion on verification. In addition, Appendix C, “The University Lab: Conceptual Design Verification, Logical Design, and Implementation,” covers the verification process in detail. The verification process is detailed in the text’s Table 9.5, reproduced here for your convenience.

260

Table 9.5 The ER Model Verification Process STEP

ACTIVITY

Identify the ER model’s central entity.

Identify each module and its components.

Identify each module’s transaction requirements: Internal: Updates/Inserts/Deletes/Queries/Reports External: Module interfaces

Verify all processes against the module’s processing and reporting requirements.

Make all necessary changes suggested in Step 4.

Repeat Steps 2−5 for all modules. Keep in mind that the verification process requires the continuous verification of business transactions as well as system and user requirements. The verification sequence must be repeated for each of the system’s modules.

235. What factors are important in a DBMS software selection? Answer: The selection of DBMS software is critical to the information system’s smooth operation. Consequently, the advantages and disadvantages of the proposed DBMS software should be carefully studied. To avoid false expectations, the end user must be made aware of the limitations of both the DBMS and the database. Although the factors affecting the purchasing decision vary from company to company, some of the most common are: 

Cost. Purchase, maintenance, operational, license, installation, training, and conversion costs.



DBMS features and tools. Some database software includes a variety of tools that facilitate the application development task. For example, the availability of query by example (QBE), screen painters, report generators, application generators, data dictionaries, and so on, helps to create a more pleasant work environment for both the end user and the application programmer. Database administrator facilities, query facilities, ease of use, performance, security, concurrency control, transaction processing, and third-party support also influence DBMS software selection.



Underlying model. Hierarchical, network, relational, object/relational, or object.



Portability. Across platforms, systems, and languages.



DBMS hardware requirements. Processor(s), RAM, disk space, and so on.

261

236. List and briefly explain the four steps performed during the logical design stage. Answer: 1. Map conceptual model to logical model components. In this step, the conceptual model is converted into a set of table definitions including table names, column names, primary keys, and foreign keys for implementing the entities and relationships specified in the conceptual design. 2. Validate the logical model using normalization. It is possible for normalization issues to be discovered during the process of mapping the conceptual model to logical model components. Therefore, it is appropriate at this stage to validate that all of the table definitions from the previous step conform to the appropriate normalization rules. 3. Validate logical model integrity constraints. This step involves the conversion of attribute domains and constraints into constraint definitions that can be implemented within the DBMS to enforce those domains. Also, entity and referential integrity constraints are validated. Views may be defined to enforce security constraints. 4. Validate the logical model against the user requirements. The final step of this stage is to ensure that all definitions created throughout the logical model are validated against the users’ data, transaction, and security requirements. Every component (table, view, constraint, etc.) of the logical model must be associated with satisfying the user requirements, and every user requirement should be addressed by the model components. 237. List and briefly explain the three steps performed during the physical design stage. Answer: 1. Define data storage organization. Based on estimates of the data volume and growth, this step involves the determination of the physical location and physical organization of each table. Also, which columns will be indexed and the type of indexes to be used are determined. Finally, the type of implementation to be used for each view is decided. 2. Define integrity and security measures. This step involves creating users and security groups and then assigning privileges and controls to those users and groups. 3. Determine performance measurements. The actual performance of the physical database implementation must be measured and assessed for compliance with user performance requirements.

262

238. What three levels of backup may be used in database recovery management? Briefly describe what each backup level does. Answer: A full backup of the database creates a backup copy of all database objects in their entirety. A differential backup of the database creates a backup of only those database objects that have changed since the last full backup. A transaction log backup does not create a backup of database objects but makes a backup of the log of changes that have been applied to the database objects since the last backup.

263

ANSWERS TO PROBLEMS The ABC Car Service & Repair Centers are owned by the Silent Car Dealership; ABC services and repairs only silent cars. Three ABC centers provide service and repair for the entire state. Each of the three centers is independently managed and operated by a shop manager, a receptionist, and at least eight mechanics. Each center maintains a fully stocked parts inventory. Each center also maintains a manual file system in which each car’s maintenance history is kept; repairs made, parts used, costs, service dates, owner, and so on. Files are also kept to track inventory, purchasing, billing, employees’ hours, and payroll. You have been contacted by one of the center’s managers to design and implement a computerized database system. Given the preceding information, do the following: a. Indicate the most appropriate sequence of activities by labeling each of the following steps in the correct order. (e.g., if you think that “Load the database” is the appropriate first step, label it “1.”) ____

Normalize the conceptual model.

____

Obtain a general description of company operations.

____

Load the database.

____

Create a description of each system process.

____

Test the system.

____

Draw a data flow diagram and system flowcharts.

____

Create a conceptual model using ER diagrams.

____

Create the application programs.

____

Interview the mechanics.

____

Create the file (table) structures.

____

Interview the shop manager.

Answer: The answer to this question may vary slightly from one designer to the next, depending on the selected design methodology and even on personal designer preferences. Yet, in spite of such differences, it is possible to develop a common design methodology to permit the development of a basic decision-making process and the analysis required in designing an information system. Whatever the design philosophy, a good designer uses a specific and ordered set of steps through which the database design problem is approached. The steps are generally based on three phases: analysis, design, and implementation. These phases yield the following activities:

264

ANALYSIS 1. Interview the shop manager 2. Interview the mechanics 3. Obtain a general description of company operations 4. Create a description of each system process DESIGN 5. Create a conceptual model, using ER diagrams 6. Draw a data flow diagram and system flow charts 7. Normalize the conceptual model IMPLEMENTATION 8. Create the table structures 9. Load the database 10. Create the application programs 11. Test the system This listing implies that, within each of the three phases, the steps are completed in a specific order. For example, it would seem reasonable to argue that we must first complete the interviews if we are to obtain a proper description of the company’s operations. Similarly, we may argue that a data flow diagram precedes the creation of the ER diagram. Nevertheless, the specific tasks and the order in which they are addressed may vary. Such variations do not matter, as long as the designer bases the selected procedures on appropriate design philosophy, such as top-down versus bottom-up. Given this discussion, we may present Problem 1’s solution this way: 7

Normalize the conceptual model.

Obtain a general description of company operations.

Load the database.

Create a description of each system process.

Test the system.

Draw a data flow diagram and system flow charts.

Create a conceptual model using ER diagrams.

Create the application programs.

265

Interview the mechanics.

Create the file (table) structures.

Interview the shop manager.

b. Describe the various modules that you believe the system should include. Answer: This question may be addressed in several ways. We suggest the following approach to develop a system composed of four main modules: Inventory, Payroll, Work Order, and Customer. We have illustrated the Information System’s main modules in Figure P9.1B.

FIGURE P9.1B The ABC Company’s IS System Modules

The Inventory module will include the Parts and Purchasing submodules. The Payroll Module will handle all employee and payroll information. The Work Order module keeps track of the car maintenance history and all work orders for maintenance done on a car. The Customer module keeps track of the billing of the work orders to the customers and of the payments received from those customers. c. How will a data dictionary help you develop the system? Give examples. Answer: We have addressed the role of the data dictionary within the DBLC in detail in the answer to Review Question 10. Remember that the data dictionary makes it easier to check for the existence of synonyms and homonyms, to check whether all attributes exist to support required reports, to verify appropriate relationship representations, and so on. Therefore, the data dictionary’s contents will help us to provide consistency across modules and to evaluate the system’s ability to generate the required reports. In addition, the use of the data dictionary facilitates the creation of system documentation. d. What general (system) recommendations might you make to the shop manager? For example, if the system will be integrated, what modules will be integrated? What

266

benefits would be derived from such an integrated system? Include several general recommendations. Answer: The designer’s job is to provide solutions to the main problems found during the initial study. Clearly, any system is subject to both internal and external constraints. For example, we can safely assume that the owner of the ABC Car Service & Repair Center has a time frame in mind, not to mention a spending limitation. As is true in all design work, the designer and the business owner must prioritize the modules and develop those that yield the greatest benefit within the stated time and development budget constraints. Keep in mind that it is always useful to develop a modular system that provides for future enhancement and expansion. Suppose, for example, that the ABC Car Service & Repair company management decides to integrate all of its service stations in the state in order to provide better statewide service. Such integration is likely to yield many benefits: The car history of each car will be available to any station for cars that have been serviced in more than one location; the inventory of parts will be online, thus allowing parts orders to be placed between service stations; mechanics can better share tips concerning the solution to car maintenance problems, and so on. e. What is the best approach to conceptual database design? Why? Answer: Given the nature of this business, the best way to produce this conceptual database design would be to use a centralized and top-down approach. Keep in mind that the designer must keep the design sufficiently flexible to make sure that it can accommodate any future integration of this system with the other service stations in the state. f.

Name and describe at least four reports the system should have. Explain their use. Who will use the reports?

Answer: REPORT 1 Monthly Activity contains a summary of service categories by branch and by month. Such reports may become the basis for forecasting personnel and stock requirements for each branch and for each period. REPORT 2 Mechanic Summary Sheet contains a summary of work hours clocked by each mechanic. This report would be generated weekly and would be useful for payroll and maintenance personnel scheduling purposes. REPORT 3 Monthly Inventory contains a summary of parts in inventory, inventory draw-down, parts reorder points, and information about the vendors who will provide the parts to be reordered. This report will be especially useful for inventory management purposes.

267

REPORT 4 Customer Activity contains a breakdown of customers by location, maintenance activity, current balances, available credit, and so on. This report would be useful to forecast various service demand factors, to mail promotional materials, to send maintenance reminders, to keep track of special customer requirements, and so on. 239. Suppose that you have been asked to create an information system for a manufacturing plant that produces nuts and bolts of many shapes, sizes, and functions. What questions would you ask, and how would the answers affect the database design? Answer: Basically, all answers to all (relevant) questions help shape the database design. In fact, all information collected during the initial study and all subsequent phases will have an impact on the database design. Keep in mind that the information is collected to establish the entities, attributes, and the relationships among the entities. Specifically, the relationships, connectivities, and cardinalities are shaped by the business rules that are derived from the information collected by the designer. Sample questions and their likely impact on the design might be: 

Do you want to develop the database for all departments at once, or do you want to design and implement the database for one department at a time?



How will the design approach affect the design process? (In other words, assess topdown versus bottom-up, centralized or decentralized, system scope and boundaries.)



Do you want to develop one module at a time, or do you want an integrated system? (Inventory, production, shipping, billing, etc.)



Do you want to keep track of the nuts and bolts by lot number, production shift, type, and department? Impact: conceptual and logical database design.



Do you want to keep track of the suppliers of each batch of raw material used in the production of the nuts and bolts? Impact: conceptual and logical database design. ER model.



Do you want to keep track of the customers who received the batches of nuts and bolts? Impact: conceptual and logical database design. ER model.



What reports will you require, what will be the specific reporting requirements, and to whom will these reports be distributed?

The answers to such questions affect the conceptual and logical database design, the database’s implementation, its testing, and its subsequent operation. a. What do you envision the SDLC to be? Answer: The SDLC is not a function of the information collected. Regardless of the extent of the design or its specific implementation, the SDLC phases remain:

PLANNING Initial assessment Feasibility study

268

User requirements Study of existing systems Logical system design

DETAILED SYSTEMS DESIGN Detailed system specifications

IMPLEMENTATION Coding, testing, debugging Installation, fine-tuning

MAINTENANCE Evaluation Maintenance Enhancements b. What do you envision the DBLC to be? Answer: As is true for the SDLC, the DBLC is not a function of the kind and extent of the collected information. Thus, the DBLC phases and their activities remain as shown:

DATABASE INITIAL STUDY Analyze the company situation Define problems and constraints Define objectives Define scope and boundaries

DATABASE DESIGN Create the conceptual design Create the logical design Create the physical design

IMPLEMENTATION AND LOADING Install the DBMS

269

Create the database(s) Load or convert the data

TESTING AND EVALUATION Test the database Fine-tune the database Evaluate the database and its application programs

OPERATION Produce the required information flow

MAINTENANCE AND EVOLUTION Introduce changes Make enhancements 240. Suppose that you perform the same functions noted in Problem 2 for a larger warehousing operation. How are the two sets of procedures similar? How and why are they different? Answer: The development of an information system will differ in the approach and philosophy used. More precisely, the designer team will probably be formed by a group of system analysts and may decide to use a decentralized approach to database design. Also, as is true for any organization, the system scope and constraints may be very different for different systems. Therefore, designers may opt to use different techniques at different stages. For example, the database initial study phase may include separate studies carried out by separate design teams at several geographically distant locations. Each of the findings of the design teams will later be integrated to identify the main problems, solutions, and opportunities that will guide the design and development of the system. 241. Using the same procedures and concepts employed in Problem 1, how would you create an information system for the Tiny College example in Chapter 4? Answer: Tiny College is a medium-sized educational institution that uses many database-intensive operations, such as student registration, academic administration, inventory management, and payroll. To create an information system, first perform an initial database study to determine the information system’s objectives.

270

Next, study Tiny College’s operations and processes (flow of data) to identify the main problems, constraints, and opportunities. A precise definition of the main problems and constraints will enable the designer to make sure that the design improves Tiny College’s operational efficiency. An improvement in operational efficiency is likely to create opportunities to provide new services that will enhance Tiny College’s competitive position. After the initial database study is done and the alternative solutions are presented, the end users ultimately decide which one of the probable solutions is most appropriate for Tiny College. Keep in mind that the development of a system this size will probably involve people who have quite different backgrounds. For example, it is likely that the designer must work with people who play a managerial role in communications and local area networks, as well as with the “troops in the trenches” such as programmers and system operators. The designer should, therefore, expect that there will be a wide range of opinions concerning the proposed system’s features. It is the designer’s job to reconcile the many (and often conflicting) views of the “ideal” system. Once a proposed solution has been agreed upon, the designer(s) may determine the proposed system’s scope and boundaries. We are then able to begin the design phase. As the design phase begins, keep in mind that Tiny College’s information system is likely to be used by many users (20 to 40 minimum) who are located on distant sites across campus. Therefore, the designer must consider a range of communication issues involving the use of such technologies as local area networks. These technologies must be considered as the database designer(s) begin to develop the structure of the database to be implemented. The remaining development work conforms to the SDLC and the DBLC phases. Special attention must be given to the system design’s implementation and testing to ensure that all the system modules interface properly. Finally, the designer(s) must provide all the appropriate system documentation and ensure that all appropriate system maintenance procedures (periodic backups, security checks, etc.) are in place to ensure the system’s proper operation. Keep in mind that two very important issues in a university-wide system are end-user training and support. Therefore, the system designer(s) must make sure that all end users know the system and know how it is to be used to enjoy its benefits. In other words, make sure that end-user support programs are in place when the system becomes operational. 242. Write the proper sequence of activities for the design of a video rental database. (The initial ERD was shown in Figure 9.9.) The design must support all rental activities, customer payment tracking, and employee work schedules, as well as track which employees checked out the videos to the customers. After you finish writing the design activity sequence, complete the ERD to ensure that the database design can be successfully implemented. (Make sure that the design is normalized properly and that it can support the required transactions. Answer: Given its level of detail and (relative) complexity, this problem would make an excellent class project. Use the chapter’s coverage of the database life cycle (DBLC) as the procedural template. The text’s Figure 9.3 is particularly useful as a procedural map for this problem’s solution and Figure 9.6 provides a more detailed view of the database design’s procedural flow. Make sure that the students review Section 9-3b, “Database Design,” before they attempt to produce the problem solution.

271

Appendix B “The University Lab: Conceptual Design” and Appendix C “The University Lab: Conceptual Design Verification, Logical Design, and Implementation” show a very detailed example of the procedures required to deliver a completed database. You will find a more detailed video rental database problem description in Appendix B, problem 4. This problem requires the completion of the initial database design. The solution is shown in this manual’s Appendix B coverage. This design is verified in Appendix C, Problem 2. The Visio Professional files for the initial and verified designs are located on your instructor’s resources; the FigD-P04a-The-Initial-Crows-Foot-ERD-for-the-Video-Rental-Store.vsd file has the initial design. Select the FigE-P02a-The-Revised-Video-Rental-Crows-Foot-ERD.vsd file to see the verified design. 243. In a construction company, a new system has been in place for a few months and now there is a list of possible changes/updates that need to be done. For each of the changes/updates, specify what type of maintenance needs to be done: (a) corrective, (b) adaptive, or (c) perfective. a. An error in the size of one of the fields has been identified and it needs to be updated status field needs to be changed. Answer: This is a change in response to a system error – corrective maintenance. b. The company is expanding into a new type of service, which will require enhancing the system with a new set of tables to support this new service and integrate it with the existing data. Answer: This is a change to enhance the system—perfective maintenance. c. The company has to comply with some government regulations. To do this, it will require adding a couple of fields to the existing system tables. Answer: This is a change in response to changes in the business environment—adaptive maintenance. 244. You have been assigned to design the database for a new soccer club. Indicate the most appropriate sequence of activities by labeling each of the following steps in the correct order. (e.g., if you think that “Load the database” is the appropriate first step, label it “1.”) Answer: 10

Create the application programs.

Create a description of each system process.

Test the system.

Load the database.

Normalize the conceptual model.

Interview the soccer club president.

Create a conceptual model using ER diagrams.

272

Interview the soccer club director of coaching.

Create the file (table) structures.

Obtain a general description of the soccer club operations.

Draw a data flow diagram and system flowcharts.

ANSWERS TO REVIEW QUESTIONS 245. Explain the following statement: A transaction is a logical unit of work. Answer: A transaction is a logical unit of work that must be entirely completed or aborted; no intermediate states are accepted. In other words, a transaction, composed of several database requests, is treated by the DBMS as a unit of work in which all transaction steps must be fully completed if the transaction is to be accepted by the DBMS. Acceptance of an incomplete transaction will yield an inconsistent database state. To avoid such a state, the DBMS ensures that all of a transaction’s database operations are completed before they are committed to the database. For example, a credit sale requires a minimum of three database operations: 1. An invoice is created for the sold product. 2. The product’s inventory quantity on hand is reduced. 3. The customer accounts payable balance is increased by the amount listed on the invoice. If only parts 1 and 2 are completed, the database will be left in an inconsistent state. Unless all three parts (1, 2, and 3) are completed, the entire sales transaction is canceled. 246. What is a consistent database state, and how is it achieved? Answer: A consistent database state is one in which all data integrity constraints are satisfied. To achieve a consistent database state, a transaction must take the database from one consistent state to another. (See the answer to Question 1.) 247. The DBMS does not guarantee that the semantic meaning of the transaction truly represents the real-world event. What are the possible consequences of that limitation? Give an example.

273

Answer: The database is designed to verify the syntactic accuracy of the database commands given by the user to be executed by the DBMS. The DBMS will check that the database exists, that the referenced attributes exist in the selected tables, that the attribute data types are correct, and so on. Unfortunately, the DBMS is not designed to guarantee that the syntactically correct transaction accurately represents the real-world event. For example, if the end user sells 10 units of product 100179 (Crystal Vases), the DBMS cannot detect errors such as the operator entering 10 units of product 100197 (Crystal Glasses). The DBMS will execute the transaction, and the database will end up in a technically consistent state but in a real-world inconsistent state because the wrong product was updated. 248. List and discuss the four individual transaction properties. Answer: The four transaction properties are: Atomicity

requires that all parts of a transaction must be completed or the transaction is aborted. This property ensures that the database will remain in a consistent state.

Consistency

indicates the permanence of the database consistent state.

Isolation

means that the data required by an executing transaction cannot be accessed by any other transaction until the first transaction finishes. This property ensures data consistency for concurrently executing transactions.

Durability

indicates that the database will be in a permanent consistent state after the execution of a transaction. In other words, once a consistent state is reached, it cannot be lost.

All four transaction properties work together to make sure that a database maintains data integrity and consistency for either a single-user or a multiuser DBMS. 249. What does serializability of transactions mean? Answer: Serializability of transactions means that a series of concurrent transactions will yield the same result as if they were executed one after another. 250. What is a transaction log, and what is its function? Answer: The transaction log is a special DBMS table that contains a description of all the database transactions executed by the DBMS. The database transaction log plays a crucial role in maintaining database concurrency control and integrity. The information stored in the log is used by the DBMS to recover the database after a transaction is aborted or after a system failure. The transaction log is usually stored in a different hard disk or in a different media (tape) to prevent the failure caused by a media error. 251. What is a scheduler, what does it do, and why is its activity important to concurrency control? Answer: The scheduler is the DBMS component that establishes the order in which concurrent database operations are executed. The scheduler interleaves the execution of the database operations (belonging to several concurrent transactions) to ensure the serializability of transactions. In other words, the scheduler guarantees that the execution of concurrent transactions will yield the same result as though the transactions were executed one after another. The scheduler is important because it is the DBMS component that will ensure transaction serializability. In other words, the scheduler allows the concurrent

274

execution of transactions, giving end users the impression that they are the DBMS’s only users. 252. What is a lock, and how does it work in general? Answer: A lock is a mechanism used in concurrency control to guarantee the exclusive use of a data element to the transaction that owns the lock. For example, if the data element X is currently locked by transaction T1, transaction T2 will not have access to the data element X until T1 releases its lock. Generally speaking, a data item can be in only two states: locked (being used by some transaction) or unlocked (not in use by any transaction). To access a data element X, a transaction T1 first must request a lock to the DBMS. If the data element is not in use, the DBMS will lock X to be used by T1 exclusively. No other transaction will have access to X while T1 is executed. 253. What are the different levels of lock granularity? Answer: Lock granularity refers to the size of the database object that a single lock is placed upon. Lock granularity can be: Database-level, meaning the entire database is locked by one lock. Table-level, meaning a table is locked by one lock. Page-level, meaning a diskpage is locked by one lock. Row-level, meaning one row is locked by one lock. Field-level, meaning one field in one row is locked by one lock. 254. Why might a page-level lock be preferred over a field-level lock? Answer: Smaller lock granularity improves the concurrency of the database by reducing contention to lock database objects. However, smaller lock granularity also means that more locks must be maintained and managed by the DBMS, requiring more processing overhead and system resources for lock management. Concurrency demands and system resource usage must be balanced to ensure the best overall transaction performance. In some circumstances, page-level locks, which require fewer system resources, may produce better overall performance than field-level locks, which require more system resources. 255. What is concurrency control, and what is its objective? Answer: Concurrency control is the activity of coordinating the simultaneous execution of transactions in a multiprocessing or multiuser database management system. The objective of concurrency control is to ensure the serializability of transactions in a multiuser database management system. (The DBMS’s scheduler is in charge of maintaining concurrency control.) Because it helps to guarantee data integrity and consistency in a database system, concurrency control is one of the most critical activities performed by a DBMS. If concurrency control is not maintained, three serious problems may be caused by concurrent transaction execution: lost updates, uncommitted data, and inconsistent retrievals. 256. What is an exclusive lock, and under what circumstances is it granted?

275

Answer: An exclusive lock is one of two lock types used to enforce concurrency control. (A lock can have three states: unlocked, shared (read) lock, and exclusive (write) lock. The “shared” and “exclusive” labels indicate the nature of the lock.) An exclusive lock exists when access to a data item is specifically reserved for the transaction that locked the object. The exclusive lock must be used when a potential for conflict exists, for example, when one or more transactions must update (WRITE) a data item. Therefore, an exclusive lock is issued only when a transaction must WRITE (update) a data item and no locks are currently held on that data item by any other transaction. To understand the reasons for having an exclusive lock, look at its counterpart, the shared lock. Shared locks are appropriate when concurrent transactions are granted READ access on the basis of a common lock, because concurrent transactions based on a READ cannot produce a conflict. A shared lock is issued when a transaction must read data from the database and no exclusive locks are held on the data to be read. 257. What is a deadlock, and how can it be avoided? Discuss several strategies for dealing with deadlocks. Answer: Base your discussion on Section 10-3d, Deadlocks. Start by pointing out that, although locks prevent serious data inconsistencies, their use may lead to two major problems: 1. The transaction schedule dictated by the locking requirements may not be serializable, thus causing data integrity and consistency problems. 2. The schedule may create deadlocks. Database deadlocks are the equivalent of a traffic gridlock in a big city and are caused by two transactions waiting for each other to unlock data. Use Table 10.13 in the text to illustrate the scenario that leads to a deadlock. The table has been reproduced below for your convenience.

Table 10.13 How a Deadlock Condition Is Created

TIME 0 1 2 3 4 5 6 7 8 9 … … … …

TRANSACTION T1:LOCK(X) T2:LOCK(Y) T1:LOCK(Y) T2:LOCK(X) T1:LOCK(Y) T2:LOCK(X) T1:LOCK(Y) T2:LOCK(X) T1:LOCK(Y) ………….. ………….. ………….. …………..

REPLY OK OK WAIT WAIT WAIT WAIT WAIT WAIT WAIT …….. …….. …….. ……..

Data X Unlocked Locked Locked Locked Locked Locked Locked Locked Locked … … … …

LOCK STATUS Data Y Unlocked Unlocked Locked Deadlock Locked Locked Locked Locked Locked Locked … … … …

276

In a real-world DBMS, many more transactions can be executed simultaneously, thereby increasing the probability of generating deadlocks. Note that deadlocks are possible only if one of the transactions wants to obtain an exclusive lock on a data item; no deadlock condition can exist among shared locks. Three basic techniques exist to control deadlocks:

DEADLOCK PREVENTION A transaction requesting a new lock is aborted if there is a possibility that a deadlock may occur. If the transaction is aborted, all the changes made by this transaction are rolled back and all locks are released. The transaction is then re-scheduled for execution. Deadlock prevention works because it avoids the conditions that lead to deadlocking.

DEADLOCK DETECTION The DBMS periodically tests the database for deadlocks. If a deadlock is found, one of the transactions (the “victim”) is aborted (rolled back and rescheduled) and the other transaction continues. Note particularly the discussion in Section 10-4a, Wait/Die and Wound/Wait Schemes.

DEADLOCK AVOIDANCE The transaction must obtain all the locks it needs before it can be executed. This technique avoids rollback of conflicting transactions by requiring that locks be obtained in succession. However, the serial lock assignment required in deadlock avoidance increases the response times. The best deadlock control method depends on the database environment. For example, if the probability of deadlocks is low, deadlock detection is recommended. However, if the probability of deadlocks is high, deadlock prevention is recommended. If response time is not high on the system priority list, deadlock avoidance may be employed. 258. What are some disadvantages of time stamping methods for concurrency control? Answer: The disadvantages are: (1) each value stored in the database requires two additional time stamp fields—one for the last time the field was read and one for the last time it was updated, (2) increased memory and processing overhead requirements, and (3) many transactions may have to be stopped, rescheduled, and restamped. 259. Why might it take a long time to complete transactions when using an optimistic approach to concurrency control? Answer: Because the optimistic approach makes the assumption that conflict from concurrent transactions is unlikely, it does nothing to avoid conflicts or control the conflicts. The only test for conflict occurs during the validation phase. If a conflict is detected, then the entire transaction restarts. In an environment with few conflicts from concurrency, this type of single checking scheme works well. In an environment where conflicts are common, a transaction may have to be restarted numerous times before it can be written to the database. 260. What are the three types of database critical events that can trigger the database recovery process? Give some examples for each one.

277

Answer: Backup and recovery functions constitute a very important component of today’s DBMSs. Some DBMSs provide functions that allow the database administrator to perform and schedule automatic database backups to permanent secondary storage devices, such as disks or tapes. Critical events include: 

Hardware/software failures. hard disk media failure, a bad capacitor on a motherboard, or a failing memory bank. Other causes of errors under this category include application program or operating system errors that cause data to be overwritten, deleted, or lost.



Human-caused incidents. This type of event can be categorized as unintentional or intentional.





An unintentional failure is caused by carelessness by end users. Such errors include deleting the wrong rows from a table, pressing the wrong key on the keyboard, or shutting down the main database server by accident.



Intentional events are of a more severe nature and normally indicate that the company data are at serious risk. Under this category are security threats caused by hackers trying to gain unauthorized access to data resources and virus attacks caused by disgruntled employees trying to compromise the database operation and damage the company.

Natural disasters. This category includes fires, earthquakes, floods, and power failures.

261. What are the four ANSI transaction isolation levels? What type of reads does each level allow? Answer: The four ANSI transaction isolation levels are (1) read uncommitted, (2) read committed, (3) repeatable read, and (4) serializable. These levels allow different “questionable” reads. A read is questionable if it can produce inconsistent results. Read uncommitted isolation will allow dirty reads, nonrepeatable reads, and phantom reads. Read committed isolation will allow nonrepeatable reads and phantom reads. Repeatable read isolation will allow phantom reads. Serializable does not allow any questionable reads.

278

ANSWERS TO PROBLEMS Suppose that you are a manufacturer of product ABC, which is composed of parts A, B, and C. Each time a new product ABC is created, it must be added to the product inventory using the PROD_QOH in a table named PRODUCT. Also, each time the product is created, the parts inventory, using PART_QOH in a table named PART, must be reduced by one each of parts A, B, and C. The sample database contents are shown in Table P10.1.

Table P10.1

Table name: PRODUCT

Table name: PART

PROD_CODE

PROD_QOH

PART_CODE

PART_QOH

ABC

1,205

567

549

Given the preceding information, complete Problems 1a through 1e. a. How many database requests can you identify for an inventory update for both PRODUCT and PART? Answer: Depending in how the SQL statements are written, there are two correct answers: 4 or 2. b. Using SQL, write each database request you identified in Problem 1a. Answer: The database requests are shown in the following table.

Four SQL statements UPDATE PRODUCT SET PROD_QOH = PROD_OQH + 1 WHERE PROD_CODE = ‘ABC’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘A’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘B’ UPDATE PART

Two SQL statements UPDATE PRODUCT SET PROD_QOH = PROD_OQH + 1 WHERE PROD_CODE = ‘ABC’ UPDATE PART SET PART_QOH = PART_OQH 1 WHERE PART_CODE = ‘A’ OR PART_CODE = ‘B’ OR PART_CODE = ‘C’

SET PART_QOH = PART_OQH - 1

279

Four SQL statements

Two SQL statements

WHERE PART_CODE = ‘C’

c. Write the complete transaction(s). Answer: The transactions are shown in the following table.

Four SQL statements

Two SQL statements

BEGIN TRANSACTION

UPDATE PRODUCT

SET PROD_QOH = PROD_OQH + 1

WHERE PROD_CODE = ‘ABC’

UPDATE PART

SET PART_QOH = PART_OQH - 1

WHERE PART_CODE = ‘A’

WHERE PART_CODE = ‘A’ OR PART_CODE = ‘B’ OR

UPDATE PART

PART_CODE = ‘C’

SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘B’

COMMIT;

UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘C’ COMMIT; d. Write the transaction log, using Table 10.1 as your template. Answer: We assume that product “ABC” has a PROD_QOH = 23 at the start of the transaction and that the transaction is representing the addition of 1 new product. We also assume that PART components “A”, “B”, and “C” have a PROD_QOH equal to 56, 12, and 45, respectively. TRL ID

TRX NUM

PREV PTR

NEXT PTR

OPERATION

1A3

NULL

START

**START TRANSACTION

1A3

UPDATE

PRODUCT

TABLE

ROW ID

ATTRIBUTE

BEFORE VALUE

AFTER VALUE

‘ABC’

PROD_QOH

280

TRL ID

TRX NUM

PREV PTR

NEXT PTR

ROW ID

ATTRIBUTE

BEFORE VALUE

AFTER VALUE

OPERATION

1A3

UPDATE

PART

‘A’

PART_QOH

1A3

UPDATE

PART

‘B’

PART_QOH

1A3

UPDATE

PART

‘C’

PART_QOH

1A3

NULL

COMMIT

** END TRANSACTION

TABLE

e. Using the transaction log you created in Problem 1d, trace its use in database recovery. Answer: Begin with the last trl_id (trl_id 6) for the transaction (trx_num 1A3) and work backward using the prev_ptr to identify the next step to undo moving from the end of the transaction back to the beginning. Trl_ID 6: Nothing to change because it is an end of transaction marker. Trl_ID 5: Change PART_QOH from 44 to 45 for ROW_ID ‘C’ in PART table. Trl_ID 4: Change PART_QOH from 11 to 12 for ROW_ID ‘B’ in PART table. Trl_ID 3: Change PART_QOH from 55 to 56 for ROW_ID ‘A’ in PART table. Trl_ID 2: Change PROD_QOH from 24 to 23 for ROW_ID ‘ABC’ in PRODUCT table. Trl_ID 1: Nothing to change because it is a beginning of transaction marker. 262. Describe the three most common problems with concurrent transaction execution. Explain how concurrency control can be used to avoid those problems. Answer: The three main concurrency control problems are triggered by lost updates, uncommitted data, and inconsistent retrievals. These control problems are discussed in detail in Section 10-2. Note particularly Section 10-2a, Lost Updates, Section 10-2b, Uncommitted Data, and Section 10-2c, Inconsistent Retrievals. 263. What DBMS component is responsible for concurrency control? How is this feature used to resolve conflicts? Answer: Severe database integrity and consistency problems can arise when two or more concurrent transactions are executed. In order to avoid such problems, the DBMS must exercise concurrency control. The DBMS’s component in charge of concurrency control is the scheduler. The scheduler is discussed in Section 10-2d. Note particularly the Read/Write conflict scenarios illustrated with the help of Table 10.11, Read/Write Conflict Scenarios: Conflicting Database Operations Matrix. 264. Using a simple example, explain the use of binary and shared/exclusive locks in a DBMS. Answer: Binary locks have two states, locked and unlocked. Shared/exclusive locks have three states, shared lock, exclusive lock, and unlocked. For example, given a row-level lock granularity and three transactions that all want access to the same customer row with the following requests:

281

T1: read customer name T2: read customer address T3: update customer balance If binary locks are used, then while T1 has the customer row locked, both other transactions must wait. Once T1 releases the lock, T2 gets a lock, and T3 continues to wait. When T2 is finished, then T3 can finally get a lock. T3’s total wait time is the combination of T1’s time plus T2’s time.

282

If shared/exclusive locks are used, then T1 gets a shared lock on the customer since it is only reading the data. T2 is then allowed to join the shared lock with T1 since it also only wants to read the data. T2 did not have to wait for T1 to finish, both transactions shared the locked data simultaneously. T3 needs an exclusive lock to update the data, so it must wait until both T1 and T2 release the shared lock. The shared/exclusive locks provided overall better performance since T2 did not have to wait, and T3’s total wait time is less since T2 did not have to wait for T1 to finish before it could begin. 265. Suppose that your database system has failed. Describe the database recovery process and the use of deferred-write and write-through techniques. Answer: Recovery restores a database from a given state, usually inconsistent, to a previously consistent state. Depending on the type and the extent of the failure, the recovery process ranges from a minor short-term inconvenience to a major long-term rebuild action. Regardless of the extent of the required recovery process, recovery is not possible without backup. The database recovery process generally follows a predictable scenario: 1. Determine the type and the extent of the required recovery. 2. If the entire database needs to be recovered to a consistent state, the recovery uses the most recent backup copy of the database in a known consistent state. 3. The backup copy is then rolled forward to restore all subsequent transactions by using the transaction log information. 4. If the database needs to be recovered, but the committed portion of the database is usable, the recovery process uses the transaction log to “undo” all the transactions that were not committed. Recovery procedures generally make use of deferred-write and write-thru techniques. In the case of the deferred-write or deferred-update, the transaction operations do not immediately update the database. Instead: 

All changes (previous and new data values) are first written to the transaction log.



The database is updated only after the transaction reaches its commit point.



If the transaction fails before it reaches its commit point, no changes (no roll-back or undo) need to be made to the database because the database was never updated.

In contrast, if the write-thru or immediate-update technique is used: 

The database is immediately updated by transaction operations during the transaction’s execution, even before the transaction reaches its commit point.



The transaction log is also updated; so if a transaction fails, the database uses the log information to roll back (“undo”) the database to its previous state.

ONLINE CONTENT The Ch10_ABC_Markets database is available at www.cengage.com. This database is stored in Microsoft Access format. 266. ABC Markets sell products to customers. The relational diagram shown in Figure P10.6 represents the main entities for ABC’s database. Note the following important characteristics: © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

283



A customer may make many purchases, each one represented by an invoice.  The CUS_BALANCE is updated with each credit purchase or payment and represents the amount the customer owes.  The CUS_BALANCE is increased (+) with every credit purchase and decreased (−) with every customer payment.  The date of last purchase is updated with each new purchase made by the customer.  The date of last payment is updated with each new payment made by the customer.



An invoice represents a product purchase by a customer.  An INVOICE can have many invoice LINEs, one for each product purchased.  The INV_TOTAL represents the total cost of the invoice, including taxes.  The INV_TERMS can be “30,” “60,” or “90” (representing the number of days of credit) or “CASH,” “CHECK,” or “CC.”  The invoice status can be “OPEN,” “PAID,” or “CANCEL.”



A product’s quantity on hand (P_QTYOH) is updated (decreased) with each product sale.



A customer may make many payments. The payment type (PMT_TYPE) can be one of the following:  “CASH” for cash payments.  “CHECK” for check payments.  “CC” for credit card payments.



The payment details (PMT_DETAILS) are used to record data about check or credit card payments:  The bank, account number, and check number for check payments.  The issuer, credit card number, and expiration date for credit card payments.

Note: Not all entities and attributes are represented in this example. Use only the attributes indicated.

284

FIGURE P10.6 The ABC Markets Relational Diagram

Using this database, write the SQL code to represent each of the following transactions. Use BEGIN TRANSACTION and COMMIT to group the SQL statements in logical transactions. a. On May 11, 2022, customer 10010 makes a credit purchase (30 days) of one unit of product 11QER/31 with a unit price of $110.00; the tax rate is 8 percent. The invoice number is 10983, and this invoice has only one product line. Answer: a. BEGIN TRANSACTION b. INSERT INTO INVOICE i. c.

VALUES (10983, ‘10010’, ‘2022-05-11’, 118.80, ‘30’, ‘OPEN’);

INSERT INTO LINE i.

VALUES (10983, 1, ‘11QER/31’, 1, 110.00);

d. UPDATE PRODUCT i.

SET P_QTYOH = P_QTYOH – 1

ii.

WHERE P_CODE = ‘11QER/31’;

e. UPDATE CUSTOMER f.

SET CUS_DATELSTPUR = ‘2022-05-11’, CUS_BALANCE = CUS_BALANCE +118.80

g. WHERE CUS_CODE = ‘10010’;

285

h. COMMIT; b. On June 3, 2022, customer 10010 makes a payment of $100 in cash. The payment ID is 3428. Answer: a. BEGIN TRANSACTION b. INSERT INTO PAYMENTS VALUES (3428, ‘2022-06-03’, ‘10010’, 100.00, ‘CASH’, ‘None’); UPDATE CUSTOMER; SET CUS_DATELSTPMT = ‘2022-06-03’, CUS_BALANCE = CUS_BALANCE -100.00 WHERE CUS_CODE = ‘10010’; COMMIT 267. Create a simple transaction log (using the format shown in Table 10.14) to represent the actions of the transactions in Problems 6a and 6b. Answer: The transaction log is shown in Table P10.7.

Table P10.7 The ABC Markets Transaction Log TRL ID

TRX NUM

PREV PTR

NEXT PTR

OPERATION

TABLE

ROW ID

ATTRIBUTE

BEFORE VALUE

987

101

Null

1023

START

* Start Trx.

1023

101

987

1026

INSERT

INVOICE

10983

10983, 10010, 2022-05-11, 118.80, 30, OPEN

1026

101

1023

1029

INSERT

LINE

10983, 1

10983, 1, 11QER/31, 1, 110.00

1029

101

1026

1031

UPDATE

PRODUCT

11QER/31

P_QTYOH

1031

101

1029

1032

UPDATE

CUSTOMER

10010

CUS_BALANCE

345.67

464.47

1032

101

1031

1034

UPDATE

CUSTOMER

10010

CUS_DATELSTPUR

2022-0505

2022-05-11

1034

101

1032

Null

COMMIT

* End Trx. *

1089

102

Null

1091

START

* Start Trx.

1091

102

1089

1095

INSERT

PAYMENT

3428

AFTER VALUE

3428, 202206-03, 10010, 100.00, CASH, None

286

TRL ID

TRX NUM

PREV PTR

NEXT PTR

OPERATION

TABLE

ROW ID

ATTRIBUTE

BEFORE VALUE

AFTER VALUE

1095

102

1091

1096

UPDATE

CUSTOMER

10010

CUS_BALANCE

464.47

364.47

1096

102

1095

1097

UPDATE

CUSTOMER

10010

CUS_DATELSTPMT

2022-0502

2022-06-03

1097

102

1096

Null

COMMIT

* End Trx.

Note: Because we have not shown the table contents, the “before” values in the transaction can be assumed. The “after” value must be computed using the assumed “before” value, plus or minus the transaction value. Also, in order to save some space, we have combined the “after” values for the INSERT statements into a single cell. Actually, each value could be entered in individual rows. 268. Assuming that pessimistic locking is being used but the two-phase locking protocol is not, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6a. Answer:

Time

Action

Lock INVOICE

Insert row 10983 into INVOICE

Unlock INVOICE

Lock LINE

Insert row 10983, 1 into LINE

Unlock LINE

Lock PRODUCT

Update PRODUCT 11QER/31, P_QTYOH from 47 to 46

Unlock PRODUCT

Lock CUSTOMER

Update CUSTOMER 10010, CUS_BALANCE from 345.67 to 464.47

Update CUSTOMER 10010, CUS_DATELSTPUR from 2022-05-05 to 2022-05-11

Unlock CUSTOMER 269. Assuming that pessimistic locking is being used with the two-phase locking protocol, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6a. Answer:

287

Time

Action

Lock INVOICE

Lock LINE

Lock PRODUCT

Lock CUSTOMER

Insert row 10983 into INVOICE

Insert row 10983, 1 into LINE

Update PRODUCT 11QER/31, P_QTYOH from 47 to 46

Update CUSTOMER 10010, CUS_BALANCE from 345.67 to 464.47

Update CUSTOMER 10010, CUS_DATELSTPUR from 2022-05-05 to 2022-05-11

Unlock INVOICE

Unlock LINE

Unlock PRODUCT

Unlock CUSTOMER 270. Assuming that pessimistic locking is being used but the two-phase locking protocol is not, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6b. Answer:

Time

Action

Lock PAYMENT

Insert row 3428 into PAYMENT

Unlock PAYMENT

Lock CUSTOMER

Update CUSTOMER 10010, CUS_BALANCE from 464.47 to 364.47

Update CUSTOMER 10010, CUS_DATELSTPMT from 2022-05-02 to 2022-06-03

Unlock CUSTOMER 271. Assuming that pessimistic locking with the two-phase locking protocol is being used with rowlevel lock granularity, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6b. Answer:

288

Time

Action

Lock PAYMENT

Lock CUSTOMER

Insert row 3428 into PAYMENT

Update CUSTOMER 10010, CUS_BALANCE from 464.47 to 364.47

Update CUSTOMER 10010, CUS_DATELSTPMT from 2022-05-02 to 2022-06-03

Unlock PAYMENT

Unlock CUSTOMER

ANSWERS TO REVIEW QUESTIONS 272. What is SQL performance tuning? Answer: SQL performance tuning describes a process—on the client side—that will generate an SQL query to return the correct answer in the least amount of time, using the minimum amount of resources at the server end. 273. What is database performance tuning? Answer: DBMS performance tuning describes a process—on the server side—that will properly configure the DBMS environment to respond to clients’ requests in the fastest way possible while making optimum use of existing resources. 274. What is the focus of most performance-tuning activities, and why does that focus exist? Answer: Most performance-tuning activities focus on minimizing the number of I/O operations because the I/O operations are much slower than reading data from the data cache. At this point in the discussion, it will be good to point out the technological advances in hardware, such as solid-state drives (SSD) and in-memory databases. Although such advances improve I/O performance at the physical level, performance tuning is still important at the query formulation level because inefficient joins can still cause a query to use © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

289

unnecessary resources and increase processing times at parsing, execution, and fetching phases. 275. What are database statistics, and why are they important? Answer: The term database statistics refers to a number of measurements gathered by the DBMS to describe a snapshot of the database objects’ characteristics. The DBMS gathers statistics about objects such as tables, indexes, and available resources—such as the number of processors used, processor speed, and temporary space available. Such statistics are used to make critical decisions about improving query processing efficiency.

290

276. How are database statistics obtained? Answer: Database statistics can be gathered manually by the DBA or automatically by the DBMS. For example, many DBMS vendors support SQL’s ANALYZE command to gather statistics. In addition, many vendors have their own routines to gather statistics. For example, IBM’s DB2 uses the RUNSTATS procedure, while Microsoft’s SQL Server uses the UPDATE STATISTICS procedure and provides the Auto-Update and Auto-Create Statistics options in its initialization parameters. 277. What database statistics measurements are typical of tables, indexes, and resources? Answer: For tables, typical measurements include the number of rows, the number of disk blocks used, row length, the number of columns in each row, the number of distinct values in each column, the maximum value in each column, the minimum value in each column, and what columns have indexes. For indexes, typical measurements include the number and name of columns in the index key, the number of key values in the index, the number of distinct key values in the index key, and histogram of key values in an index. For resources, typical measurements include the logical and physical disk block size, the location and size of data files, and the number of extends per data file. 278. How is the processing of SQL DDL statements (such as CREATE TABLE) different from the processing required by DML statements? Answer: A DDL statement actually updates the data dictionary tables or system catalog, while DML statements (SELECT, INSERT, UPDATE, and DELETE) mostly manipulate enduser data. 279. In simple terms, the DBMS processes a query in three phases. What are the phases, and what is accomplished in each phase? Answer: The three phases are: 1. Parsing. The DBMS parses the SQL query and chooses the most efficient access/execution plan. 2. Execution. The DBMS executes the SQL query using the chosen execution plan. 3. Fetching. The DBMS fetches the data and sends the result set back to the client. Parsing involves breaking the query into smaller units and transforming the original SQL query into a slightly different version of the original SQL code—but one that is “fully equivalent” and more efficient. Fully equivalent means that the optimized query results are always the same as the original query. More efficient means that the optimized query will, almost always, execute faster than the original query. (Note that we say almost always because many factors affect the performance of a database. These factors include the network, the client’s computer resources, and even other queries running concurrently in the same database.) After the parsing and execution phases are completed, all rows that match the specified condition(s) have been retrieved, sorted, grouped, and/or—if required—aggregated. During the fetching phase, the rows of the resulting query result set are returned to the client. During this phase, the DBMS may use temporary table space to store temporary data.

291

280. If indexes are so important, why not index every column in every table? (Include a brief discussion of the role played by data sparsity.) Answer: Indexing every column in every table will tax the DBMS too much in terms of indexmaintenance processing, especially if the table has many attributes, many rows, and/or requires many inserts, updates, and/or deletes. One measure to determine the need for an index is the data sparsity of the column you want to index. Data sparsity refers to the number of different values a column could possibly have. For example, a STU_SEX column in a STUDENT table can have only two possible values, “M” or “F”; therefore, this column is said to have low sparsity. In contrast, the STU_DOB column that stores the student date of birth can have many different date values; therefore, this column is said to have high sparsity. Knowing the sparsity helps you decide whether or not the use of an index is appropriate. For example, when you perform a search in a column with low sparsity, you are very likely to read a high percentage of the table rows anyway; therefore, index processing may be unnecessary work. 281. What is the difference between a rule-based optimizer and a cost-based optimizer? Answer: A rule-based optimizer uses a set of preset rules and points to determine the best approach to execute a query. The rules assign a “cost” to each SQL operation; the costs are then added to yield the cost of the execution plan. A cost-based optimizer uses sophisticated algorithms based on the statistics about the objects being accessed to determine the best approach to execute a query. In this case, the optimizer process adds up the processing cost, the I/O costs, and the resource costs (RAM and temporary space) to come up with the total cost of a given execution plan. 282. What are optimizer hints and how are they used? Answer: Hints are special instructions for the optimizer that are embedded inside the SQL command text. Although the optimizer generally performs very well under most circumstances, there are some circumstances in which the optimizer may not choose the best execution plan. Remember, the optimizer makes decisions based on the existing statistics. If the statistics are old, the optimizer may not do a good job in selecting the best execution plan. Even with the current statistics, the optimizer choice may not be the most efficient one. There are some occasions when the end-user would like to change the optimizer mode for the current SQL statement. In order to accomplish this task, you have to use hints. 283. What are some general guidelines for creating and using indexes? Answer: Create indexes for each single attribute used in a WHERE, HAVING, ORDER BY, or GROUP BY clause. If you create indexes in all single attributes used in search conditions, the DBMS will access the table using an index scan, instead of a full table scan. For example, if you have an index for P_PRICE, the condition P_PRICE > 10.00 can be solved by accessing the index, instead of sequentially scanning all table rows and evaluating P_PRICE for each row. Indexes are also used in join expressions, such as in CUSTOMER.CUS_CODE = INVOICE.CUS_CODE.

292

Do not use indexes in small tables or tables with low sparsity. Remember, small tables and low sparsity tables are not the same thing. A search condition in a table with low sparsity may return a high percentage of table rows anyway, making the index operation too costly and making the full table scan a viable option. Using the same logic, do not create indexes for tables with few rows and few attributes—unless you must ensure the existence of unique values in a column. Declare primary and foreign keys so the optimizer can use the indexes in join operations. All natural joins and old-style joins will benefit if you declare primary keys and foreign keys because the optimizer will use the available indexes at join time. The declaration of a PK or an FK will automatically create an index for the declared column. Also, for the same reason, it is better to write joins using the SQL JOIN syntax. (See Chapter 8, “Advanced SQL.”) Declare indexes in join columns other than PK/FK. If you do join operations on columns other than the primary and foreign key, you may be better off declaring indexes in such columns. 284. Most query optimization techniques are designed to make the optimizer’s work easier. What factors should you keep in mind if you intend to write conditional expressions in SQL code? Answer: Use simple columns or literals as operands in a conditional expression—avoid the use of conditional expressions with functions whenever possible. Comparing the contents of a single column to a literal is faster than comparing to expressions. Numeric field comparisons are faster than character, date, and NULL comparisons. In search conditions, comparing a numeric attribute to a numeric literal is faster than comparing a character attribute to a character literal. In general, numeric comparisons (integer, decimal) are handled faster by the CPU than character and date comparisons. Because indexes do not store references to null values, NULL conditions involve additional processing and therefore tend to be the slowest of all conditional operands. Equality comparisons are faster than inequality comparisons. As a general rule, equality comparisons are processed faster than inequality comparisons. For example, P_PRICE = 10.00 is processed faster because the DBMS can do a direct search using the index in the column. If there are no exact matches, the condition is evaluated as false. However, if you use an inequality symbol (>, >=, <, <=), the DBMS must perform additional processing to complete the request. This is because there would almost always be more “greater than” or “less than” values and perhaps only a few exactly “equal” values in the index. The slowest (with the exception of NULL) of all comparison operators is LIKE with wildcard symbols, such as in V_CONTACT LIKE “%glo%”. Also, using the “not equal” symbol (<>) yields slower searches, especially if the sparsity of the data is high; that is, if there are many more different values than there are equal values. Whenever possible, transform conditional expressions to use literals. For example, if your condition is P_PRICE -10 = 7, change it to read P_PRICE = 17. Also, if you have a composite condition such as: P_QOH < P_MIN AND P_MIN = P_REORDER AND P_QOH = 10 change it to read: P_QOH = 10 AND P_MIN = P_REORDER AND P_MIN > 10

293

When using multiple conditional expressions, write the equality conditions first. (Note that we did this in the previous example.) Remember, equality conditions are faster to process than inequality conditions. Although most RDBMSs will automatically do this for you, paying attention to this detail lightens the load for the query optimizer. (The optimizer won’t have to do what you have already done.) If you use multiple AND conditions, write the condition most likely to be false first. If you use this technique, the DBMS will stop evaluating the rest of the conditions as soon as it finds a conditional expression that is evaluated to be false. Remember, for multiple AND conditions to be found true, all conditions must be evaluated as true. If one of the conditions evaluates to false, everything else is evaluated as false. Therefore, if you use this technique, the DBMS won’t waste time unnecessarily evaluating additional conditions. Naturally, the use of this technique implies an implicit knowledge of the sparsity of the data set. Whenever possible, try to avoid the use of the NOT logical operator. It is best to transform a SQL expression containing a NOT logical operator into an equivalent expression. For example: NOT (P_PRICE > 10.00) can be written as P_PRICE <= 10.00. Also, NOT (EMP_SEX = 'M') can be written as EMP_SEX = 'F'. 285. What recommendations would you make for managing the data files in a DBMS with many tables and indexes? Answer: First, create independent data files for the system, indexes, and user data table spaces. Put the data files on separate disks or RAID volumes. This ensures that index operations will not conflict with end-user data or data dictionary table access operations. Second, put high-usage end-user tables in their own table spaces. By doing this, the database minimizes conflicts with other tables and maximizes storage utilization. Third, evaluate the creation of indexes based on the access patterns. Identify common search criteria and isolate the most frequently used columns in search conditions. Create indexes on high usage columns with high sparsity. Fourth, evaluate the usage of aggregate queries in your database. Identify columns used in aggregate functions and determine if the creation of indexes on such columns will improve response time. Finally, identify columns used in ORDER BY statements and make sure there are indexes on such columns. 286. What does RAID stand for, and what are some commonly used RAID levels? Answer: RAID is the acronym for Redundant Array of Independent Disks. RAID is used to provide balance between performance and fault tolerance. RAID systems use multiple disks to create virtual disks (storage volumes) formed by several individual disks. RAID systems provide performance improvement and fault tolerance. Table 11.7 in the text shows the commonly used RAID levels. (We have reproduced the table for your convenience.)

294

Table 11.7 Common RAID Levels RAID Level

Description

The data blocks are spread over separate drives. Also known as striped array. Provides increased performance but no fault tolerance. Fault tolerance means that in case of failure, data could be reconstructed and retrieved. Requires a minimum of two drives.

The same data blocks are written (duplicated) to separate drives. Also referred to as mirroring or duplexing. Provides increased read performance and fault tolerance via data redundancy. Requires a minimum of two drives.

The data are striped across separate drives, and parity data are computed and stored in a dedicated drive. Parity data are specially generated data that permit the reconstruction of corrupted or missing data. Provides good read performance and fault tolerance via parity data. Requires a minimum of three drives.

The data and the parity are striped across separate drives. Provides good read performance and fault tolerance via parity data. Requires a minimum of three drives.

295

ANSWERS TO PROBLEMS Problems 1 and 2 are based on the following query: SELECT

EMP_LNAME, EMP_FNAME, EMP_AREACODE, EMP_SEX

FROM

EMPLOYEE

WHERE

EMP_SEX = ‘F’ AND EMP_AREACODE = ‘615’

ORDER BY EMP_LNAME, EMP_FNAME; What is the likely data sparsity of the EMP_SEX column? Answer: Because this column has only two possible values (“M” and “F”), the EMP_SEX column has low sparsity. 287. What indexes should you create? Write the required SQL commands. Answer: You should create an index in EMP_AREACODE and a composite index on EMP_LNAME, EMP_FNAME. In the following solution, we have named the two indexes EMP_NDX1 and EMP_NDX2, respectively. The required SQL commands are: CREATE INDEX EMP_NDX1 ON EMPLOYEE(EMP_AREACODE); CREATE INDEX EMP_NDX2 ON EMPLOYEE(EMP_LNAME, EMP_FNAME); 288. Using Table 11.4 as an example, create two alternative access plans. Use the following assumptions: a. There are 8,000 employees. b. There are 4,150 female employees. c. There are 370 employees in area code 615. d. There are 190 female employees in area code 615. Answer: The solution is shown in Table P11.3.

296

Table P11.3 Comparing Access Plans and I/O Costs Plan

Step

Operation

I/O Operations

I/O Cost

Resulting Set Rows

8,000

190

8,000

Full table scan EMPLOYEE

Total I/O Cost

Select only rows with EMP_SEX=‘F’ and EMP_AREACODE=‘615’ A

SORT Operation

190

8,190

Index Scan Range of EMP_NDX1

370

Table Access by RowID

370

740

EMPLOYEE B

Select only rows with EMP_SEX=‘F’

370

190

930

SORT Operation

190

1,120

As you examine Table P11.3, note that in Plan A the DBMS uses a full table scan of EMPLOYEE. The SORT operation is done to order the output by employee last name and first name. In Plan B, the DBMS uses an Index Scan Range of the EMP_NDX1 index to get the EMPLOYEE RowIDs. After the EMPLOYEE RowIDs have been retrieved, the DBMS uses those RowIDs to get the EMPLOYEE rows. Next, the DBMS selects only those rows with SEX = ‘F’. Finally, the DBMS sorts the result set by employee last name and first name. Problems 4–6 are based on the following query: SELECT

EMP_LNAME, EMP_FNAME, EMP_DOB, YEAR(EMP_DOB) AS YEAR

FROM

EMPLOYEE

WHERE

YEAR(EMP_DOB) = 1976;

289. What is the likely data sparsity of the EMP_DOB column? Answer: Because the EMP_DOB column stores employee’s birthdays, this column is very likely to have high data sparsity. 290. Should you create an index on EMP_DOB? Why or why not? Answer: Creating an index in the EMP_DOB column would not help this query, because the query uses the YEAR function. However, if the same column is used for other queries, you may want to re-evaluate the decision not to create the index.

297

291. What type of database I/O operations will likely be used by the query? (See Table 11.3.) Answer: This query more than likely uses a full table scan to read all rows of the EMPLOYEE table and generate the required output. We have reproduced the table here to facilitate your discussion:

Table 11.3 Sample DBMS Access Plan I/O Operations Operation

Description

Table scan (full)

Reads the entire table sequentially, from the first row to the last, one row at a time (slowest)

Table access (row id)

Reads a table row directly, using the row ID value (fastest)

Index scan (range)

Reads the index first to obtain the row IDs and then accesses the table rows directly (faster than a full table scan)

Index access (unique)

Used when a table has a unique index in a column

Nested loop

Reads and compares a set of values to another set of values, using a nested loop style (slow)

Merge

Merges two data sets (slow)

Sort

Sorts a data set (slow) Problems 7–29 are based on the ER model shown in Figure P11.7.

298

FIGURE P11.7 The Ch11_SaleCo ER Model for Problems 7–29

Problems 7–10 are based on the following query: SELECT

P_CODE, P_PRICE

FROM

PRODUCT

WHERE

P_PRICE >= (SELECT AVG(P_PRICE) FROM PRODUCT);

292. Assuming there are no table statistics, what type of optimization will the DBMS use? Answer: The DBMS will use the rule-based optimization. 293. What type of database I/O operations will likely be used by the query? (See Table 11.3.) Answer: The DBMS will likely use a full table scan to compute the average price in the inner subquery. The DBMS is also very likely to use another full table scan of PRODUCT to execute the outer query. (We have reproduced the table for your convenience.)

299

Table 11.3 Sample DBMS Access Plan I/O Operations Operation

Description

Table scan (full)

Reads the entire table sequentially, from the first row to the last, one row at a time (slowest)

Table access (row id)

Reads a table row directly, using the row ID value (fastest)

Index scan (range)

Reads the index first to obtain the row IDs and then accesses the table rows directly (faster than a full table scan)

Index access (unique)

Used when a table has a unique index in a column

Nested loop

Reads and compares a set of values to another set of values, using a nested loop style (slow)

Merge

Merges two data sets (slow)

Sort

Sorts a data set (slow)

294. What is the likely data sparsity of the P_PRICE column? Answer: Because each product is likely to have a different price, the P_PRICE column is likely to have high sparsity. 295. Should you create an index? Why or why not? Answer: Yes, you should create an index because the column P_PRICE has high sparsity and the column is very likely to be used in many different SQL queries as part of a conditional expression. Problems 11–14 are based on the following query: SELECT

P_CODE, SUM(LINE_UNITS)

FROM

LINE

GROUP BY

P_CODE

HAVING

SUM(LINE_UNITS) > (SELECT MAX(LINE_UNITS) FROM LINE);

296. What is the likely data sparsity of the LINE_UNITS column? Answer: The LINE_UNITS column in the LINE table represents the quantity purchased of a given product in a given invoice. This column is likely to have many different values and therefore, the column is very likely to have high sparsity.

300

297. Should you create an index? If so, what would the index column(s) be, and why would you create the index? If not, explain your reasoning. Answer: Yes, you should create an index on LINE_UNITS. This index is likely to help in the execution of the inner query that computes the maximum value of LINE_UNITS. 298. Should you create an index on P_CODE? If so, write the SQL command to create the index. If not, explain your reasoning. Answer: Yes, creating an index on P_CODE will help in query execution. However, most DBMSs automatically index foreign key columns. If this is not the case in your DBMS, you can manually create an index using the CREATE INDEX LINE_NDX1 ON LINE(P_CODE) command. (Note that we have named the index LINE_NDX1.) 299. Write the command to create statistics for this table. Answer: ANALYZE TABLE LINE COMPUTE STATISTICS; Problems 15 and 16 are based on the following query: SELECT

P_CODE, P_QOH * P_PRICE

FROM

PRODUCT

WHERE

P_QOH * P_PRICE > (SELECT AVG(P_QOH * P_PRICE) FROM PRODUCT)

300. What is the likely data sparsity of the P_QOH and P_PRICE columns? Answer: The P_QOH and P_PRICE are likely to have high data sparsity. 301. Should you create an index? If so, what would the index column(s) be, and why should you create the index? Answer: In this case, creating an index on P_QOH or on P_PRICE will not help the query execute faster for two reasons: first, the WHERE condition on the outer query uses an expression and second, the aggregate function also uses an expression. When using expressions in the operands of a conditional expression, the DBMS will not use indexes available on the columns that are used in the expression. Problems 17–20 are based on the following query: SELECT

V_CODE, V_NAME, V_CONTACT, V_STATE

FROM

VENDOR

WHERE

V_STATE = ‘TN’

ORDER BY

V_NAME;

301

302. What indexes should you create and why? Write the SQL command to create the indexes. Answer: You should create an index on the V_STATE column in the VENDOR table. This new index will help in the execution of this query because the conditional operation uses the V_STATE column in the conditional criteria. In addition, you should create an index on V_NAME, because it is used in the ORDER BY clause. The commands to create the indexes are: CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE); CREATE INDEX VEND_NDX2 ON VENDOR(V_NAME); Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively. 303. Assume that 10,000 vendors are distributed as shown in Table P11.18. What percentage of rows will be returned by the query? Answer:

Table P11.18 State

Number of Vendors

State

Number of Vendors

358

100

3244

645

345

995

821

425

113

589

208

745

375

258

302

Given the distribution of values in Table P11.18, the query will return 113 of the 10,000 rows, or 1.13% of the total table rows. 304. What type of I/O database operations would most likely be used to execute the query? Answer: Assuming that you create the index on V_STATE and that you generate the statistics on the VENDOR table, the DBMS is very likely to use the index scan range to access the index data and then use the table access by row ID to get the VENDOR rows. 305. Using Table 11.4 as an example, create two alternative access plans. Answer: The two access plans are shown in Table P11.20.

Table P11.20 Comparing Access Plans and I/O Costs Plan

Step

Operation

Full table scan VENDOR

I/O Operations

I/O Cost

Resulting Set Rows

Total I/O Cost

10,000

113

10,000

Select only rows with V_STATE=‘TN’ A

SORT Operation

113

10,113

Index Scan Range of VEND_NDX1

113

Table Access by RowID

113

226

113

339

VENDOR B

SORT Operation

In Plan A, the DBMS uses a full table scan of VENDOR. The SORT operation is done to order the output by vendor name. In Plan B, the DBMS uses an Index Scan Range of the VEND_NDX1 index to get the VENDOR RowIDs. Next, the DBMS uses the RowIDs to get the EMPLOYEE rows. Finally, the DBMS sorts the result set by V_NAME. Problems 21–23 are based on the following query SELECT

P_CODE, P_DESCRIPT, P_PRICE, P.V_CODE, V_STATE

FROM

PRODUCT P, VENDOR V

WHERE

P.V_CODE = V.V_CODE

ORDER BY

AND

V_STATE = ‘NY’

AND

V_AREACODE = ‘212’

P_PRICE;

303

306. What indexes would you recommend? Answer: In this case, there are three possible indices to be created. First, you can create an index on VENDOR.V_STATE. Second, you can create an index in VENDOR.V_AREACODE. It is very likely that there will be many queries that will use these fields to generate reports and filters. Next, you can create an index in PRODUCT.P_PRICE to help with the ORBER BY statement. It is important to note that these three columns are high sparsity. There should not be a need to create an index on VENDOR.V_CODE as this is the primary key of the VENDOR table. Depending on the number of vendors providing products, it may be recommended to create an index on PRODUCT.V_CODE; if the sparsity is high and we have a large number of products. 307. Write the commands required to create the indexes you recommended in Problem 21. Answer: CREATE INDEX VEND_NDX22A ON VENDOR(V_STATE); CREATE INDEX VEND_NDX22B ON VENDOR(V_AREACODE); CREATE INDEX PROD_NDX22A ON PRODUCT(P_PRICE); CREATE INDEX PROD_NDX22B ON PRODUCT(V_CODE); 308. Write the command(s) used to generate the statistics for the PRODUCT and VENDOR tables. Answer: ANALYZE TABLE PRODUCT COMPUTE STATISTICS; ANALYZE TABLE VENDOR COMPUTE STATISTICS; 309. What index would you recommend based on the following query, and what command would you use to create it? SELECT

P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE

FROM

PRODUCT

WHERE

V_CODE = ‘21344’

ORDER BY

P_CODE;

Answer: This query uses one WHERE condition and one ORDER BY clause. The conditional expression uses the V_CODE column in an equality comparison. In this case, creating an index on the V_CODE attribute is recommended. If V_CODE is declared to be a foreign key, the DBMS may already have created such an index automatically. If the DBMS does not generate the index automatically, create one manually. The ORDER BY clause uses the P_CODE column. Create an index on the columns used in an ORDER BY is recommended. However, because the P_CODE column is the primary key of the PRODUCT table, a unique index already exists for this column and therefore, it is not necessary to create another index on this column.

304

Problems 25 and 26 are based on the following query: SELECT

P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE

FROM

PRODUCT

WHERE

P_QOH < P_MIN

AND

P_MIN = ‘P_REORDER’

AND

P_REORDER = 50;

ORDER BY

P_QOH;

310. Use the recommendations given in Section 11-5b to rewrite the query and produce the required results more efficiently. Answer: SELECT

P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE

FROM

PRODUCT

WHERE

P_REORDER = 50

AND

P_MIN = 50

AND

P_QOH < 50

ORDER BY

P_QOH;

This new query rewrites some conditions as follows: 

Because P_REORDER must be equal to 50, it replaces P_MIN = P_REORDER with P_MIN = 50.



Because P_MIN must be 50, it replaces P_QOH<P_MIN with P_QOH<50.

Having literals in the query conditions make queries more efficient. Note that you still need all three conditions in the query conditions. 311. What indexes would you recommend? Write the commands to create those indexes. Answer: Because the query uses equality comparison on P_REORDER, P_MIN, and P_QOH, you should have indexes in such columns. The commands to create such indexes are: CREATE INDEX PROD_NDX1 ON PRODUCT(P_REORDER); CREATE INDEX PROD_NDX2 ON PRODUCT(P_MIN); CREATE INDEX PROD_NDX3 ON PRODUCT(P_QOH);

305

Problems 27–29 are based on the following query: SELECT

CUS_CODE, MAX(LINE_UNITS * LINE_PRICE)

FROM

CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE

WHERE

CUS_AREACODE = ‘615’

GROUP BY

CUS_CODE;

312. Assuming that you generate 15,000 invoices per month, what recommendation would you give the designer about the use of derived attributes? Answer: This query uses the MAX aggregate function to compute the maximum invoice line value by customer. Because this table increases at a rate of 15,000 rows per month, the query would take considerable amount of time to run as the number of invoice rows increases. Furthermore, because the MAX aggregate function uses an expression (LINE_UNITS*LINE_PRICE) instead of a simple table column, the query optimizer is very likely to perform a full table scan in order to compute the maximum invoice line value. One way to speed up the query would be to store the derived attribute LINE_TOTAL in the LINE_TABLE and create an index on LINE_TOTAL. This way, the query would benefit by using the index to execute the query. 313. Assuming that you follow the recommendations you gave in Problem 27, how would you rewrite the query? Answer: SELECT

CUS_CODE, MAX(LINE_TOTAL)

FROM

CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE

WHERE

CUS_AREACODE = ‘615’

GROUP BY

CUS_CODE;

314. What indexes would you recommend for the query you wrote in Problem 28, and what SQL commands would you use? Answer: The query will benefit from having an index on CUS_AREACODE and an index on CUS_CODE. Because CUS_CODE is a foreign key on invoice, it’s very likely that an index already exists. In any case, the query uses the CUS_AREACODE in an equality comparison and therefore, an index on this column is highly recommended. The command to create this index would be: CREATE INDEX CUS_NDX1 ON CUSTOMER(CUS_AREACODE);

306

ANSWERS TO REVIEW QUESTIONS 315. Describe the evolution from centralized DBMSs to distributed DBMSs. Answer: Briefly, early database systems were centralized DBMSs with a single, central site typically housing a mainframe system to serve the needs of all users. Over time, the competitive, societal, and technological environments changed. Business operations became more global, business units became more integrated, and the manner in which internal and external constituents use data changed. The explosion of mobile device and acceptance of the Internet as a platform for data access and distribution greatly increased the need to transact data in highly dispersed environments. A single, centralized site could not meet the exponential growth in data processing and communication demands. As a result, organizations began to distribute the database environment across multiple sites to distribute processing loads and reduce network congestion. 316. List and discuss some of the factors that influenced the evolution of the DDBMS. Answer: 

Global business operations



On-demand transactions using web-based services



Mobile computing



Convergence of data realms

Business competition grew beyond local markets to national, international, and eventually global markets, leading to the need for integrated business operations globally. Additionally, customers began to use, and demand, web-based transactions, which suddenly shifted transaction locations from fixed locations to being able to conduct business from virtually anywhere. This shift was even more pronounced due to the proliferation of mobile devices. Finally, data became more complex by integrating conventional structured data with voice, video, audio, and other data realms that had previously been distinct. All of these factors made it infeasible to perform all processing centrally with a single system.

307

317. What are the advantages of the DDBMS? Answer: 

Data is located near the site of the greatest demand



Faster data access



Faster data processing



Improved communications



Reduced operating costs



User-friendly interface



Less danger of a single-point failure



Processor independence

318. What are the disadvantages of the DDBMS? Answer: 

Complexity of management and control



Increased technological difficulty



Security is more difficult to maintain with more points of failure



Lack of standards for DDBMS environments



Increased storage and infrastructure requirements since multiple sites and often multiple copies of data must be maintained.



Increased costs in training IT personnel.



Higher costs for not only infrastructure duplication, but more personnel, licenses, and software.

319. Explain the difference between a distributed database and distributed processing. Answer: Distributed processing is the sharing of data manipulation across multiple processing units. This can include data access, data selection, calculations and manipulations, and data validation. Distributed database is the sharing of the storage of the data while it is at rest. Distributed databases use database fragments, which are defined subsets of the database data. Distributed processing may or may not use a distributed database, but a distributed database always requires distributed processing. 320. What is a fully distributed database management system? Answer: A fully distributed database management system is a DBMS that can perform all of the functions of a centralized DBMS on a distributed database. Further, it must be able to perform all of those functions in a manner that is transparent to the user, such that the user cannot tell whether the underlying database is centralized or distributed.

308

321. What are the components of a DDBMS? Answer: Computer workstations (nodes or sites) that form the network components. Network hardware and software sufficient to allow the nodes to communication effectively with each other. Communications media such as wired or Wi-Fi communications over which the network hardware and software to exchange communications among the nodes. Transaction processor software, or application, that requests and consumes data. Data processor software that coordinates the data access to the data that resides on that node. 322. List and explain the transparency features of a DDBMS. Answer: 

Distribution transparency—the user need not know how, or if, the data is physically distributed across the network. All data appears local to the user.



Transaction transparency—allows data to be updated at multiple sites while maintaining data consistency and integrity.



Failure transparency—fault tolerance such that failure of any one site on the network does not impact the ability of the system to operate. Any functions on the lost site are picked up by other sites.



Performance transparency—the system will not suffer from performance degradation due to its distribution.



Heterogeneity transparency—the system can integrate data from multiple, different, local DBMSs under a common global schema.

323. Define and explain the different types of distribution transparency. Answer: Distribution transparency refers to transparency in the management of the distributed database as if it were a centralized database. Local mapping transparency requires end users and programmers to know both the names and the locations of the fragments that contain the data to be manipulated. Location transparency means that the end users and programmers need to know the names of the fragments that contain the data to be manipulated, but do not need to know the locations of those fragments. Fragmentation transparency means that end users and programmers do not need to know the names or locations of the fragments that contain the data to be manipulated. In fact, with fragmentation transparency, they do not even need to know that the database is distributed. 324. Describe the different types of database requests and transactions. Answer: A database transaction is formed by one or more database requests. Each database request is the equivalent of a single SQL statement. The basic difference between a local transaction and a distributed transaction is that the latter can update or request data from several remote sites on a network. In a DDBMS, a database request and a database transaction can be of two types: remote or distributed.

309

NOTE The figure references in the discussions refer to the figures found in the text. The figures are not reproduced in this manual. A remote request accesses data located at a single remote database processor (or DP site). In other words, an SQL statement (or request) can reference data at only one remote DP site. Use Figure 12.9 to illustrate the remote request. A remote transaction, composed of several requests, accesses data at only a single remote DP site. Use Figure 12.10 to illustrate the remote transaction. As you discuss Figure 12.10, note that both tables are located at a remote DP (site B) and that the complete transaction can reference only one remote DP. Each SQL statement (or request) can reference only one (the same) remote DP at a time; the entire transaction can reference only one remote DP; and it is executed at only one remote DP. A distributed transaction allows a transaction to reference several different local or remote DP sites. Although each single request can reference only one local or remote DP site, the complete transaction can reference multiple DP sites because each request can reference a different site. Use Figure 12.11 to illustrate the distributed transaction. A distributed request lets us reference data from several different DP sites. Since each request can access data from more than one DP site, a transaction can access several DP sites. The ability to execute a distributed request requires fully distributed database processing because we must be able to: 1. Partition a database table into several fragments. 2. Reference one or more of those fragments with only one request. In other words, we must have fragmentation transparency. The location and partition of the data should be transparent to the end user. Use Figure 12.12 to illustrate the distributed request. As you discuss Figure 12.12, note that the transaction uses a single SELECT statement to reference two tables, CUSTOMER and INVOICE. The two tables are located at two different remote DP sites, B and C. The distributed request feature also allows a single request to reference a physically partitioned table. For example, suppose that a CUSTOMER table is divided into two fragments C1 and C2, located at sites B and C, respectively. The end user wants to obtain a list of all customers whose balance exceeds $250.00. Use Figure 12.13 to illustrate this distributed request. Note that full fragmentation support is provided only by a DDBMS that supports distributed requests.

310

325. Explain the need for the two-phase commit protocol. Then describe the two phases. Answer: Just as a centralized DBMS does, a DDBMS must support the atomicity of transactions and change the database from one consistent state to another. This is done using a two-phase commit protocol (2PC). Throughout a transaction, data manipulation instructions have been sent to various data processors throughout the network. Each data processor has been maintaining local transaction log files for those operations. When the COMMIT command is issued by the user or application, the 2PC ensures that the transaction is committed at all sites involved in the transaction. The 2PC’s first phase is the “Preparation” phase. A coordinator node will send a PREPARE TO COMMIT message to the subordinate sites. The subordinate sites will write their respective transaction logs to permanent storage (not the actual database) and send a PREPARED TO COMMIT reply. If any site replies that it is NOT PREPARED, the coordinator sends an ABORT message to all subordinates. If all sites send the PREPARED TO COMMIT reply, then phase two is activated. Phase two is the “Final Commit” phase. The coordinator will send a COMMIT message to all subordinates. Each subordinate will then write the transaction to the actual database. If the transaction is successfully written to the database, the subordinate sends a COMMITTED reply. If the coordinator receives a COMMITTED reply from every subordinate, the end user or application is notified that the transaction was committed. If any subordinate replies NOT COMMITTED, then the coordinator sends an ABORT message to every subordinate to roll back the entire transaction. 326. What is the objective of query optimization functions? Answer: The objective of query optimization functions is to minimize the total costs associated with the execution of a database request. The costs associated with a request are a function of: 

the access time (I/O) cost involved in accessing the physical data stored on disk



the communication cost associated with the transmission of data among nodes in distributed database systems



the CPU time cost

It is difficult to separate communication and processing costs. Query optimization algorithms use different parameters, and the algorithms assign different weight to each parameter. For example, some algorithms minimize total time, others minimize the communication time, and still others do not factor in the CPU time, considering it insignificant relative to the other costs. Query optimization must provide distribution and replica transparency in distributed database systems. 327. To which transparency feature are the query optimization functions related? Answer: Query optimization functions are associated with the performance transparency features of a DDBMS. In a DDBMS, the query-optimization routines are more complicated because the DDBMS must decide where and which fragment of the database to access. Data fragments are stored at several sites, and the data fragments are replicated at several sites. 328. What issues should be considered when resolving data requests in a distributed data environment? Answer: A data request could be either a read or a write request. However, most requests tend to be read requests. In both cases, resolving data requests in a distributed data environment mostly consider the following issues:

311



Data distribution



Data replication



Network and node availability

A more detailed discussion of these factors can be found in Section 12-10. 329. Describe the three data fragmentation strategies. Give some examples of each. Answer: Horizontal fragmentation fragments rows for a table across multiple sites. The entire row remains on the same fragment, but different rows are on different fragments. An example would be to put customer rows in different fragments based on their state of residence such that all customers in the United States are in one fragment, while customers from Europe are in a different fragment, and customers from Japan are in a different fragment. Another example would be putting different product rows in different fragments based on their manufacturing location. Vertical fragmentation fragments columns for a table across multiple sites. With vertical fragmentation, some attributes of a row are in one site, while other attributes of that same row are in another site. For example, directory information for employees (name, email address, phone number, etc.) may be kept on one server, while payroll-related attributes (wage rate, hours worked, withholdings, etc.) are kept on a different server. Mixed fragmentation is a combination of horizontal and vertical fragmentation such that fragments contain some columns of some rows, other fragments contain other columns of those same rows, and still other fragments contain those same columns but for different rows. 330. What is data replication, and what are the three replication strategies? Answer: Data replication is storing the same data in more than one location. This is different than fragmentation that decomposed a table of data into multiple pieces and put each piece in a different location; however, each piece was only stored once. Replication may or may not involve fragmenting the data—that is, it may not be fragmented or may be fragmented using any of the three fragmentation strategies—but one or more pieces of the data are stored in more than one location. The three strategies are fully replicated, partially replicated, and unreplicated databases. Unreplicated databases do not use replication—each portion of the database is stored only once. A fully replicated database stores multiple copies of every piece of the database. A partially replicated database stores multiple copies of some parts of the database but not all parts. 331. What are the two basic styles of data replication? Answer: There are basically two styles of replication: 

Push replication. In this case, the originating DP node sends the changes to the replica nodes to ensure that all data are mutually consistent.



Pull replication. The originating DP node notifies the replica nodes so they can pull the updates one their own time.

See Section 12-11b for more information. 332. What trade-offs are involved in building highly distributed data environments?

312

Answer: In the year 2000, Dr. Eric Brewer stated in a presentation that: “in any highly distributed data system there are three common desirable properties: consistency, availability and partition tolerance. However, it is impossible for a system to provide all three properties at the same time.” Therefore, the system designers have to balance the trade-offs of these properties in order to provide a workable system. This is what is known as the CAP theorem. For more information on this, see Section 12-12. 333. How does a BASE system differ from a traditional distributed database system? Answer: A traditional database system enforces the ACID properties as to ensure that all database transactions yield a database in a consistent state. In a centralized database system, all data resides in a centralized node. However, in a distributed database system, data are located in multiple geographically disperse sites connected via a network. In such cases, network latency and network partitioning impose a new level of complexity. In most highly distributed systems, designers tend to emphasize availability over data consistency and partition tolerance. This trade-off has given way to a new type of database system in which data are basically available, soft state, and eventually consistent (BASE). For more information about BASE systems see Section 12-12. 334. How do NewSQL databases compare to NoSQL databases in terms of consistency, availability, and partition tolerance? Answer: NewSQL databases attempt to merge ACID transactions of centralized databases with highly distributed models of NoSQL databases. NoSQL databases tend to use BASE, basically available, soft state, eventually consistency to achieve high levels of partitioning tolerance. NewSQL databases tend toward more rigorous consistency and availability at the expense of partitioning tolerance.

313

ANSWERS TO PROBLEMS Problem 1 is based on the DDBMS scenario in Figure P12.1.

FIGURE P12.1 The DDBMS Scenario for Problem 1 TABLES

FRAGMENTS

LOCATION

CUSTOMER PRODUCT

N/A PROD_A PROD_B N/A N/A

A A B B B

INVOICE INV_LINE

Specify the minimum types of operations the database must support to perform the following operations. These operations include remote requests, remote transactions, distributed transactions, and distributed requests.

NOTE To answer the following questions, remind the students that the key to each answer is in the number of different data processors that are accessed by each request/transaction. Ask the students to first identify how many different DP sites are to be accessed by the transaction/request. Next, remind the students that a distributed request is necessary if a single SQL statement is to access more than one DP site. Use the following summary:

Number of DPs Operation

Request

Remote

Distributed

Transaction

Remote

Distributed

314

Based on this summary, the questions are answered easily. Answer: At Site C a. SELECT

FROM

CUSTOMER;

This SQL sequence represents a remote request. b. SELECT

FROM

INVOICE

WHERE

INV_TOTAL < 1000;

This SQL sequence represents a remote request. c. SELECT

FROM

PRODUCT

WHERE

PROD_QOH < 10;

This SQL sequence represents a distributed request. Note that the distributed request is required when a single request must access two DP sites. The PRODUCT table is composed of two fragments, PRO_A and PROD_B, which are located in sites A and B, respectively. d. BEGIN WORK; UPDATE CUSTOMER SET CUS_BALANCE = CUS_BALANCE + 100 WHERE CUS_NUM=‘10936’; INSERT INTO INVOICE(INV_NUM, CUS_NUM, INV_DATE, INV_TOTAL) VALUES (‘986391’, ‘10936’, ‘2022-02-15’, 100); INSERT INTO INVLINE(INV_NUM, PROD_CODE, LINE_PRICE) VALUES (‘986391’, ‘1023’, 100); UPDATE PRODUCT SET PROD_QOH = PROD_QOH - 1 WHERE PROD_CODE = ‘1023’; COMMIT WORK;

315

This SQL sequence represents a distributed request. Note that UPDATE CUSTOMER and the two INSERT statements only require remote request capabilities. However, the entire transaction must access more than one remote DP site, so we also need distributed transaction capability. The last UPDATE PRODUCT statement accesses two remote sites because the PRODUCT table is divided into two fragments located at two remote DP sites. Therefore, the transaction as a whole requires distributed request capability. e. BEGIN WORK; INSERT CUSTOMER(CUS_NUM, CUS_NAME, CUS_ADDRESS, CUS_BAL) VALUES (‘34210’, ‘Victor Ephanor’, ‘123 Main St’, 0.00); INSERT INTO INVOICE(INV_NUM, CUS_NUM, INV_DATE, INV_TOTAL) VALUES (‘986434’, ‘34210’, ‘2022-08-10’, 2.00); COMMIT WORK; This SQL sequence represents a distributed transaction. Note that, in this transaction, each individual request requires only remote request capabilities. However, the transaction as a whole accesses two remote sites. Therefore, distributed request capability is required. At Site A f.

SELECT

CUS_NUM, CUS_NAME, INV_TOTAL

FROM

CUSTOMER, INVOICE

WHERE

CUSTOMER.CUS_NUM = INVOICE.CUS_NUM;

This SQL sequence represents a distributed request. Note that the request accesses two DP sites, one local and one remote. Therefore distributed capability is needed. g. SELECT

FROM

INVOICE

WHERE

INV_TOTAL > 1000;

This SQL sequence represents a remote request, because it accesses only one remote DP site. h. SELECT

FROM

PRODUCT

WHERE

PROD_QOH < 10;

This SQL sequence represents a distributed request. In this case, the PRODUCT table is partitioned between two DP sites, A and B. Although the request accesses only one remote DP

316

site, it accesses a table that is partitioned into two fragments: PROD-A and PROD-B. A single request can access a partitioned table only if the DBMS supports distributed requests. At Site B i.

SELECT

FROM

CUSTOMER;

This SQL sequence represents a remote request. j.

SELECT

CUS_NAME, INV_TOTAL

FROM

CUSTOMER, INVOICE

WHERE

INV_TOTAL > 1000 AND CUSTOMER. CUS_NUM = INVOICE.CUS_NUM;

This SQL sequence represents a distributed request. k. SELECT

FROM

PRODUCT

WHERE

PROD_QOH < 10;

This SQL sequence represents a distributed request. (See explanation for part h.) 335. The following data structure and constraints exist for a magazine publishing company: Answer: a. The company publishes one regional magazine in each of four states: Florida (FL), South Carolina (SC), Georgia (GA), and Tennessee (TN). b. The company has 300,000 customers (subscribers) distributed throughout the four states listed in Problem 2a. c. On the first day of each month, an annual subscription INVOICE is printed and sent to each customer whose subscription is due for renewal. The INVOICE entity contains a REGION attribute to indicate the customer’s state of residence (FL, SC, GA, TN): CUSTOMER (CUS_NUM, CUS_NAME,CUS_ADDRESS, CUS_CITY, CUS_STATE, CUS_ZIP,CUS_SUBSDATE) INVOICE (INV_NUM, INV_REGION, CUS_NUM, INV_DATE, INV_TOTAL)

317

The company is aware of the problems associated with centralized management and has decided to decentralize management of the subscriptions into the company’s four regional subsidiaries. Each subscription site will handle its own customer and invoice data. The management at company headquarters, however, will have access to customer and invoice data to generate annual reports and to issue ad hoc queries such as: 

Listing all current customers by region



Listing all new customers by region



Reporting all invoices by customer and by region

Given these requirements, how must you partition the database? The CUSTOMER table must be partitioned horizontally by state. (We show the partitions in the answer to Question 3c.) 336. Given the scenario and requirements in Problem 2, answer the following questions: Answer: a. What recommendations will you make regarding the type and characteristics of the required database system? The Magazine Publishing Company requires a distributed system with distributed database capabilities. The distributed system will be distributed among the company locations in South Carolina, Georgia, Florida, and Tennessee. The DDBMS must be able to support distributed transparency features, such as fragmentation transparency, replica transparency, transaction transparency, and performance transparency. Heterogeneous capability is not a mandatory feature since we assume there is no existing DBMS in place and that the company wants to standardize on a single DBMS. b. What type of data fragmentation is needed for each table? The database must be horizontally partitioned, using the STATE attribute for the CUSTOMER table and the REGION attribute for the INVOICE table. c. What criteria must be used to partition each database? The following fragmentation segments reflect the criteria used to partition each database: Horizontal Fragmentation of the CUSTOMER Table by State

Fragment Name

Location

Condition

Node name

Tennessee

CUS_STATE = ‘TN’

NAS

Georgia

CUS_STATE = ‘GA’

ATL

Florida

CUS_STATE = ‘FL’

TAM

South Carolina

CUS_STATE = ‘SC’

CHA

318

Horizontal Fragmentation of the INVOICE Table by Region

Fragment Name

Location

Condition

Node name

Tennessee

REGION_CODE = ‘TN’

NAS

Georgia

REGION_CODE = ‘GA’

ATL

Florida

REGION_CODE = ‘FL’

TAM

South Carolina

REGION_CODE = ‘SC’

CHA

d. Design the database fragments. Show an example with node names, location, fragment names, attribute names, and demonstration data. Note the following fragments: Fragment C1

Location: Tennessee

Node: NAS

CUS_NUM

CUS_NAME

CUS_ADDRESS

CUS_CITY

CUS_STATE

10884

James D. Burger

123 Court Avenue

Memphis

2020-12-08

10993

Lisa B. Barnette

910 Eagle Street

Nashville

2021-03-12

Fragment C2 CUS_NUM

Location: Georgia

CUS_SUB_DATE

Node: ATL

CUS_NAME

CUS_ADDRESS

CUS_CITY

CUS_STATE

CUS_SUB_DATE

11887

Ginny E. Stratton

335 Main Street

Atlanta

2020-08-11

13558

Anna H. Ariona

657 Mason Ave.

Dalton

2021-06-23

319

Fragment C3

Location: Florida

Node: TAM

CUS_NUM

CUS_NAME

CUS_ADDRESS

CUS_CITY

10014

John T. Chi

456 Brent Avenue

Miami

2020-11-18

15998

Lisa B. Barnette

234 Ramala Street

Tampa

2021-03-23

Fragment C4

CUS_STATE

Location: South Carolina

CUS_SUB_DATE

Node: CHA

CUS_NUM

CUS_NAME

CUS_ADDRESS

CUS_CITY

CUS_STATE

21562

Thomas F. Matto

45 N. Pratt Circle

Charleston

2020-12-02

18776

Mary B. Smith

526 Boone Pike

Charleston

2021-10-28

Fragment I1

Location: Tennessee

Node: NAS

INV_NUM

REGION_CODE

CUS_NUM

INV_DATE

INV_TOTAL

213342

10884

2021-11-01

45.95

209987

10993

2022-02-15

45.95

Fragment I2

Location: Georgia

Node: ATL

INV_NUM

REGION_CODE

CUS_NUM

INV_DATE

INV_TOTAL

198893

11887

2021-08-15

70.45

224345

13558

2022-06-01

45.95

Fragment I3

CUS_SUB_DATE

Location: Florida

Node: TAM

INV_NUM

REGION_CODE

CUS_NUM

INV_DATE

INV_TOTAL

200915

10014

2021-11-01

45.95

231148

15998

2022-03-01

24.95

320

Fragment I4

Location: South Carolina

Node: CHA

INV_NUM

REGION_CODE

CUS_NUM

INV_DATE

INV_TOTAL

243312

21562

2021-11-15

45.95

231156

18776

2022-10-01

45.95

e. What type of distributed database operations must be supported at each remote site? To answer this question, you must first draw a map of the locations, the fragments at each location, and the type of transaction or request support required to access the data in the distributed database.

Node Fragment

NAS

ATL

TAM

CHA

CUSTOMER

INVOICE

none

none none distributed request

Distributed Operations Required none

Headquarters

Given the problem’s specifications, you conclude that no interstate access of CUSTOMER or INVOICE data is required. Therefore, no distributed database access is required in the four nodes. For the headquarters, the manager wants to be able to access the data in all four nodes through a single SQL request. Therefore, the DDBMS must support distributed requests. f.

What type of distributed database operations must be supported at the headquarters site? See the answer for Part e.

TABLE OF CONTENTS Answers to Review Questions .............................................................................................322 Answers to Problems ...........................................................................................................337

321

ANSWERS TO REVIEW QUESTIONS 337. What is business intelligence? Give some recent examples of BI usage, using the Internet for assistance. What BI benefits have companies found? Answer: Business intelligence (BI) is a term used to describe a comprehensive, cohesive, and integrated set of applications used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information used to support business decision making. As the names implies, BI is about creating intelligence about a business. This intelligence is based on learning and understanding the facts about a business environment. BI is a framework that allows a business to transform data into information, information into knowledge, and knowledge into wisdom. BI has the potential to positively affect a company’s culture by creating “business wisdom” and distributing it to all users in an organization. This business wisdom empowers users to make sound business decisions based on the accumulated knowledge of the business as reflected on recorded facts (historic operational data). Table 13.1 in the text gives some real-world examples of companies that have implemented BI tools (data warehouse, data mart, OLAP, and/or data mining tools) and shows how the use of such tools benefited the companies. Emphasize that the main focus of BI is to gather, integrate, and store business data for the purpose of creating information. BI integrates people and processes using technology in order to add value to the business. Such value is derived from how end users use such information in their daily activities, and in particular, their daily business decision making. Also note that the BI technology components are varied. Examples of BI usage found in web sources: 1. The Dallas Teachers Credit Union (DTCU) used geographical data analysis to increase its customer base from 250,000 professional educators to 3.5 million potential customers virtually overnight. The increase gave the credit union the ability to compete with larger banks that had a strong presence in Dallas. (http://www.computerworld.com/s/article/47371/Business_Intelligence?taxono myId=120) 2. Researchers from the Rand Corporation recently applied business intelligence and analytics technology to determine the dangerous side effects of prescription drugs. (http://www.panorama.com/industry-news/article-view.html?name=Analyticsspots-prescription-problems-508338) 3. Microsoft Case Study website for hundreds of cases about Business Intelligence usage. (http://www.microsoft.com/casestudies/) 338. Describe the BI framework. Illustrate the evolution of BI. Answer: BI is not a product by itself, but a framework of concepts, practices, tools, and technologies that help a business better understand its core capabilities, provide snapshots of the company situation, and identify key opportunities to create competitive advantage. In practice, BI provides a well-orchestrated framework for the management of data that works across all levels of the organization. BI involves the following general steps: 1. Collecting and storing operational data 2. Aggregating the operational data into decision support data 3. Analyzing decision support data to generate information

322

4. Presenting such information to the end user to support business decisions 5. Making business decisions, which in turn generate more data that is collected, stored, and so on (restarting the process). 6. Monitoring results to evaluate outcomes of the business decisions (providing more data to be collected, stored, etc.) To implement all these steps, BI uses varied components and technologies. Section 13-2 is where you’ll find a discussion of these components and technologies—see Table 13.2. Figure 13.2 illustrates the evolution of BI formats. 339. What are decision support systems, and what role do they play in the business environment? Answer: Decision support systems (DSSs) are based on computerized tools that are used to enhance managerial decision making. Because complex data and the proper analysis of such data are crucial to strategic and tactical decision making, DSS are essential to the well-being and even survival of businesses that must compete in a global marketplace. 340. Explain how the main components of the BI architecture interact to form a system. Describe the evolution of BI information dissemination formats. Answer: Refer the students to Section 13-3 in the chapter. Emphasize that, actually, there is no single BI architecture; instead, it ranges from highly integrated applications from a single vendor to a loosely integrated, multivendor environment. However, there are some general types of functionality that all BI implementations share. Like any critical business IT infrastructure, the BI architecture is composed of data, people, processes, technology, and the management of such components. Figure 13.1 (in the text) depicts how all those components fit together within the BI framework. Figure 13.2, in Section 13-2c “Business Intelligence Evolution,” tracks the changes of business intelligence reporting and information dissemination over time. In summary: 1. 1970s: centralized reports running on mainframes, minicomputers, or even central server environments. Such reports were predefined and took considerable time to process. 2. 1980s: desktop computers, downloaded spreadsheet data from central locations. 3. 1990s: first generation DSS, centralized reporting, and OLAP. 4. 2000s: BI web-based dashboards and mobile BI. 5. 2010s: Present: Big Data, NoSQL, Data Visualization. 341. What are the most relevant differences between operational data and decision support data? Answer: Operational data and decision support data serve different purposes. Therefore, it is not surprising to learn that their formats and structures differ. Most operational data are stored in a relational database in which the structures (tables) tend to be highly normalized. Operational data storage is optimized to support transactions that represent daily operations. For example, each time an item is sold, it must be accounted for. Customer data, inventory data, and so on are in a frequent update mode. To provide effective update performance, operational systems store data in many tables, each with a minimum number of fields. Thus, a simple sales transaction might be represented by five or more different tables (for example, invoice, invoice line, discount, store, and department). Although

323

such an arrangement is excellent in an operational database, it is not efficient for query processing. For example, to extract a simple invoice, you would have to join several tables. Whereas operational data are useful for capturing daily business transactions, decision support data give tactical and strategic business meaning to the operational data. From the data analyst’s point of view, decision support data differ from operational data in three main areas: time span, granularity, and dimensionality. 1. Time span. Operational data cover a short time frame. In contrast, decision support data tend to cover a longer time frame. Managers are seldom interested in a specific sales invoice to customer X; rather, they tend to focus on sales generated during the last month, the last year, or the last five years. 2. Granularity (level of aggregation). Decision support data must be presented at different levels of aggregation, from highly summarized to near-atomic. For example, if managers must analyze sales by region, they must be able to access data showing the sales by region, by city within the region, by store within the city within the region, and so on. In that case, summarized data to compare the regions is required, but also data in a structure that enables a manager to drill down, or decompose, the data into more atomic components (that is, finer-grained data at lower levels of aggregation). In contrast, when you roll up the data, you are aggregating the data to a higher level. 3. Dimensionality. Operational data focus on representing individual transactions rather than on the effects of the transactions over time. In contrast, data analysts tend to include many data dimensions and are interested in how the data relate over those dimensions. For example, an analyst might want to know how product X fared relative to product Z during the past six months by region, state, city, store, and customer. In that case, both place and time are part of the picture. Figure 13.3 (in the text) shows how decision support data can be examined from multiple dimensions (such as product, region, and year), using a variety of filters to produce each dimension. The ability to analyze, extract, and present information in meaningful ways is one of the differences between decision support data and transaction-at-a-time operational data. The DSS components that form a system are shown in the text’s Figure 13.1. Note that: 

The data store component is basically a DSS database that contains business data and business-model data. These data represent a snapshot of the company situation.



The data extraction and filtering component is used to extract, consolidate, and validate the data store.



The end-user query tool is used by the data analyst to create the queries used to access the database.



The end-user presentation tool is used by the data analyst to organize and present the data.

342. What is a data warehouse, and what are its main characteristics? How does it differ from a data mart? Answer: A data warehouse is an integrated, subject-oriented, time-variant, and nonvolatile database that provides support for decision making. (See Section 13-4 for an in-depth discussion about the main characteristics.)

324

The data warehouse is usually a read-only database optimized for data analysis and query processing. Typically, data are extracted from various sources and are then transformed and integrated—in other words, passed through a data filter—before being loaded into the data warehouse. Users access the data warehouse via front-end tools and/or end-user application software to extract the data in usable form. Figure 13.4 in the text illustrates how a data warehouse is created from the data contained in an operational database. You might be tempted to think that the data warehouse is just a big, summarized database. But a good data warehouse is much more than that. A complete data warehouse architecture includes support for a decision support data store, a data extraction and integration filter, and a specialized presentation interface. To be useful, the data warehouse must conform to uniform structures and formats to avoid data conflicts and to support decision making. In fact, before a decision support database can be considered a true data warehouse, it must conform to the 12 rules described in Section 13-4b and illustrated in Table 13.9. 343. Give three examples of likely problems when operational data is integrated into the data warehouse. Answer: Within different departments of a company, operational data may vary in terms of how they are recorded or in terms of data type and structure. For instance, the status of an order may be indicated with text labels such as “open”, “received”, “cancel”, or “closed” in one department while another department has it as “1”, “2”, “3”, or “4”. The student status can be defined as “Freshman”, “Sophomore”, “Junior”, or “Senior” in the Accounting department and as “FR”, “SO”, “JR”, or “SR” in the Computer Information Systems department. A social security number field may be stored in one database as a string of numbers and dashes (‘XXX-XX-XXXX’), in another as a string of numbers without the dashes (‘XXXXXXXXX’), and in yet a third as a numeric field (#########). Most of the data transformation problems are related to incompatible data formats, the use of synonyms and homonyms, and the use of different coding schemes. Use the following scenario to answer Questions 8–14. While working as a database analyst for a national sales organization, you are asked to be part of its data warehouse project team. 344. Prepare a high-level summary of the main requirements for evaluating DBMS products for data warehousing. Answer: There are four primary ways to evaluate a DBMS that is tailored to provide fast answers to complex queries: 

The database schema supported by the DBMS



The availability and sophistication of data extraction and loading tools



The end-user analytical interface



The database size requirements

Establish the requirements based on the size of the database, the data sources, the necessary data transformations, and the end-user query requirements. Determine what type of database is needed, that is, a multidimensional or a relational database using the star schema. Other valid evaluation criteria include the cost of acquisition and available upgrades (if any), training, technical and development support, performance, ease of use, and maintenance.

325

345. Your data warehousing project group is debating whether to create a prototype of a data warehouse before its implementation. The project group members are especially concerned about the need to acquire some data warehousing skills before implementing the enterprise-wide data warehouse. What would you recommend? Explain your recommendations. Answer: Knowing that data warehousing requires time, money, and considerable managerial effort, many companies create data marts, instead. Data marts use smaller, more manageable data sets that are targeted to fit the special needs of small groups within the organization. In other words, data marts are small, single-subject data warehouse subsets. Data mart development and use costs are lower and the implementation time is shorter. Once the data marts have demonstrated their ability to serve the DSS, they can be expanded to become data warehouses or they can be migrated into larger existing data warehouses. 346. Suppose that you are selling the data warehouse idea to your users. How would you define multidimensional data analysis for them? How would you explain its advantages to them? Answer: Multidimensional data analysis refers to the processing of data in which data are viewed as part of a multidimensional structure, one in which data are related in many different ways. Business decision makers usually view data from a business perspective. That is, they tend to view business data as they relate to other business data. For example, a business data analyst might investigate the relationship between sales and other business variables such as customers, time, product line, and location. The multidimensional view is much more representative of a business perspective. A good way to visualize the data is to use tools such as pivot tables in MS Excel or data visualization products such as MS Power BI, Tableau Software’s Tableau, or QlikView. 347. The data warehousing project group has invited you to provide an OLAP overview. The group’s members are particularly concerned about the OLAP client/server architecture requirements and how OLAP will fit the existing environment. Your job is to explain the main OLAP client/server components and architectures. Answer: OLAP systems are based on client/server technology and they consist of these main modules: 

OLAP Graphical User Interface (GUI)



OLAP Analytical Processing Logic



OLAP Data Processing Logic

The location of each of these modules is a function of different client/server architectures. How and where the modules are placed depends on hardware, software, and professional judgment. Any placement decision has its own advantages or disadvantages. However, the following constraints must be met: 

The OLAP GUI is always placed in the end user’s computer. The reason it is placed at the client side is simple: this is the main point of contact between the end user and the system. Specifically, it provides the interface through which the end user queries the data warehouse’s contents.



The OLAP Analytical Processing Logic (APL) module can be placed in the client (for speed) or in the server (for better administration and better throughput). The APL performs the complex transformations required for business data analysis, such as multiple dimensions, aggregation, and period comparison.

326



The OLAP Data Processing Logic (DPL) maps the data analysis requests to the proper data objects in the Data Warehouse and is, therefore, generally placed at the server level.

348. One of your vendors recommends using an MDBMS. How would you explain this recommendation to your project leader? Answer: Multidimensional On-Line Analytical Processing (MOLAP) provides OLAP functionality using multidimensional databases systems (MDBMSs) to store and analyze multidimensional data. MDBMSs use special proprietary techniques to store data in matrixlike arrays of n dimensions. 349. The project group is ready to make a final decision, choosing between ROLAP and MOLAP. What should be the basis for this decision? Why? Answer: The basis for the decision should be the system and end-user requirements. Both ROLAP and MOLAP will provide advanced data analysis tools to enable organizations to generate required information. The selection of one or the other depends on which set of tools will fit best within the company’s existing expertise base, its technology and end-user requirements, and its ability to perform the job at a given cost. The proper OLAP/MOLAP selection criteria must include: 

purchase and installation price



supported hardware and software



compatibility with existing hardware, software, and DBMS



available programming interfaces



performance



availability, extent, and type of administrative tools



support for the database schema(s)



ability to handle current and projected database size



database architecture



available resources



flexibility



scalability



total cost of ownership.

350. The data warehouse project is in the design phase. Explain to your fellow designers how you would use a star schema in the design. Answer: The star schema is a data modeling technique that is used to map multidimensional decision support data into a relational database. The reason for the star schema’s development is that existing relational modeling techniques, ER and normalization, did not yield a database structure that served the advanced data analysis requirements well. Star schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structures on which the operational database is built.

327

The basic star schema has four components: facts, dimensions, attributes, and attribute hierarchies. The star schemas represent aggregated data for specific business activities. Using the schemas, we will create multiple aggregated data sources that will represent different aspects of business operations. For example, the aggregation may involve total sales by selected time periods, by products, by stores, and so on. Aggregated totals can be total product units, total sales values by products, and so on. 351. Briefly discuss the OLAP architectural styles with and without data marts. Answer: Section 13-6d, “OLAP Architecture,” details the basic architectural components of an OLAP environment: 

The graphical user interface (GUI front-end)—located always at the end-user end.



The analytical processing logic—this component could be located in the back end (OLAP server) or could be split between the back-end and front-end components.



Data processing logic—logic used to extract data from data; typically located in the back end.

The term OLAP “engine” is sometimes used to refer to the arrangement of the OLAP components as a whole. However, the architecture allows for the split of the some of the components in a client/server arrangement as depicted in Figures 13.16 and 13.17. Figure 13.16 shows a typical OLAP architecture without data marts. In this architecture, the OLAP tool will extract data from the data warehouse and process the data to be presented by the end-user GUI. The processing of the data takes place mostly on the OLAP engine. The OLAP engine location could be located in each client computer or it could be shared from an OLAP “server.” Figure 13.17 shows a typical OLAP architecture with local data marts (end-user located). The local data marts are “miniature” data warehouses that focus on a subset of the data in the data warehouse. Normally these data marts are subject oriented, such as customers, products, and sales. The local data marts provide faster processing but require that the data be periodically “synchronized” with the main data warehouse. 352. What is OLAP, and what are its main characteristics? Answer: OLAP stands for On-Line Analytical Processing and uses multidimensional data analysis techniques. OLAP yields an advanced data analysis environment that provides the framework for decision making, business modeling, and operations research activities. Its four main characteristics are: 1. Multidimensional data analysis techniques 2. Advanced database support 3. Easy-to-use end-user interfaces 4. Support for client/server architecture 353. Explain ROLAP and list the reasons you would recommend its use in the relational database environment. Answer: Relational On-Line Analytical Processing (ROLAP) provides OLAP functionality for relational databases. ROLAP’s popularity is based on the fact that it uses familiar relational query tools to store and analyze multidimensional data. Because ROLAP is based on familiar

328

relational technologies, it represents a natural extension to organizations that already use relational database management systems within their organizations. 354. Explain the use of facts, dimensions, and attributes in the star schema. Answer: Facts are numeric measurements (values) that represent a specific business aspect or activity. For example, sales figures are numeric measurements that represent product and/or service sales. Facts commonly used in business data analysis are units, costs, prices, and revenues. Facts are normally stored in a fact table, which is the center of the star schema. The fact table contains facts that are linked through their dimensions. Dimensions are qualifying characteristics that provide additional perspectives to a given fact. Dimensions are of interest to us, because business data are almost always viewed in relation to other data. For instance, sales may be compared by product from region to region, and from one time period to the next. The kind of problem typically addressed by DSS might be “make a comparison of the sales of product units of X by region for the first quarter from 2013 through 2022.” In this example, sales have product, location, and time dimensions.

329

Dimensions are normally stored in dimension tables. Each dimension table contains attributes. The attributes are often used to search, filter, or classify facts. Dimensions provide descriptive characteristics about the facts through their attributes. Therefore, the data warehouse designer must define common business attributes that will be used by the data analyst to narrow down a search, group information, or describe dimensions. For example, we can identify some possible attributes for the product, location, and time dimensions: 

Product dimension: product id, description, product type, and manufacturer.



Location dimension: region, state, city, and store number.



Time dimension: year, quarter, month, week, and date.

These product, location, and time dimensions add a business perspective to the sales facts. The data analyst can now associate the sales figures for a given product, in a given region, and at a given time. The star schema, through its facts and dimensions, can provide the data when they are needed and in the required format, without imposing the burden of additional and unnecessary data (such as order #, po #, and status) that commonly exist in operational databases. In essence, dimensions are the magnifying glass through which we study the facts. 355. Explain multidimensional cubes and describe how the slice and dice technique fits into this model. Answer: To explain the multidimensional cube concept, let’s assume a sales fact table with three dimensions: product, location, and time. In this case, the multidimensional data model for the sales example is (conceptually) best represented by a three-dimensional cube. This cube represents the view of sales dimensioned by product, location, and time. (We have chosen a three-dimensional cube because such a cube makes it easier for humans to visualize the problem. There is, of course, no limit to the number of dimensions we can use.) The power of multidimensional analysis resides in its ability to focus on specific slices of the cube. For example, the product manager may be interested in examining the sales of a product, thus producing a slice of the product dimension. The store manager may be interested in examining the sales of a store, thus producing a slice of the location dimension. The intersection of the slices yields smaller cubes, thereby producing the “dicing” of the multidimensional cube. By examining these smaller cubes within the multidimensional cube, we can produce very precise analyses of the variable components and interactions. In short, slice and dice refers to the process that allows us to subdivide a multidimensional cube. Such subdivisions permit a far more detailed analysis than would be possible with the conventional two-dimensional data view. The text’s Section 13-5 and Figures 13.5 through 13.9 illustrate the slice and dice concept. To gain the benefits of slice and dice, we must be able to identify each slice of the cube. Slice identification requires the use of the values of each attribute within a given dimension. For example, to slice the location dimension, we can use a STORE_ID attribute in order to focus on a given store.

330

356. In the star schema context, what are attribute hierarchies and aggregation levels and what is their purpose? Answer: Attributes within dimensions can be ordered in an attribute hierarchy. The attribute hierarchy yields a top-down data organization that permits both aggregation and drilldown/roll-up data analysis. Use Figure 13.8 to show how the attributes of the location dimension can be organized into a hierarchy that orders that location dimension by region, state, city, and store. The attribute hierarchy gives the data warehouse the ability to perform drill-down and roll-up data searches. For example, suppose a data analyst wants an answer to the query “How does the 2022 total monthly sales performance compare to the 2021 monthly sales performance?” Having performed the query, suppose that the data analyst spots a sharp total sales decline in March 2022. Given this discovery, the data analyst may then decide to perform a drill-down procedure for the month of March to see how this year’s March sales by region stack up against last year’s. The drill-down results are then used to find out whether the low overall March sales were reflected in all regions or only in a particular region. This type of drill-down operation may even be extended until the data analyst is able to identify the individual store(s) that is (are) performing below the norm. The attribute hierarchy allows the data warehouse and OLAP systems to use a carefully defined path that will govern how data are to be decomposed and aggregated for drill-down and roll-up operations. Of course, keep in mind that it is not necessary for all attributes to be part of an attribute hierarchy; some attributes exist just to provide narrative descriptions of the dimensions. 357. Discuss the most common performance improvement techniques used in star schemas. Answer: The following four techniques are commonly used to optimize data warehouse design: 

Normalization of dimensional tables is done to achieve semantic simplicity and to facilitate end-user navigation through the dimensions. For example, if the location dimension table contains transitive dependencies between region, state, and city, we can revise these relationships to the third normal form (3NF). By normalizing the dimension tables, we simplify the data filtering operations related to the dimensions.



We can also speed up query operations by creating and maintaining multiple fact tables related to each level of aggregation. For example, we may use region, state, and city in the location dimension. These aggregate tables are pre-computed at the data loading phase, rather than at run-time. The purpose of this technique is to save processor cycles at run-time, thereby speeding up data analysis. An end-user query tool optimized for decision analysis will then properly access the summarized fact tables, instead of computing the values by accessing a “lower level of detail” fact table.



Denormalizing fact tables is done to improve data access performance and to save data storage space. The latter objective, storage space savings, is becoming less of a factor: Data storage costs are on a steeply declining path, decreasing almost daily. DBMS limitations that restrict database and table size limits, record size limits, and the maximum number of records in a single table are far more critical than raw storage space costs.

Denormalization improves performance by storing in one single record what normally would take many records in different tables. For example, to compute the total sales for all products

331

in all regions, we may have to access the regional sales aggregates and summarize all the records in this table. If we have 300,000 product sales records, we wind up summarizing at least 300,000 rows. Although such summaries may not be a very taxing operation for a DBMS initially, a comparison of ten or twenty years’ worth of sales is likely to start bogging the system down. In such cases, it will be useful to have special aggregate tables, which are denormalized. For example, a YEAR_TOTAL table may contain the following fields: YEAR_ID, MONTH_1, MONTH_2,....MONTH12, YEAR_TOTAL Such a denormalized YEAR_TOTAL table structure works well to become the basis for yearto-year comparisons at the month level, the quarter level, or the year level. But keep in mind that design criteria such as frequency of use and performance requirements are evaluated against the possible overload placed on the DBMS to manage these denormalized relations. 

Table partitioning and replication are particularly important when a DSS is implemented in widely dispersed geographic areas. Partitioning will split a table into subsets of rows or columns. These subsets can then be placed in or near the client computer to improve data access times. Replication makes a copy of a table and places it in a different location for the same reasons.

358. What is data analytics? Briefly define explanatory and predictive analytics. Answer: Data analytics is a subset of BI functionality that encompasses a wide range of mathematical, statistical, and modeling techniques with the purpose of extracting knowledge from data. Data analytics is used at all levels within the BI framework, including queries and reporting, monitoring and alerting, and data visualization. Hence, data analytics is a “shared” service that is crucial to what BI adds to an organization. Data analytics represents what business managers really want from BI: the ability to extract actionable business insight from current events and foresee future problems or opportunities. Data analytics discovers characteristics, relationships, dependencies, or trends in the organization’s data, and then explains the discoveries and predicts future events based on the discoveries. Data analytics tools can be grouped into two separate (but closely related and often overlapping) areas: 

Explanatory analytics focuses on discovering and explaining data characteristics and relationships based on existing data. Explanatory analytics uses statistical tools to formulate hypotheses, test them, and answer the how and why of such relationships—for example, how do past sales relate to previous customer promotions?



Predictive analytics focuses on predicting future data outcomes with a high degree of accuracy. Predictive analytics uses sophisticated statistical tools to help the end user create advanced models that answer questions about future data occurrences—for example, what would next month’s sales be based on a given customer promotion?

332

359. Describe and contrast the focus of data mining and predictive analytics. Give some examples. Answer: In practice, data analytics is better understood as a continuous spectrum of knowledge acquisition that goes from discovery to explanation to prediction. The outcomes of data analytics then become part of the information framework on which decisions are built. You can think of data mining (explanatory analytics) as explaining the past and present, while predictive analytics forecasts the future. However, you need to understand that both sciences work together; predictive analytics uses explanatory analytics as a stepping stone to create predictive models. Data mining refers to analyzing massive amounts of data to uncover hidden trends, patterns, and relationships; to form computer models to simulate and explain the findings; and then to use such models to support business decision making. In other words, data mining focuses on the discovery and explanation stages of knowledge acquisition. However, data mining can also be used as the basis to create advanced predictive data models. For example, a predictive model could be used to predict future customer behavior, such as a customer response to a target marketing campaign. So, what is the difference between data mining and predictive analytics? In fact, data mining and predictive analytics use similar and overlapping sets of tools, but with a slightly different focus. Data mining focuses on answering the “how” and “what” of past data, while predictive analytics focuses on creating actionable models to predict future behaviors and events. In some ways, you can think of predictive analytics as the next logical step after data mining; once you understand your data, you can use the data to predict future behaviors. In fact, most BI vendors are dropping the term data mining and replacing it with the more alluring term predictive analytics. Predictive analytics can be traced back to the banking and credit card industries. The need to profile customers and predict customer buying patterns in these industries was a critical driving force for the evolution of many modeling methodologies used in BI data analytics today. For example, based on your demographic information and purchasing history, a credit card company can use data-mining models to determine what credit limit to offer, what offers you are more likely to accept, and when to send those offers. Another example, a data mining tool could be used to analyze customer purchase history data. The data mining tool will find many interesting purchasing patterns, and correlations about customer demographics, timing of purchases, and the type of items they purchase together. The predictive analytics tool will use those findings to build a model that will predict with high degree of accuracy when a certain type of customer will purchase certain items and what items are likely to be purchased on certain times. 360. How does data mining work? Discuss the different phases in the data mining process. Answer: Data mining is subject to four phases: 

In the data preparation phase, the main data sets to be used by the data mining operation are identified and cleansed from any data impurities. Because the data in the data warehouse are already integrated and filtered, the Data Warehouse usually is the target set for data mining operations.



The data analysis and classification phase objective is to study the data to identify common data characteristics or patterns. During this phase the data mining tool applies specific algorithms to find: 

data groupings, classifications, clusters, or sequences.

333



data dependencies, links, or relationships.



data patterns, trends, and deviations.



The knowledge acquisition phase uses the results of the data analysis and classification phase. During this phase, the data mining tool (with possible intervention by the end user) selects the appropriate modeling or knowledge acquisition algorithms. The most typical algorithms used in data mining are based on neural networks, decision trees, rules induction, genetic algorithms, classification and regression trees, memory-based reasoning, or nearest neighbor and data visualization. A data mining tool may use many of these algorithms in any combination to generate a computer model that reflects the behavior of the target data set.



Although some data mining tools stop at the knowledge acquisition phase, others continue to the prognosis phase. In this phase, the data mining findings are used to predict future behavior and forecast business outcomes. Examples of data mining findings can be:

65% of customers who did not use the credit card in six months are 88% likely to cancel their account 82% of customers who bought a new TV 42” or bigger are 90% likely to buy an entertainment center within the next 4 weeks. If age < 30 and income <= 25,000 and credit rating < 3 and credit amount > 25,000, the minimum term is 10 years. The complete set of findings can be represented in a decision tree, a neural net, a forecasting model, or a visual presentation interface, which is then used to project future events or results. For example, the prognosis phase may project the likely outcome of a new product roll-out or a new marketing promotion. 361. Describe the characteristics of predictive analytics. What is the impact of Big Data in predictive analytics? Answer: Predictive analytics employs mathematical and statistical algorithms, neural networks, artificial intelligence, and other advanced modeling tools to create actionable predictive models based on available data. The algorithms used to build the predictive model are specific to certain types of problems and work with certain types of data. Therefore, it is important that the end user, who typically is trained in statistics and understands business, applies the proper algorithms to the problem in hand. However, thanks to constant technology advances, modern BI tools automatically apply multiple algorithms to find the optimum model. Most predictive analytics models are used in areas such as customer relationships, customer service, customer retention, fraud detection, targeted marketing, and optimized pricing. Predictive analytics can add value to an organization in many different ways; for example, it can help optimize existing processes, identify hidden problems, and anticipate future problems or opportunities. However, predictive analytics is not the “secret sauce” to fix all business problems. Managers should carefully monitor and evaluate the value of predictive analytics models to determine their return on investment.

334

Predictive analytics received a big stimulus with the advent of social media. Companies turned to data mining and predictive analytics as a way to harvest the mountains of data stored on social media sites. Google was one of the first companies that offered targeted ads as a way to increase and personalize search experiences. Similar initiatives were used by all types of organizations to increase customer loyalty and drive up sales. Take the example of the airline and credit card industries and their frequent flyer and affinity card programs. Nowadays, many organizations use predictive analytics to profile customers in an attempt to get and keep the right ones, which in turn will increase loyalty and sales. 362. Describe data visualization. What is the goal of data visualization? Answer: Data visualization is the process of abstracting data to provide a visual data representation that enhances the user’s ability to comprehend the meaning of the data. The goal of data visualization is to allow the user to quickly and efficiently see the data’s big picture by identifying trends, patterns, and relationships. 363. Is data visualization only useful when used with Big Data? Explain and expand. Answer: It is a mistake to think that data visualization is useful only when dealing with Big Data. Any organization (regardless of size) that collects and uses data in its daily activities can benefit from the use of data analytics and visualization techniques. We all have heard the saying “a picture is worth a thousand words,” and this has never been more accurate than in data visualization. Tables with hundreds, thousands, or millions of rows of data cannot be processed by the human mind in a meaningful way. Providing summarized tabular data to managers does not give them enough insight into the meaning of the data to make informed decisions. Data visualization encodes the data into visually rich formats (mostly graphical) that provide at-a-glance insight into overall trends, patterns, and possible relationships. Data visualization techniques range from simple to very complex, and many are familiar. Such techniques include pie charts, line graphs, bar charts, bubble charts, bubble maps, donut charts, scatter plots, Gantt charts, heat maps, histograms, time series plots, steps charts, waterfall charts, and many more. The tools used in data visualization range from a simple spreadsheet (such as MS Excel) to advanced data visualization software such as Tableau, Microsoft PowerBI, Domo, and Qlik. Common productivity tools such as Microsoft Excel can often provide surprisingly powerful data visualizations. Excel has long included basic charting and PivotTable and PivotChart capabilities for visualizing spreadsheet data. More recently, the introduction of the PowerPivot add-in has eliminated row and column data limitations and allows for the integration of data from multiple sources. This puts powerful data visualization capabilities within reach of most business users. 364. As a discipline, data visualization can be studied as a group of visual communication techniques used to explore and discover data insights by applying: pattern recognition, spatial awareness, and aesthetics. 365. Describe the different types of data and how they map to star schemas and data analysis. Give some examples of the different data types. Answer: In general, there are two types of data: 

Qualitative: describes qualities of the data. This type of data can be subdivided in two subtypes: — Nominal: This is data that can be counted but not ordered or aggregated. Examples: sex (male or female); student class (graduate or undergraduate).

335

— Ordinal: This is data that can be counted and ordered but not aggregated. Examples: rate your teacher (excellent, good, fair, poor), what is your family income (under 20,000, 20,001 to 40,000, 40,001 to 60,000, 60,001 or more). 

Quantitative: describes numeric facts or measures of the data. This type of data can be counted, ordered, and aggregated. Statisticians refer to this data as “interval and ratio” data. Examples of quantitative data include age, GPA, and number of accidents.

You can think of qualitative data as being the dimensions on a star schema and the quantitative data as being the facts of a star schema. This is important because it means that you must use the correct type of functions and operations with each data type, including the proper way to visually represent it. 366. What five graphical data characteristics does data visualization use to highlight and contrast data findings and convey a story? Answer: Data visualization uses shape, color, size, position, and group/order to represent and highlight data in certain ways. The way you visualize the data tells a story and has an impact on the end users. Some data visualizations can provide unknown insights and others can be a way to draw attention to an issue. When used correctly, data visualization can tell the story behind the data. 367. Contrast a data lake with a data warehouse. Answer: The primary difference between a data lake and a data warehouse is the state of the data. The data lake stores data in its raw, natural format. The data is raw in that it has not been processed yet. The data warehouse stores data that has been processed so that it conforms to the defined data warehouse structure. The processing may involve many manipulations of the data to decompose, aggregate, clean, and categorize the data.

NOTE The Teacher data files for this chapter has a Dashboards folder that contains two complete data sets: H1B Visa data and Vehicle Crash data. In addition, there are sample dashboards in Excel, MS Power BI, and Tableau. Read the included documentation explaining some of the data transformations applied to the raw data.

336

ANSWERS TO PROBLEMS The university computer lab’s director keeps track of lab usage, as measured by the number of students using the lab. This function is important for budgeting purposes. The computer lab director assigns you the task of developing a data warehouse to keep track of the lab usage statistics. The main requirements for this database are to: 

Show the total number of users by different time periods.



Show usage numbers by time period, by major, and by student classification.



Compare usage for different majors and different semesters.

Use the Ch13_P1.mdb database, which includes the following tables: 

USELOG contains the student lab access data.



STUDENT is a dimension table that contains student data.

Given the three preceding requirements, and using the Ch13_P1.mdb data, complete the following problems: a. Define the main facts to be analyzed. (Hint: These facts become the source for the design of the fact table.) b. Define and describe the appropriate dimensions. (Hint: These dimensions become the source for the design of the dimension tables.) c. Draw the lab usage star schema, using the fact and dimension structures you defined in Problems 1a and 1b. d. Define the attributes for each of the dimensions in Problem 1b. e. Recommend the appropriate attribute hierarchies. f.

Implement your data warehouse design, using the star schema you created in Problem 1c and the attributes you defined in Problem 1d.

g. Create the reports that will meet the requirements listed in this problem’s introduction. Answer: Before Problems 1a–g can be answered, the students must create the time and semester dimensions. Looking at the data in the USELOG table, the students should be able to figure out that the data belong to the Fall 2017 and Spring 2018 semesters; so the semester dimension must contain entries for at least these two semesters. The time dimension can be defined in several different ways. It will be very useful to provide class time during which students can explore the different benefits derived from various ways to represent the time dimension. Regardless of what time dimension representation is selected, it is clear that the date and time entries in the USELOG must be transformed to meet the TIME and SEMESTER codes. For data analysis purposes, we suggest using the TIME and SEMESTER dimension table configurations shown in Tables P13.1A and P13.1B, respectively. (We have used these configurations in the DW-P1sol.MDB database that is located on the CD.)

337

Table P13.1A The TIME Dimension Table Structure TIME_ID

TIME_DESCRIPTION

BEGIN_TIME

END_TIME

Morning

6:01AM

12:00PM

Afternoon

12:01PM

6:00PM

Night

6:01PM

6:00AM

Table P13.1B The SEMESTER Dimension Table Structure SEMESTER_ID

SEMESTER_DESCRIPTION

BEGIN_DATE

END_DATE

FA17

Fall 2017

15-Aug-2017

18-Dec-2017

SP18

Spring 2018

08-Jan-2018

15-May-2018

The USELOG table contains only the date and time of the access, rather than the semester or time IDs. The student must create the TIME and SEMESTER dimension tables and assign the proper TIME_ID and SEMESTER_ID keys to match the USELOG’s time and date. The students should also create the MAJOR dimension table, using the data already stored in the STUDENT table. Using Microsoft Access, we used the Make New Table query type to produce the MAJOR table. The Make New Table query lets you create a new table, MAJOR, using query output. In this case, the query must select all unique major codes and descriptions. The same technique can be used to create the student classification dimension table. (In our solution, we have named the student classification dimension table CLASS.) Naturally, you can use some front-end tool other than Access, but we have found Access to be particularly effective in this environment. To produce the solution we have stored in the PW-P1sol.MBD database, we have used the queries listed in Table P13.1C.

338

Table P13.1C The Queries in the DW_P1sol.MDB Database Query Name

Query Description

Update DATE format in USELOG

The DATE field in USELOG was originally given to us as a character field. This query converted the date text to a date field we can use for date comparisons.

Update STUDENT_ID format in STUDENT

This query changes the STUDENT_ID format to make it compatible with the format used in USELOG.

Update STUDENT_ID format in USELOG

This query changes the STUDENT_ID format to make it compatible with the format used in STUDENT.

Append TEST records from USELOG & STUDENT

Creates a temporary storage table (TEST) used to make some data transformations before the creation of the fact table. The TEST table contains the fields that will be used in the USEFACT table, plus other fields used for data transformation purposes.

Update TIME_ID and SEMESTER_ID in TEST

Before we create the USEFACT table, we must transform the dates and time to match the SEMESTER_ID and TIME_ID keys used in our SEMESTER and TIME dimension tables. This query does that.

Count STUDENTS sort by Fact Keys: SEM, MAJOR, CLASS, TIME.

This query does data aggregation over the data in TEST table. This query table will be used to create the new USEFACT table.

Populate USEFACT

This query uses the results of the previous query to populate our USEFACT table.

Compares usage by Semesters by Times

Used to generate Report1

Usage by Time, Major and Classification

Used to generate Report2

Usage by Major and Semester

Used to generate Report3

Having completed the preliminary work, we can now present the solutions to the seven problems: a. Define the main facts to be analyzed. (Hint: These facts become the source for the design of the fact table.) Answer: The main facts are the total number of students by time, the major, the semester, and the student classification. b. Define and describe the appropriate dimensions. (Hint: These dimensions become the source for the design of the dimension tables.)

339

Answer: The possible dimensions are semester, major, classification, and time. Each of these dimensions provides an additional perspective to the total number of students fact table. The dimension table names and attributes are shown in the screenshot that illustrates the answer to Problem 3. c. Draw the lab usage star schema, using the fact and dimension structures you defined in Problems 1a and 1b. Answer: Figure P13.1c shows the MS Access relational diagram—see the Ch13P1sol.mdb database in the Student Online Companion—to illustrate the star schema, the relationships, the table names, and the field names used in our solution. The students are given only the USELOG and STUDENT tables and they must produce the fact table and dimension tables.

FIGURE P13.1c The Microsoft Access Relational Diagram

d. Define the attributes for each of the dimensions in Problem 1b. Answer: Given Problem 1c’s star schema snapshot, the dimension attributes are easily defined: Semester dimension: semester_id, semester_description, begin_date, and end_date. Major dimension: major_code and major_name. Class dimension: class_id and class_description. Time dimension: time_id, time_description, begin_time, and end_time.

340

e. Recommend the appropriate attribute hierarchies. Answer: See the answer to Question 18 and the dimensions shown in Problems 1c and 1d to develop the appropriate attribute hierarchies.

Implement your data warehouse design, using the star schema you created in Problem 1c and the attributes you defined in Problem 1d. Answer: The solution is included in the Ch13_P1sol.mdb database on the Instructor’s CD.

g. Create the reports that will meet the requirements listed in this problem’s introduction. Answer: Use the Ch13_P1sol.mdb database on the Instructor’s CD as the basis for the reports. Keep in mind that the Microsoft Access export function can be used to put the Access tables into a different database such as Oracle or DB2. 368. Victoria Ephanor manages a small product distribution company. Because the business is growing fast, she recognizes that it is time to manage the vast information pool to help guide the accelerating growth. Ephanor, who is familiar with spreadsheet software, currently employs a sales force of four people. She asks you to develop a data warehouse application prototype that will enable her to study sales figures by year, region, salesperson, and product. (This prototype will be used as the basis for a future data warehouse database.) Using the data supplied in the Ch13_P2.xlsx file, complete the following seven problems: a. Identify the appropriate fact table components. Answer: The dimensions for this star schema are: Year, Region, Agent, and Product. (These are shown in Figure P13.2c.) b. Identify the appropriate dimension tables. Answer: (These are shown in Figure P13.2c.) c. Draw a star schema diagram for this data warehouse. Answer: See Figure P13.2c.

341

FIGURE P13.2C The Star Schema for the Ephanor Distribution Company

d. Identify the attributes for the dimension tables that will be required to solve this problem. Answer: The solution to this problem is presented in the Ch13_P2sol.xls file in the Student Online Companion. e. Using Microsoft Excel or any other spreadsheet program that can produce pivot tables, generate a pivot table to show the sales by product and by region. The end user must be able to specify the display of sales for any given year. The sample output is shown in the first pivot table in Figure P13.2E.

FIGURE P13.2E Using A Pivot Table

Answer: The solution to this problem is presented in the Ch13_P2sol.xlsx file in the Teacher Data Files. f.

Using Problem 2e as your base, add a second pivot table (see Figure P13.2E) to show the sales by salesperson and by region. The end user must be able to specify sales for a given year or for all years, and for a given product or for all products.

342

FIGURE P13.2F Second Pivot Table

Answer: The solution to this problem is presented in the Ch13_P2sol.xlsx file in the Teacher Data Files. g. Create a 3D bar graph to show sales by salesperson, by product, and by region. (See the sample output in Figure P13.2G.)

FIGURE P13.2G 3D Bar Graph Showing the Relationships among Agent, Product, and Region

Answer: The solution to this problem is presented in the Ch13_P2sol.xlsx file in the Teacher Data Files.

343

369. David Suker, the inventory manager for a marketing research company, wants to study the use of supplies within the different company departments. Suker has heard that his friend, Victoria Ephanor, has developed a spreadsheet-based data warehouse model that she uses to analyze sales data (see Problem 2). Suker is interested in developing a data warehouse model like Ephanor’s so he can analyze orders by department and by product. He will use Microsoft Access as the data warehouse DBMS and Microsoft Excel as the analysis tool. a. Develop the order star schema. Answer: Figure P13.3a’s MS Access relational diagram reflects the star schema and its relationships. Note that the students are given only the ORDERS table. The student must study the data set and make the queries necessary to create the dimension tables (TIME, DEPT, VENDOR, and PRODUCT) and the ORDFACT fact table.

FIGURE P13.3A The Marketing Research Company Relational Diagram

b. Identify the appropriate dimension attributes. Answer: The dimensions are TIME, DEPT, VENDOR, and PRODUCT. (See Figure P13.3A.) c. Identify the attribute hierarchies required to support the model. Answer: The main hierarchy used for data drilling purposes is represented by TIME-DEPTVENDOR-PRODUCT sequence. (See Figure P13.3a.) Within this hierarchy, the user can analyze data at different aggregation levels. Additional hierarchies can be constructed in the TIME dimension to account for quarters or, if necessary, by daily aggregates. The VENDOR dimension could also be expanded to include geographic information that could be used for drill-down purposes. d. Using the Ch13_P3 database, develop a crosstab report in Microsoft Access, using a 3D bar graph to show orders by product and by department. (The sample output is shown in Figure P13.3.)

FIGURE P13.3 Crosstab Report: Orders by Product and Department

344

Answer: The solution to this problem is included in the Ch13_P3sol.mdb database in the Teacher Data Files. 370. ROBCOR, whose sample data is contained in the database named Ch13_P4.mdb, provides “ondemand” aviation charters using a mix of different aircraft and aircraft types. Because ROBCOR has grown rapidly, its owner has hired you as its first database manager. The company’s database, developed by an outside consulting team, is already in place to help manage all company operations. Your first critical assignment is to develop a decision support system to analyze the charter data. (Review the company’s operations in Problems 24–31 of Chapter 3, The Relational Database Model.) The charter operations manager wants to be able to analyze charter data such as cost, hours flown, fuel used, and revenue. She also wants to be able to drill down by pilot, type of airplane, and time periods.

345

Given those requirements, complete the following: a. Create a star schema for the charter data.

NOTE The students must first create the queries required to filter, integrate, and consolidate the data prior to their inclusion in the Data Warehouse. The Ch13_P4.mdb database in the Student Data Files contains the data to be used by the students. The Ch13_P4sol.mdb database in the Teacher Data Files contains the data and solution to the problems. Answer: The problem requires the creation of the time dimension. Looking at the data in the CHARTER table, the students should figure out that the two attributes in the time dimension should be year and month. Another possible attribute could be day, but since no one pilot or airplane was used more than once a day, including it as an attribute would only reduce the database’s efficiency. The analysis to be done on the time dimension can be done on a monthly or yearly basis. The CHARTER table contains the date of the charter. No time IDs exist and the date is contained within a single field. The student must create the TIME dimension table and assign the proper TIME_ID keys and its attributes. A temporary table is created to aid in the creation of the CHARTER_FACT table. The queries in Table P13.4a are used in the transformation process:

Table P13.4-1A The ROBCOR Data Warehouse Queries Query Name

Query Description

Make a TEMP table from CHARTER, PILOT, and MODEL

Creates a temporary storage table used to make the necessary data transformations before the creation of the fact table.

Update TIME_ID in TEMP

Used to create the TIME_ID key used in the TIME dimension table.

Update YEAR and MONTH in TEMP

In order to get the year and month attributes in the TIME dimension it is necessary to separate that data in the temporary table first. The date is in the TEMP table but will not be in the fact table.

Make TIME table from TEMP

This query is used to create the time table using the appropriate data from the TEMP table.

Aggregate TEMP table by fact keys

This query does data aggregation over the data in the TEMP table. This query table will be used to create the new CHARTER_FACT table.

Populate CHARTER_FACT table

This query uses the results of the previous query to populate our CHARTER_FACT table.

346

The MS Access relational diagram in Figure P13.4a reflects the star schema, the relationships, the table names, and field names used in our solution. The student is given only the CHARTER, AIRCRAFT, MODEL, EMPLOYEE, PILOT, and CUSTOMER tables, and they must produce the fact table and the dimension table.

FIGURE P13.4A The ROBCOR Relational Diagram

b. Define the dimensions and attributes for the charter operation’s star schema. Answer: The dimensions are TIME, MODEL, and PILOT. Each of these dimensions is depicted in Figure P13.4a’s star schema figure. The attributes are: Time dimension: time id, year, and month. Model dimension: model code, manufacturer, name, number of seats, and so on. Pilot dimension: employee number, pilot license, pilot ratings, and so on. c. Define the necessary attribute hierarchies. Answer: The main attribute hierarchy is based on the sequence year-month-model-pilot. The aggregate analysis is based on this hierarchy. We can produce a query to generate revenue, hours flown, and fuel used on a yearly basis. We can then drill down to a monthly time period to generate the aggregate information for each model of airplane. We can also drill down to get that information about each pilot. d. Implement the data warehouse design using the design components you developed in Problems 4a–4c. Answer: The Ch13_P4sol.mdb database contains the data and solutions for Problems 4a– 4c.

347

e. Generate the reports to illustrate that your data warehouse meets the specified information requirements. Answer: The Ch13-P4sol.mdb database contains the solution for Problem 4e. Using the data provided in the Ch13-SaleCo-DW database, solve the following problems. (Hint: In Problems 5–11, use the ROLLUP command.)

The script files used to populate the database are available at cengage.com. The script files are available in Oracle, MySQL and SQL Server formats. MS Access does not have SQL support for the complex grouping required. 371. What is the SQL command to list the total sales by customer and by product, with subtotals by customer and a grand total for all product sales? Figure P13.5 shows the abbreviated results of the query. Answer: Oracle: SELECT

CUS_CODE, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT

GROUP BY

ROLLUP (CUS_CODE, P_CODE);

SQL Server and MySQL: SELECT

CUS_CODE, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT

GROUP BY

CUS_CODE, P_CODE WITH ROLLUP;

What is the SQL command to list the total sales by customer, month and product, with subtotals by customer and by month and a grand total for all product sales? Figure P13.6 shows the abbreviated results of the query. Answer: Oracle: SELECT

CUS_CODE, TM_MONTH, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

ROLLUP (CUS_CODE, TM_MONTH, P_CODE);

348

SQL Server and MySQL: SELECT

CUS_CODE, TM_MONTH, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

CUS_CODE, TM_MONTH, P_CODE WITH ROLLUP;

372. What is the SQL command to list the total sales by region and customer, with subtotals by region and a grand total for all sales? Figure P13.7 shows the result of the query. Answer: Oracle: SELECT

REG_ID, CUS_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWCUSTOMER C ON S.CUS_CODE = C.CUS_CODE

GROUP BY

ROLLUP (REG_ID, CUS_CODE);

SQL Server and MySQL: SELECT

REG_ID, CUS_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWCUSTOMER C ON S.CUS_CODE = C.CUS_CODE

GROUP BY

REG_ID, CUS_CODE WITH ROLLUP;

373. What is the SQL command to list the total sales by month and product category, with subtotals by month and a grand total for all sales? Figure P13.8 shows the result of the query. Answer: Oracle: SELECT

TM_MONTH, P_CATEGORY, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

ROLLUP (TM_MONTH, P_CATEGORY);

SQL Server and MySQL: SELECT

TM_MONTH, P_CATEGORY, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

TM_MONTH, P_CATEGORY WITH ROLLUP;

349

374. What is the SQL command to list the number of product sales (number of rows) and total sales by month, with subtotals by month and a grand total for all sales? Figure P13.9 shows the result of the query. Answer: Oracle: SELECT

TM_MONTH, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

ROLLUP (TM_MONTH);

SQL Server and MySQL: SELECT

TM_MONTH, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

TM_MONTH WITH ROLLUP;

375. What is the SQL command to list the number of product sales (number of rows) and total sales by month and product category with subtotals by month and product category and a grand total for all sales? Figure P13.10 shows the result of the query. Answer: Oracle: SELECT

TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

ROLLUP (TM_MONTH, P_CATEGORY);

SQL Server and MySQL: SELECT

TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

TM_MONTH, P_CATEGORY WITH ROLLUP;

376. What is the SQL command to list the number of product sales (number of rows) and total sales by month, product category and product, with subtotals by month and product category and a grand total for all sales? Figure P13.11 shows the result of the query.

350

Answer: Oracle: SELECT

TM_MONTH, P_CATEGORY, P_CODE, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE

GROUP BY

ROLLUP (TM_MONTH, P_CATEGORY, P_CODE);

SQL Server and MySQL: SELECT

TM_MONTH, P_CATEGORY, P_CODE, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE

GROUP BY

TM_MONTH, P_CATEGORY, P_CODE WITH ROLLUP;

377. Using the answer to Problem 10 as your base, what command would you need to generate the same output but with subtotals in all columns? (Hint: Use the CUBE command.) Figure P13.12 shows the result of the query. Answer: Oracle: SELECT

TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

CUBE (TM_MONTH, P_CATEGORY);

SQL Server: SELECT

TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

TM_MONTH, P_CATEGORY WITH CUBE;

MySQL does not currently have the ability to do this type of grouping without third-party add-on products.

351

378. Create your own data analysis and visualization presentation. The purpose of this project is for you to search for a publicly available data set using the Internet and create your own presentation using what you have learned in this chapter. a. Search for a data set that may interest you and download it. Some examples of public data sets sources are (see also Note on page 625): 

http://www.data.gov



http://data.worldbank.org



http://aws.amazon.com/datasets



http://usgovxml.com/



https://data.medicare.gov/



http://www.faa.gov/data_research/

b. Use any tool available to you to analyze the data. You can use tools such as MS Excel PivotTables, PivotCharts, or other free tools, such as Google Fusion tables, Tableau free trial, and IBM Many Eyes. c. Create a short presentation to explain some of your findings (such as what the data sources are, where the data comes from, and what the data represents.) Answer: There are an incredible number of possible visualizations that students can create for an exercise like this. Most students enjoy the opportunity to express their creativity in producing visually interesting solutions. Attempt to keep the focus on how the visualization might make the data actionable. What can we learn from the visualization, and how might a decision maker be influenced by it? Data Sources available: There are several public sources of large data sets that could be used by students to practice visualizations. Some of the most common sources are: http://catalog.data.gov

http://data.worldbank.org

http://aws.amazon.com/datasets

http://usgovxml.com

https://data.medicare.gov

http://www.faa.gov/data_research/

https://www.cdc.gov/nchs/data_access/ https://data.world/ For some good examples of data visualizations, see the Centers for Disease Control and Prevention, Data Visualization Gallery at https://www.cdc.gov/nchs/data-visualization/

352

NOTE The data files for this chapter has a Dashboards folder that contains two complete data visualization examples, including two data sets: H1B Visa data and Vehicle Crash data. In addition, there are sample dashboards built in Excel, MS Power BI, and Tableau. Read the included documentation explaining some of the data transformations applied to the raw data. These should serve as a good starting point to the students on how to create some simple dashboards. See sample figures below.

FIGURE P13.13A H1b Visa Applications Dashboard (Excel)

353

FIGURE P13.13B H1B Visa Applications Dashboard (PowerBI)

354

FIGURE P13.13C H1B Visa Applications Dashboard (Tableau)

TABLE OF CONTENTS Answers to Review Questions .................................................................................................1

ANSWERS TO REVIEW QUESTIONS 379. What is Big Data? Give a brief definition. Answer: Big Data is data of such volume, velocity, and/or variety that it is difficult for traditional relational database technologies to store and process it.

355

380. What are the traditional 3 Vs of Big Data? Briefly define each. Answer: Volume, velocity, and variety are the traditional 3 Vs of Big Data. Volume refers to the quantity of the data that must be stored. Velocity refers to the speed with which new data is being generated and entering the system. Variety refers to the variations in the structure, or the lack of structure, in the data being captured. 381. Explain why companies like Google and Amazon were among the first to address the Big Data problem. Answer: In the 1990s, the use of the Internet exploded and commercial websites helped attract millions of new consumers to online transactions. When the dot-com bubble burst at the end of the 1990s, the millions of new consumers remained but the number of companies providing them services reduced dramatically. As a result, the surviving companies, like Google and Amazon, experienced exponential growth in a very short time. This led to these companies being among the first to experience the volume, velocity, and variety of data that is associated with Big Data. 382. Explain the difference between scaling up and scaling out. Answer: Scaling up involves improving storage and processing capabilities through the use of improved hardware, software, and techniques without changing the quantity of servers. Scaling out involves improving storage and processing capabilities through the use of more servers. 383. What is stream processing, and why is it sometimes necessary? Answer: Stream processing is the processing of data inputs to make decisions on which data should be stored and which data should be discarded. In some situations, large volumes of data can enter the system at such a rapid pace that it is not feasible to try to actually store all of the data. The data must be processed and filtered as it enters the system to determine which data to keep and which data to discard.

356

384. How is stream processing different from feedback loop processing? Answer: Stream processing focuses on inputs, while feedback loop processing focuses on outputs. Stream processing is performed on the data as it enters the system to decide which data should be stored and which should be discarded. Feedback loop processing uses data after it has been stored to conduct analysis for the purpose of making the data actionable by decision makers. 385. Explain why veracity, value, and visualization can also be said to apply to relational databases as well as Big Data. Answer: Veracity of data is an issue with even the smallest of data stores, which is why data management is so important in relational databases. Value of data also applies to traditional, structured data in a relational database. One of the keys to data modeling is that only the data that is of interest to the users should be included in the data model. Data that is not of value should not be recorded in any data store—Big Data or not. Visualization was discussed and illustrated at length in Chapter 13 as an important tool in working with data warehouses, which are often maintained as structured data stores in relational DBMS products. 386. What is polyglot persistence, and why is it considered a new approach? Answer: Polyglot persistence is the idea that an organization’s data storage solutions will consist of a range of data storage technologies. This is a new approach because the relational database has previously dominated the data management landscape to the point that the use of a relational DBMS for data storage was taken for granted in most cases. With Big Data problems, the reliance on only relational databases is no longer valid. 387. What are the key assumptions made by the Hadoop Distributed File System approach? Answer: HDFS is designed around the following assumptions: High volume Write-once, read-many Streaming access Fault tolerance HDFS assumes that the massive volumes of data will need to be stored and retrieved. HDFS assumes that data will be written once, that is, there will very rarely be a need to update the data once it has been written to disk. However, the data will need to be retrieved many times. HDFS assumes that when a file is retrieved, the entire contents of the file will need to be streamed in a sequential fashion. HDFS does not work well when only small parts of a file are needed. Finally, HDFS assumes that failures in the servers will be frequent. As the number of servers increases, the probability of a failure increases significantly. HDFS assumes that servers will fail so the data must be redundant to avoid loss of data when servers fail.

357

388. What is the difference between a name node and a data node in HDFS? Answer: The name node stores the metadata that tracks where all of the actual data blocks reside in the system. The name node is responsible for coordinating tasks across multiple data nodes to ensure sufficient redundancy of the data. The name node does not store any of the actual user data. The data nodes store the actual user data. A data node does not store metadata about the contents of any data node other than itself. 389. Explain the basic steps in MapReduce processing. Answer: 

A client node submits a job to the Job Tracker.



Job Tracker determines where the data to be processed resides.



Job Tracker contacts the Task Tracker on the nodes as close as possible to the data.



Each Task Tracker creates mappers and reducers as needed to complete the processing of each block of data and consolidate that data into a result.



Task Trackers report results back to the Job Tracker when the mappers and reducers are finished.



The Job Tracker updates the status of the job to indicate when it is complete.

390. Briefly explain how HDFS and MapReduce are complementary to each other. Answer: Both HDFS and MapReduce rely on the concept of massive, relatively independent, distributions. HDFS decomposes data into large, independent chunks of data that are then distributed across a number of independent servers. MapReduce decomposes processing into independent tasks that are distributed across a number of independent servers. The distribution of data in HDFS is coordinated by a name node server that collects data from each server about the state of the data that it holds. The distribution of processing in MapReduce is coordinated by a job tracker that collects data from each server about the state of the processing it is performing. 391. What are the four basic categories of NoSQL databases? Answer: Key-value database, document databases, column family databases, and graph databases. 392. How are the value components of a key-value database and a document database different? Answer: In a key-value database, the value component is nonintelligible for the database. In other words, the DBMS is unaware of the meaning of any of the data in the value component—it is treated as an indecipherable mass of data. All processing of the data in the value component must be accomplished by the application logic. In a document database, the value component is partially interpretable by the DBMS. The DBMS can identify and search for specific tags, or subdivisions, within the value component.

358

393. Briefly explain the difference between row-centric and column-centric data storage. Answer: Row-centric storage treats a row as the smallest data storage unit. All of the column values associated with a particular row of data are stored together in physical storage. This is the optimal storage approach for operations that manipulate and retrieve all columns in a row, but only a small number of rows in a table. Column-centric storage treats a row as a divisible collection of values that are stored separately with the values of a single column across many rows being physically stored together. This is optimal when operations manipulate and retrieve a small number of columns in a row for all rows in the table. 394. What is the difference between a column and a super column in a column family database? Answer: Columns in a column family database are relatively independent of each other. A super column is a group of columns that are logically related. This relationship can be based on the nature of the data in the columns, such as a group of columns that comprise an address, or it can be based on application processing requirements. 395. Explain why graph databases tend to struggle with scaling out? Answer: Graph databases are designed to address problems with highly related data. The data that appears in a graph database are tightly integrated and queries that traverse a graph focus on the relationships among the data. Scaling out requires moving data to number of different servers. As a general rule, scaling out is recommended when the data on each server is relatively independent of the data on other servers. Due to the dependencies among the data on different servers in a graph database, the inter-server communication overhead is very high with a graph database. This has a significant negative impact on the performance of graph databases in a scaled out environment. 396. Explain what it means for a database to be aggregate aware. Answer: Aggregate aware means that the designer of the database has to be aware of the way the data in the database will be used, and then design the database around whichever component would be central to that usage. Instead of decomposing the data structures to eliminate redundancy, an aggregate aware database collects, or aggregates, all of the data around a central component to minimize the structures required during processing.

ANSWERS TO REVIEW QUESTIONS 397. Give some examples of database connectivity options and what they are used for.

359

Answer: Database connectivity refers to the mechanisms through which application programs connect and communicate with data repositories. The database connectivity software is also known as database middleware, because it represents a piece of software that interfaces between the application program and the database. The data repository is also known as the data source, because it represents the data management application (i.e., an Oracle RDBMS, SQL Server DBMS, or IBM DBMS) that will be used to store the data generated by the application program. Ideally, a data source or data repository could be located anywhere and hold any type of data. For example, the data source could be a relational database, a hierarchical database, and a spreadsheet, a text data file. There are many different technologies for database connectivity, for example: 

Native SQL connectivity. Provided by the database vendors to connect to their databases.



Microsoft’s Open Database Connectivity (ODBC), Data Access Objects (DAO) and Remote Data Objects (RDO), Microsoft’s Object Linking and Embedding for Database (OLE-DB) and Microsoft’s ActiveX Data Objects (ADO.NET). These technologies allow Windows-based applications to access multiple types of data sources: text files, spreadsheets, databases, etc.



Java Database Connectivity (JDBC)—used to connect Java-based applications to multiple different databases.

398. What are ODBC, DAO, and RDO? How are they related?

360

Answer: Open Database Connectivity (ODBC) is Microsoft’s implementation of a superset of the SQL Access Group Call Level Interface (CLI) standard for database access. ODBC allows any Windows application to access relational data sources using SQL via a standard application programming interface (API). ODBC was the first widely adopted database middleware standard and enjoyed rapid adoption in Windows applications. As programming languages evolved, ODBC did not provide significant functionality beyond the ability to execute SQL to manipulate relational style data. Therefore, programmers needed a better way to access data. To answer this need, Microsoft developed two other data access interfaces: 

Data Access Objects (DAO) is an object-oriented API used to access MS Access, MS FoxPro, and dBase databases (using the Jet data engine) from Visual Basic programs. DAO provided an optimized interface that exposed the functionality of the Jet data engine (on which MS Access database is based on) to programmers. The DAO interface can also be used to access other relational style data sources.



Remote Data Objects (RDO) is a higher-level object-oriented application interface used to access remote database servers. RDO uses the lower-level DAO and ODBC for direct access to databases. RDO was optimized to deal with server-based databases, such as MS SQL Server, Oracle, and DB2.

399. What is the difference between DAO and RDO? Answer: DAO uses the MS Jet engine to access file-based relational databases such as MS Access, MS FoxPro, and dBase. In contrast, RDO allows access to relational database servers such as SQL Server, DB2, and Oracle. RDO uses DAO and ODBC to access remote database server data. 400. What are the three basic components of the ODBC architecture? Answer: The basic ODBC architecture is composed of three main components: 

A high-level ODBC API through which application programs access ODBC functionality.



A Driver Manager component that is in charge of managing all database connections.



An ODBC Driver component that talks directly to the DBMS (data source).

401. What steps are required to create an ODBC data source name? Answer: To define a data source you must create a data source name (DSN) for the data source. To create a DSN you have to provide: 

An ODBC driver. You must identify the driver to use to connect to the data source. The ODBC driver is normally provided by the database vendor; although Microsoft provides several drives to connect to the most common databases. For example, if you are using an Oracle DBMS you will select the Oracle ODBC drive provided by Oracle or if desired, the Microsoft-provided ODBC Driver for Oracle.



A DSN name. This is a unique name by which the data source will be known to ODBC and, therefore, to the applications. ODBC offers two types of data sources: User and System. User data sources are only available to the user. System data sources are available to all users, including operating system services.

361



ODBC driver parameters. Most ODBC drivers require some specific parameters in order to establish a connection to the database. For example, if you are using a MS Access database, you must point to the location of the MS Access (.mdb) file and, if necessary, provide the user name and password. If you are using a DBMS server, you must provide the server name, the database name, and the username and password used to connect to the database. Figure 15.3—borrowed from the text and reproduced here for your convenience—shows the ODBC screens required to create a system ODBC data source for an Oracle DBMS. Note that some ODBC drivers use the native driver provided by the DBMS vendor.

FIGURE 15.3 Configuring an Oracle ODBC Data Source

402. What is OLE-DB used for, and how does it differ from ODBC? Answer: Although ODBC, DAO, and RDO were widely used, they did not provide support for nonrelational data. To answer the need for nonrelational data access and to simplify data connectivity, Microsoft developed Object Linking and Embedding for Database (OLE-DB). Based on Microsoft’s Component Object Model (COM), OLE-DB is a database middleware that was developed to add object-oriented functionality for access to relational and nonrelational data. OLE-DB was the first piece of Microsoft’s strategy to provide a unified object-oriented framework for the development of next-generation applications. 403. Explain the OLE-DB model based on its two types of objects.

362

Answer: OLE-DB is composed of a series of COM objects that provide low-level database connectivity for applications. Because OLE-DB is based on the COM object model, the objects contain data and methods (also known as the interface.) The OLE-DB model is better understood when you divide its functionality in two types of objects: 

Consumers are all those objects (applications or processes) that request and use data. The data consumers request data by invoking the methods exposed by the data provider objects (public interface) and passing the required parameters.



Providers are the objects that manage the connection with a data source and provide data to the consumers. Providers are divided in two categories: data providers and service providers.  Data providers provide data to other processes. Database vendors create data provider objects that expose the functionality of the underlining data source (relational, object-oriented, text, and so on.)  Service providers provide additional functionality to consumers. The service provider is located between the data provider and the consumer: The service provider requests data from the data provider; transforms the data and provides the transformed data to the data consumer. In other words, the service provider acts like a data consumer of the data provider and as a data provider for the data consumer (end-user application). For example, a service provider could offer cursor management services, transaction management services, query processing services, and indexing services.

404. How does ADO complement OLE-DB? Answer: OLE-DB provided additional capabilities for the applications accessing the data. However, it did not provide support for scripting languages, especially the ones used for web development, such as Active Server Pages (ASP) and ActiveX. To provide such support, Microsoft developed a new object framework called ActiveX Data Objects (ADO). ADO provides a high-level application-oriented interface to interact with OLE-DB, DAO, and RDO. ADO provided a unified interface to access data from any programming language that uses the underlying OLE-DB objects. Figure 15.5—borrowed from the text and reproduced here for your convenience—illustrates the ADO/OLE-DB architecture and how it interacts with ODBC and native connectivity options.

363

FIGURE 15.5 OLE-DB Architecture Client Applications

OLE-DB Consumers

Access

C++

Excel

ActiveX Data Objects (ADO)

OLE-DB Services Providers Email Processing

Indexing Processing

Cursor Processing

Query Processing

OLE-DB Data Providers OLE-DB Provider for Oracle

OLE-DB Provider for Exchange

OLE-DB Provider for SQL Server

OLE-DB Provider for ODBC

SQL*NET

DATABASE

ODBC

SQL-Server

DATABASE

405. What is ADO.NET, and what two new features make it important for application development? Answer: ADO.NET is the data access component of Microsoft’s .NET application development framework. Microsoft’s .NET framework is a component-based platform for the development of distributed, heterogeneous, interoperable applications aimed to manipulate any type of data, over any network, and under any operating system and programming language. The .NET framework is beyond the reach of this book. Therefore, this section will only introduce the basic data access component of the .NET architecture, ADO.NET. ADO.Net introduced two new features critical for the development of distributed applications: datasets and XML support. 

A DataSet is a disconnected memory-resident representation of the database.



ADO.NET stores all its internal data in XML format.

364

406. What is a DataSet, and why is it considered to be disconnected? Answer: A DataSet is a disconnected memory-resident representation of the database. That is, the DataSet contains tables, columns, rows, relationships, and constraints. Once the data are read from a data provider, the data are placed on a memory-resident DataSet. The DataSet is then disconnected from the data provider. The data consumer application interacts with the data in the DataSet object to make changes (inserts, updates, and deletes) in the dataset. Once the processing is done, the DataSet data are synchronized with the data source, and the changes are made permanent. A DataSet is in fact a simple database with tables, rows, and constraints. Even more important, the DataSet doesn’t require keeping a permanent connection to the data source. The DataAdapter uses the SelectCommand to populate the DataSet from a data source. However, once the DataSet is populated, it is completely independent of the data source— that’s why it’s called “disconnected.” 407. What are web server interfaces used for? Give some examples. Answer: Web server interfaces are used to extend the functionality of the web server to provide more services. If a web server is to communicate with other external programs to provide a service successfully, both programs must use a standard way to exchange messages and respond to requests. A web server interface defines how a web server communicates with external programs. Currently, there are two well-defined web server interfaces: 

Common Gateway Interface (CGI)



Application Programming Interface (API)

Web server interfaces can be used to extend the services of a web server and provide support for access to external databases, fax services, telephony services, and directory services. 408. Search the Internet for web application servers. Choose one and prepare a short presentation for your class. Answer: You are encouraged to use any web search engine to list multiple vendors. Examples of such vendors are: Oracle Application Server, IBM WebSphere, Sun Java, Microsoft, and JBOSS. We encourage the student to visit the webpages of the products and compare the features of at least two products. Some of the many other web application servers, as of this writing, include Oracle Application Server by Oracle Corp., WebLogic by BEA Systems, NetDynamics by Sun Microsystems, NetObjects’ Fusion, Microsoft’s Visual Studio.NET, and WebObjects by Apple. 409. What does this statement mean: “The web is a stateless system.” What implications does a stateless system have for database application developers?

365

Answer: Simply put, the label stateless system indicates that, at any given time, a web server does not know the status of any of the clients communicating with it. That is, there is no open communications line between the server and each client accessing it—that, of course, is impractical on a worldwide web! Instead, client and server computers interact in very short “conversations” that follow the request-reply model. For example, the browser is only concerned with the current page, so there is no way for the second page to know what was done on the first page. The only time the client and server computers communicate is when the client requests a page—when the user clicks a link—and the server sends the requested page to the client. Once the client receives the page and its components, the client/server communication is ended. Therefore, although you may be browsing a page and think that the communication is open, you are actually just browsing the HTML document stored in the local cache (temporary directory) of the client browser. The server does not have any idea what the end user is doing with the document, what data is entered in a form, and what option is selected. On the web, if we want to act on a client’s selection, we need to jump to a new page (go back to the web server), therefore losing track of whatever was done before! Not knowing what was done before or what a client selected before it got to this page makes adding business logic to the web cumbersome. For example, suppose that you need to write a program that performs the following steps: display a data entry screen, capture data, validate data, and save data. This entire sequence can be completed in a single COBOL program because COBOL uses a working storage section that holds in memory all variables used in the program. Now imagine the same COBOL program—but each section (PERFORM statement) is now a separate program! That is precisely how the web works. In short, the web’s stateless nature means that extensive processing required by a program’s execution cannot be done directly on a single webpage; the client browser’s processing ability is limited by the lack of processing ability and the lack of a working storage area to hold variables used by all pages in a website. The browser does not have computational abilities beyond formatting output text and accepting form field inputs. Even when the browser accepts form field data, there is no way to perform immediate data entry validation. Therefore, to perform such crucial processing in the client, the web defers to other web programming languages such as Java, JavaScript, and VBScript. 410. What is a web application server, and how does it work from a database perspective? Answer: A web application server extends the functionality of a web server and provides features such as: 

An integrated development environment with session management and support for persistent application variables.



Security and authentication of users through user IDs and passwords.



Computational languages to represent and store business logic in the application server.



Automatic generation of HTML pages integrated with Java, JavaScript, VBScript, and ASP.



Performance and fault-tolerant features.



Database access with transaction management capabilities.

366



Access to multiple services, such as file transfers (FTP), database connectivity, electronic mail, and directory services.

The web application server interfaces with the database connectivity standards to access databases using any of the supported APIs. So, a web page will be processed by the web application server; the application server will connect to the database using the ADO, OLEDB, or ODBC standard (or any other standard supported by the application server). 411. What are scripts, and what is their function? (Think in terms of database application development.) Answer: A script is a series of instructions executed in interpreter mode. The script is a plain text file that is not compiled like COBOL, C++, or Java. Scripts are normally used in web application development environments. For instance, ColdFusion scripts contain the code that is required to connect, query, and update a database from a web front end. 412. What is XML, and why is it important? Answer: Extensible Markup Language (XML) is a meta-language used to represent and manipulate data elements. XML is designed to facilitate the exchange of structured documents such as orders or invoices over the Internet. The World Wide Web Consortium (W3C) published the first XML 1.0 standard definition in 1998. This standard sets the stage for giving XML the real-world appeal of being a true vendor-independent platform. Therefore, it is not surprising that XML is rapidly becoming the data exchange standard for e-commerce applications. XML is important because it provides the semantics that facilitates the sharing, exchange, and manipulation of structured documents over organizational boundaries. 413. What are document type definition (DTD) documents, and what do they do? Answer: Companies that exchange data using XML must have a way to understand and validate each other’s tags. One way to accomplish that task is through the use of Document Type Definitions. A Document Type Definition (DTD) is a file with a .dtd extension that describes XML elements—in effect, a DTD file provides the composition of the database’s logical model and defines the syntax rules or valid tags for each type of XML document. (The DTD component is very similar to having a public data dictionary for business data.) 414. What are XML schema definition (XSD) documents and what do they do? Answer: An XML Schema Definition (XSD) document is an advanced data definition language that is used to describe the structure (elements, data types, relationship types, ranges, and default values) of XML data documents. Unlike a DTD document, which uses a unique syntax, an XML Schema Definition (XSD) file uses a syntax that resembles an XML document. One of the main advantages of an XML schema is that it more closely maps to database terminology and features. For example, an XML schema will be able to define common database types, such as date, integer or decimal, minimum and maximum values, list of valid values, and required elements. Using the XML schema, a company would be able to validate the data for values that may be out of range, incorrect dates, and valid values. For example, a university application must be able to specify that a GPA value must be between zero and 4.0, and it must be able to detect an invalid birth date such as “14/13/1987.” (There is no 14th month.) Many vendors are rapidly adopting this new standard and are supplying tools to translate DTD documents into XML Schema Definition (XSD) documents. It is widely expected that XML schemas will replace DTD as the method to describe XML data.

367

415. What is JDBC, and what is it used for? Answer: JDBC stands for Java Database Connectivity. Before we talk about JDBC, let’s talk about Java. Java is an object-oriented programming language developed by Sun Microsystems (now owned by Oracle). Java is one of the most common programming languages for web development. Sun Microsystems created Java as a “write once, run anywhere” environment. That means that a programmer can write a Java application once and then, without any modification, run the application in multiple environments (Microsoft Windows, Apple OS X, IBM AIX, etc.). The cross-platform capabilities of Java are based on its portable architecture. Java code is normally stored in pre-processed chunks known as applets that run on a virtual machine environment in the host operating system. This environment has well-defined boundaries, and all interactivity with the host operating system is closely monitored. Java provides runtime environments for most operating systems (from computers to hand-held devices to TV set-top boxes). Another advantage of using Java is its “on-demand” architecture. When a Java application loads, it can dynamically download all its modules or required components via the Internet. When Java applications want to access data outside the Java runtime environment, they use pre-defined application programming interfaces. Java Database Connectivity (JDBC) is an application programming interface that allows a Java program to interact with a wide range of data sources (relational databases, tabular data sources, spreadsheets, and text files). JDBC allows a Java program to establish a connection with a data source, prepare and send the SQL code to the database server, and process the result set. One of the main advantages of JDBC is that it allows a company to leverage its existing investment in technology and personnel training. JDBC allows programmers to use their SQL skills to manipulate the data in the company’s databases. As a matter of fact, JDBC allows direct access to a database server or access via database middleware. Furthermore, JDBC provides a way to connect to databases through an ODBC driver. (Figure 15.7 in the text illustrates the basic JDBC architecture and the various database access styles.) The database access architecture in JDBC is very similar to the ODBC/OLE/ADO.NET architecture. All database access middleware shares similar components and functionality. One advantage of JDBC over other middleware is that it requires no configuration on the client side. The JDBC driver is automatically downloaded and installed as part of the Java applet download. Because Java is a web-enabled technology, applications can connect to a database directly using a simple URL. Once the URL is invoked, the Java architecture comes into place, the necessary applets are downloaded to the client (including the JDBC database driver and all configuration information), and then the applets are executed securely in the client’s runtime environment. Every day, more and more companies are investing resources in developing and expanding their web presence and finding ways to do more business on the Internet. Such businesses will generate increasing amounts of data that will be stored in databases. Java and the .NET framework are part of the trend toward increasing reliance on the Internet as a critical business resource. In fact, it has been said that the Internet will become the development platform of the future.

368

416. What is cloud computing, and why is it a “game changer”? Answer: According to the National Institute of Standards and Technology (NIST), cloud computing is “a computing model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computer resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” The term “cloud services” is used to refer to the services provided by cloud computing. Cloud services allow any organization to quickly and economically add information technology services such as applications, storage, servers, processing power, databases, and infrastructure to its IT portfolio. Cloud computing is important for database technologies because it has the potential to become a “game changer.” Cloud computing eliminates financial and technological barriers so organizations can leverage database technologies in their business processes with minimal effort and cost. In fact, cloud services have the potential to turn basic IT services into “commodity” services, such as electricity, gas, and water, and to enable a revolution that could change not only the way that companies do business but the IT business itself. As Nicholas Carr put it so vividly: “Cloud computing is for IT what the invention of the power grid was for electricity.” For example, imagine that the chief technology officer of a nonprofit organization wants to add e-mail services to the IT portfolio. A few years ago, this proposition would have implied building the e-mail system’s infrastructure from the ground up, including hardware, software, setup, configuration, operation, and maintenance. However, in today’s cloud computing era, you can use Google Apps for Business or Microsoft Exchange Online and get a scalable, flexible, and more reliable e-mail solution for a fraction of the cost. Most of the cloud services come bundled with extras, for example, Google and Microsoft offer additional services like terabyte-size storage spaces for their users. The best part is that you do not have to worry about the daily chores of managing and maintaining the IT infrastructure, such as OS updates, patches, security, fault tolerance, and recovery. What used to take months or years to implement can now be done in a matter of minutes. 417. Name and contrast the types of cloud computing implementation. Answer: There are basically three cloud computing implementation types (based on who the target customers are): 

Public cloud. This type of cloud infrastructure is built by a third-party organization to sell cloud services to the general public. The public cloud is the most common type of cloud implementation; examples include Amazon Web Services (AWS), Google Application Engine, and Microsoft Azure. In this model, cloud consumers share resources with other consumers transparently. The public cloud infrastructure is managed exclusively by the third-party provider.



Private cloud. This type of internal cloud is built by an organization for the sole purpose of servicing its own needs. Private clouds are often used by large, geographically dispersed organizations to add agility and flexibility to internal IT services. The cloud infrastructure could be managed by internal IT staff or an external third party.

369



Community cloud. This type of cloud is built by and for a specific group of organizations that share a common trade, such as agencies of the federal government, the military, or higher education. The cloud infrastructure could be managed by internal IT staff or an external third party.

418. Name and describe the most prevalent characteristics of cloud computing services. Answer: The basic characteristics of cloud computing services are: 

Ubiquitous access via Internet.



Shared infrastructure.



Lower costs and variable pricing.



Flexible and scalable services.



Dynamic provisioning.



Service orientation.



Managed operations.

419. Using the Internet, search for providers of cloud services. Then, classify the types of services they provide (SaaS, PaaS, and IaaS). Answer: A starting point will be the examples shown in Figure 15.23 in the textbook. Further examples are: 

Google Workspace and Microsoft 365 (SaaS)



Amazon Cloud and Microsoft Azure (PaaS and IaaS)



DropBox.com—a cloud service storage provider (IaaS)



OneDrive.com—a cloud service storage provider (IaaS)



Carbonite.com—provides online backup of data (SaaS)



iCloud.com (Apple)—provides storage and synchronization of Apple device data (contacts, music, apps, photos, documents, backups—IaaS and SaaS)



GoodData.com—business intelligence platform (PaaS)



Heroku.com—Ruby web programming environment service (PaaS)

420. Summarize the main advantages and disadvantages of cloud computing services. Answer: Table 15.4 summarizes the main advantages and disadvantages of cloud computing services.

370

Table 15.4 Advantages and Disadvantages of Cloud Computing Advantage

Disadvantage

Low initial cost of entry. Cloud computing Issues of security, privacy, and compliance. Trusting has lower costs of entry when compared sensitive company data to external entities is difficult with the alternative of building in house. for most data-cautious organizations. Scalability/elasticity. It is easy to add and Hidden costs of implementation and operation. It is remove resources on demand. hard to estimate bandwidth and data migration costs. Support for mobile computing. Cloud Data migration is a difficult and lengthy process. computing providers support multiple types Migrating large amounts of data to and from the of mobile computing devices. cloud infrastructure can be difficult and timeconsuming. Ubiquitous access. Consumers can access Complex licensing schemes. Organizations that the cloud resources from anywhere at any implement cloud services are faced with complex time, as long as they have Internet access. licensing schemes and complicated service-level agreements. High reliability and performance. Cloud providers build solid infrastructures that otherwise are difficult for the average organization to leverage.

Loss of ownership and control. Companies that use cloud services are no longer in complete control of their data. What is the responsibility of the cloud provider if data are breached? Can the vendor use your data without your consent?

Fast provisioning. Resources can be Organization culture. End users tend to be resistant provisioned on demand in a matter of to change. Do the savings justify being dependent minutes with minimal effort. on a single provider? Will the cloud provider be around in 10 years? Managed infrastructure. Most cloud implementations are managed by dedicated internal or external staff. This allows the organization’s IT staff to focus on other areas.

Difficult integration with internal IT system. Configuring the cloud services to integrate transparently with internal authentication and other internal services could be a daunting task.

421. Define SQL data services and list their advantages. Answer: SQL data services refer to Internet-based data management services that provide access to hosted relational data management using standard protocols and common programming interfaces. The advantages of SQL data services include: 

High reliability and scalability of relational database capabilities at a low cost



High level of failure tolerance



Dynamic and automatic load balancing

371



Automated data backup and recovery



Dynamic creation and allocation of database processes and storage

372

ANSWERS TO PROBLEMS ONLINE CONTENT The databases used in the Problems for this chapter can be found at www.cengage.com.

PROBLEMS In the following exercises, you will set up database connectivity using MS Excel.

NOTE Although the precise steps to setup data connectivity vary slightly according to the version of Excel and operating system platform you are using, in general terms, the steps are outlined as indicated in the sections below. Use MS Excel to connect to the Ch02_InsureCo MS Access database using ODBC and retrieve all of the AGENTs. Answer: To perform this task, complete the following steps: 

From Excel, select Data, Get Data, From Other Sources, From Microsoft Query options to retrieve data from an ODBC data source.



Select the MS Access Database* option and click OK.



Select the Database file location and click OK.



Select the table, click on > (arrow) to use all columns in the query, and click Next.



On the Query Wizard—Filter Data click Next.



On the Query Wizard—Sort Order click Next.



Select Return Data to Microsoft Office Excel and click on Finish.



Position the cursor where you want the data to be placed on your spreadsheet and click OK.

The solution is shown in Figure P15.1.

FIGURE P15.1 Solution to Problem 1—Retrieve all AGENTs

422. Use MS Excel to connect to the Ch02_InsureCo MS Access database using ODBC and retrieve all of the CUSTOMERs. Answer: To perform this task, complete the following steps: 

From Excel, select Data, Get Data, From Other Sources, From Microsoft Query options to

373

retrieve data from an ODBC data source. 

Select the MS Access Database* option and click OK.



Select the Database file location and click OK.



Select the table, click on > (arrow) to use all columns in the query, and click Next.



On the Query Wizard—Filter Data click Next.



On the Query Wizard—Sort Order click Next.



Select Return Data to Microsoft Office Excel.



Position the cursor where you want the data to be placed on your spreadsheet and click OK.

The solution is shown in Figure P15.2.

FIGURE P15.2 Solution to Problem 2—Retrieve all CUSTOMERs

423. Use MS Excel to connect to the Ch02_InsureCo MS Access database using ODBC and retrieve the customers whose AGENT_CODE is equal to 503. Answer: To perform this task, complete the following steps: 

From Excel, select Data, Get Data, From Other Sources, From Microsoft Query options to retrieve data from an ODBC data source.



Select the MS Access Database* option and click OK.



Select the Database file location and click OK.



Select the CUSTOMER table, click on the > (arrow) to use all columns in the query and click Next.



On the Query Wizard—Filter Data, select the AGENT_CODE column, select “equals” from the left drop-down box, then select “503” from the right drop-down box, and then click Next.



On the Query Wizard—Sort Order click Next.



Select Return Data to Microsoft Office Excel and click on Finish.



Position the cursor where you want the data to be placed on your spreadsheet and click OK.

The results are shown in Figure P15.3.

374

FIGURE P15.3 Solution to Problem 3—Retrieve all CUSTOMERs with AGENT_CODE=503

424. Create a System DSN ODBC connection called Ch02_SaleCo using the Administrative Tools section of the Windows Control Panel. Answer: To create the DSN, complete the following steps: 

Using Windows, open the Control Panel, open Administrative Tools, open ODBC Data Sources.



Click on the System DSN tab, click on Add, select the Microsoft Access Drive (*.mdb) driver and click on Finish.



On the ODBC Microsoft Access Setup window, enter Ch02_SaleCo on the Data Source Name field.



Under Database, click on the Select button, browse to the location of the MS Access file and click OK. And click OK one more time.



The new system DSN now appears in the list of system data sources.

The results are shown in Figure P15.4.

375

FIGURE P15.4 Solution to Problem 4—Create Ch02_SaleCo System DSN

425. Use MS Excel to list all of the invoice lines for Invoice 103 using the Ch02_SaleCo System DSN. Answer: To perform this task, complete the following steps: 

From Excel, select Data, Get Data, From Other sources, From ODBC.



Select the Ch02_SaleCo data source and click OK.



In the ODBC driver window, select Windows, Use my current credentials and click Connect.



In the Navigator window, select Ch02_SaleCo.



Select the LINE table, then select Transform Data.



On the Power Query Editor, click on the filter on the INV_NUMBER column, deselect all and select 103, and click OK.



In the Ribbon, under File, click Close and Load.



A new LINE worksheet will appear with the LINE data.

The results are shown in Figure P15.5.

376

FIGURE P15.5

Solution to Problem 5—Retrieve all Invoice LINEs with

INV_NUMBER=103

426. Create a System DSN ODBC connection called Ch02_Tinycollege using the Administrative Tools section of the Windows Control Panel. Answer: To perform this task, complete the following steps: 

Using Windows XP, open the Control Panel, open Administrative Tools, ODBC Data Sources.



Click on the System DSN tab, click on Add, select the Microsoft Access Drive (*.mdb) driver and click on Finish.



On the ODBC Microsoft Access Setup window, enter Ch02_TinyCollege on the Data Source Name field.



Under Database, click on the Select button, browse to the location of the MS Access file and click OK twice.



The new system DSN now appears in the list of system data sources.

427. Use MS Excel to list all classes taught in room KLR200 using the Ch02_TinyCollege System DSN. Answer: To perform this task, complete the following steps: 

From Excel, select Data, Get Data, From Other sources, From ODBC.



Select the Ch02_TinyCollege data source and click OK.



In the ODBC driver window, select Windows, Use my current credentials and click Connect.



In the Navigator window, select Ch02_TinyCollege



Select the CLASS table, then select Transform Data



On the Power Query Editor, click on the filter on the CLASS_ROOM column, deselect all and select KLR200, and click OK.



In the Ribbon, under File, click Close and Load.



A new CLASS worksheet will appear with the LINE data.

The results of these actions are shown in Figure P15.7.

377

FIGURE P15.7

Solution to Problem 7—Retrieve all Classes Taught in Room

KLR200

To answer Problems 8−11, use Section 15-3a as your guide. 428. Create a sample XML document and DTD for the exchange of customer data. Answer: The solutions are shown in Figures P15.8a and P15.8b.

FIGURE P15.8a Customer DTD Solution

378

FIGURE P15.8b Customer XML Solution

429. Create a sample XML document and DTD for the exchange of product and pricing data. Answer: The solutions are shown in Figures P15.9a and P15.9b.

379

FIGURE P15.9a Product DTD Solution

380

FIGURE P15.9b Product XML Solution

381

430. Create a sample XML document and DTD for the exchange of order data. Answer: The solutions are shown in Figures P15.10a and P15.10b.

FIGURE P15.10a Order DTD Solution

382

FIGURE P15.10b Order XML Solution

383

431. Create a sample XML document and DTD for the exchange of student transcript data. Use your college transcript as a sample. Answer: The solution to Problem 11 will follow the same format as the previous solutions. However, because Problem 11 requires the students to do some research regarding the information that goes in the transcript data, we have not included a specific solution here. Encourage the student to use his/her creativity and analytical skills to research and create a simple XML file containing the data that is customary on your university. Not all fields in the Student transcript must be included in this exercise. Allow the students to represent just the most important fields. Figure P15.11 shows a sample transcript. Notice the various sections of the transcript. We will focus in those sections independently to help break down the exercise.

FIGURE P15.11 Sample Transcript

We have omitted some student identifying information from this transcript. The XML document will focus on the data elements, please note that the transcript has some “header labels” for reporting purposes that will not be in the XML data file. Use the labels to match the corresponding sections.

384

<?xml version ="1.0"?> A

<StudentTranscript> <StudentInformation> <FirstName>Jane</FirstName> <LastName>Dow</LastName> <UniversityID>M12453456</UniversityID> <StudentType>Continuing</StudentType> </StudentInformation> <CurrentCurriculum>

<CurrentProgram>Master of Science<CurrentProgram> <College>Business</College> <Major>Information Systems </Major> <Department>Information Systems and Analytics< Department> <MajorConcentration>Bus Intelligence & Analytics</MajorConcentration> </CurrentCurriculum> <DegreeAwarded>

<Award>Bachelor of Science</Award> <DegreeDate>05/29/2020</DegreeDate> <DegreeHonors>Magna Cum Laude</DegreeHonors> D

<PrimaryDegree> <College>Behavioral and Health Sciences</College> <Major>Textiles Merchandising Design</Major> <MajorConcentration>Fashion Merchandising</MajorConcentration>

385

<Minor>Business Administration</Minor> </PrimaryDegree </DegreeAwarded> <TransferAccepted>

<Semester>Fall 2015</Semester> F

<College>Broward College</Broward College> <CreditAccepted> <Subject>ELEC</Subject> <Course>ELLD</Course> <Title>LD Start/Success</Title> <Grade>TA</Grade> <CreditHours>3.00</CreditHours> <QualityPoints>0.000</QualityPoints> </CreditAccepted> <CreditAccepted> <Subject>ENGL</Subject> <Course>1010</Course> <Title>Expository Writing</Title> <Grade>TB</Grade> <CreditHours>3.00</CreditHours> <QualityPoints>0.000</QualityPoints> </CreditAccepted> <CreditAccepted>

386

<Subject>PS</Subject> <Course>3210</Course> <Title>International Rel</Title> <Grade>TB</Grade> <CreditHours>3.00</CreditHours> <QualityPoints>0.000</QualityPoints> </CreditAccepted>

<CurrentTermHours> <AttemptHours>9.000</AttemptHours> <PassedHours>9.000</PassedHours> <EarnedHours>9.000</EarnedHours> <GPAHours>0.000</GPAHours> <QualityPoints>0.000</QualityPoints> <GPA>0.000</GPA> </CurrentTermHours> </TransferAccepted> <TransferAccepted> ...

(repeat)

</TransferAccepted> </StudentTranscript> The DTD for the student transcript is shown below: <!ELEMENT StudentTranscript (StudentInformation,CurrentCurriculum,DegreeAwarded,TransferAccepted+)>

387

<!ELEMENT StudentInformation (FirstName, LastName, UniversityID, StudentType)> <!ELEMENT FirstName (#PCDATA )> <!ELEMENT LastName (#PCDATA )> <!ELEMENT UniversityID (#PCDATA )> <!ELEMENT StudentType (#PCDATA )> <!ELEMENT CurrentCurriculum (CurrentProgram, College, Major, Department, MajorConcentration)> <!ELEMENT CurrentProgram (#PCDATA )> <!ELEMENT College (#PCDATA )> <!ELEMENT Major (#PCDATA )> <!ELEMENT Department (#PCDATA )> <!ELEMENT MajorConcentration (#PCDATA )> <!ELEMENT DegreeAwarded (Award, DegreeDate, DegreeHonors, PrimaryDegree)> <!ELEMENT Award (#PCDATA )> <!ELEMENT DegreeDate (#PCDATA )> <!ELEMENT DegreeHonors (#PCDATA )> <!ELEMENT PrimaryDegree (College, Major, MajorConcentration, Minor)> <!ELEMENT College (#PCDATA )> <!ELEMENT Major (#PCDATA )> <!ELEMENT MajorConcentration (#PCDATA )> <!ELEMENT Minor (#PCDATA )>

388

<!ELEMENT TransferAccepted (Semester, College, CreditAccepted+,CurrentTermHours) +> <!ELEMENT Semester (#PCDATA>) <!ELEMENT College (#PCDATA)> <!ELEMENT CreditAccepted (Subject, Course, Title, Grade, CreditHours, QualityPoints) +> <!ELEMENT Subject (#PCDATA )> <!ELEMENT Course (#PCDATA )> <!ELEMENT Title (#PCDATA )> <!ELEMENT Grade (#PCDATA )> <!ELEMENT CreditHours (#PCDATA )> <!ELEMENT QualityPoints (#PCDATA )> <!ELEMENT CurentTermHours (AttemptHours, PassedHours, EarnedHours, GPAHours, QualityPoints, GPA)> <!ELEMENT AttemptHours (#PCDATA )> <!ELEMENT PassedHours (#PCDATA )> <!ELEMENT EarnedHours (#PCDATA )> <!ELEMENT GPAHours (#PCDATA )> <!ELEMENT QualityPoints (#PCDATA )> <!ELEMENT GPA (#PCDATA )>

389

TABLE OF CONTENTS Answers to Review Questions .............................................................................................390

ANSWERS TO REVIEW QUESTIONS 432. Explain the difference between data and information. Give some examples of raw data and information. Answer: Data are raw facts of interest to an end user. Examples of data include a person’s date of birth, an employee name, the number of pencils in stock, and so on. Data represent a static aspect of a real world object, event, or thing. Information is processed data. That is, information is the product of applying some analytical process to data. For example, invoice data may include the invoice number, customer, items purchased, invoice total, and so on. The end user can generate information by tabulating such data and computing totals by customer, cash purchase summaries, credit purchase summaries, a list of most-frequently purchased items, and so on. 433. Define dirty data and identify some of its sources. Answer: Dirty data is data that contains inaccuracies or inconsistencies (i.e., data that lacks integrity). Dirty data may result from a lack of enforcement of integrity constraints, typographical errors, the use of synonyms and homonyms across systems, the use of nonstandard abbreviations, or differences in the decomposition of composite attributes. 434. What is data quality, and why is it important? Answer: Data quality is a comprehensive approach to ensuring the accuracy, validity, and timeliness of the data. Data quality is important because without quality data, accurate and timely information cannot be produced. Without accurate and timely information, it is difficult (impossible?) to make good decisions; and without good decisions, organizations will fail in their competitive environments. 435. Explain the interactions among end user, data, information, and decision-making. Draw a diagram and explain the interactions. Answer: End users apply intelligence to data to produce information. This information is combined with existing knowledge to create new knowledge that is used to make decisions. The interactions are illustrated in Figure IM16.1.

390

FIGURE IM16.1 End User, Data, Information, and Decision-Making Interaction

436. Suppose that you are a DBA. What data dimensions would you describe to top-level managers to obtain their support for endorsing the data administration function? Answer: The first step will be to emphasize the importance of data as a company asset, to be managed as any other asset. Top-level managers must understand this crucial notion and must be willing to commit company resources to manage data as an organizational asset. The next step is to identify and define the need for and role of the DBMS in the organization. Top-level managers are supported through the DBMS’s ability to provide necessary information for strategic planning, provide access to internal and external data to identify growth opportunities, provide a framework for enforcing organizational policies, improve the likelihood of positive return on investment by searching for new ways to cut costs and boost productivity, and provide feedback to monitor goal achievement. 437. How and why did database management systems become the organizational data management standard in organizations? Discuss some of the advantages of the database approach over the file-system approach. Answer: Prior to database approaches, organizations relied on file systems. The data files in the file system were “owned” by individual functional areas within the organization, often for their exclusive use. This led to high levels of redundancy and a lack of data consistency across the organization. As the need increased for more accurate data to produce more accurate information to support increasingly integrated functions, the deficiencies of file systems became unacceptable. Databases provided an organizational ownership of data that was shared across applications and functional areas. As a result, the importance of data resources grew, and organizations found ever more applications of their data. Technology departments shifted focus from data processing to information processing to support organizational decision making. 438. Using a single sentence, explain the role of databases in organizations. Then explain your answer. Answer: The single sentence will be: The database’s predominant role is to support managerial decision making at all levels in the organization. Databases support top, middle, and operational management. At the top level of management, the database supports strategic decisions for growth. At the middle level of management, the database supports monitoring and feedback on tactical decisions; while at the operational level, database supports feedback and control of operations. 439. Define security and privacy. How are these two concepts related?

391

Answer: Security means protecting the data against accidental or intentional use by unauthorized users. Privacy deals with the rights of people and organizations to determine who accesses the data and when, where, and how the data are to be used. The two concepts are closely related. In a shared system, individual users must ensure that the data are protected from unauthorized use by other individuals. Also, the individual user must have the right to determine who, when, where, and how other users use the data. The DBMS must provide the tools to allow such flexible management of the data security and access rights in a company database. 440. Describe and contrast the information needs at the strategic, tactical, and operational levels in an organization. Use examples to explain your answer. Answer: Strategic levels of the organization need support of decisions that involve organization-wide issues, such as growth into new markets or responses to environmental threats and opportunities. Strategic decisions typically involve setting goals. Tactical decisions involve the high-level actions to implement the goals set at the strategic level, such as monitoring and controlling the use of company resources. Operational decisions involve the internal and external transactions to conduct the business of the organization within the actions defined in the tactical decisions. Operational-level support can involve activities such as querying the shipping status of a customer’s order. 441. What special considerations must you take into account when introducing a DBMS into an organization? Answer: Managerial, technical, and cultural issues must be taken into account when a new DBMS is to be introduced in an organization. For example, focus the discussion on such questions as: 

What about retraining requirements for the new system?  Who needs to be retrained?  What must be the type and extent of the retraining?



Is it reasonable to expect some resistance to change  from the computer services department administrator(s)?  from secretaries?  from technical support personnel?  from other departmental end users?



How will the resistance in the preceding question be manifested?



How will you deal with such resistance?

442. Describe the DBA’s responsibilities. Answer: The database administrator (DBA) is the person responsible for the control and management of the shared database within an organization. The DBA controls the database administration function within the organization. The DBA is responsible for managing the overall corporate data resource, both computerized and noncomputerized. Therefore, the DA is given a higher degree of responsibility and authority than

392

the DBA. Depending on organizational style, the DBA and DA roles may overlap and may even be combined in a single position or person. The DBA position requires both managerial and technical skills. Refer to Section 16-5 and Table 16.1 to explain and illustrate the general responsibilities of the DA and DBA functions. 443. How can the DBA function be placed within the organization chart? What effect(s) will such placement have on the DBA function? Answer: The DBA function placement varies from company to company and may be either a staff or line position. In a staff position, the DBA function creates a consulting environment in which the DBA is able to devise the overall data-administration strategy but does not have the authority to enforce it. In a line position, the DBA function has both the responsibility and the authority to plan, define, implement, and enforce the policies, standards, and procedures. 444. Why and how are new technological advances in computers and databases changing the DBA’s role? Answer: The DBA function is probably one of the most dynamic functions of any organization. New technological developments constantly change the DBA function. For example, note how each of the following influences the DBA function: 

the development of the DDBMS



the development of the OODBMS



the increasing use of Cloud solutions



the rapid integration of Intranet and Extranet applications and their effects on the database design, implementation, and management. (Security issues become especially important!)

445. Explain the DBA department’s internal organization, based on the DBLC approach. Answer: The DBA department may be organized based on the DBLC by allocating personnel and resources based on the DBLC phases. In this approach, the department is organized into the following units: 

Planning



Design



Implementation



Operation



Training

393

446. Explain and contrast the differences and similarities between the DBA and DA. Answer: Both the data administrator (DA) and database administrator (DBA) positions require both managerial skills and technical skills. The DA position puts more emphasis on managerial skills, while the DBA position emphasizes the technical skills more. The DA role performs strategic planning of long-term goals, and sets policies and standards based on those goals. The DA job is broad in scope and views data as a corporate asset across the organization. The DBA role controls and supervises the execution of plans to achieve goals within the standards set. 447. Explain how the DBA plays an arbitration role for an organization’s two main assets. Draw a diagram to facilitate your explanation. Answer: The DBA plays a role in the arbitration of interactions between people and data. The DBA sets and enforces standards for the interactions of users and programmers for interacting with the data. The people of the organization, end users, and application programmers interact with the data through application programs and DBMS interfaces. The DBA sets the standards for how the application programs interact with the database and may be involved in verifying that applications conform to those standards. The DBA will define the uses of the DBMS interfaces presented to the users and limit the actions that can be taken by the users with those interfaces.

FIGURE IM16.16 DBA Arbitrates Interactions Between People and Data

448. Describe and characterize the skills desired for a DBA. Answer: The skills for a DBA can be characterized as either managerial or technical. Managerial skills include a broad business understanding, coordination skills, analytical skills, conflict resolution skills, communication skills, and negotiation skills. Technical skills include broad data processing background with up-to-date knowledge of database technologies, understanding of the SDLC, structured development methodologies, DBLC, database modeling skills, and operational database skills.

394

449. What are the DBA’s managerial roles? Describe the managerial activities and services provided by the DBA. Answer: The DBA is a manager responsible for controlling and planning database administration. Activities of the DBA in this regard include planning, organizing, testing, monitoring, and delivering services such as end-user support, standards for data access, security and privacy, backup and recovery, and data distribution. 450. What DBA activities are used to support end users? Answer: DBA activities to support end users include, gathering user requirements, building end-user confidence, resolving conflicts and problems, finding solutions to information needs, ensuring quality and integrity of data, and managing the training and support of DBMS users. 451. Explain the DBA’s managerial role in the definition and enforcement of policies, procedures, and standards. Answer: A successful data administration strategy requires the continuous enforcement of policies as statements of direction or action of DBA goals. The data administration strategy must also provide and enforce standards to implement those policies, such as defining structures for applications and naming conventions that programmers must use. Finally, the strategy must include procedures, that is, the precise step-by-step instructions for how to perform a task so that it is in compliance with the standards and policies. 452. Protecting data security, privacy, and integrity are important database functions. What activities are required in the DBA’s managerial role of enforcing these functions? Answer: The DBA is responsible for defining, documenting, and communicating policies, standards, and procedures for these functions. 453. Discuss the importance and characteristics of database data backup and recovery procedures. Then describe the actions that must be detailed in backup and recovery plans. Answer: Data loss can be ruinous for companies. DBAs must ensure that data can be fully recovered in the case of data loss or loss of database integrity. The backup and recovery plan must include periodic data and application backups, proper identification of backups, safe backup storage, physical protection of the hardware and software, and typically insurance coverage for the data. 454. Assume that your company assigned you the responsibility of selecting the corporate DBMS. Develop a checklist for the technical and other aspects involved in the selection process. Answer: The checklist should address selection criteria based on the following: 

DBMS model



DBMS storage capacity



Application development support



Security and integrity



Backup and recovery



Concurrency control



Performance

395



Database administration tools



Interoperability and data distribution



Portability and standardization



Hardware requirements



Data dictionary accessibility



Vendor training and support availability



Available third-party tools



Cost

455. Describe the activities that are typically associated with the design and implementation services of the DBA technical function. What technical skills are desirable in the DBA’s personnel? Answer: The DBA performs many design activities. The technical function area during design includes helping with the creation of conceptual, logical, and physical database design, and evaluation of transactions within application programs to ensure the transactions are correct, efficient, and compliant with integrity standards. During implementation, the DBA technical functions include implementation of the physical design, creation, and evaluation of the application access plan, and development and testing of operational procedures such as training, security, and backup plans. 456. Why are testing and evaluation of the database and applications not done by the same people who are responsible for the design and implementation? What minimum standards must be met during the testing and evaluation process? Answer: Testing and evaluation of the database and applications are done by different people than the designers and implementers because the designers and implementers are often too close to the problem to recognize any omissions. The testing must include backup and recovery; security; integrity; use of SQL; application performance; evaluation of written documentation and procedures; observance of standards for naming, documenting, and coding; checking for data duplication conflicts with existing data; and the enforcement of data validation rules. 457. Identify some bottlenecks in DBMS performance, and then propose some solutions used in DBMS performance tuning. Answer: The most common bottlenecks for DBMS performance tuning deal with the use of indexes, query optimization algorithms, and management of storage resources. The DBA should create and ensure adherence to an index creation and usage plan. This can include training application programmers on the proper use of SQL statements to take advantage of the indexes. Most query optimization routines are built into the DBMS. However, part of these routines deal with concurrent transactions, and the DBA may be able to configure concurrency options to improve performance for each database individually. Finally, the DBA must configure appropriate storage resources of both primary memory for buffer pools and secondary memory for proper log file size and location. 458. What are the typical activities involved in the maintenance of the DBMS and its utilities and applications? Would you consider application performance tuning to be part of the maintenance activities? Explain your answer.

396

Answer: Database maintenance activities are extensions of the operational activities to ensure the preservation of the database environment. Common activities include reorganization of the database on physical storage devices to maintain performance. Additional database performance tuning is part of the maintenance activities. As the database system enters operation, the database starts to grow. Resources initially assigned to the application are sufficient for the initial loading of the database. As the system grows, the database becomes bigger, and the DBMS requires additional resources to satisfy the demands on the larger database. Database performance will decrease as the database grows and more users access it. The need to monitor and address issues with application performance as the database grows and its use evolves is a part of the process. 459. How do you normally define security? How is your definition of security similar to or different from the definition of database security in this chapter? Answer: The chapter defines security as all activities and measures to ensure the confidentiality, integrity, and availability of data. It is a comprehensive, company-wide approach. 460. What are the levels of data confidentiality? Answer: The levels of data confidentiality are highly restricted, confidential, and unrestricted. 461. What are security vulnerabilities? What is a security threat? Give some examples of security vulnerabilities that exist in different IS components. Answer: A security vulnerability is a weakness in a system component that could be exploited to allow unauthorized access or cause service disruptions. A security vulnerability that is left unfixed is a security threat. Examples include poor user passwords, the copying of data to unauthorized devices, and SQL injection attacks. 462. Define the concept of a data dictionary and discuss the different types of data dictionaries. If you were to manage an organization’s entire data set, what characteristics would you look for in the data dictionary? Answer: A data dictionary is the DBMS component that stores data about the definition of data characteristics and their relationships. It is the location in which metadata is stored. Data dictionaries can be integrated or standalone. Integrated data dictionaries are stored in the database, while standalone data dictionaries are stored outside the database. Relational databases use integrated data dictionaries. Data dictionaries can also be passive or active. An active data dictionary is updated automatically by the DBMS as the metadata changes. Passive data dictionaries must be manually updated, typically through a batch process. 463. Using SQL statements, give some examples of how you would use the data dictionary to monitor the security of the database.

NOTE If you use IBM’s DB2, the names of the main tables are SYSTABLES, SYSCOLUMNS, and SYSTABAUTH. Answer: List the names of all users who have some type of authority over the INVENTORY table: SELECT DISTINCT GRANTEE

397

FROM SYSTABAUTH WHERE TTNAME = ‘INVENTORY’; List the user and table names for all users who can alter the database structure for any table in the database: SELECT GRANTEE, TTNAME FROM SYSTABAUTH WHERE ALTERAUTH = ‘Y’ ORDER BY GRANTEE, TTNAME; 464. What characteristics do a CASE tool and a DBMS have in common? How can these characteristics be used to enhance the data administration? Answer: CASE tools and DBMS products both make extensive use of data repositories. CASE tools maintain a data dictionary of the objects created by the system designer. Many CASE tools can integrate with a database to maintain this repository in the database itself. The CASE tool can be used to design the database structure, making it easy for the DBA and application designers to collaborate on naming conventions, duplication of data elements, and validation rules. 465. Briefly explain the concepts of information engineering (IE) and information systems architecture (ISA). How do these concepts affect the data administration strategy? Answer: Information engineering (IE) is a top-down approach that translates the company’s strategic goals into data and applications to achieve those goals. IE takes the perspective that the data used by the organization rarely changes, even if the processes to use it change frequently. By taking a data-centric approach, the impact of changes in systems is minimized. An information systems architecture (ISA) is the resulting blueprint for data and applications that results from applying an IE approach. These concepts provide a consistent basis for decision making that is rooted in achievement of organizational strategies. 466. Identify and explain some of the critical success factors in the development and implementation of a good data administration strategy. Answer: Critical success factors for the development and implementation of a good data administration strategy include management commitment, thorough analysis of the company’s current situation, end-user involvement, defined standards, training, and the implementation of a small pilot project. Top-level management must set an example and be champions to drive the strategy. The current data situation of the company must be analyzed and a clear vision for the use of data in the organization must be articulated. End-user buy-in is critical and can only be achieved if the end-users are involved in the process.

398

467. How have cloud-based data services affected the DBA’s role.? Answer: The use of cloud-based data services reduces the DBA’s role in infrastructure management. However, the managerial aspects of the DBA role are either largely unchanged or augmented with the coordination, valuation, and evaluation of cloud services. The technical aspects of the DBA’s role may shift to an even greater emphasis on monitoring and controlling the database to ensure security and data integrity. 468. What is the tool used by Oracle to create users? Answer: The Oracle Enterprise Manager simplifies the creation of users. Users can be created using SQL commands, but the OEM helps to automate that task. 469. In Oracle, what is a tablespace? Answer: A tablespace is a logical storage space. Tablespaces are primarily used to logically group related data. Tablespace data are physically stored in one or more datafiles. 470. In Oracle, what is a database role? Answer: A database role is a named collection of database access privileges that authorize a user to perform specified actions on the database. Examples of roles are CONNECT, RESOURCE, and DBA. 471. In Oracle, what is a datafile? How does it differ from a file systems file? Answer: A database is composed of one or more tablespaces. Therefore, there is a 1:M relationship between the database and its tablespaces. Tablespace data are physically stored in one or more datafiles. Therefore, there is a 1:M relationship between tablespaces and datafiles. A datafile physically stores the database data. Each datafile is associated with one and only one tablespace. (But each datafile can reside in a different directory on the same hard disk—or even on different disks.) In contrast to the datafile, a file system’s file is created to store data about a single entity, and the programmer can directly access the file. But file access requires the end user to know the structure of the data that are stored in the file. While a database is stored as a file, this file is created by the DBMS, rather than by the end user. Because the DBMS handles all file operations, the end user does not know—nor does that end user need to know—the database’s file structure. When the DBA creates a database—or, more accurately, uses the Oracle Storage Manager to let Oracle create a database—Oracle automatically creates the necessary tablespaces and datafiles. 472. In Oracle, what is a database profile? Answer: A profile is a named collection of database settings that control how much of the database resource can be used by a given user.

399

ANSWERS TO REVIEW QUESTIONS 473. Discuss the importance of data models. Answer: A data model is a relatively simple representation, usually graphical, of a more complex real-world object event. The data model’s main function is to help us understand the complexities of the real-world environment. The database designer uses data models to facilitate the interaction among designers, application programmers, and end users. In short, a good data model is a communications device that helps eliminate (or at least substantially reduce) discrepancies between the database design’s components and the real-world data environment. The development of data models, bolstered by powerful database design tools, has made it possible to substantially diminish the database design error potential. (Review Sections 2-1 and 2-2 in detail.) 474. What is a business rule, and what is its purpose in data modeling? Answer: A business rule is a brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organization’s environment. In a sense, business rules are misnamed: they apply to any organization—a business, a government unit, a religious group, or a research laboratory; large or small—that stores and uses data to generate information. Business rules are derived from a description of operations. As its name implies, a description of operations is a detailed narrative that describes the operational environment of an organization. Such a description requires great precision and detail. If the description of operations is incorrect or incomplete, the business rules derived from it will not reflect the real-world data environment accurately, thus leading to poorly defined data models, which lead to poor database designs. In turn, poor database designs lead to poor applications, thus setting the stage for poor decision making—which may ultimately lead to the demise of the organization. Note especially that business rules help to create and enforce actions within that organization’s environment. Business rules must be rendered in writing and updated to reflect any change in the organization’s operational environment.

400

Properly written business rules are used to define entities, attributes, relationships, and constraints. Because these components form the basis for a database design, the careful derivation and definition of business rules is crucial to good database design. 475. How do you translate business rules into data model components? Answer: As a general rule, a noun in a business rule will translate into an entity in the model, and a verb (active or passive) associating nouns will translate into a relationship among the entities. For example, the business rule “a customer may generate many invoices” contains two nouns (customer and invoice) and a verb (“generate”) that associates them. 476. Describe the basic features of the relational data model and discuss their importance to the end user and the designer. Answer: A relational database is a single data repository that provides both structural and data independence while maintaining conceptual simplicity. The relational database model is perceived by the user to be a collection of tables in which data are stored. Each table resembles a matrix composed of row and columns. Tables are related to each other by sharing a common value in one of their columns. The relational model represents a breakthrough for users and designers because it lets them operate in a simpler conceptual environment. End users find it easier to visualize their data as a collection of data organized as a matrix. Designers find it easier to deal with conceptual data representation, freeing them from the complexities associated with physical data representation. 477. Explain how the entity relationship (ER) model helped produce a more structured relational database design environment. Answer: An entity relationship model, also known as an ERM, helps identify the database’s main entities and their relationships. Because the ERM components are graphically represented, their role is more easily understood. Using the ER diagram, it’s easy to map the ERM to the relational database model’s tables and attributes. This mapping process uses a series of well-defined steps to generate all the required database structures. (This structures mapping approach is augmented by a process known as normalization, which is covered in detail in Chapter 6 “Normalization of Database Tables.”) 478. Consider the scenario described by the statement “A customer can make many payments, but each payment is made by only one customer.” Use this scenario as the basis for an entity relationship diagram (ERD) representation. Answer: This scenario yields two entities: CUSTOMER and PAYMENT. The ERDs shown in Figure Q2.6 uses the Chen and Crow’s Foot notation as shown in Figure 2.3 in the book.

401

Figure Q2.6 The Chen and Crow’s Foot ERDs for Question 6 Chen model 1 CUSTOMER

M makes

PAYMENT

Crow’s Foot model

CUSTOMER

makes

PAYMENT

NOTE Remind your students again that we have not (yet) illustrated other constructs like cardinality and participation on the ERD’s presentation. Their treatment are covered in detail in Chapter 4, “Entity Relationship (ER) Modeling.” 479. Why is an object said to have greater semantic content than an entity? Answer: An object has greater semantic content because it embodies both data and behavior. That is, the object contains, in addition to data, also the description of the operations that may be performed by the object. 480. What is the difference between an object and a class in the object-oriented data model (OODM)? Answer: An object is an instance of a specific class. It is useful to point out that the object is a run-time concept, while the class is a more static description. Objects that share similar characteristics are grouped in classes. A class is a collection of similar objects with shared structure (attributes) and behavior (methods). Therefore, a class resembles an entity set. However, a class also includes a set of procedures known as methods. 481. How would you model Question 6 with an OODM? (Use Figure 2.4 as your guide.) Answer: The OODM that corresponds to question 6’s ERD is shown in Figure Q2.9:

402

Figure Q2.9 The OODM Model for Question 9

CUSTOMER M PAYMENT

482. What is an ERDM, and what role does it play in the modern (production) database environment? Answer: The extended relational data model (ERDM) is the relational data model’s response to the object-oriented data model (OODM), which adds object extensions to DBMSs based on the relational model. Most current RDBMSes support at least a few of the ERDM’s extensions. For example, support for complex data types such as large binary objects (BLOBs) is now common. In modern production database environments, most DBMSs are deeply rooted in the relational model; however, they tend to offer, and organizations occasionally use, some object extensions. 483. What is a relationship, and what three types of relationships exist? Answer: A relationship is an association among (two or more) entities. Three types of relationships exist: one-to-one (1:1), one-to-many (1:M), and many-to-many (M:N or M:M.) Note: We will learn in Chapter 4 that a relationship can also exist among rows of one (the same) entity. 484. Give an example of each of the three types of relationships. Answer: 1:1 An academic department is chaired by one professor; a professor may chair only one academic department. 1:M A customer may generate many invoices; each invoice is generated by one customer. M:N An employee may have earned many degrees; a degree may have been earned by many employees. 485. What is a table, and what role does it play in the relational model? Answer: Strictly speaking, the relational data model bases data storage on relations. These relations are based on algebraic set theory. However, the user perceives the relations to be tables. In the relational database environment, designers and users perceive a table to be a matrix consisting of a series of row/column intersections.Tables, also called relations, are related to each other by sharing a common entity characteristic. For example, an INVOICE table would contain a customer number that points to that same number in the CUSTOMER table. This feature enables the RDBMS to link invoices to the customers who generated them.

403

Tables are especially useful from the modeling and implementation perspectives. Because tables are used to describe the entities they represent, they provide an easy way to summarize entity characteristics and relationships among entities. And, because they are purely conceptual constructs, the designer does not need to be concerned about the physical implementation aspects of the database design. 486. What is a relational diagram? Give an example. Answer: A relational diagram is a visual representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities. Therefore, it is easy to see what the entities represent and to see what types of relationships (1:1, 1:M, M:N) exist among the entities and how those relationships are implemented. An example of a relational diagram is found in the text’s Figure 2.2. MS Access, Database Tools, “Relationships” option on the main Access menu could be used to illustrate simple relational diagrams. 487. What is connectivity? (Use a Crow’s Foot ERD to illustrate connectivity.) Answer: Connectivity is the relational term to describe the types of relationships (1:1, 1:M, M:N).

In the figure, the business rule that an advisor can advise many students and a student has only one assigned advisor is shown within a relationship with a connectivity of 1:M. The business rule that a student can register only one vehicle to park on campus and a vehicle can be registered by only one student is shown with a relationship with a connectivity of 1:1. Finally, the rule that a student can register for many classes, and a class can be registered for by many students, is shown by the relationship with a connectivity of M:N. 488. Describe the Big Data phenomenon. Answer: Over the last few years, a new wave of data has “emerged” to the limelight. Such data have always existed but did not receive the attention that it is receiving today. These data are characterized for being high volume (petabyte size and beyond), high frequency (data are generated almost constantly), and mostly semi-structured. These data come from multiple and varied sources such as website logs, website posts in social sites, and machinegenerated information (GPS, sensors, etc.). Such data have been accumulated over the years and companies are now awakening to the fact that it contains a lot of hidden

404

information that could help the day-to-day business (such as browsing patterns, purchasing preferences, and behavior patterns). The need to manage and leverage this data has triggered a phenomenon labeled “Big Data.” Big Data refers to a movement to find new and better ways to manage large amounts of web-generated data and derive business insight from it, while, at the same time, providing high performance and scalability at a reasonable cost. 489. What does the term 3 Vs refer to? Answer: The term 3 Vs refers to the 3 basic characteristics of Big Data databases, they are: 



The 3 Vs framework illustrates what companies now know, that the amount of data being collected in their databases has been growing exponentially in size and complexity. Traditional relational databases are good at managing structured data but are not well suited to managing and processing the amounts and types of data being collected in today’s business environment. 490. What is Hadoop, and what are its basic components? Answer: In order to create value from their previously unused Big Data stores, companies are using new Big Data technologies. These emerging technologies allow organizations to process massive data stores of multiple formats in cost-effective ways. Some of the most frequently used Big Data technologies are Hadoop and MapReduce. 

Hadoop is a Java-based, open-source, high-speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop originated from Google’s work on distributed file systems and parallel processing and is currently supported by the Apache Software Foundation.2 Hadoop has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce.

405



491. What are the basic characteristics of a NoSQL database? Answer: Every time you search for a product on Amazon, send messages to friends in Facebook, watch a video in YouTube, or search for directions in Google Maps, you are using a NoSQL database. NoSQL refers to a new generation of databases that address the very specific challenges of the “big data” era and have the following general characteristics: 

Not based on the relational model.



Support distributed database architectures.



Provide high scalability, high availability, and fault tolerance.



Support very large amounts of sparse data.



Geared toward performance rather than transaction consistency.

492. Using the example of a medical clinic with patients and tests, provide a simple representation of how to model this example using the relational model. Answer: As you can see in Figure Q2.20, the relational model stores data in a tabular format in which each row represents a “record” for a given patient. In this case, each patient can have many tests and each test refers to only one patient. As you can see the TestData table contains the PAT_NUM foreign key to point to the PatientData table.

406

493. What is logical independence? Answer: Logical independence exists when you can change the internal model without affecting the conceptual model. When you discuss logical and other types of independence, it’s worthwhile to discuss and review some basic modeling concepts and terminology: 



The terms data model and database model are often used interchangeably. In the text, the term database model is used to refer to the implementation of a data model in a specific database system.



An internal schema depicts a specific representation of an internal model, using the database constructs supported by the chosen database.

407



The external model is the end users’ view of the data environment.

494. What is physical independence? Answer: You have physical independence when you can change the physical model without affecting the internal model. Therefore, a change in storage devices or methods and even a change in operating system will not affect the internal model. The terms physical model and internal model may require a bit of additional discussion: 



408

ANSWERS TO PROBLEMS Use the contents of Figure 2.1 to work Problems 1–3. Write the business rule(s) that govern the relationship between AGENT and CUSTOMER. Answer: Given the data in the two tables, you can see that an AGENT—through AGENT_CODE—can occur many times in the CUSTOMER table. But each customer has only one agent. Therefore, the business rules may be written as follows: One agent can have many customers. Each customer has only one agent. Given these business rules, you can conclude that there is a 1:M relationship between AGENT and CUSTOMER. 495. Given the business rule(s) you wrote in Problem 1, create the basic Crow’s Foot ERD. Answer: The Crow’s Foot ERD is shown in Figure P2.2a.

Figure P2.2a The Crow’s Foot ERD for Problem 3 serves

AGENT

CUSTOMER

Figure P2.2b The Chen ERD for Problem 2 Chen model 1 AGENT

M serves

CUSTOMER

496. Using the ERD you drew in Problem 2, create the equivalent object representation and UML class diagram. (Use Figure 2.4 as your guide.) Answer: The OO model is shown in Figure P2.3a., and the UML class diagram is shown in Figure P2.3b.

409

Figure P2.3a The OO Model for Problem 3 AGENT M CUSTOMER

Figure P2.3b The UML Model for Problem 3

Using Figure P2.4 as your guide, work Problems 4 and 5. The DealCo relational diagram shows the initial entities and attributes for the DealCo stores, which are located in two regions of the country.

Figure P2.4 The DealCo relational diagram 497. Identify each relationship type and write all of the business rules. Answer: One region can be the location for many stores. Each store is located in only one region. Therefore, the relationship between REGION and STORE is 1:M. Each store employs one or more employees. Each employee is employed by one store. (In this case, we are assuming that the business rule specifies that an employee cannot work in more than one store at a time.) Therefore, the relationship between STORE and EMPLOYEE is 1:M. A job—such as accountant or sales representative—can be assigned to many employees. (For example, one would reasonably assume that a store can have more than one sales

410

representative. Therefore, the job title “Sales Representative” can be assigned to more than one employee at a time.) Each employee can have only one job assignment. (In this case, we are assuming that the business rule specifies that an employee cannot have more than one job assignment at a time.) Therefore, the relationship between JOB and EMPLOYEE is 1:M. 498. Create the basic Crow’s Foot ERD for DealCo. Answer: The Crow’s Foot ERD is shown in Figure P2.5a.

Figure P2.5a The Crow’s Foot ERD for DealCo is location for

REGION

STORE

employs

is assigned to

JOB

EMPLOYEE

The Chen model is shown in Figure P2.5b. (Note that you always read the relationship from the “1” to the “M” side.)

Figure P2.5b The Chen ERD for DealCo M

1 is location for

REGION

STORE 1

employs

1 JOB

is assigned to

M EMPLOYEE

Using Figure P2.6 as your guide, work Problems 6−8. The Tiny College relational diagram shows the initial entities and attributes for the college.

411

Figure P2.6 The Tiny College relational diagram 499. Identify each relationship type and write all of the business rules. Answer: The simplest way to illustrate the relationship among ENROLL, CLASS, and STUDENT is to discuss the data shown in Table P2.6. As you examine the Table P2.6 contents and compare the attributes to relational schema shown in Figure P2.6, note these features: 

We have added an attribute, ENROLL_SEMESTER, to identify the enrollment period.



Student 11324 is enrolled in two classes; student 11892 is enrolled in three classes, and student 10345 is enrolled in one class.

Table P2.6 Sample Contents of an ENROLL Table STU_NUM

CLASS_CODE

ENROLL_SEMESTER

ENROLL_GRADE

11324

MATH345-04

SPRING-14

11324

ENG322-11

SPRING-14

11892

CHEM218-05

SPRING-14

11892

ENG322-11

SPRING-14

11892

CIS431-01

SPRING-14

10345

ENG322-07

SPRING-14

All of the relationships are 1:M. The relationships may be written as follows: COURSE generates CLASS. One course can generate many classes. Each class is generated by one course.

412

CLASS is referenced in ENROLL. One class can be referenced in enrollment many times. Each individual enrollment references one class. Note that the ENROLL entity is also related to STUDENT. Each entry in the ENROLL entity references one student and the class for which that student has enrolled. A student cannot enroll in the same class more than once. If a student enrolls in four classes, that student will appear in the ENROLL entity four times, each time for a different class. STUDENT is shown in ENROLL. One student can be shown in enrollment many times. (In database design terms, “many” simply means “more than once.”) Each individual enrollment entry shows one student. 500. Create the basic Crow’s Foot ERD for Tiny College. Answer: The Crow’s Foot model is shown in Figure P2.7a.

Figure P2.7a The Crow’s Foot Model for Tiny College generates

COURSE

CLASS

is referenced in

is shown in

STUDENT

ENROLL

The Chen model is shown in Figure P2.7b.

Figure P2.7b The Chen Model for Tiny College M

1 generates

COURSE

CLASS 1

is referenced in

1 STUDENT

is shown in

M ENROLL

501. Create the UML class diagram that reflects the entities and relationships you identified in the relational diagram.

413

Answer: The OO model is shown in Figure P2.8a, and the UML class diagram is shown in Figure P2.8b.

Figure P2.8a The OO Model for Tiny College COURSE

STUDENT

ENROLL

CRS_CODE

CRS_DESCRIPTION C CRS_CREDIT

ENROLL_SEMESTER C

STU_NUM

ENROLL_GRADE

CLASSES: M

CLASSES: M CLASS

CLASS

CLASS C

CLASS_CODE

STU_LNAME

CLASS_DAYS

STU_FNAME

CLASS_TIME

STU_INITIAL

CLASS_ROOM

STU_DOB

COURSES:

COURSE

ENROLLMENT:

STUDENTS: M STUDENT

ENROLL ENROLLMENT:

Note: C = Character D = Date N = Numeric

ENROLL

Figure P2.8b The UML Model for Tiny College

10. Typically, a hospital patient receives medications that have been ordered by a particular doctor. Because the patient often receives several medications per day, there is a 1:M relationship between PATIENT and ORDER. Similarly, each order can include several medications, creating a 1:M relationship between ORDER and MEDICATION. Answer: c. Identify the business rules for PATIENT, ORDER, and MEDICATION. The business rules reflected in the PATIENT description are: A patient can have many (medical) orders written for him or her. Each (medical) order is written for a single patient. The business rules reflected in the ORDER description are: Each (medical) order can prescribe many medications. Each medication can be prescribed in many orders.

414

The business rules reflected in the MEDICATION description are: Each medication can be prescribed in many orders. Each (medical) order can prescribe many medications. d. Create a Crow’s Foot ERD that depicts a relational database model to capture these business rules.

Figure P2.9 Crow’s foot ERD for Problem 9

415

FIGURE P2.10a The UBA Database Tables

As you discuss the UBA database contents, note in particular the following business rules that are reflected in the tables and their contents: 

A painter can paint may paintings.



Each painting is painted by only one painter.



A gallery can exhibit many paintings.



Each painting is exhibited in only one gallery.

416

c. How might the (independent) tables be related to one another? Figure P2.10b shows the relationships.

FIGURE P2.10b The UBA Relational Model

502. Using the ERD from Problem 10, create the relational schema. (Create an appropriate collection of attributes for each of the entities. Make sure you use the appropriate naming conventions to name the attributes.) Answer: The relational diagram is shown in Figure P2.11.

FIGURE P2.11 The Relational Diagram for Problem 11

503. Convert the ERD from Problem 10 into a corresponding UML class diagram. Answer: The basic UML solution is shown in Figure P2.12.

FIGURE P2.12 The UML for Problem 12

504. Describe the relationships (identify the business rules) depicted in the Crow’s Foot ERD shown in Figure P2.13.

417

Figure P2.13 The Crow’s Foot ERD for Problem 13 Answer: The business rules may be written as follows: 

A professor can teach many classes.



Each class is taught by one professor.



A professor can advise many students.



Each student is advised by one professor.

505. Create a Crow’s Foot ERD to include the following business rules for the ProdCo company: Answer: g. Each sales representative writes many invoices. h. Each invoice is written by one sales representative. i.

Each sales representative is assigned to one department.

Each department has many sales representatives.

k. Each customer can generate many invoices. l.

Each invoice is generated by one customer.

418

Figure P2.14 Crow’s Foot ERD for the ProdCo Company

506. Write the business rules that are reflected in the ERD shown in Figure P2.15. (Note that the ERD reflects some simplifying assumptions. For example, each book is written by only one author. Also, remember that the ERD is always read from the “1” to the “M” side, regardless of the orientation of the ERD components.)

FIGURE P2.15 The Crow’s Foot ERD for Problem 15

Answer: The relationships are best described through a set of business rules: 

One publisher can publish many books.



Each book is published by one publisher.



A publisher can submit many (book) contracts.



Each (book) contract is submitted by one publisher.



One author can sign many contracts.



Each contract is signed by one author.



One author can write many books.

419



Each book is written by one author.

This ERD will be a good basis for a discussion about what happens when more realistic assumptions are made. For example, a book—such as this one—may be written by more than one author. Therefore, a contract may be signed by more than one author. Your students will learn how to model such relationships after they have become familiar with the material in Chapter 3. 507. Create a Crow’s Foot ERD for each of the following descriptions. (Note that the word many merely means more than one in the database modeling environment.) Answer: b. Each of the MegaCo Corporation’s divisions is composed of many departments. Each department has many employees assigned to it, but each employee works for only one department. Each department is managed by one employee, and each of those managers can manage only one department at a time. The Crow’s Foot ERD is shown in Figure P2.16a.

FIGURE P2.16a The MegaCo Crow’s Foot ERD

As you discuss the contents of Figure P2.16a, note the 1:1 relationship between the EMPLOYEE and the DEPARTMENT in the “manages” relationship and the 1:M relationship between the DEPARTMENT and the EMPLOYEE in the “is assigned to” relationship. c. During some period of time, a customer can download many ebooks from BooksOnline. Each of the ebooks can be downloaded by many customers during that period of time. The solution is presented in Figure P2.16b. Note the M:N relationship between CUSTOMER and EBOOK. Such a relationship is not implementable in a relational model.

420

If you want to let the students convert Figure P2.16b’s ERD into an implementable ERD, add a third DOWNLOAD entity to create a 1:M relationship between CUSTOMER and DOWNLOAD and a 1:M relationship between EBOOK and DOWNLOAD. (Note that such a conversion has been shown in the next problem solution.) d. An airliner can be assigned to fly many flights, but each flight is flown by only one airliner. Originally, the student may think that there is a 1:M relationship between AIRCRAFT and FLIGHT. And probably based on the business rule, this would be correct. The teacher could use this opportunity to expand into “real-world” situations and discuss how business rules should be properly defined with a time dimension in mind. Make the students think of a FLIGHT as having a flight number, a date, and an aircraft (among other attributes such as from and to destinations) It is common practice in the airline industry to replace an AIRCRAFT for various reasons (schedule maintenance, engine checkups, engine problems, etc.). So, in practice the same FLIGHT can be performed by a different AIRCRAFT. In this case, you can say that “over a period of time,” a flight can be flown by many aircraft (but only one at a time) and an aircraft can fly many flights. See the ERDs, sample tables, and relational diagram below.

FIGURE P2.16c The Airline Crow’s Foot ERD

Initial M:N Solution AIRCRAFT

flies

FLIGHT

Implementable Solution AIRCRAFT

is assigned to

ASSIGNMENT

shows in

FLIGHT

FIGURE P2.16c The Airline Database Tables

421

FIGURE P2.16c The Airline Relational Diagram

e. The KwikTite Corporation operates many factories. Each factory is located in a region, and each region can be “home” to many of KwikTite’s factories. Each factory has many employees, but each employee is employed by only one factory. The solution is shown in Figure P2.16d.

422

contains

REGION

employs

FACTORY

An employee may have earned many degrees, and each degree may have been earned by many employees.

The solution is shown in Figure P2.16e.

FIGURE P2.16e The Earned Degree Crow’s Foot ERD

EMPLOYEE

earns

DEGREE

Note that this M:N relationship must be broken up into two 1:M relationships before it can be implemented in a relational database. Use the airline ERD’s decomposition in Figure P2.16c as the focal point in your discussion. 508. Write the business rules that are reflected in the ERD shown in Figure P2.17. Answer: A theater shows many movies. A movie can be shown in many theaters. A movie can receive many reviews. Each review is for a single movie. A reviewer can write many reviews. Each review is written by a single reviewer. Note that the M:N relationship between theater and movie must be broken into two 1:M relationships using a bridge table before it can be implemented in a relational database.

423

FIGURE P2.17 The Crow’s Foot ERD for Problem 17

ANSWERS TO REVIEW QUESTIONS ONLINE CONTENT The website (www.cengage.com) includes MS Access databases and SQL script files (Oracle, SQL Server, and MySQL) for all of the datasets used throughout the book. 509. What is the difference between a database and a table? Answer: A table, a logical structure that represents an entity set, is only one of the components of a database. A table stores the end-user data. The database is a structure that houses one or more tables and metadata. The metadata are data about data. Metadata include the data (attribute) characteristics and the relationships between the entity sets. 510. What does it mean to say that a database displays both entity integrity and referential integrity?

424

425

511. Why are entity integrity and referential integrity important in a database? Answer: Entity integrity and referential integrity are important because they are the basis for expressing and implementing relationships in the entity-relationship model. Entity integrity ensures that each row is uniquely identified by the primary key. Therefore, entity integrity means that a proper search for an existing tuple (row) will always be successful. (And the failure to find a match on a row search will always mean that the row for which the search is conducted does not exist in that table.) Referential integrity means that, if the foreign key contains a value, that value refers to an existing valid tuple (row) in another relation. Therefore, referential integrity ensures that it will be impossible to assign a non-existing foreign key value to a table. 512. What are the requirements that two relations must satisfy to be considered union-compatible? Answer: In order for two relations to be union-compatible, both must have the same number of attributes (columns) and corresponding attributes (columns) must have the same domain. The first requirement is easily identified by a cursory glance at the relations’ structures. If the first relation has 3 attributes then the second relation must also have 3 attributes. If the first table has 10 attributes, then the second relation must also have 10 attributes. The second requirement is more difficult to assess and requires understanding the meanings of the attributes in the business environment. Recall that an attribute’s domain is the set of allowable values for that attribute. To satisfy the second requirement for union-compatibility, the first attribute of the first relation must have the same domain as the first attribute of the second relation. The second attribute of the first relation must have the same domain as the second attribute of the second relation. The third attribute of the first relation must have the same domain as the third attribute of the second relation, and so on. NOTE: the professor may further explain that you could apply the UNION operator to two relations with different number of attributes by using the PROJECT operator to project only the common attributes, assuming those attributes share common domains. Remember that for the relational model, the result of a relational set operation is another relation (table). 513. Which relational algebra operators can be applied to a pair of tables that are not unioncompatible? Answer: The Product, Join, and Divide operators can be applied to a pair of tables that are not union-compatible. Divide does place specific requirements on the tables to be operated on; however, those requirements do not include union-compatibility. Select (or Restrict) and Project are performed on individual tables, not pairs of tables. (Note that if two tables are joined, then the result is a single table and the Select or Project operator is performed on that single table.) 514. Explain why the data dictionary is sometimes called “the database designer’s database.” Answer: Just as the database stores data that is of interest to the users regarding the objects in their environment that are important to them, the data dictionary stores data that is of interest to the database designer about the important decisions that were made in regard to the database structure. The data dictionary contains the number of tables that were created, the names of all of those tables, the attributes in each table, the relationships between the tables, the data type of each attribute, the enforced domains of the attributes, and so on. All of these data represent decisions that the database designer had to make and data that the database designer needs to record about the database.

426

515. A database user manually notes that “The file contains two hundred records, each record containing nine fields.” Use appropriate relational database terminology to “translate” that statement. Answer: Using the proper relational terminology, the statement may be translated to “the table—or relation—contains two hundred rows—or, if you like, two hundred tuples, or entities. Each of these rows contains nine attributes.” Use Figure Q3.8 to answer Questions 8–12. 516. Using the STUDENT and PROFESSOR tables, illustrate the difference between a natural join, an equijoin, and an outer join. Answer:

FIGURE Q3.8 The Ch03_CollegeQue Database Tables

STU_CODE

PROF_CODE

DEPT_CODE

128569

512272

531235

553427

427

STU_CODE

STUDENT. PROF_CODE

PROFESSOR. PROF_CODE

DEPT_CODE

128569

512272

531235

553427

STU_CODE

STUDENT. PROF_CODE

PROFESSOR.P ROF_CODE

DEPT_CODE

128569

512272

531235

553427

100278 531268

A left outer join of STUDENT to PROFESSOR would include the matched rows plus the unmatched STUDENT rows:

428

STU_CODE

STUDENT. PROF_CODE

PROFESSOR. PROF_CODE

DEPT_CODE

128569

512272

531235

553427

100278 531268 A right outer join of STUDENT to PROFESSOR would include the matched rows plus the unmatched PROFESSOR row.

STU_CODE

STUDENT. PROF_CODE

PROFESSOR.P ROF_CODE

DEPT_CODE

128569

512272

531235

553427

517. Create the table that would result from πstu_code(student). Answer:

STU_CODE 128569 512272 531235 553427 100278 531268

429

518. Create the table that would result from πstu_code, dept_code(student ⋈ professor). Answer:

STU_CODE

DEPT_CODE

128569

512272

531235

553427

519. Create the basic ERD for the database shown in Figure Q3.8. Answer: Both the Chen and Crow’s Foot solutions are shown in Figure Q3.11.

FIGURE Q3.11 The Chen and Crow’s Foot ERD Solutions for Question 11 Chen ERD (generated with PowerPoint) 1 PROFESSOR

M advises

STUDENT

Crow’s Foot ERD (generated with PowerPoint)

PROFESSOR

advises

STUDENT

Chen ERD (generated with Visio Professional)

NOTE From this point forward, we will show the ERDs in Crow’s Foot format unless the problem specifies a different format.

430

520. Create the relational diagram for the database shown in Figure Q3.8. Answer: The relational diagram, generated in the Microsoft Access Ch03_CollegeQue database, is shown in Figure Q3.11.

FIGURE Q3.11 The Relational Diagram

Use Figure Q3.13 to answer Questions 13–17.

FIGURE Q3.13 The Ch03_VendingCo Database Tables

521. Write the relational algebra formula to apply a UNION relational operator to the tables shown in Figure Q3.13. Answer: The question does not specify the order in which the table should be used in the operation. Therefore, both of the following are correct. BOOTH ⋃ MACHINE MACHINE ⋃ BOOTH You can use this as an opportunity to emphasize that the order of the tables in a UNION command do not change the contents of the data returned.

431

522. Create the table that results from applying a UNION relational operator to the tables shown in Figure Q3.13 Answer:

BOOTH_PRODUCT

BOOTH_PRICE

Chips

1.5

Cola

1.25

Energy Drink Chips Chocolate Bar

2 1.25 1

Note that when the attribute names are different, the result will take the attribute names from the first relation. In this case, the solution assumes the operation was BOOTH UNION MACHINE. If the operation had been MACHINE UNION BOOTH then the attribute names from the MACHINE table would have appeared as the attribute names in the result. Also, notice that the “Chips” from both tables appears in the result, but the “Energy Drink” from both does not. A UNION operator will eliminate duplicate rows from the result; however, the entire row must match for two rows to be considered duplicates. In the case of “Chips”, the product names were the same but the prices were different. In the case of “Energy Drink”, both the product names and the prices matched so the second Energy Drink row was dropped from the result. 523. Write the relational algebra formula to apply an INTERSECT relational operator to the tables shown in Figure Q3.13. Answer: The question does not specify the order in which the table should be used in the operation. Therefore, both of the following are correct. BOOTH ⋂ MACHINE MACHINE ⋂ BOOTH 524. Create the table that results from applying an INTERSECT relational operator to the tables shown in Figure Q3.13. Answer:

BOOTH_PRODUCT Energy Drink

BOOTH_PRICE 2

432

525. Using the tables in Figure Q3.13, create the table that results from MACHINE DIFFERENCE BOOTH. Answer:

MACHINE_PRODUCT

MACHINE_PRICE

Chips

1.25

Chocolate Bar

Note that the order in which the relations are specified is significant in the results returned. The DIFFERENCE operator returns the rows from the first relation that are not duplicated in the second relation. Just as with the INTERSECT operator, the entire row must match an existing row to be considered a duplicate. Use Figure Q3.18 to answer Question 18. 526. Suppose you have the ERD shown in Figure Q3.18. How would you convert this model into an ERM that displays only 1:M relationships? (Make sure you create the revised ERD.) Answer:

FIGURE Q3.18 The Crow’s Foot ERD for DRIVER and TRUCK

The Crow’s Foot solution is shown in Figure Q3.18sol. Note that the original M:N relationship has been decomposed into two 1:M relationships based on these business rules: 

A driver may receive many (driving) assignments.



Each (driving) assignment is made for a single driver.



A truck may be driven in many (driving) assignments.



Each (driving) assignment is made for a single truck.

433

FIGURE Q3.18sol The Crow’s Foot ERM Solution for Question 18

527. What are homonyms and synonyms, and why should they be avoided in database design? Answer: Homonyms appear when more than one attribute has the same name. Synonyms exist when the same attribute has more than one name. Avoid both to avoid inconsistencies. For example, suppose we check the database for a specific attribute such as NAME. If NAME refers to customer names as well as to sales rep names, a clear case of a homonym, we have created an ambiguity, because it is no longer clear which entity the NAME belongs to. Synonyms make it difficult to keep track of foreign keys if they are named differently from the primary keys they point to. Using REP_NUM as the foreign key in the CUSTOMER table to reference the primary key REP_NUM in the SALESREP table is much clearer than naming the CUSTOMER table’s foreign key SLSREP. The proliferation of different attribute names to describe the same attributes will also make the data dictionary more cumbersome to use. Some data RDBMSs let the data dictionary check for homonyms and synonyms to alert the user to their existence, thus making their use less likely. For example, if a CUSTOMER table contains the (foreign) key REP_NUM, the entry of the attribute REP_NUM in the SALESREP table will either cause it to inherit all the characteristics of the original REP_NUM, or it will reject the use of this attribute name when different characteristics are declared by the user. 528. How would you implement a l:M relationship in a database composed of two tables? Give an example. Answer: Let’s suppose that an auto repair business wants to track its operations by customer. At the most basic level, it’s reasonable to assume that any database design you produce will include at least a car entity and a customer entity. Further suppose that it is reasonable to assume that: 

A car is owned just by one customer.



A customer can own more than one car.

434

FIGURE Q3.20 The CUSTOMER owns CAR ERM

Use Figure Q3.21 to answer Question 21. 529. Identify and describe the components of the table shown in Figure Q3.21, using correct terminology. Use your knowledge of naming conventions to identify the table’s probable foreign key(s). Answer:

FIGURE Q3.21 The Ch03_NoComp Database EMPLOYEE Table

Figure Q3.21’s database table contains: 

One entity set: EMPLOYEE.



Six attributes: EMP_NUM, EMP_LNAME, EMP_INIT, EMP_FNAME, DEPT_CODE, and JOB_CODE.



Ten entities: The 10 workers shown in rows 1–10.



One primary key: The attribute EMP_NUM because it identifies each row uniquely.



Use the database shown in Figure Q3.22 to answer Questions 22–27.

435

FIGURE Q3.22 The Ch03_Theater Database Tables

530. Identify the primary keys. Answer: DIR_NUM is the DIRECTOR table’s primary key. PLAY_CODE is the PLAY table’s primary key. 531. Identify the foreign keys. Answer: The foreign key is DIR_NUM, located in the PLAY table. Note that the foreign key is located on the “many” side of the relationship between director and play. (Each director can direct many plays ... but each play is directed by only one director.) 532. Create the ERM. Answer: The entity relationship model is shown in Figure Q3.24.

FIGURE Q3.24 The Theater Database ERD

533. Create the relational diagram to show the relationship between DIRECTOR and PLAY. Answer: The relational diagram, shown in Figure 3.21, was generated with the help of Microsoft Access. (Check the Ch03_Theater database.)

436

FIGURE Q3.25 The Relational Diagram

534. Suppose you wanted quick lookup capability to get a listing of all plays directed by a given director. Which table would be the basis for the INDEX table, and what would be the index key? Answer: The PLAY table would be the basis for the appropriate index table. The index key would be the attribute DIR_NUM. 535. What would be the conceptual view of the INDEX table described in Question 26? Depict the contents of the conceptual INDEX table. Answer: The conceptual index table is shown in Figure Q3.27.

FIGURE Q3.27 The Conceptual Index Table Index Key

Pointers to the PLAY Table

100

101

2, 5, 7

102

1, 3, 6

437

ANSWERS TO PROBLEMS Use the database shown in Figure P3.1 to answer Problems 1–9. FIGURE P3.1 The Ch03_StoreCo Database Tables

438

For each table, identify the primary key and the foreign key(s). If a table does not have a foreign key, write None. Answer:

TABLE

PRIMARY KEY

FOREIGN KEY(S)

EMPLOYEE

EMP_CODE

STORE_CODE

STORE

STORE_CODE

REGION_CODE, EMP_CODE

REGION

REGION_CODE

NONE

NOTE: the STORE_CODE foreign key in the EMPLOYEE table represents where the employee works. The EMP_CODE in the STORE table represents who is the store manager. 536. Do the tables exhibit entity integrity? Answer yes or no and then explain your answer. Answer:

TABLE

ENTITY INTEGRITY

EXPLANATION

EMPLOYEE

Yes

Each EMP_CODE value is unique and there are no nulls.

STORE

Yes

Each STORE_CODE value is unique and there are no nulls.

REGION

Yes

Each REGION_CODE value is unique and there are no nulls.

537. Do the tables exhibit referential integrity? Answer yes or no and then explain your answer. Write NA (Not Applicable) if the table does not have a foreign key. Answer:

TABLE

REFERENTIAL INTEGRITY

EXPLANATION

EMPLOYEE

Yes

Each STORE_CODE value in EMPLOYEE points to an existing STORE_CODE value in STORE.

STORE

Yes

Each REGION_CODE value in STORE points to an existing REGION_CODE value in REGION and each EMP_CODE value in STORE points to an existing EMP_CODE value in EMPLOYEE.

439

REGION

440

538.Describe the type(s) of relationship(s) between STORE and REGION. Answer: Because REGION_CODE values occur more than once in STORE, we may conclude that each REGION can contain many stores. But since each STORE is located in only one REGION, the relationship between STORE and REGION is M:1. (It is, of course, equally true that the relationship between REGION and STORE is 1:M.) 539.Create the ERD to show the relationship between STORE and REGION. Answer: The Crow’s Foot ERD is shown in Figure P3.5. Note that each store is located in a single region, but that each region can have many stores located in it. (It’s always a good time to focus a discussion on the role of business rules in the creation of a database design.)

FIGURE P3.5 ERD for the STORE and REGION Relationship

540. Create the relational diagram to show the relationship between STORE and REGION. Answer: The relational diagram is shown in Figure P3.6. Note (again) that the location of the entities is immaterial … the relationships are carried along with the entity. Therefore, it does not matter whether you locate the REGION on the left side or on the right side of the display. But you always read from the “1” side to the “M” side, regardless of the entity location.

FIGURE P3.6 The Relational Diagram for the STORE and REGION Relationship

541. Describe the type(s) of relationship(s) between EMPLOYEE and STORE. (Hint: Each store employs many employees, one of whom manages the store.) Answer: There are TWO relationships between STORE and EMPLOYEE. The first relationship, expressed by STORE employs EMPLOYEE, is a 1:M relationship, because one store can employ many employees and each employee is employed by one store. The second relationship, expressed by EMPLOYEE manages STORE, is a 1:1 relationship, because each store is managed by one employee and an employee manages only one store.

441

NOTE It is useful to introduce several ways in which the manages relationship may be implemented. For example, rather than creating the manages relationship between EMPLOYEE and STORE, it is possible to simply list the manager’s name as an attribute in the STORE table. This approach creates a redundancy that may not do much damage if the information requirements are limited. However, if it is necessary to keep track of each manager’s sales and personnel management performance by store, the manages relationship we have shown here will do a much better job in terms of information generation. Also, you may want to introduce the notion of an optional relationship. After all, not all employees participate in the manages relationship. We will cover optional relationships in detail in Chapter 4, “Entity relationship (ER) Modeling.” 542. Create the ERD to show the relationships among EMPLOYEE, STORE, and REGION. Answer: The Crow’s Foot ERD is shown in Figure P3.8. Remind students that you always read from the “1” side to the “M” side in any 1:M relationship, that is, a STORE employs many EMPLOYEEs and a REGION contains many STORES. In a 1:1 relationship, you always read from the “parent” entity to the related entity. In this case, only one EMPLOYEE manages each STORE … and each STORE is managed by only one EMPLOYEE. Figure P3.8’s ERD includes the properties of the manages relationship. Note that there is no mandatory 1:1 relationship available at this point. That’s why there is an optional relationship—the O symbol—next to the STORE entity to indicate that an employee is not necessarily a manager. Let your students know that such optional relationships will be explored in detail in Chapter 4. (Explain that you can create mandatory 1:1 relationships when you add attributes to the entity boxes and specify a mandatory data entry for those attributes that are involved in the 1:1 relationship.)

FIGURE P3.8 StoreCo Crow’s Foot ERD

442

543. Create the relational diagram to show the relationships among EMPLOYEE, STORE, and REGION. Answer: The relational diagram is shown in Figure P3.9.

FIGURE P3.9 The Relational Diagram

An EMPLOYEE has only one JOB_CODE, but a JOB_CODE can be held by many EMPLOYEEs.



An EMPLOYEE can participate in many PLANs, and any PLAN can be assigned to many EMPLOYEEs.

Note also that the M:N relationship has been broken down into two 1:M relationships for which the BENEFIT table serves as the composite or bridge entity.

443

FIGURE P3.10 The Ch03_BeneCo Database Tables

544. For each table in the database, identify the primary key and the foreign key(s). If a table does not have a foreign key, write None. Answer:

TABLE

PRIMARY KEY

FOREIGN KEY(S)

EMPLOYEE

EMP_CODE

JOB_CODE

BENEFIT

EMP_CODE + PLAN_CODE

EMP_CODE, PLAN_CODE

JOB

JOB-CODE

None

PLAN

PLAN_CODE

None

545. Create the ERD to show the relationship between EMPLOYEE and JOB. Answer: The ERD is shown in Figure P3.11. Note that the JOB_CODE = 1 occurs twice in the EMPLOYEE table, as does the JOB_CODE = 2, thus providing evidence that a JOB can be assigned to many EMPLOYEEs. But each EMPLOYEE has only one JOB_CODE, so there exists a 1:M relationship between JOB and EMPLOYEE.

444

FIGURE P3.11 The ERD for the EMPLOYEE–JOB Relationship

546. Create the relational diagram to show the relationship between EMPLOYEE and JOB. Answer: The relational schema is shown in Figure P3.12.

FIGURE P3.12 The Relational Diagram

547. Do the tables exhibit entity integrity? Answer yes or no and then explain your answer.

TABLE

ENTITY INTEGRITY

EXPLANATION

EMPLOYEE

Yes

Each EMP_CODE value is unique and there are no nulls.

BENEFIT

Yes

Each combination of EMP_CODE and PLAN_CODE values is unique and there are no nulls.

JOB

Yes

Each JOB_CODE value is unique and there are no nulls.

PLAN

Yes

Each PLAN_CODE value is unique and there are no nulls.

445

548. Do the tables exhibit referential integrity? Answer yes or no and then explain your answer. Write NA (Not Applicable) if the table does not have a foreign key. Answer:

TABLE

REFERENTIAL INTEGRITY

EXPLANATION

EMPLOYEE

Yes

Each JOB_CODE value in EMPLOYEE points to an existing JOB_CODE value in JOB.

BENEFIT

Yes

Each EMP_CODE value in BENEFIT points to an existing EMP_CODE value in EMPLOYEE and each PLAN_CODE value in BENEFIT points to an existing PLAN_CODE value in PLAN.

JOB

PLAN

549. Create the ERD to show the relationships among EMPLOYEE, BENEFIT, JOB, and PLAN. Answer: The Crow’s Foot ERD is shown in Figure P3.15.

FIGURE P3.15 BeneCo Crow’s Foot ERD

550. Create the relational diagram to show the relationships among EMPLOYEE, BENEFIT, JOB, and PLAN. Answer: The relational diagram is shown in Figure P3.16. Note that the location of the entities is immaterial—the relationships move with the entities.

446

FIGURE P3.16 The Relational Diagram

Use the database shown in Figure P3.17 to answer Problems 17–23.

FIGURE P3.17 The Ch03_TransCo Database Tables

447

551. For each table, identify the primary key and the foreign key(s). If a table does not have a foreign key, write None. Answer:

TABLE

PRIMARY KEY

FOREIGN KEY(S)

TRUCK

TRUCK_NUM

BASE_CODE, TYPE_CODE

BASE

BASE_CODE

None

TYPE

TYPE_CODE

None

NOTE The TRUCK_SERIAL_NUM could also be designated as the primary key. Because the TRUCK_NUM was designated to be the primary key, TRUCK_SERIAL_NUM is an example of a candidate key. 552. Do the tables exhibit entity integrity? Answer yes or no and then explain your answer. Answer:

TABLE

ENTITY INTEGRITY

EXPLANATION

TRUCK

Yes

The TRUCK_NUM values in the TRUCK table are all unique and there are no nulls.

BASE

Yes

The BASE_CODE values in the BASE table are all unique and there are no nulls.

TYPE

Yes

The TYPE_CODE values in the TYPE table are all unique and there are no nulls.

448

553. Do the tables exhibit referential integrity? Answer yes or no and then explain your answer. Write NA (Not Applicable) if the table does not have a foreign key. Answer:

TABLE

REFERENTIAL INTEGRITY

TRUCK

Yes

BASE

TYPE

EXPLANATION

554. Identify the TRUCK table’s candidate key(s). Answer: A candidate key is any key that could have been used as a primary key, but that was, for some reason, not chosen to be the primary key. For example, the TRUCK_SERIAL_NUM could have been selected as the PK, but the TRUCK_NUM was actually designated to be the PK. Therefore, the TRUCK_SERIAL_NUM is a candidate key. Also, any combination of attributes that would uniquely identify any truck would be a candidate key. For example, the combination of BASE_CODE, TYPE_CODE, TRUCK_MILES, and TRUCK_BUY_DATE is not likely to be duplicated and this combination would, therefore, be a candidate key. However, while the latter combination might constitute a candidate key, such a combination would not be practical. (An extreme—and impractical—example of a candidate key would be the combination of all of a table’s attributes.) Furthermore, this assumes that the TRUCK_MILES attribute represents the number of miles in the truck when we purchased and not the actual miles driven. The actual miles driven value changes over time. This will not be a good prime attribute choice as the attribute will always be different as time goes by.

449

555. For each table, identify a superkey and a secondary key. Answer:

TABLE

SUPERKEY

SECONDARY KEY

TRUCK

TRUCK_NUM + TRUCK_MILES

BASE_CODE + TYPE_CODE

TRUCK_NUM + TRUCK_MILES + TRUCK_BUY_DATE

TRUCK_NUM + TRUCK_MILES + TRUCK_BUY_DATE + TYPE_CODE BASE

TYPE

BASE_CODE + BASE_CITY

BASE_CITY + BASE_STATE

BASE_CODE + BASE_CITY + BASE_CITY

(This a very effective secondary key, since it is not likely that a state contains two cities with the same name.)

TYPE_CODE TYPE_DESCRIPTION

TYPE_DESCRIPTION

556. Create the ERD for this database. Answer: The Crow’s Foot ERD is shown in Figure P3.22.

FIGURE P3.22 TransCo Crow’s Foot ERD

450

557. Create the relational diagram for this database. Answer: The relational diagram is shown in Figure P3.23.

FIGURE P3.23 The Ch03_TransCo Relational Diagram

451

FIGURE P3.24 The Ch03_AviaCo Database Tables (Part 2)

452

NOTE Earlier in the chapter, you were instructed to avoid homonyms and synonyms. In this problem, both the pilot and the copilot are listed in the PILOT table, but EMP_NUM cannot be used for both in the CHARTER table. Therefore, the synonyms CHAR_PILOT and CHAR_COPILOT are used in the CHARTER table. Although the solution works in this case, it is very restrictive and it generates nulls when a copilot is not required. Worse, such nulls proliferate as crew requirements change. For example, if the AviaCo charter company grows and starts using larger aircraft, crew requirements may increase to include flight engineers and load masters. The CHARTER table would then have to be modified to include the additional crew assignments; such attributes as CHAR_FLT_ENGINEER and CHAR_LOADMASTER would have to be added to the CHARTER table. Given this change, each time a smaller aircraft flew a charter trip without the number of crew members required in larger aircraft, the missing crew members would yield additional nulls in the CHARTER table. You will have a chance to correct those design shortcomings in Problem 27. The problem illustrates two important points: 3. Don’t use synonyms. If your design requires the use of synonyms, revise the design! 4. To the greatest possible extent, design the database to accommodate growth without requiring structural changes in the database tables. Plan ahead and try to anticipate the effects of change on the database. 558. For each table, identify each of the following when possible: Answer: f.

The primary key

TABLE

PRIMARY KEY

CHARTER

CHAR_TRIP

AIRCRAFT

AC_NUMBER

MODEL

MOD_CODE

PILOT

EMP_NUM

EMPLOYEE

EMP_NUM

CUSTOMER

CUS_CODE

453

g. A superkey

TABLE

SUPER KEY

CHARTER

CHAR_TRIP + CHAR_DATE

AIRCRAFT

AC_NUM + MOD-CODE

MODEL

MOD_CODE + MOD_NAME

PILOT

EMP_NUM + PIL_LICENSE

EMPLOYEE

EMP_NUM + EMP_DOB

CUSTOMER

CUS_CODE + CUS_LNAME

h. A candidate key

TABLE

CANDIDATE KEY

CHARTER

AIRCRAFT

See the previous discussion.

MODEL

See the previous discussion.

PILOT

See the previous discussion.

EMPLOYEE

See the previous discussion. But perhaps the combination of EMP_LNAME + EMP_FNAME + EMP_INITIAL + EMP_DOB will yield an acceptable candidate key.

CUSTOMER

See the previous discussion.

454

The foreign key(s)

TABLE

FOREIGN KEY

CHARTER

CHAR_PILOT (references PILOT) CHAR_COPILOT (references PILOT) AC_NUMBER (references AIRCRAFT) CUS_CODE (references CUSTOMER)

AIRCRAFT

MOD_CODE

MODEL

None

PILOT

EMP_NUM (references EMPLOYEE)

EMPLOYEE

None

CUSTOMER

None

A secondary key

TABLE

SECONDARY KEY

CHARTER

CHAR_DATE + AC_NUMBER + CHAR_DESTINATION

AIRCRAFT

MOD_CODE

MODEL

MOD_MANUFACTURER + MOD_NAME

PILOT

PIL_LICENSE + PIL_MED_DATE

EMPLOYEE

EMP_LNAME + EMP_FNAME + EMP_DOB

CUSTOMER

CUS_LNAME + CUS_FNAME + CUS_PHONE

559. Create the ERD. (Hint: Look at the table contents. You will discover that an AIRCRAFT can fly many CHARTER trips but that each CHARTER trip is flown by one AIRCRAFT, that a MODEL references many AIRCRAFT but that each AIRCRAFT references a single MODEL, and so on.) Answer: The Crow’s Foot ERD is shown in Figure P3.25. The optional (default) 1:1 relationship crops up in this ERD, just as it did in the Problem 8 solution. Use the same discussion that accompanied Problem 8. Also, note that EMPLOYEE is the “parent” of PILOT. Note that all pilots are employees, but not all employees are pilots—some are mechanics, accountants, and so on. (This discussion previews some of the Chapter 4 coverage … coming attractions, so to speak.) The relationship between PILOT and EMPLOYEE is read from the

455

“parent” entity to the related entity. In this case, the relationship is read as “an EMPLOYEE is a PILOT.”

456

FIGURE P3.25 The Ch03_AviaCo Database ERD

560. Create the relational diagram. Answer: The relational diagram is shown in Figure P3.26.

FIGURE P3.26 The Ch03_AviaCo Database Relational Diagram

457

561. Modify the ERD you created in Problem 25 to eliminate the problems created by the use of synonyms. (Hint: Modify the CHARTER table structure by eliminating the CHAR_PILOT and CHAR_COPILOT attributes; then create a composite table named CREW to link the CHARTER and EMPLOYEE tables. Some crew members, such as flight attendants, may not be pilots. That’s why the EMPLOYEE table enters into this relationship.) Answer: The Crow’s Foot ERD is shown in Figure P3.27.

FIGURE P3.27 The Ch03_AviaCo_2 Database ERD

562. Create the relational diagram for the design you revised in Problem 27. Answer: (After you have had a chance to revise the design, your instructor will show you the results of the design change, using a copy of the revised database named Ch03_AviaCo_2.) The relational diagram for the Ch03_AviaCo_2 database is shown in Figure P3.28. Note that there are a few additional entities that you will encounter again in Chapter 4. (You can safely ignore the extra entities, RATING and EARNEDRATING at this point … but you can let the students “read” the relationship between these two entities.) Note that you can easily derive the M:N relationship between PILOT and RATING. (A PILOT can earn many RATINGs. A RATING can be earned by many PILOTs.) Even though your students may not know what a rating is, they can still draw up conclusions about its relationship to other entities by looking at relational diagrams and ERDs. And that’s one of the many strengths of design tools. Also, you can let your students break the M:N relationship down into two 1:M relationships—note that this is done through the EARNEDRATING entity. The issues encountered in the design and implementation of the Ch3_AviaCo_2 database will be revisited many times in the book.

458

FIGURE P3.28 The Ch03_AviaCo_2 Relational Diagram

You want to see data on charters flown by either Robert Williams (employee number 105) or Elizabeth Travis (employee number 109) as pilot or copilot, but not charters flown by both of them. Complete Problems 29–31 to find this information. 563. Create the table that would result from applying the SELECT and PROJECT relational operators to the CHARTER table to return only the CHAR_TRIP, CHAR_PILOT, and CHAR_COPILOT attributes for charters flown by either employee 105 or employee 109. Answer:

CHAR_TRIP

CHAR_PILOT

CHAR_COPILOT

10003

105

109

10006

109

10009

105

10010

109

10013

105

10016

109

105

10018

105

104

459

564. Create the table that would result from applying the SELECT and PROJECT relational operators to the CHARTER table to return only the CHAR_TRIP, CHAR_PILOT, and CHAR_COPILOT attributes for charters flown by both employee 105 and employee 109. Answer:

CHAR_TRIP

CHAR_PILOT

CHAR_COPILOT

10003

105

109

10016

109

105

565. Create the table that would result from applying a DIFFERENCE relational operator of your result from Problem 29 to your result from Problem 30. Answer:

CHAR_TRIP

CHAR_PILOT

10006

109

10009

105

10010

109

10013

105

10018

105

CHAR_COPILOT

104

ANSWERS TO REVIEW QUESTIONS 566. What two conditions must be met before an entity can be classified as a weak entity? Give an example of a weak entity.

460

Answer: To be classified as a weak entity, two conditions must be met: 3. The entity must be existence-dependent on its parent entity. 4. The entity must inherit at least part of its primary key from its parent entity. For example, the (strong) relationship depicted in the text’s Figure 4.9 shows a weak CLASS entity: 3. CLASS is clearly existence-dependent on COURSE. (You can’t have a database class unless a database course exists.) 4. The CLASS entity’s PK is defined through the combination of CLASS_SECTION and CRS_CODE. The CRS_CODE attribute is also the PK of COURSE. The conditions that define a weak entity are the same as those for a strong relationship between an entity and its parent. In short, the existence of a weak entity produces a strong relationship. And if the entity is strong, its relationship to the other entity is weak. (Note the dotted relationship line in the text’s Figure 4.9 when the relationship is weak.) Keep in mind that whether or not an entity is weak usually depends on the database designer’s decisions. For instance, if the database designer had decided to use a single attribute as shown in the text’s Figure 4.9, the CLASS entity would be strong. (The CLASS entity’s PK is CLASS_CODE, which is not derived from the COURSE entity.) In this case, the relationship between COURSE and CLASS is weak. (Note the dashed relationship line in the text’s Figure 4.9.) If the designer chose the composite key, as shown in Figure 4.10, the relationship is strong, as denoted by the solid line. However, regardless of how the designer classifies the relationship—weak or strong—CLASS is always existencedependent on COURSE.

461

567. What is a strong (or identifying) relationship, and how is it depicted in a Crow’s Foot ERD? Answer: A strong relationship exists when an entity is existence-dependent on another entity and inherits at least part of its primary key from that entity. A strong relationship is shown as a solid line. In other words, a strong relationship exists when a weak entity is related to its parent entity. (Note the discussion in Question 1.) 568. Given the business rule “an employee may have many degrees,” discuss its effect on attributes, entities, and relationships. (Hint: Remember what a multivalued attribute is and how it might be implemented.) Answer: Suppose that an employee has the following degrees: BA, BS, and MBA. These degrees could be stored in a single string as a multivalued attribute named EMP_DEGREE in an EMPLOYEE table such as the one shown next:

EMP_NUM

EMP_LNAME

EMP_DEGREE

123

Carter

AA, BBA

124

O’Shanski

BBA, MBA, Ph.D.

125

Jones

126

Ortez

BS, MS

EMP_NUM

EMP_LNAME

EMP_DEGREE1 EMP_DEGREE2

123

Carter

BBA

124

O’Shanski

BBA

MBA

125

Jones

126

Ortez

EMP_DEGREE3

Ph.D.

462

EMP_LNAME

123

Carter

124

O’Shanski

125

Jones

126

Ortez

EMP_AA

EMP_AS

EMP_BA

EMP_BS

EMP_BB A

EMP_MS

EMP_MBA

EMP_PhD

X X X X

Table name: EMPLOYEE EMP_NUM

EMP_LNAME

123

Carter

124

O’Shanski

125

Jones

126

Ortez

Table name: DEGREE EMP_NUM

DEGREE_CODE

DEGREE_DATE

DEGREE_PLACE

123

May-1999

Lake Sumter CC

123

BBA

Aug-2004

U. of Georgia

124

BBA

Dec-1990

U. of Toledo

124

MBA

May-2001

U. of Michigan

463

124

Ph.D.

Dec-2005

U. of Tennessee

125

Aug-2002

Valdosta State

126

Dec-1989

U. of Missouri

126

May-2002

U. of Florida

Note that this solution leaves no nulls, produces a simple query environment, and makes it unnecessary to alter the table structure when employees earn additional degrees. (You can make the environment even more flexible by naming the new entity QUALIFICATION, thus making it possible to store degrees, certifications, and other useful data that define an employee’s qualifications.) 569. What is a composite entity, and when is it used? Answer: A composite entity, also known as a bridge entity, is generally used to transform complex relationships that cannot be implemented in the relational model. For example, it is used to implement M:N relationships or higher order relationships by decomposing the relationship into 1:M relationships. This allows for properly implementable placement of FKs. 570. Suppose you are working within the framework of the conceptual model in Figure Q4.5. Answer:

FIGURE Q4.5 The Conceptual Model for Question 5

Given the conceptual model in Figure Q4.5: c. Write the business rules that are reflected in it. Even a simple ERD such as the one shown in Figure Q4.5 is based on many business rules. Make sure that each business rule is written on a separate line and that all of its details are spelled out. In this case, the business rules are derived from the ERD in a “reverseengineering” procedure designed to document the database design. In a real-world database design situation, the ERD is generated on the basis of business rules that are written before the first entity box is drawn. (Remember that the business rules are derived from a carefully and precisely written description of operations.)

464

Given the ERD shown in Figure Q4.5, you can identify the following business rules: 11. A customer can own many cars. 12. Some customers do not own cars. 13. A car is owned by one and only one customer. 14. A car may generate one or more maintenance records. 15. Each maintenance record is generated by one and only one car. 16. Some cars have not (yet) generated a maintenance procedure. 17. Each maintenance procedure can use many parts. (Comment: A maintenance procedure may include multiple maintenance actions, each one of which may or may not use parts. For example, 10,000-mile check may include the installation of a new oil filter and a new air filter. But tightening an alternator belt does not require a part.) 18. A part may be used in many maintenance records. (Comment: Each time an oil change is made, an oil filter is used. Therefore, many oil filters may be used during some period of time. Naturally, you are not using the same oil filter each time—but the part classified as “oil filter” shows up in many maintenance records as time passes.) Note that the apparent M:N relationship between MAINTENANCE and PART has been resolved through the use of the composite entity named MAINT_LINE. The MAINT_LINE entity ensures that the M:N relationship between MAINTENANCE and PART has been broken up to produce the two 1:M relationships shown in business rules 9 and 10. 19. Each maintenance procedure generates one or more maintenance lines. 20. Each part may appear in many maintenance lines. (Review the comment in business rule 8.) As you review the business rules 9 and 10, use the following two tables to show some sample data entries. For example, take a look at the (simplified) contents of the following MAINTENANCE and LINE tables and note that the MAINT_NUM 10001 occurs three times in the LINE table:

Sample MAINTENANCE Table Data MAINT_NUM

MAINT_DATE

10001

15-Mar-2022

10002

15-Mar-2022

10003

16-Mar-2022

465

Sample LINE Table Data MAINT_NUM

LINE_NUM

LINE_DESCRIPTION LINE_PART

LINE_UNITS

10001

Replace fuel filter

FF-015

10001

Replace air filter

AF-1187

10001

Tighten belt

alternator

10002

Replace bulbs

taillight

BU-2145

10003

Replace oil filter

OF-2113

10003

Replace air filter

AF-1187

d. Identify all of the cardinalities. The Crow’s Foot ERD, shown in Figure Q4.5, does not show cardinalities directly. Instead, the cardinalities are implied through the Crow’s Foot symbols for connectivity. You might write the cardinality (0,N) next to the MAINT_LINE entity in its relationship with the PART entity to indicate that a part might occur “N” times in the maintenance line entity or that it might never show up in the maintenance line entity. The latter case would occur if a given part has never been used in maintenance. 571. What is a recursive relationship? Give an example. Answer: A recursive relationship exists when an entity is related to itself, that is, some instances (rows) in the entity (table) are related to other instances (rows) in that same entity (table). For example, a COURSE may be a prerequisite to a COURSE. (See Section 4.1j, “Recursive Relationships,” for additional examples.) 572. How would you (graphically) identify each of the following ERM components in a Crow’s Foot notation? Answer: The answers to Questions (a) through (d) are illustrated with the help of Figure Q4.7.

466

FIGURE Q4.7 Crow’s Foot ERM Components

e. An entity An entity is represented by a rectangle containing the entity name. (Remember that, in ER modeling, the word “entity” actually refers to the entity set.) f.

The cardinality (0,N) Cardinalities are implied through the use of Crow’s Foot symbols for connectivity. For example, note the implied (0,N) cardinality in Figure Q4.7.

g. A weak relationship A weak relationship exists when the PK of the related entity does not contain at least one of the PK attributes of the parent entity. For example, if the PK of a COURSE entity is CRS_CODE and the PK of the related CLASS entity is CLASS_CODE, the relationship between COURSE and CLASS is weak. (Note that the CLASS PK does not include the CRS_CODE attribute.) A weak relationship can be indicated by a dashed line in the ERD. h. A strong relationship A strong relationship exists when the PK of the related entity contains at least one of the PK attributes of the parent entity. For example, if the PK of a COURSE entity is CRS_CODE and the PK of the related CLASS entity is CRS_CODE + CLASS_SECTION, the relationship between COURSE and CLASS is strong. (Note that the CLASS PK includes the CRS_CODE attribute.) A strong relationship can be indicated by a solid line in the ERD.

467

573. Discuss the difference between a composite key and a composite attribute. How would each be indicated in an ERD? Answer: A composite key is one that consists of more than one attribute. If the ER diagram contains the attribute names for each of its entities, a composite key is indicated in the ER diagram by the fact that more than one attribute name is underlined to indicate its participation in the primary key. A composite attribute is one that can be subdivided to yield meaningful attributes for each of its components. For example, the composite attribute CUS_NAME can be subdivided to yield the CUS_FNAME, CUS_INITIAL, and CUS_LNAME attributes. There is no ER convention that enables us to indicate that an attribute is a composite attribute. 574. What two courses of action are available to a designer who encounters a multivalued attribute? Answer: The discussion that accompanies the answer to Question 3 is valid as an answer to this question. Briefly, the multivalued attribute can be separated into multiple columns, or it can be placed in a separate table. The first option is only appropriate if the designer knows, absolutely knows, the maximum number of possible values any row could have for the attribute. This can yield a workable solution but is fraught with numerous issues for performance and querying. The second option of using a separate table will always yield a workable solution, and generally has the best performance and querying capabilities. For additional insight, see discussion in Section 4-1b, in particular Figures 4.3, 4.4, and 4.5 and Table 4.1. 575. What is a derived attribute? Give an example. What are the advantages or disadvantages of storing or not storing a derived attribute? Answer: A derived attribute is an attribute whose value is calculated (derived) from other attributes. The derived attribute need not be physically stored within the database; instead, it can be derived by using an algorithm. For example, an employee’s age, EMP_AGE, may be found by computing the integer value of the difference between the current date and the EMP_DOB. If you use MS Access, you would use INT((DATE() − EMP_DOB)/365). Similarly, a sales clerk’s total gross pay may be computed by adding a computed sales commission to base pay. For instance, if the sales clerk’s commission is 1%, the gross pay may be computed by EMP_GROSSPAY = INV_SALES*1.01 + EMP_BASEPAY Or the invoice line item amount may be calculated by LINE_TOTAL = LINE_UNITS*PROD_PRICE Advantages of storing a derived attribute include reduced complexity of the query to retrieve the computed values and less processing overhead at the time of retrieval. Disadvantages of storing a derived attribute include increased possibility of data inconsistency and increased processing overhead at the time of storage.

468

Advantages of not storing a derived attribute include reduced risk of data inconsistency or stale values. Disadvantages of not storing a derived attribute include increased query complexity and performance penalties for calculating the value when it is needed. 576. How is a relationship between entities indicated in an ERD? Give an example using the Crow’s Foot notation. Answer: Use Figure Q4.7 as the basis for your answer. Briefly, a relationship is indicated by a line connecting the related entities. Note the distinction between the dashed and solid relationship lines, then tie this distinction to the answers to Questions 7c and 7d. 577. Discuss two ways in which the 1:M relationship between COURSE and CLASS can be implemented. (Hint: Think about relationship strength.) Answer: Note the discussion about weak and strong entities in Questions 7c and 7d. Then follow up with this discussion: The relationship is implemented as strong when the CLASS entity’s PK contains the COURSE entity’s PK. For example, COURSE(CRS_CODE, CRS_TITLE, CRS_DESCRIPTION, CRS_CREDITS) CLASS(CRS_CODE, CLASS_SECTION, CLASS_TIME, CLASS_PLACE) Note that the CLASS entity’s PK is CRS_CODE + CLASS_SECTION—and that the CRS_CODE component of this PK has been “borrowed” from the COURSE entity. Because CLASS is existence-dependent on COURSE and uses a PK component from its parent (COURSE) entity, the CLASS entity is weak in this strong relationship between COURSE and CLASS. The Visio Crow’s Foot ERD shows a strong relationship as a solid line. (See Figure Q4.12a.) Visio refers to a strong relationship as an identifying relationship.

FIGURE Q4.12a Strong COURSE and CLASS Relationship

469

Sample data are shown next:

Table name: COURSE CRS_CODE

CRS_TITLE

CRS_DESCRIPTION

CRS_CREDITS

ACCT-211

Basic Accounting

An introduction to accounting. Required of all business majors.

CIS-380

Database Techniques I

Database design and implementation issues. Uses CASE tools to generate designs that are then implemented in a major database management system.

CIS-490

Database Techniques II

The second half of CIS-380. Basic Web database application development and management issues.

Table name: CLASS CRS_CODE

CLASS_SECTION

CLASS_TIME

CLASS_PLACE

ACCT-211

8:00 a.m. – 9:30 a.m. T-Th.

Business 325

ACCT-211

8:00 a.m. – 8:50 a.m. MWF

Business 325

ACCT-211

8:00 a.m. – 8:50 a.m. MWF

Business 402

CIS-380

11:00 a.m. – 11:50 a.m. MWF

Business 415

CIS-380

3:00 p.m. – 3:50 a.m. MWF

Business 398

CIS-490

1:00 p.m. – 3:00 p.m. MW

Business 398

CIS-490

6:00 p.m. – 10:00 p.m. Th.

Business 398

470

FIGURE Q4.12b Weak COURSE and CLASS Relationship

Given the weak relationship depicted in Figure Q4.12b, the CLASS table contents would look like this:

Table name: CLASS CLASS_CODE

CRS_CODE

CLASS_SECTION

CLASS_TIME

CLASS_PLACE

21151

ACCT-211

8:00 a.m. – 9:30 a.m. T-Th.

Business 325

21152

ACCT-211

8:00 a.m. – 8:50 a.m. MWF

Business 325

21153

ACCT-211

8:00 a.m. – 8:50 a.m. MWF

Business 402

38041

CIS-380

11:00 a.m. – 11:50 a.m. MWF

Business 415

38042

CIS-380

3:00 p.m. – 3:50 a.m. MWF

Business 398

49041

CIS-490

1:00 p.m. – 3:00 p.m. MW

Business 398

49042

CIS-490

6:00 p.m. – 10:00 p.m. Th.

Business 398

The advantage of the second CLASS entity version is that its PK can be referenced easily as an FK in another related entity such as ENROLL. Using a single-attribute PK makes implementation easier. This is especially true when the entity represents the “1” side in one or more relationships. In general, it is advisable to avoid composite PKs whenever it is practical to do so. 578. How is a composite entity represented in an ERD, and what is its function? Illustrate the Crow’s Foot notation. Answer: The label “composite” is based on the fact that the composite entity contains at least the primary key attributes of each of the entities that are connected by it. The composite entity is an important component of the ER model because relational database models should not contain M:N relationships—and the composite entity can be used to break up such relationships into 1:M relationships. Suppose, for example, that you want to design a class enrollment entity to serve as the “bridge” between STUDENT and CLASS in the M:N relationship defined by these two business rules: 

A student can take many classes.



Each class can be taken by many students.

471

Operational (transaction) speed requirements are also dictated by the end users.

Clearly, an elegant database design that fails to address end-user information requirements or one that forms the basis for an implementation whose use progresses at a snail’s pace has little practical use. 580. Briefly, but precisely, explain the difference between single-valued attributes and simple attributes. Give an example of each. Answer: A single-valued attribute is one that can have only one value. For example, a person has only one first name and only one social security number. A simple attribute is one that cannot be decomposed into its component pieces. For example, a person’s sex is classified as either M or F and there is no reasonable way to decompose M or F. Similarly, a person’s first name cannot be decomposed into meaningful components. (In contrast, if a phone number includes the area code, it can be decomposed into the area code and the phone number. And a person’s name may be decomposed into a first name, an initial, and a last name.) Single-valued attributes are not necessarily simple. For example, an inventory code HWPRIJ23145 may refer to a classification scheme in which HW indicates Hardware, PR indicates Printer, IJ indicates Inkjet, and 23145 indicates an inventory control number. Therefore, HWPRIJ23145 may be decomposed into its component parts, even though it is single-valued. To facilitate product tracking, manufacturing serial codes must be singlevalued, but they may not be simple. For instance, the product serial number TNP5S2M231109154321 might be decomposed this way: TN = state = Tennessee P5 = plant number 5 S2 = shift 2 M23 = machine 23 11 = month, i.e., November 09 = day

472

154321 = time on a 24-hour clock, i.e., 15:43:21, or 3:43 p.m. plus 21 seconds. 581. What are multivalued attributes, and how can they be handled within the database design? Answer: The answer to Question 3 is just as valid as an answer to this question. You can augment that discussion with the following discussion: As the name implies, multivalued attributes may have many values. For example, a person’s education may include a high school diploma, a two-year college associate degree, a fouryear college degree, a Master’s degree, a Doctoral degree, and various professional certifications such as a Certified Public Accounting certificate or a Certified Data Processing Certificate. There are basically two ways to handle multivalued attributes—three if you count ignoring the fact that it is multivalued, and two of those three ways are bad: 4. If we ignore that the attribute is multivalued, then the educational attainments may be kept as a single, variable-length string or character field. This solution is undesirable because it becomes difficult to query the table. For example, even a simple question such as “how many employees have four-year college degrees?” requires string partitioning that is time-consuming at best. Of course, if there is no need to ever group employees by education, the variable-length string might be acceptable from a design point of view. However, as database designers we know that, sooner or later, information requirements are likely to grow, so the string storage is probably a bad idea from that perspective, too. 5. Each of the possible outcomes is kept as a separate attribute within the table. This solution is undesirable for several reasons. First, the table would generate many nulls for those who had minimal educational attainments. Using the preceding example, a person with only a high school diploma would generate nulls for the two-year college associate degree, the four-year college degree, the Master’s degree, the Doctoral degree, and for each of the professional certifications. In addition, how many professional certification attributes should be maintained? If you store two professional certification attributes, you will generate a null for someone with only one professional certification and you’d generate two nulls for all persons without professional certifications. And suppose you have a person with five professional certifications? Would you create additional attributes, thus creating many more nulls in the table, or would you simply ignore the additional professional certifications, thereby losing information? 6. Finally, the most flexible way to deal with multivalued attributes is to create a composite entity that links employees to education. By using the composite entity, there will never be a situation in which additional attributes must be created within the EMPLOYEE table to accommodate people with multiple certifications. In short, we eliminate the generation of nulls. In addition, we gain information flexibility because we can also store the details (date earned, place earned, etc.) for each of the educational attainments. The (simplified) structures might look like those in Figures Q4.16a and Q4.16b.

473

FIGURE Q4.16a The Ch04_Questions Database Tables

FIGURE Q4.16b The Ch04_Questions Relational Diagram

474

Figure Q4.16c The Crow’s Foot ERD for the Ch04_Questions Database

Figure Q4.17 The ERD for Questions 17−20

582. Write the 10 cardinalities that are appropriate for this ERD. Answer: The cardinalities are indicated in Figure Q4.17sol.

475

FIGURE Q4.17sol The Cardinalities

583. Write the business rules reflected in this ERD. Answer: The following business rules are reflected in the ERD: 

A store may place many orders. (Note the use of “may”—which is reflected in the ORDER optionality.)



An order must be placed by a store. (Note that STORE is mandatory to ORDER. In this ERD, the order environment apparently reflects a wholesale environment.)



An order contains at least one order line. (Note that ORDER_LINE is mandatory to ORDER, and vice versa.)



Each order line has a specific product written in it.



Each employee is employed by one (and only one) store.

476



A dependent must be related to an employee. (Discussion: It makes no sense to keep track of dependents of people who are not even employees. Therefore, EMPLOYEE is mandatory to DEPENDENT.)

584. What two attributes must be contained in the composite entity between STORE and PRODUCT? Use proper terminology in your answer. Answer: As modeled in the figure, ORDER_LINE is the only composite entity between STORE and PRODUCT. The composite entity must at least include the primary keys of the entities it references. The combination of these attributes may be designated to be the composite entity’s (composite) primary key. Each of the (composite) primary key’s attributes is a foreign key that references the entities for which the composite entity serves as a bridge. As you discuss the model in Figure Q4.17sol, note that an order is represented by two entities, ORDER and ORDER_LINE. Note also that the STORE’s 1:M relationship with ORDER and the ORDER’s 1:M relationship with ORDER_LINE reflect the conceptual M:N relationship between STORE and PRODUCT. The original business rules probably read: 

A store can order many products.



A product can be ordered by many stores.

585. Describe precisely the composition of the DEPENDENT weak entity’s primary key. Use proper terminology in your answer. Answer: If DEPENDENT is considered a weak entity, as the question states, then it will have a composite PK that includes the EMPLOYEE entity’s PK and one of its attributes. For example, if the EMPLOYEE entity’s PK is EMP_NUM, the DEPENDENT entity’s PK might be EMP_NUM + DEP_NUM. Note that modeling DEPENDENT as a weak entity is not required, as is shown by the use of a strong relationship between EMPLOYEE and DEPENDENT. In such a case, the PK of DEPENDENT could be a single attribute such as DEP_NUM alone, depending on the domain of the attribute. 586. The local city youth league needs a database system to help track children who sign up to play soccer. Data need to be kept on each team, the children who will play on each team, and their parents. Also, data need to be kept on the coaches for each team. Answer: Draw a data model with the entities and attributes described here. Entities required: Team, Player, Coach, and Parent Attributes required: Team: Team ID number, Team name, and Team colors Player: Player ID number, Player first name, Player last name, and Player age Coach: Coach ID number, Coach first name, Coach last name, and Coach home phone number Parent: Parent ID number, Parent last name, Parent first name, Home phone number, and Home address (Street, City, State, and Zip code) The following relationships must be defined: 

Team is related to Player.

477



Team is related to Coach.



Player is related to Parent.

Connectivities and participations are defined as follows: 

A Team may or may not have a Player.



A Player must have a Team.



A Team may have many Players.



A Player has only one Team.



A Team may or may not have a Coach.



A Coach must have a Team.



A Team may have many Coaches.



A Coach has only one Team.



A Player must have a Parent.



A Parent must have a Player.



A Player may have many Parents.



A Parent may have many Players.

478

FIGURE Q4.21a Conceptual ERD for Question 21

479

FIGURE Q4.21b ERD with Foreign Keys for Question 21

480

FIGURE Q4.21c ERD with Two Team Colors for Question 21

481

FIGURE Q4.21d ERD with Color Table for Question 21

482

ANSWERS TO PROBLEMS Use the following business rules to create a Crow’s Foot ERD. Write all appropriate connectivities and cardinalities in the ERD. Answer: 

A department employs many employees, but each employee is employed by only one department.



Some employees, known as “rovers,” are not assigned to any department.



A division operates many departments, but each department is operated by only one division.



An employee may be assigned many projects, and a project may have many employees assigned to it.



A project must have at least one employee assigned to it.



One of the employees manages each department, and each department is managed by only one employee.



One of the employees runs each division, and each division is run by only one employee.

The answers to Problem 1 (all parts) are included in Figure P4.1.

FIGURE P4.1 Problem 1 ERD Solution

As you discuss the ERD shown in Figure P4.1, note that this design reflects several useful features that become especially important when the design is implemented. For example:

483



Also, if you have multiple relationships between two entities—such as the “EMPLOYEE manages DEPARTMENT” and “DEPARTMENT employs EMPLOYEE” relationships—you must make sure that each relationship has a designated primary entity. For example, the 1:1 relationship expressed by “EMPLOYEE manages DEPARTMENT” requires that the EMPOYEE entity be designated as the primary (or “first”) entity. If you use Visio to create your Crow’s Foot ERDs, Figure P4.3 shows how the 1:1 relationship is specified. If you use some other CASE tool, you will discover that it, too, is likely to require similar relationship specifications. 587. Create a complete ERD in Crow’s Foot notation that can be implemented in the relational model using the following description of operations. Hot Water (HW) is a small start-up company that sells spas. HW does not carry any stock. A few spas are set up in a simple warehouse so customers can see some of the models available, but any products sold must be ordered at the time of the sale. Answer: 

HW can get spas from several different manufacturers.



Each manufacturer produces one or more different brands of spas.



Each and every brand is produced by only one manufacturer.



Every brand has one or more models.



484



Every manufacturer is identified by a manufacturer code. The company name, address, area code, phone number, and account number are kept in the system for every manufacturer.



For each brand, the brand name and brand level (premium, mid-level, or entry-level) are kept in the system.



FIGURE P4.2 Problem 2 ERD Solution

588. The Jonesburgh County Basketball Conference (JCBC) is an amateur basketball association. Each city in the county has one team as its representative. Each team has a maximum of 12 players and a minimum of 9 players. Each team also has up to 3 coaches (offensive, defensive, and physical training coaches). During the season, each team plays 2 games (home and visitor) against each of the other teams. Given those conditions, do the following: 

Identify the connectivity of each relationship.



Identify the type of dependency that exists between CITY and TEAM.



Identify the cardinality between teams and players and between teams and city.



Identify the dependency between COACH and TEAM and between TEAM and PLAYER.



Draw the Chen and Crow’s Foot ERDs to represent the JCBC database.



Draw the UML class diagram to depict the JCBC database.

The Chen ERD solution is shown in Figure P4.3Chen. (The Crow’s Foot solution is shown after the discussion.)

485

FIGURE P4.3 Chen The JCBC Chen ERD M

M GAME

(1,1)

sponsors

CITY (1,1)

(2,N)

(1,1)

1 has

TEAM (1,1)

(1,3)

(9,12)

PLAYER (1,1)

is coached by

(1,1)

COACH

To help the students understand the ER diagram’s components better, note the following relationships: 

The main components are TEAM and GAME.



Each team plays each other team at least twice.



To play a game, two teams are necessary: the home team and the visiting team.



Each team plays at least twice: once as the home team and once as the visiting team.

486

FIGURE P4.3RD The JCBC Relational Diagram, Version 1

FIGURE P4.3SO The JCBC Database Game Summary Output, Version 1

487

FIGURE P4.3 RD2 The Revised JCBC Database Relational Diagram

488

489

FIGURE P4.3CF The JCBC Crow’s Foot ERD

490

FIGURE P4.3UML The JCBC UML Class Diagram

491

NOTE You may wonder why we examined this solution in such detail. (The sample implementation is shown in the database named Ch04_JCBC_Version2.) After all, mere games hardly seem to merit this level of database design attention. Actually, there is the proverbial method in the madness. The basketball—or any other game—environment is likely to be familiar to your students. Therefore, it becomes easier for you to show the design and implementation of recursive relationships—which are actually rather complex things. Fortunately, even complex design issues become manageable in a familiar data environment. Recursive relationships are common enough—or should be—to merit attention and the development of expertise in their implementation. In many manufacturing industries, incredibly detailed part tracking is mandatory. For example, the implementation of the recursive relationship “PART contains PART” is especially desirable in the aviation manufacturing businesses. Such businesses are required by federal law to maintain absolute parts tracing records. If a complex part fails, it must be possible to follow all the trails to all the component parts that may have been involved in the part’s failure. 589. Create an ERD based on the Crow’s Foot notation using the following requirements: Answer: 

An INVOICE is written by a SALESREP. Each sales representative can write many invoices, but each invoice is written by a single sales representative.



The INVOICE is written for a single CUSTOMER. However, each customer can have many invoices.



An INVOICE can include many detail lines (LINE), each of which describes one product bought by the customer.



The product information is stored in a PRODUCT entity.



The product’s vendor information is found in a VENDOR entity.

492

FIGURE P4.4a The Crow’s Foot ERD Solution for Problem 4

493

494

FIGURE P4.4b The Modified Crow’s Foot ERD Solution for Problem 4

590. The Hudson Engineering Group (HEG) has contacted you to create a conceptual model whose application will meet the expected database requirements for the company’s training program. The HEG administrator gives you the following description of the training group’s operating environment. (Hint: Some of the following sentences identify the volume of data rather than cardinalities. Can you tell which ones?) Answer: The HEG has 12 instructors and can handle up to 30 trainees per class. HEG offers five Advanced Technology courses, each of which may generate several classes. If a class has fewer than 10 trainees, it will be canceled. Therefore, it is possible for a course not to generate any classes. Each class is taught by one instructor. Each instructor may teach up to two classes or may be assigned to do research only. Each trainee may take up to two classes per year.

495

Given that information, do the following: c. Define all of the entities and relationships. (Use Table 4.4 as your guide.) The HEG entities and relationships are shown in Table P4.5a.

Table P4.5a The Components of the HEG ERD ENTITY

RELATIONSHIP

CONNECTIVITY

ENTITY

INSTRUCTOR

teaches

1:M

CLASS

COURSE

generates

1:M

CLASS

is listed in

1:M

ENROLL

TRAINEE

is written in

1:M

ENROLL



Each CLASS must be related to a COURSE. (The class must cover designated course material!) Therefore, COURSE is mandatory to CLASS.



You cannot create an enrollment record without having a trainee. Therefore, TRAINEE is mandatory to ENROLL. (Discussion point: What about making TRAINEE optional to ENROLL? In any case, optional relationships may be used for operational reasons, whether or not they are directly derived from a business rule.) Note that a real-world database design requires the explicit recognition of each relationship’s characteristics. When in doubt, ask the end users! d. Describe the relationship between instructor and class in terms of connectivity, cardinality, and existence dependence. Both Questions (a) and (b) have been addressed in the ER diagram shown in Figure P4.5b.

496

FIGURE P4.5b The HEG ERD

As you discuss Figure P4.5b, keep the discussion in part (a) in mind. Also, note the following points: 



A class is taught by only one instructor, but an instructor can teach up to two classes. Therefore, there is a 1:M relationship between INSTRUCTOR and CLASS.



Finally, a COURSE may generate more than one CLASS, while each CLASS is based on one COURSE, so there is a 1:M relationship between COURSE and CLASS.

These relationships are all reflected in the ER diagram shown in Figure P4.5b. Note the optional and mandatory relationships: 

To exist, a CLASS must have TRAINEEs enrolled in it, but TRAINEEs do not necessarily take CLASSes. (Some may take “on the job training.”)



An INSTRUCTOR may not be teaching any CLASSes during some enrollment periods. For example, an instructor may be assigned to duties other than training. However, each CLASS must have an INSTRUCTOR.



If an insufficient number of people sign up for a CLASS, a COURSE may not generate any CLASSes, but each CLASS must represent a COURSE.

497

NOTE The sentences “HEG has twelve instructors.” and “HEG offers five advanced technology courses.” are not reflected in the ER diagram. Instead, they represent additional information concerning the volume of data (number of entities in an entity set), rather than information concerning entity relationships. Because the HEG description leaves room for different interpretations of optional vs. mandatory relationships, we like to give the student the benefit of the doubt. Therefore, unless the question or problem description is sufficiently precise to leave no doubt about the existence of optional/mandatory relationships, we base the student grade on two criteria: 3. Was the basic nature of the relationship—1:1, 1:M, or M:N—selected and displayed properly? 4. Given the student’s rendering of such a relationship, are the cardinalities appropriate? You can add substantial detail to the ERD by including sample attributes for each of the entities. Using a data modeling tool, you can also let your student declare the nature—weak or strong—of the relationships among the entities. Finally, remind your students that the order in which the attributes appear in each entity is immaterial. Therefore, the (composite) PK of the ENROLL entity can be written as either CLASS_CODE + TRN_NUM or as TRN_NUM + CLASS_CODE. That’s why it is also immaterial which one of the foreign key attributes is FK1 or FK2. As you discuss the ERD shown in Figure P4.5b, note that the basic components of this problem are found in the text’s Figure 4.32. Note also that the ENROLL entity in Figure P4.5b uses a composite PK (TRN_NUM + CLASS_CODE) and that, therefore the relationships between ENROLL and CLASS and TRAINEE are strong. Finally, discuss the reason for the weak relationship between COURSE and CLASS—the CLASS entity’s PK (CLASS_CODE) does not “borrow” the PK of the parent COURSE entity. If the CLASS entity’s PK had been composed of CRS_CODE + CLASS_SECTION, the relationship between COURSE and CLASS would have been strong. Discussion: Review the text to show the two possible relationship strengths between COURSE and CLASS. Emphasize that the choice of the PK component(s) is usually a designer option, but that single-attribute PKs tend to yield more design options than composite PKs. Even the composite ENROLL entity can be modified to have a single-attribute PK such as ENROLL_NUM. Given that choice, CLASS_CODE + TRN_NUM constitute a candidate key—CLASS_CODE and TRN_NUM continue to serve as foreign keys to CLASS and TRAINEE, respectively. Given the latter scenario, you can create a (unique) composite index to prevent duplicate enrollments.

498

591. Automata, Inc., produces specialty vehicles by contract. The company operates several departments, each of which builds a particular vehicle, such as a limousine, truck, van, or RV. Answer: 



Given that functional description of the processes at Automata’s purchasing department, do the following: e. Identify all of the main entities. f.

Identify all of the relations and connectivities among entities.

g. Identify the type of existence dependence in all the relationships. h. Give at least two examples of the types of reports that can be obtained from the database. The initial Crow’s Foot ERD is shown in Figure P4.6init. The discussion preceding Figure P4.6rev explains why the revision was made.

499

FIGURE P4.6init Initial Automata Crow’s Foot ERD

As you explain the development of the Crow’s Foot ERD shown in Figure P4.6init, several points are worth stressing: 



500



The other optionalities should be discussed, too—using the same basic scenarios that were described in bullets 2 and 3.

FIGURE P4.6rev Revised Automata Crow’s Foot ERD

592. United Helpers is a nonprofit organization that provides aid to people after natural disasters. Based on the following brief description of operations, create the appropriate fully labeled Crow’s Foot ERD. Answer:

501



For all tasks of type “packing,” there is a packing list that specifies the contents of the packages. There are many packing lists to produce different packages, such as basic medical packages, child-care packages, and food packages. Each packing list has an ID number, a packing list name, and a packing li st description, which describes the items that should make up the package. Every packing task is associated with only one packing list. A packing list may not be associated with any tasks, or it may be associated with many tasks. Tasks that are not packing tasks are not associated with any packing list.



The ERD for United Helpers is shown in Figure P4.7a.

502

FIGURE P4.7a United Helpers ERD

This problem, however, does leave room for interesting discussion with the students regarding the need to verify requirements with the business users. In fact, getting unambiguous business rules can be one of the most difficult parts of the design process. In this problem, the potential for a relationship between the packing list (LIST) and the items (ITEM) stocked by the organization can be a source for discussion. Students may envision that a LIST can specify many ITEMs and an ITEM can be specified in many LISTs. This would imply the need for a M:N relationship between ITEM and LIST. However, the business users may not intend for the packing list to be that specific. For example, the packing list may specify that “2 liter of iodine” should be included in a given type of package without specifying whether it should be two 1-liter bottles of iodine or four 500-ml bottles of iodine. Note that “1-liter bottle of iodine” and “500-ml bottle of iodine” would have to be separate entity instances in ITEM because they have different values. If it is the case that the packing list is intentionally generic in its description of the ideal contents, then a relationship between LIST and ITEM would not be appropriate. 593. Using the Crow’s Foot notation, create an ERD that can be implemented for a medical clinic using the following business rules:

503

Answer: 



Emergency cases do not require an appointment. However, for appointment management purposes, an emergency is entered in the appointment book as “unscheduled.”



If kept, an appointment yields a visit with the doctor specified in the appointment. The visit yields a diagnosis and, when appropriate, treatment.



With each visit, the patient’s records are updated to provide a medical history.



Each patient visit creates a bill. Each patient visit is billed by one doctor, and each doctor can bill many patients.



Each bill must be paid. However, a bill may be paid in many installments, and a payment may cover more than one bill.



A patient may pay the bill directly, or the bill may be the basis for a claim submitted to an insurance company.



If the bill is paid by an insurance company, the deductible is submitted to the patient for payment.

The ERD solution is shown in Figure P4.8.

504

FIGURE P4.8 The Medical Clinic’s Crow’s Foot ERD

594. Create a Crow’s Foot notation ERD to support the following business operations: Answer: 

A friend of yours has opened Professional Electronics and Repairs (PEAR) to repair smartphones, laptops, tablets, and MP3 players. She wants you to create a database to help her run her business.



505



506

FIGURE P4.9 The PEAR ERD

595. Luxury-Oriented Scenic Tours (LOST) provides guided tours to groups of visitors to the Washington D.C. area. In recent years, LOST has grown quickly and is having difficulty keeping up with all of the various information needs of the company. The company’s operations are as follows: Answer: 

507



c. Create a Crow’s Foot notation ERD to support LOST operations.

508

FIGURE P4.10a The first LOST ERD

d. The operations provided state that it is possible for a guide to lead an outing of a tour even if the guide is not officially qualified to lead outings of that tour. Imagine that the business rules instead specified that a guide (a) is never, under any circumstance, allowed to lead an outing unless he or she is qualified to lead outings of that tour. How could the data model in Part a. be modified to enforce this new constraint?

509

FIGURE P4.10b The second LOST ERD 596. Beverage Buddy (BB) is a diabetes-friendly mobile app to track and share beverage information with friends. BB tracks data about teas, coffees, and other drinks to help individuals with diabetes manage their blood sugar levels. Create a Crow’s Foot notation ERD to support the core operations of the BB app as follows:

510

Answer: 



511



To help protect user privacy, BB does not store data about any searches that users make.

512

FIGURE P4.11 The Beverage Buddy ERD

513

CASE SOLUTIONS 597. The administrators of Tiny College are so pleased with your design and implementation of their student registration and tracking system that they want you to expand the design to include the database for their motor vehicle pool. A brief description of operations follows: Answer: 



514



FIGURE P4.12 The Tiny College TFBS Maintenance ERD 598. During peak periods, Temporary Employment Corporation (TEC) places temporary workers in companies. TEC’s manager gives you the following description of the business:

515

Answer: 

TEC has a file of candidates who are willing to work.



TEC offers courses to help candidates improve their qualifications.



Every course develops one specific qualification; however, TEC does not offer a course for every qualification. Some qualifications are developed through multiple courses.



Candidates can pay a fee to attend a training session. A training session can accommodate several candidates, although new training sessions will not have any candidates registered at first.



TEC also has a list of companies that request temporaries.



Each opening requires only one specific or main qualification.



An opening can be filled by many candidates, and a candidate can fill many openings.



TEC uses special codes to describe a candidate’s qualifications for an opening. The list of codes is shown in Table P4.13.

516

Table P4.13 Codes for Problem 13 CODE

DESCRIPTION

SEC-45

Secretarial work; candidate must type at least 45 words per minute

SEC-60

Secretarial work; candidate must type at least 60 words per minute

CLERK

General clerking work

PRG-PY

Programmer, Python

PRG-C++

Programmer, C++

DBA-ORA

Database Administrator, Oracle

DBA-DB2

Database Administrator, IBM DB2

DBA-SQLSERV

Database Administrator, MS SQL Server

SYS-1

Systems Analyst, level 1

SYS-2

Systems Analyst, level 2

NW-CIS

Network Administrator, Cisco experience

WD-CF

Web Developer, ColdFusion

TEC’s management wants to keep track of the following entities: COMPANY, OPENING, QUALIFICATION, CANDIDATE, JOB_HISTORY, PLACEMENT, COURSE, and SESSION. Given that information, do the following: f.

Draw the Crow’s Foot ERDs for this enterprise.

g. Identify all necessary relationships. h. Identify the connectivity for each relationship. i.

Identify the mandatory and optional dependencies for the relationships.

Resolve all M:N relationships.

The solutions for Problems 13a–13e are shown in Figure P4.13.

517

FIGURE P4.13 TEC Solution ERD

518

To help the students understand Figure P4.13’s ER diagram’s components better, the following discussion is likely to be useful: 



COMP_CODE

OPENING_NUM

West

East

OPENING_NUM

COMP_CODE

10025

West

10026

West

10027

East

519



Similarly, the relationship between PLACEMENT and OPENING may be defined as strong or weak. We have used a weak relationship between OPENING and PLACEMENT.



QUAL_CODE

CAND_NUM

EDUC_DATE

PRG-PY

4358

12-Dec-00

PRG-C++

4358

05-Mar-03

DBA-ORA

4358

23-Nov-01

DBA-DB2

2113

02-Jun-85

DBA-ORA

2113

26-Jan-02

520



599. Use the following description of the operations of the RC_Charter2 Company to complete this exercise: Answer: 



521



Destination 180 miles

Intermediate Stop

200 miles 390 miles

Pax Pickup 130 miles

Home Base FIGURE P4.14 Round-Trip Mile Determination 

Depending on whether a customer has RC_Charter2 credit authorization, the customer may do the following:

g. Pay the entire charter bill upon the completion of the charter flight. h. Pay a part of the charter bill and charge the remainder to the account. The charge amount may not exceed the available credit. i.

Charge the entire charter bill to the account. The charge amount may not exceed the available credit.

Customers may pay all or part of the existing balance for previous charter trips. Such payments may be made at any time and are not necessarily tied to a specific charter trip. The charter mileage charge includes the expense of the pilot(s) and other crew required by FAR 135. However, if customers request additional crew not required by FAR 135, those customers are charged for the crew members on an hourly basis. The hourly crewmember charge is based on each crew member’s qualifications.

k. The database must be able to handle crew assignments. Each charter trip requires the use of an aircraft, and a crew flies each aircraft. The smaller, piston-engine charter aircraft require a crew consisting of only a single pilot. All jets and other aircraft that have a gross takeoff weight of at least 12,500 pounds require a pilot and a copilot, while some of the larger aircraft used to transport passengers may require flight attendants as part of the crew. Some of the older aircraft require the assignment of a flight engineer, and larger

522

cargo-carrying aircraft require the assignment of a loadmaster. In short, a crew can consist of more than one person, and not all crew members are pilots. l.



Although pilot licenses and ratings are not time limited, exercising the privilege of the license and ratings under Part 135 requires both a current medical certificate and a current Part 135 checkride. The following distinctions are important: c. The medical certificate may be Class I or Class II. The Class I medical is more stringent than the Class II, and it must be renewed every six months. The Class II medical must be renewed yearly. If the Class I medical is not renewed during the six-month period, it automatically reverts to a Class II certificate. If the Class II medical is not renewed within the specified period, it automatically reverts to a Class III medical, which is not valid for commercial flight operations. d. A Part 135 checkride is a practical flight examination that must be successfully completed every six months. The checkride includes all flight maneuvers and procedures specified in Part 135.

523

Table P4.14 PART A TESTS

Test Code

Test Description

Test Frequency

Part 135 Flight Check

6 months

Medical, Class I

6 months

Medical, Class II

12 months

Loadmaster Practical

12 months

Flight Attendant Practical

12 months

Drug test

Random

Operations, written exam

6 months

524

PART B RESULTS

Employee

Test Code

Test Date

Test Result

101

12-Nov-21

Pass-1

103

23-Dec-21

Pass-1

112

23-Dec-21

Pass-2

103

11-Jan-22

Pass-1

112

16-Jan-22

Pass-1

101

16-Jan-22

Pass-1

101

11-Feb-22

Pass-2

125

15-Feb-22

Pass-1

PART C LICENSES AND CERTIFICATIONS

License or Certificate

License or Certificate Description

ATP

Airline Transport Pilot

Comm

Commercial license

Med-1

Medical certificate, Class I

Med-2

Medical certificate, Class II

Instr

Instrument rating

MEL

Multiengine Land aircraft rating

Loadmaster

Flight Attendant

525

Employee

License or Certificate

Date Earned

101

Comm

12-Nov-1997

101

Instr

28-Jun-1998

101

MEL

9-Aug-1998

103

Comm

21-Dec-1999

112

23-Jun-2006

103

Instr

18-Jan-2000

112

27-Nov-2009

Pilots and other crew members must receive recurrency training appropriate to their work assignments. Recurrency training is based on an FAA-approved curriculum that is job specific. For example, pilot recurrency training includes a review of all applicable Part 135 flight rules and regulations, weather data interpretation, company flight operations requirements, and specified flight procedures. The RC_Charter2 Company is required to keep a complete record of all recurrency training for each crew member subject to the training. The RC_Charter2 Company is required to maintain a detailed record of all crew credentials and all training mandated by Part 135. The company must keep a complete record of each requirement and of all compliance data. To conduct a charter flight, the company must have a properly maintained aircraft available. A pilot who meets all of the FAA’s licensing and currency requirements must fly the aircraft as Pilot in Command (PIC). For aircraft that are powered by piston engines or turboprops and have a gross takeoff weight under 12,500 pounds, single-pilot operations are permitted under Part 135 as long as a properly maintained autopilot is available. However, even if FAR Part 135 permits single-pilot operations, many customers require the presence of a copilot who is capable of conducting the flight operations under Part 135. The RC_Charter2 operations manager anticipates the lease of turbojet-powered aircraft, which are required to have a crew consisting of a pilot and copilot. Both the pilot and copilot must meet the same Part 135 licensing, ratings, and training requirements. The company also leases larger aircraft that exceed the 12,500-pound gross takeoff weight. Those aircraft might carry enough passengers to require the presence of one or more flight attendants. If those aircraft carry cargo that weighs more than 12,500 pounds, a loadmaster must be assigned as a crew member to supervise the loading and securing of the cargo. The database must be designed to meet the anticipated capability for additional charter crew assignments. c. Given this incomplete description of operations, write all applicable business rules to establish entities, relationships, optionalities, connectivities, and cardinalities. (Hint: Use the

526

following four business rules as examples, and write the remaining business rules in the same format.) 

Each charter trip is requested by only one customer.



Some customers have not yet requested a charter trip.



An employee may be assigned to serve as a crew member on many charter trips.



Each charter trip may have many employees assigned to serve as crew members.

d. Draw the fully labeled and implementable Crow’s Foot ERD based on the business rules you wrote in Part a of this problem. Include all entities, relationships, optionalities, connectivities, and cardinalities. The following business rules can be derived from the description of operations: 

A customer may request many charter trips.



Each charter trip is requested by only one customer.



Some customers have not (yet) requested a charter trip.



Every charter trip is requested by at least one customer.



An employee may be assigned to serve as a crew member on many charter trips.



Each charter trip may have many employees assigned to it to serve as crew members.



An employee may not yet have been assigned to serve as a crew member on any charter trip.



A charter trip may not yet have any employee assigned to serve as a crew member.



Each customer may make many payments.



Some customers have not made any payments yet.



Every payment is made by only one customer.



Every payment must have been made by a customer.



A payment may be toward many charter trips.



A payment may not be in reference to any charter trip.



Every charter trip must have a payment made.



Each charter trip has only one payment.



Every charter trip involves the use of a single aircraft.



Every charter trip requires at least one aircraft.



An aircraft may be used for many charter trips.



An aircraft may not yet have been used for any charter trip.



Each aircraft is only one model airplane.



Every aircraft has a model designation.



An airplane model is not required to be associated with any aircraft that the company owns.



The company may own many aircraft of a given model.

527



A given flight assignment may be given to many crew members.



Some flight assignments may not have ever been given to any crew member.



Every crew member assignment is associated with a flight assignment.



Every crew member assignment is associated with only one flight assignment.



An employee may have taken many tests.



Some employees may have taken no tests yet.



A test may be taken by many employees.



A test may not have been taken by any employee yet.



Each employee has one job with the company.



Every employee has only one job with the company.



A job may be done by many employees.



A job may be currently unfilled and not be associated with any employee.



An employee may be a pilot, and every pilot is an employee.



A pilot may have earned many ratings.



Some pilots have not earned any rating yet.



A rating may be earned by many pilots.



Some ratings are not held by any pilots.



A pilot may have many licenses.



A pilot may not have any license yet.



A license may be held by many pilots.



A license may not be held by any pilot yet.



Every employee can have many qualifications.



Some employees do not have any qualifications.



Each qualification can be held by many employees.



Some qualifications are not held by any employee.

The completed ERD is shown in Figure P4.14b.

528

FIGURE P4.14b The RC_Charter2 Flight Department Crow’s Foot ERD

529

ANSWERS TO REVIEW QUESTIONS 600. What is an entity supertype, and why is it used? Answer: An entity supertype is a generic entity type that is related to one or more entity subtypes, where the entity supertype contains the common characteristics and the entity subtypes contain the unique characteristics of each entity subtype. The reason for using supertypes is to minimize the number of nulls and to minimize the likelihood of redundant relationships. 601. What kinds of data would you store in an entity subtype? Answer: An entity subtype is a more specific entity type that is related to an entity supertype, where the entity supertype contains the common characteristics and the entity subtypes contain the unique characteristics of each entity subtype. The entity subtype will store the data that is specific to the entity; that is, attributes that are unique to the subtype. 602. What is a specialization hierarchy? Answer: A specialization hierarchy depicts the arrangement of higher-level entity supertypes (parent entities) and lower-level entity subtypes (child entities). To answer the question precisely, we have used the text’s Figure 5.2. (We have reproduced the figure here for your convenience.) Figure 5.2 shows the specialization hierarchy formed by an EMPLOYEE supertype and three entity subtypes—PILOT, MECHANIC, and ACCOUNTANT.

530

(Text) FIGURE 5.2 A Specialization Hierarchy

The specialization hierarchy shown in Figure 5.2 reflects the 1:1 relationship between EMPLOYEE and its subtypes. For example, a PILOT subtype occurrence is related to one instance of the EMPLOYEE supertype, and a MECHANIC subtype occurrence is related to one instance of the EMPLOYEE supertype. See Question 5 for the discussion of overlapping and disjoint subtypes. 603. What is a subtype discriminator? Give an example of its use. Answer: A subtype discriminator is the attribute in the supertype entity that is used to determine to which entity subtype the supertype occurrence is related. For any given supertype occurrence, the value of the subtype discriminator will determine which subtype the supertype occurrence is related to. For example, an EMPLOYEE supertype may include the EMP_TYPE value “P” to indicate the PROFESSOR subtype. Using Figure 5.2, the EMP_TYPE subtype discriminator attribute would have a value to represent a pilot (“P”), a mechanic (“M”), or an accountant (“A”). Notice that this is a disjoint constraint on the subtype discriminator.

531

604. What is an overlapping subtype? Give an example. Answer: Overlapping subtypes are subtypes that contain non unique subsets of the supertype entity set; that is, each entity instance of the supertype may appear in more than one subtype. For example, in a university environment, a person may be an employee or a student or both. In turn, an employee may be a professor as well as an administrator. Because an employee also may be a student, STUDENT and EMPLOYEE are overlapping subtypes of the supertype PERSON, just as PROFESSOR and ADMINISTRATOR are overlapping subtypes of the supertype EMPLOYEE. The text’s Figure 5.4 (reproduced next for your convenience) illustrates overlapping subtypes with the use of the letter O inside the category shape.

(Text) FIGURE 5.4 Specialization Hierarchy with Overlapping Subtypes

605. What is a disjoint subtype? Give an example. Answer: Disjoint subtypes, also known as nonoverlapping subtypes, are subtypes that contain a unique subset of the supertype entity set; in other words, each entity instance of the supertype can appear in only one of the subtypes. For example, in Figure 5.2, shown in Question 3, an employee (supertype) who is a pilot (subtype) can appear only in the PILOT subtype, not in any of the other subtypes. In an ERD, such disjoint subtypes are indicated by the letter d inside the category shape. See Figure 5.2 in textbook or in Question 3. Also, see Figure 5.5 Disjoint and Overlapping Subtypes in the textbook.

532

NOTE There are multiple ER notations to represent supertypes/subtypes. Please consult the documentation of the ER diagramming tool you are using. 606. What is the difference between partial completeness and total completeness? Answer: Partial completeness means that not every supertype occurrence is a member of a subtype; that is, there may be some supertype occurrences that are not members of any subtype. Total completeness means that every supertype occurrence must be a member of at least one subtype. For Questions 8–10, refer to Figure Q5.8

FIGURE Q5.8 The PRODUCT Data Model

607. List all of the attributes of a movie. Answer: Recall that the subtype inherits all of the attributes and relationships of the supertype. Therefore, all of the attributes of a subtype include the common attributes from the supertype plus the unique (unique to that subtype) attributes from the subtype. All of the attributes of a movie would be: 

Prod_Num



Prod_Title



Prod_ReleaseDate



Prod_Price



Prod_Type



Movie_Rating



Movie_Director

533

608. According to the data model, is it required that every entity instance in the PRODUCT table be associated with an entity instance in the CD table? Why or why not? Answer: No. The completeness constraint for the data model shows a total completeness constraint from PRODUCT to the subtypes. However, the total completeness constraint indicates that every instance in the supertype (PRODUCT) must be associated with one row in some subtype, not all subtypes. Since the subtypes are designated as disjoint, or exclusive, then every row in the supertype is associated a row in only one subtype. For some products that subtype will be CD, but for other products the subtype will be either Movie or Book. 609. Is it possible for a book to appear in the BOOK table without appearing in the PRODUCT table? Why or why not? Answer: No. Subtypes can only exist within the context of a supertype. 610. What is an entity cluster, and what advantages are derived from its use? Answer: An entity cluster is a “virtual” entity type used to represent multiple entities and relationships in the ERD. An entity cluster is formed by combining multiple interrelated entities into a single abstract entity object. An entity cluster is considered “virtual” or “abstract” in the sense that it is not actually an entity in the final ERD, but rather a temporary entity used to represent multiple entities and relationships with the purpose of simplifying the ERD and thus enhancing its readability. 611. What primary key characteristics are considered desirable? Explain why each characteristic is considered desirable. Answer: Desirable PK characteristics are summarized in the text’s Table 5.3, reproduced below for your convenience. The table also includes the reason why each characteristic is desirable. (See the Rationale column.)

534

(Text) TABLE 5.3 Desirable Primary Key Characteristics PK Characteristic

Rationale

Unique values

The PK must uniquely identify each entity instance. A primary key must be able to guarantee unique values. It cannot contain nulls.

Nonintelligent

No change over time

Preferably single-attribute

Preferably numeric

Security complaint

535

612. Under what circumstances are composite primary keys appropriate? Answer: Composite primary keys are particularly useful in two cases: 

As identifiers of composite entities, where each primary key combination is allowed only once in the M:N relationship.



As identifiers of weak entities, where the weak entity has a strong identifying relationship with the parent entity.

(Text) FIGURE 5.7 The M:N Relationship Between Student and Class

As shown in the text’s Figure 5.7, the composite primary key automatically provides the benefit of ensuring that there cannot be duplicate values—that is, it ensures that the same student cannot enroll more than once in the same class. In the second case, a weak entity in a strong identifying relationship with a parent entity is normally used to represent one of two cases: 3. A real-world object that is existent dependent on another real-world object. Those types of objects are distinguishable in the real world. A dependent and an employee are two separate people who exist independent of each other. However, such objects can exist in the model only when they relate to each other in a strong identifying relationship. For example, the relationship between EMPLOYEE and DEPENDENT is one of existence dependency in which the primary key of the dependent entity is a composite key that contains the key of the parent entity.

536

4. A real-world object that is represented in the data model as two separate entities in a strong identifying relationship. For example, the real-world invoice object is represented by two entities in a data model: INVOICE and LINE. Clearly, the LINE entity does not exist in the real world as an independent object, but rather as part of an INVOICE. In both cases, having a strong identifying relationship ensures that the dependent entity can exist only when it is related to the parent entity. In summary, the selection of a composite primary key for composite and weak entity types provides benefits that enhance the integrity and consistency of the model. 613. What is a surrogate primary key, and when would you use one? Answer: A surrogate primary key is an “artificial” PK that is used to uniquely identify each entity occurrence when there is no good natural key available or when the “natural” PK would include multiple attributes. A surrogate PK is also used if the natural PK would be a long text variable. The reason for using a surrogate PK is to ensure entity integrity, to simplify application development by making queries simpler, to ensure query efficiency—for example, a query based on a simple numeric attribute is much faster than one based on a 200-bit character string—and to ensure that relationships between entities can be created more easily than would be the case with a composite PK that may have to be used as an FK in a related entity. 614. When implementing a 1:1 relationship, where should you place the foreign key if one side is mandatory and one side is optional? Should the foreign key be mandatory or optional? Answer: Section 5.4.1 provides a detailed discussion. The text’s Table 5.5, reproduced here for your convenience, shows the rationale for selecting the foreign key in a 1:1 relationship based on the relationship properties in the ERD.

(Text) TABLE 5.5 Selection of Foreign Key in a 1:1 Relationship Case

ER Relationship Constraints

Action

One side is mandatory and Place the PK of the entity on the mandatory the other side is optional. side in the entity on the optional side as an FK and make the FK mandatory.

Both sides are optional.

Select the FK that causes the fewest nulls, or place the FK in the entity in which the (relationship) role is played.

III

Both sides are mandatory.

See Case II or consider revising your model to ensure that the two entities do not belong together in a single entity.

537

615. What is time-variant data, and how would you deal with such data from a database design point of view? Answer: As the label implies, time-variant data are time-sensitive. For example, if a university wants to keep track of the history of all administrative appointments by date of appointment and date of termination, you see time-variant data at work. Other examples of time-variant data are stock prices; they vary multiple times during the day. Generally, an accepted design practice is to record the opening and closing stock price. In other cases, such as product prices (they change over a period of time), it is a common practice to store current prices in the product table and past prices in a related table with a date field to represent the date the price changed. Also, the teacher could use Figure 5.11 to illustrate a history of the various jobs and salaries a person had over time. 616. What is the most common design trap, and how does it occur? Answer: A design trap occurs when a relationship is improperly or incompletely identified and therefore, it is represented in a way that is not consistent with the real world. The most common design trap is known as a fan trap. A fan trap occurs when you have one entity in two 1:M relationships to other entities, thus producing an association among the other entities that is not expressed in the model.

538

FIGURE P5.1 Two-Bit Drilling Company ERD

539

617. Given the following business scenario, create a Crow’s Foot ERD using a specialization hierarchy if appropriate. Tiny Hospital keeps information on patients and hospital rooms. The system assigns each patient a patient ID number. In addition, the patient’s name and date of birth are recorded. Some patients are resident patients who spend at least one night in the hospital and others are outpatients who are treated and released. Resident patients are assigned to a room. Each room is identified by a room number. The system also stores the room type (private or semiprivate) and room fee. Over time, each room will have many patients. Each resident patient will stay in only one room. Every room must have had a patient, and every resident patient must have a room. Answer: The data model for this scenario is given in Figure P5.2 below.

FIGURE P5.2 Tiny Hospital ERD

Note that in this scenario, a specialization hierarchy is not appropriate. While resident patients are an identifiable kind or type of patient instance, there are not additional attributes that are unique to only that kind or type of patient. Participation in a relationship that is unique to a particular kind or type of instance is not sufficient justification for a specialization hierarchy. Indicating that only some instances will participate in a relationship is addressed by the optional participation designation. In this scenario, all resident patients must have a room; however, not all patients are resident patients so ROOM is optional to patient. If students ask about the need for an attribute to distinguish between outpatients and resident patients, remind them that in this limited scenario the only distinction between outpatients and resident patients is whether or not they are associated with a room. Therefore, the Room_Num foreign key in the PATIENT table can serve in that capacity. 618. Given the following business scenario, create a Crow’s Foot ERD using a specialization hierarchy if appropriate. Granite Sales Company keeps information on employees and the departments in which they work. For each department, the department name, internal mailbox number, and office phone extension are kept. A department can have many assigned employees, and each employee is assigned to only one department. Employees can be salaried, hourly, or work on contract. All employees are assigned an employee number, which is kept along with the employee’s name and address. For hourly employees, hourly wages and target weekly work hours are stored; for example, the company may target 40 hours/week for some employees, 32 for others, and 20 for others. Some salaried employees are salespeople who can earn a commission in addition to their base salary. For all salaried employees, the yearly salary amount is recorded in the system. For salespeople, their commission percentage on sales and commission percentage on profit are stored in the system. For example, John is a salesperson with a base salary of $50,000 per year plus a 2 percent commission on the sales price for all sales he makes, plus another 5 percent of the profit on each of those sales. For contract employees, the beginning date and end date of their contracts are stored along with the billing rate for their hours. Answer: The data model for this scenario is given in Figure P5.3 below.

540

541

5. In Chapter 4, you saw the creation of the Tiny College database design, which reflected such business rules as “a professor may advise many students” and “a professor may chair one department.” Modify the design shown in Figure 4.35 to include these business rules: 

An employee could be staff, a professor, or an administrator.



A professor may also be an administrator.



Staff employees have a work-level classification, such as Level I or Level II.



Only professors can chair a department. A department is chaired by only one professor.



Only professors can serve as the dean of a college. Each of the university’s colleges is served by one dean.



A professor can teach many classes.



Administrators have a position title.

Given that information, create the complete ERD that contains all primary keys, foreign keys, and main attributes. Answer: The solution is shown in Figure P5.4 below.

542

FIGURE P5.4 Updated Tiny College ERD

543

FIGURE P5.5 Tiny College Job History ERD Segment

544

9. Some Tiny College staff employees are information technology (IT) personnel. Some IT personnel provide technology support for academic programs, some provide technology infrastructure support, and some provide support for both. IT personnel are not professors; they are required to take periodic training to retain their technical expertise. Tiny College tracks all IT personnel training by date, type, and results (completed versus not completed). Given that information, create the complete ERD that contains all primary keys, foreign keys, and main attributes. Answer: This problem provides an opportunity to reinforce the idea that to qualify as a subtype, the identifiable kind or type of instance must include additional attributes—being an identifiable kind or type of entity instance is necessary but not sufficient to justify the creation of subtypes. Given the minimal attributes specified in the problem, the solution would be as shown in Figure 5.6A.

FIGURE 5.6A Minimal Tiny College IT Staffing Solution

545

FIGURE 5.6B Expanded Tiny College IT Staffing Solution

546

10. The FlyRight Aircraft Maintenance (FRAM) division of the FlyRight Company (FRC) performs all maintenance for FRC’s aircraft. Produce a data model segment that reflects the following business rules: 

All mechanics are FRC employees. Not all employees are mechanics.



FRC keeps an employment history of all mechanics. The history includes the date hired, date promoted, and date terminated.

Given those requirements, create the Crow’s Foot ERD segment. Answer: The solution is shown in the following figure:

547

11. “Martial Arts R Us” (MARU) needs a database. MARU is a martial arts school with hundreds of students. The database must keep track of all the classes that are offered, who is assigned to teach each class, and which students attend each class. Also, it is important to track the progress of each student as they advance. Create a complete Crow’s Foot ERD for these requirements: 

Students are given a student number when they join the school. The number is stored along with their name, date of birth, and the date they joined the school.



Answer: The solution for this case is shown in Figure P5.8 below.

548

FIGURE P5.8 MARU ERD Solution

549

The case also provides an opportunity to reinforce the fact that subtypes inherit not only the attributes of the supertype but also the relationships. One requirement of the case is that the system must be able to track which instructors actually taught each class meeting. There is already a M:N relationship between STUDENT and MEETING that can be implemented with the ATTENDANCE bridge entity using only the Stu_Num and Meet_Num attributes. Students should consider that because INSTRUCTOR is a subtype of STUDENT, instructors are already associated in a M:N relationship with MEETING through that same bridge. By adding the Attend_Role attribute to ATTENDANCE, the bridge entity can properly track all students in a given class meeting and record what role they played in that meeting (e.g., student, assistant instructor, or head instructor). Finally, it is worth pointing out to the students that requirements are described as being an attribute of a rank. Some students will immediately consider requirements to be an entity, while others will model requirement as an attribute of the RANK entity. Considering rank requirements to be an attribute of RANK is perfectly acceptable—however, it must be noted that as such rank requirements would be a multivalued attribute. Therefore, the preferred implementation of a multivalued attribute (creating a new entity for the multivalued attribute) would result in the creation of the REQUIREMENT table anyway. So either way the student approaches the problem, it will eventually lead to the solution shown above. 10. The Journal of E-commerce Research Knowledge is a prestigious information systems research journal. It uses a peer-review process to select manuscripts for publication. Only about 10 percent of the manuscripts submitted to the journal are accepted for publication. A new issue of the journal is published each quarter. Create a complete ERD to support the business needs described below. 



550



Answer: The solution for this case is shown in Figure P5.9 below.

551

FIGURE P5.9 Journal of E-Commerce Research Knowledge ERD Solution

552



553



Answer: The solution for this case is shown in Figure P5.10 below.

554

FIGURE P5.10 Global Unified Technology Sales ERD Solution

619. Global Computer Solutions (GCS) is an information technology consulting company with many offices throughout the United States. The company’s success is based on its ability to maximize its resources—that is, its ability to match highly skilled employees with projects according to region. To better manage its projects, GCS has contacted you to design a database so GCS managers can keep track of their customers, employees, projects, project schedules, assignments, and invoices.

555

The GCS database must support all of GCS’s operations and information requirements. A basic description of the main entities follows: 

The employees of GCS must have an employee ID, a last name, a middle initial, a first name, a region, and a date of hire recorded in the system.



Valid regions are as follows: Northwest (NW), Southwest (SW), Midwest North (MN), Midwest South (MS), Northeast (NE), and Southeast (SE).



Each employee has many skills, and many employees have the same skill.



TABLE P5.11A Skill

Employee

Data Entry I

Seaton Amy; Williams Josh; Underwood Trish

Data Entry II

Williams Josh; Seaton Amy

Systems Analyst I

Craig Brett; Sewell Beth; Robbins Erin; Bush Emily; Zebras Steve

Systems Analyst II

Chandler Joseph; Burklow Shane; Robbins Erin

DB Designer I

Yarbrough Peter; Smith Mary

DB Designer II

Yarbrough Peter; Pascoe Jonathan

Java I

Kattan Chris; Epahnor Victor; Summers Anna; Ellis Maria

Java II

Kattan Chris; Epahnor Victor, Batts Melissa

C++ I

Smith Jose; Rogers Adam; Cope Leslie

C++ II

Rogers Adam; Bible Hanah

Python I

Zebras Steve; Ellis Maria

Python II

Zebras Steve; Newton Christopher

ColdFusion I

Duarte Miriam; Bush Emily

ColdFusion II

Bush Emily; Newton Christopher

ASP I

Duarte Miriam; Bush Emily

556

Skill

Employee

ASP II

Duarte Miriam; Newton Christopher

Oracle DBA

Smith Jose; Pascoe Jonathan

SQL Server DBA

Yarbrough Peter; Smith Jose

Network Engineer I

Bush Emily; Smith Mary

Network Engineer II

Bush Emily; Smith Mary

Web Administrator

Bush Emily; Smith Mary; Newton Christopher

Technical Writer

Kilby Surgena; Bender Larry

Project Manager

Paine Brad; Mudd Roger; Kenyon Tiffany; Connor Sean



GCS has many customers. Each customer has a customer ID, name, phone number, and region.



557

TABLE P5.11B Project ID:

Description: Sales Management System

Company :

See Rocks

Contract Date: 2/12/2022

Region: NW

Start Date:

3/1/2022

End Date:

Budget: $15,500

Start Date

End Date

Task Description

Skill(s) Required

3/1/18

3/6/22

Initial Interview

Project Manager

Systems Analyst II

DB Designer I

7/1/2022

Quantity Required

3/11/18

3/15/22

Database Design

DB Designer I

3/11/18

4/12/22

System Design

Systems Analyst II

Systems Analyst I

3/18/18

3/22/22

Database Implementation

Oracle DBA

3/25/18

5/20/22

System Coding & Testing

Java I

Java II

Oracle DBA

3/25/18

6/7/22

System Documentation

Technical Writer

6/10/18

6/14/22

Final Evaluation

Project Manager

Systems Analyst II

DB Designer I

Java II

Project Manager

Systems Analyst II

DB Designer I

Java II

Project Manager

6/17/18

7/1/18

6/21/22

7/1/22

On site System Online and Data Loading

Sign-Off

558



TABLE P5.11C Project ID: 1

Description: Sales Management System

Company: See Rocks

Contract Date: 2/12/2022

SCHEDULED

As of: 03/29/22 ACTUAL ASSIGNMENTS

Project Task

Start Date

End Date

Skill

Employee

Start Date

End Date

Initial Interview

3/1/22

3/6/22

Project Mgr.

101—Connor S.

3/1/22

3/6/22

Sys. Analyst II

102—Burklow S.

3/1/22

3/6/22

DB Designer I

103—Smith M.

3/1/22

3/6/22 3/14/22

Database Design

3/11/22

3/15/22

DB Designer I

104—Smith M.

3/11/22

System Design

3/11/22

4/12/22

Sys. Analyst II

105—Burklow S.

3/11/22

Sys. Analyst I

106—Bush E.

3/11/22

Sys. Analyst I

107—Zebras S.

3/11/22

Database Implementation

3/18/22

3/22/22

Oracle DBA

108—Smith J.

3/15/22

System Coding & Testing

3/25/22

5/20/22

Java I

109—Summers A.

3/21/22

3/19/22

559

Java I

110—Ellis M.

3/21/22

Java II

111—Ephanor V.

3/21/22

Oracle DBA

112—Smith J.

3/21/22

113—Kilby S.

3/25/22

System Documentation

3/25/22

6/7/22

Tech. Writer

Final Evaluation

6/10/22

6/14/22

Project Mgr. Sys. Analyst II DB Designer I Java II

On site System Online and Data Loading

6/17/22

6/21/22

Project Mgr. Sys. Analyst II DB Designer I Java II

Sign-Off

7/1/22

Project Mgr.

560

TABLE P5.11D Employee Name

Week Ending

Assignment Number

Burklow S.

3/1/22

1-102

xxx

Connor S.

3/1/22

1-101

xxx

Smith M.

3/1/22

1-103

xxx

Burklow S.

3/8/22

1-102

xxx

Connor S.

3/8/22

1-101

xxx

Smith M.

3/8/22

1-103

xxx

Burklow S.

3/15/22

1-105

xxx

Bush E.

3/15/22

1-106

xxx

Smith J.

3/15/22

1-108

xxx

Smith M.

3/15/22

1-104

xxx

Zebras S.

3/15/22

1-107

xxx

Burklow S.

3/22/22

1-105

Bush E.

3/22/22

1-106

Ellis M.

3/22/22

1-110

Ephanor V.

3/22/22

1-111

Smith J.

3/22/22

1-108

Smith J.

3/22/22

1-112

Summers A.

3/22/22

1-109

Zebras S.

3/22/22

1-107

Burklow S.

3/29/22

1-105

Bush E.

3/29/22

1-106

Ellis M.

3/29/22

1-110

Hours Worked

Bill Number

561

Employee Name

Week Ending

Assignment Number

Ephanor V.

3/29/22

1-111

Kilby S.

3/29/22

1-113

Smith J.

3/29/22

1-112

Summers A.

3/29/22

1-109

Zebras S.

3/29/22

1-107

Hours Worked

Bill Number

Note: xxx represents the bill ID. Use the one that matches the bill number in your database. 

Create all of the required tables and required relationships.



Create the required indexes to maintain entity integrity when using surrogate primary keys.



Populate the tables as needed, as indicated in the sample data and forms.

Evaluation of primary keys and surrogate keys. (When should each one be used?)



Evaluation of the use of indexes on candidate keys to avoid duplicate entries when using surrogate keys.



Evaluation of the use of redundant relationships. In some cases, it is better to have the foreign key attribute added to an entity, instead of using multiple join operations.

Divide the class in groups of three students per group.



Distribute the GCS database case to all students.



Assign a deadline for the groups to submit an initial design ERD with written explanations of the ERD components and features. This deadline should be two

562

weeks from the assignment date. (While the groups are working on the design phase, students will be learning to use SQL to generate information.) 



Please note that there are two database files available: 



Figure P5-11A shows the sample tables in the GCSdata.mdb student database.

563

FIGURE P5-11A GCS Student Sample Database Tables

The GCSdata-sol.mdb file contains the solution for this design case. Figure P5-11B shows the relational diagram for the solution.

564

FIGURE P5.11B Relational Diagram for the GCS Database

To help your students understand the ERD, use Table P5.11 to describe the main tables and the main indexes that are appropriate for this design implementation.

TABLE P5.11 ERD Documentation Table Name

Primary key

Unique, Not Null Index (on candidate key)

Explanation

Customer

cus_id (surrogate)

unique(cus_name)

The unique index on cus_name is used to ensure no duplicate customers exist.

Region

region_id (surrogate)

unique(region_name)

The unique index on region_name is used to ensure that no duplicate regions are entered.

Employee

emp_id (surrogate)

unique(emp_lname, emp_fname, emp_mi)

The unique index on emp_lname, emp_fname, and emp_mi is used to ensure that no duplicate employees are entered.

Skill

skill_id (surrogate)

unique(skill_description)

The unique index on skill_description is used to ensure that no duplicate skills are entered.

565

Table Name

Primary key

Unique, Not Null Index (on candidate key)

skill_id

Explanation

EmpSkill

emp_id, (composite)

The composite primary key ensures that no duplicate skills are entered for each employee.

Project

prj_id (surrogate)

unique(cus_id, prj_description)

The unique index on cus_id and prj_description is used to ensure that no duplicate project entries exist for a given customer.

Task (project schedule)

task_id (surrogate)

unique(prj_id, task_descript)

The unique index on prj_id and task_descript is used to ensure that no duplicate task is given for the same project.

TS (task ts_id (surrogate) schedule)

unique(task_id, skill_id)

The unique index on task_id and skill_id is to prevent duplicate listings for a single skill within a single task for a single project.

Assign

asn_id (surrogate)

unique (ps_id, emp_id, ts_id)

The unique index on ps_id, emp_id, and ts_id is used to ensure that an employee cannot be assigned twice to perform the same skill on the same task for a given project.

Worklog

wl_id (surrogate)

unique(asn_id, wl_date)

The unique indexes on asn_id and wl_date are used to ensure that no duplicate work log entries exist (for an employee) on a given date.

Bill

566

FIGURE P5.11C ERD for the GCS Database

567

ANSWERS TO REVIEW QUESTIONS 620. What is normalization? Answer: Normalization is the process for assigning attributes to entities. Properly executed, the normalization process eliminates uncontrolled data redundancies, thus eliminating the data anomalies and the data integrity problems that are produced by such redundancies. Normalization does not eliminate data redundancy; instead, it produces the carefully controlled redundancy that lets us properly link database tables. 621. When is a table in 1NF? Answer: A table is in 1NF when all the key attributes are defined (no repeating groups in the table) and when all remaining attributes are dependent on the primary key. However, a table in 1NF still may contain partial dependencies, that is, dependencies based on only part of the primary key and/or transitive dependencies that are based on a nonkey attribute. 622. When is a table in 2NF? Answer: A table is in 2NF when it is in 1NF and it includes no partial dependencies. However, a table in 2NF may still have transitive dependencies, that is, dependencies based on attributes that are not part of the primary key. 623. When is a table in 3NF? Answer: A table is in 3NF when it is in 2NF and it contains no transitive dependencies. 624. When is a table in BCNF? Answer: A table is in Boyce-Codd Normal Form (BCNF) when it is in 3NF and every determinant in the table is a candidate key. For example, if the table is in 3NF and it contains a nonprime attribute that determines a prime attribute, the BCNF requirements are not met. (Reference the text’s Figure 6.8 to support this discussion.) This description clearly yields the following conclusions: 

If a table is in 3NF and it contains only one candidate key, 3NF and BCNF are equivalent.



BCNF can be violated only if the table contains more than one candidate key. Putting it another way, there is no way that the BCNF requirement can be violated if there is only one candidate key.

625. Given the dependency diagram shown in Figure Q6.6, Answer Items 6a–6c. Answer:

FIGURE Q6.6 Dependency Diagram for Question 6

568

e. Identify and discuss each of the indicated dependencies. C1  C2 represents a partial dependency, because C2 depends only on C1, rather than on the entire primary key composed of C1 and C3. C4  C5 represents a transitive dependency, because C5 depends on an attribute (C4) that is not part of a primary key. C1, C3  C2, C4, C5 represents a set of proper functional dependencies, because C2, C4, and C5 depend on the primary key composed of C1 and C3. f.

Create a database whose tables are at least in 2NF, showing the dependency diagrams for each table. The normalization results are shown in Figure Q6.6b.

569

FIGURE Q6.6b The Dependency Diagram for Question 6b Table 1

Primary key: C1 Foreign key: None Normal form: 3NF

Table 2 C1

Primary key: C1 + C3 Foreign key: C1 (to Table 1) Normal form: 2NF, because the table exhibits the transitive dependencies C4 C5

g. Create a database whose tables are at least in 3NF, showing the dependency diagrams for each table. The normalization results are shown in Figure Q6.6c.

FIGURE Q6.6c The Dependency Diagram for Question 6c

Table 1 Primary key: C1 Foreign key: None Normal form: 3NF

Table 2 Primary key: C1 + C3 Foreign key: C1 (to Table 1) C4 (to Table 3) Normal form: 3NF

Table 3 Primary key: C4 Foreign key: None Normal form: 3NF

570

626. The dependency diagram in Figure Q6.7 indicates that authors are paid royalties for each book they write for a publisher. The amount of the royalty can vary by author, by book, and by edition of the book. Answer:

FIGURE Q6.7 Book Royalty Dependency Diagram

Students may have questions about the last sentence in the problem statement. Illustrate to the student the following facts to clarify this problem: If a book can only have one author, then you can imply that knowing the ISBN, you also know the author and the royalty rate. However, it is very common for a book to have multiple authors, and in that case, the authors may have the same or different royalty rates. The main point of this design is the flexibility of it. Flexibility is important because it “future-proofs” the data model by allowing the model to support changes in the business rules in the future. b. Based on the dependency diagram, create a database whose tables are at least in 2NF, showing the dependency diagram for each table. The normalization results are shown in Figure Q6.7a.

571

FIGURE Q6.7a The 2NF Normalization Results for Question 7a

b. Create a database whose tables are at least in 3NF, showing the dependency diagram for each table. The normalization results are shown in Figure Q6.7b.

FIGURE Q6.7b The 3NF Normalization Results for Question 7b

572

627. The dependency diagram in Figure Q6.8 indicates that a patient can receive many prescriptions for one or more medicines over time. Based on the dependency diagram, create a database whose tables are in at least 2NF, showing the dependency diagram for each table.

Answer: FIGURE Q6.8 Prescription Dependency Diagram

The normalization results are shown in Figure Q6.8a.

FIGURE Q6.8a The 2NF Normalization Results for Question 8

628. What is a partial dependency? With what normal form is it associated? Answer: A partial dependency exists when an attribute is dependent on only a portion of the primary key. This type of dependency is associated with 1NF. The second normal form (2NF) eliminates partial dependencies.

573

629. What three data anomalies are likely to be the result of data redundancy? How can such anomalies be eliminated? Answer: The most common anomalies considered when data redundancy exists are: update anomalies, addition anomalies, and deletion anomalies. All these can easily be avoided through data normalization. Data redundancy produces data integrity problems, caused by the fact that data entry failed to conform to the rule that all copies of redundant data must be identical. 630. Define and discuss the concept of transitive dependency. Answer: Transitive dependency is a condition in which an attribute is dependent on another attribute that is not part of the primary key. This kind of dependency usually requires the decomposition of the table containing the transitive dependency. To remove a transitive dependency, the designer must perform the following actions: 

Place the attributes that create the transitive dependency in a separate table.



Make sure that the new table’s primary key attribute is the foreign key in the original table.

Figure Q6.11 shows an example of a transitive dependency removal.

FIGURE Q6.11 Transitive Dependency Removal

574

631. What is a surrogate key, and when should you use one? Answer: A surrogate key is an artificial PK introduced by the designer with the purpose of simplifying the assignment of primary keys to tables. Surrogate keys are usually numeric, they are often automatically generated by the DBMS, they are free of semantic content (they have no special meaning), and they are usually hidden from the end users. 632. Why is a table whose primary key consists of a single attribute automatically in 2NF when it is in 1NF? Answer: A dependency based on only a part of a composite primary key is called a partial dependency. Therefore, if the PK is a single attribute, there can be no partial dependencies. 633. How would you describe a condition in which one attribute is dependent on another attribute when neither attribute is part of the primary key? Answer: This condition is known as a transitive dependency. A transitive dependency is a dependency of one nonprime attribute on another nonprime attribute. (The problem with transitive dependencies is that they still yield data anomalies.) 634. Suppose someone tells you that an attribute that is part of a composite primary key is also a candidate key. How would you respond to that statement? Answer: This argument is incorrect if the composite PK contains no redundant attributes. If the composite primary key is properly defined, all of the attributes that compose it are required to identify the remaining attribute values. By definition, a candidate key is one that can be used to identify all of the remaining attributes, but it was not chosen to be a PK for some reason. In other words, a candidate key can serve as a primary key, but it was not chosen for that task for one reason or another. Clearly, a part of a proper (“minimal”) composite PK cannot be used as a PK by itself. More formally, you learned in Chapter 3, “The Relational Database Model,” Section 3-2, that a candidate key can be described as a superkey without redundancies, that is, a minimal superkey. Using this distinction, note that a STUDENT table might contain the composite key STU_NUM, STU_LNAME This composite key is a superkey, but it is not a candidate key because STU_NUM by itself is a candidate key! The combination STU_LNAME, STU_FNAME, STU_INIT, STU_PHONE might also be a candidate key, as long as you discount the possibility that two students share the same last name, first name, initial, and phone number. If the student’s Social Security number had been included as one of the attributes in the STUDENT table—perhaps named STU_SOCSECNUM—both it and STU_NUM would have been candidate keys because either one would uniquely identify each student. In that case, the selection of STU_NUM as the primary key would be driven by the designer’s choice or by end-user requirements. Note, incidentally, that a primary key is a superkey as well as a candidate key. 635. A table is in ______ normal form when it is in ______ and there are no transitive dependencies. Answer: See the discussion in Section 6-3c, “Conversion to Third Normal Form (3NF).”

575

FIGURE P6.1a Initial Dependency Diagram for Problem 1

576

FIGURE P6.1b Revised Dependency Diagram for Problem 1

FIGURE P6.1c Resolving the First Transitive Dependency

Finally, the second and final transitive dependency can now be resolved as shown in the final dependency diagram in Figure P6.1d.

577

FIGURE P6.1d Final Dependency Diagram for Problem 1

Note that at this time, we have resolved all of the transitive dependencies. Decisions on whether or not to denormalize, and perhaps not remove the final transitive dependency, have yet to be made. Also, the structures have not yet had the benefit of additional design modifications such as achieving proper naming conventions for the attributes in the new tables. However, creating the fully normalized structures is an important set toward making informed decisions about the compromises in the design that we may choose to make. NOTE: Please note that we are making the assumption that a zip code only determines one city and state. Unfortunately, this is not true, there are a handful of zip codes that traverse states. In these cases, it would be appropriate not to use the [App_zip, App_City, App_State] relation and instead add these attributes to the previous relation. Hence, the relation would be: [App_PatiendID, App_Name, App_Phone, App_Street, App_City, App_Zip, App_State]. 636. Using the descriptions of the attributes given in the figure, convert the ERD shown in Figure P6.2 into a dependency diagram that is in at least 3NF. Answer: An initial dependency diagram depicting only the primary key dependencies is shown in Figure P6.2a below.

578

FIGURE P6.2a Initial Dependency Diagram for Problem 2

FIGURE P6.2b Revised Dependency Diagram for Problem 2

Resolving the partial dependency to achieve 2NF yields the dependency diagram shown in Figure P6.2c.

579

FIGURE P6.2c 2NF Dependency Diagram for Problem 2

Finally, the transitive dependency is resolved to achieve the 3NF solution shown in the final dependency diagram in Figure P6.2d.

580

FIGURE P6.2d Final Dependency Diagram for Problem 2

581

637. Using the INVOICE table structure shown in Table P6.3, do the following: Answer:

TABLE P6.3 Attribute Name

Sample Value

INV_NUM

211347

211348

211349

PROD_NUM

AA-E3422QW

QD-300932X

RU-995748G

AA-E3422QW

GH-778345P

SALE_DATE

15-Jan-2022

16-Jan-2022

PROD_LABEL

Rotary sander

0.25-in. bit

Rotary sander

Power drill

VEND_CODE

211

309

211

157

VEND_NAME

NeverFail, Inc.

BeGood, Inc.

NeverFail, Inc.

ToughGo, Inc.

QUANT_SOLD 1

PROD_PRICE

$3.45

$39.99

$49.95

$87.75

$49.95

drill Band saw

d. Write the relational schema, draw its dependency diagram and identify all dependencies, including all partial and transitive dependencies. You can assume that the table does not contain repeating groups and that an invoice number references more than one product. (Hint: This table uses a composite primary key.) Answer: The solutions to both problems (3a and 3b) are shown in Figure P6.3a.

NOTE We have combined the solutions to Problems 3a and 3b to let you illustrate the start of the normalization process within a single PowerPoint slide. Students generally seem to have an easier time understanding the normalization process if they can compare the normal forms directly. We will continue to use this technique for several of the initial normalization decompositions if the available PowerPoint slide space permits it. e. Remove all partial dependencies, write the relational schema, and draw the new dependency diagrams. Identify the normal forms for each table structure you created.

582

→

PROD_DESCRIPTION,

PROD_PRICE,

VEND_CODE,

(Hint: Your actions should produce three dependency diagrams.)

FIGURE P6.3a The Dependency Diagrams for Problems 3a and 3b

Remove all transitive dependencies, write the relational schema, and draw the new dependency diagrams. Also identify the normal forms for each table structure you created. Answer: To illustrate the effect of Problem 3’s complete decomposition, we have shown Problem 3a’s dependency diagram again in Figure P6.3c.

583

FIGURE P6.3c The Dependency Diagram for Problem 3c

h. Draw the Crow’s Foot ERD.

584

FIGURE P6.3d The Invoicing ERD and Its (Partial) Relational Diagram

Crow’s Foot Invoicing ERD

Invoicing Relational Diagram, Sample Attributes INVOICE INV_NUM INV_DATE

LINE 1

INV_NUM PROD_NUM NUM_SOLD

PRODUCT 1 M

PROD_NUM

VEND_CODE

PROD_DESCRIPTION PROD_PRICE

VENDOR

VEND_NAME M

VEND_CODE

585

638. Using the STUDENT table structure shown in Table P6.4, do the following: Answer:

TABLE P6.4 Attribute Name

Sample Value

STU_NUM

211343

200128

199876

198648

223456

STU_LNAME

Stephanos

Smith

Jones

Ortiz

McKulski

STU_MAJOR

Accounting

Marketing Marketing

Statistics

DEPT_CODE

ACCT

MKTG

MATH

DEPT_NAME

Accounting

Marketing Marketing

Mathematics

DEPT_PHONE

4356

4378

3420

COLLEGE_NAME

Business Admin

Arts & Sciences

ADVISOR_LNAME

Grastrand

Gentry

Tillery

Chen

ADVISOR_OFFICE

T201

T228

T356

J331

ADVISOR_BLDG

Torre Building

Jones Building

ADVISOR_PHONE

2115

2123

2159

3209

STU_GPA

3.87

2.78

2.31

3.45

3.58

STU_HOURS

117

113

STU_CLASS

Junior

Sophomore

Senior

Junior

MKTG

d. Write the relational schema and draw its dependency diagram. Identify all dependencies, including all transitive dependencies. Answer: The dependency diagram for Problem 4a is shown in Figure P6.4a.

586

FIGURE P6.4a The Dependency Diagram for Problem 4a

STU_NUM STU_LNAME STU_MAJOR DEPT_CODE DEPT_NAME DEPT_PHONE COLLEGE_NAME

Transitive Dependencies

ADV_LASTNAME ADV_OFFICE ADV_BUILDING ADV_PHONE STU_CLASS STU_GPA STU_HOURS

Transitive Dependency

As you discuss Figure 6.4a, note that the single attribute PK (STU_NUM) automatically places this table in 2NF, because it is not possible to have partial dependencies when the PK consists of a single attribute. The relational schema for the dependency diagram shown in Figure P6.4a is written as: STUDENT(STU_NUM, STU_LNAME, STU_MAJOR, DEPT_CODE, DEPT_NAME, DEPT_PHONE, ADVISOR_LNAME, ADVISOR_OFFICE, ADVISOR_BLDG, ADVISOR_PHONE, STU_GPA, STU_HOURS, STU_CLASS) Notice the ADVISOR_OFFICE values in Figure P6.4a show a literal prefix that we can interpret represents the building. Furthermore, notice how the first letter matches the building name. Based on this we say that there is a transitive dependency in which ADVISOR_OFFICE determines ADVISOR_BLDG. e. Write the relational schema and draw the dependency diagram to meet the 3NF requirements to the greatest practical extent possible. If you believe that practical considerations dictate using a 2NF structure, explain why your decision to retain 2NF is appropriate. If necessary, add or modify attributes to create appropriate determinants and to adhere to the naming conventions.

587

588

FIGURE P6.4b The Normalized Dependency Diagrams for Problem 4b

STU_NUM STU_LNAME STU_MAJOR DEPT_CODE ADV_NUM STU_CLASS STU_GPA STU_HRS

Transitive Dependency

Note: If several advisors share a phone, the ADV_PHONE is not a determinant of the other advisor attributes.

589

Using the results of Problem 4, draw the Crow’s Foot ERD.

FIGURE P6.4c The College ERD

As you examine the ER diagrams in Figure P6.4c, note that we have made several assumptions that cannot be inferred directly from the dependency diagram in Problem 4b. For example: 

Apparently, some buildings do not house advisors. Some buildings may be used for

590

storage, others for classrooms, and so on. 

When a student is assigned to a department, that department must assign an advisor to that student. That is, a student must have an advisor. Therefore, ADVISOR is mandatory to STUDENT.



Some departments do not offer majors. For example, a department may offer service courses only.



All departments must be affiliated with a college.



Notice also the there is a new relationship, a DEPARTMENT employs zero or many ADVISORs.



STUDENT is optional to MAJOR. This optionality, too, is desirable from an operational point of view. For example, new majors may not (yet) have attracted students.

591

639. To keep track of office furniture, computers, printers, and other office equipment, the FOUNDIT company uses the table structure shown in Table P6.5. Answer:

Table P6.5 Attribute Name

Sample Value

ITEM_ID

231134-678

342245-225

254668-449

ITEM_LABEL

HP DeskJet 895Cse

HP Toner

DT Scanner

ROOM_NUMBER

325

123

BLDG_CODE

NTC

CSF

BLDG_NAME

Nottooclear

Canseefar

BLDG_MANAGER

I. B. Rightonit

May B. Next

d. Given that information, write the relational schema and draw the dependency diagram. Make sure that you label the transitive and/or partial dependencies. Answer: The answers to this problem are shown in Figure P6.5a and the relational schema definition below the figure. Notice also the change in naming convention in some of the attributes. It is important to understand that sometimes the designer makes deductions based on the data as presented. However, such deductions may change as he/she learns more about the data and processes. For example, does the room number determine the building name? In this case, the answer depends on many factors. Can you determine the correct location of the item based on just the room number? Or do you also need the building code? These are the type of questions the designer must ask and adapt the model to the answer. The purpose is to build a model that is flexible enough to represent real-world data interactions. For this example, we will initially assume that the room number determines the building code and name.

FIGURE P6.5a The FOUNDIT Co. Initial Dependency Diagram

592

e. Write the relational schema and create a set of dependency diagrams that meet 3NF requirements. Rename attributes to meet the naming conventions and create new entities and attributes as necessary. Answer: The dependency diagrams are shown in Figure P6.5b. We have added a sample relational diagram to illustrate the relationships at this point. The relational schemas are written below in Figure 6.5b.

FIGURE P6.5b FOUNDIT Co. 3NF and Its Relational Diagram

The relational schemas are written as follows: ITEM (ITEM_ID, ITEM_DESCRIPTION, ROOM_NUMBER, BLDG_CODE) BUILDING (BLDG_CODE, BLDG_NAME, EMP_NUM) EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL)

593

Draw the Crow’s Foot ERD. Answer: Use Figure P6.5c to show that, in this case, the ER diagram reflects the business rule that one employee can manage many (or at least more than one) buildings. Because all employees are not required to manage buildings, BUILDING is optional to EMPLOYEE in the manages relationship. Once again, the nature of this relationship is not and cannot be reflected in the dependency diagram.

594

FIGURE P6.5c The FOUNDIT Co. ERD

As you examine Figure P6.5c, note that an EMPLOYEE can manage zero to many BUILDINGs. A BUILDING contains many ROOMs. Each ROOM is located in a single building. Therefore, you can expand the design shown in Figure P6.5b to the one shown in Figure P6.5c. This solution assumes that a room is directly traceable to a building. For example, room SC-508 would be located in the Science (SC) Building and room BA-305 would be located in the Business Administration (BA) building. A ROOM can store zero or many ITEMs. Although optional participations make excellent default conditions, it is always wise to establish the optionality based on a business rule. In any case, the designer must ask about the nature of the room/building relationship. 640. The table structure shown in Table P6.6 contains many unsatisfactory components and characteristics. For example, there are several multivalued attributes, naming conventions are violated, and some attributes are not atomic. Answer:

595

Table P6.6 Attribute Name

Sample Value

EMP_NUM

1003

1018

1019

1023

EMP_LNAME

Willaker

Smith

McGuire

EMP_EDUCATION

BBA, MBA

BBA

JOB_CLASS

SLS

EMP_DEPENDENTS

Gerald (spouse), Mary (daughter), John (son)

DEPT_CODE

MKTG

DEPT_NAME

BS, MS, Ph.D. JNT

DBA

JoAnne (spouse)

George (spouse) Jill (daughter)

MKTG

SVC

INFS

Marketing

General Service

Info. Systems

DEPT_MANAGER

Jill H. Martin

Hank B. Jones

Carlos Ortez

EMP_TITLE

Sales Agent

Janitor

DB Admin

EMP_DOB

23-Dec-1968

28-Mar-1979

18-May-1982

20-Jul-1959

EMP_HIRE_DATE

14-Oct-1997

15-Jan-2006

21-Apr-2003

15-Jul-1999

EMP_TRAINING

L1, L2

L1, L3, L8, L15

EMP_BASE_SALARY

$38,255.00

$30,500.00

$19,750.00

$127,900.00

EMP_COMMISSION_RATE

0.015

0.010

e. Given the structure shown in Table P6.6, write the relational schema and draw its dependency diagram. Label all transitive and/or partial dependencies. Answer: The dependency diagram is shown in Figure P6.6a. Note that the order of the attributes has been changed to make the transitive dependencies easier to mark. (In any case, the order in which the attributes are written into a relational database table is immaterial.) The relational schema is written below in Figure P6.6a. Please note the change of name of the attribute EMP_NUM to EMP_CODE to illustrate that the employee identification could include character and numeric values. In addition, EMP_TITLE was changed to JOB_TITLE to indicate that the position determines the title.

596

FIGURE P6.6a The Dependency Diagram for Problem 6a

Draw the dependency diagrams that are in 3NF. (Hint: You might have to create a few new attributes. Also make sure that the new dependency diagrams contain attributes that meet proper design criteria; that is, make sure there are no multivalued attributes, that the naming conventions are met, and so on.) Answer: Dependency diagrams have no way to indicate multivalued attributes, nor do they provide the means through which such attributes can be handled. Therefore, the solution to this problem requires a basic knowledge of modeling concepts, once again indicating that normalization and design are part of the same process. Given the sample data shown in Problem 6, EDUCATION, DEPENDENT, and TRAINING are multivalued attributes whose values are stored as comma-separated string values. We have created the appropriate entities to avoid the use of multivalued attributes. (See Figure P6.6b.)

597

FIGURE P6.6b The Dependency Diagrams for Problem 6b

598

The relational schemas are written as: EMPLOYEE(EMP_CODE, EMP_LNAME, DEPT_CODE, JOB_CLASS, EMP_DOB, EMP_HIRE_DATE, EMP_BASE_SALARY, EMP_COMMISSION_RATE) DEPENDENT(EMP_CODE, DEP_NUM, DEP_FNAME, DEP_TYPE) EDUCATION(EDU_CODE, EDU_DESCRIPTION) EMPEDU(EMP_CODE, EDU_CODE, DATE_EARNED) TRAINING(TRN_CODE, TRN_DESCRIPTION) EMPTRN(EMP_CODE, TRN_CODE, DATE_EARNED) DEPARTMENT(DEPT_CODE, DEPT_NAME, EMP_CODE) JOB(JOB_CLASS, JOB_TITLE) g. Draw the relational diagram. Answer: The relational diagram is shown in Figure P6.6c.

FIGURE P6.6c The Relational Diagram for Problem 6c

h. Draw the Crow’s Foot ERD. Answer: The Crow’s Foot solution is shown in Figure P6.6d.

599

FIGURE P6.6d The Crow’s Foot ERD for Problem 6d

641. Suppose you are given the following business rules to form the basis for a database design. The database must enable the manager of a company dinner club to mail invitations to the club’s members, to plan the meals, to keep track of who attends the dinners, and so on. 

Each dinner serves many members, and each member may attend many dinners.



A member receives many invitations, and each invitation is mailed to many members.



Answer: Because the manager is not a database expert, the first attempt at creating the database uses the structure shown in Table P6.7.

Table P6.7 Attribute Name

Sample Value

MEMBER_NUM

214

235

214

MEMBER_NAME

Alice VanderVoort

MEMBER_ADDRESS

325 Meadow Park

123 Rose Court

325 Meadow Park

MEMBER_CITY

Murkywater

Highlight

Murkywater

MEMBER_ZIPCODE

12345

12349

12345

INVITE_NUM

B. Gerald M. Gallega

Alice VanderVoort

600

Attribute Name

Sample Value

INVITE_DATE

23-Feb-2022

12-Mar-2022

23-Feb-2022

ACCEPT_DATE

27-Feb-2022

15-Mar-2022

27-Feb-2022

DINNER_DATE

15-Mar-2022

17-Mar-2022

15-Mar-2022

DINNER_ATTENDED

Yes

DINNER_CODE

DI5

DI2

DINNER_DESCRIPTION

Glowing sea delight

Glowing delight

ENTREE_CODE

EN3

EN5

ENTREE_DESCRIPTION

Stuffed crab

Marinated steak

DESSERT_CODE

DE8

DE5

DE2

sea Ranch Superb

DESSERT_DESCRIPTION Chocolate mousse Cherries jubilee with raspberry sauce

Apple pie honey crust

with

c. Given the table structure illustrated in Table P6.7, write the relational schema and draw its dependency diagram. Label all transitive and/or partial dependencies. (Hint: This structure uses a composite primary key.) Answer: The last sentence of the problem indicates that the manager, who knows nothing about database design, attempted to use a composite key with the structure that was created. As such, the relational schema may be written as follows: MEMBER(MEMBER_NUM, INVITE_NUM, MEMBER_NAME, MEMBER_ADDRESS, MEMBER_CITY, MEMBER_ZIP_CODE, INVITE_DATE, ACCEPT_DATE, DINNER_DATE, DINNER_ATTENDED, DINNER_CODE, ENTRÉE_CODE, ENTRÉE_DESCRIPTION, DESSERT_CODE, DESSERT_DESCRIPTION) However, based on the data shown, we can see that each invitation has a unique identifier. As discussed in Chapter 3, the manager’s selected composite primary key is not acceptable because it is not a candidate key. If an attribute in a composite superkey can be removed and the remaining attributes still form a superkey, then the original composite key was not a candidate key. In this case, the manager’s choice of (MEMBER_NUM + INVITE_NUM) was not a candidate key because if we remove MEMBER_NUM from the composite, the remaining attribute INVITE_NUM is still a superkey. Improving the primary key selection based on this analysis of the keys, we can improve the design to 1NF. We can see that each invitation is to a specific user and for a specific dinner. So, the new relational schema will be:

601

FIGURE P6.7a The Dependency Diagram for Problem 7a

d. Break up the dependency diagram you drew in Problem 7a to produce dependency diagrams that are in 3NF and write the relational schema. (Hint: You might have to create a few new attributes. Also, make sure that the new dependency diagrams contain attributes that meet proper design criteria; that is, make sure there are no multivalued attributes, that the naming conventions are met, and so on.) Answer: Actually, there is no way to prevent the existence of multivalued attributes by merely following normalization rules. Instead, knowledge of ER modeling concepts will help define the environment in which the multivalued attributes are dealt with. Although we keep repeating the message, it is worth repeating: normalization and modeling fit within the same design spectrum and they take place concurrently as the definition of entities and their attributes take place.

602

The design process can be described thus: 

Define entities, attributes, and relationships and model them.



Normalize.



Redesign based on the normalization outcomes and the evaluation of the design’s ability to meet transaction and information requirements.



Normalize the results and evaluate the normal forms until the process has yielded a stable design, implementation, and applications development environment.

FIGURE P6.7b The Dependency Diagram for Problem 7b

603

604

FIGURE P6.7c The Crow’s Foot ERD for Problem 7c

642. Use the dependency diagram shown in Figure P6.8 to work the following problems. Answer:

FIGURE P6.8 Initial Dependency Diagram for Problem 8

d. Break up the dependency diagram shown in Figure P6.8 to create two new dependency diagrams: one in 3NF and one in 2NF. Answer: The dependency diagrams are shown in Figure P6.8a.

605

FIGURE P6.8a The Dependency Diagram for Problem 8a

e. Modify the dependency diagrams you created in Problem 8a to produce a set of dependency diagrams that are in 3NF. (Hint: One of your dependency diagrams should be in 3NF but not in BCNF.) Answer: The solution is shown in Figure P6.8b.

FIGURE P6.8b The Dependency Diagram for Problem 8b

Modify the dependency diagrams you created in Problem 8b to produce a collection of dependency diagrams that are in 3NF and BCNF. Answer: The solution is shown in Figure P6.8c. Note that the A, C, and E attributes in the first three structures can be used as foreign keys in the fourth structure.

FIGURE P6.8c The Dependency Diagrams for Problem 8c

606

643. Suppose you have been given the table structure and data shown in Table P6.9, which was imported from an Excel spreadsheet. The data reflect that a professor can have multiple advisees, can serve on multiple committees, and can edit more than one journal. Answer:

Table P6.9 Attribute Name

Sample Value

EMP_NUM

123

104

118

120

PROF_RANK

Professor

Asst. Professor

Assoc. Professor

EMP_NAME

Ghee

Rankin

Ortega

Smith

DEPT_CODE

CIS

CHEM

CIS

ENG

DEPT_NAME

Computer Systems

Chemistry

Computer Systems

PROF_OFFICE

KDD-567

BLF-119

KDD-562

PRT-345

ADVISEE

1215, 2312, 3233, 2218, 2098

3102, 2782, 3311, 2008, 2876, 2222, 3745, 1783, 2378

2134, 2789, 3456, 2002, 2046, 2018, 2764

2873, 2765, 2238, 2901, 2308

COMMITTEE_CODE

PROMO, TRAF APPL, DEV

DEV

SPR, TRAF

PROMO, DEV

JOURNAL_CODE

JMIS, JMGT

Info.

QED,

Info.

English

SPR

JCIS, JMGT

Given the information in Table 6.9: f.

Draw the dependency diagram. Answer: The dependency diagram is shown in Figure P6.9a.

607

FIGURE P6.9a The Dependency Diagram for Problem 9a

NOTE The assumption that PROF_OFFICE  EMP_CODE is a rather restrictive one, because it would mean that professors cannot share an office. One could safely assume that administrators at all levels would not care to be tied by such a restrictive office assignment requirement. Therefore, we will remove this restriction in the remaining problem solutions. Also, note that there is no reliable way to identify the effect of multivalued attributes on the dependencies. For example, EMP_NUM = 123 could identify any one of five advisees. Therefore, knowing the EMP_NUM does not identify a specific ADVISEE value. The same is true for the COMMITTEE_CODE and JOURNAL_CODE attributes. Therefore, these attributes are not marked with a solid arrow line. However, if you know that EMP_NUM = 123, you will also know all five advisees, all four committee codes, and all three journal codes for that employee number value. But you do not have a unique identification for each of those attribute values. Therefore, you cannot conclude that EMP_NUM  ADVISEE, nor can you conclude that EMP_NUM  COMMITTEE_CODE or that EMP_NUM  JOURNAL_CODE. g. Identify the multivalued dependencies. Answer: Table P6.9 shows several professor attributes—ADVISEE, COMMITTEE_CODE, and JOURNAL_CODE—that represent multivalued dependencies.

608

h. Create the dependency diagrams to yield a set of table structures in 3NF. Answer: The dependency diagrams are shown in Figure P6.9c. Note that we have assumed that it is possible that professors can share an office.

FIGURE P6.9c The Dependency Diagram for Problem 9c

Eliminate the multivalued dependencies by converting the affected table structures to 4NF. Answer: The structures shown in Figure P6.9d conform to the 4NF requirement. Yet this normalization does not yield a viable database design. Here is another opportunity to stress that normalization without data modeling is a poor way to generate useful databases. (Note that we have assumed that an advisee can have only one advisor, but that an advisor can have many advisees.)

609

FIGURE P6.9d The Initial Dependency Diagrams for Problem 9d

610

Table P6.9d1 Implementation of the M:N Relationship between EMP_NUM and COMMITTEE_CODE EMP_NUM

COMMITTEE_CODE

123

PROMO

123

TRAF

123

APPL

123

DEV

104

DEV

118

SPR

118

TRAF

120

PROMO

120

SPR

120

DEV

The PK of Table P6.9d1 is EMP_NUM + COMMITTEE_CODE.

Table P6.9d2 Implementation of the M:N Relationship between EMP_NUM and JOURNAL_CODE EMP_NUM

JOURNAL_CODE

123

JMIS

123

QED

123

JMGT

118

JCIS

118

JMGT

611

dependency diagrams to check for data redundancies! Figure P6.9e shows a more practical solution to the problem and its structures all conform to the normalization requirements. j.

Draw the Crow’s Foot ERD to reflect the dependency diagrams you drew in Problem 9c. (Note: You might have to create additional attributes to define the proper PKs and FKs. Make sure that all of your attributes conform to the naming conventions.) Answer: Given the discussion in the previous problem segment d, we have incorporated additional features in the Crow’s Foot ERD shown in Figure P6.9e. Note that we have eliminated the M:N relationships in this design by creating composite entities as well as renaming JOURNAL_CODE to JOURNAL_ID. This design is implementable and it meets design standards. Normalization was part of the process that led to this solution, but it was only a part of that solution. Normalization does not replace design!

FIGURE P6.9e The Crow’s Foot ERD for Problem 9e

612

644. The manager of a consulting firm has asked you to evaluate a database that contains the table structure shown in Table P6.10. Answer:

Table P6.10 Attribute Name

Sample Value

Sample value

Sample Value

CLIENT_NUM

298

289

CLIENT_NAME

Marianne R. Brown

James D. Smith

CLIENT_REGION

Midwest

Southeast

CONTRACT_DATE

10-Feb-2022

15-Feb-2022

12-Mar-2022

CONTRACT_NUMBER

5841

5842

5843

CONTRACT_AMOUNT

$2,985,000.00

$670,300.00

$1,250,000.00

CONSULT_CLASS_1

Database Administration

Internet Services

Database Design

CONSULT_CLASS_2

Web Applications

Database Administration

CONSULT_CLASS_3

Network Installation

CONSULT_CLASS_4 CONSULTANT_NUM_1

CONSULTANT_NAME_1

Rachel G. Carson

Gerald Ricardo

CONSULTANT_REGION_1

Midwest

Southeast

CONSULTANT_NUM_2

CONSULTANT_NAME_2

Karl M. Spenser

Anne T. Dimarco

Gerald K. Ricardo

CONSULTANT_REGION_2

Midwest

Southeast

CONSULTANT_NUM_3

CONSULTANT_NAME_3

Julian H. Donatello

Geraldo J. Rivera

CONSULTANT_REGION_3

Midwest

Southeast

CONSULTANT_NUM_4

CONSULTANT_NAME_4

Donald Chen

CONSULTANT_REGION_4

West

25 K.

Angela M. Jamison

613

Each client is located in one region



A region can contain many clients.



Each consultant can work on many contracts



Each contract might require the services of many consultants.



A client can sign more than one contract, but each contract is signed by only one client.



Each contract might cover multiple consulting classifications. For example, a contract may list consulting services in database design and networking.



Each consultant is located in one region.



A region can contain many consultants.



Each consultant has one or more areas of expertise (class). For example, a consultant might be classified as an expert in both database design and networking.



Each area of expertise (class) can have many consultants. For example, the consulting company might employ many consultants who are networking experts.

d. Given this brief description of the requirements and the business rules, write the relational schema and draw the dependency diagram for the preceding (and very poor) table structure. Label all transitive and/or partial dependencies. Answer: One of the first steps when working with data is to determine the scope of what you are modeling, and what exactly you are trying to model. When the problem is large and contains many pieces of data, you may want to break it down into smaller parts that are easier to model. In our example, looking at the data you have in this problem, you have clients, consultants, regions, and contracts. The main entity here is the contract that binds clients with consultants providing an expertise. Here is a perfect illustration of the value of business rules. If the business rules had not been available, the sample record would produce ambiguities. For example, if you only look at the sample data in the one available record, defining the relationships between client, contract, consultant, region, and expertise would have been difficult, at best. The business rules augment the original data and their use removes the ambiguities. The business rules help establish that a client can sign more than one contract, so you need more than the client number to identify the remaining attributes. Also, the same client can sign multiple contracts on the same date or on different dates, using the same set of consultants for each contract or a different set of consultants for each contract. Remember also that the consultants have more than one area of expertise, so the same consultant may work on different contracts for the same client or for different clients.

614

FIGURE P6.10a The ConsultCo Dependency Diagram

The relational schema is written as follows: Note that the PK is the first listed attribute; you can write the relational schema this way: CONTRACT(CONTRACT, CLIENT_NUM, CLIENT_NAME, DATE, CLASS_1, CLASS_2, CLASS_3, CLASS_4, REGION, CONS_NUM_1, CONS_NAME_1, REGION1 CONS_NUM_2, CONS_NAME_2, REGION2, CONS_NUM_3, CONS_NAME_3, REGION3, CONS_NUM_4, CONS_NAME_4, REGION4) In any case, remind your students that the order in which the attributes are listed is immaterial in a relational database environment. e. Break up the dependency diagram you drew in Problem 10a to produce dependency diagrams that are in 3NF and write the relational schema. (Hint: You might have to create a few new attributes. Also make sure that the new dependency diagrams contain attributes that meet proper design criteria; that is, make sure there are no multivalued attributes, that the naming conventions are met, and so on.)

615

Answer: Remind the student, one more time, to use the business rules to discover the true nature of the data relationships. For example, based on the business rules, we can conclude: 

A contract has one customer, but a customer can have many contracts.



A contract requires many expertise classes, and a expertise class can appear in many contracts.



A contract requires many consultants, and a consultant can be assigned to many contracts.



A client has one region, but a region has many clients.



A consultant has one region, but a region has many clients.



A consultant can have many expertise and an expertise can belong to many consultants.

This clearly shows the following relationships: 

CLIENT (1) – (M) CONTRACT



CONTRACT (M) – (M) CLASS (expertise classes)



CONTRACT (M) – (M) CONSULTANT



REGION (1) – (M) COSULTANT



REGION (1) – (M) CLIENT



CONSULTANT (M) – (M) CLASS (expertise classes)

616

FIGURE P6.10b The ConsultCo Dependency Diagrams in 3NF

617

Using the results of Problem 10b, draw the Crow’s Foot ERD. Answer: The final ERD is shown in Figure P6.10c. Notice that we have added some additional attributes to some of the entities. It is important to explain the role of the ASSIGNMENT entity as a multipurpose entity. 



618

FIGURE P6.10c The ConsultCo ERD for Problem 10c

619

645. Given the sample records in the CHARTER table shown in Table P6.11, do the following: Answer:

Table P6.11 Attribute Name

Sample Value

CHAR_TRIP

10232

10233

10234

10235

CHAR_DATE

15-Jan-2022

16-Jan-2022

17-Jan-2022

CHAR_CITY

STL

MIA

TYS

ATL

CHAR_MILES

580

1,290

524

768

CUST_NUM

784

231

544

784

CUST_LNAME

Brown

Hanson

Bryana

Brown

CHAR_PAX

CHAR_CARGO

235 lbs.

18,940 lbs.

348 lbs.

155 lbs.

PILOT

Melton

Chen

Henderson

Melton

COPILOT

Henderson

Melton

FLT_ENGINEER

O’Shaski

LOAD_MASTER

Benkasi

AC_NUMBER

1234Q

3456Y

1234Q

2256W

MODEL_CODE

PA31-350

CV-580

PA31-350

MODEL_SEATS

MODEL_CHG_MILE

$2.79

$23.36

$2.79

b. Write the relational schema and draw the dependency diagram for the table structure. Make sure that you label all dependencies. CHAR_PAX indicates the number of passengers carried. The CHAR_MILES entry is based on round-trip miles, including pickup points. (Hint: Look at the data values to determine the nature of the relationships. For example, note that employee Melton has flown two charter trips as pilot and one trip as copilot.) Answer: The dependency diagram is shown in Figure P6.11a. Please note we abbreviated the last three attribute names.

620

FIGURE P6.11a The Dependency Diagram for Problem 11a

The relational schema is written as follows: CHARTER(CHAR_TRIP, CHAR_DATE, CHAR_CITY, CHAR_MILES, CUST_NUM, CUST_LNAME, CHAR_PAX, CHAR_CARGO, PILOT, COPILOT, FLT_ENGINEER, LOAD_MASTER, AC_NUMBER, MOD_CODE, MOD_SEATS, MOD_CHG_MILE) d. Decompose the dependency diagram you drew to solve Problem 11a to create table structures that are in 3NF and write the relational schema. Answer: The normalized dependency diagram is shown in Figure P6.11b. (Note the addition of MOD_CODE in the AIRCRAFT table to serve as the AIRCRAFT table’s FK to MODEL.)

621

FIGURE P6.11b The Normalized Dependency Diagram for Problem 11b

e. Draw the Crow’s Foot ERD to reflect the properly decomposed dependency diagrams you created in Problem 11b. Make sure the ERD yields a database that can track all of the data shown in Problem 11. Show all entities, relationships, connectivities, optionalities, and cardinalities. Answer: The initial Crow’s Foot ERD is shown in Figure P6.11c.

622

FIGURE P6.11c The Initial Crow’s Foot ERD for Problem 11c

While the ERD shown in Figure P6.11c faithfully reflects the results generated by the normalization process, it has a major design flaw. This flaw has the following consequences: 



The inclusion of COPILOT, FLT_ENGINEER, and LOAD_MASTER also produce synonyms in the CHARTER table.



623

FIGURE P6.11d The Final Crow’s Foot ERD for Problem 11c

624

FIGURE P6.11e Sample Charter Record

ANSWERS TO REVIEW QUESTIONS 646. Explain why it would be preferable to use a DATE data type to store date data instead of a character data type. Answer: The DATE data type uses numeric values based on the Julian calendar to store dates. This makes date arithmetic such as adding and subtracting days or fractions of days possible (as well as numerous special date-oriented functions discussed in the next chapter). 647. Explain why the following command would create an error and what changes could be made to fix the error.

625

Answer: SELECT V_CODE, SUM(P_QOH) FROM PRODUCT; The command would generate an error because an aggregate function is applied to the P_QOH attribute but V_CODE is neither in an aggregate function nor in a GROUP BY clause. This can be fixed by either (1) placing V_CODE in an appropriate aggregate function based on the data that is being requested by the user, (2) adding a GROUP BY clause to group by values of V_CODE (i.e., GROUP BY V_CODE), (3) removing the V_CODE attribute from the SELECT clause, or (4) removing the Sum aggregate function from P_QOH. Which of these solutions is most appropriate depends on the question that the query was intended to answer? 648. What is a cross join? Give an example of its syntax. Answer: A CROSS JOIN is identical to the PRODUCT relational operator. The CROSS JOIN is also known as the Cartesian product of two tables. For example, if you have two tables, AGENT, with 10 rows, and CUSTOMER, with 21 rows, the CROSS JOIN resulting set will have 210 rows and will include all of the columns from both tables. Syntax examples are: SELECT * FROM CUSTOMER CROSS JOIN AGENT; or SELECT * FROM CUSTOMER, AGENT; If you do not specify a join condition when joining tables, the result will be a CROSS JOIN or PRODUCT operation. 649. What three join types are included in the outer join classification? Answer: An OUTER JOIN is a type of JOIN operation that yields all rows with matching values in the join columns as well as unmatched rows. (Unmatched rows are those without matching values in the join columns.) The SQL standard prescribes three different types of join operations: LEFT [OUTER] JOIN RIGHT [OUTER] JOIN FULL [OUTER] JOIN The LEFT [OUTER] JOIN will yield all rows with matching values in the join columns, plus all of the unmatched rows from the left table. (The left table is the first table named in the FROM clause.) The RIGHT [OUTER] JOIN will yield all rows with matching values in the join columns, plus all of the unmatched rows from the right table. (The right table is the second table named in the FROM clause.) The FULL [OUTER] JOIN will yield all rows with matching values in the join columns, plus all the unmatched rows from both tables named in the FROM clause. 650. Using tables named T1 and T2, write a query example for each of the three join types you described in Question 4. Assume that T1 and T2 share a common column named C1. Answer: LEFT OUTER JOIN example: SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.C1 = T2.C1; RIGHT OUTER JOIN example: © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

626

SELECT * FROM T1 RIGHT OUTER JOIN T2 ON T1.C1 = T2.C1; FULL OUTER JOIN example: SELECT * FROM T1 FULL OUTER JOIN T2 ON T1.C1 = T2.C1; 651. What is a recursive join? Answer: A recursive join is a join in which a table is joined to itself. 652. Rewrite the following WHERE clause without the use of the IN special operator: WHERE V_STATE IN (‘TN’, ‘FL’, ‘GA’) Answer: WHERE V_STATE = ‘TN’ OR V_STATE = ‘FL’ OR V_STATE = ‘GA’ Notice that each criterion must be complete (i.e., attribute-operator-value). 653. Explain the difference between an ORDER BY clause and a GROUP BY clause. Answer: An ORDER BY clause has no impact on which rows are returned by the query; it simply sorts those rows into the specified order. A GROUP BY clause does impact the rows that are returned by the query. A GROUP BY clause gathers rows into collections that can be acted on by aggregate functions.

627

654. Explain why the following two commands produce different results: SELECT DISTINCT COUNT (V_CODE) FROM PRODUCT; SELECT COUNT (DISTINCT V_CODE) FROM PRODUCT; Answer: The difference is in the order of operations. The first command executes the Count function to count the number of values in V_CODE (say the count returns “14” for example) including duplicate values, and then the Distinct keyword only allows one count of that value to be displayed (only one row with the value “14” appears as the result). The second command applies the Distinct keyword to the V_CODEs before the count is taken, so only unique values are counted. 655. What is the difference between the COUNT aggregate function and the SUM aggregate function? Answer: COUNT returns the number of values without regard to what the values are. SUM adds the values together and can only be applied to numeric values. 656. In a SELECT query, what is the difference between a WHERE clause and a HAVING clause? Answer: Both a WHERE clause and a HAVING clause can be used to eliminate rows from the results of a query. The differences are (1) the WHERE clause eliminates rows before any grouping for aggregate functions occurs, while the HAVING clause eliminates groups after the grouping has been done, and (2) the WHERE clause cannot contain an aggregate function, but the HAVING clause can. 657. What is a subquery, and what are its basic characteristics? Answer: A subquery is a query (expressed as a SELECT statement) that is located inside another query. The first SQL statement is known as the outer query, and the second is known as the inner query or subquery. The inner query or subquery is normally executed first. The output of the inner query is used as the input for the outer query. A subquery is normally expressed inside parentheses and can return zero, one, or more rows and each row can have one or more columns. A subquery can appear in many places in a SQL statement: 

as part of a FROM clause,



to the right of a WHERE conditional expression,



to the right of the IN clause,



in an EXISTS operator,



to the right of a HAVING clause conditional operator,



in the attribute list of a SELECT clause.

Examples of subqueries are: INSERT INTO PRODUCT SELECT * FROM P; DELETE FROM PRODUCT WHERE V_CODE IN (SELECT V_CODE FROM VENDOR WHERE V_AREACODE = ‘615’);

628

SELECT

V_CODE, V_NAME

FROM

VENDOR

WHERE

V_CODE NOT IN (SELECT V_CODE FROM PRODUCT);

658. What are the three types of results that a subquery can return? Answer: A subquery can return (1) a single value (one row, one column), (2) a list of values (many rows, one column), or (3) a virtual table (many rows, many columns). 659. What is a correlated subquery? Give an example. Answer: A correlated subquery is subquery that executes once for each row in the outer query. This process is similar to the typical nested loop in a programming language. Contrast this type of subquery to the typical subquery that will execute the innermost subquery first, and then the next outer query … until the last outer query is executed. That is, the typical subquery will execute in serial order, one after another, starting with the innermost subquery. In contrast, a correlated subquery will run the outer query first, and then it will run the inner subquery once for each row returned in the outer subquery. For example, the following subquery will list all the product line sales in which the “units sold” value is greater than the “average units sold” value for that product (as opposed to the average for all products.) SELECT

INV_NUMBER, P_CODE, LINE_UNITS

FROM

LINE LS

WHERE

LS.LINE_UNITS > (SELECT WHERE

AVG(LINE_UNITS) FROM LINE LA LA.P_CODE = LS.P_CODE);

The previous nested query will execute the inner subquery once to compute the average sold units for each product code returned by the outer query. 660. Explain the difference between a regular subquery and a correlated subquery. Answer: A regular, or uncorrelated, subquery executes before the outer query. It executes only once and the result is held for use by the outer query. A correlated subquery relies in part on the outer query, usually through a WHERE criterion in the subquery that references an attribute in the outer query. Therefore, a correlated subquery will execute once for each row evaluated by the outer query; and the correlated subquery can potentially produce a different result for each row in the outer query. 661. What does it mean to say that SQL operators are set-oriented? Answer: The description of SQL operators as set-oriented means that the commands work over entire tables at a time, not row-by-row.

629

662. The relational set operators UNION, INTERSECT, and EXCEPT (MINUS) work properly only when the relations are union-compatible. What does union-compatible mean, and how would you check for this condition? Answer: Union-compatible means that the relations yield attributes with identical names and compatible data types. That is, the relation A(c1,c2,c3) and the relation B(c1,c2,c3) have union-compatibility if both relations have the same number of attributes, and corresponding attributes in the relations have “compatible” data types. Compatible data types do not require that the attributes be exactly identical—only that they are comparable. For example, VARCHAR(15) and CHAR(15) are comparable, as are NUMBER (3,0) and INTEGER, and so on. Note that this is a practical definition of union-compatibility, which is different than the theoretical definition discussed in Chapter 3. From a theoretical perspective, corresponding attributes must have the same domain. However, the DBMS does not understand the meaning of the business domain, so it must work with a more concrete understanding of the data in the corresponding columns. Thus, it only considers the data types. 663. What is the difference between UNION and UNION ALL? Write the syntax for each. Answer: UNION yields unique rows. In other words, UNION eliminates duplicates rows. On the other hand, a UNION ALL operator will yield all rows of both relations, including duplicates. Notice that for two rows to be duplicated, they must have the same values in all columns. To illustrate the difference between UNION and UNION ALL, let’s assume two relations: A (ID, Name) with rows (1, Lake, 2, River, and 3, Ocean) and B (ID, Name) with rows (1, River, 2, Lake, and 3, Ocean). Given this description, SELECT * FROM A UNION SELECT * FROM B will yield: ID Name 4. Lake 5. River 6. Ocean 3. River 4. Lake while SELECT * FROM A UNION ALL

630

SELECT * FROM B will yield: ID Name 4. Lake 5. River 6. Ocean 4. River 5. Lake 6. Ocean 664. Suppose you have two tables: EMPLOYEE and EMPLOYEE_1. The EMPLOYEE table contains the records for three employees: Alice Cordoza, John Cretchakov, and Anne McDonald. The EMPLOYEE_1 table contains the records for employees John Cretchakov and Mary Chen. Given that information, list the query output for the UNION query. Answer: The query output will be: Alice Cordoza John Cretchakov Anne McDonald Mary Chen 665. Given the employee information in Question 19, list the query output for the UNION ALL query. Answer: The query output will be: Alice Cordoza John Cretchakov Anne McDonald John Cretchakov Mary Chen 666. Given the employee information in Question 19, list the query output for the INTERSECT query. Answer: The query output will be: John Cretchakov

631

667. Given the employee information in Question 19, list the query output for the EXCEPT (MINUS) query of EMPLOYEE to EMPLOYEE_1. Answer: This question can yield two different answers. If you use SELECT * FROM EMPLOYEE MINUS SELECT * FROM EMPLOYEE_1 the answer is Alice Cordoza Anne McDonald If you use SELECT * FROM EMPLOYEE_1 MINUS SELECT * FROM EMPLOYEE the answer is Mary Chen 668. Suppose a PRODUCT table contains two attributes, PROD_CODE and VEND_CODE. Those two attributes have values of ABC, 125, DEF, 124, GHI, 124, and JKL, 123, respectively. The VENDOR table contains a single attribute, VEND_CODE, with values 123, 124, 125, and 126, respectively. (The VEND_CODE attribute in the PRODUCT table is a foreign key to the VEND_CODE in the VENDOR table.) Given that information, what would be the query output for: Because the common attribute is V_CODE, the output will only show the V_CODE values generated by each query. Answer: e. A UNION query based on the two tables? 125,124,123,126 f.

A UNION ALL query based on the two tables? 125,124,124,123,123,124,125,126

g. An INTERSECT query based on the two tables? 123,124,125 h. An EXCEPT (MINUS) query based on the two tables? If you use PRODUCT MINUS VENDOR, the output will be NULL. If you use VENDOR MINUS PRODUCT, the output will be 126.

632

669. Why does the order of the operands (tables) matter in an EXCEPT (MINUS) query but not in a UNION query? Answer: MINUS queries are analogous to algebraic subtraction—it results in the value that existed in the first operand that is not in the second operand. UNION queries are analogous to algebraic addition—it results in a combination of the two operands. (These analogies are not perfect, obviously, but they are helpful when learning the basics.) Addition and UNION have the commutative property (a + b = b + a), while subtraction and MINUS do not (a – b ≠ b – a). 670. What MS Access and SQL Server function should you use to calculate the number of days between your birth date and the current date? Answer: In MS Access, the DATE() function would be used. In MS SQL Server, the GETDATE() function would be used. 671. What Oracle function should you use to calculate the number of days between your birth date and the current date? Answer: The SYSDATE keyword can be used to retrieve the current date from the server. By subtracting your birthdate from the current date, using date arithmetic, the number of dates will be returned. Note that in Oracle, the SQL statement requires the use of the FROM clause. In this case, you may use the DUAL table. (The DUAL table is a dummy “virtual” table provided by Oracle for this type of query. The table contains only one row and one column so queries against it can return just one value.) 672. What string function should you use to list the first three characters of a company’s EMP_LNAME values? Give an example using a table named EMPLOYEE. Provide examples for Oracle and SQL Server. Answer: In Oracle, you use the SUBSTR function as illustrated next: SELECT SUBSTR(EMP_LNAME, 1, 3) FROM EMPLOYEE; In SQL Server, you use the SUBSTRING function as shown: SELECT SUBSTRING(EMP_LNAME, 1, 3) FROM EMPLOYEE; 673. What two things must a SQL programmer understand before beginning to craft a SELECT query? Answer: Before crafting a SELECT query, the SQL programmer must (1) understand the data model in which the query will operate, and (2) the problem being solved. Data models are often complex to the point that knowing what data is available, the meaning of that data, and how to transform the data to produce the desired results will require the programmer to become very familiar with the data model before the query can be created. Problem statements that seem clear to users can often be interpreted in many ways, so it is important for the programmer to understand exactly what the user is requesting.

633

Ch07_ProblemSolutions_ORA.txt

MySQL:

Ch07_ProblemSolutions_MySQL.txt

SQL Server:

Ch07_ProblemSolutions_SQL.txt

MS Access:

Ch07_ConstructCo.mdb Ch07_Fact.mdb Ch07_LargeCo.mdb Ch07_SaleCo.mdb

ANSWERS TO REVIEW QUESTIONS 674. What type of integrity is enforced when a primary key is declared? Answer: Creating a primary key constraint enforces entity integrity (i.e., no part of the primary key can contain a null and the primary key values must be unique). 675. Explain why it might be more appropriate to declare an attribute that contains only digits as a character data type instead of a numeric data type. Answer: An attribute that contains only digits may be properly defined as character data when the values are nominal; that is, the values do not have numerical significance but serve only as labels such as ZIP codes and telephone numbers. One easy test is to consider

634

whether or not a leading zero should be retained. For the ZIP code 03133, the leading zero should be retained; therefore, it is appropriate to define it as character data. For the quantity on hand of 120, we would not expect to retain a leading zero such as 0120; therefore, it is appropriate to define the quantity on hand as a numeric data type. 676. What is the difference between a column constraint and a table constraint? Answer: A column constraint can refer to only the attribute with which it is specified. A table constraint can refer to any attributes in the table. 677. What are “referential constraint actions”? Answer: Referential constraint actions, such as ON DELETE CASCADE, are default actions that the DBMS should take when a DML command would result in a referential integrity constraint violation. Without referential constraint actions, DML commands that would result in a violation of referential integrity will fail with an error indicating that the referential integrity constraint cannot be violated. Referential constraint actions can allow the DML command to successfully complete while making the designated changes to the related records to maintain referential integrity. 678. What is the purpose of a CHECK constraint? Answer: A CHECK constraint is used to limit the values that can appear in an attribute. It performs the function of enforcing a domain.

635

679. Explain when an ALTER TABLE command might be needed. Answer: ALTER TABLE is used to modify the structure of an existing table by adding, removing, or modifying column definitions and, in some cases, constraints. Many database structures have long, useful lives in an organization. It is not uncommon for a database to exist in organizational systems for decades. If the existing database structure needs to be modified to accommodate changes in business requirements or the integration of new systems, the existing structure will be modified with ALTER TABLE commands. This preserves the existing data in the table, as opposed to dropping the table and then recreating it. 680. What is the difference between an INSERT command and an UPDATE command? Answer: The INSERT command is used to add a new row to a table. The UPDATE command changes the values in attributes of an existing row. UPDATE will not increase the number of rows in a table, but INSERT will. 681. What is the difference between using a subquery with a CREATE TABLE command and using a subquery with an INSERT command? Answer: Using a subquery with a CREATE TABLE command is a DDL command and will create a new database table. The table will be structured to match the structure of the data returned by the subquery, and the data from the subquery will be placed in the table. Therefore, using a subquery with CREATE TABLE will both create the structure and place data inside that structure. Using a subquery with an INSERT command is a DML command and will add data to an existing table. This operation requires that the target table where the data should be stored must already exist. The programmer must ensure that the structure of the data being returned by the subquery is appropriate in terms of data types and constraints for the structure of the table where the results are to be stored. 682. What is a sequence? Write its syntax. Answer: A sequence is a special type of object that generates unique numeric values in ascending or descending order. You can use a sequence to assign values to a primary key field in a table. A sequence provides functionality similar to the AutoNumber data type in MS Access. For example, both sequences and AutoNumber data types provide unique ascending or descending values. However, there are some subtle differences between the two: 

In MS Access, an AutoNumber is a data type; in Oracle and SQL Server, a sequence is a completely independent object, rather than a data type.



In MS Access, you can only have one AutoNumber per table; in Oracle and SQL Server, you can have as many sequences as you want, and they are not tied to any particular table.



636

The syntax used to create a sequence is: CREATE SEQUENCE CUS_NUM_SEQ START WITH 100 INCREMENT BY 10 NOCACHE; MySQL does not currently support sequences. 683. What is a trigger, and what is its purpose? Give an example. Answer: A trigger is a block of procedural SQL code that is automatically invoked by the DBMS upon the occurrence of a data manipulation event (INSERT, UPDATE, or DELETE). Triggers are always associated with a table and are invoked before or after a data row is inserted, updated, or deleted. A table can have zero, one, or more triggers. Triggers provide a method of enforcing business rules such as: 

A customer making a credit purchase must have an active account.



A student taking a class with a prerequisite must have completed that prerequisite with a B grade.



To be scheduled for a flight, a pilot must have a valid medical certificate and a valid training completion record.

Triggers are also excellent for enforcing data constraints that cannot be directly enforced by the data model. For example, suppose that you must enforce the following business rule: If the quantity on hand of a product falls below the minimum quantity, the P_REORDER attribute must the automatically set to 1. To enforce this business rule, you can create the following TRG_PRODUCT_REORDER trigger in Oracle: CREATE OR REPLACE TRIGGER TRG_PRODUCT_REORDER BEFORE INSERT OR UPDATE OF P_ONHAND, P_MIN ON PRODUCT FOR EACH ROW BEGIN IF :NEW.P_ONHAND <= :NEW.P_MIN THEN NEW.P_REORDER := 1; ELSE :NEW.P_REORDER := 0; END IF; END; 684. What is a stored procedure, and why is it particularly useful? Give an example. Answer: A stored procedure is a named block of procedural SQL and standard SQL statements. One of the major advantages of stored procedures is that they can be used to encapsulate and represent business transactions. For example, you can create a stored procedure to represent a product sale, a credit update, or the addition of a new customer. You can encapsulate SQL statements within a single stored procedure and execute them as a single transaction.

637

There are two clear advantages to the use of stored procedures: 3. Stored procedures substantially reduce network traffic and increase performance. Because the stored procedure is stored at the server, there is no transmission of individual SQL statements over the network. 4. Stored procedures help reduce code duplication through code isolation and code sharing (creating unique procedural modules that are called by application programs), thereby minimizing the chance of errors and the cost of application development and maintenance. For example, the following PRC_LINE_ADD stored procedure will add a new invoice line to the LINE table and it will automatically retrieve the correct price from the PRODUCT table. CREATE OR REPLACE PROCEDURE PRC_LINE_ADD (W_LN IN NUMBER, W_P_CODE IN VARCHAR2, W_LU NUMBER) AS W_LP NUMBER := 0.00; BEGIN -- GET THE PRODUCT PRICE SELECT P_PRICE INTO W_LP FROM PRODUCT WHERE P_CODE = W_P_CODE; -- ADDS THE NEW LINE ROW INSERT INTO LINE VALUES(INV_NUMBER_SEQ.CURRVAL, W_LN, W_P_CODE, W_LU, W_LP); DBMS_OUTPUT.PUT_LINE('Invoice line ' || W_LN || ' added'); END;

638

Ch08_ProblemSolutions_ORA.sql

MySQL:

Ch08_ProblemSolutions_MySQL.sql

SQL Server:

Ch08_ProblemSolutions_SQL.sql

MS Access:

Ch08_AviaCo.mdb Ch08_ConstructCo.mdb Ch08_MovieCo.mdb Ch08_SaleCo.mdb Ch08_SimpleCo.mdb

ANSWERS TO REVIEW QUESTIONS 685. What is an information system? What is its purpose? Answer: An information system is a system that

639



Provides the conditions for data collection, storage, and retrieval



Facilitates the transformation of data into information



Provides management of both data and information

An information system is composed of hardware, software (DBMS and applications), database(s), procedures, and people. Good decisions are generally based on good information. Ultimately, the purpose of an information system is to facilitate good decision making by making relevant and timely information available to the decision makers. 686. How do systems analysis and systems development fit into a discussion about information systems? Answer: Both systems analysis and systems development constitute part of the Systems Development Life Cycle (SDLC). Systems analysis, phase II of the SDLC, establishes the need for and the extent of an information system by 

Establishing end-user requirements



Evaluating the existing system



Developing a logical systems design

640

687. What does the acronym SDLC mean, and what does an SDLC portray? Answer: The acronym SDLC is used to label the System Development Life Cycle. The SDLC traces the history of an information system from its inception to its obsolescence. The SDLC is composed of six phases: planning, analysis, detailed system, design, implementation, and maintenance. 688. What does the acronym DBLC mean, and what does a DBLC portray? Answer: The acronym DBLC is used to label the Database Life Cycle. The DBLC traces the history of a database system from its inception to its obsolescence. Since the database constitutes the core of an information system, the DBLC is concurrent to the SDLC. The DBLC is composed of six phases: initial study, design, implementation and loading, testing and evaluation, operation, and maintenance and evolution. 689. Discuss the distinction between centralized and decentralized conceptual database designs. Answer: Centralized and decentralized designs constitute variations on the bottom-up and top-down approaches we discussed in the third question presented in the discussion focus. Basically, the centralized approach is best suited to relatively small and simple databases that lend themselves well to a bird’s-eye view of the entire database. Such databases may be designed by a single person or by a small and informally constituted design team. The company operations and the scope of its problems are sufficiently limited to enable the designer(s) to perform all of the necessary database design tasks: 6. Define the problem(s). 7. Create the conceptual design. 8. Verify the conceptual design with all user views. 9. Define all system processes and data constraints. 10. Assure that the database design will comply with all achievable end-user requirements. The centralized design procedure thus yields the design summary shown in Figure Q9.5A.

641

FIGURE Q9.5A The Centralized Design Procedure

Conceptual Model

Conceptual Model Verification

User Views

System Processes

Data Constraints

D A T A D I C T I O N A R Y

Note that the centralized design approach requires the completion and validation of a single conceptual design.

642

combined conceptual model is still able to support all the required transactions. Thus, the decentralized design activities may be summarized as shown in Figure Q9.6B.

FIGURE Q9.6B The Decentralized Design Procedure

DATA COMPONENT

Conceptual Models

Verification

Subset A

Subset B

Subset C

Views, Processes, Constraints

Aggregation

FINAL CONCEPTUAL MODEL

D A T A D I C T I O N A R Y

Keep in mind that the aggregation process requires the lead designer to assemble a single model in which various aggregation problems must be addressed: 



Entity and entity subclasses. An entity subset may be viewed as a separate entity by one or more departments. The designer must integrate such subclasses into a higher-level entity.



643

690. What is the minimal data rule in conceptual design? Why is it important? Answer: The minimal data rule specifies that all the data defined in the data model are actually required to fit present and expected future data requirements. This rule may be phrased as All that is needed is there, and all that is there is needed. 691. Discuss the distinction between top-down and bottom-up approaches in database design. Answer: There are two basic approaches to database design: top-down and bottom-up. Top-down design begins by identifying the different entity types and the definition of each entity’s attributes. In other words, top-down design: 

starts by defining the required data sets and then



defines the data elements for each of those data sets.

Bottom-up design: 

first defines the required attributes and then



groups the attributes to form entities.

Although the two methodologies tend to be complementary, database designers who deal with small databases with relatively few entities, attributes, and transactions tend to emphasize the bottom-up approach. Database designers who deal with large, complex databases usually find that a primarily top-down design approach is more appropriate. 692. What are business rules? Why are they important to a database designer? Answer: Business rules are narrative descriptions of the business policies, procedures, or principles that are derived from a detailed description of operations. Business rules are particularly valuable to database designers because they help define: 

Entities



Attributes



Relationships (1:1, 1:M, M:N, expressed through connectivities and cardinalities)



Constraints

644

NOTE Do keep in mind that an ERD cannot always include all the applicable business rules. For example, although constraints are often crucial, it is often not possible to model them. For instance, there is no way to model a constraint such as “no pilot may be assigned to flight duties more than ten hours during any 24-hour period.” It is also worth emphasizing that the description of (company) operations must be done in almost excruciating detail and it must be verified and reverified. An inaccurate description of operations yields inaccurate business rules that lead to database designs that are destined to fail. 693. What is the data dictionary’s function in database design? Answer: A good data dictionary provides a precise description of the characteristics of all the entities and attributes found within the database. The data dictionary thus makes it easier to check for the existence of synonyms and homonyms, to check whether all attributes exist to support required reports, to verify appropriate relationship representations, and so on. The data dictionary’s contents are both developed and used during the six DBLC phases: DATABASE INITIAL STUDY The basic data dictionary components are developed as the entities and attributes are defined during this phase. DATABASE DESIGN The data dictionary contents are used to verify the database design components: entities, attributes, and their relationships. The designer also uses the data dictionary to check the database design for homonyms and synonyms and verifies that the entities and attributes will support all required query and report requirements. IMPLEMENTATION AND LOADING The DBMS’s data dictionary helps to resolve any remaining attribute definition inconsistencies. TESTING AND EVALUATION If problems develop during this phase, the data dictionary contents may be used to help restructure the basic design components to make sure that they support all required operations. OPERATION If the database design still yields (the almost inevitable) operational glitches, the data dictionary may be used as a quality control device to ensure that operational modifications to the database do not conflict with existing components.

645

MAINTENANCE AND EVOLUTION As users face inevitable changes in information needs, the database may be modified to support those needs. Perhaps entities, attributes, and relationships must be added, or relationships must be changed. If new database components fit into the design, their introduction may produce conflict with existing components. The data dictionary turns out to be a very useful tool to check whether a suggested change invites conflicts within the database design and, if so, how such conflicts may be resolved. 694. What steps are required in the development of an ER diagram? (Hint: See Table 9.3.) Answer: Table 9.3 is reproduced for your convenience.

Table 9.3 Developing the Conceptual Model Using ER Diagrams STEP

ACTIVITY

Identify, analyze, and refine the business rules.

Identify the main entities, using the results of Step 1.

Define the relationships among the entities, using the results of Steps 1 and 2.

Define the attributes, primary keys, and foreign keys for each of the entities.

Normalize the entities. (Remember that entities are implemented as tables in an RDBMS.)

Complete the initial ER diagram.

Validate the ER model against the end users’ information and processing requirements.

695. List and briefly explain the activities involved in the verification of an ER model. Answer: Section 9-4c, “Data Model Verification,” includes a discussion on verification. In addition, Appendix C, “The University Lab: Conceptual Design Verification, Logical Design, and Implementation,” covers the verification process in detail. The verification process is detailed in the text’s Table 9.5, reproduced here for your convenience.

646

Table 9.5 The ER Model Verification Process STEP

ACTIVITY

Identify the ER model’s central entity.

Identify each module and its components.

Identify each module’s transaction requirements: Internal: Updates/Inserts/Deletes/Queries/Reports External: Module interfaces

Verify all processes against the module’s processing and reporting requirements.

Make all necessary changes suggested in Step 4.

696. What factors are important in a DBMS software selection? Answer: The selection of DBMS software is critical to the information system’s smooth operation. Consequently, the advantages and disadvantages of the proposed DBMS software should be carefully studied. To avoid false expectations, the end user must be made aware of the limitations of both the DBMS and the database. Although the factors affecting the purchasing decision vary from company to company, some of the most common are: 

Cost. Purchase, maintenance, operational, license, installation, training, and conversion costs.



Underlying model. Hierarchical, network, relational, object/relational, or object.



Portability. Across platforms, systems, and languages.



DBMS hardware requirements. Processor(s), RAM, disk space, and so on.

647

697. List and briefly explain the four steps performed during the logical design stage. Answer: 5. Map conceptual model to logical model components. In this step, the conceptual model is converted into a set of table definitions including table names, column names, primary keys, and foreign keys for implementing the entities and relationships specified in the conceptual design. 6. Validate the logical model using normalization. It is possible for normalization issues to be discovered during the process of mapping the conceptual model to logical model components. Therefore, it is appropriate at this stage to validate that all of the table definitions from the previous step conform to the appropriate normalization rules. 7. Validate logical model integrity constraints. This step involves the conversion of attribute domains and constraints into constraint definitions that can be implemented within the DBMS to enforce those domains. Also, entity and referential integrity constraints are validated. Views may be defined to enforce security constraints. 8. Validate the logical model against the user requirements. The final step of this stage is to ensure that all definitions created throughout the logical model are validated against the users’ data, transaction, and security requirements. Every component (table, view, constraint, etc.) of the logical model must be associated with satisfying the user requirements, and every user requirement should be addressed by the model components. 698. List and briefly explain the three steps performed during the physical design stage. Answer: 4. Define data storage organization. Based on estimates of the data volume and growth, this step involves the determination of the physical location and physical organization of each table. Also, which columns will be indexed and the type of indexes to be used are determined. Finally, the type of implementation to be used for each view is decided. 5. Define integrity and security measures. This step involves creating users and security groups and then assigning privileges and controls to those users and groups. 6. Determine performance measurements. The actual performance of the physical database implementation must be measured and assessed for compliance with user performance requirements.

648

699. What three levels of backup may be used in database recovery management? Briefly describe what each backup level does. Answer: A full backup of the database creates a backup copy of all database objects in their entirety. A differential backup of the database creates a backup of only those database objects that have changed since the last full backup. A transaction log backup does not create a backup of database objects but makes a backup of the log of changes that have been applied to the database objects since the last backup.

649

ANSWERS TO PROBLEMS The ABC Car Service & Repair Centers are owned by the Silent Car Dealership; ABC services and repairs only silent cars. Three ABC centers provide service and repair for the entire state. Each of the three centers is independently managed and operated by a shop manager, a receptionist, and at least eight mechanics. Each center maintains a fully stocked parts inventory. Each center also maintains a manual file system in which each car’s maintenance history is kept; repairs made, parts used, costs, service dates, owner, and so on. Files are also kept to track inventory, purchasing, billing, employees’ hours, and payroll. You have been contacted by one of the center’s managers to design and implement a computerized database system. Given the preceding information, do the following: b. Indicate the most appropriate sequence of activities by labeling each of the following steps in the correct order. (e.g., if you think that “Load the database” is the appropriate first step, label it “1.”) ____

Normalize the conceptual model.

____

Obtain a general description of company operations.

____

Load the database.

____

Create a description of each system process.

____

Test the system.

____

Draw a data flow diagram and system flowcharts.

____

Create a conceptual model using ER diagrams.

____

Create the application programs.

____

Interview the mechanics.

____

Create the file (table) structures.

____

Interview the shop manager.

650

ANALYSIS 12. Interview the shop manager 13. Interview the mechanics 14. Obtain a general description of company operations 15. Create a description of each system process DESIGN 16. Create a conceptual model, using ER diagrams 17. Draw a data flow diagram and system flow charts 18. Normalize the conceptual model IMPLEMENTATION 19. Create the table structures 20. Load the database 21. Create the application programs 22. Test the system This listing implies that, within each of the three phases, the steps are completed in a specific order. For example, it would seem reasonable to argue that we must first complete the interviews if we are to obtain a proper description of the company’s operations. Similarly, we may argue that a data flow diagram precedes the creation of the ER diagram. Nevertheless, the specific tasks and the order in which they are addressed may vary. Such variations do not matter, as long as the designer bases the selected procedures on appropriate design philosophy, such as top-down versus bottom-up. Given this discussion, we may present Problem 1’s solution this way: 7

Normalize the conceptual model.

Obtain a general description of company operations.

Load the database.

Create a description of each system process.

Test the system.

Draw a data flow diagram and system flow charts.

Create a conceptual model using ER diagrams.

Create the application programs.

651

Interview the mechanics.

Create the file (table) structures.

Interview the shop manager.

c. Describe the various modules that you believe the system should include. Answer: This question may be addressed in several ways. We suggest the following approach to develop a system composed of four main modules: Inventory, Payroll, Work Order, and Customer. We have illustrated the Information System’s main modules in Figure P9.1B.

FIGURE P9.1B The ABC Company’s IS System Modules

The Inventory module will include the Parts and Purchasing submodules. The Payroll Module will handle all employee and payroll information. The Work Order module keeps track of the car maintenance history and all work orders for maintenance done on a car. The Customer module keeps track of the billing of the work orders to the customers and of the payments received from those customers. g. How will a data dictionary help you develop the system? Give examples. Answer: We have addressed the role of the data dictionary within the DBLC in detail in the answer to Review Question 10. Remember that the data dictionary makes it easier to check for the existence of synonyms and homonyms, to check whether all attributes exist to support required reports, to verify appropriate relationship representations, and so on. Therefore, the data dictionary’s contents will help us to provide consistency across modules and to evaluate the system’s ability to generate the required reports. In addition, the use of the data dictionary facilitates the creation of system documentation. h. What general (system) recommendations might you make to the shop manager? For example, if the system will be integrated, what modules will be integrated? What

652

What is the best approach to conceptual database design? Why? Answer: Given the nature of this business, the best way to produce this conceptual database design would be to use a centralized and top-down approach. Keep in mind that the designer must keep the design sufficiently flexible to make sure that it can accommodate any future integration of this system with the other service stations in the state.

Name and describe at least four reports the system should have. Explain their use. Who will use the reports?

653

REPORT 4 Customer Activity contains a breakdown of customers by location, maintenance activity, current balances, available credit, and so on. This report would be useful to forecast various service demand factors, to mail promotional materials, to send maintenance reminders, to keep track of special customer requirements, and so on. 700. Suppose that you have been asked to create an information system for a manufacturing plant that produces nuts and bolts of many shapes, sizes, and functions. What questions would you ask, and how would the answers affect the database design? Answer: Basically, all answers to all (relevant) questions help shape the database design. In fact, all information collected during the initial study and all subsequent phases will have an impact on the database design. Keep in mind that the information is collected to establish the entities, attributes, and the relationships among the entities. Specifically, the relationships, connectivities, and cardinalities are shaped by the business rules that are derived from the information collected by the designer. Sample questions and their likely impact on the design might be: 

Do you want to develop the database for all departments at once, or do you want to design and implement the database for one department at a time?



How will the design approach affect the design process? (In other words, assess topdown versus bottom-up, centralized or decentralized, system scope and boundaries.)



Do you want to develop one module at a time, or do you want an integrated system? (Inventory, production, shipping, billing, etc.)



Do you want to keep track of the nuts and bolts by lot number, production shift, type, and department? Impact: conceptual and logical database design.



Do you want to keep track of the suppliers of each batch of raw material used in the production of the nuts and bolts? Impact: conceptual and logical database design. ER model.



Do you want to keep track of the customers who received the batches of nuts and bolts? Impact: conceptual and logical database design. ER model.



What reports will you require, what will be the specific reporting requirements, and to whom will these reports be distributed?

The answers to such questions affect the conceptual and logical database design, the database’s implementation, its testing, and its subsequent operation. c. What do you envision the SDLC to be? Answer: The SDLC is not a function of the information collected. Regardless of the extent of the design or its specific implementation, the SDLC phases remain:

PLANNING Initial assessment Feasibility study

654

User requirements Study of existing systems Logical system design

DETAILED SYSTEMS DESIGN Detailed system specifications

IMPLEMENTATION Coding, testing, debugging Installation, fine-tuning

MAINTENANCE Evaluation Maintenance Enhancements d. What do you envision the DBLC to be? Answer: As is true for the SDLC, the DBLC is not a function of the kind and extent of the collected information. Thus, the DBLC phases and their activities remain as shown:

DATABASE INITIAL STUDY Analyze the company situation Define problems and constraints Define objectives Define scope and boundaries

DATABASE DESIGN Create the conceptual design Create the logical design Create the physical design

IMPLEMENTATION AND LOADING Install the DBMS

655

Create the database(s) Load or convert the data

TESTING AND EVALUATION Test the database Fine-tune the database Evaluate the database and its application programs

OPERATION Produce the required information flow

MAINTENANCE AND EVOLUTION Introduce changes Make enhancements 701. Suppose that you perform the same functions noted in Problem 2 for a larger warehousing operation. How are the two sets of procedures similar? How and why are they different? Answer: The development of an information system will differ in the approach and philosophy used. More precisely, the designer team will probably be formed by a group of system analysts and may decide to use a decentralized approach to database design. Also, as is true for any organization, the system scope and constraints may be very different for different systems. Therefore, designers may opt to use different techniques at different stages. For example, the database initial study phase may include separate studies carried out by separate design teams at several geographically distant locations. Each of the findings of the design teams will later be integrated to identify the main problems, solutions, and opportunities that will guide the design and development of the system. 702. Using the same procedures and concepts employed in Problem 1, how would you create an information system for the Tiny College example in Chapter 4? Answer: Tiny College is a medium-sized educational institution that uses many database-intensive operations, such as student registration, academic administration, inventory management, and payroll. To create an information system, first perform an initial database study to determine the information system’s objectives.

656

Next, study Tiny College’s operations and processes (flow of data) to identify the main problems, constraints, and opportunities. A precise definition of the main problems and constraints will enable the designer to make sure that the design improves Tiny College’s operational efficiency. An improvement in operational efficiency is likely to create opportunities to provide new services that will enhance Tiny College’s competitive position. After the initial database study is done and the alternative solutions are presented, the end users ultimately decide which one of the probable solutions is most appropriate for Tiny College. Keep in mind that the development of a system this size will probably involve people who have quite different backgrounds. For example, it is likely that the designer must work with people who play a managerial role in communications and local area networks, as well as with the “troops in the trenches” such as programmers and system operators. The designer should, therefore, expect that there will be a wide range of opinions concerning the proposed system’s features. It is the designer’s job to reconcile the many (and often conflicting) views of the “ideal” system. Once a proposed solution has been agreed upon, the designer(s) may determine the proposed system’s scope and boundaries. We are then able to begin the design phase. As the design phase begins, keep in mind that Tiny College’s information system is likely to be used by many users (20 to 40 minimum) who are located on distant sites across campus. Therefore, the designer must consider a range of communication issues involving the use of such technologies as local area networks. These technologies must be considered as the database designer(s) begin to develop the structure of the database to be implemented. The remaining development work conforms to the SDLC and the DBLC phases. Special attention must be given to the system design’s implementation and testing to ensure that all the system modules interface properly. Finally, the designer(s) must provide all the appropriate system documentation and ensure that all appropriate system maintenance procedures (periodic backups, security checks, etc.) are in place to ensure the system’s proper operation. Keep in mind that two very important issues in a university-wide system are end-user training and support. Therefore, the system designer(s) must make sure that all end users know the system and know how it is to be used to enjoy its benefits. In other words, make sure that end-user support programs are in place when the system becomes operational. 703. Write the proper sequence of activities for the design of a video rental database. (The initial ERD was shown in Figure 9.9.) The design must support all rental activities, customer payment tracking, and employee work schedules, as well as track which employees checked out the videos to the customers. After you finish writing the design activity sequence, complete the ERD to ensure that the database design can be successfully implemented. (Make sure that the design is normalized properly and that it can support the required transactions. Answer: Given its level of detail and (relative) complexity, this problem would make an excellent class project. Use the chapter’s coverage of the database life cycle (DBLC) as the procedural template. The text’s Figure 9.3 is particularly useful as a procedural map for this problem’s solution and Figure 9.6 provides a more detailed view of the database design’s procedural flow. Make sure that the students review Section 9-3b, “Database Design,” before they attempt to produce the problem solution.

657

Appendix B “The University Lab: Conceptual Design” and Appendix C “The University Lab: Conceptual Design Verification, Logical Design, and Implementation” show a very detailed example of the procedures required to deliver a completed database. You will find a more detailed video rental database problem description in Appendix B, problem 4. This problem requires the completion of the initial database design. The solution is shown in this manual’s Appendix B coverage. This design is verified in Appendix C, Problem 2. The Visio Professional files for the initial and verified designs are located on your instructor’s resources; the FigD-P04a-The-Initial-Crows-Foot-ERD-for-the-Video-Rental-Store.vsd file has the initial design. Select the FigE-P02a-The-Revised-Video-Rental-Crows-Foot-ERD.vsd file to see the verified design. 704. In a construction company, a new system has been in place for a few months and now there is a list of possible changes/updates that need to be done. For each of the changes/updates, specify what type of maintenance needs to be done: (a) corrective, (b) adaptive, or (c) perfective. d. An error in the size of one of the fields has been identified and it needs to be updated status field needs to be changed. Answer: This is a change in response to a system error – corrective maintenance. e. The company is expanding into a new type of service, which will require enhancing the system with a new set of tables to support this new service and integrate it with the existing data. Answer: This is a change to enhance the system—perfective maintenance. f.

The company has to comply with some government regulations. To do this, it will require adding a couple of fields to the existing system tables. Answer: This is a change in response to changes in the business environment—adaptive maintenance.

705. You have been assigned to design the database for a new soccer club. Indicate the most appropriate sequence of activities by labeling each of the following steps in the correct order. (e.g., if you think that “Load the database” is the appropriate first step, label it “1.”) Answer: 10

Create the application programs.

Create a description of each system process.

Test the system.

Load the database.

Normalize the conceptual model.

Interview the soccer club president.

Create a conceptual model using ER diagrams.

658

Interview the soccer club director of coaching.

Create the file (table) structures.

Obtain a general description of the soccer club operations.

Draw a data flow diagram and system flowcharts.

ANSWERS TO REVIEW QUESTIONS 706. Explain the following statement: A transaction is a logical unit of work. Answer: A transaction is a logical unit of work that must be entirely completed or aborted; no intermediate states are accepted. In other words, a transaction, composed of several database requests, is treated by the DBMS as a unit of work in which all transaction steps must be fully completed if the transaction is to be accepted by the DBMS. Acceptance of an incomplete transaction will yield an inconsistent database state. To avoid such a state, the DBMS ensures that all of a transaction’s database operations are completed before they are committed to the database. For example, a credit sale requires a minimum of three database operations: 4. An invoice is created for the sold product. 5. The product’s inventory quantity on hand is reduced. 6. The customer accounts payable balance is increased by the amount listed on the invoice. If only parts 1 and 2 are completed, the database will be left in an inconsistent state. Unless all three parts (1, 2, and 3) are completed, the entire sales transaction is canceled. 707. What is a consistent database state, and how is it achieved? Answer: A consistent database state is one in which all data integrity constraints are satisfied. To achieve a consistent database state, a transaction must take the database from one consistent state to another. (See the answer to Question 1.) 708. The DBMS does not guarantee that the semantic meaning of the transaction truly represents the real-world event. What are the possible consequences of that limitation? Give an example.

659

Answer: The database is designed to verify the syntactic accuracy of the database commands given by the user to be executed by the DBMS. The DBMS will check that the database exists, that the referenced attributes exist in the selected tables, that the attribute data types are correct, and so on. Unfortunately, the DBMS is not designed to guarantee that the syntactically correct transaction accurately represents the real-world event. For example, if the end user sells 10 units of product 100179 (Crystal Vases), the DBMS cannot detect errors such as the operator entering 10 units of product 100197 (Crystal Glasses). The DBMS will execute the transaction, and the database will end up in a technically consistent state but in a real-world inconsistent state because the wrong product was updated. 709. List and discuss the four individual transaction properties. Answer: The four transaction properties are: Atomicity

requires that all parts of a transaction must be completed or the transaction is aborted. This property ensures that the database will remain in a consistent state.

Consistency

indicates the permanence of the database consistent state.

Isolation

Durability

indicates that the database will be in a permanent consistent state after the execution of a transaction. In other words, once a consistent state is reached, it cannot be lost.

All four transaction properties work together to make sure that a database maintains data integrity and consistency for either a single-user or a multiuser DBMS. 710. What does serializability of transactions mean? Answer: Serializability of transactions means that a series of concurrent transactions will yield the same result as if they were executed one after another. 711. What is a transaction log, and what is its function? Answer: The transaction log is a special DBMS table that contains a description of all the database transactions executed by the DBMS. The database transaction log plays a crucial role in maintaining database concurrency control and integrity. The information stored in the log is used by the DBMS to recover the database after a transaction is aborted or after a system failure. The transaction log is usually stored in a different hard disk or in a different media (tape) to prevent the failure caused by a media error. 712. What is a scheduler, what does it do, and why is its activity important to concurrency control? Answer: The scheduler is the DBMS component that establishes the order in which concurrent database operations are executed. The scheduler interleaves the execution of the database operations (belonging to several concurrent transactions) to ensure the serializability of transactions. In other words, the scheduler guarantees that the execution of concurrent transactions will yield the same result as though the transactions were executed one after another. The scheduler is important because it is the DBMS component that will ensure transaction serializability. In other words, the scheduler allows the concurrent

660

execution of transactions, giving end users the impression that they are the DBMS’s only users. 713. What is a lock, and how does it work in general? Answer: A lock is a mechanism used in concurrency control to guarantee the exclusive use of a data element to the transaction that owns the lock. For example, if the data element X is currently locked by transaction T1, transaction T2 will not have access to the data element X until T1 releases its lock. Generally speaking, a data item can be in only two states: locked (being used by some transaction) or unlocked (not in use by any transaction). To access a data element X, a transaction T1 first must request a lock to the DBMS. If the data element is not in use, the DBMS will lock X to be used by T1 exclusively. No other transaction will have access to X while T1 is executed. 714. What are the different levels of lock granularity? Answer: Lock granularity refers to the size of the database object that a single lock is placed upon. Lock granularity can be: Database-level, meaning the entire database is locked by one lock. Table-level, meaning a table is locked by one lock. Page-level, meaning a diskpage is locked by one lock. Row-level, meaning one row is locked by one lock. Field-level, meaning one field in one row is locked by one lock. 715. Why might a page-level lock be preferred over a field-level lock? Answer: Smaller lock granularity improves the concurrency of the database by reducing contention to lock database objects. However, smaller lock granularity also means that more locks must be maintained and managed by the DBMS, requiring more processing overhead and system resources for lock management. Concurrency demands and system resource usage must be balanced to ensure the best overall transaction performance. In some circumstances, page-level locks, which require fewer system resources, may produce better overall performance than field-level locks, which require more system resources. 716. What is concurrency control, and what is its objective? Answer: Concurrency control is the activity of coordinating the simultaneous execution of transactions in a multiprocessing or multiuser database management system. The objective of concurrency control is to ensure the serializability of transactions in a multiuser database management system. (The DBMS’s scheduler is in charge of maintaining concurrency control.) Because it helps to guarantee data integrity and consistency in a database system, concurrency control is one of the most critical activities performed by a DBMS. If concurrency control is not maintained, three serious problems may be caused by concurrent transaction execution: lost updates, uncommitted data, and inconsistent retrievals. 717. What is an exclusive lock, and under what circumstances is it granted?

661

Answer: An exclusive lock is one of two lock types used to enforce concurrency control. (A lock can have three states: unlocked, shared (read) lock, and exclusive (write) lock. The “shared” and “exclusive” labels indicate the nature of the lock.) An exclusive lock exists when access to a data item is specifically reserved for the transaction that locked the object. The exclusive lock must be used when a potential for conflict exists, for example, when one or more transactions must update (WRITE) a data item. Therefore, an exclusive lock is issued only when a transaction must WRITE (update) a data item and no locks are currently held on that data item by any other transaction. To understand the reasons for having an exclusive lock, look at its counterpart, the shared lock. Shared locks are appropriate when concurrent transactions are granted READ access on the basis of a common lock, because concurrent transactions based on a READ cannot produce a conflict. A shared lock is issued when a transaction must read data from the database and no exclusive locks are held on the data to be read. 718. What is a deadlock, and how can it be avoided? Discuss several strategies for dealing with deadlocks. Answer: Base your discussion on Section 10-3d, Deadlocks. Start by pointing out that, although locks prevent serious data inconsistencies, their use may lead to two major problems: 3. The transaction schedule dictated by the locking requirements may not be serializable, thus causing data integrity and consistency problems. 4. The schedule may create deadlocks. Database deadlocks are the equivalent of a traffic gridlock in a big city and are caused by two transactions waiting for each other to unlock data. Use Table 10.13 in the text to illustrate the scenario that leads to a deadlock. The table has been reproduced below for your convenience.

Table 10.13 How a Deadlock Condition Is Created

TIME 0 1 2 3 4 5 6 7 8 9 … … … …

TRANSACTION T1:LOCK(X) T2:LOCK(Y) T1:LOCK(Y) T2:LOCK(X) T1:LOCK(Y) T2:LOCK(X) T1:LOCK(Y) T2:LOCK(X) T1:LOCK(Y) ………….. ………….. ………….. …………..

REPLY OK OK WAIT WAIT WAIT WAIT WAIT WAIT WAIT …….. …….. …….. ……..

Data X Unlocked Locked Locked Locked Locked Locked Locked Locked Locked … … … …

LOCK STATUS Data Y Unlocked Unlocked Locked Deadlock Locked Locked Locked Locked Locked Locked … … … …

662

DEADLOCK AVOIDANCE The transaction must obtain all the locks it needs before it can be executed. This technique avoids rollback of conflicting transactions by requiring that locks be obtained in succession. However, the serial lock assignment required in deadlock avoidance increases the response times. The best deadlock control method depends on the database environment. For example, if the probability of deadlocks is low, deadlock detection is recommended. However, if the probability of deadlocks is high, deadlock prevention is recommended. If response time is not high on the system priority list, deadlock avoidance may be employed. 719. What are some disadvantages of time stamping methods for concurrency control? Answer: The disadvantages are: (1) each value stored in the database requires two additional time stamp fields—one for the last time the field was read and one for the last time it was updated, (2) increased memory and processing overhead requirements, and (3) many transactions may have to be stopped, rescheduled, and restamped. 720. Why might it take a long time to complete transactions when using an optimistic approach to concurrency control? Answer: Because the optimistic approach makes the assumption that conflict from concurrent transactions is unlikely, it does nothing to avoid conflicts or control the conflicts. The only test for conflict occurs during the validation phase. If a conflict is detected, then the entire transaction restarts. In an environment with few conflicts from concurrency, this type of single checking scheme works well. In an environment where conflicts are common, a transaction may have to be restarted numerous times before it can be written to the database. 721. What are the three types of database critical events that can trigger the database recovery process? Give some examples for each one.

663



Human-caused incidents. This type of event can be categorized as unintentional or intentional.





Natural disasters. This category includes fires, earthquakes, floods, and power failures.

722. What are the four ANSI transaction isolation levels? What type of reads does each level allow? Answer: The four ANSI transaction isolation levels are (1) read uncommitted, (2) read committed, (3) repeatable read, and (4) serializable. These levels allow different “questionable” reads. A read is questionable if it can produce inconsistent results. Read uncommitted isolation will allow dirty reads, nonrepeatable reads, and phantom reads. Read committed isolation will allow nonrepeatable reads and phantom reads. Repeatable read isolation will allow phantom reads. Serializable does not allow any questionable reads.

664

Table P10.1

Table name: PRODUCT

Table name: PART

PROD_CODE

PROD_QOH

PART_CODE

PART_QOH

ABC

1,205

567

549

Given the preceding information, complete Problems 1a through 1e. f.

How many database requests can you identify for an inventory update for both PRODUCT and PART? Answer: Depending in how the SQL statements are written, there are two correct answers: 4 or 2.

g. Using SQL, write each database request you identified in Problem 1a. Answer: The database requests are shown in the following table.

Two SQL statements UPDATE PRODUCT SET PROD_QOH = PROD_OQH + 1 WHERE PROD_CODE = ‘ABC’ UPDATE PART SET PART_QOH = PART_OQH 1 WHERE PART_CODE = ‘A’ OR PART_CODE = ‘B’ OR PART_CODE = ‘C’

SET PART_QOH = PART_OQH - 1

665

Four SQL statements

Two SQL statements

WHERE PART_CODE = ‘C’

h. Write the complete transaction(s). Answer: The transactions are shown in the following table.

Four SQL statements

Two SQL statements

BEGIN TRANSACTION

UPDATE PRODUCT

SET PROD_QOH = PROD_OQH + 1

WHERE PROD_CODE = ‘ABC’

UPDATE PART

SET PART_QOH = PART_OQH - 1

WHERE PART_CODE = ‘A’

WHERE PART_CODE = ‘A’ OR PART_CODE = ‘B’ OR

UPDATE PART

PART_CODE = ‘C’

SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘B’

COMMIT;

UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘C’ COMMIT; i.

Write the transaction log, using Table 10.1 as your template. Answer: We assume that product “ABC” has a PROD_QOH = 23 at the start of the transaction and that the transaction is representing the addition of 1 new product. We also assume that PART components “A”, “B”, and “C” have a PROD_QOH equal to 56, 12, and 45, respectively.

TRL ID

TRX NUM

PREV PTR

NEXT PTR

OPERATION

1A3

NULL

START

**START TRANSACTION

1A3

UPDATE

PRODUCT

TABLE

ROW ID

ATTRIBUTE

BEFORE VALUE

AFTER VALUE

‘ABC’

PROD_QOH

666

TRL ID

TRX NUM

PREV PTR

NEXT PTR

OPERATION

1A3

UPDATE

1A3

ROW ID

ATTRIBUTE

BEFORE VALUE

AFTER VALUE

PART

‘A’

PART_QOH

UPDATE

PART

‘B’

PART_QOH

UPDATE

PART

‘C’

PART_QOH

NULL

COMMIT

** END TRANSACTION

TABLE

Using the transaction log you created in Problem 1d, trace its use in database recovery. Answer: Begin with the last trl_id (trl_id 6) for the transaction (trx_num 1A3) and work backward using the prev_ptr to identify the next step to undo moving from the end of the transaction back to the beginning. Trl_ID 6: Nothing to change because it is an end of transaction marker. Trl_ID 5: Change PART_QOH from 44 to 45 for ROW_ID ‘C’ in PART table. Trl_ID 4: Change PART_QOH from 11 to 12 for ROW_ID ‘B’ in PART table. Trl_ID 3: Change PART_QOH from 55 to 56 for ROW_ID ‘A’ in PART table. Trl_ID 2: Change PROD_QOH from 24 to 23 for ROW_ID ‘ABC’ in PRODUCT table. Trl_ID 1: Nothing to change because it is a beginning of transaction marker.

723. Describe the three most common problems with concurrent transaction execution. Explain how concurrency control can be used to avoid those problems. Answer: The three main concurrency control problems are triggered by lost updates, uncommitted data, and inconsistent retrievals. These control problems are discussed in detail in Section 10-2. Note particularly Section 10-2a, Lost Updates, Section 10-2b, Uncommitted Data, and Section 10-2c, Inconsistent Retrievals. 724. What DBMS component is responsible for concurrency control? How is this feature used to resolve conflicts? Answer: Severe database integrity and consistency problems can arise when two or more concurrent transactions are executed. In order to avoid such problems, the DBMS must exercise concurrency control. The DBMS’s component in charge of concurrency control is the scheduler. The scheduler is discussed in Section 10-2d. Note particularly the Read/Write conflict scenarios illustrated with the help of Table 10.11, Read/Write Conflict Scenarios: Conflicting Database Operations Matrix. 725. Using a simple example, explain the use of binary and shared/exclusive locks in a DBMS. Answer: Binary locks have two states, locked and unlocked. Shared/exclusive locks have three states, shared lock, exclusive lock, and unlocked. For example, given a row-level lock granularity and three transactions that all want access to the same customer row with the following requests:

667

668

If shared/exclusive locks are used, then T1 gets a shared lock on the customer since it is only reading the data. T2 is then allowed to join the shared lock with T1 since it also only wants to read the data. T2 did not have to wait for T1 to finish, both transactions shared the locked data simultaneously. T3 needs an exclusive lock to update the data, so it must wait until both T1 and T2 release the shared lock. The shared/exclusive locks provided overall better performance since T2 did not have to wait, and T3’s total wait time is less since T2 did not have to wait for T1 to finish before it could begin. 726. Suppose that your database system has failed. Describe the database recovery process and the use of deferred-write and write-through techniques. Answer: Recovery restores a database from a given state, usually inconsistent, to a previously consistent state. Depending on the type and the extent of the failure, the recovery process ranges from a minor short-term inconvenience to a major long-term rebuild action. Regardless of the extent of the required recovery process, recovery is not possible without backup. The database recovery process generally follows a predictable scenario: 5. Determine the type and the extent of the required recovery. 6. If the entire database needs to be recovered to a consistent state, the recovery uses the most recent backup copy of the database in a known consistent state. 7. The backup copy is then rolled forward to restore all subsequent transactions by using the transaction log information. 8. If the database needs to be recovered, but the committed portion of the database is usable, the recovery process uses the transaction log to “undo” all the transactions that were not committed. Recovery procedures generally make use of deferred-write and write-thru techniques. In the case of the deferred-write or deferred-update, the transaction operations do not immediately update the database. Instead: 

All changes (previous and new data values) are first written to the transaction log.



The database is updated only after the transaction reaches its commit point.



If the transaction fails before it reaches its commit point, no changes (no roll-back or undo) need to be made to the database because the database was never updated.

In contrast, if the write-thru or immediate-update technique is used: 

The database is immediately updated by transaction operations during the transaction’s execution, even before the transaction reaches its commit point.



The transaction log is also updated; so if a transaction fails, the database uses the log information to roll back (“undo”) the database to its previous state.

ONLINE CONTENT The Ch10_ABC_Markets database is available at www.cengage.com. This database is stored in Microsoft Access format. 727. ABC Markets sell products to customers. The relational diagram shown in Figure P10.6 represents the main entities for ABC’s database. Note the following important characteristics: © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

669



A product’s quantity on hand (P_QTYOH) is updated (decreased) with each product sale.



Note: Not all entities and attributes are represented in this example. Use only the attributes indicated.

670

FIGURE P10.6 The ABC Markets Relational Diagram

Using this database, write the SQL code to represent each of the following transactions. Use BEGIN TRANSACTION and COMMIT to group the SQL statements in logical transactions. c. On May 11, 2022, customer 10010 makes a credit purchase (30 days) of one unit of product 11QER/31 with a unit price of $110.00; the tax rate is 8 percent. The invoice number is 10983, and this invoice has only one product line. Answer: i.

BEGIN TRANSACTION

INSERT INTO INVOICE ii.

INSERT INTO LINE ii.

VALUES (10983, ‘10010’, ‘2022-05-11’, 118.80, ‘30’, ‘OPEN’);

VALUES (10983, 1, ‘11QER/31’, 1, 110.00);

UPDATE PRODUCT iii.

SET P_QTYOH = P_QTYOH – 1

iv.

WHERE P_CODE = ‘11QER/31’;

m. UPDATE CUSTOMER n. SET CUS_DATELSTPUR = ‘2022-05-11’, CUS_BALANCE = CUS_BALANCE +118.80 o. WHERE CUS_CODE = ‘10010’;

671

p. COMMIT; d. On June 3, 2022, customer 10010 makes a payment of $100 in cash. The payment ID is 3428. Answer: c.

BEGIN TRANSACTION

d. INSERT INTO PAYMENTS VALUES (3428, ‘2022-06-03’, ‘10010’, 100.00, ‘CASH’, ‘None’); UPDATE CUSTOMER; SET CUS_DATELSTPMT = ‘2022-06-03’, CUS_BALANCE = CUS_BALANCE -100.00 WHERE CUS_CODE = ‘10010’; COMMIT 728. Create a simple transaction log (using the format shown in Table 10.14) to represent the actions of the transactions in Problems 6a and 6b. Answer: The transaction log is shown in Table P10.7.

Table P10.7 The ABC Markets Transaction Log TRL ID

TRX NUM

PREV PTR

NEXT PTR

OPERATION

TABLE

ROW ID

ATTRIBUTE

BEFORE VALUE

987

101

Null

1023

START

* Start Trx.

1023

101

987

1026

INSERT

INVOICE

10983

10983, 10010, 2022-05-11, 118.80, 30, OPEN

1026

101

1023

1029

INSERT

LINE

10983, 1

10983, 1, 11QER/31, 1, 110.00

1029

101

1026

1031

UPDATE

PRODUCT

11QER/31

P_QTYOH

1031

101

1029

1032

UPDATE

CUSTOMER

10010

CUS_BALANCE

345.67

464.47

1032

101

1031

1034

UPDATE

CUSTOMER

10010

CUS_DATELSTPUR

2022-0505

2022-05-11

1034

101

1032

Null

COMMIT

* End Trx. *

1089

102

Null

1091

START

* Start Trx.

1091

102

1089

1095

INSERT

PAYMENT

3428

AFTER VALUE

3428, 202206-03, 10010, 100.00, CASH, None

672

TRL ID

TRX NUM

PREV PTR

NEXT PTR

OPERATION

TABLE

ROW ID

ATTRIBUTE

BEFORE VALUE

AFTER VALUE

1095

102

1091

1096

UPDATE

CUSTOMER

10010

CUS_BALANCE

464.47

364.47

1096

102

1095

1097

UPDATE

CUSTOMER

10010

CUS_DATELSTPMT

2022-0502

2022-06-03

1097

102

1096

Null

COMMIT

* End Trx.

Note: Because we have not shown the table contents, the “before” values in the transaction can be assumed. The “after” value must be computed using the assumed “before” value, plus or minus the transaction value. Also, in order to save some space, we have combined the “after” values for the INSERT statements into a single cell. Actually, each value could be entered in individual rows. 729. Assuming that pessimistic locking is being used but the two-phase locking protocol is not, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6a. Answer:

Time

Action

Lock INVOICE

Insert row 10983 into INVOICE

Unlock INVOICE

Lock LINE

Insert row 10983, 1 into LINE

Unlock LINE

Lock PRODUCT

Update PRODUCT 11QER/31, P_QTYOH from 47 to 46

Unlock PRODUCT

Lock CUSTOMER

Update CUSTOMER 10010, CUS_BALANCE from 345.67 to 464.47

Update CUSTOMER 10010, CUS_DATELSTPUR from 2022-05-05 to 2022-05-11

Unlock CUSTOMER 730. Assuming that pessimistic locking is being used with the two-phase locking protocol, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6a. Answer:

673

Time

Action

Lock INVOICE

Lock LINE

Lock PRODUCT

Lock CUSTOMER

Insert row 10983 into INVOICE

Insert row 10983, 1 into LINE

Update PRODUCT 11QER/31, P_QTYOH from 47 to 46

Update CUSTOMER 10010, CUS_BALANCE from 345.67 to 464.47

Update CUSTOMER 10010, CUS_DATELSTPUR from 2022-05-05 to 2022-05-11

Unlock INVOICE

Unlock LINE

Unlock PRODUCT

Unlock CUSTOMER 731. Assuming that pessimistic locking is being used but the two-phase locking protocol is not, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6b. Answer:

Time

Action

Lock PAYMENT

Insert row 3428 into PAYMENT

Unlock PAYMENT

Lock CUSTOMER

Update CUSTOMER 10010, CUS_BALANCE from 464.47 to 364.47

Update CUSTOMER 10010, CUS_DATELSTPMT from 2022-05-02 to 2022-06-03

Unlock CUSTOMER 732. Assuming that pessimistic locking with the two-phase locking protocol is being used with rowlevel lock granularity, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6b. Answer:

674

Time

Action

Lock PAYMENT

Lock CUSTOMER

Insert row 3428 into PAYMENT

Update CUSTOMER 10010, CUS_BALANCE from 464.47 to 364.47

Update CUSTOMER 10010, CUS_DATELSTPMT from 2022-05-02 to 2022-06-03

Unlock PAYMENT

Unlock CUSTOMER

ANSWERS TO REVIEW QUESTIONS 733. What is SQL performance tuning? Answer: SQL performance tuning describes a process—on the client side—that will generate an SQL query to return the correct answer in the least amount of time, using the minimum amount of resources at the server end. 734. What is database performance tuning? Answer: DBMS performance tuning describes a process—on the server side—that will properly configure the DBMS environment to respond to clients’ requests in the fastest way possible while making optimum use of existing resources. 735. What is the focus of most performance-tuning activities, and why does that focus exist? Answer: Most performance-tuning activities focus on minimizing the number of I/O operations because the I/O operations are much slower than reading data from the data cache. At this point in the discussion, it will be good to point out the technological advances in hardware, such as solid-state drives (SSD) and in-memory databases. Although such advances improve I/O performance at the physical level, performance tuning is still important at the query formulation level because inefficient joins can still cause a query to use © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

675

unnecessary resources and increase processing times at parsing, execution, and fetching phases. 736. What are database statistics, and why are they important? Answer: The term database statistics refers to a number of measurements gathered by the DBMS to describe a snapshot of the database objects’ characteristics. The DBMS gathers statistics about objects such as tables, indexes, and available resources—such as the number of processors used, processor speed, and temporary space available. Such statistics are used to make critical decisions about improving query processing efficiency.

676

737. How are database statistics obtained? Answer: Database statistics can be gathered manually by the DBA or automatically by the DBMS. For example, many DBMS vendors support SQL’s ANALYZE command to gather statistics. In addition, many vendors have their own routines to gather statistics. For example, IBM’s DB2 uses the RUNSTATS procedure, while Microsoft’s SQL Server uses the UPDATE STATISTICS procedure and provides the Auto-Update and Auto-Create Statistics options in its initialization parameters. 738. What database statistics measurements are typical of tables, indexes, and resources? Answer: For tables, typical measurements include the number of rows, the number of disk blocks used, row length, the number of columns in each row, the number of distinct values in each column, the maximum value in each column, the minimum value in each column, and what columns have indexes. For indexes, typical measurements include the number and name of columns in the index key, the number of key values in the index, the number of distinct key values in the index key, and histogram of key values in an index. For resources, typical measurements include the logical and physical disk block size, the location and size of data files, and the number of extends per data file. 739. How is the processing of SQL DDL statements (such as CREATE TABLE) different from the processing required by DML statements? Answer: A DDL statement actually updates the data dictionary tables or system catalog, while DML statements (SELECT, INSERT, UPDATE, and DELETE) mostly manipulate enduser data. 740. In simple terms, the DBMS processes a query in three phases. What are the phases, and what is accomplished in each phase? Answer: The three phases are: 4. Parsing. The DBMS parses the SQL query and chooses the most efficient access/execution plan. 5. Execution. The DBMS executes the SQL query using the chosen execution plan. 6. Fetching. The DBMS fetches the data and sends the result set back to the client. Parsing involves breaking the query into smaller units and transforming the original SQL query into a slightly different version of the original SQL code—but one that is “fully equivalent” and more efficient. Fully equivalent means that the optimized query results are always the same as the original query. More efficient means that the optimized query will, almost always, execute faster than the original query. (Note that we say almost always because many factors affect the performance of a database. These factors include the network, the client’s computer resources, and even other queries running concurrently in the same database.) After the parsing and execution phases are completed, all rows that match the specified condition(s) have been retrieved, sorted, grouped, and/or—if required—aggregated. During the fetching phase, the rows of the resulting query result set are returned to the client. During this phase, the DBMS may use temporary table space to store temporary data.

677

741. If indexes are so important, why not index every column in every table? (Include a brief discussion of the role played by data sparsity.) Answer: Indexing every column in every table will tax the DBMS too much in terms of indexmaintenance processing, especially if the table has many attributes, many rows, and/or requires many inserts, updates, and/or deletes. One measure to determine the need for an index is the data sparsity of the column you want to index. Data sparsity refers to the number of different values a column could possibly have. For example, a STU_SEX column in a STUDENT table can have only two possible values, “M” or “F”; therefore, this column is said to have low sparsity. In contrast, the STU_DOB column that stores the student date of birth can have many different date values; therefore, this column is said to have high sparsity. Knowing the sparsity helps you decide whether or not the use of an index is appropriate. For example, when you perform a search in a column with low sparsity, you are very likely to read a high percentage of the table rows anyway; therefore, index processing may be unnecessary work. 742. What is the difference between a rule-based optimizer and a cost-based optimizer? Answer: A rule-based optimizer uses a set of preset rules and points to determine the best approach to execute a query. The rules assign a “cost” to each SQL operation; the costs are then added to yield the cost of the execution plan. A cost-based optimizer uses sophisticated algorithms based on the statistics about the objects being accessed to determine the best approach to execute a query. In this case, the optimizer process adds up the processing cost, the I/O costs, and the resource costs (RAM and temporary space) to come up with the total cost of a given execution plan. 743. What are optimizer hints and how are they used? Answer: Hints are special instructions for the optimizer that are embedded inside the SQL command text. Although the optimizer generally performs very well under most circumstances, there are some circumstances in which the optimizer may not choose the best execution plan. Remember, the optimizer makes decisions based on the existing statistics. If the statistics are old, the optimizer may not do a good job in selecting the best execution plan. Even with the current statistics, the optimizer choice may not be the most efficient one. There are some occasions when the end-user would like to change the optimizer mode for the current SQL statement. In order to accomplish this task, you have to use hints. 744. What are some general guidelines for creating and using indexes? Answer: Create indexes for each single attribute used in a WHERE, HAVING, ORDER BY, or GROUP BY clause. If you create indexes in all single attributes used in search conditions, the DBMS will access the table using an index scan, instead of a full table scan. For example, if you have an index for P_PRICE, the condition P_PRICE > 10.00 can be solved by accessing the index, instead of sequentially scanning all table rows and evaluating P_PRICE for each row. Indexes are also used in join expressions, such as in CUSTOMER.CUS_CODE = INVOICE.CUS_CODE.

678

Do not use indexes in small tables or tables with low sparsity. Remember, small tables and low sparsity tables are not the same thing. A search condition in a table with low sparsity may return a high percentage of table rows anyway, making the index operation too costly and making the full table scan a viable option. Using the same logic, do not create indexes for tables with few rows and few attributes—unless you must ensure the existence of unique values in a column. Declare primary and foreign keys so the optimizer can use the indexes in join operations. All natural joins and old-style joins will benefit if you declare primary keys and foreign keys because the optimizer will use the available indexes at join time. The declaration of a PK or an FK will automatically create an index for the declared column. Also, for the same reason, it is better to write joins using the SQL JOIN syntax. (See Chapter 8, “Advanced SQL.”) Declare indexes in join columns other than PK/FK. If you do join operations on columns other than the primary and foreign key, you may be better off declaring indexes in such columns. 745. Most query optimization techniques are designed to make the optimizer’s work easier. What factors should you keep in mind if you intend to write conditional expressions in SQL code? Answer: Use simple columns or literals as operands in a conditional expression—avoid the use of conditional expressions with functions whenever possible. Comparing the contents of a single column to a literal is faster than comparing to expressions. Numeric field comparisons are faster than character, date, and NULL comparisons. In search conditions, comparing a numeric attribute to a numeric literal is faster than comparing a character attribute to a character literal. In general, numeric comparisons (integer, decimal) are handled faster by the CPU than character and date comparisons. Because indexes do not store references to null values, NULL conditions involve additional processing and therefore tend to be the slowest of all conditional operands. Equality comparisons are faster than inequality comparisons. As a general rule, equality comparisons are processed faster than inequality comparisons. For example, P_PRICE = 10.00 is processed faster because the DBMS can do a direct search using the index in the column. If there are no exact matches, the condition is evaluated as false. However, if you use an inequality symbol (>, >=, <, <=), the DBMS must perform additional processing to complete the request. This is because there would almost always be more “greater than” or “less than” values and perhaps only a few exactly “equal” values in the index. The slowest (with the exception of NULL) of all comparison operators is LIKE with wildcard symbols, such as in V_CONTACT LIKE “%glo%”. Also, using the “not equal” symbol (<>) yields slower searches, especially if the sparsity of the data is high; that is, if there are many more different values than there are equal values. Whenever possible, transform conditional expressions to use literals. For example, if your condition is P_PRICE -10 = 7, change it to read P_PRICE = 17. Also, if you have a composite condition such as: P_QOH < P_MIN AND P_MIN = P_REORDER AND P_QOH = 10 change it to read: P_QOH = 10 AND P_MIN = P_REORDER AND P_MIN > 10

679

When using multiple conditional expressions, write the equality conditions first. (Note that we did this in the previous example.) Remember, equality conditions are faster to process than inequality conditions. Although most RDBMSs will automatically do this for you, paying attention to this detail lightens the load for the query optimizer. (The optimizer won’t have to do what you have already done.) If you use multiple AND conditions, write the condition most likely to be false first. If you use this technique, the DBMS will stop evaluating the rest of the conditions as soon as it finds a conditional expression that is evaluated to be false. Remember, for multiple AND conditions to be found true, all conditions must be evaluated as true. If one of the conditions evaluates to false, everything else is evaluated as false. Therefore, if you use this technique, the DBMS won’t waste time unnecessarily evaluating additional conditions. Naturally, the use of this technique implies an implicit knowledge of the sparsity of the data set. Whenever possible, try to avoid the use of the NOT logical operator. It is best to transform a SQL expression containing a NOT logical operator into an equivalent expression. For example: NOT (P_PRICE > 10.00) can be written as P_PRICE <= 10.00. Also, NOT (EMP_SEX = 'M') can be written as EMP_SEX = 'F'. 746. What recommendations would you make for managing the data files in a DBMS with many tables and indexes? Answer: First, create independent data files for the system, indexes, and user data table spaces. Put the data files on separate disks or RAID volumes. This ensures that index operations will not conflict with end-user data or data dictionary table access operations. Second, put high-usage end-user tables in their own table spaces. By doing this, the database minimizes conflicts with other tables and maximizes storage utilization. Third, evaluate the creation of indexes based on the access patterns. Identify common search criteria and isolate the most frequently used columns in search conditions. Create indexes on high usage columns with high sparsity. Fourth, evaluate the usage of aggregate queries in your database. Identify columns used in aggregate functions and determine if the creation of indexes on such columns will improve response time. Finally, identify columns used in ORDER BY statements and make sure there are indexes on such columns. 747. What does RAID stand for, and what are some commonly used RAID levels? Answer: RAID is the acronym for Redundant Array of Independent Disks. RAID is used to provide balance between performance and fault tolerance. RAID systems use multiple disks to create virtual disks (storage volumes) formed by several individual disks. RAID systems provide performance improvement and fault tolerance. Table 11.7 in the text shows the commonly used RAID levels. (We have reproduced the table for your convenience.)

680

Table 11.7 Common RAID Levels RAID Level

Description

The data and the parity are striped across separate drives. Provides good read performance and fault tolerance via parity data. Requires a minimum of three drives.

681

ANSWERS TO PROBLEMS Problems 1 and 2 are based on the following query: SELECT

EMP_LNAME, EMP_FNAME, EMP_AREACODE, EMP_SEX

FROM

EMPLOYEE

WHERE

EMP_SEX = ‘F’ AND EMP_AREACODE = ‘615’

ORDER BY EMP_LNAME, EMP_FNAME; What is the likely data sparsity of the EMP_SEX column? Answer: Because this column has only two possible values (“M” and “F”), the EMP_SEX column has low sparsity. 748. What indexes should you create? Write the required SQL commands. Answer: You should create an index in EMP_AREACODE and a composite index on EMP_LNAME, EMP_FNAME. In the following solution, we have named the two indexes EMP_NDX1 and EMP_NDX2, respectively. The required SQL commands are: CREATE INDEX EMP_NDX1 ON EMPLOYEE(EMP_AREACODE); CREATE INDEX EMP_NDX2 ON EMPLOYEE(EMP_LNAME, EMP_FNAME); 749. Using Table 11.4 as an example, create two alternative access plans. Use the following assumptions: a. There are 8,000 employees. b. There are 4,150 female employees. c. There are 370 employees in area code 615. d. There are 190 female employees in area code 615. Answer: The solution is shown in Table P11.3.

682

Table P11.3 Comparing Access Plans and I/O Costs Plan

Step

Operation

I/O Operations

I/O Cost

Resulting Set Rows

8,000

190

8,000

Full table scan EMPLOYEE

Total I/O Cost

Select only rows with EMP_SEX=‘F’ and EMP_AREACODE=‘615’ A

SORT Operation

190

8,190

Index Scan Range of EMP_NDX1

370

Table Access by RowID

370

740

EMPLOYEE B

Select only rows with EMP_SEX=‘F’

370

190

930

SORT Operation

190

1,120

EMP_LNAME, EMP_FNAME, EMP_DOB, YEAR(EMP_DOB) AS YEAR

FROM

EMPLOYEE

WHERE

YEAR(EMP_DOB) = 1976;

750. What is the likely data sparsity of the EMP_DOB column? Answer: Because the EMP_DOB column stores employee’s birthdays, this column is very likely to have high data sparsity. 751. Should you create an index on EMP_DOB? Why or why not? Answer: Creating an index in the EMP_DOB column would not help this query, because the query uses the YEAR function. However, if the same column is used for other queries, you may want to re-evaluate the decision not to create the index.

683

752. What type of database I/O operations will likely be used by the query? (See Table 11.3.) Answer: This query more than likely uses a full table scan to read all rows of the EMPLOYEE table and generate the required output. We have reproduced the table here to facilitate your discussion:

Table 11.3 Sample DBMS Access Plan I/O Operations Operation

Description

Table scan (full)

Reads the entire table sequentially, from the first row to the last, one row at a time (slowest)

Table access (row id)

Reads a table row directly, using the row ID value (fastest)

Index scan (range)

Reads the index first to obtain the row IDs and then accesses the table rows directly (faster than a full table scan)

Index access (unique)

Used when a table has a unique index in a column

Nested loop

Reads and compares a set of values to another set of values, using a nested loop style (slow)

Merge

Merges two data sets (slow)

Sort

Sorts a data set (slow) Problems 7–29 are based on the ER model shown in Figure P11.7.

684

FIGURE P11.7 The Ch11_SaleCo ER Model for Problems 7–29

Problems 7–10 are based on the following query: SELECT

P_CODE, P_PRICE

FROM

PRODUCT

WHERE

P_PRICE >= (SELECT AVG(P_PRICE) FROM PRODUCT);

753. Assuming there are no table statistics, what type of optimization will the DBMS use? Answer: The DBMS will use the rule-based optimization. 754. What type of database I/O operations will likely be used by the query? (See Table 11.3.) Answer: The DBMS will likely use a full table scan to compute the average price in the inner subquery. The DBMS is also very likely to use another full table scan of PRODUCT to execute the outer query. (We have reproduced the table for your convenience.)

685

Table 11.3 Sample DBMS Access Plan I/O Operations Operation

Description

Table scan (full)

Reads the entire table sequentially, from the first row to the last, one row at a time (slowest)

Table access (row id)

Reads a table row directly, using the row ID value (fastest)

Index scan (range)

Reads the index first to obtain the row IDs and then accesses the table rows directly (faster than a full table scan)

Index access (unique)

Used when a table has a unique index in a column

Nested loop

Reads and compares a set of values to another set of values, using a nested loop style (slow)

Merge

Merges two data sets (slow)

Sort

Sorts a data set (slow)

755. What is the likely data sparsity of the P_PRICE column? Answer: Because each product is likely to have a different price, the P_PRICE column is likely to have high sparsity. 756. Should you create an index? Why or why not? Answer: Yes, you should create an index because the column P_PRICE has high sparsity and the column is very likely to be used in many different SQL queries as part of a conditional expression. Problems 11–14 are based on the following query: SELECT

P_CODE, SUM(LINE_UNITS)

FROM

LINE

GROUP BY

P_CODE

HAVING

SUM(LINE_UNITS) > (SELECT MAX(LINE_UNITS) FROM LINE);

757. What is the likely data sparsity of the LINE_UNITS column? Answer: The LINE_UNITS column in the LINE table represents the quantity purchased of a given product in a given invoice. This column is likely to have many different values and therefore, the column is very likely to have high sparsity.

686

758. Should you create an index? If so, what would the index column(s) be, and why would you create the index? If not, explain your reasoning. Answer: Yes, you should create an index on LINE_UNITS. This index is likely to help in the execution of the inner query that computes the maximum value of LINE_UNITS. 759. Should you create an index on P_CODE? If so, write the SQL command to create the index. If not, explain your reasoning. Answer: Yes, creating an index on P_CODE will help in query execution. However, most DBMSs automatically index foreign key columns. If this is not the case in your DBMS, you can manually create an index using the CREATE INDEX LINE_NDX1 ON LINE(P_CODE) command. (Note that we have named the index LINE_NDX1.) 760. Write the command to create statistics for this table. Answer: ANALYZE TABLE LINE COMPUTE STATISTICS; Problems 15 and 16 are based on the following query: SELECT

P_CODE, P_QOH * P_PRICE

FROM

PRODUCT

WHERE

P_QOH * P_PRICE > (SELECT AVG(P_QOH * P_PRICE) FROM PRODUCT)

761. What is the likely data sparsity of the P_QOH and P_PRICE columns? Answer: The P_QOH and P_PRICE are likely to have high data sparsity. 762. Should you create an index? If so, what would the index column(s) be, and why should you create the index? Answer: In this case, creating an index on P_QOH or on P_PRICE will not help the query execute faster for two reasons: first, the WHERE condition on the outer query uses an expression and second, the aggregate function also uses an expression. When using expressions in the operands of a conditional expression, the DBMS will not use indexes available on the columns that are used in the expression. Problems 17–20 are based on the following query: SELECT

V_CODE, V_NAME, V_CONTACT, V_STATE

FROM

VENDOR

WHERE

V_STATE = ‘TN’

ORDER BY

V_NAME;

687

763. What indexes should you create and why? Write the SQL command to create the indexes. Answer: You should create an index on the V_STATE column in the VENDOR table. This new index will help in the execution of this query because the conditional operation uses the V_STATE column in the conditional criteria. In addition, you should create an index on V_NAME, because it is used in the ORDER BY clause. The commands to create the indexes are: CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE); CREATE INDEX VEND_NDX2 ON VENDOR(V_NAME); Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively. 764. Assume that 10,000 vendors are distributed as shown in Table P11.18. What percentage of rows will be returned by the query? Answer:

Table P11.18 State

Number of Vendors

State

Number of Vendors

358

100

3244

645

345

995

821

425

113

589

208

745

375

258

688

Given the distribution of values in Table P11.18, the query will return 113 of the 10,000 rows, or 1.13% of the total table rows. 765. What type of I/O database operations would most likely be used to execute the query? Answer: Assuming that you create the index on V_STATE and that you generate the statistics on the VENDOR table, the DBMS is very likely to use the index scan range to access the index data and then use the table access by row ID to get the VENDOR rows. 766. Using Table 11.4 as an example, create two alternative access plans. Answer: The two access plans are shown in Table P11.20.

Table P11.20 Comparing Access Plans and I/O Costs Plan

Step

Operation

Full table scan VENDOR

I/O Operations

I/O Cost

Resulting Set Rows

Total I/O Cost

10,000

113

10,000

Select only rows with V_STATE=‘TN’ A

SORT Operation

113

10,113

Index Scan Range of VEND_NDX1

113

Table Access by RowID

113

226

113

339

VENDOR B

SORT Operation

P_CODE, P_DESCRIPT, P_PRICE, P.V_CODE, V_STATE

FROM

PRODUCT P, VENDOR V

WHERE

P.V_CODE = V.V_CODE

ORDER BY

AND

V_STATE = ‘NY’

AND

V_AREACODE = ‘212’

P_PRICE;

689

767. What indexes would you recommend? Answer: In this case, there are three possible indices to be created. First, you can create an index on VENDOR.V_STATE. Second, you can create an index in VENDOR.V_AREACODE. It is very likely that there will be many queries that will use these fields to generate reports and filters. Next, you can create an index in PRODUCT.P_PRICE to help with the ORBER BY statement. It is important to note that these three columns are high sparsity. There should not be a need to create an index on VENDOR.V_CODE as this is the primary key of the VENDOR table. Depending on the number of vendors providing products, it may be recommended to create an index on PRODUCT.V_CODE; if the sparsity is high and we have a large number of products. 768. Write the commands required to create the indexes you recommended in Problem 21. Answer: CREATE INDEX VEND_NDX22A ON VENDOR(V_STATE); CREATE INDEX VEND_NDX22B ON VENDOR(V_AREACODE); CREATE INDEX PROD_NDX22A ON PRODUCT(P_PRICE); CREATE INDEX PROD_NDX22B ON PRODUCT(V_CODE); 769. Write the command(s) used to generate the statistics for the PRODUCT and VENDOR tables. Answer: ANALYZE TABLE PRODUCT COMPUTE STATISTICS; ANALYZE TABLE VENDOR COMPUTE STATISTICS; 770. What index would you recommend based on the following query, and what command would you use to create it? SELECT

P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE

FROM

PRODUCT

WHERE

V_CODE = ‘21344’

ORDER BY

P_CODE;

690

Problems 25 and 26 are based on the following query: SELECT

P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE

FROM

PRODUCT

WHERE

P_QOH < P_MIN

AND

P_MIN = ‘P_REORDER’

AND

P_REORDER = 50;

ORDER BY

P_QOH;

771. Use the recommendations given in Section 11-5b to rewrite the query and produce the required results more efficiently. Answer: SELECT

P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE

FROM

PRODUCT

WHERE

P_REORDER = 50

AND

P_MIN = 50

AND

P_QOH < 50

ORDER BY

P_QOH;

This new query rewrites some conditions as follows: 

Because P_REORDER must be equal to 50, it replaces P_MIN = P_REORDER with P_MIN = 50.



Because P_MIN must be 50, it replaces P_QOH<P_MIN with P_QOH<50.

Having literals in the query conditions make queries more efficient. Note that you still need all three conditions in the query conditions. 772. What indexes would you recommend? Write the commands to create those indexes. Answer: Because the query uses equality comparison on P_REORDER, P_MIN, and P_QOH, you should have indexes in such columns. The commands to create such indexes are: CREATE INDEX PROD_NDX1 ON PRODUCT(P_REORDER); CREATE INDEX PROD_NDX2 ON PRODUCT(P_MIN); CREATE INDEX PROD_NDX3 ON PRODUCT(P_QOH);

691

Problems 27–29 are based on the following query: SELECT

CUS_CODE, MAX(LINE_UNITS * LINE_PRICE)

FROM

CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE

WHERE

CUS_AREACODE = ‘615’

GROUP BY

CUS_CODE;

773. Assuming that you generate 15,000 invoices per month, what recommendation would you give the designer about the use of derived attributes? Answer: This query uses the MAX aggregate function to compute the maximum invoice line value by customer. Because this table increases at a rate of 15,000 rows per month, the query would take considerable amount of time to run as the number of invoice rows increases. Furthermore, because the MAX aggregate function uses an expression (LINE_UNITS*LINE_PRICE) instead of a simple table column, the query optimizer is very likely to perform a full table scan in order to compute the maximum invoice line value. One way to speed up the query would be to store the derived attribute LINE_TOTAL in the LINE_TABLE and create an index on LINE_TOTAL. This way, the query would benefit by using the index to execute the query. 774. Assuming that you follow the recommendations you gave in Problem 27, how would you rewrite the query? Answer: SELECT

CUS_CODE, MAX(LINE_TOTAL)

FROM

CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE

WHERE

CUS_AREACODE = ‘615’

GROUP BY

CUS_CODE;

775. What indexes would you recommend for the query you wrote in Problem 28, and what SQL commands would you use? Answer: The query will benefit from having an index on CUS_AREACODE and an index on CUS_CODE. Because CUS_CODE is a foreign key on invoice, it’s very likely that an index already exists. In any case, the query uses the CUS_AREACODE in an equality comparison and therefore, an index on this column is highly recommended. The command to create this index would be: CREATE INDEX CUS_NDX1 ON CUSTOMER(CUS_AREACODE);

692

ANSWERS TO REVIEW QUESTIONS 776. Describe the evolution from centralized DBMSs to distributed DBMSs. Answer: Briefly, early database systems were centralized DBMSs with a single, central site typically housing a mainframe system to serve the needs of all users. Over time, the competitive, societal, and technological environments changed. Business operations became more global, business units became more integrated, and the manner in which internal and external constituents use data changed. The explosion of mobile device and acceptance of the Internet as a platform for data access and distribution greatly increased the need to transact data in highly dispersed environments. A single, centralized site could not meet the exponential growth in data processing and communication demands. As a result, organizations began to distribute the database environment across multiple sites to distribute processing loads and reduce network congestion. 777. List and discuss some of the factors that influenced the evolution of the DDBMS. Answer: 

Global business operations



On-demand transactions using web-based services



Mobile computing



Convergence of data realms

693

778. What are the advantages of the DDBMS? Answer: 

Data is located near the site of the greatest demand



Faster data access



Faster data processing



Improved communications



Reduced operating costs



User-friendly interface



Less danger of a single-point failure



Processor independence

779. What are the disadvantages of the DDBMS? Answer: 

Complexity of management and control



Increased technological difficulty



Security is more difficult to maintain with more points of failure



Lack of standards for DDBMS environments



Increased storage and infrastructure requirements since multiple sites and often multiple copies of data must be maintained.



Increased costs in training IT personnel.



Higher costs for not only infrastructure duplication, but more personnel, licenses, and software.

780. Explain the difference between a distributed database and distributed processing. Answer: Distributed processing is the sharing of data manipulation across multiple processing units. This can include data access, data selection, calculations and manipulations, and data validation. Distributed database is the sharing of the storage of the data while it is at rest. Distributed databases use database fragments, which are defined subsets of the database data. Distributed processing may or may not use a distributed database, but a distributed database always requires distributed processing. 781. What is a fully distributed database management system? Answer: A fully distributed database management system is a DBMS that can perform all of the functions of a centralized DBMS on a distributed database. Further, it must be able to perform all of those functions in a manner that is transparent to the user, such that the user cannot tell whether the underlying database is centralized or distributed.

694

782. What are the components of a DDBMS? Answer: Computer workstations (nodes or sites) that form the network components. Network hardware and software sufficient to allow the nodes to communication effectively with each other. Communications media such as wired or Wi-Fi communications over which the network hardware and software to exchange communications among the nodes. Transaction processor software, or application, that requests and consumes data. Data processor software that coordinates the data access to the data that resides on that node. 783. List and explain the transparency features of a DDBMS. Answer: 

Distribution transparency—the user need not know how, or if, the data is physically distributed across the network. All data appears local to the user.



Transaction transparency—allows data to be updated at multiple sites while maintaining data consistency and integrity.



Performance transparency—the system will not suffer from performance degradation due to its distribution.



Heterogeneity transparency—the system can integrate data from multiple, different, local DBMSs under a common global schema.

784. Define and explain the different types of distribution transparency. Answer: Distribution transparency refers to transparency in the management of the distributed database as if it were a centralized database. Local mapping transparency requires end users and programmers to know both the names and the locations of the fragments that contain the data to be manipulated. Location transparency means that the end users and programmers need to know the names of the fragments that contain the data to be manipulated, but do not need to know the locations of those fragments. Fragmentation transparency means that end users and programmers do not need to know the names or locations of the fragments that contain the data to be manipulated. In fact, with fragmentation transparency, they do not even need to know that the database is distributed. 785. Describe the different types of database requests and transactions. Answer: A database transaction is formed by one or more database requests. Each database request is the equivalent of a single SQL statement. The basic difference between a local transaction and a distributed transaction is that the latter can update or request data from several remote sites on a network. In a DDBMS, a database request and a database transaction can be of two types: remote or distributed.

695

NOTE The figure references in the discussions refer to the figures found in the text. The figures are not reproduced in this manual. A remote request accesses data located at a single remote database processor (or DP site). In other words, an SQL statement (or request) can reference data at only one remote DP site. Use Figure 12.9 to illustrate the remote request. A remote transaction, composed of several requests, accesses data at only a single remote DP site. Use Figure 12.10 to illustrate the remote transaction. As you discuss Figure 12.10, note that both tables are located at a remote DP (site B) and that the complete transaction can reference only one remote DP. Each SQL statement (or request) can reference only one (the same) remote DP at a time; the entire transaction can reference only one remote DP; and it is executed at only one remote DP. A distributed transaction allows a transaction to reference several different local or remote DP sites. Although each single request can reference only one local or remote DP site, the complete transaction can reference multiple DP sites because each request can reference a different site. Use Figure 12.11 to illustrate the distributed transaction. A distributed request lets us reference data from several different DP sites. Since each request can access data from more than one DP site, a transaction can access several DP sites. The ability to execute a distributed request requires fully distributed database processing because we must be able to: 3. Partition a database table into several fragments. 4. Reference one or more of those fragments with only one request. In other words, we must have fragmentation transparency. The location and partition of the data should be transparent to the end user. Use Figure 12.12 to illustrate the distributed request. As you discuss Figure 12.12, note that the transaction uses a single SELECT statement to reference two tables, CUSTOMER and INVOICE. The two tables are located at two different remote DP sites, B and C. The distributed request feature also allows a single request to reference a physically partitioned table. For example, suppose that a CUSTOMER table is divided into two fragments C1 and C2, located at sites B and C, respectively. The end user wants to obtain a list of all customers whose balance exceeds $250.00. Use Figure 12.13 to illustrate this distributed request. Note that full fragmentation support is provided only by a DDBMS that supports distributed requests.

696

786. Explain the need for the two-phase commit protocol. Then describe the two phases. Answer: Just as a centralized DBMS does, a DDBMS must support the atomicity of transactions and change the database from one consistent state to another. This is done using a two-phase commit protocol (2PC). Throughout a transaction, data manipulation instructions have been sent to various data processors throughout the network. Each data processor has been maintaining local transaction log files for those operations. When the COMMIT command is issued by the user or application, the 2PC ensures that the transaction is committed at all sites involved in the transaction. The 2PC’s first phase is the “Preparation” phase. A coordinator node will send a PREPARE TO COMMIT message to the subordinate sites. The subordinate sites will write their respective transaction logs to permanent storage (not the actual database) and send a PREPARED TO COMMIT reply. If any site replies that it is NOT PREPARED, the coordinator sends an ABORT message to all subordinates. If all sites send the PREPARED TO COMMIT reply, then phase two is activated. Phase two is the “Final Commit” phase. The coordinator will send a COMMIT message to all subordinates. Each subordinate will then write the transaction to the actual database. If the transaction is successfully written to the database, the subordinate sends a COMMITTED reply. If the coordinator receives a COMMITTED reply from every subordinate, the end user or application is notified that the transaction was committed. If any subordinate replies NOT COMMITTED, then the coordinator sends an ABORT message to every subordinate to roll back the entire transaction. 787. What is the objective of query optimization functions? Answer: The objective of query optimization functions is to minimize the total costs associated with the execution of a database request. The costs associated with a request are a function of: 

the access time (I/O) cost involved in accessing the physical data stored on disk



the communication cost associated with the transmission of data among nodes in distributed database systems



the CPU time cost

It is difficult to separate communication and processing costs. Query optimization algorithms use different parameters, and the algorithms assign different weight to each parameter. For example, some algorithms minimize total time, others minimize the communication time, and still others do not factor in the CPU time, considering it insignificant relative to the other costs. Query optimization must provide distribution and replica transparency in distributed database systems. 788. To which transparency feature are the query optimization functions related? Answer: Query optimization functions are associated with the performance transparency features of a DDBMS. In a DDBMS, the query-optimization routines are more complicated because the DDBMS must decide where and which fragment of the database to access. Data fragments are stored at several sites, and the data fragments are replicated at several sites. 789. What issues should be considered when resolving data requests in a distributed data environment? Answer: A data request could be either a read or a write request. However, most requests tend to be read requests. In both cases, resolving data requests in a distributed data environment mostly consider the following issues:

697



Data distribution



Data replication



Network and node availability

A more detailed discussion of these factors can be found in Section 12-10. 790. Describe the three data fragmentation strategies. Give some examples of each. Answer: Horizontal fragmentation fragments rows for a table across multiple sites. The entire row remains on the same fragment, but different rows are on different fragments. An example would be to put customer rows in different fragments based on their state of residence such that all customers in the United States are in one fragment, while customers from Europe are in a different fragment, and customers from Japan are in a different fragment. Another example would be putting different product rows in different fragments based on their manufacturing location. Vertical fragmentation fragments columns for a table across multiple sites. With vertical fragmentation, some attributes of a row are in one site, while other attributes of that same row are in another site. For example, directory information for employees (name, email address, phone number, etc.) may be kept on one server, while payroll-related attributes (wage rate, hours worked, withholdings, etc.) are kept on a different server. Mixed fragmentation is a combination of horizontal and vertical fragmentation such that fragments contain some columns of some rows, other fragments contain other columns of those same rows, and still other fragments contain those same columns but for different rows. 791. What is data replication, and what are the three replication strategies? Answer: Data replication is storing the same data in more than one location. This is different than fragmentation that decomposed a table of data into multiple pieces and put each piece in a different location; however, each piece was only stored once. Replication may or may not involve fragmenting the data—that is, it may not be fragmented or may be fragmented using any of the three fragmentation strategies—but one or more pieces of the data are stored in more than one location. The three strategies are fully replicated, partially replicated, and unreplicated databases. Unreplicated databases do not use replication—each portion of the database is stored only once. A fully replicated database stores multiple copies of every piece of the database. A partially replicated database stores multiple copies of some parts of the database but not all parts. 792. What are the two basic styles of data replication? Answer: There are basically two styles of replication: 

Push replication. In this case, the originating DP node sends the changes to the replica nodes to ensure that all data are mutually consistent.



Pull replication. The originating DP node notifies the replica nodes so they can pull the updates one their own time.

See Section 12-11b for more information. 793. What trade-offs are involved in building highly distributed data environments?

698

Answer: In the year 2000, Dr. Eric Brewer stated in a presentation that: “in any highly distributed data system there are three common desirable properties: consistency, availability and partition tolerance. However, it is impossible for a system to provide all three properties at the same time.” Therefore, the system designers have to balance the trade-offs of these properties in order to provide a workable system. This is what is known as the CAP theorem. For more information on this, see Section 12-12. 794. How does a BASE system differ from a traditional distributed database system? Answer: A traditional database system enforces the ACID properties as to ensure that all database transactions yield a database in a consistent state. In a centralized database system, all data resides in a centralized node. However, in a distributed database system, data are located in multiple geographically disperse sites connected via a network. In such cases, network latency and network partitioning impose a new level of complexity. In most highly distributed systems, designers tend to emphasize availability over data consistency and partition tolerance. This trade-off has given way to a new type of database system in which data are basically available, soft state, and eventually consistent (BASE). For more information about BASE systems see Section 12-12. 795. How do NewSQL databases compare to NoSQL databases in terms of consistency, availability, and partition tolerance? Answer: NewSQL databases attempt to merge ACID transactions of centralized databases with highly distributed models of NoSQL databases. NoSQL databases tend to use BASE, basically available, soft state, eventually consistency to achieve high levels of partitioning tolerance. NewSQL databases tend toward more rigorous consistency and availability at the expense of partitioning tolerance.

699

ANSWERS TO PROBLEMS Problem 1 is based on the DDBMS scenario in Figure P12.1.

FIGURE P12.1 The DDBMS Scenario for Problem 1 TABLES

FRAGMENTS

LOCATION

CUSTOMER PRODUCT

N/A PROD_A PROD_B N/A N/A

A A B B B

INVOICE INV_LINE

Number of DPs Operation

Request

Remote

Distributed

Transaction

Remote

Distributed

700

Based on this summary, the questions are answered easily. Answer: At Site C l.

SELECT

FROM

CUSTOMER;

This SQL sequence represents a remote request. m. SELECT

FROM

INVOICE

WHERE

INV_TOTAL < 1000;

This SQL sequence represents a remote request. n. SELECT

FROM

PRODUCT

WHERE

PROD_QOH < 10;

This SQL sequence represents a distributed request. Note that the distributed request is required when a single request must access two DP sites. The PRODUCT table is composed of two fragments, PRO_A and PROD_B, which are located in sites A and B, respectively. o. BEGIN WORK; UPDATE CUSTOMER SET CUS_BALANCE = CUS_BALANCE + 100 WHERE CUS_NUM=‘10936’; INSERT INTO INVOICE(INV_NUM, CUS_NUM, INV_DATE, INV_TOTAL) VALUES (‘986391’, ‘10936’, ‘2022-02-15’, 100); INSERT INTO INVLINE(INV_NUM, PROD_CODE, LINE_PRICE) VALUES (‘986391’, ‘1023’, 100); UPDATE PRODUCT SET PROD_QOH = PROD_QOH - 1 WHERE PROD_CODE = ‘1023’; COMMIT WORK;

701

This SQL sequence represents a distributed request. Note that UPDATE CUSTOMER and the two INSERT statements only require remote request capabilities. However, the entire transaction must access more than one remote DP site, so we also need distributed transaction capability. The last UPDATE PRODUCT statement accesses two remote sites because the PRODUCT table is divided into two fragments located at two remote DP sites. Therefore, the transaction as a whole requires distributed request capability. p. BEGIN WORK; INSERT CUSTOMER(CUS_NUM, CUS_NAME, CUS_ADDRESS, CUS_BAL) VALUES (‘34210’, ‘Victor Ephanor’, ‘123 Main St’, 0.00); INSERT INTO INVOICE(INV_NUM, CUS_NUM, INV_DATE, INV_TOTAL) VALUES (‘986434’, ‘34210’, ‘2022-08-10’, 2.00); COMMIT WORK; This SQL sequence represents a distributed transaction. Note that, in this transaction, each individual request requires only remote request capabilities. However, the transaction as a whole accesses two remote sites. Therefore, distributed request capability is required. At Site A q. SELECT

CUS_NUM, CUS_NAME, INV_TOTAL

FROM

CUSTOMER, INVOICE

WHERE

CUSTOMER.CUS_NUM = INVOICE.CUS_NUM;

This SQL sequence represents a distributed request. Note that the request accesses two DP sites, one local and one remote. Therefore distributed capability is needed. r.

SELECT

FROM

INVOICE

WHERE

INV_TOTAL > 1000;

This SQL sequence represents a remote request, because it accesses only one remote DP site. s. SELECT

FROM

PRODUCT

WHERE

PROD_QOH < 10;

This SQL sequence represents a distributed request. In this case, the PRODUCT table is partitioned between two DP sites, A and B. Although the request accesses only one remote DP

702

site, it accesses a table that is partitioned into two fragments: PROD-A and PROD-B. A single request can access a partitioned table only if the DBMS supports distributed requests. At Site B t.

SELECT

FROM

CUSTOMER;

This SQL sequence represents a remote request. u. SELECT

CUS_NAME, INV_TOTAL

FROM

CUSTOMER, INVOICE

WHERE

INV_TOTAL > 1000 AND CUSTOMER. CUS_NUM = INVOICE.CUS_NUM;

This SQL sequence represents a distributed request. v. SELECT

FROM

PRODUCT

WHERE

PROD_QOH < 10;

This SQL sequence represents a distributed request. (See explanation for part h.) 796. The following data structure and constraints exist for a magazine publishing company: Answer: d. The company publishes one regional magazine in each of four states: Florida (FL), South Carolina (SC), Georgia (GA), and Tennessee (TN). e. The company has 300,000 customers (subscribers) distributed throughout the four states listed in Problem 2a. f.

On the first day of each month, an annual subscription INVOICE is printed and sent to each customer whose subscription is due for renewal. The INVOICE entity contains a REGION attribute to indicate the customer’s state of residence (FL, SC, GA, TN): CUSTOMER (CUS_NUM, CUS_NAME,CUS_ADDRESS, CUS_CITY, CUS_STATE, CUS_ZIP,CUS_SUBSDATE) INVOICE (INV_NUM, INV_REGION, CUS_NUM, INV_DATE, INV_TOTAL)

703

Listing all current customers by region



Listing all new customers by region



Reporting all invoices by customer and by region

Given these requirements, how must you partition the database? The CUSTOMER table must be partitioned horizontally by state. (We show the partitions in the answer to Question 3c.) 797. Given the scenario and requirements in Problem 2, answer the following questions: Answer: e. What recommendations will you make regarding the type and characteristics of the required database system? The Magazine Publishing Company requires a distributed system with distributed database capabilities. The distributed system will be distributed among the company locations in South Carolina, Georgia, Florida, and Tennessee. The DDBMS must be able to support distributed transparency features, such as fragmentation transparency, replica transparency, transaction transparency, and performance transparency. Heterogeneous capability is not a mandatory feature since we assume there is no existing DBMS in place and that the company wants to standardize on a single DBMS. f.

What type of data fragmentation is needed for each table? The database must be horizontally partitioned, using the STATE attribute for the CUSTOMER table and the REGION attribute for the INVOICE table.

g. What criteria must be used to partition each database? The following fragmentation segments reflect the criteria used to partition each database: Horizontal Fragmentation of the CUSTOMER Table by State

Fragment Name

Location

Condition

Node name

Tennessee

CUS_STATE = ‘TN’

NAS

Georgia

CUS_STATE = ‘GA’

ATL

Florida

CUS_STATE = ‘FL’

TAM

South Carolina

CUS_STATE = ‘SC’

CHA

704

Horizontal Fragmentation of the INVOICE Table by Region

Fragment Name

Location

Condition

Node name

Tennessee

REGION_CODE = ‘TN’

NAS

Georgia

REGION_CODE = ‘GA’

ATL

Florida

REGION_CODE = ‘FL’

TAM

South Carolina

REGION_CODE = ‘SC’

CHA

h. Design the database fragments. Show an example with node names, location, fragment names, attribute names, and demonstration data. Note the following fragments: Fragment C1

Location: Tennessee

Node: NAS

CUS_NUM

CUS_NAME

CUS_ADDRESS

CUS_CITY

CUS_STATE

10884

James D. Burger

123 Court Avenue

Memphis

2020-12-08

10993

Lisa B. Barnette

910 Eagle Street

Nashville

2021-03-12

Fragment C2 CUS_NUM

Location: Georgia

CUS_SUB_DATE

Node: ATL

CUS_NAME

CUS_ADDRESS

CUS_CITY

CUS_STATE

CUS_SUB_DATE

11887

Ginny E. Stratton

335 Main Street

Atlanta

2020-08-11

13558

Anna H. Ariona

657 Mason Ave.

Dalton

2021-06-23

705

Fragment C3

Location: Florida

Node: TAM

CUS_NUM

CUS_NAME

CUS_ADDRESS

CUS_CITY

10014

John T. Chi

456 Brent Avenue

Miami

2020-11-18

15998

Lisa B. Barnette

234 Ramala Street

Tampa

2021-03-23

Fragment C4

CUS_STATE

Location: South Carolina

CUS_SUB_DATE

Node: CHA

CUS_NUM

CUS_NAME

CUS_ADDRESS

CUS_CITY

CUS_STATE

21562

Thomas F. Matto

45 N. Pratt Circle

Charleston

2020-12-02

18776

Mary B. Smith

526 Boone Pike

Charleston

2021-10-28

Fragment I1

Location: Tennessee

Node: NAS

INV_NUM

REGION_CODE

CUS_NUM

INV_DATE

INV_TOTAL

213342

10884

2021-11-01

45.95

209987

10993

2022-02-15

45.95

Fragment I2

Location: Georgia

Node: ATL

INV_NUM

REGION_CODE

CUS_NUM

INV_DATE

INV_TOTAL

198893

11887

2021-08-15

70.45

224345

13558

2022-06-01

45.95

Fragment I3

CUS_SUB_DATE

Location: Florida

Node: TAM

INV_NUM

REGION_CODE

CUS_NUM

INV_DATE

INV_TOTAL

200915

10014

2021-11-01

45.95

231148

15998

2022-03-01

24.95

706

Fragment I4

Location: South Carolina

Node: CHA

INV_NUM

REGION_CODE

CUS_NUM

INV_DATE

INV_TOTAL

243312

21562

2021-11-15

45.95

231156

18776

2022-10-01

45.95

g. What type of distributed database operations must be supported at each remote site? To answer this question, you must first draw a map of the locations, the fragments at each location, and the type of transaction or request support required to access the data in the distributed database.

Node Fragment

NAS

ATL

TAM

CHA

CUSTOMER

INVOICE

none

none none distributed request

Distributed Operations Required none

Headquarters

707

ANSWERS TO REVIEW QUESTIONS 798. What is business intelligence? Give some recent examples of BI usage, using the Internet for assistance. What BI benefits have companies found? Answer: Business intelligence (BI) is a term used to describe a comprehensive, cohesive, and integrated set of applications used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information used to support business decision making. As the names implies, BI is about creating intelligence about a business. This intelligence is based on learning and understanding the facts about a business environment. BI is a framework that allows a business to transform data into information, information into knowledge, and knowledge into wisdom. BI has the potential to positively affect a company’s culture by creating “business wisdom” and distributing it to all users in an organization. This business wisdom empowers users to make sound business decisions based on the accumulated knowledge of the business as reflected on recorded facts (historic operational data). Table 13.1 in the text gives some real-world examples of companies that have implemented BI tools (data warehouse, data mart, OLAP, and/or data mining tools) and shows how the use of such tools benefited the companies. Emphasize that the main focus of BI is to gather, integrate, and store business data for the purpose of creating information. BI integrates people and processes using technology in order to add value to the business. Such value is derived from how end users use such information in their daily activities, and in particular, their daily business decision making. Also note that the BI technology components are varied. Examples of BI usage found in web sources: 4. The Dallas Teachers Credit Union (DTCU) used geographical data analysis to increase its customer base from 250,000 professional educators to 3.5 million potential customers virtually overnight. The increase gave the credit union the ability to compete with larger banks that had a strong presence in Dallas. (http://www.computerworld.com/s/article/47371/Business_Intelligence?taxono myId=120) 5. Researchers from the Rand Corporation recently applied business intelligence and analytics technology to determine the dangerous side effects of prescription drugs. (http://www.panorama.com/industry-news/article-view.html?name=Analyticsspots-prescription-problems-508338) 6. Microsoft Case Study website for hundreds of cases about Business Intelligence usage. (http://www.microsoft.com/casestudies/) 799. Describe the BI framework. Illustrate the evolution of BI. Answer: BI is not a product by itself, but a framework of concepts, practices, tools, and technologies that help a business better understand its core capabilities, provide snapshots of the company situation, and identify key opportunities to create competitive advantage. In practice, BI provides a well-orchestrated framework for the management of data that works across all levels of the organization. BI involves the following general steps: 7. Collecting and storing operational data 8. Aggregating the operational data into decision support data 9. Analyzing decision support data to generate information

708

10. Presenting such information to the end user to support business decisions 11. Making business decisions, which in turn generate more data that is collected, stored, and so on (restarting the process). 12. Monitoring results to evaluate outcomes of the business decisions (providing more data to be collected, stored, etc.) To implement all these steps, BI uses varied components and technologies. Section 13-2 is where you’ll find a discussion of these components and technologies—see Table 13.2. Figure 13.2 illustrates the evolution of BI formats. 800. What are decision support systems, and what role do they play in the business environment? Answer: Decision support systems (DSSs) are based on computerized tools that are used to enhance managerial decision making. Because complex data and the proper analysis of such data are crucial to strategic and tactical decision making, DSS are essential to the well-being and even survival of businesses that must compete in a global marketplace. 801. Explain how the main components of the BI architecture interact to form a system. Describe the evolution of BI information dissemination formats. Answer: Refer the students to Section 13-3 in the chapter. Emphasize that, actually, there is no single BI architecture; instead, it ranges from highly integrated applications from a single vendor to a loosely integrated, multivendor environment. However, there are some general types of functionality that all BI implementations share. Like any critical business IT infrastructure, the BI architecture is composed of data, people, processes, technology, and the management of such components. Figure 13.1 (in the text) depicts how all those components fit together within the BI framework. Figure 13.2, in Section 13-2c “Business Intelligence Evolution,” tracks the changes of business intelligence reporting and information dissemination over time. In summary: 6. 1970s: centralized reports running on mainframes, minicomputers, or even central server environments. Such reports were predefined and took considerable time to process. 7. 1980s: desktop computers, downloaded spreadsheet data from central locations. 8. 1990s: first generation DSS, centralized reporting, and OLAP. 9. 2000s: BI web-based dashboards and mobile BI. 10. 2010s: Present: Big Data, NoSQL, Data Visualization. 802. What are the most relevant differences between operational data and decision support data? Answer: Operational data and decision support data serve different purposes. Therefore, it is not surprising to learn that their formats and structures differ. Most operational data are stored in a relational database in which the structures (tables) tend to be highly normalized. Operational data storage is optimized to support transactions that represent daily operations. For example, each time an item is sold, it must be accounted for. Customer data, inventory data, and so on are in a frequent update mode. To provide effective update performance, operational systems store data in many tables, each with a minimum number of fields. Thus, a simple sales transaction might be represented by five or more different tables (for example, invoice, invoice line, discount, store, and department). Although

709

such an arrangement is excellent in an operational database, it is not efficient for query processing. For example, to extract a simple invoice, you would have to join several tables. Whereas operational data are useful for capturing daily business transactions, decision support data give tactical and strategic business meaning to the operational data. From the data analyst’s point of view, decision support data differ from operational data in three main areas: time span, granularity, and dimensionality. 4. Time span. Operational data cover a short time frame. In contrast, decision support data tend to cover a longer time frame. Managers are seldom interested in a specific sales invoice to customer X; rather, they tend to focus on sales generated during the last month, the last year, or the last five years. 5. Granularity (level of aggregation). Decision support data must be presented at different levels of aggregation, from highly summarized to near-atomic. For example, if managers must analyze sales by region, they must be able to access data showing the sales by region, by city within the region, by store within the city within the region, and so on. In that case, summarized data to compare the regions is required, but also data in a structure that enables a manager to drill down, or decompose, the data into more atomic components (that is, finer-grained data at lower levels of aggregation). In contrast, when you roll up the data, you are aggregating the data to a higher level. 6. Dimensionality. Operational data focus on representing individual transactions rather than on the effects of the transactions over time. In contrast, data analysts tend to include many data dimensions and are interested in how the data relate over those dimensions. For example, an analyst might want to know how product X fared relative to product Z during the past six months by region, state, city, store, and customer. In that case, both place and time are part of the picture. Figure 13.3 (in the text) shows how decision support data can be examined from multiple dimensions (such as product, region, and year), using a variety of filters to produce each dimension. The ability to analyze, extract, and present information in meaningful ways is one of the differences between decision support data and transaction-at-a-time operational data. The DSS components that form a system are shown in the text’s Figure 13.1. Note that: 

The data store component is basically a DSS database that contains business data and business-model data. These data represent a snapshot of the company situation.



The data extraction and filtering component is used to extract, consolidate, and validate the data store.



The end-user query tool is used by the data analyst to create the queries used to access the database.



The end-user presentation tool is used by the data analyst to organize and present the data.

803. What is a data warehouse, and what are its main characteristics? How does it differ from a data mart? Answer: A data warehouse is an integrated, subject-oriented, time-variant, and nonvolatile database that provides support for decision making. (See Section 13-4 for an in-depth discussion about the main characteristics.)

710

The data warehouse is usually a read-only database optimized for data analysis and query processing. Typically, data are extracted from various sources and are then transformed and integrated—in other words, passed through a data filter—before being loaded into the data warehouse. Users access the data warehouse via front-end tools and/or end-user application software to extract the data in usable form. Figure 13.4 in the text illustrates how a data warehouse is created from the data contained in an operational database. You might be tempted to think that the data warehouse is just a big, summarized database. But a good data warehouse is much more than that. A complete data warehouse architecture includes support for a decision support data store, a data extraction and integration filter, and a specialized presentation interface. To be useful, the data warehouse must conform to uniform structures and formats to avoid data conflicts and to support decision making. In fact, before a decision support database can be considered a true data warehouse, it must conform to the 12 rules described in Section 13-4b and illustrated in Table 13.9. 804. Give three examples of likely problems when operational data is integrated into the data warehouse. Answer: Within different departments of a company, operational data may vary in terms of how they are recorded or in terms of data type and structure. For instance, the status of an order may be indicated with text labels such as “open”, “received”, “cancel”, or “closed” in one department while another department has it as “1”, “2”, “3”, or “4”. The student status can be defined as “Freshman”, “Sophomore”, “Junior”, or “Senior” in the Accounting department and as “FR”, “SO”, “JR”, or “SR” in the Computer Information Systems department. A social security number field may be stored in one database as a string of numbers and dashes (‘XXX-XX-XXXX’), in another as a string of numbers without the dashes (‘XXXXXXXXX’), and in yet a third as a numeric field (#########). Most of the data transformation problems are related to incompatible data formats, the use of synonyms and homonyms, and the use of different coding schemes. Use the following scenario to answer Questions 8–14. While working as a database analyst for a national sales organization, you are asked to be part of its data warehouse project team. 805. Prepare a high-level summary of the main requirements for evaluating DBMS products for data warehousing. Answer: There are four primary ways to evaluate a DBMS that is tailored to provide fast answers to complex queries: 

The database schema supported by the DBMS



The availability and sophistication of data extraction and loading tools



The end-user analytical interface



The database size requirements

711

806. Your data warehousing project group is debating whether to create a prototype of a data warehouse before its implementation. The project group members are especially concerned about the need to acquire some data warehousing skills before implementing the enterprise-wide data warehouse. What would you recommend? Explain your recommendations. Answer: Knowing that data warehousing requires time, money, and considerable managerial effort, many companies create data marts, instead. Data marts use smaller, more manageable data sets that are targeted to fit the special needs of small groups within the organization. In other words, data marts are small, single-subject data warehouse subsets. Data mart development and use costs are lower and the implementation time is shorter. Once the data marts have demonstrated their ability to serve the DSS, they can be expanded to become data warehouses or they can be migrated into larger existing data warehouses. 807. Suppose that you are selling the data warehouse idea to your users. How would you define multidimensional data analysis for them? How would you explain its advantages to them? Answer: Multidimensional data analysis refers to the processing of data in which data are viewed as part of a multidimensional structure, one in which data are related in many different ways. Business decision makers usually view data from a business perspective. That is, they tend to view business data as they relate to other business data. For example, a business data analyst might investigate the relationship between sales and other business variables such as customers, time, product line, and location. The multidimensional view is much more representative of a business perspective. A good way to visualize the data is to use tools such as pivot tables in MS Excel or data visualization products such as MS Power BI, Tableau Software’s Tableau, or QlikView. 808. The data warehousing project group has invited you to provide an OLAP overview. The group’s members are particularly concerned about the OLAP client/server architecture requirements and how OLAP will fit the existing environment. Your job is to explain the main OLAP client/server components and architectures. Answer: OLAP systems are based on client/server technology and they consist of these main modules: 

OLAP Graphical User Interface (GUI)



OLAP Analytical Processing Logic



OLAP Data Processing Logic



712



The OLAP Data Processing Logic (DPL) maps the data analysis requests to the proper data objects in the Data Warehouse and is, therefore, generally placed at the server level.

809. One of your vendors recommends using an MDBMS. How would you explain this recommendation to your project leader? Answer: Multidimensional On-Line Analytical Processing (MOLAP) provides OLAP functionality using multidimensional databases systems (MDBMSs) to store and analyze multidimensional data. MDBMSs use special proprietary techniques to store data in matrixlike arrays of n dimensions. 810. The project group is ready to make a final decision, choosing between ROLAP and MOLAP. What should be the basis for this decision? Why? Answer: The basis for the decision should be the system and end-user requirements. Both ROLAP and MOLAP will provide advanced data analysis tools to enable organizations to generate required information. The selection of one or the other depends on which set of tools will fit best within the company’s existing expertise base, its technology and end-user requirements, and its ability to perform the job at a given cost. The proper OLAP/MOLAP selection criteria must include: 

purchase and installation price



supported hardware and software



compatibility with existing hardware, software, and DBMS



available programming interfaces



performance



availability, extent, and type of administrative tools



support for the database schema(s)



ability to handle current and projected database size



database architecture



available resources



flexibility



scalability



total cost of ownership.

811. The data warehouse project is in the design phase. Explain to your fellow designers how you would use a star schema in the design. Answer: The star schema is a data modeling technique that is used to map multidimensional decision support data into a relational database. The reason for the star schema’s development is that existing relational modeling techniques, ER and normalization, did not yield a database structure that served the advanced data analysis requirements well. Star schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structures on which the operational database is built.

713

The basic star schema has four components: facts, dimensions, attributes, and attribute hierarchies. The star schemas represent aggregated data for specific business activities. Using the schemas, we will create multiple aggregated data sources that will represent different aspects of business operations. For example, the aggregation may involve total sales by selected time periods, by products, by stores, and so on. Aggregated totals can be total product units, total sales values by products, and so on. 812. Briefly discuss the OLAP architectural styles with and without data marts. Answer: Section 13-6d, “OLAP Architecture,” details the basic architectural components of an OLAP environment: 

The graphical user interface (GUI front-end)—located always at the end-user end.



The analytical processing logic—this component could be located in the back end (OLAP server) or could be split between the back-end and front-end components.



Data processing logic—logic used to extract data from data; typically located in the back end.

The term OLAP “engine” is sometimes used to refer to the arrangement of the OLAP components as a whole. However, the architecture allows for the split of the some of the components in a client/server arrangement as depicted in Figures 13.16 and 13.17. Figure 13.16 shows a typical OLAP architecture without data marts. In this architecture, the OLAP tool will extract data from the data warehouse and process the data to be presented by the end-user GUI. The processing of the data takes place mostly on the OLAP engine. The OLAP engine location could be located in each client computer or it could be shared from an OLAP “server.” Figure 13.17 shows a typical OLAP architecture with local data marts (end-user located). The local data marts are “miniature” data warehouses that focus on a subset of the data in the data warehouse. Normally these data marts are subject oriented, such as customers, products, and sales. The local data marts provide faster processing but require that the data be periodically “synchronized” with the main data warehouse. 813. What is OLAP, and what are its main characteristics? Answer: OLAP stands for On-Line Analytical Processing and uses multidimensional data analysis techniques. OLAP yields an advanced data analysis environment that provides the framework for decision making, business modeling, and operations research activities. Its four main characteristics are: 5. Multidimensional data analysis techniques 6. Advanced database support 7. Easy-to-use end-user interfaces 8. Support for client/server architecture 814. Explain ROLAP and list the reasons you would recommend its use in the relational database environment. Answer: Relational On-Line Analytical Processing (ROLAP) provides OLAP functionality for relational databases. ROLAP’s popularity is based on the fact that it uses familiar relational query tools to store and analyze multidimensional data. Because ROLAP is based on familiar

714

relational technologies, it represents a natural extension to organizations that already use relational database management systems within their organizations. 815. Explain the use of facts, dimensions, and attributes in the star schema. Answer: Facts are numeric measurements (values) that represent a specific business aspect or activity. For example, sales figures are numeric measurements that represent product and/or service sales. Facts commonly used in business data analysis are units, costs, prices, and revenues. Facts are normally stored in a fact table, which is the center of the star schema. The fact table contains facts that are linked through their dimensions. Dimensions are qualifying characteristics that provide additional perspectives to a given fact. Dimensions are of interest to us, because business data are almost always viewed in relation to other data. For instance, sales may be compared by product from region to region, and from one time period to the next. The kind of problem typically addressed by DSS might be “make a comparison of the sales of product units of X by region for the first quarter from 2013 through 2022.” In this example, sales have product, location, and time dimensions.

715

Product dimension: product id, description, product type, and manufacturer.



Location dimension: region, state, city, and store number.



Time dimension: year, quarter, month, week, and date.

These product, location, and time dimensions add a business perspective to the sales facts. The data analyst can now associate the sales figures for a given product, in a given region, and at a given time. The star schema, through its facts and dimensions, can provide the data when they are needed and in the required format, without imposing the burden of additional and unnecessary data (such as order #, po #, and status) that commonly exist in operational databases. In essence, dimensions are the magnifying glass through which we study the facts. 816. Explain multidimensional cubes and describe how the slice and dice technique fits into this model. Answer: To explain the multidimensional cube concept, let’s assume a sales fact table with three dimensions: product, location, and time. In this case, the multidimensional data model for the sales example is (conceptually) best represented by a three-dimensional cube. This cube represents the view of sales dimensioned by product, location, and time. (We have chosen a three-dimensional cube because such a cube makes it easier for humans to visualize the problem. There is, of course, no limit to the number of dimensions we can use.) The power of multidimensional analysis resides in its ability to focus on specific slices of the cube. For example, the product manager may be interested in examining the sales of a product, thus producing a slice of the product dimension. The store manager may be interested in examining the sales of a store, thus producing a slice of the location dimension. The intersection of the slices yields smaller cubes, thereby producing the “dicing” of the multidimensional cube. By examining these smaller cubes within the multidimensional cube, we can produce very precise analyses of the variable components and interactions. In short, slice and dice refers to the process that allows us to subdivide a multidimensional cube. Such subdivisions permit a far more detailed analysis than would be possible with the conventional two-dimensional data view. The text’s Section 13-5 and Figures 13.5 through 13.9 illustrate the slice and dice concept. To gain the benefits of slice and dice, we must be able to identify each slice of the cube. Slice identification requires the use of the values of each attribute within a given dimension. For example, to slice the location dimension, we can use a STORE_ID attribute in order to focus on a given store.

716

817. In the star schema context, what are attribute hierarchies and aggregation levels and what is their purpose? Answer: Attributes within dimensions can be ordered in an attribute hierarchy. The attribute hierarchy yields a top-down data organization that permits both aggregation and drilldown/roll-up data analysis. Use Figure 13.8 to show how the attributes of the location dimension can be organized into a hierarchy that orders that location dimension by region, state, city, and store. The attribute hierarchy gives the data warehouse the ability to perform drill-down and roll-up data searches. For example, suppose a data analyst wants an answer to the query “How does the 2022 total monthly sales performance compare to the 2021 monthly sales performance?” Having performed the query, suppose that the data analyst spots a sharp total sales decline in March 2022. Given this discovery, the data analyst may then decide to perform a drill-down procedure for the month of March to see how this year’s March sales by region stack up against last year’s. The drill-down results are then used to find out whether the low overall March sales were reflected in all regions or only in a particular region. This type of drill-down operation may even be extended until the data analyst is able to identify the individual store(s) that is (are) performing below the norm. The attribute hierarchy allows the data warehouse and OLAP systems to use a carefully defined path that will govern how data are to be decomposed and aggregated for drill-down and roll-up operations. Of course, keep in mind that it is not necessary for all attributes to be part of an attribute hierarchy; some attributes exist just to provide narrative descriptions of the dimensions. 818. Discuss the most common performance improvement techniques used in star schemas. Answer: The following four techniques are commonly used to optimize data warehouse design: 



Denormalization improves performance by storing in one single record what normally would take many records in different tables. For example, to compute the total sales for all products

717

819. What is data analytics? Briefly define explanatory and predictive analytics. Answer: Data analytics is a subset of BI functionality that encompasses a wide range of mathematical, statistical, and modeling techniques with the purpose of extracting knowledge from data. Data analytics is used at all levels within the BI framework, including queries and reporting, monitoring and alerting, and data visualization. Hence, data analytics is a “shared” service that is crucial to what BI adds to an organization. Data analytics represents what business managers really want from BI: the ability to extract actionable business insight from current events and foresee future problems or opportunities. Data analytics discovers characteristics, relationships, dependencies, or trends in the organization’s data, and then explains the discoveries and predicts future events based on the discoveries. Data analytics tools can be grouped into two separate (but closely related and often overlapping) areas: 



718

820. Describe and contrast the focus of data mining and predictive analytics. Give some examples. Answer: In practice, data analytics is better understood as a continuous spectrum of knowledge acquisition that goes from discovery to explanation to prediction. The outcomes of data analytics then become part of the information framework on which decisions are built. You can think of data mining (explanatory analytics) as explaining the past and present, while predictive analytics forecasts the future. However, you need to understand that both sciences work together; predictive analytics uses explanatory analytics as a stepping stone to create predictive models. Data mining refers to analyzing massive amounts of data to uncover hidden trends, patterns, and relationships; to form computer models to simulate and explain the findings; and then to use such models to support business decision making. In other words, data mining focuses on the discovery and explanation stages of knowledge acquisition. However, data mining can also be used as the basis to create advanced predictive data models. For example, a predictive model could be used to predict future customer behavior, such as a customer response to a target marketing campaign. So, what is the difference between data mining and predictive analytics? In fact, data mining and predictive analytics use similar and overlapping sets of tools, but with a slightly different focus. Data mining focuses on answering the “how” and “what” of past data, while predictive analytics focuses on creating actionable models to predict future behaviors and events. In some ways, you can think of predictive analytics as the next logical step after data mining; once you understand your data, you can use the data to predict future behaviors. In fact, most BI vendors are dropping the term data mining and replacing it with the more alluring term predictive analytics. Predictive analytics can be traced back to the banking and credit card industries. The need to profile customers and predict customer buying patterns in these industries was a critical driving force for the evolution of many modeling methodologies used in BI data analytics today. For example, based on your demographic information and purchasing history, a credit card company can use data-mining models to determine what credit limit to offer, what offers you are more likely to accept, and when to send those offers. Another example, a data mining tool could be used to analyze customer purchase history data. The data mining tool will find many interesting purchasing patterns, and correlations about customer demographics, timing of purchases, and the type of items they purchase together. The predictive analytics tool will use those findings to build a model that will predict with high degree of accuracy when a certain type of customer will purchase certain items and what items are likely to be purchased on certain times. 821. How does data mining work? Discuss the different phases in the data mining process. Answer: Data mining is subject to four phases: 



data groupings, classifications, clusters, or sequences.

719



data dependencies, links, or relationships.



data patterns, trends, and deviations.



65% of customers who did not use the credit card in six months are 88% likely to cancel their account 82% of customers who bought a new TV 42” or bigger are 90% likely to buy an entertainment center within the next 4 weeks. If age < 30 and income <= 25,000 and credit rating < 3 and credit amount > 25,000, the minimum term is 10 years. The complete set of findings can be represented in a decision tree, a neural net, a forecasting model, or a visual presentation interface, which is then used to project future events or results. For example, the prognosis phase may project the likely outcome of a new product roll-out or a new marketing promotion. 822. Describe the characteristics of predictive analytics. What is the impact of Big Data in predictive analytics? Answer: Predictive analytics employs mathematical and statistical algorithms, neural networks, artificial intelligence, and other advanced modeling tools to create actionable predictive models based on available data. The algorithms used to build the predictive model are specific to certain types of problems and work with certain types of data. Therefore, it is important that the end user, who typically is trained in statistics and understands business, applies the proper algorithms to the problem in hand. However, thanks to constant technology advances, modern BI tools automatically apply multiple algorithms to find the optimum model. Most predictive analytics models are used in areas such as customer relationships, customer service, customer retention, fraud detection, targeted marketing, and optimized pricing. Predictive analytics can add value to an organization in many different ways; for example, it can help optimize existing processes, identify hidden problems, and anticipate future problems or opportunities. However, predictive analytics is not the “secret sauce” to fix all business problems. Managers should carefully monitor and evaluate the value of predictive analytics models to determine their return on investment.

720

Predictive analytics received a big stimulus with the advent of social media. Companies turned to data mining and predictive analytics as a way to harvest the mountains of data stored on social media sites. Google was one of the first companies that offered targeted ads as a way to increase and personalize search experiences. Similar initiatives were used by all types of organizations to increase customer loyalty and drive up sales. Take the example of the airline and credit card industries and their frequent flyer and affinity card programs. Nowadays, many organizations use predictive analytics to profile customers in an attempt to get and keep the right ones, which in turn will increase loyalty and sales. 823. Describe data visualization. What is the goal of data visualization? Answer: Data visualization is the process of abstracting data to provide a visual data representation that enhances the user’s ability to comprehend the meaning of the data. The goal of data visualization is to allow the user to quickly and efficiently see the data’s big picture by identifying trends, patterns, and relationships. 824. Is data visualization only useful when used with Big Data? Explain and expand. Answer: It is a mistake to think that data visualization is useful only when dealing with Big Data. Any organization (regardless of size) that collects and uses data in its daily activities can benefit from the use of data analytics and visualization techniques. We all have heard the saying “a picture is worth a thousand words,” and this has never been more accurate than in data visualization. Tables with hundreds, thousands, or millions of rows of data cannot be processed by the human mind in a meaningful way. Providing summarized tabular data to managers does not give them enough insight into the meaning of the data to make informed decisions. Data visualization encodes the data into visually rich formats (mostly graphical) that provide at-a-glance insight into overall trends, patterns, and possible relationships. Data visualization techniques range from simple to very complex, and many are familiar. Such techniques include pie charts, line graphs, bar charts, bubble charts, bubble maps, donut charts, scatter plots, Gantt charts, heat maps, histograms, time series plots, steps charts, waterfall charts, and many more. The tools used in data visualization range from a simple spreadsheet (such as MS Excel) to advanced data visualization software such as Tableau, Microsoft PowerBI, Domo, and Qlik. Common productivity tools such as Microsoft Excel can often provide surprisingly powerful data visualizations. Excel has long included basic charting and PivotTable and PivotChart capabilities for visualizing spreadsheet data. More recently, the introduction of the PowerPivot add-in has eliminated row and column data limitations and allows for the integration of data from multiple sources. This puts powerful data visualization capabilities within reach of most business users. 825. As a discipline, data visualization can be studied as a group of visual communication techniques used to explore and discover data insights by applying: pattern recognition, spatial awareness, and aesthetics. 826. Describe the different types of data and how they map to star schemas and data analysis. Give some examples of the different data types. Answer: In general, there are two types of data: 

721

You can think of qualitative data as being the dimensions on a star schema and the quantitative data as being the facts of a star schema. This is important because it means that you must use the correct type of functions and operations with each data type, including the proper way to visually represent it. 827. What five graphical data characteristics does data visualization use to highlight and contrast data findings and convey a story? Answer: Data visualization uses shape, color, size, position, and group/order to represent and highlight data in certain ways. The way you visualize the data tells a story and has an impact on the end users. Some data visualizations can provide unknown insights and others can be a way to draw attention to an issue. When used correctly, data visualization can tell the story behind the data. 828. Contrast a data lake with a data warehouse. Answer: The primary difference between a data lake and a data warehouse is the state of the data. The data lake stores data in its raw, natural format. The data is raw in that it has not been processed yet. The data warehouse stores data that has been processed so that it conforms to the defined data warehouse structure. The processing may involve many manipulations of the data to decompose, aggregate, clean, and categorize the data.

722

Show the total number of users by different time periods.



Show usage numbers by time period, by major, and by student classification.



Compare usage for different majors and different semesters.

Use the Ch13_P1.mdb database, which includes the following tables: 

USELOG contains the student lab access data.



STUDENT is a dimension table that contains student data.

Given the three preceding requirements, and using the Ch13_P1.mdb data, complete the following problems: h. Define the main facts to be analyzed. (Hint: These facts become the source for the design of the fact table.) i.

Define and describe the appropriate dimensions. (Hint: These dimensions become the source for the design of the dimension tables.)

Draw the lab usage star schema, using the fact and dimension structures you defined in Problems 1a and 1b.

k. Define the attributes for each of the dimensions in Problem 1b. l.

Recommend the appropriate attribute hierarchies.

m. Implement your data warehouse design, using the star schema you created in Problem 1c and the attributes you defined in Problem 1d. n. Create the reports that will meet the requirements listed in this problem’s introduction. Answer: Before Problems 1a–g can be answered, the students must create the time and semester dimensions. Looking at the data in the USELOG table, the students should be able to figure out that the data belong to the Fall 2017 and Spring 2018 semesters; so the semester dimension must contain entries for at least these two semesters. The time dimension can be defined in several different ways. It will be very useful to provide class time during which students can explore the different benefits derived from various ways to represent the time dimension. Regardless of what time dimension representation is selected, it is clear that the date and time entries in the USELOG must be transformed to meet the TIME and SEMESTER codes. For data analysis purposes, we suggest using the TIME and SEMESTER dimension table configurations shown in Tables P13.1A and P13.1B, respectively. (We have used these configurations in the DW-P1sol.MDB database that is located on the CD.)

723

Table P13.1A The TIME Dimension Table Structure TIME_ID

TIME_DESCRIPTION

BEGIN_TIME

END_TIME

Morning

6:01AM

12:00PM

Afternoon

12:01PM

6:00PM

Night

6:01PM

6:00AM

Table P13.1B The SEMESTER Dimension Table Structure SEMESTER_ID

SEMESTER_DESCRIPTION

BEGIN_DATE

END_DATE

FA17

Fall 2017

15-Aug-2017

18-Dec-2017

SP18

Spring 2018

08-Jan-2018

15-May-2018

724

Table P13.1C The Queries in the DW_P1sol.MDB Database Query Name

Query Description

Update DATE format in USELOG

The DATE field in USELOG was originally given to us as a character field. This query converted the date text to a date field we can use for date comparisons.

Update STUDENT_ID format in STUDENT

This query changes the STUDENT_ID format to make it compatible with the format used in USELOG.

Update STUDENT_ID format in USELOG

This query changes the STUDENT_ID format to make it compatible with the format used in STUDENT.

Append TEST records from USELOG & STUDENT

Update TIME_ID and SEMESTER_ID in TEST

Before we create the USEFACT table, we must transform the dates and time to match the SEMESTER_ID and TIME_ID keys used in our SEMESTER and TIME dimension tables. This query does that.

Count STUDENTS sort by Fact Keys: SEM, MAJOR, CLASS, TIME.

This query does data aggregation over the data in TEST table. This query table will be used to create the new USEFACT table.

Populate USEFACT

This query uses the results of the previous query to populate our USEFACT table.

Compares usage by Semesters by Times

Used to generate Report1

Usage by Time, Major and Classification

Used to generate Report2

Usage by Major and Semester

Used to generate Report3

Having completed the preliminary work, we can now present the solutions to the seven problems: h. Define the main facts to be analyzed. (Hint: These facts become the source for the design of the fact table.) Answer: The main facts are the total number of students by time, the major, the semester, and the student classification. i.

Define and describe the appropriate dimensions. (Hint: These dimensions become the source for the design of the dimension tables.)

725

Draw the lab usage star schema, using the fact and dimension structures you defined in Problems 1a and 1b. Answer: Figure P13.1c shows the MS Access relational diagram—see the Ch13P1sol.mdb database in the Student Online Companion—to illustrate the star schema, the relationships, the table names, and the field names used in our solution. The students are given only the USELOG and STUDENT tables and they must produce the fact table and dimension tables.

FIGURE P13.1c The Microsoft Access Relational Diagram

k. Define the attributes for each of the dimensions in Problem 1b. Answer: Given Problem 1c’s star schema snapshot, the dimension attributes are easily defined: Semester dimension: semester_id, semester_description, begin_date, and end_date. Major dimension: major_code and major_name. Class dimension: class_id and class_description. Time dimension: time_id, time_description, begin_time, and end_time.

726

Recommend the appropriate attribute hierarchies. Answer: See the answer to Question 18 and the dimensions shown in Problems 1c and 1d to develop the appropriate attribute hierarchies.

NOTE To create the dimension tables in MS Access, we had to modify the data. These modifications can be examined in the update queries stored in the Ch13_P1sol.mdb database. We used the switch function in MS Access to assign the proper SEMESTER_ID and the TIME_ID values to the USEFACT table. m. Implement your data warehouse design, using the star schema you created in Problem 1c and the attributes you defined in Problem 1d. Answer: The solution is included in the Ch13_P1sol.mdb database on the Instructor’s CD. n. Create the reports that will meet the requirements listed in this problem’s introduction. Answer: Use the Ch13_P1sol.mdb database on the Instructor’s CD as the basis for the reports. Keep in mind that the Microsoft Access export function can be used to put the Access tables into a different database such as Oracle or DB2. 829. Victoria Ephanor manages a small product distribution company. Because the business is growing fast, she recognizes that it is time to manage the vast information pool to help guide the accelerating growth. Ephanor, who is familiar with spreadsheet software, currently employs a sales force of four people. She asks you to develop a data warehouse application prototype that will enable her to study sales figures by year, region, salesperson, and product. (This prototype will be used as the basis for a future data warehouse database.) Using the data supplied in the Ch13_P2.xlsx file, complete the following seven problems: h. Identify the appropriate fact table components. Answer: The dimensions for this star schema are: Year, Region, Agent, and Product. (These are shown in Figure P13.2c.) i.

Identify the appropriate dimension tables. Answer: (These are shown in Figure P13.2c.)

Draw a star schema diagram for this data warehouse. Answer: See Figure P13.2c.

727

FIGURE P13.2C The Star Schema for the Ephanor Distribution Company

k. Identify the attributes for the dimension tables that will be required to solve this problem. Answer: The solution to this problem is presented in the Ch13_P2sol.xls file in the Student Online Companion. l.

Using Microsoft Excel or any other spreadsheet program that can produce pivot tables, generate a pivot table to show the sales by product and by region. The end user must be able to specify the display of sales for any given year. The sample output is shown in the first pivot table in Figure P13.2E.

FIGURE P13.2E Using A Pivot Table

Answer: The solution to this problem is presented in the Ch13_P2sol.xlsx file in the Teacher Data Files. m. Using Problem 2e as your base, add a second pivot table (see Figure P13.2E) to show the sales by salesperson and by region. The end user must be able to specify sales for a given year or for all years, and for a given product or for all products.

728

FIGURE P13.2F Second Pivot Table

Answer: The solution to this problem is presented in the Ch13_P2sol.xlsx file in the Teacher Data Files. n. Create a 3D bar graph to show sales by salesperson, by product, and by region. (See the sample output in Figure P13.2G.)

FIGURE P13.2G 3D Bar Graph Showing the Relationships among Agent, Product, and Region

Answer: The solution to this problem is presented in the Ch13_P2sol.xlsx file in the Teacher Data Files.

729

830. David Suker, the inventory manager for a marketing research company, wants to study the use of supplies within the different company departments. Suker has heard that his friend, Victoria Ephanor, has developed a spreadsheet-based data warehouse model that she uses to analyze sales data (see Problem 2). Suker is interested in developing a data warehouse model like Ephanor’s so he can analyze orders by department and by product. He will use Microsoft Access as the data warehouse DBMS and Microsoft Excel as the analysis tool. e. Develop the order star schema. Answer: Figure P13.3a’s MS Access relational diagram reflects the star schema and its relationships. Note that the students are given only the ORDERS table. The student must study the data set and make the queries necessary to create the dimension tables (TIME, DEPT, VENDOR, and PRODUCT) and the ORDFACT fact table.

FIGURE P13.3A The Marketing Research Company Relational Diagram

Identify the appropriate dimension attributes. Answer: The dimensions are TIME, DEPT, VENDOR, and PRODUCT. (See Figure P13.3A.)

g. Identify the attribute hierarchies required to support the model. Answer: The main hierarchy used for data drilling purposes is represented by TIME-DEPTVENDOR-PRODUCT sequence. (See Figure P13.3a.) Within this hierarchy, the user can analyze data at different aggregation levels. Additional hierarchies can be constructed in the TIME dimension to account for quarters or, if necessary, by daily aggregates. The VENDOR dimension could also be expanded to include geographic information that could be used for drill-down purposes. h. Using the Ch13_P3 database, develop a crosstab report in Microsoft Access, using a 3D bar graph to show orders by product and by department. (The sample output is shown in Figure P13.3.)

FIGURE P13.3 Crosstab Report: Orders by Product and Department

730

Answer: The solution to this problem is included in the Ch13_P3sol.mdb database in the Teacher Data Files. 831. ROBCOR, whose sample data is contained in the database named Ch13_P4.mdb, provides “ondemand” aviation charters using a mix of different aircraft and aircraft types. Because ROBCOR has grown rapidly, its owner has hired you as its first database manager. The company’s database, developed by an outside consulting team, is already in place to help manage all company operations. Your first critical assignment is to develop a decision support system to analyze the charter data. (Review the company’s operations in Problems 24–31 of Chapter 3, The Relational Database Model.) The charter operations manager wants to be able to analyze charter data such as cost, hours flown, fuel used, and revenue. She also wants to be able to drill down by pilot, type of airplane, and time periods.

731

Given those requirements, complete the following: f.

Create a star schema for the charter data.

Table P13.4-1A The ROBCOR Data Warehouse Queries Query Name

Query Description

Make a TEMP table from CHARTER, PILOT, and MODEL

Creates a temporary storage table used to make the necessary data transformations before the creation of the fact table.

Update TIME_ID in TEMP

Used to create the TIME_ID key used in the TIME dimension table.

Update YEAR and MONTH in TEMP

Make TIME table from TEMP

This query is used to create the time table using the appropriate data from the TEMP table.

Aggregate TEMP table by fact keys

This query does data aggregation over the data in the TEMP table. This query table will be used to create the new CHARTER_FACT table.

Populate CHARTER_FACT table

This query uses the results of the previous query to populate our CHARTER_FACT table.

732

FIGURE P13.4A The ROBCOR Relational Diagram

g. Define the dimensions and attributes for the charter operation’s star schema. Answer: The dimensions are TIME, MODEL, and PILOT. Each of these dimensions is depicted in Figure P13.4a’s star schema figure. The attributes are: Time dimension: time id, year, and month. Model dimension: model code, manufacturer, name, number of seats, and so on. Pilot dimension: employee number, pilot license, pilot ratings, and so on. h. Define the necessary attribute hierarchies. Answer: The main attribute hierarchy is based on the sequence year-month-model-pilot. The aggregate analysis is based on this hierarchy. We can produce a query to generate revenue, hours flown, and fuel used on a yearly basis. We can then drill down to a monthly time period to generate the aggregate information for each model of airplane. We can also drill down to get that information about each pilot. i.

Implement the data warehouse design using the design components you developed in Problems 4a–4c. Answer: The Ch13_P4sol.mdb database contains the data and solutions for Problems 4a– 4c.

733

Generate the reports to illustrate that your data warehouse meets the specified information requirements. Answer: The Ch13-P4sol.mdb database contains the solution for Problem 4e.

Using the data provided in the Ch13-SaleCo-DW database, solve the following problems. (Hint: In Problems 5–11, use the ROLLUP command.)

The script files used to populate the database are available at cengage.com. The script files are available in Oracle, MySQL and SQL Server formats. MS Access does not have SQL support for the complex grouping required. 832. What is the SQL command to list the total sales by customer and by product, with subtotals by customer and a grand total for all product sales? Figure P13.5 shows the abbreviated results of the query. Answer: Oracle: SELECT

CUS_CODE, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT

GROUP BY

ROLLUP (CUS_CODE, P_CODE);

SQL Server and MySQL: SELECT

CUS_CODE, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT

GROUP BY

CUS_CODE, P_CODE WITH ROLLUP;

CUS_CODE, TM_MONTH, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

ROLLUP (CUS_CODE, TM_MONTH, P_CODE);

734

SQL Server and MySQL: SELECT

CUS_CODE, TM_MONTH, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

CUS_CODE, TM_MONTH, P_CODE WITH ROLLUP;

833. What is the SQL command to list the total sales by region and customer, with subtotals by region and a grand total for all sales? Figure P13.7 shows the result of the query. Answer: Oracle: SELECT

REG_ID, CUS_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWCUSTOMER C ON S.CUS_CODE = C.CUS_CODE

GROUP BY

ROLLUP (REG_ID, CUS_CODE);

SQL Server and MySQL: SELECT

REG_ID, CUS_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWCUSTOMER C ON S.CUS_CODE = C.CUS_CODE

GROUP BY

REG_ID, CUS_CODE WITH ROLLUP;

834. What is the SQL command to list the total sales by month and product category, with subtotals by month and a grand total for all sales? Figure P13.8 shows the result of the query. Answer: Oracle: SELECT

TM_MONTH, P_CATEGORY, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

ROLLUP (TM_MONTH, P_CATEGORY);

SQL Server and MySQL: SELECT

TM_MONTH, P_CATEGORY, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

TM_MONTH, P_CATEGORY WITH ROLLUP;

735

835. What is the SQL command to list the number of product sales (number of rows) and total sales by month, with subtotals by month and a grand total for all sales? Figure P13.9 shows the result of the query. Answer: Oracle: SELECT

TM_MONTH, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

ROLLUP (TM_MONTH);

SQL Server and MySQL: SELECT

TM_MONTH, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

TM_MONTH WITH ROLLUP;

836. What is the SQL command to list the number of product sales (number of rows) and total sales by month and product category with subtotals by month and product category and a grand total for all sales? Figure P13.10 shows the result of the query. Answer: Oracle: SELECT

TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

ROLLUP (TM_MONTH, P_CATEGORY);

SQL Server and MySQL: SELECT

TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

TM_MONTH, P_CATEGORY WITH ROLLUP;

837. What is the SQL command to list the number of product sales (number of rows) and total sales by month, product category and product, with subtotals by month and product category and a grand total for all sales? Figure P13.11 shows the result of the query.

736

Answer: Oracle: SELECT

TM_MONTH, P_CATEGORY, P_CODE, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE

GROUP BY

ROLLUP (TM_MONTH, P_CATEGORY, P_CODE);

SQL Server and MySQL: SELECT

TM_MONTH, P_CATEGORY, P_CODE, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE

GROUP BY

TM_MONTH, P_CATEGORY, P_CODE WITH ROLLUP;

838. Using the answer to Problem 10 as your base, what command would you need to generate the same output but with subtotals in all columns? (Hint: Use the CUBE command.) Figure P13.12 shows the result of the query. Answer: Oracle: SELECT

TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

CUBE (TM_MONTH, P_CATEGORY);

SQL Server: SELECT

TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES

FROM

DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID

GROUP BY

TM_MONTH, P_CATEGORY WITH CUBE;

MySQL does not currently have the ability to do this type of grouping without third-party add-on products.

737

839. Create your own data analysis and visualization presentation. The purpose of this project is for you to search for a publicly available data set using the Internet and create your own presentation using what you have learned in this chapter. d. Search for a data set that may interest you and download it. Some examples of public data sets sources are (see also Note on page 625): 

http://www.data.gov



http://data.worldbank.org



http://aws.amazon.com/datasets



http://usgovxml.com/



https://data.medicare.gov/



http://www.faa.gov/data_research/

e. Use any tool available to you to analyze the data. You can use tools such as MS Excel PivotTables, PivotCharts, or other free tools, such as Google Fusion tables, Tableau free trial, and IBM Many Eyes. f.

Create a short presentation to explain some of your findings (such as what the data sources are, where the data comes from, and what the data represents.) Answer: There are an incredible number of possible visualizations that students can create for an exercise like this. Most students enjoy the opportunity to express their creativity in producing visually interesting solutions. Attempt to keep the focus on how the visualization might make the data actionable. What can we learn from the visualization, and how might a decision maker be influenced by it? Data Sources available: There are several public sources of large data sets that could be used by students to practice visualizations. Some of the most common sources are: http://catalog.data.gov

http://data.worldbank.org

http://aws.amazon.com/datasets

http://usgovxml.com

https://data.medicare.gov

http://www.faa.gov/data_research/

738

FIGURE P13.13A H1b Visa Applications Dashboard (Excel)

739

FIGURE P13.13B H1B Visa Applications Dashboard (PowerBI)

740

FIGURE P13.13C H1B Visa Applications Dashboard (Tableau)

TABLE OF CONTENTS Answers to Review Questions .................................................................................................1

ANSWERS TO REVIEW QUESTIONS 840. What is Big Data? Give a brief definition. Answer: Big Data is data of such volume, velocity, and/or variety that it is difficult for traditional relational database technologies to store and process it.

741

841. What are the traditional 3 Vs of Big Data? Briefly define each. Answer: Volume, velocity, and variety are the traditional 3 Vs of Big Data. Volume refers to the quantity of the data that must be stored. Velocity refers to the speed with which new data is being generated and entering the system. Variety refers to the variations in the structure, or the lack of structure, in the data being captured. 842. Explain why companies like Google and Amazon were among the first to address the Big Data problem. Answer: In the 1990s, the use of the Internet exploded and commercial websites helped attract millions of new consumers to online transactions. When the dot-com bubble burst at the end of the 1990s, the millions of new consumers remained but the number of companies providing them services reduced dramatically. As a result, the surviving companies, like Google and Amazon, experienced exponential growth in a very short time. This led to these companies being among the first to experience the volume, velocity, and variety of data that is associated with Big Data. 843. Explain the difference between scaling up and scaling out. Answer: Scaling up involves improving storage and processing capabilities through the use of improved hardware, software, and techniques without changing the quantity of servers. Scaling out involves improving storage and processing capabilities through the use of more servers. 844. What is stream processing, and why is it sometimes necessary? Answer: Stream processing is the processing of data inputs to make decisions on which data should be stored and which data should be discarded. In some situations, large volumes of data can enter the system at such a rapid pace that it is not feasible to try to actually store all of the data. The data must be processed and filtered as it enters the system to determine which data to keep and which data to discard.

742

845. How is stream processing different from feedback loop processing? Answer: Stream processing focuses on inputs, while feedback loop processing focuses on outputs. Stream processing is performed on the data as it enters the system to decide which data should be stored and which should be discarded. Feedback loop processing uses data after it has been stored to conduct analysis for the purpose of making the data actionable by decision makers. 846. Explain why veracity, value, and visualization can also be said to apply to relational databases as well as Big Data. Answer: Veracity of data is an issue with even the smallest of data stores, which is why data management is so important in relational databases. Value of data also applies to traditional, structured data in a relational database. One of the keys to data modeling is that only the data that is of interest to the users should be included in the data model. Data that is not of value should not be recorded in any data store—Big Data or not. Visualization was discussed and illustrated at length in Chapter 13 as an important tool in working with data warehouses, which are often maintained as structured data stores in relational DBMS products. 847. What is polyglot persistence, and why is it considered a new approach? Answer: Polyglot persistence is the idea that an organization’s data storage solutions will consist of a range of data storage technologies. This is a new approach because the relational database has previously dominated the data management landscape to the point that the use of a relational DBMS for data storage was taken for granted in most cases. With Big Data problems, the reliance on only relational databases is no longer valid. 848. What are the key assumptions made by the Hadoop Distributed File System approach? Answer: HDFS is designed around the following assumptions: High volume Write-once, read-many Streaming access Fault tolerance HDFS assumes that the massive volumes of data will need to be stored and retrieved. HDFS assumes that data will be written once, that is, there will very rarely be a need to update the data once it has been written to disk. However, the data will need to be retrieved many times. HDFS assumes that when a file is retrieved, the entire contents of the file will need to be streamed in a sequential fashion. HDFS does not work well when only small parts of a file are needed. Finally, HDFS assumes that failures in the servers will be frequent. As the number of servers increases, the probability of a failure increases significantly. HDFS assumes that servers will fail so the data must be redundant to avoid loss of data when servers fail.

743

849. What is the difference between a name node and a data node in HDFS? Answer: The name node stores the metadata that tracks where all of the actual data blocks reside in the system. The name node is responsible for coordinating tasks across multiple data nodes to ensure sufficient redundancy of the data. The name node does not store any of the actual user data. The data nodes store the actual user data. A data node does not store metadata about the contents of any data node other than itself. 850. Explain the basic steps in MapReduce processing. Answer: 

A client node submits a job to the Job Tracker.



Job Tracker determines where the data to be processed resides.



Job Tracker contacts the Task Tracker on the nodes as close as possible to the data.



Each Task Tracker creates mappers and reducers as needed to complete the processing of each block of data and consolidate that data into a result.



Task Trackers report results back to the Job Tracker when the mappers and reducers are finished.



The Job Tracker updates the status of the job to indicate when it is complete.

851. Briefly explain how HDFS and MapReduce are complementary to each other. Answer: Both HDFS and MapReduce rely on the concept of massive, relatively independent, distributions. HDFS decomposes data into large, independent chunks of data that are then distributed across a number of independent servers. MapReduce decomposes processing into independent tasks that are distributed across a number of independent servers. The distribution of data in HDFS is coordinated by a name node server that collects data from each server about the state of the data that it holds. The distribution of processing in MapReduce is coordinated by a job tracker that collects data from each server about the state of the processing it is performing. 852. What are the four basic categories of NoSQL databases? Answer: Key-value database, document databases, column family databases, and graph databases. 853. How are the value components of a key-value database and a document database different? Answer: In a key-value database, the value component is nonintelligible for the database. In other words, the DBMS is unaware of the meaning of any of the data in the value component—it is treated as an indecipherable mass of data. All processing of the data in the value component must be accomplished by the application logic. In a document database, the value component is partially interpretable by the DBMS. The DBMS can identify and search for specific tags, or subdivisions, within the value component.

744

854. Briefly explain the difference between row-centric and column-centric data storage. Answer: Row-centric storage treats a row as the smallest data storage unit. All of the column values associated with a particular row of data are stored together in physical storage. This is the optimal storage approach for operations that manipulate and retrieve all columns in a row, but only a small number of rows in a table. Column-centric storage treats a row as a divisible collection of values that are stored separately with the values of a single column across many rows being physically stored together. This is optimal when operations manipulate and retrieve a small number of columns in a row for all rows in the table. 855. What is the difference between a column and a super column in a column family database? Answer: Columns in a column family database are relatively independent of each other. A super column is a group of columns that are logically related. This relationship can be based on the nature of the data in the columns, such as a group of columns that comprise an address, or it can be based on application processing requirements. 856. Explain why graph databases tend to struggle with scaling out? Answer: Graph databases are designed to address problems with highly related data. The data that appears in a graph database are tightly integrated and queries that traverse a graph focus on the relationships among the data. Scaling out requires moving data to number of different servers. As a general rule, scaling out is recommended when the data on each server is relatively independent of the data on other servers. Due to the dependencies among the data on different servers in a graph database, the inter-server communication overhead is very high with a graph database. This has a significant negative impact on the performance of graph databases in a scaled out environment. 857. Explain what it means for a database to be aggregate aware. Answer: Aggregate aware means that the designer of the database has to be aware of the way the data in the database will be used, and then design the database around whichever component would be central to that usage. Instead of decomposing the data structures to eliminate redundancy, an aggregate aware database collects, or aggregates, all of the data around a central component to minimize the structures required during processing.

ANSWERS TO REVIEW QUESTIONS 858. Give some examples of database connectivity options and what they are used for.

745

Native SQL connectivity. Provided by the database vendors to connect to their databases.



Java Database Connectivity (JDBC)—used to connect Java-based applications to multiple different databases.

859. What are ODBC, DAO, and RDO? How are they related?

746



860. What is the difference between DAO and RDO? Answer: DAO uses the MS Jet engine to access file-based relational databases such as MS Access, MS FoxPro, and dBase. In contrast, RDO allows access to relational database servers such as SQL Server, DB2, and Oracle. RDO uses DAO and ODBC to access remote database server data. 861. What are the three basic components of the ODBC architecture? Answer: The basic ODBC architecture is composed of three main components: 

A high-level ODBC API through which application programs access ODBC functionality.



A Driver Manager component that is in charge of managing all database connections.



An ODBC Driver component that talks directly to the DBMS (data source).

862. What steps are required to create an ODBC data source name? Answer: To define a data source you must create a data source name (DSN) for the data source. To create a DSN you have to provide: 



747



FIGURE 15.3 Configuring an Oracle ODBC Data Source

863. What is OLE-DB used for, and how does it differ from ODBC? Answer: Although ODBC, DAO, and RDO were widely used, they did not provide support for nonrelational data. To answer the need for nonrelational data access and to simplify data connectivity, Microsoft developed Object Linking and Embedding for Database (OLE-DB). Based on Microsoft’s Component Object Model (COM), OLE-DB is a database middleware that was developed to add object-oriented functionality for access to relational and nonrelational data. OLE-DB was the first piece of Microsoft’s strategy to provide a unified object-oriented framework for the development of next-generation applications. 864. Explain the OLE-DB model based on its two types of objects.

748



865. How does ADO complement OLE-DB? Answer: OLE-DB provided additional capabilities for the applications accessing the data. However, it did not provide support for scripting languages, especially the ones used for web development, such as Active Server Pages (ASP) and ActiveX. To provide such support, Microsoft developed a new object framework called ActiveX Data Objects (ADO). ADO provides a high-level application-oriented interface to interact with OLE-DB, DAO, and RDO. ADO provided a unified interface to access data from any programming language that uses the underlying OLE-DB objects. Figure 15.5—borrowed from the text and reproduced here for your convenience—illustrates the ADO/OLE-DB architecture and how it interacts with ODBC and native connectivity options.

749

FIGURE 15.5 OLE-DB Architecture Client Applications

OLE-DB Consumers

Access

C++

Excel

ActiveX Data Objects (ADO)

OLE-DB Services Providers Email Processing

Indexing Processing

Cursor Processing

Query Processing

OLE-DB Data Providers OLE-DB Provider for Oracle

OLE-DB Provider for Exchange

OLE-DB Provider for SQL Server

OLE-DB Provider for ODBC

SQL*NET

DATABASE

ODBC

SQL-Server

DATABASE

866. What is ADO.NET, and what two new features make it important for application development? Answer: ADO.NET is the data access component of Microsoft’s .NET application development framework. Microsoft’s .NET framework is a component-based platform for the development of distributed, heterogeneous, interoperable applications aimed to manipulate any type of data, over any network, and under any operating system and programming language. The .NET framework is beyond the reach of this book. Therefore, this section will only introduce the basic data access component of the .NET architecture, ADO.NET. ADO.Net introduced two new features critical for the development of distributed applications: datasets and XML support. 

A DataSet is a disconnected memory-resident representation of the database.



ADO.NET stores all its internal data in XML format.

750

867. What is a DataSet, and why is it considered to be disconnected? Answer: A DataSet is a disconnected memory-resident representation of the database. That is, the DataSet contains tables, columns, rows, relationships, and constraints. Once the data are read from a data provider, the data are placed on a memory-resident DataSet. The DataSet is then disconnected from the data provider. The data consumer application interacts with the data in the DataSet object to make changes (inserts, updates, and deletes) in the dataset. Once the processing is done, the DataSet data are synchronized with the data source, and the changes are made permanent. A DataSet is in fact a simple database with tables, rows, and constraints. Even more important, the DataSet doesn’t require keeping a permanent connection to the data source. The DataAdapter uses the SelectCommand to populate the DataSet from a data source. However, once the DataSet is populated, it is completely independent of the data source— that’s why it’s called “disconnected.” 868. What are web server interfaces used for? Give some examples. Answer: Web server interfaces are used to extend the functionality of the web server to provide more services. If a web server is to communicate with other external programs to provide a service successfully, both programs must use a standard way to exchange messages and respond to requests. A web server interface defines how a web server communicates with external programs. Currently, there are two well-defined web server interfaces: 

Common Gateway Interface (CGI)



Application Programming Interface (API)

Web server interfaces can be used to extend the services of a web server and provide support for access to external databases, fax services, telephony services, and directory services. 869. Search the Internet for web application servers. Choose one and prepare a short presentation for your class. Answer: You are encouraged to use any web search engine to list multiple vendors. Examples of such vendors are: Oracle Application Server, IBM WebSphere, Sun Java, Microsoft, and JBOSS. We encourage the student to visit the webpages of the products and compare the features of at least two products. Some of the many other web application servers, as of this writing, include Oracle Application Server by Oracle Corp., WebLogic by BEA Systems, NetDynamics by Sun Microsystems, NetObjects’ Fusion, Microsoft’s Visual Studio.NET, and WebObjects by Apple. 870. What does this statement mean: “The web is a stateless system.” What implications does a stateless system have for database application developers?

751

Answer: Simply put, the label stateless system indicates that, at any given time, a web server does not know the status of any of the clients communicating with it. That is, there is no open communications line between the server and each client accessing it—that, of course, is impractical on a worldwide web! Instead, client and server computers interact in very short “conversations” that follow the request-reply model. For example, the browser is only concerned with the current page, so there is no way for the second page to know what was done on the first page. The only time the client and server computers communicate is when the client requests a page—when the user clicks a link—and the server sends the requested page to the client. Once the client receives the page and its components, the client/server communication is ended. Therefore, although you may be browsing a page and think that the communication is open, you are actually just browsing the HTML document stored in the local cache (temporary directory) of the client browser. The server does not have any idea what the end user is doing with the document, what data is entered in a form, and what option is selected. On the web, if we want to act on a client’s selection, we need to jump to a new page (go back to the web server), therefore losing track of whatever was done before! Not knowing what was done before or what a client selected before it got to this page makes adding business logic to the web cumbersome. For example, suppose that you need to write a program that performs the following steps: display a data entry screen, capture data, validate data, and save data. This entire sequence can be completed in a single COBOL program because COBOL uses a working storage section that holds in memory all variables used in the program. Now imagine the same COBOL program—but each section (PERFORM statement) is now a separate program! That is precisely how the web works. In short, the web’s stateless nature means that extensive processing required by a program’s execution cannot be done directly on a single webpage; the client browser’s processing ability is limited by the lack of processing ability and the lack of a working storage area to hold variables used by all pages in a website. The browser does not have computational abilities beyond formatting output text and accepting form field inputs. Even when the browser accepts form field data, there is no way to perform immediate data entry validation. Therefore, to perform such crucial processing in the client, the web defers to other web programming languages such as Java, JavaScript, and VBScript. 871. What is a web application server, and how does it work from a database perspective? Answer: A web application server extends the functionality of a web server and provides features such as: 

An integrated development environment with session management and support for persistent application variables.



Security and authentication of users through user IDs and passwords.



Computational languages to represent and store business logic in the application server.



Automatic generation of HTML pages integrated with Java, JavaScript, VBScript, and ASP.



Performance and fault-tolerant features.



Database access with transaction management capabilities.

752



Access to multiple services, such as file transfers (FTP), database connectivity, electronic mail, and directory services.

The web application server interfaces with the database connectivity standards to access databases using any of the supported APIs. So, a web page will be processed by the web application server; the application server will connect to the database using the ADO, OLEDB, or ODBC standard (or any other standard supported by the application server). 872. What are scripts, and what is their function? (Think in terms of database application development.) Answer: A script is a series of instructions executed in interpreter mode. The script is a plain text file that is not compiled like COBOL, C++, or Java. Scripts are normally used in web application development environments. For instance, ColdFusion scripts contain the code that is required to connect, query, and update a database from a web front end. 873. What is XML, and why is it important? Answer: Extensible Markup Language (XML) is a meta-language used to represent and manipulate data elements. XML is designed to facilitate the exchange of structured documents such as orders or invoices over the Internet. The World Wide Web Consortium (W3C) published the first XML 1.0 standard definition in 1998. This standard sets the stage for giving XML the real-world appeal of being a true vendor-independent platform. Therefore, it is not surprising that XML is rapidly becoming the data exchange standard for e-commerce applications. XML is important because it provides the semantics that facilitates the sharing, exchange, and manipulation of structured documents over organizational boundaries. 874. What are document type definition (DTD) documents, and what do they do? Answer: Companies that exchange data using XML must have a way to understand and validate each other’s tags. One way to accomplish that task is through the use of Document Type Definitions. A Document Type Definition (DTD) is a file with a .dtd extension that describes XML elements—in effect, a DTD file provides the composition of the database’s logical model and defines the syntax rules or valid tags for each type of XML document. (The DTD component is very similar to having a public data dictionary for business data.) 875. What are XML schema definition (XSD) documents and what do they do? Answer: An XML Schema Definition (XSD) document is an advanced data definition language that is used to describe the structure (elements, data types, relationship types, ranges, and default values) of XML data documents. Unlike a DTD document, which uses a unique syntax, an XML Schema Definition (XSD) file uses a syntax that resembles an XML document. One of the main advantages of an XML schema is that it more closely maps to database terminology and features. For example, an XML schema will be able to define common database types, such as date, integer or decimal, minimum and maximum values, list of valid values, and required elements. Using the XML schema, a company would be able to validate the data for values that may be out of range, incorrect dates, and valid values. For example, a university application must be able to specify that a GPA value must be between zero and 4.0, and it must be able to detect an invalid birth date such as “14/13/1987.” (There is no 14th month.) Many vendors are rapidly adopting this new standard and are supplying tools to translate DTD documents into XML Schema Definition (XSD) documents. It is widely expected that XML schemas will replace DTD as the method to describe XML data.

753

876. What is JDBC, and what is it used for? Answer: JDBC stands for Java Database Connectivity. Before we talk about JDBC, let’s talk about Java. Java is an object-oriented programming language developed by Sun Microsystems (now owned by Oracle). Java is one of the most common programming languages for web development. Sun Microsystems created Java as a “write once, run anywhere” environment. That means that a programmer can write a Java application once and then, without any modification, run the application in multiple environments (Microsoft Windows, Apple OS X, IBM AIX, etc.). The cross-platform capabilities of Java are based on its portable architecture. Java code is normally stored in pre-processed chunks known as applets that run on a virtual machine environment in the host operating system. This environment has well-defined boundaries, and all interactivity with the host operating system is closely monitored. Java provides runtime environments for most operating systems (from computers to hand-held devices to TV set-top boxes). Another advantage of using Java is its “on-demand” architecture. When a Java application loads, it can dynamically download all its modules or required components via the Internet. When Java applications want to access data outside the Java runtime environment, they use pre-defined application programming interfaces. Java Database Connectivity (JDBC) is an application programming interface that allows a Java program to interact with a wide range of data sources (relational databases, tabular data sources, spreadsheets, and text files). JDBC allows a Java program to establish a connection with a data source, prepare and send the SQL code to the database server, and process the result set. One of the main advantages of JDBC is that it allows a company to leverage its existing investment in technology and personnel training. JDBC allows programmers to use their SQL skills to manipulate the data in the company’s databases. As a matter of fact, JDBC allows direct access to a database server or access via database middleware. Furthermore, JDBC provides a way to connect to databases through an ODBC driver. (Figure 15.7 in the text illustrates the basic JDBC architecture and the various database access styles.) The database access architecture in JDBC is very similar to the ODBC/OLE/ADO.NET architecture. All database access middleware shares similar components and functionality. One advantage of JDBC over other middleware is that it requires no configuration on the client side. The JDBC driver is automatically downloaded and installed as part of the Java applet download. Because Java is a web-enabled technology, applications can connect to a database directly using a simple URL. Once the URL is invoked, the Java architecture comes into place, the necessary applets are downloaded to the client (including the JDBC database driver and all configuration information), and then the applets are executed securely in the client’s runtime environment. Every day, more and more companies are investing resources in developing and expanding their web presence and finding ways to do more business on the Internet. Such businesses will generate increasing amounts of data that will be stored in databases. Java and the .NET framework are part of the trend toward increasing reliance on the Internet as a critical business resource. In fact, it has been said that the Internet will become the development platform of the future.

754

877. What is cloud computing, and why is it a “game changer”? Answer: According to the National Institute of Standards and Technology (NIST), cloud computing is “a computing model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computer resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” The term “cloud services” is used to refer to the services provided by cloud computing. Cloud services allow any organization to quickly and economically add information technology services such as applications, storage, servers, processing power, databases, and infrastructure to its IT portfolio. Cloud computing is important for database technologies because it has the potential to become a “game changer.” Cloud computing eliminates financial and technological barriers so organizations can leverage database technologies in their business processes with minimal effort and cost. In fact, cloud services have the potential to turn basic IT services into “commodity” services, such as electricity, gas, and water, and to enable a revolution that could change not only the way that companies do business but the IT business itself. As Nicholas Carr put it so vividly: “Cloud computing is for IT what the invention of the power grid was for electricity.” For example, imagine that the chief technology officer of a nonprofit organization wants to add e-mail services to the IT portfolio. A few years ago, this proposition would have implied building the e-mail system’s infrastructure from the ground up, including hardware, software, setup, configuration, operation, and maintenance. However, in today’s cloud computing era, you can use Google Apps for Business or Microsoft Exchange Online and get a scalable, flexible, and more reliable e-mail solution for a fraction of the cost. Most of the cloud services come bundled with extras, for example, Google and Microsoft offer additional services like terabyte-size storage spaces for their users. The best part is that you do not have to worry about the daily chores of managing and maintaining the IT infrastructure, such as OS updates, patches, security, fault tolerance, and recovery. What used to take months or years to implement can now be done in a matter of minutes. 878. Name and contrast the types of cloud computing implementation. Answer: There are basically three cloud computing implementation types (based on who the target customers are): 



755



879. Name and describe the most prevalent characteristics of cloud computing services. Answer: The basic characteristics of cloud computing services are: 

Ubiquitous access via Internet.



Shared infrastructure.



Lower costs and variable pricing.



Flexible and scalable services.



Dynamic provisioning.



Service orientation.



Managed operations.

880. Using the Internet, search for providers of cloud services. Then, classify the types of services they provide (SaaS, PaaS, and IaaS). Answer: A starting point will be the examples shown in Figure 15.23 in the textbook. Further examples are: 

Google Workspace and Microsoft 365 (SaaS)



Amazon Cloud and Microsoft Azure (PaaS and IaaS)



DropBox.com—a cloud service storage provider (IaaS)



OneDrive.com—a cloud service storage provider (IaaS)



Carbonite.com—provides online backup of data (SaaS)



iCloud.com (Apple)—provides storage and synchronization of Apple device data (contacts, music, apps, photos, documents, backups—IaaS and SaaS)



GoodData.com—business intelligence platform (PaaS)



Heroku.com—Ruby web programming environment service (PaaS)

881. Summarize the main advantages and disadvantages of cloud computing services. Answer: Table 15.4 summarizes the main advantages and disadvantages of cloud computing services.

756

Table 15.4 Advantages and Disadvantages of Cloud Computing Advantage

Disadvantage

Difficult integration with internal IT system. Configuring the cloud services to integrate transparently with internal authentication and other internal services could be a daunting task.

882. Define SQL data services and list their advantages. Answer: SQL data services refer to Internet-based data management services that provide access to hosted relational data management using standard protocols and common programming interfaces. The advantages of SQL data services include: 

High reliability and scalability of relational database capabilities at a low cost



High level of failure tolerance



Dynamic and automatic load balancing

757



Automated data backup and recovery



Dynamic creation and allocation of database processes and storage

758

ANSWERS TO PROBLEMS ONLINE CONTENT The databases used in the Problems for this chapter can be found at www.cengage.com.

PROBLEMS In the following exercises, you will set up database connectivity using MS Excel.

From Excel, select Data, Get Data, From Other Sources, From Microsoft Query options to retrieve data from an ODBC data source.



Select the MS Access Database* option and click OK.



Select the Database file location and click OK.



Select the table, click on > (arrow) to use all columns in the query, and click Next.



On the Query Wizard—Filter Data click Next.



On the Query Wizard—Sort Order click Next.



Select Return Data to Microsoft Office Excel and click on Finish.



Position the cursor where you want the data to be placed on your spreadsheet and click OK.

The solution is shown in Figure P15.1.

FIGURE P15.1 Solution to Problem 1—Retrieve all AGENTs

883. Use MS Excel to connect to the Ch02_InsureCo MS Access database using ODBC and retrieve all of the CUSTOMERs. Answer: To perform this task, complete the following steps: 

From Excel, select Data, Get Data, From Other Sources, From Microsoft Query options to

759

retrieve data from an ODBC data source. 

Select the MS Access Database* option and click OK.



Select the Database file location and click OK.



Select the table, click on > (arrow) to use all columns in the query, and click Next.



On the Query Wizard—Filter Data click Next.



On the Query Wizard—Sort Order click Next.



Select Return Data to Microsoft Office Excel.



Position the cursor where you want the data to be placed on your spreadsheet and click OK.

The solution is shown in Figure P15.2.

FIGURE P15.2 Solution to Problem 2—Retrieve all CUSTOMERs

884. Use MS Excel to connect to the Ch02_InsureCo MS Access database using ODBC and retrieve the customers whose AGENT_CODE is equal to 503. Answer: To perform this task, complete the following steps: 

From Excel, select Data, Get Data, From Other Sources, From Microsoft Query options to retrieve data from an ODBC data source.



Select the MS Access Database* option and click OK.



Select the Database file location and click OK.



Select the CUSTOMER table, click on the > (arrow) to use all columns in the query and click Next.



On the Query Wizard—Filter Data, select the AGENT_CODE column, select “equals” from the left drop-down box, then select “503” from the right drop-down box, and then click Next.



On the Query Wizard—Sort Order click Next.



Select Return Data to Microsoft Office Excel and click on Finish.



Position the cursor where you want the data to be placed on your spreadsheet and click OK.

The results are shown in Figure P15.3.

760

FIGURE P15.3 Solution to Problem 3—Retrieve all CUSTOMERs with AGENT_CODE=503

885. Create a System DSN ODBC connection called Ch02_SaleCo using the Administrative Tools section of the Windows Control Panel. Answer: To create the DSN, complete the following steps: 

Using Windows, open the Control Panel, open Administrative Tools, open ODBC Data Sources.



Click on the System DSN tab, click on Add, select the Microsoft Access Drive (*.mdb) driver and click on Finish.



On the ODBC Microsoft Access Setup window, enter Ch02_SaleCo on the Data Source Name field.



Under Database, click on the Select button, browse to the location of the MS Access file and click OK. And click OK one more time.



The new system DSN now appears in the list of system data sources.

The results are shown in Figure P15.4.

761

FIGURE P15.4 Solution to Problem 4—Create Ch02_SaleCo System DSN

886. Use MS Excel to list all of the invoice lines for Invoice 103 using the Ch02_SaleCo System DSN. Answer: To perform this task, complete the following steps: 

From Excel, select Data, Get Data, From Other sources, From ODBC.



Select the Ch02_SaleCo data source and click OK.



In the ODBC driver window, select Windows, Use my current credentials and click Connect.



In the Navigator window, select Ch02_SaleCo.



Select the LINE table, then select Transform Data.



On the Power Query Editor, click on the filter on the INV_NUMBER column, deselect all and select 103, and click OK.



In the Ribbon, under File, click Close and Load.



A new LINE worksheet will appear with the LINE data.

The results are shown in Figure P15.5.

762

FIGURE P15.5

Solution to Problem 5—Retrieve all Invoice LINEs with

INV_NUMBER=103

887. Create a System DSN ODBC connection called Ch02_Tinycollege using the Administrative Tools section of the Windows Control Panel. Answer: To perform this task, complete the following steps: 

Using Windows XP, open the Control Panel, open Administrative Tools, ODBC Data Sources.



Click on the System DSN tab, click on Add, select the Microsoft Access Drive (*.mdb) driver and click on Finish.



On the ODBC Microsoft Access Setup window, enter Ch02_TinyCollege on the Data Source Name field.



Under Database, click on the Select button, browse to the location of the MS Access file and click OK twice.



The new system DSN now appears in the list of system data sources.

888. Use MS Excel to list all classes taught in room KLR200 using the Ch02_TinyCollege System DSN. Answer: To perform this task, complete the following steps: 

From Excel, select Data, Get Data, From Other sources, From ODBC.



Select the Ch02_TinyCollege data source and click OK.



In the ODBC driver window, select Windows, Use my current credentials and click Connect.



In the Navigator window, select Ch02_TinyCollege



Select the CLASS table, then select Transform Data



On the Power Query Editor, click on the filter on the CLASS_ROOM column, deselect all and select KLR200, and click OK.



In the Ribbon, under File, click Close and Load.



A new CLASS worksheet will appear with the LINE data.

The results of these actions are shown in Figure P15.7.

763

FIGURE P15.7

Solution to Problem 7—Retrieve all Classes Taught in Room

KLR200

To answer Problems 8−11, use Section 15-3a as your guide. 889. Create a sample XML document and DTD for the exchange of customer data. Answer: The solutions are shown in Figures P15.8a and P15.8b.

FIGURE P15.8a Customer DTD Solution

764

FIGURE P15.8b Customer XML Solution

890. Create a sample XML document and DTD for the exchange of product and pricing data. Answer: The solutions are shown in Figures P15.9a and P15.9b.

765

FIGURE P15.9a Product DTD Solution

766

FIGURE P15.9b Product XML Solution

767

891. Create a sample XML document and DTD for the exchange of order data. Answer: The solutions are shown in Figures P15.10a and P15.10b.

FIGURE P15.10a Order DTD Solution

768

FIGURE P15.10b Order XML Solution

769

892. Create a sample XML document and DTD for the exchange of student transcript data. Use your college transcript as a sample. Answer: The solution to Problem 11 will follow the same format as the previous solutions. However, because Problem 11 requires the students to do some research regarding the information that goes in the transcript data, we have not included a specific solution here. Encourage the student to use his/her creativity and analytical skills to research and create a simple XML file containing the data that is customary on your university. Not all fields in the Student transcript must be included in this exercise. Allow the students to represent just the most important fields. Figure P15.11 shows a sample transcript. Notice the various sections of the transcript. We will focus in those sections independently to help break down the exercise.

FIGURE P15.11 Sample Transcript

770

<?xml version ="1.0"?> A

<Award>Bachelor of Science</Award> <DegreeDate>05/29/2020</DegreeDate> <DegreeHonors>Magna Cum Laude</DegreeHonors> D

<PrimaryDegree> <College>Behavioral and Health Sciences</College> <Major>Textiles Merchandising Design</Major> <MajorConcentration>Fashion Merchandising</MajorConcentration>

771

<Minor>Business Administration</Minor> </PrimaryDegree </DegreeAwarded> <TransferAccepted>

<Semester>Fall 2015</Semester> F

772

<Subject>PS</Subject> <Course>3210</Course> <Title>International Rel</Title> <Grade>TB</Grade> <CreditHours>3.00</CreditHours> <QualityPoints>0.000</QualityPoints> </CreditAccepted>

(repeat)

</TransferAccepted> </StudentTranscript> The DTD for the student transcript is shown below: <!ELEMENT StudentTranscript (StudentInformation,CurrentCurriculum,DegreeAwarded,TransferAccepted+)>

773

774

775

TABLE OF CONTENTS Answers to Review Questions .............................................................................................390

ANSWERS TO REVIEW QUESTIONS 893. Explain the difference between data and information. Give some examples of raw data and information. Answer: Data are raw facts of interest to an end user. Examples of data include a person’s date of birth, an employee name, the number of pencils in stock, and so on. Data represent a static aspect of a real world object, event, or thing. Information is processed data. That is, information is the product of applying some analytical process to data. For example, invoice data may include the invoice number, customer, items purchased, invoice total, and so on. The end user can generate information by tabulating such data and computing totals by customer, cash purchase summaries, credit purchase summaries, a list of most-frequently purchased items, and so on. 894. Define dirty data and identify some of its sources. Answer: Dirty data is data that contains inaccuracies or inconsistencies (i.e., data that lacks integrity). Dirty data may result from a lack of enforcement of integrity constraints, typographical errors, the use of synonyms and homonyms across systems, the use of nonstandard abbreviations, or differences in the decomposition of composite attributes. 895. What is data quality, and why is it important? Answer: Data quality is a comprehensive approach to ensuring the accuracy, validity, and timeliness of the data. Data quality is important because without quality data, accurate and timely information cannot be produced. Without accurate and timely information, it is difficult (impossible?) to make good decisions; and without good decisions, organizations will fail in their competitive environments. 896. Explain the interactions among end user, data, information, and decision-making. Draw a diagram and explain the interactions. Answer: End users apply intelligence to data to produce information. This information is combined with existing knowledge to create new knowledge that is used to make decisions. The interactions are illustrated in Figure IM16.1.

776

FIGURE IM16.1 End User, Data, Information, and Decision-Making Interaction

897. Suppose that you are a DBA. What data dimensions would you describe to top-level managers to obtain their support for endorsing the data administration function? Answer: The first step will be to emphasize the importance of data as a company asset, to be managed as any other asset. Top-level managers must understand this crucial notion and must be willing to commit company resources to manage data as an organizational asset. The next step is to identify and define the need for and role of the DBMS in the organization. Top-level managers are supported through the DBMS’s ability to provide necessary information for strategic planning, provide access to internal and external data to identify growth opportunities, provide a framework for enforcing organizational policies, improve the likelihood of positive return on investment by searching for new ways to cut costs and boost productivity, and provide feedback to monitor goal achievement. 898. How and why did database management systems become the organizational data management standard in organizations? Discuss some of the advantages of the database approach over the file-system approach. Answer: Prior to database approaches, organizations relied on file systems. The data files in the file system were “owned” by individual functional areas within the organization, often for their exclusive use. This led to high levels of redundancy and a lack of data consistency across the organization. As the need increased for more accurate data to produce more accurate information to support increasingly integrated functions, the deficiencies of file systems became unacceptable. Databases provided an organizational ownership of data that was shared across applications and functional areas. As a result, the importance of data resources grew, and organizations found ever more applications of their data. Technology departments shifted focus from data processing to information processing to support organizational decision making. 899. Using a single sentence, explain the role of databases in organizations. Then explain your answer. Answer: The single sentence will be: The database’s predominant role is to support managerial decision making at all levels in the organization. Databases support top, middle, and operational management. At the top level of management, the database supports strategic decisions for growth. At the middle level of management, the database supports monitoring and feedback on tactical decisions; while at the operational level, database supports feedback and control of operations. 900. Define security and privacy. How are these two concepts related?

777

Answer: Security means protecting the data against accidental or intentional use by unauthorized users. Privacy deals with the rights of people and organizations to determine who accesses the data and when, where, and how the data are to be used. The two concepts are closely related. In a shared system, individual users must ensure that the data are protected from unauthorized use by other individuals. Also, the individual user must have the right to determine who, when, where, and how other users use the data. The DBMS must provide the tools to allow such flexible management of the data security and access rights in a company database. 901. Describe and contrast the information needs at the strategic, tactical, and operational levels in an organization. Use examples to explain your answer. Answer: Strategic levels of the organization need support of decisions that involve organization-wide issues, such as growth into new markets or responses to environmental threats and opportunities. Strategic decisions typically involve setting goals. Tactical decisions involve the high-level actions to implement the goals set at the strategic level, such as monitoring and controlling the use of company resources. Operational decisions involve the internal and external transactions to conduct the business of the organization within the actions defined in the tactical decisions. Operational-level support can involve activities such as querying the shipping status of a customer’s order. 902. What special considerations must you take into account when introducing a DBMS into an organization? Answer: Managerial, technical, and cultural issues must be taken into account when a new DBMS is to be introduced in an organization. For example, focus the discussion on such questions as: 

What about retraining requirements for the new system?  Who needs to be retrained?  What must be the type and extent of the retraining?



How will the resistance in the preceding question be manifested?



How will you deal with such resistance?

903. Describe the DBA’s responsibilities. Answer: The database administrator (DBA) is the person responsible for the control and management of the shared database within an organization. The DBA controls the database administration function within the organization. The DBA is responsible for managing the overall corporate data resource, both computerized and noncomputerized. Therefore, the DA is given a higher degree of responsibility and authority than

778

the DBA. Depending on organizational style, the DBA and DA roles may overlap and may even be combined in a single position or person. The DBA position requires both managerial and technical skills. Refer to Section 16-5 and Table 16.1 to explain and illustrate the general responsibilities of the DA and DBA functions. 904. How can the DBA function be placed within the organization chart? What effect(s) will such placement have on the DBA function? Answer: The DBA function placement varies from company to company and may be either a staff or line position. In a staff position, the DBA function creates a consulting environment in which the DBA is able to devise the overall data-administration strategy but does not have the authority to enforce it. In a line position, the DBA function has both the responsibility and the authority to plan, define, implement, and enforce the policies, standards, and procedures. 905. Why and how are new technological advances in computers and databases changing the DBA’s role? Answer: The DBA function is probably one of the most dynamic functions of any organization. New technological developments constantly change the DBA function. For example, note how each of the following influences the DBA function: 

the development of the DDBMS



the development of the OODBMS



the increasing use of Cloud solutions



the rapid integration of Intranet and Extranet applications and their effects on the database design, implementation, and management. (Security issues become especially important!)

906. Explain the DBA department’s internal organization, based on the DBLC approach. Answer: The DBA department may be organized based on the DBLC by allocating personnel and resources based on the DBLC phases. In this approach, the department is organized into the following units: 

Planning



Design



Implementation



Operation



Training

779

907. Explain and contrast the differences and similarities between the DBA and DA. Answer: Both the data administrator (DA) and database administrator (DBA) positions require both managerial skills and technical skills. The DA position puts more emphasis on managerial skills, while the DBA position emphasizes the technical skills more. The DA role performs strategic planning of long-term goals, and sets policies and standards based on those goals. The DA job is broad in scope and views data as a corporate asset across the organization. The DBA role controls and supervises the execution of plans to achieve goals within the standards set. 908. Explain how the DBA plays an arbitration role for an organization’s two main assets. Draw a diagram to facilitate your explanation. Answer: The DBA plays a role in the arbitration of interactions between people and data. The DBA sets and enforces standards for the interactions of users and programmers for interacting with the data. The people of the organization, end users, and application programmers interact with the data through application programs and DBMS interfaces. The DBA sets the standards for how the application programs interact with the database and may be involved in verifying that applications conform to those standards. The DBA will define the uses of the DBMS interfaces presented to the users and limit the actions that can be taken by the users with those interfaces.

FIGURE IM16.16 DBA Arbitrates Interactions Between People and Data

909. Describe and characterize the skills desired for a DBA. Answer: The skills for a DBA can be characterized as either managerial or technical. Managerial skills include a broad business understanding, coordination skills, analytical skills, conflict resolution skills, communication skills, and negotiation skills. Technical skills include broad data processing background with up-to-date knowledge of database technologies, understanding of the SDLC, structured development methodologies, DBLC, database modeling skills, and operational database skills.

780

910. What are the DBA’s managerial roles? Describe the managerial activities and services provided by the DBA. Answer: The DBA is a manager responsible for controlling and planning database administration. Activities of the DBA in this regard include planning, organizing, testing, monitoring, and delivering services such as end-user support, standards for data access, security and privacy, backup and recovery, and data distribution. 911. What DBA activities are used to support end users? Answer: DBA activities to support end users include, gathering user requirements, building end-user confidence, resolving conflicts and problems, finding solutions to information needs, ensuring quality and integrity of data, and managing the training and support of DBMS users. 912. Explain the DBA’s managerial role in the definition and enforcement of policies, procedures, and standards. Answer: A successful data administration strategy requires the continuous enforcement of policies as statements of direction or action of DBA goals. The data administration strategy must also provide and enforce standards to implement those policies, such as defining structures for applications and naming conventions that programmers must use. Finally, the strategy must include procedures, that is, the precise step-by-step instructions for how to perform a task so that it is in compliance with the standards and policies. 913. Protecting data security, privacy, and integrity are important database functions. What activities are required in the DBA’s managerial role of enforcing these functions? Answer: The DBA is responsible for defining, documenting, and communicating policies, standards, and procedures for these functions. 914. Discuss the importance and characteristics of database data backup and recovery procedures. Then describe the actions that must be detailed in backup and recovery plans. Answer: Data loss can be ruinous for companies. DBAs must ensure that data can be fully recovered in the case of data loss or loss of database integrity. The backup and recovery plan must include periodic data and application backups, proper identification of backups, safe backup storage, physical protection of the hardware and software, and typically insurance coverage for the data. 915. Assume that your company assigned you the responsibility of selecting the corporate DBMS. Develop a checklist for the technical and other aspects involved in the selection process. Answer: The checklist should address selection criteria based on the following: 

DBMS model



DBMS storage capacity



Application development support



Security and integrity



Backup and recovery



Concurrency control



Performance

781



Database administration tools



Interoperability and data distribution



Portability and standardization



Hardware requirements



Data dictionary accessibility



Vendor training and support availability



Available third-party tools



Cost

916. Describe the activities that are typically associated with the design and implementation services of the DBA technical function. What technical skills are desirable in the DBA’s personnel? Answer: The DBA performs many design activities. The technical function area during design includes helping with the creation of conceptual, logical, and physical database design, and evaluation of transactions within application programs to ensure the transactions are correct, efficient, and compliant with integrity standards. During implementation, the DBA technical functions include implementation of the physical design, creation, and evaluation of the application access plan, and development and testing of operational procedures such as training, security, and backup plans. 917. Why are testing and evaluation of the database and applications not done by the same people who are responsible for the design and implementation? What minimum standards must be met during the testing and evaluation process? Answer: Testing and evaluation of the database and applications are done by different people than the designers and implementers because the designers and implementers are often too close to the problem to recognize any omissions. The testing must include backup and recovery; security; integrity; use of SQL; application performance; evaluation of written documentation and procedures; observance of standards for naming, documenting, and coding; checking for data duplication conflicts with existing data; and the enforcement of data validation rules. 918. Identify some bottlenecks in DBMS performance, and then propose some solutions used in DBMS performance tuning. Answer: The most common bottlenecks for DBMS performance tuning deal with the use of indexes, query optimization algorithms, and management of storage resources. The DBA should create and ensure adherence to an index creation and usage plan. This can include training application programmers on the proper use of SQL statements to take advantage of the indexes. Most query optimization routines are built into the DBMS. However, part of these routines deal with concurrent transactions, and the DBA may be able to configure concurrency options to improve performance for each database individually. Finally, the DBA must configure appropriate storage resources of both primary memory for buffer pools and secondary memory for proper log file size and location. 919. What are the typical activities involved in the maintenance of the DBMS and its utilities and applications? Would you consider application performance tuning to be part of the maintenance activities? Explain your answer.

782

Answer: Database maintenance activities are extensions of the operational activities to ensure the preservation of the database environment. Common activities include reorganization of the database on physical storage devices to maintain performance. Additional database performance tuning is part of the maintenance activities. As the database system enters operation, the database starts to grow. Resources initially assigned to the application are sufficient for the initial loading of the database. As the system grows, the database becomes bigger, and the DBMS requires additional resources to satisfy the demands on the larger database. Database performance will decrease as the database grows and more users access it. The need to monitor and address issues with application performance as the database grows and its use evolves is a part of the process. 920. How do you normally define security? How is your definition of security similar to or different from the definition of database security in this chapter? Answer: The chapter defines security as all activities and measures to ensure the confidentiality, integrity, and availability of data. It is a comprehensive, company-wide approach. 921. What are the levels of data confidentiality? Answer: The levels of data confidentiality are highly restricted, confidential, and unrestricted. 922. What are security vulnerabilities? What is a security threat? Give some examples of security vulnerabilities that exist in different IS components. Answer: A security vulnerability is a weakness in a system component that could be exploited to allow unauthorized access or cause service disruptions. A security vulnerability that is left unfixed is a security threat. Examples include poor user passwords, the copying of data to unauthorized devices, and SQL injection attacks. 923. Define the concept of a data dictionary and discuss the different types of data dictionaries. If you were to manage an organization’s entire data set, what characteristics would you look for in the data dictionary? Answer: A data dictionary is the DBMS component that stores data about the definition of data characteristics and their relationships. It is the location in which metadata is stored. Data dictionaries can be integrated or standalone. Integrated data dictionaries are stored in the database, while standalone data dictionaries are stored outside the database. Relational databases use integrated data dictionaries. Data dictionaries can also be passive or active. An active data dictionary is updated automatically by the DBMS as the metadata changes. Passive data dictionaries must be manually updated, typically through a batch process. 924. Using SQL statements, give some examples of how you would use the data dictionary to monitor the security of the database.

783

FROM SYSTABAUTH WHERE TTNAME = ‘INVENTORY’; List the user and table names for all users who can alter the database structure for any table in the database: SELECT GRANTEE, TTNAME FROM SYSTABAUTH WHERE ALTERAUTH = ‘Y’ ORDER BY GRANTEE, TTNAME; 925. What characteristics do a CASE tool and a DBMS have in common? How can these characteristics be used to enhance the data administration? Answer: CASE tools and DBMS products both make extensive use of data repositories. CASE tools maintain a data dictionary of the objects created by the system designer. Many CASE tools can integrate with a database to maintain this repository in the database itself. The CASE tool can be used to design the database structure, making it easy for the DBA and application designers to collaborate on naming conventions, duplication of data elements, and validation rules. 926. Briefly explain the concepts of information engineering (IE) and information systems architecture (ISA). How do these concepts affect the data administration strategy? Answer: Information engineering (IE) is a top-down approach that translates the company’s strategic goals into data and applications to achieve those goals. IE takes the perspective that the data used by the organization rarely changes, even if the processes to use it change frequently. By taking a data-centric approach, the impact of changes in systems is minimized. An information systems architecture (ISA) is the resulting blueprint for data and applications that results from applying an IE approach. These concepts provide a consistent basis for decision making that is rooted in achievement of organizational strategies. 927. Identify and explain some of the critical success factors in the development and implementation of a good data administration strategy. Answer: Critical success factors for the development and implementation of a good data administration strategy include management commitment, thorough analysis of the company’s current situation, end-user involvement, defined standards, training, and the implementation of a small pilot project. Top-level management must set an example and be champions to drive the strategy. The current data situation of the company must be analyzed and a clear vision for the use of data in the organization must be articulated. End-user buy-in is critical and can only be achieved if the end-users are involved in the process.

784

928. How have cloud-based data services affected the DBA’s role.? Answer: The use of cloud-based data services reduces the DBA’s role in infrastructure management. However, the managerial aspects of the DBA role are either largely unchanged or augmented with the coordination, valuation, and evaluation of cloud services. The technical aspects of the DBA’s role may shift to an even greater emphasis on monitoring and controlling the database to ensure security and data integrity. 929. What is the tool used by Oracle to create users? Answer: The Oracle Enterprise Manager simplifies the creation of users. Users can be created using SQL commands, but the OEM helps to automate that task. 930. In Oracle, what is a tablespace? Answer: A tablespace is a logical storage space. Tablespaces are primarily used to logically group related data. Tablespace data are physically stored in one or more datafiles. 931. In Oracle, what is a database role? Answer: A database role is a named collection of database access privileges that authorize a user to perform specified actions on the database. Examples of roles are CONNECT, RESOURCE, and DBA. 932. In Oracle, what is a datafile? How does it differ from a file systems file? Answer: A database is composed of one or more tablespaces. Therefore, there is a 1:M relationship between the database and its tablespaces. Tablespace data are physically stored in one or more datafiles. Therefore, there is a 1:M relationship between tablespaces and datafiles. A datafile physically stores the database data. Each datafile is associated with one and only one tablespace. (But each datafile can reside in a different directory on the same hard disk—or even on different disks.) In contrast to the datafile, a file system’s file is created to store data about a single entity, and the programmer can directly access the file. But file access requires the end user to know the structure of the data that are stored in the file. While a database is stored as a file, this file is created by the DBMS, rather than by the end user. Because the DBMS handles all file operations, the end user does not know—nor does that end user need to know—the database’s file structure. When the DBA creates a database—or, more accurately, uses the Oracle Storage Manager to let Oracle create a database—Oracle automatically creates the necessary tablespaces and datafiles. 933. In Oracle, what is a database profile? Answer: A profile is a named collection of database settings that control how much of the database resource can be used by a given user.

785

TABLE OF CONTENTS Answers to Review Questions .............................................................................................786 Answers to Problems ...........................................................................................................791

ANSWERS TO REVIEW QUESTIONS 934. What factors relevant to database design are revealed during the initial study phase? Answer: The database initial study phase yields the information required to determine an organization’s needs, as well as the factors that influence data generation, collection, and processing. Students must understand that this phase is generally concurrent with the planning phase of the SDLC and that, therefore, several of the initial study activities are common to both. The most important discovery of the initial study phase is the set of the company’s objectives. Once the designer has a clear understanding of the company’s main goals and its mission, (s)he can use this as the guide to making all subsequent decisions concerning the analysis, design, and implementation of the database and the information system. The initial study phase also establishes the company’s organizational structure; the description of operations, problems and constraints, and alternate solutions; system objectives; and the proposed system scope and boundaries. The organizational structure and the description of operations are interdependent because operations are usually a function of the company’s organizational structure. The determination of structure and operations allows the designer to analyze the existing system and to describe a set of problems, constraints, and possible solutions. Naturally, the designer must find a feasible solution within the existing constraints. In most cases, the best solution is not necessarily the most feasible one. The constraints also force the designer to narrow the focus on very specific problems that must be solved. In short, the combination of all the factors we have just discussed help the designer to put together a set of realistic, achievable, and measurable system objectives within the system’s required scope and boundaries. 935. Why is the organizational structure relevant to the database designer? Answer: The delivery of information must be timely, it must reach the right people, and the delivered information must be accurate. Since the proper use of timely and accurate information is the key factor in the success of any system, the reports and queries drawn from the database must reach the key decision makers within the organization. Clearly, understanding the organization structure helps the designer to define the organization’s lines of communication and to establish reporting requirements. 936. What is the difference between the database design scope and its boundaries? Why is the scope and boundary statement so important to the database designer? Answer: The system’s boundaries are the limits imposed on the database design by external constraints such as available budget and time, the current level of technology, and end-user resistance to change. The scope of a database defines the extent of the database design

786

coverage and reflects a conscious decision to include some things and exclude others. Note that the existence of boundaries usually has an effect on the system’s scope. For legal and practical design reasons, the designer cannot afford to work on an unbounded system. If the system’s limits have not been adequately defined, the designer may be legally required to expand the system indefinitely. Moreover, an unbounded system will not contain the built-in constraints that make its use practical in a real-world environment. For example, a completely unbounded system will never be completed, nor may it ever be ready for reasonable use. Even a system with an “optimistic” set of bounds may drag the design out over many years and may cost too much. Keep in mind that company managers almost invariably want least-cost solutions to specific problems. 937. What business rule(s) and relationships can be described for the ERD shown in Figure QB.4?

FIGURE QB.4 The ERD for Question 4

Answer: The business rules and relationships are summarized in Table QB.4.

787

Table QB.4 Business Rules and Relationships Summary

Business rules

Relationships

A supplier supplies many parts.

many to many

Each part is supplied by many suppliers.

PART - SUPPLIER

A part is used in many products.

many to many

Each product is composed of many parts.

PRODUCT - PART

A product is bought by many customers.

many to many

Each customer buys many products.

PRODUCT - CUSTOMER

Note that the ERD in Figure QB.4 uses the PART_PROD, PROD_VEND, and PROD_CUST entities to convert the M:N relationships to a series of 1:M relationships. Also, note the use of two composite entities: 

The PART_VEND entity’s composite PK is VEND_ID + PART_CODE.



The PART_PROD entity’s composite PK is PART_CODE + PROD_CODE.

The use of these composite PKs means that the relationship between PART and PART_VEND is strong, as is the relationship between VENDOR and PART_VEND. These strong relationships are indicated through the use of a solid relationship line. No PK has been indicated for the PROD_CUST entity, but the existence of weak relationships—note the dashed relationship lines—lets you assume that the PROD_CUST entity’s PK is not a composite one. In this case, a revision of the ERD might include the establishment of a composite PK (PROD_CODE + CUST_NUM) for the PROD_CUST entity. (If you are using Microsoft Visio Professional, declaring the relationships between CUSTOMER and PROD_CUST and between PRODUCT and PROD_CUST to be strong will automatically generate the composite PK, PROD_CODE + CUST_NUM.) 938. Write the connectivity and cardinality for each of the entities shown in Question 4. Answer: We have indicated the connectivities and cardinalities in Figure QB.5. (The Crow’s Foot ERD combines the connectivity and cardinality depiction through the use of the relationship symbols. Therefore, the use of text boxes—we have created those with the Visio text tool—to indicate connectivities and cardinalities is technically redundant.)

788

FIGURE QB.5 Connectivities and Cardinalities

Figure QB.5’s connectivities and cardinalities are reflected in the business rules: 

One part can be supplied by one or more suppliers, and a supplier can supply many parts.



A product is made up of several parts, and a part can be a component of different products.



A product can be bought by several customers, and a customer can purchase several products.

939. What is a module, and what role does a module play within the system? Answer: A module is a separate and independent collection of application programs that covers a given operational area within an information system. A module accomplishes a specific system function, and it is, therefore, a component of the overall system. For example, a system designed for a retail company may be composed of the modules shown in Figure QB.6.

789

FIGURE QB.6 The Retail Company System Modules

Retail System

Inventory

Purchasing

Sales

Accounting

Within Figure QB.6’s Retail System, each module addresses specific functions. For example: 

The inventory module registers any new item, monitors quantity on hand, reorder quantity, and location.



The purchasing module registers the orders sent to the suppliers, any supplier information and order status.



The sales module covers the sales of items to customers, generates the sales slips (invoices), and credit sales checks.



The accounting module covers accounts payable, accounts receivable, and generates appropriate financial status reports.

The example demonstrates that each module has a specific purpose and operates on a database subset (external view). Each external view represents the entities of interest for the specific module. However, an entity set may be shared by several modules. 940. What is a module interface, and what does it accomplish? Answer: A module interface is the method through which modules are connected and by which they interact with one another to exchange data and status information. The definition of proper module interfaces is critical for systems development, because such interfaces establish an ordered way through which system components (modules) interchange information. If the module interfaces are not properly defined, even a collection of properly working modules will not yield a useful working system.

790

ANSWERS TO PROBLEMS Modify the initial ER diagram presented in Figure B.19 to include the following entity supertype and subtypes: The University Computer Lab USER may be a student or a faculty member. Answer: The answer to Problem 1 is included in the answer to Problem 2. 941. Using an ER diagram, illustrate how the change you made in Problem 1 affects the relationship of the USER entity to the following entities: a. LOG b. RESERVATION c. CHECK_OUT d. WITHDRAW Answer: The new ER diagram segment will contain the supertype USER and the subtypes FACULTY and STUDENT. How the use of this supertype/subtype relationships affect the entities shown here is illustrated in the ER diagram shown in Figure PB.2a.

791

FIGURE PB.2a The Crow’s Foot ERD with Supertypes and Subtypes

The ER segment shown in Figure PB.2a reflects the following conditions: 

Not all users are faculty members, so FACULTY is optional to USER.



Not all users are students, so STUDENT is optional to USER.



The conditions in the first two bullets are typical of the supertype/subtype implementation.



Not all faculty members withdraw items, so a faculty member may not ever show up in the WITHDRAW table. Therefore, WITHDRAW is optional to FACULTY.



Not all items are necessarily withdrawn; some are never used. Therefore WITHDRAW is optional to ITEM. (An item that is never withdrawn will never show up in the WITHDRAW table.)



Not all items are checked out, so an ITEM may never show up in the CHECK_OUT table. Therefore, CHECK_OUT is optional to ITEM.



Not all users check out items, so it is possible that a USER—a faculty member or a student—never shows up in the CHECK_OUT table. Therefore, CHECK_OUT is optional to USER.



Not all faculty members place reservations, so RESERVATION is optional to FACULTY.



Not all students use the lab, that is, some students will never sign the log to check in. Therefore, LOG is optional to STUDENT.

792

Given the text’s initial development of the UCL Management System’s ERD, the USER entity was related to both the WITHDRAW and CHECK_OUT entities. Therefore, there was no way of knowing whether a STUDENT or a FACULTY member was related to WITHDRAW or CHECK_OUT. Although the business rules were quite specific about the relationships, the ER diagram did not reflect them. By adding a new USER supertype and two STUDENT and FACULTY subtypes, the ERD more closely represents the business rules. The supertype/subtype relationship in Figure PB.2a lets us see that STUDENT is related to LOG, and that only FACULTY members can make a RESERVATION and WITHDRAW items. However, both STUDENT and FACULTY can CHECK_OUT items. While this supertype/subtype solution conforms to the problem solution requirements, the design is far from complete. For example, one would suppose that FACULTY is already a subtype to EMPLOYEE. Also, can a faculty member also be a student? In other words, are the supertypes/subtypes overlapping or disjoint? In this initial ERD, we have assumed overlapping subtypes; that is, a user can be a faculty member and a student at the same time. Another solution—which would eliminate the USER/FACULTY and USER/STUDENT supertype/subtypes in the ERD—is to add an attribute, such as USER_TYPE, to the USER entity to identify the user as faculty or student. The application software can then be used to enforce the restrictions on various user types. Actually, that approach was used in the final (verified) Computerlab.mdb database on your CD. (The verified database is provided for Appendix C.) 942. Create the initial ER diagram for a car dealership. The dealership sells both new and used cars, and it operates a service facility. Base your design on the following business rules: a. A salesperson can sell many cars but each car is sold by only one salesperson. b. A customer can buy many cars but each car is sold to only one customer. c.

A salesperson writes a single invoice for each car sold.

d. A customer gets an invoice for each car (s)he buys. e. A customer might come in only to have a car serviced; that is, one need not buy a car to be classified as a customer. f.

When a customer takes in one or more cars for repair or service, one service ticket is written for each car.

g. The car dealership maintains a service history for each car serviced. The service records are referenced by the car’s serial number. h. A car brought in for service can be worked on by many mechanics, and each mechanic can work on many cars. i.

A car that is serviced may or may not need parts. (For example, parts are not necessary to adjust a carburetor or to clean a fuel injector nozzle.)

Answer: As you examine the initial ERD in Figure PB.3a, note that business rules (a) through (d) refer to the relationships of four main entities in the database: SALESPERSON, INVOICE, CUSTOMER, and CAR. Note also that an INVOICE requires a SALESPERSON, a CUSTOMER, © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

793

and a CAR. Business rule (e) indicates that INVOICE is optional to CUSTOMER and CAR because a CAR is not necessarily sold to a CUSTOMER. (Some customers only have their cars serviced.) The position of the CAR entity and its relationships to the CUSTOMER and INV_LINE entities is subject to discussion. If the dealer sells the CAR, the CAR entity is clearly related to the INVOICE. (If the car is sold, it generates one invoice line on the invoice. However, the invoice is likely to contain additional invoice charges, such as a dealer preparation charge, and destination charge.) At this point, the discussion can proceed in different directions: 

At this time in the design, the sold car can be linked to the customer through the invoice. Therefore, the relationship between CUSTOMER and CAR shown in Figure PB.3a is not necessary.



If the customer brings a car in for service—whether or not that car was bought at the dealer—the relationship between CUSTOMER and CAR is desirable. After all, when a service ticket is written in the SERVICE_LOG, it would be nice to be able to link the customer to the subsequent transaction. More important, it is the customer who gets the invoice for the service charge. However, if the CUSTOMER-CAR relationship is to be retained, it will be appropriate to make a distinction between the cars in the dealership’s inventory—which are not related to a customer at that point—and the cars that are owned by customers. If no distinction is made between customer-owned cars and cars still in the dealership inventory, Figure PB.3a’s CAR entity will either have a null CUST_NUM or the customer entity must contain a dummy record to indicate a “no customer—dealer-owned” condition.

794

FIGURE PB.3a The Car Dealership Initial Crow’s Foot ERD

Regardless of which argument “wins” in the presentation of the various scenarios, remind the students that the ERD to be developed in this exercise is to reflect the initial design. More important, such discussions clearly indicate the need for very detailed descriptions of operations and the development of precisely written business rules. (It may be useful to review that business rules, which are derived from the description of operations, are written to help define entities, relationships, constraints, connectivities, and cardinalities.) The dealer’s service function is likely to be crucial to the dealer—good service helps generate future sales and the service function is very likely an important cash flow generator. Therefore, the CAR entity plays an important role. If a customer brings in a car for service and the car was not bought at the dealership, it must be added to the CAR table in order to enable the system to keep a record of the service. This is why we have depicted the CUSTOMER–owns–CAR relationship in Figures PB.3a. Also, note that the optionality next to CAR reflects the fact that not all cars are owned by a customer: Some cars belong to the dealership.

795

Because Figure PB.3a shows the initial ERD, that ERD will be subject to revision as the description of operations becomes more detailed and accurate, thus modifying some of the existing business rules and creating additional business rules. Therefore, additional entities and relationships are likely to be developed and some optional relationships may become mandatory, while some mandatory relationships may become optional. Additional changes are likely to be generated by normalization procedures. Finally, the initial design includes some features that require fine-tuning. For example, a SALESPERSON is just another kind of EMPLOYEE—perhaps the main difference between “general” employee and a sales person is that the latter requires tracking of sales performance for commission and/or bonus purposes. Therefore, EMPLOYEE would be the supertype and SALESPERSON the subtype. All these issues must be addressed in the verification and logical design phases addressed in Appendix E. Incidentally, your students may ask why the design does not show a HISTORY entity. The reason for its absence is that the car’s history can be traced through the SERVICE entity.

NOTE Although we are generally reluctant to make forward references, you may find it very useful to look ahead to the ERD shown in Appendix C’s Figure PC.1a. The discussion that precedes the presentation of the modified ERD is especially valuable—students often find such sample data to be the key to understanding a complex design. In any case, the modified ERD in Figure PC.1a provides ample evidence that the initial ERD is only a starting point for the design process. As you discuss the design shown in Figure PB.3a, note that it is far from implementation-ready. For example: 

The INVOICE is likely to contain multiple charges, yet it is only capable of handling one charge at a time at this point. The addition of an INV_LINE entity is clearly an excellent idea.



The SERVICE entity has some severe limitations caused by the lack of a SERVICE_LINE entity. (Note the previous point.) Given this design, it is impossible to store and track all the individual service (maintenance) procedures that are generated by a single service request. For example, a 50,000 mile check may involve multiple procedures such as belt replacements, tire rotation, tire balancing, and brake service. Therefore, the SERVICE entity, like the INVOICE entity, must be related to service lines, each one of which details a specific maintenance procedure.



The PART_USAGE entity’s function is rather limited. For example, its depiction as a composite entity does properly translate the notion that a part can be used in many service procedures and a service procedure can use many parts. Unfortunately, the lack of a SERVICE_LINE entity means that we cannot track the parts use to a particular maintenance procedure.



According to business rule (d), the relationship between CAR and INVOICE would be 1:1. However, if it is possible for the dealer to take the car in trade at a later date and subsequently sell it again, the same CAR_VIN value may appear in INVOICE more than once. We have depicted the latter scenario.

796

The initial design does have one very nice feature at this point: The existence of the WORK_LOG entity’s WORKLOG_ACTION attribute makes it possible to record which mechanic started the service procedure and which one ended the procedure. (The WORKLOG_ACTION attribute has only two values, open and close.) Note that this feature eliminates the need for a null ending date in the SERVICE entity while the car is being serviced. Better yet, if we need to be able to track which mechanics opened and closed the service procedure, the WORK_LOG entity’s presence eliminates the need for synonyms in the SERVICE entity. Note, for example, that the following few sample entries in the WORK_LOG table lets us conclude that service number 12345 was opened by mechanic 104 on 10-Mar-2014 and closed by the same mechanic on 11-Mar-2014.

Table PB.3 Sample Data Entries in the WORK_LOG Entity

EMP_NUM

SERVICE_NUM

WORKLOG_ACTION

WORKLOG_DATE

104

12345

OPEN

10-Mar-2014

107

12346

OPEN

10-Mar-2014

104

12345

11-Mar-2014

104

12346

11-Mar-2014

112

12347

OPEN

11-Mar-2014

The format you see in Table PB.3 is based on a standard we developed for aviation maintenance databases. Because almost all aspects of aviation are tightly regulated, accountability is always close to the top of the list of design requirements. (In this case, we must be able to find out who opened the maintenance procedure and who closed it.) You will discover in Chapter 9, “Database Design,” that we will apply the accountability standard to other aspects of the design, too. (Who performed each maintenance procedure? Who signed out the part(s) used in each maintenance procedure? And so on.) It is worth repeating that a discussion of the shortcomings of the initial design will set an excellent stage for the introduction of Appendix C’s verification process. Strict accountability standards are becoming the rule in many areas outside aviation. Such standards may be triggered by legislation or by company operations in an increasingly litigious environment. 943. Create the initial ER diagram for a video rental shop. Use (at least) the following description of operations on which to base your business rules. The video rental shop classifies movie titles according to their type: comedy, western, classical, science fiction, cartoon, action, musical, or new release. Each type contains many possible titles, and most titles within a type are available in multiple copies. For example, note the summary in the following table of the relationship between video rental type and title.

797

TYPE Musical

Cartoon

Action

TITLE

COPY

My Fair Lady

Oklahoma!

Dilly Dally & Chit Chat Cat

Amazon Journey

Answer: Keep the following conditions in mind as you design the video rental database: 

The movie type classification is standard; not all types are necessarily in stock.



The movie list is updated as necessary; however, a movie on that list might not be ordered if the video shop owner decides that the movie is not desirable for some reason.



The video rental shop does not necessarily order movies from all vendors on the vendor list; some vendors on the vendor list are merely potential vendors from whom movies may be ordered in the future.



Movies classified as new releases are reclassified to an appropriate type after they have been in stock for more than 30 days. The video shop manager wants to have an end-of-period (week, month, year) report for the number of rentals by type.



If a customer requests a title, the clerk must be able to find it quickly. When a customer selects one or more titles, an invoice is written. Each invoice can contain charges for one or more titles. All customers pay in cash.



When a customer checks out a title, a record is kept of the check-out date and time and the expected return date and time. When rented titles are returned, the clerk must be able to check quickly whether the return is late and to assess the appropriate late return fee.



The video store owner wants to generate periodic revenue reports by title and by type. The owner also wants to generate periodic inventory reports and track titles on order.



The video store owner, who employs two (salaried) full-time and three (hourly) part-time employees, wants to keep track of all employee work time and payroll data. Part-time employees must arrange entries in a work schedule, while all employees sign in and out on a work log.

798

NOTE The description of operations not only establishes the operational aspects of the business; it also establishes some specific system objectives we have listed next. As you design this database, remember that transaction and information requirements help drive the design by defining required entities, relationships, and attributes. Also, keep in mind that the description provided by the problem leaves many possibilities for design differences. For example, consider the EMPLOYEE classification as full-time or part-time. If there are few distinguishing characteristics between the two, the situation may be handled by using an attribute EMP_CLASS (whose values might be F or P) in the EMPLOYEE table. If full-time employees earn a base salary and part-time employees earn only an hourly wage, that problem can be handled by having two attributes, EMP_HOURPAY and EMP_BASE_PAY, in EMPLOYEE. Using this approach, the HOUR_PAY would be $0.00 for the salaried full-time employees, while the EMP_BASE_PAY would be $0.00 for the part-time employees. (To ensure correct pay computations, the application software would select either F or P, depending on the employee classification.) On the other hand, if part-time employees are handled quite differently from fulltime employees in terms of work scheduling, benefits, and so on, it would be better to use a supertype/subtype classification for FULL_TIME and PART_TIME employees. (The more unique variables exist, the more sense a supertype/subtype relationship makes.) For discussion purposes, examine the following requirements: 

The clerk must be able to find customer’s requests quickly.



This requirement is met by creating an easy way to query the MOVIE data (by name, type, etc.) while entering the RENTAL data.



The clerk must be able to check quickly whether or not the return is late and to assess the appropriate “late return” fee. This requirement is met by adding attributes such as expected return date, actual return date, and late fees to the RENTAL entity. Note that there is no need to add a new entity, nor do we need to create an additional relationship. Keep in mind that some requirements are easily met by including the appropriate attributes in the tables and by combining those attributes through an application program that enforces the business rule. Remember that not all business rules can be represented in the database conceptual diagram.



The (store owner) wants to be able to keep track of all employee work time and payroll data.



Here we must create two new entities: WORK_SCHEDULE and WORK_LOG, which will show the employee’s work schedule and the actual times worked, respectively. These entities will also help us generate the payroll report.

The description also specifies some of the expected reports: 

End-of-period report for the number of rentals by type. This report will use the RENTAL, MOVIE, and TYPE entities to generate all rental data for some specified period of time.



Revenue report by title and by type. This report will use the RENTAL, MOVIE, and TYPE entities to generate all the rental data.



Periodic inventory reports. This report will use the MOVIE and TYPE entities.



Titles on order. This report will use the ORDER, MOVIE, and TYPE entities.



Employee work times and payroll data. This report will use the EMPLOYEE,

799

WORK_SCHEDULE, and WORK_LOG entities. This summary sets the stage for the ERD shown in Figure PB.4a. Note that the WORK_SCHEDULE and WORK_LOG entities are optional to EMPLOYEE. The optionalities reflect the following conditions: 

Only part-time employees have corresponding records in the work log table.



Only full-time employees have corresponding records in the work schedule table.

Although there is a temptation to create FULL_TIME and PART_TIME entities, which are then related to WORK_LOG and WORK_SCHEDULE, respectively, such a decision reflects a substitution of an entity for an attribute. It is far better to simply create an attribute, perhaps named EMP_TYPE, in the EMPLOYEE entity. The EMP_TYPE attribute values would then be P = part-time or F = full-time. The applications software can then be used to force an entry into the WORK_LOG and WORK_SCHEDULE entities, depending on the EMP_TYPE attribute value. Student question: Using the argument just presented, what other entity might be replaced by an attribute? Answer: The TYPE entity can be represented by a TITLE_TYPE attribute in the TITLE entity. The TITLE_TYPE values would then be “Western”, “Adventure”, and so on. This approach works fine, as long as the type values don’t require additional descriptive material. In the latter case, the TYPE would be better represented by an entity in order to avoid data redundancy problems.

FIGURE PB.4a The Initial Crow’s Foot ERD for the Video Rental Store

Additional discussion: At this point, the ERD has not yet been verified against the transaction requirements. For example, there is no way to check which specific video has been rented by a

800

customer. (If five customers rent copies of the same video, you don’t know which customer has which copy.) Therefore, the design requires additional work triggered by the verification process. In addition, the work log entity’s LOG_DATE is incapable of tracking when the part-time employees logged in or out. Therefore, two dates must be used, perhaps named LOG_DATE_IN and LOG_DATE_OUT. In addition, if you want to determine the hours worked by each part-time employee, it will be necessary to record the time in and time out. Similarly, the work schedule cannot yet be used to track the full-time employees’ schedules. Who has worked and when? Clearly, the verification process discussed in Appendix C is not a luxury! 944. Suppose a manufacturer produces three high-cost, low-volume products: P1, P2, and P3. Product P1 is assembled with components C1 and C2; product P2 is assembled with components Cl, C3, and C4; and product P3 is assembled with components C2 and C3. Components may be purchased from several vendors, as shown in the following table.

VENDOR

COMPONENT SUPPLIED

C1, C2

C1, C2, C3, C4

C1, C2, C4

Each product has a unique serial number, as does each component. To track product performance, careful records are kept to ensure that each product’s components can be traced to the component supplier. Products are sold directly to final customers; that is, no wholesale operations are permitted. The sales records include the customer identification and the product serial number. Using the preceding information, do the following: a. Write the business rules governing the production and sale of the products. Answer: The business rules are summarized in Figure PB.5a.

801

FIGURE PB.5a The Business Rule Summary PRODUCT P1 P2 P3

C1 C1

VENDOR V1 V2 V3

COMPONENTS C2 C3 C2 C3

COMPONENTS SUPPLIED C1 C1 C1

C2 C2 C3 C2

C4 C3

Business Rule 1. A component can be part of several products, and a product is made up of several components.

Business Rule 2. A component can be supplied by several vendors, and a vendor supplies several components.

b. Create an ER diagram capable of supporting the manufacturer’s product/component tracking requirements. Answer: The two business rules shown in Figure PB.5a allow the designer to generate the ERD shown in Figure PB.5b. (Note the M:N relationships between PRODUCT and COMPONENT and between COMPONENT and VENDOR that have been converted through the composite entities PROD_COMP and COMP_VENB.)

FIGURE PB.5b The Initial Crow’s Foot ERD for Problem 5b

802

As you examine Figure PB.5b, note that we have used default optionalities in the composite entities named PROD_COMP and COMP_VENB. Naturally, these optionalities must be verified against the business rules before the design is implemented. However, at this point the optionalities make sense—after all, various versions of a PRODUCT do not necessarily contain all available COMPONENTs, nor do all VENDORs supply all COMPONENTs. Quite aside from the likely existence of the relationships we just pointed out, optionalities are generally desirable from an operational point of view—at least from the database management angle. Yet, no matter how “obvious” a relationship may appear to be, it is worth repeating that the existence of the optionalities must be verified. Designs that do not reflect the actual data environment are not likely to be useful at the end-user level. Given the ERDs in Figure PB.5b and the sample data in Figure PB.5c, you can see that each PRODUCT entry actually represents a product line, that is, a collection of products belonging to the same product type or line, rather than a specific product occurrence with a unique serial number. Therefore, this model will not enable us to identify the serial number for each component used in, for example, a product with serial number 348765. Therefore, this solution does not allow us to track the provider of a part that was used in a specific PRODUCT occurrence. (Note the example in Figure PB.5c.) In other words, the model does not answer the question, who is the vendor of the component C1 used in product P1?

FIGURE PB.5c An Initial Implementation PRODUCT

PROD_COMP

COMPONENT

COMP_VEND

VENDOR

P1 P1 P2 P2 P2 P3 P3

C1 C2 C3 C4

C1 C1 C1 C2 C2 C2 C3 C4 C4

V1 V2 V3

P2 P3

C1 C2 C1 C3 C4 C2 C3

V1 V2 V3 V1 V2 V3 V2 V2 V3

As you examine Figure PB.5c, note that there are no serial numbers for the components, nor are there any for the products produced. In other words, we do not meet the requirements imposed by: BUSINESS RULE 3 Each product has a unique serial number. For example, there will be several products P1, each with a unique serial number. Each unique product will be composed of several components, and each of those components has a unique serial number. The implementation of business rule 3 will allow us to keep track of the supplier of each component. One way to produce the tracking capability required by business rule 3 is to use a ternary relationship between PRODUCT, COMPONENT, and VENDOR, shown in Figure PB.5d:

803

FIGURE PB.5d The Crow’s Foot Ternary Relationship between PRODUCT, COMPONENT, and VENDOR

The ER diagram we have just shown represents a many-to-many-to-many TERNARY relationship, expressed by M:N:P. This ternary relationship indicates that: 

A product is composed of many components and a component appears in many products.



A component is provided by many vendors and a vendor provides many products.



A product contains components of many vendors and a vendor’s components appear in many products.

Assigning attributes to the SERIALS entity, we may draw the dependency diagram shown in Figure PB.5e.

FIGURE PB.5e The Initial Dependency Diagram

P_SERIAL

C_SERIAL

PROD_TYPE

COMP_TYPE VEND_CODE

partial dependency transitive dependencies

804

We may safely assume that all serial numbers are unique. If we make this assumption, we can conclude that the product serial number will identify the product type and that the component serial number will identify the component type and the vendor. Using the standard normalization procedures, we may thus decompose the entity as shown in the dependency diagrams in Figure PB.5f.

FIGURE PB.5f The Normalized Structure The Original Dependency Diagram

P_SERIAL C_SERIAL

PROD_TYPE COMP_TYPE VEND_CODE

partial dependency transitive dependencies

The Normalized Dependency Diagrams P_SERIAL

Table name: P_SERIAL

PROD_TYPE

C_SERIAL COMP_TYPE VEND_CODE

P_SERIAL

C_SERIAL

Table name: C_SERIAL

Table name: SERIAL

As you examine the dependency diagrams in Figure PB.5f, note the following: 

P_SERIAL has a 1:M relationship with PRODUCT, because one product has many product serial numbers.



C_SERIAL has a 1:M relationship with COMPONENT, because one component has many component serial numbers.



SERIAL is the composite entity that connects P_SERIAL and C_SERIAL, thus reflecting the fact that one product has many components and a component can be found in many products.

To illustrate the relationships we have just described, let’s take a look at some data in Figure PB.5g:

805

FIGURE PB.5g Sample Data P_SERIAL

PROD_TYPE

C_SERIAL

COMP_TYPE

VENDOR

X0D101 X0C102 200201 200202 200203 200204 300301 300302

P1 P1 P2 P2 P2 P2 P3 P3

C90001 C90002 C90003 C80003 C80002 C80909 C80976 C80908 C80965 C76894 C40097 C45096 C67673 C45679

C1 C1 C1 C2 C2 C2 C2 C2 C3 C3 C4 C4 C4 C4

V1 V2 V3 V1 V1 V2 V3 V2 V2 V2 V2 V2 V3 V3

P_SERIAL

C_SERIAL

X0D101 X0C101 X0C102 X0C102 200201 200201 200201 …….. etc.

C90001 C80976 C90002 C80002 C90002 C76894 C45678 ….. etc.

The new ER diagram will enable us to identify the product by a unique serial number, and each of the product’s components will have a unique serial number, too. Therefore, the new ER diagram will look like Figure PB.5h.

806

FIGURE PB.5h The Revised (Final) Crow’s Foot ERD

As you examine Figure PB.5h’s ERD, note that the COMP_VEND composite entity seems redundant, because the CSERIAL entity already depicts the many-to-many relationship between VENDOR and COMPONENT. However, COMP_VEND represents a more general relationship that enables us to determine who the likely providers of the general component are (What vendors supply component C1?), rather than letting us determine a specific component’s vendor (Which vendor supplied the component C1 with a serial number C90003?). The designer must confer with the end user to decide whether such a general relationship is necessary or if it can be removed from the database without affecting its semantic contents. 945. Create an ER diagram for a hardware store. Make sure you cover (at least) store transactions, inventory, and personnel. Base your ER diagram on an appropriate set of business rules that you develop. (Note: It would be useful to visit a hardware store and conduct interviews to discover the type and extent of the store’s operations.) Answer: Since the problem does not specify a set of business rules, we will create some that will enable us to develop an initial ER diagram.

807

NOTE Please take into consideration that, depending on the assumptions made and on the selection of business rules, students are likely to create quite different solutions to this problem. You may find it quite useful to study each student solution and to incorporate the most interesting parts of each solution into a common ER diagram. We know that this is not an easy job, but your students will benefit because you will thus enable them to develop very important analytical skills. You should stress that:  

A problem may be examined from many different angles. Similar organizations, using different business rules, will generate design problems that may be solved through the use of quite different solutions.

To get the class discussion started, we will assume these business rules: 1. A product is provided by many suppliers, and a supplier can provide several products. 2. An employee has many dependents, but a dependent can be claimed by only one employee. 3. An employee can write many invoices, but each invoice is written by only one employee. 4. Each invoice belongs to only one customer, and each customer owns many invoices. 5. A customer makes several payments, and each payment belongs to only one customer. 6. Each payment may be applied partially or totally to one or more invoices, and each invoice can be paid off in one or more payments. Using these business rules, we may generate the ERD shown in Figure PB.6A.

808

FIGURE PB.6A The Crow’s Foot ERD for Problem 6 (The Hardware Store)

The ERD shown in Figure PB.6A requires less tweaking than the previous ERDs to get it ready for implementation. For example, given the presence of the INV_LINE entity, the customer can buy more than one product per invoice. Similarly, the ORD_LINE entity makes it possible for more than one product to be ordered per order. However, as you examine the PAYMENT entity in Figure PB.6A, note that the current PK definition limits the payments for a given customer and invoice number to one per day. (Two payments by the same customer for the same invoice number on the same date would violate the

809

entity integrity rules, because the two composite PK values would be identical in that scenario.) Therefore, the design shown in Figure PB.6A still requires additional work to be completed during the verification process. 946. Use the following brief description of operations as the source for the next database design. All aircraft owned by ROBCOR require periodic maintenance. When maintenance is required, a maintenance log form is used to enter the aircraft’s identification number, the general nature of the maintenance, and the maintenance starting date. A sample maintenance log form is shown in Figure PB.7A.

FIGURE PB.7A The Maintenance Log Form

Answer: Note that the maintenance log form contains a space used to enter the aircraft release date and a signature space for the supervising mechanic who releases the aircraft into service. Each maintenance log form is numbered sequentially. Note: A supervising mechanic is one who holds a special Federal Aviation Administration (FAA) Inspection Authorization (IA).

810

Three of ROBCOR’s ten mechanics hold such an IA. Once the maintenance log form is initiated, the maintenance log form’s number is written on a maintenance specification sheet, also known as a maintenance line form. When completed, the specification sheet contains the details of each maintenance action, the time required to complete the maintenance, parts (if any) used in the maintenance action, and the identification of the mechanic who performed the maintenance action. The maintenance specification sheet is the billing source (time and parts for each of the maintenance actions), and it is one of the sources through which parts use may be audited. A sample maintenance specification sheet (line form) is shown in Figure PB.7B.

FIGURE PB.7B The Maintenance Line Form

Parts used in any maintenance action must be signed out by the mechanic who used them, thus allowing ROBCOR to track its parts inventory. Each sign-out form contains a listing of all parts associated with a given maintenance log entry. Therefore, a parts sign-out form contains the maintenance log number against which the parts are charged. In addition, the parts sign-out procedure is used to update the ROBCOR parts inventory. A sample parts sign-out form is shown in Figure PB.7C. Mechanics are highly specialized ROBCOR employees, and their qualifications are quite different from those of an accountant or a secretary, for example.

811

Given this brief description of operations and using the Chen ER methodology, draw the fully labeled ER diagram. Make sure you include all appropriate relationships, connectivities, and cardinalities.

FIGURE PB.7C The Parts Sign-Out Form

Before drawing the ER diagram, note the following relationships: 

Not all employees are mechanics, but all mechanics are employees. Therefore, the MECHANIC entity is optional to EMPLOYEE. The EMPLOYEE is the supertype to MECHANIC.



All mechanics must sign off work on the MAINTENANCE they performed and they must sign out for the PART(s) used.



Only some mechanics (the IAs) may sign off the LOG. Therefore, LOG is optional to MECHANIC.



Because not all MAINTENANCE entries are associated with a PART—some maintenance doesn’t require parts—PART is optional to MAINTENANCE.

These relationships are all reflected in the ER diagrams shown in Figure PB.7.

812

FIGURE PB.7 The Initial Crow’s Foot ERD for Problem 7 (ROBCOR Aircraft Service)

As you discuss the ERD shown in Figure PB.7, note its similarity to the car dealership’s maintenance section of the ERD presented in Figure PB.3a. However, the ROBCOR Aircraft

813

Service ERD has been developed at a much higher detail level, thus requiring fewer modifications during the verification process. Figure PB.7 shows that: 

Each LOG entity occurrence will yield one or more maintenance procedures.



Each of the individual maintenance procedures will be listed in the LOG_LINE entity.



A mechanic must sign off on each of the LOG_LINE entity occurrences.



The possible parts use in each LOG_LINE entity occurrence is now traceable.



A part can be accounted for from the moment it is signed out by the mechanic to the point at which it is installed during the maintenance procedure.

The “references” relationship between LOG and PART is subject to discussion. After all, you can always trace each part’s use to the LOG through the LOG_LINE entity. Therefore, the relationship is redundant. Such redundancies are—or should be—picked up during the verification process. We have shown the MECHANIC to be a subtype of the EMPLOYEE supertype. Whether the supertype/subtype relationship makes sense depends on the type and extent of the attributes that are to be associated with the MECHANIC entity. There may be externally imposed requirements—often imposed through the government’s regulatory process—that can best be met through a supertype/subtype relationship. However, in the absence of such externally imposed requirements, it is usually better to use an attribute in EMPLOYEE—such as the employee’s primary job code—and link the employees to their various qualifications through a composite entity. The applications software will then be used to enforce the requirement that the person doing maintenance work is, in fact, a mechanic. 947. You have just been employed by the ROBCOR Trucking Company to develop a database. To gain a sense of the database’s intended functions, you spent some time talking to ROBCOR’s employees and you examined some of the forms used to track driver assignments and truck maintenance. Your notes include the following observations: 

Some drivers are qualified to drive more than one type of truck operated by ROBCOR. A driver may, therefore, be assigned to drive more than one truck type during some period of time. ROBCOR operates several trucks of a given type. For example, ROBCOR operates two panel trucks, four half-ton pick-up trucks, two single-axle dump trucks, one double-axle truck, and one 16-wheel truck. A driver with a chauffeur’s license is qualified to drive only a panel truck and a half-ton pick-up truck and, thus, may be assigned to drive any one of six trucks. A driver with a commercial license with an appropriate heavy equipment endorsement may be assigned to drive any of the 10 trucks in the ROBCOR fleet. Each time a driver is assigned to drive a truck, an entry is made in a log containing the employee number, the truck identification, and the sign-out (departure) date. Upon the driver’s return, the log is updated to include the sign-in (return) date and the number of driver duty hours.



If trucks require maintenance, a maintenance log is filled out. The maintenance log includes the date the truck was received by the maintenance crew. The truck cannot be released for service until the maintenance log release date has been entered and the log has been signed off by an inspector.



All inspectors are qualified mechanics, but not all mechanics are qualified inspectors.

814



Once the maintenance log entry has been made, the maintenance log number is transferred to a service log in which all service log transactions are entered. A single maintenance log entry can give rise to multiple service log entries. For example, a truck might need an oil change as well as a fuel injector replacement, a brake adjustment, and a fender repair.



Each service log entry is signed off by the mechanic who performed the work. To track the maintenance costs for each truck, the service log entries include the parts used and the time spent to install the part or to perform the service. (Not all service transactions involve parts. For example, adjusting a throttle linkage does not require the use of a part.)



All employees are automatically covered by a standard health insurance policy. However, ROBCOR’s benefits include optional copaid term life insurance and disability insurance. Employees may select both options, one option, or no options.

Given those brief notes, create the ER diagram. Make sure you include all appropriate entities and relationships and define all connectivities and cardinalities. Answer: The ERD in Figure PB.8 contains a maintenance portion that has become our standard, given that it enables the end user to track all activities and parts for all vehicles. In fact, given its ability to support high accountability standards, we first developed the “basics” of this design for aviation maintenance tracking.

815

FIGURE PB.8 The Initial Crow’s Foot ERD for the ROBCOR Trucking Service

816

As you examine the ERD in Figure PB.8, note that the driver assignment to drive trucks is a M:N relationship: Given the passage of time, a driver can be assigned to drive a truck many times and a truck can be assigned to a driver many times. We have implemented this relationship through the use of a composite entity named ASSIGN. The M:N relationship between EMPLOYEE and BENEFIT—that is, the insurance package mentioned in Problem 8’s last bullet—has been implemented through the composite entity named EMP_BEN. (An employee can select many benefit packages and each insurance package may be selected by many employees.) The reason for the optionality is based on the fact that not all of the insurance packages are necessarily selected by the employee. For example, using the BENEFIT table contents shown in Table PB.8A, an employee may decide to select option 2 or options 2 and 3, or neither option. (The standard health insurance package is assigned automatically.)

Table PB.8A Table name: BENEFIT BEN_CODE

BEN_DESCRIPTION

BEN_CHARGE

Standard health

$0.00

Co-paid term life insurance, $100,000

$35.00

Co-paid disability insurance

$42.50

Incidentally, we have used a BENEFIT entity, rather than an INSURANCE entity, to anticipate the likelihood that benefits may include items other than insurance. For example, employees might be given a benefit such as an investment plan, a flextime option, and child care. The decomposition of M:N relationships continues to be a good subject for discussion. For example, we have shown many of the decompositions as composite entities. However, while such an approach is perfectly acceptable at the initial design stage, caution your students that composite PKs cannot be referenced easily by subsequent additions of entities that must reference those PKs. Therefore, we would note that the composite PK used in the LOG_ACTION entity—EMP_ID + LOG_NUM + LOGACT_TYPE—should be replaced by an “artificial” singleattribute PK named LOGACT_NUM. The EMP_ID and LOG_NUM attributes would continue to be used as FKs to the MECHANIC and LOG entities. (Naturally, the EMP_ID and LOG_NUM attributes should be indexed to avoid duplication of records and to speed up queries.) A few sample entries are shown in Table PB.8B.

817

Table PB.8B Table name: LOG_ACTION LOGACT_NUM

LOG_NUM

EMP_ID

LOGACT_TYPE

LOGACT_DATE

1000

5023

409

Open

14-May-2014

1001

5024

409

Open

15-May-2014

1002

5023

411

15-May-2014

1003

5025

378

Open

15-May-2014

1004

5024

411

15-May-2014

1005

5026

409

Open

16-May-2014

Finally, we have used supertype/subtype relationships between EMPLOYEE and DRIVER and MECHANIC. If drivers and mechanics are assumed to have many characteristics (such as special certifications at different levels) that are not common to EMPLOYEE, this approach eliminates nulls. However, keep in mind the discussion about the use of supertypes/subtypes in Problem 2. (The use of the supertype/subtype approach may be dictated by external factors … but the use of supertypes and subtypes must be approached with some caution. For example, if drivers have multiple license types, it would be far better to create a LICENSE entity and relate it to DRIVER through a composite entity, perhaps named DRIVER_LICENSE. The composite entity may then be designed to include the date on which the license was earned and other pertinent facts pertaining to licenses. Such flexibility is not available in a subtype, unless you are willing to tolerate the possible occurrence of nulls as more pertinent data about the (multiple) licenses are kept—if some of the drivers do not have all of those licenses.)

818

Solution and Answer Guide CORONEL AND MORRIS, DATABASE SYSTEMS: DESIGN, IMPLEMENTATION AND MANAGEMENT, ©2023, 9780357673034; APPENDIX C: THE UNIVERSITY L AB: CONCEPTUAL DESIGN VERIFICATION, LOGICAL DESIGN, AND IMPLEMENTATION

TABLE OF CONTENTS Answers to Review Questions .............................................................................................819 Answers to Problems ...........................................................................................................831

ANSWERS TO REVIEW QUESTIONS 948. Why must a conceptual model be verified? What steps are involved in the verification process? Answer: The verification of a conceptual model is crucial to a successful database design. The verification process allows the designer to check the accuracy of the database design by: 

Re-examining data and data transformations.



Enabling the designer to evaluate the design efficiency relative to the end user’s and system’s design goals.

Keep in mind that, to a large extent, the best design is the one that serves the end-user requirements best. For example, a design that works well for a manufacturing firm may not fit the needs of a marketing research firm, and vice versa. The verification process helps the designer to avoid implementation problems later by: 

Validating the model’s entities. (Remember the minimal data rule.)



Confirming entity relationships and eliminating duplicate, unnecessary, or improperly defined relationships.



Eliminating data redundancies.



Improving the model’s semantic precision to better represent real-world operations.



Confirming that all user requirements (processing, performance, or security) are met.

Verification is a continuous activity in any database design. The database design process is evolutionary in nature: It requires the continuous evaluation of the developing model by examining the effect of adding new entities and by confirming that any design changes enhance the model’s accuracy. The verification process requires the following steps:

819

1. Identify the database’s central entity. The central entity is the most important entity in our database, and most of the other entities depend on it. 2. Identify and define each module and its components. The designer divides the database model into smaller sets that reflect the data needs of particular systems modules such as inventory, orders, payroll, and so on. 3. Identify and define each of the module’s processes. Specifically, this step requires the identification and definition of the database transactions that represent the module’s real-world operations. 4. Verify each of the transactions against the database. 949. What steps must be completed before the database design is fully implemented? (Make sure that you list the steps in the correct sequence and discuss each step briefly.) Answer: The DBLC, discussed in detail in Chapter 9, “Database Design,” constitutes a database’s history, tracing it from its conceptual design to its implementation and operation. We highly recommend that the database designer follow the DBLC’s steps carefully in order to ensure that the database will properly meet all user and system requirements. Before a database can be successfully implemented, the following steps must be completed: 1. Define the conceptual model’s components: entities, attributes, domains, and relationships. 2. Normalize the database to ensure that all transitive dependencies are eliminated and that each entity’s attributes are solely dependent on its key attribute(s). 3. Verify the conceptual model to ensure that the proposed database will meet the system’s transaction requirements and that the end-user and systems requirements will be met. The verification process will probably delete and/or create entities, attributes, and relationships. It may also refine existing entities, attributes, and relationships. 4. Create the logical design that requires the definition of the table structures, using a specific DBMS (relational, network, or hierarchical). Logical design also includes, if necessary, appropriate indexes and views. 5. Create the physical design to define access paths, including space allocation, storage group creation, table spaces, and any other physical storage characteristic that is dependent on the hardware and software to be used in the system’s implementation. 6. Implement the design. Somehow, this last step seems to suffer from planning neglect, to the detriment of the system’s operation. Implementation, operation, and maintenance plans must (at least) include careful definition and description of the activities required to implement the database design: 

Loading and conversion



Definition of database standards



System and procedures documentation: security, backup, and recovery

820



Operational procedures to be followed by users



A detailed training plan



Identification of responsibilities for operation and maintenance

950. What major factors should be addressed when database system performance is evaluated? Discuss each factor briefly. Answer: Database systems performance refers to the system’s ability to retrieve information within a reasonable amount of time and at a reasonable cost. Keeping in mind that “reasonable” means different things to different people, we must address at least these important performance factors: 

Concurrent users For any given system, the more users connected to the system, the longer the data retrieval time.



Resource limits The fewer resources that are available to the user, the longer the access queues will be.



Communication speeds Lower communication speeds mean longer response times.



Query response time Queries must be tuned to provide optimum query response time. (See Chapter 11, “Database Performance Tuning.”) Lack of query response tuning means slow response times. Depending on how good the design and the program code are, the query response time can vary from minutes to hours for the same query.

Although the preceding discussion is focused on the speed aspect of performance, there are other equally important issues that must be considered. A successful database implementation requires a balanced approach to all database issues, including concurrency control, query response time, database integrity, security, backup and recovery, data replication, and data distribution. 951. How would you verify the ER diagram shown in Figure QC.4? Make specific recommendations. Answer:

821

Figure QC.4 The ERD for Question 4

The verification process must include the following steps: 1. Identify and define the main entities, attributes, and domains. In this case, the main entities are PARTS, SUPPLIER, PRODUCT, and CUSTOMER. Identify proper primary keys and composite and multivalued attributes. 2. Identify and define the relationships among the entities. By examining the diagram, we may conclude that several M:N relationships exist: PARTS and SUPPLIER PARTS and PRODUCTS PRODUCT and CUSTOMER 3. Identify the composite entities and their primary and foreign keys. Each composite (bridge) entity creates the connection to maintain a 1:M relationship with each of the original entities. 4. Normalize the model. 5. Verify the model, starting with the identification of the central entity. Given the ER diagram’s layout, we conclude that the central entity is PRODUCT. 6. Identify each module and its components. Three modules can be identified: 

Inventory, containing PARTS and SUPPLIER



Production, containing PARTS and PRODUCT

822



Sales, containing PRODUCT and CUSTOMER

7. Identify each module’s processes or transaction requirements. Start by listing known transaction descriptions by module. For brevity’s sake, we will use the inventory module as an example. The inventory module supports the following transactions: 

Add a new product to inventory



Modify an existing product in inventory



Delete a product from inventory



Generate a list of products by product type



Generate a price list with product by product type



Query the product database by product description

Check the database model against these transaction requirements, verify the model’s efficiency and effectiveness, and make the necessary changes. 952. Describe and discuss the ER model’s treatment of the UCL’s inventory/order hierarchy: a. Category b. Class c. Type d. Subtype Answer: The objective here is to focus student attention on the details of the UCL’s approach to inventory management. Note that the UCL’s ER model uses two closely related entities to manage items in inventory: ITEM and INVENTORY_TYPE. These two entities maintain a 1:M relationship: One item belongs to only one inventory type, but an inventory type can contain many items. Inventory types are classified through the use of a hierarchy composed of CATEGORY, CLASS, and TYPC. (We may even identify SUBTYPE for each TYPE!) Basically, the hierarchy may be described this way: A category has many classes, and a class has many types. For example, the category hardware includes the classes computer and printer. The class computer has many types that are defined by their CPU: 486 and Pentium computers. Similarly, the category supplies can have several classes: diskette, paper, and so on. Each class can have many types: 3.5 DD diskette, 3.5 HD diskette, 8.5 × 11 paper, 8.5 × 14 paper, and so on. We may even identify subtypes: Each type can have many subtypes. For example, the class “paper” includes the types “single-sheet” and “continuous-feed”; the single-sheet type may be classified by subtype 8 × 11 inches or 11 × 14 inches. The following table summarizes some of the inventory types identified in the system. Note that the hierarchy may be illustrated as shown in Table QC.5A.

823

Table QC.5A The Classification Hierarchy

CATEGORY

CLASS

TYPE

SUBTYPE

HWPCDTP5

Hardware (HW)

Personal Computer (PC)

Desktop (DT)

Pentium (P5)

HWPCLP48

Hardware (HW)

Personal Computer (PC)

Laptop (LT)

Pentium IV

HWPRLS

Hardware (HW)

Printer (PR)

Laser (LS)

Standard

HWPRDM80 Hardware (HW)

Printer (PR)

Inkjet (IJ)

80-column

SUPPSS11

Supply (SU)

Paper (PP)

Single Sheet (SS)

8.5″ × 11″

HWEXHDID

Hardware (HW)

Expansion Board (EX)

Video (VI)

SWDBXXXX

Software (SW)

Database (DB)

The classification hierarchy may also be illustrated with the help of the tree diagram shown in Figure QC.5:

Figure QC.5 The INV_TYPE Classification Hierarchy as a Tree Diagram

CATEGORY

Hardware

CLASS

Personal Computer (PC)

Printer (PR)

TYPE

Desktop (DT)

Inkjet(IJ)

SUBTYPE

Intel P4 (300)

Intel P5 (600)

Black (BL)

Color (CO)

825

953. Modern businesses tend to provide continuous training to keep their employees productive in a fast-changing and competitive world. In addition, government regulations often require certain types of training and periodic retraining. (For example, pilots must take semiannual courses involving weather, air regulations, and so on.) To make sure that an organization can track all training received by each of its employees, trace the development of the ERD segment in Figure QC.6 from the initial business rule that states: Answer: An employee can take many courses, and each course can be taken by many employees. Once you have traced the development of the ERD segment, verify it and then provide sample data for each of the three tables to illustrate how the design would be implemented.

Figure QC.6 The ERD for Question 6

Follow the verification steps described in the answer to Question 4. Note that the composite TRAINING entity shown in Figure QC.6 reflects part of the verification process that began with the M:N relationship between EMPLOYEE and COURSE. (An employee can take many courses and many employees can take each course.) Part of the verification process involves the elimination of multivalued attributes. For example, an EMPLOYEE table that contains an attribute EMP_TRAINING containing strings such as “fire safety, weather, air regulations” have already been eliminated by the composite TRAINING entity. The structure shown in Figure QC.6 allows us to add attributes to ensure that training details—such as dates, grades, training locations, and so on—can be traced, too. One additional—and very important—point is worth mentioning: at this point, Figure QC.6’s ERD cannot handle recurrent training requirements. That is, if some courses must be retaken periodically, as is common in many transportation businesses, the TRAINING entity’s PK—at this point composed of the EMP_NUM + COURSE_CODE—will not yield a unique value if the course is retaken from time to time. The solution to this problem can be found in either one of two ways: 1. Add the training date to the TRAINING entity’s composite PK to become EMP_NUM + COURSE_CODE + TRAIN_DATE. This approach is illustrated in the examples shown in Tables QC.6A through QC.6C. Note that employee 105 took the FAR-135-P course on 26-Sep-2013 and on 11-Feb-2014. Employee 101 took the WEA-01 course on 26-Sep-2013 and on 26-Mar-2014. Note that the addition of the TRAIN_DATE to the composite PK prevents the duplication of training records. For example, if you tried to enter the first TRAINING record twice, the combination of EMP_NUM+COURSE_CODE+TRAIN_DATE would not be unique and the DBMS would diagnose an entity integrity violation.

826

Table QC.6A The EMPLOYEE Table Contents EMP_NUM

EMP_LNAME

105

Ortega

101

Williams

Table QC.6B The TRAINING Table Contents EMP_NUM

COURSE_CODE

TRAIN_DATE

TRAIN_GRADE

105

FAR-135-P

26-Sep-2013

105

HM-01

18-Dec-2013

101

FAR-135-P

23-Nov-2013

105

WEA-01

10-Mar-2014

101

HM-01

15-Sep-2013

101

WEA-01

26-Sep-2013

105

FAR-135-P

11-Feb-2014

101

WEA-01

26-Mar-2014

Table QC.6C The COURSE Table Contents COURSE_CODE

COURSE_DESCRIPTION

FAR-135-P

Aircraft charter regulations for pilots

FAR-135-M

Aircraft maintenance for charter operations

HM-01

Hazardous materials handling

WEA-01

Aviation weather – basic operations

WEA-02

Aviation weather – instrument operations

827

2. Create a new PK attribute named TRAIN_NUM to uniquely identify each entity occurrence in the TRAINING entity, and then create a composite index composed of EMP_NUM + COURSE_CODE + TRAIN_DATE. This action will remove the weak/composite designation from the TRAINING, because the TRAINING entity’s PK is no longer composed of the PK attributes of the EMPLOYEE and COURSE entities. (And the “receives” and “is used in” relationships will no longer be classified as “identifying”—thus changing the relationship descriptions from “identifying” or “strong” to “non-identifying” or weak”). The composite index will prevent the duplication of records. Note the change in the structure and contents of the TRAINING table shown in Table QC.6D.

Table QC.6D The Modified TRAINING Table Structure and Contents TRAIN_NUM

EMP_NUM

COURSE_CODE

TRAIN_DATE

TRAIN_GRADE

1203

105

FAR-135-P

26-Sep-2013

1204

105

HM-01

18-Dec-2013

1205

101

FAR-135-P

23-Nov-2013

1206

105

WEA-01

10-Mar-2014

1207

101

HM-01

15-Sep-2014

1208

101

WEA-01

26-Sep-2013

1209

105

FAR-135-P

11-Feb-2014

1210

101

WEA-01

26-Mar-2014

We would recommend the second approach. Generally speaking, single-attribute PKs are preferred over composite PKs. Single-attribute PKs are more easily handled if the table is to be linked to a related table later. (The linking is done through a FK—which is the PK in the “parent” table. But if the parent table uses a composite PK, how can you then create the appropriate FK?) In any case, the declaration of a composite PK automatically generates a matching composite index, so you would not decrease the index library if you used approach 1. 954. You read in this appendix that an examination of the UCL’s Inventory Management module reporting requirements uncovered the following problems: Answer: 

The Inventory module generates three reports, one of which is an inventory movement report. But the inventory movements are spread across two different entities (CHECK_OUT and WITHDRAW). That spread makes it difficult to generate the output and reduces the system’s performance.



An item’s quantity on hand is updated with an inventory movement that can represent a purchase, a withdrawal, a check-out, a check-in, or an inventory adjustment. Yet only the withdrawals and check-outs are represented in the system.

828

What solution was proposed for that set of problems? How would such a solution be useful in other types of inventory environments? The proposed solution was to create a common entry point for all inventory movements. This common entry point is represented by a new entity named INV_TRANS. The INV_TRANS entity is used to record an entry for each inventory transaction. In other words, the system keeps track of all inputs to and withdrawals from inventory by using this INV_TRANS entity. It is important to realize that the INV_TRANS entity is a crucial entity in the system, because it reflects all item transactions. Such a solution is not unique to the UCL’s inventory system: Most inventory systems must be able to keep track of such transactions. Having a central point of reference facilitates the processing, updating, querying, and reporting capabilities of the inventory system. The UCL’s data model keeps track of several types of inventory transaction purposes or motives: checkouts, withdrawals, adjustments, and purchases. Note the system’s flexibility: The user is able to classify all inventory transactions by type and/or motive. In addition to being flexible, the UCL system is easily expandable: If necessary, the system can support additional types of inventory transaction motives. For example, the system may be expanded to include inter-warehouse inventory transfers, items retired from inventory because they are date-limited, and so on. (Date-limited inventory is typical for such things as pharmaceuticals, food, and so on.) Given its flexibility and expandability, we may conclude that the UCL system’s inventory data model represents a very viable solution to modeling real-world inventory transactions. Therefore, it may be used to fit into just about any inventory environment.

NOTE Optimum vs. Implemented Solutions The final UCL ERD makes use of the INV_TRANS entity to replace the WITHDRAW entity. Perhaps some of your students wonder about the similarity of the CHECK_OUT and CO_ITEM entities when compared to the INV_TRANS and INTR_ITEM entities. For instance, it is quite appropriate to argue that CHECK_OUT is a type of inventory transaction and that, therefore, CHECK_OUT is a subtype of an INV_TRANS supertype. Why did the designer create such apparent system redundancy? Why wasn’t the type/subtype hierarchy used more efficiently? (Classification hierarchies and supertypes/subtypes are covered in Chapter 5, “Advanced Data Modeling.”) To answer this question, return to the discussion about fine-tuning the database for performance, integrity, and security. Based on the estimation of the number of transactions, the number of items, and the number of the possible concurrent accesses to the INV_TRANS entity, it was clear that this entity will be one of the most active in the system. The large number of check-outs reports and the even larger expected number of inventory transactions prompted both the designer and the end user to choose either controlled redundancy or having a performance bottleneck. Perhaps some students will argue that the use of the CHECK_OUT and CO_ITEM entities represents a major burden to the system and that, therefore, the system should be

829

implemented without these entities. This argument clearly has some merit: The only immediate advantage of having the CHECK_OUT and CO_ITEM entities is that the inventory check-outs report uses these entities, rather than the INV_TRANS and INTR_ITEM entities. Therefore, the elimination of CHECK_OUT and CO_ITEM reduces the concurrent access conflicts for the INV_TRANS and INTR_ITEM entities. Finally, we note that both the designer and the end user are aware of the consequences of the selected solution. Remember, this is a real solution to a real problem, and it helps to illustrate the point that we made earlier: The best solution is not always the one that is implemented. Each system is subject to constraints, and the designer must inform the end user of the consequences of the data modeling design selections. An important note in primary key selection for multiuser systems: The LOG is an entity that keeps a record of all the students that use the UCL. Note that the primary key is formed by LOG_DATE, LOG_TIME, and USER_ID. Ask the students why USER_ID has been made a part of the primary key. Since each user can be in only one place at one time, it seems safe to assume that USER_ID does not need to be part of the primary key. So why not just use a primary key composed of LOG_DATE and LOG_TIME? For example, suppose that the student Christobal Colombus enters the UCL on 02-Mar-2014 at 02:10:11 pm. To use the UCL’s facilities and services, Mr. Columbus must give his student identification card to the lab assistant. Clearly, Mr. Colombus can only be at that one location at that time. When the lab assistant enters Mr. Columbus’s USER_ID, that entry is made at a specific and unique date and time. When the lab assistant registers the next student, that student’s USER_ID is entered at a different time in the computer’s clock. Since every USER_ID entry is made at a different time, there seems to be no need for the USER_ID to be part of the primary key. However, this scenario is correct only in a single-user, stand-alone system. Remember that the UCL system runs in a LAN and that the ACCESS module is accessed by two lab assistants through two different terminals. Therefore, it is possible that, at a given time, both data entries are made at the same computer clock time. When the data are to be saved to the database, one of the two entries will be executed first; and, to preserve entity integrity, the second entry will be aborted because the date and time already exist in the database. Since it is possible to have two users register in the LOG during the same day and at the same time, only their USER_IDs will be different. Therefore, to ensure uniqueness of the primary keys, the inclusion of USER_ID as part of the primary key is quite appropriate. Of course, you might use the LOG_READER, instead of the USER_ID, to define the primary key. After all, the same LOG_READER cannot be swiped twice at the precise same time. In either case, the uniqueness of the entry is preserved, thus preserving entity integrity. Which attribute (USER_ID or LOG_READER) is used as a part of the primary key is the designer’s decision. The only requirement is that entity integrity is maintained.

830

ANSWERS TO PROBLEMS Verify the conceptual model you created in Appendix B, Problem 3. Create a data dictionary for the verified model. Answer: The verification of the car dealership’s database design conforms to the verification process described in Appendix C. (We have also illustrated the verification process in this appendix’s review question 4.) Since the verification process has already been explored in depth in several places, we will focus on the ERDs that were modified during the course of the verification process. Use the data dictionary format shown in Chapter 8, “Advanced SQL”, Table 8.2 as your data dictionary template. The basic verified database design is shown in Figure PC.1. As you discuss Figure PC.1, note that the verification process substantially modified the service component of the initial ERD. (See the discussion that accompanies Figure PD.3a in this manual’s Appendix D, problem 3.) These changes reflect the increasingly important accountability requirements. As you examine the ERD in Figure PC.1, focus on the following features: 

SALESPERSON and MECHANIC are subtypes of the supertype EMPLOYEE. This feature is based on the likelihood that the subtypes contain data that are unique to those subtypes. For example, a salesperson is likely to have at least part of his/her pay determined by sales commissions. Similarly, mechanics are likely to have special certification and training requirements that “general” employees are not likely to have. The use of these subtypes eliminates nulls in the EMPLOYEE table, thus making them desirable in this case.



Although some employee job-related data are stored in their subtypes—see, for example, our discussion of the SALESPERSON and MECHANIC subsets—we still need to know what the employee job assignments are. Although we have not included pay and benefit options in this design, both options are likely to be job related. Some jobs are paid on an hourly basis, some on a weekly basis, and some jobs are salaried. Base pay schedules are usually determined by job qualifications. Therefore, the JOB entity stores a JOB_PERIOD attribute (hour, week, or year) and a JOB_PAY attribute. If the JOB_PERIOD = “hour”, the JOB_PAY = $18.90 is clearly an hourly rate. If the JOB_PERIOD = “year”, a JOB_PAY = $45,275 is clearly a yearly salary. In larger companies, job assignments are useful in tracking the distribution of job “densities” to see if some job classification distributions are appropriate to meet the business objectives. (Do we have too many employees who are classified as “support” personnel? Too many accountants?) Also, note that the relationship between JOB and EMPLOYEE reflects the business rules that each employee has only one job assignment at a time. Naturally, any given job can be held by many employees. For example, many employees may be mechanics, support personnel, accountants, and so on. Additional discussion points follow Figure PC.1.

831

Figure PC.1 The Verified Car Dealership Crow’s Foot ERD

832

Continued discussion of Figure PC.1’s ERD: 

To track all maintenance procedures and parts precisely, only qualified mechanics may open and close service logs, check-out parts, and sign-off service work. Note that the PART_LOG tracks all parts that have been logged out. The relationship between SERVICE_LOG and PART_LOG lets us trace all checked-out parts to a specific service log entry. The use of the PART_CODE in both the SVC_LOG_LINE and the PART_LOG entities makes it possible to write a query to let us check whether or not a logged out part was actually used. All the maintenance actions can be tracked at this point. We know who opened and closed the service log through the SCV_LOG_ACTION. We know which mechanic performed each maintenance procedure (in the SVC_LOG_LINE), and we know which mechanic checked out which part(s)—in PART_LOG.



If a car was sold by the dealer, that fact is recorded in an invoice. However, the CAR entity may be expanded to include a “bought here” Y/N attribute, in addition to mileage and other pertinent data. Also, cars owned by the dealership may simply show the dealer as the “customer.” (Naturally, you can add a DLR_CAR entity if the dealer car attributes and data tracking requirements are different from the customer CAR data.)



Before any car is sold to a customer, that car must be inspected and, if necessary, repaired. Therefore, even a new car will show up in the SERVICE_LOG. Therefore, the SVC_LOG_NUM will never be null in the INVOICE … even if the invoice records the sale of a car, rather than a specific service charge.



The PAYMENT entity has a rather limited set of options at this point. However, it does enable the manager to track multiple payments on a given invoice and to keep track of specific invoice balances. Further verification procedures would (most likely!) add functionality to the PAYMENT entity. For example, you might change the PAYMENT entity to an account transaction entity, perhaps named ACCT_TRANSACTION. This change would reflect the need to identify the transaction type—debit or credit—and, if a payment is made, the payment mode—cash, credit card, check.



The employee qualifications can now be tracked without limit. If an employee gains an additional qualification, all that is needed is an entry in the EDUCATION table.



The customer car data are stored in CAR, so we can keep the service records on all the customer cars, thus producing the required car histories. (To save space, we have not included all the appropriate attributes—but you can add such attributes as CAR_BOUGHT (Y/N) to indicate whether or not the car was bought at this dealership, CAR_LAST_MILES to indicate the mileage recorded during the most recent service, and so on.)



At this point, we assume that the SERVICE_LOG and SVC_LOG_LINE records yield the information required to bill the customer.

To set the stage for further discussion of Figure PC.1’s ERD, a few sample data entries in the added SERVICE_LOG, LOG_ACTION, MECHANIC, LOG_LINE, PART, and PART_LOG tables are useful. (The PART_CODE entry “000000” is a dummy PART entry that signifies “no part used.”) Also note that the order of the attributes is immaterial. (In other words, whether CAR_VIN is shown in the last column of the SERVICE_LOG table or as the first, second, or third column has no bearing on the discussion or on the results obtained from the use of the table.)

833

Sample SERVICE_LOG Data SVC_LOG_NUM

LOG_COMPLAINT

SVC_LOG_CHARGE

CAR_VIN

10012

Hard to start. Accelerates poorly.

$89.75

2AA-W-123456

10013

Oil change. Rotate and balance tires.

$19.95

5DR-T-8765432

10014

Temp gauge shows high temps.

$135.70

4UY-D-6543210

Sample SVC_LOG_ACTION Data SVC_LOG_NUM

SVC_LOGACT_TYPE

SVC_LOGACT_DATE

EMP_NUM

10012

Open

03-Mar-2014

104

10013

Open

03-Mar-2014

112

10012

04-Mar-2014

112

10014

Open

04-Mar-2014

104

10013

04-Mar-2014

104

Sample SVC_LOG_LINE Data (Several attributes left out to save space) SVC_LINE_NUM

SVC_LOG_NUM

SVC_LINE_WORK

EMP_NUM

10012

Cleaned injection nozzles

106

000000

10013

Drained oil

112

000000

10013

Installed filter

112

FLTR-0156

10013

Replaced oil

112

Oil-PZ30/40

10013

Rotated tires

114

000000

10013

Balanced tires, using four weights (LF0.5oz, RF1.1oz, RR1.2oz, LR0.8 oz)

106

WT-LD10012

10014

Drained coolant

104

000000

10014

Replaced thermostat

112

THERM007B

10014

Replaced coolant

104

COOL-289XZ

PART_CODE

834

Sample PART_LOG Data PARTLOG_N UM

EMP_NU M

PART_CO DE

SVC_LOG_N UM

PARTLOG_DA TE

PARTLOG_UNI TS

10185

112

FLTR-0156

10013

03-Mar-2014

10186

112

Oil-PZ30/40

10013

03-Mar-2014

10187

114

WTLD10012

10013

03-Mar-2014

10188

112

THERM007B

10014

04-Mar-2014

10189

114

COOL289XZ

10014

04-Mar-2014

The main processes that can be identified in this system include: 

The generation of an invoice (INSERT).



The car sales generation and reports (SELECT).



The registration of a service for a customer’s car (INSERT, UPDATE).



The registration of the work log or of the employees (mechanics) who worked on a car (INSERT, UPDATE).



The registration of parts inventory (INSERT, UPDATE).



The registration of parts used in a service (INSERT, UPDATE).



The registration of the car history (INSERT, UPDATE).



Queries and reports such as:  Parts List  Car Price List  Sales Reports  Service Report  Car History Report  Parts Used Report  Work Log Report

The designer must check that the database model supports all these processes and that the model is flexible enough to support future modifications.

835

If problems are encountered during the model’s verification against the required database transactions that are designed to support the identified processes, the designer must make the necessary changes to the data model. These changes are reflected in Figure PC.1.

NOTE The verification process for Problems 2–5 conforms to the process discussed at length in Problem 1. Therefore, we will only show the verified ERDs. The data dictionary format example shown in Problem 1 can also be used as the template in Problems 2–5. Therefore, we do not show additional data dictionaries for Problems 2–5. The ERDs supply the necessary entities, the attribute names, and the relationships. However, it will be very useful to compare the ERDs in the following problems to the original ERDs—in the previous chapter—from which they were derived. 955. Verify the conceptual model you created in Appendix B, Problem 4. Create a data dictionary for the verified model. Answer: Compare the ERD shown in Figure PC.2A to the ERD shown in Figure PB.4A (see Appendix B) to see the impact of the verification process. Use the data dictionary format shown in Chapter 8, Table 8.2 as your data dictionary template.

836

Figure PC.2A The Crow’s Foot Verified Conceptual Model for the Video Rental Store

As you discuss the ERD components in Figure PC.2A, note particularly the following points: 

Remind your students that relationships are read from the parent to the related entity. Therefore, ORDER contains ORD_LINE. (The natural tendency to read from top to bottom or from left to right is not the governing factor in an ERD!)



We can now track individual copies of each movie. If there are 12 copies of a given movie, each copy can be rented out separately.



ORD_LINE and RENT_LINE are composite entities. So why is COPY not a composite entity? Here is an excellent example of why single-attribute PKs are a requirement

837

when the entity is referenced by another entity. In this case, the COPY entity’s PK is referenced by the RENT_LINE. Therefore, COPY must have a single-attribute PK. (Note that the PK of the COPY entity is the single-attribute COPY_CODE, rather than the combination of MOVIE_CODE and COPY_CODE.) 

It is reasonable to assume that each order goes to a particular vendor. Therefore, the VEND_CODE is the FK in ORDER, rather than in ORD_LINE. However, if the order goes to a clearing house and you still want to keep track of the individual vendors that supplied the movies to the clearing house, VEND_CODE will be the FK in ORD_LINE.

NOTE The design shown in Figure PC.2A is implemented in a small sample database named RC_Video.mdb. This database, stored in MS Access format, is located on the Instructor’s CD. If you want your student to write the applications for this segment of the database, you will find that the appropriate tables are available. Because our discussion focus is on the database’s rental transaction segment, the database does not contain all of the tables that are shown in Figure PC.2A’s ERD. We have also added a number of attributes—especially in the RENTAL table—to make it easier to see how the actual applications might be developed. The partial implementation of the ERD shown in Figure PC.2A is reflected in the RC_Video database’s relational diagram segment depicted in Figure PC.2B.

Figure PC.2B The Relational Diagram for the RC_Video Database

Once the database design has been implemented, you can easily use MS Access to illustrate a variety of implementation issues. For example, in a real-world application the RENTLINE table’s RENTLINE_DATE_OUT can simply be generated by specifying the default date to be the current date, Date(). The RENTLINE_DATE_DUE would then be Date()+2, assuming that the checkedout videos are due two days later. (Or substitute whatever criteria you want to use in the queries.) 956. Verify the conceptual model you created in Appendix B, Problem 5. Create a data dictionary for the verified model. Answer: Compare the ERD shown in Figure PC.3 to the ERD shown in Figure PB.5a. Note that the original ERD survived the verification process intact. In this case, the verification process merely confirmed that the model met all the database requirements. Use the data dictionary format shown in Chapter 8, Table 8.2 as your data dictionary template.

Figure PC.3 The Revised (Final) Crow’s Foot ERD for the Manufacturer

838

957. Verify the conceptual model you created in Appendix B, Problem 6. Create a data dictionary for the verified model. Answer: Compare the ERD shown in Figure PC.4A to the ERD shown in Figure PD.6A (in Appendix D) to see the impact of the verification process. Use the data dictionary format shown in Chapter 7, Table 7.3 as your data dictionary template.

839

Figure PC.4A The Crow’s Foot ERD for Problem 4 (The Hardware Store)

840

NOTE The following screen images are based on the database named RC_Hardware. This database is stored in MS Access format on your instructor’s CD. If you want your students to write the applications for this segment of the database, you will find that the appropriate tables are available. (We have added attributes in various tables to enhance their information content.) However, because our discussion focus is on the database’s sales transaction segment, the database does not contain the DEPENDENT table, nor does it contain the VENDOR, ORDER, and ORD_LINE tables that are shown as entities in Figure PC.4A. As you discuss the ERD shown in Figure PC.4B, note that the transactions are tracked in the ACCT_TRANSACTION table. The sample table contents are captured in the screen shown in Figure PC.4B. You can easily demonstrate with a set of queries applied to this database that the inclusion of this table structure in the database design yields very desirable results.

Figure PC.4B The RC_Hardware ACCT_TRANSACTION Table Contents

As you discuss the sample table contents in Figure PC.4B with your students, note that the invoice balances are stored in this table, rather than in the INVOICE table. The reason for this arrangement is simple: the end user must be able to track the remaining balances after each transaction. If the balances for each of the invoices were kept in the INVOICE table, you would be limited to seeing only the most recent balance. Better yet, you can now track all payment transactions by customer or by invoice. For example, a simple query can be written to show the CUSTOMER, INVOICE, and ACCT_TRANSACTION results grouped by customer. Therefore, you can track the entire payment history for each customer. (See Figure PC.4C—note that the query name is shown in the header.)

Figure PC.4C The RC_Hardware Transaction Query Results

Naturally, the CUST_BALANCE value in the CUSTOMER table and the remaining TRANS_INV_BALANCE value in the ACCT_TRANSACTION table must be updated according to © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

841

the TRANS_AMOUNT value entered by the end user in the ACCT_TRANSACTION table. The applications software must be written to automatically make such updates. For example, if you Microsoft Access, you can use macros or you can use VB to accomplish this task. As you examine the query output in Figure PC.4C, note that you can easily trace the transactions for each of the customers. For example, 

Customer 10012 (Smith) made a purchase (invoice #1, transaction #1) on February 3, 2014. The transaction amount was $239.21, but customer 10012 made only a partial payment of $100.00, thus leaving a balance of $139.21 on invoice #1.



Customer 10012 made another purchase (transaction #6, $27.98) on February 15, 2014. This time, customer 10012 paid the entire invoice amount. (Note that the remaining balance for invoice #20 is $0.00 for that transaction.)



Customer 10012 made a $50.00 payment on account—see transaction #7 on February 15, 2014, leaving the balance at $139.21 − 50 = $89.21 for the original invoice #1. (Note that the remaining balance for invoice #1 after transaction #1 was $139.21 on 3-Feb2014.) The original invoice amount of $239.21 was retrieved from the INVOICE table used in this query and this value is not—and must not be—updated. (However, the applications software must update the CUSTOMER table’s customer balance to show the total of all outstanding balances for that customer.)



Customer 10012 made a $20.00 payment on account (see transaction #10) on February 17, 2014, leaving the balance for invoice #1 at $89.21 − 20 = $69.21. Again, the original invoice amount of $239.21 was retrieved from the INVOICE table used in this query and this value is not—and must not be—updated.



Customer 10020 (Rieber) received a $10 refund (see transaction #9 on February 17, 2014). This transaction was applied to the outstanding balance of $92.19 (see transaction #8) for invoice #12, thus reducing the remaining balance for invoice #12 from $92.19 to $82.19.

If the customer comes in to make a payment on account, the system’s end user must be able to query the INVOICE table to find the invoices with outstanding invoice balances. The customer then makes a payment to a specific invoice in the ACCT_TRANSACTION table and the applications software will update both the remaining balance in the ACCT_TRANSACTION table and the customer balance in the CUSTOMER table. If you want to know the entire payment history for each invoice, you can write the query to group the results by invoice number. Figure PC.4D shows the results. It is—again—worth noting that such capability is provided at the database design level.

842

Figure PC.4D The RC_Hardware Invoice Payment History

As you discuss Figure PC.4D, note that all the payment transactions for each invoice are easily traced. For example: 

The total charge placed on invoice #1 is $239.21. The initial payment on February 3, 2014, was $100.00, leaving a balance of $139.21.



The next payment on invoice 1 was made on February 15. This payment of $50.00 leaves a balance of $89.20. Note that the invoice amount, $239.21, is stored in the INVOICE table and this amount must not be changed.



The next payment on invoice 1 was made on February 17. This payment of $20.00 leaves a balance of $69.20. Again note that the invoice amount, $239.21, is stored in the INVOICE table and this amount must not be changed.

958. Verify the conceptual model you created in Appendix B, Problem 7. Create a data dictionary for the verified model. Answer: Compare the ERD shown in Figure PC.5A to the ERD shown in Figure PD.7A (see Appendix B) to see the impact of the verification process. Note the ternary relationship between SIGN_OUT, LOG_LINE, PART, and EMPLOYEE. This relationship enables the end user to track all parts used in each of the log lines for each of the logs and to verify that the parts that were signed out for the log line were, in fact, used in that log line’s maintenance procedure. Use the data dictionary format shown in Chapter 8, Table 8.2 as your data dictionary template.

843

Figure PC.5A The Verified Crow’s Foot ERD for ROBCOR Aircraft Service

The basic ERD shown in Figure PC.5A can easily be modified to incorporate additional tracking capability for a host of other requirements. For example, note that Figure PC.5B includes all the educational, training, and testing options for ROBCOR Aircraft Service’s employees. Given the growing regulatory environment and increasingly restrictive insurance requirements, such detailed tracking requirements are becoming more common in a wide range of different types of business operations. Therefore, discussions about the tracking requirements in the production database design are very productive.

844

Figure PC.5B The Modified Crow’s Foot ERD for ROBCOR Aircraft Service

845

To help you demonstrate the use of the composite PKs in the EMP_TEST, EDUCATION, and TRAINING, we have implemented a segment of the FlyFar design in the FlyFar database. The relational diagram for the FlyFar database is shown in Figure PC.5C.

NOTE The following screen images are based on the database named FlyFar. This database, stored in MS Access format, may be found on your instructor’s CD. If you want your students to write the applications for this segment of the database, you will find that the appropriate tables are available. (We have added attributes in various tables to enhance their information content.)

Figure PC.5C The Relational Diagram for the FlyFar Database Segment

Note the effect of the composite PKs in the composite entities shown in Figure PC.5C. If you examine the last record in the EMP_TEST table shown in Figure PC.5D, you will see that this attempted record entry duplicates a previously entered record. However, note that the DBMS— in this case, Microsoft Access—has caught the attempted duplication. (The use of the composite PK—EMP_NUM + TEST_CODE + EMPTEST_DATE—requires the PK entries to be unique in order to avoid an entity integrity violation. Therefore, the system catches the duplicate record before you have a chance to save it. In fact, to avoid the entity integrity violation, the DBMS will not permit you to save the duplicate record.)

846

Figure PC.5D A Duplicate Record Warning

If you change the test date to indicate that the test result to be entered is different from an earlier test result, the DBMS will accept the data entry. (Note that the FAR135-w test was taken twice by employee 105: once on 22-Jan-2013 and once on 01-Mar-2014.) Remind your students that you can also create a single-attribute, system-generated PK named EMPTEST_NUM for the tblEMPTEST table in Figure PC.5C. This action will convert the composite (weak) EMP_TEST entity to a strong entity. (The EMP_NUM and TEST_CODE remain as foreign keys.) However, if you still want to avoid the duplication of records—a very desirable feature—you must maintain a candidate key composed of EMP_NUM, TEST_CODE, and EMPTEST_DATE—and you must set the index properties to “required” and “unique” for each of the attributes in that candidate key. (The same features may be used in the tblEDUCATION and tblTRAINING tables.) Whether or not you use a single-attribute PK or a composite PK may depend on specified system transaction and/or tracking requirements. The single-attribute PK/composite PK decision is often a function of professional judgment—clearly, the composite PKs work well in the original design shown in Figure PC.5B. However, if a PK is to be referenced by the FK(s) in one or more related tables, the creation of a single-attribute PK is appropriate. In fact, trying to create a relationship between an FK in one table and a composite PK in a related table will quickly illustrate the need for a single-attribute PK. In any case, query design becomes a more complex task when relationships based on composite PKs are traced through several levels.****** 959. Design (through the logical phase) a student-advising system that will enable an advisor to access a student’s complete performance record at the university. A sample output screen should look like the one shown in Table PC.6.

847

Answer: Table PC.6 The Student Transcript for Problem 6 Name: Xxxxxxxxxxxxxxxxx X. Xxxxxxxxxxxxxxxxxxxxxxx

Page # of ##

Department: xxxxxxxxxxxxxxxxxxxxxxx

Major: xxxxxxxxxxxxxx

Social Security Number: ###-##-####

Report Date: ##/Xxx/####

Spring, 20XX Course ENG 111 (Freshman English)

Hours

Grade

Grade points

Xxxxxxxxxxxxxxxxxxxxxxxxxxx

Total this semester

GPA: #.##

Total to date

###

Cumulative GPA: #.##

Summer, 20XX Course CIS 300 (Computers in Society)

Hours

Grade

Grade points

Xxxxxxxxxxxxxxxxxxxxxxxxxxx

Total this semester

GPA: #.##

Total to date

###

Cumulative GPA: #.##

Fall, 20XX Course CIS 400 (Systems Analysis) Xxxxxxxxxxxxxxxxxxxxxxxxxxx

Hours

Grade

Grade points

848

Xxxxxxxxxxxxxxxxxxxxxxxxxxx

Total this semester

GPA: #.##

Total to date

###

Cumulative GPA: #.##

Note that this problem is, basically, an extension of the database design developed in Chapter 4’s discussion of Tiny College. We merely need to expand the presentation to enable us to develop the required outputs.

The Development of the ERD To satisfy the requirements, the ERD must be based on (at least) the following business rules: 1. A department has many students, and each student “belongs” to only one department. 2. A student takes many classes, and each class is taken by many students. 3. A student may enroll in a class one or more times. Naturally, if a class is taken more than once, that “repeat” class is taken in a different semester. 4. A class is a section of a course, that is, a course can yield many classes, but each class references only one course. For example, two sections of the course described by CIS483, Database Systems, 3 credit hours, Prerequisites: 9 hours of CIS courses, including CIS370 (Systems Analysis) may be taught in the Fall and Spring semesters, while the course may not be offered in the Summer session. (Since a course is not necessarily offered each semester, CLASS is optional to COURSE.) 5. Each course belongs to a department. For example, the English department would not offer a Database course. The database should include at least the following components: DEPARTMENT (DEPT_CODE, DEPT_NAME) STUDENT (STU_NUM, STU_LNAME, STU_FNAME, STU_INITIAL, DEPT_CODE) DEPT_CODE references DEPT COURSE (CRS_CODE, CRS_DESC, CRS_CREDIT_HOURS) CLASS (CLASS_ID, CRS_CODE, CLASS_PLACE, CLASS_TIME)

849

CRS_CODE references COURSE ENROLL (STU_NUM, CLASS_CODE, ENROLL_SEMESTER, ENROLL_CRS_CREDITS, ENROLL_CRS_NAME, ENROLL_GRADE) STU_NUM references STUDENT, CLASS_CODE references CLASS Note 1: The participation of SEMESTER allows a student to register for a class one or more times, but only one time per semester. Note 2: The ENROLL entity includes the course description and course credits, because the course name and its credits may change over time. Therefore, you cannot count on the current course name and credit value to reconstruct previous course names and credit hours used on a transcript, which is a historical record. Naturally, to avoid data anomalies, the applications software should be written to make sure that the system transfers current course data to the current transcript record.

NOTE To keep the model simple, we have not included such “obvious” entities as MAJOR, connected to STUDENT and DEPARTMENT, the PROFESSOR who teaches CLASSES and who may chair the DEPARTMENT, the COLLEGE to which the DEPARTMENT belongs, and so on. These details may be discussed in connection with the Tiny College database discussed in Chapter 4, “Entity Relationship (ER) Modeling.” Given this simplification, the DEPARTMENT used in this example does not have any foreign keys. Verification of the ER model Required Output: Selected Student Record The required report can be easily generated through the use of the tables depicted in our database model. The SQL code that will generate the required information will look like this: SELECT

STUDENT.STU_NUM, S_LNAME, DEPARTMENT.DEPT_CODE, DEPT_NAME, ENROLL_SEMESTER, ENROLL_CRS_ CREDIT, ENROLL_CRS_NAME, ENROLL_GRADE

FROM

STUDENT, DEPARTMENT, ENROLL, CLASS, COURSE

WHERE

STUDENT.STU_NUM

= ENROLL.STU_NUM AND

CLASS.CLASS_ID

= ENROLL.CLASS_ID AND

DEPARTMENT.DEPT_CODE

= STUDENT.DEPT_CODE AND

CLASS.CRS_CODE

= COURSE.CRS_CODE

850

ORDER BY

ENROLL.STU_NUM, ENROLL_SEMESTER, ENROLL_CRS_NAME;

The previous SQL query generates the data needed for the report. Specific output format may be created by using the DBMS’s report generator or by using a 3GL programming language such as COBOL or C. Also, note that the “Grade points” column in the Student Record is a computed column that is produced by multiplying the CRS_CREDIT in the COURSE table by the numeric value equivalent to the letter ENROLL_GRADE in the ENROLL table. To compute the value for such a column, the programmer uses a conversion table such as the one shown in Table P6.1.

Table P6.1 A Grade Point Conversion Table Letter Grade

Numeric Value

When the verification process is completed, the ERD looks like the one shown in Figure PC.6.

851

Figure PC.6 The Crow’s Foot ERD for the (Transcript-based) Student Advising System

NOTE As you discuss the ERD shown in Figure PC.6, note that optionalities are often used for operational reasons. For example, keeping CLASS optional to COURSE means that you don’t have to generate a class when a new course is put into a catalog. (In this case, the optionality also reflects the business rule “not all courses generate classes each semester.”) Keeping ENROLL optional to both CLASS and STUDENT means that you won’t have to generate a dummy record in the ENROLL table when you sign up a new student or when you generate a new class entry in the registration schedule.

852

960. Design and verify a database application for one of your local not-for-profit organizations (for example, the Red Cross, the Salvation Army, your church or synagogue). Create a data dictionary for the verified design. Answer: Since this problem’s solution depends on the selected organization, no solution can be presented here. However, the steps required in the solution are shown in Question 4. An abbreviated version is presented in Problem 1. 961. Using the information given in the physical design section (C-5), estimate the space requirements for the following entities: 

RESERVATION



INV_TRANS



TR_ITEM



LOG



ITEM



INV_TYPE

Hint: You may want to check Appendix B, Table B.3, A Sample Volume of Information Log. Answer: You must generate the data storage requirement for each of the tables. Therefore, begin by identifying the attribute characteristics and storage requirements. The supported data types depend on the database software. For example, some software support the Julian date format, while other software require dates to be identified as strings. Even date strings vary in length, depending on the default format (18-Mar-2014 or 3/18/14, for example.) Therefore, the correct answer depends on the DBMS you use. In short, the following data storage requirements are meant to be used for discussion purposes only. (Only a few sample tables are shown, but they are sufficient to illustrate the process and to serve as the basis for a discussion about required table spaces.

Table: RESERVATION (4 per week, 14 weeks per semester, 56 reservations per semester)

Attribute

Data Type

Storage (bytes)

RES_ID

INT

RES_DATE

DATE

USER_ID

CHAR(11)

LA_ID

CHAR(11)

Row Length (bytes)

Number of Rows

Total Bytes

1,904

853

Table: INV_TRANS (80 per week, 14 weeks per semester, 1,120 transactions per semester) Data Type

Attribute

Storage (bytes)

TRANS_ID

INT

TRANS_TYPE

CHAR(1)

TRANS_PURPOSE CHAR(2)

TRANS_DATE

DATE

LA_ID

CHAR(11)

USER_ID

CHAR(11)

ORDER_ID

INT

TRANS_COMEMT

CHAR(50)

Row Length (bytes)

Number of Rows

1,120

Total Bytes

101,920

Table: TR_ITEM (240 per week, 14 weeks per semester, 3,360 per semester) Attribute

Data Type

Storage (bytes)

TRANS_ID

INT

ITEM_ID

NUMBER(8,0)

LOC_ID

CHAR(10)

TRANS_QTY

INT

Row Length (bytes)

Number of Rows

Total Bytes

3,360

87,360

854

Table: LOG (5,000 per week, 14 weeks per semester, 70,000 reservations per semester)

Attribute

Data Type

Storage (bytes)

LOG_DATE

DATE

LOG_TIME

CHAR(12)

LOG_READER

CHAR(1)

USER_ID

CHAR(11)

Row Length (bytes)

Number of Rows

Total Bytes

70,000

2,240,000

Table: ITEM (890 identified)

Attribute

Data Type

Storage (bytes)

ITEM_ID

NUM(8,0)

TY_GROUP

CHAR(8)

ITEM_INIV_ID

CHAR(7)

ITEM_DESCRIPTION

CHAR(10)

ITEM_QTY

INT

VEND_ID

CHAR(5)

ITEM_STATUS

CHAR(1)

ITERM_BUY_DATE

DATE

Row Length (bytes)

Number of Rows

Total Bytes

890

76,540

855

Table: INV_TYPE (15 categories)

Attribute

Data Type

Storage (bytes)

TY_GROUP

CHAR(8)

TY_CATEGORY

CHAR(2)

TY_CLASS

CHAR(2)

TY_TYPE

CHAR(2)

TY_SUBTYPE

CHAR(2)

TY_DESCRIPTION

CHAR(35)

TY_UNIT

CHAR(4)

Row Length (bytes)

Number of Rows

Total Bytes

825

TABLE OF CONTENTS Answers to Review Questions .............................................................................................856 Answers to Problems ...........................................................................................................861

ANSWERS TO REVIEW QUESTIONS NOTE Since the answers to many of these questions are covered in detail in Appendix F, we have elected to give you section references to avoid needless duplication. 962. Mainframe computing used to be the only way to manage enterprise data. Then personal computers changed the data management scene. How do those two computing styles differ, and how did the shift to PC-based computing evolve? Answer: The evolution toward client/server information systems is explained in Section F-2. The main differences between mainframe-based information systems and PC-based

856

client/server information systems are illustrated in Table F.1. The answer to this question may also include a discussion, based on Section F-2, of the forces that drive client/server systems. 963. What is client/server computing, and what benefits can be expected from client/server systems? Answer: Client/server is a term used to describe a computing model for the development of computerized systems. This model is based on the distribution of application’s functions among two types of independent and autonomous entities: servers and clients. A client is any process that requests specific services from server processes. A server is a process that provides requested services for clients. Therefore, the application is divided into client and server processes. Note also that the client and server processes can reside on the same or on different computers connected by a network. The final result is an application in which part of the processing is done at the client side and part of the processing is done at the server side. The advantages of separating and distributing the application’s processing are efficient resource utilization and maximization of resource effectiveness. Perhaps the greatest single advantage is found in the utilization of the existing personal computer power for local data access and processing. Thus, the end user is able to use local PCs to access mainframe and minicomputer legacy data and to process such data locally by using userfriendly PC software. Used correctly, this computing approach yields greater information autonomy, lower costs, improved access to information, and, therefore, a greater potential for better decision making. These benefits may yield better service to customers, thus generating more business. 964. Explain how client/server system components interact. Answer: The main client/server components are the client, the server, and the communications channel. Some experts include middleware as a separate component. The client provides an interface for interacting with the user and performing some tasks such as local data validation. When data or processing is needed from the server, the client sends a request over the communications channel, which may include processing by middleware components. The request is sent to the server process, which provides the data or processing requested, and sends the results back to the client, possibly going through middleware in the process. 965. Describe and explain the client/server architectural principles. Answer: Client/server components must conform to some basic architectural principles if they are to interact properly. The client and server distribute an application’s processing across two types of independent entities. While the entities may run on the same physical computer, this is not typical. The client architecture is usually some type of personal computing device with sufficient memory, processing power, and storage to manage a user interface and other local tasks. The server component also includes memory, processing power, and storage, but typically is more powerful since it must be able to handle multiple concurrent requests from many different clients. The separation of the client and server components across communication media allows many benefits such as location independency, improved resource efficiency, and scalability. 966. Describe the client and the server components of the client/server computing model. Give examples of server services. Answer: Desirable hardware and software for the client component includes powerful hardware, a multitasking operating system, a graphical user interface, and communications

857

capabilities. Desirable characteristics of the server component include a fast CPU, faulttoleration capabilities, expandability for memory, storage, and peripherals, and multiple communication options. Server services can include file services, print and fax services, database services, and miscellaneous services such as CD-ROM, video, and back-up. 967. Using the OSI network reference model, explain the function of the communications middleware component. Answer: The communications channel provides the means through which clients and servers communicate. The communications channel connects clients and servers and its main function is the delivery of messages between clients and servers. Using the OSI network reference model, Section F-3e provides a detailed explanation of the communication channel. Note that we use the OSI network reference model because most of the client/server applications are based on a scenario in which clients and servers are tied together through a network. 968. What major network communications protocols are currently in use? Answer: The network protocols determine how messages between computers are sent, interpreted, and processed. The main network protocols in use today are Transmission Control Protocol/Internet Protocol (TCP/IP), Internetwork Packet Exchange/Sequenced Packet Exchange (IPX/SPX), and Network Basic Input/Output System (NetBIOS). Section F4 provides a more detailed explanation of these and other network protocols. 969. Explain what middleware is and what it does. Why would MIS managers be particularly interested in such software? Answer: Middleware is software that is used to manage client/server interactions. Most important to the end user and MIS manager is the fact that middleware provides services to insulate the client from the details of network protocols and server processes. MIS managers are usually concerned with finding ways to improve end-user data access and to improve programmer productivity. By using middleware software, end users can access legacy data and programmers can write better applications faster. The applications are network independent and database server independent. Such an environment yields improved productivity, thereby generating development costs savings. Sections F-3f and F-3g provide additional database middleware software details. 970. Suppose you are currently considering the purchase of a client/server DBMS. What characteristics should you look for? Why? Answer: A client/server DBMS is just one of the components in an information system. The DBMS should be able to support all applications, business rules, and procedures necessary to implement the system. Therefore, the DBMS must match the system’s technical characteristics, it must have good management capabilities, and it must provide the desired level of support from vendors and third parties. Specifically: 

On the technical side the database should include data distribution, location transparency, transaction transparency, data dictionary, good performance, support for access via a variety of front-ends and programming languages, support several client types (DOS, UNIX, Windows, etc.), third-party support for CASE tools, Application Development Environments, and so on.



On the managerial side the database must provide a wide variety of managerial tools, database backup and recovery, GUI-based tools, remote management, interface to

858

other management systems, performance monitoring tools, database utilities, and so on. 

On the support side the DBMS must have good third-party vendor support, technical support, training, and consulting.

971. Describe and contrast the client/server computing architectural styles that were introduced in this appendix. Answer: This question deals with identifying the application processing logic components and deciding where to locate them. Section F-11 covers this very important topic in great detail. (Note particularly the summary in Figure F.19, “Functional Logic Splitting in Four Client/Server Architectural Styles.”)

859

Client/server computing styles include several layers of hardware and software in which processing takes place. The file server style places data manipulation logic on the server, with presentation logic, I/O processing logic, application business logic, and data management logic all on the client computer. The database server style places the data manipulation logic on the server, splits data management logic between the client and the server, and places presentation logic, I/O processing logic, and application business logic completely on the client system. The transaction server style puts the data manipulation and data management only on the server, puts presentation logic and I/O processing logic only on the client, and shares application business logic across both the server and client computers. Finally, the application server style puts the presentation logic on the client computer, and all other components on the server. These styles differ primarily in shifting more and more of the functional logic from the client to the server as one progresses from file server through application server styles. 972. Contrast client/server data processing and traditional data processing. Answer: From a managerial point of view, client/server data processing tends to be more complex than traditional data processing. In fact, client/server computing changes the way in which we look at the most fundamental computing chores and expands the reach of information systems. These changes create a managerial paradox. On the one hand, MIS frees end users to do their individual data processing and, on the other hand, end users are more dependent on the client/server infrastructure and on the expanded services provided by the MIS department. Client/server computing changes the way in which systems are designed, developed, and managed by forcing a change from: 

proprietary to open systems



maintenance-oriented coding to analysis, design, and service



data collection to data deployment



a centralized to a distributed style of data management



a vertical, inflexible organizational style to a more horizontal, flexible style

973. Discuss and evaluate the following statement: There are no unusual managerial issues related to the introduction of client/server systems. Answer: The managerial issues in client/server systems management arise from the changes in the data processing style, the management of multiple hardware and software vendors, the maintenance and support of the client/server infrastructure, such as communications, applications, and the management and control of associated costs. The heterogenous nature of the client/server environment presents unique challenges along all of those dimensions. Therefore, one can evaluate the given statement as false because there are many unusual managerial issues related to introducing client/server systems.

860

ANSWERS TO PROBLEMS ROBCOR, a medium-sized company, has decided to update its computing environment. ROBCOR has been a minicomputer-based shop for several years, and all of its managerial and clerical personnel have personal computers on their desks. ROBCOR has offered you a contract to help the company move to a client/server system. Write a proposal that shows how you would implement such an environment. Answer: Because Problem 1 cannot be answered properly without addressing the computing style issue in Problem 2, the answers to both questions are supplied after Problem 2. 974. Identify the main computing style of your university computing infrastructure. Then recommend improvements based on a client/server strategy. (You might want to talk with your department’s secretary or your advisor to find out how well the current system meets their information needs.) Answer: Problems 1 and 2 are research questions that yield extensive class projects. The questions are designed with two ideas in mind: 1. To have the student assume the consultant’s “proactive” role. 2. To entice the students to use the knowledge acquired in this appendix to develop an integrated approach to client/server systems implementation. The expected output for these projects is a business quality paper and a professional-level class presentation of the findings, recommended solutions, and the suggested implementation. The material presented in Section F-12d yields an outline appropriate for such a paper. It will be beneficial if students have taken at least an introductory course in Systems Analysis and Design. Keep in mind that you can either use the two scenarios presented in these questions or you can assign students a real-world case to accomplish the same goals. In the first case, the professor assumes the role of the end user. In the second case, an external third party is the end user. The problem with real-world cases is that the professor must procure commitment from the third party. Unfortunately, it is sometimes difficult for company managers to provide possibly sensitive internal information to students and to devote scarce time resources to student projects. Even if the project is kept within the university’s bounds, you are likely to discover that the university administrators may not be able or willing to provide critical information. Students should be encouraged to use the presentations as a basis for further analysis of the more nettlesome issues that must be confronted in the development of client/server systems. We suggest several class discussion sessions in which different student groups present alternative solutions. Such presentations will force students not only to design a solution but also to sell the solution to management.

861

ANSWERS TO REVIEW QUESTIONS NOTE To ensure in-depth chapter coverage, most of the following questions cover the same material that we covered in detail in the text. Therefore, in most cases, we merely cite the specific section, rather than duplicate the text material. 975. Discuss the evolution of object-oriented concepts. Explain how those concepts have affected computer-related activities. Answer: Object orientation is the combining of data and the processes to manipulate that data into a single, modular unit. These concepts first appeared in object-oriented programming languages. Object orientation is intuitive and predictably gained in popularity as the rise in popularity of personal computers increased the computing resources available to end users. 976. How would you define object orientation? What are some of its benefits? How are OO programming languages related to object orientation? Answer: Object orientation is a set of design and development principles based on conceptually anonymous computer structures known as objects, which encapsulate data and the procedures to manipulate that data. Among the benefits of object orientation are a reduction in the number of lines of code necessary to create applications, decreased development time, code reusability, support for abstract data types and complex data objects, and support for complex data manipulations in specialized applications. Table G.1 summarizes more benefits in addition to these. OO concepts have created a powerful programming environment that has radically changed both programming and systems development. Although traditional programmers tended to agree that modularity is one of the primary goals of structured programming and good design, modularity was often difficult to achieve. Even a cursory examination of OO concepts leads to the conclusion that the conceptually autonomous structure (in which an object contains both data and methods) makes the much sought-after modularity almost inevitable. 977. Define and describe the following: a. Object b. Attributes c.

Object state

d. Object ID (OID) Answer: Object is an abstract representation of a real-world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself.

862

Attributes are also called instance variables in an object-oriented environment. They are the data characteristics of the object. Object state is the set of values that the object’s attributes have at any given time. Object ID is a system-generated object identifier that is independent of the object state and any physical address in memory. Like a primary key it provides a unique identity for an instance, but it is system-generated and cannot be changed under any circumstances. 978. Define and contrast the concepts of method and message. What OO concept provides the differentiation between a method and a message? Give examples. Answer: A method is the code that performs a specific operation on the object’s data. Messages are requests sent by one object (sender) to other objects (receivers) requesting the receivers to use one of the receiver’s methods to change the receiver’s data or state. Encapsulation is the concept that hides an object’s internal details. It prevents one object from directly manipulating the contents of another. For example, a Payment class may be used to record a new payment being made by a customer. The class uses an internal method to generate a new Payment object. The Payment object sends a message to a Customer object instructing it to change the customer’s balance using the customer.updateBalance() method. The Customer object uses the updateBalance method and other internal methods to validate the results as it completes the request. 979. Explain how encapsulation provides a contrast to traditional programming constructs such as record definition. What benefits are obtained through encapsulation? Give an example. Answer: Encapsulation hides the object’s internal data representation and method implementation, thus ensuring the object’s data integrity and consistency. The programmer needs only ask an object to perform an action, without having to specify how the action is to be performed. Since the implementation details need not be specified, the programmer can concentrate on the overall process. Clearly, an object is an independent entity. Therefore, object independence assures system modularity. For example, an object-oriented system is formed by possibly thousands of independent objects (or even more) that interact to perform specific actions. In short, what we have just described is a perfectly modular system. In contrast, the programmer who uses traditional programming languages has direct access to the internal components of a record type. Therefore, the programmer can directly manipulate the data elements at will. This ability is not necessarily valuable; programmers can (and do) make mistakes, thus causing problems in critical systems. For example, when you create a record type “customer” in your program, you have direct access to all the data elements of such a record, so there is no protection of the data. 980. Using an example, illustrate the concepts of class and class instances. Answer: A class instance is an object. A class is composed of a collection of objects or class instances with shared structure (attributes) and behavior (methods). A class named STUDENT may be used to contain the collection of individual student objects. 981. What is a class protocol, and how is it related to the concepts of methods and classes? Draw a diagram to show the relationships among these OO concepts: object, class, instance variables, methods, object state, object ID, behavior, protocol, and messages.

863

Answer: A class protocol is the collection of messages, each identified by a message name that are made available for other objects to see. It represents the public aspect of the class and of the objects in that class. 982. Define the concepts of class hierarchy, superclasses, and subclasses. Explain the concept of inheritance and the different types of inheritance. Use examples in your explanations. Answer: Class hierarchy is the organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass. In a class hierarchy, the superclass is the more general classification from which the subclasses inherit data structures and behaviors. In a class hierarchy, a subclass is a class derived from a superclass. Inheritance is the ability of an object within the hierarchy to inherit the data structure and behavior of the classes above it. For example, Stringed Instrument may be a subclass of Musical Instrument, and Guitar is a subclass of Stringed Instrument. If a class has only one immediate superclass above it, then single inheritance occurs. If a class has multiple immediate superclasses above it, then multiple inheritance occurs. In the above example, a Guitar, having only Stringed Instrument as a superclass, would exhibit single inheritance. A Piano object would have both Stringed Instrument and Percussion Instrument as superclasses so would exhibit multiple inheritance. 983. Define and explain the concepts of method overriding and polymorphism. Use examples in your explanations. Answer: Ordinarily, if a method is in a superclass, then it does not need to be created in the subclasses. Calls for the method will look in the subclass to find the method. If the method isn’t found, then the system will look for the method in the superclass. Therefore, if the subclass and the superclass would define the method in the same way, then the method only needs to be defined once, in the superclass, and inheritance will allow all the subclasses to respond to that method. Method overriding is when a subclass has a different definition of a method than its superclass. In that case, even though the subclass would inherit the definition of the method from the superclass, this definition is “overridden” by the definition in the subclass so messages for the method will call the method as defined in the subclass instead of the definition from the superclass. Polymorphism allows different objects to respond to the same message in different ways. For example, Pilot and Mechanic objects may use different calculations for determining monthly pay. This allows an object, such as Payroll, to send the same message using the monthPay method to both Pilot and Mechanic requesting the monthly pay amount, and the Payroll object is unaware that Pilot and Mechanic define monthPay differently. 984. Explain the concept of abstract data types. How do they differ from traditional or base data types? What is the relationship between a type and a class in OO systems? Answer: An abstract data type is a set of similar objects with shared and encapsulated data representation and methods. It is generally used to describe complex objects. Traditional data types are predefined and have a set of predefined operations that can be performed on them. With an abstract data type, the programmer defines the operations that can be performed on them using methods.

864

A type is a pattern or definition of the structure and methods that will be used by the objects created based on that type. A class is the collection of actual objects that is created based on the type definition when the system is used. 985. What are the five minimum attributes of an OO data model? Answer: The system must be able to remember data locations. The system must be able to manage very large databases. The system must accept concurrent users. The system must be able to recover from hardware and software failures. Data query must be simple. 986. Describe the difference between early and late binding. How does each of these affect the objectoriented data model? Give examples. Answer: Early binding is the property by which the data type of an object’s attribute must be known at definition time, bonding the data type to the object’s attribute. Late bind is the characteristic in which the data type of an attribute is not known until execution time or runtime. Late binding allows instance variables to be defined as abstract data types that are defined by methods during execution. 987. What is an object space? Using a graphic representation of objects, depict the relationship(s) that exist between a student taking several courses and a course taken by several students. What type of object is needed to depict that relationship? Answer: The object space or object schema is the equivalent of a database schema. The object space is used to represent the composition of the state of an object at a given time. For example, you can use the schema shown in Figure QG.13 to represent the M:N relationship between CLASS and STUDENT:

865

Figure QG.13 The Object Schema for the Relationship between Student and Class STUDENT

ENROLL

STU_SOC_SEC_NUM

CLASS:

CLASS 1

STU_LNAME

TAKEN BY:

STU_ADDRESS

STUDENT:

STU_CITY

ENROLL

STUDENT

STU_STATE STU_ZIPCODE CLASS_TAKEN:

CLASS_DESCRIPTION

CLASS

STU_FNAME

CLASS_CODE

GRADE M

ENROLL STU_CUM_GPA STU_SEM_GPA

988. Compare and contrast the OODM with the ER and relational models. How is a weak entity represented in the OODM? Give examples. Answer: Although the OODM has much in common with relational and ER data models, the OODM introduces some fundamental differences. Table QG.14 provides a summary of the OODM characteristics

Table QG.14 A Comparison of OODM, ERM, and Relational Model Features

OODM

ER Model (ERM)

Relational Model

Type

Entity definition (limited)

Table definition (limited)

Object

Entity

Table row or tuple

Class

Entity set

Table

Instance variable

Attribute

Column (attribute)

OID

N/A

Primary key

Object schema

ER diagram

Relational schema

Class hierarchy

N/A*

N/A

Inheritance

N/A*

N/A

866

OODM

ER Model (ERM)

Relational Model

Encapsulation

N/A

Method

N/A

*There are similarities between entity type and subtypes, and class hierarchy and inheritance. Entity types and subtypes are design constructs that provide data modeling with data abstraction, but these constructs do not automatically imply the existence of inheritance. In fact, no RDBMS supports these constructs directly; instead, the programmer has to link the tables at run time to ensure that the attributes will be “inherited.” 989. Name and describe the 13 mandatory features of an OODBMS. Answer: 

The system must support complex objects.



Object identity must be supported.



Objects must be encapsulated.



The system must support types or classes.



The system must support inheritance.



The system must avoid premature binding.



The system must be computationally complete.



The system must be extensible.



The system must be able to remember data locations.



The system must be able to manage very large databases.



The system must accept concurrent users.



The system must be able to recover from hardware and software failures.



Data query must be simple.

990. What are the advantages and disadvantages of an OODBMS? Answer: The OODBMS advantages include the following: more semantic information in the database, better support for complex objects, extensible data types, versioning, faster development, and easier maintenance with reusable classes. OODBMS disadvantages include the following: incorporation of many OO features in RDBMS provides strong opposition to the implementation complexities of OODBMS, lack of theoretical foundation, complexity of OODBMS pointer systems, no standard ad hoc query language like SQL, very steep initial learning curve, few qualified data professionals, and lack of compatibility between different OODBMSs.

867

991. Explain how OO concepts affect database design. How does the OO environment affect the DBA’s role? Answer: Relational database design requires a separation between data and process. Traditionally, identification of data elements is the primary consideration that drives database design. Consideration of the procedures by which that data is manipulated occurs much later in the design process. OO concepts require consideration of the data elements and their manipulation to be considered at the same time since they are encapsulated into a single object. Working toward the OO goal of code reusability is difficult and will require DBAs to become much more proficient programmers as they assume more responsibility for the defining and implementing operations that affect the data. 992. What are the essential differences between the relational database model and the object database model? Answer: 

An object extends beyond the static concept of an entity or tuple in the other data models.



Like the entity set and table, a class includes the data structure. However, unlike the entity set and table, the class also includes methods.



Unlike its relational and ER counterparts, encapsulation allows an object’s “internals” to be hidden from the outside.



Unlike its relational and ER counterparts, inheritance allows an object to inherit attributes and methods from a parent class.



The object ID is a concept associated with the primary key concept in the relational and ER model, but it is not quite the same thing. An object ID is an attribute that is not directly exposed, user definable, or directly accessible as the PK is in the relational model.



The relational and ER model relationships are based on the primary key/foreign key relationships. Such relationships are “value” based; that is, they are based on having two attributes in different tables sharing equal values. The relationships in the object model are not based in the specific value of any attributes.



Data access in the relational model is based on a query language known as SQL. SQL is a set-oriented language that uses associative access methods to retrieve related rows from tables. In contrast with the relational model, the object data model suffers from the lack of a standard query language. Because of its identity-based access style, the object model resembles the record-at-a-time access of older hierarchical and network models.

993. Using a simple invoicing system as your point of departure, explain how its representation in an entity relationship model (ERM) differs from its representation in an object data model (ODM). (Hint: See Figure G.34.) Answer: As shown in Figure G.34, the object model represents the INVOICE as an object containing other objects (CUSTOMER and LINE). In contrast, the ER model uses three different and separate entities related to each other through their primary key/foreign key attributes. Note that the object model automatically includes the CUSTOMER and LINE object instances when each INVOICE line instance is made current.

868

994. What are the essential differences between an RDBMS and an OODBMS? Answer: OODBMS characteristics show that the OODBMS shares features such as data accessibility, persistence, backup and recovery, transaction management, concurrency control, and security and integrity with the RDBMS. In addition, the OODBMS has unique characteristics such as support for complex objects, encapsulation and inheritance, abstract data types, and object identity. 995. Discuss the object/relational model’s characteristics. Answer: The basic features of the O/RM include extensibility of new user-defined data types, support for complex objects, inheritance between supertypes and subtypes within a specialization hierarchy, procedure calls using triggers, and system-generated identifiers similar to object IDs. While these features do not perfectly mimic the structures of OODBMS, they respond to some of the most commonly cited OODBMS capabilities.

869

ANSWERS TO PROBLEMS Convert the following relational database tables to the equivalent OO conceptual representation. Explain each of your conversions with the help of a diagram. (Note: The RRE Trucking Company database includes the three tables shown in Figure PG.1.)

Figure PG.1 The RRE Trucking Company Database

870

Answer: As you examine Figure PG.1A, note that, for simplicity’s sake, we have chosen not to represent BASE_MANAGER as an abstract data type belonging to the class PERSON.

Figure PG.1A The OO Conceptual Representation BASE

TRUCK TRUCK_NUM

BASE: 1 BASE TYPE:

TYPE

BASE_CODE

TYPE_CODE

BASE_CITY

TYP_DESCRIPTION

BASE_STATE

BASE_AREA_CODE

BASE_PHONE

BASE_MANAGER

TRUCKS:

TRUCK_MILES

TRUCK_BUY_DATE

TRUCK_SERIAL_NUM

TYPE

TRUCKS: MM CTRUCK

M M

CTRUCK

Note: c = character data d = date data n = numeric data

Figure PG.1A also illustrates that the CTRUCK class represents a collection of TRUCK objects. In other words, one instance of the CTRUCK class will contain several instances of the class TRUCK. 996. Using the tables in Figure PG.1 as a source of information: Answer: a. Define the implied business rules for the relationships. Given the tables in Figure PG.1, you may develop the following relationships: 

A BASE can have many TRUCKs.



Each TRUCK belongs to only one BASE.



A TRUCK has only one truck TYPE.



Each truck TYPE may have several TRUCKs belonging to it.

b. Using your best judgment, choose the type of participation of the entities in the relationship (mandatory or optional). Explain your choices. From the data shown in Figure PG.1 you can conclude that: 

BASE and TYPE are mandatory for TRUCK.



A TRUCK must have a BASE.



A truck is of a given TYPE.



TRUCK is mandatory for BASE.

871



A BASE must have at least one TRUCK to be considered a BASE.



TRUCK is optional for TYPE. There can be zero, one, or more TRUCKs belonging to a TYPE.

c. Develop the conceptual object schema. Using the results of Problems (a) and (b), the conceptual object schema is represented by Figure PG.2C.

Figure PG.2C The Conceptual Object Schema TRUCK OLD: TX34 TRUCK_NUM: 5001 BASE: [BD39] TYPE: [DF56] TRUCK_MILES: 167123.5 TRUCK_BUY_DATE: 11/8/07 TRUCK_SERIAL_NUM: AA-322-12212-W11 BASE OLD: BD39 BASE_CITY: Nashville BASE_STATE: TN BASE_AREA_CODE: 615 BASE_PHONE: 123-4567

TYPE OLD: DF56 TYPE_CODE: 1 TYPE_DESCRIPTION: Single box, double-axle TRUCKS: [Y54F]

BASE_MANAGER: Andrea D. Gallager TRUCKS: [Y678] CTRUCK OLD: Y678 [TX34], [TX37], [TX65]

CTRUCK OLD: Y54F [TX34]

872

997. Using the data presented in Problem 1, develop an object space diagram representing the object’s state for the instances of Truck listed below. Label each component clearly with proper OIDs and attribute names. Answer: a. The instance of the class Truck with TRUCK_NUM = 5001. The instance of this class is shown in Problem 2C’s conceptual object schema (Figure PG.2C). b. The instances of the class Truck with TRUCK_NUM = 5003 and 5004. As you examine the conceptual object schema shown in Problem 2C, note the following features: 

OIDs are used to reference the object instances of the classes BASE and TYPE.



The BASE and TYPE object instances reference two different CTRUCK object instances.



Using the OIDs, each CTRUCK object instance contains the reference to several object instances of the class TRUCK.

Using these features, the conceptual object schema looks like Figure PG.3B.

873

Figure PG.3B The Conceptual Object Schema TRUCK

TRUCK

OLD: TX37

OLD: TX65

TRUCK_NUM: 5003

TRUCK_NUM: 5004

BASE: [BD39]

TYPE: [DF48]

TYPE: [DF56]

TRUCK_MILES: 221346.6

TRUCK_MILES: 99894.3

TRUCK_BUY_DATE: 12/27/07

TRUCK_BUY_DATE: 2/21/08

TRUCK_SERIAL_NUM: AC-445-78656-Z99

TRUCK_SERIAL_NUM: WG-11223144-T34 TYPE

BASE

OLD: DF56

OLD: BD39 BASE_CITY: Nashville BASE_STATE: TN BASE_AREA_CODE: 615

TYPE_CODE: 2 TYPE_DESCRIPTION: Single box, single-axle TRUCKS: [Y54F] CTRUCK OLD: Y54F

BASE_PHONE: 123-4567 BASE_MANAGER: Andrea D. Gallager

[TX37], [TX65], ……... TYPE

TRUCKS: [Y678] OLD: DF48 CTRUCK OLD: Y678 [TX34], [TX37], [TX65]

TYPE_CODE: 1 TYPE_DESCRIPTION: Single box, double-axle TRUCKS: [Y54F]

As you examine Figure PG.3B’s conceptual object schema, note the following features: 

OIDs are used to reference the object instances of the classes BASE and TYPE.



Both object instances reference the same BASE and TYPE object instances. This property is also called referential object sharing.

998. Given the information in Problem 1, define a superclass Vehicle for the Truck class. Redraw the object space you developed in Problem 3, taking into consideration the new superclass that you just added to the class hierarchy. Answer: To add a superclass VEHICLE to the TRUCK class, first define the superclass VEHICLE, after which you can create the subclass TRUCK. After this task has been completed, the end user will see only the attributes and methods that were inherited from VEHICLE. (The user does not perceive the difference!) To illustrate this point, the object space must also show the new VEHICLE instance. (See Figure PG.4.)

874

Figure PG.4 The Conceptual Object Schema VEHICLE OLD: VF345 MAKER: Ford

Class/Subclass Relationship

YEAR: 1992 TRUCK Attributes Inherited From the VEHICLE Superclass

OLD: TX34 MAKER: Ford YEAR: 1992 TRUCK_NUM: 5001 BASE: [BD39]

Interclass Relationships

TYPE: [DF56] BASE OLD: BD39 BASE_CITY: Nashville

TRUCK_MILES: 162123.5 TRUCK_BUY_DATE: 11/08/07 TRUCK_SERIAL_NUM: AA-322-12212-W11 TYPE

BASE_STATE: TN BASE_AREA_CODE: 615

OLD: DF56

BASE_PHONE: 123-4567

TYPE_CODE: 1

BASE_MANAGER: Andrea D. Gallager

TYPE_DESCRIPTION: Single box, double-axle

TRUCKS: [Y678]

TRUCKS: [Y54F]

CTRUCK OLD: Y678 [TX34], [TX37], [TX65]

CTRUCK OLD: Y54F [TX34]

999. Assume the following business rules: Answer: 

A course contains many sections, but each section has only one course.



A section is taught by one professor, but each professor may teach one or more different sections of one or more courses.



A section may contain many students, and each student is enrolled in many sections, but each section belongs to a different course. (Students may take many courses, but they cannot take many sections of the same course.)



Each section is taught in one room, but each room may be used to teach several different sections of one or more courses.



A professor advises many students, but a student has only one advisor.

Based on those business rules: Identify and describe the main classes of objects.

875

Using the business rules 1 through 6, we may identify the objects: COURSE

STUDENT

CLASS

ROOM

PROFESSOR

NOTE We commonly use CLASS to identify a Section of a COURSE. (In fact, all of the examples in Chapters 2 and 3 were based on this convention.) Therefore, we use CLASS to identify a Section of a COURSE. We use this convention for the simple reason that it properly reflects commonly used language. For example, students invariably will tell you that they have enrolled in your class; they’ll tell you they’re going to your class, rather than going to your Section. However, do keep in mind that “class” has a specific (and different!) meaning in the OO environment. Fortunately, the context in which “class” is used easily identifies which “class” you’re talking about. The classes corresponding to these objects are shown in Figure PG.5A.

Figure PG.5A The Conceptual Object Schema STUDENT

CLASS

STU_NUM

COURSE:

STU_LNAME

COURSE

STU_FNAME

PROFESSOR:

STU_ADDRESS

PROFESSOR

STU_CITY

ROOM:

STU_STATE

ROOM

STU_ZIPCODE

GRADE SCHEDULE:

PROF_NUM

CRS_DESCRIPTION C

PROF_NAME

CRS_CREDIT

PROF_DOB

DEPT_CODE

OFFERING:

N M

TEACH_LOAD: M CLASS

CLASS ROOM M

STUDENT

PROFESSOR

CRS_CODE

ENROLL:

ADVISOR:

PROFESSOR

COURSE

BLDG_CODE

ROOM_NUM

ADVISEES:

STUDENT

RESERVATION: CLASS

CLASS GRADE

STU_CUM_GPA

STU_SEM_GPA

Note: C = Character D = Date N = Numeric

Use the following descriptions to characterize the model’s components:

876

COURSE OFFERING INCLUDES CLASS A COURSE CAN GENERATE MANY CLASSES CLASS IS OPTIONAL TO COURSE (a course may not be offered) PROFESSOR TEACH_LOAD INCLUDES CLASS A PROFESSOR CAN TEACH MANY CLASSES CLASS IS OPTIONAL TO PROFESSOR (a professor may not teach a class) ADVISEES INCLUDES STUDENT A PROFESSOR MAY ADVISE MANY STUDENTS STUDENT IS OPTIONAL TO PROFESSOR (in the advises relationship) ROOM RESERVATION INCLUDES CLASS ONE ROOM CAN HAVE MANY CLASSES SCHEDULED IN IT CLASS IS OPTIONAL TO ROOM (a room may not have classes scheduled in it) STUDENT ADVISOR INCLUDES PROFESSOR A STUDENT HAS ONE PROFESSOR (who advises that student) PROFESSOR IS MANDATORY (a student must have an advisor) SCHEDULE INCLUDES CLASS A STUDENT MAY TAKE MANY CLASSES (i.e., SECTIONS OF A COURSE) CLASS IS MANDATORY TO STUDENT (a student must take at least one Section of a course) CLASS REQUIRES A COURSE COURSE IS MANDATORY (a class can’t exist without a course) PROFESSOR IS MANDATORY (a class must have a professor) ROOM IS MANDATORY (a class must be taught in a room) A CLASS MAY HAVE MANY STUDENTS ENROLLED IN IT STUDENT IS OPTIONAL ( a class may not have any students enrolled in it) AN ENROLLED STUDENT RECEIVES A GRADE

877

Modify your description in (a) to include the use of abstract data types such as Name, DOB, and Address. An abstract data type allows us to create user-defined operations for that new type. To create a new data type, first define the abstract data types or classes: NAME, DOB (date of birth), and ADDRESS, as shown in Figure PG.5B-1.

Figure PG.5B-1 The Abstract Data Types (Classes)

NAME

DOB

ADDRESS

FIRST_NAME

MONTH

STREET

INITIAL

DAY

APT_NUM

LAST_NAME

YEAR

CITY

STATE

ZIPCODE

Having created the new abstract data types or classes, we must redefine PROFESSOR and STUDENT classes so they can reference these newly created classes. For example, the object instance representation for a PROFESSOR will look like Figure PG.5B-2.

878

Figure PG.5B-2 The Object Instance Representation for PROFESSOR NAME OID:

M45

FIRST_NAME:

June

INITIAL:

LAST_NAME:

Hasselblatt

PROFESSOR

DOB

230843

OID: 456

OID:

PROF_NAME:

[M45]

MONTH:

PROF_DOB:

[456]

DAY:

PROF_ADDRESS:

[401]

YEAR:

1961

EPT_CODE:

CIS

TEACH_LOAD:

[D40]

ADVISEES:

[X34]

ADDRESS OID:

[401]

STREET:

North Side

Blvd. APT_NUM :

1093B

CITY:

Paris

STATE:

ZIPCODE:

37892

Within the new object space illustrated in Figure PG.5B-2, the PROFESSOR object instance now contains references to the NAME, DOB, and ADDRESS object instances. Use object representation diagrams to show the relationships between: 

Course and Section.



Section and Professor.



Professor and Student.

To answer this question, we must remember how 1:M relationships are interpreted in the OODM. We must also remember that the OODM interpretation of such 1:M relationships yields some important implications. Keep in mind that all pairs of objects exist in a 1:M relationship: A course has many Sections (classes), a professor teaches many classes, and a professor advises many students.

879

To save space in this manual, we will illustrate only one case of 1:M relationships; the same concepts apply to all cases. We will focus our attention on the relationships of the objects in the class PROFESSOR. The object representation for an object of the (OO) class PROFESSOR will look like Figure PG.5C.

Figure PG.5C The Object Representation for an Object of the Class PROFESSOR PROFESSOR

Collection of SECTION classes D40

OID:

230843

OID:

PROF_NAME:

[M45]

PROF_DOB:

[456]

A34332 OID: ……………. 349 OID: ……………. ……………. 369 OID: ……………. ……………. 380 OID: ……………. ……………. …………….

PROF_ADDRESS: [401] DEPT_CODE:

CIS

TEACH_LOAD:

[D40]

ADVISEES:

[X34]

CLASS objects

Collection of STUDENT classes OID:

STUDENT objects

X34

346 OID: ……………. 345 OID: ……………. ……………. 556 OID: ……………. ……………. 580 OID: ……………. ……………. …………….

Note that we have omitted the object instances for the classes NAME, DOB, and ADDRESS. (These classes are shown in the answer to Problem 5g.) Note also that we have used the “collection of” classes to represent the collection of 

CLASSes taught by the PROFESSOR.



STUDENTs advised by the PROFESSOR.

Collection objects are used to implement 1:M relationships.

880

Use object representation diagrams to show the relationships between: 

Section and Students.



Room and Section.

What type of object is necessary to represent those relationships? The relationship between CLASS (Section) and STUDENTS is M:N; that is, each class has many students, and each student has many classes. The relationship between CLASS and ROOM is 1:M, because each class is taught in only one room and each room is used to teach several classes. We covered the use and representation of 1:M relationships in our answer to Question 5c, so please refer to that material. Depending on the level of abstraction used, representing a M:N relationship in an object representation diagram is fairly simple. For example, at the conceptual level, we can show the relationship between CLASS and STUDENT in Figure PG.5D-1:

Figure PG.5D-1 The Relationship Between CLASS and STUDENT

STUDENT

CLASS

STU_NUM

STU_LNAME

STU_FNAME

STU_ADDRESS

STU_CITY

STU_STATE

STU_ZIPCODE

required

COURSE PROFESSOR:

required

PROFESSOR

ROOM:

ADVISOR:

required

PROFESSOR SCHEDULE:

required

ROOM

ENROLL:

required

optional

N N

STU_SEM_GPA

STUDENT GRADE

STU_CUM_GPA

CLASS GRADE

COURSE:

Note: C = Character D = Date

As you examine Figure PG.5D-1, note that:

881



A student must be registered in one or more CLASSes, and the student earns a GRADE in each CLASS. (Reminder: We’ve used CLASS to represent a Section of a course.)



The CLASS requires a COURSE, a PROFESSOR, and a ROOM.



The CLASS may have one or more STUDENTS, each of whom earns a GRADE in that CLASS. In other words, STUDENT is optional to CLASS.

From a conceptual point of view, the preceding diagram captures both the nature and characteristics of the relationship between CLASS and STUDENT. At the implementation level, the object-oriented data model uses an intersection class to manage M:N relationships only when additional information (attributes) about the M:N relationship between the objects is required. In this case, GRADE is our additional information, so a GRADE is associated with a CLASS and a STUDENT. The intersection class is automatically included within the STUDENT and CLASS object space and represents the individual characteristics of the M:N relationship among them. For clarity’s sake we have labeled this new object as STU-REC. In this case the STU-REC object illustrates what students are in which Section, and in which Sections is the student registered? And what is the grade of each student registered in a Section? The object diagram in Figure PG.5D-2 shows such relationships:

Figure PG.5D-2 The Object Diagram for Problem PG.5d

STUDENT STU_NUM

STU_LNAME

STU_FNAME

STU_ADDRESS

STU_CITY

STU_STATE

STU_ZIPCODE

ADVISOR:

CLASS

STU_REC STUDENT:

PROFESSOR:

PROFESSOR

CLASS GRADE

COURSE

STUDENT CLASS:

COURSE:

ROOM:

ROOM

PROFESSOR SCHEDULE: M

Note: C = Character D = Date N = Numeric

ENROLL:

STU_REC

STU_REC STU_GPA

As you discuss Figure PG.5D-2, note that STU_REC (the student record) is the intersection class that represents the M:N relationship between STUDENT and CLASS.

882

Using an OO generalization, define a superclass Person for Student and Professor. Describe this new superclass and its relationship to its subclasses. A superclass PERSON can be defined for STUDENT and PROFESSOR. PERSON will contain the following attributes:

Attribute Name

Data Type

NAME

DOB

ADDRESS

STUDENT and PROFESSOR will inherit the above attributes from their superclass PERSON. The class hierarchy will look like Figure PG.5E.

Figure PG.5E The Class Hierarchy PERSON

Superclass

NAME DOB ADDRESS

Subclass

STUDENT

PROFESSOR

Inherited from PERSON

NAME DOB 1

ADDRESS ADVISOR:

NAME DOB ADDRESS DEPT_CODE

TEACH_LOAD: M

PROFESSOR CLASS SCHEDULE:

STU_REC STU_GPA

Note: C = Character D = Date N = Numeric

ADVISEES:

STUDENT

As you discuss Figure PG.5E, note the differences between inheritance and interclass relationships. Explain that: 

Inheritance is automatic.



Inheritance moves from top to bottom within the class hierarchy.



Inheritance represents a 1:1 relationship between the superclass and its subclass(es).



Inheritance need not be explicitly defined through the attribute data type.

883

In contrast, interclass relationships must be defined explicitly through the attribute’s data type. In addition, interclass relationships may represent a 1:1, a 1:M, or a M:N relationship. 1000. Convert the following relational database tables to the equivalent OO conceptual representation. Explain each of your conversions with the help of a diagram. (Note: The R&C Stores database includes the three tables shown in Figure PG.6.) Answer:

Figure PG.6 The R&C Stores Database

The conversion is shown in Figure PG.6-1.

884

Figure PG.6-1 The Completed OO Conceptual Representation for the R&C Stores Database STORE

REGION REGION_CODE

REGION_LOCATION C STORES:

STORE

EMPLOYEE

STORE_CODE

STORE_NAME

STORE_YTD_SALES REGION: 1 REGION

MANAGER: 1

EMPLOYEE Note: C = Character D = Date N = Numeric

WORKERS:

EMP_CODE

EMP_TITLE

EMP_LNAME

EMP_FNAME

EMP_INITIAL WORKS_AT:

STORE MANAGER_OF:

EMPLOYEE

STORE

Note that Figure PG.6-1 reflects the following conditions: 

Each REGION can have many STOREs.



The STORE object includes references to the REGION and EMPLOYEE objects. The EMPLOYEE object references reflect that an employee is a manager of a store and that each store employs many employees.



The EMPLOYEE object has reciprocal relationships with the STORE object. These relationships reflect that each employee works at one store and that each store is managed by one employee. The latter relationship makes STORE optional to EMPLOYEE, because not all employees manage a store.

1001. Convert the following relational database tables to the equivalent OO conceptual representation. Explain each of your conversions with the help of a diagram. (Note: The Avion Sales database includes the tables shown in Figure PG.7.)

885

Answer:

Figure PG.7 The Avion Sales Database

The OO representation is shown in Figure PG.7-1.

886

Figure PG.7-1 The Completed OO Conceptual Representation for the Avion Sales Database

CUSTOMER

INVOICE

CUS_NUM

CUS_LNAME

CUS_FNAME

CUS_INITIAL

CUS_CREDIT

CUS_BALANCE

INVOICES:

INVOICE

Note: C = Character D = Date N = Numeric

INV_ NUM CUSTOMER:

PROD_CODE

EMP_NUM

PROD_COST

EMP_TITLE

PROD_PRICE

EMP_LNAME

PROD_QOH

EMP_FNAME

PROD_MIN_QOH

EMP_INITIAL

CUSTOMER SALESREP: EMPLOYEE

EMPLOYEE

PRODUCT

INV_ DATE

INV_SUB

INV_TAX

INV_TOTAL

INV_PYMT

M LINES: INVLINE_NUM N 1 PRODUCT

PROD_LAST_ORDER D

EMP_HIRE_DATE D

LINES: INVLINE_NUM N 1 INVOICE

INVOICES:

INVOICE

INVLINE_UNITS N INVLINE_PRICE N INVLINE_TOTAL N

1002. Using the ERD shown in Appendix C, The University Lab Conceptual Design Verification, Logical Design, and Implementation, Figure C.22 (the Check_Out component), create the equivalent OO representation. Answer: Figure C.22 in Appendix C shows how the M:N relationship of USER and ITEM can be implemented by modeling this relationship through the Check_Out (bridge) entity. The OO representation of the M:N USER and ITEM relationship uses a CHECKOUT object. This object will have its own attributes and it will reference the USER and ITEM objects as shown in Figure PG.8. Note that the CHECKOUT object is a complex object that contains a group of repeating attributes: item, location, quantity, and date in.

887

Figure PG.8 The Completed OO Conceptual Representation for Figure C.22’s Check-Out Component

USER

CHECKOUT

USER_ID

ITEM

CO_ID

ITEM_ID

CO_DATE

ITEM_UNIV_ID

USER_CLASS

USER:

USER_SEX

USER_TYPE

DEPARTMENT:

DEPARTMENT CHECKOUT:

ITEM_SERIAL_NUM N

ITEM_DESCRIPTION C

USER CO_ITEMS: 1

ITEM 1 M

LOCATION

ITEM_QTY

ITEM_SATUS

ITEM_BUY_DATE

INV_TYPE:

INV_TYPE

CHECKOUT COI_QTY

COI_DATE_IN

VENDOR: 1 VENDOR

Note: C = Character D = Date N = Numeric

CHECKOUT:

CHECKOUT

1003. Using the contracting company’s ERD in Chapter 6, Normalization of Database Tables, Figure 6.16, create the equivalent OO representation. Answer: Figure 6.16 depicts the M:N relationship between EMPLOYEE and PROJECT. The object representation of this relationship is shown in Figure PG.9.

888

Figure PG.9 The Completed OO Conceptual Representation for Figure 6.16’s Contracting Company ERD

EMPLOYEE

ASSIGN

PROJECT

EMP_NUM

ASSIGN_NUM

PROJ_NUM

EMP_LNAME

ASSIGN_DATE

D 1

PROJ_NAME

PROJ_DATE ASSIGN:

EMP_LNAME

EMP_FNAME

EMP_INITIAL

EMP_HIREDATE JOB:

PROJECT 1

EMPLOYEE

ASSIGN

ASSIGN_HOURS N

JOB ASSIGN:

ASSIGN

JOB JOB_CODE

JOB_DESCRIPTION C JOB_CHG_HOUR

Note: C = Character D = Date N = Numeric

TABLE OF CONTENTS Answers to Review Questions .............................................................................................889 Answers to Problems ...........................................................................................................896

ANSWERS TO REVIEW QUESTIONS 1004.

What does e-commerce mean and how did it evolve? Answer: Electronic commerce (e-commerce) is the use of electronic computer-based technology to: 

Bring new products, services, or ideas to market.



Support and enhance business operations, including the sales of products and/or services over the web.

The Internet started in the early 1960s as a military project to ensure the survival of computer communications in case of nuclear attack. However, the Internet soon became the prime vehicle for sharing academic research, thus making higher education institutions the Internet’s primary users.

889

1005.



During the early 1960s, banks created a private telephone network to do electronic funds transfers (EFT). This service allowed two banks to exchange funds electronically in a fast, efficient, and secure manner.



During the early 1970s, banks also created services such as the Automated Teller Machine (ATM) to provide “after-hours” services to their customers. ATMs were initially installed by only a few banks nationwide and these banks permitted only a limited number of account transactions.



During the late 1970s and early 80s, there was a boom in the use of Electronic Data Interchange (EDI). EDI enables two companies to exchange business documents over private phone networks. The use of EDI facilitated the coordination of business operations between business partners.



The early 1980s and all 1990s brought us the personal computer, which triggered the Internet’s accelerated growth. The wide acceptance and use of the Internet led to the current dominance of the World Wide Web. The web made the transfer of information among multiple organizations as simple as a click away. The web also became a fertile ground for the exploration and exploitation of new Internet-based technologies for the enhancement of business processes within and between corporations.

Identify and briefly explain five advantages and five disadvantages of e-commerce. Answer: A summary of advantages of e-commerce: 

Comparison shopping



Reduced costs and increased competition



Convenience for online shoppers



24 × 7 × 365 operations



Global access



Lower barriers of entry



Increased market (customer) knowledge

A summary of the disadvantages:

1006.



Hidden costs of operation



Network unreliability



Higher costs of staying on business



Lack of security



Loss of privacy



Low service levels



Legal issues (fraud, copyright problems)

Define and contrast B2B and B2C e-commerce styles. Answer: E-commerce styles can be classified as:

890



Business to business (B2B): electronic commerce between businesses.



Business to consumer (B2C): electronic commerce between businesses and consumers.



Intrabusiness: electronic commerce activities between employers and employees.

Business-to-business (B2B) is a type of e-commerce in which a business sells products and/or services to other business. B2B refers to all types of electronic commerce transactions that take place between businesses. The seller is any company that sells a product or service, using electronic exchanges (such as over the Internet or using EDI). The buyer may be a not-for-profit company such as the Red Cross, a for-profit company such as Dell Computers, or a government organization such as a local municipality. A business-to-consumer (B2C) website sells products or services directly to consumers or end users. In B2C e-commerce, the main focus is on using the Internet—in particular, the web—as a marketing, sales, and post-sales support channel.

891

1007.

Describe and give an example of each of the two principal B2B forms. Answer: B2B Integration. In this scenario, companies establish partnerships to reduce costs and time, to improve business opportunities, and to enhance competitiveness. For example, a company that manufactures computers might partner with its suppliers for hard disks, memory, and other components. Such a partnership will help automate its purchasing system by integrating it with its suppliers’ ordering systems—which, in turn, will tie into their respective inventory systems. In this case, when a component in company “A” gets below the minimum, it will automatically generate an order to supplier “S.” Both systems would be integrated and would exchange business data, probably using XML. Using the same technique, Company “A” may also integrate its distribution system with its distributors. Finally, the distributors may integrate their operations with those of their retailers, which in turn may integrate their activities with those of their customers. As a consequence of such integration, companies (sellers) learn to operate with other companies (buyers) and the integration of their operations makes it possible to achieve a level of efficiency that makes it difficult to switch to another provider. B2B Marketplace. In this scenario, the objective is to provide a way in which businesses can easily search, compare, and purchase products and services from other businesses. The Web-based system will basically work as an online broker to service both buyers and sellers. Given such an environment, many of the activities are focused on attracting new members— either providers or buyers. The “broker” offers sellers a way to market their products and/or services to other businesses, while buyers are attracted by the fact that they can compare products from different buyers and get access to special deals that are offered only to the members. Given this B2B marketplace scenario, the broker obtains revenue through membership and transaction fees. An example of a B2B web market place for the automotive industry is the http://www.covisint.com website.

1008.

Describe e-commerce architecture, then briefly describe each of its components. Answer: The e-commerce architecture is composed of three layers: basic Internet services, business-enabling services, and e-commerce business services. The basic Internet services layer provides the technical foundation on which computers are connected and communicate with each other. The business-enabling services build on the basic Internet services to provide capabilities necessary for conducting transactions on the Internet, such as security, search, and content management. Finally, e-commerce business services provide the capabilities to implement business logic and operations, such as providing a virtual storefront, inventory management, shipping, and customer service systems.

1009.

What types of services are provided by the bottom layer of the e-commerce architecture? Answer: The bottom layer of the e-commerce architecture, known as the Internet Basic Services, describes the basic building blocks and services that are provided by the Internet and the World Wide Web. The Internet provides the basic services that facilitate the transmission of data and information between computers.

1010. Name and explain the operation of the main building blocks of the Internet and its basic services. Answer: The main building blocks of the Internet are: 

TCP/IP, the network protocol that determines the rules used to create and route packets of data between computers.

892



Router, the special hardware/software equipment that connects multiple and diverse networks. The basic services of the Internet are:



World Wide Web, a worldwide network collection of specially formatted and interconnected documents known as webpages.



Webpage, a document containing text and special commands (tags) written in Hypertext Markup Language (HTML).



Hypertext Markup Language (HTML), the standard document-formatting language for webpages.



Hyperlink, a link used by a webpage to call other webpages creating the effect of a web.



Uniform Resource Locator (URL), the address of a resource on the Internet.



Hypertext Transfer Protocol (HTTP), the standard protocol used by web browsers and web servers to communicate.



Domain Name Service, a service to translate English-like domain names into the appropriate TCP/IP addresses.



Web browser, an end-user application used to navigate through the Internet.



Web server, a specialized application whose function is to send requested webpages to the client browser.



Website, a web server and the collection of webpages stored on the server.



Static webpage, a webpage whose contents remain the same unless changed manually.



Dynamic webpage, a webpage whose contents are automatically created and tailored to the user’s request.



File Transfer Protocol (FTP), a protocol used to provide file transfer capabilities across the Internet.



Electronic mail (email), messages transmitted electronically among computers on the Internet.



News and discussion group services, specialized services that allow the creation of virtual communities in which users exchange messages regarding specific topics.

1011. What does business enabling do? What services layer does it provide? Give six examples of business-enabling services. Answer: The Internet Basic Services (IBS) only form a foundation on which to run a basic website. However, IBS does not provide the services required for even elementary business transactions. The business-enabling layer provides the additional services to better support business transactions.

893

Business transactions require accountability, reliability, authentication, trust, fidelity, and performance. These requirements are supported through hardware and software components that work together to provide the additional functionality not provided by the other layers. Table I.3 describes services that are used to enhance websites by providing their users the ability to perform searches, authenticate and secure business data, manage website contents, and so on. The list in Table I.3 is not exhaustive—technological advances enable new services, which in turn are used to enable additional business services. The business-enabling services are search services, security, site monitoring and analysis, load balancing, personalization, web development, database integration, transaction management, messaging, and support for multiple devices. The services provided by this layer are built on top of the Internet Basic Services to provide the additional services that are required to support business transactions. 1012. What is the definition of security? Explain why security is so important for e-commerce transactions. Answer: In an e-commerce context, security refers to all the activities that are associated with the protection of the data and other components against accidental or intentional (probably illegal) use by unauthorized users. Privacy deals with the rights of individuals and organizations to determine the “who, what, when, where and how” authorization to use his/her data. Providing security is a major concern of e-commerce. Companies spend millions of dollars annually on hardware and software equipment to protect their own data (including personal customer data) and property against criminal activities. For e-commerce to be successful, it must ensure the security and privacy of all business transactions and the data associated with those transactions. 1013. Give an example of an e-commerce transaction scenario. What three things should security be concerned with in this e-commerce transaction? Answer: E-commerce data must be secured from a transaction’s beginning to its conclusion. Note, for example, the following transaction sequence: 

A customer buying products online from home, enters order and credit card information in a merchant’s web page.



The information travels from the customer’s computer over the Internet to the merchant’s web server.



The merchant’s web server receives the order and credit card data and stores these data in a database.



The web server sends the order confirmation and shipping information back to the client.



The seller uses a third-party shipping company to deliver the products to the customer.



The seller uses a third-party payment processing company to settle payment.



The shipping company delivers the product to the customer.



At the end of the month, the customer receives his/her credit card statement— possibly electronically.

894



The customer pays the credit card bill, either by writing and mailing a check or through the use of electronic funds transfer.

These transaction components are easily illustrated with the help of Figure I.7, “A Sample E-Commerce Transaction.” Given the transaction scenario in Figure I.7, security (procedures and technology) deal with all activities required to: 

Warrantee the identity of the transaction’s participants by ensuring that both the buyer and the seller are who they say they are. In other words, it needs to exist a secure way to properly identify transaction participants and the authenticity of the messages.



Protect the transaction data from unauthorized modifications while it travels on the Internet. Because it is not feasible to have private lines to connect every two computers, we use the Internet. Unlike private lines that directly connect the sender with the receiver, the Internet is formed by millions of interconnected networks. Ecommerce data has to pass through several different networks in order to travel from the client to the server; this increases the risks of data being stolen, modified, or forged.



Protect the resources (data and computers). This includes protecting the end user and the business data stored on the web server and databases from unauthorized access. It also includes securing the web server against attacks from hackers wanting to break into the system with intentions of modifying or stealing data or of impairing normal operations by limiting the resource availability.

1014. You are hired as a resource security officer for an e-commerce company. Briefly discuss what technical issues you must address in your security plan. Answer: The security plan should include issues such as physical security of the computing environment and protection of the data in the databases. Online transaction security must also cover issues such as authentication, the use of digital certificates to ensure the identity of the parties involved in business transactions, and the use of public-key encryption with digital signatures to guarantee that the data traveling on the Internet cannot be tampered with or read by unauthorized parties. The security plan should include issues such as resource security and transaction security. Transaction security includes encryption methods at the transport level, such as S-HTTP and SSL. Resource security deals with protecting the resources (hardware and software) that enable the conduct of e-commerce—servers, routers, operating systems, and applications— against threats posed by hackers, viruses, theft, and so on.

895

ANSWERS TO PROBLEMS Use the Internet at your university computer lab or home to research the scenarios described in Problems 1–9. Then work through the following problems: Answer: a. What websites did you visit? b. Classify each site (B2B, B2C, and so on). c. What information did you collect? Was the information useful? Why or why not? d. What decision(s) did you make based on the information you collected? The format is provided in the answers to Problems 2–9. Naturally, the websites shown here change periodically, so use the examples as a general guide. Also, keep in mind that there are many sites beyond the sites we have shown in the answers to Problems 2–9.

896

1015. Research—and document—the purchase of a new car. Based on your research, explain why you plan to buy this car. Answer:

A CarGurus.com

B2C

Car models, features, comparisons, ratings, evaluations, dealer prices on new and used models.

Made an informed decision. Found the car most affordable, with best ratings and features. Capability of comparing car models and features.

Information was very useful. CarMax.com

B2C

AutoTrader.com

B2C

897

1016.

Research—and document—the purchase of a new house.

Answer:

A Century21.com

Zillow.com

ColdwellBanker.com

C2B Searched B2C multiple homes based on my search criteria. Websites provide information such as school systems, nearby attractions, city guides, comparable B2C house prices, and financing options.

D I was able to determine which home I could afford, and found a home within price range, locations, and features desired.

The sites also provided ways to place a home for sale. In addition, there were tips for buyers and sellers and price comparisons for C2B new and older B2C homes. Mortgage Calculators were available to help determine what the buyer can qualify for.

898

1017. You are in the market for a new job. Search the web for your ideal job. Document your job search and your job selection. Answer:

A Monster.com

C2B Search job B2C openings by location, industry, and salary range. Research employers, salary comparison.

Indeed.com

Option to post resume C2B and obtain resume B2B advice. B2C Resume writing service. Interview tips.

D Was able to fine-tune a job search. Applied for jobs that matched my qualifications and experience. Was able to research companies and to compare salaries in different geographic regions.

Salary calculators, relocation information and services. Moving information.

899

1018. You need to do your taxes. Download IRS form 1040 and look for online tax processing help, documenting your search. Answer:

A IRS.gov

Hrblock.com

G2C

Obtained information about latest tax laws, downloaded tax return forms. Learned how to file a tax return electronically.

Searched for tax advisors within my area.

B2C

Found information about IRS red flags for auditing. Found tax advice in many different areas, determined how much to save in multiple retirement instruments, etc.

Learned how retirement instruments can be used to save for retirement and to reduce the tax burden. Used many different tax calculators to estimate how much I will pay in taxes.

900

1019. Research the purchase of a 20-year level term life insurance policy and report your findings. Answer:

A Insurance.com

B2B

Searched for policies by state, age, smoking status, and amount.

Could find best possible deals in no time at all.

B2C

Obtained policy details, latest prices, and providers. Intelliquote.com

B2B B2C

TheZebra.com

Compared insurance company premiums and ratings companies.

B2C

901

1020.

Research—and document—the purchase of a new computer.

Answer:

Dell.com

B2C

Searched for computers, compared prices, options, and warranty information.

PCPartPicker.com

B2C

I was able to find my computers at the right price, with the right Could configure features, and with my computer the best according to warranty. my specifications. Obtained leasing and credit term information. Compare prices in new and refurbished computers.

IBuyPower.com

B2C

902

1021. Vacation time is almost here! Research—and document—the destination(s) and activities of next summer’s vacation. Answer:

A Travelocity.com

PriceLine.com

B2C Found information in vacation packages with all-inclusive features, such as air fare, hotel accommodations, and guided-tour details.

I was able to search for (and find) multiple tour vacation packages that fit my criteria.

I was able to B2C perform searches by destination, travel dates, tour operators, etc.

The information provided was also useful to completely plan the vacation and do all booking online.

I could compare prices for tours and hotels. I was also able to find special deals and offers to various destinations. Expedia.com

B2C I could do all the trip planning online and get additional information such as currency conversions, city guides, comments from past-users, weather information, etc.

903

1022. You have some money to invest. Research—and document—mutual funds information for investment purposes. Report your investment decision(s) based on the research you conduct. Answer:

A Vanguard.com

Fidelity.com

Morningstar.com

B2C

Obtained information about investing in the stock market, mutual funds markets, and markets for other instruments.

Was able to determine the best investment strategy to fit my risk tolerance.

B2C

Search yielded comparative fund information such as returns, expense ratios, ratings, market capitalization, family type, price history, etc.

Obtained all critical information appropriate to my investment needs. Enabled me to manage all of my investments online.

Obtained list of best-rated funds according to search criteria.

904

TABLE OF CONTENTS Answers to Review Questions .............................................................................................908 Answers to Problems ...........................................................................................................910 Discussion Focus Here is a good opportunity to take a look at the “big picture” of Internet database development. Review the main points in Chapter 15, “Database Connectivity and Web Technologies,” and Appendix J, “Web Database Development with ColdFusion.” Specifically, focus on: 

Different database connectivity technologies.



Multitier architecture for database development.



How web-to-database middleware is used to integrate databases with the Internet.

Rather than showing you long code listings in this manual, we guide you through the solution steps as they are found in the script files located on the Instructor Resources website. Using this technique, the student can step through the solutions and see the code simultaneously. Figure DJ.1 shows the RobCor ColdFusion application’s main menu.

Figure DJ.1 RobCor Teacher Menu

The Solutions to ColdFusion Problems section is a menu-driven system that guides you through all of the solutions for this appendix. For example, if you click on the Solutions to ColdFusion Problems link (see Figure DJ.1) you will open the page shown in Figure DJ.2.

Figure DJ.2 RobCor Problem Solutions Menu

905

If you click on the View link for the first row shown in Figure DJ.2, you will see the code for the rc_u0.cfm script in Figure DJ.3.

906

Figure DJ.3 rc-u0.cfm Script Code Sample

907

ANSWERS TO REVIEW QUESTIONS 1023.

What are scripts, and how are they created in ColdFusion? Answer: Scripts are a series of instructions interpreted and executed at run time. Scripts are used in web-database application development to instruct the application server components on what actions to do, such as connect, query, and update a database from a web front end. Scripts are, for the most part, transparent to the clients. The application developer must create scripts to access the database and to create the web pages dynamically. The application server executes the scripts and passes the results (output) to the web server in HTML format.

1024.

Describe the basic services provided by the ColdFusion web application server. Answer: The ColdFusion Web Application Server provides the following services (among others): 

Integrated Development Environment.



Session management with support for persistent application variables.



Security and authentication.



A computationally complete programming language (commands and functions) to represent and store business logic.



Access to other services: FTP, SMTP, IMAP, POP, and so on.

1025. Discuss the following assertion: The web is not capable of performing transaction management. Answer: Note the discussion in Section J.2c, Transaction Management. The concept of database transactions is foreign to the web. Remember that the web’s request-reply model means that the web client and the web server interact by using very short messages. Those messages are limited to the request for and delivery of pages and their components. (Page components may include pictures, multimedia files, etc.) The dilemma created by the web’s request-reply model is that: 

The web cannot maintain an open line between the client and the database server.



The mechanics of recovery from incomplete or corrupted database transactions require the client to maintain an open communications line with the database server.

1026. Transaction management is critical to the e-commerce environment. Given the assertion made in Question 3, how is transaction management supported? Answer: Clearly, creating mission-critical web applications mandates support for database transaction management capabilities. Given the just-described dilemma, designers must ensure proper transaction management support at the database server level. Many web-to-middleware products provide transaction management support. For example, ColdFusion provides this support through the use of its CFTRANSACTION tag. If the transaction load is very high, this function can be assigned to an independent computer. By using that approach, the web application and database servers are free to perform other tasks, and the overall transaction load is distributed among multiple processors.

908

1027. Describe the webpage development problems related to database parent/child relationships. Answer: When the web is used to interact with databases, the application design must take into account the fact that the HTML web forms cannot use the multiple data entry lines that are typical of parent/child (1:M) relationships. Yet those 1:M relationships are crucial in ecommerce. For example, think of order and order line, or invoice and invoice line. Most end users are familiar with the conventional GUI entry forms that support multitable (parent/child) data entry through a multiple-component structure composed of the main form and a subform. Using such main form/subform forms, the end user can enter multiple purchases associated with a single invoice. All data entry is done on a single screen. Unfortunately, the HTML-only web environment does not support this very common type of data entry screen. As illustrated in the ColdFusion script examples, the web can easily handle single-table data entry. However, when multitable data entries or updates are needed—such as order with order lines, invoice with invoice lines, and reservation with reservation lines— the web falls short. Although implementing the parent/child data entry is not impossible in a web environment, its final outcome is less than optimum, usually counterintuitive, less userfriendly, and prone to errors. To see how the web developer might deal with the parent/child data entry, let’s briefly examine how you might deal with the ORDER and ORDER_LINE relationship used to store customer orders. Using an applications middleware server such as ColdFusion to create a web front end to update orders, one or more of the following techniques might be used: 

Design HTML frames to separate the screen into order header and detail lines. An additional frame would be used to provide status information or menu navigation.



Use recursive calls to pages to refresh and display the latest items added to an order.



Create temporary tables or server-side arrays to hold the child table data while in the data entry mode. This technique is usually based on the bottom-up approach in which the end user first selects the products to order. When the ordering sequence is completed, the order-specific data, such as customer ID, shipping information, and credit card details, are entered. Using this technique, the order detail data are stored in the temporary tables or arrays.



Use stored procedures or triggers to move the data from the temporary table or array to the master tables.

Although the web itself does not support the parent/child data entry directly, it is possible to resort to web programming languages such as Java, JavaScript, or VBScript to create the required web interfaces. The price of that approach is a steeper application development learning curve and a need to hone programming skills. And while that augmentation works, it also means that complete programs are stored outside the HTML code that is used in a website.

909

ANSWERS TO PROBLEMS In the following exercises, you are required to create ColdFusion scripts. When you create these scripts, include one main script to show the records and the main options, for a total of five scripts for each table (show, search, add, edit, and delete). Consider and document foreign key and business rules when creating your scripts. Create ColdFusion scripts to search, add, edit, and delete records for the USER table in the RobCor data

NOTE The following pages show sample ColdFusion scripts that are required by the problem set. To avoid repetition and to save space, we have illustrated only one example of each script type (select, insert, update, and delete). Use the Instructor’s Resources website to access the complete list of ColdFusion scripts. To install ColdFusion and the scripts, follow the instructions in the ColdFusion_Setup.doc. source. Answer: This exercise series requires the student to create the data manipulation scripts for the USER table. The logic used in these scripts is the same as the one shown in Appendix J, “Web Database Development with ColdFusion.” Please refer to the solution scripts found on the Instructor’s Resource website. To produce a user-friendly environment, we have created a menu to access all table database operations. The menu lets the user add, edit, delete, and search records in the table.

NOTE The focus of this chapter is on the database operations logic rather than on HTML. This is not a web development textbook. Therefore, we will show the code (or parts of it) but focus mostly on the ColdFusion database commands. The menu script named rc-u0.cfm produces the output shown in Figure PJ.1.

Figure PJ.1 The User Management Menu

910

Figure DJ.3 shows the code for the User Management Menu shown in Figure PJ.1. Notice the CFQUERY statement in lines 2 to 4. This query retrieves the user records and allows users to select the user record to Edit or Delete. Please note that the same process is used in all scripts. In short, HTML commands are used to collect or present the data, then the form data is passed to another script that uses ColdFusion commands to insert, update, or delete rows in the respective tables. In summary: 



To add records: 

A blank form is shown using form field names that match the table field names.



The data is collected and sent to the next script that uses CFINSERT to add the record.

To edit records: 

A script uses a CFQUERY command to read the existing records, then the user selects the record to edit.



The script calls another script and passes the primary key of the record to edit. This new script reads the data using CFQUERY, presents the data in a form, and the end user updates the data in the form.



Finally, this script calls another script that uses the CFUPDATE command to update the data using the form fields with names matching the table’s fields.

To delete records: 

A script uses a CFQUERY command to read the existing records, then the user selects the record to delete and passes the primary key of the record to delete to the next script.



The next script reads the data, presents the data in a form, and verifies the data can be deleted.



Finally, this script calls another script that uses the CFDELETE command to delete the record using the table’s primary key passed by the previous script.

The following pages list scripts that are required to add, edit, delete, and search records.

INSERTING RECORDS IN THE USER TABLE The rc-ua1.cfm script is shown in Figure PJ.2a and produces the data entry screen shown in Figure PJ.2c. This data entry screen consists of an HTML form that contains several input boxes to enter the data. Because the ColdFusion script for this webpage is more than 100 lines long, we will show you some extracts of the code to illustrate the main functions related to the database.

911

Figure PJ.2a rc-ua1 Script Code Sample

Lines 2 to 4 in the rc-ua1.cfm script generates a list of departments to be used in the HTML select statement shown in lines 45 to 49 in Figure PJ.2b.

912

Figure PJ.2b rc-ua1—The Add User code (continued)

The output of the rc-ua1.cfm script is shown in Figure PJ.2c below.

Figure PJ.2c The Add User Web Form

913

When the user clicks the Add Record button, the script rc-ua2.cfm is invoked. Figure PJ.3a shows the rc-ua2.cfm script code used to insert a record. The rc-ua2.cfm script uses the CFINSERT tag (Line 9) to add a row to the table. The rc-ua1.cfm page passes all the form variable values to rc-ua2.cfm page. Then, the CFINSERT command takes all the form variables with names matching the table’s field names and saves the new record using those values.

Figure PJ.3a The rc-ua2.cfm Add User code

The output of the rc-ua2.cfm script is shown in Figure PJ.3b.

Figure PJ.3b The Add User Results

914

UPDATING RECORDS IN THE USER TABLE The rc-ue1.cfm script will show the data for the USER_ID selected by the end user in the rc-u0.cfm script. In addition, the script lets the end user modify the selected data. Figure PJ.4a shows the script’s output.

Figure PJ.4a The Edit User Form

Lines 18 to 31 in the rc-u0.cfm script (see Figure DJ.3) is an HTML form that requires the user to select a user from the list. This is required for the Edit and Delete options only. Figure PJ.4b shows an extract of rc-ue1.cfm script code.

915

Figure PJ.4b rc-ue1.cfm Edit User Script (partial)

Let’s study this script to understand it better: 

Lines 3 to 5 determine if the end user clicked on the Edit or Delete buttons in the previous script. If the end user clicked on the Delete button, the rc-ud1.cfm script is executed. Otherwise, it continues in the rc-ue1.cfm script.



Lines 6 to 8 use the CFQUERY Cold Fusion command to execute the SELECT SQL statement that reads the user data for the selected user.



Starting in line 18 an HTML form shows the user data for the selected user and allows the end user to edit it. Notice the CFOUTPUT command in line 20. This command tells ColdFusion to use the data read by the “UserEdit” CFQUERY command in the form fields.

Clicking on the Edit button will trigger the rc-ue2.cfm script, which runs the CFUPDATE tag to update the database using the form’s USR_ID value. The rc-ue2.cfm script produces the output shown in Figure PJ.5a. A sample of the rc-ue2.cfm script is shown in Figure PJ.5b.

916

Figure PJ.5a The Edit User Results Screen

Figure PJ.5b rc-ue2.cfm Edit User Code (partial)

917

Deleting Records from the USER Table The rc-ud1.cfm script produces the results shown in Figure PJ.6a. Notice the message that starts with “We cannot delete this record…”. If the record can be deleted, a Delete button will appear instead.

Figure PJ.6a The Delete User Form

The rc-ud1.cfm script is shown in Figures PJ.6b and PJ.6c. Lines 3 to 5 read the user data. Lines 6 to 9 check if the user is Department’s manager. Lines 10 to 13 check if the user has placed an order. Lines 167 to 172 ensure that the Delete button is shown only if this user is not a department manager and has no orders in the ORDERS table. Otherwise, the script will not allow you to delete the USER record. After clicking on the Delete button, the rc-ud2.cfm script is invoked.

918

Figure PJ.6b rc-ud1.cfm Script (partial)

Figure PJ.6c rc-ud1.cfm Script (partial)

The rc-ud2.cfm script is shown in Figure PJ.7a. Notice the use of the CFQUERY Cold Fusion command in lines 7 to 8 to execute the DELETE SQL statement to delete the selected user record.

919

Figure PJ.7a rc-ud2.cfm Script Code

The rc-ud2.cfm script produces the output shown in Figure PJ.7b.

Figure PJ.7b The Delete User Results Screen

Searching for Records in the USER Table The rc-us1.cfm script produces the output shown in Figure PJ.8a. This script uses a CFQUERY ColdFusion command to execute a SELECT statement to retrieve the existing User IDs from the USER table. The result of this command is shown in the drop-down box shown in the form.

Figure PJ.8a The Search User Form

Figure PJ.8a’s very simple screen lets the end user enter a last name to conduct a search in the USER table—or the end user may select a user ID as the search key. An extract of the rcus1.cfm script code is shown in Figure PJ.8b.

Figure PJ.8b rc-us1.cfm Script Code (partial)

920

Clicking on the Search button invokes the rc-us2.cfm script. The rc-us2.cfm script produces the output shown in Figure PJ.9a.

921

Figure PJ.9a The Search User Results Screen

The rc-us2.cfm script (partial code) is shown in Figure PJ.9b.

922

Figure PJ.9b rc-us2.cfm Script Code

Notice the use of the CFQUERY command and SELECT statement in lines 5 to 11. This technique is commonly used to concatenate conditional statements to a query. The query will execute and return zero, one, or many rows. 1028. Create ColdFusion scripts to search, add, edit, and delete records for the INVTYPE table in the RobCor data source. Answer: This exercise series requires the student to create the data manipulation scripts for the INVTYPE table. The logic used in these scripts is the same as the one shown in Problem 1 above. Please refer to the solution scripts found on the Instructor’s Resource website. The main difference is that the scripts use the table INVTYPE and their respective fields in the forms. The scripts use Cold Fusion statements to insert, update, and delete rows in the INVTYPE table. The main data manipulation commands are: <CFINSERT DATASOURCE=”RobCor” TABLENAME=”INVTYPE”> <CFUPDATE DATASOURCE=”RobCor” TABLENAME=”INVTYPE”> <CFQUERY DATASOURCE=”RobCor”> DELETE FROM INVTYPE WHERE …… </CFQUERY> A sample data entry screen for the INVTYPE record is shown in Figure PJ.10.

Figure PJ.10 INVTYPE Add Record Screen

923

1029. Create ColdFusion scripts to search, add, edit, and delete records for the VENDOR table in the RobCor data source. Answer: This exercise series requires the student to create the data manipulation scripts for the VENDOR table. The logic used in these scripts is the same as the one shown in Problem 1 above. Please refer to the solution scripts found on the Instructor’s Resource website. The main difference is that the scripts use the table VENDOR and their respective fields in the forms. The scripts use Cold Fusion statements to insert, update, and delete rows in the VENDOR table. The main data manipulation commands are: <CFINSERT DATASOURCE=”RobCor” TABLENAME=”VENDOR”> <CFUPDATE DATASOURCE=”RobCor” TABLENAME=”VENDOR> <CFQUERY DATASOURCE=”RobCor”> DELETE FROM VENDOR WHERE …… </CFQUERY> A sample data entry screen for the VENDOR record is shown in Figure PJ.11.

Figure PJ.11 VENDOR Add Record Screen

924

1030. Modify the insert scripts (rc-5a.cfm and rc-5b.cfm) for the DEPARTMENT table so that the users who can be manager of a department are only those who belong to that department. Answer: This exercise series requires the student to create the data manipulation scripts for the DEPARTMENT table. The logic used in these scripts is the same as the one shown in Problem 1 above. Please refer to the solution scripts found on the Instructor’s Resource website. The main difference is that the scripts use the table DEPARTMENT and their respective fields in the forms. The scripts use Cold Fusion statements to insert, update, and delete rows in the DEPARTMENT table. The main data manipulation commands are: <CFINSERT DATASOURCE=”RobCor” TABLENAME=”DEPARTMENT”> <CFUPDATE DATASOURCE=”RobCor” TABLENAME=”DEPARTMENT”> <CFQUERY DATASOURCE=”RobCor”> DELETE FROM DEPARTMENT WHERE …… </CFQUERY> Please note that the data management commands for this exercise are included in the ColdFusion Examples scripts. Note that the insert script (rc-5a.cfm and rc-5b.cfm) for the DEPARTMENT table only lists the users that can be manager of a department. The key to this script is in the SELECT SQL statement. (Note the condition used in the WHERE clause. This condition lists only those users who are not already managers of a department.) <CFQUERY NAME="USRLIST" DATASOURCE="RobCor"> SELECT USR_ID, USR_LNAME, USR_FNAME, USR_MNAME FROM USER WHERE USR_ID NOT IN (SELECT USR_ID FROM DEPARTMENT WHERE USR_ID > 0) ORDER BY USR_LNAME, USR_FNAME, USR_MNAME </CFQUERY> The rc-5a.cfm script produces the output shown in Figure PJ.12.

925

Figure PJ.12 The Department Data Entry Screen

Script rc-5b.cfm script uses a CFINSERT tag to add the data to the database. The script rc5b.cfm output is shown in Figure PJ.13.

Figure PJ.13 rc-5b.cfm—the Department Insert Query

1031. Create an Order data-entry screen, using the ORDERS and ORDER_LINE tables in the RobCor data source. To do this, you can use frames and other advanced ColdFusion tags. Consult the online manual and review the demo applications.

926

NOTE This is an advanced project that requires proficient knowledge of HMTL and other technologies. This could be used as a project for graduate students or as a group project that several students can work on. In fact, there are several ways to accomplish this task, depending on the student’s background. Some could use HTML, CSS, JavaScript, Visual Studio.Net, Java, and so on. The main purpose of this project is to get the student to understand the implications of working on the web as an application development platform and its implicit limitations. Answer: Although Chapter 15 provides the basis for your students to develop web to database interfaces, it does not cover all the components required to complete this problem. Therefore, you might pitch this problem at students who have some prior web development experience. (Or perhaps you used supplemental material to examine our database design and implementation material from an applications development point of view!) Even if your students do not (yet) have the appropriate web application development skills, they will find the following discussion interesting and useful for several reasons. First, they will have a chance to revisit some important database design and implementation issues in a web environment. Second, they will be exposed to some of the details of web database applications development. Finally, they may even store this problem into their minds, to be dusted off when they take web-based classes! From a design standpoint, the developer can approach this problem from several different angles: 

Create a multiple frame page that will have one frame for the ORDER header information and another for the ORDER_LINE data entry. In this case, the ORDER data will have to be entered, validated, and saved first, before the ORDER_LINE frame is shown or accessed. Once the main ORDER data are saved, the second frame can be used to enter the ORDER_LINE rows. Both frames use buttons that will enable the system to accept data entry and to perform validation checks. This solution is not particularly well suited to a commercial e-commerce environment for two good reasons: 1. The end-user navigation among frames is awkward and is likely to be rejected by end users. 2. Keeping both frames synchronized is difficult and, unless the coding is particularly robust, is prone to failure.

927



A second way to tackle this problem is to borrow the typical “shopping cart” style used by most online stores. Students can go to Amazon, eBay, or to any other online store to step through the process of purchasing a product online. This process usually starts with the selection of all of the desired items and then progresses to the payment component. In other words, the process first collects the order line data and then, at its conclusion, collects the (invoice) payment data. Given this scenario, the browser must use temporary tables to store the data for the orders in progress. Later, such data are used to update the production database. In between, business logic is used to validate the data and to save such data in the proper format. The business logic is implemented using CGI programs such as PERL. In addition, the business logic component usually employs stored procedures in the database environment. This solution is generally preferred by online stores because it is based on well-established and proven technology. (In fact, you can even buy applications that provide the entire shopping cart feature straight out of the box!)



Using web programming frameworks (such as Visual Studio.NET or Java) may also solve the problem. This approach requires that the complete application be downloaded to the client’s computer to be run locally. Given this scenario, the end user develops an interface that can handle the one-to-many simultaneous data entry format. The entire application logic is then sent to and executed by the client side. The client application is connected to the back-end database through the web. This solution is similar to those offered by high-end web-to-database middleware products such as NetObjects, IBM’s WebSphere, or Oracle 9i Web database. In most cases, the application will be in the form of a Java applet that works in tandem with a server applet.

The rc-oa1.cfm script (found in the Instructor’s Resource website) produces the output shown in Figure PJ.14.

Figure PJ.14 The ORDER Data Entry Screen

There are a few things to mention on this web page. For example, the end-user could enter data for most fields. However, fields such as Invoice Number and Total Cost will be display-only and won’t allow end users to update these fields. The Invoice Number field will be automatically updated when the invoice is generated. The Total Cost field is automatically updated as the end user enters ORDER_LINE rows. The exact mechanics of how that would work depends on the development framework used and they are beyond the scope of this textbook.

928

TABLE OF CONTENTS Answers to Review Questions .............................................................................................929 Answers to Problems ...............................................................................................................3

NOTE Most of this appendix, and all of the end-of-appendix problems, require the use of the Ch14_FACT.json file. The Appendix provides instructions on how to import this file into a MongoDB database as a collection. The documents in this file is a reduced version of the data from the Ch07_FACT database used in Chapter 7. It can be helpful to draw to students’ attention that this is a reduced data set compared to Chapter 7. The reason this is a reduced set is because it is limited to a specific intended application. Refer students back to Chapter 14, Big Data and NoSQL, to remind them that document databases like MongoDB are aggregate aware. Therefore, the data are organized into documents with a great deal of redundancy across documents, but in a manner that reduces the number of documents that need to be accessed during the processing of a transaction.

ANSWERS TO REVIEW QUESTIONS 1032. What is the difference between a replacement update and an operator update in MongoDB? Answer: In MongoDB a replacement update will replace the entire document being updated. If the existing document has key:value pairs that are not included in the update command, then those pairs are lost. Only the pairs specified in the update command will exist in the replaced version of the document. With an operator update, the existing document is unchanged except for the changes specified in the update command. Pairs not included in the update command are not affected. 1033.

Explain what an upsert does. Answer: Upsert is a combination insert / update. If an existing document is found that matches the criteria given, then an update is performed on that document using the key:value pairs specified in the command. If an existing document is not found that matches the criteria given, then an insert is performed to create a document with the key:value pairs specified in the command.

929

1034.

Describe the difference between using $push and $addToSet in MongoDB. Answer: Both commands are used to add a value to an array. The $push command will always add the value to the array, even if it results in duplicate values in the array. The $addToSet command will only add the value to the array if adding it does not result in duplicate values in the array.

1035.

Explain the functions used to enable pagination of results in MongoDB. Answer: Results can be provided in pages of information by using limit() and skip() functions. The limit() function specifies how many results to return. The skip() function allows the programmer to provide an offset of documents before the limit is applied.

1036. Explain the difference in processing when using an explicit and and an implicit and with MongoDB. Answer: With both forms of logical and the DBMS must apply criteria to a document to determine if the document should be included in the results. An explicit and, using the $and operator, will determine that a document should not be included and stop applying criteria to that document as soon as one of the criteria evaluates to FALSE for that document. An implicit and will apply all criteria to the document before determining if the document should be included or not in the results. As a result, explicit and tends to perform better in most cases.

930

ANSWERS TO PROBLEMS For the following set of problems, use the fact database and patron collection created in the text for use with MongoDB. Create a new document in the patron collection. The document should satisfy the following requirements: Answer: First name is “Rachel” Last name is “Cunningham” Display name is “Rachel Cunningham” Patron type is student Rachel is 24 years old Rachel has never checked out a book Be certain to use the same keys as already exist in the collection. Be certain capitalization is consistent with the documents already in the collection. Do not store any keys that do not have a value (in other words, no NULLs). db.patron.insertOne( {

fname: "Rachel", lname: "Cunningham", display: "Rachel Cunningham", type: "student", age: 24

} ); 1037. Modify the document entered in the previous question with the following data. Do not replace the current document. Answer: Rachel has checked out two books on January 25, 2018. The id of the first checkout is “95000” The first book checked out was book number 5237

931

Book 5237 is titled “Mastering the database environment” Book 5237 was published in 2015 and is in the “database” subject The id of the second checkout is “95001” The second book checked out was book number 5240 Book 5240 is titled “iOS Programming” Book 5240 was published in 2015 and is in the “programming” subject Use the same keys as already exist within the collection. Conform to the existing documents in terms of capitalization. db.patron.updateOne( {

"_id": ObjectId("5a45c23f395ff183e78d9c17")},

{

$set:

{checkouts:

[ {

“id”: "95000", “year”: "2018", “month”: "1", “day”: "25", “book": “5237", "title": "Mastering the database environment", “pubyear”: "2015", “subject”: "database"

}, {

"id": "95001", "year": "2018", "month": "1", "day": "25", “book”: "5240", “title”: "iOS Programming", “pubyear”: "2015",

932

“subject”: "Programming" } ] } } ) 1038. Write a query to retrieve the _id, display name and age of students that have checked out a book in the cloud subject. Answer: db.patron.find({"checkouts.subject":"cloud"}, {display:1, age:1}) 1039. Write a query to retrieve only the first name, last name, and type of faculty patrons that have checked out at least one book with the subject “programming”. Answer: db.patron.find({type: "faculty", "checkouts.subject":"programming"}, {fname:1, lname:1, type:1, _id:0}) 1040. Write a query to retrieve the documents of patrons that are faculty and checked out book 5235, or that are students under the age of 30 that have checked out book 5240. Display the documents in a readable format. Answer: db.patron.find({$or: [ {type: "faculty", "checkouts.book":"5235"}, {type: "student", "checkouts.book":5240, age: {$lt:30}} ] } ).pretty()

933

1041. Write a query to display only the first name, last name, and age of students that are between the ages of 22 and 26. Answer: db.patron.find({ type:"student", $and: [{age: {$gte:22}}, {age: {$lte:26}} ] }, {fname:1, lname:1, age:1, _id:0} )

TABLE OF CONTENTS Answers to Review Questions .............................................................................................934 Answers to Problems ...............................................................................................................2

NOTE Most of this appendix and all of the end-of-appendix problems require the use of the Ch14_FCC.txt file. The Appendix provides instructions on how to import this file into a Neo4j graph.

ANSWERS TO REVIEW QUESTIONS 1042. Explain the difference between using the same variable name and different variable names when matching multiple patterns in Neo4j. Answer: Within a given command, all references to a variable are treated as references to the same object (node, edge, or path). Therefore, if the same variable is used in multiple patterns in the same command, then the same node or edge will be required to match both patterns. If different variable names are used, then the node or edge does not have to be the same node or edge in both patterns. 1043. What is the difference between using WHERE and embedding properties in a node when creating a pattern in Neo4j?

934

Answer: Embedded properties are much more limited. Embedded property specifications are treated as using an equality operator and combined using a logical AND. With a WHERE clause, other operators in addition to equality can be used such as less than, greater than, substrings, and so on. Also, criteria in a WHERE clause can be combined with OR connectors as well as AND.

935

ANSWERS TO PROBLEMS For the following problems, use the Food Critics Club (FCC) graph database that was created and used earlier in the text for use with Neo4j. Create a node that meets the following requirements. Use existing labels and property names as appropriate. Answer: The node will be a member, and should be labeled as such, with member id 5000. The member’s name is “Abraham Greenberg”. Abraham was born in 1978, and lives in the state of “OH”. Abraham’s email address is agreen@nomail.com, and his username is agberg. Create (:Member {mid:5000, fname: “Abraham”, lname: “Greenberg”, birth: 1978, state: “OH”, email: “agreen@nomail.com”, username: “agberg” } ) 1044. Create a restaurant node with restaurant id is 10000, the name “Hungry Much”, and located in Cobb Place, KY. Answer: Create (:Restaurant {rid: 10000, name: “Hungry Much”, state: “KY”, city: “Cobb Place”}) 1045. Update the “Hungry Much” restaurant created above to add the phone number “(931) 555-8888”, and a price rating of 2. Answer: Match(r :Restaurant {name: “Hungry Much”}) Set r.phone = “(931) 555-8888”, r.price = 2 1046. Create a REVIEWED relationship between the member created above and the restaurant created above. The review should rate the restaurant as a 5 on taste, service, atmosphere, and value. Answer: © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

936

Match (abe :Member {fname: “Abraham”, lname: “Greenberg”}), (hungry :Restaurant {name: “Hungry Much”}) Create (abe) -[rev :REVIEWED {taste: 5, service: 5, atmosphere: 5, value: 5}]-> (hungry) 1047. Create a REVIEWED relationship between member Frank Norwood and the restaurant created above. The review should rate the restaurant as a 4 on taste, service, and value, and rate the restaurant as a 2 on atmosphere. Answer: Match(frank :Member {fname: “Frank”, lname: "Norwood”}), (hungry :Restaurant {name: “Hungry Much”}) Create(frank) -[rev :REVIEWED {taste:4, service: 4, atmosphere: 2, value: 4}]-> (hungry) 1048. Write a query to display member Frank Norwood and every restaurant that he has rated as a 4 or above on value. Answer: Match (frank :Member {fname: “Frank”, lname: “Norwood”}) -[rev :REVIEWED]-> (rest :Restaurant) Where rev.value >= 4 Return frank, rest 1049. Write a query to display cuisine, restaurant, and owner for every “American” or “Steakhouse” cuisine restaurant. Answer: Match (c :Cuisine) <-- (rest :Restaurant) <-[:OWNS]- (o :Owner) Where c.name = “American” OR c.name = “Steakhouse” Return c, rest, o 1050. Write a query to return the shortest path based only on reviews between members Abraham Greenberg and Herb Christopher. Answer: Match p = shortestPath((abe :Member {fname: “Abraham”, lname: “Greenberg”}) -[:REVIEWED *](herb :Member {fname: “Herb”, lname: “Christopher”})) Return p

937

ANSWERS TO REVIEW QUESTIONS 1051. Why must a conceptual model be verified? What steps are involved in the verification process? Answer: The verification of a conceptual model is crucial to a successful database design. The verification process allows the designer to check the accuracy of the database design by: 

Re-examining data and data transformations.



Enabling the designer to evaluate the design efficiency relative to the end user’s and system’s design goals.

Validating the model’s entities. (Remember the minimal data rule.)



Confirming entity relationships and eliminating duplicate, unnecessary, or improperly defined relationships.



Eliminating data redundancies.



Improving the model’s semantic precision to better represent real-world operations.



Confirming that all user requirements (processing, performance, or security) are met.

938

5. Identify the database’s central entity. The central entity is the most important entity in our database, and most of the other entities depend on it. 6. Identify and define each module and its components. The designer divides the database model into smaller sets that reflect the data needs of particular systems modules such as inventory, orders, payroll, and so on. 7. Identify and define each of the module’s processes. Specifically, this step requires the identification and definition of the database transactions that represent the module’s real-world operations. 8. Verify each of the transactions against the database. 1052. What steps must be completed before the database design is fully implemented? (Make sure that you list the steps in the correct sequence and discuss each step briefly.) Answer: The DBLC, discussed in detail in Chapter 9, “Database Design,” constitutes a database’s history, tracing it from its conceptual design to its implementation and operation. We highly recommend that the database designer follow the DBLC’s steps carefully in order to ensure that the database will properly meet all user and system requirements. Before a database can be successfully implemented, the following steps must be completed: 7. Define the conceptual model’s components: entities, attributes, domains, and relationships. 8. Normalize the database to ensure that all transitive dependencies are eliminated and that each entity’s attributes are solely dependent on its key attribute(s). 9. Verify the conceptual model to ensure that the proposed database will meet the system’s transaction requirements and that the end-user and systems requirements will be met. The verification process will probably delete and/or create entities, attributes, and relationships. It may also refine existing entities, attributes, and relationships. 10. Create the logical design that requires the definition of the table structures, using a specific DBMS (relational, network, or hierarchical). Logical design also includes, if necessary, appropriate indexes and views. 11. Create the physical design to define access paths, including space allocation, storage group creation, table spaces, and any other physical storage characteristic that is dependent on the hardware and software to be used in the system’s implementation. 12. Implement the design. Somehow, this last step seems to suffer from planning neglect, to the detriment of the system’s operation. Implementation, operation, and maintenance plans must (at least) include careful definition and description of the activities required to implement the database design: 

Loading and conversion



Definition of database standards



System and procedures documentation: security, backup, and recovery

939



Operational procedures to be followed by users



A detailed training plan



Identification of responsibilities for operation and maintenance

1053. What major factors should be addressed when database system performance is evaluated? Discuss each factor briefly. Answer: Database systems performance refers to the system’s ability to retrieve information within a reasonable amount of time and at a reasonable cost. Keeping in mind that “reasonable” means different things to different people, we must address at least these important performance factors: 

Concurrent users For any given system, the more users connected to the system, the longer the data retrieval time.



Resource limits The fewer resources that are available to the user, the longer the access queues will be.



Communication speeds Lower communication speeds mean longer response times.



Although the preceding discussion is focused on the speed aspect of performance, there are other equally important issues that must be considered. A successful database implementation requires a balanced approach to all database issues, including concurrency control, query response time, database integrity, security, backup and recovery, data replication, and data distribution. 1054. How would you verify the ER diagram shown in Figure QC.4? Make specific recommendations. Answer:

940

Figure QC.4 The ERD for Question 4

The verification process must include the following steps: 8. Identify and define the main entities, attributes, and domains. In this case, the main entities are PARTS, SUPPLIER, PRODUCT, and CUSTOMER. Identify proper primary keys and composite and multivalued attributes. 9. Identify and define the relationships among the entities. By examining the diagram, we may conclude that several M:N relationships exist: PARTS and SUPPLIER PARTS and PRODUCTS PRODUCT and CUSTOMER 10. Identify the composite entities and their primary and foreign keys. Each composite (bridge) entity creates the connection to maintain a 1:M relationship with each of the original entities. 11. Normalize the model. 12. Verify the model, starting with the identification of the central entity. Given the ER diagram’s layout, we conclude that the central entity is PRODUCT. 13. Identify each module and its components. Three modules can be identified: 

Inventory, containing PARTS and SUPPLIER



Production, containing PARTS and PRODUCT

941



Sales, containing PRODUCT and CUSTOMER

14. Identify each module’s processes or transaction requirements. Start by listing known transaction descriptions by module. For brevity’s sake, we will use the inventory module as an example. The inventory module supports the following transactions: 

Add a new product to inventory



Modify an existing product in inventory



Delete a product from inventory



Generate a list of products by product type



Generate a price list with product by product type



Query the product database by product description

Check the database model against these transaction requirements, verify the model’s efficiency and effectiveness, and make the necessary changes. 1055.

Describe and discuss the ER model’s treatment of the UCL’s inventory/order hierarchy: e. Category f.

Class

g. Type h. Subtype Answer: The objective here is to focus student attention on the details of the UCL’s approach to inventory management. Note that the UCL’s ER model uses two closely related entities to manage items in inventory: ITEM and INVENTORY_TYPE. These two entities maintain a 1:M relationship: One item belongs to only one inventory type, but an inventory type can contain many items. Inventory types are classified through the use of a hierarchy composed of CATEGORY, CLASS, and TYPC. (We may even identify SUBTYPE for each TYPE!) Basically, the hierarchy may be described this way: A category has many classes, and a class has many types. For example, the category hardware includes the classes computer and printer. The class computer has many types that are defined by their CPU: 486 and Pentium computers. Similarly, the category supplies can have several classes: diskette, paper, and so on. Each class can have many types: 3.5 DD diskette, 3.5 HD diskette, 8.5 × 11 paper, 8.5 × 14 paper, and so on. We may even identify subtypes: Each type can have many subtypes. For example, the class “paper” includes the types “single-sheet” and “continuous-feed”; the single-sheet type may be classified by subtype 8 × 11 inches or 11 × 14 inches. The following table summarizes some of the inventory types identified in the system. Note that the hierarchy may be illustrated as shown in Table QC.5A.

942

Table QC.5A The Classification Hierarchy

CATEGORY

CLASS

TYPE

SUBTYPE

HWPCDTP5

Hardware (HW)

Personal Computer (PC)

Desktop (DT)

Pentium (P5)

HWPCLP48

Hardware (HW)

Personal Computer (PC)

Laptop (LT)

Pentium IV

HWPRLS

Hardware (HW)

Printer (PR)

Laser (LS)

Standard

HWPRDM80 Hardware (HW)

Printer (PR)

Inkjet (IJ)

80-column

SUPPSS11

Supply (SU)

Paper (PP)

Single Sheet (SS)

8.5″ × 11″

HWEXHDID

Hardware (HW)

Expansion Board (EX)

Video (VI)

SWDBXXXX

Software (SW)

Database (DB)

The classification hierarchy may also be illustrated with the help of the tree diagram shown in Figure QC.5:

Figure QC.5 The INV_TYPE Classification Hierarchy as a Tree Diagram

CATEGORY

Hardware

CLASS

Personal Computer (PC)

Printer (PR)

TYPE

Desktop (DT)

Inkjet(IJ)

SUBTYPE

Intel P4 (300)

Intel P5 (600)

Black (BL)

Color (CO)

944

1056. Modern businesses tend to provide continuous training to keep their employees productive in a fast-changing and competitive world. In addition, government regulations often require certain types of training and periodic retraining. (For example, pilots must take semiannual courses involving weather, air regulations, and so on.) To make sure that an organization can track all training received by each of its employees, trace the development of the ERD segment in Figure QC.6 from the initial business rule that states: Answer: An employee can take many courses, and each course can be taken by many employees. Once you have traced the development of the ERD segment, verify it and then provide sample data for each of the three tables to illustrate how the design would be implemented.

Figure QC.6 The ERD for Question 6

Follow the verification steps described in the answer to Question 4. Note that the composite TRAINING entity shown in Figure QC.6 reflects part of the verification process that began with the M:N relationship between EMPLOYEE and COURSE. (An employee can take many courses and many employees can take each course.) Part of the verification process involves the elimination of multivalued attributes. For example, an EMPLOYEE table that contains an attribute EMP_TRAINING containing strings such as “fire safety, weather, air regulations” have already been eliminated by the composite TRAINING entity. The structure shown in Figure QC.6 allows us to add attributes to ensure that training details—such as dates, grades, training locations, and so on—can be traced, too. One additional—and very important—point is worth mentioning: at this point, Figure QC.6’s ERD cannot handle recurrent training requirements. That is, if some courses must be retaken periodically, as is common in many transportation businesses, the TRAINING entity’s PK—at this point composed of the EMP_NUM + COURSE_CODE—will not yield a unique value if the course is retaken from time to time. The solution to this problem can be found in either one of two ways: 3. Add the training date to the TRAINING entity’s composite PK to become EMP_NUM + COURSE_CODE + TRAIN_DATE. This approach is illustrated in the examples shown in Tables QC.6A through QC.6C. Note that employee 105 took the FAR-135-P course on 26-Sep-2013 and on 11-Feb-2014. Employee 101 took the WEA-01 course on 26-Sep-2013 and on 26-Mar-2014. Note that the addition of the TRAIN_DATE to the composite PK prevents the duplication of training records. For example, if you tried to enter the first TRAINING record twice, the combination of EMP_NUM+COURSE_CODE+TRAIN_DATE would not be unique and the DBMS would diagnose an entity integrity violation.

945

Table QC.6A The EMPLOYEE Table Contents EMP_NUM

EMP_LNAME

105

Ortega

101

Williams

Table QC.6B The TRAINING Table Contents EMP_NUM

COURSE_CODE

TRAIN_DATE

TRAIN_GRADE

105

FAR-135-P

26-Sep-2013

105

HM-01

18-Dec-2013

101

FAR-135-P

23-Nov-2013

105

WEA-01

10-Mar-2014

101

HM-01

15-Sep-2013

101

WEA-01

26-Sep-2013

105

FAR-135-P

11-Feb-2014

101

WEA-01

26-Mar-2014

Table QC.6C The COURSE Table Contents COURSE_CODE

COURSE_DESCRIPTION

FAR-135-P

Aircraft charter regulations for pilots

FAR-135-M

Aircraft maintenance for charter operations

HM-01

Hazardous materials handling

WEA-01

Aviation weather – basic operations

WEA-02

Aviation weather – instrument operations

946

4. Create a new PK attribute named TRAIN_NUM to uniquely identify each entity occurrence in the TRAINING entity, and then create a composite index composed of EMP_NUM + COURSE_CODE + TRAIN_DATE. This action will remove the weak/composite designation from the TRAINING, because the TRAINING entity’s PK is no longer composed of the PK attributes of the EMPLOYEE and COURSE entities. (And the “receives” and “is used in” relationships will no longer be classified as “identifying”—thus changing the relationship descriptions from “identifying” or “strong” to “non-identifying” or weak”). The composite index will prevent the duplication of records. Note the change in the structure and contents of the TRAINING table shown in Table QC.6D.

Table QC.6D The Modified TRAINING Table Structure and Contents TRAIN_NUM

EMP_NUM

COURSE_CODE

TRAIN_DATE

TRAIN_GRADE

1203

105

FAR-135-P

26-Sep-2013

1204

105

HM-01

18-Dec-2013

1205

101

FAR-135-P

23-Nov-2013

1206

105

WEA-01

10-Mar-2014

1207

101

HM-01

15-Sep-2014

1208

101

WEA-01

26-Sep-2013

1209

105

FAR-135-P

11-Feb-2014

1210

101

WEA-01

26-Mar-2014

We would recommend the second approach. Generally speaking, single-attribute PKs are preferred over composite PKs. Single-attribute PKs are more easily handled if the table is to be linked to a related table later. (The linking is done through a FK—which is the PK in the “parent” table. But if the parent table uses a composite PK, how can you then create the appropriate FK?) In any case, the declaration of a composite PK automatically generates a matching composite index, so you would not decrease the index library if you used approach 1. 1057. You read in this appendix that an examination of the UCL’s Inventory Management module reporting requirements uncovered the following problems: Answer: 



947

948

949



950

Figure PC.1 The Verified Car Dealership Crow’s Foot ERD

951

Continued discussion of Figure PC.1’s ERD: 



The employee qualifications can now be tracked without limit. If an employee gains an additional qualification, all that is needed is an entry in the EDUCATION table.



At this point, we assume that the SERVICE_LOG and SVC_LOG_LINE records yield the information required to bill the customer.

952

Sample SERVICE_LOG Data SVC_LOG_NUM

LOG_COMPLAINT

SVC_LOG_CHARGE

CAR_VIN

10012

Hard to start. Accelerates poorly.

$89.75

2AA-W-123456

10013

Oil change. Rotate and balance tires.

$19.95

5DR-T-8765432

10014

Temp gauge shows high temps.

$135.70

4UY-D-6543210

Sample SVC_LOG_ACTION Data SVC_LOG_NUM

SVC_LOGACT_TYPE

SVC_LOGACT_DATE

EMP_NUM

10012

Open

03-Mar-2014

104

10013

Open

03-Mar-2014

112

10012

04-Mar-2014

112

10014

Open

04-Mar-2014

104

10013

04-Mar-2014

104

Sample SVC_LOG_LINE Data (Several attributes left out to save space) SVC_LINE_NUM

SVC_LOG_NUM

SVC_LINE_WORK

EMP_NUM

10012

Cleaned injection nozzles

106

000000

10013

Drained oil

112

000000

10013

Installed filter

112

FLTR-0156

10013

Replaced oil

112

Oil-PZ30/40

10013

Rotated tires

114

000000

10013

Balanced tires, using four weights (LF0.5oz, RF1.1oz, RR1.2oz, LR0.8 oz)

106

WT-LD10012

10014

Drained coolant

104

000000

10014

Replaced thermostat

112

THERM007B

10014

Replaced coolant

104

COOL-289XZ

PART_CODE

953

Sample PART_LOG Data PARTLOG_N UM

EMP_NU M

PART_CO DE

SVC_LOG_N UM

PARTLOG_DA TE

PARTLOG_UNI TS

10185

112

FLTR-0156

10013

03-Mar-2014

10186

112

Oil-PZ30/40

10013

03-Mar-2014

10187

114

WTLD10012

10013

03-Mar-2014

10188

112

THERM007B

10014

04-Mar-2014

10189

114

COOL289XZ

10014

04-Mar-2014

The main processes that can be identified in this system include: 

The generation of an invoice (INSERT).



The car sales generation and reports (SELECT).



The registration of a service for a customer’s car (INSERT, UPDATE).



The registration of the work log or of the employees (mechanics) who worked on a car (INSERT, UPDATE).



The registration of parts inventory (INSERT, UPDATE).



The registration of parts used in a service (INSERT, UPDATE).



The registration of the car history (INSERT, UPDATE).



Queries and reports such as:  Parts List  Car Price List  Sales Reports  Service Report  Car History Report  Parts Used Report  Work Log Report

The designer must check that the database model supports all these processes and that the model is flexible enough to support future modifications.

954

NOTE The verification process for Problems 2–5 conforms to the process discussed at length in Problem 1. Therefore, we will only show the verified ERDs. The data dictionary format example shown in Problem 1 can also be used as the template in Problems 2–5. Therefore, we do not show additional data dictionaries for Problems 2–5. The ERDs supply the necessary entities, the attribute names, and the relationships. However, it will be very useful to compare the ERDs in the following problems to the original ERDs—in the previous chapter—from which they were derived. 1058. Verify the conceptual model you created in Appendix B, Problem 4. Create a data dictionary for the verified model. Answer: Compare the ERD shown in Figure PC.2A to the ERD shown in Figure PB.4A (see Appendix B) to see the impact of the verification process. Use the data dictionary format shown in Chapter 8, Table 8.2 as your data dictionary template.

955

Figure PC.2A The Crow’s Foot Verified Conceptual Model for the Video Rental Store

As you discuss the ERD components in Figure PC.2A, note particularly the following points: 



We can now track individual copies of each movie. If there are 12 copies of a given movie, each copy can be rented out separately.



ORD_LINE and RENT_LINE are composite entities. So why is COPY not a composite

956

entity? Here is an excellent example of why single-attribute PKs are a requirement when the entity is referenced by another entity. In this case, the COPY entity’s PK is referenced by the RENT_LINE. Therefore, COPY must have a single-attribute PK. (Note that the PK of the COPY entity is the single-attribute COPY_CODE, rather than the combination of MOVIE_CODE and COPY_CODE.) 

Figure PC.2B The Relational Diagram for the RC_Video Database

Once the database design has been implemented, you can easily use MS Access to illustrate a variety of implementation issues. For example, in a real-world application the RENTLINE table’s RENTLINE_DATE_OUT can simply be generated by specifying the default date to be the current date, Date(). The RENTLINE_DATE_DUE would then be Date()+2, assuming that the checkedout videos are due two days later. (Or substitute whatever criteria you want to use in the queries.) 1059. Verify the conceptual model you created in Appendix B, Problem 5. Create a data dictionary for the verified model. Answer: Compare the ERD shown in Figure PC.3 to the ERD shown in Figure PB.5a. Note that the original ERD survived the verification process intact. In this case, the verification process merely confirmed that the model met all the database requirements. Use the data dictionary format shown in Chapter 8, Table 8.2 as your data dictionary template.

Figure PC.3 The Revised (Final) Crow’s Foot ERD for the Manufacturer

957

1060. Verify the conceptual model you created in Appendix B, Problem 6. Create a data dictionary for the verified model. Answer: Compare the ERD shown in Figure PC.4A to the ERD shown in Figure PD.6A (in Appendix D) to see the impact of the verification process. Use the data dictionary format shown in Chapter 7, Table 7.3 as your data dictionary template.

958

Figure PC.4A The Crow’s Foot ERD for Problem 4 (The Hardware Store)

959

Figure PC.4B The RC_Hardware ACCT_TRANSACTION Table Contents

Figure PC.4C The RC_Hardware Transaction Query Results

960

Naturally, the CUST_BALANCE value in the CUSTOMER table and the remaining TRANS_INV_BALANCE value in the ACCT_TRANSACTION table must be updated according to the TRANS_AMOUNT value entered by the end user in the ACCT_TRANSACTION table. The applications software must be written to automatically make such updates. For example, if you Microsoft Access, you can use macros or you can use VB to accomplish this task. As you examine the query output in Figure PC.4C, note that you can easily trace the transactions for each of the customers. For example, 



961

Figure PC.4D The RC_Hardware Invoice Payment History

As you discuss Figure PC.4D, note that all the payment transactions for each invoice are easily traced. For example: 

The total charge placed on invoice #1 is $239.21. The initial payment on February 3, 2014, was $100.00, leaving a balance of $139.21.



1061. Verify the conceptual model you created in Appendix B, Problem 7. Create a data dictionary for the verified model. Answer: Compare the ERD shown in Figure PC.5A to the ERD shown in Figure PD.7A (see Appendix B) to see the impact of the verification process. Note the ternary relationship between SIGN_OUT, LOG_LINE, PART, and EMPLOYEE. This relationship enables the end user to track all parts used in each of the log lines for each of the logs and to verify that the parts that were signed out for the log line were, in fact, used in that log line’s maintenance procedure. Use the data dictionary format shown in Chapter 8, Table 8.2 as your data dictionary template.

962

Figure PC.5A The Verified Crow’s Foot ERD for ROBCOR Aircraft Service

963

Figure PC.5B The Modified Crow’s Foot ERD for ROBCOR Aircraft Service

964

Figure PC.5C The Relational Diagram for the FlyFar Database Segment

965

Figure PC.5D A Duplicate Record Warning

If you change the test date to indicate that the test result to be entered is different from an earlier test result, the DBMS will accept the data entry. (Note that the FAR135-w test was taken twice by employee 105: once on 22-Jan-2013 and once on 01-Mar-2014.) Remind your students that you can also create a single-attribute, system-generated PK named EMPTEST_NUM for the tblEMPTEST table in Figure PC.5C. This action will convert the composite (weak) EMP_TEST entity to a strong entity. (The EMP_NUM and TEST_CODE remain as foreign keys.) However, if you still want to avoid the duplication of records—a very desirable feature—you must maintain a candidate key composed of EMP_NUM, TEST_CODE, and EMPTEST_DATE—and you must set the index properties to “required” and “unique” for each of the attributes in that candidate key. (The same features may be used in the tblEDUCATION and tblTRAINING tables.) Whether or not you use a single-attribute PK or a composite PK may depend on specified system transaction and/or tracking requirements. The single-attribute PK/composite PK decision is often a function of professional judgment—clearly, the composite PKs work well in the original design shown in Figure PC.5B. However, if a PK is to be referenced by the FK(s) in one or more related tables, the creation of a single-attribute PK is appropriate. In fact, trying to create a relationship between an FK in one table and a composite PK in a related table will quickly illustrate the need for a single-attribute PK. In any case, query design becomes a more complex task when relationships based on composite PKs are traced through several levels.****** 1062. Design (through the logical phase) a student-advising system that will enable an advisor to access a student’s complete performance record at the university. A sample output screen should look like the one shown in Table PC.6.

966

Answer: Table PC.6 The Student Transcript for Problem 6 Name: Xxxxxxxxxxxxxxxxx X. Xxxxxxxxxxxxxxxxxxxxxxx

Page # of ##

Department: xxxxxxxxxxxxxxxxxxxxxxx

Major: xxxxxxxxxxxxxx

Social Security Number: ###-##-####

Report Date: ##/Xxx/####

Spring, 20XX Course ENG 111 (Freshman English)

Hours

Grade

Grade points

Xxxxxxxxxxxxxxxxxxxxxxxxxxx

Total this semester

GPA: #.##

Total to date

###

Cumulative GPA: #.##

Summer, 20XX Course CIS 300 (Computers in Society)

Hours

Grade

Grade points

Xxxxxxxxxxxxxxxxxxxxxxxxxxx

Total this semester

GPA: #.##

Total to date

###

Cumulative GPA: #.##

Fall, 20XX Course CIS 400 (Systems Analysis) Xxxxxxxxxxxxxxxxxxxxxxxxxxx

Hours

Grade

Grade points

967

Xxxxxxxxxxxxxxxxxxxxxxxxxxx

Total this semester

GPA: #.##

Total to date

###

Cumulative GPA: #.##

The Development of the ERD To satisfy the requirements, the ERD must be based on (at least) the following business rules: 6. A department has many students, and each student “belongs” to only one department. 7. A student takes many classes, and each class is taken by many students. 8. A student may enroll in a class one or more times. Naturally, if a class is taken more than once, that “repeat” class is taken in a different semester. 9. A class is a section of a course, that is, a course can yield many classes, but each class references only one course. For example, two sections of the course described by CIS483, Database Systems, 3 credit hours, Prerequisites: 9 hours of CIS courses, including CIS370 (Systems Analysis) may be taught in the Fall and Spring semesters, while the course may not be offered in the Summer session. (Since a course is not necessarily offered each semester, CLASS is optional to COURSE.) 10. Each course belongs to a department. For example, the English department would not offer a Database course. The database should include at least the following components: DEPARTMENT (DEPT_CODE, DEPT_NAME) STUDENT (STU_NUM, STU_LNAME, STU_FNAME, STU_INITIAL, DEPT_CODE) DEPT_CODE references DEPT COURSE (CRS_CODE, CRS_DESC, CRS_CREDIT_HOURS) CLASS (CLASS_ID, CRS_CODE, CLASS_PLACE, CLASS_TIME)

968

STUDENT.STU_NUM, S_LNAME, DEPARTMENT.DEPT_CODE, DEPT_NAME, ENROLL_SEMESTER, ENROLL_CRS_ CREDIT, ENROLL_CRS_NAME, ENROLL_GRADE

FROM

STUDENT, DEPARTMENT, ENROLL, CLASS, COURSE

WHERE

STUDENT.STU_NUM

= ENROLL.STU_NUM AND

CLASS.CLASS_ID

= ENROLL.CLASS_ID AND

DEPARTMENT.DEPT_CODE

= STUDENT.DEPT_CODE AND

CLASS.CRS_CODE

= COURSE.CRS_CODE

969

ORDER BY

ENROLL.STU_NUM, ENROLL_SEMESTER, ENROLL_CRS_NAME;

Table P6.1 A Grade Point Conversion Table Letter Grade

Numeric Value

When the verification process is completed, the ERD looks like the one shown in Figure PC.6.

970

Figure PC.6 The Crow’s Foot ERD for the (Transcript-based) Student Advising System

971

1063. Design and verify a database application for one of your local not-for-profit organizations (for example, the Red Cross, the Salvation Army, your church or synagogue). Create a data dictionary for the verified design. Answer: Since this problem’s solution depends on the selected organization, no solution can be presented here. However, the steps required in the solution are shown in Question 4. An abbreviated version is presented in Problem 1. 1064. Using the information given in the physical design section (C-5), estimate the space requirements for the following entities: 

RESERVATION



INV_TRANS



TR_ITEM



LOG



ITEM



INV_TYPE

Table: RESERVATION (4 per week, 14 weeks per semester, 56 reservations per semester)

Attribute

Data Type

Storage (bytes)

RES_ID

INT

RES_DATE

DATE

USER_ID

CHAR(11)

LA_ID

CHAR(11)

Row Length (bytes)

Number of Rows

Total Bytes

1,904

972

Table: INV_TRANS (80 per week, 14 weeks per semester, 1,120 transactions per semester) Data Type

Attribute

Storage (bytes)

TRANS_ID

INT

TRANS_TYPE

CHAR(1)

TRANS_PURPOSE CHAR(2)

TRANS_DATE

DATE

LA_ID

CHAR(11)

USER_ID

CHAR(11)

ORDER_ID

INT

TRANS_COMEMT

CHAR(50)

Row Length (bytes)

Number of Rows

1,120

Total Bytes

101,920

Table: TR_ITEM (240 per week, 14 weeks per semester, 3,360 per semester) Attribute

Data Type

Storage (bytes)

TRANS_ID

INT

ITEM_ID

NUMBER(8,0)

LOC_ID

CHAR(10)

TRANS_QTY

INT

Row Length (bytes)

Number of Rows

Total Bytes

3,360

87,360

973

Table: LOG (5,000 per week, 14 weeks per semester, 70,000 reservations per semester)

Attribute

Data Type

Storage (bytes)

LOG_DATE

DATE

LOG_TIME

CHAR(12)

LOG_READER

CHAR(1)

USER_ID

CHAR(11)

Row Length (bytes)

Number of Rows

Total Bytes

70,000

2,240,000

Table: ITEM (890 identified)

Attribute

Data Type

Storage (bytes)

ITEM_ID

NUM(8,0)

TY_GROUP

CHAR(8)

ITEM_INIV_ID

CHAR(7)

ITEM_DESCRIPTION

CHAR(10)

ITEM_QTY

INT

VEND_ID

CHAR(5)

ITEM_STATUS

CHAR(1)

ITERM_BUY_DATE

DATE

Row Length (bytes)

Number of Rows

Total Bytes

890

76,540

974

Table: INV_TYPE (15 categories)

Attribute

Data Type

Storage (bytes)

TY_GROUP

CHAR(8)

TY_CATEGORY

CHAR(2)

TY_CLASS

CHAR(2)

TY_TYPE

CHAR(2)

TY_SUBTYPE

CHAR(2)

TY_DESCRIPTION

CHAR(35)

TY_UNIT

CHAR(4)

Row Length (bytes)

Number of Rows

Total Bytes

825

ANSWERS TO REVIEW QUESTIONS NOTE Since the answers to many of these questions are covered in detail in Appendix F, we have elected to give you section references to avoid needless duplication. 1065. Mainframe computing used to be the only way to manage enterprise data. Then personal computers changed the data management scene. How do those two computing styles differ, and how did the shift to PC-based computing evolve? Answer: The evolution toward client/server information systems is explained in Section F-2. The main differences between mainframe-based information systems and PC-based

975

Explain how client/server system components interact. Answer: The main client/server components are the client, the server, and the communications channel. Some experts include middleware as a separate component. The client provides an interface for interacting with the user and performing some tasks such as local data validation. When data or processing is needed from the server, the client sends a request over the communications channel, which may include processing by middleware components. The request is sent to the server process, which provides the data or processing requested, and sends the results back to the client, possibly going through middleware in the process.

1068.

Describe and explain the client/server architectural principles. Answer: Client/server components must conform to some basic architectural principles if they are to interact properly. The client and server distribute an application’s processing across two types of independent entities. While the entities may run on the same physical computer, this is not typical. The client architecture is usually some type of personal computing device with sufficient memory, processing power, and storage to manage a user interface and other local tasks. The server component also includes memory, processing power, and storage, but typically is more powerful since it must be able to handle multiple concurrent requests from many different clients. The separation of the client and server components across communication media allows many benefits such as location independency, improved resource efficiency, and scalability.

1069. Describe the client and the server components of the client/server computing model. Give examples of server services.

976

Answer: Desirable hardware and software for the client component includes powerful hardware, a multitasking operating system, a graphical user interface, and communications capabilities. Desirable characteristics of the server component include a fast CPU, faulttoleration capabilities, expandability for memory, storage, and peripherals, and multiple communication options. Server services can include file services, print and fax services, database services, and miscellaneous services such as CD-ROM, video, and back-up. 1070. Using the OSI network reference model, explain the function of the communications middleware component. Answer: The communications channel provides the means through which clients and servers communicate. The communications channel connects clients and servers and its main function is the delivery of messages between clients and servers. Using the OSI network reference model, Section F-3e provides a detailed explanation of the communication channel. Note that we use the OSI network reference model because most of the client/server applications are based on a scenario in which clients and servers are tied together through a network. 1071.

What major network communications protocols are currently in use? Answer: The network protocols determine how messages between computers are sent, interpreted, and processed. The main network protocols in use today are Transmission Control Protocol/Internet Protocol (TCP/IP), Internetwork Packet Exchange/Sequenced Packet Exchange (IPX/SPX), and Network Basic Input/Output System (NetBIOS). Section F4 provides a more detailed explanation of these and other network protocols.

1072. Explain what middleware is and what it does. Why would MIS managers be particularly interested in such software? Answer: Middleware is software that is used to manage client/server interactions. Most important to the end user and MIS manager is the fact that middleware provides services to insulate the client from the details of network protocols and server processes. MIS managers are usually concerned with finding ways to improve end-user data access and to improve programmer productivity. By using middleware software, end users can access legacy data and programmers can write better applications faster. The applications are network independent and database server independent. Such an environment yields improved productivity, thereby generating development costs savings. Sections F-3f and F-3g provide additional database middleware software details. 1073. Suppose you are currently considering the purchase of a client/server DBMS. What characteristics should you look for? Why? Answer: A client/server DBMS is just one of the components in an information system. The DBMS should be able to support all applications, business rules, and procedures necessary to implement the system. Therefore, the DBMS must match the system’s technical characteristics, it must have good management capabilities, and it must provide the desired level of support from vendors and third parties. Specifically: 

977



On the managerial side the database must provide a wide variety of managerial tools, database backup and recovery, GUI-based tools, remote management, interface to other management systems, performance monitoring tools, database utilities, and so on.



On the support side the DBMS must have good third-party vendor support, technical support, training, and consulting.

1074. Describe and contrast the client/server computing architectural styles that were introduced in this appendix. Answer: This question deals with identifying the application processing logic components and deciding where to locate them. Section F-11 covers this very important topic in great detail. (Note particularly the summary in Figure F.19, “Functional Logic Splitting in Four Client/Server Architectural Styles.”)

978

Contrast client/server data processing and traditional data processing. Answer: From a managerial point of view, client/server data processing tends to be more complex than traditional data processing. In fact, client/server computing changes the way in which we look at the most fundamental computing chores and expands the reach of information systems. These changes create a managerial paradox. On the one hand, MIS frees end users to do their individual data processing and, on the other hand, end users are more dependent on the client/server infrastructure and on the expanded services provided by the MIS department. Client/server computing changes the way in which systems are designed, developed, and managed by forcing a change from: 

proprietary to open systems



maintenance-oriented coding to analysis, design, and service



data collection to data deployment



a centralized to a distributed style of data management



a vertical, inflexible organizational style to a more horizontal, flexible style

1076. Discuss and evaluate the following statement: There are no unusual managerial issues related to the introduction of client/server systems. Answer: The managerial issues in client/server systems management arise from the changes in the data processing style, the management of multiple hardware and software vendors, the maintenance and support of the client/server infrastructure, such as communications, applications, and the management and control of associated costs. The heterogenous nature of the client/server environment presents unique challenges along all of those dimensions. Therefore, one can evaluate the given statement as false because there are many unusual managerial issues related to introducing client/server systems.

979

ANSWERS TO PROBLEMS ROBCOR, a medium-sized company, has decided to update its computing environment. ROBCOR has been a minicomputer-based shop for several years, and all of its managerial and clerical personnel have personal computers on their desks. ROBCOR has offered you a contract to help the company move to a client/server system. Write a proposal that shows how you would implement such an environment. Answer: Because Problem 1 cannot be answered properly without addressing the computing style issue in Problem 2, the answers to both questions are supplied after Problem 2. 1077. Identify the main computing style of your university computing infrastructure. Then recommend improvements based on a client/server strategy. (You might want to talk with your department’s secretary or your advisor to find out how well the current system meets their information needs.) Answer: Problems 1 and 2 are research questions that yield extensive class projects. The questions are designed with two ideas in mind: 3. To have the student assume the consultant’s “proactive” role. 4. To entice the students to use the knowledge acquired in this appendix to develop an integrated approach to client/server systems implementation. The expected output for these projects is a business quality paper and a professional-level class presentation of the findings, recommended solutions, and the suggested implementation. The material presented in Section F-12d yields an outline appropriate for such a paper. It will be beneficial if students have taken at least an introductory course in Systems Analysis and Design. Keep in mind that you can either use the two scenarios presented in these questions or you can assign students a real-world case to accomplish the same goals. In the first case, the professor assumes the role of the end user. In the second case, an external third party is the end user. The problem with real-world cases is that the professor must procure commitment from the third party. Unfortunately, it is sometimes difficult for company managers to provide possibly sensitive internal information to students and to devote scarce time resources to student projects. Even if the project is kept within the university’s bounds, you are likely to discover that the university administrators may not be able or willing to provide critical information. Students should be encouraged to use the presentations as a basis for further analysis of the more nettlesome issues that must be confronted in the development of client/server systems. We suggest several class discussion sessions in which different student groups present alternative solutions. Such presentations will force students not only to design a solution but also to sell the solution to management.

980

ANSWERS TO REVIEW QUESTIONS NOTE To ensure in-depth chapter coverage, most of the following questions cover the same material that we covered in detail in the text. Therefore, in most cases, we merely cite the specific section, rather than duplicate the text material. 1078. Discuss the evolution of object-oriented concepts. Explain how those concepts have affected computer-related activities. Answer: Object orientation is the combining of data and the processes to manipulate that data into a single, modular unit. These concepts first appeared in object-oriented programming languages. Object orientation is intuitive and predictably gained in popularity as the rise in popularity of personal computers increased the computing resources available to end users. 1079. How would you define object orientation? What are some of its benefits? How are OO programming languages related to object orientation? Answer: Object orientation is a set of design and development principles based on conceptually anonymous computer structures known as objects, which encapsulate data and the procedures to manipulate that data. Among the benefits of object orientation are a reduction in the number of lines of code necessary to create applications, decreased development time, code reusability, support for abstract data types and complex data objects, and support for complex data manipulations in specialized applications. Table G.1 summarizes more benefits in addition to these. OO concepts have created a powerful programming environment that has radically changed both programming and systems development. Although traditional programmers tended to agree that modularity is one of the primary goals of structured programming and good design, modularity was often difficult to achieve. Even a cursory examination of OO concepts leads to the conclusion that the conceptually autonomous structure (in which an object contains both data and methods) makes the much sought-after modularity almost inevitable. 1080.

Define and describe the following: e. Object f.

Attributes

g. Object state h. Object ID (OID) Answer: Object is an abstract representation of a real-world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself.

981

Attributes are also called instance variables in an object-oriented environment. They are the data characteristics of the object. Object state is the set of values that the object’s attributes have at any given time. Object ID is a system-generated object identifier that is independent of the object state and any physical address in memory. Like a primary key it provides a unique identity for an instance, but it is system-generated and cannot be changed under any circumstances. 1081. Define and contrast the concepts of method and message. What OO concept provides the differentiation between a method and a message? Give examples. Answer: A method is the code that performs a specific operation on the object’s data. Messages are requests sent by one object (sender) to other objects (receivers) requesting the receivers to use one of the receiver’s methods to change the receiver’s data or state. Encapsulation is the concept that hides an object’s internal details. It prevents one object from directly manipulating the contents of another. For example, a Payment class may be used to record a new payment being made by a customer. The class uses an internal method to generate a new Payment object. The Payment object sends a message to a Customer object instructing it to change the customer’s balance using the customer.updateBalance() method. The Customer object uses the updateBalance method and other internal methods to validate the results as it completes the request. 1082. Explain how encapsulation provides a contrast to traditional programming constructs such as record definition. What benefits are obtained through encapsulation? Give an example. Answer: Encapsulation hides the object’s internal data representation and method implementation, thus ensuring the object’s data integrity and consistency. The programmer needs only ask an object to perform an action, without having to specify how the action is to be performed. Since the implementation details need not be specified, the programmer can concentrate on the overall process. Clearly, an object is an independent entity. Therefore, object independence assures system modularity. For example, an object-oriented system is formed by possibly thousands of independent objects (or even more) that interact to perform specific actions. In short, what we have just described is a perfectly modular system. In contrast, the programmer who uses traditional programming languages has direct access to the internal components of a record type. Therefore, the programmer can directly manipulate the data elements at will. This ability is not necessarily valuable; programmers can (and do) make mistakes, thus causing problems in critical systems. For example, when you create a record type “customer” in your program, you have direct access to all the data elements of such a record, so there is no protection of the data. 1083.

Using an example, illustrate the concepts of class and class instances. Answer: A class instance is an object. A class is composed of a collection of objects or class instances with shared structure (attributes) and behavior (methods). A class named STUDENT may be used to contain the collection of individual student objects.

1084. What is a class protocol, and how is it related to the concepts of methods and classes? Draw a diagram to show the relationships among these OO concepts: object, class, instance variables, methods, object state, object ID, behavior, protocol, and messages.

982

Answer: A class protocol is the collection of messages, each identified by a message name that are made available for other objects to see. It represents the public aspect of the class and of the objects in that class. 1085. Define the concepts of class hierarchy, superclasses, and subclasses. Explain the concept of inheritance and the different types of inheritance. Use examples in your explanations. Answer: Class hierarchy is the organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass. In a class hierarchy, the superclass is the more general classification from which the subclasses inherit data structures and behaviors. In a class hierarchy, a subclass is a class derived from a superclass. Inheritance is the ability of an object within the hierarchy to inherit the data structure and behavior of the classes above it. For example, Stringed Instrument may be a subclass of Musical Instrument, and Guitar is a subclass of Stringed Instrument. If a class has only one immediate superclass above it, then single inheritance occurs. If a class has multiple immediate superclasses above it, then multiple inheritance occurs. In the above example, a Guitar, having only Stringed Instrument as a superclass, would exhibit single inheritance. A Piano object would have both Stringed Instrument and Percussion Instrument as superclasses so would exhibit multiple inheritance. 1086. Define and explain the concepts of method overriding and polymorphism. Use examples in your explanations. Answer: Ordinarily, if a method is in a superclass, then it does not need to be created in the subclasses. Calls for the method will look in the subclass to find the method. If the method isn’t found, then the system will look for the method in the superclass. Therefore, if the subclass and the superclass would define the method in the same way, then the method only needs to be defined once, in the superclass, and inheritance will allow all the subclasses to respond to that method. Method overriding is when a subclass has a different definition of a method than its superclass. In that case, even though the subclass would inherit the definition of the method from the superclass, this definition is “overridden” by the definition in the subclass so messages for the method will call the method as defined in the subclass instead of the definition from the superclass. Polymorphism allows different objects to respond to the same message in different ways. For example, Pilot and Mechanic objects may use different calculations for determining monthly pay. This allows an object, such as Payroll, to send the same message using the monthPay method to both Pilot and Mechanic requesting the monthly pay amount, and the Payroll object is unaware that Pilot and Mechanic define monthPay differently. 1087. Explain the concept of abstract data types. How do they differ from traditional or base data types? What is the relationship between a type and a class in OO systems? Answer: An abstract data type is a set of similar objects with shared and encapsulated data representation and methods. It is generally used to describe complex objects. Traditional data types are predefined and have a set of predefined operations that can be performed on them. With an abstract data type, the programmer defines the operations that can be performed on them using methods.

983

What are the five minimum attributes of an OO data model? Answer: The system must be able to remember data locations. The system must be able to manage very large databases. The system must accept concurrent users. The system must be able to recover from hardware and software failures. Data query must be simple.

1089. Describe the difference between early and late binding. How does each of these affect the object-oriented data model? Give examples. Answer: Early binding is the property by which the data type of an object’s attribute must be known at definition time, bonding the data type to the object’s attribute. Late bind is the characteristic in which the data type of an attribute is not known until execution time or runtime. Late binding allows instance variables to be defined as abstract data types that are defined by methods during execution. 1090. What is an object space? Using a graphic representation of objects, depict the relationship(s) that exist between a student taking several courses and a course taken by several students. What type of object is needed to depict that relationship? Answer: The object space or object schema is the equivalent of a database schema. The object space is used to represent the composition of the state of an object at a given time. For example, you can use the schema shown in Figure QG.13 to represent the M:N relationship between CLASS and STUDENT:

984

Figure QG.13 The Object Schema for the Relationship between Student and Class STUDENT

ENROLL

STU_SOC_SEC_NUM

CLASS:

CLASS 1

STU_LNAME

TAKEN BY:

STU_ADDRESS

STUDENT:

STU_CITY

ENROLL

STUDENT

STU_STATE STU_ZIPCODE CLASS_TAKEN:

CLASS_DESCRIPTION

CLASS

STU_FNAME

CLASS_CODE

GRADE M

ENROLL STU_CUM_GPA STU_SEM_GPA

1091. Compare and contrast the OODM with the ER and relational models. How is a weak entity represented in the OODM? Give examples. Answer: Although the OODM has much in common with relational and ER data models, the OODM introduces some fundamental differences. Table QG.14 provides a summary of the OODM characteristics

Table QG.14 A Comparison of OODM, ERM, and Relational Model Features

OODM

ER Model (ERM)

Relational Model

Type

Entity definition (limited)

Table definition (limited)

Object

Entity

Table row or tuple

Class

Entity set

Table

Instance variable

Attribute

Column (attribute)

OID

N/A

Primary key

Object schema

ER diagram

Relational schema

Class hierarchy

N/A*

N/A

Inheritance

N/A*

N/A

985

OODM

ER Model (ERM)

Relational Model

Encapsulation

N/A

Method

N/A

Name and describe the 13 mandatory features of an OODBMS.

Answer:

1093.



The system must support complex objects.



Object identity must be supported.



Objects must be encapsulated.



The system must support types or classes.



The system must support inheritance.



The system must avoid premature binding.



The system must be computationally complete.



The system must be extensible.



The system must be able to remember data locations.



The system must be able to manage very large databases.



The system must accept concurrent users.



The system must be able to recover from hardware and software failures.



Data query must be simple. What are the advantages and disadvantages of an OODBMS?

Answer: The OODBMS advantages include the following: more semantic information in the database, better support for complex objects, extensible data types, versioning, faster development, and easier maintenance with reusable classes. OODBMS disadvantages include the following: incorporation of many OO features in RDBMS provides strong opposition to the implementation complexities of OODBMS, lack of theoretical foundation, complexity of OODBMS pointer systems, no standard ad hoc query language like SQL, very steep initial learning curve, few qualified data professionals, and lack of compatibility between different OODBMSs.

986

1094. Explain how OO concepts affect database design. How does the OO environment affect the DBA’s role? Answer: Relational database design requires a separation between data and process. Traditionally, identification of data elements is the primary consideration that drives database design. Consideration of the procedures by which that data is manipulated occurs much later in the design process. OO concepts require consideration of the data elements and their manipulation to be considered at the same time since they are encapsulated into a single object. Working toward the OO goal of code reusability is difficult and will require DBAs to become much more proficient programmers as they assume more responsibility for the defining and implementing operations that affect the data. 1095. What are the essential differences between the relational database model and the object database model? Answer: 

An object extends beyond the static concept of an entity or tuple in the other data models.



Like the entity set and table, a class includes the data structure. However, unlike the entity set and table, the class also includes methods.



Unlike its relational and ER counterparts, encapsulation allows an object’s “internals” to be hidden from the outside.



Unlike its relational and ER counterparts, inheritance allows an object to inherit attributes and methods from a parent class.



1096. Using a simple invoicing system as your point of departure, explain how its representation in an entity relationship model (ERM) differs from its representation in an object data model (ODM). (Hint: See Figure G.34.) Answer: As shown in Figure G.34, the object model represents the INVOICE as an object containing other objects (CUSTOMER and LINE). In contrast, the ER model uses three different and separate entities related to each other through their primary key/foreign key attributes. Note that the object model automatically includes the CUSTOMER and LINE object instances when each INVOICE line instance is made current.

987

1097.

What are the essential differences between an RDBMS and an OODBMS? Answer: OODBMS characteristics show that the OODBMS shares features such as data accessibility, persistence, backup and recovery, transaction management, concurrency control, and security and integrity with the RDBMS. In addition, the OODBMS has unique characteristics such as support for complex objects, encapsulation and inheritance, abstract data types, and object identity.

1098.

Discuss the object/relational model’s characteristics. Answer: The basic features of the O/RM include extensibility of new user-defined data types, support for complex objects, inheritance between supertypes and subtypes within a specialization hierarchy, procedure calls using triggers, and system-generated identifiers similar to object IDs. While these features do not perfectly mimic the structures of OODBMS, they respond to some of the most commonly cited OODBMS capabilities.

988

Figure PG.1 The RRE Trucking Company Database

989

Answer: As you examine Figure PG.1A, note that, for simplicity’s sake, we have chosen not to represent BASE_MANAGER as an abstract data type belonging to the class PERSON.

Figure PG.1A The OO Conceptual Representation BASE

TRUCK TRUCK_NUM

BASE: 1 BASE TYPE:

TYPE

BASE_CODE

TYPE_CODE

BASE_CITY

TYP_DESCRIPTION

BASE_STATE

BASE_AREA_CODE

BASE_PHONE

BASE_MANAGER

TRUCKS:

TRUCK_MILES

TRUCK_BUY_DATE

TRUCK_SERIAL_NUM

TYPE

TRUCKS: MM CTRUCK

M M

CTRUCK

Note: c = character data d = date data n = numeric data

Figure PG.1A also illustrates that the CTRUCK class represents a collection of TRUCK objects. In other words, one instance of the CTRUCK class will contain several instances of the class TRUCK. 1099.

Using the tables in Figure PG.1 as a source of information:

Answer: c. Define the implied business rules for the relationships. Given the tables in Figure PG.1, you may develop the following relationships: 

A BASE can have many TRUCKs.



Each TRUCK belongs to only one BASE.



A TRUCK has only one truck TYPE.



Each truck TYPE may have several TRUCKs belonging to it.

d. Using your best judgment, choose the type of participation of the entities in the relationship (mandatory or optional). Explain your choices. From the data shown in Figure PG.1 you can conclude that: 

BASE and TYPE are mandatory for TRUCK.



A TRUCK must have a BASE.



A truck is of a given TYPE.



TRUCK is mandatory for BASE.

990



A BASE must have at least one TRUCK to be considered a BASE.



TRUCK is optional for TYPE. There can be zero, one, or more TRUCKs belonging to a TYPE.

d. Develop the conceptual object schema. Using the results of Problems (a) and (b), the conceptual object schema is represented by Figure PG.2C.

TYPE OLD: DF56 TYPE_CODE: 1 TYPE_DESCRIPTION: Single box, double-axle TRUCKS: [Y54F]

BASE_MANAGER: Andrea D. Gallager TRUCKS: [Y678] CTRUCK OLD: Y678 [TX34], [TX37], [TX65]

CTRUCK OLD: Y54F [TX34]

991

1100. Using the data presented in Problem 1, develop an object space diagram representing the object’s state for the instances of Truck listed below. Label each component clearly with proper OIDs and attribute names. Answer: a. The instance of the class Truck with TRUCK_NUM = 5001. The instance of this class is shown in Problem 2C’s conceptual object schema (Figure PG.2C). c. The instances of the class Truck with TRUCK_NUM = 5003 and 5004. As you examine the conceptual object schema shown in Problem 2C, note the following features: 

OIDs are used to reference the object instances of the classes BASE and TYPE.



The BASE and TYPE object instances reference two different CTRUCK object instances.



Using the OIDs, each CTRUCK object instance contains the reference to several object instances of the class TRUCK.

Using these features, the conceptual object schema looks like Figure PG.3B.

992

Figure PG.3B The Conceptual Object Schema TRUCK

TRUCK

OLD: TX37

OLD: TX65

TRUCK_NUM: 5003

TRUCK_NUM: 5004

BASE: [BD39]

TYPE: [DF48]

TYPE: [DF56]

TRUCK_MILES: 221346.6

TRUCK_MILES: 99894.3

TRUCK_BUY_DATE: 12/27/07

TRUCK_BUY_DATE: 2/21/08

TRUCK_SERIAL_NUM: AC-445-78656-Z99

TRUCK_SERIAL_NUM: WG-11223144-T34 TYPE

BASE

OLD: DF56

OLD: BD39 BASE_CITY: Nashville BASE_STATE: TN BASE_AREA_CODE: 615

TYPE_CODE: 2 TYPE_DESCRIPTION: Single box, single-axle TRUCKS: [Y54F] CTRUCK OLD: Y54F

BASE_PHONE: 123-4567 BASE_MANAGER: Andrea D. Gallager

[TX37], [TX65], ……... TYPE

TRUCKS: [Y678] OLD: DF48 CTRUCK OLD: Y678 [TX34], [TX37], [TX65]

TYPE_CODE: 1 TYPE_DESCRIPTION: Single box, double-axle TRUCKS: [Y54F]

As you examine Figure PG.3B’s conceptual object schema, note the following features: 

OIDs are used to reference the object instances of the classes BASE and TYPE.



Both object instances reference the same BASE and TYPE object instances. This property is also called referential object sharing.

1101. Given the information in Problem 1, define a superclass Vehicle for the Truck class. Redraw the object space you developed in Problem 3, taking into consideration the new superclass that you just added to the class hierarchy. Answer: To add a superclass VEHICLE to the TRUCK class, first define the superclass VEHICLE, after which you can create the subclass TRUCK. After this task has been completed, the end user will see only the attributes and methods that were inherited from VEHICLE. (The user does not perceive the difference!) To illustrate this point, the object space must also show the new VEHICLE instance. (See Figure PG.4.)

993

Figure PG.4 The Conceptual Object Schema VEHICLE OLD: VF345 MAKER: Ford

Class/Subclass Relationship

YEAR: 1992 TRUCK Attributes Inherited From the VEHICLE Superclass

OLD: TX34 MAKER: Ford YEAR: 1992 TRUCK_NUM: 5001 BASE: [BD39]

Interclass Relationships

TYPE: [DF56] BASE

TRUCK_MILES: 162123.5 TRUCK_BUY_DATE: 11/08/07

OLD: BD39 BASE_CITY: Nashville

TRUCK_SERIAL_NUM: AA-322-12212-W11 TYPE

BASE_STATE: TN BASE_AREA_CODE: 615

OLD: DF56

BASE_PHONE: 123-4567

TYPE_CODE: 1

BASE_MANAGER: Andrea D. Gallager

TYPE_DESCRIPTION: Single box, double-axle

TRUCKS: [Y678]

TRUCKS: [Y54F]

CTRUCK OLD: Y678 [TX34], [TX37], [TX65]

1102.

CTRUCK OLD: Y54F [TX34]

Assume the following business rules:

Answer: 

A course contains many sections, but each section has only one course.



A section is taught by one professor, but each professor may teach one or more different sections of one or more courses.



Each section is taught in one room, but each room may be used to teach several different sections of one or more courses.



A professor advises many students, but a student has only one advisor.

Based on those business rules: Identify and describe the main classes of objects.

994

Using the business rules 1 through 6, we may identify the objects: COURSE

STUDENT

CLASS

ROOM

PROFESSOR

Figure PG.5A The Conceptual Object Schema STUDENT

CLASS

STU_NUM

COURSE:

STU_LNAME

COURSE

STU_FNAME

PROFESSOR:

STU_ADDRESS

PROFESSOR

STU_CITY

ROOM:

STU_STATE

ROOM

STU_ZIPCODE

GRADE SCHEDULE:

PROF_NUM

CRS_DESCRIPTION C

PROF_NAME

CRS_CREDIT

PROF_DOB

DEPT_CODE

OFFERING:

N M

TEACH_LOAD: M CLASS

CLASS ROOM M

STUDENT

PROFESSOR

CRS_CODE

ENROLL:

ADVISOR:

PROFESSOR

COURSE

BLDG_CODE

ROOM_NUM

ADVISEES:

STUDENT

RESERVATION: CLASS

CLASS GRADE

STU_CUM_GPA

STU_SEM_GPA

Note: C = Character D = Date N = Numeric

Use the following descriptions to characterize the model’s components:

995

996

Figure PG.5B-1 The Abstract Data Types (Classes)

NAME

DOB

ADDRESS

FIRST_NAME

MONTH

STREET

INITIAL

DAY

APT_NUM

LAST_NAME

YEAR

CITY

STATE

ZIPCODE

997

Figure PG.5B-2 The Object Instance Representation for PROFESSOR NAME OID:

M45

FIRST_NAME:

June

INITIAL:

LAST_NAME:

Hasselblatt

PROFESSOR

DOB

230843

OID: 456

OID:

PROF_NAME:

[M45]

MONTH:

PROF_DOB:

[456]

DAY:

PROF_ADDRESS:

[401]

YEAR:

1961

EPT_CODE:

CIS

TEACH_LOAD:

[D40]

ADVISEES:

[X34]

ADDRESS OID:

[401]

STREET:

North Side

Blvd. APT_NUM :

1093B

CITY:

Paris

STATE:

ZIPCODE:

37892

Course and Section.



Section and Professor.



Professor and Student.

998

Figure PG.5C The Object Representation for an Object of the Class PROFESSOR PROFESSOR

Collection of SECTION classes D40

OID:

230843

OID:

PROF_NAME:

[M45]

PROF_DOB:

[456]

A34332 OID: ……………. 349 OID: ……………. ……………. 369 OID: ……………. ……………. 380 OID: ……………. ……………. …………….

PROF_ADDRESS: [401] DEPT_CODE:

CIS

TEACH_LOAD:

[D40]

ADVISEES:

[X34]

CLASS objects

Collection of STUDENT classes OID:

STUDENT objects

X34

346 OID: ……………. 345 OID: ……………. ……………. 556 OID: ……………. ……………. 580 OID: ……………. ……………. …………….

CLASSes taught by the PROFESSOR.



STUDENTs advised by the PROFESSOR.

Collection objects are used to implement 1:M relationships.

999

Use object representation diagrams to show the relationships between: 

Section and Students.



Room and Section.

Figure PG.5D-1 The Relationship Between CLASS and STUDENT

STUDENT

CLASS

STU_NUM

STU_LNAME

STU_FNAME

STU_ADDRESS

STU_CITY

STU_STATE

STU_ZIPCODE

required

COURSE PROFESSOR:

required

PROFESSOR

ROOM:

ADVISOR:

required

PROFESSOR SCHEDULE:

required

ROOM

ENROLL:

required

optional

N N

STU_SEM_GPA

STUDENT GRADE

STU_CUM_GPA

CLASS GRADE

COURSE:

Note: C = Character D = Date

As you examine Figure PG.5D-1, note that:

1000



A student must be registered in one or more CLASSes, and the student earns a GRADE in each CLASS. (Reminder: We’ve used CLASS to represent a Section of a course.)



The CLASS requires a COURSE, a PROFESSOR, and a ROOM.



The CLASS may have one or more STUDENTS, each of whom earns a GRADE in that CLASS. In other words, STUDENT is optional to CLASS.

Figure PG.5D-2 The Object Diagram for Problem PG.5d

STUDENT STU_NUM

STU_LNAME

STU_FNAME

STU_ADDRESS

STU_CITY

STU_STATE

STU_ZIPCODE

ADVISOR:

CLASS

STU_REC STUDENT:

PROFESSOR:

PROFESSOR

CLASS GRADE

COURSE

STUDENT CLASS:

COURSE:

ROOM:

ROOM

PROFESSOR SCHEDULE: M

Note: C = Character D = Date N = Numeric

ENROLL:

STU_REC

STU_REC STU_GPA

As you discuss Figure PG.5D-2, note that STU_REC (the student record) is the intersection class that represents the M:N relationship between STUDENT and CLASS.

1001

Attribute Name

Data Type

NAME

DOB

ADDRESS

STUDENT and PROFESSOR will inherit the above attributes from their superclass PERSON. The class hierarchy will look like Figure PG.5E.

Figure PG.5E The Class Hierarchy PERSON

Superclass

NAME DOB ADDRESS

Subclass

STUDENT

PROFESSOR

Inherited from PERSON

NAME DOB 1

ADDRESS ADVISOR:

NAME DOB ADDRESS DEPT_CODE

TEACH_LOAD: M

PROFESSOR CLASS SCHEDULE:

STU_REC STU_GPA

Note: C = Character D = Date N = Numeric

ADVISEES:

STUDENT

As you discuss Figure PG.5E, note the differences between inheritance and interclass relationships. Explain that: 

Inheritance is automatic.



Inheritance moves from top to bottom within the class hierarchy.



Inheritance represents a 1:1 relationship between the superclass and its subclass(es).



Inheritance need not be explicitly defined through the attribute data type.

1002

In contrast, interclass relationships must be defined explicitly through the attribute’s data type. In addition, interclass relationships may represent a 1:1, a 1:M, or a M:N relationship. 1103. Convert the following relational database tables to the equivalent OO conceptual representation. Explain each of your conversions with the help of a diagram. (Note: The R&C Stores database includes the three tables shown in Figure PG.6.) Answer:

Figure PG.6 The R&C Stores Database

The conversion is shown in Figure PG.6-1.

1003

Figure PG.6-1 The Completed OO Conceptual Representation for the R&C Stores Database STORE

REGION REGION_CODE

REGION_LOCATION C STORES:

STORE

EMPLOYEE

STORE_CODE

STORE_NAME

STORE_YTD_SALES REGION: 1 REGION

MANAGER: 1

EMPLOYEE Note: C = Character D = Date N = Numeric

WORKERS:

EMP_CODE

EMP_TITLE

EMP_LNAME

EMP_FNAME

EMP_INITIAL WORKS_AT:

STORE MANAGER_OF:

EMPLOYEE

STORE

Note that Figure PG.6-1 reflects the following conditions: 

Each REGION can have many STOREs.



The STORE object includes references to the REGION and EMPLOYEE objects. The EMPLOYEE object references reflect that an employee is a manager of a store and that each store employs many employees.



1104. Convert the following relational database tables to the equivalent OO conceptual representation. Explain each of your conversions with the help of a diagram. (Note: The Avion Sales database includes the tables shown in Figure PG.7.)

1004

Answer:

Figure PG.7 The Avion Sales Database

The OO representation is shown in Figure PG.7-1.

1005

Figure PG.7-1 The Completed OO Conceptual Representation for the Avion Sales Database

CUSTOMER

INVOICE

CUS_NUM

CUS_LNAME

CUS_FNAME

CUS_INITIAL

CUS_CREDIT

CUS_BALANCE

INVOICES:

INVOICE

Note: C = Character D = Date N = Numeric

INV_ NUM CUSTOMER:

PROD_CODE

EMP_NUM

PROD_COST

EMP_TITLE

PROD_PRICE

EMP_LNAME

PROD_QOH

EMP_FNAME

PROD_MIN_QOH

EMP_INITIAL

CUSTOMER SALESREP: EMPLOYEE

EMPLOYEE

PRODUCT

INV_ DATE

INV_SUB

INV_TAX

INV_TOTAL

INV_PYMT

M LINES: INVLINE_NUM N 1 PRODUCT

PROD_LAST_ORDER D

EMP_HIRE_DATE D

LINES: INVLINE_NUM N 1 INVOICE

INVOICES:

INVOICE

INVLINE_UNITS N INVLINE_PRICE N INVLINE_TOTAL N

1105. Using the ERD shown in Appendix C, The University Lab Conceptual Design Verification, Logical Design, and Implementation, Figure C.22 (the Check_Out component), create the equivalent OO representation. Answer: Figure C.22 in Appendix C shows how the M:N relationship of USER and ITEM can be implemented by modeling this relationship through the Check_Out (bridge) entity. The OO representation of the M:N USER and ITEM relationship uses a CHECKOUT object. This object will have its own attributes and it will reference the USER and ITEM objects as shown in Figure PG.8. Note that the CHECKOUT object is a complex object that contains a group of repeating attributes: item, location, quantity, and date in.

1006

Figure PG.8 The Completed OO Conceptual Representation for Figure C.22’s Check-Out Component

USER

CHECKOUT

USER_ID

ITEM

CO_ID

ITEM_ID

CO_DATE

ITEM_UNIV_ID

USER_CLASS

USER:

USER_SEX

USER_TYPE

DEPARTMENT:

DEPARTMENT CHECKOUT:

ITEM_SERIAL_NUM N

ITEM_DESCRIPTION C

USER CO_ITEMS: 1

ITEM 1 M

LOCATION

ITEM_QTY

ITEM_SATUS

ITEM_BUY_DATE

INV_TYPE:

INV_TYPE

CHECKOUT COI_QTY

COI_DATE_IN

VENDOR: 1 VENDOR

Note: C = Character D = Date N = Numeric

CHECKOUT:

CHECKOUT

1106. Using the contracting company’s ERD in Chapter 6, Normalization of Database Tables, Figure 6.16, create the equivalent OO representation. Answer: Figure 6.16 depicts the M:N relationship between EMPLOYEE and PROJECT. The object representation of this relationship is shown in Figure PG.9.

1007

Figure PG.9 The Completed OO Conceptual Representation for Figure 6.16’s Contracting Company ERD

EMPLOYEE

ASSIGN

PROJECT

EMP_NUM

ASSIGN_NUM

PROJ_NUM

EMP_LNAME

ASSIGN_DATE

D 1

PROJ_NAME

PROJ_DATE ASSIGN:

EMP_LNAME

EMP_FNAME

EMP_INITIAL

EMP_HIREDATE JOB:

PROJECT 1

EMPLOYEE

ASSIGN

ASSIGN_HOURS N

JOB ASSIGN:

ASSIGN

JOB JOB_CODE

JOB_DESCRIPTION C JOB_CHG_HOUR

Note: C = Character D = Date N = Numeric

ANSWERS TO REVIEW QUESTIONS 1107.

What does e-commerce mean and how did it evolve? Answer: Electronic commerce (e-commerce) is the use of electronic computer-based technology to: 

Bring new products, services, or ideas to market.



Support and enhance business operations, including the sales of products and/or services over the web.

1008

1108.



Identify and briefly explain five advantages and five disadvantages of e-commerce. Answer: A summary of advantages of e-commerce: 

Comparison shopping



Reduced costs and increased competition



Convenience for online shoppers



24 × 7 × 365 operations



Global access



Lower barriers of entry



Increased market (customer) knowledge

A summary of the disadvantages:

1109.



Hidden costs of operation



Network unreliability



Higher costs of staying on business



Lack of security



Loss of privacy



Low service levels



Legal issues (fraud, copyright problems)

Define and contrast B2B and B2C e-commerce styles. Answer: E-commerce styles can be classified as:

1009



Business to business (B2B): electronic commerce between businesses.



Business to consumer (B2C): electronic commerce between businesses and consumers.



Intrabusiness: electronic commerce activities between employers and employees.

1010

1110.

1111.

1112.

1113. Name and explain the operation of the main building blocks of the Internet and its basic services. Answer: The main building blocks of the Internet are: 

TCP/IP, the network protocol that determines the rules used to create and route packets of data between computers.

1011



Router, the special hardware/software equipment that connects multiple and diverse networks. The basic services of the Internet are:



World Wide Web, a worldwide network collection of specially formatted and interconnected documents known as webpages.



Webpage, a document containing text and special commands (tags) written in Hypertext Markup Language (HTML).



Hypertext Markup Language (HTML), the standard document-formatting language for webpages.



Hyperlink, a link used by a webpage to call other webpages creating the effect of a web.



Uniform Resource Locator (URL), the address of a resource on the Internet.



Hypertext Transfer Protocol (HTTP), the standard protocol used by web browsers and web servers to communicate.



Domain Name Service, a service to translate English-like domain names into the appropriate TCP/IP addresses.



Web browser, an end-user application used to navigate through the Internet.



Web server, a specialized application whose function is to send requested webpages to the client browser.



Website, a web server and the collection of webpages stored on the server.



Static webpage, a webpage whose contents remain the same unless changed manually.



Dynamic webpage, a webpage whose contents are automatically created and tailored to the user’s request.



File Transfer Protocol (FTP), a protocol used to provide file transfer capabilities across the Internet.



Electronic mail (email), messages transmitted electronically among computers on the Internet.



News and discussion group services, specialized services that allow the creation of virtual communities in which users exchange messages regarding specific topics.

1114. What does business enabling do? What services layer does it provide? Give six examples of business-enabling services. Answer: The Internet Basic Services (IBS) only form a foundation on which to run a basic website. However, IBS does not provide the services required for even elementary business transactions. The business-enabling layer provides the additional services to better support business transactions.

1012

Business transactions require accountability, reliability, authentication, trust, fidelity, and performance. These requirements are supported through hardware and software components that work together to provide the additional functionality not provided by the other layers. Table I.3 describes services that are used to enhance websites by providing their users the ability to perform searches, authenticate and secure business data, manage website contents, and so on. The list in Table I.3 is not exhaustive—technological advances enable new services, which in turn are used to enable additional business services. The business-enabling services are search services, security, site monitoring and analysis, load balancing, personalization, web development, database integration, transaction management, messaging, and support for multiple devices. The services provided by this layer are built on top of the Internet Basic Services to provide the additional services that are required to support business transactions. 1115. What is the definition of security? Explain why security is so important for e-commerce transactions. Answer: In an e-commerce context, security refers to all the activities that are associated with the protection of the data and other components against accidental or intentional (probably illegal) use by unauthorized users. Privacy deals with the rights of individuals and organizations to determine the “who, what, when, where and how” authorization to use his/her data. Providing security is a major concern of e-commerce. Companies spend millions of dollars annually on hardware and software equipment to protect their own data (including personal customer data) and property against criminal activities. For e-commerce to be successful, it must ensure the security and privacy of all business transactions and the data associated with those transactions. 1116. Give an example of an e-commerce transaction scenario. What three things should security be concerned with in this e-commerce transaction? Answer: E-commerce data must be secured from a transaction’s beginning to its conclusion. Note, for example, the following transaction sequence: 

A customer buying products online from home, enters order and credit card information in a merchant’s web page.



The information travels from the customer’s computer over the Internet to the merchant’s web server.



The merchant’s web server receives the order and credit card data and stores these data in a database.



The web server sends the order confirmation and shipping information back to the client.



The seller uses a third-party shipping company to deliver the products to the customer.



The seller uses a third-party payment processing company to settle payment.



The shipping company delivers the product to the customer.



At the end of the month, the customer receives his/her credit card statement— possibly electronically.

1013



The customer pays the credit card bill, either by writing and mailing a check or through the use of electronic funds transfer.



1117. You are hired as a resource security officer for an e-commerce company. Briefly discuss what technical issues you must address in your security plan. Answer: The security plan should include issues such as physical security of the computing environment and protection of the data in the databases. Online transaction security must also cover issues such as authentication, the use of digital certificates to ensure the identity of the parties involved in business transactions, and the use of public-key encryption with digital signatures to guarantee that the data traveling on the Internet cannot be tampered with or read by unauthorized parties. The security plan should include issues such as resource security and transaction security. Transaction security includes encryption methods at the transport level, such as S-HTTP and SSL. Resource security deals with protecting the resources (hardware and software) that enable the conduct of e-commerce—servers, routers, operating systems, and applications— against threats posed by hackers, viruses, theft, and so on.

1014

Classify each site (B2B, B2C, and so on).

g. What information did you collect? Was the information useful? Why or why not? h. What decision(s) did you make based on the information you collected? The format is provided in the answers to Problems 2–9. Naturally, the websites shown here change periodically, so use the examples as a general guide. Also, keep in mind that there are many sites beyond the sites we have shown in the answers to Problems 2–9.

1015

1118. Research—and document—the purchase of a new car. Based on your research, explain why you plan to buy this car. Answer:

A CarGurus.com

B2C

Car models, features, comparisons, ratings, evaluations, dealer prices on new and used models.

Made an informed decision. Found the car most affordable, with best ratings and features. Capability of comparing car models and features.

Information was very useful. CarMax.com

B2C

AutoTrader.com

B2C

1016

1119.

Research—and document—the purchase of a new house.

Answer:

A Century21.com

Zillow.com

ColdwellBanker.com

C2B Searched B2C multiple homes based on my search criteria. Websites provide information such as school systems, nearby attractions, city guides, comparable B2C house prices, and financing options.

D I was able to determine which home I could afford, and found a home within price range, locations, and features desired.

1017

1120. You are in the market for a new job. Search the web for your ideal job. Document your job search and your job selection. Answer:

A Monster.com

C2B Search job B2C openings by location, industry, and salary range. Research employers, salary comparison.

Indeed.com

Option to post resume C2B and obtain resume B2B advice. B2C Resume writing service. Interview tips.

D Was able to fine-tune a job search. Applied for jobs that matched my qualifications and experience. Was able to research companies and to compare salaries in different geographic regions.

Salary calculators, relocation information and services. Moving information.

1018

1121. You need to do your taxes. Download IRS form 1040 and look for online tax processing help, documenting your search. Answer:

A IRS.gov

Hrblock.com

G2C

Obtained information about latest tax laws, downloaded tax return forms. Learned how to file a tax return electronically.

Searched for tax advisors within my area.

B2C

Found information about IRS red flags for auditing. Found tax advice in many different areas, determined how much to save in multiple retirement instruments, etc.

Learned how retirement instruments can be used to save for retirement and to reduce the tax burden. Used many different tax calculators to estimate how much I will pay in taxes.

1019

1122. Research the purchase of a 20-year level term life insurance policy and report your findings. Answer:

A Insurance.com

B2B

Searched for policies by state, age, smoking status, and amount.

Could find best possible deals in no time at all.

B2C

Obtained policy details, latest prices, and providers. Intelliquote.com

B2B B2C

TheZebra.com

Compared insurance company premiums and ratings companies.

B2C

1020

1123.

Research—and document—the purchase of a new computer.

Answer:

Dell.com

B2C

Searched for computers, compared prices, options, and warranty information.

PCPartPicker.com

B2C

IBuyPower.com

B2C

1021

1124. Vacation time is almost here! Research—and document—the destination(s) and activities of next summer’s vacation. Answer:

A Travelocity.com

PriceLine.com

B2C Found information in vacation packages with all-inclusive features, such as air fare, hotel accommodations, and guided-tour details.

I was able to search for (and find) multiple tour vacation packages that fit my criteria.

I was able to B2C perform searches by destination, travel dates, tour operators, etc.

The information provided was also useful to completely plan the vacation and do all booking online.

I could compare prices for tours and hotels. I was also able to find special deals and offers to various destinations. Expedia.com

B2C I could do all the trip planning online and get additional information such as currency conversions, city guides, comments from past-users, weather information, etc.

1022

1125. You have some money to invest. Research—and document—mutual funds information for investment purposes. Report your investment decision(s) based on the research you conduct. Answer:

A Vanguard.com

Fidelity.com

Morningstar.com

B2C

Obtained information about investing in the stock market, mutual funds markets, and markets for other instruments.

Was able to determine the best investment strategy to fit my risk tolerance.

B2C

Search yielded comparative fund information such as returns, expense ratios, ratings, market capitalization, family type, price history, etc.

Obtained all critical information appropriate to my investment needs. Enabled me to manage all of my investments online.

Obtained list of best-rated funds according to search criteria.

1023

Different database connectivity technologies.



Multitier architecture for database development.



How web-to-database middleware is used to integrate databases with the Internet.

Figure DJ.1 RobCor Teacher Menu

Figure DJ.2 RobCor Problem Solutions Menu

1024

If you click on the View link for the first row shown in Figure DJ.2, you will see the code for the rc_u0.cfm script in Figure DJ.3.

1025

Figure DJ.3 rc-u0.cfm Script Code Sample

1026

ANSWERS TO REVIEW QUESTIONS 1126.

1127.

Describe the basic services provided by the ColdFusion web application server. Answer: The ColdFusion Web Application Server provides the following services (among others): 

Integrated Development Environment.



Session management with support for persistent application variables.



Security and authentication.



A computationally complete programming language (commands and functions) to represent and store business logic.



Access to other services: FTP, SMTP, IMAP, POP, and so on.

1128. Discuss the following assertion: The web is not capable of performing transaction management. Answer: Note the discussion in Section J.2c, Transaction Management. The concept of database transactions is foreign to the web. Remember that the web’s request-reply model means that the web client and the web server interact by using very short messages. Those messages are limited to the request for and delivery of pages and their components. (Page components may include pictures, multimedia files, etc.) The dilemma created by the web’s request-reply model is that: 

The web cannot maintain an open line between the client and the database server.



The mechanics of recovery from incomplete or corrupted database transactions require the client to maintain an open communications line with the database server.

1129. Transaction management is critical to the e-commerce environment. Given the assertion made in Question 3, how is transaction management supported? Answer: Clearly, creating mission-critical web applications mandates support for database transaction management capabilities. Given the just-described dilemma, designers must ensure proper transaction management support at the database server level. Many web-to-middleware products provide transaction management support. For example, ColdFusion provides this support through the use of its CFTRANSACTION tag. If the transaction load is very high, this function can be assigned to an independent computer. By using that approach, the web application and database servers are free to perform other tasks, and the overall transaction load is distributed among multiple processors.

1027

1130. Describe the webpage development problems related to database parent/child relationships. Answer: When the web is used to interact with databases, the application design must take into account the fact that the HTML web forms cannot use the multiple data entry lines that are typical of parent/child (1:M) relationships. Yet those 1:M relationships are crucial in ecommerce. For example, think of order and order line, or invoice and invoice line. Most end users are familiar with the conventional GUI entry forms that support multitable (parent/child) data entry through a multiple-component structure composed of the main form and a subform. Using such main form/subform forms, the end user can enter multiple purchases associated with a single invoice. All data entry is done on a single screen. Unfortunately, the HTML-only web environment does not support this very common type of data entry screen. As illustrated in the ColdFusion script examples, the web can easily handle single-table data entry. However, when multitable data entries or updates are needed—such as order with order lines, invoice with invoice lines, and reservation with reservation lines— the web falls short. Although implementing the parent/child data entry is not impossible in a web environment, its final outcome is less than optimum, usually counterintuitive, less userfriendly, and prone to errors. To see how the web developer might deal with the parent/child data entry, let’s briefly examine how you might deal with the ORDER and ORDER_LINE relationship used to store customer orders. Using an applications middleware server such as ColdFusion to create a web front end to update orders, one or more of the following techniques might be used: 

Design HTML frames to separate the screen into order header and detail lines. An additional frame would be used to provide status information or menu navigation.



Use recursive calls to pages to refresh and display the latest items added to an order.



Use stored procedures or triggers to move the data from the temporary table or array to the master tables.

1028

Figure PJ.1 The User Management Menu

1029



To add records: 

A blank form is shown using form field names that match the table field names.



The data is collected and sent to the next script that uses CFINSERT to add the record.

To edit records: 

A script uses a CFQUERY command to read the existing records, then the user selects the record to edit.



Finally, this script calls another script that uses the CFUPDATE command to update the data using the form fields with names matching the table’s fields.

To delete records: 

A script uses a CFQUERY command to read the existing records, then the user selects the record to delete and passes the primary key of the record to delete to the next script.



The next script reads the data, presents the data in a form, and verifies the data can be deleted.



Finally, this script calls another script that uses the CFDELETE command to delete the record using the table’s primary key passed by the previous script.

The following pages list scripts that are required to add, edit, delete, and search records.

1030

Figure PJ.2a rc-ua1 Script Code Sample

Lines 2 to 4 in the rc-ua1.cfm script generates a list of departments to be used in the HTML select statement shown in lines 45 to 49 in Figure PJ.2b.

1031

Figure PJ.2b rc-ua1—The Add User code (continued)

The output of the rc-ua1.cfm script is shown in Figure PJ.2c below.

Figure PJ.2c The Add User Web Form

1032

Figure PJ.3a The rc-ua2.cfm Add User code

The output of the rc-ua2.cfm script is shown in Figure PJ.3b.

Figure PJ.3b The Add User Results

1033

Figure PJ.4a The Edit User Form

1034

Figure PJ.4b rc-ue1.cfm Edit User Script (partial)

Let’s study this script to understand it better: 



Lines 6 to 8 use the CFQUERY Cold Fusion command to execute the SELECT SQL statement that reads the user data for the selected user.



1035

Figure PJ.5a The Edit User Results Screen

Figure PJ.5b rc-ue2.cfm Edit User Code (partial)

1036

Figure PJ.6a The Delete User Form

1037

Figure PJ.6b rc-ud1.cfm Script (partial)

Figure PJ.6c rc-ud1.cfm Script (partial)

The rc-ud2.cfm script is shown in Figure PJ.7a. Notice the use of the CFQUERY Cold Fusion command in lines 7 to 8 to execute the DELETE SQL statement to delete the selected user record.

1038

Figure PJ.7a rc-ud2.cfm Script Code

The rc-ud2.cfm script produces the output shown in Figure PJ.7b.

Figure PJ.7b The Delete User Results Screen

Figure PJ.8a The Search User Form

Figure PJ.8b rc-us1.cfm Script Code (partial)

1039

Clicking on the Search button invokes the rc-us2.cfm script. The rc-us2.cfm script produces the output shown in Figure PJ.9a.

1040

Figure PJ.9a The Search User Results Screen

The rc-us2.cfm script (partial code) is shown in Figure PJ.9b.

1041

Figure PJ.9b rc-us2.cfm Script Code

Notice the use of the CFQUERY command and SELECT statement in lines 5 to 11. This technique is commonly used to concatenate conditional statements to a query. The query will execute and return zero, one, or many rows. 1131. Create ColdFusion scripts to search, add, edit, and delete records for the INVTYPE table in the RobCor data source. Answer: This exercise series requires the student to create the data manipulation scripts for the INVTYPE table. The logic used in these scripts is the same as the one shown in Problem 1 above. Please refer to the solution scripts found on the Instructor’s Resource website. The main difference is that the scripts use the table INVTYPE and their respective fields in the forms. The scripts use Cold Fusion statements to insert, update, and delete rows in the INVTYPE table. The main data manipulation commands are: <CFINSERT DATASOURCE=”RobCor” TABLENAME=”INVTYPE”> <CFUPDATE DATASOURCE=”RobCor” TABLENAME=”INVTYPE”> <CFQUERY DATASOURCE=”RobCor”> DELETE FROM INVTYPE WHERE …… </CFQUERY> A sample data entry screen for the INVTYPE record is shown in Figure PJ.10.

Figure PJ.10 INVTYPE Add Record Screen

1042

1132. Create ColdFusion scripts to search, add, edit, and delete records for the VENDOR table in the RobCor data source. Answer: This exercise series requires the student to create the data manipulation scripts for the VENDOR table. The logic used in these scripts is the same as the one shown in Problem 1 above. Please refer to the solution scripts found on the Instructor’s Resource website. The main difference is that the scripts use the table VENDOR and their respective fields in the forms. The scripts use Cold Fusion statements to insert, update, and delete rows in the VENDOR table. The main data manipulation commands are: <CFINSERT DATASOURCE=”RobCor” TABLENAME=”VENDOR”> <CFUPDATE DATASOURCE=”RobCor” TABLENAME=”VENDOR> <CFQUERY DATASOURCE=”RobCor”> DELETE FROM VENDOR WHERE …… </CFQUERY> A sample data entry screen for the VENDOR record is shown in Figure PJ.11.

Figure PJ.11 VENDOR Add Record Screen

1043

1133. Modify the insert scripts (rc-5a.cfm and rc-5b.cfm) for the DEPARTMENT table so that the users who can be manager of a department are only those who belong to that department. Answer: This exercise series requires the student to create the data manipulation scripts for the DEPARTMENT table. The logic used in these scripts is the same as the one shown in Problem 1 above. Please refer to the solution scripts found on the Instructor’s Resource website. The main difference is that the scripts use the table DEPARTMENT and their respective fields in the forms. The scripts use Cold Fusion statements to insert, update, and delete rows in the DEPARTMENT table. The main data manipulation commands are: <CFINSERT DATASOURCE=”RobCor” TABLENAME=”DEPARTMENT”> <CFUPDATE DATASOURCE=”RobCor” TABLENAME=”DEPARTMENT”> <CFQUERY DATASOURCE=”RobCor”> DELETE FROM DEPARTMENT WHERE …… </CFQUERY> Please note that the data management commands for this exercise are included in the ColdFusion Examples scripts. Note that the insert script (rc-5a.cfm and rc-5b.cfm) for the DEPARTMENT table only lists the users that can be manager of a department. The key to this script is in the SELECT SQL statement. (Note the condition used in the WHERE clause. This condition lists only those users who are not already managers of a department.) <CFQUERY NAME="USRLIST" DATASOURCE="RobCor"> SELECT USR_ID, USR_LNAME, USR_FNAME, USR_MNAME FROM USER WHERE USR_ID NOT IN (SELECT USR_ID FROM DEPARTMENT WHERE USR_ID > 0) ORDER BY USR_LNAME, USR_FNAME, USR_MNAME </CFQUERY> The rc-5a.cfm script produces the output shown in Figure PJ.12.

1044

Figure PJ.12 The Department Data Entry Screen

Script rc-5b.cfm script uses a CFINSERT tag to add the data to the database. The script rc5b.cfm output is shown in Figure PJ.13.

Figure PJ.13 rc-5b.cfm—the Department Insert Query

1134. Create an Order data-entry screen, using the ORDERS and ORDER_LINE tables in the RobCor data source. To do this, you can use frames and other advanced ColdFusion tags. Consult the online manual and review the demo applications.

1045

Create a multiple frame page that will have one frame for the ORDER header information and another for the ORDER_LINE data entry. In this case, the ORDER data will have to be entered, validated, and saved first, before the ORDER_LINE frame is shown or accessed. Once the main ORDER data are saved, the second frame can be used to enter the ORDER_LINE rows. Both frames use buttons that will enable the system to accept data entry and to perform validation checks. This solution is not particularly well suited to a commercial e-commerce environment for two good reasons: 3. The end-user navigation among frames is awkward and is likely to be rejected by end users. 4. Keeping both frames synchronized is difficult and, unless the coding is particularly robust, is prone to failure.

1046



The rc-oa1.cfm script (found in the Instructor’s Resource website) produces the output shown in Figure PJ.14.

Figure PJ.14 The ORDER Data Entry Screen

1047

ANSWERS TO REVIEW QUESTIONS 1135. What is the difference between a replacement update and an operator update in MongoDB? Answer: In MongoDB a replacement update will replace the entire document being updated. If the existing document has key:value pairs that are not included in the update command, then those pairs are lost. Only the pairs specified in the update command will exist in the replaced version of the document. With an operator update, the existing document is unchanged except for the changes specified in the update command. Pairs not included in the update command are not affected. 1136.

1048

1137.

1138.

1139. Explain the difference in processing when using an explicit and and an implicit and with MongoDB. Answer: With both forms of logical and the DBMS must apply criteria to a document to determine if the document should be included in the results. An explicit and, using the $and operator, will determine that a document should not be included and stop applying criteria to that document as soon as one of the criteria evaluates to FALSE for that document. An implicit and will apply all criteria to the document before determining if the document should be included or not in the results. As a result, explicit and tends to perform better in most cases.

1049

fname: "Rachel", lname: "Cunningham", display: "Rachel Cunningham", type: "student", age: 24

} ); 1140. Modify the document entered in the previous question with the following data. Do not replace the current document. Answer: Rachel has checked out two books on January 25, 2018. The id of the first checkout is “95000” The first book checked out was book number 5237

1050

"_id": ObjectId("5a45c23f395ff183e78d9c17")},

{

$set:

{checkouts:

[ {

“id”: "95000", “year”: "2018", “month”: "1", “day”: "25", “book": “5237", "title": "Mastering the database environment", “pubyear”: "2015", “subject”: "database"

}, {

"id": "95001", "year": "2018", "month": "1", "day": "25", “book”: "5240", “title”: "iOS Programming", “pubyear”: "2015",

1051

“subject”: "Programming" } ] } } ) 1141. Write a query to retrieve the _id, display name and age of students that have checked out a book in the cloud subject. Answer: db.patron.find({"checkouts.subject":"cloud"}, {display:1, age:1}) 1142. Write a query to retrieve only the first name, last name, and type of faculty patrons that have checked out at least one book with the subject “programming”. Answer: db.patron.find({type: "faculty", "checkouts.subject":"programming"}, {fname:1, lname:1, type:1, _id:0}) 1143. Write a query to retrieve the documents of patrons that are faculty and checked out book 5235, or that are students under the age of 30 that have checked out book 5240. Display the documents in a readable format. Answer: db.patron.find({$or: [ {type: "faculty", "checkouts.book":"5235"}, {type: "student", "checkouts.book":5240, age: {$lt:30}} ] } ).pretty()

1052

1144. Write a query to display only the first name, last name, and age of students that are between the ages of 22 and 26. Answer: db.patron.find({ type:"student", $and: [{age: {$gte:22}}, {age: {$lte:26}} ] }, {fname:1, lname:1, age:1, _id:0} )

NOTE Most of this appendix and all of the end-of-appendix problems require the use of the Ch14_FCC.txt file. The Appendix provides instructions on how to import this file into a Neo4j graph.

ANSWERS TO REVIEW QUESTIONS 1145. Explain the difference between using the same variable name and different variable names when matching multiple patterns in Neo4j. Answer: Within a given command, all references to a variable are treated as references to the same object (node, edge, or path). Therefore, if the same variable is used in multiple patterns in the same command, then the same node or edge will be required to match both patterns. If different variable names are used, then the node or edge does not have to be the same node or edge in both patterns. 1146. What is the difference between using WHERE and embedding properties in a node when creating a pattern in Neo4j?

1053

1054

ANSWERS TO PROBLEMS For the following problems, use the Food Critics Club (FCC) graph database that was created and used earlier in the text for use with Neo4j. Create a node that meets the following requirements. Use existing labels and property names as appropriate. Answer: The node will be a member, and should be labeled as such, with member id 5000. The member’s name is “Abraham Greenberg”. Abraham was born in 1978, and lives in the state of “OH”. Abraham’s email address is agreen@nomail.com, and his username is agberg. Create (:Member {mid:5000, fname: “Abraham”, lname: “Greenberg”, birth: 1978, state: “OH”, email: “agreen@nomail.com”, username: “agberg” } ) 1147. Create a restaurant node with restaurant id is 10000, the name “Hungry Much”, and located in Cobb Place, KY. Answer: Create (:Restaurant {rid: 10000, name: “Hungry Much”, state: “KY”, city: “Cobb Place”}) 1148. Update the “Hungry Much” restaurant created above to add the phone number “(931) 555-8888”, and a price rating of 2. Answer: Match(r :Restaurant {name: “Hungry Much”}) Set r.phone = “(931) 555-8888”, r.price = 2 1149. Create a REVIEWED relationship between the member created above and the restaurant created above. The review should rate the restaurant as a 5 on taste, service, atmosphere, and value. Answer: © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

1055

Match (abe :Member {fname: “Abraham”, lname: “Greenberg”}), (hungry :Restaurant {name: “Hungry Much”}) Create (abe) -[rev :REVIEWED {taste: 5, service: 5, atmosphere: 5, value: 5}]-> (hungry) 1150. Create a REVIEWED relationship between member Frank Norwood and the restaurant created above. The review should rate the restaurant as a 4 on taste, service, and value, and rate the restaurant as a 2 on atmosphere. Answer: Match(frank :Member {fname: “Frank”, lname: "Norwood”}), (hungry :Restaurant {name: “Hungry Much”}) Create(frank) -[rev :REVIEWED {taste:4, service: 4, atmosphere: 2, value: 4}]-> (hungry) 1151. Write a query to display member Frank Norwood and every restaurant that he has rated as a 4 or above on value. Answer: Match (frank :Member {fname: “Frank”, lname: “Norwood”}) -[rev :REVIEWED]-> (rest :Restaurant) Where rev.value >= 4 Return frank, rest 1152. Write a query to display cuisine, restaurant, and owner for every “American” or “Steakhouse” cuisine restaurant. Answer: Match (c :Cuisine) <-- (rest :Restaurant) <-[:OWNS]- (o :Owner) Where c.name = “American” OR c.name = “Steakhouse” Return c, rest, o 1153. Write a query to return the shortest path based only on reviews between members Abraham Greenberg and Herb Christopher. Answer: Match p = shortestPath((abe :Member {fname: “Abraham”, lname: “Greenberg”}) -[:REVIEWED *](herb :Member {fname: “Herb”, lname: “Christopher”})) Return p

1056

1057