Relational Database And Entity-Relationship Modeling: A Primer by Prabhudev Konana, Ph.D

Page 1

Relational Database And Entity-Relationship Modeling: A Primer

ď›™ Konana, 2000

Prabhudev Konana, Ph.D. Assistant Professor The Graduate School of Business The University of Texas at Austin Austin, TX 78712


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

TABLE of CONTENTS 2.1 Relational Model....................................................................................................................... 2 2.1.1 Relational Keys.................................................................................................................... 2 2.2 Relational Integrity ..................................................................................................................... 5 Example 2: ................................................................................................................................... 8 2.3 Normalization ............................................................................................................................ 9 2.3.1 Implications of Normalization........................................................................................... 12 2.4 Questions to Ponder (Relational Model and Normalization)!.................................................... 14 2.4.1 What to focus on?.............................................................................................................. 14 2.4.2 True/False questions ......................................................................................................... 15 2.4.3 Practice problem for normalization.................................................................................... 15 2.5 Database design steps............................................................................................................... 17 2.5.1 Entity-Relationship Model (E-R model)............................................................................. 17 2.5.2 Example ............................................................................................................................ 21 Other E-R constructs: Weak entity......................................................................................... 23 Recursive Relationships.......................................................................................................... 23 2.5.2 Extended or Enhanced E -R model........................................................................................ 24 2.6 Questions to Ponder – Entity-Relationship modeling! .............................................................. 26 2.6.1 Examples........................................................................................................................... 26 2.6.2 True/False questions ......................................................................................................... 26 2.7 Example of E-R modeling using ERWin (OPTIONAL) .......................................................... 27

1


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

2.1 Relational Model The relational model is based on (relational) set theory in mathematics. Most commercial database products introduced in the last two decades are relational model (e.g., Oracle, Informix, DB2, Sybase, Ingres, MS SQL Server). A relation is nothing but a “table” (as perceived by us) that consists of tuples (rows) and attributes (columns). A relation or a table has certain properties: (1) A relation consists of a set of attributes with unique names (i.e., attribute names cannot repeat in the SAME table). It doesn’t matter in which order the attributes are listed. (2) The values of attributes fall into a known domain or legal values (For example, SSN consists of values from 000000000 to 999999999, or Age is between 0 and 125). These domains enforce some degree of data integrity (i.e., users don’t enter meaningless values. For example, the number of students in a class cannot be -20 or 3.26; it has to be a positive integer). Note that for a given attribute, the values should come from the same domain. (3) Each tuple (row) in the database is unique. That is, records do not repeat. To avoid records from repeating, we have what is called "Primary Key." The primary key uniquely determines each record. That is, given a value for a primary key we can uniquely determine a record. For example, SSN is the primary key for STUDENT table and there is exactly one record that matches a given student’s SSN. The primary key cannot take NULL value (means a record cannot exist without a unique identifier. For example, SSN is the primary key for EMPLOYEE table. One cannot enter an employee’s information without his/her SSN into the EMPLOYEE table). 2.1.1 Relational Keys There are many types of keys that can be defined in the relational model. These have significant importance in maintaining data consistency and correctness in the database. We will discuss only the candidate key, primary key, alternate key, and foreign key. (We will ignore the concept of super key. Anyone interested can refer any traditional textbook on databases for more details.) a. Candidate Keys: A minimal set of attributes in a table that uniquely identifies a record. When there is more than one attribute in the candidate key, it is called composite key. b. Primary Key: A candidate key that is chosen to represent a record uniquely. That is, a table may consist of many candidate keys, but ONLY ONE can be selected as a primary key. Note that a primary key can have more than one attribute. When there is more than one

2


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

attribute we say that the primary key is a composite primary key. One of the common mistakes students have made is to refer to one of the attributes in the composite key as primary key. Please note that when there is more than one attribute in the primary key ALL ATTRIBUTES TOGETHER are called the primary key since they together define the record uniquely. c. Alternate Keys: Any candidate key that is not selected to be a primary key can be an alternate key. Most databases allow us to define alternate keys by way of UNIQUE constraint. d. Foreign Keys: An attribute or a set of attributes of one table that is matched to candidate keys of another table (or even the same table). That is, a “child” table may reference the “parent” table for appropriate attribute values. Please note that the corresponding attributes in two different tables must have the same data type and domain (see example below). The above relational keys are explained with a example below: Example: 1 Consider a commercial bank. They have many tables in their main database. Consider two tables related to bank customers (table name: CUSTOMER) and their accounts (table name: ACCOUNT). For simplicity, we will not worry about different types of accounts or joint ownership of accounts. The example is just for demonstration purposes. We could potentially have a completely different table structure than what is shown here. CUSTOMER table: The attributes are Cust_SSN, Cust_Name, Cust_Address, Cust_State and Cust_Phone. (The attribute list is not meant to be exhaustive). ACCOUNT table: The attributes are Acct_Number, Acct_Balance, and Acct_Holder. In the CUSTOMER table, Cust_SSN can be a candidate key since it uniquely identifies a record. That is, there can be no two customers with the same SSN. We can have another attribute or a collection of attributes to be candidate keys. The figure below shows one another candidate key which is a composite candidate key of Cust_Name and Cust_Phone. That is, when the values of these attributes are considered together they uniquely determine the record. However, each attribute

3


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

by itself cannot determine a record uniquely. For instance, we can have two customers with the same phone number. We assume that there can be no two customers with the same phone number and name. We choose Cust_SSN to be the primary key since it is also a candidate key. We can now define the composite candidate key (Cust_Name and Cust_Phone) to be an alternate key. There is no need for a foreign key since this table acts as a parent table for any other table that uses Cust_SSN. Attributes

Primary key

Candidate key

Cust_SSN

Cust_Name

Cust_Address

111-11-1111

Konana

CBA 3.246, UT, Austin

Cust_State Cust_Phone

TX

512-471-5219

Row (tuple)

Consider the ACCOUNT table with the three attributes: Acct_Number, Acct_Balance and Acct_Holder. In this table Acct_Number uniquely determines each record and, therefore, it is a candidate key. Please note that we made an assumption earlier that there is no joint ownership of any account. Hence, Acct_Holder, which is nothing but the customer SSN, could also be the candidate key. Let’s choose Acct_Number to be the primary key. Assume there are two records in this table. The account numbers correspond to customers in the CUSTOMER table. In order to make sure that there are no account numbers for account holders who are not in the CUSTOMER table we need to enforce some consistency. Otherwise, something unwanted can happen. If you notice the ACCOUNT table, the second record belongs to a customer with SSN 222-22-2222. However, in the table CUSTOMER there is no customer with SSN 111-111111. Therefore, we have an account owned by customer with SSN 222-22-2222, but we do not know who the customer is! Acct_Number

123456 124578

Acct_Balance

50.32 529000.9

4

Acct_Holder

111-11-1111 222-22-2222


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

In order to prevent unanticipated inconsistency in your database we can define Acct_Holder in the ACCOUNT table to be a foreign key referencing the attribute Cust_SSN in the CUSTOMER table. The reason is any entry in Acct_Holder must be listed (or referenced) in the CUSTOMER table first. Therefore, CUSTOMER table is the “parent” table and ACCOUNT table is the “child” table. Since Acct_Holder refers to a key in another table it is called the foreign key. Primary key

CUSTOMER table Cust_SSN

Cust_Name

Cust_Address

111-11-1111

Konana

CBA 3.246, UT, Austin

Primary key

ACCOUNT Table

Acct_Number

123456 124578

Acct_Balance

50.32 529000.9

Cust_State Cust_Phone

TX

512-471-5219

Foreign key Acct_Holder

111-11-1111 222-22-2222

Not a valid entry since there is no 222-22-2222 in the CUSTOMER table

2.2 Relational Integrity Data integrity is critical for businesses. Anyone who has worked with databases will know that a countless number of hours are spent to make data consistent. Let’s take the simple example of “date.” Date can be stored in any number of ways: mmyydd, ddmmyy, mmddyyyy, mm-dd-yy, yyyydd-mm and so on. If the date stored is not consistent then the queries executed on the database may compute incorrect results or may not retrieve the correct set of records. In 1991, an ambitious project to computerize faculty research database in a large university was shelved after it was realized that data input was highly inconsistent despite the large number of built-in checks. For instance, the journal Communications of the ACM was input in at least 10 different ways (e.g., CACM, The Communications of the ACM, Communication of the ACM, Communications of ACM, comm. of ACM, etc. The effort to provide a pull-down menu was also rejected as users had little patience to scan the list of over 200 journal names). The above is just a simple example of what can happen within the database that compromises data integrity.

5


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

Data integrity constraints refer to the accuracy a nd correctness of data in the database. They provide a mechanism to maintain data consistency when we perform an UPDATE, INSERT or DELETE operation. There are different types of data integrity constraints: 1. Entity integrity 2. Null integrity 3. Domain integrity 4. Referential integrity 5. Business constraints ENTITY integrity implies that a primary key cannot accept nulls (If the primary key is a composite key, that is, more than one attribute forms a primary key, then no component of a primary key can accept nulls). In the CUSTOMER table example, Cust_SSN can never be null. If it is null then that record should not exist. This is similar to saying some nameless person is teaching the database class. Every person must have a name (soon after birth J)! Consider the following set of tables for a university database (FACULTY, SECTION and COURSE): SECTION Table

COURSE Table

Course_ID MIS380N.1 MIS374

Course_name ITM Systems Analysis

FACULTY Table

Faculty_SSN 111-11-1111 222-22-2222 333-33-3333 444-44-4444 555-55-5555

Course_id * MIS380N.1 MIS380N.1 MIS380N.1 MIS380N.1 MIS380N.1 MIS374

Section * 1 2 3 4 5 2

Faculty_SSN 111-11-1111 222-22-2222 222-22-2222 333-33-3333 444-44-4444 555-55-5555

Faculty_Name Konana Barua Ruefli Leibrock Jordan

The attributes Course_ID and Section in SECTION table together make a composite primary key. In this SECTION table, no part of the primary key should be null. However, in many cases we may not know the value of an attribute temporarily. If that attribute is part of the primary key then we cannot insert that record into the table. However, we may have

6


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

situations where some non-primary key attribute may be left blank. This gives raise to NULL integrity. Null implies that the data value is not known temporarily. It does not represent “blank spaces” or “zero.” Remember than zero or spaces are characters and may not be a valid value for an attribute (e.g., Age cannot be zero). This brings us to the next integrity that is DOMAIN integrity. We have discussed domain integrity as part of the relational model definition. Each attribute must have values derived from a valid range. For instance, SSN cannot have any letter from the alphabet. You could define a hyphen after three digits and next two digits. Inventory of parts in a manufacturing firm database can never be negative. REFERENTIAL integrity is related to the foreign key discussed earlier. Referential integrity is best explained with an example. Consider the SECTION table above. The attributes Course_ID and Faculty_SSN are foreign keys referring COURSE table and FACULTY table, respectively. Assume Course_ID MIS380N.1 is entered in SECTION table but has no corresponding entry in the COURSE table. Similarly, Faculty_SSN 111-11-1111 is entered in SECTION table but has no corresponding entry in the FACULTY table. This creates many problems. For example, you will never know the name of the course with Course_ID MIS380N.1 or the name of the instructor who teaches MIS380N.1 and MIS382. To avoid such problems, we earlier defined foreign keys. Through foreign keys we can enforce referential integrity. There are many types of referential integrity constraints: Cascade, Restrict and Nullify. If any changes are made in the COURSE or FACULTY table then the corresponding entries in the SECTION table are also changed. This is called Cascade on Update. If entries are deleted in the COURSE or FACULTY table then corresponding entries in the SECTION table are also deleted. This is called Cascade on Delete. If you don’t want to delete entries in COURSE or FACULTY table if there are any corresponding entries in SECTION table then we can define Restrict on Delete. Similarly we can enforce Restrict on Update. Note that if there are no corresponding entries in the child table then the parent record can be deleted/updated. There is another type of referential integrity called Nullify on update or delete. That is, if the records in COURSE or FACULTY table are deleted then corresponding entries in the SECTION table are set to null. For instance, in FACULTY table if we delete Konana (SSN = 111-11-1111) then the Faculty_SSN for the corresponding entries in the SECTION table (1st , 5 th and 6th record) are set to null. In the above example, we cannot set Course_ID in the SECTION table null since it is part of the primary key.

7


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

Recently new types of constraints have been introduced to enforce and change complex business rules that typically required substantial application code in the past. These are called business constraints expressed as rules that can be enforced fairly easily with triggers. These features have dramatically reduced cost of maintenance, and improved responsiveness and database management. Some examples are: •

If # of items of Disk drives < 100 then update Status = “Low” (and send an invoice to ABC vendor for additional shipment)

If MIS380N.1 is not taken then deny registration for MIS382

Example 2: Consider a university database with three tables - STUDENT, COURSE and GRADE Table 1: STUDENT (SSN, Name, address) Table 2: COURSE (Course_ID, C-Name,..) Table 3: GRADE (Course_ID, SSN, Grade). SSN in the table GRADE is a foreign key that “references” (note the word “REFERENCES”) SSN in STUDENT table. The idea behind this is that a grade cannot exist for a student when that student is not in the STUDENT table. NOTE - FOREIGN KEY implies that the attribute is a CANDIDATE (PRIMARY or ALTERNATE) KEY in another table. This has a number of implications. For example: 1. What happens if a Student record in deleted from STUDENT table? 2. What happens if the SSN of a student is changed (for some reason)? 3. What happens if the Course_ID is changed for a particular course? If we are not careful, we can end up with data inconsistency. For example, a student with SSN = “111-11-1111” is deleted from STUDENT table. What happens to his/her records in the GRADE table? If we don’t take any actions, those records will remain in the GRADE table, but we can never know who the student is. Therefore, we define explicitly a set of rules while Updating or Deleting RESTRICT, CASCADE or NULLIFY (an update or delete). As the terms suggest, when we delete (update) a student’s record (or SSN) from the STUDENT table and if one or more records

8


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

exist in the GRADE table then delete (or update) is RESTRICTED (don’t allow) or CASCADED (delete or update all records) or NULLIFIED (make those entries blank).

2.3 Normalization Normalization theory deals with reducing undesirable properties, such as redundancy, and minimizing INSERT, DELETE and UPDATE anomalies (anomalies are abnormalities) of the database. Some of the anomalies are a result of redundancy. Depending on the number of (or instances) of anomalies, we have various levels (degrees) of normalization. These anomalies occur because the databases designed may have certain inherent problems. The degree of normalization is defined by normal forms. The normal forms, in an increasing level of normalization, are first normal form (1NF), second normal form (2NF), 3NF, Boyce Codd Normal form (BCNF), 4NF and 5NF. In general, 3NF is considered good enough. In certain instances, a lower level (i.e., 1NF or 2NF) of normalization is desirable (e.g., where queries take enormous time to execute). In order to understand normalization, one needs to understand how attributes are related (or what is called functional dependencies). For example, SSN determines a person’s Name. We say that Name is functionally dependent on SSN; typically represented by SSN → Name. Example: Consider a database that is intended to keep track of faculty and courses offered. An extremely simple way of maintaining data is given below: Course_ID*

Section*

Course_Name

Faculty_SSN

Faculty_Name

MIS380N.1

1

ITM

111-11-1111

P. Konana

MIS380N.1

2

ITM

222-22-2222

A. Barua

MIS380N.1

3

ITM

222-22-2222

A. Barua

MIS380N.1

4

ITM

333-33-3333

T. Ruefli

MIS380N.1

5

ITM

444-44-4444

L. Leibrock

MIS374

1

Systems Analysis

555-55-5555

E. Jordan

9


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

In the above example, Course_ID and Section form the (composite) primary key. That is, the primary key uniquely determines each record. The functional dependencies in the above table are as follows: 1. Course_ID → Course_Name (i.e., given Course_ID we can precisely determine the course name) 2. Course_ID, Section → Faculty_SSN 3. Course_ID, Section → Faculty_Name 4. Faculty_SSN → Faculty_Name We can represent the above functional dependencies in the form of a figure called dependency graph. The primary key of the above table is a composite key of Course_ID and Section. We represent that as ONE box (in brown). We draw a line from this box to Faculty_SSN and Faculty_name (red lines) because of functional dependencies 2 and 3. We draw a line from Course_ID to Course_name (yellow line) because of dependency 1. Similarly, we have a green line from Faculty_SSN and Faculty_Name based on the functional dependency 4. (In general, it is the yellow line that is used when an attribute is not fully dependent on the primary key, the red line when attributes are fully dependent on the primary key, and the green line when an attribute is dependent on a non-key attribute.)

Course_Name

Course_ID

Instr_SSN

Section

Instr_Name

Composite Key

The above table is in 1NF since the attribute set contains atomic values (values cannot be broken down into smaller elements). For all practical purposes we can assume that any table given to us is in 1 NF. There are many problems with the above table: 1. If a faculty is not teaching any course, he/she will not be represented in the database (INSERT anomaly). 10


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

2. If MIS374 is deleted, we lose all the information of faculty "Jordan” (DELETE anomaly). 3. If we have to rename the Course_name for MIS380N.1, we need to update every record that has MIS380N.1 (UPDATE anomaly). We can now normalize (reduce the anomalies) the above table. If you watch the dependency graph carefully, you’ll notice one of the attributes is partially dependent on the primary key (Course_ID → Course_Name). (The yellow line is drawn from Course_ID to Course_Name). In other words, Course_Name is not completely dependent on Course_ID and Section (the primary key). This can potentially create problems such as that in update anomaly discussed above. We can now make two tables from the above:

Table 1

Course_ID

Course_Name

Course_ID

Faculty_SSN

Table 2 Section

Faculty_Name

Composite Key

The above figure is the same as the tables below.

Course_ID MIS380N.1 MIS374

Course_name ITM Systems Analysis

Course_id * MIS380N.1 MIS380N.1 MIS380N.1 MIS380N.1 MIS380N.1 MIS374

Section * 1 2 3 4 5 1

Faculty_SSN 111-11-1111 222-22-2222 222-22-2222 333-33-3333 444-44-4444 555-55-5555

Faculty_Name

P. Konana A. Barua A. Barua

T. Ruefli L. Leibrock E. Jordan

When every (non-key) attribute is fully dependent on the primary key and the table is already in 1 NF, we call the set of tables to be in 2NF. This minimizes some of the problems discussed earlier (UPDATE anomaly). However, we can normalize even further. If we observe the dependency graph for Table 2 above, we can see that Faculty_Name is transitively dependent on the primary key (that is, even if there is no direct link between the primary key and Faculty_Name, it can 11


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

be derived transitively from (1) Course_ID, Section → Faculty_SSN; and (2) Faculty_SSN → Faculty_Name). Therefore, we can split the above tables into three tables as below: Table 1

Course_ID

The above set of tables is said to be in

Course_Name

3NF. That is, every non-key attribute is non-transitively dependent on the

Course_ID

Faculty_SSN

primary key and the table is already

Table 2

in 2NF. This further eliminates some of

Section

the problems such as INSERT and

Composite Key

DELETE anomalies (and also the redundancy). Other normal forms such Table 3

Faculty_SSN

as BCNF, 4NF and 5NF are not

Faculty_Name

discussed and readers can refer any database textbook for details. The final set of tables is as shown below: SECTION Table

COURSE Table

Course_ID MIS380N.1 MIS374

Course_name ITM Systems Analysis

FACULTY Table

Faculty_SSN 111-11-1111 222-22-2222 333-33-3333 444-44-4444 555-55-5555

Course_id * MIS380N.1 MIS380N.1 MIS380N.1 MIS380N.1 MIS380N.1 MIS374

Section * 1 2 3 4 5 2

Faculty_SSN 111-11-1111 222-22-2222 222-22-2222 333-33-3333 444-44-4444 555-55-5555

Faculty_Name Konana Barua Ruefli Leibrock Jordan

2.3.1 Implications of Normalization As discussed earlier, normalization allows us to minimize insert, update and delete anomalies and help maintain data consistency in the database. Ideally, organizations would like to keep data in higher normal forms for these reasons. However, there is a side effect. For large databases, queries

12


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

that require join of two or more tables will take a significant amount of time. That is, there is a tradeoff between higher normal form and response time. Typical databases in large corporations would have millions of records in each table. For simplicity let’s assume a bank example where there are 100,000 customers in the customer table and 1 million transactions (records) in the transaction table. If you need to run a query that requires a join of these two tables, in the worst case, it must first create 100,000 multiplied by 1 million records and then select (selects the appropriate records) and project (projects the appropriate attributes). If there is any grouping to compute sum, average, count, maximum and minimum then additional processing is required. When we join two tables, the database creates all permutations and combinations of these records and then processes further (there are efficient techniques to minimize this problem). Therefore, higher the normal form larger will be the processing time since more joins are required for querying. Of course, if there is no need to join tables then normalized tables are efficient since there is no data redundancy to deal with. From an organizational point of view this trade-off is not acceptable. They need both highly consistent data and good response time. There are many techniques to improve response time while keeping tables at least in 3rd NF. The database can have

Business Users

3 rd NF, but the database administrator can pre-compute frequently used queries and provide a simple view of precomputed queries. Views are virtual tables (means, they

Views

do not exist as tables but gives you an illusion that they do exist) formed by queries on actual set of tables (actual

Base Relations

set of tables are called “base” tables). Therefore, users Normalized Tables

query the view of the pre-computed query rather than the

original tables. This raises the question “what if the data in the base tables change?” To address this issue, there are built-in techniques to update the pre-computed table incrementally whenever the base tables change. Therefore, changes are reflected in the view immediately. Views also have another function: they provide data security. They hide parts of the database from users. The same data can be customized for each user so that the same data can be viewed differently. There are other ways to improve response time. All or most of the database can be stored in the main memory or cached. There are simple techniques available to database administrators cache frequently accessed data. This can improve response time of queries by many magnitudes. The

13


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

reason is that one of most time-consuming aspects of querying is retrieving data from the secondary or tertiary storage such as disk drive or optical drives. If we can avoid data access by placing all the data in the main-memory, the querying time can be minimized dramatically. Of course, mainmemory databases add another level of complexity of backing up since any accidental system failure or outage can destroy your data. Many databases (e.g., Oracle) provide easy mechanisms to cache (highest form of memory) parts or all of data while creating tables. Response time can be improved by having efficient indexing techniques. Efficient indexing techniques are being used in data warehouse applications where the size of the database is in the order of many terabytes (1000 Giga bytes) to improve response time. There are other advanced techniques to improve response time using Symmetric Multiprocessing (SMP) and Massively Parallel Processing (MPP). Queries can be broken down into smaller chunks of smaller queries and executed in different processors simultaneously.

2.4 Questions to Ponder (Relational Model and Normalization)! 2.4.1 What to focus on? 1. What is a relational model and what are the characteristics? 2. What are candidate keys, primary keys, alternative keys and foreign keys? Give an example how alternative keys can be useful in a database application? 3. Various types of relational integrity, and their implications. 4. Understand the role of referential integrity using Cascade on delete/update, and Restrict on delete/update. 5. What is normalization? Understand the various levels of normalization (1st , 2nd and 3rd) and the implications of normalization. How to provide good response time for user queries while still maintaining highly normalized table? 6. What are functional dependencies (FD)? Should be able to figure out FDs from a given table. You should be able to figure out primary key of a table given a set of FDs. Review the normalization example with particular emphasis on how to identify partial dependencies and transitive dependencies.

14


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

2.4.2 True/False questions State whether the statement is true of false. You must justify why this statement is true or false. You may use hypothetical examples to articulate your thoughts. 1. When you define RESTRICT on delete for referential integrity, if you delete a parent record in one table and if there are no related records in another (child) table then the delete is permitted. 2. When we normalize a set of relations, we reduce redundancy and may never be able to recreate the original set of relations. 3. A table is in 3rd normal form only when every non-key attribute is non-transitively dependent on the primary key. 4. A foreign key can take NULL value. 5. A set of relations that is in third normal form need not necessarily be in second normal form. 6. When we define a primary key we can enforce that when a record is deleted all related records in other tables are also deleted. 7. In a relation, all the attributes together can be a candidate key. 8. If there are two tables with 100,000 records in each, then a join of these tables without any join condition or any attribute conditions will result in 200,000 records. 9. An attribute in a composite primary key can also be a foreign key 10. IDMS is a relational model. 11. A primary key can take NULL value. 12. A table is in 2nd Normal form and not in 3rd normal form when there is partial dependency on the primary key 13. A table is in 3rd normal form when every non-key attribute is completely dependent on the primary key 14. Normalizing tables does not mean we increase the response time. 15. Companies can store data in normalized tables while still providing high response times by caching. 2.4.3 Practice problem for normalization Consider the table BOOK (the attributes are in parenthesis) and its functional dependencies (Note: DO NOT ADD ANY NEW ATTRIBUTES. YOU MUST WORK BASED ON THE

15


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

FUNCTIONAL DEPENDENCIES GIVEN BELOW. DO NOT MAKE YOUR OWN ASSUMPTIONS): BOOK (Title, Author, Price, Publisher-name, Number-of-copies, subject-area) Author à Subject-area Title, Author à publisher-name Title, Publisher-name à Price Title, Publisher-name à Number-of-copies

1. Determine the primary key of the BOOK table and justify. Your primary key should be based on the functional dependencies provided.

2. Is this table in 2NF? Why or why not? Justify.

3. Compose the table into 3NF. Justify why the set of tables are in 3NF.

16


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

2.5 Database design steps The database design consists of four major steps: Analysis à (Conceptual modeling à Logical modeling) à Physical design à Implementation. In the analysis phase, the planning, requirements gathering and feasibility study are conducted. The conceptual modeling deals with the identification of the data requirement, the data semantics and the relationship among the data. This is modeled as entity-relationship (E-R) diagram most often using a modeling tool (e.g., ERWin, ProcessModeler, Rational Rose). Once E-R modeling is done, it is translated into a set of tables in the logical modeling. Most often conceptual modeling and logical modeling are combined since modeling tools automatically convert the E-R diagram into database schema (table structure). The tables are normalized based on our requirements. The physical design involves how data will be stored and accessed in a computer system. Some of the decisions that have to be made include where data will be stored, how data will be distributed within an organization or disks, and type of indexes to be used (for efficient retrieval and manipulation). Implementation phase is the actual implementation of the database and associated programming. Below the conceptual modeling using the EntityRelationship is discussed. 2.5.1 Entity-Relationship Model (E-R model) Peter Chen first proposed modeling databases using a graphical technique that humans can relate to easily. Humans can easily perceive entities (see the definition below) and their characteristics in the real world and represent any relationship to each other. The objective of modeling graphically is even more profound than simply representing these entities and relationship; database designer can use tools to model these entities and their relationships and then generate database vendor specific database schema (database tables, primary keys, foreign keys and any other constraints) automatically. These tools can normalize the tables as well, if the functional dependencies are also stated (unfortunately, there are no commercial tools used in real-life that use functional dependencies, although Microsoft uses the concept of functional dependencies to convert arbitrary tables or Excel spreadsheet into third normal form Access tables.). Potentially these tools can improve productivity of the designers by automating routine tasks. Most importantly, these tools can create the required documentation for future maintenance (note: approximately 70% of the system lifecycle cost is spent on maintenance alone!).

17


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

An ENTITY is a "thing" or an "object" in the real world that is distinguishable from all other objects. For example, in our (now infamous) Faculty-Course-Student example, Faculty, Students, and Courses are entities. An entity has a set of properties or attributes. For example, Student has several attributes like, SSN, Name, Address, State, Zip, etc. One or more attributes may uniquely identify an entity. For Faculty and Students, SSN uniquely identifies one particular person. In the real world, entities have some relationships. For example, "Faculty" teaches "Courses" or "Students" take "Courses." Therefore, we can model the real world in terms of Entity-Relationship. In order to do that, we represent entities as rectangles, relationships as diamonds, and attributes as ovals.

Entities

Relationship

Attributes

For every relationship you must consider the degree of relationship (one-to-one, one-to-many, many-to-many), called the cardinality. The cardinality describes how an instance of an entity (i.e., if STUDENT is an entity, a particular student John Doe is an instance of the entity STUDENT. Think of this as a record of STUDENT table.) is linked to the instances (records) of another entity (or, may be the same entity). The table structure depends on the relationship degree. The cardinality concept is discussed below with examples along with rules for converting these entities in a relationship into tables. 1. One-to-One relationship: Example - A faculty can teach at most one section each semester and a section can be taught by at most one faculty. In the above example, one faculty can teach no classes or maximum 1 section. We say that the cardinality is (0,1) (shown in red in the figure below). Similarly, one section is taught by no faculty or maximum one faculty member; that is, (0,1) shown in green in the table below. The relationship is described by a verb "teaches." Assume for this example, the unique key for entity FACULTY is SSN and that of SECTION is Unique#.

18


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

NAME (0,1)

FACULTY

(0,1)

teaches

SECTION

SSN

Unique#

How to convert the above one-to-one relationship into tables? Ideally, entities in one-to-one relationship can be combined into one large table with attributes from both the entities. The primary key of this table can be any one of the unique keys of FACULTY and SECTION entities. In some situation, it is better to create two tables. In which case, each entity becomes a table with a primary key (underlined attribute). Key of either entity can appear in the other entity’s table as a FOREIGN key. In the above example, we have two tables - FACULTY and SECTION FACULTY Table SSN

NAME

……

…….

NAME

….

SSN

SECTION Table Unique#

(Foreign

key

references

FACULTY Table) Even though we can argue that Unique# can be a foreign key in Table FACULTY, it is not a good idea. What if tomorrow you have to make changes to the structure so that you want to answer “ List all the courses taught by Faculty with SSN = ‘111-11-1111’ in the past 4 semesters.” That is, what if tomorrow one changes the requirements that a faculty MUST teach more than one section? è YOU MUST CHANGE the structure of both the tables that will be very expensive. Imagine, changing the structure of your building after it has been built versus changing wall color in a room. Similarly, your design goal must be to AVOID structural changes in future! Bottom line è Think about possible queries, future potential changes, whether you need to maintain history, which record remains relatively static (for example, a faculty will continue to teach

19


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

different courses while a record of teaching seldom is updated - that is, once entered in SECTION table it remains as history and is not changed in future). 2. One-to-many or Many-to-one A faculty can teach at most three sections each semester, while a section can be taught by at most one faculty. The E-R model is as shown below:

NAME (0,3)

FACULTY

(0,1)

SECTION

teaches

SSN

Unique#

Note - A faculty can teach up to three sections, so the relationship is ONE - TO - MANY. That is ONE faculty can teach MANY sections. The same tables as in one-to-one relationship hold good. But, here there is no ambiguity in terms of which primary key of a table should go into the other table as a foreign key. Put SSN as a foreign key in the SECTION table. To understand the guideline, you must first understand that ONE faculty can teach up to THREE (Many) sections. Therefore, put the primary key of the table corresponding to ONE (i.e., faculty table) in the table that corresponds to MANY (i.e., Section table). 3. Many - to - Many A faculty can teach at most three sections while each section can be taught by at most four faculty members.

NAME (0,3) or (1,3)

FACULTY SSN

(0,4)

teaches

SECTION Unique#

20


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

Here, a faculty can teach no classes (or 1) to maximum 3 classes, while a section can be taught by one or two faculty members. So this is a MANY - to - MANY relationship. The tables here are: FACULTY Table SSN

NAME

……

…….

C-NAME

….

…….

SECTION Table Unique#

FAC-SECT Table (A RELATIONSHIP Table) Unique#

SSN

….

…….

Here you will have three tables – first table corresponds to the first entity, second table corresponds to the second entity, and the third table corresponds to the relationship table with a composite primary key (that is, primary keys of table 1 and table 2 together form one key). Note - anytime you have a many-to-many relationship, you will have a third table that is a relationship table. 2.5.2 Example A large computer manufacturer uses hundreds of parts. Each part has a unique ID, a name, a re-order level and other attributes. The company maintains a list of suppliers who supply one or more of the required parts. Each supplier is evaluated on the price/unit, quality, and reliability for EACH part they supply (implies, a supplier can supply more than one part but the supplier is evaluated for each part separately). Each supplier is then ranked (that is, if there are 5 suppliers then each supplier will be ranked on a scale of five. One implies highest preference). Each part supplied by a supplier has minimum quantity, maximum quantity and a lead-time. Below the E-R model is provided for the two entities discussed in the case: SUPPLIERS and PARTS. (We will ignore all the other entities such as products, orders, payment etc.)

21


Chapter 2: Relational Model and E-R Modeling

Supplier_ID

Copyright Konana, 2000

Name

Rank

Suppliers

Part_ID

Lead_Time

Supplies

Name

Parts

(1,N)

(1,M) Min_Qty

Max_Qty Re-Order

In the above E-R model, one supplier can supply one part or N parts (1,N), and one part can be supplied by one supplier or many suppliers (1,M). This is MANY-to-MANY relationship. For every one part and one supplier combination we have many additional attributes that cannot be included in either supplier or parts entity (e.g., Rank, lead_time, Max_qty, Min_Qty etc.) These attributes are also called relationship attributes that are linked to the relationship diamond. Some textbooks and modeling tools use a diamond within a rectangle to indicate relationship entity that is similar to the above figure. That is, replace the diamond with the figure shown below.

Many text books and tools require you to recreate another entity (similar to any other entity) to represent this relationship entity (means, you have to think every entity in terms of tables which is antithesis to the very notion of conceptual modeling. Nevertheless, each tool or company has its own standards for modeling relationship entity. Just be aware of the different approaches). The tables for the above E-R model is: Table 1: Supplier Table Supplier_Id

Name

…..

….

Table 2: Parts tables Part_Id

Name

In_stock

Re-order level

22

….


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

Table 3: Part-Supplier Table Part_Id

Supplier_ID

Rank

Lead_time

Min_Qty

Max_Qty

‌‌.

Please note, all the attributes are not shown here. The first table corresponds to the entity Supplier with the primary key Supplier_ID. The second table corresponds with Parts table with the primary key Part_ID. The third table is called relationship table with a composite primary key that includes both the primary keys of tables Supplier and Parts. All the attributes of the relationship (attached to the diamond shape) are included in table 3.

Other E-R constructs: Weak entity Some tools use the concept of Weak entity. Weak entities depend on other entities for existence. For example, a student enrolls in a section and receives a grade. The grade can be an entity that depends on the existence of the student and section entities. A weak entity is represented by a rectangle with double lines as shown below.

Student

enroll

Section

Grade

Here we have three tables - STUDENT, SECTION and GRADE that inherits the primary keys of STUDENT and SECTION just as in many-to-many relationships.

Recursive Relationships In recursive relationships, an entity has a relationship to itself. A common example is that an employee of an organization has a supervisor who is also an employee of that organization.

23


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

Therefore, EMPLOYEE is an entity, which is related to itself. Another standard example is that an employee who is married to another employee within the organization. (0,1) EMPLOYEE

Supervise

(0,1)

2.5.2 Extended or Enhanced E-R model Since P. C. Chen introduced E-R modeling in 1976, there has been no dearth of extended E-R models. There are numerous new constructs that most commercial packages completely ignore. However, many tools (referred to as CASE -- Computer Aided Software Engineering -- tools) support the notion of super class (super entity) and subclasses (sub-entities). An entity may be a collection of many subclasses (entities). For example, STUDENT is an entity. A student may be an undergraduate (UG) student or a graduate student, which are two subclasses. These subclasses share many common attributes that belong to the STUDENT entity and may have some unique attributes specific to that subgroup. For instance, a graduate student is required to have a thesis advisor. Similarly, an UG may belong to a fraternity/sorority/dorm. Furthermore, a graduate student can be a Master's student or a Ph.D. student, again with some unique attributes. In such cases, it is recommended to model an entity with associated subclasses. The notation used for identifying subclasses varies from one book to another, one tool to another. Let’s assume subset is represented as subset symbol ⊂. The subclasses of a super class can be disjoint (i.e., an instance belongs to ONLY in one subclass) or overlapping (i.e., an instance may belong to one or more subclasses), and mutually exhaustive (i.e., any given instance belongs to some subgroup. That is, all possible subclasses are modeled). A subclass is an entity and inherits all attributes of its parent. For example, UG subclass inherits all attributes of the entity STUDENT while UG student entity may have its own attributes (or none at all). The representation of super entity and subclasses is shown below. The entity STUDENT has several attributes that are common to both undergraduate and graduate students (or entities). The two subclasses Undergraduate and Graduate are disjoint; that is, a student cannot be both UG and graduate student at the same time. This is represented as “D” in the figure. If they overlap then “D” is replaced by “O”.

24


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

STUDENT

Attributes: SSN Name Address Gender ……

D

Undergraduate

Graduate

Attributes: Student-athlete Club-name …… ……

Attributes: Thesis-advisor Defense-date ……

Many books discuss concepts such as specialization, generalization and IS-A hierarchy. These can be easily understood by reading the above super entity and subclasses appropriately. Let’s read the above example top-down and then bottom-up. In the top-down approach, we say STUDENT is a generalization of Undergraduate and Graduate subclasses (entities). In the bottom-up approach, we say that Undergraduate ad Graduate entities (subclasses) are specialization of STUDENT entity. IS-A hierarchy just describes that there exists a generalization or specialization hierarchy. You could replace “D” in the circle by IS-A and read the hierarchy top-down. That is, STUDENT IS-A generalization of Undergraduate and Graduate entities. There are other numerous extensions that can be pretty much termed as academic despite their importance. We will ignore these extended constructs. J CAUTION: Very often when you create an E-R model, you encounter several confusing issues (there lies the paradox of E-R model which is supposed to be human-friendly).

25


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

2.6 Questions to Ponder – Entity-Relationship modeling! 2.6.1 Examples 1. Construct an E-R diagram for a project database where an employee can work on only one project at any given time, while many employees can participate in a project at the same time. Convert the E-R model into tables. 2. Construct an E-R diagram for a vehicle-insurance company that has a set of customers, each of whom owns one or more vehicles. Each vehicle is associated with zero to any number of recorded accidents. From the E-R diagram develop the tables and identify the primary keys and foreign keys. 3. ABC company markets various products to thousands of regular (repeat) customers. Each product is identified by a product ID and has product description, quantity on hand and unit price. A unique account number identifies each customer. All relevant information of the customer is maintained in the database (e.g., contact name, address, phone number, etc.). Customers place orders for various products. Each customer order may consist of one or more products. The quantity ordered for various products can vary. However, the order delivery date will be the same. 1. Model the above scenario by way of Entity-Relationship diagram. Identify the keys, attributes and relationship cardinalities for each entity. 2. Based on your E-R model develop the set of tables. 2.6.2 True/False questions 1. When translating an E-R model into a set of tables, two entities having one-to-one relationship will result in two tables with the primary key of one table as a foreign key of the other table. 2. In translating two entities, A and B, with MANY-to-MANY relationship into tables both tables A and B will have each other's primary key as an attribute. 3. In a one-to-one relationship it is highly desirable to create three tables for two entities. E-R modeling

26


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

2.7 Example of E-R modeling using ERWin (OPTIONAL) Below the COURSE-SECTION-FACULTY E-R model is created using ERWin (ERWin is from Logic Works, now acquired by Platinum Technologies and part of Computer Associates. It is an award winning, leading modeling tool. Certainly one the easiest to learn!). Although you may not have access to ERWin or other modeling tool the description below gives you an idea of the benefits of using tools.

E-R model Tool Box

When you start ERWin, you will see an interface as shown above. You can see an ERWin Toolbox. This toolbox gives you all the required constructs to create an E-R model. The details are given below.

27


Identifying relationship (i.e., primary oneModeling Chapter 2: Relational Modelkey andofE-R entity becomes part of the primary key of another table)

Independent Entity (Super entity or Major entity)

Non-identifying relationship (i.e., primary key of one entity becomes foreign key in another table)

Dependent Entity (Subclass or

Complete Subcategory (Exhaustive

Describes many-toCopyright Konana, 2000 many relationships

Incomplete Subcategory

Example A university teaches a large number of courses. Each course number is unique and has a name and certain credit hours. Each course is offered in one or more sections. While a faculty member teaches one or more sections, a section can be taught only by one faculty member. Each section has certain enrollment. Model this as an E-R diagram and generate the tables (please ignore where and when a section is taught). Here there are three independent (major) entities: FACULTY, COURSE and SECTION. To create the first entity FACULTY, we need to click on Independent Entity icon and click on the workspace. You will see an entity that look like:

Double clicking on this entity will allow us to define name of the entity, attributes and primary key.

28


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

Clicking on OK, it creates an entity FACULTY. Similarly, we can create another entity COURSE. Primary Key

The last entity, SECTION can be modeled as a dependent entity or independent entity. Ignoring the details, assume this as a dependent entity with attributes section_number and enrollment. We ignore for time being the attributes course_id and faculty_SSN (these will be inherited by defining appropriate relationships with other entities). The three entities are shown above. Now we define the relationships between entities. First, FACULTY is related to SECTION. This relationship is one-to-many (i.e., one faculty member can teach many sections but each section is taught exactly by one faculty member). This is nonidentifying relationship. That is, the record in SECTION does not depend on FACULTY (it is only a foreign key). Now clicking on the non-identifying relations icon in the ERWin toolbar and then

29


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

clicking one FACULTY and SECTION tables it creates a relationship by adding Faculty_SSN as a foreign key (FK) into SECTION entity.

However, the relationship between COURSE and SECTION is identifying relationship because Course_ID is required for a section to exist; that is, Course_ID is part of the primary key in SECTION. By clicking on Identifying relationship icon and then on COURSE and SECTION, SECTION automatically inherits the primary key of Course_ID in the COURSE entity into the composite primary key of SECTION as shown below.

We can edit the relationships that we defined. If we double-click on the relationship line between FACULTY and SECTION, it opens up a window with necessary details about the relationship.

30


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

Here, we can define the cardinality whether NULLS are allowed for the foreign key etc. One may wonder, why use a separate tool for this! Why not just use a Microsoft graphics package or Powerpoint? The benefit comes from generating database specific schema. From the menu bar, we can choose Server Ă Target Server and we will be given a option to choose the right database.

31


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

Let’s select ORACLE database and Version 7. This will allow us to generate Oracle Version 7 specific SQL code and triggers (using PL/SQL – native language). We can view the options for generating the code (we will ignore the details. But this should give you an idea about what a modeling tool can do for you). We can click on the Preview and see what this tool creates. It generates all the CREATE statements for the tables, and triggers to enforce data consistency. A sample of that report is shown below.

REPORT (partial code): CREATE TABLE COURSE (Course_ID CHAR(18) NOT NULL, Course_Name CHAR(18) NULL, Credit_Hours CHAR(18) NULL ); CREATE UNIQUE INDEX XPKCOURSE ON COURSE ( Course_ID ASC ); ALTER TABLE COURSE ADD ( PRIMARY KEY (Course_ID) ) ; CREATE TABLE FACULTY (Faculty_SSN CHAR(18) NOT NULL, Faculty_Name CHAR(18) NULL,

32


Chapter 2: Relational Model and E-R Modeling

Copyright Konana, 2000

Faculty_Title CHAR(18) NULL ); CREATE UNIQUE INDEX XPKFACULTY ON FACULTY ( Faculty_SSN ASC ); ALTER TABLE FACULTY ADD ( PRIMARY KEY (Faculty_SSN) ) ; CREATE TABLE SECTION (Section_Number CHAR(18) NOT NULL, Course_ID CHAR(18) NOT NULL, Faculty_SSN CHAR(18) NULL, Enrollment CHAR(18) NULL ); CREATE UNIQUE INDEX XPKSECTION ON SECTION ( Section_Number ASC, Course_ID ASC ); ALTER TABLE SECTION ADD ( FOREIGN KEY (Course_ID) REFERENCES COURSE ) ; ALTER TABLE SECTION ADD ( FOREIGN KEY (Faculty_SSN) REFERENCES FACULTY ) ; CREATE TRIGGER tD_COURSE after DELETE on COURSE for each row -- ERwin Builtin Tue Sep 07 17:30:24 1999 -- DELETE trigger on COURSE declare numrows INTEGER; begin /* ERwin Builtin Tue Sep 07 17:30:24 1999 */ /* COURSE R/4 SECTION ON PARENT DELETE RESTRICT */ select count(*) into numrows from SECTION where /* %JoinFKPK(SECTION,:%Old," = "," and") */ SECTION.Course_ID = :old.Course_ID; if (numrows > 0) then raise_application_error( -20001, 'Cannot DELETE "COURSE" because "SECTION" exists.' ); end if; -- ERwin Builtin Tue Sep 07 17:30:24 1999 end; / ----rest of the code deleted ----

33


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.