IMTS (ISO 9001-2008 Internationally Certified) DATABASE MANAGEMENT SYSTEM
DATABASE MANAGEMENT SYSTEM
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEMS CONTENTS: UNIT I
01-38 File System and Databases: Introducing the Database – The Historical
Roots of the Database: Files and File Systems – A File System Critique – Database System – Database Modes – Wrap–Up: The Evolution of Data Models. The Relational Database Model: A Logical view of data – Keys – Integrity Rules Revisited – Relational Database Operations – The Data Dictionary and the system Catalog – Relationship within the Relational Database – Data Redundancy Revisited.
UNIT II
39-59
Entity Relationship (E–R) Modeling: Basic Modeling Concepts – Data Models: Degrees of data Abstraction – The Entity Relationship (E-R) Model – Normalization of Database Tables: Database Tables and Normalization – Higher – Level Normal Forms – Denormalization.
UNIT III
60-100
Relational Database; Structured Query Language – Other Relational Languages – Transaction Management and Concurrency Control: What is a Transaction – Concurrency Control – concurrency Control with Locking Methods – Concurrency Control with Time Stamping Methods – Concurrency Control with Optimistic Methods: Database Recovery Management. Distributed Database Management Systems: The Evolution of Distributed Database Management Systems – Distributed Processing and Distributed Databases – Functions of Distributed Database Management System – DDBMS Components – Levels of Data and Process Distribution – Distributed Database Transparency Features – FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
Distribution
Transparency
–
Transaction
Transparency
–
Performance
Transparency and Query Optimization – Distributed Database Design – Design – Data Fragmentation – Data Replication – Data Allocation – Client/Server vs. DDBMS –C.J. Date’s Twelve Commandments for Distributed Databases.
UNIT IV
101-124
Query Processing: Overview – erasures of Query cost – Selection Operation – Sorting – Join Operation – Other Operations – Evaluation of Expressions, Query Optimization: Overview – Estimating Statistics of Expression Results – Transformation of Relational Expression – Choice of Evaluation Plans – Materialized Views. .Distributed Databases: Homogeneous and Heterogeneous Database – Distributed Data Storage – Distributed Transaction – Commit Protocols – Concurrency control in Distributed Database – Availability – Distributed Query Processing – Heterogeneous Distributed Databases – Directory Systems.
UNIT V
125-147
Object Oriented Databases: Object–Oriented Concepts – Characteristic of an object– Oriented Data Model – OODM and Previous Data Models: Similarities and Differences – Object– Oriented Database Management Systems – How Object Orientation Affects Database Design – OODBMS, Advantages and Disadvantages – OO Concepts in Relational Modes – The Next Generation of Database Management Systems Advanced Data Types and New Applications; Motivation – Time in Database – Spatial and Geographic data – Multimedia Database – Mobility and Personal Databases.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
1
UNIT 1 FILE SYSTEMS AND DATABASES 1.1 INTRODUCTING THE DATABASE Data Data are raw facts. The word “raw” is used to indicate that the facts have not yet been processed to reveal their meaning. Information Processed Data (i.e.) refined data. Information is the backbone of any Organization. Information is data that have been put into a meaningful and useful context and communicated to a recipient of who uses it to make decisions. Key points:
Data constitute the building blocks of information.
Information is produced by processing data
Information is used to reveal the meaning of data
Good, relevant and timely information is the key to good decision making.
Good decision making is the key to organizational survival in a global environment.
Database A database is a collection of data, typically describing the activities of one or more related organizations. For example: University database might contain information about the following. Entities such as students, faculty, courses and class rooms.
Relationships between entities such as students enrollment in courses, faculty teaching courses and the use of rooms for courses. Data Management Data Management is a discipline that focuses on the proper generation, storage, and retrieval of data. Typically, efficient data management requires the use of a computer database. A database is a shared, integrated computer structure that houses a collection of:
End user data (i.e.) raw facts of interest to the end users.
Metadata (data about data) through which the data are integrated.
Metadata provide description of the data characteristics and the set of relationship that links the data found within the database.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
2
DBMS Database Management System or DBMS is software designed to assist in maintaining and utilizing large collections of data, and the need for such systems as well as their use is growing rapidly.
Reasons why DBMS are important in our information-based society
Since data are so important, we must have a good way of managing such data.
The DBMS contains a query language that makes it possible to produce quick answers to ad hoc queries.
The DBMS helps create an environment in which end users have better access to more and better–managed data than they did before the DBMS became the data management standard.
Wider access to well-managed data promotes an integrated view of the Organization’s operations.
Data inconsistency is greatly reduced in a properly designed database that is managed through a DBMS. Better data make it possible to generate better information on which better decisions are based.
The DBMS makes it possible to share the data in the database among multiple applications or users. DBMS stands between the users and the database.
Database structure
End user
Application request
Customer
Data
End user
Data
Metadata
DBMS Database management system
Inventory
End user data
Invoices Application request
Products
Fig 1.1 The DBMS Manages The Interaction Between The End User And The Database Fig 1.1 illustrates that the DBMS stands between the database and the user(s). In effect, the DBMS serves as the intermediary between the user and the database by translating user
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
3
requests into the complex code required to fulfill those request. The application program might be written by a programmer, using a programming language such as COBOL, or it might be created through a DBMS utility program. The Database design here it means the design of the database structure that will be used to store and manage data, rather than the design of the DBMS software.
1.2 WHY DATABASE DESIGN IS IMPORTANT
A good database does not just happen; the structure of is contents must be designed carefully.
Even a good DBMS performs poorly with a badly designed database.
A well-designed database facilities data management and becomes a valuable information generator.
Poorly designed database is likely to become a breeding ground for redundant data (duplicate data). Redundant data are often the source of difficult-to-trace information errors.
A database contains redundant data when the same data about the same entity are kept in different locations. Therefore, the existence of redundant data can produce uncorrected data entries.
A poorly designed database tends to generate errors that are likely to lead to bad decisions.
1.3 THE HISTORICAL ROOTS OF THE DATABASE: FILES AND FILE SYSTEM
In the recent past, a manager of almost any small organization was able to keep track of necessary data by using a manual file system.
File system was traditionally composed of a collection of file folders, each properly tagged and kept in a filing cabinet
As long as a data collection was relatively small and an organization’s managers had few reporting requirements the manual system served well
As the organization grew the reporting requirements became more complex and keeping track of data in a manual file system becomes difficult.
Conversion from manual file system to a matching computer file system could be technically complex.
New specialist in deed called data processing (DP) specialist, had to be hired or “grown from the current staff.
DP specialist create the computer file structure
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
Wrote software to manage the data.
Designed the application programs that produced reports based on the file data.
4
BASIC FILE VOCABULARY
DATA:
Raw facts that have little meaning unless they have been organized in some logical manner.
FIELDS:
A character or group of character that has a specific meaning.
RECORD: A logically connected set of one or more fields. FILE:
A collection of related records. As time went on, additional programs were written and using computer saved much time and effort.
Reports were impressive.
Complex data searches yielded the information needed to make sound decisions.
As the number of files increased, a small file system, like the one shown in the figure 1.2, evolved. Each of the files in the system used its own application programs to store, retrieve, and modify data. And each file was owned by the individual or the department that commissioned its creation.
1.4 A FILE SYSTEM CRITIQUE 1.4.1 File System Data management Even the simplest data-retrieval task required extensive programming in a thirdgeneration language (3GL). Example COBOL, BASIC, FORTRAN. Programming in a 3GL can be a time-consuming, high-skill activity. As the file system become more complex the access paths become difficult. Each file must have its own file management system composed of program that allows to
Create the file structure
Add data to file.
Delete data from the file.
Modify the data contained in the file.
List the file content.
Even a simple file system of only 20 files requires 5 x 20 = 100 file management programs. If each of the files is accessed by 10 different reporting programs, an additional 20 x 10 = 200 programs must be written. Thus file system’s structure and lack of security make it difficult to pool data and simply it is not well suited to modern information requirements.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
5
Personnel dept.
Sales dept.
File Management programs
File report programs
File Management programs
CUSTOMER File File Management programs
File report programs
AGENT File
File report programs
Sales file
1.4.2 Structural and Data Dependence Figure 1.2
A Simple File System
Change in any file’s structure, such as the addition or deletion of a field, requires the modification of all programs using that file. Such modifications are required because the file system exhibits structural dependence; that is, access to a file is dependent on its structure. Even changes in file data characteristics, such as changing a field from integer to decimal, require changes in all programs that access the file. Because all data access programs are subject to change when any of the file’s data characteristics change, the system is said to exhibit data dependence.
1.4.3 File Definitions and Naming Conventions Each file should have a unique record identifier and selecting proper field names is very important. For example, make sure that the field names are reasonably descriptive. That is, by simply looking at the field names we are able to determine which files the fields belong to and what information the fields are likely to contain.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
6
1.4.4 Data Redundancy If the file system environment makes it difficult to pool data, it is likely that the data are stored in many different locations. Having them occur in more than one place produces data redundancy. Uncontrolled data redundancy sets the stage for: 1. Data inconsistency. Data inconsistency is the lack of data integrity. Suppose we change an agent’s phone number or address in the AGENT file. If we forget to make corresponding changes in the CUSTOMER file, the files contain different data for the same agent. Reports yield inconsistent result, depending on which version of the data is used. Data entry error such as an incorrectly spelled name also yields the same kind of data integrity problems. 2. Data Anomalies. The dictionary defines “anomaly” as an abnormality. Data anomalies exist because any change in any field value must be correctly made in many places to maintain data consistency. The data anomalies are commonly labeled: Modification Anomalies: If agent has a new phone number, that new number must be entered in each of the CUSTOMER file records in which the agent’s phone number is shown. In a large file system such changes might occur in hundred or even thousands of records. Clearly, the potential for data inconsistencies is great. Insertion Anomalies: To add each new customer in the CUSTOMER file, we must also add the corresponding agent data. If we add several numbered new customers, we must also enter several hundred agent names and telephone numbers. Again, the potential for creating data inconsistencies is great. Deletion Anomalies: If any agent quits and is deleted from the payroll, the entire customers in the CUSTOMER file become linked to a nonexistent agent. To resolve this problem, we must modify all records in which the agent name and phone number appears. 1.5 DATABASE SYSTEMS The problems inherent in file systems make using the database system very desirable. Unlike the file system, with its many separate and unrelated file, the database consist of logically related data stored in a single data repository. Therefore, the database represents a change in the way end user data are stored, accessed, and managed. The database’s DBMS system provides numerous advantages over file system management and eliminates most of the file system’s
Data inconsistency
Data anomalies
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
Data dependency and
Structural dependency problems.
7
The current generation of DBMS software stores not only the data structure, but also the relationships between the database components, all in a central location. The current generation of DBMS software also takes care of defining, storing, and managing all the required access paths to those components. Perhaps it is appropriate to refer to the DBMS as the database system’s heart.
1.5.1 The Database System Environment The term database system refers to an organization of components that defines and regulates the collection, storage, management, and use- of data within a database environment. The database system is composed of the five major parts shown in figure 1.3: hardware, software, people, procedures, and data.
Procedures and standards
Writes and enforces
Database administrator
Analysts
Database Designer
System administrator
Manages Hardware
Designer End users
Programmers Applications programs
Write
DBMS utilities DBMS
Use Access
Data
Figure 1.3 The Database Environment
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
8
1. Hardware: Hardware identifies all the system’s physical devices. The database system’s main and most easily identified hardware component is the computer. It includes keyboards, mice, modems, printers, etc. peripherals also include any electronic devices that are used to connect two or more computers, thereby producing a computer network. 2. Software: Software refers to the collection of programs used by the computers within the database system although the most readily identified software is the DBMS itself, it takes three types of software to make the database system function fully:
Operating systems software,
DBMS software and
Application programs and utilities.
1. Operating system software It manages all hardware components and makes it possible for all other software to run on the computers. Examples of operating system software include DOS, OS/2, and Windows used by the microcomputers, UNIX and VMS used by the minicomputers, and MVS used by IBM mainframe computers. 2. DBMS software: It manages the database within the database system. Some examples of DBMS software include Microsoft’s Access and SQL Server, Oracle Corporation’s Oracle, and IBM’s DB2. 3. Application programs and utilities software They are used to access and manipulate the data in the DBMS and to manage the computer environment in which data access and manipulation take place. Application programs are most commonly used to access the data found within the database to generate reports, tabulations and other information to facilitate decision making. Utilities are the software tools used to help manage the database system’s computer components. 3. People This component includes all users of the database system. On the basis of primary job functions, we can identify five types of users in a database system:
System administrators oversee the database system’s general operations.
Database administrators, also known as DBAs, manage the DBMS’s use and ensure that the database is functioning properly.
Database designer design the database structure. They are, in effect, the database architects. If the database design is poor, even the best application
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
9
programmers and the most dedicated DBAs will fail to produce a useful database environment.
Systems analysts and programmers design and implement the application programs. They design and create the data entry screens, reports, and procedures through which end users access and manipulate the database’s data.
End users
are the people who use the application programs to run the
organization’s daily operation
4. Procedures Procedures are the instructions and rules that govern the design and use of the database system. Procedures are a critical although occasionally forgotten components of the system. Procedures play an important role in a company, because they enforce the standards by which business is conducted within the organization and with customers. 5. Data The word “data” covers the collection of facts stored in the database. Because data are the raw materials from which information is generated, the determination of which data are to be entered into the database and how such data are to be organized is a vital part of the database designer’s job. 1.5.2 Types of Database Systems The DBMS, on which the database system is based, can be classified according to the number of users, the database site location(s), and the expected type and extend of use. The number of users determines whether the DBMS is classified a single-user or multiuser. Single-User Database System A single-user DBMS supports only one user at a time. If A is using the database, user B and C must wait until user A has completed the work. If a single-user database runs on a personal computer it is also called a desktop database. Multi-User Database System In contrast, a multi-user DBMS supports multiple users at the same time. If the multi-user database supports a relatively small number of users or a specific department within an organization, it is called a work group database if the database is used by the entire organization and supports many users across many departments, the database is known as an enterprise database. The database site location might also be used to classify the DBMS Centralized DBMS
A DBMS that supports a database located at a single site is called a centralized DBMS.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
10
Distributed DBMS A DBMS that supports a database distributed across several different sites is called a distributed DBMS. Based on type of use and the extend of such use the DBMS can be classified in to Transactional DBMS A DBMS that powers a database primarily designed to support “immediate response” transaction is classified as a transactional DBMS or a production DBMS. Decision Support Database In contrast, a decision support database focuses primarily on the production of information required to make tactical or strategic decision at middle-and high-management levels. Decision support, provided by a decision support system (DSS) typically requires extensive “data massaging” to extract information from historical data to formulate pricing decisions, sales forecasts, market positioning, and so on. DSS information tends to be based on complex data derived from many sources. To make more easily retrieval, data warehouse is used to describe the database design favored by DSSs.
1.5.3 DBMS Functions A DBMS performs several important functions that guarantee the integrity and consistency of the data in the database.
1. Data Dictionary Management The DBMS requires that definition of the data elements and their relationship (metadata) be stored in a data dictionary. The DBMS uses the data dictionary to look up the required data component structure and relationship, thus relieving us from having to code such complex relationships in each program. Additionally, any changes made in a database structure are automatically recorded in the data dictionary. In other words, the DBMS system provides data abstraction, and it removes structural and data dependency from the system. 2. Data storage management The DBMS creates the complex structures required for data storage, thus relieving us from the difficult task of defining and programming the physical data characteristics. As modern DBMS system provides storage not only for the data, but also for related data entry forms or screen definitions, report definitions, data validation rules, Procedural code, structures to handle video and picture formats, and so on.
3. Data transformation and presentation The DBMS transforms entered data to conform to the data structures that are required to store the data. Therefore, the DBMS relieves us of the chore of making a distinction between the
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
11
data logical format and the data physical format. By maintaining data independence, the DBMS translates logical requests into commands that physically locate and retrieve the requested data. That is, the DBMS formats the physically retrieved data to make it conform to the user’s logical expectations. In other words, a DBMS provides application programs with software independence and data abstraction. 4. Security management The DBMS creates a security system that enforces user security and data privacy within the database. Security rules determines which users can access the database, which data items each user may access, and which data operations the user may perform. 5. Multiuser access control The DBMS creates the complex structures that allow multiuser access to the data. In order to provide data integrity and data consistency, the DBMS uses sophisticated algorithms to ensure that multiple users can access the database concurrently without compromising the integrity of the database.
6. Backup and Recovery Management The DBMS provides backup and data recovery procedures to ensure data safety and integrity. Current DBMS systems provide special utilities that allow the DBA to perform routine and special backup and restore procedures. Recovery management deals with the recovery of the database after a failure, such as a bad sector in the disk or a power failure. Such capability is critical to the preservation of the database’s integrity.
7. Data Integrity Management The DBMS promotes and enforces integrity rules to eliminate data integrity problems, thus minimizing data redundancy and maximizing data consistency. Ensuring data integrity is especially important in transaction-oriented database systems. 8. Database Access Languages and Application Programming Interfaces The DBMS provides data access via a query language. A query language is a nonprocedural language–that is, one that lets user specify what must be done without having to specify how it is to be done. The DBMS’s query language contains two components: a data definition language (DDL) and a data manipulation language (DML). The DDL defines the structures in which the data are housed, and the DML allows end user to extract the data from the database. 9. Database Communication Interfaces Current-generation DBMSs provide special communications routines designed to allow the database to accept end user requests within a computer network environment Communications can be accomplished in several ways:
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
12
End users can generate answers to queries by filling in screen forms through their preferred World Wide Web browser.
The DBMS can automatically publish predefined reports on the Internet, using a Web format that enables any Web user to browse it.
The DBMS can connect to third-party systems to distribute information via e-mail or other productivity applications such as Lotus Notes.
1.5.4 Database Design and Modeling Database design is a crucial activity in the database environment. Database design is made much simpler when we use models. Models are simplified abstractions of real-world events or conditions. If the models are not logically sound, the database derived from them will not deliver the database system’s promise of effective information drawn from an efficient database. Thus good models yield good database designs that are the basis fro good applications.
1.6 DATABASE MODELS A database model is a collection of logical constructs used to represent the data structure and the data relationships found with in the database. Database models can be grouped into two categories:
Conceptual models
Implementation models
Conceptual Model It focuses on the logical nature of the data representation. Therefore, the conceptual model is concerned with what is represented in the database rather than with how it is represented. Conceptual models include the entity relationship (E-R) model and object-oriented model. Implementation Model In contrast to the conceptual model, an implementation model places the emphasis on how the data are represented in the database or on how the data structures are implemented to represent what is modeled. Implementation model includes the hierarchical database model, the network database model, the relational database model, and the object-oriented database model.
Conceptual models use three types of relationships to describe associations among data: one-to-many, many-to-many, and one-to-one. Database design use the shorthand notations 1: M, M : N, and 1:1 for them, respectively. 1. One-to-Many relationship. A painter paints many different paintings, but each one of them is painted by only that painter. Thus the painter (the “one”) is related to the paintings (the “many”). Therefore, database designers label the relationship “PAINTRE
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
13
paints PAINTING” as 1: M. similarly the “CUSTOMER generates INVOICE” relationship would also be labeled 1: M. 2. Many-to-Many. An employee might learn many job skills, and each job skill might be learned by many employees. Database designers label the relationship “EMPLOYEE learns SKILL” as M:N. Similarly The relationship “STUDENT takes COURSE” can be labeled as M : N. 3. One-to-One. A retail company’s management structure may require that each one of its stores be managed by a single employee. In turn, each store manager who is an employee–only manages a single store. Therefore, the relationship “EMPLOYEE manages STORE” as labeled 1:1.
1.6.1 The Hierarchical Database Model
One of the oldest database models.
Hierarchical model assumes that a tree structure is the most frequently occurring relationship
This model organizes data elements as tabular rows, one for each instance of an entity.
Basic structure
Hierarchical database is considered as a hierarchy of segments
A segment is the equivalent of a file system’s record type.
Hierarchical database is a collection of records that is logically organized to conform to the upside-down tree structure.
Within the hierarchy the top layer is root and is perceived as the parent of the segment.
Segments below other segments are the children of the segment above them. In short Each parent can have many children Each child has only one parent
Hierarchical structure A hierarchical structure is shown in the figure 1.4.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
14
Final assembly (filing cabinet)
Component A
Component B
Component C
Assembly A Assembly B
Part A
Assembly C
Part B Part C
Part D
Part E
Given this hierarchical structural, it is easy to trace both the database’s components and the 1:M relationship among them. The ordered sequence of segments tracing the hierarchical structure is called the hierarchical path. example, the Structure hierarchical path to the segment labeled Figure 1.4 AFor Hierarchical “Part D” in figure 1.4 can be traced this way: Final assembly Component A Assembly A Part A part B Component B Component C Assembly B Part C part Note that the path traces all segments from the root, starting at the segment. This “leftlist” path is known as the preorder traversal or the hierarchic sequence. Given such a path, database designers must make sure that the most frequently access segment and their components are located closest to the left side of the tree, to ensure the most efficient data access. In figure 1.4, for example, if Part D is the most frequently accessed and updated component, it would be wise to change the database structure to place Component C on the left side of the level 1 segment. Then, within this relocated “branch”, switch Part D to the position currently held by Part C. These changes cause the hierarchic sequence to be much shorter: Final assembly Component C Assembly B Part D
Thus hierarchical model is effective whenever we have many transaction that involves 1:M relationship that remain fixed overtime.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
15
Advantages Hierarchical model yields many advantages over the file system model. It acts as foundation for current database models. 1. Conceptual Simplicity. The relationship between the various layers is logically simple in case of hierarchical model. Therefore, it becomes easier to view the database conceptually, thus making its design process simpler. 2. Database Security. Database security is provided and enforced by the DBMS. Therefore, security is enforced uniformly throughout the system, without having to rely on the efforts of individual applications programmers who may have very different ideas about the extent and type of required security. 3. Data
Independence.
DBMS
creates
an
environment
in
which
data
independence can be maintained, thereby substantially decreasing programming effort and program maintenance. 4. Database Integrity. Given the parent/child relationship, there is always a link between the parent segment and its child segment(s).because the child is always automatically referenced to its parent, the hierarchical model promotes database integrity. 5. Efficiency. The hierarchical database model is very efficient when a database contains a large volume of data in 1:M relationships and when users require large number of transactions, using data whose relationships are fixed over time. Disadvantages 1. Complex Implementation. Although relieves the designer and programmers of data-dependence problems, they must still have detailed knowledge of physicaldata storage characteristics. Thus database design becomes complicated. 2. Difficult to Manage. Any changes in the database structure, such as relocation of segments, require changes in all application programs that access the database. Therefore, database management can become a difficult chore. Deletion of one segment that leads to the involuntary deletion of all the segments under it – an error that can be extremely costly. 3. Lacks Structural Independence. Benefits of data independence are limited by structural dependence. 4. Applications Programming and use Complexity. Given the navigational database system’s structure, applications programmers and end users must know precisely how the data are distributed physically within the database in order to access the data. Therefore, hierarchical databases are said to be written by programmers for programmers.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
16
5. Implementation Limitations. Only 1:M relationship can be maintained. Common M:N relationship is difficult to implement. That is common two parent conditions cannot be implemented easily in a hierarchical environment. 6. Lack of Standards. No specific standards such as DDL nor did it have a data manipulation language (DML) to manipulate the database contents. Therefore moving from one hierarchical DBMS to another was difficult; (i.e.) portability was limited.
1.6.2 The Network Database Model The network database model was created to represent complex data relationships more effectively than the hierarchical model could, to improve database performance standard. The lack of database standards was troublesome to programmers and application designers because it made database designs and applications less possible. Database Task Group (DBTG) The DBTG was charged to define standard specifications for and environment that would facilitate database creation and data manipulation. The final DBTG report contained specifications for three crucial database components. 1. The network Scheme, the conceptual organization of the entire database as viewed by the database administrator. The schema includes a definition of the database name, the record type for each record, and the components that make up those records. 2. The subschema, which defines the portion of the database “seen� by the application programs that actually produce the desired information from the data contained within the database. The existence of subschema definitions allows all application programs to simply invoke the subschema required to access the appropriate database file(s). 3. A data management language to define the data characteristics and the data structure and to manipulated the data
To produce the desired standardization for each of the three components, the DBTG specified three distinct data management language components. 1. A schema data definition language (DDL), which enables the database
administrator
to define the schema components. 2. A subschema DDL, which allows the application programs to define the data base components that will be used. 3. A data manipulation language (DML) to manipulate the database contents.
BASIC STRUCTURE In many respect the network database model resembles the hierarchical database model. For example, as in the hierarchical model, the user perceives the network database as a
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
17
collection of records in 1:M relationships. However, quite unlike the hierarchical mode, the network model allows a record to have more than one parent. Using network database terminology, a relationship is called a set. Each set is composed of least two record types; an owner record that is equivalent to the hierarchical model’s parents, and a member record that is equivalent to the hierarchical model’s child. The difference between the hierarchical model and the network model is that the latter might include a condition in which a member may have several owners. A set represents a 1:M relationship between the owner and the member. An example of such a relationship is depicted in Figure .1.5.
CUSTOMER
SALESREP 1:M Commission set
1:M Sales set
INVOICE
PRODUCT 1:M
1:M Payment set PAYMENT
1:M Inventory set
Line set INV_LINE
Figure 1.5 A Network Database Model ADVANTAGES The network database model retains many of the hierarchical database model’s advantages, while it corrects or improves man of the hierarchical model’s shortcomings. 1. Conceptual Simplicity. Like the hierarchical database model, the conceptual view of the database is simple, thus promoting design simplicity. 2. Handles more relationship types M:N relationships are easier to implement in the network database model than in the hierarchical model.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
18
DISADVANTAGES Although the network database model yielded significant advantages over the hierarchical database model, it still was subject to significant disadvantages. 1. System Complexity. Database integrity control and the efficiency, with which the network model manages relationships, are sometimes short-circuited by the system’s complexity. Like the hierarchical model, the network model provides a navigational data access environment, in which data are accessed one record at a time. In short, like the hierarchical database, the network database, the network database was not designed to produce a user-friendly system. 2. Lack of Structural Independence. Given its navigational data access
environment, it
remains difficult to make structural changes in a network database, and some structural changes are impossible to make. If changes are made to the databse3 structure, all application programs must be revalidated before they can access he database. In short, although the network model achieves data independence, it still does not produce structural .independence Because of the network database model’s disadvantages, it was largely replaced by the relational database model in the 1980s.
1.6.3 THE RELATIONAL DATABASE MODEL The relational model, first developed by E.F.Codd (of IBM) in 1970,
represented a
major break-through for both users and designers. Relational model stores data in the form of a table
Powerful because they requires few assumption about how data is related (or) how it will be extracted from the database.
Another feature of relational systems is database can be spread across several tables
Uses tables to organize data elements
Each table corresponds to an application entity
Each row represents on instance of that entity
Relationships link rows from two tables embedding row identifiers (Keys)
BASIC STRUCTURE The relational database model is implemented through a very sophisticated relational database management system (RDBMS). The RDBMS performs the same basic functions provided by the hierarchical and network DBMS systems plus a host of other functions that make the relational database model easier to understand and to implement.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
19
Thus, the relational database is perceived by the user to be a collection of tables in which data are stored. Each table is a matrix consisting of a series of row/column intersections. Tables, also called relations, are related to each other by sharing a common entity characteristic. The relationship type (1:1. 1:M, or M.N) is often shown in a relational schema. Structured Query Language (SQL) serves as a uniform interface for users providing a collection of standard expressions for storing and receiving data.
ADVANTAGES Like the hierarchical and network database, the relational database is a single data repository in which data independence is maintained. However, the relational database model adds significant advantages. 1. Structural independence. Because the relational database model does not use a navigational data access system, data access paths are irrelevant to relational database designers, programmers, and end users. Changes in the relational database structure do not affect the DBMS’s data access in any way. Therefore, the relational database model achieves the structural independence not found in the hierarchical and network database models. 2. Improved conceptual simplicity. Although the hierarchical and network database models were conceptually much simpler than the file management systems they replaced, the relational database model is even simpler at the conceptual level. 3. Easier database design, implementation, management, and use. Because the relational model achieves both independence and structural independence, it becomes much easier to design the database and to manage its contents. 4. Ad hoc query capability. One of the reasons for the relational database model’s rise to dominance in the database market is its very powerful and flexible query capability. For most relational database software, the query language is a structured query language (SQL). SQL a so called fourth generation language (4GL). A 4GL allows the user to specify what must be done without specifying how it must be done. Any SQL – based relational database application involves three parts: a user interface, a set of tables within the database, and the SQL “engine”. The interface might include menus, query operations, report generators, and so on. Basically, the interface allows the end user to interact with the data. The database tables simply store the data. Largely hidden from the end user, the SQL engine does the tough database jobs. 5. A powerful database management system. A good RDBMS is a much more complex piece of software than the DBMS found in the hierarchical and network databases. Its complexity stems from the fact that it performs far more tasks for both
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
20
the system’s designers and users. Consequently, a good RDBMS makes it possible to hide a system’s complexity from both the database designer and the end user.
DISADVANTAGES The relational database’s substantial advantages over the hierarchical and network databases are purchased at the cost of some disadvantages. 1. Hardware overheads and software overhead. Takes more powerful computer to perform all RDBMS-assigned tasks. Modern computers processing rate is increasing at an exponential rate. So no longer a very big issue. 2. Ease of design can lead to bad design. Ease to design and use leads to poor design and implementation which creates problem when database grow in size and results in performance degradation and data corruption. 3. May promote “islands of information” problems. As easy to implement and use will create situation where too many people or department will create their own database and application. Thus prevent smooth and efficient functioning of the organization. These individual databases will create data inconsistency, data duplication, data redundancy and so on.
1.6.4 THE ENTITY RELATIONSHIP DATA MODEL The relational database model’s conceptual simplicity made it possible to expand the database’s scope. In turn, the rapidly increasing transaction and information requirements created the need for more complex database implementation structure, thus creating the need for more effective database design tools. Although the relational database model was a vast conceptual improvement over the hierarchical and network data base models, it still lacked the features that might make it an effective database design tool. Because it is easier to examine structures graphically than to describe them in text form, database designers find it desirable to use a graphical tool in which entities and their relationships can be pictured. Therefore, the search was on for a conceptual data model that was basically graphical in character. The E-R model is by far the most widely accepted and adapted graphical tool for data modeling in the relational database’s production environment the entity relationship model (ERM) yields a graphical representation of entities and their relationships in a database structure. It is precisely this graphic data representation that popularized the use of the E–R diagrams as a tool for conceptual-level data modeling.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
21
BASIC STRUCTURE E-R models are normally represented in an entity relationship diagram (ERD). The ERD uses graphic representations to model the database components. The basic E-R data model is based on the following components:
An entity is “a person, place, or thing about which data are to be collected and stored.” An entity is represented by a rectangle in the E-R model. Each row in the relational table is known as an entity instance or entity occurrence in the E-R model.
Each entity is described by a set of attributes. An attribute describes a particular characteristic of the entity. For example, the entity EMPLOYEE will have attributes such as a Social Security number, a last name, a first name, and so on.
A relationship describes an association among data. Most relationship describes association between two entities. Three types of relationships possible among data: one-to-many (1:M). Many-to-many (M:N), and one-to-one (1:1). ERD modelers use the term connectivity to label the relationship classification. Relationships are represented by a diamond connected to the related entities.
Examples
A One-to-Many (1:M) Relationship: A PAINTER can paint many PAINTING each PAINTING is painted by one PAINTER 1 PAINTER
M PAINTING
paints
A Many-to-Many (M:N) Relationship: an EMPLOYEE can learn many SKILLs each SKILL can be learned by many EMPLOYEEs M EMPLOYEE
N SKILL
learns
A One-to-One (1:1) Relationship: an EMPLOYEE manages one STORE; each STORE is manaded by one EMPLOYEE 1 EMPLOYEE
1 manages
STORE
Figure 1.6 RELATIONSHIP DEPICTIONS: THE ERD FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
22
ADVANTAGES 1. Easy to learn, use and understand. 2. Exceptional conceptual simplicity. Easy conceptual view of a database’s main entities and their relationships. Therefore complex database design are more easily crested and managed than they otherwise would be. 3. Visual representation. E-R model gives database designers, programmers and end users an easily understood visual representation of data and relationship. 4. Effective communication tool. E-R model is well integrated with the relational database model. Such integration helps to make relational database design a very structured process.
DISADVANTAGES 1. Limited constraint representation 2. Limited relationship representation. Relationships are represented as occurring between entities. Therefore relationship between attributes within entities cannot be represented. 3. No data manipulation language. 4. Loss of information content. The model tends to become crowed when attributes are represented. 1.6.5 OBJECT-ORIENTED DATABASE MODEL Increasingly complex real-world problems demonstrated a need for a data model that more closely represented the real world. In the object-oriented data model (OODM), both data and their relationships are contained in a single structure known as an object. In turn, the OODM is the basis for the object-oriented database management system (OODBMS). An OODM reflects a very different way to define and use entities. Like the relational model’s entity, an object is described by its factual content. But quit unlike an entity, an object includes information about relationships between the facts within object, as well as information about its relationships with other objects. Therefore, the facts within the object are given greater meaning. The OODM is said to be semantic data model because semantic indicated meaning. Subsequent OODM development has allowed an object also to contain all operations that can be performed on it, such as changing its data values, finding a specific data value, and printing data values. Because object includes data, various types of relationships and operational procedures, the object becomes self-contained, thus making the object-at least potentially-a basic building block for autonomous structure. The OO data model is based on the following components:
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
23
An object is an abstraction of a real-world entity. In general terms, an object may be considered equivalent to an ER model’s entity. More precisely, an object represents only one individual occurrence of an entity.
Attributes describe the properties of an object. Fro example, a PERSON object includes the attributes Name, Social Security Number, and Date of Birth.
Objects that share similar characteristics are grouped in classes. A class is a collection of similar objects with shared structure and behavior. However, a class is different from an entity set in that it contains a set of procedures known as methods. A class’s method represents a real-world action such a s finding a selected PERSON’s name, changing a PERSON’s name, or printing a PERSON’s address.
Classes are organized in a class hierarchy. The class hierarchy resembles an upsidedown tree in which each class has only one parent.
Inheritance is the ability of an object within the class hierarchy to inherit the attributes and methods of the classes above it.
ER Model
OO data model
CUSTOMER
INVOICE
generates
INVOICE
INV_NUMBER
INVOICEI
INV_DATE INV_SHIP_DATE
INV_DATE INV_NUMBER INV_SHIP_DATE INV_TOTAL
INV_TOTAL
LINE
CUSTOMER LINE
1
has
M
1.7 A Comparison of the OO model and the ER model
The OO data model represents an object as a box; all of the object’s attributes and relationships to other objects are included within the object box. The object representation of the INVOICE includes all related objects within the same object box. Note that the connectives (1 and M) indicate the relationship of the related objects to the
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
24
INVOICE. For example, the 1 next to the CUSTOMER object indicates that each INVOICE is related to only one CUSTOMER. The M next to the LINE object indicates that each INVOICE contains many LINEs.
The ER model uses three separate entities and two relationships to represent an invoice transaction. Because customers can buy more than one item at a time, each invoice references one or more lines, one item per line. And because invoices are generated by customers, the data-modeling requirements include a customer entity and a relationship between the customer and the invoice. THE RELATIONAL DATABASE MODEL
1.7 A LOGICAL VIEW OF DATA Relational model enables us to view data logically rather than physically. The practical significance of taking the logical view is that it serves as a remainder of the simple file concept of data storage. Although the use of a table, quite unlike that of a file, has the advantages of structural and data independence, a table does resemble a file from a conceptual point of view. Greater logical simplicity tends to yield simpler and more effective database design methodologies. Because the table plays such a prominent role in the relational model, it deserves a closer look.
1.7.1 Tables and their characteristics The logical view of the relational database is facilitated by the creation of data relationships based on logical construct known as a table. A table is perceived as twodimensional structure composed of rows and columns.
Characteristics of a Relational Table
A table is perceived as a two-dimensional structure composed of rows and columns.
Each table row (tuple) represents a single entity occurrence within the entity set.
Each table column represents an attribute, and each column has a distinct name.
Each row/column intersection represents a single data value.
All values in a column must conform to the same data format. For example, if the attribute is assigned an integer data format, all values in the column representing that attribute must be integer.
Each column has a specific range of values known as the attribute domain.
The order of the rows and columns is immaterial to the DBMS.
Each table must have an attribute or a combination of attributes that uniquely identifies each row.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
25
1.8 KEYS A key consists of one or more attributes that determine other attributes. It uniquely identifies a row in a table. There are two types of keys- intelligent keys and non-intelligent keys. An intelligent key is based upon data values such as a data, a last name or a combination of values. A non-intelligent key is completely arbitrary, having no function or meaning other than identification of the row. Intelligent keys are problematic because the key value must be changed to reflect changing data values. An updateable key presents its own problems, because all related tables must be updated every time the key is updated. As a general rule, intelligent keys must never be used, but it is not always possible to do that. A primary key is a column in the table whose purpose is to uniquely identify records from the same table. A foreign key is a column in the table that uniquely identifies the records from a different table. Primary and foreign key therefore have a symbiotic relationship, in which the foreign key is used to store values of another table’s primary key to describe the relationship between data from different tables. The key’s role is based on a concept known as Determination. “A determines B” indicates knowing the value of attributes A means we can look up the values of attributes B. the short hand notation can be defined as
A B If A determines B, C & D we can write A B, C, D FUNCTIONAL DEPENDENCE The attributes A is functionally dependent on the attribute A if A determines B and if each value in column A determines one and only one value in column B. Composite key Key composed of more than one attribute (i.e.,) multi-attribute key is known as composite key.
Fully functional dependence If the attribute B is functionally dependent on a composite key (A) but not on any subset of that composite key, the attribute (B) is fully functionally dependent on A.
KEY TYPE 1. Super key
DEFINITION An attribute that uniquely identifies each row in a table.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM 2. Candidate key
A minimal superkey. A superkey that does not contain a
26 subset of
attributes that is itself a superkey. 3. Primary key A candidate key selected to uniquely identify all other attribute values in any given row 4. Secondary key 5. Foreign key
An attribute used strictly for data retrieval purposes An attribute in one table whose values must either match the primary key in another table or be null.
1.9 INTEGRITY RULES Relational database integrity rules are very important to good database design. Many RDBMs enforce integrity rules automatically. However, it is much safer to make sure that the application design conforms to the entity and referential integrity rules. ENTITY INTEGRITY
Requirement
DISCRIPTION
All primary key entries are unique, and no part of a primary key may be null.
Purpose
Each row will have a unique identity, and foreign key values can properly reference primary key values.
Example
No invoice can have a duplicate number, nor can it be null. In short, all invoices are uniquely identified by their invoice number.
REFERENTIAL INTEGRITY DESCRIPTION
Requirement
A foreign key may have either a null–as long as it is not a part of its table’s primary key–or an entry that matches he primary key value in a table to which its is related (Every non-null foreign key value must reference an existing primary key value)
Purpose
It is possible for an attribute NOT to have corresponding value, but will be impossible to have an invalid entry. The enforcement of the referential integrity rule makes it impossible to delete a row in one table whose primary key has mandatory matching foreign key values in another table.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM Example
27
A customer might not yet have an assigned sales representative (number), but it will be impossible to have an invalid sales representative (number).
1.10 RELATIONAL ALGEBRAIC OPERATIONS The relational algebra is a collection of operators that take relations as their operands and return a relation as their result. The original algebra had eight operators, two groups of four each: 1. The traditional set operators union, intersection, difference and Cartesian product. 2. The special relational operators select, project join, and divide.
1.10.1 Set operation
UNION
INTERSECTION
SET DIFFERENCE
CARTESIAN PRODUCT
Pictorial Representation RU S
R∩S
R-S
R
R
R
S
S
S
Union
Intersection
Set Difference
Three of these operations – UNION, INTERSECTION and DEFFERENCE – require that the tables (relations) involved be union compatible. The CARTESIAN
PRODUCT can be defined on any two relations. Two relations are said to be union compatible if the following conditions are satisfied.
The two relations/ tables must contain the same number of columns.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
28
Each column of the first relation/table must be either the same data type as the corresponding column of the second relation/table or convertible to the same data type as corresponding column of the second.
Let us consider the Relation/Table CATALOG and ORDER as shown in Table 1.8
O Tom Clancy
Special Forces
Robert Ludlum
Sigma Protocol
Jack Higgins
Edge of Danger
Steve Martini
The Jury
Table 1.8 Relation C and O C Robin Cook
Shock
Matthew Reilly
Area 7
Tom Clancy
Special Forces
David Baldacci
Last Man Standing
Ken Follet
Jackdaws
Robert Ludlum
Sigma Protocol
Nicholas Sparks
The Rescue
UNION The result of this operation denoted by CUO is the relation that includes all tuples that are either in C or in O or in both C and O. Duplicates are eliminated.
INTERSETION The result of the intersection operation is a relation that includes all tuples that are in both C and O. The intersection operation is denoted by C ∩O. The intersection operation can be expressed as the Set Difference operation as follows: C ∩ O = C – (C – O). DIFFERENCE The difference of the intersection operation is denoted by a C – O. The result of the difference operation is the relation that contains all tuples in C but not in O. The result of the UNION INTERSECTION and DIFFERENCE operations on relations C and O are given in Table 1.9.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
29
CUO
C∩O
Robin Cook
Shock
Tom Clancy
Special Forces
Matthew Reilly
Area 7
Robert Ludlum
Sigma Protocol
Tom Clancy
Special Forces
David Baldacci
Last Man Standing
Ken Follet
Jackdaws
Robert Ludlum
Sigma Protocol
Nicholas Sparks
The Rescue
Jack Higgins
Edge of Danger
Steve Martini
The Jury
Table 1.9 Results of the UNION, INTERSECTION and DIFFERENCE Operations
UNION and INTERSECTION operations support commutative and associative property. This means that following are type:
A U B = B U A and A ∩ B = B ∩ A
A U (B U C) = (A U B) U C and A ∩ (B ∩ C) = (A ∩ B) ∩C.
The DIFFERENCE operation is not commutative. This means that A - B is not the same as B-A, or in other words A -B ≠ B - A.
CARTESIAN PRODUCT
The CARTESIAN PRODUCT is also known as CROSS PRODUCT or CROSS JOIN. It is denoted by an “X”. The CARTESIAN PRODUCT between relations C and O is denoted by C X O. The relations need not be union compatible. This operation is used to combine tuples from two relations in a combinatorial fashion. If the relations have n and m
tuples respectively, then the
CARTESIAN PRODUCT will have n*m tuples. Consider the two tables C and T, shown in table 1.10, one containing the book details and other the book types:
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
30
Table 1.10 Relations C and T T
C Robin Cook
Shock
Hardcover
Matthew Reilly
Area 7
Paperback
Tom Clancy
Special Forces
The CARTESIAN PRODUCT of the relations C and T is given in table 1.11
Table 1.11 CARTESIAN PRODUCT of Relations C and T CXT Robin Cook
Shock
Hardcover
Matthew Reilly
Area 7
Hardcover
Tom Clancy
Special Forces
Hardcover
Robin Cook
Shock
Paperback
Matthew Reilly
Area 7
Paperback
Tom Clancy
Special Forces
Paperback
1.10.2
Relational
algebraic operators The second group of relational algebraic operations is developed specifically for the relational database.
SELECT
This operation is used to select a subset of the tuples from a relation that satisfies a selection condition or search criteria. SELECT operation acts like a filter that allows only the tuples that match the specified criteria into the result set. The SELECT operation is represented as follows:
P_CODE
P_DESCRIPTION
PRICE
12345
Lamp
100
12346
Powerdrill
250
12347
Flashlight
150
12348
Boxfan
120
12349
Bulb
50
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
31
SELECT * FROM PRODUCT WHERE PRICE < 200
P_CODE
P_DESCRIPTION
PRICE
12345
Lamp
100
12347
Flashlight
150
12348
Boxfan
120
12349
Bulb
50
PROJECT The SELECT operation selects the rows (tuples) that match the selection condition from a table (relation). It discards the rows that do not match the selection condition. The PROJECT operation selects certain columns (attributes) from a table while discarding others. So of we are interested in only certain attributes or columns of a table, we can use the PROJECT operation to project the relation over these attributes only. Thus PROJECT yields all values for selected attributes. In other words, PROJECT yields a vertical subset of a table. The effect of a PROJECT is shown below. QUERY PROJECT PRICE yields PRICE 100 250 150 120 50 PROJECT P_DESCRIPTION and PRICE yields
P_DESCRIPTION
PRICE
Lamp
100
Flashlight
150
Boxfan
120
Bulb
50
The PROJECT operation removes any duplicate tuples from the result relation.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
32
ASSIGNMENT Some times it is more convenient to write a relational expression by assigning parts of it to temporary relation variables. The assignment operation, denoted by ←, works like assignment in a programming language.
The evaluation of an assignment does not result in any relation being displayed to the user. Rather the result of the expression to the right side of ← is assigned to the relation variable on the left of the ←. This relation variable may be used in subsequent expressions. JOIN Join allows information to be combined from two or more tables. JOIN is the real power behind the relational database, allowing the use of independent tables linked by common attributes. EMPLOYEE and PAY table shown below shows the join operations.
EMPLOYEE TABLE
ENUMBER NAME
ADDRESS
DESIGNATION
SALARY
AGE JOIN_DATE -----------------------------------------------------------------------------------------------------------1001
RAMAN
K.K.NAGAR
CLERK
1006
SUJATHA
NELLORE
1002
DEVIKA
ANNANAGAR
1672
YASÈIKA
KAMARAJNAGAR RECEPTIONIST 9000
1023
PRASANNA SALEM
1234
BALA
MANAGER ACCOUTANT
26
23-JAN-99
5500
27
05-JUN-01
4700
DEVELOPER
NATRAJ NAGAR
5000
CLERK
27
22-APR-99
29 28-MAR-05
7000
26
15-AUG-00
5000
28
07-SEP-03
PAY TABLE
ENUMBER
BP
HRA
CCA
DA
IT
SALARY
---------------------------------------------------------------------------1006
3500
1000
500
1000
500
5500
1002
2500
900
500
1000
200
4700
1023
4000
1300
1000
1600
900
7000
1672
4500
1600
1500
2600
1200
9000
1234
3000
1000
500
1000
500
5000
1022
3000
1000
500
1000
500
5000
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
33
1. EQUAL JOIN: *************** SQL> select name, address, age, bp from employee, pay where employee.enumber = pay.enumber
NAME
ADDRESS
AGE
BP
------------------------ -------------------- --------- ----------------DEVIKA
ANNANAGAR
27
2500
SUJATHA
NELLORE
27
3000
SUJATHA
NELLORE
27
3500
PRASANNA
SALEM
26
4000
BALA
NATRAJ NAGAR
28
3000
YASHIKA
KAMARAJ NAGAR
29
4500
2. NON EQUAL JOIN: ******************** SQL>select name, address, bp, it from employee e1.pay p1 where e1.enumber = p1.enumber and p1.salary>5000; NAME
ADDRESS
BP
IT
----------------------------------------------------- ----------SUJATHA
NELLORE
3500
500
PRASANNA
SALEM
4000
900
YASHIKA
KAMARAJ NAGAR 4500
1200
3. SELF JOIN: ************* SQL> select e1.name, e1.designation from employee e1, employee e2 where e1.designation = e2.designation and e1.enumber<>e2.enumber order by designation; NAME
DESIGNATION
------------------------------------------BALA
CLERK
RAMAN
CLERK
4. NATURAL JOIN: ***************** SQL> select e1.name, e1.age, p1.it, p1.bp from employee e1, pay p1 where e1.enumber = p1.enumber; NAME
AGE
IT
BP
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM ----------- --------- ------------ ----------DEVIKA
27
200
2500
SUJATHA
27
500
3000
SUJATHA
26
500
3500
PRASANNA
26
900
4000
BALA
28
500
3000
YASHIKA
29
1200 4500
5. LEFT OUTER JOIN: ******************** SQL> select employee.enumber, employee.name, pay.bp, pay.salary from employee, pay where employee.enumber(+) = pay.enumber;
ENUMBER
NAME
BP
SALARY
-----------------------------------------------------1002
DEVIKA
2500
4700
1006
SUJATHA
3500
5500
3000
5000
1023
PRASANNA
4000
7000
1234
BALA
3000
5000
1672
YASHIKA
4500
9000
6 rows selected. 6. RIGHT OUTER JOIN: ********************** SQL> select employee.enumber, employee.name, pay.bp, pay.salary from employee, pay where employee.enumber = pay.enumber(+);
ENUMBER
NAME
BP
SALARY
----------------------------------------------------------1001
RAMAN
1002
DEVIKA
2500
4700
1006
SUJATHA
3500
5500
1023
PRASANNA 4000
7000
1234
BALA
3000
5000
1672
YASHIKA
4500
9000
1006
BALA
4500
1200
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
34
DATABASE MANAGEMENT SYSTEM
35
Thus a natural join links tables by selecting only the rows with common values in their common attribute(s).
DIVISION DIVISION operation (represented by ÷) can be very useful in some data manipulation operations. Divide require one single-column table and two-column table. Using the example shown in Table 1.12, note that: a. Table 1 is “divide” by Table 2. Table 1 and 2 both contains the column CODE but do not share LOC b. To be included in the resulting Table 3, a value in the unshared column (LOC) must be associated (in the dividing Table 2) with every value in Table 1. c.
The only value associated with both A and B is 5.
Table 1
CODE A B C D D E A A B
Table 2
LOCDIVIDE Yields 5 CODE 5 A 6 B 7 8 8 9 4 3 Table 1.12 DIVIDE
LOC 5
1.11 THE DATA DICTIONARY AND THE SYSTEM CATALOG The data dictionary provides a detailed accounting of all tables found within the user/designer created database. Thus, the data dictionary contains at all of the attribute names and characteristics for each table in the system. In short, the data dictionary contains metadata – data about data. The data dictionary is something described as “the database designer’s database” because it records the design decision about tables and their structures. Like the data dictionary, the system catalog contains metadata. The system catalog can be described as a detailed system data dictionary that describes all objects within the database, including data about table names, the table’s creator and creation data, the number of columns in each table, the data type corresponding to each column, index filename, index creator, authorized user, and access privileges. Because the system catalog contain all required data dictionary information, the terms system catalog and data dictionary are often used interchangeably. In fact, current relational database software generally provides only a system catalog, from which the
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
36
designer’s data dictionary information may be derived. The system catalog is actually a systemcreated database whose tables store the user/designer-created database characteristics and contents. Therefore, the system catalog tables can be queried just like any user/designer-created table. In effect, the system catalog automatically produces database documentation. As new tables are added to the database, that documentation also allows the RDBMS to check for and eliminate homonyms and synonyms. In general terms, homonyms are similar-sounding words with different meanings, such as boar and bore, or identically spelled words with different meanings, such as fair (meaning “just”) and fair (meaning “festival”). In a database context, the word homonym indicates the use of the same attribute name to label different attributes. In a database context, a synonym is the opposite of a homonym and indicates the use of different names to describe the same attribute. For example, car and auto refer to the same object. Synonyms must be avoided.
1.12 RELATIONSHIP WITHIN THE RELATIONAL DATABASE
There are three possible relationships between related entities or tables. They are: oneto-one, one-to-many, and many-to-many. Relational databases are designed to work most effectively with one-to-many relationships between tables and expressed using primary and foreign keys. Defining a column as a primary key in the database will ensure that no duplicate values are contained in that column. In other words, a primary key will be unique. In the business world, one-to-one relationships are few and far between. One-to-many and many-to-many relationships, on the other hand, are common. However, many-to-many relationships are not permitted in a relational database and must be converted into one-to-many relationships. Relational database are comprised almost entirely of tables in one-to-many relationships. One-to-one (1:1) relationships are rare, because data elements are related to this way are normally place in the same table. When a one-to-one relationship is desired, it can be expressed simply as a one-to-many relationship using primary and foreign keys. Relationship types are determined by interpreting business rules and common sense that characterize entries in relationships. Consider the business rule ‘Employees in the company are each assigned their own office.’ This rule means that for each employee, there exist a unique office, and for each office there exist a unique employee. The entries in the example are EMPLOYEE and OFFICE.
Both
EMPLOYEE:OFFICE
and
OFFICE:EMPLOYEE
are
one-to-one
(1:1)
relationships.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
37
One-to-many (1:N) relationships are the most common. Because one-to-one relationships are rare and because all many-to-many relationships must be converted into one-tomany relationships most database are comprised entirely of one-to-many relationships. A typical business rule for a one-to-many relationship might be stated as ‘each employee is assigned to many department; each department has many employees.’ The two entities represented in the example above are EMPLOYEE and DEPARTMENT. The EMPLOYEE:DEPARTEMENT
relationship
is
1:1
for
DEPARTEMENT:EMPLOYEE
the
relationship is 1:N. A many-to-one (N:1) relationship is a mirror image of a one-to-many (1:N) relationship. They are treated identically. Once a one-to-many relationship has been identified, it is easy to create a link between two entries–the entity (table) on the ‘many’ side of the relationship receives the key attribute (primary key) from the entity (table) on the ‘one’ side of the relationship. A primary key is the ‘one’ side of the relationship, and the foreign key is the ‘many’ side of the relationship. Many-to-many (M:N or N:N or M:N) relationships are problematic and cannot be adequately expressed directly in a relational database. A classic examples of a many-to-many relationship is this; A student mat enroll in many classes ( 1:N): class may have many enrolled students (1;N ). Entries in a many-to- many relationship must be linked in a special way-though a third entity, called a composite entity. A composite entity has only one function-to provide an indirect link between two entities in a M: N relationship. In the language of tables, a composite entity is termed a linking table. A composite entity has no key attribute of its own; rather the key attributes from each of the two entities it links, and combines them to form a composite key attribute. In other words, many-to-many relationships are usually expressed using intersection tables. An intersection table contains two or more foreign keys, relating the primary key values of two or more tables to each other. The role of an intersection table is to covert the many-tomany relationship into two one-to-many relationship that can be easily handled by the database. Figure 1.13 illustrates a composite entity that now indirectly links the STUDENT and CLASS entities. The M:N relationship between STUDENT and CLASS has been dissolved into two oneto-many relations:
The 1:N relationship between STUDENT and STUDENT CLASSES reads this way: for one instance of STUDENT, there exists zero, one, or many instances of STUDENT CLASSES; but for one instance of STUDENT CLASSES, there exists zero or one instance of STUDENT.
The 1:N relationship between CLASS and STUDENT CLASSES reads this way. For one instance of CLASS, there exists zero, one, or many instances of STUDENT CLASSES; but for on instance of STUDENT CLASSES, there exists zero or one instance of CLASS.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
STUDENT
38
CLASS
* Student_ID
* Class_ID
Last_Name
Class_Name
First_Name
Instructor
STUDENT CLASSES * Student_ID * Class_ID
Figure 1.13 Composite Entity and intersection Table
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
39
UNIT 2 ENTITY RELATIONSHIP (E-R) MODELING BASIC MODELING CONCEPTS
Models refers to as abstraction of a more complex real world object
Data models – Relatively simple representation usually graphics of complex real world data structure.
Model’s main function is to help us and understand the complexities of the real world environment.
Data model represents data structures, their characteristics, relations, constraints and transformations.
2.2 DATA MODELS: DEGREE OF DATA ABSTRACTION The commonly used view of data approach is the three–level architecture suggested by ANSI / SPARC (American National Standards Instituted /Standards Planning and Requirements Committee). The ANSI/SPARC architecture is divided into three levels usually referred to as the internal level, the external level, and the conceptual level (see Fig 2.1), broadly speaking:
Degree of Abstraction High
ER
Characteristics Hardware-independent Software-independent
Medium
Low
Relational Object-
Hardware-independent
Oriented
Software-dependent
Network
Hardware-dependent
Hierarchical
Software-dependent
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
External Level
External Schema
40
External Schema
External Schema
External- Conceptual Mapping ---------------------------------------------------------------------
Conceptual Level
Conceptual Schema
Conceptual-Internal Mapping --------------------------------------------------------------------Internal Level
Physical Schema Physical Data Organization
Database
The internal Level (also known as the storage Level) is the one closest to physical storage – that is, it the one concerned with the way the data is stored inside the system.
Figure 2.1 ANSI/SPARC Database Architecture
The External Level (also known as the user logical level) is the one closest to the users that is, it is the one concerned with the way the data is seen by individual users
The Conceptual Level (also known as the community logical, or sometimes just the logical level, unqualified) is a level of indirection between the other two.
Observe that the external level is concerned with individual user perceptions, while the conceptual level is concerned with a community user perception. There will be many distinct “external views” and precisely one “conceptual view” consisting of a similarly abstract representation of the database in its entirety. And there will be precisely one “internal view” representing the database as stored internally. External and conceptual levels are with model levels, while the internal level is an implementation level; in other words, the external and
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
41
conceptual levels are defined in terms of user oriented construct such a records and fields, While the internal level is defined in terms of machine oriented constructs such as bits and bytes..
THE EXTERNAL LEVEL
The external level is the individual user level. Each user has a language as his or her disposal.
For the application programmer, the language will be either a conventional
programming
language (e.g., Java, C++. or C) or perhaps a proprietary language that is specific to the system.
For the end-user, the language will be either a query language
(probably SQL) or some
special– purpose language, perhaps forms or menu- driven, tailored to that user’s requirements. To return to the architecture: The ANSI / SPARC term for an individual user’s view is an external view. In general, an external view consists of many occurrences of many types of external record. Each external view is defined by means of an external schema, which consists basically of definitions of each of the various external record types. The external scheme is written using the DDL (Data Definition Language.) portion of the user’s data sublanguage.
THE CONCEPTUAL LEVEL
The conceptual view is a representation of the entire information content of the database again (as with an external view) in a form that is some what abstract in comparison with the way in which the data is physically stored.
Broadly speaking, the conceptual view is intended to be a view of the data “as it really is”, rather than as users are forced to see it by the limitations of the particulars language or hardware they might be using.
The conceptual view consists of many occurrences of many types of conceptual records.
The conceptual view is defined by means of the conceptual schema which includes definitions of each of the various conceptual record types.
The conceptual schema is written using another data definition language, the conceptual DDL.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
42
THE INTERNAL LEVEL
The third level of the architecture is the internal level.
The internal view is a low level representation of the entire database.
Internal view consists of many occurrences of many types of internal record.
The internal view is described by means of the internal schema, which not only defines the various stored record types but also specifies what indexes exist, how stored fields are represented, what physical sequence the stored records are in and so on.
The internal schema is written using get another data definition language– the internal DDL.
2.3 THE ENTITY RELATIONSHIP (ER) MODEL (E – R) Model a tool that is commonly used to
Translate different views of data
Define data processing and constraint requirements
Help implement the database
E – R Model Components.
E – R Model Basis of an ER diagram
ERD represents the conceptual database as viewed by the end users
Three main components of ER model are entities, attributes and relationships.
ENTITIES
Entity in the E-R model corresponds to a table and not to a row in the relational environment.
In E-R model specific table row is called as entity instance or entity occurrence.
Entity is represented by rectangle
Entity name is specified in capital letters.
ATTRIBUTES
Represented by ovals connected to the entity with a line
Also have a domain. A domain is the attributes set of possible values Eg: Marks: 0 – 100 Sex:
Female or Male
Attributes may share a domain
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
43
Newly declared attributes inherit the characteristics of an existing attributes if same attributes name is used.
Primary Key attributes are underlined in E-R diagrams.
The attributes of the STUDENT entity
STU_FNAME
STU_LNAME
STU_INITIAL
STUDENT
STU_EMAIL
STU_PHONE
DOMAINS Attributes have a domain. A domain is the attribute’s set of possible values The domain for the (character) attribute SEX consists of only two possibilities M or F. Attributes may share a domain. The data dictionary may let a newly declared attribute inherit the characteristics of an existing attributes if the same attribute name is used.
IDENTIFIERS (Primary Keys)
The ERM uses identifiers to uniquely identify each entity instance. In the relational model, such identifiers are mapped to primary keys in tables. Identifiers are underlined in the ERD. Key attributes are also underlined in a frequently used table structure.
SHORTHAND FORMAT TABLE NAME (KEY_ATTRIBUTE 1, ATTRIBUTE 2, ….. ATTRIBUTE K)
STUDENT (STU_INITIAL, STU_FNAME, STU_LNAME, STU_EMAIL, STU_PHONE)
CAR (CAR_ID _NUM, MOD_CODE, CAR_YEAR, CAR _COLOR)
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
44
Composite Primary Keys Ideally, a primary key is composed of only a single attribute. However, it is possible to use a composite key, that is, a primary composed of more than one attribute.
COMPOSITE AND SIMPLE ATTRIBUTE Attributes are classified as simple or composite. Composite attribute is an attributes that can be further subdivided to yield additional attributes.
Eg: ADDRESS can be subdivided into street, city, state, and zip code. PHONE_NUMBER can be subdivided into area code and exchange number.
Simple attribute is an attribute that cannot be subdivided. Eg: age, sex and marital status
Single-Valued Attributes A single_valued attribute can have only a single value. For example, a person can have only one Social Security number. Single_valued attribute is not necessarily a simple attribute.
Multivalued Attributes Multivalued attributes are attributes that can have many values. Eg: Household may have several different phones each with its own number. In E-R model the multivalued attributes shown by double line connecting the attributes to the entity.
A multivalued attribute in an entity
MOD_CODE CAR_YEAR
CAR CAR_COLOR CAR_VIN
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
45
Resolving multivalued attribute problems Although the conceptual model can handle M:N relationships and multivalued attributes, we should not implement them in the RDBMS. So if multivalued attributes exist, the designer must decide on one of two possible course of action. 1. Within the original entity, create several new attributes, one for each of the original multivalued attribute’s components. For example, the CAR entity’s attribute CAR COLOR can be split to create the new attributes CAR_TOPCOLOUR, CAR _BODYCOLOR, and CAR_TRIMCOLOR.
CAR_YEAR CAR_TOPCOLOR
MOD_CODE
CAR_TRIMCOLOR
CAR CAR_VIN
CAR_BODYCOLOR
2. Create a new entity composed of the original multivalued attribute’s components. The new (independent) CAR_COLOR entity is then related to the original CAR entity in a 1:M relationship Note that such a change allows the designer to define color for different sections of the car.
Components of the Multivalued Attribute SECTION
COLOR
Top
White
Body
Blue
Trim
Gold
Interior
Blue
Derived Attributes Finally, an attribute may be classified as a derived attribute. A derived attribute is an attribute whose value is calculated (derived) from other attributes. The derived attribute need not be physically stored within the database instead; it can be derived by using an algorithm. For
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
46
example, an employee’s age, EMP_AGE, may be found by computing the integer value of the difference between the current date and the EMP_DOB. A derived attribute is indicated at dashed line connecting the attribute and the entity. Derived attributes are sometimes referred to as computed attributes.
Depiction of a Derived Attribute
EMP_INITIAL
EMP_FNAME
EMP_DOB EMP_LNAME
EMP_NUM
EMP_AGE
EMPLOYEE
Advantages and Disadvantages of Storing Derived Attributes DERIVED ATTRIBUTE STORED Advantage
Disadvantage
Saves CPU processing cycles Data Value is readily available Can be used to keep track of historical data Requires constant maintenance to ensure derived value is current, especially if any values used in the calculation change.
NOT STORED Saves storage space Computation always yields current value Uses CPU processing cycles Adds coding complexity to queries
RELATIONSHIPS A relationship is an association between entities. The entities that participate in a relationship are also known as participants. Relationships between entities always operate in both directions. That is, to define the relationship between the entities named CUSTONER and INVOICE, you would specify that
A CUSTOMER may generate many INVOICEs.
Each INVOCE is generated by one CUSTOER
Because you know both directions of the relationship between CUSTOER and INVOICE, it is easy to see that this relationship can be classified as 1:M. Relationships are represented by diamond shaped symbols.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
47
Example:
PROFESSOR
CLASS
teaches
CONNECTIVITY AND CARDINALITY Term connectivity is used to describe the relationship classification.
One-to-Many M 1
1 PROFESSOR
teaches
CLASS
Many-to-Many STUDENT
M 1
Enrolls in
M 1
CLASS
CARDINALITY Cardinality expresses the minimum and maximum number of entity occurrences associated with one occurrence of the related entity. In the ERD, cardinally is indicated by placing the appropriate numbers beside the entities, using the format (x, y). The first value represents the minimum number of associated entities, while the second value represents the maximum number of associated entities.
One-to-Many 1 PROFESSOR
teaches
M 1
(0, 3)
CLASS (1, 1)
Many-to-Many (1, 6) STUDENT
M 1
Enrolls in
M 1
(0, 35) CLASS
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
48
EXISTENCE DEPENDENCY If an entity’s existence depends on the existence of one or more other entities, it is said to be existence-dependent. Eg COURSE generates CLASS COURSE Primary Key Foreign Key
CLASS
COURSE table must be created before the CLASS table. If an entity can exist apart from one or more related entities, it is said to be existenceindependent. RELATIONSHIP PARTICIPATION A participation entity in a relationship is either optional or mandatory. The participation is optional if one entity occurrence does not require a corresponding entity occurrence in a particular relationship.
Example “PROFESSOR teaches CLASS” is quite possible.
CLASS is optional to PROFESSOR But PROFESSOR is mandatory to CLASS
Optional entities are represented by a small circle on the side of the optional entity.
M
1 PROFESSOR
teaches
CLASS
WEAK ENTITIES A weak entity is one that
1. Is existence-dependent: that is it cannot exist without the entity with which it has a relationship. 2. Has a primary key that is partially or totally derived from the parent entity in the relationship
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
49
RECURSIVE ENTITES A recursive entity is one in which a relationship can exist between occurrences of the same entity set.
1
1 EMPLOYEE
Is married to
1
M
M
EMPLOYEE
COURSE
manages
requires
N
COMPOSITE ENTITIES The bridge entity is known as a composite entity. The bridge is an entity composed of the primary keys of each of the entities to be connected. It is represented by diamond shape within a rectangle.
ENTITY SUPERTYPES AND SUBTYPES Supertype contains the shared attributes while the subtype contains the unique attributes.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
50
SUPERTYPE
EMPLOYEE
G
G
PILOT
MECHANIC
ACCOUNTANT
Represents disjoint relationship (No overlapping)
G
Supertype
PERSON
Gs
EMPLOYEE
Supertype/Subtype
STUDENT Gs
Gs
PROFESSOR
ADMIN
UG
PG
Subtype
Gs Represents overlapping relationship. 2.4 DATABASE TABLES AND NORMALIZATION The table is a basic building block in the database design process. Normalization is a process for evaluating and correcting table structures to minimize data redundancies, thereby reducing the likelihood of data anomalies. The normalization process involves assigning attributes to tables based on the concept of determination.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
51
Normalization works through a series of stages called normal forms. The first three stages are described as first normal form (1NF), second normal form (2NF), and third normal form (3NF). From a structural point of view, 2NF is better than 1NF and 3NF is better than 2NF. Occasionally we expect to denormalize some portions of a database design in order to meet performance requirements. (Denormalization produces a lower normal form; that is, a 3NF will be converted to a 2NF through denormalization.)
2.5 THE NEED FOR NORMALIZATION Normalization is needed to eliminate redundancy and promotes integrity. Normalization is a formal process of developing data structures in a manner that eliminates redundancy and promotes integrity. Data normalization is a corner stone of the relational theory. Many people do not consider a set of data structures to be relational unless it is fully normalized. There are many steps to Data normalization. Each normal form builds upon the last and functions as both a process and a criterion. For example, a database structure that can be developed to the third normal form can be said to satisfy the second normal form. Normalization, as we have been, is accomplished in steps or stages, each of which corresponds to a normal form. A normal form is a state of a table that results from applying simple rules regarding functional dependencies to that table.
2.6 NORMAL FORMS The process of Normalization was first developed by E.F. Codd. Normalization is often performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form. Three normal forms were initially proposed, namely.
FIRST
SECOND
THIRD NORMAL FORM.
Subsequently, a stronger definition of the third normal forms was introduced by R Boyce and E.F.Codd, referred to as Boyce-Codd Normal form (BCNF). Normalization is the process of building database structures to store data. Thus Normalization is a process or assigning attributes to entities.
Normalization reduces data
redundancies and by extension, helps eliminate the data anomalies that result from those redundancies.
Normalization works through a series of stages called normal forms.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
52
First Normal Form (INF) – The multi-valued attributes (called repeating groups) should be removed, i.e., elimination of repeating groups.
Second Normal Form (2NF) – The partial functional dependencies have to be removed, i.e., elimination of redundant data.
Third Normal (3Nf) – The transitive dependencies have to be removed, i.e., elimination of columns not dependent on the key.
Boyce–Codd Normal Form (BCNF) – The remaining anomalies that result from functional dependencies are removed.
Fourth Normal Form (4NF) – Multi-valued dependencies are removed, i.e., isolation of independent multiple relationships.
Fifth Normal Form (5NF) – Any remaining anomalies are removed. In this normal form we isolate semantically related multiple relationships.
2.6.1 FIRST NORMAL FORM (INF) First Normal form (1NF) is a relation in which the intersection of each row and column contains one and only one value. To transform the unnormalized table (a table that contains one or more repeating groups) to first normal form, we identify and remove the repeating group with the table. A repeating group is a set column that store similar information that repeats in the some table. Consider the following table which contains the contact tracking information:
CREATE TABLE CONTACTS (CONTACT_ID
INTEGER
NOT NULL,
L_NAME
VARCHAR (20)
NOT NULL,
F_NAME
VARCHAR (20)
CONTACT_DATE1
DATE,
CONTACT _DESC1
VARCHAR (50),
CONTACT_DATE2
DATE,
CONTACT _DESC2
VARCHAR (50));
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
53
Fig 2.2 CONTACTS-CONVERSATION Relationship
The above data structure contains repeating groups of the date and description of two conversations. The only advantage of designing the table like this is that it avoids the need for a relationship. But the disadvantages are many: The structure limits the number of conversation to two, which will create problems when more than two conversations need to be stored. The structure also makes it difficult to do any kind of meaningful searching using the columns, for example, to locate conversation on a specific data.
To eliminate the repeating group, the group is moved to another table, which is then related to the parent table. The primary key of the parent table (CONTACT_ID) is stored in the second table. Moving the repeating group into another table allows any number of conversations to be recovered and searched easily. The primary and foreign key relationships are defined to ensure that conversations that do not relate to a contact are not recorded. The new arrangement is shown in Figure 2.2. The DDL Statements for creating the table are also given. CREATE TABLE CONTACTS (CONTACT_ID
INTEGER
NOT NULL,
L _NAME
VARCHAR (20)
NOT NULL,
F_ NAME
VARCHAR (20));
CREATE TABLE CONVERSATION (CONTACT_ID
INTEGER
CONTACT_DATE
DATE,
NOT NULL,
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM CONTACT_DESC
54
VARCHAR (50));
ALTER TABLE CONTACTS ADD CONSTRAINT CONTACT_ID_PK PRIMARY KEY (CONTACT_ID);
ALTER TABLE CONVERSATION ADD CONSTRAINT CONTACT_ID_PK FOREIGN KEY (CONTACT_ID) REFERENCE CONTACTS (CONTACT_ID);
In summary every table should have a primary key, and each set of repeating groups should appear in its own table. When these criteria are satisfied, we say that the normal form is achieved.
.2 SECOND NORMAL FORM (2NF) Second normal form (2NF) is based on the concept of full functional dependency. Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A. a functional dependency AB is full functional dependency if removal of any attribute from A, results in the dependency not being sustained any more. A functional dependency AB is partially dependent if there is some attribute that can be removed from A and the dependency still holds. A relation is in second normal form if it is in first normal form and very non – primary – key attribute is fully and functionally dependent on the primary key. Thus no non – key attribute is functionally dependent on the primary key. A relation in the first normal form will be in the second normal form if one of the following conditions is satisfied: The primary key consists of only one attribute No non-key attributes exist in the relation. Every non – key attribute is functionally dependent on the full set of primary key attributes.
Consider a relation called EMPLOYEE with the attributes Emp_ID, Department, F_Name, L_Name, Salary and Birth_ Date. Here assume that the primary key for this relation is the composite key Emp_ID + Department.
In this case the non-key attribute such as F_Name,
L_Name, Salary and Birth_ Date are functionally dependent on part of the primary key (Emp_ID) but not on the Department. A partial functional dependency is a functional dependency in which one or more non-key attributes are functionally dependent on part of the primary key. The partial functional dependency in the EMPLOYEE table creates a redundancy in that relation which
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
55
results in anomalies when the table is updated. Redundant data is data that is unnecessarily expressed multiple times or that depends only on part of a multi-value key.
For example,
consider the following table: CREATE TABLE EMPLOYEE (EMP_NO
INTEGER
L _NAME
VARCHAR (20)
F_ NAME
VARCHAR (20));
DEPT_CODE
INTEGER,
DESCRIPTION
VARCHAR (50));
NOT NULL, NOT NULL,
This table contains redundant data, namely the department description, which depends only on the DEPT_CODE and does not vary based on the value of the EMP_NO, which is the primary key of the table. So by storing the department code and description in a different table the redundancy is eliminated.
CREATE TABLE EMPLOYEE (EMP_NO
INTEGER
L _NAME
VARCHAR (20)
F_ NAME
VARCHAR (20));
DEPT_CODE
NOT NULL, NOT NULL,
INTEGER);
CREATE TABLE DEPARTMENT (DEPT_CODE DESCRIPTION
INTEGER VARCHAR (50)
NOT NULL, NOT NULL);
This relationship is shown in Figure 2.3
Figure 2.3 EMPLOYEE-DEPARTMENT Relationships
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
56
2.6.3 THIRD NORMAL FORM (3NF) A relation is in third normal form (3NF) if it is in second normal form and no transitive dependencies exist. Transitive dependency is a condition where A, B and C are attributes such that if AB and BC, then C is transitively dependent on A via B. Transitive dependency is a type of functional dependency. Consider the following functional dependencies: emp_no department department dept_ head The transitive dependency emp_nodept_head exists via the department attribute. This condition holds, as emp_no is not functionally dependent on department or dept_head. The normalizations of 2NF relations to 3NF involve the removal of transitive dependencies. We remove the transitively dependent attribute(s) from the relation by the placing the attribute(s) in a new relation along with a copy of the determinant. A transitive dependency in a relation is a functional dependency between two or more non-key attribute. For example, in the following table, the columns F_NAME and L_NAME contribute to describing a specific contact using the primary key CONTACT_ID. But, the COMPANY_NAME and COMPANY_LOCATION do not contribute to describing the record with the given CONTACT_ID, since it identifies an individual and not a company.
CREATE TABLE CONTACTS {CONTACT_ID
INTEGER
L_NAME
VARCHAR (20)
F_NAME
VARCHAR (20),
COMPANY_NAME
VARCHAR (20),
COMPANY_LOCATION
VARCHAR (50));
NOT NULL, NOT NULL,
In the above relation CONTACT_ID is the primary key, so that all the remaining attributes are functionally dependent on this attribute. However, there is a transitive dependency– COMPANY_LOCATION is dependent on COMPANY_NAME and COMPANY_NAME is functionally dependent on CONTACT_ID. As a result of the transitive dependency, there are update anomalies in the CONTACTS table as follows:
Insertion anomaly – A new company cannot be inserted to the CONTACTS table until a contact person has been assigned to the company.
Deletion anomaly – If a company that has only one contact person is deleted from the table, we will lose the information about that company, as the company information is associated with that person.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
57
Modification anomaly – If a company changes its location, we will have to make the change in all the records whenever the company name appears. Suppose, if the company has five contact persons, then we will have to make the changes in five places.
The insertion, deletion and modification anomalies arise as a result of the transitive dependency. The transitive dependency can be removed by decomposing the above table into two as shown in Figure 2.4.
CREATE TABLE CONTACTS (CONTACT_ID
INTEGER
NOT NULL,
L_NAME
VARCHAR (20)
NOT NULL,
F_NAME
VARCHAR (20),
COMPANY_ID
INTEGER);
CREATE TABLE COMPANY (COMPANY_ID
INTEGER
COMPANY_NAME
VARCHAR (20)
COMPANY_LOCATION
VARCHAR (50));
NOT NULL, NOT NULL,
Figure 2.4 CONTACTS-COMPANY Relationships When all the columns in a table describe and depend upon the primary key, the table is said to satisfy the third normal form. 2.6.4 BOYCE-CODD NORMAL FORM (BCNF) Database relations are designed so that they have neither partial dependencies nor transitive dependencies, because these types of dependencies result in update anomalies. A functional dependency describes the relationship between attributes in a relation. For example, if ‘A’ and ‘B’
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
58
are attributes in relation R, ‘B’ is functionally dependent on ‘A’, if each value of ‘A’ is associated with exactly one value of ‘B’. For example, in the CONTACTS table, we can say that L_NAME, F_NAME and COMPANY _ID are functionally dependent on CONTACT_ID. These dependencies are expressed as follows:
CONTRACT_ID L_NAME
CONTRACT_IDF_NAME
CONTRACT_IDCOMPAY_ID
CONTRACT_ID{L_NAME,F_NAME,COMPAY_ID}
{L_NAME,F_NAME,COMPAY_ID}CONTACT_ID
The left–hand side and the right-hand side of a functional dependency are sometimes called the determinant and dependent respectively. As the definition states, the determinant and the dependent are both, sets of attributes. When the set contains more than one attribute we will use the braces to enclose them as shown above. A functional dependency AB is full functional dependency if removal of any attribute from ‘A’ results in the dependencies not being sustained any more. A functional dependency AB is partially dependent if there is some attribute that can be removed from ‘A’ and the dependencies still holds. Transitive dependency as we have seen is a condition where A, B and C are attributes of a relation such that if AB and BC, then C is transitively dependent on A via B i.e. (AC) exist via attribute B. One of the major aim of relational database design is to group attributes into relations so as to minimize data redundancy and thereby reduce the file storage space required by the implemented base relations. One of the aim of the normalization is to remove the update anomalies. BOYCE CODD normal form (BCNF) is based on functional dependencies that take into account all candidate keys in a relation. For a relation (table) with only one candidate key, 3NF and BCNF are equivalent. A relation is in BCNF if and only if every determinant is a candidate key. To test whether a relation is in BCNF, we identify all determinants and make sure that they are candidate keys. A determinant is an attribute or a group of attributes on which some other attributes is fully functionally dependent.
2.7
DENORMALIZATION
Normalized relation is an important database design goal and only one of many such goals.
Good database design also considers processing requirements
Tables are decomposed to conform to normalization requirements, i.e., no of tables expands.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
59
Joining large no of tables takes additional disk input output operation and processing logic, thereby reducing system speed.
Some occasional circumstances that allow some degree of denormalization to increase processing speed.
Advantages of higher processing speed must be carefully weighted against the disadvantages of data anomalies
In other words use denormalization cautiously.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
60
UNIT 3 RELATIONAL DATABASE 3.1 SQL SQL language as a “query language,” it can do much more than just query a database. It can define the structure of the data, modify data in the database, and specify security constraints. SQL uses a combination of relational-algebra and relational-calculus constructs.
The SQL language has several parts:
Data-definition language (DDL). The SQL DDL provides commands for defining relational schemas, deleting relations, and modifying relation schemas.
Interactive data-manipulation language (DML). The SQL DML includes a query language based on both the relational algebra and the tuple relational calculus. It includes also commands to insert tuples into, delete tuples from, and modify tuples in the database.
View definition. The SQL DDL includes commands for defining views.
Transaction control. SQL includes commands for specifying the beginning and ending of transactions.
Embedded SQL and dynamic SQL. Embedded and dynamic SQL define how SQL statements can be embedded within general-purpose programming languages, such as C, C++, Java, PL/I, COBOL, PASCAL, and FORTRAN.
Integrity. The SQL DDL includes commands for specifying integrity constraints that the data stored in the database must satisfy. Updates that violate integrity constraints are disallowed.
Authorization. The SQL DDL includes commands fro specifying access rights to relations and views.
3.2 BASIC STRUCTURE A relational database consists of a collection of relations, each of which is assigned a unique name. SQL allows the use of null values to indicate that the value either is unknown or does not exist. It allows a user to specify which attributes cannot be assigned null values. The basic structure of an SQL expression consists of three clauses: select, from, and where.
The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributed desired in the result of a query.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
61
The form clause corresponds to the Cartesian-product operation of the relational algebra. It lists the attributes to be scanned in the evaluation of the expression.
The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations here to minimize potential confusion.
A typical SQL query has the form select A1, A2, …., An from r1, r2, ….,rm where P where Ai represents an attribute, and each ri a relation and P is a predicate. 3.2.1 The Select Clause The result of an SQL query is, of course, a relation. Let us consider a simple query “Find the names of all branches in the loan relation”: select branch-name from loan the result is a relation consisting of a single attribute with the heading branch-name. To eliminate the duplicates we insert the keyword distinct after select. We can rewrite the preceding query as
select distinct branch-name from loan SQL allows us to use the keyword all to specify explicitly that duplicates are not removed: select all branch-name from loan The asterisk symbol “ * ” can be used to denote “all attributes.” A select clause of the form select * indicates that all attributes of the relations appearing in the from clause are selected. The select clause may also contain arithmetic expressions involving the operators +, -, *, and / operating on constants or attributes of tuples. Fro example, the query
select loan-number, branch-name, amount*100 from loan will return a relation that is the same as the loan relation, except that the attribute amount is multiplied by 100.
3.2.2 The Where Clause The query “Find all loans numbers for loans made at the Perryridge branch with loan amounts greater than $1200.” This query can be written in SQL as: select loan-number
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
62
from loan where branch-name = ‘Perryridge’ and amount > 1200 SQL uses the logical connectives and, or, and not – rather than the mathematical symbols , , – in the where clause. SQL allows us to use the comparison operators such as , , , , , and to compare strings and arithmetic expressions, as well as special types, such as data types. SQL includes a between comparison operators to simplify where clauses that specify that a value be less than or equal to some value and greater than or equal to some other value. select loan-number from loan where amount between 90000 and 100000 instead of select loan-number from loan where amount <=100000 and amount >= 90000 Similarly, we can use the not between comparison operator.
3.2.3 The from Clause The from clause by itself defines a Cartesian product of the relations in the clause. Since the natural join is defined in terms of a Cartesian product, a selection, and a projection, it is a relatively simple matter to write an SQL expression for the natural join. “For all customers who have a loan from the bank, find their names, loan numbers and loan amount.” In SQL, this query can be written as select customer-name, borrower.loan-number,amount from borrower,loan where borrower.loan-number = loan.loan-number
3.2.4 The Rename Operation SQL provides a mechanism for renaming both relations and attributes. It uses the as clause, taking the form: old-name as new-name 3.2.5 Tuple Variables A tuple variable in SQL must be associated with a particular relation. Tuple variables are defined in the from clause by way of the as clause. “For all customers who have a loan from the bank, find their names, loan numbers,a nd loan amount” as select customer-name, T.loan-number,S.amount
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
63
from borrower as T, loan as S where T.loan-number = S.loan-number Tuples variables are most useful for comparing two tuples in the same relation.
3.2.6 String Operations SQL specifies strings by enclosing them in single quotes. A single quote character that is part of a string can be specified by using two single quote characters. The most commonly used operation on strings is pattern matching using the operator like. We describe patterns by using two special characters:
Percent (%): The % character matches any substring.
Underscore ( _ ): The _ character matches any character.
Patterns are case sensitive; that is, uppercase characters do not match lowercase character, or vice-versa.
3.2.7 Ordering the display of tuples Order by clause causes the tuples in the result of a query to appear in sorted order. To list in alphabetic order all customers who have a loan at the Perryridge branch, we write select distinct customer-name from borrower, loan where borrower.loan-number = loan.loan-number and branch-name = ‘Perryridge’ order by customer-name by default, the order by clause lists items in ascending order. To specify the sort order, we may specify desc fro descending order or asc for ascending order. select * from loan order by amount desc, loan-number desc, loan-number asc 3.3 SET OPERATIONS The SQL operations union, intersect, and except operate on relations and corresponds to the relational-algebra operations , , . The set of all customers who have an account at the bank, which can be derived by select customer-name from depositor and the set of customers who have a loan at the bank, which can be derived by select customer-name
from borrower
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
64
3.3.1 Union Operation To find all customers having loan, an account, or both at the bank, we write (select customer-name from depositor) union (select customer-name from borrower) The union operation automatically eliminates duplicates, unlike the select clause. If we want to retain all duplicates, we must write union all in place of union: (select customer-name from depositor) union all (select customer-name from borrower)
3.3.2 The Intersect Operation To find all customers who have both a loan and an account at the bank, we write (select distinct customer-name from depositor) intersect (select distinct customer-name from borrower) The intersect operation automatically eliminates duplicates, unlike the select clause. If we want to retain all duplicates, we must write intersect all in place of intersect: (select distinct customer-name from depositor) intersect all (select distinct customer-name from borrower)
3.3.3 The Except Operation To find all customers who have account but no loan at the bank, we write
(select distinct customer-name from depositor) except
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
65
(select customer-name from borrower) The except operation automatically eliminates duplicates, unlike the select clause. If we want to retain all duplicates, we must write except all in place of except: (select distinct customer-name from depositor) except all (select customer-name from borrower) 3.4 Aggregate functions Aggregate functions are functions that take a collection of values as input and return a single value. SQL offers five built-in aggregate functions:
Average: avg
Minimum: min
Maximum: max
Total: sum
Count: count
The input to sum and avg must be a collection of numbers, but the other operators can operate on collections of nonnumeric data types, such as strings, as well. “Find the average account balance at the Perryridge branch.” We write this query as: select avg (balance) from account where branch-name = ‘Perryridge’ 3.5 NULL VALUES SQL allows the use of null values to indicate absence of information about the value of an attribute. We can use the special keyword null in a predicate to test for a null value. Thus, to find all loan number that appear in the loan relation with null values for amount we write select loan-number from loan where amount is null The predicate is not null tests for the absence of a null value. Since the predicate in a where clause can involves Boolean operations such as and, or, and not on the results of comparison, the definitions of the Boolean operations are extended to deal with the value unknown.
and: The result of true and unknown is unknown, false and unknown is false, while unknown and unknown is unknown.
or: The result of true or unknown is true, false or unknown is unknown, while unknown or unknown is unknown
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
66
not: the result of not unknown is unknown.
3.6 NESTED SUB QUERIES A sub queries is a select-from-where expression that is nested within another query.
3.6.1 Set Membership The in connectives tests for set membership, where the set is a collection of values produced by a select clause. The not in connective tests for the absence of set membership. Example 1: To find all account holders, we write the sub query as (select customer-name from depositor) We then need to find those customers who are borrowers from the bank and who appears in the list of account holders obtained in the subquery. We do so by nesting the subquery in an outer select. The resulting query is select distinct customer-name from borrower where customer-name in (select customer-name form depositor) Example 2: To find all customers who do have a loan at the bank, but do not have an account at the bank, we can write select distinct customer-name from borrower where customer-name not in (select customer-name form depositor) The in and not in can also be used on enumerated sets. The following query selects the names of customers who have a loan at the bank, and whose names are neither Smith nor Jones. select distinct customer-name from borrower where customer-name not in (’Smith’,’Jones’)
3.6.2 Set Comparison SQL offer an alternative style for writing the query that include the phrase ”greater than at least one” is represented in SQL by > some.
Example 1: To find the names of all branches that have assets greater than those of at least one branch located in Brooklyn.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
67
select branch-name from branch where assets > some(select assets from branch where branch-city = ‘Brooklyn’) SQL also allows < all, <= all, >= all, = all, and <> all comparisons.
3.6.3 Test for Empty Relations SQL includes a feature fro testing whether a subquery has any tuples in its result. The exists construct returns the value true if the argument subquery is nonempty. We can test for the nonexistence of tuples in a subquery by using the not exists construct.
3.6.4 Test for the Absence of Duplicate Tuples SQL includes a feature for testing whether a subquery has any duplicate tuples in its result. The unique construct returns the value true if the argument subquery contains no duplicate tuples. We also test for the existence of duplicate tuples in a subquery by using the not unique construct.
3.7 Views We define a view in SQL by using the create view command. To define a view, we must give the view a name and must state the query that computes the view. The form of the create view command is create view as <query expression> where <query expression> is any legal query expression..
3.8 MODIFIACTION OF THE DATABASE We can also add, remove, or change information with SQL.
3.8.1 Deletion A delete request is expressed in much the same way as a query. We can delete only whole tuples; we cannot delete values on only particular attributes. SQL expresses a deletion by delete from r where P where P represents a predicate and r represents a relation.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
68
3.8.2 Insertion To insert data into a relation, we either specify a tuple to be inserted or write a query whose result is a set of tuples to be inserted. The simplest statement is a request to insert one tuple. Example: insert into account values (‘A-9732’, ‘Perryridge’,1200) 3.8.3 Update We may wish to change a value in a tuple without changing all values in the tuples. For this purpose the update statement can be used. Example 1: Suppose that annual interest payments are being made, and all balances are to be increased by 5 percent. We write update account set balance = balance * 0.05 Example 2: Suppose if the interest is to be paid only to accounts with a balance of $1000 or more, we can write update account set balance = balance * 0.05 where balance >=1000
3.8.4 Transaction A transaction consists of a sequence of query and /or update statements. The SQL standard specifies that a transaction begins implicitly when an SQL statement is executed.
Commit work commits the current transaction; that is, it makes the updates performed by the transaction become permanent in the database. After the transaction is committed, a new transaction is automatically started.
Rollback work causes the current transaction to be rolled back; that is, it undoes all the update performed by the SQL statements in the transaction. Thus, the database state is restored to what it was before the first statement of the transaction was executed.
OTHER RELATIONAL LANGUAGES 3.9 Query-by-Example Query-by-example (QBE) is the name of both a data-manipulation language and an early database system that included this language. It has two distinctive features:
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
69
1. Unlike most query languages and programming languages, QBE has a twodimensional syntax: Queries look like tables. A query in a one-dimensional language can be written in one line. A two dimensional language requires two dimensions for its expression. 2. QBE queries are expressed â&#x20AC;&#x153;by example.â&#x20AC;? Instead of giving a procedure for obtaining the desired answer, the user gives an example of what is desired. Queries in QBE are expressed by skeleton tables. Example:
Branch
Branch-name
Branch-city
assets
QBE can also be implemented on one relations, queries on several relations and it can also includes a condition box features.
TRANSACTION MANAGEMENT AND CONCURRENCY CONTROL 3.10 WHAT IS A TRANSACTION? A transaction is a logical unit of work that must be either entirely competed or aborted; no intermediate states are acceptable. A transaction that changes the contents of the database must after the database from one consistent state to another. A consistent database state is one in which all data integrity constraints are satisfied.
Amount in stock = X X = 40
X = X- 10
X = 30
Initial state
<Consistent state>
Transaction A <Modifies state> Read Modify Write Final state
<Consistent state>
Example of a transaction To ensure consistency of the database, every transaction must begin with the database in a known consistent state. If the database is not in a consistent state, the transaction will yield an inconsistent database that violates its integrity and business rules.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
70
Once a transaction has been guaranteed access to the database, it may manipulate the database contents. As this transaction is taking place, the DBMS must ensure that no other transaction accesses X.
3.10.1 Transaction Properties All transaction must display atomicity, durability, serializability, and isolation. We will briefly examine each of these properties.
ATOMICITY Requires that all overtops (parts) of a transaction be completed, if not the transaction is aborted. In other words, a transaction is treated as a single, indivisible, logical unit of work.
DURABILITY Indicates the permanence of the database’s consistent state. When a transaction is completed, the database reaches a consistent state, and that state cannot be lost, even in the event of the system’s failure.
SERIAIZABILITY Describes the result of the concurrent execution of several transactions More specifically, the concurrent transactions are treated as though they were executed in serial order (one after another).This property is important in multiuser and distributed database, where several transaction are likely to be executed concurrently.
ISOLATION Means that the data used during the execution of a transaction cannot be used by a second transacti9on until the first one is completed. In other words, if a transaction,. I is being executed and is using the data item, that data item cannot be accessed by any other transaction (T2…...Tn) until T1 ends. This property is particularly useful in multiuser database environments because several different users can access and update the database at the same time. By its nature, single-user database systems automatically ensure serializability and isolation of the database, because only one transaction is executed at a time .The atomicity and the durability of transaction must be guaranteed by the single-user DBMSs.
3.10.2 Transaction Management with SQL The American National Standards (ANSI) has defined standards that govern SQL database transaction. Transaction support is provided by two SQL statements: COMMIT and
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
71
ROLLBACK. When a transaction sequence is initiated by a user or an application program, it must continue all succeeding SQL statements unit one of the following four events occurs. 1. A COMMIT statement is reached, in which case all changes are permanently recorded within the database. The COMMIT statement automatically, ends of the SQL transaction. 2. A ROLLBACK statement is reached, in which case all changes are aborted and the database is rolled back to its previous consistent state. 3. The end of program is successfully reached, in which case all changes are permanently recorded within the database. The action is equivalent to COMMIT 4. The program is abnormally terminated, in which case the changes made in the database are aborted and the database is rolled back to its previous consistent state. This action is equivalent to ROLLBACK.
3.10.3 The Transaction Log A DBMS uses a transaction log to keep track of all transaction that updates the database. The information stored in this log is used by the DBMS for a recovery requirement triggered by a ROLLBACK
statement, a programâ&#x20AC;&#x2122;s abnormal termination, or a system failure
such as a network discrepancy or a disk crash. Some RDBMSs use the transaction log to recover a database forward to currently consistent state. While DBMS executes transaction transactions that modify the database, it also automatically updates the transaction log. The transaction log stores before- and-after data about the database and any of the tables, rows, and attribute values that participated in the transaction the beginning and ending (COMMIT) of the transaction are also recorded. Although using a transaction log increases the processing overhead of a DBMS, the ability to restore a computed database is worth the price. The transaction log is itself a database, and it is managed by the DBMS like any other database. The transaction log is subject to such common database dangers as disk-full conditions and disk crashes. Because the transaction log contains some of the most critical data in a DBMS, some implementations support logs on several different disks or on tapes to reduce the risk of a system failure.
3.11 CONCURRENCY CONTROL The coordination of simultaneous execution of transaction in a multiprocessing database system is known as concurrency control. The objective of concurrency control is to ensure the serializability of transactions in a multiuser database environment. Concurrency control is important because the simultaneous execution of transactions over a shared database can create several data integrity and consistency problems. The three main problems are lost updates uncommitted data, and inconsistent retrievals.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
72
3. 11.1 LOST UPDATES To illustrate the lost update problem, let us examine a simple PRODUCT table. One of the PRODUCT table’s attributes is a product’s quantity on hand (PROD_QOH). Assume a product whose current PRO_QOH value is 35. Now assume two concurrent transactions. TI and T2, that update PROD_QOH value for some item in the PRODUCT table. The transaction is:
TRANSACTION
COMPUTATION
T1: Purchase 100 units
PROD_QOH = PROD_QOH + 100.
T2: Sell 30 Units
PROD_QOH = PROD_QOH - 30.
Table 3.1 shows the serial execution of these transactions under normal circumstances, yielding the correct answer PROD_QOH = 105.
TABLE 3.1 NORMAL EXECUTIONS OF TWO TRANSACTIONS.
TIME
TRANSACTION
STEP
STORED VALUE
1
T1
Read PROD_QOH
2
T1
PROD_QOH = 35 + 100
3
T1
Write PROD_QOH
135
4
T2
Read PROD_QOH
135
5
T2
PROD_QOH = 135 – 30
6
T2
Write PROD_QOH
35
105
But suppose that a transaction is able to read a product’s PROD_QOH value from the table before a previous transaction (using the some product) has been committed. The sequence depicted in Table 3.2 shows how the lost update problem can arise. Note that the first transaction (T1) has not yet been committed when the second transaction (T2) is executed. Therefore, T2 still operation on the value 35, and its subtraction yields 5 in memory. In the meantime.T1 writes the value 135 to disk, which is promptly overwritten by T2. In short, the addition of 100 units is “lost” during the process.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
73
TABLE 3.2 LOST UPDATES TIME
TRANSACTION
STEP
STORED VALUE
1
T1
Read PROD_QOH
35
2
T2
Read PROD_QOH
35
3
T1
PROD_QOH = 35 + 100
4
T2
PROD_QOH = 35 -30
5
T1
6
T2
Write PROD _ QOH (Lost update) Write PROD_QOH
135 5
3.11.2 UNCOMMITTED DATA Data are committed when two transactions.T1 and T2 are executed concurrently and the first transaction (T1) is rolled back after the second transaction (T2) has already accessed the uncommitted data â&#x20AC;&#x201C; thus violating the isolation property of transaction. To illustrate this possibility, let us use the same transactions described during out lost updates discussion. However, this time the T1 transaction is rolled back to eliminate the addition of the 100 units. Because T2 subtract 30 from the original 35 units, the correct answer should be 5.
TRANSACTION
COMPUTATION
T1: Purchase 100 units
PROD_QOH = PROD_QOH + 100 (Rolled back)
T2: Sell 30 units
PROD_QOH = PROD_QOH - 30
Table 3.3 shows how, under normal circumstances, the serial execution of these transactions yields the correct answer. TABLE 3.3 CORRECT EXECUTIONS OF TWO TRANSACTIONS TIME
TRANSACTION
STEP
STORED VALUE
1
T1
Read PROD_QOH
2
T1
PROD_QOH = 35 + 100
3
T1
Write PROD_QOH
135
4
T1
*****ROLLBACK*****
35
5
T2
Read PROD_QOH
35
6
T2
PROD_QOH = 35 -30
7
T2
Write PROD_QOH
35
5
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
74
Table 3.4 shows how the uncommitted data problem can arise when the ROLLBACK is completed after T2 has begun its execution
TABLE 3.4 AN UNCOMMITTED DATA PROBLEM TIME
TRANSACTION
STEP
STORED VALUE
1
T1
Read PROD_QOH
2
T1
PROD_QOH = 35 + 100
3
T1
Write PROD_QOH
z135
4
T2
Read PROD_QOH (Read uncommitted data)
135
5
T2
PROD_QOH = 135 -30
6
T1
****ROLLBACK****
35
7
T2
Write PROD_QOH
105
35
3. 11.3 INCONSISTENT RETRIEVALS Inconsistent retrievals occur when a transaction calculates some summary (aggregated) functions over a set of data while other transactions are updating the data. The problems are that the transaction might read some data before they are changed and other data after they are changed, thereby yielding inconsistent results. To illustrate this problems. Let us assume the following conditions. 1. T1 calculates the total quantity on hand of the products stored in the PRODUCT table. 2. At the same time, T2 updates the quantity on hand (PROD_QOH) for two of the PRODUCT table’s products.
The two transactions are shown in Table 3.5 TIME PERIOD T1 SELECT SUM (PROD_QOH) FROM PRODUCT
TIME PERIOD T2 UPDATE PRODUCT SET PROD_QOH = PROD_QOH + 30 WHERE PROD_CODE = ‘125TYZ’; UPDATE PRODUCT SET PROD_QOH=PROD_QOH – 30 WHERE PROD_CODE = ‘345TYZ’; COMMIT;
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
75
Table 3.5 RETRIEVAL DURING UPDATE
While T1 calculate the total quantity on hand (PROD _QOH) for all items, T2 represents the correction of a typing error: the user added 30 units to product 345TYX’s PROD_QOH, but meant to add the 30 units to product 125TYZSs PROD_QOH .To correct the problem, the user subtracts 30 from product 345TYXs PROD_QOH and adds 30 to product 125TYZ’s PROD-QOH. The initial and final PROD_QOH values are reflected in Table 3.6.
TABLE 3.6 TRANSACTION RESULTS: DATA ENTRY CORRECTION BEFORE
AFTER
PROD_CODE
PROD_QOH
PROD_QOH
104XCV
100
100
110YGH
120
120
125TYZ
70
(70+30)
100
345TYX
35
(35-30)
5
350TYX
100
100
355TYX
30
30
Total
455
455
Although the final results are correct after the adjustment .Table 3.7 demonstrates that inconsistent retrievals are possible during the correction process
TABLE 3.7 CONSISTENT RETRIEVALS TIME
TRANSACTION
ACTION
VALUE
TOTAL
1
T1
Read PROD_QOH for PROD_CODE =’104XCV’
100
100
2
T1
Read PROD_QOH for PROD_CODE =’110YGC’
120
220
3
T2
Read PROD_QOH for PROD_CODE =’125TYZ’
70
4
T2
5
T2
Write PROD_QOH for PROD_CODE =’125TYZ’
100
6
T1
Read PROD_QOH for PROD_CODE =’125TYZ’
100
PROD_QOH = 70 + 30
320 (After)
7
T1
Read PROD_QOH for PROD_CODE =’345TYX’
35
355 (Before)
8
T2
9
T2
Read PROD_QOH for PROD_CODE =’345TYX’
35
PROD_QOH = 35 - 30
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
76
Write PROD_QOH for PROD_CODE =’345TYX’
10
T2
5
11
T2
12
T1
Read PROD_QOH for PROD_CODE =’345TYX’
100
455
13
T1
Read PROD_QOH for PROD_CODE =’345TYX’
30
485
*********COMMIT************
The computed answer of 485 is obviously wrong, because we know that correct answer to be 455. Unless the DBMS exercise concurrency control, a multiuser database environment can create havoc within the information system.
3.12 THE SCHEDULER
Server problems can arise when two or more concurrent transactions are executed.
A database transaction involves a series of database I/O operations that take the database from one consistent state to another.
As long as two transactions, T1 and T2 access unrelated data, there is no conflict and the order of execution is irrelevant to the final outcome.
But, if the transaction operate on related data, conflict is possible among the transaction components.
To overcome this conflict DBMS handles the tricky assignment for us by using a builtin scheduler.
The scheduler establishes the order in which the operations within concurrent transaction are executed.
The scheduler interleaves the execution of database operations to endure serializability
Scheduler uses appropriate algorithms such as locking or time stamping methods to determine the order.
READ/WRITE CONFLICT SCENARIOS Transactions
Operations
T1
T2
Read
Read
Read
Write
Write
Read
Write
Write
No conflict
Conflicting operations
Two operations are in conflict when they access the same data and at least one of them is a write operation.
3.13 LOCKING Locking is a procedure used to control concurrency access to data. When one transaction is accessing the database, a lock may deny access to other transaction to prevent incorrect
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
77
results. The working principle of locking is as follows. A transaction must obtain a read or write lock on a data item before it can perform a read or write operation. The read lock is also called a shared lock. The write lock is also known as an exclusive lock.
Rules of locking
If a transaction has a read lock on a data item, it can read the item but not update it.
If a transaction has a read lock on a data item, other transaction can obtain a read lock on the data item, but no write locks.
If a transaction has a write lock on a data item, it can both read and update the data item.
If a transaction has a write lock on a data item, then other transactions cannot obtain either a read lock or a write lock on the data item. In other words, when a transaction acquires a write lock on a data item, it gains exclusive control over the data item.
3.13.1 TWO-PHASE LOCKING TO ENSURE SERIABILITY The two-phase locking protocol defines how transactions acquire and relinquish locks. Two phase locking guarantees serializability, but it does not prevent deadlocks. The two-phases are: 1. A growing phase, in which a transaction acquires all the required locks without unlocking any data. Once all locks have been acquired, the transaction is in its locked point. 2. A shrinking phase, in which a transaction releases all locks and cannot obtain any new locks. The two-phase locking protocol is governed by the following rules:
1. Two transactions cannot have conflicting locks. 2. No unlock operation can precede a lock operation in the same transaction. 3. No data are affected until all locks are obtained-that is, until the transaction is in its locked point.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
78
TWO-PHASE LOCKING PROTOCOL
Acquire Lock Locked point
Release lock
Acquire Lock Release lock
Time 1
2
Growing Phase
3
4
5
Locked phase
6
7
8
Shrinking phase
3. 14 DEADLOCKS Deadlocks exist when two transactions T1 and T2, exist in the following modes. T1 = access data items X and Y T2 = access data items Y and X If T1 has not unlocked data item Y, T2 cannot begin; if T2 has not unlocked data item X, T1 cannot continue. Consequently, T1 and T2 wait indefinitely, each waiting for the other to unlock the required data item. Such a deadlock is also known as a deadly embrace. Table 9.8 shows how a deadlock condition is created.
Deadlocks are possible only if one of the transactions wants to obtain an exclusive lock on a data item: no deadlock condition can exist among shared locks. Three basic techniques exist to control deadlocks:
3.14.1 Deadlock prevention A transaction requesting a new lock is aborted if there is the possibility that a deadlock can occur. If the transaction is aborted, all the changes made by this transaction are rollback, and all locks obtained by the transaction are released. The transaction is then rescheduled for execution. Deadlock prevention works because it avoids the conditions that lead to deadlocking.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
79
3.14.2 Deadlock detection The DBMS periodically tests the database for deadlocks. If a deadlock is found, one of the transactions is aborted and the other transaction continues.
3.14.3 Deadlock avoidance The transaction must obtain all the locks it needs before it can be executed. This technique avoids rollbacks of conflicting transactions by requesting that locks be obtained in succession. However, the serial lock assignment required in deadlock avoidance increases action response times. The best deadlock control method depends on the database environment. For example, if the probability of deadlock is low, deadlock detection is recommended. However, if the probability of deadlock is high, deadlock prevention is recommended. If response time is not high on the system priority list, deadlock avoidance might be employed.
3.15 SERIALIZABILITY Serializability describes the result of the concurrent execution of several transactions executed in serial order (important in multiuser and distributed databases)
A serial execution is one in which the transactions are run one at a time in some sequence.
Guaranteed means that the given execution and the serial one always produce the same result as each other, no matter what the initial state of the database might be.
3. 16 CONCURRENCY CONTROL WITH TIME STAMPING METHODS The time stamping approach to scheduling concurrent transactions assign a global unique time stamp to each transaction. The time stamp value produces an explicit order in which transactions are submitted to the DBMS. Time stamps must have two properties: uniqueness and monotonicity. Uniqueness ensures that no equal time stamp values can exist, and monotonicity ensures that timestamp values always increase. All database operations (READ and WRITE) within the same transaction must the same time stamp. The DBMS executes conflicting operations in time stamp order, thereby ensuring serializability of the transactions. If two transactions conflict, one often is stopped, rescheduled, and assigned a new time stamp value. The disadvantages of the time stamping approach is that each value stored in the database requires two additional time stamping thus increases the memory needs and the database’s processing overhead.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
80
CONCURRENCY CONTROL WITH OPTIMISTIC METHODS The optimistic approach is based on the assumption that the majority of the database operations do not conflict. The optimistic approach does not require locking or time stamping techniques. Instead, a transaction is executed without restrictions until it is committed. Using as optimistic approach, each transaction moves through two or three phases. The phases are read, validation, and write.
Read Phase The transaction reads the database, executes the needed computations, and makes the updates to a private copy of the database values. All update operations of the transactions are recorded in a temporary update file, which is not accessed by the remaining transaction.
Validation Phase The transaction is validated to ensure that the changes made will not affect the integrity and consistency of the database. If the validation test is positive, the transaction goes to the write phase. If the validation test is negative, the transaction is restarted, and the changes are discarded.
Write Phase The changes are permanently applied to the database. The optimistic approach is acceptable for mostly read or query database systems that require very few update transactions.
3.18 DATABASE RECOVERY MANAGEMENT Recovery techniques are based on the atomic transaction property: all portions of the transaction must be treated as a single logical unit of work, in which all operations must be applied and completed to produce a consistent database. If, for some reason, any transaction operation cannot be completed, the transaction must be aborted, and any changes to the database must be rolled back. In short, transaction recovery reverses all the changes that the transaction has made to the database before it was aborted. Some DBMSs provide functions that allow the database administrator to schedule automatic database backups to permanent secondary storage devices, such as disks or tapes.
The level of backup varies ď&#x201A;§
A full backup of the database, or dump of the database.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
81
A differential backup of the database, in which only the last modifications done to the database (when compared with a previous backup copy) are copied
A backup of the transaction log only, the level backs up all the transaction log operations that are not reflected in a previous backup copy of the database.
The database backup is stored in secure place, usually in a different building, and protected against dangers such as fire, theft, flood, and other potential calamities. The backup’s existence guarantees database recovery following systems (hardware/software) failures. Failures that plague databases and systems are generally induced by software, hardware, programs exemptions, transactions, or external factors.
SOFTWARE
Software-induced failures may be traceable to the operating systems, the DBMS software, application programs viruses.
HARDWARE
Hardware-induced failures may include memory chip errors, disk crashes, bad disk sectors, disk full errors, and so on.
PROGRAMMING EXEMPTION
Application programs or end users may roll back transactions when certain conditions are defined For example, a recovery procedure may be initiated if a withdrawal of funds is made when the customer balance is zero or if an end user has initiated an unintended key-board error, such as pressing [Ctrl] + [C].
TRANSACTION
The system detects deadlocks and aborts one of the transactions.
EXTERNAL
Backups are especially important when a system suffers complete destruction due to fire, earthquake, flood, and so forth.
Depending on the type and the extent of the failure, the recovery process ranges from a minor short-term inconvenience to a major long-term rebuild action. Regardless of extend to the required recovery process, recovery is not possible without backup.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
82
DISTRIBUTED DATABASE MANAGEMENT SYSTEMS
3. 19 THE EVOLUTION OF DISTRIBUTED DATABASE MANAGEMENT SYSTEMS A distributed database management system (DDBMS) governs the storage and processing of logically a related data over interconnected computer systems in which both data and processing functions are distributed among several sites. To understand how and why the DDBMS is different from the DBMS, it is useful to briefly examine the changes in the database environment that set the stage for the development of the DDBMS. 1970‘s
Centralized database management systems to meet their structured information needs. Data access was provided through dumb terminals.
1980’s
Social and technological changes that affected database development and design such as. 1. Business operations became more decentralized geographically. 2. Competition increased at the global level.
In Dynamic Business environment Respond quickly to competitive and technological pressures. Two database requirements became obvious. 1. Quick ad hoc data access (i.e.) quick–response decision- making environment. 2. Decentralization. 1990’s
1. Growing acceptance of Internet particularly www. 2. Data analysis data mining and data warehousing.
All these led to decentralization of database desirable because centralized database is subject to problems such as: 1. Performance degradation due to growing number of remote location over greater distances. 2. High Cost associated with maintaining and operating large central database systems. 3. Reliability problems created by dependence on a central site.
Multiple–source /multiple–location database environment known as a distributed database is managed by a distributed database management system (DDBMS).
3.20 DDBMS ADVANTAGES AND DISADVANTAGES Distributed database management systems deliver several advantages over traditional systems. All the same time, they are subject to some problems. Table shown below summarizes the advantages and disadvantages associated with a DDBMS
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
83
Distributed DBMS Advantages and Disadvantages ADVANATAGES
Data are located near the greatest demand site. The data in a distributed database system are dispersed to match business requirements. Faster data access. End users often work with only a locally stored subject of the company’s data. Faster data processing. A distributed database systems spreads, out the system’s workload by processing data at several sites.
Growth facilitation. New sites can be added to the network without affecting the operations of other sites.
Improved communications. Because local sites are smaller and located closer to customers, local sites foster better communication among departments and between customers and company staff.
Reduced operating costs. It is more cost-effective to add workstations to a network than to update a mainframe system. Development work is done more cheaply and more quickly on low-cost PCs than on mainframes.
User –friendly interface. PCs and workstations are usually equipped with an easy-to use graphical user interface (CUI), The CUI simplifies use and training for end users.
Less danger of a single-point failure. When one of the computers fails, the workload is picked up by other workstations. Data are also distributed at multiple sites.
DISADVANTAGES Complexity of management and control. Applications must recognize data location, and they must be able to stitch together data from different sites. Database administrator must have the ability to coordinate database anomalies. Transaction management, concurrency control, security, backup, recovery, query optimization, access path selection, and so on, must all be addressed and resolved. Security. The probability of security lapses increases when data are located at multiple sites. The responsibility of data management will be shared by difference people at several sites. Lack of standards. There are no standard communication protocols at the database level. (Although TCP/IP is the de facto standard at the network level, there is no standard at the application level) For example, difference database vendors employ different – and often incompatible – techniques to manage the distribution of data and processing in a DDBMS environment. Increased storage requirements. Multiple copies of data are required at different sites, thus requiring additional disk storage space. Increased training cost. Training costs are generally higher in a distributed model than they would be in a centralized model, sometimes even to the extent of offsetting operational and hardware savings.
Processor independence. The end user is able to access any available copy of the data, and an end user’s request processed by any processor of the data location.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
84
3.21 DISTRIBUTED PROCESSING AND DISTRIBUTED DATABASES In distributed processing, a databaseâ&#x20AC;&#x2122;s logical processing is shared among two or more physically independence sites that are connected through a network. For example, the data input/output (I/O), data selection, and data validation might be performed on one computer, and a report based on that data might be created on another computer. A basic distributed processing environment is illustrated in Figure 3.8 Figure 3.8 shows that a distributed processing system shares the database processing chores among three sites connected through a communication network. Although the database resides at only one site (Miami), each site can access the data and update the database. The database is located on Computer a. a network computer known as the database server.
Computer A Site 1 Miami user joe
DBMS
Employee database
Update payroll data
Communication network
Generate payroll report
Database are processed A distributed database, on the records other hand, stores a logically related database over two
in different location Site 2 Site 3 contrast, distributed database butVictor shares the Newthe York user processing system uses only a single-siteAtlanta user Computer C processingDonna chores among several sites. In a distributed database system, a database is Computer B composed of several parts known as database fragments .The database fragments are located or more physically independence sites. The sites are connected via a computer network. In
at different sites and can be Figure replicated various sites. An example of a distributed database 3.8among Distributed processing environment environment is shown in figure 3.9.
Computer A
DBMS
Site 1 Miami user Alan
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
Employee database
DATABASE MANAGEMENT SYSTEM
85
The database in Figure 3.9 is divided into three database fragments (E1, E2 and E3) located at different sites. The computers are connected through a network system. In a fully distributed database, the users Alan, Betty, and Hernando do not need to know the name or location of each database fragment in order to access the database. Also, the users may be located at sites other than Miami, Net York, or Atlanta and still be able to access the database as a single logical unit. As we examine and contrast Figures 3.8 and Figure 3.9, we should keep the following points in mind.
Distributed processing does not require a distributed database, but a distributed database requires distributed processing.
Distributed processing may be based on a single database located on a single computer. For the management of distributed data to occur, copies or parts of the database processing function must be distributed to all data storage sites.
Both distributed processing and distributed databases require a network to connect all components.
3.22 CHARACTERISTICS OF DISTRIBUTED DATABASE MANAGEMENT SYSTEMS A distributed database management system (DDBMS) governs the storage and processing of logically related data over interconnected computer systems in which both data and processing function are distributed among several sites. A DBMS must have at least the following functions to be classified as distributed.
Application interface to interact with the end user or application programs and with other DBMSs within the distributed database.
Validation to analyze data requests.
Transformation to determine which data request components are distributed and which are local.
Query optimization to find the best access strategy. (Which database fragment must be accessed by the query, and how must data updates, if any, be synchronized?)
Mapping to determine the data location of local and remote fragments.
I/O interface to read or write data from or to permanent local storage.
Formatting to prepare the data for presentation to the end user or to an application program.
Security to provide data privacy at both local and remote databases.
Backup and recovery to ensure the availability and recoverability of the database in case of a failure.
DB administration features for the database administrator.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
86
Concurrency control to mange simultaneous data access and o ensure data consistency across database fragments in the DDBMS.
Transaction management to ensure that the data move from one consistent state to another. This activity includes the synchronization of local and remote transactions as well as transactions across multiple distributed segments.
A fully distributed database management system must perform all of the functions of a centralized DBMS, as follows: 1. Receive an application’s (or an end user’s) request. 2. Validate, analyze, and decompose the request. The request may include mathematical and/or logical operations such as the following. Select of customers with balance greater than $1000.the request may require data from only a single table, or it may require access to several tables. 3. Map the request’s logical-to-physical data components. 4. Decompose the request into several disk I/O operations. 5. Search for locate, read, and validate the data. 6. Ensure database consistency, security, and integrity. 7. Validate the data for the conditions, if any, specified by the request. 8. Present the selected data in the required format.
On addition, a distributed DBMS must handle all necessary functions imposed by the distribution of data and processing. And it must perform those additional functions transparently to the end user.
3.23 DDBMS COMPONENTS The DDBMS must include at least the following components.
Computer workstations (sites or nodes) that form the network system. The distributed system must be independent of the computer system hardware.
Network hardware and software components that reside in each workstation. The network components allow all sites to interact and exchange data. Because the components-computers operating systems, network hardware, and so on-are likely to be supplied by different vendors, it is best to ensure that distributed database functions can be run on multiple platforms.
Communication media that carry the data from one workstation to another. The DDBMS must be communications media-independence, that is, it must be able to support several types of communications media.
The transaction processor (TP), which is the software component found in each computed that requests data .The transaction processor receives and processes the
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
87
application’s data requests (remote and local). The TP is also known as the application processor (AP) or the transaction manager (TM).
The data processor (DP), which is the software component residing on each computer that stores and retrieves data located at the site. The DP is also known as the data manager (DM). A data processor may even be a centralized DBMS.
Figure 3.10 illustrates the placement of and interaction among the components. The communication among TPs and DPs shown in Figure 3.10 is made possible through a specific set of rules, or protocols, used by the DDBMS.
The protocols determine how the distributed database system will.
Interface with the network to transport data and commands between data processor (DPS) and transaction processors (TPs).
Synchronize all data received from DPs (TP side) and route retrieved data to the appropriate TPs (DP side).
Ensure common database functions in a distributed system. Such functions include security, concurrency control backup, and recovery.
DPs and TPs can be added to the system without affecting the operation of the other components. A TP and a DP can reside on the same computer, allowing the end user to access local as well as remote data transparently. In theory, a DP can be an independent centralized DBMS with proper interfaces to support remote access from other independent DBMSs in the network.
TP
TP
TP
TP
Communication Network
TP,DP
TP,DP
3.24 LEVELS OF 3.10 DATADistributed AND PROCESS DISTRIBUTION Figure database system management
TP,DP
components
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
88
Current database systems can be classified on the basis of how process distribution and data distribution are supported For example; a DBMS may store data in a single site (centralized DB) or in multiple sites (distributed DB) and may support data processing at a single site or at multiple sites.
Database system: Levels of Data and Process Distribution SINGLE SITE DATA Single - site Process
Host DEMS (Mainframes)
Multiple- Site Process
File server Client/server DBMS (LAN DBMS)
MULTIPLE SITE DATA Not applicable (Requires multiple processes) Fully distributed Client /server DDBMS
3.24.1 Single-Site Processing, Single–Site Data (SPSD) In the single–site processing, single-site data (SPSD) scenario, all processing is done on a single CPU or host computer (mainframe, midrange, or PC) and all data are stored on the host computer’s local disk. Processing cannot be done on the end user’s side of the system. The DBMS is located on the host computer, which is accessed by dumb terminals connected to it. Using Figure 3.11 as an example, the functions of the TP and the DP are embedded within the DBMS located on a single computer .the DBMS usually runs under time-sharing, multitasking operating system, which allows several processes to run concurrently on a host CPU accessing a single DP. All data storage and data processing are handled by a single CPU
T1 DBMS
Dumb terminal Front-end processor
T1
Database
T1
Remote dumb terminal
Communication through telephone line FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
Figure 3.11 Single-site processing, single-site data (centralized)
DATABASE MANAGEMENT SYSTEM
89
3.24.2 MULTIPLE-SITE PROCESSING, SINGLE-SITE DATA (MPSD) Under the multiple-site processing, single – site data (MPSD) scenario, multiple processes run on different computers sharing a single data repository. Typically, the MPSD scenario requires a network file server running conventional applications that are accessed through a LAN. Many multiuser accounting applications running under a personal computer network fit such a description (see Figure 3.12 As you examine Figure 3.12 note that
The TP on each workstation acts only as a redirector to route all network data requests to the file sever.
The end user sees the file server as just another hard disk. Because only the data storage input/output) I/O) is handled by the file server’s computer, the MPSD offer limited distributed processing capabilities.
The end user must make a direct reference to the file server in order to access remote data. All record and file-locking activity is done at the end-user location.
All data selection, search, and update functions take place at the workstation, thus requiring that entire files travel through the network for processing at the workstation. Such requirement increases network traffic slows response time, and increases communion costs.
File Server DP
Site A
Site B
Site C
T1
T1
T1
Communication Network
3.12 Multiple-site processing single-site data A variation of the multiple-site processing, single-site data approach is known as client/server architecture client/sever architecture is similar to that of the network file server except that all database processing is done at the server site, thus reducing network traffic
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
90
Although both the network file serve and the client/server systems perform multiple-site processing, the latter’s processing is distributed. Note that the network file server approach requires the database to be located at a single site in contrast; the client/server architecture is capable of supporting data at multiple sites.
3.24.3 MULTIPLE-SITE PROCESSING, MULTIPLE-SITE DATA (MPSD)
The multiple-site processing, multiple-site data) MPMD) scenario describes a fully distributed DBMS with support for multiple data [processors and transaction processors at multiple sites. Depending on the level of support for different types of centralized DBMSs. DDBMSs are classified as either homogeneous or heterogeneous. Homogenous DDBMSs integrate only one type of centralized DBMS over network. Thus, the same DBMS will be running of different mainframes, minicomputers, and microcomputer. In contrast, heterogeneous DDBMSs integrate different types of centralized DBMSs over a network. A fully heterogeneous DDBMS will support different DBMSs that may even support different data models (relational, hierarchical, or network) running under different computer systems, such as mainframe and microcomputers. However, such DDBMSs still are subject to certain restrictions. For example
Remote access is provided on a read-only basis and does not support write privileges.
Restrictions are placed on the number of remote tables that may be accessed in a single transaction
Restrictions are placed on the number of district databases that may be accessed.
Restrictions are placed on the database model that may be accessed. Thus, access may be provided to relational databases but not to network or hierarchical database.
3.25 DISTRIBUTED DATABASE TRANSPARENCY FEATURES A distributed database system requires functional characteristics that can be grouped and described as transparency features, DDBMS transparency features have the common property of allowing the end user to feel like the database’s only user in other words, the user believes that (s)he is working with a centralized DBMS, all complexities of a distributed database are hidden, or transparent to the user.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
91
The DDBMS transparency features are:
Distribution transparency, which allows a distributed database to be treated as a single logical database. It a DDBMS exhibits distribution transparency, the user does not need to know - That the data are partitioned. - That the data can be replicated at several sites. - The data location. Transaction transparency, which allows a transaction to update data at several network sites. Transaction transparency ensures that the transaction will be either entirely completed or aborted, thus maintaining database integrity. Failure transparency, which ensures that the system will continue to operate in the event of a node failure Functions that were lost because of the failure, will be picked up by another network node. Performance transparency, which allows the system to perform as if it were a centralized DBMS. The system will not suffer any performance degradation due to its use on a network or due to the network’s platform differences. Performance transparency also ensures that the system will find the most cost effective path to access remote data. Heterogeneity transparency, which allows the integration of several different local DBMSs (relational, network, and hierarchical) under a common, or global. The DDBMS is responsible for translating the data request form the global schema to the local DBMS schema.
3.26 DISTRIBUTION TRANSPAPRENCY Distribution transparency allows a physically dispersed database to be managed as though it were a centralized database. The level of transparency supported by the DDBMS varies from system to system. Three levels of distribution transparency are recognized.
Fragmentation transparency is the highest level of transparency. The end user or programmer does not need to know that a database is partitioned. Therefore, neither fragment names nor fragment locations are specified prior to data access.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
92
Location transparency exists when the end user or programmer must specify the database fragment names but does not need to specify where those fragments are located.
Local mapping transparency exists when the end user or programmer must specify both the fragment names and their locations.
Transparency features are summarized in Table 3.13 A SUMMARY OF TRANSPARENCY FEATURES IF THE SQL STATEMENT REQURIES
FRAGMENT NAME
LOCATION NAME
THEN THE DBMS SUPPORTS
LEVEL OF DISTRIBUTION TRANSPARENCY
Yes
Yes
Local mapping
Low
Yes
No
Location transparency
Medium
No
No
Fragmentation transparency
High
Distribution transparency is supported by a distributed data dictionary (DDD), or a distributed data catalog (DDC). The DDC contains the description of the entire database as seen by the database administrator. The database description, known as the distributed global schema, is the common database schema used by local TPs to translate user requests into subqueries (remote requests) that will be processed by different DPs. The DDC is itself distributed, and it is replicated at the network nodes. Therefore, the DDC must maintain consistency through updating at all sites.
3.27 TRANSACTION TRANSPARENCY Transaction transparency is a DDBMS property that ensures that database transactions will maintain the distributed database’s integrity and consistency. Remember that a DDBMS database transaction can update data stored in many different computers connected in a network. Transaction transparency ensures that the transaction will be completed only when all database sites involved in the transaction complete their part of the transaction.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
93
3.27.1 Distributed Requests and Distributed Transactions Whether or not a transaction is distributed, it is formed by one or more database requests. The basic different between a non distributed transaction and a distributed transaction is that the latter can update or request data from several different remote sites on a network. A remote request. A remote transaction A distributed transaction allows a transaction to reference several different local or remote DP sites. Although each single request can reference only one local or remote DP site, the transaction as a whole can reference multiple DP sites because each request can reference a different site.
3.27.2 Distributed Concurrency Control Concurrency control becomes especially important in the distributed database environment because multisite, multiple-process operations are more likely to create data inconsistencies and deadlocked transactions than single-site systems are For example, the TP component of a DDBMS must ensure that all parts of the transaction are completed at all sites before a final COMMIT is issued to record the transaction.
3.27.3 Two-Phase Commit Protocol Centralized databases require only one DP. All database operations take place at only one site and the consequences if database operations are immediately known to the DBMS. In contrast, distributed databases make it possible for a of database transactions to access data at several sites. A final COMMIT must not be issued until all sites have committed their parts of the transaction. The two-phase commit protocol guarantees that if a portion of a transaction operation cannot be committed; all changes made at the other sites participating in the transaction will be undone to maintain a consistent database state. Each DP maintains its own transaction log. The two-phase commit protocol requires that the transaction entry log for each DP be written before the database fragment is actually updated. Therefore, the two-phase commit protocol requires a DO-UNDO-READ protocol and a write – ahead. The DO-UNDO-REDO Protocol is used by the DP to rollback and/or roll forward transactions with the help of the system’s transaction log entries. The DO-UNDO-REDO protocol defines three types of operations.
DO perform the operation and records the “before” and “after” values in the transaction log.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM ď&#x201A;ˇ
94
UNDO reverses an operation, using the log entries written by the DO portion of the sequence.
ď&#x201A;ˇ
REDO redoes an operation, using the log entries written by the DO portion of the sequence.
To ensure that the DO, UNDO, and REDO operations can survive a system crash while they are being executed, write-ahead protocol is used. The write-ahead protocol forces the log entry to be written to permanent storage before the actual operation takes place. The two-phase commit protocol defines the operation between two types of nodes: the coordinator and one or more subordinates, or cohorts. The participating nodes agree on a coordinator. Generally, the coordinator role is assigned to the node that initiates the transaction. However, different systems implement various, more sophisticated election methods. The protocol is implemented in two phases.
Phase 1: Preparation 1. The coordinator sends a PREPARE TO COMMIT message to all subordinates. 2. The subordinates receive the message, write the transaction log, using the write-ahead protocol and send an acknowledgment (YES/PREPARED TO COMMIT or NO/NOT PREPARED) message to the coordinator. 3. The coordinator makes sure that all nodes are ready to commit, or it aborts the action.
If all nodes are PREPARED TO COMMIT, the transaction goes to phase 2.If one or more nodes reply NO or NOT PREPARED, the coordinate broadcasts as ABORT message to all subordinates.
Phase 2: The Final Commit 1. The coordinate broadcasts a COMMIT message to all subordinates and waits for the replies. 2. Each subordinate receives the COMMIT message, then updates the database using the DO protocol. 3. The subordinates reply with a COMMITTED or NOT COMMITTED message to the coordinator
If one or more subordinates did not commit, the coordinator sends an ABORT message, thereby facing them to UNDO all changes.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
95
The objective of the two-phase commit is to ensure that all nodes commit their part of the transaction; other wise, the transaction is aborted. If one of the nodes falls to commit, the information necessary to recover the database is in the transaction log; and the database can be recovered with the DO-UNDO-REDO protocol.
3.28 PERFORMANCE TRANSPAREENCY AND QUERY OPTIMIZATION The DDBMS makes it possible to partition a database into several fragment, thereby rendering the query translation more complicated because of DDMS must decide which fragment of the database to access. In addition, the data may also be replicated at several different sites. The data replication makes the access problem even more complex because the database must decide which copy of the data to access. The DDBMS uses query optimization techniques to deal with such problems and to ensure acceptable database performance.
The objective of a query optimization routine is to minimize the total cost associated with the execution of a request. The cost associated with a request is a function of the:
Access time (I/O) cost involved in accessing the physical data stored on disk.
Communication cost associated with the transmission of data among nodes in distributed database systems.
CPU time cost associated with the processing overhead of managing distributed transactions.
One of the most important characteristics of query optimization in distributed database systems is that it must provide distribution transparency as well as replica transparency Replica transparency refers to the DDBMS’s ability to hide the existence of multiple copies of data from the user:
Most of the algorithms proposed for query optimization are based on two provincials
The selection of the optimum execution order
The selection of sites to be accessed to minimize communication costs.
Within those two principles, a query optimization algorithm can be evaluated on the basis of its operation mode or the timing of its optimization.
Operation modes can be classified as manual or automatic. Automatic query optimization means that the DDBMS finds the most cost- effective access path without user intervention. Manual query optimization requires that the optimization be selected and scheduled by the end user or programmer. Automatic query optimization is clearly more desirable
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
96
from the end user’s point of view, but the cost of such convenience is the increased overhead that it imposes on the DDBMS
Query optimization algorithms can also be classified according to when the optimization is done. Within this timing classification query optimization algorithms can be classified as static or dynamic.
Static query optimization takes place at compilation time.
Dynamic query optimization takes place at execution time.
Finally, query optimization techniques can be classified according to the type of information that is used to optimize the query.
A Statistically based query optimization algorithm uses statistical information about the database. the statistic provide information about database characteristics such as size, number of records, average access time number of request serviced, and number of users with access rights. These statistics are then used by the DBMS to determine the best access strategy.
The statistical information is managed by the DDBMS and is generated in one of two different modes dynamic or manual. In the dynamic statistical generation mode, the DDBMS automatically evaluates and updates the statistics after each access. In the manual statistical generation mode, the statistics must be updated periodically through a user selected utility.
A rule-based query optimization algorithm is based on a set of user-defined rules to determine the best query access strategy. The rules are entered by the end user or database administrator, and they typically are very general in nature.
3.29 DISTRIBUTED DATABASE DESING The design of a distributed database introduces three new issues:
How to partition the database into fragments.
Which fragments to replicate?
Where to locate those fragments and replicas.
Data fragmentation and data replication deal with the first two issues, and data allocation deals with the thirds issue.
3.29.1 Data Fragmentation Data fragmentation allows you to break a single object into two or more segments or fragments. The object might be a user’s database, a system database, or a table. Each fragment can be stored at any site over a computer network. Information about data fragmentation is stored
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
97
in the distributed data catalog (DDC) from which it is accessed by the TP to process user requests.
Three types of data fragmentation strategies are horizontal, vertical, and mixed.
Horizontal fragmentation refers to the division of a relation into subsets of tuples. Each fragment is stored at a different node, and each fragment has unique rows. However, the unique rows all have the same attributes. This is equivalent of a SELECT statement, with the WHERE clause on a single attribute.
Vertical fragmentation refers to the division of a relation into attribute subsets. Each subset is stored at a different node, and each fragment has unique columns-with the exception of the key column, which is common to all fragments. This is equivalent of the PROJECT statement in SQL.
Mixed fragmentation refers to a combination of horizontal and vertical strategies. In other words, a table may be divided into several horizontal subsets (rows), each one having a subset of the attributes (columns).
3.29.2 Data Replication Data replication refers to the storage of data copies at multiple sites served by a computer network. Fragment copies can be stored at several sites to serve specific information requirements. Because the existence of fragment copies can enhance data availability and can response time, data copies can help to reduce communication and local query costs.
Replicated data are subject to the mutual consistency rule. The mutual consistency rule requires that all copies of data fragments can be identical. Therefore, to maintain data consistency among data replicas the DDBMS must ensure that a database update is performed at all sites where replicas exist.
Three replication scenarios exist: a database can be fully replicated, partially replicated, or unreplicated.
A fully replicated database stores multiple copies of each database fragment at multiple sites. In this case, all database fragments are replicated. A fully replicated database can be impractical due to the amount of over head it I imposes on the system.
A partially replicated database stores multiple copies of some database fragments at multiple sites. Most DDBMSs are able to handle the partially replicated database well.
An unreplicated database stores each database fragment at a single site. Therefore, there are no duplicate database fragments.
Several factors influence the decision to use data replication.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
Database size
Usage frequency.
Costs–performance,
software
overhead,
98
and
management
–associated
with
synchronizing transactions and their components vs. fault-tolerance benefits that are associated with replicated data.
3.29.3 Data Allocation Data allocation describes the process of deciding where to locate data. Data allocation strategies are as follows.
With centralized data allocation, the entire database is stored at one site.
With partitioned data allocation, the database is divided into several disjoined parts (fragments) and stored at several sites.
With replicated data allocation, copies of one or more database fragment are stored at several sites
Data allocation algorithms take into consideration a variety of factors, including
Performance and data availability goals.
Size, number of rows, and number of relations that an entity maintains with other entities.
Types of transactions to be applied to the database and the attributes accessed by each of those transitions.
3.30 CLIENT/ SERVER VS DDBMS Client/sever architecture refers to the way in which computers interact to form a system. The client/server architecture features a user of resources, or a client, and a provider of resources, or a server. The client/server architecture can be used to implement a DBMS in which the client is the TP and the server is the DP. Client/server interactions in a DDBMS are carefully scripted. The client (TP) interacts with the end user and sends a request to the server (DP). The server receives, schedules, and executes that request, selecting only those records that are needed by the client. The server then sends the data to the client only when the client requests the data.
Client/server application offer several advantages
Client/server solutions tend to be less expensive than alternate minicomputer or mainframe solutions.
Client/server solutions allow the end user to use the microcomputer’s GUI, thereby improving functionality and simplicity.
More people in the job market have PC skills than mainframe skills.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
99
The PC is well established in the workplace.
Numerous data analysis and query tools exist to facilitate interaction with many of the DBMSs that are available in the PC market
There is a considerable cost advantage to offloading application development from the mainframe in powerful PCs.
Client/server applications are also subject to some disadvantages
The client/server architecture creates a more complex environment in which different platforms (LANs operating systems and so on) are often difficult to manage.
An increase in the number of users and processing sites often paves the way for security problems.
The client/server environment makes it possible to spread data access to a much under circle of users. Such an environment increases the demand for people with a broad knowledge of computer and software applications. the burden of training increases the cost of maintaining the environment
3.31 C.J. DATE’S TWELVE COMMANDMENTS FOR DISTRIBUTED DATABASES C.J. Date’s distributed databases commandments Date’s commandments describe a fully distributed database, and although no current DDBMS conforms to all of them, the rules do constitute a useful distributed database target. The twelve rules are as follows: 1. Local site indep0endendce.Each local site can act as an independent, autonomous, centralized DBMS, Each site is responsible for security, concurrency control, backup and recovery. 2. Central site independence NO site in the network relies on a central site or any other site. All sites have the same capabilities. 3. Failure independence, the systems is not allocated by node failures. The system is in continuous operation even in the case of a node failure or an expansion of the network. 4. Location transparency. The user does not need to know the location of the data in order to retrieve those data. 5. Fragmentation transparency. The user sees only one logical database. Data fragmentation is transparent to the user. The user does not need to know the name of the database fragments in order to retrieve them. 6. Replication transparency. The user sees only one logical database. The DDBMS transparently selects the database fragment to access. To the user, The DDBMS managers all fragment transparently. 7. Distributed query processing. A distributed query may be executed at several different DP sites. Query optimization is performed transparently by the DDBMS.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
100
8. Distributed transaction processing. A transaction may update data at several different sites. The transaction is transparently executed at several different sites. 9. Hardware independence. The system must run on any hardware platform. 10. Operating system independence. The system must run on any operating system software platform 11. Network independence. The system must run on any network platform. 12. Database independence. The system must support any vendorâ&#x20AC;&#x2122;s database product.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
101
UNIT 4 QUERY PROCESSING 4.1 OVERVIEW Query Processing refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions that can be used at the physical level of the file system, a variety of query-optimizing transformations and actual evaluation of queries. The steps involved in processing a query appear in figure 4.1.The steps are
Parsing and translation
Optimization
Evaluation
Before query processing can begin, the system must translate the query into unable forms. A language such as SQL suitable for human use, but is ill-suited to be the system’s internal representation of a query. A more useful internal representation is one based on the extended relational algebra. Thus, the first action the system, must take in query processing is to translate a given query into its internal form, In generating the internal from of the query, the parser checks the syntax of the user’s query, verifies that the relation names appearing in the query are names of the relations in the database, and so on. The system constructs a parse-tree representation of the query, which it then translates into a relational-algebra expression.
As an illustration, consider the query. select balance from account where balance < 2500 This query can be translated into either of the following relational algebra expressions:
balance < 2500 ( balance (account))
balance ( balance < 2500 (account)) balance balance<2500;use index 1
account
A query-evaluation plan FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
102
A relational-algebra operation annotated with instructions on how to evaluate it is called an evaluation primitive. A sequence of primitive operations that can be used to evaluate a query is a query-execution plan or query-evaluation plan. The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query. Once the query plan is chosen, the query is evaluated with that plan, and the result of the query is output. In order to optimize a query, optimizer must know the cost of each operation. Although the exact cost is hard to compute, since it depends on many parameters such as actual memory available to the operation, it is possible to get a rough estimate of execution cost for each operation.
Parser and translator
query
relational algebra expression
optimizer
query output
evaluation engine
Execution plan
4.2 MEASURES OF QUERY COST
Statistics about data
The cost of query evaluation can be measured in terms of a number of different resources, including disk accesses, CPU time to execute a query, and, in a distributed or parallel
Figure 4.1 Steps in query processing
database system, the cost of communication. The response time for a queryâ&#x20AC;&#x201C;evaluation plan (that is, the clock time required to execute the plan), assuming no other activity is going on the
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
103
computer, would account for all these costs, and could be used as a good measure of the cost of the plan. In large database systems, however, disk accesses (which we measure as the number of transfers of blocks from disk) are usually the most important cost, since disk accesses are slow compared to in-memory operations. Finally, estimating the CPU time is relatively hard compared to estimating the disk-access cost. Therefore, most people consider the disk-access cost a reasonable measure of the cost of a query-evaluation plan. We use the number of block transfers from disk as a measure of the actual cost. To get more precise numbers, we need to distinguish between sequential I/O, where the blocks read are contiguous on disk, and random I/O, where the blocks are noncontiguous, and an extra seek cost must be paid for each disk I/O operation. We also need to distinguish between reads, and writes of blocks, since it takes more time to write a block to disk than to read a block from dist. A more accurate measure would therefore estimate 1. The number of seek operations performed. 2. The number of blocks read 3. The number of blocks written and then add up these numbers after multiplying them by the average seek time, average transfer time for reading a block, and average transfer time for writing a block respectively. The cost estimated we give ignore the cost of writing the final result of an operation back to disk. These are taken into account separately where required. The costs of all the algorithms that we consider depend on the size of the buffer in main memory. In the best case, all data can be read into the buffers, and the disk does not need to be accessed again. In the worst case, we assume that the buffer can hold only a few blocks of data â&#x20AC;&#x201C; approximately one block per relation. When presenting cost estimates, we generally assume the worst case.
4.3 SELECTION OPERATION In query processing, the file scan is the lowest-level operator to access data. File scans are search algorithms that locate and retrieve records that fulfill a selection condition. In relational systems a file scan allows an entire relation to be Pread in those cases where the relation is stored in a single dedicated file.
4.3.1 Basic Algorithms ď&#x201A;§ A1(Linear Search) In a linear search, the system scans each file block and tests all records to see whether they satisfy the selection condition. For a selection on a key attribute, the system can terminate the scan if the required record is found, without looking at the other records of the relation. The cost of linear search, in terms of number of I/O operations, is br, where br denotes the number of blocks in the file. Although it may
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
104
be slower than other algorithms for implementing selection, the linear search algorithm can be applied to any file, regardless of the ordering of the file, or the availability of indices, or the nature of the selection operations. A2 (Binary Search) If the file is ordered on an attribute, and the selection condition is an equality comparison on the attribute; we can use a binary search to locate records that satisfy the selection. The system performs the binary search on the blocks of the file.
4.3.2 Selections Using Indices Index structures are referred to as access paths, since they provide a path through which data can be located and accessed. Primary index is an index that allows the records of a file to be read in an order that corresponds to the physical order in the file. An index that is not a primary index is called a secondary index. Search algorithms that use an index are referred to as index scans. Search algorithms that use an index are:
A3 (Primary index, equality on key)
A4 (Primary index, equality on nonkey).
A5 (secondary index, equality).
4.3.3 Selections Involving Comparisons: Selection involving comparisons are of two types
A6 (Primary index, comparison)
A7 (Secondary index, comparison) The secondary index provides to the records, but to get the actual records we have to
fetch the records by using the pointers. Therefore the secondary index should be used only if very few records are selected.
4.3.4 Implementation of Complex Selections So far, we have considered only simple selection conditions of the form A op B, where op is an equality or comparison operation. We now consider more complex selection predicates.
Conjunction: A conjunctive selection is a selection of the form. 12 … n(r)
Disjunction: A disjunctive selection is a selection of the form 12 …n(r)
A disjunctive condition is satisfied by the union of all records satisfying the individual, simple conditions i.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
105
Negation: The result of a selection (r) is the set tuples of r for which the condition evaluates to false. In the absence of nulls, this set is simply the set of tuples that are not in (r).
We can implement a selection operation involving either a conjunction or a disjunction of simple conditions by using one of the following algorithms.
A8 (As conjunction selection using one index).
A9 (Conjunctive selection using composite index).
A10 (Conjunctive selection by intersection of identifiers).
A11 (Disjunctive selection by union of identifiers).
4.4 SORTING Sorting of data plays an important role in database systems for two reasons. First, SQL queries can specify that the output be sorted. Second, and equally important for query processing, several of the relational operations, such as joins, can be implemented efficiently if the input relations are first sorted. We can sort a relation by building an index on the sort key, and then using that index to read the relation is sorted order. However, such a process orders the relation only logically, through an index, rather than physically. Hence, the reading of tuples in the sorted order may lead to a disk access for each record, which can be very expensive, since the number or records can be much larger than the number of blocks. For this reason, it may be desirable to order the records physically. The problem of sorting has been studied extensively, both for relations that fit entirely in main memory, and for relations that are bigger than memory. In the first case, standard sorting techniques such as quick-sort can be used. Sorting of relations that do not fit in memory is called external sorting. The most commonly used technique for external sorting is the external sort- merge algorithm. 1. In the first stage, a number of sorted runs are created; each run is sorted, but contains only some of the records of the relation. 2. In the second stage, the runs are merged.
The output of the merge stage is the sorted relation. The output file is buffered to reduce the number of disk write operations. The preceding merge operation is a generalization of the twoway merge used by the standard in-memory sort- merge algorithm; it merges N runs, so it is called an N-way merge.
4.5 JOIN OPERATION We use the term equi-join to refer to a join of the form r ⋈r.A=s.B s, where A and B attributes of sets of attributes of relations r and s respectively
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
106
4.5.1 Nested – Loop Join This algorithm is called the nested-loop join algorithm, since it basically consists of a pair of nested for loops, Relation r is called the outer relation and relation s the inner relation of the join, since the loop for r encloses the loop for s. Like the linear file-scan algorithm for selection, the nested-loop join algorithm requires no indices, and if can be used regardless of which the join condition is. The nested-loop join algorithm is expensive, since it examines every pair of tuples in the two relations. 4.5.2 Block Nested – Loop Join If the buffer is too small to hold either relation entirely in memory, we can still obtain a major saving in block accesses if we process the relations on a per-block basis, rather than on a per-tuple basis. Block nested-loop join, which is a variant of the nested-loop join where every block of the inner relation is paired with every block of the outer relation. Within each pair of blocks, every tuple in one block is paired with every tuple in the other block, to generate all pairs of tuple. The primary difference in cost between the block nested-loop join and the basic nestedloop join is that, in the worst case, each block in the inner relation s is read only once for each block in the outer relation, instead of once for each tuple in the outer relation.
4.5.3 Indexed Nested-Loop Join In a nested–loop join if an index is available on the inner loop’s join attribute, index lookups can replace file scans, For each tuple tr in the outer relation r, the index is used to look up tuples in s that will satisfy the join condition with tuple t r This join method is called an indexed nested-loop join; it can be used with existing indices, as well as with temporary indices created for the sole purpose of evaluating the join. The cost is lower than for a block nested-loop join. 4.5.4 Merge Join The merge join algorithm (also called sort-merge join algorithm) can be used to compute natural joins and equal-joins. Let r(R) and s(S) be the relations whose natural join is to be computed, and let RS denote their common attributes. Suppose that both relations are sorted on the attributes RS. Then, their join can be computed by a process much like the merge stage in the merge-sort algorithm.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
107
4.5.5 Hash Join Like the merge join algorithm, the hash join algorithm can be used to implement natural joins and equi-joins. In the hash join algorithm, a hash function h is used to partition tuple of both relations. The basic idea is to partition the tuples of each of the relations into sets that have the same hash value on the join attributes.
4.6 OTHER OPERATIONS Other relational operations and extended relational operations â&#x20AC;&#x201C; such as duplicate elimination, projection, set operations, outer join, and aggregation â&#x20AC;&#x201C; can be implemented as outlined in Section 13.6.1 though
4.6.1 Duplicate Elimination We can implement duplicate elimination easily by sorting. Identical tuples will appear adjacent to each other during sorting, and all but one copy can be removed. We can also implement duplicate elimination by hashing, as in the hash join algorithm. Because of the relatively high cost of duplicate elimination, SQL requires an explicit request by the user to remove duplicate; otherwise, the duplicates are retained.
4.6.2 Projection We can implement projection easily by performing projection on each tuple, which gives a relation that could have duplicate records, and then removing duplicate records. Duplicates can be eliminated by the methods described above. If the attributes in the projection list include a key of the relation, no duplicates will exist; hence, duplicate elimination is not required. Generalized projection can be implemented in the same way as projection.
4.6.3 Set Operations We can implement the union, intersection, and set-difference operations by first sorting both relations, and then scanning once through each of the sorted relations to produce the result.
4.7 EVALUATION OF EXPRESSIONS The obvious way to evaluate an expression is simply to evaluate one operation at a time, in an appropriate order. The result of each evaluation is materialized in a temporary relation for subsequence use. A disadvantage to this approach is the need to construct the temporary relations, which (unless they are small) must be written to disk. An alternative approach is to evaluate several operations simultaneously in a pipeline, with the results of one operation passed on to the next, without the need to store a temporary relation.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
108
4.7.1 Materialization If we apply the materialization approach, we start from the lowest-level operations in the expression (at the bottom of the tree). By repeating the process, we will eventually evaluate the operation at the root of the tree, giving the final result of the expression. Evaluation as just described is called materialized evaluation, since the results of each intermediate operation are created (materialized) and then are used for evaluation of the next – level operations. The cost of a materialized evaluation is not simply the sum of the costs of the operations involved. When we computed the cost estimates of algorithms, we ignored the cost of writing the result of the operation to disk. To compute the cost of evaluating an expression as done here, we have to add the cost of all the operations, as well as the cost of writing the intermediate results to disk. 4.7.2 Pipelining We can improve query-evaluation efficiency by reducing the number of temporary files that are produced. We achieve this reduction by combining several relational operations into a pipeline of operations, in which the results of one operation are passed along to the next operation in the pipeline. Evaluation as just described is called pipelined evaluation. Combining operations into a pipeline eliminates the cost of reading and writing temporary relations.
QUERY OPTIMIZATION
4.8 Overview Query Optimization is the process of selecting the most efficient query-evaluation plan from among the many strategies usually possible for processing a given query, especially if the query is complex. Consider the relational-algebra expression for the query “Find the names of all customers who have an account at any branch located in Brooklyn”
customer_name ( branch_city = “Brooklyn” (branch ⋈ (account ⋈ depositor)))
This expression constructs a large intermediate relation, branch ⋈ account ⋈ depositor. By reducing the number of tuples of the branch relation that we need to access, we reduce the size of the intermediate result. Our query is now represented by the relational-algebra expression.
customer_name (( branch_city = “Brooklyn” (branch)) ⋈ (account ⋈ depositor))
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
109
Which is equivalent to our original algebra expression, but which generates smaller intermediate relations. Given a relational algebra expression, it is job of the query optimizer to come up with a query-evaluation plan that computes the same result as the given expression, and is the leastcostly way of generating the result. Generation of query-evaluation plans involves three steps: (1) generating expressions that are logically equivalent to the given expression, (2) estimating the cost of each evaluation plan, and (3) annotating the resultant expressions in alternative ways to generate alternative query-evaluation plans. Steps (1) and (3) are interleaved in the query optimizer – some expressions are generated and annotated, then further expressions are generated and annotated, and so on. Step (2) is done in the background by collecting statistical information about the relations, such as relation sizes and index depths, to make a good estimate of the cost of plan. To implement the first step, the query optimizer must generate expressions equivalent to a given expression. It does so by means of equivalence rule that specify how to transform an expression into a logically equivalent one.
4.9 TRANSFORMATION OF RELATIONAL EXPRESSIONS A query can be expressed in several different ways, with different costs of evaluation. Two relational- algebra expressions are said to be equivalent if, on every legal database instance, the two expressions generate the same set of tuples. (Recall that a legal database instance is one that satisfies all the integrity constraints specified in the data base schema). Note that the order of the tuples is irrelevant; the two expressions may generate the tuples in different order, but would be considered equivalent as long as the set of tuples in the same.
4.9.1 Equivalence Rules As equivalence rule says that expressions of two forms are equivalent. We can replace an expression of the first form by an expression of the second form, or vice versa- that is we can replace an expression of the second form by an expression of the first form-since the two expressions would generate the same result on any valid database. The optimizer uses equivalence rules to transform expression into other logically equivalent expressions. We now list number of general equivalence rules on relational algebra expressions. We use , 1, 2, and so on to denote predicates, L1, L2, L3, and so on to denote lists of attributes and E1, E2, E3, and so on to denote relational-algebra expression. A relation name r is simply a special case of a relational-algebra expression, and can be used wherever E appears.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
110
1. Conjunctive selection operations can be deconstructed into a sequence of the individual selections. This transformation is referred to as a cascade of . 12 (E) = 1 (2 (E)) 2. Selection operations are commutative. 1 (2 (E)) = 2 (1 (E)) 3. Only the final operations in a sequence of projection operations are needed, the others can be omitted. This transformation can also be referred to as a cascade of . L1 ( L1 ( … ( Ln (E)) … )) = L1 (E) 4. Selections can be combined with Cartesian products and theta joins. a. (E1 x E2) = E1 ⋈0 E2 This expression is just the definition of the theta join. b. 1 (E1 ⋈0 E2) = E1 ⋈12 E2
5. Theta-join operations are commutative. E1 ⋈0 E2 = E2 ⋈0 E1 Actually, the order of attributes differs between the left-hand side and right-hand side, so the equivalence does not hold if the order of attributes is taken into account. A projection operation can be added to one of the sides of the equivalence to appropriately recorder attributes, but for simplicity we omit the projection and ignore the attribute order in most of our examples. Recall that the natural-join operator is simply special case of the theta-join operator; hence, natural joins are also commutative, 6. a. Natural-join operations are associative. ( E1 ⋈ E2 ) ⋈ E3 = E1 ⋈ ( E2 ⋈ E3 ) b. Theta joins are associative in the following manner: ( E1 ⋈1 E2 ) ⋈2 3 ⋈ E3 = E1 ⋈1 3 ( E2 ⋈ 2 E3 ) Where 2 involve attributes from only E2 and E3. Any of these conditions may be empty; hence, it follows that the Cartesian product (x) operation is also associative. The commutatively and associatively of join operations are important for join reordering in query optimization.
7. The selection operation distributes over the theta-join operation under the following two conditions. a. It distributes when all the attributers in selection condition 0 involve only the attributes of one of the expressions (say, E1 ) being joined. 0 (E1 ⋈ E2 ) = (0 ( E1)) ⋈ E2
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
111
b. It distributes when selection condition 1 involves only the attributes of E1 and 2 involves only the attributes of E2. 1 2 (E1 ⋈ E2 ) = (1 ( E1)) ⋈ (2(E2))
8. The projection operation distributes over the theta-join operation under the following conditions. a. Let L1 and L2 be attributes of E
1
and E2, respectively. Suppose
that the join
condition involves only attributes in L1 L 2 . Then L1 L 2 (E1 ⋈ E2 ) = ( L1 (E1 )) ⋈ ( L2 (E2 )) b. Consider a join E1 ⋈ E2 Let L1 and L2 be sets of attributes from E1 and E2 respectively. Let L3Then be attributes of E1 that are involved in join condition , but are not in L1 U L 2, and let L4 be attributes of E2 that are involved in join condition , but are not in L1 U L 2 Then, L1 L 2 (E1 ⋈ E2 ) = ( L1 L2 (( L1 L3 (E1 )) ⋈ ( L2 L4 (E2 )))
9. The set operations union and intersection are commutative. E1 E2 = E2 E1 E1 E2 = E2 E1 Set difference is not commutative. 10. Set union and intersection are associative. (E1 E2) E3 = E1 (E2 E3) (E1 E2) E3 = E1 (E2 E3) 11. The selection operation distributes over the union, intersection and set-difference operations. p (E1 – E2 ) = p (E1) – p (E2) Similarly, the preceding equivalence, with – replaced with either or ,
also holds.
Further, p (E1 – E2 ) = p (E1) – E2 The preceding equivalence, with – replaced by , also holds, but does not hold if – is replaced by . 12. The projection operation distribution over the union operation. L (E1 E2) = ( L (E1 )) ( E2))
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
112
4.10 ESTIMATING STATISTICS OF EXPRESSION RESULTS The cost of an operation depends on the size and other statistics of its inputs. Given an expression such as a ⋈ (b ⋈ c) to estimate the cost of joining a with (b ⋈ c), we need to have estimates of statistics such as the size of b ⋈c.
4.10.1 Catalog Information The DBMS catalog stores the following statistical information about database relations:
nr, the number of tuples in the relation r.
br, the number of blocks containing tuples of relation r.
lr, the size of a tuple of relation r in bytes.
fr, the blocking factor of relation r–that is, the number of tuples of relation r that fit into one block.
V(A, r) the number of distinct values that appear in he relation r for attribute A. This value is the same as the size of A(r). If A is a key for relation r, V(A, r ) is nr .
The last statistic, V(A, r), can also be maintained for sets of attributes, if desired, instead of just for individual attributes. Thus given a set of attributes, A,V (A, r) is the size of II A (r) . If we assume that the tuples of relation r stored together physically in a file, the following equation holds: br =
nr fr +
Statistics about indices, such as the heights of B -tree indices and number of leaf pages in the indices, are also maintained in the catalog.
4.10.2 Selection Size Estimation The size estimate of the result of a selection operation depends on the selection predicate. We first consider a single equality predicate, then a single comparison predicate, and finally combinations of predicates.
A = a (r)
A (r)
Complex selections: 1. Conjunction: A conjunctive selection is a selection of form 1 2 … n (r)
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
113
2. Disjunction : A disjunctive selection is a selection of the form 1 2 … n (r) 3. Negation : In the absence of nulls, the result of a selection is of the from ( r) 4.10.3 Join Size Estimation In this section, we see how to estimate the size of the result of a join The Cartesian product r x s contains nr * ns tuples. Each tuple of r x s occupies lr + ls bytes, from which we can calculate the size of the Cartesian product. Estimating the size of a natural join is somewhat more complicated than estimating the size of a section or of a Cartesian product, Let r(R) and s(S) be relations.
If R S = – That is, the relations have no attribute in common – then r ⋈ s is the same as r x s, and we can use our estimation technique for Cartesian products.
If R – S is a key for R, then we know that a tuple of s will join with at most one tuple from r. Therefore, the number of tuples in r ⋈ s is no greater than the number of tuples in s. the case where R S is a key for S is symmetric to the case just described. If R S forms a foreign key of S, referencing R, the number of tuples in r ⋈ s is exactly the same as the number of tuples in s.
The most difficult case is when R S is a key for neither R nor S. In this case, we assume as we did for selections, that each value appears with equal probability. Consider a tuple t of r, and assume R S + {A}. We estimate that tuple t produces ns V (A, s) Tuples in r ⋈ s, since this number is the average number of tuples in s with a given value for the attributes A, Considering all the tuples in r, we estimate that there are
tuples in r ⋈ s. Observe that, if wenreverse r * ns the roles of r and s in the preceding estimate. We obtain an estimate of
V (A, s) nr * ns V (A, s)
tuples in r ⋈ s. These two estimates differ if V (A, r) V (A, s), If this situation occurs, there are likely to be dangling tuples that do not participate in the join. Thus, the lower of the two estimates is probably the more accurate one.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
114
4.11 CHOICE OF EVALUATION PLANS Generation of expressions is only part of the query-optimization process, since each operation in the expression can be implemented with different algorithms. An evaluation plan is therefore needed to define exactly what algorithm should be used for each operation, and how the execution of the operations should be coordinated. 4.11.1 Interaction of Evaluation Techniques One way to choose an evaluation plan for a query expression is simply to choose for each operation the cheapest algorithm for evaluating it. We can choose any ordering of the operations that ensures that operations lower in the tree are executed before operations higher in the tree. However, choosing the cheapest algorithm for each operation independently is not necessarily a good idea. Given an evaluation plan, we an estimate its cost using statistics estimated by the techniques coupled with cost estimates for various algorithms and evaluation methods. Depending upon the indices available, certain selection operations can be evaluated using only an index without accessing the relation itself. That still leaves the problem of choosing the best evaluation plan for a query. There are two broad approaches: The first searches all the plans, and chooses the best plan in a cost based fashion. The second uses heuristics to choose a plan. Practical query optimizers incorporate elements of both approaches.
DISTRIBUTED DATABASES
4.12 INTRODUCTION A distributed database is a database physically stored in two or more computer systems. Although geographically dispersed, a distributed database system manages and controls the entire database as a single collection of data. If redundant data is stored in separate databases due to performance requirements, updates to one set of data will automatically update the additional sets in a timely manner. A distributed system is one in which both data and transaction processing are divided between one or more computers connected by a network, each computer playing a specific role in the system. Distributed databases bring together the advantages of distributed computing and database management. A distributed database is a collection of multiple logically interrelated databases distributed over a computer network and a distributed database management system is the software that manages a distributed database while making the distribution transparent to the user. Distributed databases use client/server architecture to process information requests.
4.13 HOMOGENEOUS AND HETEROGENEOUS DATABASES
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
115
In a homogeneous distributed database system, all sites have identical databasemanagement system software, are aware of one another, and agree to cooperate in processing users requests. In such a system, local sites surrender a portion of their autonomy in terms of their right to change schemas or database management system software. That software must also cooperate with other sites in exchanging information about transactions, to make transaction processing possible across multiple sites. In contract, in a heterogeneous distributed database, different sites may use different schemas and different database-management system software. The sites may not be aware of one another, and they may provide only limited facilities for cooperation in transaction processing. The difference in schemas is often a major problem for query processing, while the divergence in software becomes a hindrance for processing transactions that access multiple sites.
4.14 Distributed Data Storage Consider a relation r that is to be stored in the database. There are two approaches to storing this relation in the distributed database.
Replication. The system maintains several identical replicas (copies) of the relation, and stores each replica at a different site. The alternative to replication is to store only one copy of relation r.
Fragmentation. The system partitions the relation into several fragments, and stores each fragment at a different site.
Fragmentation and replication can be combined. A relation can be partitioned into several fragments and there may be several replicas of each fragment.
4.14.1 Data Replication If relation r is replicated, a copy of relation r is stored in two or more sites. In the most extreme case, we have full replication, in which a copy is stored in every site in the system. There are a number of advantages and disadvantages to replication.
Availability. If one of the sites containing relation r fails, then the relation r can be found in another site. Thus, the system can continue to process queries involving r, despite the failure of one site.
Increased parallelism. In the case where the majority of accesses to the relation r result in only the reading of the relation, then several sites can process queries involving r in parallel. The more replicas of r there are, the greater the chance that the needed data will be found in the site where the transaction is executing. Hence, data replication minimizes movement of data between sites.
Increased overhead on update. The system must ensure that all replicas of a relation r are consistent, otherwise, erroneous computations my result. Thus, wherever r is
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
116
updated. The update must be propagated to all sites containing replicas. The result is increased overhead.
4.14.2 Data Fragmentation If relation r is fragmented, r is divided into a number of fragments r 1, r2,…, rn. These fragments contain sufficient information to allow reconstruction of the original relation r. There are two different schemes for fragmenting a relation: horizontal fragmentation and vertical fragmentation. Horizontal fragmentation splits the relation by assigning each tuple of r to one or more fragments. Vertical fragmentation splits the relation by decomposing the scheme R of relation r.
In horizontal fragmentation, a relation r is partitioned into a number of subsets, r 1, r2, …, rn. Each tuple of relation r must belong to at least one of the fragments, so that the original relation can be reconstructed, if needed. Horizontal fragmentation is usually used to keep tuples at the sites where they are used the most, to minimize data transfer. Vertical fragmentation of r(R) involves the definition of several subsets of attributes R 1, R2, ….., Rn of the schema R so that R1 = R1U R2U …… U Rn Each fragment, ri of r is defined by ri = Ri (r) The fragmentation should be done in such a way that we can reconstruct relation r from the fragments by taking the natural join. r = r1 ⋈ r2 ⋈ r3 ⋈ …..⋈ r n The two types of fragmentation can be applied to a single schema; for instance, the fragments obtained by horizontally fragmenting a relation can be further partitioned vertically. Fragments can also be replicated. In general, a fragment can be replicated; replicas of fragments can be fragmented further, and so on.
4.14.3 Transparency The user of a distributed database system should not be required to know either where the data are physically located or how the data can be accessed at the specific local site. This characteristic, called data transparency, can take several forms:
Fragmentation transparency. Users are not required to know how a relation has been fragmented.
Replication transparency. Users view each data object as logically unique. The distributed system may replicate an object to increase either system performance or data
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
117
availability. Users do not have to be concerned with what data objects have been replicated, or where replicas have been placed.
Location transparency: Users are not required to know the physical location of the data. The distributed database system should be able to find any data as long as the data identifier is supplied by the user transaction. Data items–such as relations, fragments, and replicas – must have unique names. This
property is easy to ensure in a centralized database. In a distributed database, however, we must take to ensure that two sites do not use the same name for distinct data items One solution to this problem is to require all names to be registered in a central name server. The name server helps to ensure that the same name does not get used for different data items.
4.15 DISTRIBUTED TRANSACTIONS Access to the various data items in a distributed system is usually accomplished through transactions, which must preserve the ACID properties. There are two types of transaction that we need to consider. The local transactions are those that access and update data in only one local database; the global transactions are those that access and update data in several local databases. Ensuring the ACID properties of the local transaction can be done essay. However, for global transactions, this task is much more complicated, since several sites may be participating in execution. The failure of one of these sites, or the failure of a communication link connecting these sites, may result in erroneous computations.
4.15.1 System Structure Each site has its own local transaction manager, whose function is to ensure the ACID properties of those transactions that execute at that site. The various transaction managers cooperate to execute global transaction.
The transaction manger manager the execution of those transactions that access data stored in a local site. Note that each such transaction may be either a local transaction or part of a global transaction.
The transaction coordinator coordinates the execution of the various transactions initiated at that site.
The structure of a transaction manager is similar in much respect to the structure of a centralized system. Each transaction manager is responsible for
Maintaining a log for recovery purposes.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
118
Participating in an appropriate concurrency-control scheme to coordinate the concurrent execution of the transactions executing at that site. The transaction coordinator subsystem is not needed in the centralized environment,
since a transaction accesses data at only a single site. A transaction coordinate, as its name implies, is responsible for coordinating the execution of all the transactions initiated at that site. For each such transaction, the coordinator is responsible for
Starting the execution of the transaction.
Breaking the transaction into a number of sub transactions and distributing these subs transactions to the appropriate sites for execution.
Coordinating the termination of the transaction which may result in the transaction being committed at all sites or aborted at all sites.
4.15.2 System Failure Modes A distributed system may suffer from the same types of failure that a centralized system does (for example, software error, hardware errors, or disk crashes) There are, however, additional types of failure with which we need to deal in a distributed environment. The basic failure types are
Failure of a site
Loss of messages
Failure of a communication link
Network partition
The loss or corruption of messages is always a possibility in distributed systems. The system uses transmission-control protocols, such as TCP/IP, to handle such errors. However, if two sites A and B are not directly connected, messages from one to the other must be routed through a sequence of communication links. If a communication link fails, messages that would have been transmitted across the link must be rerouted. In some cases, it is possible to find another route through the network, so that the messages are able to reach their destination. In other cases, a failure may result in there being no connection between some pairs of sites. A system is partitioned if it has been split into two (or more) subsystems called partitions that lack any connection between them. Note that, under this definition, a subsystem may consist of a single node.
4.16 COMMIT PROTOCOLS If we are to ensure atomicity, all the sites in which a transaction T executed must agree on the final outcome of the execution. T must either commit at all sites, or it must abort at all sites. To ensure this proper, that transaction coordinator of T must execute a commit protocol.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
119
Among the simplest and most widely used commit protocols is the two-phase commit protocol (2PC). An alternative is the three-phase commit protocol (3PC), which avoids certain disadvantages of the 2PC protocol but adds to complexity and overhead.
4.16.1 Two-Phase Commit Mechanism A database must guarantee that all statements in a transaction, distributed or nondistributed, either commit or roll back as a unit. The effects of an ongoing transaction should be invisible to all other transactions at all nodes; this transparency should be true for transactions that include any type of operations, including queries, updates, or remote procedure calls. In a distributed database, the database management system must coordinate transaction control with the same characteristics over a network and maintain data consistency, even if a network or system failure occurs. The two-phase commit mechanism guarantees that all databases servers participating in a distributed transaction, either all commit or all roll back the statements in the transaction. A two-phase commit mechanism also protects implicit DML, operations performed by integrity constraints, remote procedure calls and triggers. Two-phase commit allows groups of transactions across several nodes to be treated as a unit; either all the transactions commit or they all get rolled back. The use of two-phase commit is transparent to the users. The two-phase commit consists of two-phase– prepare phase and Execute phase.
Prepare Phase An initiating node called the global coordinator notifies all sites involved in the transaction to be ready to either commit or roll back the transaction. The coordinator sends a message “prepare for commit” to each node to get ready for committing the transaction. Each participating database receiving the message will force-write (i.e., write all pages updated by a transaction are immediately written to disk) all the transaction details to disk and then send a “ready to commit” or “OK” signal to the coordinator. If the force-writing to the disk falls or if the local transaction cannot commit for some reason, the participating database sends a “cannot commit” or “not OK” signal to the coordinator. If the coordinator does not receive a reply from a database within a certain timeout interval, it assumes a “not OK” response. This is illustrated in Figure 4.2.
Execute phase If all participating databases reply “OK”, the coordinator signals “OK”. This means that the transaction is successful. The coordinator sends a “commit” signal for the transaction to all the participating databases.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
120
Each participating database completes the transaction by permanently updating the database. On the other hand, if one or more of the databases has given a “not Ok” signal, then the communicator also signals a “not OK’. The coordinator will then send a message to “rollback” or undo the local effect of the transaction to each participating database. Thus, if there is no problem with the prepare phase, then all sites commit their transactions; if a network or node failure occur, then all sites roll back their transactions. The execute phase is illustrated in Figure 4.3. Thus, two-phase commit (also known as 2PC) is a feature of transaction processing systems that enables distributed or multi-database systems to be returned to the pretransaction state if some error condition occurs. A single transaction can update many different databases. The two-phase commit strategy is designed to ensure that either all the databases are updated or none of them are, so that the databases remain synchronized. Database changes required by a transaction are initially stored temporarily by each database. The transaction monitor then issued a “precommit” command to each database, which requires an acknowledgement. If the monitor receives the appropriate response from each database, the monitor issues the “commit” command, which causes all databases to simultaneously make the transaction changes permanent.
Coordinator Get ready
Get ready
I am ready
Get ready
Database1 Database1
I am ready
I am ready
Database 2
Database 3
Figure 4.2 Prepare Phase of the Two-Phase Commit Protocol
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
121
Coordinator Commit
Commit
Commit
OK OK
OK
Database1 Database1
Database 2
Database 3
Threeâ&#x20AC;&#x201C;Phase Commit The three-phase commit (3PC)Phase protocol anTwo-Phase extension of Commit the two-phase commit protocol Figure 4.3 Execute of is the Protocol that avoids the blocking problem under certain assumptions. In particular, it is assumed that no network partition occurs, and not more than k sites fails, where k is some predetermined number. Under these assumptions, the protocol avoids blocking by introducing and extra third phase where multiple sites are involved in the decision to commit. Instead of directly noting the commit decision in its persistent storage, the coordinator first ensures that at least k other sites know that it intended to commit the transaction. If the coordinator fails, the remaining sites first select a new coordinator. This new coordinator checks the status of the protocol from the remaining sties; it the coordinator had decided to commit, at least one of the other k sites that it informed will be up and will ensure that the commit decision is respect. The new coordinator restarts the third phase of the protocol if some site knew that the old coordinator restarts the third phase of the protocol if some site knew that the old coordinator intended to commit the transaction, Otherwise the new coordinator aborts the transaction. While the 3PC protocol has the desirable property of not blocking unless k sites fails, it has the drawback that a partitioning of the network will appear to be the same as more than k sites failing, which would lead to blocking. The protocol also has to be carefully implemented to ensure that network partitioning (or more than k sites failing) does not result in inconsistencies, where a transaction is committed in one partition, and aborted in another. Because of its overhead, the 3PC protocol is not widely used.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
122
4.17 Concurrency Control in Distributed Databases We assume that each site participates in the execution of commit protocol to ensure global transition atomicity.
4.17.1 Locking Protocols The various locking protocols can be used in a distributed environment. The only change that needs to be incorporated is in the way the lock manager deals with replicated data.
Single Lock-Manager Approach In the single lock-manager approach, the system maintains a single lock-manger that resides in a single chosen site-say Si. All lock and unlock request are made at site Si. When a transaction needs to lock a data item, it sends a lock request to Si. The lock manager determines whether the lock can be granted immediately. If the lock can be granted, the lock manager sends a message to that effect to the site at which the lock request was initiated. Otherwise, the request is delayed until it can be granted, at which time a message us sent to the site at which the lock request was initiated. The transaction can read the data item from any one of the sites at which a replica of the data item resides. In the case of a write, all the sites where a replica of the data item resides must be involved in the writing. The scheme has these advantages.
Simple implementation. This scheme requires two messages for handling lock requests and one message for handling unlock request.
Simple deadlock handling. Since all lock and unlock requests are made at one site, the deadlock-handling algorithms can be applied directly to this environment.
The disadvantages of the scheme are:
Bottleneck. The site Si becomes a bottleneck, since all requests must be processed there.
Vulnerability. If the site Si fails, the concurrency controller is lost. Either processing must stop, or a recovery scheme must be used so that a backup site can take over lock management from Si.
Distributed Lock Manager A compromise between the advantages and disadvantages can be achieved through the distributed-lock-manager approach, in which the lock-manger function is distributed over several sites. Each site maintains a local lock manager whose function is to administer the lock and unlock requests for those data items that are stored in that site. When a transaction wishes to lock data item Q, which is not replicated and resides at site Si, a message is sent to the lock manager at site Si requesting a lock (in a particular lock model). If data item Q is locked in an
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
123
incompatible mode, then the request can be granted, the lock manager sends a message back to the initiator indicating that it has granted the lock request. The distributed-lock-manager scheme has the advantage of simple implementation, and reduces the degree to which the coordinator is a bottleneck. It has a reasonably low overhead; requiring two message transfers for handling lock requests and one massage transfer for handling unlock requests. However, deadlock handling is more complex, since the lock and unlock requests are no longer made at a single site.
Primary Copy When a system uses data replication, we can choose one of the replicas as the primary copy. Thus, for each data item Q the primary copy of Q must reside in precisely one site, which we call the primary site of Q.
Biased Protocol The biased Protocol is another approach to handling replication. The difference from the majority protocol is that requests for shared locks are given more favorable treatment than requests for exclusive locks. ď&#x201A;§
Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site that contains a replica of Q
ď&#x201A;§
Exclusive locks. When a transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites that contain a replica of Q.
4.17.2 Timestamping The principal idea behind the timestamping scheme in is that each transaction is given a unique timestamp that the system uses in deciding the serialization order. .Our first task, then, in generalizing the centralized scheme to a distributed scheme is to develop a scheme for generating unique timestamps. Then, the various protocols can operate directly to the non replicated environment.
Local unique timestamp
Site identifier
Global unique identifier FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
Figure 4.4 Generation of unique timestamps
DATABASE MANAGEMENT SYSTEM
124
There are two primary methods for generating unique timestamps, one centralized and one distribute. In the centralized scheme, a single site distributes the timestamps. The site can use a logical counter or its own local clock for this purpose. In the distributed scheme, each site generates a unique local timestamp by using either a logical counter or the local clock. We obtain the unique global timestamp by concatenating the unique local timestamp with the site identifier, which also must be unique. (Figure 4.4)
4.18 Directory Systems Consider an organization that wishes to make data about its employees available to a variety of people in the organization; examples of the kinds of data include name, designation, employee-id, address, email address, phone number, fax number, and so on. In general, a directory is a listing of information about some class of objects such as person, Directories can be used to find information about a specific object, or in the reverse direction to find object, that meet a certain requirement. In the world of physical telephone directories, directories that satisfy lookups in the forward direction are called white Pages, while directories that satisfy lookups in the reserve direction are called yellow pages. 4.18.1 Directory Access Protocols Several directory access protocols have been developed to provide a standardized way of accessing data in a directory. The most widely used among them today is the Lightweight Directory Access Protocol (LDAP).
Advantages ď&#x201A;§
First, directory access protocols are simplified protocols that cater to a limited type of access to data. They evolved in parallel with the database access protocols.
ď&#x201A;§
Second and more important, directory systems provide a simple mechanism to name objects in a hierarchical fashion, similar to file system directory names, which can be used in a distributed system to specify what information is stored in each of the directory servers.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
125
UNIT 5 OBJECT – ORIENTED CONCEPTS
5.1 OBJECTS: COMPONENTS AND CHARACTERISTICS
Object Oriented concepts have their roots in programming. In OO systems everything we deal with is an object, whether it is a student, an invoice, an airplane, an employee, a service, a menu paned, a reports, or so forth. Some objects are tangible and some are not. We can define an object within the OO environment as an abstract representation of a real –world entity that has a unique identity, embedded properties, and the ability to interact with other object and itself. A defining characteristic of an object is its unique identity. 5.1.1 OBJECT IDENTITY The object’s identity is represented by an object ID (OID), which is unique to that object. The OID is assigned by the system at the moment of the object’s creation and cannot be changed under any circumstances. The OID is assigned by the system, does not depend on the object’s attribute values, and cannot be changed. The OID can be deleted only if the object is deleted, and that OID can never be reused.
5.1.2 ATTRIBUTES (INSTANCE VARIABLES) Objects are described by their attributes, known as instance variables in an OO environment. For example, the student may have the attributes (instance variables) such as SOCIAL_SECURITY_NUMBER, FIRST_NAME, MIDDLE_INITIAL, LAST_NAME, and so on. Each attributes has a unique name and a data type associated with it. Traditional data type, also known as base data types or conventional data types, are used in most programming languages and include real, integer, string, and so on.
Attributes also have a domain. The domain logically groups and describes the set of all possible values that an attribute can have. Domains can also be defined as a list of possible values separated by commas. For example the “GENDER’ domain can be defined as “Male, Female” or as “M, F”. Just as in the E-R model, an object’s attribute can be single-valued or multivalued. Therefore, the object’s attribute can draw a single value or multiple values from its domain. Object attributes may reference one or more other objects.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
126
5.1.3 OBJECT STATE The object state is the set of values that the object’s attributes have at a given time. Although the object’s state can vary, its OID remains the same. If we want to change the object’s state, we must change the values of the object’s attributes.To change the object’s attribute values; we must send a message to the object. This message will invoke a method.
5.1.4 MESSAGES AND METHODS A method is the code that performs specific operation on the object’s data. Methods protect data from direct and unauthorized access by other objects. Every operation performed on an object must be implemented by a method. Methods are used to charge the object’s attribute values or to return the value of selected object attributes. Methods represent real-world actions. In effect, methods are the equivalent of procedures in traditional programming languages. In OO terms, methods represent the object’s behavior. Every method is identified by a name and has a body. The body is composed of computer instruction written in some programming language to represent a real-world action. To invoke a method you send messages to the object. A message is sent buy specifying a receiver object, the name of the method, and any required parameters. Denial of access to the structure ensures the integrity of the object’s state and hides the object’s internal details. The ability to hide the object’s internal details (attributes and methods) is known as encapsulation. An object may also send messages to change or interrogate another object’s state. (To interrogate means to ask for the interrogated object’s instance variable value or values) To perform such object-change and interrogation tasks, the method’s body can contain references to other objects methods (send messages to other objects).
5.1.5 CLASSES OO systems classify objects according to their similarities and differences. Objects that share common characteristics are grouped into classes. In other words a class is a collection of similar objects with shared structure (attributed) and behavior (methods). A class contains the description of the data structure and the method implementation details for the objects in that class Therefore, all objects in a class share the same structure and respond to the same messages. In addition, a class acts as a “storage bin” for similar objects. Each object in a class is known as a class instance or object instance (see Figure 5.1).
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
Method1
Method1
127
Method1
Method1
Object 1
Object 2
Object 3
Object 4
Object 5
Object 6
Object instances (1,2,3,4,5,6) share the structure and methods of the class Figure 5.1 Class Illustration 5.1.6 PROTOCOL The class’s collection of messages, each identified by a message name, constitutes the object or class protocol. The protocol represents an object’s public aspect; that is, it is known by other objects as well as end users. In contrast the implementation of the object’s structure and methods constitutes the object’s private aspect. 5.1.7 SUPERCLASSES, SUBCLASSES, AND INHERITANCE Classes are organized into a class hierarchy. A Class hierarchy resembles an upsidedown tree in which each class has only one parent class. The class hierarchy is known as class lattice if its classes can have multiple parent classes. Class is used to categorize objects into groups of objects that hare common characteristics. The class hierarchy introduces a powerful OO concept known as inheritance; Inheritance is the ability of an object within the hierarchy to inherit the data structure and behavior (method) of the classes above it, For example, the Plano class in Figure 5.2 inherits its data structure and behaviors from the uppercases Stringed instruments and Musical instruments. The Piano inherits the strings and its sounding board characteristic from the Stringed instruments and the musical scale from its Musical instruments superclass. It is through inheritance that OO systems can deliver code reusability.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
128
Superclass
Musical Instruments
Stringed Instruments
Piano
Violin
Guitar
Wind Instruments
Trumpet
Superclass/ Subclass
Subclasses Flute
Figure 5.2 Musical Instruments Class Hierarchy
In OO systems, all objects are derived from the superclass Object, or the root class. Therefore, all classes share the characteristics and methods of the superclass Object. The inheritance to data and methods goes from top to bottom in the class hierarchy. Two variants of inheritance exist: single and multiple inheritance. SINGLE INNERITANCE Single inheritance exists when a class has only one immediate (parent) superclass above it. Most of the current OO systems support single inheritance. When the system, sends a message to an object .instance, the entire hierarchy is searched for the matching methods, using the following sequence. 1. Scan the class to which the object belongs 2. If the method is not found, scan the superclass. The scanning process is repeated until either one of the following occurs. 1. The method is found. 2. The top of the class hierarchy is reached without finding the method. The system then generates a message to indicate that the method was not found.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
129
Employee
Pilot
Superclass
Mechanic
Subclasses
Single Inheritance MULTIPLE INNERITANCE Multiple inheritance exist when class can have more than one immediate (parent) superclass above it. Motorcycle subclass inherits characteristics from both the Motor inheritance, illustrating that the Motorcycle subclass inherts characteristic from both the Motor Vehicle and Bicycle superclasses. Example
Motor Vehicle
Bicycle
Motorcycle
Superclasses
Subclass
Multiple Inheritance
5.1.8 METHOD OVERRINGD AND POLYMORPHSIM We may override a superclassâ&#x20AC;&#x2122;s method definition by redefining the method at the subclass level. In contrast to method overriding, polymorphism allows different objects to respond to the same message in different ways. Polymorphism is a very important feature of OO systems because its existence allows objects to behave according to their specific characteristics. In OO terms, polymorphism means that: 1. We may use the same name for a method defined in different classes in the class hierarchy.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
130
The user may send the same message to different objects that belong to different classes and yet always generate the correct response.
5.1.9 ABSTRACT DATA TYPES A data type describes a set of objects with similar characteristics conventional programming languages use a set of predefined base data types (real, integer, and string or character).Base data types are subject to a predefined set of operations. For example, the integer base data type allows operations such as addition, subtraction, multiplication, and division Like conventional data types. Abstract data types (ADT) describe a set of similar objects. However, an abstract data type differs from a conventional data type in that:
The ADT’s operations are used defined.
The ADT does not allow direct excess to its internal data representation or method implementation. In other words, the ADT encapsulates its definition, thereby hiding its characteristics.
Some OO systems Such as Smalltalk, implement base data types as ADTs. To create an abstract data type we must define
Its name.
The data representation or instance variables of the objects belonging to the abstract data type; each instance variable has a data type that may be a base data type or another ADT
The abs tact data type operations and constraints, both of which are implemented through methods.
5.1.10 OBJECT CLASSIFICATION An object can be classified according to the characteristics (simple, composite, compound, hybrid, and associative) of its attributes. A simple object is an object that contains only single valued and associative) of its attributes. A simple object is an object that contains only single-value attributes and has no attributes that refer to another object. A composite object is an object that contains at least one Multivalued attributed and has no attributes that refer to another object. A Compound object is an object that contains at least one attribute that references another object.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
131
A hybrid object is an object that contains a repeating group of attributes, and at least one of the repeating attributes refers to another object. Finally, an associative object is an object used to represent a relationship between two or more objects. The associative object can contain its own attributes to represent specific characteristics of the relationship between two or more objects. 5.2 CHARACTERISTICS OF AN OBJECT ORIENTED DATA MODEL Object-oriented data model (OODM) also known as an object data model or ODM. At the ever least, an object-oriented data model must.
Support the representation of complex object.
Be extensible, that is, it must be capable of defining new data types as well as the operations to be performed on them
Support encapsulation; that it, the data representation and the method’s implementation must be hidden from external entities.
Exhibit inheritance; an object must be able to inherit the properties (data and methods) of other objects.
Support the notion of object identity (OID)
SUMMARY 1. The OOBM models real-world entities as objects. 2. Each object is composed of attribution and a set of methods. 3. Each attribute can reference another object or a set of objects 4. The attributes and the methods; implementation are hidden, encapsulated, from other objects. 5. Each object’s is identified by a unique object ID (O/D), which is independent of the value of its attributes. 6. Similar objects are described and grouped in a class that contains the description of the data (attributes or instance variables any the methods; implementation.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
132
7. The class describes a type of object 8. Classes are organized in a class hierarchy. 9. Each object of a class inherits all properties of its superclasses in the class hierarchy. TABLE COMPARING THE OO AND E-R MODEL COMPONENTS OO DATA MODEL
E-R MODEL
Type
Entity definition
Object
Entity
Class
Entity set
Instance Variable
Attribute
N/A
Primary key
OID
N/A
Method
N/A
Class hierarchy
E-R diagram (data schema)
5.2.1 OBJECT SCHEMAS: THE GRAPHICAL REPRESENTATION OF OBJECTS A graphical representation of an object resembles a box, with the instance variable names inside the box. Generally speaking; the object representation is shared by all objects in the class. 5.2.1 CLASS-SUBCLASS RELATIONSHIP Classes inherit the properties of their superclasses in the class hierarchy. This property leads to the use of the table ”is a“ to describe the relationship between the classes within the hierarchy. That is, an employee is a person, and a student is a person.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
133
Superclass
Person
Employee
Student
Subclasses
Class Hierarchy 5.2.3 INTEROBJECT RELATIONSHIPS: ATTRIBUTE-CLASS LINKS In addition to supporting the class-subclass relationship, the OODM supports the attribute –class relationship. An attribute-class or interobject relationship is created when an object’s attribute references another object of the same or different class. The interobject relationship is different from the class-subclass relationship. 5.2.3 LATE AND EARLY BINDING: USE AND IMPORTANCE A very desirable OODM characteristic is its ability to let and object’s attribute contain objects that define different data types (or classes) at different times. With this feature an object can contain a numeric value for a given instance variable, and the next object (of the same class) can contain a character value for the same instance variable. This characteristic is achieved through late binding. With late binding the data type of an attribute is not known until execution time or run- time. Therefore, two different object instances of the same class can contain values of different data types for the same attribute In contrast to the OODM’s ability to use late binding, a conventional DBMS requires that a base data type be defined for each attribute at the time of its creation, For example, suppose that you want to define an INVENTORY to contain the following attributes .ITEM_TYPE, DESCRIPTION, VENDOR, WEIGHT, and PRICE. In a conventional DBMS you create a table named INVENTORY and assign a base data type to each attributes as shown in Figure5.3.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
Table: INVENTORY Attributes:
ITEM_TYPE DESCRIPTION VENDOR WEIGHT PRICE
134
Conventional (Base) Data Type numeric character numeric numeric numeric
Figure 5.3 Table with Predetermined (BASE) Data Types Thus when working with conventional database systems, the designer must define the data type for each attribute when the table structure is defined. The approach to data type definition is called early binding. Early binding allows the database to check the data type for each of the attribute’s values a t compilation or definition time. 5.2.4 SUPPORT FOR VERSIONING Versioning is an OODM feature that allows us to track the history of change in the state of an object Versioning is thus a very powerful modeling feature, especially in computer aided design (CAD) environments. 5.3 OODM AND PREVIOUS DATA MODELS: SIMILARITES AND DIFFERENCES 5.3.1 OBJECT, ENTITY, AND TUPLE The OODM concepts of object moves well beyond the concept of entity or tuple in other data models. Although an OODM object resembles the entity and the tuple in the E-R and relational models, an OODM object has additional characteristics, such as behavior, inheritance, and encapsulation and OODM characteristics make OO modeling much more natural than E-R and relational modeling. 5.3.2 CLASS, ENTITY SET, AND TABLE The concept of class can be associated with the E-R and relational models’ concepts of entity set and table, respectively. However, class is a more powerful concept that allows not only
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
135
the description of the data structure but also the description of the behavior of the class objects. A class also allows both the concept and the implementation of abstract data types in the OODM. 5.3.3 ENCAPSULATION AND INHERITANCE ADT brings us two other OO features that are not supported in previous models, encapsulation and inheritance. Classes are organized in class hierarchies. An object belonging to a class inherits all the properties of its superclasses. Encapsulation means that the data representation and the methodsâ&#x20AC;&#x2122; implementation are hidden from other objects and from the end user. Conventional models do no incorporate the methods found in the OODM. The closest thing to methods is the use of triggers and stored procedures in SQL databases. 5.3.4 OBJECT ID The object ID (OID) is not supported in either the E-R or the relational model. Although database users may argue that Oracle Sequences and MS Access .AutoNumber provide the same functionality as an OID, this argument is true only to the extent that they can be used to uniquely identify data elements, however, unlike the object model in which the relationship are implicit, the relational model still uses value-based relationships. 5.3.5 RELATIONSHIP The main property of any data model is found in its representation of relationship among the data components. The relationships in an OODM can be of two types interclass references or class hierarchy inheritance. The E-R and the relational models use a value-based relationship approach. 5.3.6 ACCESS The E-R and relational data models depend on the use of SQL to retrieve data from the database SQL is an ad hoc, set-oriented query language that is based on a formally defined mathematical model 5.4 OBJECT ORIENTED DATBASE MANGEMENT SYSTEMS During the past two years the data management and application environment has become far more complex than the one envisioned by the creation of the hierarchical, network, or relational DBMSs. Such complex application environments may best be served by an objectoriented database management system (OODBMS) The OODBMS is a database
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
136
management system that integrates the benefits of typical database systems with the more powerful modeling and computational (programming) characteristics of the object-oriented data model. (See Figure 5.4)
Object-oriented features
+
OODBMS
Conventional DBMS features
Figure 5.4 Object-Oriented Database Management Systems 5.4.1 FEATURES OF AN OBJECT-ORIENTED DBMS Optional Characteristics of the OODBMS The 13 rules are divided into two sets the first eight characterize an OO system, and the last five characterize a DBMS. The 13 rules are listed in Table 5.5. We will briefly discuss each of these rules. TABLE 5.5 THE THIRTEEN OODBMS RULES RULES THAT MAKE IT AN OO SYSTEM Rule 1
The system must support complex objects.
Rule 2
Object identity must be supported
Rule 3
Object must be encapsulated
Rule 4
The system must support types or classes
Rule 5
The system must support inheritance.
Rule 6
The system must avoid premature binding.
Rule 7
The system must be computationally complete
Rule 8
The system must be extensible. RULES THAT MAKE IT A DBMS
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
Rule 9
The system must be able to remember data locations.
Rule 10
The system must be able to manage very large databases.
Rule 11
The system, must accept concurrent users.
Rule 12
The system must be able to recover from the hardware and software failures.
Rule 13
Data query must be simple.
137
Optional OODBMS features include
Support for multiple inheritance.
Support for distributed OODBMSs
Support for versioning.
5.5 OODBMS: ADVANTAGES AND DISADVANTAGES ADVANTAGES
OODBMSs allow the inclusion of more semantic information in the database, thus providing a more natural and realistic representation of real world object.
OODBMSs provide an edge in the support for complex object, which makes them especially desirable in specialized application areas. Conventional database simply lack the ability to provide efficient applications IN CAD, CAM, medical imaging, spatial imaging, and specialized multimedia environments.
OODBMSs permit the extensibility of base data types, thereby increasing both the database functionality and its modeling capabilities.
If the platform allows efficient caching, OOBMSs provide dramatic performance. Improvements compared to relational database systems when managing complex objects.
Versioning us a useful feature for specialized applications such as CAD, CAM medical imaging, spatial imaging, engineering, text management, and desktop publishing.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
138
The reusability of classes allows for faster development and easier maintenance of the database and its applications
Faster application development time is obtained through inheritance and reusability. the benefit is obtained only after mastering the use of OO development features such as
The proper use of the class hierarchy: for example, how to use existing classes to create new classes.
OO design methodology
The OOBMS provides a possible solution to the problem of integrating existing and future DBMSs into a single environment. This solution is based on its strong data abstraction capabilities and its promise of portability
DISADVANTAGES
OODBMSs face strong and effective from the firmly established RDBMSs, especially when those RDBMSs– such as IBM’s DB2 Universal Database and Oracle 8.0 – incorporate many OO features that would otherwise have given the OODBMS the clear competitive edge in a complex data environment. Therefore, the OODBMS’s. design and implementation complexities become more difficult to justify. .
The OODBMS is based on the object model – which lacks the solid theoretical foundation of the relational model on which the RDBMS is built.
In some sense, OOBMSs are considered a throwback to the traditional pointer systems used by hierarchical and network models. This criticism is not quite on the thank when a associates the pointer system with the navigational data manipulation style and fixed access paths that led to the relational system’s dominance. Nevertheless, the complexity of the OODBMS pointer systems cannot be denied.
OODBMSs do not provide a standard ad hoc query language, as relational systems do. At this point. The development of the object query language (OQL) is far from complete. Some OODBMS implementations are beginning to provide extensions to the relational SQL, to make the integration of the OODBMS and RDBMS possible.
The relational DBMS provides a comprehensive solution to business database design and management needs, supplying both a data model and a set of fairly straightforward
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
139
normalization rules for designing and evaluating relational databases. OODBMSs do not yet provide a similar set of tools.
The initial learning curve for the OODBMS is steep. If you consider the direct training costs and the time it takes to fully master the uses and advantages of object orientation, you will appreciate why OODBMSs seldom are rated as the first option when solutions are sought for noncomplex business oriented problems.(Programmers are especially likely to resist changed)
The OODBMS’s low market presence, combined with its steep learning curve, means that there are few people who are qualified to make use of the presumed power of OO technology. Most of the technology is currently, focused on engineering application areas of software development Therefore, only companies with the right mix of resources (money, time, and qualified personnel) can afforded to invest in OO technology.
5.6 HOW OCC OCNCEPTS HAVE INFLUENCED THE RELATIONAL MODEL Most relational database are deigned to serve general business applications that require ad hoc queries and easy interaction. The data types encountered in such applications are well defined and are easily represented in common tabular formats with equally common short and well defined transactions. However, RDBMS are not quite as well suited as OODBMSs to the complex application requirements found in CAM, medical and spatial imaging, engineering design, simulation modeling, architectural design, or pure scientific modeling. The first changing daft environment has forced relational model advocates to respond to the OO challenge by extending the relational model’s conceptual reach. The result of their efforts is usually referred to as the Extended Relational Mode (ERM) or, more appropriately, the Object/Relational Model (O/RM).Although this O/RM effort is still a work in progress, its basic features provide supports for.
Extensibility of new user-defined (abstract) data types
Complex objects
Inheritance
Procedure calls (rules or triggers)
System-generated identifiers (OID surrogates)
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
140
ADVANCED DATA TYPES AND NEW APPLICATIONS
The growth of mobile computers, starting with laptop computers and pocket organizers and extending in more recent years to mobile phones with built-in computers and a variety of wearable computers that are increasingly used in commercial applications
Motivation
Temporal data Temporal databases store time-dependent (temporal) data.
Spatial data Spatial databases provide concepts for databases that keep track of objects in a multidimensional space.
Multimedia data Multimedia databases provide features that allow users to store and query different types of multimedia information like images, video clips, audio clips and documents.
5.7 TEMPORAL DATABASES Temporal databases store temporal data, i.e., data that is time-department (time varying). Typical temporal database scenarios and applications are the following: Economical data is frequently time-dependent–share prices, exchange rates, interest rates, company profits, etc, vary over time. This means that we need to store not only the respective value but also an associated data or a time period for which the value is valid. More sophisticated analysis might want to correlate interest rates and exchange rate or share prices trends. This means that an interest rate value has to be related to an exchange rate value using the data or period for which the values are valid – they have to be valid during the same period of time in this example. Many companies offer products whose prices vary over time. Daytime telephone calls. Our everyday life is very often influenced by timetables for buses, trains, flights, university lectures, laboratory access and even cinema, theatre or TV programs.
As one
consequence, many people plan their daily activities by using diaries which itself is a kind of timetable. And again, timetables or diaries can be regarded as temporal relations in terms of a relational temporal data model. Medical diagnosis often draws conclusions from a patient’s i.e., from the evolution of his/her illness. The latter is described by a series of values, such as the body
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM temperature, cholesterol concentration in the blood pressure, etc,.
141 As in the first
example, each of these values is only valid during a certain period of time (e.g. a certain day). Typically, a doctor would retrieve a patient’s values’ history, analyze trends and base his diagnosis on his observations.
Similar examples can be found in many areas that rely on the observation of evolutionary processes, such as environmental studies, economics and many natural sciences.
5.7.1 Temporal Database Management System Temporal database management systems (TDBMS) support the maintenance and manipulation of temporal data in many possible ways. Temporal support can affect many but no necessarily all of the following issues:
It can provide an entire temporal data model which consists of a temporal data definition language (DDL) and a Data manipulation language (DML). This means that temporal objects can be defined via the DDL and can be created, updated and retrieved via the DML.
User-defined time is already an integral part of the relational data model. Thus, there might be a temporal query language that simply offers a set of temporal operators and predicates to enhance the search facilities.
Finally, there are various performance related issues such as temporal storage structures or the implementation of temporal operators.
5.7.2 Temporal and Conventional databases Temporal database can be regarded as an enhancement of a conventional database – it ‘only’ sets the data into a context of time, i.e. it adds a dimension. In a relational database environment, a relation can be considered as a table with rows representing individual tuples and columns holding the respective attribute values. Over time such a table is updated, i.e., new rows are inserted, some rows are deleted and some rows might be modified. This means that, the data in the table changes over time. If a copy of the table was taken each time before it is updated and if the data of the copy was added to all rows, we could actually follow the evolution of the table. And in fact, this is what many users require, just recall the share-prices example. In contrast, in many databases, one would only keep the current copy of a table or, as it is frequently called, the current snapshot. They are therefore referred to as snapshot databases Whenever an update occurs, a conventional database physically updates. i.e., throws old values away and stores the new ones, whereas a temporal database is updated logically, i.e., it makes the old and new values with timestamps that indicated to which snapshot they actually
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
142
belong-the current or a previous, historic one. This is also called the concept of Physical Vs Logical deletion. Time dimension that represents the time – dimension form DBMS’s point of view, i.e., the time when (update) transaction in the database take place. This type of time is therefore called transaction time. A transactions timestamp indicates the time when the associated tuple is currently in the database. Naturally, the most recent value of transaction time is always the current moment, e.g., represented as now in Figure 30.1. In terms of the time cube this means that the current snapshot is the front one, if time runs from the background to the foreground as in Figure 30.2 There is, however, a second notion of time which is called valid time. Whereas transaction time is restricted to the present and the past, valid time extends to the future as well. Imagine a hotel reservation system that stores room bookings in a table. For the staff that run the hotel, it is not really important to know when a room was booked, i.e., when the booking transaction took place (transaction time) but for which time a room is booked (valid time) which naturally must cover the future.
Additionally, past bookings might be stored as well, as the
management might want to analyze the information in order to analyze customer characteristics. We will see, more about valid and transaction times in the next section. In case of valid time the current snapshot-if it is supposed to be the snapshot giving, for example. As valid time extends to the future it is not necessarily the front one, however. Whereas a snapshot relation can be considered as a slice of a transaction time cube, a similar connection cannot be drawn for a valid time cube. The contents of a valid time relation that is currently held in the database cannot be regarded as slice. It might be the entire data represented by the cube or parts of it. Remember the example of the hotel reservation systems-it suggests that many conventional databases would store a significant amount of the data represented by the valid time cube. This means that, the latter are actually valid time databases which treat, valid time just as any other attribute. This underlines again that temporal DBMS emphasize and efficiently support the time dimension(s) but do not extend the expressive power of conventional databases. This is the fact on which many critics build their argument.
There is, however, no doubt about the existence and
widespread usage and demand for temporal databases. 5.8 SPATIAL AND GEOGRAPHIC DATA Spatial data support in databases is important for efficiently storing, indexing, and querying of data based on spatial locations. Two types of spatial data are particularly important:
Computer-aided-design (CAD) data, which includes spatial information about how objects - such as buildings, cars, or aircraft–are constructed.
Geographic data such as road maps, land-usage maps, topographic elevation maps, political maps showing boundaries, land ownership maps, and so on. Geographic
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
143
information systems are special-purpose databases tailored for storing geographic data.
5.8.1
Representation of Geometric Information A line segment can be represented by the coordinated of its endpoints. For example, in a
map database, the two coordinates of a point would be its latitude and longitude. A polyline consists of a connected sequence of line segments, and can be represented by a list containing the coordinated of the endpoints of the segment in sequence. Curves can be represented as sequence of arcs.
5.8. 2 Geographic Data Geographic data are spatial in nature, but differ from design data in certain ways. Maps and satellite images are typical examples of geographic data. Maps may provide not only location information-about boundaries, rivers, and roads, but also much more detailed information associated with locations, such as elevation, soil type, land usage, and annual rainfall. Geographic data can be categorized into two types:
Raster data. Such data consist of bit maps or pixel maps, in two or more dimensions.
Vector data. Vector data are constructed from basic geometric objects, such as points, line segments, triangles, and polygons in two dimensions, and cylinders, spheres, cuboids, and other polyhedrons in three dimensions.
5.8.2
Spatial Queries
There are a number of types of queries that involves spatial locations.
Nearness queries request objects that lie near a specified location. A query to find all restaurants that lie within a given distance of a given point is an example of a nearness query.
Region queries deal with spatial region. Such query can ask for objects that lie partially or fully inside a specified region.
Queries may also request intersection and unions of regions. For example, given region information, such as annual rainfall and population density, a query may request all regions with a low annual rainfall as well as a high population density.
5.9 MULTIMEDIA DATABASES Multimedia data, such a images, audio, and video – an increasingly popular form of data. Multimedia objects often have descriptive attributes, such as those indicating when they were created, who created them, and to what category they belong. One approach to building a
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
144
database for such multimedia objects is to use databases for storing the descriptive attributes and for keeping track of the files in which the multimedia objects are stored. Several issues have to be addressed if multimedia data are to be stored in a database.
The database must support large objects, since multimedia data such as videos can occupy up to a few gigabytes of storage. Many database systems do not support objects larger than a few gigabytes.
The retrieval of some types of data, such as audio and video, has the requirement that data delivery must proceed at a guaranteed steady rate. Such data are sometime called isochronous data, or continuous-media data.
Similarity-based retrieval is needed in may multimedia database applications.
5.9.1 Multimedia Data Formats Because of the large number of bytes required to represent multimedia data, it is essential that multimedia data be stored and transmitted in compressed form. For image data, the most widely used format is JPEG, named after the standards body that created it, the Joint Picture Experts Group. We can store video data by encoding each frame of video in JPEG format, but such an encoding is wasteful, since successive frames of video are often nearly the same. The Moving Picture Experts Group has developed the MPEG series of standards for encoding video and audio data.
5.9.2 Continuous-Media Data The most important types of continuous-media data are video and audio data. Continuous-media systems are characterized by their real-time information-delivery requirements:
Data must be delivered sufficiently fast that no gaps in the audio or video results.
Data must be delivered at a rate that does not cause overflow of system buffers.
Synchronization among distinct data streams must be maintained. This need arises, for example, when the video of a person speaking must show lips moving synchronously with the audio of the person speaking.
Several vendors offer video-on-demand servers. Current systems are based on file systems, because existing database systems do not provide the real-time response that these applications need. The basic architecture of video-on-demand systems comprises:
Video server. Multimedia data are stored on several disks. Systems containing a large volume of data use tertiary storage for less frequently accessed data.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
145
Terminals. People view multimedia data through various devices, collectively referred to as terminals. Examples are personal computers and televisions attached to a small, inexpensive computer called a set-top box.
Network. Transmission of multimedia data from a server to multiple terminals requires a high-capacity network.
5.9.3 Similarity-Based Retrieval In many multimedia applications, data are described only approximately in the database. Other examples are:
Pictorial data. Two pictures or images that are slightly different as represented in the database may be considered the same by a user. For instance, a database may store trademark designs. When a new trademark is to be registered, the system ay need first to identify all similar trademarks that were registered previously.
Audio data. Speech-based user interfaces are being developed that allow the user to give a command or identify a data item by speaking. The input from the user must then be tested for similarity to those commands or data items stored in the system.
Handwritten data. Handwritten input can be used to identify a handwritten data item or command stored in the database.
5.10 MOBILITY AND PERSONAL DATABASES Large–scale, commercial databases have traditionally been stored in central computing facilities. In distributed database applications, there has usually been strong central database and network administration. Two technology trends have combined to create application in which this assumption of central control and administration is not entirely correct 1. The increasingly widespread use of personal computers, and, more important, of laptop or notebook computers. 2. The development of a relatively low-cost wireless digital communication infrastructure, based on wireless local-area works, cellular digital packet networks, and other technologies. Mobile computing has proved useful in many applications. Wireless computing creates a situation where machines no longer have fixed locations and network addresses, Locationdepending queries are an interesting class of queries that are motivated by mobile computers.
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
146
5.10.1 A Model of Mobile Computing The mobile-computing environment consists of mobile computers referred to as mobile hosts, and a wired network of computers. Mobile hosts communicate with the wired network via computers referred to as mobile support stations. Each mobile support station, managers those mobile-hosts within its cell– that is, the geographical area that it covers. Mobile hosts may move between cells, thus necessitating a handoff of control from one mobile-support station to another. Since mobile hosts may, at times, be powered down, a host may leave one cell and rematerialized later at some distant cell. Therefore, moves between, cells are not necessarily between adjacent cells. Within a small area, such as a building, mobile hosts may be connected by a wireless local-area network (LAN) that provides lower-cost connectivity than would a widearea cellular network, and that reduces the overhead of handoffs. 5.10.2 Routing and Query Processing The route between a pair of hosts may change over time if one of the two hosts is mobile. This simple fact has a dramatic effect at the network level since location based network addresses are no longer constants within the system. Mobility results is dynamically changing communication costs, thus complicating the optimization process. Furthermore, there are competing notions of cost to consider.
User time is a highly valuable commodity in many business application
Connection time is the unity by which monetary charges are assigned tin come cellular systems.
Number of bytes, or packets, transferred is the unit by which charges are computed in some digital cellular systems.
Time-of-day-based charges vary, depending on whether communication occurs during peak or off-peak periods.
Energy is limited. Often, battery power is a scarce resource whose use must be optimized. A basic principle of radio communication is that it requires less energy to receive than to transmit radio signals. Thus, transmission and reception of data impose different power demands on the mobile host
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621
DATABASE MANAGEMENT SYSTEM
147
5.10.3 Broadcast Data It is often desirable for frequently requested data to be broadcast in a continuous cycle by mobile support stations, rather than transmitted to mobile hosts on demand. A typical application of such broadcast data is stock-market price information. 5.10.4 Disconnectivity and Consistency Since wireless communication may be paid for on the basis of connection time,. There is an incentive for certain mobile hosts to be disconnected for substantial periods. Mobile computers without wireless connectivity are disconnected most of the time when they are being used, except periodically when they are connected to their host computers, either physically or through a computer network During these periods of disconnections, the mobile host may remain in operation. The user of the mobile hosts may issue queries and updates on data that reside or are cached locally. Thus situation creates several problems, in particular. ď&#x201A;§
Recoverability: Updates entered on a disconnected machine may be lost if the mobile hosts experiences a catastrophic failure. Since the mobile host represents a single point of failure, stable storage cannot be simulated well
ď&#x201A;§
Consistency: Locally cached data may become out of date, but the mobile host cannot discover this situation until it is reconnected. Likewise, updates occurring in the mobile host cannot be propagated until reconnection occurs.
------------------------------------------------------------THE END----------------------------------------------------------
FOR MORE DETAILS VISIT US ON WWW.IMTSINSTITUTE.COM OR CALL ON +91-9999554621