and excelling in the competitive market of today. For example, “59375598702” is the data whereas the “phone number of Bob is 59375598702” is the information.
3. How information is smartly used to take decisions in a business? Explain. Ans: Organizations are capable of processing the information and using it to make decisions for surviving and excelling in the competitive market of today. The usage of information extends across the company. Although the applications vary by industry, one constant is the utilisation of accurate, timely information that may improve company operations and, as a result, the customer experience. When the customer experience is improved, sales, earnings, and businesses thrive. Information is fast and becoming the lifeblood of business, and its significance in an organization's long-term success cannot be emphasized.
4. Describe information cycle with its advantages and disadvantages. Ans: Data, pictures, text, documents, and speech are all forms of information. To obtain information, data is processed using models. The produced information reaches the receiver in good time and is then used to make judgments/decisions. These decisions can then cause additional events and activities, resulting in enormous quantities of dispersed data that is recorded and delivered as an input, and the cycle continues.
5. What do you mean by quality of information? Ans: Information must be accurate, timely and relevant. Accuracy means there should not be any manipulated or distorted information. It should clearly convey the meaning of the data it is based on. Timeliness of information is defined as information reaching the recipient within the needed period. The information may be of no use if it reaches/arrives late and relevancy means that the information that may be relevant for one user may not be relevant for another even though it might have been accurate as well as timely.
6. Explain the various stages of databases and discuss why we are using databases? Ans: Databases also have a finite lifespan. They also are replaced with successful databases that are more flexible and have up-to-date structure. There are generally six stages of the database lifecycle: Analysis: The stakeholders(organizations, universities etc.) who require database management services are interviewed in order to identify the problems, possibilities and constraints of the existing system. In this phase the objectives and scope of the new system to be developed is determined.
Design: The conceptual design is created in this phase from the requirements gathered during the analysis phase. A logical and physical design is also created for the easy implementation of the database in the next step. Implementation: In this step database management system (DBMS) is installed and databases are created. The data is imported in created databases. Testing: The database is tested and fine-tuned in respect to the applications that are going to use it. Operation: In this phase, the databases work properly and provide the needed information to the users. Maintenance: In this phase, changes are made to the database in response to new requirements.
A database can be defined as a collection of data, which is designed to be used by different people. The data stored in disks and tapes is always dispersed and hence difficult to control. Some other important reasons that move us towards adoption of databases for organization are as follows:
•
Duplication of data and effort from the users’ end.
•
Redundancy of data.
•
Data is hardcoded in applications.
•
Inconsistent data.
•
Data integrity issues.
•
Data security issues.
•
Concurrency issues.
•
Non-uniform backup and recovery methods.
•
Interdependence between data and programs.
•
Ad hoc data management techniques.
•
Ad hoc representation of relationships.
7. Define and discuss the components of a database and its uses.
Ans: Database system comprises of database and management software. So, the components of the database system can be categorized as follows: hardware, software, data, procedures, and database access language (query language in general), users, and database. Hardware: It refers to hard drives for storing data, magnetic tape, etc. for backing up the stored data, I/O channels for sending data to memory, which can be RAM or ROM, keyboard for input, monitor for output, and so forth. Software: It refers to the set or collection of programmes used to access, alter, or update data contained on storage media in a database. The programme understands the database access language and translates it into actual database commands that may be executed on the hardware-stored database. Data: It is the primary resource for which every database management software is built. Procedures: It includes the processes needed to configure and install a DBMS, the login and logout function in a DBMS, database administration, data store backups, and report generation. Database access language: It is used to access, delete, insert, and update data in a database. Users write commands in these languages, which are then submitted to DBMS for translation and execution on hardware. Database Administrators (DBAs): They are responsible for ensuring the authorized access to the database, coordinating the use of the database, and monitoring the usage of the database. They are responsible for purchasing the required hardware and software that is necessary for proper functioning of the database, which further affects database response time too. Database Designers: They are responsible for interacting with the users and understanding their requirements before actually storing the data in a database. Users: The users require access to the database for querying, updating, and generating reports. They are generally categorized on the basis of their frequency of access to the database. The people associated with a database are generally the ones who work on the scene, but the people associated behind the scene generally include DBMS system designers and implementers, tool developers and operators, or maintenance people.
8. Discuss the main characteristics of the database approach and how it differs from traditional file systems. Ans: •
•
Self-Describing: Database systems are self-describing in nature since they contain not just data but also metadata that defines the contents and connections between the data present in a database. This separation of data and information of data distinguishes it from typical file systems in which data definition is merely part of application programmes. Independence: Unlike in a file-based system, the structure of data saved in a database is specified in a data structure kept independently in the system catalogue rather than in the applications. If a database has to be modified, just the structure of the file needs to be changed, while the programmes stay unchanged. This property is called as
•
•
•
•
•
•
•
•
•
program-data independence. However, in the case of file-based storage, the applications that access the file must also be modified. Multiple Views: Multiple views are supported by the database. View is described as a subset of a database that is dedicated and defined based on the needs of the users, since it contains just the data that the user is interested in. Sharing: A database's data is shared by several users and applications. Concurrency control techniques are used to ensure that data accessible by many users and applications is always correct and maintains its integrity. Non-redundancy: Non-redundancy refers to the fact that no two data entries in a database should reflect the same real-world entity. In a more descriptive sense, it states that each data item should be saved in a database just once. However, redundancy still exists in databases to increase system speed, but it is restricted to application programming and is as little as feasible. Integrity: Integrity refers to the correctness of the data stored in a database. This is done by defining and enforcing constraints on the data items so that users enter only valid information in the database only. Constraints are rules that define the restrictions on the data to be entered in a particular field in a database. Security: All the users of a database may not have the same privileges as they have different roles and responsibilities. It may be possible that one of the users may require read-only access, i.e., the ability to only read the content but not modify or update it, and some may require read and write privileges, which provides the ability to both read, modify, and update the data in the database. Thus, in a database system, there must exist a control to provide different access rights to different user accounts and hence control unauthorized access too. Persistence: Persistence refers to the data's presence in the database beyond the scope of the operation that produced it, i.e., it exists in the database indefinitely. The data stored is never lost until it is deliberately removed. Even if a system failure occurs in the middle of a transaction, the transaction will be rolled back or completed completely, but the data will never be compromised. Consistency: Consistency implies that when a software attempts to update data in a database, the true purpose of the data should not be lost. The database values should be consistent in terms of the relationship and restrictions. This is accomplished through the use of ACID characteristics. Backup and recovery: A database may also fail, resulting in loss of all the information and a big loss to the organization that was owning it. It is backed up at regular intervals in order to avoid such catastrophes. Backup and recovery is the characteristic of a database that makes it more effective too. Any kind of data: A database should be capable of storing any kind of data. It should not be capable only of storing only employee id, name, salary, etc. but anything that can represent a real -world entity.
9. Explain the responsibilities of the DBAs and the database designers. Ans: DBAs are generally responsible for managing the resources. In a Database system, the main resource is the database itself, and DBMS being the secondary resource is responsible for managing the data on hard drives. DBA is responsible for ensuring the authorized access to the
database, coordinating the use of the database, and monitoring the usage of the database. He is also responsible for purchasing the required hardware and software that is necessary for proper functioning of the database, which further includes database response time. The administrator is also responsible for security breaches and threats to the database. Database Designers are responsible for interacting with the users and understanding their requirements before actually storing the data in a database. They build views that meet database and processing requirements. These views designed after interacting with potential user groups are integrated to form a single design. Generally, database designers are on the same staff as that of DBAs. Once the design is completed, these people share the responsibilities of DBAs.
10. Discuss the roles and responsibilities of different types of database end users. Ans: End users are those who use the database less regularly, or who view the data for various reasons each time. Managers who visit the database on a regular basis, for example, utilise advanced database query language to acquire this information. End users are categorised into three categories: casual, parametric or naive users and sophisticated users. Casual: These users learn few facilities, which they use repeatedly. Parametric or naive: These involves those who use canned transactions to continually access, modify, and update the database. Bank tellers who continually update accounts following debit, credit, and the establishment of numerous sorts of accounts are examples of parametric users. Naive users only need to grasp a few or very few DBMS features, but they must understand how to leverage interfaces to perform pre-packaged transactions. Sophisticated Users: It includes scientists, engineers, and business analysts. They become extremely familiar with the DBMS and then use it to satisfy their complicated requirements. These users learn about the database's most comprehensive features.
1. Discuss the hierarchical database model.
Ans: In hierarchical data model, data is arranged in a tree like structure. Records are connected through links with each other. There is only one parent node called as root node through which many child nodes are connected. Multiple parent nodes are not allowed in this model. In order to retrieve data from the hierarchical database model, we need to traverse the whole tree starting from the root node or the parent node. This type of model is also known as one-to-many relationship model, which can be explained by taking two tables A and B. If each record in table A is linked with different records of table B, but each record in table B has only one linked record in table A, this is known as one-to-many relationship in hierarchical database model. Hierarchical database model was used in the era of mainframe computers. In today’s era, it can be used in applications which require high performance and availability like banking and telecommunications. Also, it can be used for storing file systems and the geographic information. This type of database model can also be used for the Windows Registry in the Microsoft Windows Operating System. 2. Differentiate between hierarchical and network database models with the help of an example. Ans: The hierarchical model features a single parent node that connects to all of the child nodes, whereas the network data model includes "many-to-many" interactions. In hierarchical database model, there is only one parent node through which many child nodes are connected and parent node can also be called the root node whereas in network database model, instead of each child having one parent, the network data model allows each child to have more than one parent. 3. Relational database model is the most widely used database. Justify this statement. Ans: The relational database model can be accessed and managed by a special software called relational database management system(RDBMS). This model makes use of tables that represents the real-world entity in which data is arranged in the form of rows and columns. Structured query language (SQL) is used for maintaining and querying such databases. Rows are also known as tuples or records, while columns are also known as the attributes. Each table is related to only one entity that is why it is known as a relational database model. Relational database model can be used in storing financial records, manufacturing products, personal information, etc. 4. List the advantages of using object-oriented database model over relational model. Ans: Object oriented database model increases productivity. Inheritance aids in the reutilization of codes, lowering the cost and effort associated with managing the same data numerous times. Relational data model do not have the capability to handle the different types of data whereas object-oriented data model can store any type of data like such as; audio, video, and pictures. Object oriented database model supports navigational as well as associative access to information that is made possible by representing the relationships explicitly, which also results in better data access performance. 5. Object relational database model is more secure than relational model. Explain.
Ans: An object-relational database (ORD), sometimes known as an object-relational database management system (ORDBMS) is a type of database management system (DBMS) that is comparable to a relational database. It sits between relational databases and object-oriented databases. The functionality of the system can be extended in Object relational data model. This can be achieved using complex data types as well as advanced concepts of object-oriented model such as inheritance. 6. Compare and contrast different databases models. Ans: Hierarchical database model is the oldest database model in which the data is arranged in a tree like structure. The network data model resembles an upside-down tree, with the branches representing member data linked to the parent node. Because the data segments are connected to one other, navigating is simple. In order to overcome the complexity of the network and hierarchical database model, relational database model comes into the picture that can be accessed and managed by a special software called relational database management system (RDBMS). A relational database model being the most simple to use and popular, cannot manage complicated information systems since it needs the application developer to impose the information model into tables whose connections are specified by values. Data is kept in the form of objects in an object-oriented data model. These objects can be thought of class instances and it is only through these that we may gain entry to the class. Together, these classes and objects form an object-oriented model. Object relational database can be defined as the combination of both relational database and object-oriented database. This database model is also be known as the Object Relational Database Management Systems (ORDBMS). It supports all the properties of object-oriented programming like objects, classes, inheritance, etc. and supports data types, and tabular structures like relational database. In a conventional relational database model, a developer has only a limited number of data types to choose from when creating a database, but in a modern object relational data model, a developer may create their own data types when creating a database. As a result, the object relational data model enables developers to enhance the level of abstraction with which they approach the issue domain. 7. Differentiate between a database schema and a database state. Ans: Schema essentially specifies the structure of data and the relationships between data. In other words, schema is the description of the database that is typically given at the time of database creation and does not change regularly. Database state is also called as the current state and initially, it is an empty state, i.e., it is having no data. Database when is loaded with data initially gets into what is called as initial state. The data is then updated frequently, and state goes on changing at any point of the time. 8. Describe the three-schema architecture. Why do we need mappings between schema levels? How do different schema definition languages support this architecture? Ans: •
Internal level: The internal level, also known as the physical level, provides information on how data is preserved in storage devices as well as an inside look at the data that is physically stored. It works at the lowest level of data abstraction and deals
•
•
in depth with complicated low-level data structures, file structure, and access methods and also helps in data compression and in encryption techniques. Conceptual Level: It is the next level after internal level, also known as the logical level. It describes the type of data stored in the database and the linking of different elements with each other. This level basically defines the stored data structures in terms of the data model used. It also hides the actual storage's low-level complexity. Database administrators and designers work at the conceptual level to determine what sort of data should be maintained in the database. External Level: It is known as the view level and only refers to the area of the database in which the end user is interested. The external level of data abstraction is the most abstract. This level enables users to access only the sections of the database that they require rather than the entire database, saving time. Because different users have different database requirements, the database might contain a number of view-level abstractions.
Mappings are essentially the conversion of requests and outcomes from one level to another. The user, for example, may have various perspectives at the external schema level. These view requests must be translated to conceptual representations and subsequently to internal or physical representations. Similarly, the data collected from the database must be displayed to the user, which necessitates reformatting the information according to different levels back to the external level. Most schema-related data-related description languages or tools focus on the "physical level" and "view level," with the "conceptual level" primarily utilised in putting together the schema design itself. 9. Discuss and analyse logical data independence and physical data independence. Which one is harder to achieve? Why? Ans: Logical data is data about the database, which means it contains all of the information on how data is maintained in the database. It refers to the ability to change the conceptual schema without affecting the external schema that means, deleting a record, inserting more records, changing the constraints, etc. Physical data independence denotes the ability to modify the physical schema without affecting the conceptual schema or logical data. As a result, the external schema remains unchanged. When files on storage devices are rearranged or quicker access methods are implemented by offering new access pathways, the underlying schema may be altered. Logical independence is harder to achieve because it requires application programs to be unchanged even when structural or constraint changes occur.
1. Discuss the various features of an RDBMS? Solution: i) ii) iii) iv) v) vi) vii) viii)
We can store and fetch the data stored in the tables with RDBMS. The data is stored in the form of tables where row represents the record and column represents the attribute. Uniquely identify the records among the records of the table with the help of primary key. Fetch the records in higher speeds with the help of Indexes in RDBMS. Multiple views of a table is possible in RDBMS. Joining of two or more tables by a common attribute among the tables. Multi-user accessibility is facilitated to be controlled by individual users. A virtual table creation is enabled to store sensitive data and simplify queries.
These features are the reasons why RDBMS is widely used across the world. In simple words, RDBMS is the software system that allows storing data in the form of tables. As we discussed, data stored within RDBMS is in the form of rows and columns and this kind of a setup is often referred to as tuples and attributes. 2. Discuss and define different Codd’s 12 rules for Relational Database? Solution: E F Codd was a scientist who developed the relational database management model. The twelve rules of Codd describe the characteristics that a database management system must have in order to be classified as relational, i.e. a relational database management system. Rule 0: Rule of thumb Any system that claims to be a relational database management system must handle the data in a relational manner.. Rule 1: Information rule. Tables with rows and columns are used to represent all data (including metadata). The rows and columns must be in a specific order. Rule 2: Guaranteed entry. A combination of table name, primary key, and column name should be able to access all values in a database. (According to this provision, the ability to access data directly using a pointed is invalid).
Rule 3: Systematic Null values In a database, NULL values may be used to reflect the absence of data or invalid data in a systematic way. Any operation on NULL values must return NULL regardless of the data type. Rule 4: Online Catalog A database's catalogue is its full summary. It is accessible through the internet and provides additional database information. On the catalog, the query language used on the database is also used. Rule 5: Well-Structured Language Multiple languages can be supported by a relational database. However, at least one language should be able to perform all forms of data access, manipulation, and analysis. SQL is one such language. Rule 6:View/Updation rule The code should be able to update all views that are potentially updatable. Rule 7: Quality Data Manipulation Insertion, deletion, and updating should all be possible with a single operand. At all levels of relationships, this facility must be open. Rule 8: Physical Data Independence Physical improvements to the system, such as storage space changes, access methods changes, and so on, do not have an effect on the system's programmes and other operations. Rule 9: Logical Data Independence Even if the logical structure of the relational database changes, the user view should remain consistent. This law is extremely difficult to follow. Rule 10: Integrity Independence Integrity constraints specific to a relational database should be specified and stored in the catalogue in that database's language. This rule states that relational databases are not dependent on the front end. Rule 11: Distribution Independence Even if the database's data is spread over many sites, the end user can perceive it as a single entity. Furthermore, even though the database is spread throughout the network, it should function flawlessly. Rule 12: Nonsubversion Rule If a relational database has low-level access, it should not be able to alter data by circumventing integrity constraints and bypassing protection. 3. Explain and differentiate among primary key, foreign key, candidate key, super key?
Solution:
Primary Key
Candidate Key
Super key
Foreign Key
This ensure the unique value for each and every record in the relation.
Candidate specifies the key which can qualify for primary key.
This is an attribute or set of attributes used to uniquely identifies the records from a relation.
This is an attribute or group of attributes in a table by which we can link the multiple tables.
A relation will have While a relation can A relation can form A relation can have maximum one have more than one multiple super keys more than one foreign primary key. candidate keys. and from these keys we keys. can select the candidate keys. It does not allow Candidate key any Super key’s attributes It can also contain NULL values. attribute can contain can contain NULL NULL values. NULL value values. It is the minimal set of attributes those are required to uniquely identify the records in a relation.
This is the subset of super key that can be used to uniquely identify the records.
While Super Key can be any set that is used to uniquely identify the records from a relation.
An attribute referring to other relation is called the foreign key and the referred attribute should be primary key in another relation.
4. Referential Integrity is ensuring that no records can be entered without the permission of parent relation. Justify this statement with the help of an example. Solution: The values for a set of attributes in one relation must appear the same for the same set of attributes in another relation, according to referential integrity. The same attribute or attributes of other relations are referred to by the attribute or set of attributes of one relation. Consider the following student relationship.: Account (Acc_No, Branch_Name, Balance) Branch (Branch_name, Branchcity)
In this above example of account relation has an attribute Branch_name that is having the details of all the accounts details at different branches and in the branch relation branch_name is also an attribute that is primary Key. In the above example while we are doing any kind of manipulation at account relation we have to make sure that there should be branch exist of the same name in the branch relation. Here we are referring the values of Branchname from account relation to branch relation for validation, this concept is known as referential Integrity. The concept of referential Integrity is achieved by foreign key keyword in the database declaration. create table branch (Branch_name varchar (20) ,Branchcity varchar (20) not null, primary key (Branch_name), ); create table Account (Acc_no varchar (5) , Branch_name varchar (20) not null, balance int, primary key (Acc_no), foreign key(Branch_name) references Branch ); The foreign key attribute should be the primary key in the parent relation means Branch_name attribute should be Primary key in Branch relation. It is important to ensure that the value we are denoting in child relation that should be present in the parent relation otherwise it will be the violation of referential Integrity and the record is going to be rejected. It means if we insert a tuple in the account relation with the Branch_name value that doesn’t exist in the Branch_name attribute of some tuple in Branch relation. 5. Explain the unary operations in Relational Algebra with the help of an example? Solution: The procedural query language RELATIONAL ALGEBRA is commonly used. It takes relation instances as input and returns relation occurrences as output. This action is carried out using a variety of operations. Recursive relational algebra operations are performed on a relation. These operations produce a new relation, which can be made up of one or more input relations.
Relational Algebra operations are basically divided in two categories are Unary and binary Unary Operators • • •
Select Project Rename
Binary Operators • • • • • •
Union Intersection Set-Difference Cross Product Join Division
SELECT Operator The SELECT operator is used to find the subsets of records those are satisfying the given predicate. Symbol, Sigma(σ) is used to represent select operator. Example: Find out the names of all those students whose marks are greater than 50. σ marks>50(student) PROJECT Operator It only display those attributes, defined in the projection and eliminates rest of the attributes. With this operator we will get the subset that is vertically fragmented as per the attributes used with project operator. This will eliminates the duplicates and display all the unique values for the projected attributes.. For Example: Π Name, city (students) It will display the values for name and city attribute from the student relation. 6. Consider the following relational schema Employee (empno,name,office,age) Books(isbn,title,authors,publisher)
Loan(empno, isbn,date) Write the following queries in relational algebra. a) Find the names of employees who have borrowed a book Published by 1 McGrawHill? b) Find the names of employees who have borrowed all books Published by McGrawHill? c) Find the names of employees who have borrowed more than five different books published by McGraw-Hill? d) For each publisher, find the names of employees who have borrowed more than five books of that publisher? Solution: i) select name from employee e, books b, loan l where e.empno = l.empno and l.isbn = b.isbn and b.publisher = ‘McGrawHill’ ii) select name from employee e join loan l on e.empno=l.empno join (select isbn from books where publisher = 'McGrawHill') x on l.isbn=x.isbn group by e.empno,name having count(*)= (select count(*) from books where publisher=’McGrawHill’) iii) select name from employee,loan where employee.empno=loan.empno and isbn in ( select distinct isbn from books where publisher='McGraw-Hill') group by employee.empno,name having count(isbn) >=5 iv) select name from employee,loan,books where employee.empno=loan.empno and books.isbn=loan.isbn group by employee.empno, name,books.publisher having count(loan.isbn) >=5 7. List down the differences between MySQL and NoSQL in terms of implementation issues. Solution:
MySQL MySQL is the most widely used open source database on the planet. MySQL is the database backend for many of the web applications you can come across online, and it is used extensively by tech luminaries such as Facebook, Twitter, and Youtube. Users must scale relational databases on powerful servers, which are costly and difficult to manage. The database should be stored in the multiple servers for scaling the data but its difficult to manage these number of tables in different sites. NoSQL
NoSQL termed as “not limited to SQL”. In NoSQL we can store and manage the unstructured ,data without schema in the multiple nodes that doesn’t require the fixed table structure and due to limited support of join queries we can scale up in high amount. a. Scalability RDBMS (relational database management system) databases will scale vertically. As the load on an RDBMS database grows, we scale it by increasing server hardware capacity, which necessitates the purchase of more costly and larger servers, while NoSQL databases are built to scale horizontally and indefinitely. Horizontal scaling refers to expanding the pool of resources by adding more computers. b. Less Expensive Maintaining high-end RDBMS systems is costly, because database management requires qualified personnel, while NoSQL databases need less management. Many features, such as automatic repair, easier data delivery, and simpler data models, reduce the amount of administration and tuning required in NoSQL. c. Cost effective and open source NoSQL databases are low-cost and free to use. RDBMS databases are costly and require large servers and storage systems, while NoSql databases are simple to implement and usually use inexpensive servers to handle the exploding data and transactions. As a result, the cost of storing and processing data per gigabyte in NoSQL can be several times lower than in RDBMS. d. No Schema defination Data may be loaded into a NoSQL database without any predefined schema since it is schemaless. As a result, the format or data model may be modified at any time without affecting the programme. In SQL, change management is a big headache. As NoSQL id open source database, this is the greatest advantage but have limited establishment requirement. GUI mode database access tools are not widely available in the market, and finding nosql experts is difficult.
8. Explain the join operation? Differentiate between natural join and self-join with example. Solution: The different types of joins are as per the following : NATURAL JOIN (⋈) It can be only performed when joining relations will have the common attributes. The attribute's name and form must be the same. RollNo 1 2 3
Student Name Jagdish Ravi Tanupriya
Stu1 RollNo 1 2 4
F_Name Prayag Rajkumar Ramesh
Stu2 Stu1⋈stu2 will result the following RollNo 1 2
Student Name Jagdish Ravi
F_Name Prayag Rajkumar
A self-join is a join in which a table is joined with itself (which is also called Unary relationships), especially when the table has a FOREIGN KEY which references its own PRIMARY KEY. To join a table itself means that each row of the table is combined with itself and with every other row of the table. The self join can be viewed as a join of two copies of the same table. The table is not actually copied, but SQL performs the command as though it were. The syntax of the command for joining a table to itself is almost same as that for joining two different tables. To distinguish the column names from one another, aliases for the actual the
table name are used, since both the tables have the same name. Table name aliases are defined in the FROM clause of the SELECT statement. CREATE TABLE employee(emp_id varchar(5) NOT NULL, emp_name varchar(20) NULL, dt_of_join date NULL, emp_supv varchar(5) NULL, CONSTRAINT emp_id PRIMARY KEY(emp_id) , CONSTRAINT emp_supv FOREIGN KEY(emp_supv) REFERENCESemployee(emp_id));
1. Imagine that you have been assigned to a team that will be developing an inventory tracking system. As part of the project startup, your manager has asked each team leader to bring a basic work plan to the next meeting. At that meeting, these work plans will be analyzed to determine the overall project timeframe, costs, personnel requirements and software requirements. For now, as the team leader for the data design team, you have been asked to bring a work plan that identifies the phases of data design and includes the following information for each phase: a). A description of the data design phase, b). The inputs of the phase, c). The outputs of the phase, d). A key issue addressed in the phase e). A challenge that you can anticipate would occur in the phase. Prepare the response you will bring to the meeting. Solution: Description Conceptual Design Create model that captures major entities, relationships among entities, and attributes of entities required for a particular system.
Issue Capturing all Capturing relationships integrity
Logical Design Transform the major entity/attribute /relationship requirements into high level specification for database
Providing location for all data Data integrity
Improving Logical Design Improve the high-level database specification.
Minimizing Relational database redundancy schema Minimizing ambiguity
Physical Design Transform the Performance high-level specifications for Data integrity database into detailed specifications for how to construct actual database in a specific relational database software.
Input Output data Functional specs ER Diagram General Data understanding of problem
ER Diagram
Relational database schema Meaning of data
Relational database schema
Relational database schema
Technical specifications for construction of the database
2. UPS prides itself on having up-to-date information on the processing and current location of each shipped item. To do this, UPS relies on a company-wide information system. Shipped items are the heart of the UPS product tracking information system. Shipped items can be characterized by item number (unique), weight, dimensions, insurance amount, destination, and final delivery date. Shipped items are received into the UPS system at a single retail center. Retail centers are characterized by their type, uniqueID, and address. Shipped items make their way to their destination via one or more standard UPS transportation events (i.e., flights, truck deliveries). These transportation events are characterized by a unique scheduleNumber, a type (e.g, flight, truck), and a deliveryRoute. Create an Entity Relationship diagram that captures this information about the UPS system. Be certain to indicate identifiers and cardinality constraint Solution:
3. Production tracking is important in many manufacturing environments (e.g., the pharmaceuticals industry, children’s toys, etc.). The following ER diagram captures important information in the tracking of production. Specifically, the ER diagram captures relationships between production lots (or batches), individual production units, and raw materials.
Fig: 4.20: ER Diagram for manufacturing site convert the ER diagram into a relational database schema. Be certain to indicate primary keys and referential integrity constraints. Solution:
4. Assume we have the following application that models soccer teams, the games they play, and the players in each team. In the design, we want to capture the following: •
We have a set of teams, each team has an ID (unique identifier), name, main stadium, and to which city this team belongs.
•
•Each team has many players, and each player belongs to one team. Each player has a number (unique identifier), name, DoB, start year, and shirt number that he uses.
•
Teams play matches, in each match there is a host team and a guest team. The match takes place in the stadium of the host team.
•
For each match we need to keep track of the following: o The date on which the game is played o The final result of the match o The players participated in the match. For each player, how many goals he scored, whether or not he took yellow card, and whether or not he took red card. o During the match, one player may substitute another player. We want to capture this substitution and the time at which it took place. • Each match has exactly three referees. For each referee we have an ID (unique identifier), name, DoB, years of experience. One referee is the main referee and the other two are assistant referee.
Design an ER diagram to capture the above requirements. State any assumptions you have that affects your design (use the back of the page if needed). Make sure cardinalities and primary keys are clear. Solution:
5. What Are the Most Common Errors You Can Potentially Face in Data Modelling? Solution: i) ii) iii) iv) v) vi)
Inadequate Planning Poor description of database or poor schema definition Missing Naming Convention Anomalies of Normalization Improper Indexing Redundancy
6. In the Context of Data Modelling, What is the Importance of Metadata? Solution: Metadata characterizes data. It is used to provide documentation such that data can be understood and more readily consumed by your organization. Metadata answers the who, what, when, where, why, and how questions for users of the data. 7. Have you ever came across the scenario of recursive relationships? If yes, how did you handle it? Solution: If the same entity participates more than once in a relationship it is known as a recursive relationship. In the below example an employee can be a supervisor and be supervised, so there is a recursive relationship.
Superv ision
Employee
8. What is the number of child tables that can be created out from a single parent table? Solution: A child table tends to be one where it has one or more foreign keys pointing at some other table(s). 9. If a unique constraint is applied to a column then will it throw an error if you try to insert two nulls into it?
Solution: You can insert NULL values into columns with the UNIQUE constraint because NULL is the absence of a value, so it is never equal to other NULL values and not considered a duplicate value. This means that it's possible to insert rows that appear to be duplicates if one of the values is NULL. 10. How is the Entity, Entity Type, and Entity Set different from each other in DBMS? Solution: Entity-A definable thing—such as a person, object, concept or event—that can have data stored about it. Think of entities as nouns. Examples: a customer, student, car or product. Typically shown as a rectangle. Entity type: A group of definable things, such as students or athletes, whereas the entity would be the specific student or athlete. Other examples: customers, cars or products. Entity set: Same as an entity type, but defined at a particular point in time, such as students enrolled in a class on the first day. Other examples: Customers who purchased last month, cars currently registered in Florida. A related term is instance, in which the specific person or car would be an instance of the entity set.
Exercise with Solution Q1. What is Functional Dependency? Answer: Function Dependency (FD) in a relation represents the relationship between two or more attributes in a given relation. It may represents the relationship between Key attribute and nonkey attribute or between two non-key attributes. The notation used for functional dependency between two attributes X, Y of a relation is given below X→Y Here left side attribute i.e. X in this case is called as determinant and right side attribute i.e. Y is called as dependent.
Q2. What is Armstrong’s Six Axioms related to functional dependency? Answer: There are Six Inference rules or Armstrong’s axioms of functional dependency. Using these rules more functional dependency can be identified from a given initial set of dependency. 1. 2. 3. 4. 5. 6.
Reflexivity rule If 𝛼 is a set of attributes and 𝛽 ⊆ 𝛼 , then 𝛼 → 𝛽 holds. Augmentation rule If 𝛼 → 𝛽 holds and 𝛾 is a set of attributes, then 𝛾𝛼 →𝛾𝛽 holds. Transitivity rule If 𝛼 → 𝛽 holds and 𝛽 →𝛾 holds, then 𝛼 →𝛾 holds. Union rule If 𝛼 → 𝛽 holds and 𝛼 →𝛾 holds, then 𝛼 →𝛽𝛾 holds. Decomposition rule If 𝛼 →𝛽𝛾 holds, then 𝛼 → 𝛽 holds and 𝛼 →𝛾 holds. Pseudotransitivity rule If 𝛼 →𝛽 holds and 𝛾𝛽 →𝛿 holds, then 𝛼𝛾→ 𝛿 holds.
Q3. What is Normalization? How is it useful for data integrity? Answer: Normalization is a process of designing the database schema in way to reduce redundancy or duplication of record and eliminate various anomalies from a database like Insert anomaly, update anomaly and delete anomaly. The Normalization process takes a relation schema through a sequence of tests to formally declare whether it satisfies the requirements of a certain normal form. This process, which involves evaluation of each relation against the conditions for normal forms, is considered as relational design by analysis. Normalization is a process of decomposition of a large relation into two or more relations to remove the above said anomalies. The decomposition is performed on the basis of some predefined rules. The division into multiple relation should be in such a way that there should be a common keys in two or more decomposed relation so that using that common key or attribute the complete record can be retrieved. In short way the decomposition should be lossless division and join should be lossless join. The major advantages of Normalization are given below: 1. It reduce the complexity of a big relation.
2. 3. 4. 5.
It helps us to reduce redundancy of record in a relation. It helps to eliminate various anomalies in database. It also helps to maintain atomicity in a relation. Various normal forms of normalization is used to address different kind of issue of a relation.
Q4. Write short note on following with example. a) First normal Form (1NF) b) Second normal Form (2NF) c) Third normal Form (3NF) Answer: a) First normal Form (1NF) A relation will be called in First Normal Form (1NF) if it hold following properties 1. Each cell should contain atomic value. 2. Any attribute should not have multiple values. 3. A relation should not have multi-valued attribute, composite attribute and their combination. b) Second normal Form (2NF) A relation will be called in Second Normal Form (2NF) if it hold following properties 1. It should be in First Normal Form. 2. All Non-Prime attribute should be fully functional dependent on Prime attribute. c) Third normal Form (3NF) A relation will be called in Third Normal Form (3NF) if it hold following properties 1. It should be in 2NF. 2. There should not be any transitive dependency. This normal form is used to reduce data redundancy and provide data integrity. Q5. What is BCNF? Explain with and Example. Answer: BCNF is more restricted form of 3NF. A relation will be in Boyce-Codd Normal Form (BCNF) if it hold following properties 1. Relation should be in 3NF. 2. A Relation is in BCNF if every dependency contains determinant attributes as candidate key of the table. Example: Let consider the following EMPLOYEE table where employee works in different departments.
EMPLOYEE table:
EID
CITY
DNAME
D_TYPE
DNO
101
Delhi
Accounting
D10
201
101
Delhi
Development
D10
301
102
Mumbai
Admin
D20
202
102
Mumbai
Testing
D20
304
The functional dependency in this relation are given below. EID → CITY DNAME → {D_TYPE, DNO} Candidate key: {EID, DNAME} The given EMPLOYEE table in not in in BCNF as EID and DNAME are not keys in individual. To convert this table into BCNF we have to decompose it into three tables. E_CITY table: EID
CITY
101
Delhi
102
Mumbai
E_DEPT table:
DNAME
D_TYPE
DNO
Accounting
D10
201
Development
D10
301
Admin
D20
202
Testing
D20
304
E_DEPT_MAPPING table: EID
DNO
101
201
101
301
102
202
102
304
FDs of the tables are:EID → CITY DNAME → {D_TYPE, DNO} Candidate keys of the tables are: For E_CITY table: EID For E_DEPT table: DNAME For E_DEPT_MAPPING table: {EID, DNAME }
Exercise with Solution Q1. What is SQL (Structured Query Language)? What are the advantages of SQL? Answer: SQL (Structured Query Language) is a High Level Computer Language used by Relational Database Management Systems for database interaction. It is used to define, create and manage the databases[1][2]. RDBMS are getting popularity because of ease of access using this high level programming language. SQL is further divided into various types based on their functionality with databases. Advantages of SQL: SQL is a Structured Query Language. It is not a simple procedural programming language where the logic will be interpreted in a step by step manner. As any procedural programming language SQL also follows a standard syntax. ANSI and ISO are the standardization body who provides and maintain the standard syntax for all RDBMS providers[3][4][5]. All RDBMS need to follow at least 80% syntax of ANSI SQL standard. The major advantages of SQL are given below: 1. SQL has well defined standards provided by standard organization like ANSI. 2. SQL is a portable language, it can be executed on small and large scale computing devices. 3. It is easy to integrate with multiple platform and data can be transferred from one platform to other platform. 4. SQL can also be used to define data structure, control access to the data, delete, insert and modify occurrence of the data. 5. SQL has its division of command sets as per their functionalities. DDL, DML DQL, DCL and TCS such sub-divisions. 6. PL/SQL is another additional module in SQL which make possible to write business login on database.
Q2. Write short note on following a) Data Definition Language (DDL) b) Data Manipulation Language (DML) Answer: a) Data Definition Language (DDL) These Commands are used for defining and manipulation structure of database and tables. These commands are also used to create and manipulate other database object as well like user, views, triggers etc. Here are the list of commands used for DDL. Example: CREATE, ALTER, DROP etc. b) Data Manipulation Language (DML)
Data Manipulation Language (DML) is used for manipulation records in a table. It is used for Inserting, Updating and Deleting the records in a table or from a table. The commands falls in this category are not Auto-Committed. It means, the records changed with these commands can be roll backed if not committed explicitly. Example: INSERT, UPDATE, DELETE etc. Q3. What is SQL constraint? Explain different types of SQL constraints. Answer: SQL constraints are the set of rule applied on columns of a table. These are used to provide restrictions on inserted values in applied columns of a table. These ensures the reliability and integrity of the data in respective columns. There are majorly six types of constraints. • PRIMARY KEY • UNIQUE • NOT NULL • FOREIGN KEY • DEFAULT • CHECK
Q4. What is the signification of Mathematical Operator in SQL. Explain with Example. Answer: Mathematical function are used to perform certain calculation on retrieved values. Using these functions in SQL queries we can do some calculations on data. Majorly used mathematical function with their use and syntax is given below. ABS(N): This function is used to find absolute value of given data item N. Syntax: SELECT ABS(-15) FROM DUAL; Output: 15 MOD(M,N): In this function M is divided by N and the remainder will be given as
output SELECT MOD(8,5) FROM DUAL; Output: 3 FLOOR(N): This function will return the smallest value which is near to N. SELECT
FLOOR(7.6) FROM DUAL; Output: 7 CEIL(X): This function will return the largest value which is near to N. SELECT CEIL(7.6) FROM DUAL; Output: 8 POWER(X,Y): This function will return the value of X raised to the power Y. SELECT POWER(2,5) FROM DUAL; Output: 32 ROUND(X): This function returns an integer value nearest to the whole number X. SELECT ROUND(7.4) FROM DUAL; Output: 7 SQRT(X): This function will return square root of X. SELECT SQRT(9) FROM DUAL; Output: 3
Q5. What is Subquery? What is the significance of Subquery in SQL? Answer: A Subquery also called as Inner query is a nested queries in which two or more SQL queries are embed in each other. Whenever a SQL query is placed in where clause of some other SQL query than this concept is known a Subquery. This is used to return the data from one external query based on the output return by the internal query. A subquery can be used with SELECT, UPDATE, DELETE and INSERT command with SQL operators including <,>,=,<=, BETWEEN, IN, ANY, ALL EXIST etc.
Q6. What is the difference between Primary Key and Foreign key constraint? Answer: PRIMARY KEY: This constraint ensure or enforce to have unique and not null values in each row of the applied column. FOREIGN KEY: This constraint also ensure not null values but uniqueness may not be maintained. It is used to map between two tables. But it can have only those values which are mapped with the primary key column of other table.
Q7. Explain following operator with example. a) ANY b) ALL c) IN d) BETWEEN Answer: There are some special operators in SQL used for specific functionalities. Some special operators are given below with their syntax and example. • • • • •
ALL ANY BETWEEN LIKE IN
Following table, named Employee will be used for performing special operations. Employee Empno 101 102 103 104 105
Ename Amit Kabir Monika Kavita Karan
Age 38 30 25 35 34
Sal 2000 4500 3000 3000 5000
H_Date 11-01-2009 13-04-1994 02-04-2013 03-08-2020 01-02-1998
Dept 10 20 10 30 30
ALL: This operator is used to compare all values returned from the subquery and return only those rows for which the given condition satisfy for all values. Select * from Employee Where sal > ALL (select sal from Employee where Deptno=10); This query will return the records of all employees whose salary is greater than all employee working in department no. 10.
Output: Employee Empno 102 105
Ename Kabir Karan
Age 30 34
Sal 4500 5000
H_Date 13-04-1994 01-02-1998
Dept 20 30
ANY: This operator is used to compare all values returned from the subquery and return only those rows for which the given condition satisfy for any of the return values in the set of returned values[11][12]. Select * from Employee Where sal > ANY (select sal from Employee where Deptno=10); This query will return the records of all employees whose salary is greater than any employee working in department no. 10. Output: Employee Empno Ename Age Sal H_Date Dept 102 Kabir 30 4500 13-04-1994 20 103 Monika 25 3000 02-04-2013 10 104 Kavita 35 3000 03-08-2020 30 105 Karan 34 5000 01-02-1998 30 BETWEEN: This operator is used to return the values in a given range. Minimum and maximum are given in query and the records satisfy the given range will be displayed. Select * from Employee Where sal BETWEEN 4000 AND 5000; Output: Employee Empno 102 105
Ename Kabir Karan
Age 30 34
Sal 4500 5000
H_Date 13-04-1994 01-02-1998
Dept 20 30
IN: This operator is used to return the values given in a particular list. Select * from Employee Where dept in (10,30); Output: Employee Empno 101 103 104 105
Ename Amit Monika Kavita Karan
Age 38 25 35 34
Sal 2000 3000 3000 5000
H_Date 11-01-2009 02-04-2013 03-08-2020 01-02-1998
Dept 10 10 30 30
Q8. Using Subquery in following table write the query to retrieve all employee records whose salary is less than the maximum sal. Employee Empno 101 102 103 104 105
Ename Amit Kabir Monika Kavita Karan
Age 38 30 25 35 34
Sal 2000 4500 3000 3000 5000
H_Date 11-01-2009 13-04-1994 02-04-2013 03-08-2020 01-02-1998
Dept 10 20 10 30 30
Answer: Select * from Employee where Sal < (Select max(Sal) from Emplyee); Q9. Write a query on above employee table to retrieve records of all employee whose names are starting from letter “K”. Answer: Select * from Employee where Ename like ‘K%’; Q10. Write a query on above table to print Hire Date column in “DD-MONTH-YYYY’ format. Answer: Select TO_DATE(H_Date, DD-MONTH-YYYY) from Employee;
1. How would you reference column values BEFORE and AFTER you have inserted and deleted triggers? Solution: The BEFORE or AFTER option in the CREATE TRIGGER statement specifies exactly when to file the trigger body in relation to the triggering statement that is being executed. In a CREATE TRIGGER statement, the BEFORE or AFTER option is specified
just
before
the
triggering
statement.
For
example,
the
PRINT_SALARY_CHANGES trigger in the example is a BEFORE trigger. CREATE TRIGGER print_salary_changes BEFORE DELETE OR INSERT OR UPDATE ON emp FOR EACH ROW WHEN (new.empno > 0) DECLARE sal_diff number; BEGIN sal_diff := new.sal - old.sal; dbms_output.put('Old salary: ' || :old.sal); dbms_output.put(' New salary: ' || :new.sal); dbms_output.put_line(' Difference ' || sal_diff); END; / 2. How does ROWID help in running a query faster? Solution: ROWID s are the fastest way to access a row of data, but if you can do an operation in a single DML statement, that is faster than selecting the data first, then supplying the ROWID to the DML statement. If rows are moved, the ROWID will change. Rows can move due to maintenance operations like shrinks and table moves. 3. Write a unique difference between a function and a stored procedure? Solution: The function must return a value but in Stored Procedure it is optional. Even a procedure can return zero or n values. Functions can have only input parameters for it whereas Procedures can have input or output parameters. Functions can be called from Procedure whereas Procedures cannot be called from a Function. 4. What are some predefined exceptions in PL/SQL? Solution: ACCESS_INTO_NULL
Your program attempts to assign values to the attributes of an uninitialized (atomically null) object.
CASE_NOT_FOUND
None of the choices in the WHEN clauses of a CASE statement is selected, and there is no ELSE clause.
COLLECTION_IS_NULL
Your program attempts to apply collection methods other than EXISTS to an uninitialized (atomically null) nested table or varray, or the program attempts to assign values to the elements of an uninitialized nested table or varray.
CURSOR_ALREADY_OPEN
Your program attempts to open an already open cursor. A cursor must be closed before it can be reopened. A cursor FOR loop automatically opens the cursor to which it refers. So, your program cannot open that cursor inside the loop.
DUP_VAL_ON_INDEX
Your program attempts to store duplicate values in a database column that is constrained by a unique index.
INVALID_CURSOR
Your program attempts an illegal cursor operation such as closing an unopened cursor.
INVALID_NUMBER
In a SQL statement, the conversion of a character string into a number fails because the string does not represent a valid number. (In procedural statements, VALUE_ERROR is raised.) This exception is also raised when the LIMIT-clause expression in a bulk FETCH statement does not evaluate to a positive number.
LOGIN_DENIED
Your program attempts to log on to Oracle with an invalid username and/or password.
NO_DATA_FOUND
A SELECT INTO statement returns no rows, or your program references a deleted element in a nested table or an uninitialized element in an index-by table. SQL aggregate functions such as AVG and SUM always return a value or a null. So, a SELECT INTO statement that calls an aggregate function never raises NO_DATA_FOUND. The FETCH statement is expected to return no rows eventually, so when that happens, no exception is raised.
NOT_LOGGED_ON
Your program issues a database call without being connected to Oracle.
PROGRAM_ERROR
PL/SQL has an internal problem.
ROWTYPE_MISMATCH
The host cursor variable and PL/SQL cursor variable involved in an assignment have incompatible return types. For example, when an open host cursor variable is passed to a stored subprogram, the return types of the actual and formal parameters must be compatible.
SELF_IS_NULL
Your program attempts to call a MEMBER method on a null instance. That is, the built-in parameter SELF (which is always the first parameter passed to a MEMBER method) is null.
STORAGE_ERROR
PL/SQL runs out of memory or memory has been corrupted.
SUBSCRIPT_BEYOND_COUNT Your program references a nested table or varray element using an index number larger than the number of elements in the collection. SUBSCRIPT_OUTSIDE_LIMIT
Your program references a nested table or varray element using an index number (-1 for example) that is outside the legal range.
SYS_INVALID_ROWID
The conversion of a character string into a universal rowid fails because the character string does not represent a valid rowid.
TIMEOUT_ON_RESOURCE
A time-out occurs while Oracle is waiting for a resource.
TOO_MANY_ROWS
A SELECT INTO statement returns more than one row.
VALUE_ERROR
An arithmetic, conversion, truncation, or size-constraint error occurs. For example, when your program selects a column value into a character variable, if the value is longer than the declared length of the variable, PL/SQL aborts the assignment and raises VALUE_ERROR. In procedural statements, VALUE_ERROR is raised if the conversion of a character string into a number fails. (In SQL statements, INVALID_NUMBER is raised.)
ZERO_DIVIDE
Your program attempts to divide a number by zero.
5. What are the different schemas objects that can be created using PL/SQL? Solution: A schema is a collection of logical structures of data, or schema objects. A schema is owned by a database user and has the same name as that user. Each user owns a single schema. Schema objects can be created and manipulated with SQL and include the following types of objects: •
Clusters
•
Database links
•
Database triggers
•
Dimensions
•
External procedure libraries
•
Indexes and index types
•
Java classes, Java resources, and Java sources
•
Materialized views and materialized view logs
•
Object tables, object types, and object views
•
Operators
•
Sequences
•
Stored functions, procedures, and packages
•
Synonyms
•
Tables and index-organized tables
•
Views
Other types of objects are also stored in the database and can be created and manipulated with SQL but are not contained in a schema: •
Contexts
•
Directories
•
Profiles
•
Roles
•
Tablespaces
•
Users
•
Rollback segments
6. What is the difference between syntax error and runtime error? Solution: A runtime error is a program error that occurs while the program is running. Whereas, a syntax error is an error in the syntax of a sequence of characters or tokens that is intended to be written in a particular programming language. 7. What is a Trigger? Name some instances when “Triggers” can be used. Solution: Triggers can be understood as a scheduled procedure that happens to execute based upon any event. They are configured and are then automatically executed by DBMS engine without any human intervention. Triggers serves many purpose such as deriving some column values, enforcing referential integrity, event logging, auditing, replication of tables, imposing security authorizations and preventing invalid transactions. Triggers are executed on any of the following events and can be deined on table, view, schema or database. 8. What is the importance of %TYPE and %ROWTYPE data types in PL/SQL? Solution:
%TYPE- %TYPE is used to declare a constant, variable, collection element, etc as previously declared variable or column. %ROWTYPE- %Rowtype is used declare a record that represents a row in the table. The fields of the row have same name and data types as column in the view. 9. Differentiate between the cursors declared in procedures and the cursors declared in the package specifications? Solution: A cursor declared in a package specification is global and can be accessed by other procedures or procedures in a package. A cursor declared in a procedure is local to the procedure that can not be accessed by other procedures.
10. What are COMMIT, ROLLBACK and SAVEPOINT statements in PL/SQL? Solution: COMMIT ROLLBACK SAVEPOINT
: Make the permanent changes in : Set the database state to the : Save a point from where you can rollback.
the last
database. commit.
11. Write a PL/SQL program using WHILE loop for calculating the average of the numbers entered by user. Stop the entry of numbers whenever the user enters the number 0. Solution:
DECLARE n NUMBER; average NUMBER :=0 ; sum NUMBER :=0 ; count NUMBER :=0 ; BEGIN n := &input_number; WHILE(n<>0) LOOP count := count+1;
sum := sum+n; n := &input_number; END LOOP; average := sum/count; DBMS_OUTPUT.PUT_LINE(‘Average of entered numbers is ’||average); END; 12. Write a PL/SQL procedure for selecting some records from the database using some parameters as filters. Solution: CREATE PROCEDURE get_employee_details @salary nvarchar(30) AS BEGIN SELECT * FROM ib_employee WHERE salary = @salary; END;
13. Write a PL/SQL code to find whether a given string is palindrome or not. Solution:
DECLARE n number; m number; temp number:=0; rem number; BEGIN n :=12321; m :=n; while n>0 loop rem := mod(n,10); temp := (temp*10)+rem; n := trunc(n/10); end loop; if m = temp then
dbms_output.put_line('Palindrome'); else dbms_output.put_line('Not Palindrome'); end if; END;
14. Write PL/SQL program to convert each digit of a given number into its corresponding word format.
Solution: DECLARE num INTEGER; number_to_word VARCHAR2(100); digit_str VARCHAR2(100); len INTEGER; digit INTEGER; BEGIN num := 123456; len := LENGTH(num); dbms_output.PUT_LINE('Input: ' ||num); FOR i IN 1..len LOOP digit := SUBSTR(num, i, 1); SELECT Decode(digit, 0, 'Zero ', 1, 'One ', 2, 'Two ', 3, 'Three ', 4, 'Four ', 5, 'Five ', 6, 'Six ', 7, 'Seven ', 8, 'Eight ', 9, 'Nine ') INTO digit_str FROM dual;
number_to_word := number_to_word || digit_str; END LOOP; dbms_output.PUT_LINE('Output: ' ||number_to_word); END;
1. Explain what is meant by a transaction. Discuss the meaning of transaction states and operations. Solution: Transactions can also be considered as the means to access, modify and manipulate a database. This happens using either the read operation or the write operation on the database using transactions. Let us understand the concept of transaction management using the standard example of transferring money from user A to user B. If user A wants to transfer 2000 rupees to user B, then there must be a deduction in A’s account with an addition in B’s account. The series of steps which must be followed to perform this transaction are listed as: •
Read the account balance of A, to check if sufficient balance is there or not.
•
Execute the given operation i.e. A=A-2000.
•
Write the newly calculated balance to A’s account by overwriting the previous balance.
•
Read the account balance of B.
•
Execute the given operation i.e. B=B+2000.
•
Write the newly calculated balance to B’s account by overwriting the previous balance.
Transactions are the bread butter for database management. Any transaction on a database can be in any one of the following given states at a point in time [4]. These states define the behavior of a transaction as in how a given transaction will proceed further. The transition of a transaction between multiple states is represented in Figure 1. Start and end represent the beginning and end of a transaction as shown in the figure. All of the states and their interactions when combined, represent the life cycle of a transaction in a database.
Partially committe d
Star t
Committed
Terminated
Active
Failed
Aborted
End
States of a transaction Active state •
This is the first state in the lifecycle of the transaction. When a transaction starts execution, it enters the active state. It stays here as long as the sequential steps within the transaction are executed. While executing, the transaction will make certain changes to the state of a database. These changes are stored in buffered memory temporarily.
Partially committed state •
Once the transaction completed execution in the active state, it enters the partially committed state. This state is so named because the transaction is not yet complete and hence partial executed. The transaction is not completed until unless the changes are reflected in the database. A transaction with changes written in buffer memory is called as a partially committed transaction as the changes are not reflected in the database.
Committed state •
Once the transaction executes properly and is ready to be committed to database, it enters the committed state. This state deals with writing and reflecting the changes of a transaction within the database. Once a transaction reaches this state, it cannot be rolled back. If one has to revoke the changes, a new transaction with given reverted changes must be executed.
Failed state •
When a given transaction is executing in active state or is present in the partially committed state, if any abrupt event happens then the transaction moves to the failed state and the changes are not reflected in the database.
Aborted state •
Once the transaction enters failed state, the changes made by the transaction must be undone. This is done by rolling back the transaction and then entering the aborted state.
Terminated state •
This is the last state in the lifecycle of a transaction. Whether a transaction completes execution or doesn’t, it will always reach the terminated state. This represents the closure of a given transaction.
Every new transaction goes through these states while executing. Every transaction will begin at start and arrive at end, not matter which route is taken. The goal for transaction execution is same, they must stick to ACID properties and preserve the integrity as well as consistency of the database. 2. Discuss the actions taken by the read_item and write_item operations on a database. Solution:
While read data item is performed then data item is accessed from the database and available to users to read the same while in write data item is fetched from database and make the necessary changes to the data item.
3. What is meant by interleaved concurrent execution of database transactions in a multi-user system? Discuss why concurrency control is needed, and give informal examples. Solution: Many computer systems, including DBMSs, are used simultaneously by more than one user. This means the computer runs multiple transactions (programs) at the same time. For example, an airline reservations system is used by hundreds of travel agents and reservation clerks concurrently. Systems in banks, insurance agencies, stock exchanges and the like are also operated by many users who submit transactions concurrently to the system. If, as is often the case, there is only one CPU, then only one program can be processed at a time. To avoid excessive delays, concurrent systems execute some commands from one program (transaction), then suspended that program and execute some commands from the next program, and so on. A program is resumed at the point where it was suspended when it gets its turn to use the CPU again. This is known as interleaving.
4. Discuss different types of possible transaction failures, with some examples.
Solution: A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any further. This is called transaction failure where only a few transactions or processes are hurt. Reasons for a transaction failure could be •
Logical errors − Where a transaction cannot complete because it has some code error or any internal error condition.
•
System errors − Where the database system itself terminates an active transaction because the DBMS is not able to execute it, or it has to stop because of some system condition. For example, in case of deadlock or resource unavailability, the system aborts an active transaction.
5. Transactions cannot be nested inside one another. Why? Support your answer with an example. Solution: A nested transaction is a transaction that is created inside another transaction. A nested transaction’s purpose is to support transactions in stored procedures that can be called from a process already in a transaction or from a process that has no active transaction. BEGIN TRAN Tran1 GO BEGIN TRAN Nested Tran GO INSERT INTO Table1 DEFAULT Values GO 10 COMMIT TRAN Nested Tran SELECT * FROM Table1
ROLLBACK TRAN Tran1 Since rolled back the outer transaction, the entire transaction is rolled back
6. Compare binary locks with exclusive/shared locks. Why is the latter type of locks preferable?
Solution:
Binary locks:-Binary locks are type of lock. It has only two states of a lock, it is too simple, and it is too restrictive. It is not used in the practice. Exclusive/shared lock:-Exclusive/shared locks that may provide more general locking capabilities and that are used in practical database locking schemas. Read-lock as a shared lock and Write-lock as an exclusive lock. From the above locks, exclusive/shared lock is preferable, because, Share-lock is the read-locked item through this other operations are allow to read the item and where as a write-locked is a single transaction exclusively holds the lock on the item. Here these are three locking operations. That are Read-lock (X) Write-lock (X), and Un lock (X)
7. Discuss why the schedule below is a non-serializable schedule. What went wrong with the multiple-mode locking scheme used in the example schedule?
Solution:
The reason this non-serializable schedule occurs is that the items Y in T1 and X in T2 were unlocked too early. To guarantee serialisability, we must follow an additional protocol concerning the positioning of locking and unlocking operations in every transaction. The best known protocol, two-phase locking
8. Given the graph below, identify the deadlock situations.
T1
T1
T2
T2
11
11
T3
T3
a) Without Deadlock
T1
b) Without deadlock
T1
T3
T2
11
11 T4
T3
T2
T5
c) With deadlock
T6
T4
T5
T6
d) With deadlock
Solution: As per the above figure there in no cycle is formed in figure a so no deadlock condition. In figure b also no cycle is formed so again no deadlock situation. In figure c a cycle is formed T2→T6→T5→T2 so deadlock situation in this situation. In figure d a cycle is formed T2→T3→T6→T2 so deadlock situation in this situation.
9. What is two-phase locking protocol? How does it guarantee serialisability? Solution:
Two Phase Locking Protocol also known as 2PL protocol is a method of concurrency control in DBMS that ensures serializability by applying a lock to the transaction data which blocks other transactions to access the same data simultaneously. Two Phase Locking protocol helps to eliminate the concurrency problem in DBMS. This locking protocol divides the execution phase of a transaction into three different parts. •
In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.
•
The second part is where the transaction obtains all the locks. When a transaction releases its first lock, the third phase starts.
•
In this third phase, the transaction cannot demand any new locks. Instead, it only releases the acquired locks.
The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in two steps: •
Growing Phase: In this phase transaction may obtain locks but may not release any locks.
•
Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock
It is true that the 2PL protocol offers serializability. However, it does not ensure that deadlocks do not happen. In the above-given diagram, you can see that local and global deadlock detectors are searching for deadlocks and solve them with resuming transactions to their initial states. 10. Analyse the relationships among the following terminology: Problems of concurrency access to database (lost update, uncommitted dependency); serializable schedule; basic 2PL; deadlock; conservative 2PL; wait-die and wound-wait Solution:
The lost update problem occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect.
That is, interleaved use of the same data item would cause some problems when an update operation from one transaction overwrites another update from a second transaction. An example will explain the problem clearly. Suppose the two transactions T1 and T2 introduced previously are submitted at approximately the same time. It is possible when two travel agency staff help customers to book their flights at more or less the same time from a different or the same office. Suppose that their operations are interleaved by the operating system.
Uncommitted dependency occurs when a transaction is allowed to retrieve or (worse) update a record that has been updated by another transaction, but which has not yet been committed by that other transaction. Because it has not yet been committed, there is always a possibility that it will never be committed but rather rolled back, in which case, the first transaction will have used some data that is now incorrect - a dirty read for the first transaction.
A schedule S of n transactions is a serialisable schedule if it is equivalent to some serial schedule of the same n transactions. Notice that for n transactions, there are n possible serial schedules, and many more possible non-serial schedules. We can form two disjoint groups of the non-serial schedules: those that are equivalent to one (or more) of the serial schedules, and hence are serialisable; and those that are not equivalent to any serial schedule, and hence are not serialisable.
A transaction is said to follow the two-phase locking protocol (basic 2PL protocol) if all locking operations (read_lock, write_lock) precede the first unlock operation in the transaction. Such a transaction can be divided into two phases: an expanding (or growing) phase, during which new locks on items can be acquired but none can be released; and a shrinking phase, during which existing locks can be released but no new locks can be acquired.
Conservative 2PL A variation of the basic 2PL is conservative 2PL also known as static 2PL, which is a way of avoiding deadlock. The conservative 2PL requires a transaction to lock all the data items it needs in advance. If at least one of the required data items cannot be obtained then none of the items are locked. Rather, the transaction waits and then tries again to lock all the items it needs. Although conservative 2PL is a deadlock-free protocol, this solution further limits concurrency.
Two schemes that use transaction timestamp to prevent deadlock are wait-die and wound-wait. Suppose that transaction Ti tries to lock an item X, but is not able to because X is locked by some other transaction Tj with a conflicting lock. The rules followed by these schemes are as follows: •
wait-die: if TS(Ti) < TS(Tj) (Ti is older than Tj) then Ti is allowed to wait, otherwise abort Ti (Ti dies) and restart it later with the same timestamp. •
•
wound-wait: if TS(Ti) < TS(Tj) (Ti is older than Tj) then abort Tj (Ti wound Tj) and restart it later with the same timestamp, otherwise Ti is allowed to wait.
In wait-die, an older transaction is allowed to wait on a younger transaction, whereas a younger transaction requesting an item held by an older transaction is aborted and restarted. The wound-die approach does the opposite: a younger transaction is allowed to wait on an older one, whereas an older transaction requesting an item held by a younger transaction preempts the younger transaction by aborting it. Both schemes end up aborting the younger of the two transactions that may be involved in a deadlock, and it can be shown that these two techniques are deadlock-free.
11. How does the granularity of data items affect the performance of concurrency control? What factors affect selection of granularity size for data items? Solution: The following are the factors that an affect the performance of concurrency control 1. Database Record 2. Field value 3. Block of disk 4. Complete file 5. Database The granularity can affect the performance of concurrency control and recovery.
12. Discuss multi-version two-phase locking for concurrency control. Solution:
Multiversion concurrency control techniques keep the old values of a data item when the item is updated. Several versions (values) of an item are maintained. When a transaction requires access to an item, an appropriate version is chosen to maintain the serialisability of the concurrently executing schedule, if possible. The idea is that some read operations that would be rejected in other techniques can still be accepted, by reading an older version of the item to maintain serialisability. An obvious drawback of multiversion techniques is that more storage is needed to maintain multiple versions of the database items. However, older versions may have to be maintained anyway – for example, for recovery purpose. In addition, some database applications require older versions to be kept to maintain a history of the evolution of data item values. The extreme case is a temporal database, which keeps track of all changes and the items at which they occurred. In such cases, there is no additional penalty for multiversion techniques, since older versions are already maintained.