Db handout full

Page 1

Database Management [436304]

AUGUST 14, 2014

(Image credit) Hauman, G. (2009, March 11). The Manga Guide To Databases. Retrieved from Comicmix: http://www.comicmix.com/news/2009/03/11/the-manga-guide-to-databases/

DATABASE MANAGEMENT 436 304 JIRAPON TANASANTI SILPAKORN UNIVERSITY


Database Management [436304]

Forward Dear arts students Database is one of the most crucial part of any information system. For small scale, database can tell you how much did you pay for food in this month. A company knows its sales. Government knows its people. Have you ever wonder, how investigators in CSI series can find one individual in a hundred of suspects? These things are made possible by database technology. Anything that involve storing data and find information out of them are, most of the time, relate to database technology. Database management course teaches database technology. Students who pass this course should be able to create and use their own database. They would have good knowledge in database design and experienced a basic database system implementation. They can also troubleshoot basic database system and, if things are out of hand, communicate with database specialist for solution to the problems. Be warn, this course is not as carefree as the cover of this handout. This document is a handout for 436 304 Database management course, aimed for registered students. Information written in this handout is intended to be studied in combination with class lecture. So, if you have questions, attend the class and ask away!


Database Management [436304]

Table of contents INTRODUCTION ..................................................................................................................................................... 1 Why should I learn about database management ............................................................................................................ 1 What is database .............................................................................................................................................................. 1 Where to learn more outside the class ............................................................................................................................. 1 Course and teaching policy ............................................................................................................................................... 2

RELATIONAL DATABASE ......................................................................................................................................... 4 History lesson .................................................................................................................................................................... 4 Relational Database .......................................................................................................................................................... 6

RELATIONAL DATABASE BASIC PRINCIPLES ............................................................................................................ 9 Keys ................................................................................................................................................................................... 9 Relationship (cardinality) ................................................................................................................................................ 11 Dependency .................................................................................................................................................................... 12

NORMALIZATION ................................................................................................................................................. 14 What is so “normal” about it?......................................................................................................................................... 14 Normal forms .................................................................................................................................................................. 15 Why should I do it? Or why I shouldn’t? ......................................................................................................................... 22

ENTITY RELATIONSHIP (ER) DIAGRAM ................................................................................................................. 25 Chen’s VS crow’s foot...................................................................................................................................................... 25 Crow’s Foot Diagram ....................................................................................................................................................... 26

JOB SPECIFIC DATABASE ...................................................................................................................................... 29 Operational database...................................................................................................................................................... 29 Issues about Operational database ................................................................................................................................. 30 Analytical database ......................................................................................................................................................... 30 Compare Operational and Analytical Database .............................................................................................................. 32

CASE STUDY ......................................................................................................................................................... 35 How to design a database (for dummy) .......................................................................................................................... 35 How not to design a database ........................................................................................................................................ 37

MYSQL & SQL COMMANDS.................................................................................................................................. 41 DBMS ......................................................................................................................................................................41 mySQL ............................................................................................................................................................................. 41

SQL COMMANDS .........................................................................................................................................................42 Getting start .................................................................................................................................................................... 42 Use & Show ..................................................................................................................................................................... 45 CREATE ............................................................................................................................................................................ 46 ALTER .............................................................................................................................................................................. 48 DROP ............................................................................................................................................................................... 50 Where can I learn more? ................................................................................................................................................ 51

SQL COMMANDS II .............................................................................................................................................. 52 Intro to select .................................................................................................................................................................. 52 INSERT ............................................................................................................................................................................. 52 UPDATE ........................................................................................................................................................................... 54


Database Management [436304] DELETE ............................................................................................................................................................................ 55

SQL COMMANDS III ............................................................................................................................................. 56 SELECT ............................................................................................................................................................................. 56 JOIN ................................................................................................................................................................................. 58 WHERE ............................................................................................................................................................................ 62 NESTED SELECT ............................................................................................................................................................... 67

SQL FUNCTIONS ................................................................................................................................................... 69 Aggregate Functions ....................................................................................................................................................... 69 Scalar Functions .............................................................................................................................................................. 71 Conclusion ....................................................................................................................................................................... 74

DATABASE ADMINISTRATION AND TOOLS........................................................................................................... 75 DATABASE ADMINISTRATION..........................................................................................................................................75 Indexing........................................................................................................................................................................... 75 Back up............................................................................................................................................................................ 76 Optimization ................................................................................................................................................................... 77

TOOLS .......................................................................................................................................................................79 MySQL workbench .......................................................................................................................................................... 79 PHPmyadmin................................................................................................................................................................... 87 Conclusion ....................................................................................................................................................................... 90

BIBLIOGRAPHY..................................................................................................................................................... 91


Database Management [436304]

INTRODUCTION This is a handout for database management course. In this course, students will learn about database technology. First, students should realize importance of database technology in everyday life. They should know what a database is and where to learn more on their own. Lastly, course and teaching policy is explained.

WHY SHOULD I LEARN ABOUT DATABASE MANAGEMENT Database is a technology that make the world as you know it. Any information that can be stored and accessed quickly are most likely stored in a database. Database allows department stores to use barcode to speed up selling process. It allows university to keep track of students’ progress. It allows cooperation to track their inventories. It allows your progress in online game to be saved, so you don’t have to restart every time you login. By the way, the login process is also possible with database technology. Knowing database technology allows students to understand how various system works. With enough knowledge, students can design and implement database system to fit their needs in future occupations. Students can troubleshoot problems with database and can communicate with database specialist. In short, knowledge of database technology allows students to store and retrieve data effectively.

WHAT IS DATABASE Database is an organized collection of data. By organized, it means that data is stored in relevant fashion. Data can be quickly stored and quickly retrieved. Some database allows more advance analytical technique. Lastly, by collection, it means a group of data with shared characteristics. Technically, database includes any organized collection of data such as card index and classic document folder. However, this course focuses on computer database.

WHERE TO LEARN MORE OUTSIDE THE CLASS Knowledge about database is a widely available online. However, most of them are technical information such as SQL syntax for manipulating data. W3C (http://www.w3schools.com/sql/) is a trustworthy website for standard SQL references. More specific technical information can be found from reference pages by database management system or DMBS provider such as MySQL reference manual (http://dev.mysql.com/doc/). Make sure you picked the right version.

1


Database Management [436304]

Reading reference webpages is an important skill. If you can read error message, you can fix the problem. Otherwise, tough luck!

For database theory, a lot of good books on database technologies are in libraries. Online sources such as Wikipedia are worth reading. However, a large portion of these knowledge is in English. Therefore, it is highly recommended that students improve their English skills. It is also important to read technical documents. The more students read technical documents, the better their skill to understand them.

COURSE AND TEACHING POLICY As you can see, handouts are in English. This policy aims to prepare students for coming Asean Economic Community or AEC. Students need English skills to survive in upcoming competition. The largest portion of database knowledge is also recorded in English. DBMS reference manuals are in English. Therefore, English is the crucial for this course.

All handouts provided in this course will be in English‌ ‌If students cannot read the handout, it is recommended that they attend every lecture which will provided in Thai.

However, the instructor understand students’ situation. Lectures are to be provided in Thai to ensure correct understanding of database knowledge. Students must read the handout, which will be distributed by the printing shop on every Monday, before class. Any questions from self-learning are to be asked during class. PowerPoint presentations shown in class might not be comprehensible without lecture, do not depend on it. If a student can identify mistakes, such as grammatical mistake, left uncorrected in the handouts, the student can identify those mistakes to the instructor. A reward is promised at the end of the course. To ensure students understanding on every lesson, each session begins with pre-test. The test is scored. This is designed to stimulate self-learning and also used as attendant check. Then, a post-test is taken before a session ends. This test is also scored. This is designed to reflect students understanding on themselves and encourage concentration in the session. A term paper must be submitted for this course. Students are to assemble a team of their own. No solo work allowed. Teams decide their research topic on database. At least, the project should involve extensive learning on a topic related to database

2


Database Management [436304] technology. The best project should be improvement of new database management technique. A proposal must be submitted before midterm exam. After that, students must submit and present a term report at the last session before the final exam. In case that presentation exceed class time, additional session might be issued as circumstances demand.

Peer review Students have to evaluate their peers after working together. Students who take no part in the work will earn less or no score at all.

3


Database Management [436304]

RELATIONAL DATABASE Database technology has been developed for a long time. This handout briefly explore its development before focusing on relational database.

HISTORY LESSON History of database system shows development of the technology. Learning this development allows students to understand what and why things have come to be. With this understanding, they can get the core aspect of each development step. Database technology is developed along other computer technologies in order to handle larger and more complex data. Database technology can be divided into three eras based on data modelling. These eras are navigational, relational and post relational. N AVIGATIONAL

DATABASE

Navigational database is the first type of database. Still based on file-like structure, navigational database handle file hierarchically, like file folder. The system is implemented in a tree like structure which a file is connected to another file. This form a long chain of relation. ROOT Navigational database provide a number of advantages such as flexibility because the data is linked to its parent only and not limited by any 1 2 complex structure. It is easy to understand from user’s point of view. It can be used on almost any type of data, although not very good in the long 1.1 1.2 2.1 run. Despite the advantages, it has some FIGURE 1: HIERACHICAL DATABASE MODEL drawbacks. It is difficult to implement and extremely hard to manage. This is due to its lack of ROOT standard. Think about a long list of necklace which each single lock of chain is a data node. Then, as more nodes are added, the chain becomes more strangled. After all, the person who make the chain resigns. Who will untangle the mess? This is exactly what happen to a non-standard database system. Despite its complications, navigational database does not die. The technology might not suitable for standard data like student profiles but it is good for non-standard data like webpage components. An

1

2 2.1

1.1 1.2

FIGURE 2: NETWORK DATABASE MODEL

4


Database Management [436304] example for navigational database engine currently in usage is Document Object Model or DOM. Navigational database also developed into network database which is like hierarchical database but nodes can be linked upwards. Nevertheless, navigational database system is still a mess after a while of using. The, a new type of database system which is developed for standard data model is born. The database that will not become entangled mess of chain after a large data is added. R ELATIONAL

DATABASE

Edgar Codd start developing a new database system from his hard disk development office. A hard disk cannot use a long chain of data because it must be searched quickly. So, Edgar proposed a new data model based on tables. Let’s compare this new data model to navigational model. Navigational model keep data in a form of a node, like a lock of chain. This structure is also called, linked list. A list which each one of its member linked to another. The new structure treat a data as a record in a table. This table is also called scheme or entity (not a dining table). Like a spreadsheet with fixed column and fixed data type, data can be added to the table without creating a long chain of complex parent and child relationship. Each column of a table can be selected for an index, which speed-up searching. Data redundancy is also controlled using keys. This is a new type of database system known as relational database. This type of database will be introduced further in this handout. It is the most prevalent database system by the time this handout is made. However, a new trend of database technology which allows vast variety of data to be stored and retrieved is being promoted. This is a post-relational database. It is like navigational database remade and rearmed with powerful computer of the modern age. P OST - RELATIONAL

DATABASE

After globalization takes the full flight, companies began to expand over countries border. Requirement of database system changes. First, it is the requirement of anything that can bring the user to the data, which results in navigational database. Then, it is the system that can manage data efficiently, which results in relational database. This time, it is the system that can expand and manage an extremely large data from globalization. A new type of database system is proposed. It is the database that can handle data size of world population. Used by Twitter and Facebook, NoSQL gains recognition as an emerging database system.

5


Database Management [436304] The name NoSQL is not “No SQL”. It is “Not Only SQL”. SQL is a language for relational database manipulation. The name of NoSQL is to imply that this new type of database system allows SQL usage and then some more features. NoSQL database system is designed for expansion. For relational database to handle a large data is not uncommon. However, data from globalization is increasingly larger and larger. Database maintenance tasks like backup takes an extremely long time and a collapse of one data center means the death of the business. Most importantly, there is no single hard disk to store a data size of world population. The answer to this question is to expand horizontally, add more computers to the existing ones. This is where NoSQL excels. There are many NoSQL database engine such as MongoDB which treats data like a document. Their keys are stored separately for quick search which the data itself can be manipulated as a file. This results in loosely formatted database system which is not limited by table structure like relational database. However, it is very new form of technology and this handout is not focusing on it. Students are free to dig deeper into this technology if they choose NoSQL for the topic for their term report.

RELATIONAL DATABASE Relational database is the most prevalent database system to the time this handout is being made. It is developed after navigational database. With it, Structured Query Language or SQL, the language for manipulating relational database, is born and database system is standardized for everyone. R ELATIONAL

DATABASE STRUCTURE AND TERMINOLOGY

A database contain many data. Data is organized into entity, also known as table and relation but not to be confused with relationship between tables. In each table, there are many columns which represent an attribute of the data being stored. Therefore, a column is also known as attribute. Some books call column a field. Each set of data stored into a table creates a record, also known as a row or a tuple. Every intersection of a row and a column makes a cell. This structure is shown in figure 3.

6


Database Management [436304] Table DATABASE

Column Row ->> Row ->> Row ->>

Column Cell Cell Cell

Column Cell Cell Cell

FIGURE 3: DATABASE STRUCTURE USING SPREADSHEET TERMS

There are more than just data storage in relational database. Sometimes, data is retrieved by a set of instruction. This data viewing slot is called a query or a view. Use can view information from the view but not insert or update anything. For other features, some fields can be configured as an index to facilitate quick search. Other terms should be noted as the lessons progress. This handout series do not summarize all the terms for you, so you must make a summary of terms yourself. This is aimed to stimulate selflearning. Are you sure? Check your understanding by writing down synonym of these words. Table Column Row

Table: Entity, Relation Column: Attribute, Field Row: Record, Tuple

You may notice that database terms are sometime used interchangeably with spreadsheet terms. For example, a record can also be called a row. This is because the database is visualized using table structure. However, note that these terms have their places and are not exactly the same. For example, an entity which means a table cannot have duplicate tuple, which means a row, while a table could contain duplicated rows. This might be confusing but it is important to know that these words has unique meaning. If you read any books about database, make sure you read definition of each word and try not to assume its definition otherwise. For this handout, table, field and row are used. In special occasion, some different terms would be used such as when discussing about object-oriented design, an attribute might be used as a field.

Relation database is named after the relational data model by Edgar Codd. In this database system, users configure the data they need to keep before actually storing those data. For each data stored into a table, users link it to other tables to make information. The link is called relationship. For example, a log of userID A accessing computerID 436 does not provide any information. With linked relationship, the database shows that userID A is student

7


Database Management [436304] Norman and computer 436 is a teacher’s computer in exam storage room. This create information, that student Norman do something to teacher’s computer, from data, of a userID do something with a computer.

So, relational database is best, huh? Relational database is good and widely used but it is not perfect. What are their disadvantages? Are there any more advantages? Why should you use relational database for your data storage, and why should not? Discuss‌this might be in an exam. Who knows?

Relational database is designed to improve data integrity. Before its development, data is stored in files. To access a data, users must open the file and read from/write to it. However, some changes in a file does not affect the other. For example, if a customer changes his name on customer profile, delivery order does not change with it. Thus, the goods send to a no longer existing person. With relational database, customer name in delivery order is linked with customer name in customer profile. Changes in customer profile also affect delivery order. In addition, customer name is stored in one place which reduce redundancy while saving storage space, too.

Relational database has a set of design rules. This rules are aimed to eliminate redundancy in the data and to keep the data in the most atomic level. Following these rules ensure data integrity and prevent mistakes when manipulating data. These rules are known as normalization, which is discussed in the next handout.

8


Database Management [436304]

RELATIONAL DATABASE BASIC PRINCIPLES Relational database is based on relational data model. In relational data model, there are some basic principles necessary for understanding relational database. This handout explains those topics.

KEYS Keys are unique attribute which can be used to identify the record. Relational database uses key to find data when queried. There are many keys in relational database. Understanding these keys concepts is crucial for database designer. S UPER

KEY

Some attributes are unique. These attributes have potential to represent the entire record. For example, student code can represent a student. These attributes are treated as keys. All attributes or groups of attributes which can represent a record are super keys. In other words, super key is an attribute or a group of attributes which can be used as a key. Note that candidate key can consist of many attributes. For example, a key for “calendar event� can be [Year, Month, Date]. This means one cannot use only [Year] to identify an event. [Month] and [Date] cannot identify the event too. Only combination of all [Year, Month, Date] is unique and is super key. S IMPLE

KEY

A simple key is one attribute which can uniquely identify a record. This attribute cannot be reduced into many attributes. For example, a studentID is a simple key. It can identify student. However, StudentFullName, which includes first name and last name, is not a simple key because it can be reduced into first name attribute and last name attribute. C OMPOSITE

KEY

Composite key is a group of attributes that is a key, can uniquely identify a record. Do not confuse this with a compound key. C OMPOUND

KEY

This is similar to a composite key but a compound key contains more than one attribute that can uniquely identify a record.

9


Database Management [436304] C ANDIDATE

KEY

Candidate key is the smallest super key which can be used to identify a record. There must not be a super key inside a candidate key. In other word, candidate key cannot have a super key in its subset. Note that a candidate key, like super key, may be a group of attributes. It can be a composite key but it cannot be a compound key. P RIMARY

KEY

Primary key is the key which is chosen from candidate keys to represent the record. Data may have more than one fields which are unique and capable of representing the record. However, not all those fields are set as keys. The primary key is selected from those keys. Primary is one of the most important part of a table. To be chosen as a primary key, the key must meet these requirements: 1. Must be unique. 2. Cannot be Null. Lastly, the primary key becomes index used for searching. S ECONDARY

KEY

/

ALTERNATIVE KEY

After primary key is chosen, the rest of the candidate keys become secondary key or alternative key. In this sense, these are keys that are not used. They can be set as indexes to facilitate searching. F OREIGN

KEY

In order to establish relation between tables, one of the keys in one table must present in another table as foreign key. It is a key that put into other table in relation. The purpose of foreign key is to allow tables to join data and create information. Foreign key is demonstrated in figure 1. A primary key from table [country] is country_id. That primary key is put into table [city] to indicate relationship between city and country. In this case, country_id is a foreign key in table [city]. Note the differences in relationship line which indicate 1-M relationship.

FIGURE 4: FOREIGN KEY

10


Database Management [436304]

RELATIONSHIP (CARDINALITY) Relationship between tables indicate occurrence in a table that come from another table. Relationship is critical in designing a database because it allows DBMS to manipulate data effectively. There are three types of relationship in relational database. Note that for every type of relationship, data in one of the tables might be forced to have at least one record or allowed to be none (zero record). 1-1 One-to-one relationship means one record in the left table can be linked to only one record in the right table. This type of relationship is sometime not necessary for the design since the designer can put 1-1 tables into one table. For example, a person can have only one death record. Table [person] has one-to-one relation with table [death record]. As you can see, designer could put death record as an attribute in table [person]. 1- M One-to-many relationship is the most crucial relationship in database design. It means one record in the left table can be linked to many records in the right table. This relationship also eliminates repeated values inside a table. It shorten the data using code. Therefore, designer should identify 1-M relationship in data and design tables for it. For example, one person may have many phones. A person is represented by citizenID which is a primary key in table [person]. A phone is represented by PhoneNumber which is a primary key in table [phone]. [person] is in 1-M relationship with [phone]. M-M

Many-to-many indicates that many record in the left table can be linked to many record in the right table. This relation is a mess in database design because database cannot operate effectively with this kind of relation. It is critical to note, though, that this type of relationship present in many natural data. Database designer must identify it, try to reduce the relation into 1-M in order to design the tables and allow DBMS to operate effectively. For example, husbands may have many wives and their wives may have many husbands. This situation is not just life-threatening for both parties but for database designer as well. Try to think how a designer could put a foreign key into these tables without repeating part of the records. Most of the time, it is not possible. Therefore, M-M

11


Database Management [436304] relationship induces repeated values inside that tables. Database designer could eliminate repeated data by reducing M-M relationship to 1-M relationship with a middle table.

DEPENDENCY Dependency is a situation when a data is affected by another data, or depend on another data. This creates many problems for relational database designer to solve as eliminate most of these dependencies allow data to be reduced to its atomic form. An example of dependency is the key and other attributes. All attributes in the table is dependent to the primary key. This is a good dependency which allows database to operate without anomalies. P ARTIAL

DEPENDENCY

As mentioned above, attributes depends on the primary key. However, sometimes primary key is a composite key, meaning it contains more than one attributes. In this case, a partial dependency may occur. Partial dependency is a situation when some attributes can be identified using only part of the primary key. Partial dependency is demonstrated in table 1. TABLE 1: PARTIAL DEPENDENCY

ID 01 02

NAME Basic computer Basic programming

BOOK Basic computer Basic computer

TEACHER Mod Ant

In table 1, [ID] and [Name] is chosen as primary key. You can see that either [ID] or [name] can identify [teacher]. This means [teacher] is partially dependent on the primary key. T RANSITIVE

DEPENDENCY

Transitive dependency is a situation which some attributes can be identified by a nonkey attribute in a transitive way. In other word, consider this situation. Attribute A can identify attribute B. Attribute B cannot uniquely identify attribute A. However, B can identify attribute C. This create a situation which A -> B -> C, therefore, A -> C. This is a transitive dependency. Transitive dependency is demonstrated in table 2.

12


Database Management [436304] TABLE 2: TRANSITIVE DEPENDENCY

NAME Som One Som Tin Som We

GENDER Male Male Female

PANTS/SKIRT Pants Pants Skirt

In table 2, [name] is chosen as a primary key. As a primary key, [name] can identify [gender] and [pants/skirt]. However, [gender] can identify [pants/skirt] too. This is called transitive dependency. F UNCTIONAL

DEPENDENCY

Functional dependency is a situation which an attribute or a group of attributes can be identified by non-key attribute(s). In other word, there is an attribute (or a group) which cannot identify a record but can identify other attributes. M ULTIVALUED

DEPENDENCY

Multivalued dependency is rather special, compare to previous dependency. Multivalued dependency is a situation which some records with certain data is bound to occur given added records. For example, a table containing sales promotion data. There are attributes [salesperson], [promotion], [result] in the table. Attribute [promotion] is identified by [salesperson]. Attribute [result] is also identified by [promotion]. If a new [promotion] is added to the some additional [salesperson], [result] is also bound to be added as well. The added data is going to be the same as other records with the same [promotion]. J OIN

DEPENDENCY

Join dependency is not about only a table. This type of dependency spread across the database. It is a situation which a large table is reduced into smaller tables. When these smaller tables are joined, it must resemble the original large table without anomalies.

13


Database Management [436304]

NORMALIZATION Relational database depends on relational data model. It is great at reducing data redundancy and improving data integrity. However, data do not relate themselves. Database designer must design relation for them. To do that, database designer must (or might not) do normalization.

WHAT IS SO “NORMAL” ABOUT IT? Normalization is a process to reduce data into its atomic form. Atomic means the smallest possible state. This process reduces data into a smaller data that it cannot be reduced any further. But, why should we reduce data into such a small state?

Do you know? “Data” is a plural form. It is not countable. Its singular form is “datum”.

When data is reduced into its atomic form, it possesses great integrity and there is no redundancy. This is the data in its normal form. However, an atomic data is not useful by itself. It needs relation to help making sense of the data. This is the core of relational database. As in his own word, Edgar Codd, who introduced relational database to the world, stated the objectives of performing normalization as followed1:

1. To free the collection of relations from undesirable insertion, update and deletion dependencies; 2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs; 3. To make the relational model more informative to users; 4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

In other words, normalization eliminates anomalies and logical inconsistencies. So, How to make data become normal? To do that, data must be put into its normal form.

1

Codd, E.F. "Further Normalization of the Data Base Relational Model", p. 34

14


Database Management [436304]

NORMAL FORMS There are many levels of normalization, each called normal form. The higher level of the normal form is, the more atomic, aka. Normal, the data is. To put in a practical view, normalization involves chopping down a large data into smaller records and put into multiple tables with relation set between them. Students can see this by examples given for each normal form. Before one can normalize the data, a key must be established. Key is the data that identify the entire record in the table. Keys are investigated more closely in the next chapter. In this chapter, keys are briefly explained. 0NF Zero-normal form is similar to not normalized at all. Data is simply put into a table such as in table 1. Redundancy can be clearly seen. Data manipulation is extremely challenging at this level. TABLE 3: 0NF TABLE (WITH NO PRIMARY KEY)

ID 1 2

FIRSTNAME Oppa Gangnam Steve Joint

3

Ball Gate

TEL 081- WHATDAT 081- ICANTTH, 083- INKOFBE 088-TTERNAM

Difficult to update (table 1) If you have to tell write a code to update Steve Joint’s 2nd, what would you write?

1NF First normal form can be attained when a table meet these criteria. 1. Identify by one unique primary key. This means all the records must not share the same primary key. Primary key is the data that can identify the record and is used primarily with no other keys involved. 2. Same type of value in a column. Data in the same column must be the same throughout all records. For example, the name column should not hold anything else but a name. 3. Only one value in a cell. Data in a cell must be organized so that there is only one data in a cell. This is shown in table 1 as the value in TEL of Steve Joint is 2 different values.

15


Database Management [436304]

16


Database Management [436304] 4. No repeating data. Data must not be repeated in order to make a table. For example, consider table 1, one can eliminate the TEL values which violate the 3. Rule of 1NF by adding more row to it. Make 2 Steve Joints rows, each with its own TEL value. This creates a repeating data and does not pass 1NF qualification. To complete 1NF for table 1, values must be reorganized into two tables. The first table contains User data as shown in table 2. The second table contains contact data as shown in table 3 (shaded columns are primary keys). TABLE 4: 1NF - USER DATA TABLE

userID 1 2 3

FIRSTNAME Oppa Gangnam Steve Joint Ball Gate

TABLE 5: USER CONTACT TABLE

recordID 1 2 3 4

userID 1 2 2 3

TEL 081- WHATDAT 081- ICANTTH 083- INKOFBE 088-TTERNAM

Relation between these tables can be clearly seen. UserID link between table 2 and table 3 and provide that Steve Joint has 2 telephone number. Both tables have a unique primary key for each record. There is no repeated value. Each cell contains one value. Each column contains one type of data. This is the first normal form. 2NF No non-prime attribute in the table is functionally dependent on a proper subset of any candidate keys. 1. Must be in 1NF This is a must for all normal form. To be a higher level of normal form, the table must meet all criteria of the lower level normal form. 2. No partial dependency This situation is created when a primary key column is not enough to identify the record. Two or more columns are needed to identify the record. These multiple column primary key is called composite key. However, some column does not require the entire composite key to identify itself. This is called partial dependency. This implies that if the primary key is contained within only one column, the table is in 2NF. However, all 1NF criteria must be met.

17


Database Management [436304] TABLE 6: 2NF – ORIGINAL TABLE (WITH PARTIAL DEPENDENCY)

Student Donald Duke Scoopy Does Pink Panzer

Subject Code 436304 436304 436302

Subject Database Database Network

Score 49 56 68

From table 4, one can see that shaded columns can identify the score and are set as a composite key. However, assuming no adding data, using only [student] can also identify a score. This is a partial dependency. An attribute is depending on only a part of any keys. This dependency results in a number of anomalies. For example, an insert anomaly occurs when one try to add a new subject. At least one student must have a score from that subject for the subject to be inserted into this table because [student] is a key and cannot be null. An update anomaly also occurs when one tries to update subject name from database using student as a criteria. This results in two different subject name with the same subject code and the data loses integrity. To fix this dependency, one could reduce table 4 into two separated tables as shown in table 5 – table 6. TABLE 7: SUBJECT

Subject Code 436302 436304

TABLE 8: SCORE

Subject Network Database

Student Donald Duke Scoopy Does Pink Panzer

Subject Code 436304 436304 436302

Score 49 56 68

3NF Every non-prime attribute is non-transitively dependent on every candidate key in the table. The attributes that do not contribute to the description of the primary key are removed from the table. In other words, no transitive dependency is allowed. 1. Must be in 2NF (which in turn required to be in 1NF, too) 2. No transitive dependency This situation is called transitive dependency. It happens when a non-key attribute can be identified by other non-key attribute. These attributes must be moved into a separated table in order for the tables to be in 3NF.

18


Database Management [436304] Basically, 3NF is adequate for normalization process of most database. However, there are more normal forms as relational database is developed. Tips: Easy way to remember normal form criteria There is a tradition of “sworn testimony” when a person is called for a court of law. A person is asked to put one of his hand on a holy text, put another up toward the sky (or ceiling, if you must) and speak out loud.

“I swear by the [the holy bible or the god of choice] that the evidence I shall give shall be the truth, the whole truth and nothing but the truth” This can be adopted for normalization. For 1NF to 3NF in order,

“I shall make normal form by a record that can be identify by a key, the whole key and nothing but the key.” TABLE 9: 3NF – ORIGINAL WITH TRANSITIVE DEPENDENCY

C HARACTER S OAP M C T RAVISH M ARKUS F ENIX L EON F. K ENNEDY J ILL V ALENTINE

G AME C ALL OF DUTY G EAR OF WAR R ESIDENT E VIL R ESIDENT E VIL

G AME TYPE 1 ST PERSON SHOOTER 3 RD PERSON SHOOTER HORROR HORROR

Table 7 shows transitive dependency. It is clear that a character can identify which game he appears. However, a game cannot uniquely identify a character because there are many characters appear in a game. In turn, a game can identify its type. Given character as a key, transitive dependency occurs. In this case, [character] can identify (->) [game] and [game] can identify [game type]. This can be written as [character] -> [game] -> [game type]. Transitive dependency can cause problems, anomalies, to database operation. For example, if Soap McTravish is killed and have to be removed from this database, the database does not only loss Soap McTravish but also the fact that Call of Duty is a 1st person shooter. This loss of information is a deletion anomaly. To eliminate transitive dependency, one could reduce the table into two tables such as shown in table 8 and table 9.

19


Database Management [436304] TABLE 10: 3NF - CHARACTER

C HARACTER S OAP M C T RAVISH M ARKUS F ENIX L EON F. K ENNEDY J ILL V ALENTINE

TABLE 11: GAME TYPE

G AME C ALL OF DUTY G EAR OF WAR R ESIDENT E VIL R ESIDENT E VIL

G AME G AME TYPE C ALL OF DUTY 1 ST PERSON SHOOTER G EAR OF WAR 3 RD PERSON SHOOTER R ESIDENT E VIL HORROR

BCNF Boyce-Codd normal form is like an upgraded + shortened version of 3NF. It is also known as 3.5NF. Usually, tables which meet 3NF requirements are in BCNF but in some cases they are not. To be in BCNF, a table must meet these criteria: 1. Must be in 3NF 2. No functional dependency In other word, BCNF requires all attributes which are not already be part of a key to be candidate key. Usually, 3NF would be BCNF. However, in case that there are candidate keys in the table and those keys overlapped, that 3NF table is not a BCNF compatible. It is important to note that BCNF is, in theory, explained by functional dependency theory. Students are encouraged to take a glimpse of it, although it is not necessary for arts student to understand all theory in it. TABLE 12: BCNF - ORIGINAL TABLE

ID 1 2 3 4 5

Victor Persian USSR USA Viking Viking

Battlefield Greece Russia Normandy Normandy England

Defeated Sparta Germany Germany French Celt

Table 10 is in 3NF. ID is the primary key. All attributes can be uniquely identify by the key. However, you can see that there are some cells share the same value. Update anomaly occurs when you are realized that Normandy should not be in the table. It should be France. You would have to update every single record in the table which has Normandy in battlefield column. To fix those problems, the table could be reduced into smaller tables as shown in table 11 and table 12.

20


Database Management [436304] TABLE 13: BCNF - BATTLEFIELD

ID 1 2 3 4

B ATTLEFIELD Thermopylae Stalingrad Normandy England

TABLE 14: BCNF – BATTLE RESULT

V ICTOR B ATTLEFIELD Persian 1 USSR 2 USA 3 Viking 3 Viking 4

D EFEATED Sparta Germany Germany French Celt

Now, if Normandy is to be changed to France, all data would be updated. 4NF Forth normal form is based on multivalued dependency (MVD). If a part of a record must be repeated in order to address the rest of the record, it is not in a 4NF. To ensure MVD is to make a 4NF. Therefore, 4NF is completed when these criteria are met. 1. Must be in BCNF 2. No multivalued dependency In other word, 4NF requires all attributes which are repeated in order to address the rest of the record to be separated into another table. 5NF Fifth normal form is a higher level normal form and hardly be seen in normal applications. It requires join dependency. Join dependency is a condition which guarantees the original table when a large table is reduced into smaller ones and these smaller tables are joined back. All records must stay intact, no additional record appear and no record lost. For a relation to be in 5NF, these criteria must be met: 1. Must be in 4NF 2. Ensure join dependency Originally, there is a large table. The large table is reduced into 4NF tables. When these 4NF tables are joined by the superkey, the result table must be identical to the original large table. DKNF Domain/Key or DKNF is one of the most complicated normal form. It depends on domain of the data as some data which can be constrained by some logical rules. It is rather difficult to create and sometimes involve broken down data in an attribute.

21


Database Management [436304] 1. Must be in 5NF 2. Use constraints from domain constraints or key constraints This means digging into an attribute which can be considered part of a domain and break down the attribute into another attribute. Then, put that attribute, along with a key, into another table. Use join in combination with functions to get the information back. It is important to note that some books uses the term 6NF for DKNF. This can create a lot of confusion. Students must read the definition of the term used in a book to make sure they understand things correctly. 6NF Sixth normal form is the highest normal form by the time this handout is created. These are criteria for a table to achieve 6NF. 1. Must be in 5NF 2. Table has only a key and only one another attribute. Now, students can see that this normal form is the most atomic level of a data. A table has only two attributes. One attribute is a key, important for relation across tables. Another attribute is the data itself. In application purpose, 6NF eliminate problem with Null values in tables which a key is required by other tables to exist but no other information.

WHY SHOULD I DO IT? OR WHY I SHOULDN’T? Normalization is done to ensure data integrity and eliminate anomalies. Data integrity is ensured when any changes on one set of data affect the rest in a correct way. Anomalies, on the other hand, is to keep the changes on data to affect only relevant attributes while not mess up the rest of the data. Some anomalies affect insert process, prevent users from inserting data. However, despite these benefits, sometimes normalization should be avoided.

22


Database Management [436304] W HY

SHOULD

TABLE 15: ANOMALIES

ID 01 02 02 03

First name Judge Rambo Rambo Casper

Last name Dread Babo Babo Ghost

Weapon (required) Pistol M16 Army Knife

Considering table 4, there is a problem with data integrity. The problem occurs on 2 nd and 3rd record. If in one cell the lastname of Rambo Babo is misspelled or somehow changed from Babo to Barbo, data integrity would be loss. Also, if a new record is being added to table 13, one would have encounter an insert anomaly. Because a character must have at least one weapon. This makes a character without weapon cannot be in the database. Normalization could get rid of those anomalies. Such as shown in table 5 and table 6. TABLE 16: FIXED ANOMALIES - CHARACTER

TABLE 17: FIXED ANOMALIES - WEAPON

ID 01 02 03

charID 01 02 02

First name Judge Rambo Casper

Last name Dread Babo Ghost

Weapon (required) Pistol M16 Army Knife

Normalization has these benefits, however, it also has some limitations and inconveniences. W HY

SHOULD NOT

There are a number of reasons that one should not do normalization under certain circumstances. 1. Join is costly DBMS software support normalized design. However, when it comes to joining tables, it is costly for the system. Joining unindexed column is slow. Joining a long text column is slow, and can be slower if that column is not indexed. Programmers also affected by a lot of joining because it could become confusing with all the tables and keys and conditions involved in querying the information. Try to think about a student biography. If a designer decided to normalize the

23


Database Management [436304] database to 6NF, which one table has only two attributes, how many join statement a programmer must perform to get a complete student biography. It is going to involve half-a-hundred tables with the same number of keys and the same number of attributes in SELECT statement. After that programmer resigned, the next person would have a big problem maintaining the program. 2. Difficult design Normalization reduce one big table into many smaller ones. This can put a lot of stress to the designer to keep track of all those tables, especially in a very large database system. Such difficult design is also accompanied by a big pile of design and implementation documents. Resources must be poured into production of these stuffs, and can be seen as an overhead. The designer does not get the worst of this problem because it is him who designs the system. He knows what he thought at that point, given he does not forget about it. The real sufferer is the people who work on his position after he resigns. In this case, the new employee must make sure him, too, understand all the design and the reasons that make the system get a design that way. This is not an easy task. 3. Time constraint Normalization is generally perceived as a good design because all its benefits. However, a good design might not be suitable in a situation that demand quick results such as prototyping. Prototyping is a process which a development team is required to make a demo program which is presented to the customers. This process usually limited by an extremely tight deadline. For example, a project is announced today and a prototype must be presented in 3 days. In webpage cases, the deadline could be tomorrow. Therefore, in such a short time, normalization is not really something that can be achieved and still be able to finish the job in time. Developers and designers should keep in mind that it is not necessary to do normalization.

24


Database Management [436304]

ENTITY RELATIONSHIP (ER) DIAGRAM Entity relationship diagram or ER diagram is a data model to describe relational database structure. It is a standard tool for communication between database designer and programmer who implements a database. It is critical for people who need to communicate with database specialist to understand ER diagram.

CHEN’S VS CROW’S FOOT There are two types of ER diagram. First is called Chen’s diagram. Second is Crow’s Foot diagram. These two types of ER diagram are used for different purposes. Both has its advantages and disadvantages. Their appearances are being compared in figure 1. StudentID

SubjectID Student

Name

SubjectID

Student PK StudentID Name FK SubjectID

study

Subject

SubjectName

Chen’s ER diagram

Subject study

PK SubjectID SubjectName

Crow’s Feet ER diagram FIGURE 5: COMPARE CHEN'S DIAGRAM (UPPER) & CROW'S FOOT DIAGRAM (LOWER)

Chen’s diagram is proposed by Peter Chen (1976)2 in an attempt to establish a unified view of data. It is a great tool for conceptualize data to see how each set of data interact with others, which in turn making information. This type of ER diagram could be found in design documentation. However, it does not provide a good communication with programmer who implements a database. Crow’s Foot ER diagram is more implementation oriented.

2

http://www.csc.lsu.edu/~chen/pdf/erd-5-pages.pdf

25


Database Management [436304] Crow’s Foot ER diagram is widely used in both design and implement documents. It got its name from the line which describes relationship between entities. The end of a line shows cardinality of instances. It tells how many times a record in one entity shows up in the linked entity. That line ends with either single straight line or expands into three lines which look like bird’s foot. How it comes to be crow’s foot is not known. Crow’s foot diagram is the main focus of this course. It is important to note that Chen’s diagram is a great tool for conceptualize relational data model. However, in an aspect of this course which involve some implementation, Crow’s Foot diagram suits better.

CROW’S FOOT DIAGRAM Crow’s Foot diagram can be drawn using a set of standard notation. In relational database, designer group data into a group of entity, also known as table. Then, link those entities together using relationships. After that, cardinality and modality is marked to the relationship. In this section, each notation is explained. E NTITY Entity is drawn as a table. This is a reason why an entity is also called a table by programmers. In an entity, there must be these information. 1. 2. 3. 4.

Name of an entity Attributes of the entity Primary key of the entity (PK) Foreign Key (FK)

Entity name Student PK StudentID Name FK SubjectID Attribute name Key indicator PK = Primary key FK = Foreign Key Blank = non-key FIGURE 6: NOTATION FOR ENTITY

Sometimes in a design process, entities are left with only Name of an entity because listing all attribute would have consume too much time. A complete entity is shown in figure 2. Note that naming attribute could be controlled by naming convention which is agreed upon by design and development team. This naming convention is extremely useful in implementation process because it helps distinguish different elements. For example, a student entity may be named “tblStudent” to show that this object is a table. A view could be made by joining tblStudent with other table and named “viewStudent”. With two different name (tblStudent and viewStudent), programmers do not confuse when try to insert data into the database. It is also important to note that some DBMS is case sensitive which means lower case “a” and upper case “A” are different. Finally, do not

26


Database Management [436304] name anything in a database using any language except English. Naming and entity, for example, using Thai language introduce the system to countless technical problems. R ELATIONSHIP

perform Entity Entity Relationship is drawn in a line linking two 1:1 entities. The line could be solid or dotted line, each express different relationship. perform Entity Entity The important part of the line is the head. 1:M A single line express “one” relationship on perform that particular entity. “Crow’s foot” line Entity Entity express “many” relationship on the M:M particular entity. Therefore, a line with one FIGURE 7: NOTATION FOR RELATIONSHIP end ends with straight line while another end ends with crow’s foot express one-to-many relationship. Relationships are shown in figure 3. Most of the time, conceptual relationship is noted on top of the line to explain the relationship during a design phrase. However, in implementation phrase, those notes might present. C ARDINALITY

AND

M ODALITY

Cardinality and modality is a part of 1 and only 1 Entity Entity relationship line. Cardinality is the 1 and only 1 relationship between entities. It can be 11, 1-M or M-M. This is cardinality. It 1 or many Entity Entity expresses how many times one record Zero or 1 from one entity shows up in another Zero or many entity. Modality, on the other hand, Entity Entity express how many records must be in an Zero or many entity for a record in other entity can be FIGURE 8: NOTATION FOR CARDINALITY AND MODALITY formed. For example, a citizen can marry someone. Therefore, we got a citizen entity with name, surname and other attributes. Then, we got marriage entity with both parties name, surname, date of marriage and other data. A citizen can only hold only one marriage status at a time. Therefore, citizen and marriage has cardinality of 1-1. This is also called 1-1 relationship because cardinality is a degree of relationship. However, a marriage cannot happen if there is no citizen. This leads to modality of 1 in citizen entity. Citizen can live without getting married. This leads to modality of 0 in marriage entity. Therefore, a full relationship between these two entities is expressed as 1-1 relationship with “one and only one” citizen to “Zero or one” on marriage.

27


Database Management [436304] Figure 4 shows notation of cardinality and modality. You may notice that “zero or many� is also marked by dashed line relationship. This is called unidentifying relation. Identifying and non-identifying relation is different by the requirement of relationship between entities involved. If a relationship requires both side to have at least one particular attribute from another table, that relationship is identifying. In implementation, this usually results in including a foreign key from that particular table to the primary key of its pair. On the other hand, unidentifying relationship does not require an entity to have the attribute. In implementation, that foreign key is not set as required and not included into the set of primary key. Note that these identifying relationships are constrains to the database. A constrained database is more false tolerant but less flexible. P UT

ALL INTO ACTION

Try to write some basic ER diagram using notation you have learned from this session. This part of the handout give you a few example data for practice (all plural nouns are made singular for the sake of example). 1. 2. 3. 4. 5.

Computer and part Employee and employer Teacher and student Boyfriend, girlfriend and the Gig (More than friend, less than boy/girlfriend) Student and course in university

It is important to note that ER diagram is a set of standard communication protocol. Right or wrong in this setting is the use of notations. The actual design can be different by perspectives, but the design must correctly answers business logics of the case. Therefore, exercises given in this handout do not come with solutions. However, if students want to know how the instructor would have written ER diagram for these topics, check out instructor’s website3.

3

https://sites.google.com/site/jirapontanasanti/ Go to page for Database management course and find solution of handout 5.

28


Database Management [436304]

JOB SPECIFIC DATABASE Database is essential for many business operations. From purchasing and selling goods to strategic planning, database is required by all parts of an information-driven organization. However, a single database is not built for all tasks. One can only be specialized for one or a few tasks. This handout explores these database systems. Databases which are designed for specific jobs.

OPERATIONAL DATABASE Operational database is optimized for concurrent access from multiple users. It must process many requests in a short period of time. It can perform data manipulation quickly. Reliability is a must for operational database because business hardly stops. Examples of operational database are point-of-sale and e-commerce website. P OINT

OF

S ALE

Point-of-sale or POS (not to be confused with Part-of-speech) is a machine which retail stores have at their cashier counter. When a customer got the goods they wanted, they put all the goods on cashier counter and the store person check-out those goods on a POS machine. The old model of these machine prints an invoice for a customer. It also hold a copy of the printed invoice for store manager to check for daily sales. However, those long list of sold items are similar to 0NF or 1NF (at best). These data is used only at operation level and not very useful at strategic level because it is difficult to use. POS system is developed to provide users with relational database capabilities. Now, every transaction made by the POS is inserted into database and can be used later. Now, POS systems are designed for Online Transaction Processing or OLTP. Think about a Tesco-Lotus store. All their POS insert and update data all the times. They cannot use a standalone version of POS because of such continuous and simultaneous task. This is but one of examples of operational database. E-C OMMERCE W EBSITE From POS system at cashier counter, developers put those system to customer’s computer screen. Customers now scan their own barcode, by clicking desired products, and check out their goods, by paying via online secured channels using credit cards. This is basically POS system at customers’ homes. This is early E-Commerce website.

29


Database Management [436304] However, E-commerce is a wider concept. E-Commerce website is the “front-end” of the business. E-commerce also involves back-end of business. For example, user buying products is front-end while the shop buying supplies from suppliers is back-end. ECommerce system track store’s inventory and alarm store manager when goods are depleted. It also warns the store persons if any goods seems to be running out of shelf life. Therefore, a full E-Commerce system is one of the extremely operation oriented database system.

ISSUES ABOUT OPERATIONAL DATABASE Operational database has a few issues. These issues are to be considered when designing the database system. The most obvious one is concurrency management, which is the focus in this handout. C ONCURRENCY

MANAGEMENT

Concurrency means simultaneous connection from multiple user. This is one of the biggest issues of operational database. Operational database which involves a lot of people trying to update the same data at the same time should design the database accordingly. Think about a retail store’s database system. A store person sells a can of cola. The system check-out that cola and allow the customer to take that particular cola out of the store. Store’s cola inventory is reduced by one. This is normal business. However, two store persons sell 2 cans of cola at the same time. Each request the inventory to reduce inventory by 1. The database must be able to perform both requests otherwise the inventory would be reduced by only 1. This case is also true with E-Commerce system which a lot of customers order the same goods at the same time. They also post their comment to the same goods at the same time. Their comment number must be placed correctly. By technical aspect, data manipulation are classified as create, read, update and delete or CRUD. Whenever a CUD (no R) request reaches the database, the table which data manipulation takes place would be locked. So, only one request can be performed and avoid any errors. Other requests are queued or otherwise handled.

ANALYTICAL DATABASE If an operational database is a database for working ants, an analytical database is a database for elephant. It is enormous in size, use a lot of energy to maintain and slow to move around. However, its steps have a lot of impact. This database is used for

30


Database Management [436304] strategic planning and decision support. One of them is OLAP which is used to make business decisions. Another famous system is for a technology called datamining. The two systems are introduced in this handout. Note that there are variety of analytical database out there that students should learn too. O N L INE A NALYTICAL P ROCESSING

OR

OLAP

Online analytical processing or OLAP is a database system which allow user to navigate through data to get the information. It might be visualized as OLAP cube as shown in Figure 9. Each dimension represent a specific attribute of the data. In other word, it is 3D table with many pages bundled into a cube which user can roll around and switch pages to see how each data interact with others. Historically, OLAP was developed during the relational database emergence. At that time, relational database can handle operational data quite well. However, when all those operational data are to be queried for strategic decision making, the system run unbearably slow. To fight that problem, many technology firms, like Oracle or Microsoft, offered commercial solutions. These solutions were the birth of OLAP.

FIGURE 9: OLAP CUBE

One of the reason that cause navigating through large relational database is the processing needed to join all the data. Users tend to need a lot of views on the data. For example, they might want to see how well milk is selling. Then, who buy those milk. Then, what else those customers buy with the milk. And then, how much does the suppliers cost us for the milk. You can see that during the first request, there are operations which must be done to get the information. For every operations, the input from earlier operations is needed. This create a long chain of operation which cost the system dearly and cause it to perform slowly. The solution to that problem is to preprocess those data before the user wants to see it. This way, the user does not have to wait for a long time for the system to process the data but it does require the system to be prepared beforehand. And preparing OLAP system is performed by IT specialists who are busy most of the time.

31


Database Management [436304] The second problem is that OLAP relies on specialists to make changes to OLAP structure. IT specialist must determine which data are going to be put in and how that data going to be grouped with other data. IT specialist must prepare for every combination of data that user might roll the cube into, otherwise it is going to be slow. This is like an oracle predicting if an army going to win a war. Therefore, for a company to use OLAP, IT specialist is an essential resource. Dimension of the OLAP is also an issue. Too many dimension creates confusion. Most people can manage 3D data as visualized in OLAP cube but 7D data seems to be the most complicated set of data that human can comprehend. Just try to think how anyone could put 7 lines of columns into one table. This is how confusing OLAP can be. With such limitations, OLAP is great for structured environment. Structured data such as student grading system is good for OLAP because these data can be put into cube rather easily. However, data which are not very organized might not be suitable for OLAP. Sales data which can be vary from sales person to sales person might not be suitable for OLAP. OLAP might has a number of limitations but it is a good and proven tool for data analysis. There are a number of improved versions of OLAP out in the market and students should learn about them. Remember, OLAP is a tool for data analysis. It might look complicated but it works. D ATA

WAREHOUSE

Data from operational database can be stored and used for strategic planning. However, such as large data must be kept somewhere. Data warehouse is a place to do so. However, it is not just a collective lump of data. Data warehouse systematically collect history of data. This allow users to see changes on data over time. Ultimately, leads to better decisions. Therefore, data warehouse is classified as business intelligent tool. There is a special process which can “discover� information from data warehouse. This process is called Data mining. It is also classified as business intelligent tool. Data mining is a very useful tool to make value out of a lot of seemingly useless data. On seemingly useful data, it is even more powerful. The instructor suggests students to try research deeper into data mining.

COMPARE OPERATIONAL AND ANALYTICAL DATABASE As you can see, database can be configured for specific task. Good database does not perform well at everything. In that case, it is rather called good-for-nothing. Database should be optimized for specific tasks, tasks which it should be excel at, and let other database system worry about other tasks.

32


Database Management [436304]

33


Database Management [436304] S IMILARITIES It is important to note that, despite differences, operational database and analytical database are often similar in theory. They are structured the same. They need DBMS to perform. They contain tables and the data inside. They must have keys, indexes and queries. In short, they are, well, database system. D IFFERENCES However, these two different kind of databases are different in design and configuration. These differences are listed as follow. 1. Analytical database is usually require a dedicated resources. This means analytical database should be set up in its own machine because it requires much processing power. 2. Operational database is optimized for real time operations. It is meant to be used all the time. It is designed to process many short data manipulation requests in shortest given time. Analytical database, on the other hand, is designed to be used from time to time because strategic planning is not done every day. It is designed for complicated queries. 3. Analytical database usually support only a few concurrent user. This is because analytical database is used by the top head of an organization. Designer does not want too many people to see all those sensitive data. Operational database, on the other hand, is used by thousands of users at a time. It should be configured and designed for that. 4. Operational database must be maintain all the time. System administrator should perform backup regularly on operational database. Operational database is the heart of a business. Analytical database might be maintain only a short time around usage.

34


Database Management [436304]

CASE STUDY Database design and implementation is a complex task. There are many successes and mistakes which should be studied and, if possible, avoid.

HOW TO DESIGN A DATABASE (FOR DUMMY) Designing a relational database is a matter of identifying relation between entities of data. This could be difficult for inexperience designers. In the case study handout, we start with the basic steps of designing database. 1. Know what you are going to design First thing first, you must know what you are going to design. Otherwise, the best you can do is a good, anomaly-free, database which cannot be used. To know your system, you must look into the business in question. You must understand what data you must keep and what not. 2. Find those data Sometimes you must see the actual data to know exactly what you are working with. To do that, simply gather relevant documents. Draw a simple table and put those data into your tables. You might want to try Chen’s ER diagram. Put the name of the document in a rectangle. Then, link with it all bits of data in ovals. That cloud is one table. 3. Put the same thing together as an entity After you know exactly what you are working with, you can try to design the tables. A table should hold data relevant to a single type of object. First, you look for objects in your design. By objects, it means anything that share the same attributes. For example, “Mr.Zucker” and “Mrs.Zucher” are both customers. They are human. They should be treated as the same object, which is customer, and put into the same entity or table. 4. Classify data into columns After you get your entities in place, you can start working with their columns. You should be able to see what kind of information associate with an entity. For example, a customer should have “name” as an attribute because it is an associated data with a person. This is also true with employee, so employee table should have a “name” column in it too. You analyze the entity and make columns based on your understanding of the business which you should have been in step 1 +2.

35


Database Management [436304] 5. Give them primary keys After you get those columns in your tables, you should make them primary keys. Remember that primary key can uniquely identify a record. Primary key cannot be duplicated in a table. Primary key cannot be left blank or null. In this step, you should imagine inserting data into your tables. See if the primary key might be blank or duplicated. If it does blank/duplicated, you must fix it. One way to fix it is to make a composite primary key. Another way is to give an ID attribute. Note that giving composite primary key might lead to partial dependency while giving ID attribute might lead to transitive dependency. So, design carefully. 6. Give them relationships When all primary keys are in place, you can setup relationships between tables. In this step, you find relation between entities. You might start with your entities, which from step 3 their names should be noun. If your “noun entities” have M:M relationship, you might need a linking entity because it is easy to have duplicated data in tables with M:M relationship. Linking entities are usually named with verb. This is because you have 2 noun entities in relationship. For example, “customer” “buy” “products”. You have “customer” and “products” entities from step 3. You had given them IDs as primary keys. But to put product’s ID in customer entity, you will have many customer’s ID for a person. So, you put customer’s ID and products’ ID into “buy” entity. Those keys are both PK and FK in “Buy” entity. This eliminate the problem with multiple ID for a person. In short, you try to set up relationships. You try to link entities together according to business rules, such as customer buys products, without violating the rules of primary key (no duplicate, no null)

Some key points As shown from pre – post session quiz, some students are having problems with database design and ER diagram. So, this handout reminds you of some mistakes many students committed. 1. Every table must have a primary key which…  Cannot be empty  No duplicate  Uniquely identify all attributes 2. When tables are linked, foreign key(s) must be specified. These keys are:  Primary key of other tables  May also be part of primary key in the connecting table 3. Do not include sample data in ER diagram Try making your sample database in MS access. Set your keys and link tables. See relation panel for ER diagram. Try insert, update and select data to see problems with your design.

36


Database Management [436304] 7. Do normalization After you finish your design, you should look back and identify any functional dependencies, such as partial or transitive. Do normalization, so anomaly does not occur. This should be done according to time constrain. 8. Improve your design It is extremely rare for a perfect design to happen. You should keep that in mind and rethink your design according to business rule, which can be changed over time. It is important to improve your design when time and resource permit. Database design should be simple. There is a principle called KISS, Keep It Simple – Stupid. Student could choose to research various design principle for their term report.

HOW NOT TO DESIGN A DATABASE OK. We learned how to design a database. However, there are some mistakes which should not be repeated. This is how “not” to design a database system. U NDERESTIMATE

THE SIZE OF DATA

Database designer must consider all possibilities about data recurrence. Failure to do so may result in a number of problems. For example, the database schema or the database design itself might require periodical manipulation. An example of how recurrence data are poorly handled is this. Say, there is a company which sells electrical devices. This company issues a policy that they allow their customer to pay partially every month for 2 months until they pay a full price. The database designer takes that policy seriously and designs payment table to fit only 2 occurrences. The designer assigns a key of the table to be [Bill_number] which is the bill of the purchase. Then, the amount of 1st payment is stored in [intPay_1] and the time of the payment is stored in [datPay_1]. Similarly, [intPay_2] and [datPay_2] store payment information about 2nd payment. The design is shown in figure 1. Now, the real world does not work as written in script. The debt collecting team cannot collect the full price in 2 months. They negotiate with the customers for 3 times payment instead. Now the tables must be modified. The designer adds [intPay_3] and

FIGURE 10

FIGURE 11

37


Database Management [436304] [datPay_3] to the existing table. The design is shown in figure 2 with the added columns marked with red highlight. However, the executives seem to see that the interest rate allow partial payment to be more profitable to the company. So, they cancel their last policy, which the database system is strongly based upon. Their new policy is for customers to pay as many time as they like, as long as the full price is covered. The designer can only add as many columns as possible to the table. Thing is, not all customers pay 24 times or 2 years. They might pay in cash, leaving any payment after the 1st time blank. However, DBMS is required to reserve storage for every column. This is a bad utilization of resources. However, for the designer to convert this bad design to a better one, it is impossible because this table is linked to many other operations. The design is shown in figure 3. The better solution is to design for expansion in the first place. Instead of creating fixed columns, designer should use rows for each FIGURE 12 payment. Bill and payment is in 1:M relationship for 1 bill has many payments. The designer may separate payment tables into 2 entities, bill and payment. Then, link them with a 1:M relationship. The design is shown in figure 4.

FIGURE 13: BETTER SOLUTION FOR DATA EXPANSION PROBLEM

N AMING

AMBIGUOUS

Naming tables and columns might seem to be a little thing and sometimes overlooked. Then, it leads to terrible disaster in development process. Bad naming makes developers confuse. It may make a query so long that people cannot understand it. Usually, naming convention is set by design/development team as they work together. Name should be short but clear. It should not be ambiguous. It must follow DBMS rules and avoid conflicting with system reserved words.

38


Database Management [436304] An example of bad naming is to make an acronym which is difficult to understand. For example, a motor repair company might design a database for car’s parts. This database has a table called [Car] and has a primary key [cID]. Each car has 4 tires, so there is a [Tire] table which has [tID] as primary key. Position of each tire is recorded in car table’s [ltfID], [rtfID], [ltrID] and [rtrID] which are Left-Tire-Front, Right-Tire-Front, Left-TireRear and lastly Right-Tire-Rear. After a few years of service, this system is required an optimization. How could the person who perform improvement over this system understand what [ltfID] and other columns mean? Solution to this problem is to hold a meeting with the team. Set up a naming convention which all personnel must adhere. This naming convention must be documented for further references. It is important to note that acronym is not bad as a name by itself. It is bad when the acronym is not making sense. D OES

NOT SET AN INDEX

It is important to set index column for speedy query. Without an index, query is time consuming and slow the entire system down. This issue is especially important for operational database which works in real-time. R EAL

WORLD CASE STUDY

This section refer to some real world case study which might not entirely relate to the design phase of database system but worth reading. It is a good supplement reading for students to understand how database design could turn out badly. 1

Gorman, T. (2006, October 16). Bad CaRMa. Retrieved from simple talk: https://www.simple-talk.com/opinion/opinion-pieces/bad-carma/

2

Vinson, J. (2007, April 2). Really Unique Passwords. Retrieved from The Daily WTF: http://thedailywtf.com/Articles/Really_Unique_Passwords.aspx

3

Whitemarsh Information Systems Corporation. (2008). Modeling Data and Designing Databases. Retrieved from Whitemarsh Short Papers: http://wiscorp.com/sp/sp06.pdf

4

Preez, R. d. (2012). Bad Database Design ERD. Retrieved from Creately: http://creately.com/diagram/example/gz2j13w82/Bad+Database+Design+ERD

The last item is shown in the figure 5. Note the design flows on making each table for one record such as Witnesses 1, Witnesses 2 and so on.

39


Database Management [436304]

FIGURE 14: BAD DATABASE DESIGN ERD (THE 4TH CASE STUDY)

40


Database Management [436304]

MYSQL & SQL COMMANDS Here comes the fun part. Students have acquired all knowledge about database design from the first half of this course. Now is the time to put them into practice. In the second half of the course, students will learn how to build, or implement, a database system of their own. Students must learn technical skills for database management. This could be a rough ride, so you better hang tight!

DBMS First, a database system needs Database Management System or DBMS. This program is the core of database system. It is similar to Windows OS to database. It abstracts every technical task to the user, so the user does not have to worry about any sophisticate techniques in hard disks. User is only needed to tell DBMS the command and it takes care of the rest.

MYSQL MySQL was an opensource DBMS widely used throughout the world. If you see some popular website hosted on linux machines, chances are they run MySQL for their data storage. MySQL is easy to use, easy to configure and free. Until, Sun microsystem brought MySQL and itself been brought by Oracle, the biggest competitor in database business. MySQL is still free, although it is not improved as quickly as it used to be. W HY M Y SQL It is free. This is the most important feature. It is easy to use. It runs standard SQL language. It is easy to setup. Easy to maintain. It can also manage large data. If a user is a person or a small business, MySQL fits the needs just fine. On the other hand, if the users are enterprise business, MySQL also has enterprise version to offer. H OW

TO INSTALL IT ON MY MACHINE ?

In this course, MySQL is already installed in student’s machine. If students want to run it at home on their notebooks, please download

FIGURE 15: RUNNING MYSQL CONSOLE VIA WAMP SERVER

41


Database Management [436304] WAMP server. It is an opensource virtual server bundle. Its name stands for Windows Apache MySQL PHP. One bundle does all. After WAMP server is installed, student can run MySQL by left clicking the WAMP logo. Select MySQL, then click MySQL console. Student can also use it via PHPmyAdmin or other GUI products. Now, let’s get to work.

SQL COMMANDS Structured Query Language or SQL is a standard language for manipulating relational databases. Most RDBMS runs on this language. There are variation between each implementation of DBMS though most of them run standard command without problem. In this course, students must be able to use SQL commands to manipulate relational database. Students use SQL in MySQL console to ensure no software aid, thus encourage skill building.

GETTING START First, after starting MySQL console. Student is most likely running as root user, which has all privileges and can do anything with the database, include breaking it. This is not good for the DBMS. So, students should start by protecting DBMS from themselves by changing user. The first thing students should keep in mind is SQL statement, or command, must end with semi-colon (;). If you cannot make your code works, chances are you forget a semicolon. C REATE

USER

Use command: CREATE USER ‘username’@’host address’ IDENTIFIED BY ‘password’; This command make a new username which will use the database from the host address. The password for that account is set by IDENTIFY BY. For example CREATE USER ‘myname’@’localhost’ IDENTIFIED BY ‘1234’;

42


Database Management [436304] Next time this user wants to use the database, the user must enter username “myname”. The user must connect from “localhost” and must identify himself by “1234” password. Successful command results is shown in figure 2.

FIGURE 16: USER CREATED SUCCESSFULLY

G RANT

PERMISSION

Before user can perform anything with the database, the user must be granted permission to do so. To grant permission, use this command: GRANT [privilege type (column)] ON [database].[table] TO ‘username’@’host address’; Privilege type can be command name such as SELECT, INSERT and so on. With each command type, administrator can choose to grant that privileges over a specific area such as the column name. Note that column name must be in parentheses. The privilege is granted for specific databases and tables specified after ON. The privilege is given to the specified user after TO. For example: GRANT ALL ON *.* TO ‘myname’@’localhost’; This command grants all permission to user ‘myname’ to use all databases (marked by the first *) and tables (marked by the * after a period [.]) within MySQL. After running GRANT command, you need to flush the privilege list currently in use by MySQL for the change to take effect. This is similar to pressing “Apply” button in various Windows dialog box. To flush privilege, enter this command. FLUSH privileges; After privilege is flushed, the change is applied and the new username is ready to be used. Successful execution of these user creation command is shown in figure 3.

43


Database Management [436304]

FIGURE 17: COMPLETE CREATING A NEW USER ACCOUNT FOR MYSQL

One last thing about creating new user account is changing password. As shown in figure 4, MySQL does not store password in its raw form. MySQL applied encryption to it and this allow more secured data storage. However, you cannot just change the password like any other data because MySQL treats all password as encrypted. The raw password will never be recognized.

FIGURE 18: PASSWORD STORAGE IS ENCRYPTED

To change password, SET PASSWORD command is used. Note that you can use UPDATE statement to change password too. However, it must be done with encryption on the password as well. SET PASSWORD command is shown in figure 5 with comparison of old ‘1111’ password and ‘1234’ password. Note that password is a string, a series of letters. String in any programming languages, SQL included, must be enclosed by “” or ‘’. Those quotation signs cannot be mixed (like “’ is not allowed but “ ‘ ’ ” can be used because they match with their pairs).

44


Database Management [436304]

FIGURE 19: SET PASSWORD SUCCESSFULLY. OLD PASSWORD = '1111', NEW PASSWORD = '1234'

USE & SHOW Use statement is used to focus on the object. For example, if a user access username from user table inside mysql database, he must type: SELECT user FROM mysql.user; You can see that user must type the name of the database before table name. This can be overcome by use statement. Then, user can type shorter name. For example: USE mysql; SELECT user FROM user; To know the name of the database, SHOW statement is used. SHOW statement is also used to show the list of tables. For example: SHOW databases; FIGURE 20: SHOW AND USE STATEMENT

SHOW tables;

45


Database Management [436304]

CREATE To start working with a database, we must have a database. To create databases and other elements inside it, CREATE statement is used. C REATE

DATABASE

CREATE is used to make new things in the database system. After a user account is created, one may create a database. A database is a collection of tables. A single DBMS like MySQL can support multiple database at once. This means people can run many applications with only single database server. Before we make a new database, we might want to see how many databases are in the system. To show the list of databases, use SHOW command. SHOW databases; Before you create a new database or anything, you might want to read through ALTER statement and see the warning in the first paragraph. To create a database, simply enter command: CREATE database DBname [optional database specification]; A new database is empty, no table. You can use SHOW databases; to see the result of the command. Successfully created new database is shown in figure 7. If database specification must be set, it can be specified after the database name. For example, if the database is going to store Thai language data, it should be set to store UTF8 data. To do this, run this command: CREATE DATABASE thaidb CHARACTER SET utf8 COLLATE utf8_unicode_ci; The CHARACTER SET part specify the database to use UTF8 encoding for data storage. COLLATE specify which character mapping scheme is to be used. For more on character mapping scheme, see examples on http://collation-charts.org/.

FIGURE 21: CREATE DATABASE

46


Database Management [436304] C REATE

TABLE

CREATE command is also used for creating tables. However, creating tables is trickier than creating database because tables require data type to be set. You can read more on data type in MySQL reference library4. This course focus on only a few basic data types. CREATE TABLE has many variation of use. It is recommended for students to finish reading the entire command before using it. To create a table, use this command: CREATE TABLE [IF NOT EXISTS] tbl_name (create_definition,...); For example (new line is not necessary): CREATE TABLE IF NOT EXIST student (student_id char(8) NOT NULL, student_name varchar(40)); Note that if you specify [IF NOT EXISTS] MySQL will first check if that table is exist. If it does, MySQL terminate the statement. Otherwise, the table is created. In create definition user must specify names of the fields which going to be in the table. All fields must also be specified with appropriate data type. If case we need primary keys to be set, PRIMARY KEY must be specified in table definition area. CREATE TABLE student (student_id char(8) NOT NULL, student_name varchar(40), PRIMARY KEY (student_id)); If foreign key should be created, foreign key then be specified in table definition area as well. Note that foreign key can be with or without constraint (as known as referential integrity) but if a constraint is used, cascade manipulation can be done. To specify foreign key, run the command with foreign key option.

4

Oracle Corporation. (2013, July 15). 11 Data Types. Retrieved from MySQL 5.7 Reference Manual: http://dev.mysql.com/doc/refman/5.7/en/data-types.html

47


Database Management [436304] CREATE TABLE student ( student_id char(8) NOT NULL, student_name varchar(40), student_highschool INT, PRIMARY KEY (student_id), CONSTRAINT fkStudentSchool FOREIGN KEY (student_highschool) REFERENCES tblschool(school_id) ON UPDATE CASCADE ); Foreign key option consisted of CONSTRAINT followed by the name of that constraint. Then, FOREIGN KEY is specified in a parentheses. REFERENCE points to another table and the field in question is put inside a parentheses. Lastly, ON statement specify cascade condition on conditions such as UPDATE and DELETE.

ALTER Business changes. Database, too, have to change to meet business needs. Sometimes, the design is not perfect and require reconfigurations. All alteration on database schema is perform by ALTER statements. A LTER

DATABASE

Database can be altered by: ALTER DATABASE database_name CHARACTER SET charset COLLATE collation_name; However, database name cannot be changed easily. To change database name, UPDATE statement is used. UPDATE statement is explained in other session. It is not recommend to change database name. Therefore, it is important to name the database well in the first place. It is also important to note that changing CHARACTER SET of a database is to change default character set. It is not conversion of existing data inside the database. A LTER

TABLE

Table can be altered by ALTER statement. For example: ALTER TABLE tblStudent CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

48


Database Management [436304] The example modify the table to use character set UTF8. Like altering a database, it affect newly created columns but not the existing ones. ALTER TABLE can also rename/add/remove columns. If the existing structure is not clear, DESCRIBE statement is used. For example: DESCRIBE tblStudent; The result is shown in figure 8.

FIGURE 22: DESCRIBE STATEMENT

ALTER table statement goes like this: ALTER TABLE table_name CHANGE column_old_name column_new_name datatype(size), ADD COLUMN column_name datatype(size), DROP COLUMN column_name, MODIFY column_name datatype(size); An example of ALTER statement is as followed: ALTER TABLE student CHANGE student_highschool student_school INT, ADD COLUMN firstname varchar(25), ADD COLUMN lastname varchar(25), DROP COLUMN student_name, MODIFY student_id INT AUTO_INCREMENT; This ALTER statement change the name of student_highschool column to student_school which store integer data. The statement also add two columns while drop student_name column. Lastly, it change student_id to integer which is auto incremented. The result is shown in figure 9.

49


Database Management [436304]

FIGURE 23: ALTER TABLE RESULT

Note that if an error number 150 occur in MySQL after running ALTER statement, it is because foreign key constraint. To allow alteration of the table, the constraint must be removed by drop statement inside alter statement as follow: ALTER table_name DROP FOREIGN KEY constraint_name;

DROP Drop equals delete. Drop is one of the easiest statement in database system. However, its consequence can be hazardous. Before students drop anything, make sure it can be dropped. For what was dropped cannot be recovered. Make sure you backup important things before performing drop operation. Even with backup, restore procedure is time consuming. So, do not drop a thing unless you are very sure that it can be drop. D ROP D ATABASE When a database is no longer used, it can be dropped by a short statement. DROP DATABASE database_name; For example: DROP DATABASE test; The statement delete a database name test.

50


Database Management [436304] D ROP

TABLE

To drop a table, use DROP TABLE command. DROP TABLE table_name; For example: DROP TABLE bckStudent;

WHERE CAN I LEARN MORE? This handout introduce a few statements and commands to manipulate database and tables. However, it is important to know that this is but a small fraction of what MySQL has to offer. Each statement holds a large set of options to choose from. For example, instead of dropping the entire table, one can remove only the data by using TRUNCATE instead of DROP. Command such as this can be studied in MySQL manual5. Make sure you choose the right version (the latest by the time this handout is being write is 5.7). The manual is written in English. If you can read this handout, you should give it a try. However, the manual is a technical documentation. It is more complicated to read. The instructor is highly recommend students to read such manual because it is important for communication with database experts. This is one of the goal of this course.

5

Oracle Corporation. (2013, July 15). MySQL 5.7 Reference Manual. Retrieved from MySQL 5.7 Reference Manual: http://dev.mysql.com/doc/refman/5.7/en/index.html

51


Database Management [436304]

SQL COMMANDS II INTRO TO SELECT Do not panic just because this is not in the teaching plan. Ok, SELECT statement is not the main focus in this session but you cannot drive a car without knowing how to refill the oil, can we? Select is basically a command that shows data in a table. Let’s try to figure out how it works by studying figure 1. The next session will deal with SELECT statement in detail.

FIGURE 24: SELECT IN MYSQL

INSERT W HAT

DOES IT DO ?

Insert statement is used to put a new record into a relational database. Data is created from a program which communicate with the database. For example, an E-commerce web program developed using PHP receives commands from customers. Once the customer click purchase, the code commands database server. The programmer write that code to translate user interface interaction to SQL commands for the database. S YNTAX

INSERT INTO table_name (column name, column name 2, … ) VALUES (value for column 1, value for column 2, …)

52


Database Management [436304] Where the bold text is something user must define. The underlined text is optional information which can be specified or not. The example part below shows usage of insert statement.

E XAMPLE

INSERT INTO student VALUES (1,1,”firstname”,”lastname”);

1

INSERT INTO student (student_id,firstname,lastname) VALUES (1,”firstname”,”lastname”);

2

The first statement add a new record into table student. The value for the first column is 1. The value of the second, third and fourth column is 1, firstname and lastname, in particular order. The second statement add a new record into table student. However, it specify which columns data would be added. So, only three values are added. The second column, as inserted in the first statement, is left with null value. A live example is shown in figure 2.

FIGURE 25: INSERT STATEMENT IN MYSQL

The first command shown in figure 2 display data in a table called student. The respond is “Empty set” which means the table contains nothing.

53


Database Management [436304] The second command create a new record. Then, the command similar to the first command is once again called. The result shows that, at this time, table student contains a new data, created by the second command.

UPDATE W HAT

DOES IT DO ?

Update command modify a record. Similar to INSERT command which is initiated by user interaction with user interface of a program. An example is the “apply” button in Windows OS. Before user pressing apply, changes are made but the file is the same file. S YNTAX

UPDATE table_name SET column_name=value, column_name2=value,… WHERE condition Condition will be focus later in this course. Basically, it is a logical statement which focus the changes to one or some specific records. For example, user might specify a value of certain column. A live example is shown in figure 3.

E XAMPLE

FIGURE 26: UPDATE COMMAND IN MYSQL

54


Database Management [436304] The first statement shows the data inside student table. The second command the database to modify the data in student table by changing firstname to Djinn. But which record should be modified? The WHERE statement specify condition that only the records which contain 1 in student_id column are to be changed. The last statement show the change of data.

DELETE W HAT

DOES IT DO ?

Delete removes a record from a table. Note that it is unlike Windows OS which you have a recycle bin to undo the accidental deletion. MySQL does not have that. The best way to recover from such mistake is to back-up the database. If you accidentally delete records afterward, you can restore the database to the latest back-up point. S YNTAX DELETE FROM table_name WHERE condition; E XAMPLE

FIGURE 27: DELETE IN MYSQL

The first command of figure 4 shows data student table. The second command remove a record from student table if that record contains 2 in student_id column. The last command shows the change.

55


Database Management [436304]

SQL COMMANDS III SELECT W HAT

DOES IT DO ?

Select command is used to query data from a relational database. There are three parts in a select statement. The first part is the “select”. In this part, user specifies columns to be shown. The second part is “from” which specifies the table. The last part is “Where” which specifies the conditions. There are many options for select command. DISTINCT option can be added to select command to filter any repeating results. ORDER BY and GROUP BY can manipulate the result to fit user’s needs. SQL functions such as SUM() can be used to calculate the result from select command too. SQL Functions will be explained in volume 13th. Note that any <new line> in the example and syntax is purely optional. You do not have to press Enter when typing the command. SYNTAX

SELECT [DISTINCT] [list of column separated by comma] FROM [table] [optional - join statements] WHERE [condition] E XAMPLE Suppose the database is holding a data shown in table 1. Bill NO. A201308001 A201308002 A201308002 A201308003 A201308003 B201308004

Customer Wellington Napoleon Napoleon Khan Khan Wellington

Item Bread Cake Cream Milk Meat Meat

Price 50 120 30 20 150 150

TABLE 18: tblPURCHASE

Command 1 :

SELECT * FROM tblPurchase;

Result

The original table with all columns (specified in the command as *) results in the original table 1.

:

56


Database Management [436304]

Command 2 :

SELECT * FROM tblPurchase WHERE Customer=”Khan”;

Result

Show only the records that contain “Khan” in customer field as shown in table 2.

:

Bill NO. Customer A201308003 Khan A201308003 Khan

Item Milk Meat

Price 20 150

TABLE 19: RESULT FROM COMMAND 2

Command 3 :

SELECT Customer, Item FROM tblPurchase WHERE Customer=”Khan”;

Result

Show only 2 columns, customer and item, from the records that contain “Khan” in customer field as shown in table 3.

:

Customer Khan Khan

Item Milk Meat

TABLE 20: RESULT FROM COMMAND 3

Command 4 :

SELECT DISTINCT Customer FROM tblPurchase;

Result

Show only customer column with repeat results omitted as shown in table 4.

:

Customer Wellington Napoleon Khan TABLE 21: RESULT FROM COMMAND 4

Command 5 :

SELECT SUM(Price) AS SumNapoleon FROM tblPurchase WHERE Customer=”Napoleon”;

Result

Show sum price of all goods the customer named Napoleon has brought as shown in table 5.

:

SumNapoleon 150 TABLE 22: RESULT FROM COMMAND 5

57


Database Management [436304]

JOIN When a database is normalized, many data are represented with foreign keys. To get the original information, not the IDs, tables must be joined. To do that, JOIN statement is used in SELECT command. T YPE

OF JOIN

There are many type of join statements. Each join is explained in table. It is important to note that the first table mentioned in FROM is treated as LEFT table. The latters are the rights. A very good illustration of each type of join can be accessed via a link6 embedded in figure 1. JOIN LEFT JOIN RIGHT JOIN INNER JOIN

FIGURE 28: LINK TO SQL JOIN ILLUSTRATION

FUNCTION Return all records in the left table regardless if any of those records have a match in the right table. It will return matching records from the right table too. Return all records in the right table regardless if any of those records have a match in the left table. It will return matching records from the left table too. Return only records that exist in both tables. TABLE 23: TYPES OF JOIN AND THEIR FUNCTION

More than the joins shown in table 2, user can exclude results of INNER JOIN from the result by adding condition that key IS NULL in WHERE statement. Some RDBMS offer FULL OUTER JOIN which return all records from both tables. However, MySQL does not support FULL OUTER JOIN. To get a similar result, user must use UNION option to merge results from LEFT JOIN and RIGHT JOIN. SYNTAX

Join statement is used in FROM with SELECT command. [JOIN] [Reference table] ON [reference column]

6

Moffatt, C. (2009, Febuary 3). Visual Representation of SQL Joins. Retrieved from CodeProject: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

58


Database Management [436304] E XAMPLE Suppose there are two table, table 7 and table 8. Relationship is depicted in figure 2.

TABLE 25: tblCUSTOMER

TABLE 24: tblBILL

tblBIll PK billNO FK Customer Payment

tblCustomer PK customerID firstname lastname

Figure 29: relationship between table 7 and table 8

Type of join :

LEFT JOIN

Command :

SELECT * FROM tblBill LEFT JOIN tblCustomer ON Customer=customerID;

Result

:

TABLE 26: RESULT FROM LEFT JOIN

Type of join :

LEFT JOIN excluding INNER JOIN

Command :

SELECT * FROM tblBill LEFT JOIN tblCustomer ON Customer=customerID WHERE customerID IS NULL;

Result

:

TABLE 27: RESULT FROM LEFT JOIN EXCLUDING INNER JOIN

59


Database Management [436304] Type of join :

RIGHT JOIN

Command :

SELECT * FROM tblBill RIGHT JOIN tblCustomer ON Customer=customerID;

Result

:

TABLE 28: RESULT FROM RIGHT JOIN

Type of join :

RIGHT JOIN excluding INNER JOIN

Command :

SELECT * FROM tblBill RIGHT JOIN tblCustomer ON Customer=customerID WHERE Customer IS NULL;

Result

:

TABLE 29: RESULT FROM RIGHT JOIN EXCLUDING INNER JOIN

Type of join :

INNER JOIN

Command :

SELECT * FROM tblBill INNER JOIN tblCustomer ON Customer=customerID;

Result

:

TABLE 30: RESULT FROM INNER JOIN

60


Database Management [436304] Type of join :

FULL OUTER JOIN

Command :

SELECT * FROM tblBill LEFT JOIN tblCustomer ON Customer=customerID UNION SELECT * FROM tblCustomer RIGHT JOIN tblBill ON customerID =Customer;

Result

:

TABLE 31: RESULT FROM FULL OUTER JOIN

Type of join :

FULL OUTER JOIN excluding INNER JOIN

Command :

SELECT * FROM tblBill LEFT JOIN tblCustomer ON Customer=customerID UNION SELECT * FROM tblCustomer RIGHT JOIN tblBill ON customerID =Customer WHERE Customer IS NULL OR customerID IS NULL;

Result

:

TABLE 32: RESULT FROM FULL OUTER JOIN EXCLUDING INNER JOIN

JOIN statement works like set theory in mathematic. If you feels like you do not understand them by looking at the examples, make sure you check out the illustration linked by figure 1. It is highly encouraged that students practice joining tables.

61


Database Management [436304]

WHERE To specific conditions for SQL command such as SELECT, WHERE statement is used. WHERE statement check for logical conditions which result in either TRUE or FALSE status of a record. The conditionally TRUE records are affected by the query while the FALSE is not. Table Operators = <> != > < >= <= BETWEEN…AND… LIKE IN HAVING IS NULL IS NOT NULL AND OR

Description Equal to Not equal to. (Depend on RDBMS. MySQL support this) Not equal to. (Depend on RDBMS. MySQL support this) Greater than Less than Greater than or equal to Less than or equal to Between given range Equal a pattern (use % as a wild card) Equal any of specified values Compare results from aggregate functions such as SUM() Check for NULL data Check for any data except NULL Link two logical conditions. Perform AND operation on conditions’ results. Link two logical conditions. Perform OR operation on conditions’ results TABLE 33: LOGICAL OPERATORS

Moreover, result from SELECT statement can be manipulated by some options which follow WHERE statement. These options are shown in table 17. Option Description Order the result by values in specified columns. By default, it order in Order by…ASC/DESC ascending order (less to more). If DESC option is specified at the end, it order in descending order (more to less). 7 Perform aggregate function after grouping values in specified columns. Group by It is highly recommended that students train on these conditions and explore outside class for additional conditions. There are many conditional arguments which are not explained in this handout because it is hardly used but it can be extremely useful in certain situations such as INSERT INTO SELECT, which insert select results into a table.

7

Example of group by is in volume 13 (functions)

62


Database Management [436304]

E XAMPLE WHERE examples use data from table 7 (tblBill). Operator

:

Command : Result

= SELECT * FROM tblBill WHERE billNO=”A001”;

:

TABLE 34: RESULT FROM EQUAL (=) OPERATOR IN WHERE STATEMENT

Operator

:

Command : Result

<> or != SELECT * FROM tblBill WHERE billNO!=”A001”;

:

TABLE 35: RESULT FROM NOT EQUAL (!=) OPERATOR IN WHERE STATEMENT

Operator

:

Command : Result

> SELECT * FROM tblBill WHERE Payment>4000;

:

TABLE 36: RESULT FROM "GREATER THAN" OPERATOR IN WHERE STATEMENT

63


Database Management [436304] Operator

:

Command : Result

< SELECT * FROM tblBill WHERE Payment<4000;

:

TABLE 37: RESULT FROM "LESS THAN" OPERATOR IN WHERE STATEMENT

Operator

:

Command : Result

>= SELECT * FROM tblBill WHERE Payment>=4000;

:

TABLE 38: RESULT FROM "GREATER THAN OR EQUAL" IN WHERE STATEMENT

Operator

:

Command : Result

<= SELECT * FROM tblBill WHERE Payment<=4000;

:

TABLE 39: RESULT FROM "LESS THAN OR EQUAL" OPERATOR IN WHERE STATEMENT

64


Database Management [436304] Operator

:

Command : Result

Between…AND… SELECT * FROM tblBill WHERE Payment BETWEEN 3000 AND 5000;

:

TABLE 40: RESULT FROM "BETWEEN" OPERATOR IN WHERE STATEMENT

Operator

:

Command : Result

LIKE SELECT * FROM tblBill WHERE billNO LIKE “%3”;

:

TABLE 41: RESULT FROM "LIKE" OPERATOR IN WHERE STATEMENT

Operator

:

Command : Result

IN SELECT * FROM tblBill WHERE billNO IN (“A001”,”A002”);

:

TABLE 42: RESULT FROM "IN" OPERATOR IN WHERE STATEMENT

65


Database Management [436304] Operator

:

HAVING

Command :

SELECT * FROM tblBill WHERE Customer HAVING payment%2000=0;

Result

show records which payment can be divided by 2000 without remain.

:

TABLE 43: RESULT FROM "HAVING" OPERATOR IN WHERE STATEMENT

Operator

:

Command : Result

AND SELECT * FROM tblBill WHERE payment >=2000 AND payment <=3000;

:

TABLE 44: RESULT FROM "AND" OPERATOR IN WHERE STATEMENT

Operator

:

Command : Result

OR SELECT * FROM tblBill WHERE payment >=2000 OR payment <=3000;

:

TABLE 45: RESULT FROM "OR" OPERATOR IN WHERE STATEMENT

66


Database Management [436304]

Operator

:

Command : Result

ORDER BY SELECT * FROM tblBill ORDER BY Payment DESC;

:

TABLE 46: RESULT FROM OPERATOR "ORDER BY"

NESTED SELECT SELECT command can perform select into other select result. This is called nested select, also known as Nested Query. It is rather costly for DBMS to perform nested select but it is extremely useful in some cases. It is important that in nested select, user must specify alias or nickname for the inner select. The name must be unique in the query. To create an alias, simply capture the inner select with parentheses () and follow by the name of the select. For example: SELECT firstname, lastname, SUM(payment) AS pay FROM tblCustomer LEFT JOIN tblBill ON customerID=customer GROUP BY customerID; Will produce result shown in table 30.

TABLE 47: 1ST SELECT RESULT FOR NESTED SELECT EXAMPLE (COMPARE TO TABLE 31)

67


Database Management [436304] Then, if we need to query, select, only the customer who never pay a thing, a nested select can be used. The command is as followed. SELECT * FROM ( SELECT firstname, lastname, SUM(payment) AS pay FROM tblCustomer LEFT JOIN tblBill ON customerID=customer GROUP BY customerID) q1 WHERE pay IS NOT NULL; The result is shown in table 31. Note the bold word in the command above. That is an alias given to the inner select, enable it to act like a table which can be selected from.

TABLE 48: RESULT FROM NESTED SELECT

Nested select is a very powerful tool to work with relational database. It can be extremely complex but the instructor recommends students to practice, a lot, with it.

68


Database Management [436304]

SQL FUNCTIONS Database management system, like MySQL, can do more than putting data in and out of a table. There are some arithmetic functions that user can use. These functions provide more functionalities and take some load off the web server. There are two major types of functions provided by standard SQL language. Aggregate functions which are used for calculation over the queried data. Scalar functions deal with string and appearance of the result.

AGGREGATE FUNCTIONS Aggregate functions which are used for calculation over the queried data. For example, a user may query for prices of movies. An AVG() or average function can calculate the average price of all movies. Aggregate functions take all values into a single arithmetic functions. This means the result is always one line. If user wants the function to calculate in group, GROUP BY statement is required. If user want to SELECT some values which aggregate function return specific result, HAVING statement must be used. For GROUP BY and HAVING statement, see handout 11. Aggregate functions are used in SELECT command during column addressing area, right after SELECT keyword. It is treated as a column. The function is called by using function name and an open parentheses. In the parentheses, specify the column to be calculated. The following part use data contained in table 1 as an example.

TABLE 49: DATA FOR AGGREGATE FUNCTION

SUM() – S UMMATION Return the summation of the selected column. Takes all values and sum them together. Usage:

69


Database Management [436304] AVG() – A VERAGE Return the average of the selected column. Takes all values, sum them together and divide them by the number of instances. Usage:

COUNT() – C OUNTING

THE NUMBER OF RESULT

Returns the number of rows from the query. Usage:

MAX() – M AXIMUM Returns the maximum value of the specified column. Usage:

70


Database Management [436304] MIN() – M INIMUM Returns the minimum value of the specified column. Usage:

SCALAR FUNCTIONS Scalar function affect the queried results format. These functions do not group the data into one row of result. They turn all data into a specific format. Some scalar functions are essential for programming with MySQL. This is because MySQL keep date/time data type in “Year/Month/Date” format. Users may want to see the data in “Date/Month/Year” format, or other specific format. Therefore, scalar functions are extremely important. UCASE() – U PPER

CASE

Display data in upper case character. This function, along with LCASE(), is useful when trying to compare two data without considering character case using case-sensitive programming language. Usage:

71


Database Management [436304] LCASE() – L OWER

CASE

Display data in lower case character. This function, along with UCASE(), is useful when trying to compare two data without considering character case using case-sensitive programming language. Usage:

MID() – C UT

OUT PART OF A WORD

This function allows user to cut a portion of a string. The syntax contain the name of the column, the character position which the cut is to be started and the length of the cut. Similar function is provided by most programming language such as substr(). This function is useful when trying to show an acronym of a name such as “Jan” from January. Usage:

72


Database Management [436304] L ENGTH () – C OUNT

CHARACTER

Display character count of a data. A word with 3 characters like “Cat” results in 3. However, it is important to note that a blank data “” has zero length while a null data has no length and will results in null value. Usage:

ROUND() – R OUND

UP NUMBERS

Rounds a numeric field to the number of decimals specified. If the number is from 1.5 or more, the result is the next decimal value which is 2. Anything less than that is rounded down. Usage:

73


Database Management [436304] NOW() – C URRENT

DATE / TIME

Returns the current system date and time in default SQL format. Unlike other functions, now() does not require any parameters. Usage:

FORMAT() –

DECIMAL FORMAT

Return the data in specified format with a comma (,) at thousands and a dot (.) as specified. It is important for displaying monetary data. Usage:

CONCLUSION This handout is the last handout of SQL part of the course. Database system is operated by DBMS. RDBMS is commanded by SQL language. This course introduce SQL from data definition to data manipulation plus some SQL functions. Tools for database administration is also introduced to make the work easier. It is important to know that database system evolved every second. Students should learn about advances in database system on regular basis.

74


Database Management [436304]

DATABASE ADMINISTRATION AND TOOLS Relational database requires configuration and regular maintenance. This handout shows a few common tasks that must be performed to allow optimal operation of relational database system. Also, in this handout, database tools are introduced. These tools allow quicker and easier operation on database system. The first one is a GUI for database design called MySQL Workbench. The second is an opensource database management software called HeidiSQL.

DATABASE ADMINISTRATION Database administration is a task that allow database system to perform at its optimal performance. It also involves managing users that was covered in SQL command CREATE USER. In this handout, three important tasks, indexing, backing up and optimizing, are explained.

INDEXING Index is an important part of a database system. Index allows query to perform faster, although not visible in a small dataset. Index can be added after a table is created. However, if a design is good enough, index should be added on table creation. An example of different performance between indexed and non-indexed table is shown in figure 1, being the upper image is an indexed table while the lower is not.

Indexed

Non - indexed

FIGURE 30: PERFORMANCE COMPARISION BETWEEN INDEXED AND NON-INDEXED TABLE

75


Database Management [436304] Adding index is very easy. Simply input SQL command ALTER TABLE with “ADD INDEX” followed by the name of column to be indexed in parentheses. For example, making index for “first_name” column in “actor” table is done by this command: ALTER TABLE `actor` ADD INDEX (`first_name`);

Indexing columns can be done by GUI tools too. This will be demonstrated in the later part of this handout.

BACK UP Back up is an extremely important task. It is so important that sometimes it becomes the only task for database administrator. A whole day may be dedicated to backing up data and system. We can never know when a failure may happen, or when a fire may be started inside a server room. All we can do is prepare for the worst, this is why we must backup regularly. If you are working with programmers whose database backgrounds are different, it is a wise choice to back up the database often. Those who are not familiar with DBMS may “accidentally” drop the entire database or a few tables. Even dropping a few records might cause serious injury to the system, say it were the records about administrator username. There are many ways to back up a database. In this part, the task is performed via command line. Unlike other operation which we perform through MySQL console, back up is performed by a specific program called mysqldump.exe inside MySQL bin folder. To use the program, simply access the /bin/ folder inside your MySQL folder. In there, run mysqldump.exe with these options: mysqldump.exe [database name] > [file name] –u [username] –p [password]

A screenshot showing an example is shown in figure 2.

FIGURE 31: SUCCESSFULLY CREATED BACKUP FILE SAKILA_BCK.SQL

76


Database Management [436304] You might notice that mysqldump.exe help can be accessed with a command “mysqldump.exe –help”. This help allows you to use more options and be able to backup with certain configuration. Note that successful back up does not produce any feedback via command line. No feedback means backup is successful. Any feedback means error occurs. Ok, so you have backed up your database and then CRASH! A dinosaur dropped from the sky, ricochet with a meteor from the Mar, lands right on your database server. After a long time handling the repair, your database server is up again and you have to restore the backup file. To restore, simply run mysql.exe from command line with these option: mysql.exe –u [username] [database name] < [backup file name]

Similar to mysqldump.exe, successful command does not produce any feedback as shown in figure 3.

FIGURE 32: SUCCESSFULLY RESTORED DATABASE

OPTIMIZATION Database optimization is a very large topic. It is also vary from DBMS to DBMS. MySQL optimization and MSSQL optimization is different. Most of the time, optimizer must be familiar with that particular DBMS and the database system running in it. Optimization can save a lot of money because the database becomes more efficient. It uses less resources to work. The machine does not have to be upgraded and no transition from old machine to the new machine. Some tasks also require optimization or the system cannot perform at all.

FIGURE 33: WAMP MY.INI (MYSQL CONFIG FILE)

77


Database Management [436304] In this part, optimization is performed only with a configuration file my.ini. More optimization is covered on the later part where tools are used. Some more information for optimization can be studied from DBMS support website. MySQL configuration file is my.ini. This file can be found in MySQL directory. For WAMP server, this file can be accessed via mysql service panel as shown in figure 4. W HAT

TO LOOK AT ?

Settings in my.ini will not appear after #. The # lines are comments which are information about the related settings. Some settings that must be changed can be temporarily “Commented out” by adding # sign before the setting itself. This technique is used in the password part of WAMP’s my.ini. To optimize your database, changes can be made to buffer related settings. Buffer is a part of a program that store temporary data before it is processed. Too small buffer can slow down the system and might cause buffer overflowing error. However, to have a large buffer, more physical and logical memories are required. In other word, you might need a better, more expensive, machine to run the server. The second important setting is cache. Cache is a small temporary data storage that allows program to run faster. Unlike buffer which is required by the system, cache does not have to exist. Without cache, database system performs slower, but it can perform. To have more cache, the machine need more RAM. Other settings that can be modified are flags. These flags tell the system which options and processes are to be allowed on the server. For example, “quick” setting in [mysqldump] tells mysqldump.exe to perform in “quick mode”. Lastly, some settings post limitation to server’s functionalities. For example, max_allowed_packet setting specify the largest file allowed to be uploaded to the server. A larger file is to be refused. This limit users from uploading some files that may stop them on their track. This is good for the system as users cannot use the database for not-designed-for purposes. However, administrator must make sure to set this value to match business needs of the system.

78


Database Management [436304]

TOOLS MYSQL WORKBENCH MySQL workbench, as the name suggests, is a GUI tool for MySQL. The program is extremely useful for various task. However, this handout cover two of the main functionalities of MySQL workbench. They are design tools and capable of reverse engineering. Students can download MySQL workbench from http://dev.mysql.com/downloads/tools/workbench/. More information about the tool can be found on the same website, also linked via QR code in figure 5. D ESIGN

FIGURE 34: LINK TO MYSQL WORKBENCH

MySQL workbench provide visual workspace for database designer. This visual workspace allows designer to work with database schema in entity relationship diagram (ER diagram). This workspace can be accessed via data modeling menu. The workspace is shown in figure 6. Tool bar

Design Area

Mini map Design tools

Object tree

Object description

FIGURE 35: SCREENSHOT OF MYSQL WORKBENCH WITH EXPLAINATION IN BUBBLES

Tool bar is a place where user can use various tool to perform all MySQL workbench tasks. This is not limited to the work being opened by the program. For example, the

79


Database Management [436304] file menu on the tool bar allows user to perform file related tasks like opening a new file or saving the current workspace to a file. The design area is where use can draw their ER diagram and make relationship. However, before a user can draw a diagram, user must create tables those going to be it the diagram. This is covered in the following example, after all parts of the workspace is explained. Mini map allows user to navigate through a large diagram with ease. User can use scroll bar to get across large diagram or simply click on mini map to instantly move to the area. The map can zoom-in and zoom-out to allow better view of the overall design. Design tools are placed in a vertical toolbar just the left of the design area. These tools are used to create notions and relationships on the design area. Object tree contains all objects inside the selected database. Tables can be selected or dragged from here to the design area. This menu is organized in tree structure with a minus (-) or plus (+) sign on the left of every icon. The + sign allows user to open the object to see its children data such as fields in a table. Lastly, the object description panel on the lower left side of the screen. This panel allows user to specify object’s attributes such as adding notes for the other designer or change the color of a table on the design screen. These changes affect the design outlook but not the technical attribute of the schema. D ESIGN :

EXAMPLE

STEP BY STEP

1. After open MySQL workbench, click “Create New EER Model” under data modeling column as shown in figure 7. If you want to continue your work with an existing ERD, you can click on “Open existing EER model” Note that the name of the model is not ER but EER. EER stands for EnhancedER. It is an improved version of ERD. Basically, it contains more information of conceptual design to the details of the design of a database schema. However, most of it is similar to ERD. So, if you are proficient in ERD, you will not have any problem using it.

80


Database Management [436304]

Click here to open existing ERD Click here to create new ERD

FIGURE 36: CREATING NEW DATA MODEL

2. A new screen should appear as shown in figure 8. In this screen, you specify your tables and queries. Some objects, such as routines and groups, can be created here too.

Change DB name

Create table Create view Manage user’s privilege Write SQL script

FIGURE 37: CREATE DATABASE WORKSPACE

81


Database Management [436304] Note that there are sub menu such as privilege tab is being minimized. These tabs can be opened and user can perform related task. 3. First, we should rename our database. Double click the database name on the top to open configuration tab as shown in figure 9.

FIGURE 38: RENAME THE DATABASE

Change the name of the database as needed. Collation can also be configured. As for keeping Thai alphabets, we leave it UTF8 with utf8_general_ci. After you finish setting, click on the X icon on the right of the panel (next to EXAMPLE – Schema” shown in figure 9) to close the panel. 4. Double click “Add table” to add a new table and open table configuration tab as shown in figure 10. To add a column or a field, simply click, to make active and, type into the column area. By clicking on the column area, a new column appears. Double clicking it allows user to modify the setting. User must configure an appropriate data type of the column. Column characteristic must be configured by the following checkboxes. The meaning of all acronyms are explained in table 1. SYMBOL PK NN MEANING Primary key Not null

UQ Unique

BIN Binary

UN Unsigned data type

TABLE 50: COLUMN CHARACTERISTIC, ACRONYM AND MEANING

ZF AI Zero if Auto numeric increment

82


Database Management [436304]

FIGURE 39: ADD TABLE

If a table has more configuration to be done, click on the related menu at the bottom of the table configuration panel such as Indexes for indexing or Foreign Keys to specify foreign keys. 5. After you add more tables, you can put them on ERD by opening the ERD drawing area. Double clicking “Add diagram” on the top of the workspace to open drawing area. After that, table can be dragged, from object tree, to the drawing area. Relationships can also be set by the tool from design toolbar. If a new table is required, user can go back to create table on the previous panel or use “new table” from the design toolbar to instantly create a new table. To configure object, simply double click on it. An example is shown in figure 11. User can double click on an object to set related specification such as referencing column in relation.

Exercise: Try to create something Try to create a database to contain book information which: - A book has many authors - A book has one publisher - A book has one title - Publisher has one address - Author has one address Add more information to the tables as needed. If you want more, try to create your database from midterm exam using MySQL workbench.

83


Database Management [436304]

New table These tools create relationships with new columns

Use this tool to create relationship from existing column

FIGURE 40: USING TOOLS TO CREATE RELATIONSHIP ON DRAWING SPACE

6. After a design is finished, user can save the workspace by going to File > Save Model. This will preserve the workspace. However, if a user wants to have a database exactly as it has been designed on the workspace, export menu is used. To export the design to a SQL file which can be imported to a DBMS to create a database identical to the design, click on File > Export > and choose the appropriate option. If this design is a new database, choose “Forward engineer SQL CREATE script”. A new dialog shown in figure 12 should popped up. In the dialog, user specify export option such as file location or whether the script should perform DROP command before CREATE command. The second dialog allows user to specify what to export. For example, user may export only tables or only queries or all objects. The last dialog shows the resulting script that the program generates. The script is also saved to the specified file location. To use this script, simply perform restoration using the exported script. User may not export the script but use MySQL connection to forward engineer the design directly into the database. This is done by using Database > Forward Engineer… and follow the instruction appeared in the dialog.

84


Database Management [436304]

FIGURE 41: EXPORT DESIGN TO SQL SCRIPT

This is the basic of designing a database schema using MySQL workbench. There are many more functionalities which MySQL workbench provides. It is highly recommended for the students to try dive-in to learn the program because self-learning allow students to use newly available programs in their future works. R EVERSE

ENGINEERING

MySQL workbench allows user to reverse engineer a database. Reverse engineering is a process that create ERD from an existing database. To do this, open MySQL workbench and click “Create EER model from existing database� as shown in figure 13. If a backup script is provided, user may perform reverse engineering on the SQL script as well.

FIGURE 42: MENU FOR REVERSE ENGINEERING

85


Database Management [436304] After choosing “Create EER model from existing database”, user is asked for database connection parameters. Specify them toward the database that needed reverse engineering and click “Next”. If the connection parameters are correct, list of available databases should appear as shown in figure 14.

FIGURE 43: LIST OF AVAILABLE DATABASE

User then choose which database is needed to be reverse engineered. User then perform “Next algorithm”8 to finish the process. The result should be a drawing space with all objects in the database on it without being aligned as shown in figure 15. Aligning those objects is for the user to do.

8

Next Algorithm is to press Next button until the whole process finishes.

86


Database Management [436304]

FIGURE 44: FINISHED REVERSE ENGINEERED DIAGRAM (SEMI-ALIGNED)

PHPMYADMIN PHPmyadmin is a web program that allows user to use database more easily. It creates shortcut for various tasks such as creating tables and adding indexes. There are many programs similar to PHPmyadmin. However, PHPmyadmin is a free opensource software and it comes with WAMP server so we do not have to install it separately. More information, includes a demo and download links, can be found at http://www.phpmyadmin.net/. B ASIC

OPERATIONS

FIGURE 45: LINK TO HEIDISQL To run PHPmyadmin by default setup parameters, open a web browser and enter “localhost/phpmyadmin” into the address bar. If WAMP server is being used, there would be no login screen. If you are using PHPmyadmin for production, however, please do create username and password accordingly. Do not let everyone use root user, people make mistakes.

Once PHPmyadmin is opened, the first screen is a homepage as shown in figure 17. In this screen, the left column shows information of available database in the DBMS. User can click on the name of the database. Doing so results in a similar result of running “use database”. Menu bar on the top center to top right allows user to perform various

87


Database Management [436304] database management tasks and access database information. For example, if user wants to execute a SQL command, simply click SQL tab and code as shown in figure 18.

FIGURE 46: PHPMYADMIN HOMEPAGE

FIGURE 47: PHPMYADMIN PAGE FOR SQL CODING

SQL code in textbox shown in Figure 47 is executed when the user press “Go� button on the lower right of the screen. Several commands can be specified at once, separated by chosen delimiter (by default, ;). This is more convenient than using mysql console because it allows user to fix misspelling, plus color code in textbox helps identify syntax errors too.

88


Database Management [436304] Command result + time spend

Query result

Available actions

FIGURE 48: RESULT OF SQL SCRIPT

Figure 48 shows a result page from running an SQL script. The top part of the page shows command result and time it takes to perform that command. If there is an error, error message is shown in a red box. On the right side, there are several buttons. The most used is edit button, which allows user to edit the previous command. Lastly, query result, if any, is shown in the lower part of the page. The buttons shown in the left side of query result allows user to modify the data of that specific record. The next important tab is structure tab. This tab allows user to view and modify structures of database objects such as tables. For example, an index can be added from structure tab by clicking at “Index� button of the indexing column. This performs ALTER command automatically. The result of is shown in the result area as in Figure 49.

FIGURE 49: ADDING INDEX

89


Database Management [436304] B ACKUP & R ESTORE To back up a database, select the desired database and click on “Export tab�. This tab allows user to backup the active object, database or table. User can choose the backup format to be in SQL code or others such as CSV file or even PDF. Several options can be specified by choosing custom backup instead of quick backup. To restore, simply click import tab and select the backup file using browse button. After the file is selected, click go button at the bottom of the page. However, user cannot restore file larger than limited file size which is shown next to browse button. To fix problem with too large file, user must modify PHP.ini which is a configuration file of PHP system. Look for this part: ; Maximum allowed size for uploaded files. ; http://php.net/upload-max-filesize upload_max_filesize = 8M

User must modify the number of upload_max_filesize to cover that file. This will allows PHP to take more resources from the server machine. After modifying the value, web server may need to be restarted once, for the change to take effect. O PTIMIZATION Optimization can be perform from operations tab. In operation tab, user can change table structure and options. The lower part of the page allows user to check table for problems, defragment the table for more space, optimize table for better performance and flush table to reload table caches. The option is shown in Figure 50.

FIGURE 50: PHPMYADMIN AVAILABLE TABLE OPERATION

CONCLUSION This handout explains some important tasks of database administration. These tasks are indexing, backing up and optimizing the database. Tools are MySQL workbench and PHPmyadmin. Both are free to use and compatible with MySQL database. They provide users with GUI and make working with database easier than using command line.

90


Database Management [436304]

BIBLIOGRAPHY Hauman, G. (2009, March 11). The Manga Guide To Databases. Retrieved from Comicmix: http://www.comicmix.com/news/2009/03/11/the-manga-guide-todatabases/ Kifer, M., Bernstein, A., & Lewis, P. M. (2006). Database systems : an applicationoriented approach. Boston: Pearson/Addison-Wesley. Oracle Corporation. (2013, July 15). 11 Data Types. Retrieved from MySQL 5.7 Reference Manual: http://dev.mysql.com/doc/refman/5.7/en/data-types.html Oracle Corporation. (2013, July 15). MySQL 5.7 Reference Manual. Retrieved from MySQL 5.7 Reference Manual: http://dev.mysql.com/doc/refman/5.7/en/index.html Preez, R. d. (2012). Bad Database Design ERD. Retrieved from Creately: http://creately.com/diagram/example/gz2j13w82/Bad+Database+Design+ERD Rob, P., & Coronel, C. (2007). Database systems : design, implementation, and management. Boston: Thomson Course Technology. Takahashi, M., & Azuma, S. (2009). Manga Guide to Databases. San Francisco: No Starch Press. Whitemarsh Information Systems Corporation. (2008). Modeling Data and Designing Databases. Retrieved from Whitemarsh Short Papers: http://wiscorp.com/sp/sp06.pdf

จิวะรังสิ นี, ศ., & สังข์เจริ ญ, ข. (2551). คู่มือระบบฐานข้ อมูล Oracle 11g ขัน้ พืน้ ฐาน. กรุ งเทพฯ: บริ ษทั เทรนลิสต์ จำกัด.

91


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.