SpagoBI 4.0 Baby Steps. A step by step guide.
Stephen Ogutu 1|SpagoBI 4.0 Baby Steps. The Ogutu Foundation, www.ogutu.org
Copyright Š 2012 by Stephen Ogutu All rights reserved, including the right to reproduce this book or portions thereof in any form whatsoever. For information, address: Stephen Ogutu, P.O.Box 8031-00200 Nairobi Kenya.
Trademarks: All other trademarks are the property of their respective owners. Stephen Ogutu is not associated with any product or vendor mentioned in this book. Limit of Liability/Disclaimer of Warranty:While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties or merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss or profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
2|SpagoBI 4.0 Baby Steps. The Ogutu Foundation, www.ogutu.org
Dedication This book is dedicated to all fathers who go back home to their children every evening. God bless you for bringing up a responsible generation and being there for your wives. Acknowledgments Special thanks to the SpagoBI community and the ow2 consortium. Thank you all for creating a great product and documenting it effectively.
3|SpagoBI 4.0 Baby Steps. The Ogutu Foundation, www.ogutu.org
Table of Contents Introduction. ................................................................................................................................................. 6 Business Intelligence with SpagoBI 4.0 ..................................................................................................... 8 Business Intelligence. ............................................................................................................................ 9 Introduction to SpagoBI 4.0 ................................................................................................................ 10 Configuring SpagoBI 4.0 ...................................................................................................................... 11 SAMPLE 1 - CHARTS ............................................................................................................................ 13 SAMPLE 2 - OLAP Documents ............................................................................................................. 16 SAMPLE 4 - COCKPITS.......................................................................................................................... 19 SAMPLE 5 – LOCATION INTELLIGENCE DOCUMENT ........................................................................... 20 SAMPLE 6 – MONITORING CONSOLE .................................................................................................. 21 SAMPLE 7 – NETWORK ANALYSIS REPORT ......................................................................................... 22 OLAP with JPIVOT.................................................................................................................................... 25 Online Analytical Processing ............................................................................................................... 26 OLAP Cube........................................................................................................................................... 28 Your first OLAP document in SpagoBI 4.0 ........................................................................................... 28 Location Intelligence ............................................................................................................................... 59 What is Location Intelligence? ................................................................................................................ 60 The Problem. ....................................................................................................................................... 61 Solution. .................................................................................................................................................. 61 STEPS TO CREATE AN SVG MAP WITH INKSCAPE. .............................................................................. 61 Geo Template.......................................................................................................................................... 73 Uploading the Geo Template. ................................................................................................................. 78 Highchart Dashboards............................................................................................................................. 82 Business Intelligence dashboards ....................................................................................................... 83 Method 2: Use the SpagoBI Studio. ........................................................................................................ 98 Introduction to BIRT Reports ................................................................................................................ 107 BIRT ................................................................................................................................................... 108 Creating a dashboard with BIRT and SpagoBI ....................................................................................... 126 Creating a dashboard with BIRT and SpagoBI ................................................................................... 127 Cross Navigation with SpagoBI ............................................................................................................. 154
4|SpagoBI 4.0 Baby Steps. The Ogutu Foundation, www.ogutu.org
Introduction. ..................................................................................................................................... 155 The Master Document. ..................................................................................................................... 156 CHILD REPORT. .................................................................................................................................. 176 Uploading Child Report to SpagoBI server........................................................................................ 180 JfreeChart, Highcharts and Cockpits ..................................................................................................... 191 Introduction. ............................................................................................................................................. 192 JfreeChart - Speedometer dial chart. ................................................................................................ 194 Datamining with SpagoBI and Weka..................................................................................................... 203 Introduction. ..................................................................................................................................... 204 Loading the data. .................................................................................................................................. 205 SQL*Loader ........................................................................................................................................... 207 WEKA................................................................................................................................................. 211 Cluster analysis ..................................................................................................................................... 211 Downloading Weka. .......................................................................................................................... 211 JDBC Driver........................................................................................................................................ 212 A simple analysis with weka.............................................................................................................. 215 Understanding the output. ............................................................................................................... 218 K-Means Algorithm ........................................................................................................................... 222 Enter SpagoBI .................................................................................................................................... 227 SpagoBI datamining document. ........................................................................................................ 236 Qbe Document. ................................................................................................................................. 244
5|SpagoBI 4.0 Baby Steps. The Ogutu Foundation, www.ogutu.org
Introduction. How much information you learnt in school do you actually use in your daily life? Not so much I guess. Most books contain hundreds upon hundreds of pages and the reader gets lost in so much detail they just give up. In writing this book, I have decided to cover only the important subjects in SpagoBI so that at the end of it you will be productive in this technology. You can then use the online documentation available on the SpagoBI website to continue you journey in this beautiful piece of software. This book have been written to the non-technical user who just wants to download and start using SpagoBI immediately for his Business Intelligence assignment or to provide better reports for his company.
6|SpagoBI 4.0 Baby Steps. The Ogutu Foundation, www.ogutu.org
A journey of a thousand miles begins with a single step – Laozi (604 BC – 531 BC). We have decided to call this book baby steps because it is really a step by step guide for someone who is just beginning his or her thousand mile journey in SpagoBI and Business Intelligence. This book is not meant for the expert. Enough talk already; let us begin by taking our first step on the next page.
7|SpagoBI 4.0 Baby Steps. The Ogutu Foundation, www.ogutu.org
LESSON ONE
Business Intelligence with SpagoBI 4.0
OBJECTIVES After completing this chapter, you should be able to: 1. Describe the importance of Business Intelligence to an Organization. 2. Download, install and configure SpagoBI 4.0. 3. Login to SpagoBI 4.0 and describe the various components.
CONTENTS Business Intelligence with SpagoBI • • • •
Introduction to Business Intelligence. Introduction to SpagoBI. Configuring SpagoBI. SpagoBI components.
8|SpagoBI 4.0 Baby Steps. The Ogutu Foundation, www.ogutu.org
Business Intelligence. The world today generates terabytes of data from many sources. Millions of tweets, Facebook updates and emails are sent each day. Millions of transactions occur in the financial sector every hour. How do we make use of this vast amount of data to be of benefit to the business? How do you present the information to the management in an easy to use format? For decades, many IT managers have relied on the big software corporations to provide them with the business intelligence software that can help with the task of generating and presenting the required reports. Fast forward to this last decade, it is not only the big businesses that generate vast amounts of data. A web startup might require business intelligence software to make sense of customers that visit their web site. Traditional business intelligence software does not come cheap and before the advent of the open source software, it was almost impossible for businesses with small budgets to afford business intelligence software. Thankfully, the open source community has produced wonderful business intelligence software which is easy to use and ready for production. SpagoBI is a business intelligence software from Italy which is licensed using the GNU GPL and supports all the fields of business intelligence such as OLAP, dash boards, reports and charts. According to the online encyclopedia Wikipedia, Business Intelligence refers to computer-based techniques used in identifying, extracting and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes. Business Intelligence technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining and predictive analytics. Business Intelligence systems are very important for decision making and sometimes they are referred to as decision support systems. To put this in perspective, assume you are the CEO of a large bank. Which kind of information would you like to know about your bank at any given time? You might want to know how many account holders are in a certain age group, for example so as to make a decision on which products best suits them or you may want to know how much you spend to acquire a customer. Naturally, you always want to know what your competitor is doing. This is in a branch of Business Intelligence called Competitive Intelligence.
9|SpagoBI 4.0 Baby Steps. The Ogutu Foundation, www.ogutu.org
Introduction to SpagoBI 4.0 SpagoBI is an open source business intelligence suite. It consists of several engines and analytical areas. The engines includes SpagoBIBirtReportEngine, SpagoBIJPivotEngine and SpagoBIHighChartsEngine. In total there are around 21 engines and the complete list can be found at http://www.spagoworld.org/xwiki/bin/view/SpagoBI/AnalyticalEngines. As at writing this book, the latest SpagoBI is version 4.0 and most of the steps outlined in this book are for this version. There are other examples that are based on the previous version 3.6 and you will be notified when we give such examples. Downloading and installing SpagoBI 4.0 Download SpagoBI 4.0 at the URL http://forge.ow2.org/project/showfiles.php?group_id=204. There are two components that we will need for this course: 1. SpagoBIServer - This is the actual business intelligence platform that offers all the core and analytical functionalities. It is also where we will be hosting all reports created using BIRT. Click on All-In-One-SpagoBI-4.0-09072013.zip to download the SpagoBI Server as illustrated below.
10 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
2. SpagoBI Studio We will need the SpagoBI studio to create BIRT Reports, OLAP documents, charts, location intelligence documents and cockpits. BIRT is an eclipse based business intelligence and reporting tool and the acronyms stand for Business Intelligence and Reporting Tool. Download SpagoBI Studio by clicking on SpagoBIStudio_4.0.0_win_32_09072013.zip as illustrated below. If you are using Linux or a 64 bit version of windows, select appropriately.
In addition to the two pieces of software above, you will need to install java development kit and ensure that the java bin directory is in your computers PATH variable. Also ensure you have setup JAVA_HOME environment variable in your environment variable. I recommend you use Java 1.6 for SpagoBI 4.0 as the most up-to-date version of Java, 1.7 does have an issue with SpagoBI 4.0 particularly when uploading documents to the server from the studio. Test that your java is setup correctly by running the command “java –version”.
Configuring SpagoBI 4.0 I downloaded and kept all my software on the folder F:\BI so the full path to my SpagoBI server is F:\BI\All-In-One-SpagoBI-4.0. As you can see from the path above, I am using version 4.0 but you should download the latest version if it is available. Setup the environment variable CATALINA_HOME to point to F:\BI\All-In-One-SpagoBI-4.0 as shown below.
11 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Navigate to F:\BI\All-In-One-SpagoBI-4.0\bin and double click on the file startup.bat to start SpagoBI server.
From the startup output, we can see that SpagoBI uses the tomcat server as a default (see arrow A) and therefore you can easily change the IP address of the server and the port. By default it starts on port 8080, see arrow B. If port 8080 is already being used by another software, you can change the port number in the Tomcat configuration file in the location F:\BI\All-In-One-SpagoBI-4.0\conf\server.xml. You will know that the startup is finished when you see a line similar to the line in arrow C which gives you the amount of time it has taken to start.
12 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Note: You might get the error â&#x20AC;&#x153;SEVERE: Catalina.start LifecycleException: Protocol handler initialization failed: java.net.BindExcept ion: Address already in use: JVM_Bind <null>:8080â&#x20AC;? if the port is already used by another server. Once the server is up, navigate to the URL http://localhost:8080/SpagoBI and login using the user biadmin and password biadmin. Note: By default, there are various other users e.g bitest, bimodel, bidev, biuser with password being the same as the username but we will ignore these other users at this point. Now that we have logged in into SpagoBI, we can test a few documents that come packaged with SpagoBI before we start creating our own. SAMPLE 1 - CHARTS Follow the steps below to open a sample chart document: 1. Login to SpagoBI as user biadmin and password biadmin.
13 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
2. Navigate to Documents development.
Select Analytical documents, then chart.
14 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Click on the first chart “Double Pie”. You will get the error below.
This is because though the SpagoBI server is up, the database where the sample reports get their data is down and so you have to start it. To start the database use the script F:\BI\AllIn-One-SpagoBI-4.0\database\start.bat
Once the script starts, repeat the procedure of opening the “Double Pie” report.
15 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
This should now open the chart. SpagoBI has various charts engines. This chart have been created by an engine called HighChartEngine.
SAMPLE 2 - OLAP Documents SpagoBI 4.0 also ships with sample OLAP documents. OLAP is an acronym for Online Analytical Processing and is used to view data using different dimensions or views. To open a sample OLAP document, under Olap folder, select â&#x20AC;&#x153;Inventory Cubeâ&#x20AC;?.
16 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Olap allows you to view data in various dimensions like in the example above; you can view information on drinks in so many ways by just collapsing the product and Region as shown below.
17 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
For example, we can see that there were 4,008 units of drinks ordered in Mexico City. This allows you to view a very large amount of information easily by slicing and dicing! We will learn how to create our own OLAP objects using a step by step example in this book. SAMPLE 3 - BIRT DOCUMENT Under the Reports folder, select DOC_RPT_002 report. This is an example of a BIRT report. BIRT (Business Intelligence Reporting Tool) is an open source tool used to create reports and charts and is similar to the Crystal reports designer tool. For Product department, select Beverages and click on Confirm. For age, select 50-60 and then click on the Execute document button.
The BIRT document will be opened.
18 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
This will display a BIRT document that has an embedded chart. BIRT is a very powerful tool which we will be making use of extensively in this course. SAMPLE 4 - COCKPITS Next we will look at Cockpits. Under the Cockpits folder, select “Sales and Revenue” document.
19 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
This will open the cockpit document “Sales and Revenue”. A cockpit document is made up of several other documents, like in this case; five individual documents are combined to create a sales cockpit.
SAMPLE 5 – LOCATION INTELLIGENCE DOCUMENT Under the Maps folder, select “Sales Cost by Region”.
This opens a report created by the Geo engine that allows you to select a location you want to analyze. Such documents are sometimes called Location Intelligence documents.
20 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
We shall be creating our own location intelligence documents in this book. SAMPLE 6 – MONITORING CONSOLE Under “Monitoring Console”, Select “Engine Monitor”.
This report will show you what is happening in your environment like how many users you have, the load in your system e.t.c.
21 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
SAMPLE 7 – NETWORK ANALYSIS REPORT Next we will look at a sample Network analysis report. Click on “Network Analysis” folder and select DOC_NTW_001.
This will open a report that shows the relationship between product brands and customer profiles (yearly income based).
22 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Lastly, let us look at a sample KPI report. Key performance indicator (KPI) reports are used to measure performance. Under KPI Model folder, select â&#x20AC;&#x153;KPI Modelâ&#x20AC;?.
This will open a KPI report.
23 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
We should now have a good overview of the kind of documents available in SpagoBI 4.0. Note that SpagoBI contains many other types of documents which we have not covered here. Now we will proceed to create our own documents.
24 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
LESSON TWO OLAP with JPIVOT
OBJECTIVES After completing this chapter, you should be able to: 1. Describe the importance of OLAP to your business. 2. Use JPIVOT. 3. Use Mondrian cubes.
CONTENTS OLAP with JPIVOT • • • •
Introduction to Online Analytical Processing (OLAP). OLAP cubes. Star and Mondrian Schema. Step by step example.
25 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Online Analytical Processing
Prepare the database. The data for this assignment is in the table server_uptime. If you have not already done so, load the mysql dump that came with the CD when you purchased this book. The dump is under the folder SapgoBI4/database/bidb.sql. You can write the author at xogutu@gmail.com or info@ogutu.org for a copy incase you have misplaced your book CD or if you did not purchase this book. (Community Edition)
Online Analytical Processing (OLAP) enables one to analyze different dimensions of multidimensional data. It enables one to analyze data from different perspectives. Consider sales data as an example. One might be interested in analyzing sales data in terms of the date when the sale occurred, the region the sales occurred and the store the sales occurred . The sales amount we are analyzing is called a measure. The way we analyze the measure (sales amount) is called a dimension. Therefore the sales date is one dimension of looking at the sales data; the store where the sales occurred is another dimension of looking at the sales data. We can therefore look at the sales data by date, by store e.t.c. We would like to demonstrate this in SpagoBI using simple data with one dimension and several measures. Below is the problem description: Shemma Global is a Business intelligence company that specializes in data mining and analysis. They would like to view the memory usage of one of their servers by event time.
Olap 1: Server uptime data.
26 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
From the diagram Olap 1 above, which you can get by performing the query â&#x20AC;&#x153;select * from v_dbuptimeâ&#x20AC;?, we are only interested in two columns, the event_date and used_memory (highlighted in blue). The event_date is our dimension (how we would like to view the data) and the used_memory is the data we would like to view (measure). This kind of table is called a fact table. Normally dimension data like date, sales region e.tc are not stored in a fact table but in a dimension table. A foreign key is then included in the fact table to link the two . Consider a telecommunication company as another example. We would make the fact table the central table in our schema surrounded by dimension tables as shown below.
Table 3: Dimension Table (Customer)
Table 4: Dimension Table (Product)
Region. Age.
Prepaid. Post Paid.
Table 1: The fact table Activations. Deletions. Churn. Consumed Airtime.
Table 2: Dimension Table (Time) Year Quarter Month
Olap 2: Star Schema
27 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
In the above example, it is easy to answer questions like how many mobile phone subscribers were activated in the last quarter or how many subscribers are post paid or prepaid. The above schema is called a star schema. If we were to build an OLAP document for this assignment, we would build a single fact table (Table 1) and link it to the dimension tables using foreign keys. For the purpose of demonstrating the server problem, we will keep the measure and dimension in a single fact table. OLAP Cube An OLAP cube is a collection of measures (facts) and dimensions. In the telecommunication example above, we can create a cube which can answer questions like how many subscribers were activated on a certain year, certain quarter or certain month, or how much airtime was consumed by customers from Nairobi region or how many subscribers are in pre-paid. Your first OLAP document in SpagoBI 4.0 To create a cube in SpagoBI, we will be using the SpagoBIJPivotEngine which comes embedded with your SpagoBI server. The cube will be created using SpagoBI studio which will automatically generate for us the xml schema file. This will be a simple cube based on data from the diagram Olap 1 which shows the average amount of used memory by day. The average used memory here is the measure or fact while the day is the dimension. So our cube only has one measure, the average amount of memory used on any given day. Here are the steps. 1. Under F:\BI\SpagoBIStudio_4.0_win32 double click on SpagoBI.exe. This will start SpagoBI studio.
28 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
2. Click File -> New -> Project.
3. Select “SpagoBI Project”
29 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
4. For Project name, enter “OLAP”
30 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
5. Click on Finish. It will ask you if you want to associate the project with “SpagoBI perspective”, select “Yes”.
6. Under "Data Source Explorer", right click on “Database Connections" and select “New”.
31 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
7. For the “Connection Profile”, select “MySQL” and for the name use “BIDB”.
32 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
8. Under “New Connection Profile”, select “New driver definition”.
9. Under the Name/Type Tab, Select system version 5.0 and for the Driver name enter “MySQL JDBC Driver 5.0”
33 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
10. Under JAR List tab, click on remove JAR/ZIP. This will remove any previously defined jar file.
11. Next click on Add JAR/Zip and select the mysql-connector-java-5.0.8-bin.jar file under F:\BI\All-In-One-SpagoBI-4.0\lib and click on OK. This will add the mysql jdbc driver 34 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
needed to access the mysql database. Without this jdbc driver, SpagoBI studio will not be able to communicate with the mysql database and therefore we will not be able to create any report.
12. For database enter BIDB. Remember this is the name of the mysql database that comes with the CD included in this book. For URL use jdbc:mysql://localhost:3306/BIDB . Enter username and password for your database. Click on Save password.
35 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
13. Click on Test Connection then Finish.
14. Right click on Business Models and select New Model.
36 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
15. For the Model name use UptimeModel and the file name use Uptime.sbimodel. Make sure business Models folder is selected.
37 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
16. Select the BIDB connection we had created previously.
17. For the tables to import to your physical model, select server_uptime.
38 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
18. For tables to import to your business model, select server_uptime.
19. Click on Finish. Right Click on the business model and click on show properties view.
39 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
20. Change the name to Server Uptime.
40 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
21. Change the type to cube.
22. Change Swap usage to be a measure.
41 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
23. Do the same for processes, available memory and used memory. Notice how the icon changes.
24. Now that we have measures, let us add a dimension. Remember a dimension helps us look at the data in various ways. For example, we would like to look at the data by week day. To create a Dimension, we will add a new business class. Right click on business model, select Edit then Add business class.
42 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
25. For physical table, select server_uptime.
26. For the attributes, select weekday and id. We will use the column id to link with the cube table.
43 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
27. For name, enter Analysis Period and click on Finish.
28. Change the type of the Analysis Period to dimension.
44 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
29. Right click on Analysis Period, select Edit and select Edit Hierarchies.
45 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
30. Drag Weekday to Hierarchy.
31. Lastly let us link the cube and the dimension.
46 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
32. The relationship name can be anything but for uniformity use Uptime_Period. Source business class should be Server Uptime and Target business class Analysis Period. Source attribute select ID and Target attribute select ID. Then click on Add relationship and Finish.
33. Now it is time to create the Mondrian Template which is an xml file that contains all the things we have been doing. It is this xml file that will be uploaded to the SpagoBI server.
47 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Right click on Business Model, select Create and click on Mondrian Template. Select OLAP Templates folder and for the file name enter Uptime. Click on Finish.
34. Now we need to upload the Mondrian Template to SpagoBI server so this would be a good time to start your SpagoBI server.
48 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
35. Login to SpagoBI as use biadmin with password biadmin and create a datasource that we will use to connect to our OLAP document. Under Resources select Data source.
36. Click on Add Button and for the Label use BIDB. For the dialect select MySQL, Type JDBC, URL jdbc:mysql://localhost:3306/BIDB and driver com.mysql.jdbc.Driver and then click on the Test button.
49 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
37. You should get a success.
38. Now that we have our Data source created and tested, lets go back to SpagoBI studio and create a link to SpagoBI server so that we can upload the Mondrian Template we created on the Studio to the server. Under resources, right click on server and select New Server.
50 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
39. For server name enter SPAGO4. For URL enter http://localhost:8080/SpagoBI and username biadmin. The password is same as username by default and click on active. Click on Test. It should be a success. Click on Finish.
51 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
40. Now to upload the template we created to SpagoBI server, Right click on it and select deploy OLAP template.
41. For the name, enter ServerUptime. For datasource select BIDB and click on Finish.
52 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
42. You should get the screen below if all went well.
43. Now under document browser, select biadmin folder and click on the server uptime document.
53 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
44. Then select Open Olap Navigator. The small left most button.
45. Click on Measures and make sure all measures are selected. Click on OK. Then Ok again.
54 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
46. Now let us use the OLAP cube to answer some questions. a. How much memory in KB was used on Sunday?
55 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
b. When is memory mostly used? Can we show this in a graph?
And there you have it, your very first OLAP document.
56 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
OLAP ASSIGNMENT: Store Sales example.
Prepare the database. The data for this assignment is in the table storesales. The view which the cube will be based on is v_storesales.
Problem definition: Shemma Global has offices in Nairobi, Kisumu, Mombasa and Kitale. The sales department would like to view the total sales for any store by year, quarter, month and day. Our measure is sales and we have a time dimension here with multiple hierarchies: year, quarter, month and day. Create a JPivot OLAP cube with SpagoBI to achive this. NOTE: If you purchased this book, then the steps required to complete the assignment is in the video OLAP_Assignment1 which shows a step by step guide on how to complete the assignment. If you got this book free and would like to have the companion videos and database then contact us at info@ogutu.org or xogutu@gmail.com on how to get copies of the videos and database files.
Assignment Questions and Answers: 1. Compare Sales for Quarter 4 for Nairobi in 2011 against Sales for Quarter 4 for 2010.
57 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
2. Compare sales between quarter 3 of 2010 and quarter 3 of 2011 for all stores.
3. Include a column with Total Sales.
References and further reading: 1. The OLAP_Assignment1.avi companion video. 2. http://jpivot.sourceforge.net/ 3. http://wiki.spagobi.org/xwiki/bin/view/spagobi_server/JPivot 4. http://mondrian.pentaho.com/documentation/schema.php
58 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
LESSON TWO Location Intelligence
OBJECTIVES After completing this chapter, you should be able to: 1. Describe what Location Intelligence is. 2. Describe the importance of Location Intelligence to your business. 3. Use the SpagoBI Geo Engine.
CONTENTS Location Intelligence. • • •
What is Location Intelligence? How is location Intelligence useful to an organization. Creating a location intelligence document, Step by Step.
59 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
What is Location Intelligence? Location Intelligence is the integration of Business Intelligence and Geographical data. This gives a visual representation of the data on a map.
Consider the SpagoBI GEO document above. This is a document that shows the number of livestock that are at risk due to famine to be used by a disaster preparedness organization. For the policy makers, it would be easier to understand as they will be interested in regions with red color. You can see that we have different tones of red color on the map. The regions which have a higher tone like Upper Eastern have more livestock at risk compared to other regions. In this chapter, we will learn how to create such documents. The document shown above is called a Geo document in SpagoBI. It is created using an SVG map and an xml template called a geo template. Apart from the map and the template, you will also need a source of data. So we need to understand how to create these 3 components; an SVG map, a geo template and a data source.
60 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
The Problem. Assume you are a consultant for the Kenya Red Cross and you are tracking an upcoming famine in the month of August. You have data from various provinces and regions in Kenya and you would like to put them in a map so that you can project it in the big screen on the conference room. The map will be updated automatically as data comes from the provinces. Solution. To solve this problem we will need several things to be done first: (i)
We will need the data. For this exercise the data is in the table livestock_by_county in the database bidb.
(ii)
We will need an SVG map. SVG or Scalable Vector Graphics is XML based standard. We will be creating the map of Kenya with its various regions. We can do this with a free tool called Inkscape or you can use Corel Draw if you have a license for it. Instead of starting from scratch, follow these steps to create a map based on an already existing map. If you are good with inkscape or Corel draw, you can skip this and draw your own SVG map. Note that if you are not interested in creating the svg map, there is already an existing one under the folder â&#x20AC;&#x153;Location Intelligenceâ&#x20AC;? with the CD that came with this book. It is called Kenya_Map.svg. If you chose not to create the the map, jump to step (iii) otherwise follow the following steps to create the map.
STEPS TO CREATE AN SVG MAP WITH INKSCAPE. a. Navigate to http://upload.wikimedia.org/wikipedia/commons/2/2c/Kenya_location_map.svg and when the map is opened in your screen, take a screenshot of the page and crop the image using Microsoft Paint or any tool of your choice. Save the image as F:\Location Intelligence\Kenya.png. If you open your file, it should look like this. We will use this image as the base of our SVG map since we do not want to waste time drawing the map of Kenya with all its borders.
61 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
b.
Download and install Inkscape.
c. Open Inkscape. d. Go to File->Open then Select the PNG image we had downloaded above.
e. Select embed.
f.
You can see that the regions are already separated by lines so we will just use a fill tool with white color to segment and name each region we need. We will start with Upper Eastern, (III) on the map.
62 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Click on the fill tool (I) then select the green color (II) and then click on the area III. The map will change as show below.
63 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
g. To see how this region will look like in xml, use the selection tool to highlight it (I) then click on Edit - > XML Editor (II).
h. Click on id (I) and on box (III), type the name of the region you had selected as “Upper North Eastern” then click on “Set”.
64 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Note that the fill color is 00FF00 i.e green. We need to change it back to white so click on fill and change it to fill:#ffffff i.e white. Why are we doing this? We do this because once we are done with naming all regions, they should all be white and will only take the color based on data from the database. Click on Set to make the changes permanent. i.
Next we need to give the region a label so using the text tool, label it as â&#x20AC;&#x153;Upper North Easternâ&#x20AC;? as shown below.
65 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
j.
We are finished with the region “Upper North Eastern”. Now do the same for “Lower North Eastern”.
66 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
k. Finish the other areas. Your map should look like this.
67 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
l.
Now we need to select all regions and group them. Name this new group “county”. How do you do this? Click on a region, say “Western” then hold down the shift key and then click on the other regions. Once all the regions are selected, to group them, go to object and select group.
68 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
m. When you look at the xml, it should be as shown.
Note that all regions are under the county group. So how will SpagoBI know how to color the various regions based on the data from the database? It will compare the names of the regions against a column called county in the database. This column will have names similar to the various regions in the SVG map. So always ensure your group name is same as the column name where the region names are stored. See below for a sample table we will be using for this assignment.
69 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Now save the file as Save.svg. Open it with notepad or gedit, search for style="fill:#ffffff" and replace with fill=”#ffffff” and save. Inkscape uses style="fill:#ffffff" which SpagoBI 4.0 do not understand. Now your map is ready!
70 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
71 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
(iii)
Now that we have the map, let us load it into SpagoBI. Login to SpagoBI as user biadmin and under Resources select Maps.
(iv)
Click on Insert. Under name use KENYA_PROVINCES. For Description use “Kenyan Provinces”. Under Template, use Kenya.svg. For format, select SVG. Remember if your map is giving you a problem or if you did not create it, you can use the map in the “Location Intelligence\Kenya_Map.svg” in the CD that came with the book.
Click on Save. 72 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
(v)
Now we have finished with the map. Let us create the location intelligence document template. (Geo Template).
Geo Template. Most documents in SpagoBI needs a template that defines the structure and source of data and the location intelligence document that we are going to create is no different. Remember up to this point, we have managed to create the SVG map to be used in the document and saved it in SpagoBI with the name KENYA_PROVINCES. The template we are going to create is called a geo template and has the structure outlined below. The template will be rendered with the SpagoBI Geo Engine. NOTE: If you do not want to type in your own template, you can use the geo template that came with this book under â&#x20AC;&#x153;Location Intelligence\Kenya_Livestock.xmlâ&#x20AC;?.
73 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
We need to understand every section in the template file represented with the arrows. 1. MAP_PROVIDER – This is where you specify the map that will be used by the SpagoBI Geo Engine under the map_name. Note that the name should be the same as what you specified when saving the SVG map in SpagoBI in our case “KENYA_PROVINCES”. 2. METADATA allows us to specify the columns to be used as measures and geographical reference. From the example above <column type=”geoid” column_id=”county” hierarchy=”Kenya” level=”county”> tells us that in the table, the column county will be used as a geographical reference so we need in our SVG map a group called county also. Remember we had grouped all the regions into a group called county in our SVG map? 3. The line <column type=”measure” column_id=”amount” agg_func=”sum”> tells us that the column amount in our table will be used as a measure and the aggregation function will be sum. Here is a sample of the data we are using.
4. HIERARCHIES – These are the hierarchies used to aggregate the data and are divided into levels. Considering our template above, we only have one hierarchy called Kenya and one level called county. Consider the following: <LEVEL name="county" column_id="county" column_desc="Kenyan Provinces" feature_name="county"/> This means that: a. name="county" - The level is called county.
74 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
b. column_id="county" - This is the column that contains the id of the regions in the map. From our select statement in 3 above, you can see that this is the column where we have region ids and should have the same on the SVG map e.g from our map, we have a region with id Upper Eastern as shown.
Note that the path id on the svg map match the contents of the column county in the table. c. feature_name=”county” – This is the grouping that contains the regions refered to by the column_id. In our case the group is called county.
75 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Now that you understand the template, let us load it in SpagoBI. Before you do that, you need to create a dataset with the query â&#x20AC;&#x153;select * from livestock_by_countyâ&#x20AC;?. Follow these steps to create the dataset. Creating the dataset. 1. Login to SpagoBI as user biadmin and password biadmin. 2. Under Resources, select dataset.
76 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
3. Under dataset list click add.
4. For name and label use DS_KENYA_PROVICES. 5. Click on the Type tab. For the DataSet Type, use query. 6. For Data Source: use BIDB, i.e the datasource we had created in the previous chapters. 7. For the query, use “select * from livestock_by_county”
8. Click on the preview tab and select the preview button.
77 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
9. Save your dataset.
Uploading the Geo Template. You can find the template in the folder Location intelligence in the CD that came with this book. It is called Kenya_Livestock.xml. We need to create a Geo document using this template. Proceed as follows. 1. Login to SpagoBI as user biadmin with password biadmin. 2. Under analytical model, select document development.
78 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
3. Click on custom documents. 4. Click on add. 5. For the label use Kenya Livestock. 6. For the name use Kenya Livestock. 7. Under type select Location Intelligence. 8. For the datasource, select BIDB. 9. For the dataset, select DS_KENYA_PROVINCES. 10. For the template, select Kenya_Livestock.xml
79 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
11. Click on Save.
We are done with creating the document; now let us look at it. Click on your document to open it.
80 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
From the map, if you put your cursor at Upper North Eastern, you can see that this is where we have most livestock at risk that is why the color is red.
References Book companion video SpagoBI_4_Location_Intelligence.avi http://upload.wikimedia.org/wikipedia/commons/2/2c/Kenya_location_map.svg http://wiki.spagobi.org/xwiki/bin/view/spagobi_server/geo_template
81 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
LESSON THREE
Highchart Dashboards
OBJECTIVES After completing this chapter, you should be able to: 4. Describe the importance of Dashboards to your business. 5. Use High Charts.
CONTENTS DashBoards. • •
Business Intelligence Dashboards. Building Dashboards using High charts.
82 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Business Intelligence dashboards Every car comes with a dash board which has several gauges that alerts the driver when an important event have occurred. It might be that the car is running out of fuel or that the engine oil is getting low. It might be also that the battery is not charging or you are driving with handbrake on! An automobile dashboard need to be easy to understand and should not take time to read; remember you are driving and cannot stare at the dashboard for long! A good dashboard should be easy to understand and should portray relevant information only. The business community copied this dashboard idea from the automobile industry. Business dashboards show at a glance the state of the business at any given time. As an example, a chart might show a comparison between sales between current quarter and the last quarter. If all sales for previous quarters are greater than current quarter, then something is definitely wrong. Since a dashboard should be easy to read, normally only summaries are shown in dashboards. Dashboards also show trends and comparisons. We will create our first dashboard to compare sales between current year against sales for previous year. The dashboard will be built using the highcharts library. First, we will write the Sql that will help us get the sales comparisons between current year and previous year for the Nairobi store. Here is the result of the query:
83 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
And here is the query: select * from (select curr.curr_month_fig,prev.prev_month_fig,prev.previous_year,curr.current_year,prev.sales_previous_ye ar,curr.sales_current_year,prev.previous_month,curr.current_month from (select month(salesdate) prev_month_fig,sum(nairobi) sales_previous_year,DATE_FORMAT(salesdate, '%Y') previous_year,DATE_FORMAT(salesdate, '%M') previous_month from bidb.storesales where DATE_FORMAT(salesdate, '%Y')=DATE_FORMAT((select max(salesdate) from bidb.storesales), '%Y')-1 group by DATE_FORMAT(salesdate, '%M'),month(salesdate)) prev,(select month(salesdate) as curr_month_fig,sum(nairobi) sales_current_year,DATE_FORMAT(salesdate, '%Y') current_year,DATE_FORMAT(salesdate, '%M') current_month from bidb.storesales where DATE_FORMAT(salesdate, '%Y')=DATE_FORMAT((select max(salesdate) from bidb.storesales), '%Y') group by DATE_FORMAT(salesdate, '%M'),month(salesdate)) curr where prev.previous_month=curr.current_month) sales_comparison order by curr_month_fig asc
You can find the query under the dashboard folder in the CD that came with this book. It is called dash1.txt. In case you got a free softcopy of the book then write to the author at xogutu@gmail.com or info@ogutu.org to email you the database. Now that we have the query, we will create the xml template that will be used by highcharts. There are two methods we can use to achieve this: We can use the SpagoBI studio which is a graphical tool or we can use a manual method and specify the entries in an xml file. We will start with a manual method so that we understand the contents of the xml file. We have included the xml below, it can be found under the dashboard folder in the CD that came with this book. The xml file is called sales_comparison.xml.
84 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
<HIGHCHART width='80%' height='80%'> <CHART zoomType='xy' /> <TITLE text='Nairobi sales comparison, current year vs previous year.' /> <SUBTITLE text='Detail for 2011, 2010' /> <X_AXIS alias='current_month' /> <Y_AXIS_LIST> <Y_AXIS alias='sales_previous_year' opposite='true'> <LABELS> <STYLE color='#89A54E' /> </LABELS> <TITLE text='Sales Previous Year (2010)'> <STYLE color='#89A54E' /> </TITLE> </Y_AXIS> <Y_AXIS alias='sales_current_year' gridLineWidth='0'> <LABELS> <STYLE color='#4572A7' /> </LABELS> <TITLE text='Sales Current Year (2011)'> <STYLE color='#4572A7' /> </TITLE> </Y_AXIS> </Y_AXIS_LIST> <LEGEND layout='vertical' align='left' verticalAlign='top' x='120' y='40' floating='true' borderWidth='1' backgroundColor='#FFFFFF' shadow='true'/> <SERIES_LIST allowPointSelect='true'> <SERIES name='Sales Previous Year (2010)' color='#89A54E' type='spline' alias='sales_previous_year' /> <SERIES name='Sales Current Year (2011)' color='#4572A7' type='spline' alias='sales_current_year' dashStyle='shortdot'> </SERIES> </SERIES_LIST> </HIGHCHART>
85 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
We will first use the xml template to create a highcharts document then we will go over every line and explain what it does once you see the results. Now login to SpagoBI as the biadmin user and follow these steps to create the highchart document. The first thing we need to do is create a dataset where our highcharts document will get its data. To do this, click on Resources -> dataset. 1. Click on the add button. 2. On the Label, write “SalesComparisonNairobi”. 3. On Name, write “SalesComparisonNairobi”. 4. On Description, write “Sales comparison for Nairobi store between between current year and previous year.” 5. You should have the following once you are done.
6. Click on the Type TAB.
86 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
7. Under DataSet Type, select Query. 8. Under data source select “BIDB”. Remember we had created this data source previously. 9. Under Query, Paste the query in the file dash.txt on the folder dashboard. 10. You should have this once you are done.
11. Click on preview button. You should have the output shown below.
87 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
12. Save the data set. 13. Click on Analytical Model -> Documents Development. 14. Click on Insert. Add create the document as shown below.
88 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
15. Click on browse and select the sales_comparison.xml file we created previously. The file can be found on the CD that came with this book under dashboard folder. 16. Under documents template, select “Custom documents” and click Save.
17. Navigate to “Custom Documents” folder. Click on “Sales Comparison” document to run it. You should have your first chart!
89 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Now that you have seen the results, let us look at sales_comparison.xml file to better understand it.
A. Chart Size. The first element must be HIGHCHART.You can also specify the size of the chart in this element. <HIGHCHART width='80%' height=â&#x20AC;&#x2DC;80 %'> B. TITLE and SUBTITLE. The TITLEand SUBTITLE elements are the title and subtitle of the charts. <TITLE text='Nairobi sales comparison, current year vs previous year.' /> <SUBTITLE text='Detail for 2011, 2010' />
90 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
C. The X_AXIS element. The element <X_AXIS alias='current_month' /> is the label of the X axis. In Spago BI, this can come from a column in your data set query.
In our chart, the labels come from the database column current_month from the dataset SalesComparisonNairobi we created previously.
91 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
The contents of current_ month column in the data set SalesComparisonNairobi through the xml element <X_AXIS alias='current_month' /> is used to label the x axis with January up to December on our chart.
92 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
D. Y_AXIS_LIST Element. The<Y_AXIS_LIST> element defines the items that appear on the Y axis. Normally they come from database tables or views through the SpagoBI data set. For example, in the block of xml code below: <Y_AXIS alias='sales_previous_year' opposite='true'> <LABELS> <STYLE color='#89A54E' /> </LABELS> <TITLE text='Sales Previous Year (2010)'> <STYLE color='#89A54E' /> </TITLE> </Y_AXIS>
1. Alias element. alias='sales_previous_year' - This defines which column in the dataset query the data comes from. opposite='true' - When opposite is true, the label will appear on the right. When false it appears on the left. As an example, if we say opposite is false as shown below,
93 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
The Y axis elements will be on the left.
Change it to opposite=true and the Y axis items will appear on right.
94 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
gridLineWidth='1' â&#x20AC;&#x201C; Use 0 if you do not need a grid line. 2. The <TITLE text='Sales Current Year (2011)'> element. This element is used to set the labels of the items on the Y axis. E. The LEGEND element determines the position, color and other properties of the legend.
The above block of code produces the legend below.
95 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
F. The SERIES_LIST element is the main contents of the chart. It can be line or column. <SERIES_LIST allowPointSelect='true'> <SERIES name='Sales Previous Year (2010)' color='#89A54E' type='spline' alias='sales_previous_year' /> <SERIES name='Sales Current Year (2011)' color='#4572A7' type='spline' alias='sales_current_year' dashStyle='shortdot'> </SERIES> </SERIES_LIST>
Since our chart has two lines, it therefore will have two series. The contents of the series come from a table column through the SpagoBI dataset. For example, the dottet series (Sales for 2011) comes from the column sales_current_year and the element dashStyle='shortdot' makes it dotted. The element type='spline' means a line chart, if we change it to type='column' , we will end up with a bar graph as shown:
96 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
97 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Method 2: Use the SpagoBI Studio. Instead of editing xml files manually, we can use the SpagoBI studio to generate for us the xml file to be used as a template for the highchart document. Follow these steps to create a similar document as the one above but using SpagoBI studio. 1. Start SpagoBI studio. 2. Select File->New Project and select SpagoBI project.
98 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
3.
For project name enter Highcharts.
4. Right click on Business Analysis, select chart then click on Chart with HighChart.
99 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
5. Let us call it sales_comparison_studio and for type select spline.
6. Enter the chart title, subtitle and size (Width and Height) as shown below.
7. Under X AXIS, the alias should be current_month. Remember this is the column in the database where the name of the months are stored e.g December. So the X axis will consist of month names from January to December.
100 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
8. Under Series list, create the two series as shown. a. For the first series, enter the name “Sales Previous Year (2010)”. For the alias, enter “sales_previous_year”. Remember this is the column in the database that has values for last year’s sales. Chose a color of your choice. This will be the color of the line chart. For the type, use spline then click on save.
b. Do the same for series two but the name should be “Sales Current Year (2011)” and the alias “sales_current_year”. 101 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
c. Save the document. Now let us look at the xml that have been generated and compare it to the one we used previously. To do this, right click on the document and open with text editor. 9. Open the document with text editor.
102 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
10. The contents should be familiar to you from the previous exercise.
11. Now you can right click on the document and deploy it to SpagoBI directly or you can copy the xml onto another file and use it as a template. Let us try that. Right click on the document and select properties. This will show you where the document is located.
103 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
12. Then in SpagoBI, modify the Sales Comparison document we created for highcharts and select this document as a template.
104 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
13.
14. Test the document.
105 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
You can see that you have achieved the same result as the manual process. The method you use is up to you. Now that you know how to create highcharts with SpagoBi studio, try out all the various parameters that are there for highcharts. You can then mix several high chart documents in one screen to create a dashboard using the SpagoBI composite document.
106 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
LESSON FOUR Introduction to BIRT Reports
OBJECTIVES After completing this chapter, you should be able to: 1. Describe BIRT reports. 2. Configure BIRT reports. 3. Create BIRT reports. 4. Host BIRT reports on SpagoBI server.
CONTENTS BIRT Reports. • • •
Introduction to BIRT reports. Creating Birt reports. Hosting BIRT reports on SpagoBI server.
107 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
BIRT BIRT is the acronym for Business Intelligence and Reporting Tools and is an open source initiative to create a fully functional reporting tool using open source tools. BIRT supports various types of reports such as lists, charts, crosstabs and compound reports. In this chapter we will learn how to create BIRT reports and how to publish them in SpagoBI server. The first report we will create is a simple list report similar to the one shown below. It has a logo, a heading and data from the database.
BIRT 1 - We will be creating a report similar to the one above.
108 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Let us create a new report.
Download Software. In this chapter, we will be using the eclipse based BIRT reporting software SpagoBI Studio. Download the SpagoBI Studio if you have not already done so as it comes with everything we need to create our BIRT reports. 1. To create a report similar to the one shown above, start the Spago BI studio then follow these steps. 2. Create a new project. a. Click on File -> New -> Project. b. Under Business Intelligence and Reporting Tools select Report Project.
c. Click Next. d. Under project name use â&#x20AC;&#x153;Business Intelligenceâ&#x20AC;?. For the storage location, use default. e. Your new project will now be visible on the navigator on the bottom left of the page.
109 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
3. Define a new report. Click on File -> New -> Report.
a. For the report name, enter â&#x20AC;&#x153;Administrative Costsâ&#x20AC;?
110 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
b. For the report template, select blank report and click on finish.
c. Your new report should be visible on the navigator.
111 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
4. Reports can get their data from various sources such as flat files or relational databases. Prepare Database. We will be using the table admincost to generate this report. This table and others are included in the mysql dump that came with this book.
We need to create a connection to the database from the SpagoBI Studio. Proceed as shown below. a. Click on Data -> New Data source.
b. Select JDBC Data Source, and on Data Source Name put â&#x20AC;&#x153;Mysql Localâ&#x20AC;?
112 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
c. For the Data source details, enter the following.
Change the URL, username and password appropriately. Click on Test Connection. This should be successful before you proceed.
Click on Finish.
113 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
d. When you click on Data Explorer, your new connection should be visible.
e. With the Data Explorer still opened, right click on Data Sets, and select New Data Set.
f.
For the data source location select Mysql Local (The data source you just created above).
g. For data set name, put AdminCost. h. For the query text, use â&#x20AC;&#x153;select * from admincostâ&#x20AC;?. This will select all the contents of the table admincost table.
114 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
i.
Click on finish.
j.
On your dataset, click on preview results, this will output the contents of the table admincost.
k. Click on Palette and under report items, select grid. The grid allows you to organize the items in your reports like images, charts, text etc.
115 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
l.
Create a grid with 2 columns and one row.
116 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
m. Drag an image icon to the first cell.
n. Click on embedded image then select the shemma.jpg image from the BIRT folder on the CD that came with this book.
o. Drag the image to make it smaller.
117 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
p. Drag the text item to the second cell.
q. For the type of text, select HTML. Write a header as shown.
r. Once you are done, your report should look like the one shown below.
s. Next we will include the actual data on the report.
5. To include data from the data set we create above on the report, click on the Data Explorer tab. Expand datasets and drag AdminCost on an empty area of your report. 118 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
6. Using the property editor, change the heading for id, item and cost. a. Before
b. After.
119 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
c. To test the report we just created, click on Run -> View Report -> In Web Viewer.
d. And there you have it, your very first BIRT report!
120 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
7. Next we need to publish our report to the SpagoBI server. Proceed as follows. a. In the data explorer, right click on Report Parameters then click on New Parameter. For the name enter driver.
b. Create other parameters url, user and pwd.
c. Right click on the data source â&#x20AC;&#x153;Mysql Localâ&#x20AC;?, click on edit and select Property Binding. Attach the parameters as shown below.
121 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
This can be easily be done as follows: i. Click on the fx (Javascript sysntax) button next to JDBC Driver Class.
ii. Under category select Reports Parameters. iii. Under Sub Category select All. d. Under Double Click to Insert, double click on driver.
122 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
e. Repeat the procedure for JDBC Driver URL, User Name and Password. f.
If you have not been Saving your work, this is a good time to do so!
g. Right Click on your Business Intelligence project and click on properties. Under Resource, check the location where your report is stored.
h. Take note of this location as we will use it when creating documents in SpagoBI.
123 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
8. Start your SpagoBI server and navigate to http://localhost:8080/SpagoBI. 9. Login with username biadmin and password biadmin. 10. Click on Analytical model -> Documents Development. 11. Click on Insert.
12. Under Show document templates , select the folder you want your document to be stored. 13. Click on browse and navigate to where your AdministrativeCosts.rptdesign report file is stored. 14. Click on Save. 15. Click on Administrative Tasks under the folder you saved your report. You should have the report displayed on the SpagoBI server.
124 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
NOTE: This was done in SpagoBI 3.6 but should be similar to SpagoBI 4.0.
125 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
LESSON FIVE Creating a dashboard with BIRT and SpagoBI
OBJECTIVES After completing this chapter, you should be able to: 1. Describe BIRT reports. 2. Configure BIRT reports. 3. Create a dash board with BIRT reports. 4. Host BIRT dash board on SpagoBI server.
CONTENTS Creating a dashboard with BIRT and SpagoBI • Preparing the database. • Creating Birt reports. • Using grid, charts and tables in one report.
126 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Creating a dashboard with BIRT and SpagoBI In this chapter, we will be creating a dashboard using the BIRT reporting engine and then we will host it in the SpagoBI server. The final dashboard we will be creating will be similar to the one below.
127 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Prepare the database. The data for this chapter is in the table loan_payment. You can find this in the MySQL export file that came with this book. You can use the query below to look at the contents of this table. select * from loan_payment order by loan_year,month_no asc
Assume that the bank Shemma Global Limited gave out loans to 10 individuals with the account numbers 100 to 109. The loan was to be paid after duration of 48 months and was taken on December 2010. The first repayment period therefore was on 31st January 2011. We will follow the payment for these individuals for duration of one year. 1. January 2011 â&#x20AC;&#x201C; On the month of January 2011, everybody made their loan repayment which for illustration purposes we will assume is 0.19% of the total loan amount per month.
From the figure above, we can see that the column not_paid is zero for everyone for the month of January. If this column have a figure, then the loan will be in arrears. 2. On the month of February, some of our clients had difficulty paying their loans as illustrated below.
128 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
For subsequent months, we had several defaults. We need to create a dashboard using the BIRT reporting engine and SpagoBI that will show the following details.
1. The monthly loan arrears trends in a line chart. 2. The top 5 defaulters. 3. The total loan defaults per month. 4. Listing of all defaulters.
Follow these steps to create the BIRT dashboard outlined above. 1. Start your SpagoBIStudio and create a new report project. Call it Dashboard. a. Click on File -> New -> Project. b. Select Report Project under Business Intelligence and Reporting Tools.
129 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
c. Under project name, enter Dashboard and click on use default location.
130 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
d. Click on Finish. 2. Next, create a new report document. a. Click on File -> New -> Report.
b. For the parent folder, select Dashboard and enter non_performing_loans.rptdesign as the file name.
131 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
c. Click Next. d. Under report template, select blank report.
132 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
e. Click Finish. 3. Now let us add a grid with four rows and two columns to our report. A grid is a row/column layout that helps you organize items in your report. Under the pallete tab, click on report items, click on the grid icon and drag it to your report.
133 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
4. For number of rows select two and number of columns select four.
5. We will insert the company logo on the first cell. Click on the image icon under pallete and drag it to the first cell. Under “Select image from”, check the radio button next to “Embedded Image” and click on “Add Image”. On the CD that came with this book, you will find the folder images. Inside it you will get shemmalogo.jpg. Select this image and click on insert.
134 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
6. Click on the Shift key as you drag the image to resize it proportionately. Now we will add heading on the report. Click on the second cell on the right and drag the Text icon to it from the Palette.
7. Click on OK. 8. Click on the first cell of the second row then press down shift key and click on second cell on second row. All the cells should be highlighted. With the cells in this state, right click and select merge cells.
135 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
9. Click on background and change the background color of the merged cells and reduce the size to five pixels.
136 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
10. Once we are done with the heading, it should look like this.
Now we will add the first item that pulls data from our table on the chart. This will be the “total monthly loan repayments that have not been paid expresses as a line chart to show trends for the entire year “. 11. First though we need to create a connection to the database. a. Under Data Explorer tab, right click on Data Sources and select New Data source. b. Under Data Source Type, select JDBC data source. c. Under Data Source name insert “LocalMysql”
137 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Click Next. d. Under Driver Class, insert “com.mysql.jdbc.Driver (v3.1)” e. Under database URL, insert “jdbc:mysql://localhost:3306/bank” f.
Put a username and password and test your connection.
138 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
g. Next we need to create a data set. Right click on Data Sets then select New Dataset.
139 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
h. Under Data Source Connection, select the datasource we created above.
i.
For the Datasource name, enter â&#x20AC;&#x153;RepaymentTrendâ&#x20AC;?
j.
Click Next.
k. Under Query Text, enter the following sql and click on Finish. select sum(not_paid) total_monthly_unpaid,loan_month,month_no from loan_payment group by loan_year,loan_month order by month_no asc
140 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
l.
A dialog box should come up. Click on preview results.
141 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
12. We will create a chart item using this data set. 13. Merge the cells on row three and drag the chart icon from the pallete to the mergerd cells. 14. Select line chart and click Next.
142 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
15. Under Select Data, click on the radio button next to “Use Data From” and select “RepaymentTrend” data set.
143 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
16. Click on loan_month, drag and drop it on category (X) series. 17. Click on total_monthly_unpaid, drag and drop it on value (Y) series.
144 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
18. Now if you check keenly, you will notice on the chart preview the chart is nor ordered properly by month. To order click on “Edit Group and sorting button” next to “Category (X) series”
145 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
19. Under“Data sorting”, select Ascending. 20. Under “Sort On”, select row["month_no"] and click on OK.
21. Click on Next. Under format chart tab, click on series and remove the text “Series 1” and type “Loan Arrears” 22. Click on Title and replace the text “Line Chart Title” with “Monthly loan arrears” 23. Click on Finish and resize the chart accordingly.
146 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
24. Now click on Run -> View Report -> In Web Viewer, so that we can see how our dashboard looks so far.
We can see from this that the month of SEP had the greatest amount of loan arrears. Now let us add other components to our dashboard. 25. Next we will create a chart to show the top five defaulters as at current month which is December. a. We will use the query below to calculate the top five defaulters. select count(not_paid) months_defaulted, acctno from loan_payment where not_paid>0 group by acctno order by 1 desc limit 0,5
147 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
b. Create a new data set using the query above and call it â&#x20AC;&#x153;Top5Defaultersâ&#x20AC;?
Expand the row below the one that contains the monthly loan arrears chart and insert a bar chart on the cell labeled top 5 below.
c. Under use Data from, select Top5Defaulters. Drag the months_defaulted to Value (Y) series and acctno to Category (X) Series a shown below. Click on Next.
148 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
d. On the format chart Tab under legend, uncheck the visible check box. e. Under title, replace the text with â&#x20AC;&#x153;Top 5 Defaultersâ&#x20AC;? f.
Under X-Axis, click on the icon below to invoke the font editor.
g. Change the rotation to -42 degrees.
149 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
h. Click on Y-Axis and select the check box under title. For the title insert the text “No of defaults last 12 months”. For the title of the X axis, insert “Account No.” i.
Next we will add the the total loan defaults per month on our chart using the query below. select loan_year,loan_month,sum(not_paid) total_defaulted from loan_payment group by loan_year,loan_month,month_no order by month_no asc
Create a new data set and call it “Monthly Defaulters” using the query above. j.
Drag the data set “Monthly Defaulters “ to the cell on the right of the one with the Top 5 defaulters chart. Modify it to look like the one shown below.
150 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
k. Create a new data set and call it “ArrearsList” using the sql below. select loan_month "Loan Month",loan_amount "Disbursed Amount",months_left "Months Left", not_paid "Arrears", acctno "Account No" from loan_payment where not_paid>0 order by month_no
l.
Drag the dataset “ArrearsList” just below the datagrid we used above.
151 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
m. Modify the List to look like the one below.
n. Our dashboard should now be complete. Click on Run -> View Report -> As PDF to export your report to PDF.
152 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Final non performing loans dashboard.
Follow the steps in the previous chapters to add the BIRT report to SpagoBI.
153 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
LESSON SIX Cross Navigation with SpagoBI
OBJECTIVES After completing this chapter, you should be able to: 1. Explain what Cross Navigation is. 2. Explain the advantages of using Cross Navigation. 3. Create a document that has cross navigation.
CONTENTS BIRT Reports. • • •
Introduction to cross navigation. Sample document with cross navigation. Creating a cross navigation document.
154 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Introduction. In this chapter we are going to see how to build a cross navigation report using BIRT. This means that we will have a master report and we will navigate to a child report by clicking on some data in the master report. For example when we click on January in the Figure 1 below, it opens a child report with data for January as shown in Figure 2.
Figure 1 â&#x20AC;&#x201C; Master Document.
Figure 2 â&#x20AC;&#x201C; Child Document
155 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
The Master Document. The first step in creating a cross navigation document with BIRT is to create a master document. It is from the master document that we will navigate to the child document hence the name cross navigation. To create a master document follow these steps: 1. Create a new Project. Click on File then new project. Under “Business Intelligence and Reporting Tools” select “Report Project”.
2. For the Project name, enter “SampleDrillDown”
156 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
3. Next, we need to create a new report document in the project we just created. Select the “SampleDrillDown” project then select File->New -> Report. In the Dialog box that comes up, select “SampleDrillDown” as the parent folder. For the File name, enter “master.rptdesign”.
4. For the type of report enter “Blank Report” then click on Finish.
5. Next we need to create a data source where our report will get its data. Right click on “Data Sources” and select “New Data source”.
157 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
6. Select “JDBC Database Connection for Query Builder”. 7. For Datasource Name, use “bidb”. 8. Select MySQL.
9. For database use “bidb”. For URL enter jdbc:mysql://localhost:3306/bidb. Enter your username and password and select save password. Click Test Connection to ensure everything is working properly then save.
158 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
10. Now that we have a data source, we need to create a data set. Right click on Data Sets then select New Data set. Make sure bidb is selected as the data source and for the data set name enter ds_summary.
159 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
11. Enter the query â&#x20AC;&#x153;select * from bidb.v_sales_summary_yearâ&#x20AC;? and save.
Now that we have setup the source of our data, we need to build our report layout. Follow these steps. 1. We will drag a grid from the palete to our report so that we can use it to separate various items in our report. This will help us place objects in the location we desire e.g images, labels e.t.c without them flowing to other areas we do not need them to go. A grid is like a table and has rows and columns. From the palette, click on the grid and drag it to the report.
160 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
2. Our grid has 3 columns and 7 rows.
161 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
3. The first item we need to add to the grid is the logo. From the palette, select the image item from the palette and drag it to the first cell in the grid. Browse for the image ogutu.jpg under the folder images in the book CD. Select embedded image.
4. From the palette, drag a label to the second field.
162 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
5. In the label, enter “Countrywide Sales”.
163 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
6. Add another label and add the box number.
7. Add a third label and type â&#x20AC;&#x153;Nairobi Kenyaâ&#x20AC;?.
164 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
8. You need to add a dynamic text that will be used to add date. Drag it as shown below and use the birt function now().
9. Let us look at how the report looks at this point. Under run, select view Report.
165 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
10. It is coming up but we need to add a line to separate the header from the rest of the report.
11. Highlight the first cell just under the image, then press down the shift key and select the other two. See (I) below. Next, in the properties window, change the background color to silver. See (II) below.
166 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
12. Do the same for the row on top of the image. Once you are done, look at the preview again. With the lines now, it looks much better.
13. Now that we have finished with the heading of our document, we need to prepare a location where we will put the data from our data set. To do this, just under the row you had given a silver background, highligh the first cell (I in the image below) then with the shift key pressed down, select the other two cells (II and III).
167 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
14. Right click and then merge the cells.
15. Now drag the data set ds_summary to the cell you just merged.
168 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
16. We need to add formating. Select Month (I) below.
17. Make it bold.
169 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
18. We do not need the salesmonthnum column so delete it.
19. Now we need to format the sales figures for Nairobi, Kisumu, Kitale and Mombasa. Click on the value for Nairobi, then under properties editor select Format Number. Then check on 1000s separator. What this will do is that if you have a sales figure of say 145680 dollars, it will format it as 145,680 which is easier to read. We will go a step further and put the currency symbol to the
170 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
amount.
20. Under symbol select $, so now our figure above will be formatted as $145,680 in SpagoBI report.
171 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
21. This would be a good time to preview our work again. It looks nice. All you need to do is make sure that you finish formatting Kitale and Mombasa.
22. And now for the most important part. We need to create a link between the master report and the child report. We will put this link on the month such that when someone clicks on the month, it will open a child report with sales transactions for that particular month. So click on month (Red arrow) then under properties editor window select Hyperlink (Blue arrow).
172 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
23. In the expression builder, paste the code below.
24. The â&#x20AC;&#x153;SALESALLâ&#x20AC;? referes to the document name in SpagoBI server that will be opened when somebody clicks on this link in the master document.
173 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
25. The pdate is the parameter that will be transferred to the child document and it will hold the value that will be returned by the database row month. You can select this in the column bindings.
26. Now transfer your report to SpagoBI and run it. 174 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
You can see that the link have been created but when you click on any month, nothing happens. It gives the error that the document SALESALL does not exist. This is because though we have created the link and supplied the parameter that will transfer the value to the child report, we have not yet created the child report itself. Congratulations if you have come this far. In the next section, we will create the child report.
175 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
CHILD REPORT. The child report will contain the transactions for the month selected in the master report. For example, if you click on the month of August in the master report, you get the child report shown below which is a transaction that was done in the month of August.
Since you already know how to create a BIRT document, we will not go over all the gory details of creating a BIRT document but rather will highlight what you need to do to make your report work. Incase you need a step by step of how to create the child report see the companion video BIRT Drill Down_ Cross Navigation - Lesson 2 of 2 Here are the steps: 1. Create a child report using BIRT as shown below.
176 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Call it allsales.rptdesign. The data should come from the view v_daily_sales.
Note that in the SQL, we have included a where clause WHERE MONTH = ? and what this means is that the month will come from a parameter. Remember the value that will 177 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
be used to populate this parameter will come from the parent report we created previously when someone clicks on the month link. We have two types of parameters in BIRT, a report parameter and a data set parameter. This is a dataset parameter which needs to be linked to the report parameter. So create a dataset parameter as shown below.
Note that the data set parameter is linked to the report parameter called month. The report parameter is created as shown with the yellow arrow.
178 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Below are the properties of the month parameter.
2. Now we need to deploy the report to the SpagoBI server with the name SALESALL. Why do we need to deploy it with that name? Because that is what we had specified in the master document under the expression builder. See (I) below. 179 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Uploading Child Report to SpagoBI server. Login to SpagoBI server as user biadmin and create a new BIRT document with the following: Label – SALESALL, Name – SALESALL, Type – Report, Engine – SpagoBIBirtReportEngine, Data Source – bidb, State – Development.
180 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Next click on Browse and select the location where you had saved the child report we just created above (allsales.rptdesign).
181 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Remember when you click on any link in the master report, it will send to the child report the value you have clicked, i.e the month you have clicked. This value is send to the child report as a parameter. For you to use parameters in SpagoBi server, you need to create what is called an analytical driver. That is what we have just below the browse button. For the title of the analytical document, enter AN_MONTH. For the Url Name, enter month. Why are entering month? Because if you remember correctly, in the child report, we have a parameter called month. Now click on the serach icon next to Analytical driver.
182 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
On the page that opens, create a new analytical driver by clicking on the icon shown with the red arrow.
For the label put P_MONTH. For the Name put P_MONTH. The type should be string. Click on Temporal.
For the analytical driver use mode details, enter the data as shown below.
183 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Lastly you need to select the roles that are supposed to use this analytical driver. For now, select all roles.
Save your analytical driver and select it for your current document. Now it should look like this.
184 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Notice that we have deselected visible because we do not want to be asked to enter a value for this parameter when the child report is being launched. We need the parameter to be passed directly from the master document to the child document. Save your child report. Navigate back to the master document and click on any link, say 2012 December.
185 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Does it open the child report? Yes it does. But there is a problem! The child report is empty. Why is this? The reason is simple, when you click on 2012 December, notice that it only sends to the child report 2012. This is because there is a space between the words 2012 and December and since the parameters are sent using the the normal browser URL, not all the values are sent. We need a way of masking the space with something else and once it is sent to the child report, we can then remove whatever we used to mask the space so that the query in the child report returns the actual value. We can do this easily using javascript by going back to the master report and modifying the expression we had written to replace space with an underscore as shown below.
186 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Now once you reload the master document in the SpagoBI server and you highlight any of the links, you will notice that the space is replaced with an underscore.
187 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
But there is one more thing to do. Remember that the value sent to the child report is used in the SQL â&#x20AC;&#x153;WHEREâ&#x20AC;? clause to extract data from the database for the child report. In the database for the child report, the data for the month column still have a space e.g 2012 October but in the master document we have changed the space to an underscore so we also need to convert the query for the child document to replace any space with an underscore before performing a comparison with the parameter. This can be done using the mysql REPLACE function.
188 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Once you are done with the changes, reload the child document and then try to click on any link in the master document say 2012 October.
It now opens the child document successfully. 189 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
190 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
LESSON SEVEN JfreeChart, Highcharts and Cockpits
OBJECTIVES After completing this chapter, you should be able to: 5. Describe the importance of cockpits. 6. Create Dial charts. 7. Create Speedometer charts. 8. Create cockpits.
CONTENTS • •
Introduction. Dial Charts.
191 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Introduction. As we had said in previous chapters, cockpits enable us to get a snapshot of the business as at a particular point in time. They give a summary of all business components in one screen or less. Consider an aircraft or automobile cockpit for example. Just by glancing at it, you will know the status of your aircraft or car in an instant. Similarly in the world of business, you need a report that will tell you the status of your business in an instant without going through voluminous amounts of reports. This is the function of a cockpit. Consider the cockpit shown below. It shows us how the sales team are doing as concerns meeting their sales target. All the four graphs in Cockpit 1 and Cockpit 2 show the same information but using differenct charts for demonstration purposes. We can see that out of a target of 100%. The sales team have only met 27.8% so far.
Cockpit 1
192 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Cockpit 2 In this lesson, we will learn how to create the various charts and then combine them to create a cockpit.
193 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
JfreeChart - Speedometer dial chart. The image shows a speedometer dial chart we will be creating. This chart is best used to display target met against some set value. For example, assume that the sales team in a bank have a target of opening 100,000 new accounts per year but so far, they have only managed 28.7 percent of the target which is 28,700 new accounts. It can be displayed as shown in the diagram on the left. There are several things you need to note: 1. The interval between dials is 25 so we have 0,25,50 upto 100. We will see how to configure this. 2. Between 0 to 25 is red in color, 25 to 50 is yellow, 50 to 75 is blue and 75 to 100 is green. So if you are at the green area it means either you have met your target or almost. Red area means you are way off! Keep that in mind when we start configuring colors. STEPS: Follow these steps to create a speedometer dial chart similar to the one shown above. In your SpagoBI 4.0 studio, create a new SpagoBI project. Right click on “Business Analysis”, select “Chart” and click on “Chart with JfreeChart”.
194 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
For the name, enter “Sales Target Speedometer”. For the type select “DialChart”.
For the the Title, enter “Sales Target”. For the Sub Title enter “Speedometer”. For the sub type enter “speedometer”.
195 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
The background color of the chart is silver so under color select silver. Set width and height to be 400.
196 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Under series interval, we specify the range of colors we had talked about previously. What we want is that any target between 0 to 25 should fall under the red color since that is a danger! It means sales target have not been met by a wide margin. If you have only met between 0 and 25 percent of the target then there is a problem. To configure this, select Series intervals (I) and then click on Add (II).
For the label enter “bad”. For min enter “0”. For max enter “25”. For the color select “red”.
197 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Create other series as follows, 1. Between 25 to 50 put orange color. The label should be “fair”. 2. Between 50 to 75 put yellow color. The label should be “good”. 3. Between 75 to 100 put green color. The label should be “excellent”. We now need to configure the interval between dials which we have seen is 25. We also need to set the lower bound to be 0 and upper bound to be 100. Click on chart conf settings.
198 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Set orientation to be vertical, increment by 25, lower bound 0 and upper bound 100.
199 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
And that is it; we now need to deploy the generated template to SpagoBI server. Right click on the document and select properties so that you know where the template is saved. We can also click on deploy but incase it does not work, follow this method.
Once you know where the file is located, create a document on SpagoBI with the following settings:
200 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
The Dataset DS_GAUGE should only return one value labeled value. Click on browse and select the document. You should get your first dial chart!
201 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Play around with other type like thermometer, simple dial e.t.c then combine them using the SpagoBI composite document to create a cockpit. This is also outlined in the companion video for cockpits.
202 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
LESSON EIGHT Datamining with SpagoBI and Weka.
OBJECTIVES After completing this chapter, you should be able to: 9. Describe the importance of datamining. 10. Understand Weka. 11. Create clustering document in SpagoBI.
203 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Introduction. In most places I have seen people use business intelligence tools purely for OLAP, creating reports and charts. However business intelligence tools are much more powerful than this. In this tutorial, we will look at a real world example of using SpagoBI to discover patterns hidden in a large data set of millions of records containing the US census data. According to Wikipedia, data mining is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
204 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
The Problem We will assume that you are a military recruiter and your problem is to find a list of people who qualifies to join the army. You want people who have a certain group of qualities. They must not be children, should not be earning too much and therefore already comfortable and not interested in joining the army. Should not have served in the army before etc. To aid in your work, you have been given a large dataset of 2.4 million records from the last census with ID number so you can get the contacts of the people. You want to mine the data using BI so that it groups for you potential candidates to reduce the time taken to recruit. You don’t want to run after people who are not interested in joining the army.
Preparing the data. For us to use SpagoBI to perform data mining, we will need to load the data to be analyzed into a relational database. We will be using Oracle since it is the most popular enterprise database and also because we need to simulate as far as possible, a real world scenario. So where will we get the data? Download the data from http://archive.ics.uci.edu/ml/machine-learningdatabases/census1990-mld/USCensus1990.data.txt and save it to your computer. It is a large file, 352MB of data. The data is in a CSV format so the first thing we need to do is import it into the Oracle database. If you have no prior experience with Oracle, see my free book “SpagoBI, ORACLE and OLAP” available here http://www.scribd.com/doc/133975956/SpagoBI-with-ORACLE-11g Loading the data. We need to load the data into Oracle before we can perform data mining with SpagoBI. Start your database and login as user sys.
Create a tablespace that will hold your census data. This tablespace should be 2GB in size but can extend if needed. (A tablespace is a logical container where table data is kept). I will place the datafile for my tablespace in drive C:\ as that is where I have space. Use the command below.
205 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
CREATE TABLESPACE CENSUS_DATA DATAFILE 'C:\oraclexe\app\oracle\oradata\XE\census_data01.dbf' SIZE 2048M AUTOEXTEND ON;
Create the user spago with password spago. This is the user that will own the census data. Use the command below. Notice that we are granting unlimited usage on the tablespace CENSUS_DATA to spago. That is, user spago can use as much space as he/she likes on this tablespace. CREATE USER SPAGO IDENTIFIED BY SPAGO DEFAULT TABLESPACE CENSUS_DATA QUOTA UNLIMITED ON CENSUS_DATA;
Next grant the CREATE SESSION privilege (Allows the user to login) and CREATE TABLE privilege (Allows the user to create a table to the user spago). GRANT CREATE SESSION TO SPAGO; GRANT CREATE TABLE TO SPAGO;
206 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Confirm that you can login as SPAGO user.
Next, create the table that will hold the data from the USCensus1990.data.txt file you downloaded previously. Below is the script for creating this table. Save it as C:\US\table.sql CREATE TABLE CENSUS (caseidINT,dAge INT,dAncstry1 INT,dAncstry2 INT,iAvailINT,iCitizenINT,iClassINT,dDepart INT,iDisabl1 INT,iDisabl2 INT,iEnglish INT,iFeb55 INT,iFertilINT,dHispanic INT,dHour89 INT,dHoursINT,iImmigr INT,dIncome1 INT,dIncome2 INT,dIncome3 INT,dIncome4 INT,dIncome5 INT,dIncome6 INT,dIncome7 INT,dIncome8 INT,dIndustryINT,iKorean INT,iLang1 INT,iLookingINT,iMarital INT,iMay75880 INT,iMeansINT,iMilitaryINT,iMobilityINT,iMobillimINT,dOccupINT,iOthrservINT,iPerscareINT,dPOBINT,dPoverty INT,dPwgt1 INT,iRagechldINT,dRearning INT,iRelat1 INT,iRelat2 INT,iRemplparINT,iRidersINT,iRlaborINT,iRownchldINT,dRpincomeINT,iRPOBINT,iRrelchldINT,iRspouseINT,iRvet servINT,iSchool INT,iSept80 INT,iSex INT,iSubfam1 INT,iSubfam2 INT,iTmpabsntINT,dTravtimeINT,iVietnam INT,dWeek89 INT,iWork89 INT,iWorklwkINT,iWWIIINT,iYearschINT,iYearwrkINT,dYrsserv INT);
To create the table, execute the query as shown below when logged in as user spago.
SQL*Loader The table is now prepared and all that remains is to load the data into it. To load the data, we are going to use an Oracle tool called SQL loader. This is a tool that loads data from a flat file into an oracle database table. For SQL loader to work, it needs a control file which tells it where the data is and into which table in the database we should load the file. Below is a sample control file we will use. 207 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
load data infile 'C:\US\USCensus1990.data.txt' into table CENSUS fields terminated by "," optionally enclosed by '"' (caseid ,dAge ,dAncstry1 ,dAncstry2 ,iAvail ,iCitizen ,iClass ,dDepart ,iDisabl1 ,iDisabl2 ,iEnglish ,iFeb55 ,iFertil ,dHispanic ,dHour89 ,dHours ,iImmigr ,dIncome1 ,dIncome2 ,dIncome3 ,dIncome4 ,dIncome5 ,dIncome6 ,dIncome7 ,dIncome8 ,dIndustry ,iKorean ,iLang1 ,iLooking ,iMarital ,iMay75880 ,iMeans ,iMilitary ,iMobility ,iMobillim ,dOccup ,iOthrserv ,iPerscare ,dPOB ,dPoverty ,dPwgt1 ,iRagechld ,dRearning ,iRelat1 ,iRelat2 ,iRemplpar ,iRiders ,iRlabor ,iRownchld ,dRpincome ,iRPOB ,iRrelchld ,iRspouse ,iRvetserv ,iSchool ,iSept80 ,iSex ,iSubfam1 ,iSubfam2 ,iTmpabsnt ,dTravtime ,iVietnam ,dWeek89 ,iWork89 ,iWorklwk ,iWWII ,iYearsch ,iYearwrk ,dYrsserv )
The line infile 'C:\US\USCensus1990.data.txt'tells us that this is the source of the data. The data will be loaded into the table census and the data is separated by commas. The values in bracket displays the columns of the table. Save the file with the text as control.ctl in the folder C:\US\control.ctl. We are now ready to load the data. Launch command prompt and at the terminal, type the commands below. sqlldr SPAGO/SPAGO control=C:\US\control.ctl This means that we are launching the SQL loader utility and it is connecting to the database as user SPAGO (Which we created previously) and with password SPAGO. It will use the control file in the location specified.
When you hit enter key, it will start inserting the data into the database and this might take a while depending on the speed of your machine. On my laptop it took less than 5 minutes to load the 2.45 million records.
208 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
209 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
210 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
WEKA SpagoBI uses a software called Weka (Waikato Environment for Knowledge Analysis) which is a collection of machine learning algorithms developed at the University of Waikato, New Zealand. Though Weka supports many algorithms, only cluster analysis is supported in SpagoBI and therefore we will limit ourselves to clustering for the remainder of this document. Cluster analysis Clustering is a method used to discover natural groups in data without prior knowledge of the groups. Suppose you have a database of an insurance company and you run a clustering algorithm against it, what details might you discover? It might give you groups of policy holders with high claim cost who you can blacklist from your firm and groups with low claim cost who you can do business with. This is an example of how data mining can be used in the real world. Marketers also use clustering algorithms to discover certain groups in their customer data whom they target with specific products. In the Telco sector, you might discover that young people call mostly at a particular time of the day or use more of a certain service e.g. internet data as opposed to voice and you can use this information to target them with offers for internet data bundle. Clustering has many other uses in marketing, image processing, medicine etc. Looking at the census data that is now in our database, it makes no sense at all but once we start analyzing it, we might discover interesting details from it. The particular algorithm we will be using is called the k-means algorithm. Downloading Weka. Download Weka 3.6.1 from http://sourceforge.net/projects/weka/files/?source=navbar and install it into your computer. Next put the Oracle jdbc library to your computers class path so that Weka will be able to find it when connecting to Oracle database. The Oracle library is in the path C:\oraclexe\app\oracle\product\11.2.0\server\jdbc\lib\ojdbc6.jar. This may differ if you installed express edition on a different path.
211 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
JDBC Driver Now Weka needs to know where the Oracle JDBC driver is. We tell it by modifying the file Oracle DatabaseUtils.props which can be found in the jar file C:\Program Files\Weka-36\weka.jar. You have three options. 1. Modify the file DatabaseUtils.props to include the Oracle setting by navigating to the location where you installed Weka e.g. C:\Program Files\Weka-3-6, right click on weka.jar and open using winrar.
212 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Navigate to experiment/DatabaseUtils.props and extract out the file DatabaseUtils.props. Change the line jdbcURL=jdbc:idb=experiments.prp to jdbcUrl=jdbc:oracle:thin:@localhost:1521:XE and Change the line jdbcDriver=RmiJdbc.RJDriver,jdbc.idbDriver,org.gjt.mm.mysql.Driver,com.mckoi.JDBC Driver,org.hsqldb.jdbcDriver to jdbcDriver=oracle.jdbc.driver.OracleDriver then return the file back to the jar file.The file should now look like this. Notice the highlighted entry.
213 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
2. The other easy option is to delete the file DatabaseUtils.props and rename the file DatabaseUtils.props.oracle to DatabaseUtils.props in the jar file.
3. The last and recommended option which we will be using is to extract the file DatabaseUtils.props.oracle and copy it to your home directory with the name DatabaseUtils.propse.g â&#x20AC;&#x153;C:\Documents and Settings\Stephen Ogutu\DatabaseUtils.propsâ&#x20AC;? then modify it as follows. a. Change the database URL to jdbcURL=jdbc:oracle:thin:@localhost:1521:XE if you installed Oracle Express edition. b. Change the JDBC driver to jdbcDriver=oracle.jdbc.driver.OracleDriver 4. Weka is now ready to connect to Oracle.
214 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
A simple analysis with weka. Now start Weka and click on explorer.
To select the source of data that we need to analyze click on Open DB icon and under the URL, enter jdbcUrl=jdbc:oracle:thin:@localhost:1521:XE as shown below.
215 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Click on User and enter the following details.
When you click on connect, on the info box, it should tell you â&#x20AC;&#x153;connecting to: jdbc:oracle:thin:@localhost:1521:XE = true â&#x20AC;&#x153; Let us start with 20,000 records since most laptops will not handle the 2.4 million records at a go. First let us see if there is any relation between age,marital status,militaryservice,poverty level and gender in the census data. After entering the password, click on connect. Then enter the query and click on execute.
216 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
When you click OK, you might get this error if you used option one to change the DatabaseUtils.props file .
This is because WEKA does not know about the NUMBER data type returned from the jdbc driver and so we need to map it into a type that WEKA understands. Since we know the values are integers, we will map them to a java type integer (Represented by number 5 in the file DatabaseUtils.props). Add the line below in your DatabaseUtils.props and save. NUMBER=5 Your file should be similar to this.
Try to run the query in WEKA again. We should now get the screen below.
217 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Understanding the output. Under Attributes, click on ISEX as shown below. This attribute (column displays the gender). Remember the k-means algorithm that we will be using only accepts numbers so we need a way of converting the gender representation Male or Female to number values. This has been achieved by using the number 1 for females and the number 0 for Males.
218 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
When you click on ISEX attribute (Yellow arrow in the above diagram) and look at the output of the green arrow (Arrow II), you see that the minimum number is 0 (Males) and Maximum is 1 (Females) and in our data sample, the Mean (Average) between Males and Females is 0.517, in other words, the distribution between males and females is almost half with females slightly higher than males which is the norm in most populations. From the graph, we see that we only have males represented by red arrow (Arrow III) who total 9658 in our sample count. Females total 10342, blue arrow (Arrow IIII). Notice that in the graph or visual representation, there is nothing between 0 and 1 since we either have males or females. Letâ&#x20AC;&#x2122;s look at another attribute which is military service. Click on IMILITARYattribute.
219 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
We see that we have a minimum of 0 and a maximum of 4 with a mean of 2.801. From the description of that column found in http://archive.ics.uci.edu/ml/machine-learningdatabases/census1990-mld/USCensus1990raw.attributes.txt and copied below.
It therefore means that from our sample of 20,000 rows (instances), there are 4708 people who have not reached the military age so they are represented by zero. See the blue arrow on the image below.
220 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
We have 143 people on active duty (red arrow). Remember 1 represents active duty. 2214 people who were on active duty in the past (Green arrow), 288 serve in the national guard (Black arrow) and 12647, the majority never served in the armed forces(Yellow arrow). Now you should be ok with understanding the data. Let us now run a clustering algorithm on the data.
221 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
K-Means Algorithm Click on cluster, click on chose and select simple kmeans.
Next to the chose button click on the bold text SimpleKMeans and change the number of clusters to 5.
222 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Click on ignore attributes and select CASEID. We will not be using this attribute (column) in the clustering because it is merely used to identify a row or the instance.
Under cluster mode, select â&#x20AC;&#x153;Use training setâ&#x20AC;? then click on start.
223 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
From the results above, we can see that the data sample was 20000 (black arrow), the number of attributes (columns) were 6 (yellow arrow) and out of these, CASEID was ignored. The data have been partitioned into 6 groups with similar characteristics. Let us look at cluster 0 (Green box). 1. It has a total population of 2370 people. 2. From the value of DAGE column which is 1.1093 and looking at the age function found here http://archive.ics.uci.edu/ml/machine-learning-databases/census1990mld/USCensus1990.mapping.sql and copied below,
224 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
We can deduce that people in this cluster are less than 13 years old since the value 1.1093 rounded off to the nearest integer is 1. 3. The value of IMARITAL attribute is 3.9958 which rounded to the nearest integer is 4 and from http://archive.ics.uci.edu/ml/machine-learning-databases/census1990mld/USCensus1990raw.attributes.txt which is copied below,
means that those in this cluster have never married since they are less than 15 years old. 4. The value of IMILITARY is 0.0025 which rounded to the nearest integer is 0 and from http://archive.ics.uci.edu/ml/machine-learning-databases/census1990-
225 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
mld/USCensus1990raw.attributes.txt copied below
means they are under age and so have never been in military service. 5. The value of the attribute DPOVERTY is 1.8342 which rounded off to the nearest integer is 2 and means not applicable. 6. Lastly the ISEX attribute is 1 which means they are females. In summary, this is a cluster which consists of 2,370 female underage kids who have never been to the army. If you were employed by the army to recruit soldiers, would you consider members in this cluster? 7. As an exercise, assume you are a recruiting agent for the army and you have this data. Find a cluster of people who would make potential candidates.
References: •
https://list.scms.waikato.ac.nz/pipermail/wekalist/2005-April/030088.html
•
http://archive.ics.uci.edu/ml/machine-learning-databases/census1990mld/USCensus1990raw.attributes.txt
•
http://www.kdd.org/explorations/issues/11-1-2009-07/p2V11n1.pdf
•
http://www.ibm.com/developerworks/opensource/library/os-weka1/index.html
226 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Enter SpagoBI We will be using SpagoBI to arrive at the same values we have got with the Weka explorer. The advantage with SpagoBI is that we will be able to store the data so that we can analyze it with other tools available in SpagoBI like Qbe, charts and OLAP. Now SpagoBI needs a XML KnowledgeFlow layout file (kfml) file which defines what we have just done above in weka for it to work. For us to create a kfml file, start weka and choose knowledge flow.
Knowledge flow does a similar thing to explorer except that we put the items on a canvas and connect them such that we can visualize how the data flows. Here are the steps. 1. Under DataSources tab, select database loader, the arrow will change to a cross. Click on the knowledge flow layout with the cross. It will deposit the Database Loader icon on the Layout.
227 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
2. Click on the Evaluation tab, select the TrainingSet maker and deposit it on the Layout as shown.
3. Under Filters, select AddCluster and deposit it into the Layout.
4. Lastly under DataSinks tab, select DatabaseSaver and deposit it into the Layout.
228 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
5. Double click on Database Loader and enter the data as shown.
The query should be similar to the one below. select CASEID,DAGE,IMARITAL,IMILITARY,DPOVERTY,ISEX from census where rownum<=20000 6. Right click on DatabaseLoader and select dataSet.
229 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
7. The icon will change into a rubber band, click on Training SetMaker. The two will now be linked.
8. Link Training SetMaker with AddCluster by right clicking on TrainingSetMaker and selecting Training Set.
230 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
9. Double click on AddCluster. Click on chose then SimpleKMeans.
10. Next to choose, click on SimpleKmeans and for number of clusters select 5.
231 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
11. For IgnoredAttributeIndiceseneter 1(This is the CASEID since it is the first attribute or column).
12. Link AddCluster to DatabaseSaver.
232 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
13. Double click on DatabaseSaver and enter the details below.
This means the data will be saved into the database table RECRUITS_TRAINING_1_OF_1 which will be created automatically. 14. The final diagram should look like the one below.
233 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
This means that we will take data from the database, pass it to Training set maker then pass it to the clustering algorithm which will cluster it and save the results in the database. 15. Click on Save icon to save your layout. a. Under files of Type, select KFML.
234 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
b. For file name enter Recruit.
Click on Save. 16. We are done with Weka. You can close it.
235 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
SpagoBI datamining document. NOTE: This was done for earlier versions of SpagoBI but should be applicable to SpagoBI 4.0. We will need to create a datamining document in SpagoBI to perform the clustering. Login to SpagoBI (I am using version 3.3) as user biadmin.
Under Resources, select Data source then click on create button. Enter the details as shown.
236 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Click on Test before Save button. It should say â&#x20AC;&#x153;Connection Test OKâ&#x20AC;?.
We have now created a connection to the Oracle database from SpagoBI. Next click on Analytical Document -> Document Development.
Click on Insert. 1. For Labelenter Recruits. 2. For Nameenter Recruits. 3. For Descriptionenter Datamining recruits data. 4. For Typeenter Datamining model. 5. For Engine enterWeka engine. 6. For Engine enter Weka engine. 7. For Data Sourceenter SpagoBIOracle. 8. For Stateenter Relesed. 237 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
9. For Template enter the KFML file we created in Weka (Recruit.kfml).
10. Under Show document templates select Data Mining folder shown below.
238 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
11. Click on Save. 12. Click on Home Page, Select Business Analysis folder then Data Mining. Click on the Recruits Document.
13. The document will run successfully as shown below.
14. But when you look at the Tomcat log, you will see it inserting the cluster output rows in the database.
239 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
This feature is not available in the original Weka engine that comes with SpagoBI, I have added it as a way of debugging. You can get the modified Weka engine for a small fee when you write me. Note that if you get an error here it might be because of two reasons. a. That your SpagoBIWekaEngine have not been configured properly particularly the file C:\Downloads\All-In-One-SpagoBI-3.3-01242012\SpagoBI-Server3.3\webapps\SpagoBIWekaEngine\WEB-INF\classes\database.properties which should be setup for Oracle connection as shown below.
240 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
b. The other problem might be that Oracle is refusing to create the table because one of the columns is using a reserved word CLUSTER. The offending code is found in the file DatabaseSaver.class in the path C:\Downloads\All-In-OneSpagoBI-3.3-01242012\SpagoBI-Server3.3\webapps\SpagoBIWekaEngine\WEB-INF\classes\weka\core\converters. I had to add the following lines to make it add underscore if column name was CLUSTER which is a reserved word in Oracle.
241 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
How did I know this? Well Oracle allows you to trace a session and I saw from the Oracle trace files that this was the problem. See the trace file below.
If hunting errors in trace files is not your cup of coffee or if you have no time for it and just needs a functioning Weka engine to use with Oracle mail me for a copy. It will cost you a small fee to compensate for my time.
15. Assuming you had no issues, click on User menu events. You will see that it says â&#x20AC;&#x153;Execution of Weka flow successfully terminated!â&#x20AC;?
16. We can confirm from Oracle that the table RECRUITS_TRAINING_1_OF_1 was created by SpagoBI, clustering done and data inserted.
242 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
17. Now that data makes no sense at all and so we will need another SpagoBI document to analyze it. That is where Qbe comes in.
243 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
Qbe Document. We will create a Qbe document to help us freely inquire the clustered data and produce reports from it. Create a datamart using SpagoBI studio. Steps: 1. Create a new General project.
244 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
2. Call the project Recruit. Under the Recruit project, create a new SpagoBI model.
245 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
3. Name the model RecruitModel and the file RecruitFile.
246 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
4. Under Connection, select New Oracle and for schema select SPAGO.
247 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
5. Select Physical Model Tables.
248 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
6. Select business model class.
249 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
7. You will have the following screen.
8. Right click on Business Model, click Create and select Datamart.
250 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
9. Select the location.
10. Navigate to the location C:\Downloads\SpagoBIMeta_3.3_Win_20111222\SpagoBIMeta_3.3_win_20111220\wor kspace\RecruitModel\dist and copy the files datamart.jar and cfields_meta.xml to C:\Downloads\All-In-One-SpagoBI-3.3-01242012\SpagoBI-Server3.3\resources\qbe\datamarts\RecruitModel
251 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
11. Now we need to tell SpagoBI where our datamart is i.e the RecruitModel. We do this by creating a simple xml file like below.
12. Save the file as C:\DataMart.xml 13. We are done creating the datamart, next login to SpagoBI and create the Qbe document. 14. Click on Analytical documents -> Documents management. 15. Click on Create. 16. Enter data as follows. a. For Labelenter RecruitResults. b. For Nameenter RecruitResults. c. For Name enter RecruitResults. d. For Typeenter Datamart model. e. For Engineenter QbeEngine. f.
For Stateenter Released.
252 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
g. For Templatechose DataMart.xml
h. Under Show document templates select Business Analysis and Data Mining.
i.
Save the document.
j.
Under Business analysis folder, select Data Mining folder and select Recruit Results.
253 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
254 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
k. Select the attributes DAGE,IMARITAL,IMILITARY,DPOVERTY,ISEX,CLUSTER from Schema and drop it to the Query Editor.
l.
Under alias, rename the fields as shown. Rename DAGE to AGE,IMARITAL to MARITAL STATUS,IMILITARY to MILITARY SERVICE ,DPOVERTY to POVERTY,ISEX to GENDER ,CLUSTER to CLUSTER.
255 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
m. Now we need to get the average for all fields and group by CLUSTER.
n. Next click on Execute Query.
o. Now let us look at the results of the clusters.
256 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
p. We have forgotten to add a count of the people in any given cluster. Go back to Query and add the attribute CASEID and instead of Average function, select count.
q. Run the query again.
Let us look at the results. Cluster 1 is made up of Females (Gender = 1) who have never seen military service (0 means underage for military service ) and have never married (Marital status = 4) and are below 13 years old (AGE=1.11). The total number in this group is 2370. So this group is of no interest to a military 257 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
recruiter. Notice that the results are exactly the same as the ones we got by using Weka explorer shown below.
Now you know enough to be productive in data mining with SpagoBI.
258 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g
About the author. Business Intelligence is very important for decision makers. Making business decision without the correct information is suicidal for your business. You need to understand your products, your customers and more importantly your competitors. This book will help you to very fast setup a business intelligence platform to better under understand your data and trends using intuitive charts and OLAP using a free and very powerful business intelligence platform.
Stephen Ogutu is an experienced IT Administrator who has a love for creativity and enjoys experimenting with various technologies. He has vast experience in business intelligence technologies, databases particularly oracle and Unix operating systems. Follow his daily findings on Twitter at @xogutu. If you have a question, drop him a line at xogutu@gmail.com.
259 | S p a g o B I 4 . 0 B a b y S t e p s . T h e O g u t u F o u n d a t i o n , w w w . o g u t u . o r g