USER MANUAL Hadoop Grid 247 Workflow Designer 1.3.6
Table of Content 1 SYSTEM OVERVIEW .................................................................................................................... 8 1.1 PROJECT BACKGROUND..................................................................................... 8 1.2 PURPOSE OF USER MANUAL ......................................................................... 8 2 GETTING STARTED ..................................................................................................................... 9 2.1 PREPARATION ......................................................................................................... 9 2.2 INSTALL HADOOP FROM CYGWIN ............................................................ 9 2.3 SETTING ENVIRONMENT ................................................................................... 9 3 STRUCTURE FILE............................................................................................................................ 12 4 RUNNING APPLICATION .......................................................................................................... 14 4.1 FIRST ACCESS ........................................................................................................... 14 4.2 WORK AREA ............................................................................................................... 15 4.2.1 Menu Bar .................................................................................................. 15 4.2.2 Toolbar ...................................................................................................... 19 4.2.3 Upper Left Pane .................................................................................. 20 4.2.4 Lower Left Pane .................................................................................... 22 4.2.5 Right Pane................................................................................................. 25 4.3 TRANSFORMATOR................................................................................................. 26 4.4 COMBINER................................................................................................................... 44 4.5 AGGREGATOR ......................................................................................................... 47 4.6 WORKFLOW ON HADOOP .............................................................................. 49 5 MAKING PROJECT ...................................................................................................................... 50 5.1 MAKING NEW PROJECT .................................................................................... 50 5.2 MAKING NEW PACKAGE ................................................................................... 51 5.3 MAKING NEW WORKFLOW ............................................................................. 52 5.3.1 Reading Text Files and Applying Parsing Process ...... 54 5.3.2 Select the Fields ................................................................................. 58 5.3.3 Outer Join the Fields ....................................................................... 62 5.3.4 Select Data in Join Result............................................................... 64 5.3.5 Applying Aggregation .................................................................... 74 5.3.6 Generate MapReduce jar from Workflow ......................... 79 5.4 RUNNING WORKFLOW ON HADOOP ..................................................... 80
UserManual | HGrid247
Table of Figure Figure 1. System Properties – Advanced Tab ................................................................... 9 Figure 2. Add System Variable .................................................................................................... 10 Figure 3. Java Home - Variable .................................................................................................. 10 Figure 4. Hadoop Home - Variable............................................................................................ 10 Figure 5. Path Variable - JDK.......................................................................................................... 11 Figure 6. Path Variable – Hadoop ............................................................................................. 11 Figure 7. Structure File....................................................................................................................... 12 Figure 8. Lib Files ................................................................................................................................ 12 Figure 9. LibApp Files .......................................................................................................................13 Figure 10. Main Window ................................................................................................................ 14 Figure 11. Work Area ........................................................................................................................ 14 Figure 12. File Menu ......................................................................................................................... 15 Figure 13. Operation Menu ......................................................................................................... 16 Figure 14. Tools Menu ..................................................................................................................... 17 Figure 15. Setting HDFS Cluster................................................................................................... 17 Figure 16. Convert Jar To Workflow .......................................................................................... 18 Figure 17. MapReduce Jar Directory ....................................................................................... 18 Figure 18. Toolbar .............................................................................................................................. 19 Figure 19. Upper Left Pane .......................................................................................................... 20 Figure 20. Create New Package / Class / Workflow ...................................................... 21 Figure 21. Lower Left Pane .......................................................................................................... 22 Figure 22. Assembly Process Palette ..................................................................................... 22 Figure 23. Source / Sink Palette ................................................................................................ 23 Figure 24. Console............................................................................................................................... 24 Figure 25. Right Pane ...................................................................................................................... 25 Figure 26. Transformator Type ................................................................................................... 26 Figure 27. Combiner Type ............................................................................................................ 44 Figure 28. Grouping Field ..............................................................................................................44 Figure 29. Aggregator Type ........................................................................................................... 47 Figure 30. Dialog Box to Make New Project ...................................................................... 50 Figure 31. New Project Created ................................................................................................ 50
UserManual | HGrid247
Table of Figure Figure 32. Figure 33. Figure 34. Figure 35. Figure 36. Figure 37. Figure 38. Figure 39. Figure 40. Figure 41. Figure 42. Figure 43. Figure 44. Figure 45. Figure 46. Figure 47. Figure 48. Figure 49. Figure 50. Figure 51. Figure 52. Figure 53. Figure 54. Figure 55. Figure 56. Figure 57. Figure 58. Figure 59. Figure 60. Figure 61. Figure 62.
Click on Parent Package ........................................................................................ Dialog Box to Create New Package ............................................................... New Package Created ........................................................................................... Dialog Box to Create New Workflow............................................................... New Workflow Created ......................................................................................... Data Voms ..................................................................................................................... Data VomsUsed ......................................................................................................... Drag link HFS Voms-Input into Transformator ....................................... Dialog Box of Transformation ........................................................................... Dialog Box of Input Fields ................................................................................... Add Voms Fields ....................................................................................................... Change Delimiter String........................................................................................ Parsing Voms Fields ................................................................................................ Drag link HFS VomsUsed-Input into Transformator ........................... Add VomsUsed Fields ........................................................................................... Parsing VomsUsed Fields .................................................................................... Drag link Parsing Voms into Select Voms.................................................... Drag the link ............................................................................................................... Set Transformator Substring ............................................................................. Set Transformator Reference ............................................................................ Select Voms Fields .................................................................................................. Drag link Parsing Vomsused into Select Vomsused .......................... Drag the links ............................................................................................................ Select Vomsused Fields ...................................................................................... Drag link into Outer Join on No-Serie ....................................................... Dialog Box for Join Process .............................................................................. Select Outer Join Fields ...................................................................................... Make Transformator ‘Select Data’ .................................................................. Add the fields ........................................................................................................... Drag the links ........................................................................................................... Dialog Box for SwitchCase – Status (1).......................................................
UserManual | HGrid247
51 51 51 52 52 53 53 54 54 55 55 55 56 57 57 58 58 59 59 60 60 61 61 62 63 63 64 64 65 66 66
Table of Figure Figure 63. Figure 64. Figure 65. Figure 66. Figure 67. Figure 68. Figure 69. Figure 70. Figure 71. Figure 72. Figure 73. Figure 74. Figure 75. Figure 76. Figure 77. Figure 78. Figure 79. Figure 80. Figure 81. Figure 82. Figure 83. Figure 84. Figure 85. Figure 86. Figure 87. Figure 88. Figure 89. Figure 90. Figure 91. Figure 92. Figure 93.
Dialog Box for SwitchCase - Status (2) .......................................................... 67 Dialog Box for SwitchCase - Status (3) .......................................................... 67 Dialog Box for SwitchCase - Status (4) .......................................................... 68 Dialog Box for SwitchCase - Date (1) ............................................................. 68 Dialog Box for SwitchCase - Date (2) ............................................................. 69 Dialog Box for SwitchCase – Date (3) ........................................................... 69 Dialog Box for SwitchCase – Date (4) ............................................................ 70 Dialog Box for SwitchCase - Denom (1) ....................................................... 70 Dialog Box for SwitchCase - Denom (2) ...................................................... 71 Dialog Box for SwitchCase – Denom (3) ..................................................... 71 Dialog Box for SwitchCase – Denom(4) ...................................................... 72 Dialog Box for SwitchCase - Product (1)........................................................ 72 Dialog Box for SwitchCase - Product (2)........................................................ 73 Dialog Box for SwitchCase – Product (3) .................................................... 73 Dialog Box for SwitchCase – Product(4) ..................................................... 74 Result of Select Data ............................................................................................... 74 Dialog Box for Combiner ...................................................................................... 75 Sequence for Combiner ........................................................................................ 75 Map-Aggregation tab ............................................................................................. 76 Result of Map-Aggregation ................................................................................ 76 Drag Link into Grouping ....................................................................................... 77 Group By Parameter ................................................................................................ 77 Drag Link into Aggregate .................................................................................... 77 Map Aggregation ..................................................................................................... 78 Drag Link Map Aggregation .............................................................................. 78 Drag Link into Join-Field ...................................................................................... 78 Join-Field Transformator ...................................................................................... 79 Drag the Link into Result .................................................................................... 79 Drag Link into Join-Field ...................................................................................... 79 Generate Process Succeed .................................................................................. 80 Execution in Cygwin ............................................................................................... 80
UserManual | HGrid247
1.System Overview 1.1 Project Background. Hadoop is a framework for running applications on large cluster built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes. Both MapReduce and the Hadoop Distributed File System are designed so that node failures are automatically handled by the framework. Solusi247 has developed the GUI tool named HGrid247 Workflow Designer. By using Hgrid247, you can design workflow and generate MapReduce easily that will be executed in Hadoop cluster.
1.2 Purpose of User Manual. This document describes usage of HGrid247. Features, functions, and how-to-use instructions are provided to guide user in using these modules, Structure This manual document contains the following chapters : Chapter 1 : Introduces you to Hgrid247 Application Chapter 2 : How to getting started with Hgrid247 Application Chapter 3 : Describes structure file of Hgrid247 Application Chapter 4 : Running the Application Chapter 5 : How to Making New Project
UserManual | HGrid247
8
2. Getting Started 2.1 Preparation Before you use Hgrid247, you need to install application below : 1. Java 1.6.x (jdk-6u4-windows-i586-p). 2. Cygwin, if you want to install hadoop in local (window environtment)
2.2 Install Hadoop from cygwin Hadoop can be downloaded from one of the Apache download mirrors. Open the link : http://www.apache.org/dyn/closer.cgi/hadoop/ After you dowbload the hadoop, you can install the hadoop from Cygwin. Follow these steps : 1. Copy hadoop file (for instance, hadoop-0.20.2.tar.gz) to directory Cygwin : d:\cygwin\usr\src 2. Run Cygwin, move to directory below : cd /cygdrive/d/cygwin/usr/src 3. Extract the hadoop file (for instance above, hadoop-0.20.2.tar.gz).
2.3 Setting Environment
1. Go to System Properties, click on tab Advanced
Figure 1. System Properties – Advanced Tab
UserManual | HGrid247
9
2. Click Variable.
button then click
button to add New System
Figure 2. Add System Variable
3. Add new system variable with this value : Variable Name : JAVA_HOME Variable Value : location where jdk is installed (for example C:\Program Files\Java\jdk1.6.0_23)
Figure 3. Java Home - Variable
4. If hadoop is installed in local, add new system variable with this value : Variable Name : HADOOP_HOME Variable Value : location where hadoop is installed (for example C:\cygwin\usr\src\hadoop-0.202.2)
Figure 4. Hadoop Home - Variable
UserManual | HGrid247
10
5. Add ‘location of installed jdk/bin’ to Path variable. (for example C:\Program Files\Java\jdk1.6.0_23\bin)
Figure 5. Path Variable - JDK
6. If Hadoop is installed in local, add ‘location of installed hadoop /bin’ to Path variable (for example C:\cygwin\usr\src\hadoop-0.20.2\bin)
Figure 6. Path Variable – Hadoop
UserManual | HGrid247
11
3. Structure File Hgrid247 has structure file as displayed in figure below :
Figure 7. Structure File
Lib This directory contains library that needed for GUI application of Hgrid24.jar. The Lib contains files below :
Figure 8. Lib Files
UserManual | HGrid247
12
LibApp This directory contains library that needed for application MapReduce. Files in this directory will be processed when we generate map reduce jar application that running in Haddop. The LibApp contains files below :
Figure 9. LibApp Files
Note
!
To run this app in hadoop, copy all jar files in this LibApp to lib directory where hadoop is installed (for example C:\cygwin\usr\src\hadoop-0.20.2\lib)
hgrid247.license Licence file for Hgrid247 application Hgrid247.jar Executable jar file of Hgrid247 Workflow Designer
UserManual | HGrid247
13
4. Running Application This section describes features, functions, and instructions in Hgrid247 application.
4.1 First Access To run Hgrid247, double click on HGrid247.jar or if you want to run from command prompt, type command below : java -jar Hgrid247.jar Then application HGrid247 Workflow Designer is displayed as figure below :
Figure 10. Main Window
UserManual | HGrid247
14
4.2 Work Area The Hgrid247 work area includes three panes on the right and left (upper and lower) side that helps you work with Hgrid247. Toolbar near the top of the window provide other controls that you can use, including toolbar pane and menu bar.
Figure 11. Work Area
A : Menu Bar B : Upper Left Pane C : Lower Left Pane D : Toolbar E : Right pane
4.2.1 Menu Bar Menu Bar consists of 3 menus as follow : 1. File, if you click on File menu, then it will appear sub menu below
Figure 12. File Menu
UserManual | HGrid247
15
Sub Menu
Description
Open Project
To open project from local directory
New Project
To create new project
Save File
To save the project
Generate Map ReduceJar
To generate Map Reduce Jar from Workflow
Exit
To exit from application 2. Operation, if you click on Operation menu, then it will appear sub menu below
Figure 13. Operation Menu
Sub Menu is appears depend on the active condition. For instance, if you select project, then the active sub menu is Project, as figure below :
In this menu, you can do some operation, like create new package, compile package, remove class etc.
UserManual | HGrid247
16
3. Tools, if you click on Tools menu, then it will appear sub menu below
Figure 14. Tools Menu
a. Setting HDFS Cluster This menu is used for setting HDFS Cluster. After you choose this menu then pop-up window to set configuration of HDFS Cluster is appears as figure below :
Figure 15. Setting HDFS Cluster
You also can open this pop-up window by clicking icon on Toolbar. b. File Transfer Operation This menu is used to transfer files from local directory into hadoop cluster. You also can transfer the files by clicking icon on Toolbar. c. Run Workflow Cluster This menu is used to run workflow on hadoop cluster. You also can run workflow by clicking icon on Toolbar
UserManual | HGrid247
17
d. Convert Jar to Workflow This menu is used to convert jar file to workflow project. After you choose this menu then pop-up window to convert jar to workflow is appears as figure below :
Figure 16. Convert Jar To Workflow
e. MapReduce Jar Directory This menu is used to set the directory of generated MapReduce Jar. After you choose this menu then pop-up window to select the directory is appears as figure below :
Figure 17. MapReduce Jar Directory
UserManual | HGrid247
18
4.2.2 Tool Bar Figure 18. Toolbar
The description of each toolbar is given in table below : Icon
Description To open project from local directory To create new project To close the project To set the property of workflow To select directory of generated MapReduce jar To generate MapReduce jar from workflow To compile package / java source To save workflow / java source To remove workflow To copy workflow To create new package To create new workflow To rename workflow To create new java class To set configuration of HDFS Cluster To transfer files from local directory into hadoop cluster To run workflow on hadoop cluster Zoom in the display of workflow Zoom out the display of workflow Zoom to fit the display of workflow To show / hide grid Export workflow to Jpeg image Delete link/Source/Assembly Show the description of Assembly/Source/Sink Apply change meta data to the next Assembl
UserManual | HGrid247
19
4.2.3 Upper Left Pane This pane is used to display workflow and java class file. After you create new project, then application will display the hierarchy as displayed in figure below :
Figure 19. Upper Left Pane
Project Name (for instance above, the project name is training) ExternalLibrary, consists of external library that needed when compiling MapReduce application. This library not included in jar application. SourcePackage, consists of some packages in project. In SourcePackage, automatically system will create package parent (where the package parent’s name is same with project name, for instance above the name is training). The parent package consists of packages below : 1. Aggregator, this package contains classes of custom aggregator. 2. Aggregator > Transformation, this package contains classes of custom aggregator transformator. 3. Buffer, this package contains class of custom buffer. 4. Filter, this package contains class of custom filter. 5. Function, this package contains class of custom filter. 6. Function > Transformation, this package contains class of custom function transformator.
UserManual | HGrid247
20
Another package can be created according to the requirement. You also can create new Java Class, Executable Class, Workflow or Chaining Workflow by selecting the package you want to place it then click right on mouse, as figure below :
Figure 20. Create New Package / Class / Workflow
If you want to add Library class, click on ExternalLibrary in hierarchy. Then click right on mouse then select Add or click menu Operation > Library > Add If you want to add Aggregator class, click on aggregator in hierarchy. Then click right on mouse then select New Aggregator or click menu Operation > Aggregator > New Aggregator If you want to add Transformator Aggregator class, click on aggregator > transformation in hierarchy. Then click right on mouse then select New Transformator Aggregator or click menu Operation > Transformator Aggregator > New Transformator Aggregator If you want to add Buffer class, click on buffer in hierarchy. Then click right on mouse then select New Buffer or click menu Operation > Buffer > New Buffer If you want to add Function class, click on function in hierarchy. Then click right on mouse then select New Function or click menu Operation > Function > New Function If you want to add Transformator Function class, click on function > transformation in hierarchy. Then click right on mouse then select New Transformator Function or click menu Operation > Transformator Function > New Transformator Function.
UserManual | HGrid247
21
4.2.4 Lower Left Pane
Figure 21. Lower Left Pane
This pane is consists of 3 tabs below : 1. Accembly Process Palette, this tab contains icons for stream processing
Figure 22. Assembly Process Palette
Icon
Name Pipe Transformator
UserManual | HGrid247
Description This assembly is used for branching flow This assembly is used for processing of individual row of streams
22
Icon
Name Transformator2 GrupBy CoGroup (Join) Aggregator Filter Split Filletr Combiner
Description This assembly is used for transformation with splitting accepted and rejected streams This assembly is used for grouping/sorting/merging streams This assembly is used for joining streams based on specific fields This assembly is used for aggregating calculation (min,max,count,sum,average) of grouped streams This assembly is used for filtering stream This assembly used for applying filter with two output streams This assembly is used for applying aggregation in a local mapper
Buffer
This assembly is used for processing a grouped stream
Function
This assembly is used for applying existing function to stream This assembly is used for applying function that will be defined to stream This assembly is used for packaging values of common group into one tuple
CodeFunction Pre Join Post Join
This assembly is used for extracting values of common group into tuples
Duplicate Check
This assembly is used for marking duplicate stream dat
Pmml
This assembly is used for pmml process
2. Source / Sink Palette, this tab contains icons for stream processing.
Figure 23. Source / Sink Palette
UserManual | HGrid247
23
Icon
Name Custom Tap HFS Tap CAT Tap JDBC Tap Hbase Tap HFS Tap
Description This assembly is used for processing file of custom format This assembly is used for processing Hadoop File System This assembly is used for conversing binary file to text file This assembly is used for joining streams based on specific fields This assembly is used for reading / writing data from / into HBase DB This assembly is used for source of processing Hadoop File System
HFS Tap
This assembly is used for sink of processing Hadoop File System
XMLSource
This assembly is used for reading xml formated data
XMLSink
This assembly is used for writing xml formated data
Costum
This assembly is used for reading costum formated data
3. Console, this tab display compilation information on the process generate jar file.
Figure 24. Console
UserManual | HGrid247
24
4.2.5 Right Pane This pane is used to display workflow and java class file. On the below bar, it displaysproperty name and property value of each assembly. For instance, in figure below :
Figure 25. Right Pane
UserManual | HGrid247
25
4.3 Transformator There are some types of transformator. By selecting transformator icon ( or ), application will display some types of transformator, as figure below :
Figure 26. Transformator Type
1. math, this transformator is used for mathematics operation. Transformator +
x
/ / (double) / (integer) x (double) x (integer) Calculation Ceil Floor ^(double) Random
UserManual | HGrid247
Description To add two or more fields input. It depends on type of output,if type of String then the input field will be concatenated with + as concate string To subtract two or more fields input. It depends on type of output,if type of String then the input field will be concatenated with - as concate string To multiply two or more fields input. It depends on type of output,if type of String then the input field will be concatenated with * as concate string To divide two or more fields input. It depends on type of output,if type of String then the input field will be concatenated with / as concate string To divide a field value by constant double value (stated in parameter) To divide a field value by constant integer value (stated in parameter) To multiply a field value by constant double value (stated in parameter) To multiply a field value by constant integer value (stated in parameter) To evaluate arithmetic operation To get the smallest long value that is greater or equal to the argument To get the smallest long value that is less or equal to the argument To calculte power of a field value with constant double value (stated in parameter) To generate random number from 0 to integer value (stated in parameter). If this transformator has input field, the output
26
Transformator
Description is the value of field concatenated with result of random generator.
Round
To get the closet long to the argument. the result is rounded to integer by adding 1/2 taking the floor of the result
2. string, this transformator is used for string operation. Transformator AddPostfix AddPrefix
Description To add constant string at the end of the input field value To add constant string in front of the input field value
AddPrefixPostfix
To add constant string infront of the input value and at the end of field value
CharsAt
To get chars specified by indexes
Concat
To concatenate two or more input fields with concate string specified in parameter To get head of text from begining until the index-th of token. If the number of token in the input field less than specified index, then all portion of input will be transfered to output. Parameter string in form of : [token],[index] token : the regex-token character (comma, if not specified) index : the index of token, if negatif the index is counted from right to left For example, if the input value stream :http://www.solusi247.com/index.php
Head
Parameter Value Result /,3 http://www.solusi247.com /,4 http://www.solusi247.com/index.php http://www.solusi247.com /,-1 Note : Due to characterisctic of java language some token string must be preceded by \\ , they are | become \\| tab become \\t ^ become \\^ space become \\s [ become \\[ ] become \\] ? become \\?
HeadMidTail
UserManual | HGrid247
To get head, middle and tail of text. Parameter string in the form of : [token],[head-index][tail-index] token : the regex-token character head-index : the end of index token for the head of text, if negatif the index is counted from right to left tail-index : the begin of index of token for the tail of text, if negatif the index is counted from right to left
27
Transformator
Description For example, if the input value stream :http://www.solusi247.com/index.php Parameter Value
/,1,3 /,1,-1 /,1,4
Head
Middle
Tail
index.php Http : /www.solusi247.com index.php /www.solusi247.com Http : Http : /www.solusi247.com/index.html null
Due to characterisctic of java language some token string must be preceded by \\ , they are | become \\| tab become \\t ^ become \\^ space become \\s [ become \\[ ] become \\] ? become \\? Length Lower
Lpad
Mid
To get length of text (the number of char in the text) of the input field To convert the text into lower case To add zero or more characters at the begining of input field. Parameter string in the form of : [Minimum Length],[char] Minimum Length : the length of expected output char : the filling character If the length of input field more than the specified minimum length, then the input transfered to the output without changes. For example, the input field value is : 666666 if the parameter filled with 10,0 then the output is 0000666666 if the parameter is filled with 5,0 then the output is still 666666 To get the middle of text from from-index until the to-index of token. Parameter string in the form of : [token],[from-index][to-index] token : the regex-token character from-index : the first index of token, if negatif this index is counted from right to left to-index : the second index of token, if negatif this index is counted from right to left For example, if the input value stream :http://www.solusi247.com/index.php Parameter Value Result /,1,3 /www.solusi247.com /,1,-1 /www.solusi247.com /www.solusi247.com/index.php /,1,4
UserManual | HGrid247
28
Transformator
Description Due to characterisctic of java language some token string must be preceded by \\ , they are | become \\| tab become \\t ^ become \\^ space become \\s [ become \\[ ] become \\] ? become \\?
ReplaceText
Rpad
SplitCounter
To replace a part of input text with another text. Parameter string in the form of : [Target Text],[Replacement] Target Text : the text that will be replaced Replacement : the replacement text To add zero or more characters at the end of input field. Parameter string in the form of : [Minimum Length],[char] Minimum Length : the length of expected output char : the filling character If the length of input field more than the specified minimum length, then the input transfered to the output without changes. For example, the input field value is : 666666 if the parameter filled with 10,0 then the output is 6666660000 if the parameter is filled with 5,0 then the output is still 666666 To get a array length of split string based on parameter Parameter : delimiter
StringLocation
To get the location of a substring in a field. If string to serach is not found in the field, then the this transformator will return negative number. Parameters : String to Search : the substring to search. Start Position : the position in string of field where the search will start. This argument is optional. If omitted, it defaults to 0. The first position in the string is 0. If the start position is negative, the function counts back start position number of characters from the end of field, and then searches towards the beginning of field nth appearance : the nth appearance of string to search. This is optional. If omitted, it defaults to 1.
Substring
To get a part of input field. Parameter string in the form of : [Begin Index],[Length] Begin Index : the begin index of character of the input field. The first index is indicated with 0. If the specified begin index more than the length of text, the result is empty text. Lengt : the length of text that will be transfered to to output. If the input field length is less than the specified length, then it gets until the last character in the input field
UserManual | HGrid247
29
Transformator
Description To get tail of text from the index-th of token until the end of text. If the number of token in the input field less than specified index, the result will be empty text. Parameter string in form of : [token],[index] token : the regex-token character index : the index of token, if negatif the index is counted from right to left For example, if the input value stream :http://www.solusi247.com/index.php Parameter Value Result /,3 index.php /,2 www.solusi247.com/index.php index.php /,-1
Trim
Due to characterisctic of java language some token string must be preceded by \\ , they are | become \\| tab become \\t ^ become \\^ space become \\s [ become \\[ ] become \\] ? become \\? To omit the leading and trailing whitespace of input field
Upper
To convert text to upper case
Tail
3. field, this transformator is used for field operation. Transformator
Description
ConstantValue
To add constant value (specified in parameter) to ouput. Field input is not needed
FieldJoiner
To join two or more fields to one field. The join character is specified in delimiter parameter.
FieldInput
To add file name of input stream to output. Field input is not needed.
Identity
To pass through the input field to ouput field
NVL
To give value if the input field is null or empty text
RecordCounter
To add record counter to the output. It is applicable on if the input stream is binary format
RegexSplitter
UserManual | HGrid247
To split a field value into two or more fields.The splitting is based on delimiter-string specified in parameter.Due to characterisctic of java language some delimiter string must be preceded by \\ , they are | become \\| tab become \\t ^ become \\^ space become \\s
30
Transformator
Description [ become \\[ ] become \\] Notes : if the count of splitting result more or less than the count of field specified in the output record, the value of input will be seen as error data and removed from stream. For example : Double click on RegexSplitter then change the delimiter string with ‘/’, ’\’, ‘\\|’, as figure below, then insert data sample to ensure delimeter string is correct:
4. lookup, this transformator is used for lookup to reference file operation Transformator DataFromFileRef
UserManual | HGrid247
Description This transformator is used to get a data from file reference.
31
Transformator
Description To lookup data in the reference file with one field refference. This is applicable if references data less than 100000 records. If references data more than 100000 records, it is recomended to use join. The param parameter string in the form of [file-ref ],[index-of-key-ref ],[delimiter],[sub-range-ofinput],[outputindexes]
Reference
file-ref : the name of reference file index-of-key-ref : part (field) index of reference file that will be joined by input value, default is 0 delimiter : delimiter that split rows of file reference into parts (fields), defult is ‘,’ sub-range-of-input : begin and last index for substring of input value that will be joined with the index-of-keyref part of file-reference for example ‘1-8’ -> input will be operated with substring(1,8) and the result will be joined with file reference. If it is not spicified, the whole input value field will be joined to ref. field output-indexes : the index of part file reference that will be transfered into output stream for example ‘1-2-4’ -> part 1, 2, and 4 (except the index-key) of file reference will be transfered into ouput stream. If it is not specified, than the output is all part of file reference that is not the index-key For example : Double click on Reference then change the delimiter string with ‘/’, ’\’, ‘\\|’, as figure below, then insert data sample to ensure delimeter string is correct:
UserManual | HGrid247
32
Transformator
Description To lookup data in the reference range file. This is applicable if references data less than 100000 records. If references data more than 100000 records, it is recomended to use join.
ReferenceRange
The param parameter string in the form of file-ref : the name of reference file delimiter : delimiter that split rows of file reference into parts (fields), defult is ‘,’ start index : index of start range of input value end index : index of end range of input value value index : index of value of input value For example : Double click on ReferenceRange then change the delimiter string with ‘/’, ’\’, ‘\\|’, as figure below, then insert data sample to ensure delimeter string is correct:
To lookup data in the reference file with references field more than 1. This is applicable if references data less than 100000 records. If references data more than 100000 records, it is recomended to use join. param parameter string in the form of [file-ref ],[indexes-of-key-ref ],[delimiter-of-file-ref ],[outputindexes]
References
UserManual | HGrid247
file-ref : the name of reference file (including the path) indexes-of-key-ref :parts (fields) indexes of reference file that will be joined by input value, default is 0. For example ‘0-1’ means that compound of first part and second part is key for joining input(s) with the file reference. delimiter-of-file-ref : delimiter that split rows of file reference into parts (fields), defult value is ‘,’ output-indexes :the index of part file reference that will be transfered into output stream for example ‘1-2-4’ -> part 1, 2, and 4 (except the index-key) of file reference will be transfered into ouput stream. If it is
33
Transformator
Description
not specified, than the output is all part of file reference that is not the index-key
For example : Double click on References then change the delimiter string with ‘/’, ’\’, ‘\\|’, as figure below, then insert data sample to ensure delimeter string is correct:
To process dynamic refrence if-then-else statement. This is applicable if references data less than 100000 records. If references data more than 100000 records, it is recomended to use join. parameter string in the form of [file-ref ],[field-index-ref ],[field-index-value],[delimiter-betweenfields], [delimiter-ref-data]
MultiValueRef
file-ref : the name of reference file field-index-ref : part (field) index of reference file that will be joined by input value, default is 0. For example ‘1’ means that second part is key for joining input(s) with the file reference. field-index-value : field index that will be used as return value delimiter-between-fields : delimiter that split rows of file reference into parts (field ref and field value), defult value is ‘;’ delimiter-ref-data : token that is used to delimite the value of reference, one reference value my be splited into parts that associated with return value. For example, content file ref data.txt :
if the parameter is filled with : data.txt,1,0,\\|,;
UserManual | HGrid247
34
Transformator
Description the transformator is equivalent with statements :
For example : Double click on MultiValueRef then change the delimiter string with ‘/’, ’\’, ‘\\|’, as figure below, then insert data sample to ensure delimeter string is correct:
MultiValueRef
UserManual | HGrid247
To get reference with length of the key reference is not fix. Parameter string in the form of : [file-reference],[field-delimiter],[ref-field-index][outputfieldindexes][lookup-order] file-reference : the name of file reference (including the path) field-delimiter : the delimiter of fields in the file reference. If delimiter is not specified, by default delimiter will be , . ref-field-index : the index of field in the file reference that will be use as reference-key If it is not specified it wil get 0 as default value. output-field_indexes : the index(es) of fields on file reference that wil be used as output. If multiple indexes, it must be sparated by lookup-order : order of lookup-process : 0 means lookup from the longest to the shortest field refrence, 1 means lookup from the shortest to the longest field reference. Default value is 0. For example the contents of ref file data.txt as folows :
35
Transformator
Description If the pararamter filled with data.txt,\\|,0,1,0 Stream with input abc123 will get output data2 Stream with input abc will get also output data2 If the pararamter filled with data.txt,\\|,0,1,1 Stream with input abc123 will get output data1 Stream with input abc will get also output data1 For example : Double click on MultiValueRef then change the delimiter string with ‘/’, ’\’, ‘\\|’, as figure below, then insert data sample to ensure delimeter string is correct:
ResolveRange
To get reference with length of the key reference is not fix. Parameter string in the form of : [file-reference],[field-delimiter],[ref-field-index][outputfieldindexes][lengths-of-ref-field] file-refernce : the name of file reference (including the path) field-delimiter : the delimiter of fields in the file reference. If delimiter is not specified, by default delimiter will be , . ref-field-index : the index of field in the file reference that will be use as reference-key If it is not specified it wil get 0 as default value. output-field_indexes : the index(es) of fields on file reference that wil be used as output. If multiple indexes, it must be sparated by lengths-of-ref-field : posibilitis of lengths of reference field. For example 8-7-6 means reference field length 8, 7 or 6. For example the contents of ref file data.txt as folows :
If the pararamter filled with data.txt,\\|,0,1,4-3 input abc123 will get output data2 input abc will get also output data2 UserManual | HGrid247
36
Transformator
Description For example : Double click on MultiValueRef then change the delimiter string with ‘/’, ’\’, ‘\\|’, as figure below, then insert data sample to ensure delimeter string is correct:
5. conversion, this transformator is used for conversion operation
Transformator
Description
BinToDec
To convert binary number format to Decimal number format. If the length of result less than the minimum-length specified in the parameter, then the preceding of result will be filled with 0’s
BinToHexa
To convert binary number format to hexa number format. If the length of result less than the minimum-length specified in the parameter, then the preceding of result will be filled with 0’s
DecToBin
To convert decimal number format to binary number format. .If the length of result less than the minimum-length specified in the parameter, then the preceding of result will be filled with 0’s
DecToHexa
To convert decimal number format to hexa number format. If the length of result less than the minimum-length specified in the parameter, then the preceding of result will be filled with 0’s
DecToOcta
To convert decimal number format to octa number format. If the length of result less than the minimum-length specified in the parameter, then the preceding of result will be filled with 0’s
UserManual | HGrid247
37
Transformator
Description To convert decimal number into string with particular format Sample of format String :
DecToString
HexaToBin
To convert hexa number format to binary number format. If the length of result less than the minimum-length specified in the parameter, then the preceding of result will be filled with 0’s
HexaToDec
To convert hexa number format to decimal number format. If the length of result less than the minimum-length specified in the parameter, then the preceding of result will be filled with 0’s
OctaToDec
To convert octa number format to decimal number format. If the length of result less than the minimum-length specified in the parameter, then the preceding of result will be filled with 0’s
6. date, this transformator is used for timestamp operation
Transformator
Description
AddDay
To add or subtract the specified amount of days to timestamp input field. Negative amount of days means subtraction, otherwise addition.The input value can be inserted by either manually inputing the amount of days or by choosing the input from another field (line)
AddMonth
To add or subtract the specified amount of months to timestamp input field. Negative amount of months means subtraction, otherwise addition.The input value can be inserted by either manually inputing the amount of days or by choosing the input from another field (line)
Adds
AddYear
UserManual | HGrid247
To add or subtract the specified amount of days, amount of months, and amount of years to timestamp input field. Negative amount of days means subtraction, otherwise addition. The input value can be inserted by either manually inputing the amount of days or by choosing the input from another field (line) To add or subtract the specified amount of years to timestamp input field. Negative amount of years means subtraction, otherwise addition.The input value can be inserted by either manually inputing the amount of days or by choosing the input from another field (line)
38
Transformator CurrentTimestamp
Description To insert current Timestamp to ouput stream. No input field is needed To insert String formated current Timestamp to ouput stream. No input field is needed. The following format letters are defined :
CurrentTimestampString
Diff
GetDay
UserManual | HGrid247
To calculate the differ between Timestamp-1 and Timestamp-2. The result is a number that represents the number of days. If the Timestamp-1 older than Timestamp-2 then the result is a negative number, otherwise a positive number To get the day of month of Timestamp. The result is number between 1 and 31
39
Transformator GetDayofWeek
Description To get day-of-week of Timestamp input field. The result is a number that represents the day of a week. Here is the table of result and its representation
GetDayofYear
To get day-of-year from given Timestamp input fied. The result is a number between 1 and 365. To get month of Timestamp input field. The result is a number that represents a month in one year. Here is the table of result and its representation
GetMonth
GetYear
To get year portion of Timestamp input field. The result is a number that represents a year
StringToTimestamp
To parse a String to Timestamp. The string being parsed must match with the Format string.For valid format string see CurrentTimestampString description.
TimestampToString
To parse a String to Timestamp. The string being parsed must match with the Format string.For valid format string see CurrentTimestampStrinng description.
UserManual | HGrid247
40
7. special-purpose, this transformator is used for special-purpose operation
Transformator DaysInMonthRef
Description This transformator is used to get the number of days of a given month and year refference.Needed parameter is refference file-name and the format string. This transformator is used to select a value based on if-else statements.The structure of if-else statement is follows : if (condition1) return value-1 else if (condition2) return value-2 .... else return value-n Key word if must be followed by (, condition, and ). Condition contains relational operator, they are :
IfElse
Condition may contain logical operator, they are :
Condition and return value can containt constant value or value from field input stream.Value from field input stream is market with ‘[FIELD-NAME]’.The Value from field input stream is read as String data. DataFromArgument
This transformator is used to get a data from argument command line.
MccMncSacGit
UserManual | HGrid247
41
Transformator
Description To select a value based on if-else statements.
SwitchCase
ParseXML
The conditions and it’s return value are stored in a ConditionList. The Condition-List is evaluated from the first element to the last element.If all of element in the Condition-List are evaluated as false, default return value will be transfered to output.The return value may constant, field operation, or calcuation operation.The structure of if-else statement is follows :
This transformator is used to parse xml formated data
8. transpose, this transformator is used for transpose operation.
Transformator
Description This transformator is used to transpose Collumns into row Example: If the data are as follows :
CRTranspose
After CRTranspose data will be
Examples of the following HGRID application Sample data before CRTranspose
UserManual | HGrid247
42
Transformator
Description This transformator is used to transpose Collumn to rows by splitting collumn value. The splitting is based on delimiter-string specified in parameter.Due to characterisctic of java language some delimiter string must be preceded by \\ , they are | become \\| tab become \\t ^ become \\^ space become \\s [ become \\[ ] become \\] Example: If the data are as follows
After SplitTranspose data will be SplitTranspose
Examples of the following HGRID application Sample data before SplitTranspose
9. custom, this transformator is used for custom definition.
UserManual | HGrid247
43
4.4 Combiner There are some types of Combiner. By selecting transformator icon ( application will display some types of combiner, as figure below :
),
Figure 27. Combiner Type
1. Grouping Field, this Combiner is used for grouping filelds.
Figure 28. Grouping Field
UserManual | HGrid247
44
2. Map Aggregator, this Combiner is used for agregation.
Transformator
Description To get the count of record from each group. For example :
Count
From grouped result above, then the Count value for : A = 5, B = 4 Max
To get the maximal value of grouped record
Min
To get the minimal value of grouped record To get the sum value of grouped record For example :
Sum
ConditionCount
UserManual | HGrid247
From grouped result above, then the Sum value for : A = 9, B = 13 To get the count of record from each group with particular condition. For example :
45
Transformator
Description If ConditionCount (F2 > 2) then A = 1, B = 2
ConditionMax ConditionMin
To get the maximal value of grouped record with particular condition To get the minimal value of grouped record with particular condition To get the sum value of grouped record with particular condition. For example :
ConditionSum
If ConditionCount (F2 > 2) then A = 3, B = 10
UserManual | HGrid247
46
4.5 Aggregator By selecting aggregator icon ( aggregator, as figure below :
), application will display some type of
Figure 29. Aggregator Type
1. Basic, this transformator contains basic type of aggregator 2. ConditionAggregator, this transformator contains type of aggregator with particular condition
Transformator ConditionAverage
Description To get the average value of grouped record with particular condition To get the count of record from each group with particular condition. For example :
ConditionCount
If ConditionCount (F2 > 2) then A = 1, B = 2 ConditionMax
UserManual | HGrid247
To get the maximal value of grouped record with particular condition
47
Transformator ConditionMin
Description To get the minimal value of grouped record with particular condition To get the sum value of grouped record with particular condition. For example :
ConditionSum
If ConditionCount (F2 > 2) then A = 3, B = 10
3. custom, this transformator is used for custom definition.
UserManual | HGrid247
48
4.6 Workflow on Hadoop To run Hgrid247 workflow on Hadoop, use this command hadoop jar <jar-file> <class-Name> <input-list> <output-list> where : hadoop jar : command to run jar file on hadoop <jar-file> : name of jar file that will be executed <class-Name> : name of class that will be executed, with the name of its package <input-list> : list of input, it can contains of file or directory. List of input is separated by space. <output-list> : list of output, it contains directory. List of output is separated by space. In folder output that will be created hgrid247-00000 s/d hgrid247_(N-1) where N is number of mapper or reducer. This files is output from mapper or reducer. For example : hadoop jar training.jar training.mypackage.reconcile voms vomsused output hadoop jar : command to run jar file on hadoop training.jar : name of jar file that will be executed training.mypackage.reconcile : name of class that will be executed, with the name of package is training.mypackage vom vomsused : list of input, it can contains of file or directory voms and output : list of output, it contains directory that contains output files from mapper / reducer
UserManual | HGrid247
49
5. Making Project This chapter will describe step by step to make new project in Hgrid247 application.
5.1 Making New Project Step by Step : 1. To make new project, choose menu File > New Project. 2. Application will display dialog box to save new project in local directory, as figure below :
Figure 30. Dialog Box to Make New Project
3. Select the directory then fill the name project. 4. Click button to save the project. 5. Application will display the new project in hierarchy as dispayed in figure below :
Figure 31. New Project Created
UserManual | HGrid247
50
5.2 Making New Package Step by Step : 1. Create new package by selecting the parent package, then click right on mouse
Figure 32. Click on Parent Package
2. Select New Package, then dialog box will appear, as figure below :
Figure 33. Dialog Box to Create New Package
3. Fill the Package Name, click
button then click
button.
4. Application will display the new package in hierarchy as dispayed in figure below :
Figure 34. New Package Created
UserManual | HGrid247
51
5.3 Making New Workflow It is better if we want to add a workflow, we place the workflow not in the standard package above. For instance, we will place the workflow in package: mypackage. Step by Step : 1. Create new workfflow by selecting the package (mypackage), then click right on mouse. 2. Select New Workflow, then dialog box will appear, as figure below :
Figure 35. Dialog Box to Create New Workflow
3. Fill the workflow Name, click button then click 4. Application will display the new workflow in hierarchy.
button
Figure 36. New Workflow Created
In this project, we will use data input below : a. Voms.txt This file contains information :
UserManual | HGrid247
52
The delimiter uses “/” character, as figure below :
Figure 37. Data Voms
b. VomsUsed.txt This file contains information :
The delimiter uses “|” character, as figure below :
Figure 38. Data VomsUsed
c. Product.txt This file contains information : MSISDN_CODE, PRODUCT_CODE, PRODUCT_TYPE
Next we will use Hgrid247 to reconcile these data. In this workflow, there will be process of joining, combining and aggregation. The name of workflow is ‘reconcile’
UserManual | HGrid247
53
5.3.1 Reading Text Files and Applying Parsing Process Step by Step : 1. Drag and Drop HFS Tap ( ) on workflow ‘reconcile’. 2. Change the label of HFS Tap by clicking right mouse on HFS then choose Edit Label. 3. Fill with name Voms-Input. 4. Drag and Drop Transformator ( ) on the workflow. 5. Change the label of Transformator with name ‘parsing voms’. 6. Drag link from HFS Tap into Transformator, as figure below :
Figure 39. Drag link HFS Voms-Input into Transformator
7. Add transformation into node parsing voms by double click on parsing voms, then the dialog box will be displayed as figure below :
Figure 40. Dialog Box of Transformation
8. Click right mouse on ‘Output Record’, then drop-down menu, select ‘Add Output Field(s)’, then will be displayed dialog box as figure below :
UserManual | HGrid247
54
Figure 41. Dialog Box of Input Fields
9. Fill the field with fields below :
10. Click OK button, then the form will be displayed as figure below :
Figure 42. Add Voms Fields
11. On list transformator, select RegexSplitter (in ‘field’ transformator), then drop down RegexSplitter into form. 12. Double click on RegexSplitter then change the delimiter string with ‘/’, as figure below, then insert data sample to ensure delimeter string is correct:
Figure 43. Change Delimiter String
UserManual | HGrid247
55
13. Drag link from Input Record into RegexSplitter(/) then drag link RegexSplitter(/) into all fields in Output Recor
Note
!
You can immadiately make links to all output fields by clicking right mouse on transformator then choose ‘Link to All Output Fields’
14. Click
button, then the result is displayed in figure below :
Figure 44. Parsing Voms Fields
15. Click
button.
16. Drag and Drop HFS Tap ( ) on the workflow. 17. Change the label of HFS Tap by clicking right mouse on HFS then choose Edit Label, fill with name VomsUsed-Input. 18. Drag and Drop Transformator ( ) on the workflow. 19. Change the label of Transformator with name ‘parsing vomsused’.
UserManual | HGrid247
56
20. Drag link from HFS Tap into Transformator, as figure below :
Figure 45. Drag link HFS VomsUsed-Input into Transformator
21. Add transformation into node parsing vomsused by double click on parsing vomsused. 22. On transformation form, click right mouse on ‘Output Record’, then drop-down menu, select ‘Add Output Field(s)’. 23. Fill the field with fields below :
24. Click OK button, then the form will be displayed as figure below :
Figure 46. Add VomsUsed Fields
25. On list transformator, select RegexSplitter, then drop down RegexSplitter into form. 26. Drag link from Input Record into RegexSplitter(\\|) then drag link RegexSplitter(\\|) into all fields in Output Record.
UserManual | HGrid247
57
27. Click
button, then the result is displayed in figure below :
Figure 47. Parsing VomsUsed Fields
5.3.2 Select the Fields 1. Drag and Drop Transformator ( ) , then change the label of Transformator with name ‘select voms’. 2. Drag link from ‘parsing voms’ into ‘select voms’, as figure below :
Figure 48. Drag link Parsing Voms into Select Voms
3. Add transformation into node select voms by double click on select voms. 4. On transformation form, fill output record with fields below :
UserManual | HGrid247
58
5. Drag the link from Input record to output record :
NUM_SERIE --> VOMS_SERIE DATE_USED --> VOMS_DATE VOMS_STAT -->VOMS_STATUS MSISDN -->VOMS_MSISDN DENOMINASI -->VOMS_DENOMINASI FILENAME -->VOMS_FILENAME
Figure 49. Drag the link
6. On list transformator, select Substring (in ‘string’ transformator), then drop down Substring into form. 7. Drag the link from MSISDN field in input record into transformator ‘Substring’. 8. Doule click Substring then fill with value below : - Begin Index = 2 - Length = 3
Figure 50. Set Transformator Substring
9. On list transformator, select Reference (in ‘lookup’ transformator), then drop down Reference into form. 10. Drag the link from transformator ‘Substring’ into transformator ‘ Reference’.
UserManual | HGrid247
59
11. Doule click Reference then fill with value below : - File Reference = voms_ref/product.txt (where product.txt is file reference and voms_ref is folder where product.txt is stored. - Field Delimiter = \\| - Index of Field Ref. = 0 - Indexes of Output Fields = 2
Figure 51. Set Transformator Reference
12. Drag the link from transformator ‘Reference’ into ‘VOMS_PRODUCT’ in Output Record. The result is displayed in figure below :
Figure 52. Select Voms Fields
13. Click
UserManual | HGrid247
button.
60
14. Drag and Drop Transformator ( ) , then change the label of Transformator with name ‘select vomsused’.
Figure 53. Drag link Parsing Vomsused into Select Vomsused
15. Add transformation into node select vomsused by double click on select vomsused. 16. On transformation form, fill output record with fields below :
17. Drag the link from Input record to output record : MSISDN --> VU_MSISDN DENOMINASI -->VU_DENOMINASI DATE_USED --> VU_DATE_USED NUM_SERIE --> VU_NUM_SERIE FILENAME --> VU_FILENAME
Figure 54. Drag the links
UserManual | HGrid247
61
18. On list transformator, select Substring (in ‘string’ transformator), then drop down Substring into form. 19. Drag the link from MSISDN field in input record into transformator ‘Substring’. 20. Doule click Substring then fill with value below : - Begin Index = 2 - Length = 3 21. On list transformator, select Reference (in ‘lookup’ transformator), then drop down Reference into form. 22. Drag the link from transformator ‘Substring’ into transformator ‘ Reference’. 23. Doule click Reference then fill with value below : - File Reference = voms_ref/product.txt - Field Delimiter = \\| - Index of Field Ref. = 0 - Indexes of Output Fields = 2 24. Drag the link from transformator ‘Reference’ into ‘VU_PRODUCT’ in Output Record. The result is displayed in figure below :
Figure 55. Select Vomsused Fields
5.3.3 Outer Join the Fields 1. Drag and Drop CoGroup( ) , then change the label of Transformator with name ‘outer join on no-serie’. 2. Drag link from ‘select voms’ into ‘outer join on no-serie’.
UserManual | HGrid247
62
3. Drag link from ‘select vomsused’ into ‘outer join on no-serie’. as figure below :
Figure 56. Drag link into Outer Join on No-Serie
4. Double click on ‘outer join on no-serie’, then dialog box will be displayed as figure below :
Figure 57. Dialog Box for Join Process
5. Select ‘Outer Join’ on Joiner Field
UserManual | HGrid247
63
6. In Left Group, click and drag on VOMS_SERIE field and then drop at Right Group on VU_NUM_SERIE field. The result is displayed in figure below :
Figure 58. Select Outer Join Fields
7. Click
button.
5.3.4 Select Data in Join Result 1. Drag and Drop Transformator ( ) , then change the label of Transformator with name ‘select data’. 2. Drag the link from ‘outer join on no-serie’ into transformator ‘select data’
Figure 59. Make Transformator ‘Select Data’
3. On transformation form, click right mouse on table ‘Output Record’ then choose ‘Add Output Field(s)’.
UserManual | HGrid247
64
4. Add the output fields with fields below : The result is displayed in figure below :
Figure 60. Add the fields
5. Drag the link from Input record to output record : VOMS_SERIE -->VOMS_SERIE VOMS_DATE --> VOMS_DATE VOMS_STATUS --> VOMS_STATUS VOMS_MSISDN --> VOMS_MSISDN VOMS_PRODUCT --> VOMS_PRODUCT VU_MSISDN --> VU_MSISDN VU_DENOMINASI --> VU_DENOMINASI VU_DATE_USED --> VU_DATE_USED VU_NUM_SERIE --> VU_NUM_SERIE VUPRODUCT --> VU_PRODUCT 6. On list transformator, select NVL (in ‘field’ transformator), then drop down NVL into form. 7. Drag the link from VOMS_FILENAME in input record into transformator NVL, then from output NVL, drag the link into VOMS_FILENAME in output record. 8. Drag the link from VU_FILENAME in input record into transformator NVL, then from output NVL, drag the link into VU_FILENAME in output record.
UserManual | HGrid247
65
9. On list transformator, select SwitchCase (in ‘special-purpose’ transformator), then drop down Switchcase into form. Do this 4 (four times). The result is displayed in figure below :
Figure 61. Drag the links
10. Drag the link from SwitchCase into Status field in output record. 11. Double click on SwitchCase (for Status), then the dialog box will be displayed, as figure below :
Figure 62. Dialog Box for SwitchCase – Status (1)
12. Click
UserManual | HGrid247
button to add the condition.
66
13. Fill with value below :
Figure 63. Dialog Box for SwitchCase - Status (2)
14. Fill Constant Value field with ‘1’. 15. Click button. 15. Click button.
Figure 64. Dialog Box for SwitchCase - Status (3)
17. Fill Constant Value field with ‘2’. 18. Click button.
UserManual | HGrid247
67
Figure 65. Dialog Box for SwitchCase - Status (4)
19. 20. 21. 22.
Fill Constant Value of ‘Default Return Value’ with ‘3’. Click button. Drag the link from SwitchCase into Date field in output record. Double click on SwitchCase (for Date), then the dialog box will be displayed, as figure below :
Figure 66. Dialog Box for SwitchCase - Date (1)
23. Click
UserManual | HGrid247
button to add the condition.
68
24. Fill with value below :
Figure 67. Dialog Box for SwitchCase - Date (2)
25. 26. 27. 28.
Select ‘Field’ in Return Expression Field. Select ‘VOMS_DATE’ in Field. Click button. Click button then fill with value below :
Figure 68. Dialog Box for SwitchCase – Date (3)
29. Select ‘Field’ in Return Expression Field. 30. Select ‘VU_DATE_USED in Field. 31. Click button.
UserManual | HGrid247
69
Figure 69. Dialog Box for SwitchCase – Date (4)
32. Select ‘Field’ in Return Expression Type of ‘Default Return Value’. 33. Select ‘VOMS_DATE’ in Field. 34. Click button. 35. Drag the link from SwitchCase into Denom field in output record. 36. Double click on SwitchCase (for Denom), then the dialog box will be displayed, as figure below :
Figure 70. Dialog Box for SwitchCase - Denom (1)
37. Click
UserManual | HGrid247
button to add the condition.
70
38. Fill with value below :
Figure 71. Dialog Box for SwitchCase - Denom (2)
39. 40. 41. 42.
Select ‘Field’ in Return Expression Field. Select ‘VOMS_DENOMINASI’ in Field. Click button. Click button then fill with value below :
Figure 72. Dialog Box for SwitchCase – Denom (3)
43. Select ‘Field’ in Return Expression Field. 44. Select ‘VO_DENOMINASI’ in Field. 45. Click button.
UserManual | HGrid247
71
Figure 73. Dialog Box for SwitchCase – Denom(4)
46. Select ‘Field’ in Return Expression Type of ‘Default Return Value’. 47. Select ‘VOMS_DENOMINASI’ in Field. 48. Click button. 49. Drag the link from SwitchCase into Product field in output record. 50. Double click on SwitchCase (for Product), then the dialog box will be displayed, as figure below :
Figure 74. Dialog Box for SwitchCase - Product (1)
51. Click
UserManual | HGrid247
button to add the condition.
72
52. Fill with value below :
Figure 75. Dialog Box for SwitchCase - Product (2)
53. 54. 55. 56.
Select ‘Field’ in Return Expression Field. Select ‘VOMS_PRODUCT’ in Field. Click button. Click button then fill with value below :
Figure 76. Dialog Box for SwitchCase – Product (3)
57. Select ‘Field’ in Return Expression Field. 58. Select ‘VU_PRODUCT’ in Field. 59. Click button.
UserManual | HGrid247
73
Figure 77. Dialog Box for SwitchCase – Product(4)
60. Select ‘Field’ in Return Expression Type of ‘Default Return Value’. 61. Select ‘VOMS_PRODUCT’ in Field. 62. Click button. The result is displayed in figure below :
Figure 78. Result of Select Data
63. Click
button to save all the setting before.
5.3.5 Applying Aggregation 1. Drag and Drop Combiner ( ) , then change the label of Transformator with name ‘Combiner’.
UserManual | HGrid247
74
Figure 79. Dialog Box for Combiner
3. In tab Grouping Fields, check list ( )based on the following sequence :
Figure 80. Sequence for Combiner
UserManual | HGrid247
75
4. Click on Map-Aggregation tab, then it will be displayed form as figure below :
Figure 81. Map-Aggregation tab
5. In output record, add these fields : 6. On list map-aggregator, select Count then drop down Count into form. 7. Drag the link from VOMS_SERIE in input record into aggregator Count, then from output Count, drag the link into QUANTITY in output record. 8. On list map-aggregator, select Sum then drop down Sum into form. 9. Drag the link from DENOM in input record into aggregator Sum, then from output Count, drag the link into AMOUNT in output record. 10. Click button. The result is displayed in figure below :
Figure 82. Result of Map-Aggregation
UserManual | HGrid247
76
11. In main form, Drag and Drop GroupBy( ) , then change the label with name ‘Grouping’. 12. Drag the link from ‘Combiner into ‘Grouping’
Figure 83. Drag Link into Grouping
13. Double click in Grouping icon, then application will display dialog box as figure below :
Figure 84. Group By Parameter
14. Check list (
)based on the following sequence :
15. Click button. 16. In main form, drag and Drop Aggregator ( ) , then change the label with name ‘Aggregate’. 17. Drag the link from ‘Combiner into ‘Grouping’
Figure 85. Drag Link into Aggregate
UserManual | HGrid247
77
18. Double click in Aggregate icon, then application will display dialog box as figure below :
Figure 86. Map Aggregation
19. On list aggregator, select Sum then drop down Sum into form. 20. Drag the link from QUANTITY in input record into aggregator Sum, then from output Count, drag the link into QUANTITY in output record. 21. On list aggregator, select Sum then drop down Sum into form. 22. Drag the link from AMOUNT in input record into aggregat or Sum, then from output Count, drag the link into AMOUNT in output record.
Figure 87. Drag Link Map Aggregation
23. Click button. 24. In main form, Drag and Drop Transformator ( the label with name ‘join-field’. 25. Drag the link from ‘Grouping into ‘join-field’
) , then change
Figure 88. Drag Link into Join-Field
UserManual | HGrid247
78
26. Double click in join-field icon, then application will display dialog box as figure below :
Figure 89. Join-Field Transformator
27. On list transformator, select FieldJoiner (in ‘field’ transformator), then drop down FieldJoiner into form. 28. Delete the output field then change becomes one field = ‘RESULT’. 29. Drag link from all input fields into FieldJoiner or click right mouse on FieldJoiner then select Link From All Input Fields. 30. Drag link from FieldJoiner into output record. The result is displayed in figure below :
Figure 90. Drag the Link into Result
31. Click
button.
32. In main form, Drag and Drop Transformator ( label with name ‘output data. 33. Drag the link from ‘join-field’ into ‘output data’
) , then change the
Figure 91. Drag Link into Join-Field
UserManual | HGrid247
79
5.3.6 Generate MapReduce jar from Workflow 1. Click on toolbar to generate MapReduce jar from workflow. 2. If process generate success, then dialog box will appear as figure below :
Figure 92. Generate Process Succeed
5.4 Running Workflow on Hadoop Workflow â&#x20AC;&#x2DC;Reconcileâ&#x20AC;&#x2122; has 2 inputs and 1 file reference. The input are Voms data and Vomsused data, while the file reference file is product.txt. For example, we store Voms data in folder voms and we store Vomsused in folder vomsused then product.txt as file reference is stored in voms_ref folder. Finally, the output will be stored in output folder. Command for running in hadoop is : hadoop jar training.jar training.mypackage.reconcile voms vomsused output Execution in development environment is displayed as figure below :
Figure 93. Execution in Cygwin
UserManual | HGrid247
80
After hadoop jar has been executed, then the result in hgrid-0000 on folder output as following :
UserManual | HGrid247
81
Powered By:
PT. DUA EMPAT TUJUH Segitiga Emas Bussiness Park Jl. Prof Dr. Satrio Kav 6 Jakarta Selatan 12940 Indonesia Phone : (021) 579 511 32 Fax : (021) 579 511 28 www.solusi247.com