GRD Journals- Global Research and Development Journal for Engineering | Volume 5 | Issue 6 | May 2020 ISSN- 2455-5703
Supervised Machine Learning Algorithms: Classification and Comparison Shweta Chaudhary Department of Computer Science and Engineering Sharda University, Greater Noida
Abstract Supervised Machine Learning (SML) is a search for algorithms that cause given external conditions to produce general hypotheses, and then make predictions about future events. Supervised classification is one of the most frequently performed tasks by smart systems. This paper describes various Supervised Machine Learning (ML) methods for comparing, comparing different learning algorithms and determines the best-known algorithm based on the data set, number of variables and variables (features). : Decision Table, Random Forest (RF), Naive Bayes (NB), vector Support Machine (SVM), Neural Networks (Perception), JRip and Tree Decision (J48) using learning tool the Waikato Information Machine (WEKA). In order to use algorithms, diabetes data were set up to be classified into 786 cases with eight characteristics such as independent variables and reliability analyzes. The results indicate that the SVM was found to be an algorithm with great accuracy and accuracy. Naive Bayes and Random Forest classification algorithms were found to be more accurate following SVM. Studies show that the time it takes to build a model and accuracy (accuracy) is a factor on the other hand; while statistical kappa and mean Absolute Error (MAE) are another factor on the other hand. Therefore, ML algorithms require more precision, accuracy and less error to evaluate machine learning prediction. Keywords- Machine Learning, Classifiers, Mining Techniques, Data Analysis, Learning Algorithms, Monitored Machine Learning
I. INTRODUCTION Machine learning is one of the most rapidly developing areas of computer science. It means automatic detection of meaningful patterns in the data. Machine learning tools are concerned with learning and adaptive learning systems. Machine learning has become one of the most important forms of Information Technology and, therefore, the central, often hidden, part of our lives. With the ever-increasing prices of available data, there is good reason to believe that systematic data processing will be as complete as necessary ingredients for technological advancements. There are many applications of Machine Learning (ML), most important of which are data minerals. People tend to make mistakes between analyzes or, perhaps, when trying to build relationships between multiple symptoms. Data Mining and Machine Learning for Siamese twins where more information can be found with relevant learning algorithms. There has been great progress in data mining and machine learning due to the emergence of smart and Nano technologies that has raised the interest in discovering hidden patterns of quantitative data. The combination of mathematics, machine learning, information orders, and computer has created strong science, solid mathematical foundations, and very powerful tools. Supervised reading creates the mapping function of the desired output input. The unprecedented data generation has made machine learning techniques more sophisticated at times. This requires the use of a few algorithms to study an unmanaged virtual machine. Readings made for that are most common in partition problems because the purpose is usually to get the computer to read through the editing program we created. ML is wholly intended to achieve access hidden within Big Data. ML contributes to ensuring value extraction from large and unique data sources with minimal systematic dependence on the individual track as data is cut and increased at machine level. Machine learning is well suited to the sophisticated input of managing different data sources and the large range of variables and amount of data involved when ML succeeds in non-additive information. The more information that is provided in the ML structure, the more it can be trained and affect the effects of a higher level of understanding. In the relief that comes from the restriction of the scale and the consideration of individual levels, ML is wise to discover and display patterns hidden in the data. Another common way of doing supervised reading work is the problem of classification: The student needs to learn (guess how he or she performs some memory-based work in one of the many classes by looking at examples of reproducible mechanical input). Learning is the process of learning a set of rules from specific contexts (examples in a training set), or multitasking, to create a classifier that can be used to generalize from new contexts. The procedure for using the supervised ML for a real-world problem is described in Figure 1.
All rights reserved by www.grdjournals.com
8
Supervised Machine Learning Algorithms: Classification and Comparison (GRDJE/ Volume 5 / Issue 6 / 003)
Fig. 1: Supervised Machine Learning Techniques
This work focuses on the classification of ML algorithms and determining the most efficient algorithms with high accuracy and accuracy. As well as introducing the functionality of the different algorithms to large and small data sets with a view it has separated them well and provided insight into how to build supervised machine learning models. The remainder of this work is organized as follows: Section 2 presents a review of the literature on the categories of supervised learning algorithms; section 3 presents the methodology used, section 4 discusses the results of the work while section 5 presents the conclusions and recommendations of the other works.
II. REVIEW READING A. Classification of Supervised Learning Algorithms Accordingly, the machine learning algorithm for multi-segmentation based algorithms includes the following: Linear Classifiers, Logistic Regression, NaĂŻve Bayes Classifier, Perception, Support Vector Machine; Quadratic Classifiers, K-Means Cluswing, Weed Power, Tree Decision, Random Forest (RF); Neural Networks, Bayesian Networks and more. 1) Linear Classifiers Linear models for the classification of different vegetation types using the boundary (plane line) of decision boundaries. The goal of linear segmentation with machine learning is to group objects with the same element values, in groups. It emphasized that the classifier placed in the queue achieves this goal by making a classification decision based on the number of combinations of features. A high classifier is often used in cases where the speed of separation is problematic, since it is measured with a fast classifier. Also, line separators tend to be most efficient when the maximum value is large, such as document fragmentation, where each component is generally a word count in a text. The degree of overlap between variable data sets however depends on the line. In short, the margin specifies how well the partitioned data is, which is why it is very easy to solve the given partitioning problem. 2) Logistic Regression This is a classification function which uses a class for building and uses a single multinomial logistic regression model with a single estimator. Logistic regression usually tells where the boundary between classes is, and it also states that the squared probability depends, in a certain way, on the distance from the boundary. When the data set is large it moves towards the intensity (0 and 1). These statements about probability make logic more than just classification. It makes stronger, more detailed predictions and fits in a different way; but those strong assumptions may be wrong. Logistic regression is an approach to estimation such as Ordinary List Square (OLS) regression. However, with logistic regression, the estimation results in bipolar.
All rights reserved by www.grdjournals.com
9
Supervised Machine Learning Algorithms: Classification and Comparison (GRDJE/ Volume 5 / Issue 6 / 003)
3) Innocent Bayesian (NB) Networks These are very simple Bayesian networks with a strong (if not identical) node of child nodes having a parent (representing a talking node) and a large number of children (corresponding to related nodes). Are made with directed acyclic graphs. In reference to his parents. Thus, the independence model (Naive Bayes) is based on the approximation. Bayes classifiers are generally less accurate than other advanced learning algorithms (such as ANN). Decision tree induction, however, can be compared to state-of-the-art algorithms for example-based learning with naive Bayes classifiers. And the rule is included on standard benchmark datasets, and even in datasets with significant data dependencies, it is sometimes found to be superior to other learning schemes. The Bayes classification has the characteristic-independence problem, which is solved with mean one-dependence estimators. 4) Multi-layer Perception This is a classification in which the network's wattage is found by solving convex, uncontrolled minimization problems in standard neural network training rather than solving quadratic programming problems with linear constraints. Other popular algorithms rely on umption. Perception algorithm is used to learn from training sets, repeatedly running through the training set until the algorithm becomes an estimation vector. Find the right one in all training sets. This estimation rule is used to evaluate labels on test sets. 5) Support Vector Machines (SVMs) This is a recent supervised machine learning technology. Support vector machine (SVM) models are related to multilayer perception and neural networks. VVS revolves around the notion of a margin either side of the hyper plane separating the two data classes. Maximizing the margin and creating as much distance as possible between the separating hyper planes, and the upper limit on the normalization error, shown in the examples on either side, has been demonstrated. 6) K-means It is one of the most simple and unpredictable learning algorithms that can solve a clustering problem. This process follows a simple and easy way to classify a given data by a specific number of clusters (beyond clusters). When the labelled data is not available — the algorithm is used. The simplest way to change a strict rule. The most accurate assessment rule is thumb. If you look at the weak― learning algorithm that can find consistent classifiers (rules of thumb), at least better than random, say, accuracy _ 55%, with enough data, a very high accuracy algorithm can produce, say, 99%. 7) Decision Trees Decision Trees (DT) are trees that sort examples by feature values. Each node in the decision tree represents an attribute in the context that needs to be categorized and each branch node can represent a value. The examples are initialized at the root node and sorted based on their attribute values. Decision tree learning, used in data mining and machine learning, uses the decision tree to model observations about conclusions about the objective value of an object. Those tree models are classification trees or regression trees. Decision tree classifiers typically use post-pruning techniques to assess tree performance, as they can be sorted using a valid set. You can remove any node and assign it to the most common class of serialization. 8) Neural Networks Advanced Neural Networks (NNs), which can perform multiple regression and / or classification tasks simultaneously but usually each network only does one. Therefore, in most cases, the network has only one output variable, although in the case of multi-state classification problems, it may be compatible with multiple output units (the next stage of processing takes care of the mapping. Output Units of Output Variables). The three basic elements of the Artificial Neural Network (ANN) unit depend on the input and activation functions, network architecture, and the weight of each input connection. Since the first two issues are solved, the behaviour of the ANN is defined by the current values of the weights. The net weight of the training set is initially set to random values, and then the instances of the training set are repeatedly exposed to the net. The values of the input of an instance are placed in the input units and the output of the net is compared to the desired output for this example. Then, all the weights of the net are adjusted slightly in the direction of the net, bringing the output values of the net closer to the desired output. There are many algorithms that train the network. 9) Bayesian Network Bayesian Network (BN) is a graphical model for probability relationships between variables. Bayesian networks are well known representatives of statistical learning algorithms. The problem with BN classmates is that they are not suitable for datasets with multiple attributes. This prior expertise or domain knowledge about Bayesian network architecture can take the following forms: – A node declares a parent node, which means it has no parent. – Declare that a node is a leaf node, which means it has no children. – Declare that a node is the direct cause or direct effect of another node. – Declare that a node is indirectly connected to another node. – N is given a condition-set declaring that the two nodes are independent. – Ordering partial nodes, that is, a node in the order appears earlier than another node. – Providing full node commands.
All rights reserved by www.grdjournals.com
10
Supervised Machine Learning Algorithms: Classification and Comparison (GRDJE/ Volume 5 / Issue 6 / 003)
B. Properties of Machine Learning Algorithms Supervised machine learning techniques apply across many domains. Many machine learning (ML) application-based documents can be found in [18], [25]. SVM and neural networks work much better when working with multiple dimensions and continuous features. On the other hand, logic-based systems work better for dealing with discrete / hierarchical features. For neural network models and SVMs, a large sample size is required to achieve its maximum prediction accuracy, while NBs may require new datasets. There is general agreement that k-NN is very sensitive to irrelevant properties: this characteristic can be explained by the way the algorithm works. Most decision tree algorithms may not work well with problems requiring diagonal partitioning. Example The division of space is orthogonal to the axis of the variable and parallel to all other axes. Therefore, the fields after the partition are all hyper rectangles. There is multi-co linearity and there is a nonlinear relationship between input and output characteristics when ANN and SVM work well. Naive Bayes (NB) requires less storage space during the training and classification stages: memory required to store the strict minimum pre- and conditional probabilities. The basic kNN algorithm consumes a large amount of storage space for the training phase and its implementation space is at least larger than its training space. In contrast, for all lazy learners, the implementation space is usually much smaller than the training space, because the resulting classification is usually a condensed summary of the data. In addition, Naive Bayes and KNN can be easily used as incremental learners, while normative algorithms cannot. Naive Bayes is inherently strong for missing values because they are ignored in the computing potential and therefore have no impact on the final decision. In contrast, kNN and neural networks require complete records to complete their work. In contrast, Decision Trees and Rule classifiers have a similar operational profile. SVM and ANN have similar operational profile. A single data algorithm does not equal all other algorithms in all datasets. Different data can be set with different types of variables and the number of instances determines the type of algorithm that works best. There is no single learning algorithm that determines other data algorithms according to free lunch theory. Table 1 presents a comparative analysis of various learning algorithms. Table 1: Comparative learning algorithms (**** stars indicate best and * worst performance)
III. RESEARCH METHODOLOGY Data for the research was obtained from the National Institute of Diabetes and Digestive and Kidney Diseases at the University of California, Available on the Web site: https://archive.ics.uci.edu/ml/machine-learning-database/pima-Indian. -Diabetes / (2017). This data is selected because of its accuracy and is anonymous (unidentified), so confidentiality is ensured. The number of attributes is 8, the square is formed. 9. All the properties of the numeric value are as follows: – Pregnancy number – Oral glucose tolerance test for 2 hours with ration as plasma glucose – Diastolic blood pressure (mm Hg) – Triceps skin fold thickness (mm) – 2-hour serum insulin (mu u / ml) – Mass Body Mass Index (kg weight / height in meters) ^ 2) All rights reserved by www.grdjournals.com
11
Supervised Machine Learning Algorithms: Classification and Comparison (GRDJE/ Volume 5 / Issue 6 / 003)
– – –
Diabetic pedigree function Age class variable (0 or 1) Table 2: Class Distribution: (Class Vol 1 "tested positive for diabetes") and (Class Vol 0 means "tested negative for diabetes") Classroom The number of values Changed value (attribute) 0 500 NO 1 268 YES
Table 2 shows that of all the examples used for this research, 500 tested positive for diabetes and 268 tested negative for diabetes. Comparative analysis between different monitored machine learning algorithms was performed using WEKA 3.7.13 (for analysis for WEKA - Waikato environment). The data set is trained to represent the nominal attribute column as a dependent variable. Values for Class Distribution (Class Variable) are changed to 1 Yes, which means that 0 values for positive and Class Distribution (Class Variable) are not changed, i.e. Negative is tested for Diabetes. This is necessary because most algorithms must have at least one nominal variable column. Seven classification algorithms were used in this research: Decision Table, Random Forest, Novay Bay, SVM, and Neural Networks (Perception), JRIP and Decision Tree (J48). The following characteristics were considered for comparative analysis: time, correctly classified, misclassified and tested modes, number of examples, kappa statistic, MAE, accuracy of yes, accuracy of NO and classification. This research work was carried out by tuning the parameters of two different sets of examples, in order to evaluate accuracy and to ensure accuracy for different machine learning algorithms. According to the first category of 768 examples and 9 symptoms (gestational number, oral glucose tolerance test, ration as plasma glucose, 2 hours, diastolic blood pressure (mm Hg), triceps skin fold thickness (mm), 2-hr insulin per year). , Body mass index (weight in kilogram (height in metre) ^ 2), diabetes pedigree function, age (year) and class variable (0 or 1) as a dependent variable and eight Vatantra variables. With. The second category consists of 384 examples and 6 features of the data set (multiple times pregnant, 2 hours on oral glucose tolerance test as plasma glucose, 2 hours serum insulin (mu / mL), diabetes gene function, age (years) )) And class variables (0 or 1)) with one dependent variable and five independent variables.
IV. RESULTS AND DISCUSSION A. Results WEKA was used to classify and compare different machine tilt algorithms. Table 3 shows the results along with the 9 parameters along with the considered parameters. Table 3: Comparison of different classification algorithms with larger data sets and more features Algorithm Decision Table Random Forest Naive Bayes
Time (SEC)
Correctly classified (%)
Incorrectly Classified (%)
0.23
72.3948
27.6042
0.55
74.7396
25.2604
0.03
76.3021
23.6979
SVM
0.09
77.3438
22.6563
Neural network
0.81
75.1302
24.8698
JRip
0.19
74.4792
25.5208
Decision Tree (J48)
0.14
73.8281
26.1719
Test mode 10-fold-cross validation 10-fold-cross validation 10-fold-cross validation 10-fold-cross validation 10-fold-cross validation 10-fold-cross validation 10-fold-cross validation
Attributes
No of instances
9
768
Kappa statistic
MAE
0.3752
0.341
Accuracy Of Yes
Accuracy Of No
0.619
0.771
9
768
0.4313
0.3105
0.653
0.791
9
768
0.4664
0.2841
0.678
0.802
9
768
0.4682
0.2266
0.740
0.785
9
768
0.4445
0.2938
0.653
0.799
9
768
0.4171
0.3461
0.659
0.780
9
768
0.4164
0.3158
0.632
0.790
Classification rules Trees Bayes Functions Functions rules tree
Time taken to build My Model (Mean Absolute Error), which is a measure of how close the end result or forecast is. Table 4: comparison of the 6 features of the classifier and the various machines tilt algorithms and scales Algorithm Decision Table Random Forest Naive Bayes SVM
Time (SEC)
Correctly classified (%)
Incorrectly Classified (%)
0.09
67.9688
32.0313
0.42
71.875
28.125
0.01
70.5729
29.4271
0.04
72.9167
27.0833
Neural Networks
0.17
59
41
JRip
0.01
64
36
Test mode 10-fold-cross validation 10-fold-cross validation 10-fold-crossvalidation 10-fold-crossvalidation 10-fold-crossvalidation 10-fold-crossvalidation
Attrib utes
No of instances
Kappa statistic
MAE
Accuracy Of Yes
Accuracy Of No
Classificat ion
6
384
0.3748
0.3101
0.581
0.734
Rules
6
384
0.3917
0.348
0.639
0.763
Trees
6
384
0.352
0.3297
0.633
0.739
Bayes
6
384
0.3837
0.2708
0.711
0.735
Functions
6
384
0.1156
0.4035
0.444
6
384
0.2278
0.4179
0.514
0.672 0.714
All rights reserved by www.grdjournals.com
Functions Rules
12
Supervised Machine Learning Algorithms: Classification and Comparison (GRDJE/ Volume 5 / Issue 6 / 003) Decision Tree (J48)
0.03
64
36
10-fold-crossvalidation
6
384
0.1822
0.4165
0.56
0.685
Tree
Time is the time taken to model. MAE (Mean Absolute Error) is a measure of how close the end result or expectations are. Yes, testing positive for diabetes. NO is a negative test for diabetes Table 4 shows the results of a comparison of the 6 features of the classifier and the various machines tilt algorithms and scales. The kappa statistic is a metric that compares the observed accuracy with the expected accuracy (random chance). That means testing positive for diabetes. NO is a negative test for diabetes Tables 5: Ranking of Accuracy of Positive Diabetes and Negative Diabetes Using Small Sets of Different Algorithms Small Dataset 384 Algorithm Yes (Positive Diabetes) NO (negative diabetes) SVM 0.711 0.735 Random Forest 0.639 0.761 Naive bays 0.633 0.739 Decision table 0.581 0.734 Decision Tree (J48) 0.519 0.685 JRip 0.514 0.714 Neural Network (Perception) 0.444 0.672 Tables 6: Ranking of Accuracy of Positive Diabetes and Negative Diabetes Using Large Sets of Different Algorithms Large Dataset 384 Algorithm Yes (Positive Diabetes) NO (negative diabetes) SVM 0.74 0.785 Naive Bayes 0.678 0.802 JRip 0.659 0.78 Random Forest 0.653 0.791 Neural Network (Perception) 0.653 0.799 Decision tree (J48) 0.632 0.79 Decision Table 0.619 0.771 Tables 7: Small data sets are shown over time to classify correctly and misclassify the model to be classified with the correct algorithm Small Dataset 384 Algorithm Time Correctly Classified Incorrectly Classified SVM 0.04 sec 72.92% 27.08% Random Forest 0.42 sec 71.88% 28.13% Naive Bayes 0.01 sec 70.57% 29.43% Decision Tree 0.09 sec 67.97% 32.03% JRip 0.01 sec 64% 36% Decision Tree(J48) 0.03 sec 64% 36% Neural network(perception) 0.17 sec 59% 41% Table 8: Large data sets are shown over time to classify correctly and misclassify the model to be classified with the correct algorithm Large dataset 768 Algorithm Time Correctly Classified Incorrectly Classified SVM 0.09 sec 77.34% 22.66% Naive Bayes 0.03 sec 76.30% 23.70% Neural network(Perception) 0.81 sec 75.13% 24.87% Random Forest 0.55 sec 74.74% 25.26% JRip 0.19 sec 74.48% 25.52% Decision Tree(J48) 0.14 sec 73.83% 26.17% Decision Table 0.23 sec 72.40% 27.60% Table 9: Detailed analysis of various dataset attributes Attribute number Mean Standard Deviation 1 3.8 3.4 2 120.9 32.0 3 69.1 19.4 4 20.5 16.0 5 79.8 115.2 6 32.0 7.9 7 0.5 0.3 8 33.2 11.8
B. Discussion Table 3 shows a comparison of the results of the 768 cases and 9 features. It is observed that all algorithms have higher kappa statistics than MAE (Mean Absolute Error). Furthermore, correctly classified examples outperform incorrectly classified examples. With more data sets, this is an indication that attendance analysis is more reliable. SVM and NB require large sample sizes to achieve maximum prediction accuracy, as shown in Table 3, while Decision Tree and Decision Table have minimum accuracy. All rights reserved by www.grdjournals.com
13
Supervised Machine Learning Algorithms: Classification and Comparison (GRDJE/ Volume 5 / Issue 6 / 003)
Table 4 shows a comparison of the results of the 384 cases and 6 features. Kappa's statistics for neural networks, JRIPs, and J48 are lower than MAE and do not describe accuracy and accuracy. However, SVM and RF with high data sets show high accuracy and accuracy. Decision Table has produced more time models than JRip and Decision Tree. Therefore, less time does not guarantee accuracy. If the kappa statistic is less than the mean absolute error (MAE), the algorithm does not show accuracy and precision. It follows that an algorithm that cannot use such features for that data set does not show accuracy and precision. Table 6 shows the accuracy of large data sets and small data sets with SVM. Yet Table 5 shows the SVM as a very accurate algorithm. Small data set. Stories 7 and 8 show a comparison of correctly classified and misclassified percentages for small and large datasets over time during model preparation. From Table 7, the results are revealed as naive Bayes and JRip as the fastest time algorithms to build, although JRip has a correctly classified lower percentage, which shows that the construction time is not an accurate model. In the same vein, SVM has the highest level of accuracy with a time of 0.04 seconds. Table 8 compares this result with the neural network () h) The third is the correctly classified algorithm. This means that the neural network works better with larger datasets compared to smaller data sets. Furthermore, the results indicate that the decision table does not work well with large datasets. The SVM algorithm performs the highest classification and the larger the dataset, the greater the accuracy. Table 9 shows the mean and standard deviation of all traits used in this research, indicating that plasma glucose concentrations (feature 2) have the highest average and the lowest mean of diabetic pedigree function (symptom 7), indicating strong effects. In small data sets. However, low standard deviation (SD) is not desirable, meaning that the function of the diabetes pedigree (feature 7) may not be of importance when analyzing large data sets.
V. CONCLUSION AND RECOMMENDATION FOR FURTHER WORKS ML classification requires fine tuning of parameters and number of instances of data set at the same time. Creating a model only for the algorithm is not a matter of time, but an accurate and correct classification. Therefore, a best practice algorithm for a particular data set cannot guarantee the accuracy and accuracy of another data set, whose characteristics are logically different from those of the other. However, the important question when dealing with ML classification is not whether a learning algorithm is superior to others, but under what circumstances a particular method can best explain others on a given application problem. For this purpose, meta-learning uses a set of attributes called meta-attributes to represent the characteristics of teaching tasks and searches for the interrelationships between these characteristics and the performance of learning algorithms. Some of the characteristics of learning tasks are: number of examples, hierarchy of proportions, proportion of missing values, admission of classes, etc. Provided a comprehensive list of information and statistical measures for the dataset. After a thorough understanding of the strengths and limitations of each method, the possibility of combining two or more algorithms to solve the problem needs to be investigated. The idea is to use the strengths of one method to complement the weaknesses of another. If we are only interested in the best possible classification accuracy, it is difficult or impossible to find a single classifier that displays a good set of classifiers. VV, NB and RF machine learning algorithms can provide high accuracy and accuracy, regardless of the number of features and data. This research suggests that model preparation time is a factor on the one hand; and while Kappa is accurate with statistics, MAE is another aspect on the other side. Therefore, the ML algorithm requires accuracy, accuracy and minimum error so that the machine can monitor attendant machine learning. This work recommends that for large data sets, consider a distributed processing environment. This allows for a high degree of correlation between the variables, which ultimately makes the model more efficient.
REFERENCES [1]
[2] [3]
[4]
[5]
Alex S. Sgt. and Viswanathan, S.V.N. (2008). Introduction to machine learning. Copyright 5 Cambridge University Press 2008. ISBN: 0-521-82583-0. Available on the KTH website: https://www.kth.se/social/upload/53a14887f276540ebc81aec3/online.pdf Retrieved from: http://alex.smola.org/drafts/thebook/pdf Bishop, c. M. (1995). Neural Networks for Pattern Recognition. Clarendon Press, Oxford, England. 1995. Brazil P., Soares C.. & D. Costa, J. (2003). Ranking Learning Algorithms: Using IBL and Meta-Learning on Accurate and Timely Outcomes. Machine Learning Volume 50, Issue 3, 2003. Copyright © Kluwer Academic Publishers. Made in the Netherlands, doi: 10.1023/A:1021713901879P.251-277. Available on the Springer website: https://link.springer.com/content/pdf/10.1023%2FA%3A1021713901879.pdf Cheng, J., Greener, R., Kelly, J., Bell, D. & Liu, W. (2002). Bayesian Network Learning from Data: Information-Theory Based Approach. Artificial Intelligence Volume 137. Copyright © 2002. Published by Elsevier Science. Wife. All rights reserved. 43 - 90. Science Direct: http://www.sciencedirect.com/science/article/pii/S00043704200191111 Domingo, p. And Pajani, M. (1997). On the suitability of ordinary Bayesian classifiers in zero-one loss. Machine Learning Volume 29, pp. 103-130 Copyright © 1997 Kluwer Academic Publishers. Made in the Netherlands. Available on the University of Trento website: http://disi.unitn.it/~p2p/RelatedWork/Matching/domingos97optimality.pdf
All rights reserved by www.grdjournals.com
14