GRD Journals | Global Research and Development Journal for Engineering | International Conference on Innovations in Engineering and Technology (ICIET) - 2016 | July 2016
e-ISSN: 2455-5703
Hospital Recommendation using Hybrid Approach 1N.SanjeevRam 2R.Vairamuthu 3M.RajaPrabhu 4C.Pandian 1,2,3
Student 4Assistant Professor Department of Information Technology 1,2,3,4 K.L.N. College of Engineering, Pottapalayam, Sivagangai 630612, India 1,2,3,4
Abstract E-commerce websites are very widely used and the data is constantly growing. Recommendation is a technique used to suggest items based on customer’s likes. The problem is that there is a vast amount of data. This remedy is to use a hybrid approach by combining the Matrix Factorization and Genetic algorithm to conclude the best results. This technique is now applied in the field of medicines. Keyword- Hybrid recommendation, MF, GA, Cold Start __________________________________________________________________________________________________
I. INTRODUCTION Web usage mining has focused on the extraction user patterns from the user logs for the purpose of marketing intelligence [1].This are in case of well-known users. The growing size of the digital information base prevents an effective access to knowledge due to the well-known phenomenon called as information overload [3]. This information overload prevents us from recommending the products which is essential to the user based on user’s likes and dislikes. Knowing the three features contributing to a good recommender system–recommendation accuracy, user satisfaction, and provider satisfaction [2]. These three factors are required to build a good recommendation system. Similar users can be identified using the resource they see and how the see the resource in the web. For e.g. If a user tags a resource as “funny” and the same tag is done by other person, proves they both have unanimous view[4]. This is used in collaborative filtering. CF is regarded as one of the most important and useful algorithms in recommendation systems recently [5]. Matrix factorization is a form of collaborative filtering. Matrix Factorization can be used to discover latent features underlying the interactions between two different kinds of entities. Given that each user have rated some items in the system, we would like to predict how the users would rate the items that they have not yet rated, such that we can make recommendations to the users. The other approach is to apply genetic algorithm to the ever changing user’s preferences since it changes from time to time. Although there are a number of different types of genetics-based machine learning systems, in this issue we concentrate on classifier systems and their derivatives. Classifier systems are parallel production systems that have been designed to exploit the implicit parallelism of genetic algorithms [6]. These results obtained from MF and GA are now obtained and compared. The term hybrid recommender system is used here to describe any recommender system that combines multiple recommendation techniques together to produce its output [7]. Now the items are recommended. The cold start problem is the inability to recommend to new or unknown users. The remedy is to recommend items based on user’s access location. This algorithm is applied to find the hospitals that match a certain criteria.
II. ARCHITECTURE The recommendation process consists of two stages. The first stage is the addressing of cold start problem. This cold start problem is overcome by getting the current access location of the user and recommending the items based on his current location which means fetching of the items that is confined to that particular geographical area. The next stage is to find the results for the particular input. The user preferences are matched against the existing item’s characteristics. A threshold value is fixed by Mean Absolute Error. The match between the two should produce a value greater than the threshold value. Now the matrix factorization is applied to the external ratings of the user. Given that each user has rated some items in the system, we would like to predict how the users would rate the items that they have not yet rated. The genetic algorithm works on the comparison of the user’s interests and item’s characteristics. The Genetic algorithm produces an item list which is combined with MF ratings. The combined result should be higher than the threshold value which is also a MAE value. The top n results are recommended.
All rights reserved by www.grdjournals.com
408
Hospital Recommendation using Hybrid Approach (GRDJE / CONFERENCE / ICIET - 2016 / 068)
Fig. 1: Data Flow Diagram
III. MATRIX FACTORIZATION Matrix factorization can be used to discover latent features underlying the interactions between two different kinds of entities. Given that each user have rated some items in the system, we would like to predict how the users would rate the items that they have not yet rated, such that we can make recommendations to the users. Having discussed the intuition behind matrix factorization, we can now go on to work on the mathematics. Firstly, we have a set of users, and a set of items. Let R of size |U| x |D|be the matrix that contains all the ratings that the users have assigned to the items. Also, we assume that we would like to discover $K$ latent features. Our task, then, is to find two matricesP (a |U| x Kmatrix) and Q (a |D| x K matrix) such that their product approximatesR: R ≈ P x Qᵀ = Ȓ In this way, each row of P would represent the strength of the associations between a user and the features. Similarly, each row of Q would represent the strength of the associations between an item and the features. To get the prediction of a rating of an item by, we can calculate the dot product of the two vectors corresponding to and Now, we have to find a way to obtain P and Q. One way to approach this problem is the first initialize the two matrices with some values, calculate how `different’ their product is to M, and then try to minimize this difference iteratively. Such a method is called gradient descent, aiming at finding a local minimum of the difference. The difference here, usually called the error between the estimated rating and the real rating, can be calculated by the following equation for each user-item pair: Here we consider the squared error because the estimated rating can be either higher or lower than the real rating. To minimize the error, we have to know in which direction we have to modify the values of & . In other words, we need to know the gradient at the current values, and therefore we differentiate the above equation with respect to these two variables separately:
All rights reserved by www.grdjournals.com
409
Hospital Recommendation using Hybrid Approach (GRDJE / CONFERENCE / ICIET - 2016 / 068)
Having obtained the gradient, we can now formulate the update rules for both
&
:
Here, α is a constant whose value determines the rate of approaching the minimum. Usually we will choose a small value for α, say 0.0002. This is because if we make too large a step towards the minimum we may run into the risk of missing the minimum and end up oscillating around the minimum. A question might have come to your mind by now: if we find two matrices P and Q such that P x Q approximates R, isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with P and Q such that we can reproduce R exactly. Instead, we will only try to minimize the errors of the observed user-item pairs. In other words, if we let
be a set of tuples, each of which is in the form of
, such that
contains all the observed user-item pairs together
with the associated ratings, we are only trying to minimize every for . (In other words, T is our set of training data.) As for the rest of the unknowns, we will be able to determine their values once the associations between the users, items and features have been learnt. Using the above update rules, we can then iteratively perform the operation until the error converges to its minimum. We can check the overall error as calculated using the following equation and determine when we should stop the process.
IV. GENETIC ALGORITHMS It is better than conventional AI in that it is more robust. Unlike older AI systems, they do not break easily even if the inputs changed slightly, or in the presence of reasonable noise. Also, in searching a large state-space, multi-modal state-space, or ndimensional surface, a genetic algorithm may offer significant benefits over more typical search of optimization techniques. GA is simulating the survival of the fittest among individuals over consecutive generation for solving a problem. Each generation consists of a population of character strings that are analogous to the chromosome that we see in our DNA. Each individual represents a point in a search space and a possible solution. The individuals in the population are then made to go through a process of evolution. GA is based on an analogy with the genetic structure and behavior of chromosomes within a population of individuals using the following foundations: Individuals in a population compete for resources and mates. Those individuals most successful in each 'competition' will produce more offspring than those individuals that perform poorly. Genes from `good' individuals propagate throughout the population so that two good parents will sometimes produce offspring that are better than either parent. Thus each successive generation will become more suited to their environment.
V. SEARCH SPACE A population of individuals is maintained within search space for a GA, each representing a possible solution to a given problem. Each individual is coded as a finite length vector of components, or variables, in terms of some alphabet, usually the binary alphabet {0, 1}. To continue the genetic analogy these individuals are likened to chromosomes and the variables are analogous to genes. Thus a chromosome (solution) is composed of several genes (variables). A fitness score is assigned to each solution representing the abilities of an individual to `compete'. The individual with the optimal (or generally near optimal) fitness score is sought. The GA aims to use selective `breeding' of the solutions to produce `offspring' better than the parents by combining information from the chromosomes.
Fig. 2: Search Space
The GA maintains a population of n chromosomes (solutions) with associated fitness values. Parents are selected to mate, on the basis of their fitness, producing offspring via a reproductive plan. Consequently highly fit solutions are given more opportunities to reproduce, so that offspring inherit characteristics from each parent. As parents mate and produce offspring, room must be made for the new arrivals since the population is kept at a static size. Individuals in the population die and are replaced by
All rights reserved by www.grdjournals.com
410
Hospital Recommendation using Hybrid Approach (GRDJE / CONFERENCE / ICIET - 2016 / 068)
the new solutions, eventually creating a new generation once all mating opportunities in the old population have been exhausted. In this way it is hoped that over successive generations better solutions will thrive while the least fit solutions die out. New generations of solutions are produced containing, on average, better genes than a typical solution in a previous generation. Each successive generation will contain more good `partial solutions' than previous generations. Eventually, once the population has converged and is not producing offspring noticeably different from those in previous generations, the algorithm itself is said to have converged to a set of solutions to the problem at hand.
VI. IMPLEMENTATION A. Based on Natural Selection After an initial population is randomly generated, the algorithm evolves the through three operators: Selection which equates to survival of the fittest. Crossover which represents mating between individuals. Mutation which introduces random modifications 1)
Selection Operator Key idea: give preference to better individuals, allowing them to pass on their genes to the next generation. The goodness of each individual depends on its fitness. Fitness may be determined by an objective function or by a subjective judgment.
2) Crossover Operator Prime distinguished factor of GA from other optimization techniques. Two individuals are chosen from the population using the selection operator. A crossover site along the bit strings is randomly chosen. The values of the two strings are exchanged up to this point. If S1=000000 and s2=111111 and the crossover point is 2 then S1'=110000 and s2'=001111. The two new offspring created from this mating are put into the next generation of the population. By recombining portions of good individuals, this process is likely to create even better individuals.
Fig. 3: Cross Over
3)
Mutation Operator With some low probability, a portion of the new individuals will have some of their bits flipped. Its purpose is to maintain diversity within the population and inhibit premature convergence. Mutation alone induces a random walk through the search space. Mutation and selection (without crossover) create a parallel, noise-tolerant, hill-climbing algorithms.
Fig. 4: Mutation
B.
Effects of Genetic Operators Using selection alone will tend to fill the population with copies of the best individual from the population. Using selection and crossover operators will tend to cause the algorithms to converge on a good but sub-optimal solution. Using mutation alone induces a random walk through the search space. Using selection and mutation creates a parallel, noise-tolerant, hill climbing algorithm.
C. Algorithms 1) Randomly initialize population (t). 2) Determine fitness of population (t). 3) Repeat 1) Select parents from population (t). 2) Perform crossover on parents creating population (t+1). 3) Perform mutation of population (t+1). All rights reserved by www.grdjournals.com
411
Hospital Recommendation using Hybrid Approach (GRDJE / CONFERENCE / ICIET - 2016 / 068)
4) Determine fitness of population (t+1). 4) Until best individual is good enough.
VII.
RELATED WORK
A. Existing System The existing system uses key word extraction from the profiles of various participating entities and filter performs filtering process to provide proper records of doctor and hospitals to users among the stored records on user's chosen specialty and hospital type. NLP is used to rank the hospitals based on the review.
VIII. PROPOSED SYSTEM The proposed system combines two different approaches. The first approach is the collaborative filtering. Matrix factorization is a collaborative filtering approach. Matrix factorization can be used to discover latent features underlying the interactions between two different kinds of entities. Given that each users have rated some items in the system, we would like to predict how the users would rate the items that they have not yet rated, such that we can make recommendations to the users. The Genetic Algorithm also called as GA is an evolutionary algorithm used to optimise the search results until the desired result is found. It is based on the process of natural selection. A. Addressing cold start problem The inability to recommend products to the new user because of unknown history and preferences are overcome by identifying the access location of the user. The hospitals are recommended based on the user’s current geographical access location. B. Matrix Factorization The matrix factorization works on the external ratings. The ratings given by the users after they visit the hospital, is based on their experience. We know that each user doesn’t visit each and every hospital. In order to predict how much a user rates for an unvisited hospital Matrix Factorization is used. This process finds unanimous users by grouping similar users together. The ratings are between 1 and 5. ‘0’ is used to represent an unrated item. The ratings are represented as a matrix. Columns represent the hospital names while the rows represent the users. For example: Hos1 Hos2 Hos3 Hos4 User1 5 3 0 1 User2 4 0 0 1 User3 1 1 0 5 User4 1 0 0 4 User5 0 1 5 4 After the matrix factorization the result matrix is Hos1 Hos2 Hos3 Hos4 User1 4.97 2.98 2.18 0.98 User2 3.97 2.40 1.97 0.99 User3 1.02 0.93 5.32 4.93 User4 1.00 0.85 4.59 3.93 User5 1.36 1.07 4.89 4.12 C. Genetic Algorithm The Genetic Algorithm is used for continuous comparison of user’s needs against the hospital’s facilities. The user is provided with a form. There are certain parameters given for searching. A few of them are emergency, outpatient and Insurance claims etc. The users are provided with two options one is “Mandatory” and the next is “optional”. If the user chooses mandatory that facility must be provided by that hospital and if he chooses optional the facility might or might not be needed, Fig 5. For example, if the user marks mandatory against the Insurance claims parameter the hospital must provide the insurance claim facility. Finally the user selects the type of hospital he wants. The hospitals are of two types’ general hospital and multi-specialty hospital. Generally, issues such as cold and fever can be treated by general hospitals and issues like diet problems and cancer can only be treated in the specialized hospital. It is solely based on the user’s choice. There are two arrays 1) The user’s input array and 2) The hospital’s facilities array. If the user wants specialized hospital then the user’s preferences of specialties (stored in a separate table) is checked against the hospital’s specialization services. Now, the user’s input is converted into an array of bits. Bit 1 for mandatory and Bit 0 for optional. So, for instance, the bit array is like 10110 for 5 parameters. Now, this bit stream is compared with the hospital’s facilities. The hospital’s facilities array is similar to user’s input. If the facility is provided then 1 is marked against the parameter else 0 is marked. Before, starting the procedure, the location of user is seen and a set of hospitals which match the user’s location is stored in a list and the searching process is done in that list. The user’s input array and the hospital’s facility array are compared. The mandatory fields must be first checked. For instance, if the user had marked “Emergency services” parameter as mandatory
All rights reserved by www.grdjournals.com
412
Hospital Recommendation using Hybrid Approach (GRDJE / CONFERENCE / ICIET - 2016 / 068)
the “Emergency services” field of the hospital’s facility array is first checked if both are 1 then it’s a match. Similarly, the all the mandatory fields are compared. Now, the optional fields are compared. The optional option is considered to be 0. For example, if labs parameter is considered to be an optional value but if the hospital provides lab facility then it is considered to be a hit. This hit is an added advantage. So, it is considered to be a match. The Fitness value is calculated using the Mean Absolute Error (MAE).The fitness function can be calculated by using the following formula, in this the real rating of the item j given by the user is represented as Ri and the weight of the attribute k for user i and j are represented as Wik, ejk&Mi and number of rated items are represented as A probabilistic selection is performed based on the individual’s fitness such that the better individuals have an increased chance of being selected. The sum of the fitness in a population is constant an individual with lower fitness has a larger probability to be chosen. Fitness Sc and PS represents the number of individual in the population and Pc represents the selection probability for chromosome c.
The Single point mutation technique is used to introduce the diversity of recommendation. Mutation operator is used to investigate and suggest improvements to the hospital by analysing users input. Since, user’s wants changes often. D. Combining approaches The genetic algorithm’s result and the average of hospital’s external rating is combined together and this value must be greater than the threshold value (which is also MAE). If the result is above the threshold then it is recommended else it is discarded.
Fig. 5: UI
E. Challenges The proposed system has several challenges. New items can’t be recommended and finding unanimous users becomes a problem until sufficient reviews are obtained.
IX. EXPERIMENTAL RESULTS A. Table
Fig. 6
All rights reserved by www.grdjournals.com
413
Hospital Recommendation using Hybrid Approach (GRDJE / CONFERENCE / ICIET - 2016 / 068)
B. Graph
Fig. 7 So to conclude, the more rigid the user requirements become, the more appropriate the recommendations are.
X. CONCLUSION Nowadays, every website which is involved in the provision of services use recommendation systems. These recommendation systems play a vital role in attracting the customers and make them to stick to their website. Hybrid filtering is one of the modern ways to design a recommendation system. Since, the traditional ways lack in certain aspects the hybrid recommendation involves the combination different traditional approaches to cover the short comings. This project, thus involves combining traditional approaches Matrix Factorization and the GA. The results prove that this approach has good accuracy in recommending items to the users.
XI. FUTURE ENHANCEMENTS Implicit learning of user’s actions can be used to deliver recommendations without user’s manual input so as to enhance their experience of surfing over the internet. Recommendations through mobile SMS services by linking phone numbers with the user profiles.
REFERENCES [1] BamshadMobasher, Honghua Dai, Tao Luo, Yuqing Sun and Jiang Zhu, "Integrating Web Usage and Content Mining for More Effective Personalization, ",2008 [2] A. Naak, H. Hage, and E.A (meur, "A Multi-criteria Collaborative Filtering Approach for Research Paper Recommendation in Papyres," MCETECH 2009, LNBIP 26, pp. 25-39, 2009. [3] Felice Ferrara, NirmalaPudota, and Carlo Tasso, "A Keyphrase -Based Paper Recommender System" IRCDL 2011, CCIS 249, pp. 14-25, 2011. [4] A. Nisgav and B. Patt-Shamir, “Finding similar users in social networks,” Theory Comput.Syst., vol. 49, pp. 720–737, 2011. [5] Bo X., Peng, H., Fan, Y., Ruimin, S., Pipe, CF.: A scalable DHT-based collaborative filtering recommendation system, WWW 2004, May, 17–22, 2004. [6] D. E. Goldberg and J. H. Holland, "Genetic Algorithms and Machine Learning," Machine Learning, vol. 3, no. 2-3, pp. 9599, 1988. Robin Burke,” Hybrid Web Recommender Systems”.
All rights reserved by www.grdjournals.com
414