Ijctt v8p115

Page 1

International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014

SVM Based Ranking Model for Dynamic Adaption to Business Domains Sandeep Reddy Daida1, G.Manoj Kumar2 Student, Department of CSE, MRCET, Hyderabad, India 1 Asst.Professor, Department of CSE, MRCET, Hyderabad, India 2

Abstract: Huge information can be obtained from

are returning huge amount of information that make

vertical search domains. Often the information is

the end users to spend more time to pick the right

very large in such a way that users need to browse

information that they need. This caused problems to

further to get the required piece of information. In

end users. In order to overcome this problem, many

this context ranking plays an important role. When

ranking models came into existence. The ranking

ranking is required for every domain, it is tedious

models include SVM [1], [2], RankNet [3],

task to develop ranking algorithms for every domain

RankBoost [4], LambdaRank [5] and so on. These

explicitly. Therefore it is the need of the hour to have

algorithms helped to obtain information that can help

a ranking model that can adapt to various domains

users immediately.

implicitly. Recently Geng et al. proposed an

There are search engines that are domain specific

algorithm that has a ranking model which can adapt

which are moved from broad based searches to

to various domains. This avoids the process of

domain specific searches in order to provide vertical

writing various algorithms for different domains. In

search information to end users. Such search engines

this paper we built a prototype application that

are very useful to end users by returning required

implements the algorithm to provide ranking to the

documents. Thus many search engines came into

search results of various domains. The experimental

existence that can serve images, music and video.

results reveal that the algorithm is effective and can

They act on various documents of different types and

be used in the real world applications.

formats.

Index Terms –Ranking, domain specific search, and ranking adaptation

Ranking models are used by broad based search

I.INTRODUCTION

Frequency (TF) for ranking the results. However, the

Internet has become information super highway

broad based ranking models provide information of

which provides plethora of information of various

any kind with ranking. They are much generalized

domains. Internet has been around and being used to

ones that can’t provide domain specific results. When

add more information to it from various quarters of

ranking models are required for vertical domains,

the world. At the same time information retrieval has

many models are needed in order to serve all

been very useful from Internet that led people to

domains. This will be very cumbersome and time

obtain required information. Vertical domains, of

consuming to do so. Therefore it is very much

late,

required to have an algorithm that can adapt to

are

helping

people

to

obtain

required

information with ease. However, the search engines

ISSN: 2231-2803

engines with various techniques. They use Term

various domains in the real world.

http://www.ijcttjournal.org

Page79


International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 From the experiment in the real world it is

from search engines. However, many of the

understood that the broad based search engines can

researches were focusing on general information

provide

useful

retrieval and not specific to domains. When they are

immediately unless user browses for more accurate

specific to domains, they could not be able to adapt to

information. To overcome this problem, ranking

new domains. Language models for retrieval of

adaptation to new domains is required by the

information [13], [12] and Classical BM25 [14]

algorithms of classifiers. In this context research was

worked well. When few parameters are to be adjusted

found in the literature as explored in [6], [7], [8], [9],

and obtain results, these models worked fine.

and [10]. However, it is new research for adapting to

However, they were inadequate to adapt to new

new domains i.e. ranking model adaptation. Concept

domains for the purpose of ranking. For this reason it

drifting [11] and classifier adaptation existed in the

is understood from the review of literature that there

literature. The former is related to predicting rankings

was a need for making a new ranking model which

while the latter deals with binary targets. These

could adapt to different domains spontaneously

classifiers have some problems as they could not

without causing problems. Before presenting our

adapt to new vertical domains.

ranking model adaptation concept, we would like to

information

that

cannot

be

review the previous models. Recently Geng et al. [12] proposed a new ranking adaptation algorithm that helped in domain specific

Of late, ranking algorithms came into existence that

search. One algorithm can adapt to various domains

helped information retrieval with ranking that could

and help in all domains instead of labeled data. The

let users to get more useful results at the top of the

effective adaption of ranking models helped the

search results. The ranking problem was converted to

algorithm

classification problem by some of the algorithms.

to

serve

users

who

need

diverse

information from across the domains. Their algorithm

The

examples

of

such

algorithms

include

addressed all the problems and made the domain

LambdaRank [5], ListNet [15], RankNet [3],

specific search successful with a single ranking

RankBoost [4], and Ranking SVM [1], [2] etc. These

model as it can adapt to new domains. In this paper

algorithms have single objective that is optimizing

we implemented that model and tested it practically.

search results. Recently Gent et al. built a new

The empirical results are encouraging. The remainder

ranking model adaptation algorithm that helps in

of the paper is structured into sections. Section 2

ranking the search results of any domain. This

focuses on review of literature. Section 3 provides

reduced the need for developing a new ranking model

information about the new ranking adaptation model.

for each domain. Some other ranking related models

Section 4 gives details about the experiments, results

were developed in [7], [6], [16], and [9]. In this paper

and evaluation while section 5 provides conclusions.

we implemented the model developed by Geng et al. [12].

II.RELATED WORK

III.RAKING ADAPTATION In this section we provide details regarding the

In the literature many researches were found on the ranking models that helped to rank returned results

ISSN: 2231-2803

ranking model that has been implemented in this paper. Before that we describe the problem statement

http://www.ijcttjournal.org

Page80


International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 here. Let set of queries to be Q and set of documents to be D. Human annotators label the search results. The other ranking models studied include PageRank [17] and HITS [18]. Estimating ranking is the purpose of these algorithms. The documents retuned by them will be small and they depended on the prior

As expected in this paper, the data is from various

knowledge in the form of labels and training samples.

domains that have different features related to the

Ranking Adaptation SVM

corresponding domains. The ranking model we

In this paper we have an assumption that the ranking

implement practically adapts to those features. The

target domain and auxiliary domain are smaller and it

usage of the domain specific features and adapt to

is quite possible to have a ranking model that can

new domains make this model useful and avoid the

adapt to various domains. The regularization

necessity of making so many ranking models. We

frameworks that are conventional like Neural

also considered the ranking loss concept into the

Networks [19] and SVM [20] have some problem in

framework. Document similarities concept is used in

obtaining results with various domains. This problem

the framework. The margin rescaling is also

was named as ill-posed problem that needs prior

considered as an optimization problem for rescaling

assumption and thus not suitable for ranking model

the margin violations if any with respect to

adaptation.

adaptability. The same is presented as follows.

Therefore

we

use

regularization

framework in order to solve this problem elegantly with the help of an adaptive ranking function. The ranking adaptation model which has been proposed is as follows.

In the same fashion slack rescaling is formulated as follows:

Adapting to Multiple Domains The algorithm proposed in this paper can be extended for multiple domains that can be used in the ranking model adaptation. New domains are learned on the fly by the same ranking model in the process of adaptation. Thus the proposed system supports domain specific search with only one ranking model. The multiple domain adaptation can be formulated as follows with an assumption that certain auxiliary functions help the system to adapt to new domains.

ISSN: 2231-2803

http://www.ijcttjournal.org

Page81


International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 IV.EXPERIMENTAL RESULTS

0.4

We built a prototype application with web based 0.3

application is built using Java/JEE platform that used JDBC, Servlets and JSP. The environment includes a

NDCG

interface to demonstrate the proof of concept. The

TAR-Only

0.2

AUX-Only

PC with 2 GB RAM with Pentium Core 2 Dual

0.1

processor is used for experiments. We used datasets

0

Lin-Comb RA-SVM 12345 6 7 8 9

such as TD2004 and TD 203 which were gathered from Internet sources. The performance of the model

Truncation Level

is measured using cumulative gain and means average precision. The results are compared with

Fig. 2 - TD2003 to TD2004 adaptation with ten

other baseline methods such as Aux-Only, Tar-Only,

queries

and Lin-Comb. As can be seen in fig. 2, comparison is made with adaptation performance of proposed algorithm with

0.4

other three algorithms. The adaptation is from NDCG

0.3

TD2003 dataset to TD2004 dataset with ten queries.

0.2 0.1 0 1

3

5

7

TAR-Only

Out of all, the proposed algorithm has shown best

AUX-Only

performance. As number of queries is increased,

Lin-Comb

when Aux-Only is compared with Tar-Only, the Tar-

RA-SVM

Only outperforms the other model.

9

Truncation Level

0.5 Aux-Only

Fig. 1 - TD2003 to TD2004 adaptation with five queries

As can be seen in fig. 1, comparison is made with adaptation performance of proposed algorithm with other three algorithms. The adaptation is from TD2003 dataset to TD2004 dataset with five

NDCG

0.4

Tar-Only

0.3

Lin-Comb

0.2

RA-SVM

0.1

RA-SVM-GR

0 1 3 5 7 9 11 13 Truncation Level

RA-SVM-MR RA-SVM-SR

queries.Out of all, the proposed algorithm has shown best performance. When Aux-Only is compared with Tar-Only, the Aux-Only outperforms the other

Fig. 3 –NDCG Results of web page search to image search adaptation with five labeled queries

model.

ISSN: 2231-2803

http://www.ijcttjournal.org

Page82


International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 As can be seen in fig. 3, adaptation is made from web

queries. The performance of proposed model is

page search to image search with five labeled queries.

compared with other baseline models. The proposed

The performance of proposed model is compared

model outperforms other models.

with other baseline models. The proposed model outperforms other models.

0.5 Aux-Only

0.5 Aux-Only

NDCG

0.4

Tar-Only

0.3

Lin-Comb

0.2

RA-SVM

0.1

RA-SVM-GR

0

RA-SVM-MR

1 3 5 7 9 11 13

NDCG

0.4

Tar-Only

0.3

Lin-Comb

0.2

RA-SVM

0.1

RA-SVM-GR

0 1

4

7 10 13 Truncation Level

RA-SVM-MR RA-SVM-SR

RA-SVM-SR

Truncation Level

Fig. 6 – NDCG Results of web page search to image search adaptation with thirty labeled queries

Fig. 4 – NDCG Results of web page search to image search adaptation with ten labeled queries

As can be seen in fig. 6, adaptation is made from web page search to image search with thirty labeled

As can be seen in fig. 4, adaptation is made from web

queries. The performance of proposed model is

page search to image search with ten labeled queries.

compared with other baseline models. The proposed

The performance of proposed model is compared

model outperforms other models.

with other baseline models. The proposed model outperforms other models.

V.CONCLUSION

NDCG

In this paper we studied information retrieval systems 0.5 0.4 0.3 0.2 0.1 0 1 4

7 10 13 Truncation Level

Aux-Only

such as search engines. We came to know that the

Tar-Only

search engines are broad based search engines and

Lin-Comb

they can’t provide domain specific information.

RA-SVM

However, there are certain domain specific search

RA-SVM-GR

engines that have problem to adapt to new domains.

RA-SVM-MR

These search engines provide huge amount of

RA-SVM-SR

information that makes the user not happy. The reason behind this is that the end user has to browse

Fig. 5 – NDCG Results of web page search to image search adaptation with twenty labeled queries

information. This has been improved a lot by various

As can be seen in fig. 5, adaptation is made from web page search to image search with twenty labeled

ISSN: 2231-2803

and spend some time to have exactly required

ranking models. However, those ranking models could not adapt to new domains. In this paper we

http://www.ijcttjournal.org

Page83


International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 implemented a ranking model that can adapt to new

[11] R. Klinkenberg and T. Joachims, “Detecting Concept Drift

models and thus avoid making many algorithms for

withSupport Vector Machines,” Proc. 17th Int’l Conf. Machine Learning(ICML ’00), pp. 487-494, 2000.

each domain. Our model is based on the work done

[12]

J.M. Ponte and W.B. Croft, “A Language Modeling

by Geng et al. [12] We built a prototype application

Approach to Information Retrieval,” Proc. 21st Ann. Int’l ACM

that demonstrated the proof of concept. The empirical

SIGIR Conf. Research and Development in Information Retrieval,

results are encouraging.

pp. 275-281, 1998. GENG ET AL.: RANKING MODEL ADAPTATION FOR DOMAIN-SPECIFIC SEARCH 757 [13] J. Lafferty and C. Zhai, “Document Language Models, Query

REFERENCES

Models, and Risk Minimization for Information Retrieval,” Proc. 24th Ann. Int’l ACM SIGIR Conf. Research and Development in

[1] R. Herbrich, T. Graepel, and K. Obermayer, “Large Margin

Information Retrieval (SIGIR ’01), pp. 111-119, 2001.

RankBoundaries for Ordinal Regression,” Advances in Large

[14] S. Robertson and D.A. Hull, “The Trec-9 Filtering Track Final

MarginClassifiers, pp. 115-132, MIT Press, 2000.

Report,” Proc. Ninth Text Retrieval Conf., pp. 25-40, 2000.

[2]

T.

Joachims,

“Optimizing

Search

Engines

Using

[15] Z. Cao and T. Yan Liu, “Learning to Rank: From

ClickthroughData,” Proc. Eighth ACM SIGKDD Int’l Conf.

PairwiseApproach to Listwise Approach,” Proc. 24th Int’l Conf.

Knowledge Discoveryand Data Mining (KDD ’02), pp. 133-142,

MachineLearning (ICML ’07), pp. 129-136, 2007.

2002.

[16] H. Daume III and D. Marcu, “Domain Adaptation for

[3] C.J.C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds,

Statistical Classifiers,” J. Artificial Intelligence Research, vol. 26,

N.Hamilton, and G. Hullender, “Learning to Rank Using

pp. 101-126, 2006.

GradientDescent,” Proc. 22th Int’l Conf. Machine Learning (ICML

[17] L. Page, S. Brin, R. Motwani, and T. Winograd, “The

’05), 2005.

Pagerank Citation Ranking: Bringing Order to the Web,” technical report, Stanford Univ., 1998.

[4]

Y. Freund, R. Iyer, R.E. Schapire, Y. Singer, and G.

[18] J.M. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan,

Dietterich, “An Efficient Boosting Algorithm for Combining

and A. Tomkins, “The Web as a Graph: Measurements, Models

Preferences,”J. Machine Learning Research, vol. 4, pp. 933-969,

and Methods,” Proc. Int’l Conf. Combinatorics and Computing,

2003.

pp. 1-18, 1999.

[5] C.J.C. Burges, R. Ragno, and Q.V. Le, “Learning to Rank

[19] F. Girosi, M. Jones, and T. Poggio, “Regularization Theory

withNonsmooth Cost Functions,” Proc. Advances in Neural

and Neural Networks Architectures,” Neural Computation, vol. 7,

InformationProcessing Systems (NIPS ’06), pp. 193-200, 2006.

pp. 219-269, 1995.

[6] J. Blitzer, R. Mcdonald, and F. Pereira, “Domain Adaptation

[20] V.N. Vapnik, Statistical Learning Theory. Wiley-Interscience,

withStructural Correspondence Learning,” Proc. Conf. Empirical

1998.

Methodsin Natural Language Processing (EMNLP ’06), pp. 120128, July2006. [7]

W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, “Boosting for

TransferLearning,” Proc. 24th Int’l Conf. Machine Learning (ICML ’07),pp. 193-200, 2007. [8] H. Shimodaira, “Improving Predictive Inference Under CovariateShift by Weighting the Log-Likelihood Function,” J. StatisticalPlanning and Inference, vol. 90, no. 18, pp. 227-244, 2000. [9] J. Yang, R. Yan, and A.G. Hauptmann, “Cross-Domain VideoConcept Detection Using Adaptive Svms,” Proc. 15th Int’l Conf.Multimedia, pp. 188-197, 2007. [10] B. Zadrozny, “Learning and Evaluating Classifiers Under SampleSelection Bias,” Proc. 21st Int’l Conf. Machine Learning (ICML ’04),p. 114, 2004.

ISSN: 2231-2803

http://www.ijcttjournal.org

Page84


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.