International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014
SVM Based Ranking Model for Dynamic Adaption to Business Domains Sandeep Reddy Daida1, G.Manoj Kumar2 Student, Department of CSE, MRCET, Hyderabad, India 1 Asst.Professor, Department of CSE, MRCET, Hyderabad, India 2
Abstract: Huge information can be obtained from
are returning huge amount of information that make
vertical search domains. Often the information is
the end users to spend more time to pick the right
very large in such a way that users need to browse
information that they need. This caused problems to
further to get the required piece of information. In
end users. In order to overcome this problem, many
this context ranking plays an important role. When
ranking models came into existence. The ranking
ranking is required for every domain, it is tedious
models include SVM [1], [2], RankNet [3],
task to develop ranking algorithms for every domain
RankBoost [4], LambdaRank [5] and so on. These
explicitly. Therefore it is the need of the hour to have
algorithms helped to obtain information that can help
a ranking model that can adapt to various domains
users immediately.
implicitly. Recently Geng et al. proposed an
There are search engines that are domain specific
algorithm that has a ranking model which can adapt
which are moved from broad based searches to
to various domains. This avoids the process of
domain specific searches in order to provide vertical
writing various algorithms for different domains. In
search information to end users. Such search engines
this paper we built a prototype application that
are very useful to end users by returning required
implements the algorithm to provide ranking to the
documents. Thus many search engines came into
search results of various domains. The experimental
existence that can serve images, music and video.
results reveal that the algorithm is effective and can
They act on various documents of different types and
be used in the real world applications.
formats.
Index Terms –Ranking, domain specific search, and ranking adaptation
Ranking models are used by broad based search
I.INTRODUCTION
Frequency (TF) for ranking the results. However, the
Internet has become information super highway
broad based ranking models provide information of
which provides plethora of information of various
any kind with ranking. They are much generalized
domains. Internet has been around and being used to
ones that can’t provide domain specific results. When
add more information to it from various quarters of
ranking models are required for vertical domains,
the world. At the same time information retrieval has
many models are needed in order to serve all
been very useful from Internet that led people to
domains. This will be very cumbersome and time
obtain required information. Vertical domains, of
consuming to do so. Therefore it is very much
late,
required to have an algorithm that can adapt to
are
helping
people
to
obtain
required
information with ease. However, the search engines
ISSN: 2231-2803
engines with various techniques. They use Term
various domains in the real world.
http://www.ijcttjournal.org
Page79
International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 From the experiment in the real world it is
from search engines. However, many of the
understood that the broad based search engines can
researches were focusing on general information
provide
useful
retrieval and not specific to domains. When they are
immediately unless user browses for more accurate
specific to domains, they could not be able to adapt to
information. To overcome this problem, ranking
new domains. Language models for retrieval of
adaptation to new domains is required by the
information [13], [12] and Classical BM25 [14]
algorithms of classifiers. In this context research was
worked well. When few parameters are to be adjusted
found in the literature as explored in [6], [7], [8], [9],
and obtain results, these models worked fine.
and [10]. However, it is new research for adapting to
However, they were inadequate to adapt to new
new domains i.e. ranking model adaptation. Concept
domains for the purpose of ranking. For this reason it
drifting [11] and classifier adaptation existed in the
is understood from the review of literature that there
literature. The former is related to predicting rankings
was a need for making a new ranking model which
while the latter deals with binary targets. These
could adapt to different domains spontaneously
classifiers have some problems as they could not
without causing problems. Before presenting our
adapt to new vertical domains.
ranking model adaptation concept, we would like to
information
that
cannot
be
review the previous models. Recently Geng et al. [12] proposed a new ranking adaptation algorithm that helped in domain specific
Of late, ranking algorithms came into existence that
search. One algorithm can adapt to various domains
helped information retrieval with ranking that could
and help in all domains instead of labeled data. The
let users to get more useful results at the top of the
effective adaption of ranking models helped the
search results. The ranking problem was converted to
algorithm
classification problem by some of the algorithms.
to
serve
users
who
need
diverse
information from across the domains. Their algorithm
The
examples
of
such
algorithms
include
addressed all the problems and made the domain
LambdaRank [5], ListNet [15], RankNet [3],
specific search successful with a single ranking
RankBoost [4], and Ranking SVM [1], [2] etc. These
model as it can adapt to new domains. In this paper
algorithms have single objective that is optimizing
we implemented that model and tested it practically.
search results. Recently Gent et al. built a new
The empirical results are encouraging. The remainder
ranking model adaptation algorithm that helps in
of the paper is structured into sections. Section 2
ranking the search results of any domain. This
focuses on review of literature. Section 3 provides
reduced the need for developing a new ranking model
information about the new ranking adaptation model.
for each domain. Some other ranking related models
Section 4 gives details about the experiments, results
were developed in [7], [6], [16], and [9]. In this paper
and evaluation while section 5 provides conclusions.
we implemented the model developed by Geng et al. [12].
II.RELATED WORK
III.RAKING ADAPTATION In this section we provide details regarding the
In the literature many researches were found on the ranking models that helped to rank returned results
ISSN: 2231-2803
ranking model that has been implemented in this paper. Before that we describe the problem statement
http://www.ijcttjournal.org
Page80
International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 here. Let set of queries to be Q and set of documents to be D. Human annotators label the search results. The other ranking models studied include PageRank [17] and HITS [18]. Estimating ranking is the purpose of these algorithms. The documents retuned by them will be small and they depended on the prior
As expected in this paper, the data is from various
knowledge in the form of labels and training samples.
domains that have different features related to the
Ranking Adaptation SVM
corresponding domains. The ranking model we
In this paper we have an assumption that the ranking
implement practically adapts to those features. The
target domain and auxiliary domain are smaller and it
usage of the domain specific features and adapt to
is quite possible to have a ranking model that can
new domains make this model useful and avoid the
adapt to various domains. The regularization
necessity of making so many ranking models. We
frameworks that are conventional like Neural
also considered the ranking loss concept into the
Networks [19] and SVM [20] have some problem in
framework. Document similarities concept is used in
obtaining results with various domains. This problem
the framework. The margin rescaling is also
was named as ill-posed problem that needs prior
considered as an optimization problem for rescaling
assumption and thus not suitable for ranking model
the margin violations if any with respect to
adaptation.
adaptability. The same is presented as follows.
Therefore
we
use
regularization
framework in order to solve this problem elegantly with the help of an adaptive ranking function. The ranking adaptation model which has been proposed is as follows.
In the same fashion slack rescaling is formulated as follows:
Adapting to Multiple Domains The algorithm proposed in this paper can be extended for multiple domains that can be used in the ranking model adaptation. New domains are learned on the fly by the same ranking model in the process of adaptation. Thus the proposed system supports domain specific search with only one ranking model. The multiple domain adaptation can be formulated as follows with an assumption that certain auxiliary functions help the system to adapt to new domains.
ISSN: 2231-2803
http://www.ijcttjournal.org
Page81
International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 IV.EXPERIMENTAL RESULTS
0.4
We built a prototype application with web based 0.3
application is built using Java/JEE platform that used JDBC, Servlets and JSP. The environment includes a
NDCG
interface to demonstrate the proof of concept. The
TAR-Only
0.2
AUX-Only
PC with 2 GB RAM with Pentium Core 2 Dual
0.1
processor is used for experiments. We used datasets
0
Lin-Comb RA-SVM 12345 6 7 8 9
such as TD2004 and TD 203 which were gathered from Internet sources. The performance of the model
Truncation Level
is measured using cumulative gain and means average precision. The results are compared with
Fig. 2 - TD2003 to TD2004 adaptation with ten
other baseline methods such as Aux-Only, Tar-Only,
queries
and Lin-Comb. As can be seen in fig. 2, comparison is made with adaptation performance of proposed algorithm with
0.4
other three algorithms. The adaptation is from NDCG
0.3
TD2003 dataset to TD2004 dataset with ten queries.
0.2 0.1 0 1
3
5
7
TAR-Only
Out of all, the proposed algorithm has shown best
AUX-Only
performance. As number of queries is increased,
Lin-Comb
when Aux-Only is compared with Tar-Only, the Tar-
RA-SVM
Only outperforms the other model.
9
Truncation Level
0.5 Aux-Only
Fig. 1 - TD2003 to TD2004 adaptation with five queries
As can be seen in fig. 1, comparison is made with adaptation performance of proposed algorithm with other three algorithms. The adaptation is from TD2003 dataset to TD2004 dataset with five
NDCG
0.4
Tar-Only
0.3
Lin-Comb
0.2
RA-SVM
0.1
RA-SVM-GR
0 1 3 5 7 9 11 13 Truncation Level
RA-SVM-MR RA-SVM-SR
queries.Out of all, the proposed algorithm has shown best performance. When Aux-Only is compared with Tar-Only, the Aux-Only outperforms the other
Fig. 3 –NDCG Results of web page search to image search adaptation with five labeled queries
model.
ISSN: 2231-2803
http://www.ijcttjournal.org
Page82
International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 As can be seen in fig. 3, adaptation is made from web
queries. The performance of proposed model is
page search to image search with five labeled queries.
compared with other baseline models. The proposed
The performance of proposed model is compared
model outperforms other models.
with other baseline models. The proposed model outperforms other models.
0.5 Aux-Only
0.5 Aux-Only
NDCG
0.4
Tar-Only
0.3
Lin-Comb
0.2
RA-SVM
0.1
RA-SVM-GR
0
RA-SVM-MR
1 3 5 7 9 11 13
NDCG
0.4
Tar-Only
0.3
Lin-Comb
0.2
RA-SVM
0.1
RA-SVM-GR
0 1
4
7 10 13 Truncation Level
RA-SVM-MR RA-SVM-SR
RA-SVM-SR
Truncation Level
Fig. 6 – NDCG Results of web page search to image search adaptation with thirty labeled queries
Fig. 4 – NDCG Results of web page search to image search adaptation with ten labeled queries
As can be seen in fig. 6, adaptation is made from web page search to image search with thirty labeled
As can be seen in fig. 4, adaptation is made from web
queries. The performance of proposed model is
page search to image search with ten labeled queries.
compared with other baseline models. The proposed
The performance of proposed model is compared
model outperforms other models.
with other baseline models. The proposed model outperforms other models.
V.CONCLUSION
NDCG
In this paper we studied information retrieval systems 0.5 0.4 0.3 0.2 0.1 0 1 4
7 10 13 Truncation Level
Aux-Only
such as search engines. We came to know that the
Tar-Only
search engines are broad based search engines and
Lin-Comb
they can’t provide domain specific information.
RA-SVM
However, there are certain domain specific search
RA-SVM-GR
engines that have problem to adapt to new domains.
RA-SVM-MR
These search engines provide huge amount of
RA-SVM-SR
information that makes the user not happy. The reason behind this is that the end user has to browse
Fig. 5 – NDCG Results of web page search to image search adaptation with twenty labeled queries
information. This has been improved a lot by various
As can be seen in fig. 5, adaptation is made from web page search to image search with twenty labeled
ISSN: 2231-2803
and spend some time to have exactly required
ranking models. However, those ranking models could not adapt to new domains. In this paper we
http://www.ijcttjournal.org
Page83
International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 2– Feb 2014 implemented a ranking model that can adapt to new
[11] R. Klinkenberg and T. Joachims, “Detecting Concept Drift
models and thus avoid making many algorithms for
withSupport Vector Machines,” Proc. 17th Int’l Conf. Machine Learning(ICML ’00), pp. 487-494, 2000.
each domain. Our model is based on the work done
[12]
J.M. Ponte and W.B. Croft, “A Language Modeling
by Geng et al. [12] We built a prototype application
Approach to Information Retrieval,” Proc. 21st Ann. Int’l ACM
that demonstrated the proof of concept. The empirical
SIGIR Conf. Research and Development in Information Retrieval,
results are encouraging.
pp. 275-281, 1998. GENG ET AL.: RANKING MODEL ADAPTATION FOR DOMAIN-SPECIFIC SEARCH 757 [13] J. Lafferty and C. Zhai, “Document Language Models, Query
REFERENCES
Models, and Risk Minimization for Information Retrieval,” Proc. 24th Ann. Int’l ACM SIGIR Conf. Research and Development in
[1] R. Herbrich, T. Graepel, and K. Obermayer, “Large Margin
Information Retrieval (SIGIR ’01), pp. 111-119, 2001.
RankBoundaries for Ordinal Regression,” Advances in Large
[14] S. Robertson and D.A. Hull, “The Trec-9 Filtering Track Final
MarginClassifiers, pp. 115-132, MIT Press, 2000.
Report,” Proc. Ninth Text Retrieval Conf., pp. 25-40, 2000.
[2]
T.
Joachims,
“Optimizing
Search
Engines
Using
[15] Z. Cao and T. Yan Liu, “Learning to Rank: From
ClickthroughData,” Proc. Eighth ACM SIGKDD Int’l Conf.
PairwiseApproach to Listwise Approach,” Proc. 24th Int’l Conf.
Knowledge Discoveryand Data Mining (KDD ’02), pp. 133-142,
MachineLearning (ICML ’07), pp. 129-136, 2007.
2002.
[16] H. Daume III and D. Marcu, “Domain Adaptation for
[3] C.J.C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds,
Statistical Classifiers,” J. Artificial Intelligence Research, vol. 26,
N.Hamilton, and G. Hullender, “Learning to Rank Using
pp. 101-126, 2006.
GradientDescent,” Proc. 22th Int’l Conf. Machine Learning (ICML
[17] L. Page, S. Brin, R. Motwani, and T. Winograd, “The
’05), 2005.
Pagerank Citation Ranking: Bringing Order to the Web,” technical report, Stanford Univ., 1998.
[4]
Y. Freund, R. Iyer, R.E. Schapire, Y. Singer, and G.
[18] J.M. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan,
Dietterich, “An Efficient Boosting Algorithm for Combining
and A. Tomkins, “The Web as a Graph: Measurements, Models
Preferences,”J. Machine Learning Research, vol. 4, pp. 933-969,
and Methods,” Proc. Int’l Conf. Combinatorics and Computing,
2003.
pp. 1-18, 1999.
[5] C.J.C. Burges, R. Ragno, and Q.V. Le, “Learning to Rank
[19] F. Girosi, M. Jones, and T. Poggio, “Regularization Theory
withNonsmooth Cost Functions,” Proc. Advances in Neural
and Neural Networks Architectures,” Neural Computation, vol. 7,
InformationProcessing Systems (NIPS ’06), pp. 193-200, 2006.
pp. 219-269, 1995.
[6] J. Blitzer, R. Mcdonald, and F. Pereira, “Domain Adaptation
[20] V.N. Vapnik, Statistical Learning Theory. Wiley-Interscience,
withStructural Correspondence Learning,” Proc. Conf. Empirical
1998.
Methodsin Natural Language Processing (EMNLP ’06), pp. 120128, July2006. [7]
W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, “Boosting for
TransferLearning,” Proc. 24th Int’l Conf. Machine Learning (ICML ’07),pp. 193-200, 2007. [8] H. Shimodaira, “Improving Predictive Inference Under CovariateShift by Weighting the Log-Likelihood Function,” J. StatisticalPlanning and Inference, vol. 90, no. 18, pp. 227-244, 2000. [9] J. Yang, R. Yan, and A.G. Hauptmann, “Cross-Domain VideoConcept Detection Using Adaptive Svms,” Proc. 15th Int’l Conf.Multimedia, pp. 188-197, 2007. [10] B. Zadrozny, “Learning and Evaluating Classifiers Under SampleSelection Bias,” Proc. 21st Int’l Conf. Machine Learning (ICML ’04),p. 114, 2004.
ISSN: 2231-2803
http://www.ijcttjournal.org
Page84