Modeling and learning distributed word representation with metadata for question retrieval

Page 1

Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval

Abstract: Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB MB-NET NET and the other one is an enhanced category powered model called ME ME-NET NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable riable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed fixed-length length vectors. Experimental results on large-scale scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on largelarge scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Modeling and learning distributed word representation with metadata for question retrieval by ieeeprojectchennai - Issuu