Guided htm hierarchical topic model with dirichlet forest priors

Page 1

Guided HTM Hierarchical Topic Model with Dirichlet Forest Priors

Abstract: Despite the proliferation of topic models, the organization of topics from the probabilistic models needs improvement in two ways: the better structured presentation of topics and the incorporation of domain knowledge on the corpus. The structured presentation, tion, i.e., the hierarchical topic model, helps in categorizing similar topics; the incorporation of domain knowledge enables the concentrated sampling of predefined keywords in the mixture parameter learning. This paper presents a hierarchical topic model modelss with incorporated domain knowledge, called Guided Hierarchical Topic Model (GHTM). Specifically, we allocated the prior information from the knowledge to the Dirichlet Forest prior. From the prior adjustment, we obtained the topic tree guided by the domain in knowledge. This paper also contributes in enumerating four different knowledge extraction methods and applying the extracted knowledge to GHTM. We evaluated the performance of GHTM in terms of the hierarchical clustering accuracy, and we found a signifi significant cant improvement of hierarchical clustering measured by F-measures. measures. This improvement is also verified by the perplexity analyses. Additionally, we measured topic quality with KL KL-divergence divergence and visualization, and these confirm the ability to better separate topic distributions. Finally, we tested the hierarchical topic quality through human experiments, and this also revealed significant improvements originating from the guidance.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.