Joint Image-Text Text News Topic Detection and Tracking by Multimodal Topic AndAnd Or Graph
Abstract: This paper presents a novel method for automatically detecting and tracking news topics from multimodal TV news data. We propose a multimodal topic and-or and graph (MT-AOG) AOG) to jointly represent textual and visual elements of news stories and their latent topicc structures. An MT MT-AOG AOG leverages a context-sensitive context grammar that can describe the hierarchical composition of news topics by semantic elements about people involved, related places, and what happened, and model contextual relationships between elements iin n the hierarchy. We detect news topics through a cluster sampling process which groups stories about closely related events together. Swendsen Swendsen-Wang Wang cuts, an effective cluster sampling algorithm, is adopted for traversing the solution space and obtaining optimal timal clustering solutions by maximizing a Bayesian posterior probability. The detected topics are then continuously tracked and updated with incoming news streams. We generate topic trajectories to show how topics emerge, evolve, and disappear over time. The experimental results show that our method can explicitly describe the textual and visual data in news videos and produce meaningful topic trajectories. Our method also outperforms previous methods for the task of document clustering on Reuters Reuters-21578 dataset taset and our novel dataset, UCLA Broadcast News dataset.