View article

[PDF] from github.io

Multi-Dimensional, Phrase-Based Summarization in Text Cubes.

Authors

Fangbo Tao, Honglei Zhuang, Chi Wang Yu, Qi Wang, Taylor Cassidy, Lance M Kaplan, Clare R Voss, Jiawei Han

Publication date

2016/9

Journal

IEEE Data Eng. Bull.

Volume

Issue

Pages

74-84

Description

To systematically analyze large numbers of textual documents, it is often desirable to manage documents (and their metadata) in a multi-dimensional text database (Text Cube). Such structure provides flexibility of understanding local information with different granularities. Moreover, the contextualized analysis derived from cube structure often yields comparative insights. To quickly digest the content of subsets of documents in the multi-dimensional context, we study the problem of phrase-based summarization of a subset of documents of interest. We propose a new phrase ranking measure to leverage the relation between document subsets induced by multi-dimensional context and identify phrases that truly distinguish the queried subset of documents from neighboring subsets (ie, background). Our quality evaluation suggests the new measure involving dynamic, query-dependent background generation is more effective than previous measures using the whole corpus as a static background for finding representative phrases. Computing this measure is more expensive due to the need of access to many subsets of documents to answer one query. We develop a cube-based analytical platform that implements an efficient solution by materializing a deliberately selected part of statistics, and using these statistics to perform online query processing within a constant latency constraint. Our experiments in a large news dataset demonstrate the efficiency in both query processing time and storage cost.

Total citations

Cited by 33

2015201620172018201920202021202220231 3 7 8 6 1 4 2

Scholar articles

Multi-Dimensional, Phrase-Based Summarization in Text Cubes.

F Tao, H Zhuang, CW Yu, Q Wang, T Cassidy… - IEEE Data Eng. Bull., 2016