Authors
Han He, Lei Wu, Xiaokun Yang, Hua Yan, Zhimin Gao, Yi Feng, George Townsend
Publication date
2018
Conference
Information Technology-New Generations: 15th International Conference on Information Technology
Pages
421-426
Publisher
Springer International Publishing
Description
Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to learn sub-character level representation and capture deeper level of semantic meanings. To build a concrete study and substantiate the efficiency of our neural architecture, we take Chinese Word Segmentation as a research case example. Among those languages, Chinese is a typical case, for which every character contains several components called radicals. Our networks employ a shared radical level embedding to solve both Simplified and …
Total citations
20182019202020212022202389221
Scholar articles
H He, L Wu, X Yang, H Yan, Z Gao, Y Feng… - … -New Generations: 15th International Conference on …, 2018