View article

[HTML] from mdpi.com

Comparing the quality and speed of sentence classification with modern language models

Authors

Krzysztof Fiok, Waldemar Karwowski, Edgar Gutierrez, Mohammad Reza-Davahli

Publication date

2020/5/14

Journal

Applied Sciences

Volume

Issue

Pages

3386

Publisher

MDPI

Description

After the advent of Glove and Word2vec, the dynamic development of language models (LMs) used to generate word embeddings has enabled the creation of better text classifier frameworks. With the vector representations of words generated by newer LMs, embeddings are no longer static but are context-aware. However, the quality of results provided by state-of-the-art LMs comes at the price of speed. Our goal was to present a benchmark to provide insight into the speed–quality trade-off of a sentence classifier framework based on word embeddings provided by selected LMs. We used a recurrent neural network with gated recurrent units to create sentence-level vector representations from word embeddings provided by an LM and a single fully connected layer for classification. Benchmarking was performed on two sentence classification data sets: The Sixth Text REtrieval Conference (TREC6)set and a 1000-sentence data set of our design. Our Monte Carlo cross-validated results based on these two data sources demonstrated that the newest deep learning LMs provided improvements over Glove and FastText in terms of weighted Matthews correlation coefficient (MCC) scores. We postulate that progress in LMs is more apparent when more difficult classification tasks are addressed.

Total citations

Cited by 17

202020212022202320242 5 4 4 2

Scholar articles

Comparing the quality and speed of sentence classification with modern language models

K Fiok, W Karwowski, E Gutierrez, M Reza-Davahli - Applied Sciences, 2020