Authors
Yuna Jeong, Myunggwon Hwang, Wonkyung Sung
Publication date
2023/3
Journal
Multimedia Tools and Applications
Volume
82
Issue
7
Pages
9855-9870
Publisher
Springer US
Description
Recently, nonlinear machine-learning models have been effectively applied to multimedia data, contributing greatly to various downstream tasks. However, large amounts of training data are required to properly train many parameters and achieve reasonable performance in nonlinear models. Using a large amount of data significantly increases time and cost, which are limited resources of model development and distribution processes. The goal of our study is to construct a core set that approximates the entire original dataset so that we can quickly observe performance changes caused by model redesign or parameter changes in machine learning deployment. The core set is mainly composed of informative samples with a high contribution to the train. We measure the contribution of the sample based on the dataset distillation and perform area-based sampling for generalization. The core set can be construct in a …
Total citations