View article

Training data selection based on dataset distillation for rapid deployment in machine-learning workflows

Authors

Yuna Jeong, Myunggwon Hwang, Wonkyung Sung

Publication date

2023/3

Journal

Multimedia Tools and Applications

Volume

Issue

Pages

9855-9870

Publisher

Springer US

Description

Recently, nonlinear machine-learning models have been effectively applied to multimedia data, contributing greatly to various downstream tasks. However, large amounts of training data are required to properly train many parameters and achieve reasonable performance in nonlinear models. Using a large amount of data significantly increases time and cost, which are limited resources of model development and distribution processes. The goal of our study is to construct a core set that approximates the entire original dataset so that we can quickly observe performance changes caused by model redesign or parameter changes in machine learning deployment. The core set is mainly composed of informative samples with a high contribution to the train. We measure the contribution of the sample based on the dataset distillation and perform area-based sampling for generalization. The core set can be construct in a …

Total citations

Cited by 1

20241

Scholar articles

Training data selection based on dataset distillation for rapid deployment in machine-learning workflows

Y Jeong, M Hwang, W Sung - Multimedia Tools and Applications, 2023