Authors
Dung Duc NGUYEN, Maike ERDMANN, Tomoya TAKEYOSHI, Gen HATTORI, Kazunori MATSUMOTO, ONO Chihiro
Publication date
2013/11/1
Journal
IEICE TRANSACTIONS on Information and Systems
Volume
96
Issue
11
Pages
2376-2384
Publisher
The Institute of Electronics, Information and Communication Engineers
Description
The abundance of information published on the Internet makes filtering of hazardous Web pages a difficult yet important task. Supervised learning methods such as Support Vector Machines (SVMs) can be used to identify hazardous Web content. However, scalability is a big challenge, especially if we have to train multiple classifiers, since different policies exist on what kind of information is hazardous. We therefore propose two different strategies to train multiple SVMs for personalized Web content filters. The first strategy identifies common data clusters and then performs optimization on these clusters in order to obtain good initial solutions for individual problems. This initialization shortens the path to the optimal solutions and reduces the training time on individual training sets. The second approach is to train all SVMs simultaneously. We introduce an SMO-based kernel-biased heuristic that balances the …
Total citations
Scholar articles
DD Nguyen, M Erdmann, T Takeyoshi, G Hattori… - IEICE TRANSACTIONS on Information and Systems, 2013