Authors
Igor Kotenko, Andrey Chechulin, Andrey Shorov, Dmitry Komashinsky
Publication date
2014
Conference
Advances in Data Mining. Applications and Theoretical Aspects: 14th Industrial Conference, ICDM 2014, St. Petersburg, Russia, July 16-20, 2014. Proceedings 14
Pages
39-54
Publisher
Springer International Publishing
Description
The paper considers the problem of automated categorization of web sites for systems used to block web pages that contain inappropriate content. In the paper we applied the techniques of analysis of the text, html tags, URL addresses and other information using Machine Learning and Data Mining methods. Besides that, techniques of analysis of sites that provide information in different languages are suggested. Architecture and algorithms of the system for collecting, storing and analyzing data required for classification of sites are presented. Results of experiments on analysis of web sites’ correspondence to different categories are given. Evaluation of the classification quality is performed. The classification system developed as a result of this work is implemented in F-Secure mass production systems performing analysis of web content.
Total citations
2015201620172018201920202021202220231321264321
Scholar articles
I Kotenko, A Chechulin, A Shorov, D Komashinsky - Advances in Data Mining. Applications and Theoretical …, 2014