Authors
Konstantin Avrachenkov, Paulo Gonçalves, Arnaud Legout, Marina Sokol
Publication date
2011/12
Journal
Proceeding of NIPS Big Learning Workshop
Description
P2P downloads still represent a large portion of today’s Internet traffic. More than 100 million users operate BitTorrent and generate more than 30% of the total Internet traffic [7]. Recently, a significant research effort has been done to develop tools for automatic classification of Internet traffic by application [9, 8, 11]. The purpose of the present work is to provide a framework for subclassification of P2P traffic generated by the BitTorrent protocol. Unlike previous works [9, 8, 11], we cannot rely on packet level characteristics and on the standard supervised machine learning methods. The application of the standard supervised machine learning methods in [9, 8, 11] is based on the availability of a large set of parameters (packet size, packet interarrival time, etc.). Since P2P transfers are based on the same BitTorrent protocol we cannot use this set of parameters to classify P2P content and users. Instead we can make use of the bipartite user-content graph. This is a graph formed by two sets of nodes: the set of users (peers) and the set of contents (downloaded files). From this basic bipartite graph we also construct the user graph, where two users are connected if they download the same content, and the content graph, where two files are connected if they are both downloaded by at least one same user. The general intuition is that the users with similar interests download similar contents. This intuition can be rigorously formalized with the help of graph based semi-supervised learning approach [13].
The main idea of the graph based semi-supervised learning approach is to use the instance smoothness over the graph. Namely, if one data point has many …
Total citations
Scholar articles
K Avrachenkov, P Gonçalves, A Legout, M Sokol - Proceeding of NIPS Big Learning Workshop, 2011