Authors
Marius Kloft
Journal
Machine Learning with Interdependent and Non-identically Distributed Data
Pages
36
Description
A classic assumption in machine learning states that the data is independently realized from an unknown distribution. This assumptions greatly simplifies theory [1] and algorithms [2]. However, it is common in several applications that the data exhibit dependencies and inherent correlations between observations. Clearly, this occurs especially for time series, for instance, in network security (eg, HTTP requests) and computer vision (video streams). Under the assumption of time-structured dependencies, several algorithms and theory have been proposed [3]. But few theory and algorithms have been developed for complexer dependencies, in particular for confounding ones.
For instance in statistical genetics, it is one of the central challenges to detect–among ten thousands of genes–the ones that are strong predictors of complex diseases or other binary outcomes [4, 5], as it is a first step in identifying regulatory components controlling heritability. However, for various diseases such as type 2 diabetes [6], these sparse signals are yet largely undetected, which is why these missing associations have been entitled the The Dark Matter of Genomic Associations [7]. Central problems include that these signals are
Scholar articles