Data-dependent Geometries and Structures: Analyses and Algorithms for Machine Learning is a PASCAL-funded research project comprising researchers in machine learning based at UCL the Universita' dell'Insubria and the University of Bristol.
A standard paradigm of supervised learning is the data-independent hypothesis space. In this model a data set is a sample of points from some space with a given geometry. Thus the distance between any two points in the space is independent of the particular sample. In a data-dependent geometry the distance depends on the particular points sampled. Thus for example consider a data set of "news stories,'' containing a story in the Financial Times about a renewed investment in nuclear technology, and a story in the St. Petersburg Gazetteer about job losses from a decline in expected tourism. Although these appear initially to be dissimilar, the inclusion of a third story regarding an oil pipe line leakage creates an indirect "connection.'' In the data-independent case the "distance'' between stories is unchanged while in the data-dependent case, the distances reflect the connection. This project is designed to address the challenges posed both algorithmically and theoretically by data-defined hypothesis spaces. The complexity of real world data is clearly offset by its intricate structure -- be it hierarchical, long-tailed distributional, graph based, etc. Leveraging this structure to enable practical learning is the core aim of this project.
More details can be found here.