Network flows and the link prediction problem

Abstract

Link prediction is used by many applications to recommend new products or social connections to people. Link prediction leverages information in network structure to identify missing links or predict which new one will form in the future. Recent research has provided a theoretical justification for the success of some popular link prediction heuristics, such as the number of common neighbors and the Adamic-Adar score, by showing that they estimate the distance between nodes in some latent feature space. In this paper we examine the link prediction task from the novel perspective of network flows. We show that how easily two nodes can interact with or influence each other depends not only on their position in the network, but also on the nature of the flow that mediates interactions between them. We show that different types of flows lead to different notions of network proximity, some of which are mathematically equivalent to existing link prediction heuristics. We measure the performance of different heuristics on the missing link prediction task in a variety of real-world social, technological and biological networks. We show that heuristics based on a random walk-type processes outperform the popular Adamic-Adar and the number of common neighbors heuristics in many networks.

Publication
In 7th workshop on social network mining and analysis, Knowledge Discovery and Mining.