Ed Davis

Profile Picture

PhD, Computational Statistics and Data Science

University of Bristol

Contact & Links

PhD Research Outline

My PhD thesis addresses the problem of how to understand information encoded in multi-layer graphs. Here, a graph could refer to a network or a corpus of text, and multi-layer could refer to a graph that is dynamic or observed in different contexts. We address this problem through the use of graph embedding, a process by which we assign each node (or edge, or the whole graph) to a vector. This creates a space in which nodes with similar behaviour are closer to each other. We invented a new way of embedding dynamic graphs by using a framework that extended the applicability of any established graph embedding method to multi-layer graphs. We provide a statistical guarantee that our framework produces multi-layer embeddings that are stable over space and time (the first general proof of its kind), and we show through rigorous simulated and real-data experiments that our framework is more powerful and interpretable than other state-of-the-art methods. We have also shown that this framework demonstrates empirical gains when applied to Graph Neural Networks.

Our proof of stability can be rephrased as a proof of exchangeability, that is, our embedding framework can take a multi-layer graph and turn it into an exchangeable sequence of vectors. With exchangeability, we gain access to other fields of maths, including conformal prediction and the bootstrap.

What is a graph embedding?

See the graphic below for a visual explanation of a graph embedding. On the left, we see a Harry Potter emnity network (adversarial interactions) plotted in the conventional way. On the right, we see a corresponding graph embedding where each node is represented in a 2D space (hover over each node to see the correspondence). In this case, the graph embedding is able to separate the good and the bad characters in the Harry Potter world.

My work mainly focuses on how to best compute a dynamic graph embedding. In the below example, a network is changing its connections over time. In the corresponding dynamic embedding, we are able to see how this structure changes over time - click "Play" to see a dynamic graph embedding in action! To find out more about this intersting field, see this blog post, or some of my papers.

Papers

Valid Conformal Prediction for Dynamic GNNs (Accepted, ICLR)

Graph neural networks (GNNs) are powerful black-box models which have shown impressive empirical performance. However, without any form of uncertainty quantification, it can be difficult to trust such models in high-risk scenarios. Conformal prediction aims to address this problem, however, an assumption of exchangeability is required for its validity which has limited its applicability to static graphs and transductive regimes. We propose to use unfolding, which allows any existing static GNN to output a dynamic graph embedding with exchangeability properties. Using this, we extend the validity of conformal prediction to dynamic GNNs in both transductive and semi-inductive regimes. We provide a theoretical guarantee of valid conformal prediction in these cases and demonstrate the empirical validity, as well as the performance gains, of unfolded GNNs against standard GNN architectures on both simulated and real datasets.
A dynamic embedding of i.i.d. data using UGCN and a standard GCN.
Figure: UGCN and block diagonal GCN representations of an i.i.d. stochastic block model after applying PCA. The models were trained with transductive masks. Block diagonal GCN appears to encode a significant change over time despite there being none. The embedding from UGCN is exchangeable over time, as would be expected.

A Simple and Powerful Framework for Stable Dynamic Network Embedding

In this paper, we address the problem of dynamic network embedding, that is, representing the nodes of a dynamic network as evolving vectors within a low-dimensional space. While the field of single-graph embedding is wide and established, the field of dynamic graph embedding is comparatively in its infancy. In this paper, we propose that we can take a wide class of established single-graph embedding methods and use them to produce interpretable and powerful dynamic graph embeddings by simply applying them to the dilated unfolded matrix. We provide a theoretical guarantee that, regardless of embedding dimension, these unfolded methods will produce stable embeddings over time and space, meaning that nodes with identical latent behaviour will be exchangeable, regardless of their position in time or space.

1D Embedding of Flight Network
Figure: A one-dimensional URLSE displaying the average temporal structure of airports in Italy, the UK and across Europe. The embedding encodes the two main European waves of COVID-19 as well as an annual periodic structure before and after the pandemic. We also see that the pandemic hit Italy slightly before the UK and the rest of Europe.

Valid Bootstraps for Networks with Applications to Network Visualisation

Quantifying uncertainty in networks is an important step in modelling relationships and interactions between entities. We consider the challenge of bootstrapping an inhomogeneous random graph when only a single observation of the network is made and the underlying data generating function is unknown. We utilise an exchangeable network test that can empirically validate bootstrap samples generated by any method, by testing if the observed and bootstrapped networks are statistically distinguishable. We find that existing methods fail this test. To address this, we propose a principled, novel, distribution-free network bootstrap using k-nearest neighbour smoothing, that can regularly pass this exchangeable network test in both synthetic and real-data scenarios. We demonstrate the utility of this work in combination with the popular data visualisation method t-SNE, where uncertainty estimates from bootstrapping are used to explain whether visible structures represent real statistically sound structures.

Our network bootstrapping procedure
Figure: An overview of the proposed graph bootstrap framework. Given an adjacency matrix (a), we use its network embedding (b) to estimate the probability matrix (c) using k-nearest neighbours, from which the adjacency matrix was drawn. From this, we draw many new adjacency matrices (d) and use them to estimate the uncertainty in the network embedding (e). We then use this uncertainty to explain the visible structures in the network embedding (f).

Other Work

pyemb logo pyemb - EDA of embeddings in Python

A Python package for exploring complex datasets including high-dimensional data, relational databases and networks.

For example, the below code can quickly compute a URLSE dynamic embedding and then produce an interactive plot of the embedding over time.

import pyemb as eb

# Embed the graph using unfolded regularised Laplacian spectral embedding
d = 10
URLSE_emb = eb.dyn_embed(As, d=d, method="URLSE")

# Quick interactive + animated plot of the embedding
fig = eb.quick_plot(URLSE_emb, n, T, node_labels)

Award-Winning Data Visualisation

Runner-up in the Jean Golding Institute's Beauty of Data competition. This visualisation was presented at the Bristol Data and AI Showcase 2022. This visualisation displays an embedding of a friendship network between countries on the world stage based on their alliances.

(Click image for full-size)

World Stage Data Visualization