Calculus on finite weighted graphs

From HandWiki

In mathematics, calculus on finite weighted graphs is a discrete calculus for functions whose domain is the vertex set of a graph with a finite number of vertices and weights associated to the edges. This involves formulating discrete operators on graphs which are analogous to differential operators in calculus, such as graph Laplacians (or discrete Laplace operators) as discrete versions of the Laplacian, and using these operators to formulate differential equations, difference equations, or variational models on graphs which can be interpreted as discrete versions of partial differential equations or continuum variational models. Such equations and models are important tools to mathematically model, analyze, and process discrete information in many different research fields, e.g., image processing, machine learning, and network analysis. In applications, finite weighted graphs represent a finite number of entities by the graph's vertices, any pairwise relationships between these entities by graph edges, and the significance of a relationship by an edge weight function. Differential equations or difference equations on such graphs can be employed to leverage the graph's structure for tasks such as image segmentation (where the vertices represent pixels and the weighted edges encode pixel similarity based on comparisons of Moore neighborhoods or larger windows), data clustering, data classification, or community detection in a social network (where the vertices represent users of the network, the edges represent links between users, and the weight function indicates the strength of interactions between users).

The main advantage of finite weighted graphs is that by not being restricted to highly regular structures such as discrete regular grids, lattice graphs, or meshes, they can be applied to represent abstract data with irregular interrelationships.

If a finite weighted graph is geometrically embedded in a Euclidean space, i.e., the graph vertices represent points of this space, then it can be interpreted as a discrete approximation of a related nonlocal operator in the continuum setting.

Basic definitions

A finite weighted graph [math]\displaystyle{ G }[/math] is defined as a triple [math]\displaystyle{ G = (V, E, w) }[/math] for which

  • [math]\displaystyle{ V = \{x_1, \dots, x_n\}, n\in\mathbb{N} }[/math], is a finite set of indices denoted as graph vertices or nodes,
  • [math]\displaystyle{ E \subset V \times V }[/math] is a finite set of (directed) graph edges connecting a subset of vertices,
  • [math]\displaystyle{ w \colon E \rightarrow \mathbb{R} }[/math] is an edge weight function defined on the edges of the graph.

In a directed graph, each edge [math]\displaystyle{ (x_i,x_j)\in E }[/math] has a start node [math]\displaystyle{ x_i\in V }[/math] and an end node [math]\displaystyle{ x_j\in V }[/math]. In an undirected graph for every edge [math]\displaystyle{ (x_i,x_j) }[/math] there exists an edge [math]\displaystyle{ (x_j,x_i) }[/math] and the weight function is required to be symmetric, i.e., [math]\displaystyle{ w(x_i,x_j) = w(x_j,x_i) }[/math].[1] On the remainder of this page, the graphs will be assumed to be undirected, unless specifically stated otherwise. Many of the ideas presented on this page can be generalized to directed graphs.[1]

The edge weight function [math]\displaystyle{ w }[/math] associates to every edge [math]\displaystyle{ (x_i,x_j) \in E }[/math] a real value [math]\displaystyle{ w(x_i,x_j) \gt 0 }[/math]. For both mathematical and application specific reasons, the weight function on the edges is often required to be strictly positive and on this page it will be assumed to be so unless specifically stated otherwise. Generalizations of many of the ideas presented on this page to include negatively weighted edges are possible. Sometimes an extension of the domain of the edge weight function to [math]\displaystyle{ V\times V }[/math] is considered (with the resulting function still being called the edge weight function) by setting [math]\displaystyle{ w(x_i,x_j)=0 }[/math] whenever [math]\displaystyle{ (x_i,x_j) \not\in E }[/math].

In applications each graph vertex [math]\displaystyle{ x \in V }[/math] usually represents a single entity in the given data, e.g., elements of a finite data set, pixels in an image, or users in a social network. A graph edge represents a relationship between two entities, e.g. pairwise interactions or similarity based on comparisons of geometric neighborhoods (for example of pixels in images) or of another feature, with the edge weight encoding the strength of this relationship. Most commonly used weight functions are normalized to map to values between 0 and 1, i.e., [math]\displaystyle{ w : E \rightarrow (0,1] }[/math].

In the following it is assumed that the considered graphs are connected without self-loops or multiple edges between vertices. These assumptions are mostly harmless as in many applications each connected component of a disconnected graph can be treated as a graph in its own right, each appearance of [math]\displaystyle{ w(x_i,x_i) }[/math] (which would be nonzero in the presence of self-loops) appears in the presence of another factor which disappears when [math]\displaystyle{ i=j }[/math] (see the section on differential graph operators below), and edge weights can encode similar information as multiple edges could.

Neighborhood

A node [math]\displaystyle{ x_j \in V }[/math] is a neighbor of the node [math]\displaystyle{ x_i \in V }[/math] if there exists an edge [math]\displaystyle{ (x_i,x_j) \in E }[/math]. In terms of notation this relationship can be abbreviated by [math]\displaystyle{ x_j \sim x_i }[/math], which should be read as "[math]\displaystyle{ x_j }[/math] is a neighbor of [math]\displaystyle{ x_i }[/math]". Otherwise, if [math]\displaystyle{ x_j }[/math] is not a neighbor of [math]\displaystyle{ x_i }[/math] one writes [math]\displaystyle{ x_j\not\sim x_i }[/math]. The neighborhood [math]\displaystyle{ \mathcal N(x_i) }[/math] of a vertex [math]\displaystyle{ x_i \in V }[/math] is simply the set of neighbors [math]\displaystyle{ \mathcal{N}(x_i) := \{ x_j\in V \colon x_j \sim x_i \} }[/math]. The degree of a vertex [math]\displaystyle{ x_i \in V }[/math] is the weighted size of its neighborhood:

[math]\displaystyle{ \deg(x_i) := \sum_{j\,:\,x_j \sim x_i} w(x_i,x_j). }[/math]

Note that in the special case where [math]\displaystyle{ w\equiv 1 }[/math] on [math]\displaystyle{ E }[/math] (i.e. the graph is unweighted) we have [math]\displaystyle{ \deg(x_i) := |\mathcal{N}(x_i)| }[/math].

Space of real vertex functions

Let [math]\displaystyle{ \mathcal{H}(V) := \{f : V \rightarrow \mathbb{R} \} }[/math] be the space of (real) vertex functions. Since [math]\displaystyle{ V }[/math] is a finite set, any vertex function [math]\displaystyle{ f\in \mathcal{H}(V) }[/math] can be represented as a [math]\displaystyle{ n }[/math]-dimensional vector [math]\displaystyle{ f \in \mathbb{R}^n }[/math] (where [math]\displaystyle{ n:= |V| }[/math]) and hence the space of vertex functions [math]\displaystyle{ \mathcal{H}(V) }[/math] can be identified with an [math]\displaystyle{ n }[/math]-dimensional Hilbert space: [math]\displaystyle{ \mathcal{H}(V) \cong \mathbb{R}^n }[/math]. The inner product of [math]\displaystyle{ \mathcal{H}(V) }[/math] is defined as:

[math]\displaystyle{ \langle f, g \rangle_{\mathcal{H}(V)} := \sum_{x_i \in V} f(x_i) g(x_i), \quad \forall f, g \in \mathcal{H}(V). }[/math]

Furthermore, for any vertex function [math]\displaystyle{ f \in \mathcal{H}(V) }[/math] the [math]\displaystyle{ \ell_p }[/math]-norm and [math]\displaystyle{ \ell_\infty }[/math]-norm of [math]\displaystyle{ f }[/math] are defined as:

[math]\displaystyle{ \|f\|_p = \begin{cases} \left( \sum_{x_i\in V} |f(x_i)|^p \right)^\frac{1}{p}, &\text{ for } 1 \leqslant p \lt \infty \ ,\\ \max_{x_i\in V}|f(x_i)|, &\text{ for } p = \infty \ . \end{cases} }[/math]

The [math]\displaystyle{ \ell_2 }[/math]-norm is induced by the inner product.

In applications vertex functions are useful to label the vertices of the nodes. For example, in graph-based data clustering, each node represents a data point and a vertex function is used to identify cluster membership of the nodes.

Space of real edge functions

Analogously to real vertex functions, one can introduce the space of real edge functions [math]\displaystyle{ \mathcal{H}(E) := \{F:E \rightarrow \mathbb{R} \} }[/math]. As any edge function [math]\displaystyle{ F }[/math] is defined on a finite set of edges [math]\displaystyle{ E }[/math], it can be represented as a [math]\displaystyle{ m }[/math]-dimensional vector [math]\displaystyle{ F \in \mathbb{R}^{m} }[/math], where [math]\displaystyle{ m := |E| }[/math]. Hence, the space of edge functions [math]\displaystyle{ \mathcal{H}(E) }[/math] can be identified as a [math]\displaystyle{ m }[/math]-dimensional Hilbert space, i.e., [math]\displaystyle{ \mathcal{H}(E) \cong \mathbb{R}^{m} }[/math].

One special case of an edge function is the normalized edge weight function [math]\displaystyle{ w \colon E \rightarrow (0, 1] }[/math] introduced above in the section on basic definitions. Similar to that function, any edge function [math]\displaystyle{ F }[/math] can be trivially extended to [math]\displaystyle{ V \times V }[/math] by setting [math]\displaystyle{ F(x_i, x_j) := 0 }[/math] if [math]\displaystyle{ (x_i, x_j) \not\in E }[/math]. The space of those extended edge functions is still denoted by [math]\displaystyle{ \mathcal{H}(E) }[/math] and can be identified with [math]\displaystyle{ \mathbb{R}^m }[/math], where now [math]\displaystyle{ m := |V|^2 }[/math].

The inner product of [math]\displaystyle{ \mathcal{H}(E) }[/math] is defined as:

[math]\displaystyle{ \langle F, G \rangle_{\mathcal{H}(E)} := \sum_{(x_i, x_j) \in E} F(x_i, x_j) G(x_i, x_j), \quad \forall F, G \in \mathcal{H}(E). }[/math]

Additionally, for any edge function [math]\displaystyle{ F \in \mathcal{H}(V) }[/math] the [math]\displaystyle{ \ell_p }[/math]-norm and [math]\displaystyle{ \ell_\infty }[/math]-norm of [math]\displaystyle{ F }[/math] are defined as:

[math]\displaystyle{ \|F\|_p = \begin{cases} \left(\sum_{(x_i, x_j)\in E} |F(x_i, x_j)|^p\right)^\frac{1}{p} &\text{ for } 1 \leqslant p \lt \infty \ ,\\ \max_{(x_i, x_j) \in E}|F(x_i, x_j)|, &\text{ for } p = \infty \ . \end{cases} }[/math]

The [math]\displaystyle{ \ell_2 }[/math]-norm is induced by the inner product.

If one extends the edge set [math]\displaystyle{ E }[/math] in a way such that [math]\displaystyle{ E = V \times V }[/math] than it becomes clear that [math]\displaystyle{ \mathcal{H}(E) \cong \mathbb{R}^{n \times n} }[/math] because [math]\displaystyle{ \mathcal{H}(V) \cong \mathbb{R}^n }[/math]. This means that each edge function can be identified with a linear matrix operator.

Differential graph operators

An important ingredient in the calculus on finite weighted graphs is the mimicking of standard differential operators from the continuum setting in the discrete setting of finite weighted graphs. This allows one to translate well-studied tools from mathematics, such as partial differential equations and variational methods, and make them usable in applications which can best be modeled by a graph. The fundamental concept which makes this translation possible is the graph gradient, a first-order difference operator on graphs. Based on this one can derive higher-order difference operators, e.g., the graph Laplacian.

First-order differential operators

Weighted differences

Let [math]\displaystyle{ G = (V,E,w) }[/math] be a finite weighted graph and let [math]\displaystyle{ f\in \mathcal{H}(V) }[/math] be a vertex function. Then the weighted difference (or weighted graph derivative) of [math]\displaystyle{ f }[/math] along a directed edge [math]\displaystyle{ (x_i,x_j) \in E }[/math] is

[math]\displaystyle{ \partial_{x_j}f(x_i) \ := \ \sqrt{w(x_i, x_j)}\left(f(x_j) - f(x_i)\right). }[/math]

For any weighted difference the following properties hold:

  • [math]\displaystyle{ \partial_{x_i}f(x_j) = -\partial_{x_j}f(x_i), }[/math]
  • [math]\displaystyle{ \partial_{x_i}f(x_i) = 0, }[/math]
  • [math]\displaystyle{ f(x_i) = f(x_j) \Rightarrow \partial_{x_j}f(x_i) = 0. }[/math]

Weighted gradient

Based on the notion of weighted differences one defines the weighted gradient operator on graphs [math]\displaystyle{ \nabla_w: \mathcal{H}(V) \rightarrow \mathcal{H}(E) }[/math] as

[math]\displaystyle{ (\nabla_w f)(x_i, x_j) \ = \ \partial_{x_j}f(x_i). }[/math]

This is a linear operator.

To measure the local variation of a vertex function [math]\displaystyle{ f }[/math] in a vertex [math]\displaystyle{ x_i \in V }[/math] one can restrict the gradient [math]\displaystyle{ \nabla_w f }[/math] of [math]\displaystyle{ f }[/math] to all directed edges starting in [math]\displaystyle{ x_i }[/math] and using the [math]\displaystyle{ \ell_p }[/math]-norm of this edge function, i.e.,

[math]\displaystyle{ \|(\nabla_w f)(x_i,\cdot)\|_{\ell_p} = \begin{cases} \left(\sum_{x_j \sim x_i}w(x_i, x_j)^\frac{p}{2}|f(x_j) - f(x_i)|^p\right)^\frac{1}{p} &\text{ for } 1 \leq p \lt \infty,\\ \max_{x_j \sim x_i} \sqrt{w(x_i, x_j)}|f(x_j) - f(x_i)| &\text{ for } p = \infty. \end{cases} }[/math]

Weighted divergence

The adjoint operator [math]\displaystyle{ \nabla_w^*\colon\mathcal{H}(E)\rightarrow \mathcal{H}(V) }[/math] of the weighted gradient operator is a linear operator defined by

[math]\displaystyle{ \langle \nabla_wf,G\rangle_{\mathcal{H}(E)} = \langle f,\nabla_w^*G\rangle_{\mathcal{H}(V)} \quad \text{ for all } f\in \mathcal{H}(V), G\in \mathcal{H}(E). }[/math]

For undirected graphs with a symmetric weight function [math]\displaystyle{ w \in \mathcal{H}(E) }[/math] the adjoint operator [math]\displaystyle{ \nabla_w^* }[/math] of a function [math]\displaystyle{ F\in \mathcal{H}(E) }[/math] at a vertex [math]\displaystyle{ x_i\in V }[/math] has the following form:

[math]\displaystyle{ \left(\nabla_w^*F\right)(x_i) \ = \ \frac{1}{2}\sum_{x_j\sim x_i}{\sqrt{w(x_i, x_j)}(F(x_j, x_i) - F(x_i, x_j))}. }[/math]

One can then define the weighted divergence operator on graphs via the adjoint operator as [math]\displaystyle{ \operatorname{div}_w := -\nabla_w^* }[/math]. The divergence on a graph measures the net outflow of an edge function in each vertex of the graph.

Second-order differential operators

Graph Laplace operator

The weighted graph Laplacian [math]\displaystyle{ \Delta_w : \mathcal{H}(V) \rightarrow \mathcal{H}(V) }[/math] is a well-studied operator in the graph setting. Mimicking the relationship [math]\displaystyle{ \operatorname{div}(\nabla f) = \Delta f }[/math] of the Laplace operator in the continuum setting, the weighted graph Laplacian can be derived for any vertex [math]\displaystyle{ x_i \in V }[/math] as:

[math]\displaystyle{ \begin{align} (\operatorname{div}_w(\nabla_w f))(x_i) &= \frac{1}{2}\sum_{x_j \sim x_i}\sqrt{w(x_i, x_j)}(\nabla_w f(x_j, x_i) - \nabla_w f(x_i, x_j))\\ &= \frac{1}{2}\sum_{x_j \sim x_i}\sqrt{w(x_i, x_j)}\left(\sqrt{w(x_i, x_j)}(f(x_j) - f(x_i)) - \sqrt{w(x_j, x_i)}(f(x_i) - f(x_j))\right)\\ &= \frac{1}{2}\sum_{x_j \sim x_i}w(x_i, x_j)(2f(x_j) - 2f(x_i))\\ &= \sum_{x_j \sim x_i}w(x_i, x_j)(f(x_j) - f(x_i)) \ =: \ (\Delta_w f)(x_i). \end{align} }[/math]

Note that one has to assume that the graph [math]\displaystyle{ G }[/math] is undirected and has a symmetric weight function [math]\displaystyle{ w(x_i, x_j) = w(x_j, x_i) }[/math] for this representation.

Graph p-Laplace operators

The continuous [math]\displaystyle{ p }[/math]-Laplace operator is a second-order differential operator that can be well-translated to finite weighted graphs. It allows the translation of various partial differential equations, e.g., the heat equation, to the graph setting.

Based on the first-order partial difference operators on graphs one can formally derive a family of weighted graph [math]\displaystyle{ p }[/math]-Laplace operators [math]\displaystyle{ \Delta_{w,p} \colon \mathcal{H}(V) \rightarrow \mathcal{H}(V) }[/math] for [math]\displaystyle{ 1 \leq p \lt \infty }[/math] by minimization of the discrete [math]\displaystyle{ p }[/math]-Dirichlet energy functional

[math]\displaystyle{ E(f) \ := \ \frac{1}{p} \sum_{x_i \in V} \|\nabla_w f(x_i,\cdot)\|_{\ell_p}^p. }[/math]

The necessary optimality conditions for a minimizer of the energy functional [math]\displaystyle{ E }[/math] lead to the following definition of the graph [math]\displaystyle{ p }[/math]-Laplacian:

[math]\displaystyle{ (\Delta_{w,p} f)(x_i) \ := \ \sum_{x_j\sim x_i} w(x_i, x_j)^\frac{p}{2} |f(x_j) - f(x_i)|^{p-2} (f(x_j) - f(x_i)). }[/math]

Note that the graph Laplace operator is a special case of the graph [math]\displaystyle{ p }[/math]-Laplace operator for [math]\displaystyle{ p = 2 }[/math], i.e.,

[math]\displaystyle{ (\Delta_{w,2} f)(x_i) \ = \ (\Delta_w f)(x_i) \ = \ \sum_{x_j\sim x_i} w(x_i, x_j) (f(x_j) - f(x_i)). }[/math]

Applications

Calculus on finite weighted graphs is used in a wide range of applications from different fields such as image processing, machine learning, and network analysis. A non-exhaustive list of tasks in which finite weighted graphs have been employed is:

See also

  • Phase field models on graphs

Notes

1.^ Note that a slightly different definition of undirected graph is also in use, which considers an undirected edge to be a two-set (set with two distinct elements) [math]\displaystyle{ \{x_i,x_j\} }[/math] instead of a pair of ordered pairs [math]\displaystyle{ (x_i,x_j) }[/math] and [math]\displaystyle{ (x_j,x_i) }[/math]. Here the latter description is needed, as it is required to allow edge functions in [math]\displaystyle{ \mathcal{H}(E) }[/math] (see the section about the space of edge functions) to take different values on [math]\displaystyle{ (x_i,x_j) }[/math] and [math]\displaystyle{ (x_j,x_i) }[/math].

References

  1. Luxburg, Ulrike von; Audibert, Jean-Yves; Hein, Matthias (2007). "Graph Laplacians and their Convergence on Random Neighborhood Graphs". Journal of Machine Learning Research 8 (Jun): 1325–1368. ISSN 1533-7928. http://www.jmlr.org/papers/v8/hein07a.html. 
  2. 2.0 2.1 Gilboa, Guy; Osher, Stanley (2009). "Nonlocal Operators with Applications to Image Processing" (in en). Multiscale Modeling & Simulation 7 (3): 1005–1028. doi:10.1137/070698592. ISSN 1540-3459. 
  3. 3.0 3.1 Elmoataz, A.; Lezoray, O.; Bougleux, S. (2008). "Nonlocal Discrete Regularization on Weighted Graphs: A Framework for Image and Manifold Processing". IEEE Transactions on Image Processing 17 (7): 1047–1060. doi:10.1109/TIP.2008.924284. ISSN 1057-7149. PMID 18586614. Bibcode2008ITIP...17.1047E. 
  4. Desquesnes, Xavier; Elmoataz, Abderrahim; Lézoray, Olivier (2013). "Eikonal Equation Adaptation on Weighted Graphs: Fast Geometric Diffusion Process for Local and Non-local Image and Data Processing" (in en). Journal of Mathematical Imaging and Vision 46 (2): 238–257. doi:10.1007/s10851-012-0380-9. ISSN 0924-9907. https://hal.archives-ouvertes.fr/hal-00932510/file/Desquesnes-jmiv2012.pdf. 
  5. Elmoataz, Abderrahim; Toutain, Matthieu; Tenbrinck, Daniel (2015). "On the $p$-Laplacian and $\infty$-Laplacian on Graphs with Applications in Image and Data Processing" (in en). SIAM Journal on Imaging Sciences 8 (4): 2412–2451. doi:10.1137/15M1022793. ISSN 1936-4954. 
  6. Mahmood, Faisal; Shahid, Nauman; Skoglund, Ulf; Vandergheynst, Pierre (2018). "Adaptive Graph-Based Total Variation for Tomographic Reconstructions". IEEE Signal Processing Letters 25 (5): 700–704. doi:10.1109/LSP.2018.2816582. ISSN 1070-9908. Bibcode2018ISPL...25..700M. 
  7. Peyré, Gabriel; Bougleux, Sébastien; Cohen, Laurent (2008). "Non-local Regularization of Inverse Problems". in Forsyth, David; Torr, Philip; Zisserman, Andrew. Computer Vision – ECCV 2008. Lecture Notes in Computer Science. 5304. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 57–68. doi:10.1007/978-3-540-88690-7_5. ISBN 9783540886891. 
  8. Bühler, Thomas; Hein, Matthias (2009). "Spectral clustering based on the graph p -Laplacian" (in en). Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Quebec, Canada: ACM Press. pp. 81–88. doi:10.1145/1553374.1553385. ISBN 9781605585161. http://portal.acm.org/citation.cfm?doid=1553374.1553385. 
  9. Lozes, Francois; Elmoataz, Abderrahim; Lezoray, Olivier (2014). "Partial Difference Operators on Weighted Graphs for Image Processing on Surfaces and Point Clouds". IEEE Transactions on Image Processing 23 (9): 3896–3909. doi:10.1109/TIP.2014.2336548. ISSN 1057-7149. PMID 25020095. Bibcode2014ITIP...23.3896L. https://hal.archives-ouvertes.fr/hal-01096304/file/Lozes_TIP2014.pdf.