From HandWiki

A graphoid is a set of statements of the form, "X is irrelevant to Y given that we know Z" where X, Y and Z are sets of variables. The notion of "irrelevance" and "given that we know" may obtain different interpretations, including probabilistic, relational and correlational, depending on the application. These interpretations share common properties that can be captured by paths in graphs (hence the name "graphoid"). The theory of graphoids characterizes these properties in a finite set of axioms that are common to informational irrelevance and its graphical representations.


Judea Pearl and Azaria Paz[1] coined the term "graphoids" after discovering that a set of axioms that govern conditional independence in probability theory is shared by undirected graphs. Variables are represented as nodes in a graph in such a way that variable sets X and Y are independent conditioned on Z in the distribution whenever node set Z separates X from Y in the graph. Axioms for conditional independence in probability were derived earlier by A. Philip Dawid[2] and Wolfgang Spohn.[3] The correspondence between dependence and graphs was later extended to directed acyclic graphs (DAGs)[4][5][6] and to other models of dependency.[1][7]


A dependency model M is a subset of triplets (X,Z,Y) for which the predicate I(X,Z,Y): X is independent of Y given Z, is true. A graphoid is defined as a dependency model that is closed under the following five axioms:

  1. Symmetry: [math]\displaystyle{ I(X,Z,Y) \Leftrightarrow I(Y,Z,X) }[/math]
  2. Decomposition: [math]\displaystyle{ I(X,Z,Y\cup W) \Rightarrow I(X,Z,Y)~\&~I(X,Z,W) }[/math]
  3. Weak Union: [math]\displaystyle{ I(X,Z,Y\cup W) \Rightarrow I(X,Z\cup W,Y) }[/math]
  4. Contraction: [math]\displaystyle{ I(X,Z,Y)~\&~I(X,Z\cup Y,W) \Rightarrow I(X,Z,Y\cup W) }[/math]
  5. Intersection: [math]\displaystyle{ I(X,Z\cup W,Y)~\&~I(X,Z\cup Y,W) \Rightarrow I(X,Z,Y\cup W) }[/math]

A semi-graphoid is a dependency model closed under 1–4. These five axioms together are known as the graphoid axioms.[8] Intuitively, the weak union and contraction properties mean that irrelevant information should not alter the relevance status of other propositions in the system; what was relevant remains relevant and what was irrelevant remains irrelevant.[8]

Types of graphoids

Probabilistic graphoids[1][7]

Conditional independence, defined as

[math]\displaystyle{ I(X,Z,Y) \Leftrightarrow P(X\mid Y, Z) = P(X\mid Z) }[/math]

is a semi-graphoid which becomes a full graphoid when P is strictly positive.

Correlational graphoids[1][7]

A dependency model is a correlational graphoid if in some probability function we have,

[math]\displaystyle{ I_c(X,Y,Z) \Leftrightarrow \rho_{xy.z}=0\text{ for every }x \in X\text{ and }y \in Y }[/math]

where [math]\displaystyle{ \rho_{xy.z} }[/math] is the partial correlation between x and y given set Z.

In other words, the linear estimation error of the variables in X using measurements on Z would not be reduced by adding measurements of the variables in Y, thus making Y irrelevant to the estimation of X. Correlational and probabilistic dependency models coincide for normal distributions.

Relational graphoids[1][7]

A dependency model is a relational graphoid if it satisfies

[math]\displaystyle{ P(X,Z)\gt 0~\&~P(Y,Z)\gt 0 \implies P(X,Y,Z)\gt 0. }[/math]

In words, the range of values permitted for X is not restricted by the choice of Y, once Z is fixed. Independence statements belonging to this model are similar to embedded multi-valued dependencies (EMVD s) in databases.

Graph-induced graphoids

If there exists an undirected graph G such that,

[math]\displaystyle{ I(X,Z,Y) \Leftrightarrow \langle X,Z,Y\rangle_G, }[/math]

then the graphoid is called graph-induced. In other words, there exists an undirected graph G such that every independence statement in M is reflected as a vertex separation in G and vice versa. A necessary and sufficient condition for a dependency model to be a graph-induced graphoid is that it satisfies the following axioms: symmetry, decomposition, intersection, strong union and transitivity.

Strong union states that

[math]\displaystyle{ I(X,Z,Y) \implies I(X,Z\cup W,Y) }[/math]

Transitivity states that

[math]\displaystyle{ I(X,Z,Y) \implies \left(\forall~\gamma \notin X \cup Y \cup Z,~~I(X,Z,\gamma) \text{ or } I(\gamma, Z,Y)\right) }[/math]

The axioms symmetry, decomposition, intersection, strong union and transitivity constitute a complete characterization of undirected graphs.[9]

DAG-induced graphoids

A graphoid is termed DAG-induced if there exists a directed acyclic graph D such that [math]\displaystyle{ I(X,Z,Y) \Leftrightarrow \langle X,Z,Y\rangle_D }[/math] where [math]\displaystyle{ \langle X,Z,Y\rangle_D }[/math] stands for d-separation in D. d-separation (d-connotes "directional") extends the notion of vertex separation from undirected graphs to directed acyclic graphs. It permits the reading of conditional independencies from the structure of Bayesian networks. However, conditional independencies in a DAG cannot be completely characterized by a finite set of axioms. [10]

Inclusion and construction

Graph-induced and DAG-induced graphoids are both contained in probabilistic graphoids.[11] This means that for every graph G there exists a probability distribution P such that every conditional independence in P is represented in G, and vice versa. The same is true for DAGs. However, there are probabilistic distributions that are not graphoids and, moreover, there is no finite axiomatization for probabilistic conditional dependencies. [12]

Thomas Verma showed that every semi-graphoid has a recursive way of constructing a DAG in which every d-separation is valid.[13] The construction is similar to that used in Bayes networks and goes as follows:

  1. Arrange the variables in some arbitrary order 1, 2,...,i,...,N and, starting with i = 1,
  2. choose for each node i a set of nodes PAi such that i is independent on all its predecessors, 1, 2,...,i − 1, conditioned on PAi.
  3. Draw arrows from PAi to i and continue.

The DAG created by this construction will represent all the conditional independencies that follow from those used in the construction. Furthermore, every d-separation shown in the DAG will be a valid conditional independence in the graphoid used in the construction.


  1. 1.0 1.1 1.2 1.3 1.4 Pearl, Judea; Paz, Azaria (1985). "Graphoids: A Graph-Based Logic for Reasoning About Relevance Relations". 
  2. Dawid, A. Philip (1979). "Conditional independence in statistical theory". Journal of the Royal Statistical Society, Series B: 1–31. 
  3. Spohn, Wolfgang (1980). "Stochastic independence, causal independence, and shieldability". Journal of Philosophical Logic 9: 73–99. doi:10.1007/bf00258078. 
  4. Pearl, Judea (1986). "Fusion, propagation and structuring in belief networks". Artificial Intelligence 29 (3): 241–288. doi:10.1016/0004-3702(86)90072-x. 
  5. Verma, Thomas; Pearl, Judea (1988). "Causal networks: Semantics and expressiveness". Proceedings of the 4th Workshop on Uncertainty in Artificial Intelligence: 352–359. 
  6. Lauritzen, S.L. (1996). Graphical Models. Oxford: Clarendon Press. 
  7. 7.0 7.1 7.2 7.3 Geiger, Dan (1990). "Graphoids: A Qualitative Framework for Probabilistic Inference" (PhD Dissertation, Technical Report R-142, Computer Science Department, University of California, Los Angeles). 
  8. 8.0 8.1 Pearl, Judea (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann. 
  9. A. Paz, J. Pearl, and S. Ur, "A New Characterization of Graphs Based on Interception Relations" Journal of Graph Theory, Vol. 22, No. 2, 125-136, 1996.
  10. Geiger, D. (1987). "The non-axiomatizability of dependencies in directed acyclic graphs". UCLA Computer Science Tech Report R-83. 
  11. Geiger, D.; Pearl, J. (1993). "Logical and algorithmic properties of conditional independence and graphical models". The Annals of Statistics 21 (4): 2001–2021. doi:10.1214/aos/1176349407. 
  12. Studeny, M. (1992). Kubik, S.; Visek, J.A.. eds. "Conditional independence relations have no finite complete characterization". Information Theory, Statistical Decision Functions and Random Processes. Transactions of the 11th Prague Conference (Dordrecht: Kluwer) B: 377–396. 
  13. Verma, T.; Pearl, J. (1990). Shachter, R.; Levitt, T.S.; Kanal, L.N.. eds. "Causal Networks: Semantics and Expressiveness". Uncertainty in AI 4 (Elsevier Science Publishers): 69–76.