Graph partition strategies for generalized mean eld inference
Eric P. Xing Michael I. Jordan Stuart Russell
Computer Science Division Computer Science and Statistics Computer Science Division
University of California University of California University of California
Berkeley, CA 94720 Berkeley, CA 94720 Berkeley, CA 94720
Abstract raise a new question|how are the clusters to be cho-
sen? Until now, this issue has generally been left in
the hands of the algorithm designer; moreover, the de-An autonomous variational inference algo-
signer has been provided with little beyond intuitionrithm for arbitrary graphical models requires
for making these choices. For some graphical model ar-the ability to optimize variational approxi-
chitectures, there are only a few natural choices, andmations over the space of model parameters
these can be explored manually. In general, however,as well as over the choice of tractable fam-
we wish to envisage a general piece of software forilies used for the variational approximation.
variational inference which can be asked to performIn this paper, we present a novel combina-
inference for an arbitrary graph. In this setting, it istion of graph partitioning algorithms with a
essential to begin to explore automatic methods forgeneralized mean eld (GMF) inference algo-
choosing clusters.rithm. This combination optimizes over dis-
joint clustering of variables and performs in- In the setting of structured mean eld algorithms (Jor-
ference using those clusters. We provide a dan et al., 1999, Ghahramani and Jordan, 1997) it is
formal analysis of the relationship between meaningful to consider disjoint clusters, and in Xing
the graph cut and the GMF approximation, et al. (2003) we have proposed a generalized mean eld
and explore several graph partition strategies (GMF) algorithm for inference based on this assump-
empirically. Our empirical results provide tion, noting that the assumption of disjoint clusters
rather clear support for a weighted version leads to a simple set of inference equations that can
of MinCut as a useful clustering algorithm be easily implemented. Disjoint clusters have another
for GMF inference, which is consistent with virtue as well, which is the subject of the current
the implications from the formal analysis. paper|they open the door to a role for graph par-
titioning algorithms in choosing clusters for inference.
There are several intuitions that support a possi-1 Introduction
ble role for graph partitioning algorithms in the au-
tonomous choice of clusters for graphical model in-What are the prospects for fully autonomous algo-
ference. The rst is that minimum cuts are to berithms for variational inference in graphical mod-
preferred, so that as much as possible of the proba-els? Recent years have seen an increasingly sys-
bilistic dependence is captured within clusters. It alsotematic treatment of an increasingly exible range
seems likely that the values of parameters should mat-of algorithms for variational inference. In particu-
ter because they often re ect the ?coupling strength?lar, the cluster v framework has provided
of the probabilistic dependences among random vari-a range of algorithms that extend the basic \belief
ables. Another intuition is that maximum cuts shouldpropagation" framework (Yedidia et al., 2000). Sim-
be preferred, because in this case the mean eld act-ilarly, general clusters of variables are also allowed
ing across a large cut may have an expectation that isin recent treatments of structured mean eld algo-
highly concentrated, a situation which corresponds torithms (Wiegerinck, 2000). Empirical results have
the basic assumption underlying mean eld methods.shown that both kinds of generalization can yield more
Again, speci c values of parameters should matter.e ectiv e algorithms.
In this paper we provide a preliminary formal analysisWhile these developments provide much needed exi-
and a thoroughgoing empirical exploration of these is-bility for the design of e ectiv e algorithms, they alsosues. We present a theorem that relates the weight of tion function). Given a disjoint variable partitioning,
the graph cut to the quality of the bound of GMF ap- C, the true cluster conditional of each variable cluster
proximation, and study random graphs and a variety C given its Markov blanket MB is:i i
of settings of parameter values. We compare several
p(X jX = x )/C MB MBdi eren t kinds of partitioning algorithms empirically. i i in oX XAs we will show, our results turn out to provide rather
exp (X ) + (X ;x ) ; D D \C D \MB i iclear support for a clustering algorithm based on min-
D C D B i i
imal cut, which is consistent with implications drawn (2)
from the formal analysis. A more general version of the
where B denotes the set of cliques that intersect withGMF algorithm that allows non-factorizable potentials i
but are not contained in cluster C , and where theis also provide in this paper and possible extensions i
lower-case character x denotes a speci c assignment tomotivated by our formal analysis are discussed.
variable X. Without loss of generality, we assume that
all the potentials are positively weighted (i.e., > 0)2 The generalized mean eld (GMF)
and the signs are subsumed in the potential functions.
algorithm
Given a clique D , let I denote the set of indices
of clusters that have non-empty intersection with D .