MUQ  0.4.3
GaussianGammaSampling.cpp
Go to the documentation of this file.
1 /***
2 ## Overview
3 The goal of this example is to demonstrate MCMC sampling with block updates.
4 The problem is to sample a Gaussian distribution where the variance of
5 the Gaussian is a random variable endowed with an inverse Gamma distribution.
6 Thus, there are two "blocks" of interest: the Gaussian random variable and the
7 variance hyperparameter. We will sample the joint distribution of both parameters
8 by constructing a Metropolis-Within-Gibbs sampler.
9 
10 ### Problem Formulation
11 Let $x$ denote the Gaussian random variable and $\sigma^2$ its
12 variance, which follows an Inverse Gamma distribution. Notice that the joint
13 density $\pi(x,\sigma)$ can be expanded in two ways:
14 $$
15 \pi(x,\sigma^2) = \pi(x | \sigma^2)\pi(\sigma^2)
16 $$
17 and
18 $$
19 \pi(x,\sigma^2) = \pi(\sigma^2 | x)\pi(x).
20 $$
21 We will use this to simplify the Metropolis-Hastings acceptance ratios below.
22 
23 
24 Now consider a two stage Metropolis-Hastings algorithm. In the first stage
25 we take a step in $x$ with a fixed value of $\sigma^2$ and in the second stage
26 we take a step in $\sigma^2$ with a fixed value of $x$. Using the Metropolis-Hastings
27 rule for each stage, the algorithm is given by
28 
29 1. Update $x$
30 
31  a. Propose a step in the $x$ block, $x^\prime \sim q(x | x^{(k-1)}, \sigma^{(k-1)})$
32 
33  b. Compute the acceptance probability using the expanded joint density
34  $$\begin{eqnarray}
35  \gamma &=& \frac{\pi(x^\prime | \sigma^{(k-1)})\pi(\sigma^{(k-1)})}{\pi(x^{(k-1)} | \sigma^{(k-1)}) \pi(\sigma^{(k-1)})} \frac{q(x^{(k-1)} | x^\prime, \sigma^{(k-1)})}{q(x^\prime | x^{(k-1)}, \sigma^{(k-1)})} \\\\
36  &=& \frac{\pi(x^\prime | \sigma^{(k-1)})}{\pi(x^{(k-1)} | \sigma^{(k-1)})} \frac{q(x^{(k-1)} | x^\prime, \sigma^{(k-1)})}{q(x^\prime | x^{(k-1)}, \sigma^{(k-1)})}
37  \end{eqnarray}$$
38 
39  c. Take the step in the $x$ block: $x^{(k)} = x^\prime$ with probability $\min(1,\gamma)$, else $x^{(k)} = x^{(k-1)}$
40 
41 2. Update $\sigma^2$
42 
43  a. Propose a step in the $\sigma^2$ block, $\sigma^\prime \sim q(\sigma | x^{(k)}, \sigma^{(k-1)})$
44 
45  b. Compute the acceptance probability using the expanded joint density
46  $$\begin{eqnarray}
47  \gamma &=& \frac{\pi(\sigma^\prime | x^{(k)})\pi(x^{(k)})}{\pi(\sigma^{(k-1)} | x^{(k)}) \pi(x^{(k)})} \frac{q(\sigma^{(k-1)} | \sigma^\prime, x^{(k)})}{q(\sigma^\prime | \sigma^{(k-1)}, x^{(k)})}. \\\\
48  &=& \frac{\pi(\sigma^\prime | x^{(k)})}{\pi(\sigma^{(k-1)} | x^{(k)})} \frac{q(\sigma^{(k-1)} | \sigma^\prime, x^{(k)})}{q(\sigma^\prime | \sigma^{(k-1)}, x^{(k)})}.
49  \end{eqnarray}$$
50 
51  c. Take the step in the $\sigma^2$ block: $\sigma^{(k)} = \sigma^\prime$ with probability $\min(1,\gamma)$, else $\sigma^{(k)} = \sigma^{(k-1)}$
52 
53 The extra complexity of this two stage approach is warranted when one or both of the block proposals
54 $q(\sigma^\prime | \sigma^{(k-1)}, x^{(k)})$ and $q(x^\prime | x^{(k-1)}, \sigma^{(k-1)})$
55 can be chosen to match the condtiional target densities $\pi(\sigma^\prime | x^{(k)})$
56 and $\pi(x^\prime | \sigma^{(k-1)})$. When $\pi(x | \sigma^2)$ is Gaussian
57 and $\pi(\sigma^2)$ is Inverse Gamma, as is the case in this example, the conditional target distribution $\pi(\sigma^2 | x)$ can be sampled
58 directly, allowing us to choose $q(\sigma^\prime | \sigma^{(k-1)}, x^{(k)}) = \pi(\sigma^\prime | x^{(k)})$.
59 This guarantees an acceptance probability of one for the $\sigma^2$ update. Notice
60 that in this example, $\pi(x^\prime | \sigma^{(k-1)})$ is Gaussian and could also
61 be sampled directly. For illustrative purposes however, we will mix a random
62 walk proposal on $x$ with an independent Inverse Gamma proposal on $\sigma^2$.
63 
64 */
65 
66 /***
67 ## Implementation
68 To sample the Gaussian target, the code needs to do four things:
69 
70 1. Define the joint Gaussian-Inverse Gamma target density with two inputs and set up a sampling problem.
71 
72 2. Construct the blockwise Metropolis-Hastings algorithm with a mix of random walk and Inverse Gamma proposals.
73 
74 3. Run the MCMC algorithm.
75 
76 4. Analyze the results.
77 
78 ### Include statements
79 */
84 
87 #include "MUQ/Modeling/WorkGraph.h"
89 
94 
96 
97 #include <boost/property_tree/ptree.hpp>
98 
99 namespace pt = boost::property_tree;
100 using namespace muq::Modeling;
101 using namespace muq::SamplingAlgorithms;
102 using namespace muq::Utilities;
103 
104 
105 int main(){
106 
107  /***
108  ### 1. Define the joint Gaussian-Inverse Gamma target density
109  Here we need to construct the joint density $\pi(x | \sigma^2)\pi(\sigma^2)$.
110  Combining models model components in MUQ is accomplished by creating a
111  WorkGraph, adding components the graph as nodes, and then adding edges to
112  connect the components and construct the more complicated model. Once constructed,
113  our graph should look like:
114 
115  <center>
116  <img src="DocFiles/GraphImage.png" alt="Model Graph" style="width: 400px;"/>
117  </center>
118 
119  Dashed nodes in the image correspond to model inputs and nodes with solid borders
120  represent model components represented through a child of the ModPiece class.
121  The following code creates each of the components and then adds them on to the
122  graph. Notice that when the ModPiece's are added to the graph, a node name is
123  specified. Later, we will use these names to identify structure in the graph
124  that can be exploited to generate the Inverse Gamma proposal. Note that the
125  node names used below correspond to the names used in the figure above.
126  */
127  auto varPiece = std::make_shared<IdentityOperator>(1);
128 
129  /***
130  In previous examples, the only input to the Gaussian density was the parameter
131  $x$ itself. Here however, the variance is also an input. The Gaussian class
132  provides an enum for defining this type of extra input. Avaialable options are
133 
134  - `Gaussian::Mean` The mean should be included as an input
135  - `Gaussian::DiagCovariance` The covariance diagonal is an input
136  - `Gaussian::DiagPrecision` The precision diagonal is an input
137  - `Gaussian::FullCovariance` The full covariance is an input (unraveled as a vector)
138  - `Gaussian::FullPrecision` The full precision is an input (unraveled as a vector)
139 
140  The `|` operator can be used to combine options. For example, if both the mean and
141  diagonal covariance will be inputs, then we could pass `Gaussian::Mean | Gaussian::DiagCovariance`
142  to the Gaussian constructor.
143 
144  In our case, only the diagonal covariance will be an input, so we can simply use `Gaussian::DiagCovariance`.
145  */
146  Eigen::VectorXd mu(2);
147  mu << 1.0, 2.0;
148 
149  auto gaussDens = std::make_shared<Gaussian>(mu, Gaussian::DiagCovariance)->AsDensity();
150 
151  std::cout << "Gaussian piece has " << gaussDens->inputSizes.size()
152  << " inputs with sizes " << gaussDens->inputSizes.transpose() << std::endl;
153 
154  /***
155  Here we construct the Inverse Gamma distribution $\pi(\sigma^2)$
156  */
157  const double alpha = 2.5;
158  const double beta = 1.0;
159 
160  auto varDens = std::make_shared<InverseGamma>(alpha,beta)->AsDensity();
161 
162  /***
163  To define the product $\pi(x|\sigma^2)\pi(\sigma^2)$, we will use the DensityProduct class.
164  */
165  auto prodDens = std::make_shared<DensityProduct>(2);
166 
167 /***
168 The Gaussian density used here is two dimensional with the same variance in each
169 dimension. The Gaussian ModPiece thus requires a two dimensional vector to define
170 the diagonal covariance. To support that, we need to replicate the 1D vector
171 returned by the "varPiece" IdentityOperator. The ReplicateOperator class
172 provides this functionality.
173 */
174  auto replOp = std::make_shared<ReplicateOperator>(1,2);
175 
176  auto graph = std::make_shared<WorkGraph>();
177 
178  graph->AddNode(gaussDens, "Gaussian Density");
179  graph->AddNode(varPiece, "Variance");
180  graph->AddNode(varDens, "Variance Density");
181  graph->AddNode(prodDens, "Joint Density");
182  graph->AddNode(replOp, "Replicated Variance");
183 
184  graph->AddEdge("Variance", 0, "Replicated Variance", 0);
185  graph->AddEdge("Replicated Variance", 0, "Gaussian Density", 1);
186  graph->AddEdge("Variance", 0, "Variance Density", 0);
187 
188  graph->AddEdge("Gaussian Density", 0, "Joint Density", 0);
189  graph->AddEdge("Variance Density", 0, "Joint Density", 1);
190 
191  /***
192  #### Visualize the graph
193  To check to make sure we constructed the graph correctly, we will employ the
194  WorkGraph::Visualize function. This function generates an image in the folder
195  where the executable was run. The result will look like the image below. Looking
196  at this image closely, we see that, with the exception of the "Replicate Variance"
197  node, the structure matches what we expect.
198  */
199  graph->Visualize("DensityGraph.png");
200  /***
201  <center>
202  <img src="DensityGraph.png" alt="MUQ-Generated Graph" style="width: 600px;"/>
203  </center>
204  */
205 
206 
207  /***
208  #### Construct the joint density model and sampling problem
209  Here we wrap the graph into a single ModPiece that can be used to construct the
210  sampling problem.
211  */
212  auto jointDens = graph->CreateModPiece("Joint Density");
213 
214  auto problem = std::make_shared<SamplingProblem>(jointDens);
215 
216 /***
217 ### 2. Construct the blockwise Metropolis-Hastings algorithm
218 The entire two-block MCMC algorithm described above can be specified using the
219 same boost property tree approach used to specify a single chain algorithm. The
220 only difference is the number of kernels specified in the "KernelList" ptree
221 entry. Here, two other ptree blocks are specified "Kernel1" and "Kernel2".
222 
223 "Kernel1" specifies the transition kernel used to update the Gaussian variable
224 $x$. Here, it is the same random walk Metropolis (RWM) algorithm used in the
225 first MCMC example.
226 
227 "Kernel2" specifies the transition kernel used to udpate the variance $\sigma^2$.
228 It could also employ a random walk proposal, but here we use to Inverse Gamma
229 proposal, which requires knowledge about both $\pi(x | \sigma^2)$ and $\pi(\sigma^2)$.
230 To pass this information to the proposal, we specify which nodes in the WorkGraph
231 correspond to the Gaussian and Inverse Gamma densities.
232 */
233  pt::ptree pt;
234  pt.put("NumSamples", 1e5); // number of MCMC steps
235  pt.put("BurnIn", 1e4);
236  pt.put("PrintLevel",3);
237  pt.put("KernelList", "Kernel1,Kernel2"); // Name of block that defines the transition kernel
238 
239  pt.put("Kernel1.Method","MHKernel"); // Name of the transition kernel class
240  pt.put("Kernel1.Proposal", "MyProposal"); // Name of block defining the proposal distribution
241  pt.put("Kernel1.MyProposal.Method", "MHProposal"); // Name of proposal class
242  pt.put("Kernel1.MyProposal.ProposalVariance", 1.0); // Variance of the isotropic MH proposal
243 
244  pt.put("Kernel2.Method","MHKernel"); // Name of the transition kernel class
245  pt.put("Kernel2.Proposal", "GammaProposal"); // Name of block defining the proposal distribution
246 
247  pt.put("Kernel2.GammaProposal.Method", "InverseGammaProposal");
248  pt.put("Kernel2.GammaProposal.InverseGammaNode", "Variance Density");
249  pt.put("Kernel2.GammaProposal.GaussianNode", "Gaussian Density");
250 
251  /***
252  Once the algorithm parameters are specified, we can pass them to the CreateSingleChain
253  function of the MCMCFactory class to create an instance of the MCMC algorithm we defined in the
254  property tree.
255  */
256  auto mcmc = MCMCFactory::CreateSingleChain(pt, problem);
257 
258  /***
259  ### 3. Run the MCMC algorithm
260  We are now ready to run the MCMC algorithm. Here we start the chain at the
261  target densities mean. The resulting samples are returned in an instance
262  of the SampleCollection class, which internally holds the steps in the MCMC chain
263  as a vector of weighted SamplingState's.
264  */
265  std::vector<Eigen::VectorXd> startPt(2);
266  startPt.at(0) = mu; // Start the Gaussian block at the mean
267  startPt.at(1) = Eigen::VectorXd::Ones(1); // Set the starting value of the variance to 1
268 
269  std::shared_ptr<SampleCollection> samps = mcmc->Run(startPt);
270 
271  /***
272  ### 4. Analyze the results
273  */
274 
275  Eigen::VectorXd sampMean = samps->Mean();
276  std::cout << "Sample Mean = \n" << sampMean.transpose() << std::endl;
277 
278  Eigen::VectorXd sampVar = samps->Variance();
279  std::cout << "\nSample Variance = \n" << sampVar.transpose() << std::endl;
280 
281  Eigen::MatrixXd sampCov = samps->Covariance();
282  std::cout << "\nSample Covariance = \n" << sampCov << std::endl;
283 
284  Eigen::VectorXd sampMom3 = samps->CentralMoment(3);
285  std::cout << "\nSample Third Moment = \n" << sampMom3 << std::endl << std::endl;
286 
287 
288  /***
289  ### 5. Compute convergence diagnostics
290  To quantitatively assess whether the chain has converged, we need to run multiple
291  chains and then compare the results. Below we run 3 more independent chains (for a total of 4)
292  and then analyze convergence using the commonly employed \f$\hat{R}\f$ diagnostic.
293 
294  Notice that a new MCMC sampler is defined each time. If we simply called `mcmc->Run()`
295  multiple times, the sampler would always pick up where it left off.
296  */
297  int numChains = 4;
298  std::vector<std::shared_ptr<SampleCollection>> chains(numChains);
299  chains.at(0) = samps;
300 
301  for(int i=1; i<numChains; ++i){
302  startPt.at(0) = mu + RandomGenerator::GetNormal(mu.size()); // Start the Gaussian block at the mean
303  startPt.at(1) = RandomGenerator::GetNormal(1).array().exp(); // Set the starting value of the variance to 1
304 
305  mcmc = MCMCFactory::CreateSingleChain(pt, problem);
306  chains.at(i) = mcmc->Run(startPt);
307  }
308 
309  Eigen::VectorXd rhat = Diagnostics::Rhat(chains);
310  std::cout << "Rhat = " << rhat.transpose() << std::endl;
311 
312  return 0;
313 }
int main()