MUQ  0.4.3
GaussianSampling.cpp
Go to the documentation of this file.
1 /***
2 ## Overview
3 The goal of this example is to demonstrate the use of MUQ's MCMC stack by sampling
4 a simple bivariate Gaussian density. To keep things as simple as possible, we
5 employ a Metropolis-Hastings transition kernel with a simple random walk proposal.
6 The idea is to introduce the MUQ MCMC workflow without additional the additional
7 complexities that come from more challenging target densities or more complicated
8 MCMC algorithms.
9 
10 ### Background
11 Let $x$ denote a random variable taking values in a space $\mathcal{X}$, and let $\pi(x)$ denote the
12 probability density of $x$. In many cases, we cannot compute expectations with
13 respect to $\pi(x)$ analytically, and we need to resort to some sort of numerical
14 integration. Typically, such approaches approximate an expectation $\mathbb{E}_x \left[f(x)\right]$,
15 through some sort of weighted sum that takes the form
16 $$
17 \mathbb{E}_x\left[f(x)\right] \approx \sum_{k=1}^K w_k f\left(x^{(k)}\right).
18 $$
19 In standard Monte Carlo procedures, the weights are constant $w_k=\frac{1}{K}$ and
20 points $x^{(k)}$ are independent samples of $\pi(x)$. However, generating
21 independent samples is not always possible for general $\pi(x)$. Markov chain
22 Monte Carlo is one way to get around this. The basic idea is to create a sequence
23 of points (e.g., a chain) that are correlated, but for large $K$, can still be used in a Monte
24 Carlo approximation. We further restrict that the chain is Markov, which simply means that $x^{(k)}$
25 depends only on the previous step in the chain $x^{(k-1)}$.
26 
27 #### Transition Kernels
28 The transition from $x^{(k-1)}$ to $x^{(k)}$ is controlled by a probability distribution
29 over $\mathcal{X}$ called the transition kernel. This distribution is consturcted
30 to ensure that the chain can be used to form a Monte Carlo approximation as $K\rightarrow \infty$.
31 More precisely, the transition kernel is constructed to ensure that the chain forms
32 a stationary random process with stationary distribution $\pi(x)$ and is ergodic,
33 which ensures the sample mean will converge to the true expecation almost surely
34 as $K\rightarrow \infty$. Note that establishing a central limit theorem and
35 studying the variance of the MCMC estimator requires additional technical conditions
36 on the transition kernel. See <a href=https://www.springer.com/us/book/9780387212395>Robert and Casella, <i>Monte Carlo Statistical Methods</i></a>
37 for more details.
38 
39 #### Metropolis-Hastings Rule
40 While not the only option, the Metropolis-Hastings rule is one of the most common
41 methods for constructing an appropriate transition kernel. The idea is to start
42 with a proposal distribution $q(x | x^{(k-1)})$ and then "correct" the proposal
43 with an accept/reject step to ensure ergodicity. The basic procedure is as follows
44 
45 1. Generate a sample $x^\prime$ from the proposal distribution
46 $$
47 x^\prime \sim q(x | x^{(k-1)}).
48 $$
49 
50 2. Compute the acceptance probability $\gamma$
51 $$
52 \gamma = \frac{\pi(x^\prime)}{\pi(x^{(k-1)})} \frac{q(x^{(k-1)} | x^\prime)}{q(x^\prime | x^{(k-1)})}.
53 $$
54 
55 3. Accept the proposed step $x^\prime$ as the next state with probability $\gamma$
56 $$
57 x^{(k)} = \left\\{ \begin{array}{lll} x^\prime & \text{with probability} & \gamma\\\\ x^{(k-1)} & \text{with probability} & 1.0-\gamma\end{array}\right\\}.
58 $$
59 
60 #### Proposal Distributions
61 Clearly, the proposal density $q(x | x^{(k-1)})$ is a key component of the Metropolis-Hastings
62 transition kernel. Fortunately though, there is incredible flexibility in the
63 choice of proposal; the Metropolis-Hastings correction step ensures, at least
64 asymptotically, that the resulting chain will be ergodic. While not
65 the most efficient, a common choice of proposal is an isotropic Gaussian distribution
66 centered at $x^{(k-1)}$. The resulting algorithm is commonly referred to as
67 the "Random Walk Metropolis" (RWM) algorithm. Below, we will use the RWM
68 algorithm in MUQ to sample a simple bivariate Gaussian target density $\pi(x)$.
69 
70 ### MCMC in MUQ
71 The MCMC classes in MUQ are analogous to the mathematical components of an MCMC
72 algorithm: there is a base class representing the chain, another base class representing
73 the transition kernel, and for Metropolis-Hastings, a third base class representing
74 the proposal distribution. The RWM algorithm can be constructed by combining an
75 instance of the "SingleChainMCMC" chain class, with an instance of the "MHKernel"
76 transition kernel class, and an instance of the "RandomWalk" proposal class. In
77 later examples, the flexibility of this structure will become clear, as we construct
78 algorithms of increasing complexity by simply exchanging each of these components
79 with alternative chains, kernels, and proposals.
80 
81 ## Problem Description
82 Use the Random Walk Metropolis algorithm to sample a two-dimensional normal
83 distribution with mean
84 $$
85 \mu = \left[\begin{array}{c} 1.0\\\\ 2.0\end{array}\right],
86 $$
87 and covariance
88 $$
89 \Sigma = \left[\begin{array}{cc} 1.0 & 0.8 \\\\ 0.8 & 1.5 \end{array}\right].
90 $$
91 */
92 
93 /***
94 ## Implementation
95 To sample the Gaussian target, the code needs to do four things:
96 
97 1. Define the target density and set up a sampling problem.
98 
99 2. Construct the RWM algorithm.
100 
101 3. Run the MCMC algorithm.
102 
103 4. Analyze the results.
104 
105 ### Include statements
106 Include the necessary header files from MUQ and elsewhere. Notice our use of the
107 <a href=https://www.boost.org/doc/libs/1_65_1/doc/html/property_tree.html>boost::property_tree class</a>.
108 We use property tree's to pass in algorithm parameters, and even to define the
109 chain, kernel, and proposal themselves.
110 */
113 
118 
120 
121 #include <boost/property_tree/ptree.hpp>
122 
123 namespace pt = boost::property_tree;
124 using namespace muq::Modeling;
125 using namespace muq::SamplingAlgorithms;
126 using namespace muq::Utilities;
127 
128 
129 int main(){
130 
131  /***
132  ### 1. Define the target density and set up sampling problem
133  MUQ has extensive tools for combining many model compoenents into larger
134  more complicated models. The AbstractSamplingProblem base class and its
135  children, like the SamplingProblem class, define the interface between
136  sampling algorithms like MCMC and the models and densities they work with.
137 
138  Here, we create a very simple target density and then construct a SamplingProblem
139  directly from the density.
140  */
141  /***
142  Define the Target Density:
143  */
144  Eigen::VectorXd mu(2);
145  mu << 1.0, 2.0;
146 
147  Eigen::MatrixXd cov(2,2);
148  cov << 1.0, 0.8,
149  0.8, 1.5;
150 
151  auto targetDensity = std::make_shared<Gaussian>(mu, cov)->AsDensity(); // standard normal Gaussian
152 
153  /***
154  Create the Sampling Problem:
155  */
156  auto problem = std::make_shared<SamplingProblem>(targetDensity);
157 
158  /***
159  ### 2. Construct the RWM algorithm
160  One of the easiest ways to define an MCMC algorithm is to put all of the algorithm
161  parameters, including the kernel and proposal definitions, in a property tree
162  and then let MUQ construct each of the algorithm components: chain, kernel, and
163  proposal.
164 
165  The boost property tree will have the following entries:
166 
167  - NumSamples : 10000
168  - KernelList "Kernel1"
169  - Kernel1
170  * Method : "MHKernel"
171  * Proposal : "MyProposal"
172  * MyProposal
173  + Method : "MHProposal"
174  + ProposalVariance : 0.5
175 
176 At the base level, we specify the number of steps in the chain with the entry "NumSamples".
177  Note that this number includes any burnin samples. The kernel is then defined
178  in the "KernelList" entry. The value, "Kernel1", specifies a block in the
179  property tree with the kernel definition. In the "Kernel1" block, we set the
180  kernel to "MHKernel," which specifies that we want to use the Metropolis-Hastings
181  kernel. We also tell MUQ to use the "MyProposal" block to define the proposal.
182  The proposal method is specified as "MHProposal", which is the random walk
183  proposal used in the RWM algorithm, and the proposal variance is set to 0.5.
184  */
185 
186  // parameters for the sampler
187  pt::ptree pt;
188  pt.put("NumSamples", 1e4); // number of MCMC steps
189  pt.put("BurnIn", 1e3);
190  pt.put("PrintLevel",3);
191  pt.put("KernelList", "Kernel1"); // Name of block that defines the transition kernel
192  pt.put("Kernel1.Method","MHKernel"); // Name of the transition kernel class
193  pt.put("Kernel1.Proposal", "MyProposal"); // Name of block defining the proposal distribution
194  pt.put("Kernel1.MyProposal.Method", "MHProposal"); // Name of proposal class
195  pt.put("Kernel1.MyProposal.ProposalVariance", 2.5); // Variance of the isotropic MH proposal
196 
197  /***
198  Once the algorithm parameters are specified, we can pass them to the CreateSingleChain
199  function of the MCMCFactory class to create an instance of the MCMC algorithm we defined in the
200  property tree.
201  */
202  Eigen::VectorXd startPt = mu;
203  auto mcmc = MCMCFactory::CreateSingleChain(pt, problem);
204 
205  /***
206  ### 3. Run the MCMC algorithm
207  We are now ready to run the MCMC algorithm. Here we start the chain at the
208  target densities mean. The resulting samples are returned in an instance
209  of the SampleCollection class, which internally holds the steps in the MCMC chain
210  as a vector of weighted SamplingState's.
211  */
212  std::shared_ptr<SampleCollection> samps = mcmc->Run(startPt);
213 
214  /***
215  ### 4. Analyze the results
216 
217  When looking at the entries in a SampleCollection, it is important to note that
218  the states stored by a SampleCollection are weighted even in the MCMC setting.
219  When a proposal $x^\prime$ is rejected, instead of making a copy of $x^{(k-1)}$
220  for $x^{(k)}$, the weight on $x^{(k-1)}$ is simply incremented. This is useful
221  for large chains in high dimensional parameter spaces, where storing all duplicates
222  could quickly consume available memory.
223 
224  The SampleCollection class provides several functions for computing sample moments.
225  For example, here we compute the mean, variance, and third central moment.
226  While the third moment is actually a tensor, here we only return the marginal
227  values, i.e., $\mathbb{E}_x[(x_i-\mu_i)^3]$ for each $i$.
228  */
229  Eigen::VectorXd sampMean = samps->Mean();
230  std::cout << "\nSample Mean = \n" << sampMean.transpose() << std::endl;
231 
232  Eigen::VectorXd sampVar = samps->Variance();
233  std::cout << "\nSample Variance = \n" << sampVar.transpose() << std::endl;
234 
235  Eigen::MatrixXd sampCov = samps->Covariance();
236  std::cout << "\nSample Covariance = \n" << sampCov << std::endl;
237 
238  Eigen::VectorXd sampMom3 = samps->CentralMoment(3);
239  std::cout << "\nSample Third Moment = \n" << sampMom3 << std::endl << std::endl;
240 
241  /***
242  #### Statistical Accuracy
243  In addition to looking at moments and other expectations with respect to the target distribution, we can also look at the
244  statistical accuracy of the mean estimate. Specifically, we can look at the Monte Carlo Standard Error (MCSE) and effective
245  sample size of the chain. There are multiple ways of computing these quantities and MUQ provides implementations of both
246  batch and spectral methods. Batch methods use the means of different subsets of the chain to estimate the estimator variance
247  whereas spectral methods look at the autocorrelation of the chain. MUQ uses the spectral method described in
248  [Monte Carlo errors with less error](https://doi.org/10.1016/S0010-4655(03)00467-3).
249  */
250 
251  Eigen::VectorXd batchESS = samps->ESS("Batch");
252  Eigen::VectorXd batchMCSE = samps->StandardError("Batch");
253 
254  Eigen::VectorXd spectralESS = samps->ESS("Wolff");
255  Eigen::VectorXd spectralMCSE = samps->StandardError("Wolff");
256 
257  std::cout << "ESS:\n";
258  std::cout << " Batch: " << batchESS.transpose() << std::endl;
259  std::cout << " Spectral: " << spectralESS.transpose() << std::endl;
260  std::cout << "MCSE:\n";
261  std::cout << " Batch: " << batchMCSE.transpose() << std::endl;
262  std::cout << " Spectral: " << spectralMCSE.transpose() << std::endl;
263 
264  /***
265  ### 5. Compute convergence diagnostics
266  To quantitatively assess whether the chain has converged, we need to run multiple
267  chains and then compare the results. Below we run 3 more independent chains (for a total of 4)
268  and then analyze convergence using the commonly employed $\hat{R}$ diagnostic. A value of $\hat{R}$ close to $1$ (e.g., $<1.01$)
269  implies that the chains have converged. More discussion on this point, as well as a description of the split-rank approach used
270  in MUQ to estimat $\hat{R}$, can be found in [Rank-normalization, folding, and localization: An improved R for assessing convergence of MCMC](https://arxiv.org/pdf/1903.08008.pdf).
271 
272  Notice that a new MCMC sampler is defined each time with a randomly selected starting point. If we simply called `mcmc.Run()`
273  multiple times, the sampler would always pick up where it left off. For the estimation of $\hat{R}$, it is also important that
274  the initials states of these chains be drawn from a distribution that is more "diffuse" than the target distribution.
275  */
276  pt.put("PrintLevel",0);
277  int numChains = 4;
278  std::vector<std::shared_ptr<SampleCollection>> chains(numChains);
279  chains.at(0) = samps;
280 
281  for(int i=1; i<numChains; ++i){
282  std::cout << "Running chain " << i << "..." << std::flush;
283  Eigen::VectorXd x0 = startPt + 1.5*RandomGenerator::GetNormal(mu.size()); // Start the Gaussian block at the mean
284 
285  mcmc = MCMCFactory::CreateSingleChain(pt, problem);
286  chains.at(i) = mcmc->Run(x0);
287 
288  std::cout << " done" << std::endl;
289  }
290 
291  Eigen::VectorXd rhat = Diagnostics::Rhat(chains);
292  std::cout << "\nRhat = " << rhat.transpose() << std::endl;
293 
294 
295  return 0;
296 }
int main()