A class to hold and analyze a collection of SamplingState objects. More...
#include <SampleCollection.h>
A class to hold and analyze a collection of SamplingState objects.
Definition at line 71 of file SampleCollection.h.
Public Member Functions | |
SampleCollection ()=default | |
virtual | ~SampleCollection ()=default |
virtual void | Add (std::shared_ptr< SamplingState > newSamp) |
virtual std::shared_ptr< SamplingState > | at (unsigned i) |
virtual const std::shared_ptr< SamplingState > | at (unsigned i) const |
virtual unsigned int | size () const |
virtual std::shared_ptr< SampleCollection > | head (unsigned int N) const |
virtual std::shared_ptr< SampleCollection > | tail (unsigned int N) const |
virtual std::shared_ptr< SampleCollection > | segment (unsigned int startInd, unsigned int length, unsigned int skipBy=1) const |
virtual const std::shared_ptr< SamplingState > | back () const |
virtual Eigen::VectorXd | CentralMoment (unsigned order, Eigen::VectorXd const &mean, int blockNum=-1) const override |
Computes the componentwise central moments (e.g., variance, skewness, kurtosis, etc..) of a specific order given that we already know the mean. More... | |
virtual Eigen::VectorXd | CentralMoment (unsigned int order, int blockNum=-1) const override |
virtual Eigen::VectorXd | Mean (int blockInd=-1) const override |
virtual Eigen::MatrixXd | Covariance (int blockInd=-1) const override |
virtual Eigen::MatrixXd | Covariance (Eigen::VectorXd const &mean, int blockInd=-1) const override |
virtual std::vector< Eigen::MatrixXd > | RunningCovariance (int blockInd=-1) const |
virtual std::vector< Eigen::MatrixXd > | RunningCovariance (Eigen::VectorXd const &mean, int blockInd=-1) const |
virtual Eigen::VectorXd | ESS (std::string const &method="Batch") const override |
Returns the effective sample size of the samples. More... | |
virtual Eigen::VectorXd | ESS (int blockDim) const override |
virtual Eigen::VectorXd | ESS (int blockDim, std::string const &method) const override |
Eigen::VectorXd | BatchESS (int blockInd=-1, int batchSize=-1, int overlap=-1) const |
double | MultiBatchESS (int blockInd=-1, int batchSize=-1, int overlap=-1) const |
virtual Eigen::VectorXd | StandardError (int blockInd, std::string const &method) const override |
virtual Eigen::VectorXd | StandardError (int blockInd) const override |
virtual Eigen::VectorXd | StandardError (std::string const &method="Batch") const override |
virtual Eigen::VectorXd | BatchError (int blockInd=-1, int batchSize=-1, int overlap=-1) const |
virtual Eigen::VectorXd | MultiBatchError (int blockInd=-1, int batchSize=-1, int overlap=-1) const |
virtual Eigen::MatrixXd | AsMatrix (int blockDim=-1) const |
virtual Eigen::VectorXd | Weights () const |
virtual Eigen::VectorXd | Rhat (int blockDim, unsigned int numSegments=4, boost::property_tree::ptree options=boost::property_tree::ptree()) const |
virtual Eigen::VectorXd | Rhat (unsigned int numSegments=4, boost::property_tree::ptree options=boost::property_tree::ptree()) const |
Public Member Functions inherited from muq::SamplingAlgorithms::SampleEstimator | |
virtual | ~SampleEstimator ()=default |
virtual unsigned int | BlockSize (int blockInd) const =0 |
virtual unsigned int | NumBlocks () const =0 |
virtual Eigen::VectorXd | CentralMoment (unsigned int order, Eigen::VectorXd const &mean, int blockNum=-1) const |
virtual Eigen::VectorXd | StandardizedMoment (unsigned int order, int blockInd=-1) const |
virtual Eigen::VectorXd | StandardizedMoment (unsigned int order, Eigen::VectorXd const &mean, int blockInd=-1) const |
virtual Eigen::VectorXd | StandardizedMoment (unsigned int order, Eigen::VectorXd const &mean, Eigen::VectorXd const &stdDev, int blockInd=-1) const |
virtual Eigen::VectorXd | Variance (int blockInd=-1) const |
virtual Eigen::VectorXd | Variance (Eigen::VectorXd const &mean, int blockInd=-1) const |
virtual Eigen::VectorXd | Skewness (int blockInd=-1) const |
virtual Eigen::VectorXd | Skewness (Eigen::VectorXd const &mean, int blockInd=-1) const |
virtual Eigen::VectorXd | Skewness (Eigen::VectorXd const &mean, Eigen::VectorXd const &stdDev, int blockInd=-1) const |
virtual Eigen::VectorXd | Kurtosis (int blockInd=-1) const |
virtual Eigen::VectorXd | Kurtosis (Eigen::VectorXd const &mean, int blockInd=-1) const |
virtual Eigen::VectorXd | Kurtosis (Eigen::VectorXd const &mean, Eigen::VectorXd const &stdDev, int blockInd=-1) const |
virtual Eigen::VectorXd | ExpectedValue (std::shared_ptr< muq::Modeling::ModPiece > const &f, std::vector< std::string > const &metains=std::vector< std::string >()) const =0 |
Static Public Member Functions | |
static std::shared_ptr< SampleCollection > | FromMatrix (Eigen::MatrixXd const &samps) |
|
default |
|
virtualdefault |
|
virtual |
Reimplemented in muq::SamplingAlgorithms::DistributedCollection.
Definition at line 107 of file SampleCollection.cpp.
References samples.
|
virtual |
Returns the samples in this collection as a matrix. Each column of the matrix will correspond to a single state in the chain.
Reimplemented in muq::SamplingAlgorithms::DistributedCollection.
Definition at line 438 of file SampleCollection.cpp.
References samples.
Referenced by muq::SamplingAlgorithms::MarkovChain::WolffESS().
|
virtual |
Reimplemented in muq::SamplingAlgorithms::DistributedCollection.
Definition at line 113 of file SampleCollection.cpp.
References samples.
Referenced by GetMeta(), RunningCovariance(), segment(), and muq::SamplingAlgorithms::MarkovChain::segment().
|
virtual |
Reimplemented in muq::SamplingAlgorithms::DistributedCollection.
Definition at line 117 of file SampleCollection.cpp.
References samples.
|
virtual |
Definition at line 134 of file SampleCollection.cpp.
References samples.
|
virtual |
Estimates the Monte Carlo standard error (square root of estimator variance) using overlapping batch means (OBM). For details, consult [Flegal2010] .
[in] | blockInd | Specifies the block of the sampling state we're interested in. Defaults to -1, which will result in all blocks of the sampling state being concatenated in the MCSE estimate. |
[in] | batchSize | The size of the batches \(b_n\) to use. Defaults to \(\sqrt{N}\). Note that to ensure the convergence of the MCSE estimate as \(N\rightarrow\infty\), the batch size must increase with \(N\). |
[in] | overlap | How much the batches overlap. Defaults to \(0.75*b_n\), where \(b_n\) is the batch size. A nonoverlapping batch estimate will be use is overlap=0. This value bust be smaller than \(b_n\). |
Definition at line 384 of file SampleCollection.cpp.
References muq::SamplingAlgorithms::SampleEstimator::BlockSize(), Mean(), segment(), and size().
Referenced by BatchESS(), muq::SamplingAlgorithms::MarkovChain::StandardError(), and StandardError().
Eigen::VectorXd SampleCollection::BatchESS | ( | int | blockInd = -1 , |
int | batchSize = -1 , |
||
int | overlap = -1 |
||
) | const |
Computes the effective sample size using the overlapping or nonoverlapping batch method. (See e.g., [Flegal2010]).
[in] | blockInd | The block of the SampleState that we want to use in the ESS computation. |
[in] | batchSize | The size of the batches \(b_n\) to use. Defaults to \(N^{1/3}\). Note that to ensure the convergence of the ESS estimate as \(N\rightarrow\infty\), the batch size must increase with \(N\). |
[in] | overlap | How much the batches overlap. Defaults to \(b_n/2\), where \(b_n\) is the batch size. A nonoverlapping batch estimate will be use is overlap=0. This value bust be smaller than \(b_n\). |
Definition at line 356 of file SampleCollection.cpp.
References BatchError(), size(), and muq::SamplingAlgorithms::SampleEstimator::Variance().
Referenced by muq::SamplingAlgorithms::MarkovChain::ESS(), and ESS().
|
inlineoverridevirtual |
The central moment of order $p$ is given by
\[ \mathbb{E}\left[ \left(X-\mathbb{E}[X]\right)^p \right] \]
This function returns a sample-based estimate of the central moment. In the default implementation of this function, the SampleEstimator::ExpectedValue function is called twice. Once to compute \(\mathbb{E}[X]\) and then again to compute \(\mathbb{E}\left[ \left(X-\mathbb{E}[X]\right)^p \right]\). Child classes might provide alternative implementations.
[in] | order | The order \(p\) of the central moment. \(p=2\) yields the variance. |
[in] | blockNum | (Optional) The block of the random variable \(x\) to use in the expectation. By default, blockNum=-1 and the expectation is computed with respect to the entire random variable \(x\). |
Reimplemented from muq::SamplingAlgorithms::SampleEstimator.
Definition at line 108 of file SampleCollection.h.
References CentralMoment(), and Mean().
|
overridevirtual |
Computes the componentwise central moments (e.g., variance, skewness, kurtosis, etc..) of a specific order given that we already know the mean.
Definition at line 139 of file SampleCollection.cpp.
References RecursiveSum(), and samples.
Referenced by CentralMoment().
|
overridevirtual |
Computes the sample covariance using a precomputed (or known) mean.
If blockInd is non-negative, only the mean of one block of the samples is computed.
Implements muq::SamplingAlgorithms::SampleEstimator.
Definition at line 161 of file SampleCollection.cpp.
References samples.
|
inlineoverridevirtual |
Computes the sample covariance of \(x\) with itself (if blockInd==-1
) or \(x_i\) with itself (if blockInd==i
), i.e.,
\[ \text{Cov}[x] = \mathbb{E}\left[ \left(x-\mathbb{E}[x]\right)\left(x-\mathbb{E}[x]\right)^T\right] \]
or
\[ \text{Cov}[x_i] = \mathbb{E}\left[ \left(x_i-\mathbb{E}[x_i]\right)\left(x_i-\mathbb{E}[x_i]\right)^T\right] \]
Note that it is only possible to compute the cross covariance of \(x_i\) with \(x_j\) by setting blockInd=-1
and computing the entire covariance matrix.
[in] | blockInd | (Optional) The block of the random variable $x$ to use in the expectation. By default, blockInd=-1 and the expectation is computed with respect to the entire random variable $x$. |
Reimplemented from muq::SamplingAlgorithms::SampleEstimator.
Reimplemented in muq::SamplingAlgorithms::DistributedCollection.
Definition at line 112 of file SampleCollection.h.
References muq::SamplingAlgorithms::SampleEstimator::Covariance().
Referenced by MultiBatchESS().
|
private |
[in] | hdf5file | The hdf5 file where the data will be written |
[in] | dataname | The name of the data set we (may) want to create |
[in] | dataSize | The number of rows (the size of the data in one sample) |
[in] | totSamps | The total number of samples we need to write to the file (max. number of samples) |
Definition at line 474 of file SampleCollection.cpp.
References size().
|
inlineoverridevirtual |
[in] | blockDim | The block of the SampleState that we want to use in the ESS computation. |
Reimplemented from muq::SamplingAlgorithms::SampleEstimator.
Reimplemented in muq::SamplingAlgorithms::DistributedCollection, and muq::SamplingAlgorithms::MarkovChain.
Definition at line 159 of file SampleCollection.h.
References ESS().
Referenced by ESS().
|
overridevirtual |
[in] | blockDim | The block of the SampleState that we want to use in the ESS computation. |
[in] | method | A string specifying the method used to estimate the ESS. Options in SampleCollection are "Batch" or "MultiBatch". The MarkovChain class also supports a "Wolff" method. |
Implements muq::SamplingAlgorithms::SampleEstimator.
Reimplemented in muq::SamplingAlgorithms::MarkovChain.
Definition at line 288 of file SampleCollection.cpp.
References BatchESS(), and MultiBatchESS().
|
inlineoverridevirtual |
Returns the effective sample size of the samples.
For almost all random variables of interest, the central limit theorem states that the variance of a Monte Carlo estimator
\[\hat{\mu}_N = \frac{1}{N}\sum_{i=1}^N \theta^{(i)},\]
satisfies
\[ \sqrt{N}\mathbb{V}\left[\hat{\mu}_N-\mu\right] \rightarrow N(0,\sigma^2),\]
as the number of samples \(N\) increases. Here, \(mu\) is the true mean of the random variable, \(\sigma^2\) is the true variance of the random variable. This assumes that each \(\theta^{(i)}\) is an independent sample with unity weight.
When the samples are not independent or the weights are not unity, the variance of the estimator will generally be larger. The "effective sample size" (ESS) describes how many independent samples would have been needed to obtain the same estimator variance. In particular, let \(\tilde{\mu}_N\) be a Monte Carlo estimator based on \(N\) correlated samples.
If method=="Batch" (default) The overlapping batch method (OBM) described in [Flegal2010] is used. This method is also applied to each component independently, resulting in an ESS estimate for each component.
If method=="MultiBatch", The multivariate method of [Vats2019] is employed. This method takes into account the joint correlation of all components of the chain and returns a single ESS. This approach is preferred in high dimensional settings.
[in] | method | A string specifying the method used to estimate the ESS. Options in SampleCollection are "Batch" or "MultiBatch". The MarkovChain class also supports a "Wolff" method. |
Reimplemented from muq::SamplingAlgorithms::SampleEstimator.
Reimplemented in muq::SamplingAlgorithms::MarkovChain.
Definition at line 153 of file SampleCollection.h.
References ESS().
Referenced by ESS().
|
static |
Constructs a SampleCollection from a matrix of samples. Each column of the matrix is a sample.
Definition at line 427 of file SampleCollection.cpp.
|
protected |
Definition at line 540 of file SampleCollection.cpp.
|
inlinevirtual |
Returns a new sample collection with the first \(N\) states of this collection. Equivalent to a python command like list[0:N]
Definition at line 88 of file SampleCollection.h.
References segment().
|
overridevirtual |
Computes the sample mean of \(x\) (if blockInd==-1
) or \(x_i\) (if blockInd==i
).
If blockInd is non-negative, only the mean of one block of the samples is computed.
Reimplemented from muq::SamplingAlgorithms::SampleEstimator.
Reimplemented in muq::SamplingAlgorithms::DistributedCollection.
Definition at line 149 of file SampleCollection.cpp.
References RecursiveSum(), and samples.
Referenced by BatchError(), CentralMoment(), MultiBatchESS(), and RunningCovariance().
|
virtual |
Estimates the Monte Carlo standard error using the effective sample size coming from the multivariate overlapping batch method of [Vats2019] .
[in] | blockInd | Specifies the block of the sampling state we're interested in. Defaults to -1, which will result in all blocks of the sampling state being concatenated in the MCSE estimate. |
[in] | batchSize | The size of the batches \(b_n\) to use. Defaults to \(N^{1/2}\). Note that to ensure the convergence of the MCSE estimate as \(N\rightarrow\infty\), the batch size must increase with \(N\). |
[in] | overlap | How much the batches overlap. Defaults to \(0.75*b_n\), where \(b_n\) is the batch size. A nonoverlapping batch estimate will be use is overlap=0. This value bust be smaller than \(b_n\). |
Definition at line 420 of file SampleCollection.cpp.
References MultiBatchESS(), and muq::SamplingAlgorithms::SampleEstimator::Variance().
Referenced by muq::SamplingAlgorithms::MarkovChain::StandardError(), and StandardError().
double SampleCollection::MultiBatchESS | ( | int | blockInd = -1 , |
int | batchSize = -1 , |
||
int | overlap = -1 |
||
) | const |
Computes the multivariate effective sample size of [Vats2019]. This implementation is adapted from the python implementation at https://github.com/Gabriel-p/multiESS .
The standard definition of a univariate effective sample size is given in terms of the the ratio between the target distribution variance and the estimator variance. More precisely, \(\text{ESS}_u\) is given by
\[ \text{ESS}_u = N \frac{\sigma^2}{\hat{\sigma}^2}, \]
where \(N\) is the total number of samples in the Monte Carlo estimate, \(\sigma^2\) is the variance of the target random variable, and \(\hat{\sigma}^2\) is the variance of a Monte Carlo estimator for the mean of the target distribution.
The multivariate ESS is defined similarly, but using generalized variances. Let \(\Sigma\) denote the target distribution covariance and \(\hat{\Sigma}\) the covariance of the Monte Carlo mean estimator. The multivariate ESS of [Vats2019] is then given by
\[ \text{ESS}_m = N \left( \frac{|\Sigma|}{|\hat{\Sigma}|} \right)^{1/D}, \]
where \(|\cdot|\) denotes the determinant of the matrix and \(D\) is the dimension of target random variable.
As proposed in [Vats2019], it is possible to estimate \(\text{ESS}_m\) using overlapping batches of samples to calculate the estimator covariance \(\hat{\Sigma}\).
[in] | blockInd | The block of the SampleState that we want to use in the ESS computation. |
[in] | batchSize | The size of the batches \(b_n\) to use. Defaults to \(N^{1/3}\). Note that to ensure the convergence of the ESS estimate as \(N\rightarrow\infty\), the batch size must increase with \(N\). |
[in] | overlap | How much the batches overlap. Defaults to \(b_n/2\), where \(b_n\) is the batch size. A nonoverlapping batch estimate will be use is overlap=0. This value bust be smaller than \(b_n\). |
Definition at line 303 of file SampleCollection.cpp.
References muq::SamplingAlgorithms::SampleEstimator::BlockSize(), Covariance(), Mean(), segment(), and size().
Referenced by muq::SamplingAlgorithms::MarkovChain::ESS(), ESS(), and MultiBatchError().
|
inlinestaticprotected |
Definition at line 351 of file SampleCollection.h.
Referenced by CentralMoment(), and Mean().
|
staticprotected |
Returns the sum of the weights and the sum of the squared weights.
Definition at line 259 of file SampleCollection.cpp.
|
virtual |
Writes the samples, weights, and sample metadata to a group in an HDFfile.
true
, the output vector will have a single component. The MPSRF serves as a worse case estimate of \(\hat{R}\) over all linear combinations of the parameters. |[in] | blockDim | Specifies the block of the sampling state we're interested in. If blockDim=-1, then all blocks will be concatenated together. |
[in] | options | (optional) A property tree possibly containing settings for the "Multivariate" or "Transform" parameters listed above. |
Definition at line 684 of file SampleCollection.cpp.
Referenced by Rhat().
|
inlinevirtual |
Same as the other Rhat function, except all blocks are concatenated together.
Definition at line 338 of file SampleCollection.h.
References Rhat().
|
virtual |
Definition at line 203 of file SampleCollection.cpp.
|
virtual |
Computes running estimates of the covariance.
This function returns a vector of matrices where index \(n\) of the vector contains a Monte Carlo estimate of the covariance that is constructed using the first \(n\) samples in this SampleCollection.
Definition at line 199 of file SampleCollection.cpp.
References Mean().
|
virtual |
Returns a new sample collection containing a segment of this sample collection. The Equivalent to a python command like list[startInd:startInd+length:skipBy]
Reimplemented in muq::SamplingAlgorithms::MarkovChain.
Definition at line 122 of file SampleCollection.cpp.
Referenced by BatchError(), head(), MultiBatchESS(), and tail().
|
virtual |
Returns the number of samples in this SampleCollection.
Reimplemented in muq::SamplingAlgorithms::DistributedCollection.
Definition at line 502 of file SampleCollection.cpp.
References samples.
Referenced by BatchError(), BatchESS(), CreateDataset(), GetMeta(), MultiBatchESS(), RunningCovariance(), segment(), muq::SamplingAlgorithms::MarkovChain::segment(), muq::SamplingAlgorithms::MarkovChain::SingleComponentWolffESS(), and tail().
|
inlineoverridevirtual |
Reimplemented from muq::SamplingAlgorithms::SampleEstimator.
Reimplemented in muq::SamplingAlgorithms::MarkovChain.
Definition at line 222 of file SampleCollection.h.
References StandardError().
Referenced by StandardError().
|
overridevirtual |
Returns an estimate of the the Monte Carlo standard error (MCSE) \(\hat{\sigma}\) for a Monte Carlo estimate of the mean derived using this SampleEstimator. Recall at the MCSE is the standard deviation of the estimator variance employed in the Central Limit Theorem.
[in] | blockInd | Specifies the block of the sampling state we're interested in. Defaults to -1, which will result in all blocks of the sampling state being concatenated in the MCSE estimate. |
[in] | method | A string describing what method should be used to estimate the MCSE. Defaults to "Batch". Options are the same as the ESS function: "Batch", "MultiBatch", and "Wolff". The "Wolff" option is only valid for the "MarkovChain" and "MultiIndexEstimator" classes. |
Implements muq::SamplingAlgorithms::SampleEstimator.
Reimplemented in muq::SamplingAlgorithms::MarkovChain.
Definition at line 369 of file SampleCollection.cpp.
References BatchError(), and MultiBatchError().
|
inlineoverridevirtual |
Reimplemented from muq::SamplingAlgorithms::SampleEstimator.
Reimplemented in muq::SamplingAlgorithms::MarkovChain.
Definition at line 223 of file SampleCollection.h.
References StandardError().
Referenced by StandardError().
|
inlinevirtual |
Returns a new sample collection with the last \(N\) states of this collection. Equivalent to a python command like list[-N:]
Definition at line 93 of file SampleCollection.h.
|
virtual |
Returns a vector of unnormalized sample weights for computing empirical expectations. If the samples were generated with MC or MCMC, the weights will all be
Reimplemented in muq::SamplingAlgorithms::DistributedCollection.
Definition at line 466 of file SampleCollection.cpp.
References samples.
|
protected |
Definition at line 343 of file SampleCollection.h.
Referenced by Add(), AsMatrix(), at(), back(), CentralMoment(), Covariance(), Mean(), size(), and Weights().