## GP Basics¶

Sometimes an unknown parameter or variable in a model is not a scalar value ora fixed-length vector, but a *function*. A Gaussian process (GP) can be usedas a prior probability distribution whose support is over the space ofcontinuous functions. A GP prior on the function \(f(x)\) is usually written,

\[f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,.\]

The function values are modeled as a draw from a multivariate normaldistribution that is parameterized by the mean function, \(m(x)\), and thecovariance function, \(k(x, x')\). Gaussian processes are a convenientchoice as priors over functions due to the marginalization and conditioningproperties of the multivariate normal distribution. Usually, the marginaldistribution over \(f(x)\) is evaluated during the inference step. Theconditional distribution is then used for predicting the function values\(f(x_*)\) at new points, \(x_*\).

The joint distribution of \(f(x)\) and \(f(x_*)\) is multivariatenormal,

\[\begin{split}\begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim\text{N}\left( \begin{bmatrix} m(x) \\ m(x_*) \\ \end{bmatrix} \,, \begin{bmatrix} k(x,x') & k(x_*, x) \\ k(x_*, x) & k(x_*, x_*') \\ \end{bmatrix} \right) \,.\end{split}\]

Starting from the joint distribution, one obtains the marginal distributionof \(f(x)\), as \(\text{N}(m(x),\, k(x, x'))\). The conditionaldistribution is

\[f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\, k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,.\]

Note

For more information on GPs, check out the book Gaussian Processes forMachine Learning by Rasmussen &Williams, or this introductionby D. Mackay.

PyMC3 is a great environment for working with fully Bayesian Gaussian Processmodels. GPs in PyMC3 have a clear syntax and are highly composable, and manypredefined covariance functions (or kernels), mean functions, and several GPimplementations are included. GPs are treated as distributions that can beused within larger or hierarchical models, not just as standalone regressionmodels.

## Mean and covariance functions¶

Those who have used the GPy or GPflow Python packages will find the syntax forconstruction mean and covariance functions somewhat familiar. When firstinstantiated, the mean and covariance functions are parameterized, but notgiven their inputs yet. The covariance functions must additionally be providedwith the number of dimensions of the input matrix, and a list that indexeswhich of those dimensions they are to operate on. The reason for this designis so that covariance functions can be constructed that are combinations ofother covariance functions.

For example, to construct an exponentiated quadratic covariance function thatoperates on the second and third column of a three column matrix representingthree predictor variables:

ls = [2, 5] # the lengthscalescov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls, active_dims=[1, 2])

Here the `ls`

, or lengthscale, parameter is two dimensional, allowing the secondand third dimension to have a different lengthscale. The reason we have tospecify `input_dim`

, the total number of columns of `X`

, and`active_dims`

, which of those columns or dimensions the covariancefunction will act on, is because `cov_func`

hasn’t actually seen theinput data yet. The `active_dims`

argument is optional, and defaults toall columns of the matrix of inputs.

Covariance functions in PyMC3 closely follow the algebraic rules for kernels,which allows users to combine covariance functions into new ones, for example:

The sum of two covariance functions is also a covariance function:

cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...)

The product of two covariance functions is also a covariance function:

cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...)

The product (or sum) of a covariance function with a scalar is acovariance function:

cov_func = eta**2 * pm.gp.cov.Matern32(...)

After the covariance function is defined, it is now a function that isevaluated by calling `cov_func(x, x)`

(or `mean_func(x)`

). SincePyMC3 is built on top of Theano, it is relatively easy to define and experimentwith non-standard covariance and mean functons. For more information check outthe tutorial on covariance functions.

## GP Implementations¶

PyMC3 includes several GP implementations, including marginal and latentvariable models and also some fast approximations. Their usage all follows asimilar pattern: First, a GP is instantiated with a mean function and acovariance function. Then, GP objects can be added together, allowing forfunction characteristics to be carefully modeled and separated. Finally, oneof prior, marginal_likelihood or conditional methods is called on the GPobject to actually construct the PyMC3 random variable that represents thefunction prior.

Using `gp.Latent`

for the example, the syntax to first specify the GPis:

gp = pm.gp.Latent(mean_func, cov_func)

The first argument is the mean function and the second is the covariancefunction. We’ve made the GP object, but we haven’t made clear which functionit is to be a prior for, what the inputs are, or what parameters it will beconditioned on.

Note

The `gp.Marginal`

class and similar don’t have a `prior`

method.Instead they have a `marginal_likelihood`

method that is used similarly,but has additional required arguments, such as the observed data, noise,or other, depending on the implementation. See the notebooks for examples.The `conditional`

method works similarly.

Calling the prior method will create a PyMC3 random variable that representsthe latent function \(f(x) = \mathbf{f}\):

f = gp.prior("f", X)

`f`

is a random variable that can be used within a PyMC3 model like anyother type of random variable. The first argument is the name of the randomvariable representing the function we are placing the prior over.The second argument is the inputs to the function that the prior is over,`X`

. The inputs are usually known and present in the data, but they canalso be PyMC3 random variables. If the inputs are a Theano tensor or aPyMC3 random variable, the `shape`

needs to be given.

Usually at this point, inference is performed on the model. The`conditional`

method creates the conditional, or predictive,distribution over the latent function at arbitrary \(x_*\) input points,\(f(x_*)\). To construct the conditional distribution we write:

f_star = gp.conditional("f_star", X_star)

## Additive GPs¶

The GP implementation in PyMC3 is constructed so that it is easy to defineadditive GPs and sample from individual GP components. We can write:

gp1 = pm.gp.Marginal(mean_func1, cov_func1)gp2 = pm.gp.Marginal(mean_func2, cov_func2)gp3 = gp1 + gp2

The GP objects have to have the same type, `gp.Marginal`

cannotbe added to `gp.Latent`

.

Consider two independent GP distributed functions, \(f_1(x) \sim\mathcal{GP}\left(m_1(x),\, k_1(x, x')\right)\) and \(f_2(x) \sim\mathcal{GP}\left( m_2(x),\, k_2(x, x')\right)\). The joint distribution of\(f_1,\, f_1^*,\, f_2,\, f_2^*,\, f_1 + f_2 and f_1^* + f_2^*\) is

\[\begin{split}\begin{bmatrix} f_1 \\ f_1^* \\ f_2 \\ f_2^* \\ f_1 + f_2 \\ f_1^* + f_2^* \end{bmatrix} \sim\text{N}\left( \begin{bmatrix} m_1 \\ m_1^* \\ m_2 \\ m_2^* \\ m_1 + m_2 \\ m_1^* + m_2^* \\ \end{bmatrix} \,,\, \begin{bmatrix} K_1 & K_1^* & 0 & 0 & K_1 & K_1^* \\ K_1^{*^T} & K_1^{**} & 0 & 0 & K_1^* & K_1^{**} \\ 0 & 0 & K_2 & K_2^* & K_2 & K_2^{*} \\ 0 & 0 & K_2^{*^T} & K_2^{**} & K_2^{*} & K_2^{**} \\ K_1 & K_1^{*} & K_2 & K_2^{*} & K_1 + K_2 & K_1^{*} + K_2^{*} \\ K_1^{*^T} & K_1^{**} & K_2^{*^T} & K_2^{**} & K_1^{*^T}+K_2^{*^T} & K_1^{**}+K_2^{**} \end{bmatrix}\right) \,.\end{split}\]

Using the joint distribution to obtain the conditional distribution of \(f_1^*\)with the contribution due to \(f_1 + f_2\) factored out, we get

\[f_1^* \mid f_1 + f_2 \sim \text{N}\left( m_1^* + K_1^{*^T}(K_1 + K_2)^{-1}\left[f_1 + f_2 - m_1 - m_2\right] \,,\, K_1^{**} - K_1^{*^T}(K_1 + K_2)^{-1}K_1^* \right) \,.\]

These equations show how to break down GP models into individual components to see how eachcontributes to the data. For more information, check out David Duvenaud’s PhDthesis.

The GP objects in PyMC3 keeps track of these marginals automatically. Thefollowing code sketch shows how to define the conditional distribution of\(f_2^*\). We use gp.Marginal in the example, but the same works forother implementations. The first block fits the GP prior. We denote\(f_1 + f_2\) as just \(f\) for brevity:

with pm.Model() as model: gp1 = pm.gp.Marginal(mean_func1, cov_func1) gp2 = pm.gp.Marginal(mean_func2, cov_func2) # gp represents f1 + f2. gp = gp1 + gp2 f = gp.marginal_likelihood("f", X, y, noise) trace = pm.sample(1000)

To construct the conditional distribution of `gp1`

or `gp2`

, wealso need to include the additional arguments, `X`

, `y`

, and`noise`

:

with model: # conditional distributions of f1 and f2 f1_star = gp1.conditional("f1_star", X_star, given={"X": X, "y": y, "noise": noise, "gp": gp}) f2_star = gp2.conditional("f2_star", X_star, given={"X": X, "y": y, "noise": noise, "gp": gp}) # conditional of f1 + f2, `given` not required f_star = gp.conditional("f_star", X_star)

This second block produces the conditional distributions. Notice that extraarguments are required for conditionals of \(f1\) and \(f2\), but not\(f\). This is because those arguments are cached when`.marginal_likelihood`

is called on `gp`

.

Note

When constructing conditionals, the additional arguments `X`

, `y`

,`noise`

and `gp`

must be provided as a dict called given!

Since the marginal likelihoood method of `gp1`

or `gp2`

weren’t called,their conditionals need to be provided with the required inputs. In the samefashion as the prior, `f_star`

, `f1_star`

and `f2_star`

are randomvariables that can now be used like any other random variable in PyMC3.

Check the notebooks for detailed demonstrations of the usage of GP functionalityin PyMC3.

## FAQs

### What are the requirements for PyMC3? ›

Running PyMC3 requires **a working Python interpreter, either version 2.7 (or more recent) or 3.5 (or more recent**); we recommend that new users install version 3.5.

**How do you sample a function in Gaussian process? ›**

To sample functions from the Gaussian process we need to **define the mean and covariance functions**. The covariance function k ( x a , x b ) models the joint variability of the Gaussian process random variables. It returns the modelled covariance between each pair in and .

**What is Gaussian process used for? ›**

Gaussian Process is a machine learning technique. You can use it to do **regression, classification**, among many other things. Being a Bayesian method, Gaussian Process makes predictions with uncertainty. For example, it will predict that tomorrow's stock price is $100, with a standard deviation of $30.

**What does PyMC mean? ›**

PyMC (formerly known as PyMC3) is **a Python package for Bayesian statistical modeling and probabilistic machine learning** which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms.

**What does PyMC3 stand for? ›**

PyMC3 is a probabilistic programming package for Python that allows users to fit Bayesian models using a variety of numerical methods, most notably Markov chain Monte Carlo (MCMC) and variational inference (VI).

**Which three PyMC building blocks are provided for Bayesian probability models? ›**

Bayesian inference begins with specification of a probability model relating unknown variables to data. PyMC provides three basic building blocks for probability models: **Stochastic, Deterministic and Potential**.

**How do you convert data into Gaussian distribution? ›**

- Checking the distribution with Skewness.
- Checking the distribution of some variables using Histogram.
- Checking the distribution of variables using KDE plot.
- Checking the distribution of variables using a Q-Q plot.
- Transformations to change the distribution of features.

**How do you show a function is Gaussian? ›**

you can write **f(x,y)=12πexp(−0.5((x,y)Σ−1(x,y))** here because (x,y)Σ−1(x,y)=x2−2xy+9y2. Hence it is gaussian with mean 0 and variance matrix Σ. It should include the factor of 12πdetΣ−12 at the front.

**How do you determine if a process is Gaussian? ›**

Textbook definition

Just like a Gaussian distribution is specified by its mean and variance, a Gaussian process is completely defined by (1) **a mean function telling you the mean at any point of the input space and (2) a covariance function K ( x , x ′ ) that sets the covariance between points**.

**What is Gaussian process in simple words? ›**

In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed.

### Why Gaussian is so important? ›

Gaussian distribution is the most important probability distribution in statistics because **it fits many natural phenomena like age, height, test-scores, IQ scores, sum of the rolls of two dices and so on**.

**Why Gaussian is important? ›**

The normal distribution (or Gaussian distribution), also referred as bell curve, is very useful **due to the central limit theorem**. Normal distribution states which are average of random variables converge in distribution to the normal and are normally distributed when the number of random variables is large.

**What is probabilistic programming with PyMC? ›**

Probabilistic programming (PP) **allows for flexible specification and fitting of Bayesian statistical models**. PyMC3 is a new, open-source PP framework with an intuitive and readable, yet powerful, syntax that is close to the natural syntax statisticians use to describe models.

**What is PyMC deterministic? ›**

**Deterministic nodes are only deterministic given all of their inputs, i.e. they don't add randomness to the model**. They are generally used to record an intermediary result. Indeed, PyMC allows for arbitrary combinations of random variables, for example in the case of a logistic regression.

**What is PyMC3 summary? ›**

PyMC3 is **a probabilistic programming framework for performing Bayesian modeling and visualization**. It uses Theano as a backend. It has algorithms to perform Monte Carlo simulation as well as Variational Inference. It also has a diagnostic visualization tool called ArViz.

**What is Bayesian analysis using PyMC3? ›**

**PyMC3 (now simply PyMC) is a Bayesian modelling package that enables us to carry out Bayesian inference easily as Data Scientists**. Under the hood, PyMC3 uses the method of Markov Chain Monte Carlo (MCMC) to compute the posterior distribution.

**What is probabilistic programming in Python using PyMC3? ›**

PyMC3 is a new open source probabilistic programming framework written in Python that **uses Theano to compute gradients via automatic differentiation as well as compile probabilistic programs on-the-fly to C for increased speed**.

**What are the 4 basic building blocks of a data model? ›**

The basic building blocks of all data models are **entities, attributes, relationships, and constraints**.

**What are the 3 core building blocks in programming? ›**

An algorithm is made up of three basic building blocks: **sequencing, selection, and iteration**.

**What are the 3 building blocks of every project? ›**

The building blocks are: **processes, people and systems**. However, these building blocks cannot be considered as equal, simply because processes have a significant influence on the required people and systems.

### How do you write a Gaussian? ›

**=(aσX)Z+(aμX+b)**.

**How is Gaussian distribution calculated? ›**

Any point (x) from a normal distribution can be converted to the standard normal distribution (z) with the formula **z = (x-mean) / standard deviation**. z for any particular x value shows how many standard deviations x is away from the mean for all x values.

**How do you create a Gaussian sample? ›**

**In short, to generate our 2-D Gaussian samples, we:**

- Sample independent left-side areas (A) from a uniform distribution (using numpy. ...
- Apply the Taylor series approximation of the inverse Gaussian CDF to each sampled area. ...
- For 2-D Gaussian samples, we can first generate standard Gaussian samples for the x-coordinates.

**How do you normalize a Gaussian distribution function? ›**

The Gaussian distribution arises in many contexts and is widely used for modeling continuous random variables. p(x | µ, σ2) = N(x; µ, σ2) = 1 Z exp ( − (x − µ)2 2σ2 ) . The normalization constant Z is **Z = √ 2πσ2**.

**How do you show an equation is a function? ›**

**A function is an equation that has only one answer for y for every x**. A function assigns exactly one output to each input of a specified type. It is common to name a function either f(x) or g(x) instead of y. f(2) means that we should find the value of our function when x equals 2.

**What are the two main features of Gaussian process? ›**

First, **a Gaussian process is completely determined by its mean and covariance functions**. This property facili- tates model fitting as only the first- and second-order moments of the process require specification. Second, solving the prediction problem is relatively straight- forward.

**What is the difference between Gaussian process and Gaussian distribution? ›**

The multivariate Gaussian distribution is a distribution that describes the behaviour of a finite (or at least countable) random vector. Contrarily, **a Gaussian process is a stochastic process defined over a continuum of values (i.e., an uncountably large set of values)**.

**What are the assumptions of Gaussian process? ›**

The Gaussian process assumption is **to model f as random itself, and assume that the value of f for any two arbitrary inputs x and x (f(x) and f(x ) respectively) has a joint Gaussian distribution**, here represented with the solid blue lines. The distribution over f(x) and f(x ) is, however, a joint distribution (cf.

**Is Gaussian process a linear or nonlinear? ›**

Abstract—Gaussian processes (GPs) are versatile tools that have been successfully employed to solve **nonlinear** estimation problems in machine learning, but that are rarely used in signal processing. In this tutorial, we present GPs for regression as a natural nonlinear extension to optimal Wiener filtering.

**Why is it called Gaussian? ›**

The term “Gaussian distribution” **refers to the German mathematician Carl Friedrich Gauss**, who first developed a two-parameter exponential function in 1809 in connection with studies of astronomical observation errors.

### What does it mean if something is Gaussian? ›

: **being or having the shape of a normal curve or a normal distribution**.

**How to interpret Gaussian distribution? ›**

The graph of the Gaussian distribution depends on two factors – the mean and the standard deviation. **The mean of the distribution determines the location of the center of the graph, and the standard deviation determines the height and width of the graph**.

**Is Gaussian the same as normal? ›**

The Gaussian is the same as the normal. Wikipedia can usually be trusted on this sort of question.

**What is difference of Gaussian used for? ›**

The difference of gaussians algorithm **removes high frequency detail that often includes random noise**, rendering this approach one of the most suitable for processing images with a high degree of noise.

**Why do we assume Gaussian distribution? ›**

**Whenever we need to represent real valued random variables whose distribution is not known**, we assume the Gaussian form. This behavior is largely owed to Central Limit Theorem (CLT) which involves the study of sum of multiple random variables.

**What is the strongest Gaussian? ›**

**Neodymium magnets** are the strongest type of magnets that are permanent in nature. These magnets can have a gauss rating over 14,000. Medical magnets that are used in conjunction with MRI machines have a gauss rating between 20,000 and 70,000.

**Why is Gaussian distribution so common? ›**

The Normal Distribution (or a Gaussian) shows up widely in statistics **as a result of the Central Limit Theorem**. Specifically, the Central Limit Theorem says that (in most common scenarios besides the stock market) anytime “a bunch of things are added up,” a normal distribution is going to result.

**What are the two types of probabilistic models? ›**

These models can be **part deterministic and part random or wholly random**.

**What is the difference between stochastic and probabilistic? ›**

**Stochastic can be thought of as a random event, whereas probabilistic is derived from probability**.

**Why do we need probabilistic programming? ›**

The idea behind Probabilistic programming to bring the inference algorithms and theory from statistics combined with formal semantics, compilers, and other tools from programming languages to build efficient inference evaluators for models and applications from Machine Learning.

### What is the difference between probabilistic and deterministic? ›

In deterministic models, the output of the model is fully determined by the parameter values and the initial values, whereas probabilistic (or stochastic) models incorporate randomness in their approach. Consequently, the same set of parameter values and initial conditions will lead to a group of different outputs.

**Why do we use deterministic? ›**

A Deterministic Model **allows you to calculate a future event exactly, without the involvement of randomness**. If something is deterministic, you have all of the data necessary to predict (determine) the outcome with certainty.

**What is deterministic and stochastic simulation? ›**

1 Stochastic vs deterministic simulations. **A model is deterministic if its behavior is entirely predictable**. Given a set of inputs, the model will result in a unique set of outputs. A model is stochastic if it has random variables as inputs, and consequently also its outputs are random.

**What are chains in PyMC3? ›**

A chain is **a single run of MCMC**. So if you have six 2-d parameters in your model and ask for 1000 samples, you will get six 2x1000 arrays for each chain. When running MCMC, it is a best practice to use multiple chains, as they can help diagnose problems.

**Which Python package is best for Bayesian statistics? ›**

**PyMC3** is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

**How many chains are in PyMC3? ›**

By default, PyMC3 will run **one chain for each core available**. This used 4 cores to sample 4 chains, and did it in less than a second.

**How many chains should run in MCMC? ›**

John Kruschke in Doing Bayesian Data Analysis recommends that for parameters of interest, MCMC chains should be run **until their effective sample size is at least 10,000**.

**Are Markov chains Bayesian? ›**

**Among the trademarks of the Bayesian approach**, Markov chain Monte Carlo methods are especially mysterious. They're math-heavy and computationally expensive procedures for sure, but the basic reasoning behind them, like so much else in data science, can be made intuitive.

**How do you predict probabilities in Python? ›**

The sklearn library has the **predict_proba() command** that can be used to generate a two column array, the first column being the probability that the outcome will be 0 and the second being the probability that the outcome will be 1. The sum of each row of the two columns should also equal one.

**Is Bayesian analysis hard? ›**

Bayesian statistics itself is not new, but traditionally it has not been taught widely. There are many reasons for this. One reason is that it is **computationally challenging**. Second, in the past, there were more criticisms surrounding the subjectivity of Bayesian methods.

### How hard is Bayesian statistics? ›

Bayesian methods can be computationally intensive, but there are lots of ways to deal with that. And for most applications, they are fast enough, which is all that matters. Finally, **they are not that hard, especially if you take a computational approach**.

**Is it worth learning Bayesian statistics? ›**

**Easier to interpret**: Bayesian methods have more flexible models. This flexibility can create models for complex statistical problems where frequentist methods fail. In addition, the results from Bayesian analysis are often easier to interpret than their frequentist counterparts [2].