**EDIT:** Please see this new post for the same analysis done with RStan.

I’m currently working my way through Rasmussen and Williams’s book on Gaussian processes. It’s another one of those topics that seems to crop up a lot these days, particularly around control strategies for energy systems, and thought I should be able to at least perform basic analyses with this method.

While the book is sensibly laid-out and pretty comprehensive in its choice of topics, it is also a very hard read. My linear algebra may be rusty but I’ve heard some mathematicians describe the conventions used in the book as “an affront to notation”. So just be aware that if you try to work through the book, you will need to be patient. It’s not a cookbook that clearly spells out how to do everything step-by-step.

That said, I have now worked through the basics of Gaussian process regression as described in Chapter 2 and I want to share my code with you here. As always, I’m doing this in R and if you search CRAN, you will find a specific package for Gaussian process regression: gptk. The implementation shown below is much slower than the gptk functions, but by doing things manually I hope you will find it easier to understand what’s actually going on. The full code is given below and is available Github. (PS anyone know how to embed only a few lines from a gist?)

### Step 1: Generating functions

With a standard univariate statistical distribution, we draw single values. By contrast, a Gaussian process can be thought of as a distribution of functions. So the first thing we need to do is set up some code that enables us to generate these functions. The code at the bottom shows how to do this and hopefully it is pretty self-explanatory. The result is basically the same as Figure 2.2(a) in Rasmussen and Williams, although with a different random seed and plotting settings.

### Step 2: Fitting the process to noise-free data

Now let’s assume that we have a number of fixed data points. In other words, our Gaussian process is again generating lots of different functions but we know that each draw must pass through some given points. For now, we will assume that these points are perfectly known. In the code, I’ve tried to use variable names that match the notation in the book. In the resulting plot, which corresponds to Figure 2.2(b) in Rasmussen and Williams, we can see the explicit samples from the process, along with the mean function in red, and the constraining data points.

### Step 3: Fitting the process to noisy data

The next extension is to assume that the constraining data points are not perfectly known. Instead we assume that they have some amount of normally-distributed noise associated with them. This case is discussed on page 16 of the book, although an explicit plot isn’t shown. The code and resulting plot is shown below; again, we include the individual sampled functions, the mean function, and the data points (this time with error bars to signify 95% confidence intervals).

### Next steps

Hopefully that will give you a starting point for implementating Gaussian process regression in R. There are several further steps that could be taken now including:

- Changing the squared exponential covariance function to include the signal and noise variance parameters, in addition to the length scale shown here.
- Speed up the code by using the Cholesky decomposition, as described in Algorithm 2.1 on page 19.
- Try to implement the same regression using the gptk package. Sadly the documentation is also quite sparse here, but if you look in the source files at the various demo* files, you should be able to figure out what’s going on.

SMThank you for this, a wonderful script

XYCare to be more specific?

James KeirsteadPost authorIf I remember correctly, it was some friends of a student of mine who were used to a different notation. Though what the alternative was I have no idea.

XYOk thanks. I guess I just don’t want to end up having learned the “wrong” notation …

DionysiosThanks a lot for the tutorial.

I have a question regarding the value ranges of the X and Y variables. Does the above work for any value ranges? Or is it necessary to do some kind of normalisation?

James KeirsteadPost authorShould work fine with any value ranges. If you’re looking for more info though on data transformations for regression, Gelman and Hill’s Data Analysis Using Regression and Multilevel/Hierarchical Models has lots of great tips.

DionysiosThanks a lot for the reference James. One more point; what if my response variable takes only positive values? How can I bound my GP with such a condition?

James KeirsteadPost authorThe best option in this case might be to transform the data so that you create a new response variable that is positive-only. For example, in a normal regression case, you might calculate $latex y_2 = \log y = \beta x + \epsilon$

Pingback: Bajando el error de las predicciones (II): algoritmos | Open Data

skaaeI used most of your script in a shiny application. You can play around with observation noise, length scale, magnitude.

http://skaae.shinyapps.io/test_project/

James KeirsteadPost authorNice!

lingfang zhangwhy can’t I download the code

James KeirsteadPost authorNot sure. Just click “View raw” at the bottom of the code box.

AkiYour tutorial is simply awsome. It not only gave me a good exposure to GPM, but also some plotting capabilities in R. However my requirement is to use the derivatives of the observations in the predictions.

Can you please read chapter 9 and try to use the derivative of the observations in the R code.

JulieVery clear, thanks a lot ! It really helped me understand how to use the GP in practice.