Wednesday 20 February 2013

Bolt vs Emerson and the pause - who is right. Using piece-wise regression to find an answer

There has been some debate about the interpretation of global temperature readings - specifically whether there has been a "pause" in the rate of "global warming" over the last 16 years. An recent episode of this is presented at JoNova. The actual exchange is available following the links at the above page. Leaving aside the important point of whether a global temperature is a valid measure, the issue comes down to how to statistically pose the question. Fitting a regression line to a selected time period and inferring that there is no significant trend is not really valid. Of course, fitting a linear regression line to the entire temperature record can be done

and you get a reasonable regression line. This model however assumes a stationary process over the period - no hockey stick and assumes a linearly increasing "human influence" over the entire period. It is actually a fairly good fit to the data.

 The regression data from R are

The correlation coefficient is 0.744

Of course, it looks much more dramatic if you restrict the data to post 1974. A fit to that data of a linear model gives

 The correlation coefficient is 0.821

 It correlates better and the gradient is 3.8 times as steep. It also corresponds to a plausible period of increased "human influence". A problem is that if you allow a change in the regression line at that time point, you must allow other changes as well and this poses the question of whether there has been another change and if so to what ?

It is true that "eyeballing" the data there seems to be a flattening off in later years

This graph looks at the data post 1974 and compares two models : one with a breakpoint and one without

Now, comparing model fits like this is not easy.  The likelihood function of the data fit needs to be adjusted for the number of parameters.  The single line model has only 2 parameters - slope and intercept but the 2 line model has 5 parameters - the slope and intercept of both lines and the break point position.  One way to compare the models is to use the AIC (Akaike Information Criterion) - the lower the better.  In this case, the AIC for the single line was -568 and the two line model was -602.  Not much difference but some evidence for a breakpoint in September 2005