Saturday, 1 September 2012

Do it Yourself Climate Science with R and GHCN


As a scientist but yet-to-be-convinced about climate change, I was challenged to have a look at the climate data myself. After all, statistical modelling is what currently interests me and the open source software R does modelling particularly well.

It turns out that the main surface temperature dataset upon which all of the models are based is publically available and can be analysed with R

First you must get and install R – the program is available for all platforms from http://cran.r-project.org/

then you need to run r and install package RghcnV3 the package written to access and transform the GHCN data

the actual data can be downloaded from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/

monthly average, min or max data can be downloaded. This needs to be unzipped to the .dat file

this file is then loaded into R

> require(RghcnV3)
> v<-readV3Data(filename='c:\\yourdirectory\\foo.dat') #assuming your data has been renamed foo.dat and you have a windows machine - syntax is slightly different for a linux or mac
> stationNames<-dimnames(v) # get the station names

Now you need the id of the station you want. This is available here http://data.giss.nasa.gov/gistemp/station_data_v2/v2.temperature.inv.txt

for example - I tried Omeo : 50194911000

so now which column is Omeo ?

> which(stationNames[[1]]==”50194911000″)

this gives column 5778

so let’s get the data for that :

> odata<-data.frame(v[5776,,])

this creates a data frame of the omeo data. you can have a look at it by typing odata

X1941 X1942 X1943 X1944 X1945 X1946 X1947 X1948 X1949 X1950 X1951 X1952
Jan NA NA 23.0 25.1 24.5 26.0 24.8 21.8 22.4 22.6 23.7 24.6
Feb NA 23.3 24.0 22.3 22.4 23.6 25.2 23.6 21.8 21.9 23.5 23.2
Mar NA 21.9 22.4 20.2 19.1 18.0 20.3 18.4 20.5 20.2 21.7 21.7
Apr NA 16.9 14.7 13.6 15.9 14.8 16.3 16.2 14.3 15.5 13.3 15.4
May NA 13.2 11.1 11.0 11.8 11.2 13.5 10.3 11.3 11.7 11.3 10.8
Jun NA 9.8 7.8 7.7 10.8 7.3 9.2 8.9 7.3 9.0 10.2 9.1
Jul NA 8.0 7.1 7.8 7.6 8.9 8.2 6.6 7.7 9.3 7.5 6.9
Aug NA 9.7 7.8 9.1 9.8 8.5 8.5 9.5 8.8 9.3 8.1 9.4
Sep NA 11.6 11.8 12.8 11.2 11.0 12.0 12.4 11.7 12.4 12.2 11.0
Oct NA 15.1 15.2 16.4 14.6 14.6 13.7 13.7 15.3 14.4 13.8 14.1
Nov NA 18.9 17.1 21.7 18.9 20.2 16.9 18.0 17.0 17.5 16.1 15.7
Dec NA 24.2 21.7 21.5 24.6 22.6 21.1 22.6 20.7 21.9 21.1 20.4

This is a small piece of the data - in Omeo's case available mostly from 1943 to 2011.

To get a rough and ready plot, you can stack it into a single vector

> stackedOmeo<-stack(odata)

and this can be plotted

> plot(stackedOmeo$values,type=’l')

now, there are many missing values from 1701 so let’s just plot the ones we have

> plot(stackedOmeo$values[2900:3700],type=’l')


and you can try and spot the warming.

R has many tools to fit a warming line : linear model, robust linear model, polynomials etc so have fun



1 comment:

  1. My sister wrote a thesis on a similar topic. I helped her with this, because I also need it. As soon as we finished with this work, our computer started to give out an error. We could not open the file with our thesis work. We found such a solution how to open dat file https://wikiext.com/dat. Using this program, our problem was solved.

    ReplyDelete