Saturday 1 September 2012

Astigmatism

Astigmatism is a very difficult concept to explain to students and patients. There are no good webpage references on the combination of different spherocylinder lenses. Understanding how they add is however important in understanding how the Jackson Cross Cylinder works and also in understanding how astigmatic changes come about with surgery.
At North West Eye Specialists we have the Holladay software for analysing surgical astigmatism which is allowing each of our doctors to form their own database parameters for the optimized management of astigmatism.
As a teaching tool, i have developed an online astigmatism calculator – it can add and subtract spherocylindrical lenses. It is still in beta form and not 100% accurate but will help in the understanding of astigmatism. Please feel free to leave suggestions or drop me a line for the source code

Sample Size Calculations for Ethics Committee Applications



One of the key things that ethics committees look at when determining whether to accept a research proposal is the statistical justification. The ethical issue is twofold:

1. Is an excessive number of subjects being requested while ;
2. Is the number of subjects sufficient to reasonably expect to get a significant result.

Doing the statistical analysis for the application can be daunting and confusing for a prospective researcher. The online resources are often not very good either. I have found the easiest program to use for these calculations to be sigmastat. It is, however, quite expensive even for an academic license.

Fortunately, there is a free alternative. The free software package
r

can do all the power calculations for you and is available for windows, mac and even linux. It is not the most user friendly package so this guide is designed to help you use the program for your ethics commitee application.

First decide what test you are going to use.

For two groups with a continuous (assume normally distributed ) variable – for example the length of male vs female elephant tails – use a t test. You must also decide whether the test needs to consider whether the first group is larger or the same as the second (a one tailed test) or whether the first group could be larger, smaller or the same as the second (a two tailed test)
For more than one group with a continuous (assume normally distributed) variable – for example the average height of asians, africans and north americans – use an ANOVA
For the comparison of success or failure as a proportion of two or more groups – for example the proportion of women who conceived using a fertility treatment – use a test of proportions
Let’s consider these one by one

T-test
work out your parameters – in this example assume that the standard deviation of the elephant tails is 20% and we hypothesize that the male ones will on average be 40% greater.
Fire up R and type ?power.t.test this will display the help screen

In this case the command to get your analysis is

power.t.test(delta=0.4,sd=0.2,sig.level=0.05,power=0.8)

this gives you 5 subjects per group

if you want the one tailed option (ie we are only testing whether male tails are longer male tails being significantly shorter is not a possibility)- you need
power.t.test(delta=0.4,sd=0.2,sig.level=0.05,power=0.8,alternative=”one.sided”)

ANOVA test
This is the most tricky of the three. We first need to know our expected difference in the means and the standard deviations of the groups – is this example let’s say standard deviation again is 20% and we are looking for a difference in the means of 40%.

The first step is to work out Cohen’s D value – this is the difference in the means divided by the standard deviation – in this case 2.0

The next step is to work out Cohen’s effect size – this is given by



so in our example we have f = 0.707

now within r we need to load an additional module so start r and type

require(pwr)

then ?pwr.anova.test for the help screen

and for our power calculation

pwr.anova.test(k=6,f=0.707,sig.level=0.05,power=0.8)

which gives us 6 subjects per group (you can’t use fractions of a subject)

Test of Proportions
this one is fairly easy – for this example let’s say that we guess that 80% of women on a fertility treatment will conceive and of those on the placebo 10% will conceive
In R type

?power.prop.test

and for our example

power.prop.test(p1=0.8,p2=0.1,sig.level=0.05,power=0.8)

which gives us 7 per group

I hope this is useful and i might revise this with some more tricky power calculations later. Any experts are welcome to comment

Do it Yourself Climate Science with R and GHCN


As a scientist but yet-to-be-convinced about climate change, I was challenged to have a look at the climate data myself. After all, statistical modelling is what currently interests me and the open source software R does modelling particularly well.

It turns out that the main surface temperature dataset upon which all of the models are based is publically available and can be analysed with R

First you must get and install R – the program is available for all platforms from http://cran.r-project.org/

then you need to run r and install package RghcnV3 the package written to access and transform the GHCN data

the actual data can be downloaded from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/

monthly average, min or max data can be downloaded. This needs to be unzipped to the .dat file

this file is then loaded into R

> require(RghcnV3)
> v<-readV3Data(filename='c:\\yourdirectory\\foo.dat') #assuming your data has been renamed foo.dat and you have a windows machine - syntax is slightly different for a linux or mac
> stationNames<-dimnames(v) # get the station names

Now you need the id of the station you want. This is available here http://data.giss.nasa.gov/gistemp/station_data_v2/v2.temperature.inv.txt

for example - I tried Omeo : 50194911000

so now which column is Omeo ?

> which(stationNames[[1]]==”50194911000″)

this gives column 5778

so let’s get the data for that :

> odata<-data.frame(v[5776,,])

this creates a data frame of the omeo data. you can have a look at it by typing odata

X1941 X1942 X1943 X1944 X1945 X1946 X1947 X1948 X1949 X1950 X1951 X1952
Jan NA NA 23.0 25.1 24.5 26.0 24.8 21.8 22.4 22.6 23.7 24.6
Feb NA 23.3 24.0 22.3 22.4 23.6 25.2 23.6 21.8 21.9 23.5 23.2
Mar NA 21.9 22.4 20.2 19.1 18.0 20.3 18.4 20.5 20.2 21.7 21.7
Apr NA 16.9 14.7 13.6 15.9 14.8 16.3 16.2 14.3 15.5 13.3 15.4
May NA 13.2 11.1 11.0 11.8 11.2 13.5 10.3 11.3 11.7 11.3 10.8
Jun NA 9.8 7.8 7.7 10.8 7.3 9.2 8.9 7.3 9.0 10.2 9.1
Jul NA 8.0 7.1 7.8 7.6 8.9 8.2 6.6 7.7 9.3 7.5 6.9
Aug NA 9.7 7.8 9.1 9.8 8.5 8.5 9.5 8.8 9.3 8.1 9.4
Sep NA 11.6 11.8 12.8 11.2 11.0 12.0 12.4 11.7 12.4 12.2 11.0
Oct NA 15.1 15.2 16.4 14.6 14.6 13.7 13.7 15.3 14.4 13.8 14.1
Nov NA 18.9 17.1 21.7 18.9 20.2 16.9 18.0 17.0 17.5 16.1 15.7
Dec NA 24.2 21.7 21.5 24.6 22.6 21.1 22.6 20.7 21.9 21.1 20.4

This is a small piece of the data - in Omeo's case available mostly from 1943 to 2011.

To get a rough and ready plot, you can stack it into a single vector

> stackedOmeo<-stack(odata)

and this can be plotted

> plot(stackedOmeo$values,type=’l')

now, there are many missing values from 1701 so let’s just plot the ones we have

> plot(stackedOmeo$values[2900:3700],type=’l')


and you can try and spot the warming.

R has many tools to fit a warming line : linear model, robust linear model, polynomials etc so have fun