Sunday 10 November 2019

Cybersecurity - Ransomware resilience

Ransomware is a real and present danger to all businesses.  Medical practices need to be especially vigilant because of the sensitivity of our data and the risk to our patients if we lose their clinical information

We have been developing some guidelines for ransomware resilience - to decrease the likelihood of an attack and to survive it if the unthinkable happens

the new site can be found here  http://cybersecurity.sarossy.com

Tuesday 2 September 2014

Plaquenil Screening

Plaquenil (hydroxychloroquine) and its cousin chloroquine are useful drugs - particularly for arthritis and some skin conditions.  Both drugs can cause significant damage to the retina which is usually irreversible.  Doctors have known for decades about the importance of  monitoring of all patients on long term plaquenil.

The technology for screening for plaquenil toxicity has evolved over recent years.  A nice review article can be found here.  There has been much enthusiasm recently for high resolution OCT scanning for detection of early plaquenil toxicity and this has in many doctors' practices been at the expense of visual field testing.

A case was recently seen where substantial and permanent visual loss resulted from plaquenil despite the regular OCT scans.

Our approach is now, where possible, do some baseline tests before the first dose of plaquenil.  This includes colour vision, central field and OCT.  We have found Fundus Autofluorescence (FAF) to be more effective than OCT in the detection of damage so for patients with more than 3 years of plaquenil treatment we recommend annual visual fields, FAF and OCT.

In cases where patients have been on Plaquenil and it is working for their joint symptoms but there is some suspicion of toxicity, we recommend the addition of a multifocal ERG test .  This is a test of the function of the retina rather than its structure which is tested by OCT and FAF

At our Essendon rooms  we can offer all of these types of tests as well as testing for a large range of other eye diseases

Wednesday 20 February 2013

Bolt vs Emerson and the pause - who is right. Using piece-wise regression to find an answer

There has been some debate about the interpretation of global temperature readings - specifically whether there has been a "pause" in the rate of "global warming" over the last 16 years. An recent episode of this is presented at JoNova. The actual exchange is available following the links at the above page. Leaving aside the important point of whether a global temperature is a valid measure, the issue comes down to how to statistically pose the question. Fitting a regression line to a selected time period and inferring that there is no significant trend is not really valid. Of course, fitting a linear regression line to the entire temperature record can be done

and you get a reasonable regression line. This model however assumes a stationary process over the period - no hockey stick and assumes a linearly increasing "human influence" over the entire period. It is actually a fairly good fit to the data.

 The regression data from R are
Coefficients
InterceptGradient
5.769e-021.272e-05

The correlation coefficient is 0.744

Of course, it looks much more dramatic if you restrict the data to post 1974. A fit to that data of a linear model gives
Coefficients
InterceptGradient
-1.638e-014.641e-05

 The correlation coefficient is 0.821

 It correlates better and the gradient is 3.8 times as steep. It also corresponds to a plausible period of increased "human influence". A problem is that if you allow a change in the regression line at that time point, you must allow other changes as well and this poses the question of whether there has been another change and if so to what ?

It is true that "eyeballing" the data there seems to be a flattening off in later years

This graph looks at the data post 1974 and compares two models : one with a breakpoint and one without

Now, comparing model fits like this is not easy.  The likelihood function of the data fit needs to be adjusted for the number of parameters.  The single line model has only 2 parameters - slope and intercept but the 2 line model has 5 parameters - the slope and intercept of both lines and the break point position.  One way to compare the models is to use the AIC (Akaike Information Criterion) - the lower the better.  In this case, the AIC for the single line was -568 and the two line model was -602.  Not much difference but some evidence for a breakpoint in September 2005


Saturday 1 September 2012

Astigmatism

Astigmatism is a very difficult concept to explain to students and patients. There are no good webpage references on the combination of different spherocylinder lenses. Understanding how they add is however important in understanding how the Jackson Cross Cylinder works and also in understanding how astigmatic changes come about with surgery.
At North West Eye Specialists we have the Holladay software for analysing surgical astigmatism which is allowing each of our doctors to form their own database parameters for the optimized management of astigmatism.
As a teaching tool, i have developed an online astigmatism calculator – it can add and subtract spherocylindrical lenses. It is still in beta form and not 100% accurate but will help in the understanding of astigmatism. Please feel free to leave suggestions or drop me a line for the source code

Sample Size Calculations for Ethics Committee Applications



One of the key things that ethics committees look at when determining whether to accept a research proposal is the statistical justification. The ethical issue is twofold:

1. Is an excessive number of subjects being requested while ;
2. Is the number of subjects sufficient to reasonably expect to get a significant result.

Doing the statistical analysis for the application can be daunting and confusing for a prospective researcher. The online resources are often not very good either. I have found the easiest program to use for these calculations to be sigmastat. It is, however, quite expensive even for an academic license.

Fortunately, there is a free alternative. The free software package
r

can do all the power calculations for you and is available for windows, mac and even linux. It is not the most user friendly package so this guide is designed to help you use the program for your ethics commitee application.

First decide what test you are going to use.

For two groups with a continuous (assume normally distributed ) variable – for example the length of male vs female elephant tails – use a t test. You must also decide whether the test needs to consider whether the first group is larger or the same as the second (a one tailed test) or whether the first group could be larger, smaller or the same as the second (a two tailed test)
For more than one group with a continuous (assume normally distributed) variable – for example the average height of asians, africans and north americans – use an ANOVA
For the comparison of success or failure as a proportion of two or more groups – for example the proportion of women who conceived using a fertility treatment – use a test of proportions
Let’s consider these one by one

T-test
work out your parameters – in this example assume that the standard deviation of the elephant tails is 20% and we hypothesize that the male ones will on average be 40% greater.
Fire up R and type ?power.t.test this will display the help screen

In this case the command to get your analysis is

power.t.test(delta=0.4,sd=0.2,sig.level=0.05,power=0.8)

this gives you 5 subjects per group

if you want the one tailed option (ie we are only testing whether male tails are longer male tails being significantly shorter is not a possibility)- you need
power.t.test(delta=0.4,sd=0.2,sig.level=0.05,power=0.8,alternative=”one.sided”)

ANOVA test
This is the most tricky of the three. We first need to know our expected difference in the means and the standard deviations of the groups – is this example let’s say standard deviation again is 20% and we are looking for a difference in the means of 40%.

The first step is to work out Cohen’s D value – this is the difference in the means divided by the standard deviation – in this case 2.0

The next step is to work out Cohen’s effect size – this is given by



so in our example we have f = 0.707

now within r we need to load an additional module so start r and type

require(pwr)

then ?pwr.anova.test for the help screen

and for our power calculation

pwr.anova.test(k=6,f=0.707,sig.level=0.05,power=0.8)

which gives us 6 subjects per group (you can’t use fractions of a subject)

Test of Proportions
this one is fairly easy – for this example let’s say that we guess that 80% of women on a fertility treatment will conceive and of those on the placebo 10% will conceive
In R type

?power.prop.test

and for our example

power.prop.test(p1=0.8,p2=0.1,sig.level=0.05,power=0.8)

which gives us 7 per group

I hope this is useful and i might revise this with some more tricky power calculations later. Any experts are welcome to comment

Do it Yourself Climate Science with R and GHCN


As a scientist but yet-to-be-convinced about climate change, I was challenged to have a look at the climate data myself. After all, statistical modelling is what currently interests me and the open source software R does modelling particularly well.

It turns out that the main surface temperature dataset upon which all of the models are based is publically available and can be analysed with R

First you must get and install R – the program is available for all platforms from http://cran.r-project.org/

then you need to run r and install package RghcnV3 the package written to access and transform the GHCN data

the actual data can be downloaded from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/

monthly average, min or max data can be downloaded. This needs to be unzipped to the .dat file

this file is then loaded into R

> require(RghcnV3)
> v<-readV3Data(filename='c:\\yourdirectory\\foo.dat') #assuming your data has been renamed foo.dat and you have a windows machine - syntax is slightly different for a linux or mac
> stationNames<-dimnames(v) # get the station names

Now you need the id of the station you want. This is available here http://data.giss.nasa.gov/gistemp/station_data_v2/v2.temperature.inv.txt

for example - I tried Omeo : 50194911000

so now which column is Omeo ?

> which(stationNames[[1]]==”50194911000″)

this gives column 5778

so let’s get the data for that :

> odata<-data.frame(v[5776,,])

this creates a data frame of the omeo data. you can have a look at it by typing odata

X1941 X1942 X1943 X1944 X1945 X1946 X1947 X1948 X1949 X1950 X1951 X1952
Jan NA NA 23.0 25.1 24.5 26.0 24.8 21.8 22.4 22.6 23.7 24.6
Feb NA 23.3 24.0 22.3 22.4 23.6 25.2 23.6 21.8 21.9 23.5 23.2
Mar NA 21.9 22.4 20.2 19.1 18.0 20.3 18.4 20.5 20.2 21.7 21.7
Apr NA 16.9 14.7 13.6 15.9 14.8 16.3 16.2 14.3 15.5 13.3 15.4
May NA 13.2 11.1 11.0 11.8 11.2 13.5 10.3 11.3 11.7 11.3 10.8
Jun NA 9.8 7.8 7.7 10.8 7.3 9.2 8.9 7.3 9.0 10.2 9.1
Jul NA 8.0 7.1 7.8 7.6 8.9 8.2 6.6 7.7 9.3 7.5 6.9
Aug NA 9.7 7.8 9.1 9.8 8.5 8.5 9.5 8.8 9.3 8.1 9.4
Sep NA 11.6 11.8 12.8 11.2 11.0 12.0 12.4 11.7 12.4 12.2 11.0
Oct NA 15.1 15.2 16.4 14.6 14.6 13.7 13.7 15.3 14.4 13.8 14.1
Nov NA 18.9 17.1 21.7 18.9 20.2 16.9 18.0 17.0 17.5 16.1 15.7
Dec NA 24.2 21.7 21.5 24.6 22.6 21.1 22.6 20.7 21.9 21.1 20.4

This is a small piece of the data - in Omeo's case available mostly from 1943 to 2011.

To get a rough and ready plot, you can stack it into a single vector

> stackedOmeo<-stack(odata)

and this can be plotted

> plot(stackedOmeo$values,type=’l')

now, there are many missing values from 1701 so let’s just plot the ones we have

> plot(stackedOmeo$values[2900:3700],type=’l')


and you can try and spot the warming.

R has many tools to fit a warming line : linear model, robust linear model, polynomials etc so have fun