Posts in category statistics

R tip: G-Test G-Statistic G^2 likelihood ratio, or whatever else you might want to call it.

When analyzing categorical data, sometimes Chi-Square just isn't the right distribution for testing goodness-of-fit or testing Independence. So many people recommend a G test instead.  http://www.biostathandbook.com/chiind.html

Being a user of  R, obviously I'd like to also run this test along with my other tests. A little searching the web, and answers are littered with, "R doesn't have g-test built in, here's code to do it yourself..." Which is 1/2 true, unlike the chisq.test the base R does not appear to have a g-test. I'd rather leave coding of standard statistics to people who really know the ins and outs of the formulas and have a good way to verify the answer.

So, a few hours later I find  Deducer has  likelihood.test So we're all good, right?

Well then when I got significant results I started looking for Post-hoc tests. In doing so it turns out that the following also do G-tests as part of their Measures of Association tests (typically used as post-hoc tests):

So there, base R doesn't have it, but at least 3 packages do so people don't need to keep re-writing it.

FYI  http://www.rdocumentation.org/ is awesome if you haven't seen it yet.

Reshape R - long to wide conversion

--May be incorrect, working on a fix will post when done-- Keeping data in long format just makes sense, but for some reason statistics often requires your data in wide format. The good news is that it's much easier to go from long to wide than the other way around. Although the tool I'm about to describe can go both ways.

Using  R and pulling a dataframe in from an SQLite database the following command will take the dataframe and for every Species listed create a new column based on it. Then all the records are grouped by their Plot and the resulting Percent Cover for a given species in a plot is now a value in one of the columns instead of it being it's own row.

Plant (the data.frame)

Plot Species PrCover
A Poppy 5
A Redwood 20
B Oak 50
B Poppy 10
 WidePlant <- reshape(Plant, v.names = "PrCover", idvar = "Plot", timevar = "Species", direction = "wide")

WidePlant (the results)

Plot PrCover.Poppy PrCover.Redwood PrCover.Oak
A 5 20 NA
B 10 NA 50

The documentation is kinda hard to read, so here's my attempt at plain english

  • v.names = the values you want to show up under your new columns
  • idvar = the id that you want to group your data record by
  • timevar = the values that you want to make up the new columns, however many distinct values are in this column determines the number of new columns
  • direction = wide, the destination or resulting format we want