Posts in category database

How big was that database?

Database servers are great, but there's a lot of magic in there sometimes and it can be hard to figure just how much storage is being taken up by what database and which tables.

A nice little hint on how to check the size of the whole or parts of you database server (Postgres):  http://feeding.cloud.geek.nz/2009/02/finding-size-of-postgres-database-on.html

Or for the lazy

SELECT pg_database.datname,
       pg_size_pretty(pg_database_size(pg_database.datname)) AS size
  FROM pg_database;

Reshape R - long to wide conversion

--May be incorrect, working on a fix will post when done-- Keeping data in long format just makes sense, but for some reason statistics often requires your data in wide format. The good news is that it's much easier to go from long to wide than the other way around. Although the tool I'm about to describe can go both ways.

Using  R and pulling a dataframe in from an SQLite database the following command will take the dataframe and for every Species listed create a new column based on it. Then all the records are grouped by their Plot and the resulting Percent Cover for a given species in a plot is now a value in one of the columns instead of it being it's own row.

Plant (the data.frame)

Plot Species PrCover
A Poppy 5
A Redwood 20
B Oak 50
B Poppy 10
 WidePlant <- reshape(Plant, v.names = "PrCover", idvar = "Plot", timevar = "Species", direction = "wide")

WidePlant (the results)

Plot PrCover.Poppy PrCover.Redwood PrCover.Oak
A 5 20 NA
B 10 NA 50

The documentation is kinda hard to read, so here's my attempt at plain english

  • v.names = the values you want to show up under your new columns
  • idvar = the id that you want to group your data record by
  • timevar = the values that you want to make up the new columns, however many distinct values are in this column determines the number of new columns
  • direction = wide, the destination or resulting format we want

SQLite and ODBC for Data Entry

The one downside of all the good database systems is the lack of an easy tool for entering data, especially coding in data off of hand written field forms.

I recently revisited the idea of using Open Office Base or Access as a front end to better databases. In this particular case due the number of issues and my familiarity I got Access working, I plan to go back and also get OOo working next chance I get and taking over my friends windows box (The data entry is for her anyways).

Tools: sqlite-odbc driver (I tested it on Windows and Linux) An ODBC client: Access, Excel, Open Office Base, Calc

Issues:

  1. All your tables must have a primary key declared.
    1. If you don't have one it's real quick using the Firefox SQL Manager to fix that, however you have to make new tables), something like this.
      CREATE TABLE NewData (pk INTEGER PRIMARY KEY AUTOINCREMENT, Afield, someotherfield);
      INSERT INTO NewData (Afield,someotherfield) SELECT * FROM Data;
      DROP TABLE Data;
      ALTER TABLE NewDATA RENAME TO Data;
      
  2. Declaring field types as TEXT, Access will import them as blobs if you do and this makes linking the tables difficult. Just drop the type.
  3. The Relationship tool in Access and Open Office are useless with linked-tables in this case. OO tells you so, Access doens't.
    1. To get around this I created nested forms each based on 1 table. When inserting a nested table into another I built in the relationship to the form so that for every record in the parent child records would be matched automatically.

More details to come soon...

List of useful links:

  • a
  • b