Saturday, November 27, 2010

Google Refine lets you fix and handle huge, messy sets of data

googlerefine

Google has just introduced a new product, and this time it's a PC application (with a browser-based UI). It's called Google Refine, and it solves a problem that is enormous for some people: it lets you take massive sets of "messy data" and massage them into shape so that they're uniform, make sense, and can be statistically analyzed.

The video after the jump shows a very good example, which is based on a CSV file exported from a publicly available data source (a government contract system, in this case). The data is very realistic - descriptions are inconsistent (Firm Fixed Price on some rows and FFP on other rows), and even the number formats are inconsistent (you get 0.78 on one row and a number in the millions on another row).

Google Refine lets you very easily hone in on those inconsistencies and fix them in a myriad of ways. This is an important data tool because those heaps of messy data are often public records, which are available but not transparent; being able to quickly analyze them could expose some very interesting patterns and anomalies in the way that public institutions and governments behave.

[Thanks, Yanksy, for the tip!]

Continue reading Google Refine lets you fix and handle huge, messy sets of data

Filed under: ,

Google Refine lets you fix and handle huge, messy sets of data originally appeared on Download Squad on Wed, 17 Nov 2010 10:30:00 EST. Please see our terms for use of feeds.

Read | Permalink | Email this | Comments

CDW CHINA MOBILE CISCO SYSTEMS COGNIZANT TECH. SOLUTIONS COMCAST