Data Wrangling

Page 1

Data wrangling Sometimes we have to do dirty jobs

Michele Mauri DensityDesign Research Lab


Data often is messy and needs to be cleaned or at least converted





My data cleaning toolkit


1. Textwrangler * ** http://www.barebones.com/products/textwrangler/

* (notepad++ for winduz) ** (actually, any advanced texteditor)


1. Textwrangler useful to: - remove text formatting - clean hidden characters - replace separator charachters - structure data - apply regexp


2. Open Refine http://openrefine.org/


2. Open Refine useful to: - convert formats - reconcile data - structure data - enrich (link) data with freebase - apply GREL functions


3. Data wrangler http://vis.stanford.edu/wrangler/


3. Data Wrangler useful to: - reformat data values - correct erroneous or missing values - (re)structure dataset


4. Excel http://oďŹƒce.microsoft.com/en-us/excel/


4. Excel useful to: - use formulas - rearrange & filter - pivot tables


5. Code (processing, javascript‌)


5. Code useful to: - do everything


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.