Data wrangling Sometimes we have to do dirty jobs
Michele Mauri DensityDesign Research Lab
Data often is messy and needs to be cleaned or at least converted
My data cleaning toolkit
1. Textwrangler * ** http://www.barebones.com/products/textwrangler/
* (notepad++ for winduz) ** (actually, any advanced texteditor)
1. Textwrangler useful to: - remove text formatting - clean hidden characters - replace separator charachters - structure data - apply regexp
2. Open Refine http://openrefine.org/
2. Open Refine useful to: - convert formats - reconcile data - structure data - enrich (link) data with freebase - apply GREL functions
3. Data wrangler http://vis.stanford.edu/wrangler/
3. Data Wrangler useful to: - reformat data values - correct erroneous or missing values - (re)structure dataset
4. Excel http://oďŹƒce.microsoft.com/en-us/excel/
4. Excel useful to: - use formulas - rearrange & filter - pivot tables
5. Code (processing, javascript‌)
5. Code useful to: - do everything