Textwrangler & regexp
Think to regexp as advanced “find and replace�
With regexp you can look for characters or metacharactes
Selectors \t \r \s \d \w \d
tabulation line break (\n on windows) space any digit any alphanumeric any digit
Examples
Find semicolon and replace with tabulation
Find empty lines and remove them
Looking for a point
Looking for a point
Metacharachters ^$()[]{}\|.*+?
Escaping metacharacters adding a backslash is a way of indicating that we want to use one of our metacharacters as a literal
\^ \$ \( \) \[ \] \{ \} \\ \| \. \* \+ \?
Ex. 1 • go to “operabase > performances > Season 13/14”; • copy & past the Germany list of theatres; • Replace “ \\” with end of line; • Replace commas with tabulations.
Ex. 1b • go to “List of countries by carbon dioxide emissions” on Wikipedia, copy & past the list; • replace points with commas; • Remove commas between numbers; • Get rid of “sources” column
Any Character .
Any character
Repetitions + *
one or more (until the last match) zero or more (until the last match)
+? *?
one or more (until the first match) zero or more (until the first match)
{3} exact number of repetitions
Examples \s+ one or more spaces cats* value.*
match “cat” and “cats” match any character after “value”
Group of characters []
Group of characters negation [^]
Examples [azm] [0-9] [a-z] [A-Z] [A-z] [0-9,.] [^b]
match “a” “z” “m” match digits match lowercase characters match uppercase characters match both upper & lowercase match digits commas and points any character apart “b”
Ex. 2 • go to “Craiglist milano for sale / wanted”; • Look for the source code; • Find all posts links and description; • For each line, keep the URL and the description. Hint: The link structure is: <a href=”[URL]”>[DESCRIPTION]</a>
Catch ()
Examples
Find a series of digits and write â&#x20AC;&#x153;number: â&#x20AC;&#x153; before them
Examples
Find dates followed by time, like â&#x20AC;&#x153;3-feb-1984 10:23â&#x20AC;? and divide them in parts
Start/end of line ^ $
line start line end