Perl Quick Reference Card
Operator Precedence (continued) 87 Associativiy Arity Precedence Class version 0.02 – editor: John Bokma – freelance programmer Right 3 ?: DRAFT VERSION, check: http://johnbokma.com/perl/ Right 2 = += -= *= and so on Backslashed Character Escapes 61 Left 2 , => \n Newline (usually LF) \0 Null character (NUL) Right 0+ List operators (rightward) \r Carriage return (usually CR) \033 ESC in octal Right 1 not \t Horizontal tab (HT) \x7f DEL in hexadecimal Left 2 and \f Form feed (FF) \cC Control-C Left 2 or xor \b Backspace (BS) \x{263a} Unicode, ☺(smiley) \a Alert (BEL) \N{NAME} Named character File Test Operators 98 \e Escape (ESC) -r File is readable by effective UID/GID. -w File is writable by effective UID/GID. Translation Escapes 61 -x File is executable by effective UID/GID. \u Force next character to uppercase (“titlecase” in Unicode). -o File is owned by effective UID/GID. \l Force next character to lowercase. -R File is readable by real UID/GID. \U Force all following characters to uppercase -W File is writable by real UID/GID. \L Force all following characters to lowercase -X File is executable by real UID/GID. \Q Backslash all following non-"word" characters (quotemeta) -O File is owned by real UID/GID. \E End \U, \L, or \Q. -e File exists. Quote Constructs 63 -z File has zero size -s File has nonzero size (returns size). Customary Generic Meaning Interpolates -f File is a plain file. -d File is a directory. '' q// Literal string No -l File is a symbolic link. "" qq// Literal string Yes -p File is a named pipe (FIFO). `` qx// Command execution Yes -S File is a socket. () qw// Word list No -b File is a block special file. // m// Pattern match Yes -c File is a character special file. s/// s/// Pattern substitution Yes -t Filehandle is open to a tty. y/// tr/// Character translation No -u File has setuid bit set. "" qr// Regular expression Yes -g File has setgid bit set. -k File has sticky bit set. Note: no interpolation is done if you use single quotes for delimiters. -T File is a text file. Operator Precedence 87 -B File is a binary file (opposite of -T). Associativiy Arity Precedence Class -M Age of file (at startup) in (fractional) days since modification. None 0 Terms, and list operators (leftward) -A Age of file (at startup) in (fractional) days since last access. Left 2 -> -C Age of file (at startup) in (fractional) days since inode change. None 1 ++ -Pattern Modifiers 147 Right 2 ** /i Ignore alphabetic case distinctions (case insensitive). Right 1 ! ~ > and unary + and unary /s Let . match newline and ignore deprecated $* variable. Left 2 =~ !~ /m Let ^ and $ match next embedded \n. Left 2 */%x /x Ignore (most) whitespace and permit comments in pattern. Left 2 +-. /o Compile pattern only once. Left 2 << >> Right 0,1 Named unary operators Additional m// Modifiers 150 None 2 < > <= >= lt gt le ge /g Globally find all matches. None 2 == != <=> eq ne cmp /cg Allow continued search after failed /g match. Left 2 & Left 2 |^ Additional s/// Modifiers 153 Left 2 && /g Replace globally, that is, all occurences. Left 2 || /e Evaluate the right side as an expression. None 2 .. ...
tr/// Modifiers /c Complement SEARCHLIST. /d Delete found but unreplaced characters. /s Squash duplicate replaced characters.
156
General Regex Metacharacters
159
Symbol Atomic Meaning \…
Varies
…|… (…) […]
No Yes Yes
^
No
. $
Yes No
De-meta next nonalphanumeric character, meta next alphanumeric character (maybe). Alternation (match one or the other). Grouping (treat as a unit). Character class (match one character from a set). True at beginning of string (or after a newline, maybe). Match one character (except newline, normally). True at end of string (or before any newline, maybe).
Regex Quantifiers
159-160
Quantifier Atomic Meaning
No No No No No No
Match 0 or more times (maximal). Match 1 or more times (maximal). Match 0 or 1 time (maximal). Match exactly COUNT times. Match at least MIN times (maximal). Match at least MIN but not more than MAX times (maximal).
No No No {MIN,}? No {MIN,MAX}? No
Match 0 or more times (minimal). Match 1 or more times (minimal). Match 0 or 1 time (minimal). Match at least MIN times (minimal). Match at least MIN but not more than MAX times (minimal).
* + ? {COUNT} {MIN,} {MIN,MAX}
*? +? ??
Extended Regex Sequences Extension
160
Atomic Meaning
(?#…) (?:…) (?imsx-imsx) (?imsx-imsx:…) (?=…) (?!…) (?<=…) (?<!…) (?>…) (?{…}) (??{…}) (?(…)…|…) (?(…)…)
No Yes No Yes No No No No Yes No Yes Yes Yes
Comment, discard. Cluster-only parentheses, no capturing. Enable/disable pattern modifiers. Cluster-only parentheses plus modifiers. True if lookahead assertion succeeds. True if lookahead assertion fails. True if lookbehind assertion succeeds. True if lookbehind assertion fails. Match nonbacktracking subpattern. Execute embedded Perl code. Match regex from embedded Perl code. Match with if-then-else pattern. Match with if-then pattern.
Alphanumeric Regex Metasymbols
161-162
Composite Unicode Properties
168-169
Symbol Atomic Meaning
Property Equivalent IsASCII [\x00-\x7f] Match the null character (ASCII NUL). IsAlnum [\p{IsLl}\p{IsLu}\p{IsLt}\p{IsLo}\p{IsNd} Match the character given in octal, up to \377. IsAlpha [\p{IsLl}\p{IsLu}\p{IsLt}\p{IsLo} Match nth previously captured string (decimal). IsCntrl \p{IsC} Match the alarm character (BEL). IsDigit \p{IsNd} True at the beginning of a string. IsGraph [^\pC\p{IsSpace}] IsLower \p{IsLl} Match the backspace character (BS). IsPrint \P{IsC} True at a word boundary. IsPunct \p{IsP} True when not at a word boundary. IsSpace [\t\n\f\r\p{IsZ}] Match the control character Ctrl-X (\cZ). IsUpper [\p{IsLu}\p{IsLt}] Match one byte (C char) even in utf8 (dangerous). IsWord [_\p{IsLl}\p{IsLu}\p{IsLt}\p{IsLo}\p{IsNd}] IsXDigit [0-9a-fA-F] Match any digit character.
\0 \NNN \n \a \A \b \b \B \cX \C \d \D \e \E \f \G \l \L \n
Yes Yes Yes Yes No Yes No No Yes Yes Yes Yes Yes — Yes No — — Yes
\N{NAME} \p{PROP} \P{PROP} \Q \r
Yes Yes Yes — Yes
Match any non-digit character. Match the escape character (ASCII ESC, not \ ). End case (\L, \U) or quotemeta (\Q) translation. Match the form feed character (FF). True at end-of-match position of prior m//g. Lowercase the next character only. Lowercase till \E. Match the newline character (usually NL, but CR on Macs). Match the named char (\N{greek:Sigma}). Match any character with named property. Match any character without the named property. Quote (de-meta) metacharacters till \E. Match the return character (usually CR, but NL
Yes Yes Yes — — Yes Yes Yes Yes Yes
Match any whitespace character. Match any nonwhitespace character. Match the tab character (HT). Titlecase next character only. Uppercase (not titlecase) till \E. Match any “word” character (alphanum plus “_”). Match any nonword character. Match the character given one or two hex digits. Match the character given in hexadecimal. Match Unicode “combining character sequence”
No No
True at end of string only. True at end of string or before optional newline.
on Macs). \s \S \t \u \U \w \W \xHEX \x{abcd} \X
string. \z \Z
Classic Character Classes Symbol Meaning As Bytes \d Digit [0-9] \D Nondigit [^0-9] \s White [ \t\n\r\f] \S Nonwhitespace [^ \t\n\r\f] \w Word character [a-zA-Z0-9_] \W Non-(word character) [^a-zA-Z0-9_]
167 As utf8 \p{IsDigit} \P{IsDigit} \p{IsSpace} \P{IsSpace} \p{IsWord} \P{IsWord}
Perl also provides the following composites: Property
Meaning
IsC IsL IsM IsN IsP IsS IsZ
Crazy control characters and such Letters Marks Numbers Punctuation Symbols Separators (Zeparators?)
POSIX-Style Character Classes
Normative
Yes Partly Yes Yes No No Yes 174-175
Class Meaning alnum Any alphanumeric, that is an alpha or a digit. alpha Any letter. (That's a lot more letters than you think, unless
you're thinking Unicode, in which case it's still a lot.) ascii Any character with an ordinal value between 0 and 127. cntrl Any control character. Usually characters that don't
produce output as such, but instead control the terminal somehow; for example, newline, form feed, and backspace. digit A character representing a decimal digit, such as 0 to 9. (Includes other characters under Unicode.) Equivalent to \d. graph Any alphanumeric or punctuation character. lower A lowercase letter. print Any alphanumeric or punctuation character or space. punct Any punctuation character. space Any space character. Includes tab, newline, form feed, and carriage return (and a lot more under Unicode.) Equivalent to \s. upper Any uppercase (or titlecase) letter. word Any identifier character, either an alnum or underline. xdigit Any hexadecimal digit. Equivalent to [0-9a-fA-F]. You can negate the POSIX character classes by prefixing the class name with a ^ following the [:. (This is a Perl extension.)