Saturday, August 13, 2016

Regular Expressions

/ [^ ] * / g Door County is beautiful year-round

Match a sequence of zero or more characters that are not a space. The /g in the first one
above makes the match global.

/ [^ ] * / Door County is beautiful year-round

Without /g only the first occurrence in the string is matched.

/ .*[ ] / U Door County is beautiful year-round

Match a sequence of zero or more characters that ends with a space. The dot (.) will match any character, including spaces. The * says zero or more. Then there is a space, so the string we are matching must end with a space character. The /U option makes it "ungreedy", so that it matches the first, smallest substring it encounters, rather than the longest substring.

/ ^https?:\ / \ / /

Match "http://" or "https://" at the beginning of a sequence. s? says match a single s if there's one otherwise skip. The two forward slashes which are special characters have to be escaped with two backward slashes.

Some regex rules:

  • Dot (.) matches any character, except newlines. If you provide a /s modifier then it will match newlines too. Example: /^The.*/s would match any sequence of text where the first line starts with the word "The", until it reaches the end of the text being searched.
  • If you need to explicitly match a dot (aka period), then you need to escape it. Let's change the example above as follows: /^The.*\./s Now it will only match until it encounters a period, which could be on a different line of the text.
  • The ^ character in the above examples means match from the beginning of the text being searched.
  • You can also match at the end of the text being searched with $. So /\.$/ would match any string that ends with a period.
  • Instead of matching zero or more, you can also match one or more using +
  • You can also match zero or one, using ?
  • Curly braces let you define a length range you want to match, so you write {minlength,maxlength}. Example: Suppose you want to match a series of the character "a" that has a length between 3 and 6. You can do it like this: a{3,6}
  • Square brackets defines a set of characters to match. So [ab] will match either a or b. This could also be written as [a-b]. If you wanted to match any lower case alphabet character, you'd write [a-z].
  • You can also use an "or", so that you match one string or another. Suppose you want to match "abc" or "xyz". You write it like this: /(abc)|(xyz)/ Actually the parenthesis are not needed, but are good for making it clear.
  • Note that the starting and end slashes / are the delimiters of the regular expression (the start and end). You can place modifiers, like the g after the ending slash to change the way the regular expression is interpreted.

Post a Comment