Okay, here’s a long-form article (approximately 5000 words) serving as an “Introduction to Regular Expressions (Online Guide)”:
Introduction to Regular Expressions (Online Guide)
Table of Contents
-
What are Regular Expressions?
- 1.1 A Simple Analogy
- 1.2 Formal Definition
- 1.3 Why Learn Regular Expressions?
- 1.4 Use Cases
- 1.5 Flavors of Regular Expressions
-
Basic Syntax and Concepts
- 2.1 Literal Characters
- 2.2 Metacharacters: The Special Operators
- 2.3 Character Sets (
[]
)- 2.3.1 Ranges within Character Sets
- 2.3.2 Negated Character Sets (
[^]
)
- 2.4 The Dot (
.
) – Matching Any Character (Almost) - 2.5 Anchors:
^
(Start) and$
(End) - 2.6 Alternation (
|
) – Matching “or” - 2.7 Quantifiers:
*
,+
,?
, and{}
- 2.7.1
*
(Zero or More) - 2.7.2
+
(One or More) - 2.7.3
?
(Zero or One) - 2.7.4
{n}
(Exactly n times) - 2.7.5
{n,}
(At least n times) - 2.7.6
{n,m}
(Between n and m times)
- 2.7.1
- 2.8 Greedy vs. Lazy Quantifiers
- 2.9 Escaping Metacharacters (
\
) - 2.10 Word Boundaries (
\b
) and Non-Word Boundaries (\B
)
-
Grouping and Capturing
- 3.1 Parentheses
()
for Grouping - 3.2 Capturing Groups and Backreferences
- 3.3 Non-Capturing Groups
(?:...)
- 3.1 Parentheses
-
Character Classes (Shorthand Character Sets)
- 4.1
\d
: Digits - 4.2
\w
: Word Characters - 4.3
\s
: Whitespace Characters - 4.4 Negated Character Classes (
\D
,\W
,\S
)
- 4.1
-
Lookarounds (Zero-Width Assertions)
- 5.1 Positive Lookahead
(?=...)
- 5.2 Negative Lookahead
(?!...)
- 5.3 Positive Lookbehind
(?<=...)
- 5.4 Negative Lookbehind
(?<!...)
- 5.5 Lookaround Limitations
- 5.1 Positive Lookahead
-
Modifiers (Flags)
- 6.1 Case-Insensitive (
i
) - 6.2 Global Search (
g
) - 6.3 Multiline (
m
) - 6.4 Dotall/Single Line (
s
) - 6.5 Other Modifiers
- 6.1 Case-Insensitive (
-
Regular Expressions in Practice: Common Tools and Languages
- 7.1 Online Regex Testers (regex101, regexr, etc.)
- 7.2 Text Editors (VS Code, Sublime Text, Notepad++, etc.)
- 7.3 Command-Line Tools (grep, sed, awk)
- 7.4 Programming Languages
- 7.4.1 JavaScript
- 7.4.2 Python
- 7.4.3 Java
- 7.4.4 C#
- 7.4.5 PHP
- 7.4.6 Perl
- 7.4.7 Ruby
-
Common Regular Expression Patterns and Examples
- 8.1 Matching Email Addresses
- 8.2 Matching Phone Numbers
- 8.3 Matching URLs
- 8.4 Matching Dates and Times
- 8.5 Matching HTML Tags
- 8.6 Matching IP Addresses
- 8.7 Validating User Input
- 8.8 Extracting Data from Log Files
- 8.9 Finding and Replacing Text
- 8.10 Password Validation
-
Advanced Topics
- 9.1 Atomic Groups
- 9.2 Recursive Regular Expressions
- 9.3 Unicode and Regular Expressions
- 9.4. Regular Expression Optimization
-
Best Practices and Pitfalls
- 10.1 Keep it Simple
- 10.2 Comment Your Regexes
- 10.3 Test Thoroughly
- 10.4 Beware of Catastrophic Backtracking
- 10.5 Understand Your Regex Flavor
- 10.6 Don’t Use Regex When Simpler Methods Exist
-
Resources and Further Learning
1. What are Regular Expressions?
Regular expressions (often shortened to “regex” or “regexp”) are powerful tools for pattern matching within text. Think of them as a highly sophisticated “find and replace” mechanism, but instead of searching for a specific, literal string, you define a pattern that describes the text you want to find.
1.1 A Simple Analogy
Imagine you have a large box of LEGO bricks of various colors and shapes. You want to find all the red, 2×4 bricks. You could manually examine each brick, checking its color and size. This is like a simple string search.
Now, imagine you have a special tool that lets you specify a rule: “Find all bricks that are red AND have two studs wide AND four studs long.” This tool, in essence, is a regular expression engine. The rule you define is the regular expression.
1.2 Formal Definition
Formally, a regular expression is a sequence of characters that defines a search pattern. This pattern is then used by a regular expression engine (a software component) to search through a string (the “target string”) and identify portions of the text that match the defined pattern.
1.3 Why Learn Regular Expressions?
Regular expressions offer several significant advantages:
- Power and Flexibility: They can match complex patterns that would be incredibly difficult or impossible to describe with simple string searches.
- Efficiency: Once you’ve crafted a regex, it can often perform complex text manipulations very quickly.
- Conciseness: A single, well-written regex can replace many lines of procedural code that would be needed to achieve the same result.
- Ubiquity: Regular expressions are supported in virtually every programming language, text editor, and command-line tool, making them a highly portable skill.
- Time Savings: Automate repetitive text processing tasks, saving significant time and effort.
1.4 Use Cases
Regular expressions are used in a vast array of applications, including:
- Data Validation: Checking if user input (e.g., email addresses, phone numbers, passwords) conforms to a specific format.
- Data Extraction: Pulling specific pieces of information (e.g., dates, prices, URLs) from text or log files.
- Text Transformation: Modifying text by finding and replacing patterns (e.g., converting date formats, standardizing whitespace).
- Syntax Highlighting: Text editors use regexes to identify keywords, comments, and other language elements for colored display.
- Search and Replace: Performing complex find-and-replace operations in text editors or code.
- Web Scraping: Extracting data from websites by matching HTML patterns.
- Network Security: Analyzing network traffic for suspicious patterns.
- Bioinformatics: Searching and analyzing DNA and protein sequences.
1.5 Flavors of Regular Expressions
While the fundamental concepts of regular expressions are consistent, there are different “flavors” or implementations. These flavors differ slightly in the syntax they support, the features they offer, and how they handle certain edge cases. Some common flavors include:
- PCRE (Perl Compatible Regular Expressions): A widely used and feature-rich flavor, often considered the standard. Used in many languages (PHP, Python (with the
re
module), and tools. - JavaScript: JavaScript has its own built-in regex engine, which is similar to PCRE but has some differences.
- .NET: The .NET framework has its own regex engine, with some unique features.
- Java: Java’s
java.util.regex
package provides regex support. - POSIX: A standard for regular expressions used in some Unix-like systems. There are two main POSIX flavors: Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE).
It’s crucial to be aware of the specific flavor you’re using, as a regex that works perfectly in one environment might need slight adjustments to work in another. This guide will primarily focus on PCRE-like syntax, as it’s the most common, but will note significant differences where appropriate.
2. Basic Syntax and Concepts
Let’s dive into the building blocks of regular expressions.
2.1 Literal Characters
The simplest regular expressions are just literal characters. For example, the regex cat
will match the string “cat” exactly. It will find the first occurrence of “c” followed immediately by “a” followed immediately by “t”.
2.2 Metacharacters: The Special Operators
The real power of regular expressions comes from metacharacters. These are special characters that have a specific meaning within a regex, rather than representing themselves literally. Here are the core metacharacters:
.
(Dot)^
(Caret/Circumflex)$
(Dollar sign)*
(Asterisk)+
(Plus sign)?
(Question mark){
and}
(Curly braces)[
and]
(Square brackets)\
(Backslash)|
(Pipe)(
and)
(Parentheses)
We’ll explore each of these in detail.
2.3 Character Sets ([]
)
Square brackets []
define a character set. A character set matches any one of the characters enclosed within the brackets.
[abc]
will match “a”, “b”, or “c”.[0123456789]
will match any single digit.
2.3.1 Ranges within Character Sets
You can specify a range of characters within a character set using a hyphen -
.
[a-z]
will match any lowercase letter from “a” to “z”.[A-Z]
will match any uppercase letter from “A” to “Z”.[0-9]
will match any digit from 0 to 9 (equivalent to[0123456789]
).[a-zA-Z0-9]
will match any lowercase or uppercase letter, or any digit.
2.3.2 Negated Character Sets ([^]
)
If the first character inside a character set is a caret ^
, it negates the set. This means it will match any character except those listed within the brackets.
[^abc]
will match any character except “a”, “b”, or “c”.[^0-9]
will match any character that is not a digit.
2.4 The Dot (.
) – Matching Any Character (Almost)
The dot .
is a metacharacter that matches any character except a newline character (\n
) by default. (This behavior can be modified with the “dotall” or “single line” flag, discussed later.)
a.c
will match “abc”, “aac”, “a1c”, “a c”, etc. It will match any sequence where “a” is followed by any character, followed by “c”.
2.5 Anchors: ^
(Start) and $
(End)
The ^
(caret) and $
(dollar sign) are anchors. They don’t match any specific character, but rather assert a position within the string.
^
matches the beginning of the string (or the beginning of a line if the multiline flag is enabled).-
$
matches the end of the string (or the end of a line if the multiline flag is enabled). -
^hello
will match “hello” only if it appears at the beginning of the string (or line). world$
will match “world” only if it appears at the end of the string (or line).^hello world$
will match the string “hello world” only if it’s the entire string (or line).
2.6 Alternation (|
) – Matching “or”
The pipe symbol |
represents alternation. It allows you to match one pattern or another.
cat|dog
will match either “cat” or “dog”.gr(a|e)y
will match either “gray” or “grey”. (Parentheses are used for grouping, explained later.)
2.7 Quantifiers: *
, +
, ?
, and {}
Quantifiers specify how many times a preceding element (a character, a character set, or a group) should be matched.
2.7.1 *
(Zero or More)
The asterisk *
matches the preceding element zero or more times.
ab*c
will match “ac”, “abc”, “abbc”, “abbbc”, etc.
2.7.2 +
(One or More)
The plus sign +
matches the preceding element one or more times.
ab+c
will match “abc”, “abbc”, “abbbc”, etc., but not “ac”.
2.7.3 ?
(Zero or One)
The question mark ?
matches the preceding element zero or one time (i.e., it makes the preceding element optional).
colou?r
will match both “color” and “colour”.
2.7.4 {n}
(Exactly n times)
Curly braces {}
with a single number n
inside match the preceding element exactly n
times.
a{3}
will match “aaa” but not “aa” or “aaaa”.
2.7.5 {n,}
(At least n times)
Curly braces with a number n
followed by a comma match the preceding element at least n
times.
a{2,}
will match “aa”, “aaa”, “aaaa”, etc.
2.7.6 {n,m}
(Between n and m times)
Curly braces with two numbers n
and m
separated by a comma match the preceding element between n
and m
times (inclusive).
a{2,4}
will match “aa”, “aaa”, or “aaaa”, but not “a” or “aaaaa”.
2.8 Greedy vs. Lazy Quantifiers
By default, quantifiers are greedy. This means they will try to match as much text as possible. Sometimes, this is not the desired behavior. You can make a quantifier lazy (or non-greedy) by adding a question mark ?
after it. A lazy quantifier will try to match as little text as possible.
Consider the string “aaaa” and the regex a+
.
a+
(greedy) will match the entire string “aaaa”.a+?
(lazy) will match only the first “a”.
Consider the string <p>This is a paragraph.</p>
and we are trying to extract content within HTML paragraph tags:
<.*>
will match the entire string because.
can match any character, including<
, and*
is greedy.<.*?>
will match only<p>
because*?
tells the dot to match as few characters as possible before encountering the next character in the pattern.
2.9 Escaping Metacharacters (\
)
If you want to match a metacharacter literally (e.g., you want to find a literal dot, asterisk, or question mark), you need to escape it using a backslash \
.
\.
will match a literal dot (.).\*
will match a literal asterisk (*).\+
will match a literal plus sign (+).\?
will match a literal question mark (?).\[
will match a literal opening square bracket ([).\\
will match a literal backslash ().
2.10 Word Boundaries (\b
) and Non-Word Boundaries (\B
)
\b
is an anchor like ^
and $
, matching a position called a “word boundary.” This position lies between a word character (\w
) and a non-word character (\W
), or the beginning/end of the string if the first/last character is a word character.
\bcat\b
will match “cat” in “The cat sat”, but not in “concatenate”.
\B
matches any position that \b
doesn’t match.
\Bcat\B
will match “cat” in “concatenate”, but not in “The cat sat”.
3. Grouping and Capturing
3.1 Parentheses ()
for Grouping
Parentheses ()
are used for grouping parts of a regular expression. This has several important effects:
- Applying Quantifiers to Groups: You can apply a quantifier to an entire group, rather than just a single character.
(ab)+
will match “ab”, “abab”, “ababab”, etc.
- Alternation within Groups: You can use alternation (
|
) within a group to match different alternatives.(cat|dog)food
will match “catfood” or “dogfood”.
- Capturing Groups (see below): Parentheses also create capturing groups, which allow you to extract specific portions of the matched text.
3.2 Capturing Groups and Backreferences
When you use parentheses to group part of a regex, you also create a capturing group. The text matched by each capturing group is stored and can be accessed later, either within the regex itself (using backreferences) or after the match is complete (using the features of your programming language or tool).
-
Backreferences: Within the regex, you can refer to the text captured by a group using
\1
for the first capturing group,\2
for the second, and so on.([a-z])\1
will match any lowercase letter that is immediately repeated (e.g., “aa”, “bb”, “cc”). The\1
refers back to the text captured by the first group([a-z])
.(\w+) \1
will look for repeated words. It will match “hello hello” or “bye bye”.- Accessing Captures After the Match: Most regex engines provide ways to access the captured text after a match is found. For instance, in Python, you can use the
group()
method of a match object, and in JavaScript, you can access the captured groups through the array returned by thematch()
orexec()
methods.
3.3 Non-Capturing Groups (?:...)
Sometimes, you need to group parts of a regex for quantifiers or alternation, but you don’t want to capture the matched text. In this case, you can use a non-capturing group, which is created with (?:...)
.
(?:ab)+
will match “ab”, “abab”, “ababab”, etc., just like(ab)+
, but it won’t create a capturing group. This is more efficient if you don’t need to access the captured text later.
4. Character Classes (Shorthand Character Sets)
For common character sets, regular expressions provide convenient shorthand notations called character classes.
4.1 \d
: Digits
\d
is equivalent to [0-9]
. It matches any single digit.
4.2 \w
: Word Characters
\w
matches any “word character”. This typically includes alphanumeric characters (letters and digits) and the underscore _
. The exact set of characters matched by \w
can depend on the regex flavor and locale settings, but it generally includes [a-zA-Z0-9_]
.
4.3 \s
: Whitespace Characters
\s
matches any whitespace character. This typically includes space, tab (\t
), newline (\n
), carriage return (\r
), form feed (\f
), and vertical tab (\v
).
4.4 Negated Character Classes (\D
, \W
, \S
)
The uppercase versions of the character classes are their negations:
\D
is equivalent to[^0-9]
. It matches any character that is not a digit.\W
is equivalent to[^a-zA-Z0-9_]
. It matches any character that is not a word character.\S
matches any character that is not a whitespace character.
5. Lookarounds (Zero-Width Assertions)
Lookarounds are powerful features that allow you to assert that a certain pattern must or must not exist before or after the main pattern you’re trying to match, without including those surrounding patterns in the overall match. They are zero-width assertions, meaning they don’t consume any characters in the string.
5.1 Positive Lookahead (?=...)
(?=...)
asserts that the pattern inside the parentheses must follow the current position, but it’s not included in the match.
q(?=u)
will match “q” only if it’s followed by “u”, but the “u” is not part of the match. In the string “quit”, it will match the “q”, but the overall match will be just “q”, not “qu”.
5.2 Negative Lookahead (?!...)
(?!...)
asserts that the pattern inside the parentheses must not follow the current position.
q(?!u)
will match “q” only if it’s not followed by “u”. In “Iraq”, it will match the “q”. In “quit”, it will not match anything.
5.3 Positive Lookbehind (?<=...)
(?<=...)
asserts that the pattern inside the parentheses must precede the current position.
(?<=a)b
will match “b” only if it’s preceded by “a”. In “cab”, it will match the “b”.
5.4 Negative Lookbehind (?<!...)
(?<!...)
asserts that the pattern inside the parentheses must not precede the current position.
(?<!a)b
will match “b” only if it’s not preceded by “a”. In “cb”, it will match the “b”. In “ab”, it will not match anything.
5.5 Lookaround Limitations
Some regex engines (notably JavaScript) have limitations on lookbehind assertions. Specifically, many engines require lookbehind patterns to have a fixed length. This means you can’t use quantifiers like *
or +
inside a lookbehind in those engines. PCRE (and many other modern engines) do support variable-length lookbehinds.
6. Modifiers (Flags)
Modifiers (also called flags) are options that change the behavior of the regular expression engine. They are typically specified after the closing delimiter of the regex (e.g., after the last /
in JavaScript or Perl) or as separate arguments to a function in some programming languages.
6.1 Case-Insensitive (i
)
The i
modifier makes the regex case-insensitive.
/abc/i
(JavaScript) will match “abc”, “Abc”, “aBc”, “ABC”, etc.
6.2 Global Search (g
)
The g
modifier tells the engine to find all matches in the string, not just the first one.
/abc/g
(JavaScript) will find all occurrences of “abc” in a string. Without theg
flag, it would only find the first.
6.3 Multiline (m
)
The m
modifier changes the behavior of the ^
and $
anchors. With the m
flag:
^
matches the beginning of the string and the beginning of each line (after a newline character).$
matches the end of the string and the end of each line (before a newline character).
Without the m
flag, ^
and $
only match the very beginning and very end of the entire string.
6.4 Dotall/Single Line (s
)
The s
modifier (sometimes called “dotall”) changes the behavior of the dot .
. Normally, .
matches any character except a newline. With the s
flag, .
will match any character, including newlines.
6.5 Other Modifiers
There are other modifiers available in some regex flavors, such as:
x
(Extended/Ignore Whitespace): Allows you to add whitespace and comments to your regex for readability (PCRE and some others).u
(Unicode): Enables full Unicode support (JavaScript and some others).
7. Regular Expressions in Practice: Common Tools and Languages
Now that you understand the basic syntax, let’s see how to use regular expressions in various tools and programming languages.
7.1 Online Regex Testers (regex101, regexr, etc.)
Online regex testers are invaluable tools for learning, experimenting, and debugging regular expressions. They provide a user-friendly interface where you can:
- Enter your regular expression.
- Enter a test string.
- See the matches highlighted in real-time.
- View captured groups.
- Choose different regex flavors.
- Get explanations of the regex syntax.
- Generate code snippets for various languages.
Some popular online regex testers include:
- regex101.com: A very comprehensive and feature-rich tester with support for multiple flavors, explanations, and code generation. Highly recommended.
- regexr.com: Another excellent tester with a clean interface and good explanations.
- debuggex.com: A visual regex debugger that shows how the regex engine processes the string.
7.2 Text Editors (VS Code, Sublime Text, Notepad++, etc.)
Most modern text editors have built-in support for regular expression search and replace. The exact way to enable regex mode varies by editor, but it’s usually a checkbox or button in the find/replace dialog.
- VS Code: In the Find (Ctrl+F or Cmd+F) and Replace (Ctrl+H or Cmd+H) widgets, click the
.*
icon to enable regular expression mode. - Sublime Text: Similar to VS Code, click the
.*
icon in the Find and Replace panels. - Notepad++: In the Find and Replace dialogs, select the “Regular expression” radio button under “Search Mode”.
7.3 Command-Line Tools (grep, sed, awk)
Unix-like operating systems (Linux, macOS, BSD) come with powerful command-line tools that use regular expressions extensively:
-
grep
(Global Regular Expression Print): Searches for lines in files that match a given regex.grep 'pattern' filename.txt
(prints lines matching ‘pattern’)grep -i 'pattern' filename.txt
(case-insensitive search)grep -v 'pattern' filename.txt
(prints lines not matching ‘pattern’)grep -E 'pattern' filename.txt
(uses extended regular expressions – ERE)grep -P 'pattern' filename.txt
(uses Perl-compatible Regular Expressions – PCRE, if available)
-
sed
(Stream Editor): Performs text transformations based on regular expressions. The most common use iss/pattern/replacement/flags
.sed 's/old/new/g' filename.txt
(replaces all occurrences of “old” with “new”)sed -E 's/([0-9]+)/\1\1/g' input.txt
# Doubles every sequence of numbers using a backreference.
-
awk
: A powerful text processing language that uses regular expressions for pattern matching and data manipulation.
7.4 Programming Languages
Almost all programming languages provide built-in support for regular expressions, either through standard libraries or built-in language features. Here are examples in some popular languages:
7.4.1 JavaScript
“`javascript
// Create a RegExp object:
const regex = /pattern/flags; // Literal notation
const regex2 = new RegExp(“pattern”, “flags”); // Constructor
// Test if a string matches:
const str = “hello world”;
const matches = regex.test(str); // Returns true or false
// Find matches:
const matches2 = str.match(regex); // Returns an array of matches (or null)
// Replace matches:
const newStr = str.replace(regex, “replacement”);
// Example: Extract digits
const str2 = “abc123def456”;
const digitRegex = /\d+/g; // Match one or more digits globally
const digits = str2.match(digitRegex); // [“123”, “456”]
“`
7.4.2 Python
“`python
import re
Compile a regex:
regex = re.compile(r”pattern”, flags) # Raw string (r””) is recommended
Test if a string matches:
match = regex.match(string) # Matches only at the beginning of the string
match = regex.search(string) # Searches anywhere in the string
Find all matches:
matches = regex.findall(string) # Returns a list of all matching substrings
matches = regex.finditer(string) # Returns an iterator of match objects
Replace matches:
new_string = regex.sub(“replacement”, string)
Access captured groups:
if match:
print(match.group(0)) # The entire match
print(match.group(1)) # The first captured group
print(match.group(2)) # The second captured group, etc.
Example: Extract email addresses
text = “Contact us at [email protected] or [email protected].”
email_regex = re.compile(r”[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}”)
emails = email_regex.findall(text) # [“[email protected]”, “[email protected]”]
“`
7.4.3 Java
“`java
import java.util.regex.Matcher;
import java.util.regex.Pattern;
// Compile a regex:
Pattern pattern = Pattern.compile(“pattern”, flags);
// Create a Matcher object:
Matcher matcher = pattern.matcher(string);
// Test if a string matches:
boolean matches = matcher.matches(); // Matches the entire string
boolean found = matcher.find(); // Finds the next match
// Access captured groups:
if (found) {
System.out.println(matcher.group(0)); // The entire match
System.out.println(matcher.group(1)); // The first captured group
// …
}
// Replace matches:
String newString = matcher.replaceAll(“replacement”);
//Example
String text = “My phone number is 123-456-7890.”;
Pattern phonePattern = Pattern.compile(“\d{3}-\d{3}-\d{4}”);
Matcher phoneMatcher = phonePattern.matcher(text);
if (phoneMatcher.find()) {
System.out.println(“Phone number found: ” + phoneMatcher.group(0));
}
“`
7.4.4 C#
“`csharp
using System.Text.RegularExpressions;
// Create a Regex object:
Regex regex = new Regex(“pattern”, options);
// Test if a string matches:
bool matches = regex.IsMatch(string);
// Find matches:
Match match = regex.Match(string); // Finds the first match
MatchCollection matches = regex.Matches(string); // Finds all matches
// Access captured groups:
if (match.Success) {
Console.WriteLine(match.Value); // The entire match
Console.WriteLine(match.Groups[1].Value); // The first captured group
// …
}
// Replace matches:
string newString = regex.Replace(string, “replacement”);
//Example
string input = “The price is $19.99.”;
Regex priceRegex = new Regex(@”\$(\d+.\d{2})”);
Match priceMatch = priceRegex.Match(input);
if (priceMatch.Success)
{
Console.WriteLine(“Price: ” + priceMatch.Groups[1].Value); // Output: 19.99
}
“`
7.4.5 PHP
“`php
“`
7.4.6 Perl
Perl has extremely strong built-in support for Regular Expressions.
“`perl
Test if a string matches
if ($string =~ /pattern/flags) {
# …
}
Find matches (implicit assignment to special variables)
if ($string =~ /(pattern)/flags) {
print $&; # The entire match
print $1; # The first captured group
# …
}
Replace matches
$string =~ s/pattern/replacement/flags;
Example
$text = “The date is 2024-03-15.”;
if ($text =~ /(\d{4})-(\d{2})-(\d{2})/) {
print “Year: $1, Month: $2, Day: $3\n”;
}
“`
7.4.7 Ruby
“`ruby
Test if a string matches
if string =~ /