Java Regex Tutorial with Practical Examples: Mastering the Art of Pattern Matching
Regular expressions (regex or regexp) are powerful tools for pattern matching and manipulation of text. They provide a concise and flexible way to search, extract, and replace strings based on complex patterns. In Java, the java.util.regex
package offers comprehensive support for working with regular expressions, enabling developers to perform a wide range of text processing tasks efficiently. This tutorial will delve into the intricacies of Java regex, covering fundamental concepts, syntax, common use cases, and practical examples.
1. Introduction to Regular Expressions
A regular expression is essentially a pattern described using a specialized syntax. This pattern can then be used to match, search, or manipulate text. Think of it as a template that you can use to identify specific sequences of characters within a larger body of text.
2. The java.util.regex
Package
Java’s regex functionality is primarily provided by three core classes within the java.util.regex
package:
Pattern
: Represents a compiled regular expression. It’s immutable and thread-safe, allowing you to reuse a compiled pattern multiple times.Matcher
: An engine that interprets the compiledPattern
and performs matching operations against an input string.PatternSyntaxException
: Indicates syntax errors within the regular expression pattern.
3. Basic Syntax and Metacharacters
Regex uses a special set of characters called metacharacters to define patterns. These metacharacters have specific meanings within the regex engine:
.
(Dot): Matches any single character except a newline.^
(Caret): Matches the beginning of a line or string.$
(Dollar): Matches the end of a line or string.*
(Asterisk): Matches the preceding element zero or more times.+
(Plus): Matches the preceding element one or more times.?
(Question Mark): Matches the preceding element zero or one time.{n}
: Matches the preceding element exactly n times.{n,}
: Matches the preceding element n or more times.{n,m}
: Matches the preceding element at least n and at most m times.[]
(Character Class): Defines a set of characters to match. For example,[aeiou]
matches any vowel.[^]
(Negated Character Class): Matches any character not within the specified set. For example,[^aeiou]
matches any consonant.|
(OR): Matches either the expression before or after the|
.()
(Grouping): Groups parts of the expression together and allows for backreferencing.\
(Backslash): Escapes special characters, allowing you to match literal characters like.
,*
,+
, etc.
4. Character Classes and Predefined Character Classes
Character classes provide a powerful way to specify sets of characters. Java regex also offers predefined character classes for common character sets:
\d
: Matches any digit (equivalent to[0-9]
).\D
: Matches any non-digit (equivalent to[^0-9]
).\s
: Matches any whitespace character (space, tab, newline).\S
: Matches any non-whitespace character.\w
: Matches any word character (alphanumeric and underscore).\W
: Matches any non-word character.
5. Quantifiers
Quantifiers specify how many times a preceding element should be matched:
*
(Zero or more):a*
matches “”, “a”, “aa”, “aaa”, etc.+
(One or more):a+
matches “a”, “aa”, “aaa”, etc., but not “”.?
(Zero or one):a?
matches “” or “a”.{n}
(Exactly n):a{3}
matches “aaa”.{n,}
(n or more):a{2,}
matches “aa”, “aaa”, “aaaa”, etc.{n,m}
(n to m):a{2,4}
matches “aa”, “aaa”, or “aaaa”.
6. Anchors
Anchors don’t match characters but specify positions within the string:
^
(Beginning of line/string):^Hello
matches “Hello” at the beginning of a line.$
(End of line/string):world$
matches “world” at the end of a line.\b
(Word boundary):\bword\b
matches “word” but not “sword” or “wording”.\B
(Non-word boundary):\Bword\B
matches “sword” or “wording” but not “word”.
7. Groups and Capturing
Parentheses ()
create capturing groups, allowing you to extract specific parts of the matched string. Each group is assigned a number starting from 1.
8. Backreferences
Backreferences allow you to refer to previously captured groups within the same regex. \n
refers to the nth captured group.
9. Lookarounds (Lookahead and Lookbehind)
Lookarounds assert conditions without including the matched portion in the result:
- Positive Lookahead
(?=...)
: Asserts that the following characters match the specified pattern. - Negative Lookahead
(?!...)
: Asserts that the following characters do not match the specified pattern. - Positive Lookbehind
(?<=...)
: Asserts that the preceding characters match the specified pattern. - Negative Lookbehind
(?<!...)
: Asserts that the preceding characters do not match the specified pattern.
10. Flags
Flags modify the behavior of the regex engine. Common flags include:
CASE_INSENSITIVE (i)
: Performs case-insensitive matching.MULTILINE (m)
:^
and$
match the beginning and end of each line.DOTALL (s)
:.
matches any character, including newline.
11. Practical Examples
“`java
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexExamples {
public static void main(String[] args) {
// Example 1: Validating an email address
String email = "[email protected]";
String emailRegex = "^[A-Za-z0-9+_.-]+@(.+)$";
Pattern pattern = Pattern.compile(emailRegex);
Matcher matcher = pattern.matcher(email);
System.out.println("Email is valid: " + matcher.matches());
// Example 2: Extracting phone numbers from a string
String text = "My phone numbers are 123-456-7890 and 987-654-3210.";
String phoneRegex = "\\d{3}-\\d{3}-\\d{4}";
pattern = Pattern.compile(phoneRegex);
matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("Phone number: " + matcher.group());
}
// Example 3: Replacing whitespace with underscores
String stringWithSpaces = "This string has spaces.";
String replacedString = stringWithSpaces.replaceAll("\\s+", "_");
System.out.println("Replaced string: " + replacedString);
// Example 4: Case-insensitive matching
String textCase = "Hello World";
String caseInsensitiveRegex = "hello";
pattern = Pattern.compile(caseInsensitiveRegex, Pattern.CASE_INSENSITIVE);
matcher = pattern.matcher(textCase);
System.out.println("Case-insensitive match found: " + matcher.find());
// Example 5: Using groups and capturing
String textWithGroups = "John Doe (1985)";
String groupRegex = "(\\w+) (\\w+) \\((\\d+)\\)";
pattern = Pattern.compile(groupRegex);
matcher = pattern.matcher(textWithGroups);
if(matcher.find()){
System.out.println("First Name: " + matcher.group(1));
System.out.println("Last Name: " + matcher.group(2));
System.out.println("Year: " + matcher.group(3));
}
// Example 6: Lookahead assertion
String textLookahead = "apple banana orange";
String lookaheadRegex = "banana(?= orange)"; // Matches "banana" only if followed by " orange"
pattern = Pattern.compile(lookaheadRegex);
matcher = pattern.matcher(textLookahead);
System.out.println("Lookahead match found: " + matcher.find());
}
}
“`
This tutorial provides a comprehensive overview of Java regex. By understanding the syntax, metacharacters, and classes provided by the java.util.regex
package, you can leverage the power of regular expressions to perform complex text processing tasks efficiently and effectively. Remember to practice and experiment with different patterns to solidify your understanding and discover new applications of this powerful tool. Online regex testers and debuggers can be invaluable resources for visualizing and refining your regex patterns. As you gain experience, you’ll find regular expressions to be an indispensable asset in your Java development toolkit.