Java Regex: A Comprehensive Guide to Syntax and Examples
Regular expressions (regex or regexp) are powerful tools for pattern matching and manipulation of text. In Java, the java.util.regex
package provides comprehensive support for working with regular expressions, allowing developers to perform tasks like validation, searching, replacing, and extracting specific parts of strings. This article offers a deep dive into Java regex, covering its syntax, usage, and providing numerous practical examples.
1. Basic Syntax and Metacharacters:
Regular expressions are essentially patterns composed of literal characters and metacharacters, which have special meanings. Understanding these metacharacters is crucial for constructing effective regex patterns.
-
Literal Characters: Most characters in a regex pattern match themselves literally. For example, the pattern “hello” matches the string “hello”.
-
.
(Dot): Matches any single character except a newline.a.b
matches “aab”, “acb”, “a1b”, etc. -
^
(Caret): Matches the beginning of a line or string.^Hello
matches “Hello World” but not “World Hello”. -
$
(Dollar): Matches the end of a line or string.World$
matches “Hello World” but not “World Hello”. -
*
(Asterisk): Matches zero or more occurrences of the preceding character or group.a*
matches “”, “a”, “aa”, “aaa”, etc. -
+
(Plus): Matches one or more occurrences of the preceding character or group.a+
matches “a”, “aa”, “aaa”, etc., but not “”. -
?
(Question Mark): Matches zero or one occurrence of the preceding character or group.colou?r
matches both “color” and “colour”. -
{n}
(Braces): Matches exactly n occurrences of the preceding character or group.a{3}
matches “aaa”. -
{n,}
(Braces): Matches n or more occurrences of the preceding character or group.a{2,}
matches “aa”, “aaa”, “aaaa”, etc. -
{n,m}
(Braces): Matches at least n and at most m occurrences of the preceding character or group.a{2,4}
matches “aa”, “aaa”, and “aaaa”. -
[]
(Character Class): Matches any single character within the brackets.[abc]
matches “a”, “b”, or “c”. -
[^]
(Negated Character Class): Matches any single character not within the brackets.[^abc]
matches any character except “a”, “b”, or “c”. -
-
(Hyphen): Inside a character class, specifies a range of characters.[a-z]
matches any lowercase letter. -
|
(Pipe): Acts as an OR operator.cat|dog
matches either “cat” or “dog”. -
()
(Parentheses): Creates a capturing group, allowing you to extract specific parts of the matched string.(abc)
matches “abc” and captures it as a group. -
\d
(Digit): Matches any digit (equivalent to[0-9]
). -
\D
(Non-Digit): Matches any non-digit character (equivalent to[^0-9]
). -
\s
(Whitespace): Matches any whitespace character (space, tab, newline, etc.). -
\S
(Non-Whitespace): Matches any non-whitespace character. -
\w
(Word Character): Matches any word character (letters, digits, and underscore). -
\W
(Non-Word Character): Matches any non-word character. -
\b
(Word Boundary): Matches a word boundary (the position between a word character and a non-word character). -
\B
(Non-Word Boundary): Matches a non-word boundary. -
Escape Sequences: Special characters like
.
,*
,+
,?
,[
,]
,(
,)
,{
,}
,|
,^
,$
,\
, can be matched literally by preceding them with a backslash (\
). For example,\.
matches a literal dot.
2. Java Regex API:
The java.util.regex
package provides the following core classes:
Pattern
: Represents a compiled regular expression.Matcher
: Used to perform matching operations on a string using aPattern
.
3. Creating a Pattern:
“`java
import java.util.regex.Pattern;
public class RegexExample {
public static void main(String[] args) {
String regex = “[a-zA-Z]+\d+”; // Matches one or more letters followed by one or more digits.
Pattern pattern = Pattern.compile(regex);
}
}
“`
4. Creating a Matcher:
“`java
import java.util.regex.Matcher;
import java.util.regex.Pattern;
// … (previous code) …
Matcher matcher = pattern.matcher(“Hello123”);
“`
5. Matching Methods:
matches()
: Returns true if the entire input string matches the pattern.
java
boolean isMatch = matcher.matches(); // true for "Hello123", false for "123Hello"
find()
: Attempts to find the next subsequence of the input sequence that matches the pattern.
java
while (matcher.find()) {
System.out.println(matcher.group()); // Prints the matched subsequence
}
lookingAt()
: Attempts to match the beginning of the input sequence against the pattern.
java
boolean isBeginningMatch = matcher.lookingAt(); // true for "Hello123", false for "123Hello"
6. Capturing Groups and Backreferences:
Parentheses ()
create capturing groups. You can access the captured groups using group(int groupNum)
. group(0)
returns the entire matched sequence, group(1)
returns the first captured group, and so on.
“`java
String regex = “(\d+)-(\d+)-(\d+)”; // Matches dates in YYYY-MM-DD format
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(“2023-10-27”);
if (matcher.matches()) {
System.out.println(“Year: ” + matcher.group(1)); // Prints “2023”
System.out.println(“Month: ” + matcher.group(2)); // Prints “10”
System.out.println(“Day: ” + matcher.group(3)); // Prints “27”
}
String text = “Hello Hello”;
String regexBackreference = “(\w+) \1”; // Matches repeated words
Pattern patternBackreference = Pattern.compile(regexBackreference);
Matcher matcherBackreference = patternBackreference.matcher(text);
if (matcherBackreference.find()) {
System.out.println(matcherBackreference.group()); // Prints “Hello Hello”
}
“`
7. Replacing and Splitting:
replaceAll(String replacement)
: Replaces all occurrences of the matched pattern with the specified replacement string.
java
String replacedString = matcher.replaceAll("XXX");
replaceFirst(String replacement)
: Replaces the first occurrence of the matched pattern with the specified replacement string.
java
String replacedFirst = matcher.replaceFirst("YYY");
split(String regex)
: Splits the string around matches of the given regular expression.
java
String text = "apple,banana,orange";
String[] fruits = text.split(","); // Splits the string into an array of strings
8. Flags:
Flags modify the behavior of the regex engine. Common flags include:
CASE_INSENSITIVE (i)
: Performs case-insensitive matching.MULTILINE (m)
: Makes^
and$
match the beginning and end of each line instead of the entire string.DOTALL (s)
: Allows.
to match any character, including newline.
9. Examples:
- Validating an Email Address:
java
String emailRegex = "^[A-Za-z0-9+_.-]+@(.+)$";
- Extracting Numbers from a String:
“`java
String text = “There are 10 apples and 20 oranges.”;
String numberRegex = “\d+”;
Pattern pattern = Pattern.compile(numberRegex);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group()); // Prints “10” and “20”
}
“`
- Removing Whitespace from a String:
java
String textWithWhitespace = " This string has extra whitespace. ";
String trimmedString = textWithWhitespace.replaceAll("\\s+", " ").trim();
- Validating a Phone Number:
java
String phoneRegex = "^\\(?(\\d{3})\\)?[- ]?(\\d{3})[- ]?(\\d{4})$";
This comprehensive guide provides a solid foundation for understanding and using Java’s regular expression capabilities. Remember to consult the official Java documentation for the java.util.regex
package for further details and explore online resources for testing and refining your regex patterns. With practice and exploration, you can harness the power of regular expressions to effectively solve various text processing challenges in your Java applications.