Table of Contents
- Basics of Regular Expressions in Bash
- Tools Supporting Regex in Bash
- Advanced Regex Patterns
- Practical Examples
- Tips and Best Practices
- Common Pitfalls
- Conclusion
- References
Basics of Regular Expressions in Bash
Regex in Bash is built on POSIX standards, with two main flavors:
- Basic Regular Expressions (BRE): Supported by tools like
grep(default) andsed(default). Some metacharacters (e.g.,+,?,|) require escaping with\. - Extended Regular Expressions (ERE): Supported by
grep -E,sed -E, and Bash’s[[ ... =~ ... ]]conditional construct. Metacharacters work without escaping.
Below are fundamental regex components with examples:
1. Literals
Match exact characters.
- Example:
hellomatches the string “hello”.
2. Metacharacters
Special characters with reserved meanings:
| Metacharacter | Description | Example | Matches |
|---|---|---|---|
. | Matches any single character (except newline) | h.t | ”hat”, “hot”, “h1t”, “h@t” |
* | Matches 0+ occurrences of the preceding element | ho* | ”h”, “ho”, “hoo”, “hooo” |
+ | Matches 1+ occurrences (ERE only) | ho+ (with grep -E) | “ho”, “hoo”, “hooo” (not “h”) |
? | Matches 0 or 1 occurrence (ERE only) | colou?r | ”color”, “colour” |
^ | Anchors match to the start of a line | ^Error | Lines starting with “Error” |
$ | Anchors match to the end of a line | done$ | Lines ending with “done” |
[ ] | Character class: matches any one character inside | [Hh]ello | ”Hello”, “hello” |
[^ ] | Negated character class: matches any character not inside | [^0-9] | Any non-digit character |
| ` | ` | Alternation (OR logic, ERE only) | `cat |
( ) | Grouping (ERE only) | (hello)+ | ”hello”, “hellohello” |
3. Character Classes
Predefined classes for common patterns (POSIX syntax):
| Class | Description | Equivalent Regex |
|---|---|---|
[:alpha:] | Letters (a-z, A-Z) | [a-zA-Z] |
[:digit:] | Digits (0-9) | [0-9] |
[:alnum:] | Alphanumeric (letters + digits) | [a-zA-Z0-9] |
[:space:] | Whitespace (space, tab, newline) | [ \t\n\r\f\v] |
[:lower:] | Lowercase letters | [a-z] |
[:upper:] | Uppercase letters | [A-Z] |
Tools Supporting Regex in Bash
Bash scripting relies on several tools to implement regex. Here’s how to use the most common ones:
1. grep: Search for Patterns in Text
grep searches input files or stdin for lines matching a regex pattern.
-
Basic grep (BRE): Use with default syntax (escape
+,?,|with\).# Match lines with "error" or "warning" (BRE: escape |) grep 'error\|warning' app.log -
Extended grep (ERE): Use
-Eflag for unescaped metacharacters.# Match lines with "error" or "warning" (ERE: no escape needed for |) grep -E 'error|warning' app.log -
Inverse match (
-v): Exclude lines matching the pattern.# Show lines NOT containing "debug" grep -v 'debug' app.log -
Count matches (
-c):# Count lines with "error" grep -c 'error' app.log
2. sed: Stream Editor for Text Transformation
sed uses regex to edit text streams (substitute, delete, insert).
-
Substitute (
s/pattern/replacement/):# Replace "old" with "new" in a file (ERE with -E) sed -E 's/old/new/g' file.txt # -g: global replace (all occurrences, not just first) -
Delete lines (
d):# Delete lines containing "temp" sed '/temp/d' file.txt
3. awk: Pattern-Driven Text Processing
awk uses regex patterns to trigger actions (print, calculate, etc.).
- Basic pattern-action:
# Print lines where the 3rd field (CSV) is "active" awk -F ',' '$3 ~ /active/ {print $1, $2}' data.csv # ~: regex match operator # -F ',': set field separator to comma
4. Bash Conditional [[ ... =~ ... ]]
Check if a string matches a regex using Bash’s [[ ]] construct (supports ERE).
str="hello123"
if [[ $str =~ ^[a-z]+[0-9]+$ ]]; then
echo "String matches: letters followed by numbers"
fi
# Output: "String matches: letters followed by numbers"
Advanced Regex Patterns
1. Quantifiers
Specify the number of occurrences with {n}, {n,}, {n,m} (ERE only):
-
{n}: Exactlyntimes.
Example:\b\d{3}\bmatches 3-digit words (e.g., “123”). -
{n,}: At leastntimes.
Example:a{2,}matches “aa”, “aaa”, etc. -
{n,m}: Betweennandmtimes.
Example:\d{1,3}matches 1-3 digits (e.g., “5”, “42”, “123”).
2. Word Boundaries
\b (PCRE only, via grep -P or awk --re-interval) matches positions between a word character ([a-zA-Z0-9_]) and a non-word character.
Example:
# Match "cat" as a standalone word (not "category" or "scat")
grep -P '\bcat\b' file.txt
3. Backreferences
Refer to previously matched groups with \n (n = group number). Useful for repeated patterns.
Example with sed (BRE: use \( \) for groups):
# Swap first and last names (e.g., "Doe, John" → "John Doe")
sed 's/\([A-Za-z]+\), \([A-Za-z]+\)/\2 \1/' names.txt
4. Negation
[^...] matches any character not in the set.
Example:
# Match lines with no digits
grep -E '^[^0-9]+$' file.txt
Practical Examples
Example 1: Validate Email Addresses
Check if a string is a basic email (simplified regex):
email="[email protected]"
if [[ $email =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ ]]; then
echo "Valid email: $email"
else
echo "Invalid email: $email"
fi
Example 2: Extract IP Addresses from Logs
Use grep -Eo to extract IPs (e.g., from Apache logs):
# Extract all IPv4 addresses (simplified regex)
grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' access.log
Example 3: Count Failed SSH Logins
Parse /var/log/auth.log to count failed SSH attempts:
# Count lines with "Failed password"
grep -c 'Failed password' /var/log/auth.log
Example 4: Replace Text in Multiple Files
Use sed with find to replace text across files:
# Replace "old-url" with "new-url" in all .html files
find . -name "*.html" -exec sed -i 's/old-url/new-url/g' {} +
Example 5: Check if a String Is a Number
Validate that a string is an integer (positive/negative):
num="-123"
if [[ $num =~ ^-?[0-9]+$ ]]; then
echo "$num is a valid integer"
else
echo "$num is not an integer"
fi
Tips and Best Practices
-
Escape Metacharacters When Needed
In BRE (e.g., basicgrep), escape+,?,|,(,)with\. Use ERE (grep -E,sed -E) for cleaner syntax. -
Use Single Quotes for Regex
Prevent shell expansion of metacharacters like*or$:grep '^[0-9]' file.txt # Good (single quotes) grep ^[0-9] file.txt # Bad (shell may expand [0-9] as a glob) -
Test Regex with
grep -Eor Online Tools
Validate patterns incrementally:echo "test123" | grep -E '^[a-z]+[0-9]+$' # Test in isolation -
Prefer Specific Tools
- Use
grepfor simple searches. - Use
sedfor substitutions/deletions. - Use
awkfor complex field-based logic.
- Use
Common Pitfalls
- Confusing BRE and ERE: Forgetting to use
grep -Eorsed -Eleads to unexpected behavior (e.g.,+not matching). - Overlooking Case Sensitivity: By default, regex is case-sensitive. Use
grep -i,sed -i, or[[ ... =~ [A-Za-z] ]]for case-insensitive matches. - Backreferences in
[[ ]]: Bash’s[[ ... =~ ... ]]does not support backreferences (usesedorawkinstead). - Greedy Quantifiers:
.*matches as much as possible. Use.*?(non-greedy) with PCRE (grep -P) for minimal matches.
Conclusion
Regular expressions are a cornerstone of text processing in Bash scripting. By mastering regex, you can automate complex tasks like log analysis, data validation, and text transformation with precision. Start small—experiment with grep and sed—and gradually tackle advanced patterns. With practice, regex will become an indispensable tool in your Bash toolkit.