funwithlinux guide

A Guide to Using Regular Expressions in Bash Scripts

Regular expressions (regex) are powerful patterns used to match, search, and manipulate text. In Bash scripting, regex enables you to handle complex text-processing tasks—from validating input to parsing logs, extracting data, and automating repetitive edits. Whether you’re a system administrator, developer, or DevOps engineer, mastering regex in Bash can significantly boost your productivity. This guide will walk you through the fundamentals of regex in Bash, explore tools that support regex, dive into advanced patterns, and provide practical examples to help you apply these concepts in real-world scenarios. By the end, you’ll be equipped to write robust Bash scripts that leverage regex for efficient text manipulation.

Table of Contents

  1. Basics of Regular Expressions in Bash
  2. Tools Supporting Regex in Bash
  3. Advanced Regex Patterns
  4. Practical Examples
  5. Tips and Best Practices
  6. Common Pitfalls
  7. Conclusion
  8. References

Basics of Regular Expressions in Bash

Regex in Bash is built on POSIX standards, with two main flavors:

  • Basic Regular Expressions (BRE): Supported by tools like grep (default) and sed (default). Some metacharacters (e.g., +, ?, |) require escaping with \.
  • Extended Regular Expressions (ERE): Supported by grep -E, sed -E, and Bash’s [[ ... =~ ... ]] conditional construct. Metacharacters work without escaping.

Below are fundamental regex components with examples:

1. Literals

Match exact characters.

  • Example: hello matches the string “hello”.

2. Metacharacters

Special characters with reserved meanings:

MetacharacterDescriptionExampleMatches
.Matches any single character (except newline)h.t”hat”, “hot”, “h1t”, “h@t”
*Matches 0+ occurrences of the preceding elementho*”h”, “ho”, “hoo”, “hooo”
+Matches 1+ occurrences (ERE only)ho+ (with grep -E)“ho”, “hoo”, “hooo” (not “h”)
?Matches 0 or 1 occurrence (ERE only)colou?r”color”, “colour”
^Anchors match to the start of a line^ErrorLines starting with “Error”
$Anchors match to the end of a linedone$Lines ending with “done”
[ ]Character class: matches any one character inside[Hh]ello”Hello”, “hello”
[^ ]Negated character class: matches any character not inside[^0-9]Any non-digit character
``Alternation (OR logic, ERE only)`cat
( )Grouping (ERE only)(hello)+”hello”, “hellohello”

3. Character Classes

Predefined classes for common patterns (POSIX syntax):

ClassDescriptionEquivalent Regex
[:alpha:]Letters (a-z, A-Z)[a-zA-Z]
[:digit:]Digits (0-9)[0-9]
[:alnum:]Alphanumeric (letters + digits)[a-zA-Z0-9]
[:space:]Whitespace (space, tab, newline)[ \t\n\r\f\v]
[:lower:]Lowercase letters[a-z]
[:upper:]Uppercase letters[A-Z]

Tools Supporting Regex in Bash

Bash scripting relies on several tools to implement regex. Here’s how to use the most common ones:

1. grep: Search for Patterns in Text

grep searches input files or stdin for lines matching a regex pattern.

  • Basic grep (BRE): Use with default syntax (escape +, ?, | with \).

    # Match lines with "error" or "warning" (BRE: escape |)  
    grep 'error\|warning' app.log  
  • Extended grep (ERE): Use -E flag for unescaped metacharacters.

    # Match lines with "error" or "warning" (ERE: no escape needed for |)  
    grep -E 'error|warning' app.log  
  • Inverse match (-v): Exclude lines matching the pattern.

    # Show lines NOT containing "debug"  
    grep -v 'debug' app.log  
  • Count matches (-c):

    # Count lines with "error"  
    grep -c 'error' app.log  

2. sed: Stream Editor for Text Transformation

sed uses regex to edit text streams (substitute, delete, insert).

  • Substitute (s/pattern/replacement/):

    # Replace "old" with "new" in a file (ERE with -E)  
    sed -E 's/old/new/g' file.txt  
    # -g: global replace (all occurrences, not just first)  
  • Delete lines (d):

    # Delete lines containing "temp"  
    sed '/temp/d' file.txt  

3. awk: Pattern-Driven Text Processing

awk uses regex patterns to trigger actions (print, calculate, etc.).

  • Basic pattern-action:
    # Print lines where the 3rd field (CSV) is "active"  
    awk -F ',' '$3 ~ /active/ {print $1, $2}' data.csv  
    # ~: regex match operator  
    # -F ',': set field separator to comma  

4. Bash Conditional [[ ... =~ ... ]]

Check if a string matches a regex using Bash’s [[ ]] construct (supports ERE).

str="hello123"  
if [[ $str =~ ^[a-z]+[0-9]+$ ]]; then  
  echo "String matches: letters followed by numbers"  
fi  
# Output: "String matches: letters followed by numbers"  

Advanced Regex Patterns

1. Quantifiers

Specify the number of occurrences with {n}, {n,}, {n,m} (ERE only):

  • {n}: Exactly n times.
    Example: \b\d{3}\b matches 3-digit words (e.g., “123”).

  • {n,}: At least n times.
    Example: a{2,} matches “aa”, “aaa”, etc.

  • {n,m}: Between n and m times.
    Example: \d{1,3} matches 1-3 digits (e.g., “5”, “42”, “123”).

2. Word Boundaries

\b (PCRE only, via grep -P or awk --re-interval) matches positions between a word character ([a-zA-Z0-9_]) and a non-word character.

Example:

# Match "cat" as a standalone word (not "category" or "scat")  
grep -P '\bcat\b' file.txt  

3. Backreferences

Refer to previously matched groups with \n (n = group number). Useful for repeated patterns.

Example with sed (BRE: use \( \) for groups):

# Swap first and last names (e.g., "Doe, John" → "John Doe")  
sed 's/\([A-Za-z]+\), \([A-Za-z]+\)/\2 \1/' names.txt  

4. Negation

[^...] matches any character not in the set.

Example:

# Match lines with no digits  
grep -E '^[^0-9]+$' file.txt  

Practical Examples

Example 1: Validate Email Addresses

Check if a string is a basic email (simplified regex):

email="[email protected]"  
if [[ $email =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ ]]; then  
  echo "Valid email: $email"  
else  
  echo "Invalid email: $email"  
fi  

Example 2: Extract IP Addresses from Logs

Use grep -Eo to extract IPs (e.g., from Apache logs):

# Extract all IPv4 addresses (simplified regex)  
grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' access.log  

Example 3: Count Failed SSH Logins

Parse /var/log/auth.log to count failed SSH attempts:

# Count lines with "Failed password"  
grep -c 'Failed password' /var/log/auth.log  

Example 4: Replace Text in Multiple Files

Use sed with find to replace text across files:

# Replace "old-url" with "new-url" in all .html files  
find . -name "*.html" -exec sed -i 's/old-url/new-url/g' {} +  

Example 5: Check if a String Is a Number

Validate that a string is an integer (positive/negative):

num="-123"  
if [[ $num =~ ^-?[0-9]+$ ]]; then  
  echo "$num is a valid integer"  
else  
  echo "$num is not an integer"  
fi  

Tips and Best Practices

  1. Escape Metacharacters When Needed
    In BRE (e.g., basic grep), escape +, ?, |, (, ) with \. Use ERE (grep -E, sed -E) for cleaner syntax.

  2. Use Single Quotes for Regex
    Prevent shell expansion of metacharacters like * or $:

    grep '^[0-9]' file.txt  # Good (single quotes)  
    grep ^[0-9] file.txt    # Bad (shell may expand [0-9] as a glob)  
  3. Test Regex with grep -E or Online Tools
    Validate patterns incrementally:

    echo "test123" | grep -E '^[a-z]+[0-9]+$'  # Test in isolation  
  4. Prefer Specific Tools

    • Use grep for simple searches.
    • Use sed for substitutions/deletions.
    • Use awk for complex field-based logic.

Common Pitfalls

  • Confusing BRE and ERE: Forgetting to use grep -E or sed -E leads to unexpected behavior (e.g., + not matching).
  • Overlooking Case Sensitivity: By default, regex is case-sensitive. Use grep -i, sed -i, or [[ ... =~ [A-Za-z] ]] for case-insensitive matches.
  • Backreferences in [[ ]]: Bash’s [[ ... =~ ... ]] does not support backreferences (use sed or awk instead).
  • Greedy Quantifiers: .* matches as much as possible. Use .*? (non-greedy) with PCRE (grep -P) for minimal matches.

Conclusion

Regular expressions are a cornerstone of text processing in Bash scripting. By mastering regex, you can automate complex tasks like log analysis, data validation, and text transformation with precision. Start small—experiment with grep and sed—and gradually tackle advanced patterns. With practice, regex will become an indispensable tool in your Bash toolkit.

References