funwithlinux blog

How to Recursively List All Files in a Directory Including Symlink Directories (Fixing `find` and `ls -R` Limitations)

Symlinks (symbolic links) are powerful tools in Unix-like systems, allowing you to create references to files or directories elsewhere on your system. They’re commonly used to organize projects, link shared resources (e.g., configuration files, assets), or access external drives. However, when you try to recursively list files in a directory containing symlink directories, default behavior of tools like ls -R and find can be frustrating: they either ignore symlink directories entirely or list them without descending into their target contents.

This blog post will demystify why ls -R and find fail to include symlink directories by default, explore practical solutions to fix this, and provide advanced tips to avoid common pitfalls like symlink loops. By the end, you’ll be able to recursively list all files—including those in symlink directories—with confidence.

2025-11

Table of Contents#

  1. Understanding the Problem: Limitations of ls -R and find
  2. Why Follow Symlink Directories? Real-World Use Cases
  3. Solutions to Recursively List Files Including Symlink Directories
  4. Creating a Custom Script for Advanced Control
  5. Avoiding Pitfalls: Symlink Loops and Performance Considerations
  6. Conclusion
  7. References

Understanding the Problem: Limitations of ls -R and find#

Before diving into solutions, let’s clarify why standard tools fail to include symlink directories recursively.

The ls -R command lists files and directories recursively, but by default, it does not descend into symlink directories. Instead, it treats symlink directories as regular entries, listing their names but not their contents.

Example:
Suppose you have this directory structure:

project/
├── docs/
   └── guide.txt
└── assets -> ../shared-assets/  # Symlink to an external directory
    # ../shared-assets/ contains:
    # ├── images/
    # │   └── logo.png
    # └── styles.css

Running ls -R project outputs:

project/:
assets  docs

project/assets:  # Symlink listed, but contents not shown
project/docs:
guide.txt

Notice assets is listed, but images/ and styles.css in shared-assets are missing.

The find command is designed to traverse directory trees, but it also avoids symlink directories by default to prevent accidental loops (e.g., a symlink pointing back to a parent directory). By default, find uses the -P flag (no dereferencing), meaning it treats symlink directories as files and does not descend into them.

Example:
Using the same project/ structure above, find project outputs:

project
project/docs
project/docs/guide.txt
project/assets  # Symlink listed, but no contents

Again, the contents of shared-assets (target of assets) are missing.

You might wonder: “Why would I need to list files through symlink directories?” Here are common scenarios:

  • Project Organization: Symlinks often link to shared resources (e.g., a company-wide assets folder) to avoid duplication.
  • External Storage: Symlinks can point to mounted drives (e.g., /mnt/external-drive/backups) that need inclusion in file audits.
  • Legacy Systems: Older projects may use symlinks to restructure directories without moving files.
  • Backup Scripts: You may need to archive all files, including those referenced via symlinks.

Let’s explore tools and workarounds to include symlink directories in recursive listings.

The find command has a built-in solution: the -L (dereference) flag. When used, find -L dereferences symlinks, treating them as their target files/directories. This means it will descend into symlink directories and list their contents.

How It Works:#

  • -L tells find to dereference all symlinks encountered during traversal.
  • Symlink files (not directories) are listed as their target paths.
  • Symlink directories are treated as regular directories, and find recurses into them.

Example:#

Using the project/ structure above, run:

find -L project  # `-L` enables symlink dereferencing

Output:

project
project/docs
project/docs/guide.txt
project/assets  # Symlink dereferenced to ../shared-assets
project/assets/images
project/assets/images/logo.png
project/assets/styles.css

Now assets’s contents (from shared-assets) are included!

Key Flags to Pair with -L:#

  • -type f: List only files (exclude directories).
    find -L project -type f  # Lists all files, including symlink directory contents
  • -name "*.txt": Filter by filename pattern.
    find -L project -name "*.txt"  # Finds guide.txt (and any .txt in shared-assets)

If you prefer ls for its human-readable output, use ls -LR. The -L flag dereferences symlinks, and -R enables recursion—combined, ls -LR follows symlink directories.

How It Works:#

  • -L: Dereferences symlinks (treats them as their targets).
  • -R: Recursively lists subdirectories.

Example:#

Using the project/ structure:

ls -LR project  # `-L` dereferences, `-R` recurses

Output:

project/:
assets  docs

project/assets:  # Now shows contents of ../shared-assets
images  styles.css

project/assets/images:
logo.png

project/docs:
guide.txt

Perfect! ls -LR includes the symlink directory’s contents.

Creating a Custom Script for Advanced Control#

For complex needs (e.g., avoiding loops, custom formatting, or filtering), a bash script gives you granular control. Below is a script that:

  • Follows symlink directories.
  • Skips symlink loops (e.g., a/b -> a).
  • Outputs full paths for clarity.

Custom Script: list-all-files.sh#

#!/bin/bash
 
# Recursively list all files, including symlink directories, with loop detection
# Usage: ./list-all-files.sh <directory>
 
declare -A visited_inodes=()  # Track visited directories by inode to avoid loops
 
process_dir() {
  local dir="$1"
 
  # Skip non-directory paths
  if [[ ! -d "$dir" ]]; then
    echo "$dir"  # List files/non-dir symlinks
    return
  fi
 
  # Get inode of the dereferenced directory (unique identifier)
  local inode=$(stat -c %i "$dir")
 
  # Skip if we’ve already processed this directory (loop detected)
  if [[ -n "${visited_inodes[$inode]}" ]]; then
    echo "⚠️  Skipping symlink loop: $dir (inode $inode already visited)" >&2
    return
  fi
  visited_inodes[$inode]=1  # Mark inode as visited
 
  # List all items in the directory
  for item in "$dir"/*; do
    if [[ -L "$item" ]]; then
      # Handle symlinks: dereference and process target
      local target=$(readlink -f "$item")  # Get absolute path of target
      echo "🔗 $item -> $target"  # Optional: Show symlink relationship
      process_dir "$target"  # Recurse into target
    elif [[ -d "$item" ]]; then
      # Handle regular directories: recurse
      process_dir "$item"
    else
      # Handle regular files: list
      echo "$item"
    fi
  done
}
 
# Validate input
if [[ $# -ne 1 || ! -d "$1" ]]; then
  echo "Usage: $0 <directory>"
  exit 1
fi
 
process_dir "$1"

How It Works:#

  1. Loop Detection: Uses an associative array (visited_inodes) to track directories by their inode (a unique filesystem identifier). If a directory’s inode is re-encountered, it’s a loop, and the script skips it.
  2. Symlink Handling: Uses readlink -f to resolve symlinks to their absolute target paths.
  3. Clarity: Prints symlink relationships (e.g., 🔗 project/assets -> /path/to/shared-assets) for transparency.

Usage:#

chmod +x list-all-files.sh
./list-all-files.sh project/

Sample Output:

project
project/docs
project/docs/guide.txt
🔗 project/assets -> /home/user/shared-assets
/home/user/shared-assets
/home/user/shared-assets/images
/home/user/shared-assets/images/logo.png
/home/user/shared-assets/styles.css

While following symlinks is powerful, it comes with risks like infinite loops and performance hits. Here’s how to mitigate them.

A symlink loop occurs when a symlink points to a parent directory (e.g., a/b -> a), causing recursive tools to loop indefinitely.

Example of a Loop:#

mkdir -p looptest/a/b
ln -s ../.. looptest/a/b/c  # c -> looptest/ (points back to root)

Running find -L looptest would spiral into looptest/a/b/c/a/b/c/a/b/c/... until it crashes.

How to Avoid Loops:#

  • Use the Custom Script: The list-all-files.sh script above tracks inodes to skip loops.
  • find -maxdepth: Limit recursion depth (e.g., find -L looptest -maxdepth 5), but this is rigid.
  • Manual Checks: Use readlink -f to inspect symlink targets before traversal:
    readlink -f looptest/a/b/c  # Output: /path/to/looptest (reveals the loop)

Performance Tips for Large Directories#

Following many symlinks (especially to networked or slow storage) can slow down listings. Optimize with these tips:

  • Filter Early: Use find -L <dir> -type f -name "*.txt" to limit results before traversal.
  • Avoid Network Symlinks: Temporarily exclude symlinks to NFS/SMB shares if speed is critical.
  • Parallelize (Advanced): Use xargs or parallel with find for large datasets (e.g., find -L . -type f | xargs -n 100 echo).

Conclusion#

Recursively listing files through symlink directories doesn’t have to be a headache. Use:

  • find -L <dir> for simple, powerful listings with pattern matching.
  • ls -LR <dir> for human-readable output with symlink dereferencing.
  • Custom Scripts (like list-all-files.sh) for loop detection and advanced control.

Always watch for symlink loops and test performance with large directories. With these tools, you’ll never miss a file hidden behind a symlink again!

References#