How to Efficiently Find Files from a List Under a Directory Using Shell (Faster Than 1000 Find Commands)
Have you ever needed to locate multiple specific files scattered across a directory (or its subdirectories) using a pre-defined list of filenames? Maybe you’re auditing logs, verifying backups, or processing a batch of files—and you have a list.txt with hundreds or thousands of filenames to check.
The naive approach might be to loop through each filename in list.txt and run find for every entry (e.g., for file in $(cat list.txt); do find /target -name "$file"; done). But if your list has 1000 entries, this runs find 1000 times—each time traversing the entire directory tree from scratch. This is slow, inefficient, and unnecessary.
In this blog, we’ll explore faster methods that traverse the directory once and then check against your list of filenames. These approaches are orders of magnitude faster, especially for large lists or deep directory structures.
Let’s start by understanding why the naive loop is so inefficient. Suppose you have a list of 1000 filenames in list.txt, and you want to find them under /data/docs. The naive script might look like this:
# Naive (slow) approach: Loop through list and run `find` for each filewhile IFS= read -r filename; do find /data/docs -name "$filename"done < list.txt
Redundant Directory Traversal: find scans the entire directory tree for every filename in the list. If /data/docs has 100,000 files and 10 subdirectories, find will traverse all 100k files 1000 times—resulting in 100 million file checks!
Overhead of Multiple Processes: Each find invocation spawns a new process, adding CPU/memory overhead.
I/O Bottlenecks: Repeatedly accessing the same directory metadata (inodes, etc.) leads to unnecessary disk I/O, slowing things down further.
For large lists, this approach can take minutes or even hours. The solution? Traverse the directory once, then check all filenames against the list in a single pass.
The key insight is to generate a list of all files under the target directory in one go, then filter this list against your input list.txt. This reduces directory traversal from O(N) (N = number of files in list.txt) to O(1).
fd (aka fdfind) is a modern replacement for find optimized for speed (parallel traversal, ignore patterns, etc.). It’s not preinstalled, but you can install it via apt install fd-find (Debian/Ubuntu) or brew install fd (macOS).
Faster Traversal: fd uses parallelism and skips hidden files/directories by default (add --no-ignore to include them).
Simpler Syntax: More intuitive flags than find (e.g., -t f for files, -t d for directories).
For large directory trees, fd can be 2–10x faster than find, making this method ideal for performance-critical tasks.
Method 3: Sorted Lists with comm (For Very Large Datasets)#
If list.txt has 10,000+ filenames, grep may become slow (it scans the entire find output for each pattern). A faster alternative is to use comm, which compares two sorted files and outputs common lines.
# Step 1: Sort the input list (handle duplicates with `sort -u`)sort -u list.txt > sorted_list.txt# Step 2: Generate and sort all files under the target directoryfind /data/docs -type f | sort > sorted_all_files.txt# Step 3: Find common lines with `comm`comm -12 sorted_all_files.txt sorted_list.txt
sort -u list.txt: Sorts list.txt and removes duplicates (-u).
find ... | sort: Sorts the list of all files to match sorted_list.txt.
comm -12 file1 file2: Outputs lines present in both file1 and file2 (-1 hides lines only in file1, -2 hides lines only in file2).
This method is efficient for very large datasets because sorting is O(n log n) and comm runs in O(n) time, making it faster than grep for 10k+ filenames.
Match Filenames Only (Not Paths): If list.txt has just filenames (e.g., data.csv), but you want to ignore the path, use find -printf "%f\n" to output only the basename:
Filenames can contain newlines (though rare), which break line-based tools like grep or comm. To handle this, use null-terminated lines (replace newlines with \0):
# For `find` + `grep`:find /data/docs -type f -print0 | grep -F -x -z -f <(tr '\n' '\0' < list.txt)
-print0: find outputs null-terminated lines.
-z: grep reads null-terminated lines.
tr '\n' '\0' < list.txt: Converts list.txt newlines to nulls.
Looping through find for each filename in a list is inefficient and slow. Instead, traverse the directory once with find or fd, then filter the results against your list using grep or comm. This reduces runtime from O(N) to O(1) for directory traversal, making the process orders of magnitude faster.
Use find + grep for universal compatibility.
Use fd + grep for maximum speed (if fd is installed).
Use comm for very large lists (10k+ filenames).
With these methods, you’ll turn a tedious task into a quick one-liner!