funwithlinux blog

How to Recursively Find Files with Specific Extensions (e.g., .pdf, .jpg) Without Repeating Filenames Using the Find Command

Whether you’re organizing a cluttered folder, backing up important documents, or cleaning up redundant files, the ability to recursively search for files by extension is a critical skill for anyone working in a command-line environment. The find command—available on Linux, macOS, and other Unix-like systems—is a powerful tool for this task. However, a common frustration arises when the same filename (e.g., report.pdf) appears in multiple subdirectories, leading to duplicate entries in your search results.

In this blog, we’ll demystify the find command, teach you how to target specific file extensions (like .pdf or .jpg), and show you how to eliminate duplicate filenames from your results. By the end, you’ll be able to efficiently locate files, avoid redundancy, and handle edge cases like filenames with spaces or special characters.

2025-11

Table of Contents#

  1. Understanding the find Command: Basics
  2. Specifying File Extensions with find
  3. Avoiding Repeating Filenames: The Key Challenge
  4. Step-by-Step: Find Files with Specific Extensions and Unique Filenames
  5. Common Examples
  6. Advanced Tips and Best Practices
  7. Troubleshooting Common Issues
  8. Conclusion
  9. References

1. Understanding the find Command: Basics#

The find command searches for files and directories recursively starting from a specified path. Its basic syntax is:

find [starting-path] [expression]  
  • starting-path: The directory to begin the search (use . for the current directory).
  • expression: Criteria to filter results (e.g., file type, name, size).

By default, find is recursive, meaning it will traverse all subdirectories under starting-path. This makes it ideal for hunting down files buried deep in folder structures.

2. Specifying File Extensions with find#

To target files by extension, use the -name flag with a wildcard pattern (e.g., *.pdf). Here’s how:

Single Extension#

To find all .pdf files starting from the current directory:

find . -type f -name "*.pdf"  
  • .: Start search in the current directory.
  • -type f: Only match regular files (exclude directories, symlinks, etc.).
  • -name "*.pdf": Match filenames ending with .pdf (case-sensitive).

Multiple Extensions#

To search for multiple extensions (e.g., .pdf and .jpg), use the logical OR operator -o and group conditions with parentheses () (escaped with \ to avoid shell interpretation):

find . -type f \( -name "*.pdf" -o -name "*.jpg" \)  
  • -o: Stands for "or" (match .pdf or .jpg).
  • \( and \): Group the conditions to ensure -o applies only to the -name flags.

Case Insensitivity#

By default, -name is case-sensitive (e.g., it won’t match .PDF or .JPG). Use -iname for case-insensitive searches:

find . -type f \( -iname "*.pdf" -o -iname "*.jpg" \)  

3. Avoiding Repeating Filenames: The Key Challenge#

A common problem with find is duplicate filenames. For example, if report.pdf exists in both ./docs/ and ./archive/, the command above will list both paths:

./docs/report.pdf  
./archive/report.pdf  

To only show unique filenames (ignoring their directory paths), we need to extract the filename (basename) and remove duplicates.

4. Step-by-Step: Find Files with Specific Extensions and Unique Filenames#

To get unique filenames, combine find with:

  • basename: Extracts the filename from a full path.
  • sort -u: Sorts results and removes duplicates (-u = unique).

Full Command#

Here’s how to find .pdf and .jpg files, extract their filenames, and show only unique ones:

find . -type f \( -name "*.pdf" -o -name "*.jpg" \) -exec basename {} \; | sort -u  

Breakdown of the Command#

ComponentPurpose
find . -type f ...Recursively search for regular files with .pdf or .jpg extensions.
-exec basename {} \;For each match, run basename to extract the filename (e.g., ./docs/report.pdfreport.pdf).
`sort -u`

Example Output#

If report.pdf exists in two directories, the output will be:

report.pdf  
vacation.jpg  

5. Common Examples#

Example 1: Single Extension with Unique Filenames#

Find all .txt files and show unique filenames:

find ~/Documents -type f -name "*.txt" -exec basename {} \; | sort -u  

Example 2: Exclude Directories#

To exclude specific directories (e.g., node_modules or .git), use -prune:

find . -path "./node_modules" -prune -o -type f \( -name "*.pdf" -o -name "*.jpg" \) -exec basename {} \; | sort -u  
  • -path "./node_modules" -prune: Skip the node_modules directory.

Example 3: Save Results to a File#

Save unique filenames to unique_files.txt:

find . -type f \( -name "*.pdf" -o -name "*.jpg" \) -exec basename {} \; | sort -u > unique_files.txt  

6. Advanced Tips and Best Practices#

Handle Filenames with Spaces/Special Characters#

Filenames with spaces (e.g., my report.pdf) can break basename if not handled. Use -print0 with xargs -0 to safely process them:

find . -type f \( -name "*.pdf" -o -name "*.jpg" \) -print0 | xargs -0 basename | sort -u  
  • -print0: Outputs filenames separated by null bytes (avoids issues with spaces).
  • xargs -0: Reads null-separated input and passes filenames to basename.

Limit Search Depth#

Use -maxdepth N to restrict the search to N levels of subdirectories (e.g., -maxdepth 2 for current dir + 2 subdirs):

find . -maxdepth 2 -type f -name "*.pdf" -exec basename {} \; | sort -u  

Include Full Paths (But Unique Filenames)#

If you want to see full paths but only for unique filenames, use a more advanced pipeline with awk to track duplicates:

find . -type f \( -name "*.pdf" -o -name "*.jpg" \) | awk -F/ '{print $NF "|" $0}' | sort -u -t'|' -k1,1 | cut -d'|' -f2-  
  • This preserves the first occurrence of each unique filename and prints its full path.

7. Troubleshooting Common Issues#

Problem: No Results#

  • Fix: Check the starting path (e.g., use ~/Documents instead of . if files are elsewhere).
  • Ensure extensions are correct (e.g., -name "*.jpeg" won’t match .jpg).

Problem: Duplicates Still Appear#

  • Fix: Ensure you’re using sort -u after basename. If filenames have different cases (e.g., Report.pdf and report.pdf), use sort -fu for case-insensitive uniqueness.

Problem: Filenames with Spaces Break the Command#

8. Conclusion#

The find command is a versatile tool for recursive file searches, and combining it with basename, sort -u, and other utilities lets you efficiently locate files by extension while avoiding duplicate filenames. Whether you’re managing backups, cleaning up storage, or auditing files, these techniques will save you time and reduce clutter in your results.

9. References#