Table of Contents#
- Introduction
- Prerequisites
- Understanding the Task
- Method 1: Using Command-Line Tools (
find+jq) - Method 2: Python Script (For Readability & Control)
- Method 3: Node.js Script (JavaScript Users)
- Troubleshooting Common Issues
- Best Practices
- Conclusion
- References
Prerequisites#
Before starting, ensure you have the following tools installed based on your chosen method:
| Method | Required Tools |
|---|---|
Command-Line (find + jq) | find (preinstalled on Linux/macOS), jq (install via brew install jq or apt install jq) |
| Python Script | Python 3.x (download from python.org) |
| Node.js Script | Node.js (download from nodejs.org) |
Understanding the Task#
"Recursively cat JSON files" means:
- Recursively search through a directory and all its subdirectories.
- Locate all
.jsonfiles (e.g.,data1.json,subdir/data2.json). - Combine their contents into a single JSON file.
Critical note: JSON files cannot be naively "appended" with cat file1.json file2.json > combined.json—this often results in invalid JSON (e.g., {"a":1}{"b":2} is not valid). Instead, we need to merge them into a structured format like an array (e.g., [{"a":1}, {"b":2}]) or a single object (e.g., {"file1": {"a":1}, "file2": {"b":2}}).
Method 1: Using Command-Line Tools (find + jq)#
For quick, terminal-based workflows, find (to locate files) and jq (to process JSON) are powerful.
Step 1: Locate All JSON Files Recursively#
Use find to list all .json files in the current directory and subdirectories:
find . -type f -name "*.json" .: Search starting from the current directory.-type f: Only include files (not directories).-name "*.json": Match files ending with.json.
Example Output:
./data/logs.json
./subdir/config.json
./old/backup.json
Step 2: Combine JSON Files into an Array#
To merge all JSON files into a single array (most common use case), pipe the find results into jq with "slurp" mode (-s), which reads all input into an array:
find . -type f -name "*.json" -print0 | xargs -0 jq -s . > combined.json Breakdown:
-print0+xargs -0: Safely handles filenames with spaces/newlines (avoids errors).jq -s .: "Slurp" all input JSONs into a single array (.outputs the array).> combined.json: Write the result tocombined.json.
Example Input Files:
data1.json: {"name": "Alice", "age": 30}
subdir/data2.json: {"name": "Bob", "age": 25}
Output (combined.json):
[
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25}
] Step 3: Merge JSON Objects (Instead of Arrays)#
If your JSON files are objects (not arrays) and you want to merge them into a single object (e.g., combining configs), use jq -s add (the add function merges objects):
find . -type f -name "*.json" -print0 | xargs -0 jq -s add > merged_objects.json Example Input Files:
config1.json: {"theme": "dark", "font": "Arial"}
config2.json: {"font_size": 14, "notifications": true}
Output (merged_objects.json):
{
"theme": "dark",
"font": "Arial",
"font_size": 14,
"notifications": true
} Method 2: Python Script (For Readability & Control)#
Python is ideal for users who prefer readable code or need custom logic (e.g., filtering files, handling errors).
Step 1: Recursively Find JSON Files#
Use Python’s os.walk to traverse directories and collect .json files:
import os
import json
def find_json_files(root_dir):
json_files = []
for dirpath, _, filenames in os.walk(root_dir):
for filename in filenames:
if filename.endswith(".json"):
json_files.append(os.path.join(dirpath, filename))
return json_files
# Usage: Find all JSON files starting from the current directory
json_files = find_json_files(".")
print(f"Found {len(json_files)} JSON files.") Step 2: Read and Validate JSON Data#
Read each JSON file, load its contents, and handle errors (e.g., invalid JSON):
combined_data = [] # Store all JSON data in an array
for file_path in json_files:
try:
with open(file_path, "r", encoding="utf-8") as f:
data = json.load(f) # Parse JSON
combined_data.append(data)
print(f"Successfully read: {file_path}")
except json.JSONDecodeError:
print(f"⚠️ Skipping invalid JSON: {file_path}")
except Exception as e:
print(f"⚠️ Error reading {file_path}: {str(e)}") Step 3: Write Combined Data to Output File#
Dump the collected data into a single JSON file:
output_file = "combined_python.json"
with open(output_file, "w", encoding="utf-8") as f:
json.dump(combined_data, f, indent=2) # indent=2 for pretty formatting
print(f"✅ Combined JSON saved to: {output_file}") Full Python Script#
import os
import json
def find_json_files(root_dir):
json_files = []
for dirpath, _, filenames in os.walk(root_dir):
for filename in filenames:
if filename.endswith(".json"):
json_files.append(os.path.join(dirpath, filename))
return json_files
def main():
root_dir = "." # Start from current directory
json_files = find_json_files(root_dir)
print(f"Found {len(json_files)} JSON files.")
combined_data = []
for file_path in json_files:
try:
with open(file_path, "r", encoding="utf-8") as f:
data = json.load(f)
combined_data.append(data)
print(f"Read: {file_path}")
except json.JSONDecodeError:
print(f"⚠️ Skipping invalid JSON: {file_path}")
except Exception as e:
print(f"⚠️ Error reading {file_path}: {str(e)}")
output_file = "combined_python.json"
with open(output_file, "w", encoding="utf-8") as f:
json.dump(combined_data, f, indent=2)
print(f"\n✅ Done! Combined data saved to: {output_file}")
if __name__ == "__main__":
main() Method 3: Node.js Script (JavaScript Users)#
For JavaScript developers, use Node.js’s fs (file system) and path modules to replicate the Python logic:
const fs = require('fs');
const path = require('path');
// Recursively find all JSON files
function findJsonFiles(rootDir) {
let jsonFiles = [];
function traverse(dir) {
const files = fs.readdirSync(dir);
for (const file of files) {
const fullPath = path.join(dir, file);
if (fs.statSync(fullPath).isDirectory()) {
traverse(fullPath); // Recurse into subdirectories
} else if (path.extname(file) === '.json') {
jsonFiles.push(fullPath);
}
}
}
traverse(rootDir);
return jsonFiles;
}
// Main logic
const rootDir = '.';
const jsonFiles = findJsonFiles(rootDir);
console.log(`Found ${jsonFiles.length} JSON files.`);
const combinedData = [];
for (const file of jsonFiles) {
try {
const data = JSON.parse(fs.readFileSync(file, 'utf8'));
combinedData.push(data);
console.log(`Read: ${file}`);
} catch (err) {
console.log(`⚠️ Skipping ${file}: ${err.message}`);
}
}
// Write output
const outputFile = 'combined_node.json';
fs.writeFileSync(outputFile, JSON.stringify(combinedData, null, 2));
console.log(`✅ Saved to: ${outputFile}`); Run with: node combine-json.js
Troubleshooting Common Issues#
| Issue | Solution |
|---|---|
find: permission denied | Run with sudo (if needed) or exclude restricted directories: find . -type f -name "*.json" -not -path "./node_modules/*" |
jq: command not found | Install jq via brew install jq (macOS) or apt install jq (Linux). |
| Invalid JSON in output | Use jq . combined.json to validate, or check for malformed input files. |
| Filenames with spaces | Use find -print0 + xargs -0 (command line) or Python/Node.js (handles spaces natively). |
Best Practices#
- Backup First: Always back up original JSON files before merging.
- Validate Output: Use
jq . combined.jsonor JSONLint to check validity. - Handle Large Files: For 1000+ files, use streaming (e.g., Python’s
ijsonor Node.js streams) to avoid memory issues. - Exclude Unneeded Files: Use
find -not -path "./tmp/*"to skip directories likenode_modules/ortmp/.
Conclusion#
Recursively combining JSON files is straightforward with the right tools. Choose:
- Command Line for speed (if you know
find/jq). - Python/Node.js for readability, error handling, or custom logic.
All methods ensure valid JSON output, whether as an array, merged object, or custom structure.