funwithlinux blog

Should the FUSE getattr Operation Always Be Serialised? Implications for RESTful API-Backed Filesystems

Filesystems are the backbone of how we interact with data, but traditional filesystems are bound to local storage. Today, with the rise of cloud storage and remote data access, userspace filesystems—powered by frameworks like FUSE (Filesystem in Userspace)—have become indispensable. FUSE allows developers to implement filesystems in userspace, enabling innovative use cases like mounting RESTful API-backed storage (e.g., AWS S3, Google Drive, or custom APIs) as local filesystems.

At the heart of any filesystem lies metadata: information like file size, permissions, and modification time. In FUSE, the getattr operation retrieves this metadata, making it one of the most frequently called operations (e.g., when listing directories with ls, opening files, or previewing file details). For RESTful API-backed filesystems, each getattr often translates to a remote API call (e.g., HEAD or GET requests to fetch metadata), introducing latency and dependencies on external services.

A critical question arises: Should getattr operations always be serialised (processed one at a time), or is parallelism (processing multiple simultaneously) acceptable? This decision impacts performance, reliability, and consistency—especially for filesystems relying on REST APIs with rate limits, latency, and caching constraints.

In this blog, we’ll dissect the tradeoffs of serialising vs. parallelising getattr, explore implications for RESTful API-backed filesystems, and outline best practices to strike the right balance.

2026-01

Table of Contents#

  1. Understanding FUSE and the getattr Operation
  2. What Does "Serialising getattr" Mean?
  3. Arguments in Favor of Serialising getattr
  4. Arguments Against Serialising getattr
  5. Implications for RESTful API-Backed Filesystems
  6. Best Practices: When to Serialise, When to Parallelise
  7. Conclusion
  8. References

1. Understanding FUSE and the getattr Operation#

What is FUSE?#

FUSE (Filesystem in Userspace) is a kernel module and userspace library that enables non-privileged users to create custom filesystems without writing kernel code. It acts as a bridge between the kernel’s VFS (Virtual File System) layer and a userspace process implementing the filesystem logic. When a user interacts with the filesystem (e.g., ls, cat), the kernel forwards requests (like getattr, read, or write) to the FUSE userspace process, which handles them and returns results to the kernel.

The Role of getattr#

The getattr operation (short for "get attributes") is a cornerstone of FUSE. Its primary job is to retrieve metadata for a file or directory, which the kernel uses to populate the struct stat data structure. Key fields in struct stat include:

  • st_size: File size in bytes.
  • st_mtime: Last modification time.
  • st_mode: File type and permissions (e.g., regular file, directory, read/write access).
  • st_nlink: Number of hard links.

When is getattr called? Almost every interaction with the filesystem triggers getattr:

  • Running ls -l (to display file sizes and permissions).
  • Opening a file (to check if it exists and has read/write permissions).
  • File managers previewing directory contents.
  • Tools like find or grep scanning directories.

Given its frequency, getattr performance directly impacts perceived filesystem responsiveness.

2. What Does "Serialising getattr" Mean?#

In the context of FUSE, serialisation refers to processing getattr requests sequentially: one request is completed before the next starts. In contrast, parallelism allows multiple getattr requests to be processed simultaneously (e.g., overlapping network calls to a REST API).

FUSE’s Default Behavior#

By default, FUSE processes requests synchronously: the userspace filesystem must handle one request at a time before the kernel sends the next. However, FUSE supports asynchronous processing via the FUSE_ASYNC_READ capability, where the filesystem can acknowledge a request with -EAGAIN, indicating it will process it in the background. This allows the kernel to send additional requests while the first is pending—enabling parallelism.

Thus, whether getattr is serialised or parallelised depends on how the FUSE filesystem is implemented. Serialising requires explicitly queuing requests, while parallelism leverages asynchronous processing to handle multiple requests concurrently.

3. Arguments in Favor of Serialising getattr#

Serialising getattr may seem counterintuitive for performance, but it offers benefits in specific scenarios—especially for RESTful API-backed filesystems.

3.1 Avoiding REST API Rate Limits#

Most REST APIs (e.g., AWS S3, Azure Blob Storage) enforce rate limits to prevent abuse. For example, S3 allows up to 5,500 requests per second per prefix for metadata operations like HEAD (which getattr often uses). If a filesystem parallelises getattr, it could exceed these limits, triggering throttling (HTTP 429 errors) or temporary bans. Serialising ensures requests are paced to stay under the API’s quota.

3.2 Reducing Network Congestion#

Parallel getattr requests can flood the network with concurrent API calls, leading to increased latency (due to packet loss or contention) or resource exhaustion on the client side (e.g., exhausting ephemeral ports). Serialising limits concurrent network usage, making it easier to manage bandwidth and avoid timeouts.

3.3 Ensuring Cache Consistency#

Many FUSE filesystems cache metadata to reduce API calls. If two parallel getattr requests for the same file hit the cache simultaneously, the first might invalidate the cache (e.g., after fetching fresh data), while the second could still return stale data. Serialising ensures cache updates are atomic: the first request updates the cache, and subsequent requests use the fresh data.

3.4 Simplifying Error Handling#

REST APIs can return transient errors (e.g., 5xx status codes). With parallelism, retrying failed getattr requests becomes complex—retries might overlap, exacerbating throttling. Serialising centralises error handling: failed requests can be retried sequentially without conflicting with new ones.

4. Arguments Against Serialising getattr#

While serialisation addresses rate limits and consistency, it introduces significant drawbacks—especially for performance-critical use cases.

4.1 Crippling Latency#

REST API calls inherently involve network round-trips (RTT), which can range from 50ms (edge networks) to 500ms+ (long-distance or high-latency links). If getattr is serialised, processing N requests takes N × RTT time. For example:

  • A directory with 100 files would take 100 × 100ms = 10 seconds to list with ls (serialised).
  • With parallelism (e.g., 10 concurrent requests), this drops to ~1 second (100/10 × 100ms).

For interactive use (e.g., file managers, command-line tools), this latency is unacceptable.

4.2 Underutilising Modern Hardware#

Modern systems have multi-core CPUs and high-bandwidth networks. Serialising getattr wastes these resources, as the filesystem process sits idle waiting for one API call to complete before starting the next. Parallelism leverages concurrency to keep CPUs and networks busy.

4.3 Over-Reliance on Caching#

Serialisation often forces developers to rely heavily on caching to mask latency. While caching helps, it introduces complexity:

  • Stale data: Short TTLs (time-to-live) reduce staleness but increase API calls; long TTLs risk serving outdated metadata (e.g., a file that was deleted remotely but still appears in ls).
  • Cache invalidation: Writing to a file requires invalidating its cache entry, but distributed systems (e.g., multi-user cloud storage) make invalidation non-trivial.

4.4 Not All APIs Have Strict Rate Limits#

Many REST APIs (e.g., self-hosted services, internal APIs) have lenient or no rate limits. Serialising getattr here is unnecessary and only degrades performance.

5. Implications for RESTful API-Backed Filesystems#

RESTful API-backed filesystems (e.g., s3fs, rclone mount, or custom implementations) face unique challenges that amplify the serialisation vs. parallelism debate. Let’s break down the key implications:

5.1 Latency vs. Rate Limits: The Core Tension#

Cloud APIs like S3 or Google Cloud Storage have high rate limits (thousands of requests/second) but non-trivial latency. For these, parallelism is often safe and performant—if the filesystem stays under the rate limit. For smaller APIs (e.g., a custom in-house REST service with 100 requests/second limits), serialisation or bounded parallelism (e.g., 10 concurrent requests) is necessary to avoid throttling.

5.2 Idempotency and Safety of REST GET#

REST GET requests (used for getattr via HEAD or GET methods) are idempotent (repeated calls have the same effect) and safe (no side effects). This makes parallel getattr inherently low-risk: even if two requests for the same file overlap, they won’t corrupt data or leave the filesystem in an inconsistent state.

5.3 Caching as a Force Multiplier#

Caching can reduce the number of getattr requests, mitigating the need for strict serialisation. For example:

  • Negative caching: Cache "file not found" responses to avoid repeated API calls for non-existent files.
  • Write-through caching: Invalidate or update the cache when a file is modified (via write or mkdir), ensuring subsequent getattr calls use fresh data.
  • TTL tuning: Set TTLs based on how often data changes (e.g., 5 minutes for logs, 1 hour for static assets).

5.4 Batching: A Middle Ground#

Some REST APIs support batch metadata requests (e.g., S3’s ListObjectsV2 for directory listings). Batching reduces the number of getattr calls by fetching metadata for multiple files in a single API request. For example, listing a directory with ListObjectsV2 returns st_size and st_mtime for all objects, eliminating the need for per-file getattr calls. Batching makes serialisation less painful by reducing the total number of round-trips.

5.5 Consistency Models#

REST APIs often have eventual consistency (e.g., S3’s read-after-write consistency for new objects, but eventual consistency for overwrites). Serialising getattr won’t guarantee strong consistency here—even sequential requests might return stale data if the API hasn’t propagated updates yet. Thus, strict serialisation for consistency is often unnecessary in this context.

6. Best Practices: When to Serialise, When to Parallelise#

There’s no one-size-fits-all answer, but these guidelines can help decide:

Serialise When…#

  • API rate limits are tight: If the API allows <100 requests/second, serialise with pacing (e.g., 1 request every 10ms) to avoid throttling.
  • Latency is predictable and low: If API RTT is <20ms, serialisation may not feel sluggish for small directories.
  • Cache consistency is critical: For read-write filesystems where stale metadata could cause errors (e.g., editing a file that was remotely deleted), serialise to ensure cache updates are atomic.

Parallelise When…#

  • API rate limits are high: Cloud APIs like S3 (5,500 requests/second) or Azure Blob (5,000 requests/second) can handle parallelism safely.
  • Interactive use is prioritised: File managers or ls commands require sub-second responsiveness—parallelise with bounded concurrency (e.g., 10–20 concurrent requests).
  • Caching is aggressive: If most getattr calls hit the cache, parallelism has minimal API impact but still speeds up cache misses.

Hybrid Strategies#

  • Per-inode serialisation: Serialise getattr for the same file (to avoid cache conflicts) but parallelise for different files.
  • Token bucket limiting: Use a token bucket to allow parallelism while respecting rate limits (e.g., 100 tokens/second, refilled at 10 tokens/second, allowing bursts up to 100 requests).
  • Adaptive concurrency: Dynamically adjust parallelism based on API latency or throttling (e.g., reduce concurrency if 429 errors are detected).

7. Conclusion#

The question of whether to serialise FUSE’s getattr operation has no universal answer. For RESTful API-backed filesystems, the decision hinges on balancing:

  • Performance requirements (interactive vs. batch use).
  • API constraints (rate limits, latency, batching support).
  • Consistency needs (stale metadata tolerance).

Key takeaways:

  • Serialise to avoid rate limits, simplify error handling, or ensure cache consistency—but only if latency is acceptable.
  • Parallelise for interactive use, high-rate-limit APIs, or when caching reduces API calls—but bound concurrency to avoid throttling.
  • Use caching and batching aggressively to reduce getattr frequency, making both strategies more effective.

Ultimately, the best approach is context-dependent. By profiling API behavior, testing with realistic workloads, and prioritising user experience, developers can craft a getattr strategy that balances speed, reliability, and compliance.

8. References#