Is Backblaze B2 a Filesystem?
What is a filesystem?
The easy answer is that a filesystem is a system that holds files, but that simply opens up the question of what a file is, what is meant by holding a file, and further existential questions that usually devolve into unhelpful distinctions.
For this discussion, a file is an object composed of an arbitrary number of bytes, and a filesystem is some means of arranging these objects for access via some form of metadata. B2 is object storage, so it holds this sort of file, and it allows access to them by supplying a filename. A filename in turn is some kind of identifying set of bytes, which usually has some constraints around what bytes are permitted and in what order. Although these constraints may seem arbitrary, but generally represent higher-level considerations (must be valid Unicode, cannot exceed 1024 characters, and similar requirements).
A filesystem often arranges files by filenames — but it does not have to; there are other possibilities. Fortunately, Backblaze B2 storage uses filenames, so those other methods need only be acknowledged, not further discussed.
Backblaze B2 storage is a filesystem
Backblaze B2 storage is organized by buckets and filenames. A bucket is not a physical object, but a virtual object (technically, this is referred to as a namespace). Each B2 bucket has a unique name, and within a bucket, every file has a unique name, and this unique name is used to identify the file.
The B2 Storage Filesystem
The B2 Filesystem is a flat filesystem. Every file within a bucket has a unique filename; there is no inherent organization of files underneath the bucket. This differs from a hierarchical filesystem as a hierarchical filesystem has a hierarchy of file collections, usually referred to as folders and subfolders, or directories and subdirectories.
B2 storage does not have folders
B2 storage appears to have folders — but it does not. A file named foo/bar/prozog.txt has that as its name: the two / characters are part of the name. It is not a file named prozog.txt that resides in a directory bar that in turn resides in a directory foo in the root directory; it is a single file named foo/bar/prozog.txt in a B2 storage bucket. Backblaze B2 has some rules about where a / character may occur in a filename (not at the start, and not at the end), and the result is that moving a file to and from a Backblaze B2 bucket from a hierarchical filesystem (such as Linux, MacOS, or Windows generally use) is as transparent as possible.
As a sidenote, a B2 bucket may have 0-byte files whose name ends in .bzempty, such as bar/foo/.bzempty. These special filenames exist to make it appear that there is a folder foo under the folder bar, and this allows Backblaze B2 storage to capture empty directories and complete the illusion of a hierarchical filesystem.
Linux, MacOS, and Windows Hierarchical Filesystems
The many variations of Linux, Apple’s MacOS, and Microsoft’s Windows come with a hierarchical (folders and subfolders) filesystem, and it is this organization that most persons think of when they think of a filesystem. This approach to organizing files make it possible to organize hundreds, thousands, hundreds of thousands, and millions of files in a metaphorical way that is familiar to users, and easy to understand by with a physical-world metaphor of bookshelves holding cabinets holding drawers holding files holding documents.
Why doesn’t Backblaze B2 storage use a hierarchy?
Backblaze B2’s flat file system guarantees speed of access because Backblaze does not have to open and parse multiple folder directories to locate a file. Backblaze B2 storage, with a little care, makes it easy for a developer to make it look like the kind of hierarchical filesystem that users are familiar with by returning files filtered by prefix: asking Backblaze B2 for “all the files that start with foo/bar/” looks pretty much like the contents of folder bar in the top-level folder foo — thus providing the advantages of both.
Accessing B2 as Hierarchical Storage
In the simplest sense, all the high-level tools access B2 as if it had folders and directories. The native web interface at displays folder icons, and clicking on a folder icon takes one to the files one would expect to see. Our integration partner’s tools (such as Cyberduck, Filezilla Pro, ForkLift, Panic Transmit, and others) all take the same approach: displaying the files as if they came from an underlying folder.
Mounting B2 as a Filesystem
If the Backblaze B2 filesystem is so easy to present as a hierarchical file system, is it possible to use it as if it were a filesystem? That is, to enable direct access to Backblaze B2 storage by applications as if the application were reading and writing to local disk? This is referred to as “mounting” Backblaze B2 storage. (This historical term dates from when computer storage was in the form of removable media such as tapes or removable disks, and an operator would mount a tape on a tape drive).
In practice, this makes Backblaze B2 storage available as either a disk or a folder in the existing filesystem, and one can run commands such as ls (Linux/Unix and Mac) or dir (Microsoft Windows). A number of our integration partners tools can do this: CloudMounter, ExpanDrive, ForkLift, Mountain Duck, and others. The promise of editing documents or reading a database directly from Backblaze B2 without having to download them to a local directory, and then having to upload them is a fantastic goal.
That goal is not always realistic. Backblaze B2 storage is object storage, intended to preserve files without change and access those unchanging files quickly and efficiently. This is not what a typical application program working with data-on-disk expects: an application program expects fast local disk access to read and write parts of a file. Although these tools can read selected parts of a file, they must write the entire file when they emulate saving to disk. Also, unlike most filesystems, each rewrite of the file creates another version of the file in B2 storage — Backblaze B2 only deletes files when told to do so explicitly via a delete command, or implicitly through a bucket’s lifecycle rules.
This approach is good for looking at archived files, and understanding what is there, and pulling them down (or putting them up) manually, but it’s not a seamless extension that provides both expandable storage and fast access. If mounting Backblaze B2 directly as a filesystem isn’t the right approach — what is? Is there a way to leverage Backblaze B2 object storage as a way to extend storage seamlessly?
A Hierarchical Filestore backed by Backblaze B2
A step beyond the approach of applications that mount Backblaze B2 remotely and pretend that this remote object storage is a local disk are applications known as storage gateways. These applications combine a local cache with remote storage to present fast and apparently unlimited storage. If a file is not present locally, the gateway pulls it from B2, and it is effectively local until it is no longer needed. The data is updated on B2 at regular intervals (but not as often as every disk write). Once the file is no longer being accessed regularly, the gateway lets the cached version expire, freeing local space for data that is in use. These are more complex applications and installations, consisting of sophisticated software and a significant amount of local storage for cache, and some of these tools offer other storage capabilities, as well.
OpenDedup, StarWind, and Tiger Bridge all present a local cached filesystem that can offload less-used data to Backblaze B2 storage. OpenDedup, for example, provides deduplication capabilities, as well as multisystem access to data (that is, multiple computers can use and share data backed from B2 using OpenDedup). StarWind enables an expanded file system with multiple locations as well as supporting virtual automated tape libraries, as does Tiger Bridge.