You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Object storage is a type of data storage architecture that stores data as objects rather than files or blocks. In an object storage system, data is stored as discrete units of information that include the data itself, metadata describing the data, and a unique identifier for the object.

Unlike file and block storage systems, object storage systems do not organize data into a hierarchical directory structure. Instead, objects are stored in a flat address space, which makes it easier to store and manage large amounts of data, as only the key is needed to identify an object and no index or hierarchy needs to be maintained. Another difference is that changes to objects are not possible - they must be replaced, and consistency/atomicity is not guaranteed during the time of replacement, unlike in a filesystem - this simplifies caching.

Object storage is commonly used in cloud-based environments because of its scalability and cost-effectiveness.  It is well suited for situations where large amounts of data need to be stored and accessed independently by many processes.  Typically an additionally indexing mechanism, like a database (or a simple text file of keys), is needed for performance/speed.

Difference between filesystem and object storage


filesystemobject storage
Structure

File storage is organized into a strict tree-like hierarchy with directories, sub-directories, and so on. To access a stored file, you must follow a specific path to it.

Image from: https://www.datacore.com/blog/file-object-storage-differences/

Object storage, on the other hand, is stored in a “flat” address space. Each stored object has a unique identifier plus detailed metadata that makes it easy to find among potentially billions of other objects. While a object storage path might look like bucketname:path/to/a/file  there aren't actually any directories EXT - the name (key) of that object just happens to have forward slashes in it.

Image from: https://www.datacore.com/blog/file-object-storage-differences/

Scalability

The hierarchy and pathing of file storage begins to max out at hundreds of millions of files.

While distributed filesystems do exist, they suffer from high overheads to maintain consistency between servers in the cluster, or need to sacrifice some of the guarantees (like consistent view of the filesystem between clients).

Object storage offers near-infinite scaling, to petabytes and beyond.
LatencyAs long as the system has the path to where the data is located, grabbing it is fast and simple.Object storage, on the other hand, was created with scalability in mind, and those advantages have typically come at the cost of speed and performance.  Typically it is performant once it starts transferring, but initial setup takes longer.
Performance

While file storage allows you to locate data very quickly through the hierarchical system, however, that throughput becomes slower and slower the more directories, folders, and files you have to open. Think of a directory with millions of sub-directories, which have millions of folders, which have millions of files each.

Object storage works best with larger objects - once transferring, it goes quickly, and the initial setup time is less of an overhead.  Unlike a filesystem, where a path means stepping through a number of directories to reach a file, accessing a key is a single-step operation so there is no difference in access time between objects, unlike files. 
Access protocol
Traditional networked file storage typically uses Network File System (NFS) or other common network protocols that are optimized for low latency and excellent throughput.Traditional object storage uses HTTP to access data. This makes it simple to retrieve data via many different applications and even web browsers, and circumvents most firewalls. However, because HTTP isn't optimized for file transfer, it is processed more slowly than file storage protocols.
Security
Filesystems are generally intended for local usage, not for sharing to the world, and secure usage is difficult to guarantee.
Object storage has a well structured access control mechanism enabling local as well as world-wide usage.
Search
Filesystems are designed with hierarchical search in mind, and support this well.  Searching for files that aren't well sorted into a hierarchy is quite a significant search effort though.  Tooling is generally excellent.
Searching object storage is generally a bad idea - while it works, it is not an efficient way to use it.  However, if an external index, such as a database or a list of keys is available, this can be optimized for the application and is extremely quick.
Support in programming languages/libraries
Central to computing since the early days, filesystem access and tooling is endemic in all languages and libraries.
Object storage is well supported in cloud computing and the languages / libraries used there (Check How to access S3 buckets and perform actions).  It is less common but increasing in support in scientific computing.

How to use/interact with S3 buckets

References


  • No labels