Paper: Google File System
Google File System / Distributed File System Goal Design a distributed file system to store huge files (terabyte and larger). The system should be scalable, reliable, and highly available.
Developed by Google for its large data-intensive applications. Background GFS was built for handling batch processing on large data sets and is designed for system-to-system interaction, not user-to-system interaction. Was designed with following goals in mind: GFS Use Cases Built for distributed data-intensive applications like Gmail or Youtube. Google’s BigTable uses GFS to store log files and data files. APIs GFS doesn’t provide a standard posix-like API. Instead user-level APIs are provided. Files organized hierarchically in directories and identified by their path names. Supports usual file system operations: Additional Special Operations High Level Architecture Agenda Chunks Chunk Handle Cluster Chunk Server Master Client A GFS cluster consists of a single master and multiple chunk servers and is accessed by multiple clients. Chunk As files stored in GFS tend to be very large, GFS breaks files into multiple fixed-size chunks where each chunk is 64 megabytes in size. Chunk Handle Each chunk is identified by an Immutable and globally unique 64-bit ID number called chunk handle. Allows 2^64 unique chunks. Total allowed storage space = 2^64 * 64MB = 10^9 exabytes Files are split into Chunks, so the job of GFS is to provide a mapping from files to Chunks, and then to support standard operations on Files, mapping down operations to individual chunks. Cluster GFS is organized into a network of computers(nodes) called a cluster. A GFS cluster contains 3 types of entities: Chunk Server Nodes which stores chunks on local disks as linux files Read or write chunk data specified by chunk handle and byte-range. For reliability, each chunk is replicated to multiple chunk servers. By default, GFS stores three replicas, though different replication factors can be specified on a per-file basis. Master Coordinator of GFS cluster. Responsible for keeping track of filesystem metadata. Metadata stored at master includes: Master also controls system-wide activities such as: Periodically communicates with each ChunkServer in HeartBeat messages to give it instructions and collect its state. For performance and fast random access, all metadata is stored in the master’s main memory, i.e. entire filesystem namespace as well as all the name-to-chunk mappings. For fault tolerance and to handle a master crash, all metadata changes(every operation to File System) are written to the disk onto an operation log(similar to Journal) which is replicated to remote machines. The benefit of having a single, centralized master is that it has a global view of the file system, and hence, it can make optimum management decisions, for example, related to chunk placement. Client Application/Entity that makes read/write requests to GFS using GFS Client library. This library communicates with the master for all metadata-related operations like creating or deleting files, looking up files, etc. To read or write data, the client(library) interacts directly with the ChunkServers that hold the data. Neither the client nor the ChunkServer caches file data. ChunkServers rely on the buffer cache in Linux to maintain frequently accessed data in memory. Single Master and Large Chunk Size Agenda Single Master Chunk Size Single Master Having a single master vastly simplifies GFS design and enables the master to make sophisticated chunk placement and replication decisions using global knowledge. GFS minimizes the master’s involvement in reads and writes, so that it does not become a bottleneck. Chunk Size GFS has chosen 64 MB, which is much larger than typical filesystem block sizes (which are often around 4KB). One of the key design parameters. Advantages of large chunk size: Lazy space Allocation Each chunk replica is stored as a plain Linux file on a ChunkServer. GFS does not allocate the whole 64MB of disk space when creating a chunk. Instead, as the client appends data, the ChunkServer, lazily extends the chunk One disadvantage of having a large chunk size is the handling of small files. Metadata Let’s explore how GFS manages file system metadata.
...