Wednesday, March 1, 2017

Azure Storage - High Level Architecture




Windows Azure storage is a cloud storage service that is highly durable, available, and scalable. Once your data is stored in Azure storage, you can access that data any time and from anywhere. It provides the following four abstractions (services)- Blob storage, Table storage, Queue storage, and File storage. Each of these has different role to play, you can get more information here…

In conjunction to aforementioned services It also provides the storage foundation for Azure Virtual Machines in the form of persistent data disks.
The goal of this article is not to explain the offering but to understand the fundamentals of Azure storage using which they are able able to achieve all the design goal sets out. Let’s try to decipher how Microsoft do some of these things on Azure storage under the cover.
Any data stored on Azure Storage is triplicated by default, in normal circumstances it creates at least 3 copies of that inside storage stamp (and potentially another region with geo-resiliency enabled). So, before diving into storage architecture it’s only fair to put some lights on Stamps-

Azure Storage Stamp


Azure Divides things into Stamps where each stamp has its own fabric controller. A single Storage Stamp can best be understood as a cluster of N racks of storage nodes, where each rack is built out as a separate fault domain with redundant networking and power. Clusters typically range from 10 to 20 racks with 18 disk-heavy storage nodes per rack.

Microsoft deploys these stamps in its Azure data centers across the world, and adds more stamps as demand grows. Inside the datacenter it's used as a unit of deployment and management. They provide huge assistance in achieving fault-tolerance.

When user creates storage accounts all the data of that is stored on a single stamp. They do get migrated between stamp only when need arises, the way it works is that Microsoft always make sure that single storage stamp should only be utilized ~75% of its capacity & bandwidth. This is because ~20% is kept as a reserve for –

  1. Disk short stroking to gain better seek times and higher throughput by utilizing the outer tracks of the disks
  2. To continue providing storage capacity and availability in the presence of a rack failure within a stamp. When the storage stamp reaches ~75% utilization, the location service migrates accounts to different stamps using Inter-Stamp replication


Location Service

Location Service in Azure is the one responsible for storage stamps management. Another notable responsibility of this to manage the account name-spaces across all stamps. Internally this Location Service itself is distributed across two geographical locations for its own disaster recovery. That makes it immune from geo failure.

As shown in the architecture diagram- We have a Location Service with two storage stamps and within the stamps we have the all three-layer mentioned. While tracking the resources Location Service look for them at each storage stamp in production across all the locations. Now when an application requests a new account for storing data, it specifies the location affinity for the storage (e.g., US North). The LS then chooses a storage stamp within that location as the primary stamp for the account using heuristics based on the load information across all stamps (which considers the fullness of the stamps and other metrics such as network and transaction utilization). The LS then stores the account metadata information in the chosen storage stamp, which tells the stamp to start taking traffic for the assigned account. The LS then updates DNS to allow requests to now route from the name https://AccountName.service.core.windows.net/ to that storage stamp’s IP which it has exposed for external traffic

Architecture Layers inside Stamps

system To keep the data durable, consistent and available with in specific Azure Region lest understand the different layer which constitute the stamp. Windows Azure Storage is a layered system, so let's take it from the bottom up-


Stream Layer: The first or lowest layer on this subsystem which also gets referred as DFS (Distributed File System) layer. This layer stores the bits on disk and is actually in charge of handling the disk, it’s the responsibility of this layer to persist your data on the disk by distributing and replicating the data across many servers to provide durability within a storage stamp. you can think of the underling system as JBOD with in the stamp, so your data when you store in the Azure Storage Service it gets stored in the DFS into these files which is called extent and these extend gets replicated three times across UD’s (update domain)/FD’s (fault domain). The unique thing about this file system is that this is append only file system so when you override any data azure always keeps on appending the data to these extents.

Partition Layer: Then comes the partition layer which can be called as the brain of Azure Storage service. Most of the decision making happens at this layer. It has this significance because it serves two unique purposes-First this layer is built for managing and understanding higher level data abstractions (Blob, Table, Queue, Files). So, it’s this layer which understand what blob, is table is and how to perform transaction on those objects.Secondly it providing a scalable object namespace, and responsible for massive scalable indexIn addition to above two it does help in providing transaction ordering and strong consistency for objects, storing object data on top of the stream layer.  '

Front End Layer:  And finally, at the top of Azure Storage Service we have front end layer which provides a rest protocol for those abstractions (Blob, Table, Queue, Files). The Front-End layer consists of a set of stateless servers that take incoming requests. Upon receiving a request, an FE looks up the AccountName, authenticates and authorizes the request, then routes the request to a partition server in the partition layer (based on the PartitionName). Every write request which comes to the system will have a partition key specified to it. So, in nutshell it does extend its roles for authentication/authorization.

Global Name-space


One of the key design goal of Azure Storage Service is to provide a single global namespace that allows the data to be stored and accessed in a consistent manner from any location in the world. To provide this capability Microsoft leverages DNS as part of the storage namespace and break the storage namespace into three parts: an account name, a partition name, and an object name. As a result, all data is accessible via a uniform URI of the form:

http(s)://AccountName.1.core.windows.net/PartitionName/ObjectName


Two Replication Engines

To address the replication challenge which is backbone of all the Azure Storage Service design goals Azure in general two types of replication-


 Intra-Stamp Replication: Under this replication model Azure keeps your data durable within a region or stamp. It provides synchronous replication and is focused on making sure all the data written into a stamp is kept durable within that stamp. It keeps three replicas of the data across different nodes in different fault domains/Update Domain to keep data durable within the stamp in the face of disk, node, and rack failures. Intra-stamp replication is done completely by the stream layer and is on the critical path of the customer’s write requests. Once a transaction has been replicated successfully with intra-stamp replication, success can be returned back to the customer. Under this engine Azure provides strong consistency, until data is written on all the three places transaction is not committed. And because the replication implementation is happening at stream layer it can provide


Inter-Stamp Replication: Under this replication model Azure provides asynchronous replication and is focused on replicating data across stamps. Inter-stamp replication is done in the background and is off the critical path of the customer’s request. Because of asynchronous replication the write operation is not strongly consistent. This replication is at the object level, where either the whole object is replicated or recent delta changes are replicated for a given account. Inter-stamp replication is configured for an account by the location service and performed by the partition layer. 

“Inter-stamp replication is focused on replicating objects and the transactions applied to those objects, whereas intra-stamp replication is focused on replicating blocks of disk storage that are used to make up the objects.”


Conclusion

When Windows Azure Storage was being comprehended, it had some design goals to achieve -

Consistency: One of the design goal of Microsoft for WAS was to provide strong consistency with in a region. By virtue of mentioned technological brilliance all the data which goes to Azure Storage gets triplicated and All committed data across all 3 replicas are identical.
Durability: Second design goal was that the data must be durable that means there must be proper replication mechanism in place, to address the same all data stored with at least 3 replicas
Availability: Can read from any 3 replicas; If any issues writing seal extent and continue appending to new extent
Performance/Scale: Retry based on 95% latencies; Auto scale out and load balance based on load/capacity. As it should automatically scale to meet the customers peak demand

*You also need a global name space to access the data around the world