July 7, 2024

Azure Blob Storage made easy for beginners

Let’s take a look together into all the theory behind Azure Blob Storage Accounts and what can help us understand those concepts better.

What is a Blob?

Before we dive deep into the theory behind Blob Storages we have to understand what blob really is. Blob is simply Binary Large Object. It’s everything you might want to store – picture, a file (pdf, txt), a .zip file, backups, virtual hard discs for VMs, almost everything you can think of, really.

Azure Blob Storage

Now when we know what a blob actually is, Blob Storage is pretty much self-explanatory. It is a solution where you can store large volume of unstructured data. It has many good selling points, like:

  1. Blob Storage has a build-in security features (which we will look into with more details in a minute).
  2. Comes with client libraries supporting programming languages like: .Net, Java, Node.js, Python, Go, PHP, or you can access objects using HTTP/HTTPS calls.
  3. It’s highly convenient and customizable (we will look into that in more details too)

The general structure

I like thinking of the structure here like a wardrobe, where you can store your underwear. We can think of a room as the highest level – Azure Subscription. Then the wardrobe itself would be a Resource Group. Like everything in Azure you have to start from a Resource Group, it is like a Golden Rule. Once you have that we can create an Azure Storage, which will be our drawers. Drawers with rules. Perfect ones!

Here you can’t just throw everything into one drawer, socks, panties, bras. No! You have to categories them! Just like you can have different types of underwear, you can have different services you can store in Azure Storage. Those services are: Files, Queues, Tables and Containers. Here we will focus only on Containers. This is a drawer where our Blobs are residing. You can have as many containers as you wish under one Storage Account.

Overall it looks something like this.

Types of Blobs

We have several types of Bobs, that are classifying our objects.

  1. Block Blobs
  2. Page Blobs
  3. Append Blobs

Block Blobs

Block Blobs are optimised for “blocks” of data. One block can hold up to 50k blocks, and manages each block individually using block ID. You can store here up to 4.75 TiB of data.

Page Blobs

Page blobs are collections of 512-byte pages optimized for random read/write operations. Their primary usage is for Virtual Hard Discs files that serve as disks for Azure virtual machines. Maximum size of this blob is 8 TiB.

Append Blobs

Append Blobs also consists of blocks, but is optimised for append operations. Any modification done on an append blob is added at the end of blob, and updating or deleting existing blocks is not supported. That makes it perfect for f.ex. logging. Maximum size is about 195 GiB.

Accessing your blobs

Account Storage allows you to run several actions on your blobs. You can delete / edit / upload / download dependently on type of Blob as we discussed in previous paragraph. However, different types of actions come with different costs and that cost depends on a tier of your Storage Account.

Let’s circle back to our wardrobe comparison for a second. We all have clothes that we use with different frequency. Surely you have some underwear that you can use regularly all-year long, no matter what. But you might also have only time-specific pieces, like special hiking socks when you go hiking only once every other month. Lastly, there are those things (that might as well remember your High School times), you hold from pure sentiment, with hope, that maybe one day you will manage to squeeze in again.

The same thing happens with your Blobs! If you are running an online shop and you store pictures of your inventory you expect that you will get a lot of read requests, so you need a low download latency. However, if working on solution for banking you might have some regulatory reasons to store some pieces of information for X amount of years without any hope, that anyone will actually request those, but if this happens, they can wait a bit longer for them to be downloaded.

Different kinds of tiers in Azure Storage Accounts cover all of those scenarios. Let’s look at them in more details.

Hot tier

Hot access tier is an online tier, perfect for frequently access data. If you need to modify your data it might be a good solution for you. In this case cost of storage is the highest, but accessing data is the cheapest amongst all tiers.

Cool tier

Cool access tier is an online tier optimised for infrequently accessed data. Cost of storage is lower compared to Hot tier, but accessing data will be more expensive. Minimum time of storing data is 90 days.

Cold tier

Cold access tier is an online tier optimised for rarely used data, but that still require fast retrieval. In this case storage cost is lower, but access cost is higher compared to cool tier. Minimum time of storing data is 90 days.

Archive tier

This is an offline tier, that is optimised for storing rarely accessed data. Just like files in banks stored for regulatory reasons. If you are flexible with your latency and can wait for hours to download a file, this one is for you. It’s the cheapest option in terms of storage costs, but accessing will be the most expensive amongst all other tiers. Minimum time of storing data is 180 days.

NB! Please remember, that setting access tiers is only available for Block Blobs!

You probably realised three things about tiers: we have online and offline tier, cost of storage consists of two factors: storage and access, and some tiers have minimum storage time. As long as I don’t want to go deeper into pricing details as those tend to change, we can talk about two other factors a little.

Online vs. offline

If data are stored in an online tier, they can be accessed immediately, while archive (offline) is best for storing data, that are rarely or hardly ever accessed.

Minimum storage time

This magical statement defines the minimum amount of days, that our blob has to reside under given tier to avoid an early deletion penalty. You can get this penalty if you move, delete or overwrite your resource before specified amount of days passes. Let’s consider an example

We moved our Blob from Hot to Cold tier after 15 days from last modification (not manually of course, but rules will be covered later). Then, we decided to move the Blob from Cold tier to Archive one, as we realise, that we won’t need this one as often as we expected. It lived on Cold tier for only 50 days and we are moving it to Archive. This change will create a deletion fee of 40 days (90 – 50) of storing that blob in Cold tier!

So you can think of minimum storage time as the minimum storage cost, that you will pay for moving blob to the tier. If the blob stays longer in the selected tier, of course we will have to cover those costs additionally.

Managing Blob’s lifecycle

Just like your underwear in the drawer has it’s lifecycle, and panties can go from regularly used (in high School) to ones kept just from pure sentiment you can manage your blobs. You can define a rule or set of rules to transition blob data to the appropriate access tier or expire them at the end of their lifecycle. Policies run on a base blob or optionally on blob’s version or snapshot.

Lifecycle management policy consists of set of rules, running once per day. They define a set of actions that are to be taken if conditions in filters are being met. Rules can apply to containers or a subset of blobs. We have a set of check to choose from:

  • Number of days since creation;
  • Days since last modification;
  • Days since last access;

To use those conditions we need to enable last access time tracking on your Storage Account level. You can create policies from code level or directly in Azure Portal.

NB! Only Block and Append Blobs in general-purpose v2, premium block blob, and Blob Storage accounts support lifecycle management.

If you define multiple actions for one blob, the cheapest one will be performed. F.ex. deleting is cheaper than tireToArchive, but tierToArchive will be cheaper than tierToCool etc.

Security

One thing, that does not really fit our wardrobe comparison is security. All your data will be automatically encrypted using SSE (service-side encryption) when persisted to the cloud. Moreover, Azure Storage libraries for Blob and Queue Storage also provide client-side encryption if you need to encrypt data on the client. By default encryption happens with Microsoft-managed keys, but you can customise that as well. If you decide to manage encryption with your own keys, you can either specify a customer-managed key or a customer-provided key.

If you decide to provide a customer-managed key, you have to store them in Azure Key Vault or Azure Key Vault Managed Hardware Security Model (HSM). Then Azure can use your key to encrypt your Blobs.

If you choose customer-provided key, you’ll have to include an encryption key in every read or write request your client is making to the Blob Storage. This way you have a granular control over the encryption / decryption of your data.

Moreover, you have to authorise before making any action on your Blob. Working directly in Azure Portal is quite obvious, you have to log in, but the same happens from code. On creating your Client you can authorise using either Connection String or by setting up Role Based Access Control (RBAC).

You can read more about security in documentation.

In the next article we will cover creation of Account Storage and managing your Blobs from code level, so stay tuned! In the meantime, you can check out my previous article about reading secrets from Variable Groups in Azure Pipeline.

1 Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.