The sheer number of Google Cloud Platform services can be overwhelming. I found that breaking them down into their component parts helped me to get a clearer understanding.
With that in mind, today I decided to write about one of the fundamental GCP data storage option: Cloud Storage
Google Cloud Storage
When deciding what Google Cloud service to use to store data you have to think about how you are going to use that data. Google offers different services depending on how your data is structured, how you will be accessing that data, as well as how often you are accessing your data.
In my notes I defined Cloud Storage thusly:
Binary large-object storage accessible by URL. High durability, high availability. It’s not a file system – it’s “buckets”. Buckets contain immutable objects. Appropriate for web-content, downloads, etc.
If you are anything like me the part of this definition that catches your attention is: “buckets”. WTF does buckets mean? I found that it’s best to not over-think it 🙂
It really just means that in Google Cloud Storage you are not dealing with a file system as you are on your Linux or Windows computer. A comparable paradigm in a file system might be a folder or directory. In the GCP Cloud Storage system buckets are the most fundamental containers that hold data and everything in Cloud Storage must be stored in a bucket.
One difference between directories and buckets is that while you can use them both to organize your content you cannot “nest” buckets inside one another.
The Binary large-object part of the definition from my notes means that each item in a bucket is an object – just a chunk of data to GCP. Each object has associated meta data (name-value pairs) that describe that object qualities.
It’s also important to understand that objects are immutable; meaning that you cannot change an object. You can upload a new version to overwrite an object – and you can even use versioning to keep a series of the same object. What you can’t do it go into GCP and edit an object itself.
You can control who has access objects using GCP’s IAM policy or with Access Control Lists (ACLs).
Cloud Storage Classes
Finally you’ll need to decide what Cloud Storage class is the best fit for your data. There are four classes in total, but within those 4 I think there are really 2 classes each with a sub class. Again, from my notes:
Multi-regional
High performance
For most frequently accessed data
Appropriate for content storage and delivery
Highest storage price – low transfer price
Regional
High performance
Data accessed frequently within a region
Appropriate for transcoding / regional analytics
Nearline
For data accessed less than once a month
Appropriate for backup / longtail content
Coldline
Accessed less than once a year
Archive / disaster recovery
Lowest storage price – highest transfer price
The idea is that if you or your users are going to access your data on the reg then you choose Multi-Regional or Regional. If you are storing your data or doing some type of scheduled batch processing you choose Nearline or Coldline.
The first two have a higher storage price and a lower transfer price and the later two the opposite.
Hopefully this gives you at least a basic understanding of Google Cloud Platforms Cloud Storage. Cloud Storage is only one of the myriad of data storage options GCP provides.
Next week I’ll write a post outlining the GCP Database offerings. Until then here is a great page to continue reading about Google Cloud Storage.