Google Cloud Storage (GCS)
-
GCS is used to store the unstructured data such as images, vidoes and other static content as well as backups, disaster recovery.
-
GCS stores data in the form of objects on an underlying proprietary distributed file system called as Colossus
-
We can transfer the data in our out of GCS using gsutil
-
Different offerings by Google based on usecase
-
Cloud Storage is simple an user friendly (think of Google Drive)
-
We dont need to allocate capacity
-
We can store files till infinity. GCS supports datasets of any size
-
Each file cannot have size more than 5 TB
-
This supports 5000 writes adn 1000 reads per second
-
In GCS when we want to store the data we need to create the storage buckets
-
Basic Terminology
- Buckets are basic containers that hold your data. Each Bucket has collection of objects
- Each bucket has a name which must be globally unique
- Objects are individual pieces of data that you store in Cloud Storage
- Object names are treated as object metadata. Object names should be less that 1024 bytes in length.
- Object name should be unique with in bucket
- Example object names
- myphoto.jpeg
- /year2021/jan/myphoto.jpeg
-
The access to the data in the buckets can be done as shown below
-
The objects in the GCS bucket are categories into two sections
- Cold data: Accessed infrequently
- Hot Data: Accessed frequently
Storage Classes
- Buckets are mainly divided into four classes based on their availabilty,demographic relevance and frequency of access
- Hot and available – Multi Regional:
- This is Geo-Redundant. When we upload any data in this storage class two separte copies will be stored in two regions of gcp which are hundreds of miles away from each other
- This is most expensive class of buckets and should be used only when we are certain of global traffic.
- Hot and local – Regional:
- This is used for webapplications with highly concentrated traffic patterns with in a region
- Cool-Nearline: We use this storage class when we expect access data less frequently (once in a month). Here we pay less for storage and more for access
- Coldline: This is a cold storage facility for data that we would expect to access less than once in a year. Suitable for disaster recovery & long-term archival storage.
- Lets estimate some costs use pricing calculator Refer Here
- We need to store 10 TB of data
- Multi-regional
- We need to store 10 TB of data
Working with GCS buckets
- Creating Buckets:
- For creating buckets in GCP there are multiple ways Console, gsutil, REST API.
- To create bucket we need three fiels
- Region
- Storage Class
- Univesally unique name
- Creating a bucket using Web Console
- Creating buckets using gsutil:
- For overview Refer Here and for installing if you dont have gsutil installed Refer Here
- For command reference use the left hand pane Refer Here
- Lets create a bucket using gsutil
- Refer Here for the cli created in the class