MultiCloud Classroom notes 06/Sep/2025

Data lakes

  • A data lake is centralized storage repository that can hold all types of data (structure, unstructured or semi structured) at any scale.
  • Structure data: CSV, tables, databases
  • Unstructured data: image, videos, pdf docs, audio
  • Semi structured: JSON, XML, log, sensory data
  • Datalakes keep all enterprise data in oneplace so that it is flexible enough to run
    • AI/ML
    • BI
    • Adhoc queries

Datalakes & S3

  • Since S3 bucket supports all file types, storage scaling, low cost tiers it is a decent candidate to store info.
  • Ecosystem Integration:
    • AWS Glue: ETL (Extract transform load ) service to catalog and clean data
    • Athena: Run sql queries directly on data in s3
    • Redshift
    • EMR
    • Sagemaker
    • Lake-formation

S3 bucket types

  • Classic/General Purpose Bucket:
    • Use cases:
      • Website Hosting
      • Backup and Archival
      • Big data storage
  • Directory Bucket: (S3 Express Onezone):
    • Here buckets that have directories (unlike virtual folders in general purpose bucket)
    • Each directory scales performance independently
    • S3 supports only one zoner storage class
    • Use cases:
      • AI/ML training datasets
      • High frequency log ingestion
      • Single AZ workloads
  • Table Buckets: meant to store tabular structured data.
    • Usecase:
      • Simpler data lake tables
  • Vector Buckets: A bucket type designed for storing vector data use in Gen AI applications
    • Usecases:
      • RAG
      • Recpmmendation systems
      • Image/audio search

Azure storage account types

  • General purpose storage account with blob containers (s3 general purpose)
  • ADLS Gen2 => Blob storage + directory tree + analytics features
  • Table storage
  • Queue storage
  • Hierarchical namespace => Directory Bucket

Disk Storage

  • In the cloud when we are attaching a disk to a virtual machine (ec2/azure vm). The important categorization is where the disk has os in it (AWS root volume/Azure Os disk)
  • AWS allows to add additional Non root volumes & Azure allows to add data disks, However the number of disks that can be attached is linked to VMsize in Azure.
  • Both AWS & Azure allows us to increase disk sizes on the fly. Extending existing filesystem to use the increased disk size has to be done by us.
  • The backup of the disk/ebs volume is referred as snapshot.
  • No cloud allows us to decrease the disk size once created.
  • Generally the disk and vm should be in same zone.
  • AWS allows to change disk sizes (only increase). AWS will make the modified sizes available to ec2 instance. now we need to resize the filesystem to make use of additional space.

Create an Windows ec2 instance and increase disk sizes

  • Watch classroom video

Create an Linux ec2 instance and increase disk sizes

  • On the linux instance to view disks
lsblk
  • To view mounts
df -h
  • prompt for helping you mount
As an expert linux admin, Help me with some of the activities which i need. i'm using ubuntu server 24.04

Published
Categorized as Uncategorized Tagged

By continuous learner

devops & cloud enthusiastic learner

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Please turn AdBlock off
Social Media Integration by Acurax Wordpress Developers

Discover more from Direct DevOps from Quality Thought

Subscribe now to keep reading and get access to the full archive.

Continue reading

Visit Us On FacebookVisit Us On LinkedinVisit Us On Youtube