AWS Classroom Series – 01/Dec/2021

S3 Contd

  • S3 is accessed over HTTP(S), whether we access the S3 from browser, CLI or programatically internally all of these will be called over REST API
  • Operations on S3
HTTP Verb CRUD operations in AWS S3
GET Read
PUT Create/Update
DELETE Delete
POST Create
  • The S3 URI for the CLI is written as s3://<bucket-name>

AWS S3 Data Conistency Model

  • S3 is a object store which can be accesed over web (web-store).
  • The S3 service is intended to "write once ready many" usecase.
  • Thats the reason why S3 architecture is different from traditional File System or network SAM architecutre.
  • S3 Infrastructure Preview
  • The above image shows only AZs whereas in real life Standard s3 uses a minimum of three AZ’s to store the data.
  • When we create a new object, the data will be synchronously stored across multiple facilities before returning success, this provides the read-after-write consistency
  • For all the other objects (apart from new ones), S3 is eventually consistent

AWS S3 Performance Considerations

  • Its important to understand the best practice for partitioning the workload if your are planning to run on S3 bucket is going to exceed 100 PUT/LIST/DELETE requests per second or 300 GET requests per second.
  • In this case we need to make sure we follow the partitioning guidelines so that we don’t end up with performance bottleneck.
  • S3 bucket is unique and every object name (key) in your bucket can be identified uniquely across the globe
  • AWS S3 scales to support very high request rate, to do so S3 automatically partitions all your buckets
  • Object keys are stored in UTF-8 binary with a maximum size of 1024 bytes
  • Bucket name is qtaws and we have an image image2.1.png
Bucket      Object Key
qtaws       images/image2.1.png
  • Lets assume you have 20 objects
images/image2.1.png
images/image2.2.png
images/image2.3.png
images/image3.1.png
images/image3.2.png
images/image3.3.png
images/image3.4.png
images/image4.1.png
...
  • In this everything is falling under same partition here qtaws/i , since the partition key is i
  • To solve this problem
2images/image2.1.png
2images/image2.2.png
2images/image2.3.png
3images/image3.1.png
3images/image3.2.png
3images/image3.3.png
3images/image3.4.png
4images/image4.1.png
  • we have distribute our objects into following partitions instead of one
qtaws/1
qtaws/2
qtaws/3
qtaws/4
qtaws/5
qtaws/6
...
qtaws/9
  • Reverse the Key Name String
    • When you are uploading the data from your application, with every set of uploads the sequence of application ID increases by 1.
    applicationid/421212342/log.txt
    applicationid/421212342/error.txt
    applicationid/421212343/log.txt
    applicationid/421212343/error.txt
    applicationid/421212344/log.txt
    applicationid/421212344/error.txt
    ....
    
    • IN this case everything falls under the same partition applicationid/4, simply reverse the key to solve the partition issue
    applicationid/243212124/log.txt
    applicationid/243212124/error.txt
    applicationid/343212124/log.txt
    applicationid/343212124/error.txt
    applicationid/443212124/log.txt
    applicationid/443212124/error.txt
    ...
    
    • Now we have solve the partitioning issue
    • or the other way to resolve this issue try to add the hash prefix
    application/178D421212342/log.txt
    application/178D421212342/error.txt
    application/CEAB421212343/log.txt
    application/CEAB421212343/error.txt
    ...
    

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About learningthoughtsadmin