AutoScaling
- When a user tries to access the server then on the server a thread is created. Each thread is a program which can run in parallel i.e. it is direct load on cpu. Thread also will be given some amount of memory

- As user parallely access your application, number of threads increase which leads to load on cpu
- On a server CPU is a finite resource so there will be a limit at which your server cpu will be exhausted.
- So in cloud we can create number of vm’s and pay for the usage.
- In GCP we have MIG which lets us create a group of vm’s which run on same image
- Now we enable autoscaling where we try to define the situations where vm’s running your application should be increased when in demand and decrease when demand is reduced.
- We have an autoscaling policy, these policeis can be base on CPU usage or Stackdriver metrics
- Lets try to understand the following terms
- Cool down period: This is also known as application initialization period.
- Stabilization period: This is period when autoscaler tries to calculate the metrics
- Lets configure autoscaling for an instance template

- Lets try to add a load balancer

- Exercise Try the same exercise with instance template having an vm size with atleast 2 VCPUs and 4 GB of RAM
