Infrastructure as code using AWS Cloudformation and Chef : Cloudformation

Resources for standalone EC2 instance


					"InstanceSecurityGroup": {
  "Type": "AWS::EC2::SecurityGroup",
  "Properties": {
    "GroupDescription": "Enable SSH access and HTTP",
    "SecurityGroupIngress": [{
      "IpProtocol": "tcp",
      "FromPort": "22",
      "ToPort": "22",
      "CidrIp": ""
      "IpProtocol": "tcp",
      "FromPort": "80",
      "ToPort": "80",
      "CidrIp": ""


    "WebServerInstance": {
      "Type": "AWS::EC2::Instance",
      "Metadata": {
        "Comment": "Install chef",
        "AWS::CloudFormation::Init": {
          "configSets": {
            "All": ["setupDefault"]
          "setupDefault": {
            "packages": {
              "yum": {
                "git": []
              "rpm": {
                "chefdk": ""
            "files": {
              "/tmp/": {
                "source": "",
                "mode": "000400",
                "owner": "root",
                "group": "root"
              "/etc/cfn/cfn-hup.conf": {
                "content": {
                  "Fn::Join": ["", [
                    "stack=", {
                      "Ref": "AWS::StackId"
                    }, "n",
                    "region=", {
                      "Ref": "AWS::Region"
                    }, "n"
                "mode": "000400",
                "owner": "root",
                "group": "root"
              "/etc/cfn/hooks.d/cfn-auto-reloader.conf": {
                "content": {
                  "Fn::Join": ["", [
                    "action=/opt/aws/bin/cfn-init -v ",
                    " --stack ", {
                      "Ref": "AWS::StackName"
                    " --resource LaunchConfig ",
                    " --region ", {
                      "Ref": "AWS::Region"
                    }, "n",
            "commands": {
              "install_chef": {
                "command": "bash /tmp/"
              "clone_git": {
                "command": "sudo -u ec2-user bash -c 'cd ;git clone  '"
            "services": {
              "sysvinit": {
                "cfn-hup": {
                  "enabled": "true",
                  "ensureRunning": "true",
                  "files": ["/etc/cfn/cfn-hup.conf",

Autoscaling Group

Now that we have a standalone instance, we can add other components to our stack so we have a scalable web application. If the application gets some traffic, the group can expand or contract if the traffic slows down. For such web-scale application, we would need the following resources:

  1. Autoscaling Group: Multiple Servers that can scale up and down
  2. Scaling up policy
  3. Scaling down policy
  4. Alarm that triggers scale up policy
  5. Alarm that triggers scale down policy
  6. Load Balancer: a server that distributes traffic to the autoscaling group.

Load Balancer

This is where the DNS of the website points to. Load balancer takes in the incoming request and forwards it to one of the servers in the Autoscaling group.

					"ElasticLoadBalancer": {
      "Type": "AWS::ElasticLoadBalancing::LoadBalancer",
      "Properties": {
        "AvailabilityZones": {
          "Fn::GetAZs": ""
        "CrossZone": "true",
        "Listeners": [{
          "LoadBalancerPort": "80",
          "InstancePort": "xxxx",
          "Protocol": "HTTP"
        "HealthCheck": {
          "Target": "HTTP:xxxx/health-check",
          "HealthyThreshold": "3",
          "UnhealthyThreshold": "5",
          "Interval": "30",
          "Timeout": "5"
  1. This is a multi-AZ (Availability Zone) Load balancer
  2. Listeners: This declares what port load balancer listens on (80) and what port the web application is running on.
  3. HealthCheck – Target: The load balancer hits the Target URL on each instance that registers with it to make sure it’s healthy. If for some reason it does not get 200 responses, it marks the instance as unhealthy and stops all traffic to it. This is useful in case something goes wrong, and application stops for some reason. We implemented a special end point for this.
  4. HealthCheck – HealthyThreshold : Number of times the health check end point must return 200 before instance is marked healthy
  5. HealthCheck – UnhealthyThreshold : Number of times the health check fails (returns anything but 200) before instance is marked healthy.
  6. HealthCheck – Interval: Interval between health checks. If an instance goes bad right after its marked in service, it may take bout 30 seconds in this case for load balancer to stop traffic.
  7. HealthCheck – Timeout: This is how long the load balancer will wait to get a response, if instance does not respond in specified time, its marked as health check fail.


Note: Take special caution while updating health checks. If for some reason it does not work because of not setting it up correctly or not having reasonable timeouts, you can end up taking the whole application down.

Autoscaling group

This is a group of servers that can grow and shrink based on specified grow/shrink logic to account for increasing or reducing traffic. Instead of website owners watching the metrics and adding servers manually, this takes care of it automatically, which is great for increased uptime and better customer experience.

					"WebServerGroup": {
      "Type": "AWS::AutoScaling::AutoScalingGroup",
      "Properties": {
        "AvailabilityZones": {
          "Fn::GetAZs": ""
        "LaunchConfigurationName": {
          "Ref": "LaunchConfig"
        "MinSize": "3",
        "MaxSize": "10",
        "LoadBalancerNames": [{
          "Ref": "ElasticLoadBalancer"
        "NotificationConfigurations": [{
          "TopicARN": {
            "Ref": "NotificationTopic"
          "NotificationTypes": ["autoscaling:EC2_INSTANCE_LAUNCH",
      "CreationPolicy": {
        "ResourceSignal": {
          "Timeout": "PT15M",
          "Count": "1"
      "UpdatePolicy": {
        "AutoScalingRollingUpdate": {
          "MinInstancesInService": "1",
          "MaxBatchSize": "1",
          "PauseTime": "PT15M",
          "WaitOnResourceSignals": "true"
    "LaunchConfig" : {
      "Type" : "AWS::AutoScaling::LaunchConfiguration",
      "Metadata" : {...}
  1. Autoscaling group has a min, max and desired count. Desired always lies between min and max. It can be changed by an authorized user or an autoscaling policy.
  2. LaunchConfigurationName – This is the configuration for each instance in the autoscaling group. This is exactly the same as in WebServerInstance discussed earlier.
  3. NotificationConfigurations – This is a hook for SNS notifications, in case you want to subscribe to notifications when any sclaing happens.
  4. UpdatePolicy : AutoScalingRollingUpdate – This handles update policy for the group. This configuration means update one instance at a time and keep at least once instance in service while performing the update, wait for 15 minutes before running update on next batch (next instance since batch size is 1.)
  5. UpdatePolicy : WaitOnResourceSignals – This means that autoscaling group must wait for signal from new instances for pause time duration (15 min here). If no signal arrives, it does not complete the update.


These policies specify how the auto scaling group will expand or shrink when the alarm triggers.

					"WebServerScaleUpPolicy" : {
      "Type" : "AWS::AutoScaling::ScalingPolicy",
      "Properties" : {
        "AdjustmentType" : "ChangeInCapacity",
        "AutoScalingGroupName" : { "Ref" : "WebServerGroup" },
        "Cooldown" : "60",
        "ScalingAdjustment" : "1"
    "WebServerScaleDownPolicy" : {
      "Type" : "AWS::AutoScaling::ScalingPolicy",
      "Properties" : {
        "AdjustmentType" : "ChangeInCapacity",
        "AutoScalingGroupName" : { "Ref" : "WebServerGroup" },
        "Cooldown" : "60",
        "ScalingAdjustment" : "-1"
  1. WebServerScaleUpPolicy – Increase the size of autoscaling group by 1 instance and wait for 60 seconds before running this policy again.
  2. WebServerScaleDownPolicy – Decrease the size of autoscaling group by 1 instance and wait for 60 seconds before running this policy again.
  3. These can also be configured to be % of Autoscaling group. for instance adding 1 instance, when there are 3 instances increases the capacity by 33%, but adding 1 when there are 5 increases capacity by 20%, hence for consistency’s sake, sometimes it might be best to indicate a % increase if instances can scale from very low to very high.
  4. Adding instances and removing them may need some tuning, for instance, if the limits are tight, like scale down < 50%, scale up > 55%, it is totally possible that at times it adds an instance that brings load down to less than 50, then it removes an instance that takes the load up to 60, which then triggers an add, and the group keeps scaling all the time. this is not good, so the advice here is to spread it out a bit, maybe about 20% of spread in scaling.

CloudWatch Alarms

These alarms monitor the “bottle neck” metrics of the autoscaling group, for most applications its CPU Utilization. When these alarms trigger they activate the policy they are configured to which causes the auto scaling group to grow or shrink based on amount of traffic.

					"CPUAlarmHigh": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmDescription": "Scale-up if CPU > 60% for 10 minutes",
        "MetricName": "CPUUtilization",
        "Namespace": "AWS/EC2",
        "Statistic": "Average",
        "Period": "300",
        "EvaluationPeriods": "2",
        "Threshold": "60",
        "AlarmActions": [{
          "Ref": "WebServerScaleUpPolicy"
        "Dimensions": [{
          "Name": "AutoScalingGroupName",
          "Value": {
            "Ref": "WebServerGroup"
        "ComparisonOperator": "GreaterThanThreshold"
    "CPUAlarmLow": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmDescription": "Scale-down if CPU < 50% for 10 minutes",
        "MetricName": "CPUUtilization",
        "Namespace": "AWS/EC2",
        "Statistic": "Average",
        "Period": "300",
        "EvaluationPeriods": "2",
        "Threshold": "50",
        "AlarmActions": [{
          "Ref": "WebServerScaleDownPolicy"
        "Dimensions": [{
          "Name": "AutoScalingGroupName",
          "Value": {
            "Ref": "WebServerGroup"
        "ComparisonOperator": "LessThanThreshold"
  1. CPUAlarmHigh – This triggers WebServerScaleUpPolicy if the combined CPU utilization of autoscaling group is greater than 60% for 2 consecutive 300 second periods, or 10 min.
  2. CPUAlarmLow – This triggers WebServerScaleDownPolicy if the combined CPU utilization of autoscaling group is less than 50% for 2 consecutive 300 second periods, or 10 min.


Official AWS Cloudformation documentation:


