Automating infrastructure: playing factorio on AWS

EC2, Lambda and API Gateway Infrastructure pipeline in code using Ansible, Packer and Terraform ⌛ 17 minutes

The first question you may ask is: What is Factorio and where can I get it?

Quoting Factorio website:

Factorio is a game in which you build and maintain factories. You will be mining resources, researching technologies, building infrastructure, automating production and fighting enemies. Use your imagination to design your factory, combine simple elements into ingenious structures, apply management skills to keep it working and finally protect it from the creatures who don’t really like you.

Sneak peak of the game on their gameplay trailer.

Factorio is the game for engineers who love automation. That’s me, and probably you too.

In this post, Factorio will be the used as guinea pig in the scenario of deploying an app to AWS, using:

Ansible and Packer for the image baking
Terraform for all AWS resources provisioning
AWS Auto Scaling Groups and Spot Instances for a Always On and yet cheap deployment
AWS Lambda for simple management of Factorio instance
AWS API Gateway as a point of entry to the Lambda function
AWS S3 for backing up and restoring game saves

I’ve put together all the examples and created a Github repo. Follow it throughout the post.

Dependencies
Building AMI with Ansible and Packer
- Ansible
- Packer
Provisioning AWS Resources with Terraform
Using the API
Wrapping up

Dependencies

Ansible - 2.3.1.0 - Installation Docs
Packer - 1.0.4 - Installation Docs
Terraform - v0.10.2 - Installation Docs

Building AMI with Ansible and Packer

If you’re new to either of these tools, worry not, I kind of cheated on this one and only used Ansible to install Docker and pull this Factorio image.

Ansible

This setup is very simple. Let’s start with including docker ansible role on requirements.yml:

---

- name: geerlingguy.docker
  src: geerlingguy.docker
  version: 2.1.0

It can now be used on playbook.yml:

---

- hosts: all
  roles:
    - role: geerlingguy.docker
      become: yes

  tasks:
    - name: Pull factorio image
      become: true
      command: "docker pull dtandersen/factorio:{{factorio_version}}"

    - name: Tag image as latest
      become: true
      command: "docker tag dtandersen/factorio:{{factorio_version}} dtandersen/factorio:latest"

    - name: Install the package "duplicity"
      become: true
      apt:
        name: duplicity
        state: present

Now what I love about Ansible is how easy it is to read:

Install docker via role
Pull factorio image and tag it with latest
Install duplicity

Note: When packer connects to the instance via SSH, it uses the admin user to login (at least on Debian). So we need to become root to be able to install the software above, hence all the become’s.

Packer

The packer part is yet the simplest one, with the assumption that you already have a VPC created with a public subnet. If you need help on this step, you can follow this guide from AWS documentation.

After having a VPC and a public subnet ready, just copy the variables sample file and edit accordingly:

$ cat variables.json.sample
{
  "factorio_version": "0.15.34",
  "source_ami": "ami-d037cda9",
  "vpc_id": "",
  "subnet_id": ""
}

$ cp variables.json.sample variables.json
$ vim variables.json

For reference, that AMI is Debian Stretch r1.

Take a look on the packer script factorio.json:

{
  "variables": {
    "factorio_version" : "{{env `factorio_version`}}",
    "source_ami" : "{{env `source_ami`}}",
    "vpc_id" : "{{env `vpc_id`}}",
    "subnet_id" : "{{env `subnet_id`}}"
  },

  "builders": [{
    "type": "amazon-ebs",
    "region": "eu-west-1",
    "source_ami": "{{user `source_ami`}}",
    "instance_type": "t2.micro",
    "ssh_username": "admin",
    "associate_public_ip_address": true,
    "vpc_id": "{{user `vpc_id`}}",
    "subnet_id": "{{user `subnet_id`}}",
    "ami_name": "factorio-{{user `factorio_version`}}-({{isotime \"20060102150405\"}})"
  }],

  "provisioners": [
    {
      "type": "ansible",
      "playbook_file": "./playbook.yml",
      "extra_arguments": [ "--extra-vars", "factorio_version={{user `factorio_version`}}" ]
    }
  ]
}

This is one of the Packer scripts with less lines of code I have seen. It’s just saying “Hey, this is my VPC and subnet, this is the source AMI, just create a new image there. Oh and apply this”. Just like that.

This is possible thanks to the Ansible provisioner from packer, that applies your playbook via SSH from your computer. You don’t need to have Ansible installed on the host.

Let’s build our image:

$ packer build -var-file=variables.json ./factorio.json

When the build finishes, it will print the ID of the new AMI. We’ll need that to provision our instances with Terraform.

Provisioning AWS Resources with Terraform

Now that we have an image, we can use terraform to provision literally everything else on AWS.

Autoscaling Group with a Spot Instance

This setup will probably not going to give you the most common usage of autoscaling groups. Normally one would use them to have dozens of instances, coming up and down in an unattended way, depending on usage, health or even custom metrics. In this use case, we’ll mostly take advantage of the Desired Capacity attribute. This tells AWS how many instances you want running at all times. In our case, it will be only 1.

IAM Roles and Policies

Let’s dig into our terraform files, starting with main.tf:

provider "aws" {
  region = "${var.aws_region}"
}

resource "aws_security_group" "allow_factorio" {
...
}


resource "aws_s3_bucket" "factorio_backups" {
...
}

resource "aws_iam_role" "allow_s3_access" {
  name = "factorio-allow-s3-access"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Principal": {"Service": "ec2.amazonaws.com"},
    "Action": "sts:AssumeRole"
  }
}
EOF

}

resource "aws_iam_policy" "allow_s3_access" {
  name        = "factorio-allow-s3-access"
  path        = "/"
  description = "Allow all S3 actions on factorio bucket"

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": [
        "${aws_s3_bucket.factorio_backups.arn}",
        "${aws_s3_bucket.factorio_backups.arn}/**"
      ]
    }
  ]
}
EOF

}

resource "aws_iam_role_policy_attachment" "allow_s3_access" {
    role       = "${aws_iam_role.allow_s3_access.name}"
    policy_arn = "${aws_iam_policy.allow_s3_access.arn}"
}

resource "aws_iam_instance_profile" "factorio" {
  name  = "factorio"
  role = "${aws_iam_role.allow_s3_access.name}"
}

The security group and the bucket are pretty straight forward resources, so lets skip them and check one of the biggest AWS demons: IAM Roles and Policies.

Because we’ll be backing up our game progress to S3, the instance needs to have permission to write on our bucket. The way to do this on AWS is to associate your instance with an instance profile. This instance profile is associated with a role. This role, in turn, can have multiple policies attached. For a more detailed explanation, please read the IAM Roles for Amazon EC2 documentation page.

In the example above, a role is created, where the assume_role_policy says that it can be assumed by EC2 instances. Then we create a policy that allows every S3 operation specifically for the bucket we created earlier and the objects inside it. Finally, we attach the policy to the role, and then the role to the instance profile that is now ready to use.

Autoscaling group and Spot Instance

# still main.tf

data "template_file" "factorio_init" {
  template = "${file("init.tpl")}"

  vars {
    hostname = "factorio"
    dns_domain = "${var.dns_domain}"
    s3_url = "s3://s3-${var.aws_region}.amazonaws.com/${var.s3_bucket_name}/"
  }
}

resource "aws_launch_configuration" "instance_conf" {
  image_id      = "${var.ami_id}"
  instance_type = "${var.instance_type}"
  key_name      = "${var.key_name}"
  security_groups = ["${aws_security_group.allow_factorio.id}"]
  iam_instance_profile = "${aws_iam_instance_profile.factorio.name}"

  user_data = "${data.template_file.factorio_init.rendered}"
  associate_public_ip_address = true
  spot_price        = "${var.spot_price}"

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "factorio" {
  name                 = "factorio"
  launch_configuration = "${aws_launch_configuration.instance_conf.name}"
  vpc_zone_identifier  = ["${var.subnet_id}"]

  min_size             = 0
  max_size             = 1
  desired_capacity     = 0

  tag {
    key                 = "Name"
    value               = "factorio"
    propagate_at_launch = true
  }
}

Now that we have all the permissions figured out, we can take care of provisioning our instance. The data template_file refers to a template that will be executed on launch via cloud-init. We’ll go into it later.

When you’re setting up an autoscaling group, it requires you two steps. First a launch configuration that will take almost all attributes that a normal instance would have, and then the autoscaling group itself.

Notice on aws_launch_configuration that we are including the iam_instance_profile. We also load our cloud-init template on the user_data attribute, and it’s also here that we set the maximum price we’re willing to pay for an instance.

Regarding the autoscaling group, it needs very few attributes. One of them is the launch configuration we just created.

Moving on to our cloud-init template init.tpl:

#cloud-config
fqdn: ${hostname}.${dns_domain}
hostname: ${hostname}
manage_etc_hosts: true

write_files:
  - content: |
      #!/bin/bash
      echo "Attempting factorio game save restore..."
      duplicity restore ${s3_url} /opt/factorio --no-encryption &> /dev/null || echo "Failed! Skipping"
      echo "Spinning up factorio..."
      docker run -d -p 34197:34197/udp -p 27015:27015/tcp -v /opt/factorio:/factorio --name factorio --restart=always dtandersen/factorio:latest

    path: /opt/setup_factorio.sh
    permissions: '0755'

  - content: |
      */10 * * * * root duplicity /opt/factorio ${s3_url} --no-encryption
      */30 * * * * root duplicity full /opt/factorio ${s3_url} --no-encryption
      5 * * * * root duplicity remove-all-but-n-full 1 ${s3_url} --force --no-encryption

    path: /etc/cron.d/factorio_backups
    permissions: '0644'

runcmd:
  - /opt/setup_factorio.sh

Here is where the magic happens regarding backup and restore of game saves to S3. Because we’re using duplicity, very simple commands can get the job done.

For backups, there are 3 cron jobs:

An incremental backup every 10 minutes (matches game setting of auto-saves)
A full backup every 30 minutes, to avoid having to download a lot of increments (helps decrease the number of S3 GetObject operations)
Every 3 hours, a deletion of all backups except the most recent one and its increments

As for restores, the instance will run /opt/setup_factorio.sh on boot, which will try to restore the last valid backup available on our S3 bucket. If no valid backup is found, factorio will create a new game by default.

To learn more about duplicity check [Personal backups: The geek way]({{ site.baseurl }}{% post_url 2017-06-22-personal-backups-the-geek-way %}).

At this stage, you can already start playing Factorio with automated backups and restores. It is not handy to login everyday to AWS Console just to turn the game on and off. Let’s move to… Lambda!

Lambda

Even though it’s been a hot topic for some time now, a lot of people still don’t understand what is AWS Lambda or Serverless. Quoting Martin Fowler’s Serverless Architectures:

Serverless architectures refer to applications that significantly depend on third-party services (knows as Backend as a Service or “BaaS”) or on custom code that’s run in ephemeral containers (Function as a Service or “FaaS”), the best known vendor host of which currently is AWS Lambda. By using these ideas, and by moving much behavior to the front end, such architectures remove the need for the traditional ‘always on’ server system sitting behind an application. Depending on the circumstances, such systems can significantly reduce operational cost and complexity at a cost of vendor dependencies and (at the moment) immaturity of supporting services.

In our case, we just want something to turn on and off our factorio instance. You definitely don’t need a server running 24/7 just for that.

AWS Lambda in particular, triggers in response to several AWS events. One of them is doing a request to an API Gateway endpoint, which is what we’ll be doing.

As of writing Lambda supports the following stacks:

Node.js – v4.3.2 and 6.10.3
Java – Java 8
Python – Python 3.6 and 2.7
.NET Core – .NET Core 1.0.1 (C#)

The function I’ll present you is written in Node.js, but for no particular reason, you could use any of the above.

Lambda function

# manage_factorio.js

// Configuring the AWS SDK
var AWS = require('aws-sdk');
AWS.config.update({region: process.env.REGION});

function setAsgDesiredCapacity(asgName, desiredCapacity, callback) {

  var autoscaling = new AWS.AutoScaling();

  var params = {
    AutoScalingGroupName: asgName,
    DesiredCapacity: desiredCapacity,
    HonorCooldown: false
  };

  autoscaling.setDesiredCapacity(params, function(err, data) {
    if (err) {
      callback(null, '{"ack":"false","reason":' + JSON.stringify(err) + '}');
    } else {
      callback(null, '{"ack":"true"}');  // successful response
    }
  });
}

function startFactorio(asgName, callback) {
  setAsgDesiredCapacity(asgName, 1, function(err,data){callback(err,data)});
}

function stopFactorio(asgName, callback) {
  setAsgDesiredCapacity(asgName, 0, function(err,data){callback(err,data)});
}
...

function statusFactorio(asgName, callback) {
...
}

exports.handler = function(event, context, callback) {
  if (event.token !== undefined && event.action !== undefined && event.token == process.env.AUTH_TOKEN) {
    if (event.action == "start") {
      startFactorio(process.env.ASG_NAME, function(err,data){callback(err,data)});
    } else if (event.action == "stop") {
      stopFactorio(process.env.ASG_NAME, function(err,data){callback(err,data)});
    } else if (event.action == "status"){
      statusFactorio(process.env.ASG_NAME, function(err,data){callback(err,data)});
    } else {
      callback(null, '{"ack":"false","reason":"Action specified does not exist"}');
    }
  } else {
    callback(null, '{"ack":"false","reason":"Wrong token specified and/or missing action parameter."}');
  }
};

In this function there are two things to pay attention to. The first thing is the handler function, which is where the function will begin:

exports.handler = function(event, context, callback)

When the function is triggered, Lambda calls the handler function and your piece of computation starts from there. In this case I assume that the event is JSON and just look for the fields I decided were required: action and token.

Possible actions are start, stop and status, while the token was just for implementing a very simple authentication mechanism (don’t tell your security guy!). For a more serious authentication mechanism you could use AWS Cognito or Auth0.

The second thing is that we are using an autoscaling group to provision our Factorio instance, so this Lambda function basically turns Factorio on and off by setting the desiredCapacity of the autoscaling group to 1 or 0 respectively.

Provisioning Lambda function

Going back to terraform to provision all the AWS resources needed to get the Lambda function working as expected.

#lambda.tf

resource "aws_iam_role" "iam_for_lambda" {
  name = "factorio_iam_for_lambda"
  ...
}


resource "aws_iam_policy" "allow_asg_access" {
  name        = "factorio-allow-asg-access"
  path        = "/"
  description = "Allow lambda to do API requests on autoscaling and ec2"
  ...
}

resource "aws_iam_role_policy_attachment" "allow_asg_access" {
...
}

resource "aws_lambda_function" "manage_factorio" {
  filename         = "manage_factorio.zip"
  function_name    = "manage_factorio"
  role             = "${aws_iam_role.iam_for_lambda.arn}"
  handler          = "manage_factorio.handler"
  source_code_hash = "${base64sha256(file("manage_factorio.zip"))}"
  runtime          = "nodejs6.10"
  timeout          = 5

  environment {
    variables = {
      ASG_NAME = "factorio"
      REGION = "${var.aws_region}"
      AUTH_TOKEN = "${var.lambda_auth_token}"
    }
  }
}

# Allow api gateway
resource "aws_lambda_permission" "allow_api_gateway" {
  ...
}

If you paid attention to the Lambda function code, you noticed the aws-sdk is being required and used to set the desired capacity of the autoscaling group, as well as for getting the status of that instance (check the code on Github).

Just like S3, Lambda also needs IAM roles setup to allow other services to trigger its functions.

This aws_iam_role is exactly the same we used for our S3 backups, but authorizing lambda.amazonaws.com instead of ec2.amazonaws.com.

This time, the aws_iam_policy is allowing Lambda to perform the following operations:

ec2:DescribeInstances
autoscaling:DescribeAutoScalingGroups
autoscaling:SetDesiredCapacity

Finally, in aws_lambda_function we give the path to the lambda function zip file (yes, you need to zip it first), associating the function with the IAM role we created above, passing the handler name, runtime and environment variables.

Of all of them, the tricky attribute here is the handler! Where manage_factorio is the name of the js file, and handler happens to be our handler’s name. If the Lambda function started with:

exports.factorio_manager = function(event, context, callback)...

then the value of the handler attribute in this Terraform resource would be manage_factorio.factorio_manager. This example sounded weird, that’s why I went with just handler.

We’ll look into the aws_lambda_permission after we went through the API Gateway provisioning.

API Gateway

We want to trigger our Lambda function in a seamless way and setting up an API Endpoint that surely a way to do it.

Resources, Methods and Integration with Lambda

# apigateway.tf

resource "aws_api_gateway_rest_api" "factorio" {
  name        = "factorio"
  description = "This is an API to perform start and stop actions on factorio"
}

resource "aws_api_gateway_resource" "manage" {
  rest_api_id = "${aws_api_gateway_rest_api.factorio.id}"
  parent_id   = "${aws_api_gateway_rest_api.factorio.root_resource_id}"
  path_part   = "manage"
}

resource "aws_api_gateway_method" "post" {
  rest_api_id   = "${aws_api_gateway_rest_api.factorio.id}"
  resource_id   = "${aws_api_gateway_resource.manage.id}"
  http_method   = "POST"
  authorization = "NONE"
  api_key_required = true
}

resource "aws_api_gateway_integration" "lambda" {
  rest_api_id             = "${aws_api_gateway_rest_api.factorio.id}"
  resource_id             = "${aws_api_gateway_resource.manage.id}"
  http_method             = "${aws_api_gateway_method.post.http_method}"
  integration_http_method = "POST"
  type                    = "AWS"
  uri                     = "arn:aws:apigateway:${var.aws_region}:lambda:path/2015-03-31/functions/${aws_lambda_function.manage_factorio.arn}/invocations"
}
...

The code above shows how you create an API called factorio, with a resource called manage, that accepts the POST method. That means you can do a POST request to a URL that looks like sampleapi.com/factorio/manage/.

Now one of the advantages of using AWS API Gateway is that you can integrate it with other AWS products, like… Lambda. In the aws_api_gateway_integration we’re basically associating our API method with what happens to be a Lambda function. An important note: the integration_http_method refers to the kind of http method that will be used to communicate with your integration. In this case, Lambda only accepts POST, hence we using it.

Revisiting lambda.tf:

resource "aws_lambda_permission" "allow_api_gateway" {
  statement_id  = "AllowExecutionFromAPIGateway"
  action        = "lambda:InvokeFunction"
  function_name = "${aws_lambda_function.manage_factorio.arn}"
  principal     = "apigateway.amazonaws.com"

  source_arn = "arn:aws:execute-api:${var.aws_region}:${var.aws_account_id}:${aws_api_gateway_rest_api.factorio.id}/*/${aws_api_gateway_method.post.http_method}${aws_api_gateway_resource.manage.path}"
}

It’s easier to understand now what this aws_lambda_permission does, which is to allow the Lambda function we created earlier to be invoked by API Gateway, specifically from this source_arn, that refers to the Method we just created.

Method and Integration Responses

...
resource "aws_api_gateway_method_response" "post" {
  rest_api_id = "${aws_api_gateway_rest_api.factorio.id}"
  resource_id = "${aws_api_gateway_resource.manage.id}"
  http_method = "${aws_api_gateway_integration.lambda.http_method}"
  status_code = "200"

  response_models = {
    "application/json" = "Empty"
  }
}

resource "aws_api_gateway_integration_response" "lambda" {
  rest_api_id = "${aws_api_gateway_rest_api.factorio.id}"
  resource_id = "${aws_api_gateway_resource.manage.id}"
  http_method = "${aws_api_gateway_method_response.post.http_method}"
  status_code = "${aws_api_gateway_method_response.post.status_code}"

  response_templates = {
    "application/json" = ""
  }
}
...

API Gateway let’s define models for your method response to transform your the data to one or more output formats. Because we don’t want to transform our data, we’ll just set it to Empty.

It also let’s you set different mapping templates for your integration responses. We won’t need to do any mapping so we’ll leave it blank.

Read here for more information on both Models and Mapping templates.

Stages, Deployments and Usage Plans

...
resource "aws_api_gateway_deployment" "factorio" {
  depends_on = ["aws_api_gateway_method.post"]

  rest_api_id = "${aws_api_gateway_rest_api.factorio.id}"
  stage_name  = "factorio"

  stage_description = "Live api for factorio management"

}

resource "aws_api_gateway_usage_plan" "usage_plan" {
  name         = "rate-limiter"
  description  = "Limit calls to 2/s and 1k a day"

  api_stages {
    api_id = "${aws_api_gateway_rest_api.factorio.id}"
    stage  = "${aws_api_gateway_deployment.factorio.stage_name}"
  }

  quota_settings {
    limit  = 1000
    offset = 0
    period = "DAY"
  }

  throttle_settings {
    burst_limit = 1
    rate_limit  = 1
  }
}

resource "aws_api_gateway_api_key" "my_factorio_key" {
  name = "my-factorio-key"

  stage_key {
    rest_api_id = "${aws_api_gateway_rest_api.factorio.id}"
    stage_name  = "${aws_api_gateway_deployment.factorio.stage_name}"
  }
}

resource "aws_api_gateway_usage_plan_key" "factorio" {
  key_id        = "${aws_api_gateway_api_key.my_factorio_key.id}"
  key_type      = "API_KEY"
  usage_plan_id = "${aws_api_gateway_usage_plan.usage_plan.id}"
}

Once you have everything up and running, mimicking a read world application, you’ll want to have different environments for better testing and eventually deploy to production. This is what stages and deployments are for.

By setting up different stages you’re creating all those different environments (usually something like sandbox, staging, qa, production, etc, depending on how you want your pipeline). A deployment will represent a snapshot of one of those API stages that becomes callable by your API users.

This Factorio deploy is so awesome it just needs one stage and one deployment. By creating the aws_api_gateway_deployment resource we are also creating a stage by the name of factorio.

And the API is up and running, ready to be used! Now because we all know how software works, and because we know it is never used properly be it on purpose or not, we’ll create a Usage Plan for our API to avoid abuse and more importantly, huge costs.

In this example I found that turning on factorio 1000 times per day would be enough, and that allowing 1 request per second would be reasonable.

API Clients are now prevented from damaging our wallet, so then we create them an API key and associate it with our Usage Plan. This association is needed because you can create different keys with different Usage Plans.

Using the API

To help demonstrate the use of all the trouble we just had with API Gateway, I wrote a small python script for starting and stopping the Factorio instance, and also getting its status.

First create your env file (or export directly on your terminal):

$ cp .env factorio.env
$ cat factorio.env
export FACTORIO_API_URL=
export FACTORIO_API_KEY=
export FACTORIO_AUTH_TOKEN=
$ vim factorio.env

The result should look like this:

export FACTORIO_API_URL=https://09yxn77opyu.execute-api.eu-west-1.amazonaws.com/factorio/manage/
export FACTORIO_API_KEY=DjH3vyWrQY9076MScTXuktGm55MKlu6jB8jHSw0tx
export FACTORIO_AUTH_TOKEN=KlYjIhacbfbbagKiQf1X6vCprsKsI8Faeyd6frREo8vxELwxDAwSWiKf0KKzTsOU

Now just run the script with the arguments start, stop or status:

$ source factorio.env
$ ./factorio --help
usage: factorio [-h] <action>

Manage factorio in AWS.

positional arguments:
  <action>    action to perform (start|stop|status)

optional arguments:
  -h, --help  show this help message and exit

$ ./factorio status
INSTANCE_IP		STATE
-		No instances running

$ ./factorio start
Factorio started successfully

Disclaimer: Even though it tells you right away that the instance started, Virtual Machines in AWS do take a few minutes to spin up, so be patient and check the status until it returns you the IP of the Factorio instance. Also take in to account that we’re using spot instances, and that they are subject to a market price and infrastructure availability. If for some reason the price gets higher than what we chose as limit (0.03$/h), the spot request will only be placed when the instance cost goes below that limit.

Wrapping up

While we got to get a sneak peak of a lot of different AWS Products and tools, there is a lot more to say about each one of them. This is a birds eye view of what the infrastructure pipeline of your production application could look like in code. This example would surely need a LOT of work if you were to sell some kind of Factorio as a Service, but it’s a good place to start.

Don’t forget to give the code a try and please give feedback or even contribute!

Happy mining!

Sep 28, 2017· Ricardo Marques

Your comments are welcome!
All comment data is hosted on Capsule One and is not tracked. See the privacy policy.

Table of Contents