ShareWis ACT Staging Automation

Are you overwhelmed by all the tools you need to use within a company? Do you feel like lot of your daily tasks are repetitive and could be automatized?
It’s not like we can just stop using some services to make the work environment easier, each service has its purpose and is a painkiller to specific problems plaguing every company.
My name is Adrien Lemaire, I  joined ShareWis in March 2016 as Full Stack Engineer, and I’ll be explaining how we started solving this issue.
At ShareWis, here are some of the tools used by the development team:

  • Slack: main communication tool company-wide, has many integrations to get logs and statuses from services, errors warnings, feedback from customers, etc.
  • Pivotal Tracker: The team’s main project management tool.
  • Github: Our code versioning hosting service and code review tool.
  • AWS: The company main infrastructure hosting service.
  • Wercker: One of our Continuous Integration tools which runs tests on each commit pushed to a GitHub project.

While there are more tools, we found that a lot of repetition occurred within those tools for each development task. Here is a typical flow:

  1. A story is created on Pivotal Tracker, then moved from Icebox to Backlog, and from Backlog to Iteration during weekly meetings.
  2. A developer, let’s call him Mr. A starts the story #123456789, then creates a new git feature branch and develop the code while testing it on a local vagrant image.
  3. Mr. A creates a new GitHub Pull Request #1, then refers the pivotal story #123456789 in it for other developers to easily map the code to the story.
  4. After Wercker tests pass for the PR, Mr. A assigns Mr. B to review the PR (Pull Request). Mr. A also needs to change the pivotal story status to “Finished”
  5. Mr. B reviews the code, and if some design element is included, also pulls the branch and verify it locally.
  6. After reviewing the code and getting Mr. A to fix whatever needs to be fixed, Mr. B adds the label “reviewed” to the PR.
  7. Mr. A can now merge the PR, and delete the old branch in GitHub.
  8. Then, Mr. A needs to manually deploy (using capistrano) the staging instance for the Product Owner Mr. C to user test the story. Mr. A also needs to change the pivotal story status to “Finished”.
  9. The flow finishes with Mr.C  accepting the story. If Mr. C refuses the story, Mr. A needs to restart everything over (new hotfix branch, new PR, etc).

Considering that each developer handles between 2 and 6 stories per day in average, this flow sounds very strict and decreases the coding productivity.
Is there a way to improve productivity by reducing these steps, automatizing the rest while keeping a safe workflow? Thankfully yes, all our services provide APIs! But considering the load of work we already have, allocating resources to such a project seemed like a low priority. The only way to get it done is if it can be done in a timely matter. Since I’m a very lazy developer and had several experiences with automation from previous startup, I proposed a plan for 1 person to develop this in 3~4 days. It got accepted at the beginning of the month, hurray!
Here is the result:


Now, the new workflow is as follows:

  1. A story is created on Pivotal Tracker, then moved from Icebox to Backlog, and from Backlog to Iteration during weekly meetings. A label “auto” is added to relevant tasks needing staging review.
  2. The developer creates his new git feature branch with the pivotal story id in the branch name.
  3. The developer creates a PR
  4. The developer assigns the reviewer to the PR
  5. The Reviewer labels the PR as “reviewed”
  6. The Story owner reviews the pivotal story on a dedicated staging instance, then accepts the story.

Under the hood, a lot has been automatized:

  • When a PR is created, it automatically gets a label “auto” to it, like its pivotal story counterpart.
  • When a reviewer is assigned to the PR, a new staging instance is automatically created and provisioned to the latest commit of the PR. The related pivotal story is automatically finished.
  • When the staging instance is ready, its url is shared on slack and in the github PR and in the pivotal story.
  • When a new commit is pushed to the PR afterwards, a deployment is triggered to update the dedicated staging instance.
  • When the PR is marked as reviewed, the pivotal story is automatically delivered, and the product owner asked to review it with the staging url.
  • When the pivotal story is accepted, the PR is automatically merged, the old branch deleted from GitHub, and the staging instance automatically deleted.
  • For each action, logs are sent to a dedicated slack channel.

Wow, that’s a lot! So how did we implement it ? We basically needed 3 components: a watcher, to get activities from Pivotal Tracker, GitHub and AWS; a builder, to launch and delete new staging instances; and a writer, to send messages to Slack, Pivotal Tracker and GitHub.
We could launch a new server, then send periodical queries to each service’s API, and take action from there. But it’s really not efficient. Since Pivotal Tracker and GitHub allow the use of webhooks, we do not need a timer and will receive activities as soon as they happen. Also, AWS has a neat service called Lambda, which allows to execute functions without caring about the underlying infrastructure.
Since GitHub has a plugin to connect to AWS SNS (Simple Notification Service), we created a dedicated IAM user with appropriate policies, and GitHub started sending activities to Lambda through SNS.
Unfortunately, Pivotal Tracker doesn’t integrate SNS, so it wasn’t possible to use the IAM access id and secret from there. Therefore, we created an AWS API Gateway, which behaves similarly to SNS in that it receives activity requests, and forward them to Lambda.
The Lambda code was written in Node.js, since JavaScript can be read by anybody in the team and all APIs have dedicated npm packages (aws-sdk, pivotaltracker, github4). The code is rather simple in that it checks the request parameters, and take action accordingly. But since some actions need other api calls, it resulted in some sort of callback spaghetti.
Now that we have our webhooks in place and can send back messages to Slack, Pivotal Tracker and Github, the most important part is automating the staging continuous deployment! Using a CloudFormation json template made that quite simple. I created a custom AMI image from the existing staging server and used that image in the template. The template also allows to write UserData (shell scripts or cloud-init directives) as well as MetaData (Cloudformation::Init). I used the later to write each step of the deployment (copied from our capistrano recipe), and voila! Using parameters to specify variables like the git commit to fetch, we can now automatically launch as many staging instances as there are stories to review and approve.
Finally, CloudWatch was used to centralize logs from each component, to simplify debugging while building the workflow.
The allocated timeframe has passed, and there’s a lot more that could be improved:

  • Use a spot instance in a scaling group instead of t1.micro on-demand instances (much cheaper)
  • Auto-shutdown instances during the night
  • Auto-create the PR when the git branch is pushed to Github
  • Make name cnames for the staging instances instead of sharing ips.
  • Auto-rebase other PRs from master once a branch has been merged, so that each staging instance always has the latest code available. Etc

We’ll do it whenever we find the time for it, little by little. The next big step will be automatizing the Production workflow as  well!
All we’ve done was using existing APIs and services, nothing needed to be built from scratch!
How about you, can you find a similar use case quick to implement that will simplify your life?

コメント