The HelloSign Ops team loves our chatbot and the ChatOps concept. As automation becomes more and more prominent in our industry, and with the benefit of enhanced communication tools, a ChatOps movement seemed almost inevitable.
Some key takeaways from Jesse Newland's presentation on ChatOps at GitHub:
- Results in an almost constant 'pairing' in the context of any chatbot functionality. Everyone sees what's happening, and any questions are simply answered in context for everyone to see.
- Automated/repeatable functions. By scripting/automating and creating chatbot functionality, a task is both automated AND as accessible to others as you care to make it.
- Communication for important tasks becomes a mandate; you need not inquire about an event or state, because it's available in your communication platform.
- Ease in incident response: alerts can be resolved by logging in to your communication platform and issuing a command, which should be possible from a mobile device.
ChatOps at GitHub started as a way to promote teaching by doing, and teaching in a public setting (the chatroom). ChatOps has continued to mature and today I think ChatOps is almost as difficult to define as the elusive 'DevOps' term. At HelloSign, ChatOps is about automation, communication, learning, access control, and helping our team grow (whether that growth involves code deployment, meal plans, or memes).
1. Code Deployment
It's easy to have a chatbot running somewhere that can trigger a deployment with as much control as your deployment technique offers. Some of the options HelloSign supports allow deploy commands like:
- push stage01 www develop
- push stage02 docs mybranch@abcd1234
- push environment0001 master clear-sessions
Ours follow the format: "deploy ENVIRONMENT TARGET_NODE_TYPE GIT_BRANCH[:COMMIT][ OPTION]" Some of our other 'deploy' targets that are conceptually similar are puppetmaster, packer config files, and external testing suites. Anywhere that code gets pushed/pulled/cloned/installed is a prime target for a chatbot command.
2. Environment status
What is currently deployed to the staging servers? How many production web instances are we running? Is there a test environment provisioned? What are the oldest logs currently available? At HelloSign many of these questions are tightly linked to AWS state, and as a result we use a combination of custom scripts and AWS API wrappers to pull environment details like:
- ec2 instance tag:Name instance-name
- ec2 instance group-name elasticsearch
- ec2 instance private-dns-name 184.108.40.206
- as group stage01 www
These pull information about ec2 instances directly from ec2-describe-instances with an applied filter (based on the request), or display auto scaling group data by a particular group name using as-describe-auto-scaling-groups.
3. Environment Updates
Our environments are modeled around auto scaling groups in AWS, so the environment updates are often a matter of an update to the auto scaling group in question. If someone is testing a feature in an environment called "someEnvironment," and they need to confirm that it works properly in a larger cluster, they can easily:
- as update someEnvironment docs 4 4
- as update someEnvironment www 2 4
These bring up 4 docs nodes and configure default auto scaling rules with a minimum of 2 and maximum of 4 instances for www nodes. This is very consistent across our various services and makes it easy to increase/decrease our cluster sizes.
Puppet configures our servers at HelloSign, but from an Ops-perspective we follow the Netflix OSS team's proposal that an AMI is the smallest unit of deployment, and in most cases the desired unit is actually an auto scaling group. This means that a 'provisioning' activity for us generally consists of packer generating an AMI with the help of our puppetmaster. Some of the chatbot commands we use to support this are:
- puppetmaster branch@abcd1234
- packer aws stage01 www
- packer vbox dev elasticsearch
These commands update our puppetmaster the desired commit, then generate a stage-web ami and a virtual box image with a dev's elasticsearch cluster.
5. Server Deployment
Depending on your flow, Provisioning and Server Deployment may be the same action. If you 'pre-bake' images like we do at Hellosign, then you need to actually push those images into a cluster before you've made any real progress. In some clusters, we consider this a specific deployment type, in others it is simply an auto scaling group update.
- ami-push stage01 www develop@abcd1234
- as update-lc stage01 docs
The first command updates the ami used for www nodes in the stage01 environment and ensures that they use the desired code upon deploy. The second is a more generic auto scaling group change that generates a new launch configuration to launch instances based on the latest ami to an auto scaling group.
6. File retrieval (s3/cloud)
We sometimes require access to particular content for debugging. This is one of the more cloud-specific implementations, but rather than writing access in an administrative interface we simply use our chatbot to interface with s3 and make particular files accessible. This helps when multiple people are investigating something and also acts as an auditing tool for who was accessing what.
- s3 BUCKET FILE
These requests can push a file directly to your communication platform (where supported), create a temporary link, copy a file to a known location, etc. You can easily decorate requests like this too, if the Marketing team requires access to particular data, maybe a request like "acct_so_2001" in a marketing room or from a marketing team member automatically performs the right s3 command. These types of beautification scripts can often make a big impression on your team.
7. Cat pics
If you're going to have a chatbot around, your co-workers should probably be able to use it to retrieve memes, gifs, images, or whatever from the internet. It's only fair, and it often sparks interesting discussion.
- mustache me MY_BOSS
- image me ALL THE THINGS
- youtube me Rick Astley
Every once in a while, try only communicating via images; you'll see.
8. Office/Admin services
Your chatbot can fetch all sorts of data for you. I tell co-workers that any time they find themselves doing something repeatedly to get new data, they should ask about having the chatbot do it.
Some data that our chatbot retrieves:
- schedules and events (calendars, pagerduty, etc)
- food information (zerocater or yelp)
- twitter (recent tweets about HelloSign)
- weather by zip code
These are easy to implement and save small amounts of hassle for lots of people. They always feel like a big win to me.
9. Self management
Your chatbot should be able to update/kill itself. When we push new chatbot code, it would be terrible to not let the chatbot push that code to itself.
- self update master@abcd1234
- self seppuku
Automate All The Things
Anything that happens 3 or more times fits my requirements for a thing the chatbot should do (scheduling these as a priority is more complicated, of course). These are a few to get your implementation running, but there are so many more. Check out some other common chatops features (hubot specific):
And learn to write your own!