Unplugging from the Matrix

Today’s my first day back at work after a week long holiday. A holiday is nothing unusual of course, but I wanted to write down a few thoughts on an aspect of my holiday which is a little more unusual. For the entire duration of that week, I was totally disconnected from the internet. I didn’t take my laptop with me. There was no wi-fi where I was staying. Data roaming on my phone was turned off.

This is a habit I acquired around two years ago now, and every year I make sure that I spend at least one (but preferably two weeks) completely cut off from the online world. Imagine that – no email, twitter, github, irc, skype or interwebs.

But guess what. The world kept on turning quite happily without me. Ops folk (and developers as well) are almost universally bad at unplugging in this way. Many of our professional lives revolve around the online world and the thought of forcibly disconnecting ourselves makes us all twitchy.

I must admit, my motivations for these annual offline weeks are purely selfish – I find it recharges my batteries and refreshes my mind. That’s been especially important this year as I’m entering the closing stages of writing a book, and had been skating on the edge of burnout-territory for a couple of months.

But there’s another side to this as well – I was catching up with one of my colleagues this morning, Pete Bellisano, who made the following observation:

I’d imagine it’s incredibly healthy thing to do, both from a personal and organizational view point

My feeling is that Pete was entirely correct on this point – although taking a week away from the constant barrage of information is good for me on a personal level, it’s also a healthy exercise for the company I work for and the people I work with.

The fact that I was able to take that week away without anybody having to call me on my mobile – the only means of communicating with me during that week – let us know that we’re doing a good job of making sure that I’m not a single point of failure for anything. It’s the classic “what if X gets hit by a bus” dilemma.

It’s my personal opinion that everybody should try taking an offline week at least once a year – without fail, I return to work feeling refreshed and ready to rock. I’m also writing the first post on this blog since October last year, which says something by itself!

Having said that, I realise that it’s not possible for everybody to do that – you might be the only person on your team, or your organization might not be all that “bus proof” at the moment. So my challenge to you is this:

  • Take 1 week off work per year and totally disconnect from the internet.
  • If you can’t do the above this year, aim to be able to do it next year. Figure out what you need to change to make that happen, and focus on fixing it.

Even if by this time next year you’re still not able to take that offline week, hopefully you’ll have gotten closer to it – and if you’re already working in a good team where you’re not a single point of failure you haven’t got any excuses. Leave the laptop at home, and go take a break. You won’t regret it.

 

knife-spork 1.3.0 released

I don’t usually write blogposts for each new knife-spork release, but along with the usual smattering of bugfixes this release has a couple of fairly significant new features I wanted to highlight and explain in greater detail.

Spork Omni

One of the most requested features in knife-spork was a more simplified workflow. A lot of the people who use knife-spork follow the bump, upload, promote (or promote –remote) pattern every time they change a cookbook.

With that in mind, I’ve added a new command, spork omni. This essentially combines bump, upload and promote (or promote –remote) in to one step. Here’s an example of omni in action:

$ knife spork omni apache2 --remote
OMNI: Bumping apache2
Successfully bumped apache2 to v0.3.99!

OMNI: Uploading apache2
Freezing apache2 at 0.3.99...
Successfully uploaded apache2@0.3.99!

OMNI: Promoting apache2
Adding version constraint apache2 = 0.3.99
Saving changes to development.json
Uploading development.json to Chef Server
Promotion complete at 2013-08-08 11:43:12 +0100!
Adding version constraint apache2 = 0.3.99
Saving changes to production.json
Uploading production.json to Chef Server
Promotion complete at 2013-08-08 11:43:12 +0100!

When using omni, all spork plugins run as if each individual step were being run, which is what in actual fact is happening under the hood. Omni is really a convenient wrapper for the most common spork workflow.

Node, role and databag Commands

One of the annoyances that we’ve experienced at Etsy is that whilst spork gives us excellent visibility into cookbook changes, we’re still effectively in the dark when it comes to role, node and databag changes. With that in mind, I’ve added spork commands which wrap all of the destructive knife default node, role and databag commands.

By destructive commands, I mean those which in some way alter the chef server by changing a run_list, uploading new data bag items etc. All of the spork equivalents run the default knife command under the hood, they just wrap them in spork’s plugin API so that you’re able to see IRC notifications when you upload a role, say.

The following data bag commands are provided in knife-spork:

knife spork data bag create
knife spork data bag delete
knife spork data bag edit
knife spork data bag from file

The following node commands are provided in knife-spork:

knife spork node create
knife spork node delete
knife spork node edit
knife spork node from file
knife spork node run_list add
knife spork node run_list remove
knife spork node run_list set

The following role commands are provided in knife-spork:

knife spork role create
knife spork role delete
knife spork role edit
knife spork role from file

And here’s an example of the IRC notification you might see from running knife spork node run_list set mynode.mydomain.com

CHEF: Jon Cowie set the  run_list for mynode.mydomain.com to 
["role[Base]", "recipe[awesome::stuff]"]

You can find knife-spork-1.3.0 on Rubygems.org now, please do check out the CHANGELOG for details on the rest of the changes in the new version.

Why I retired mycrot.ch

Way back in 2009, I decided to buy myself an amusing vanity URL. After much careful thought, I ended up choosing mycrot.ch (I won’t lie, because I thought it was funny as hell), which I have been using for my personal site / blog until the end of last week. But I’ve now decided to retire it. Why? Read on…

Those of you who follow me on twitter may have seen part of a minor argument I had with some folks on Friday night. Here’s a link to as much of the thread as twitter will show me in one place (I’ll paste screenshots of tweets excluded from this below as needed) https://twitter.com/jonlives/status/272037883421020160

Now, for context as to what I was complaining about, have a look at the hall of fame at the bottom of this page.

The gist of the argument is that I think it’s totally unacceptable for Comic Relief to be suggesting it might be funny to place your underpants on the desk of your receptionist and tell her that you’re going commando – in my (non lawyer) opinion, this sort of thing is pretty clearly sexual harassment  As you can see from the above twitter thread, several people (including @choosenick and @stef who worked on the project) didn’t see my complaint as anything other than being overly sensitive. So I replied with the following:

The aforementioned @choosenick then called me out with the following:

From subsequent comments on twitter I’m fairly sure he meant the above comment to be flippant, but it got me thinking. He’s actually got a perfectly valid point. Sure, my domain name is personal and not specifically directed at women, but is it really *that* different to what I’m complaining about? A good friend of mine observed that mycrot.ch was what he called “undirected sexual humour”, the problem arising with the typical direction in which such humour is normally directed – ie towards women. I have no actual evidence that my domain name ever offended anyone, but it’s still symptomatic of the sort of “bro” humour that pervades tech these days.

My employer, Etsy, have recently been doing an excellent job of encouraging more women to apply for tech roles, and it’s got me thinking a lot more about the issue recently. I generally fall into the apathetic bracket when it comes to issues and doing anything about them, but this time it struck me I could actually do something to demonstrate where I stand.

So, I shut down mycrot.ch and tweeted the following.

The following are the direct @replies I received in response to that tweet, all from people involved in the Comic Relief project:

It may be that I’m over-reacting, or making tiresome assumptions about what women find offensive, but my honest opinion is that by perpetuating (even non-directed) sexual humour, I’m also helping to perpetuate the idea that it is normal and accepted within the Tech Community at large.

I’ve now migrated my personal site and blog to the jonliv.es domain you’re reading this on now. My personal website is also directly linked to my reputation, both professional and personal, and I’m making a stand that I don’t want either to say “Hey, this guy’s a bro just like us”. I feel strongly that we as a tech community need to take steps to reverse the “testosterone and booth-babes” culture which has become ingrained in our industry. It may even be the case that until the scales are balanced again, we have to become more sensitive about our humour etc than we’d perhaps like to be the case.

Maybe one day, when any woman who wants to work in tech can be sure of being treated as fairly as her male counterparts, and isn’t deterred from working in tech for any reason other than genuinely not being interested, I’ll reclaim mycrot.ch. But for now, jonliv.es.

Incidentally, Comic Relief still have that video online, so please do give them your feedback if you feel the same way as I do.

knife-spork 1.0.0 released

Ohai chefs!

It’s been nearly 3 months since the last knife-spork release, but I haven’t forgotten about you all. Oh no. I’m happy to announce that finally knife-spork has hit version 1.0.0, and you can get it from Rubygems or Github right now.

I don’t usually write blogposts for new knife-spork releases, but since a lot of the changes in this release are behind the scenes, I thought I’d give you all a bit more insight into what’s been done that just pointing you at the Changelog.

Since the first version of knife-spork was released in Jan 2012, there’s been what developers like to call a lot of “organic growth” in the codebase. In short, what this means is that I added a ton of new stuff without cleaning up the old. The result of this was that (and I’ll be the first to admit this) the codebase was rather cluttered, and contained a lot of duplicated code. This isn’t exactly ideal for me as the main programmer, but it’s doubly annoying for those of you who’ve contributed code to knife-spork, because finding your way around the source for the first time to figure out where your contribution belongs isn’t the easiest.

With that in mind, I’d like to say a big thank you to Seth Vargo from Customink (@sethvargo) who took on the task of refactoring the entire thing! Seth’s cleaned up the code, cleaned up command outputs and structured things a lot more logically and generally give the code a good tidyup.

Seth also contributed a significant new feature to knife-spork with his implementation of a plugin framework. Support for several external systems like irccat, Hipchat and Git was already present in knife-spork, but Seth’s separated this out into a proper plugin framework which will make it far easier to work on specific integration points and integrate new systems.

Alongside Seth’s sterling work, I’ve also added a few new features:

  • The spork-config.yml schema has changed slightly to reflect the new plugin framework. Please check out README.md in the knife-spork repo root for more details
  • “knife spork info” command to display the config Hash that spork is using, and show you the status of any Plugins you have installed
  • spork check now gracefully handles missing local / remote cookbooks (bug reported by Jason Perry)
  • spork check now has an optional “–fail” parameter to make it throw a non-zero exit code if any checks fail (suggestion by Jason Perry)
  • A new safety check has been added which will prompt you for confirmation if you’re promoting a version constraint more that version_change_threshold versions ahead
    • The abovementioned version_change_threshold is now a configurable parameter in your spork-config file which defaults to 2 if not set.
    • Version diffs are calculated as follows: A patch level release increments by 1, a minor level release increments by 10, a major release increments by 100.
    • The default threshold value corresponds to the patch level changing ie from 1.0.1 to 1.0.3

I’d also like to add a note of thanks to Bethany Benzur and Nick Marden who submitted patches for a couple of bugs pre-refactor. Although not present in the original form they were submitted in, both patches have been incorporated into the refactored code.

There’s a whole other pile of new features I wanted to get into this release, but I decided to get the new code out there for you all to use, and save the new shiny for version 1.0.1 Here’s a sneak preview of some of the stuff I’m working on for the next release:

  • Configurable git behavior modes to support different git workflows
  • The ability to “lock” and “unlock” cookbooks you’re working on
  • Expose organizations information in notification messages
  • Cookbook diffing to warn you about large changes

As always, thanks for using knife-spork and please keep the suggestions, Issue reports and Pull requests coming!

Scaling Chef with more API Workers

We’re big fans of Opscode’s chef software at Etsy, and are using it on close to 700 nodes. Recently though, we found that we were beginning to see a large number of connection time outs during Chef runs. A little digging revealed that although the hardware on which we run Chef was by no means struggling, the API worker (the process running on port 4000 you point knife at by default) was continually maxing out a CPU core.

The default configuration which Chef ships with runs a single API worker, which is more than sufficient for most environments but evidently we’d hit the limit of what that worker could handle. Fortunately, scaling Chef to spawn more workers and make better use of a modern multi core machine is easy, though a little poorly documented. So, as with most of the posts I write here, I thought I’d document the process for anyone else hitting the same issues.

Please note, the following instructions are for Redhat / CentOS based systems, although most of the steps are platform agnostic.

The first step to multiple worker nirvana is to configure chef-server to start multiple worker processes. To do this, you’ll want to edit /etc/sysconfig/chef-server and change the OPTIONS line to the following, changing the number of processes as desired – in this example, we’re starting 8:

#Configuration file for the chef-server service
#CONFIG=/etc/chef/server.rb
#PIDFILE=/var/run/chef/server.pid
#LOCKFILE=/var/lock/subsys/chef-server
#LOGFILE=/var/log/chef/server.log
#PORT=4000
#ENVIRONMENT=production
#ADAPTER=thin
#CHILDPIDFILES=/var/run/chef/server.%s.pid
#SERVER_USER=chef
#SERVER_GROUP=chef
#Any additional chef-server options.
OPTIONS="-c 8"

Once you’ve done this, run /etc/init.d/chef-server restart, and then run “ps -ef | grep merb”. You should now see output similar to the following:

chef 16495 1 10 Feb23 ? 2-02:55:03 merb : chef-server (api) : worker (port 4000)
chef 16498 1 8 Feb23 ? 1-15:48:30 merb : chef-server (api) : worker (port 4001)
chef 16503 1 8 Feb23 ? 1-17:33:12 merb : chef-server (api) : worker (port 4002)
chef 16506 1 8 Feb23 ? 1-17:34:43 merb : chef-server (api) : worker (port 4003)
chef 16509 1 9 Feb23 ? 1-17:59:06 merb : chef-server (api) : worker (port 4004)
chef 16515 1 8 Feb23 ? 1-17:45:54 merb : chef-server (api) : worker (port 4005)
chef 16518 1 8 Feb23 ? 1-16:06:50 merb : chef-server (api) : worker (port 4006)
chef 16523 1 8 Feb23 ? 1-17:39:14 merb : chef-server (api) : worker (port 4007)

As you can see from the above output, the new worker processes have been started on ports 4000 through 4008. If we want our chef-clients to hit our new workers, we’re going to need a load balancer sitting in front of the workers. Luckily since our worker processes communicate over HTTP, we can use Apache for this through the use of it’s mod_proxy_balancer module. I’m going to assume that you’re familiar with the basics of setting up Apache here, and just cover the specifics of load balancing our workers.

The following vhost example shows how to enable the mod_proxy_balancer module and balance across our new worker processes.

<VirtualHost *:80>
   ServerName chef.mydomain.com
   DocumentRoot /usr/share/chef-server/public
   ErrorLog /var/log/httpd/_error_log
   CustomLog /var/log/httpd/access_log combined
   <Directory /usr/share/chef-server/public>
     Options FollowSymLinks
     AllowOverride None
     Order allow,deny
     Allow from all
   </Directory>
   <Proxy balancer://chefworkers>
     BalancerMember http://127.0.0.1:4001
     BalancerMember http://127.0.0.1:4002
     BalancerMember http://127.0.0.1:4003
     BalancerMember http://127.0.0.1:4004
     BalancerMember http://127.0.0.1:4005
     BalancerMember http://127.0.0.1:4006
     BalancerMember http://127.0.0.1:4007
   </Proxy>
   <Location /balancer-manager>
     SetHandler balancer-manager
     Order Deny,Allow
     Deny from all
     Allow from localhost
     Allow from 127.0.0.1
   </Location>
  RewriteEngine On
  RewriteCond %{REQUEST_URI} !=/balancer-manager
  RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
  RewriteRule ^/(.*)$ balancer://chefworkers%{REQUEST_URI} [P,QSA,L]
</VirtualHost>

You might notice that I’ve omitted our original worker on port 4000 from the balancer pool – this is so that we can migrate traffic off our overloaded single worker without throwing any more at it. Once all of our nodes are talking to the load balanced pool, our original worker will be idle and can then safely be added into the pool with its fellows.

Once you’ve configured a suitable vhost with your worker pool, restart Apache and make sure that the host name you configured works properly. It’s also worth having a look at the balancer-manager we configured above as well (http://yourhost/balancer-manager) as this will show you the status of your worker pool and let you tweak weightings and so on if you so desire.

Now that our load balanced worker pool is up and running, all that remains is to point chef-client on our nodes at the new host name. I’m going to assume here that you’re cheffing out your client.rb file – you are cheffing out your client.rb, aren’t you? Anyway, this step is as simple as changing the chef-server line from port 4000 to port 80 (or whatever port you set up your Apache vhost on) – a sample snippet from client.rb is below:

# Main config
log_level :info
log_location "/var/log/chef/client.log"
ssl_verify_mode :verify_none
registration_url "http://chef.mydomain.com:80"
template_url "http://chef.mydomain.com:80"
remotefile_url "http://chef.mydomain.com:80"
search_url "http://chef.mydomain.com:80"
role_url "http://chef.mydomain.com:80"
client_url "http://chef.mydomain.com:80"
chef_server_url "http://chef.mydomain.com:80"

With that all done, presto chango – your chef-clients are now pointing at a shiny new pool of load balanced workers making use of as many CPU cores as you can throw at them. Once chef-client has run on all of your nodes, you’ll probably want to add our original worker on port 4000 into the loadbalancer pool again as well.

It’s worth noting that we found the optimum number of worker processes for our setup to be 10. We’re running close to 700 nodes with an interval of 450 seconds and a splay of 150 seconds, but your mileage may vary. Providing your chef-sever’s underlying hardware can handle it , keep adding workers until you stop seeing connection timeout errors. I’d recommend you don’t add more workers then you have CPU cores, and remember that you need to leave enough free cores for the rest of Chef’s processes.