Sunday, January 18, 2015

Averages are not good enough (F#)

Let's (no pun intended) look at a set of response times of a web service.

People like having a single number to summarize a piece of data. The average is the most popular candidate. The average is calculated by dividing the sum of the input elements by the number of input elements.

The average is a measure of centre which is fragile to outliers; one or two odd irregular values might skew the outcome. The median on the other hand is always representative of the centre, not just when the data distribution is symmetric. The median is determined by sorting the input elements and picking the one in the middle.

Both measures are terrible at showing how the data is distributed though. The average and median response time might look fair, but maybe there are a few outliers which are giving a few customers a bad time.

Instead of reducing our input down to a single number, it might be better to extract a table that displays the frequency of various outcomes in the input.

Now this is more useful; we can clearly see that the data is not distributed equally and there are a few outliers in our response times we need to investigate further.

This table takes up quite a bit of ink though. What if we want to get rid of the table, but maintain a feel for the distribution of the data?

The standard deviation measures the amount of variation from the average. A low standard deviation means that the data points are very close to the mean. A high standard deviation indicates that the data points are spread out over a large range of values.
It is calculated by taking the square root of the average of the squared differences of the values from their average value.

The standard deviation is even more useful when you put the average at the centre of a graph, lay out the input elements according their deviation of the average and see a bell graph drawn. This means that we can use the empirical 68-95-99.7 rule to get a feel of how the data is distributed.
In statistics, the so-called 68–95–99.7 rule is a shorthand used to remember the percentage of values that lie within in a band around the mean in a normal distribution with a width of one, two and three standard deviations, respectively; more accurately, 68.27%, 95.45% and 99.73% of the values lie within one, two and three standard deviations of the mean, respectively. 

For our set of response times, this means that 68.27% of the response times lies within the 24.8 and 66.4 range, 95.45% lies within the 4 and 87.2 range, while 99.73% lies within the -16.8 and 108 range.

When we calculate the standard deviation, we can put one extra number next to the average and derive from just two numbers how the data is distributed.

In conclusion, the mean and the median hide outliers. Looking at the frequency distribution gives you a more complete picture. If we insist on having less data to look at, the standard deviation and the 68–95–99.7 rule can compress the same complete picture into just two numbers.  

Sunday, January 11, 2015

Consumed in 2014

Starting 2014, I wanted to look more closely at everything I consume. So I started keeping a list of everything I read, watch and listen to.

I started off with a markdown file on Github that quikcly evolved into a good excuse to dabble with an alternative stack. I ended up writing an event sourced node.js application on top of postgres, hosted on Heroku. I could write a post on that particular experience, but it's safe to say this blog post captures it better than I ever could.

In 2014, I listened to 10 audiobooks and 18 podcasts, read 20 books and watched 12 movies and 13 shows.

I just went over the list and cherry picked three items per category that stuck with me the most.

Books
  1. Addiction by Design: An incredibly complete overview of the gambling industry, with insights into the human psyche which apply far outside the domain of gambling. 
  2. 11/22/63: I try to read a fiction book for every non-fiction book I read. This was my first "real" King book, and I was hooked instantly. I devoured the 880 pages thick book in less than two weeks, and was left with a big gaping hole in my heart - the journey was more than worth it. 
  3. SQL Performance Explained: The author succeeds in capturing the essentials in only 204 pages. This is one of those books you want to have going around at the office. 
Movies
  1. Be Here To Love Me: Not-your-average documentary about Townes van Zandt; my musical and philosophical discovery of 2014. A troubled soul, struggling with the futility of existence. Simple poetic songs beautifully crafted, capturing in a few words what would take others books. While darkness inspired his best work, it was also the search for relief of that same darkness that led him further down the path of self-destructive behaviour. Painful to watch. RakeWaiting around to dieSnake Mountain BluesDollar Bill Blues.
  2. 3-10 to Yuma: A true modern Western, capturing the American West beautifully, making me nostalgic.
  3. Dallas Buyers Club: Is there one role Matthew McConaughey can't hack?
Townes Van Zandt



Audio books
  1. On Writing - A Memoir of the Craft: "What follows is an attempt to put down, briefly and simply, how I came to the craft (of telling stories on paper), what I know about it now, and how it's done. It's about the day job; it's about the language." If you're not a King fan, you will be after finishing this one. 
  2. Getting to Yes: Separate people and issues, focus on interest, generate options and use objective criteria. But also, what to do when the other party is more powerful or when they use dirty tricks.
  3. Blink: Making decisions in the blink of an eye; why you should go with instinct and why you shouldn't.
Shows
  1. True Detective: "Life's barely long enough to get good at one thing. So be careful what you get good at." "If the only thing keeping a person decent is the expectation of divine reward then, brother, that person is a piece of shit." "People incapable of guilt usually do have a good time." Heavy stuff.
  2. Rick and Morty: "Listen, Morty, I hate to break it to you but what people call love is just a chemical reaction that compels animals to breed." "Nobody exists on purpose. Nobody belongs anywhere. Everybody's gonna die. Come watch TV? "Snuffles was my slave name, you can call me snowball because my fur is white and pretty." Must be the best animated series of the moment hands down, looking forward to next season. The first season is available for free.
  3. The Walking Dead: Addictive soap opera that features zombies.

Papers

Mostly classics on Functional Programming.

Sunday, December 28, 2014

TDD as the crack cocaine of software

The psychologist Mihaly Csikszentmihalyi popularized the term "flow" to describe states of absorption in which attention is so narrowly focused on an activity that a sense of time fades, along with the troubles and concerns of day-to-day life. "Flow provides an escape from the chaos of the quotidian," he wrote. 
This is a snippet from the highly recommended book Addiction by Design, which not only gives you an incredibly complete overview of the gambling industry, but also insights into the human psyche which apply far outside the domain of gambling.

For me, this book was an eye-opener, with the biggest realization being that most gamblers don't play to win. They play to lose. To lose themselves. Slot machines and video poker are for many people the quickest and surest way to reach flow. It's this phenomenon that has earned machine gambling the title of "the crack cocaine of gambling."

It's not just gamblers that crave for flow though, we all do.

Some of us get up early on the weekends, to drive halfway across the country for a few hours of intensive mountain biking. Others come home after work, throw their laptop in the corner, to engage in an online shooter, zoning out for a good hour. Others will accidentally waste their entire Sunday morning solving a crossword puzzle they bumped on reading the newspaper.

All these activities meet a specific set of preconditions.
Csikszentmihalyi identified four preconditions of flow: first, each moment of the activity must have a little goal; second, the rules of attaining that goal must be clear; third, the activity must give immediate feedback so that one has certainty, from moment to moment, on where one stands; fourth, the tasks of the activity must be matched with operational skills, bestowing a sense of simultaneous control and challenge.
Machine play has all these properties. Let's look at video poker. The goal is to make a winning combination. The set of winning combinations should be easy enough to remember; they're similar to live poker. After pushing "deal" you get five cards. The player decides which cards to "hold". Pushing the "deal" button the second time will draw new cards from the same virtual deck. After the draw, you immediately know whether you've won or lost. Feedback is instantaneous. A game is over in a few seconds. Although the outcome is determined by chance, there is some degree of skill involved; it's up to you to hold the right cards.

As programmers we're lucky enough to inadvertently end up in the zone frequently. Without a doubt, it's in this state most of us do our best work. In the zone, it's constant feedback and a sense of moving forward that keep me going. One could argue that the zone is inherent to the activity of programming. I'd say that the length of the feedback loop and the size of the goals are critical and hard to maintain without working at it.

In this regard, there are a few techniques that help me reach a state of flow. At first I could get by just trying to get the code to compile or to just launch whatever it was I was working on. But once you're comfortable with a code base, getting it to compile isn't much of a challenge, and having to start your application to get feedback gets old real quick. Most often it's TDD that helps me get there these days. You start of with a failing test, your mission: to make it pass. The rules are simple; when your test goes from red to green, you're allowed to move on. It's important that tests are fast to be able to give you that immediate feedback. How fast? Fast enough for you not to lose focus. It stands for itself that the fourth precondition is met too; you're writing the code, doing your best to bend it your way.

When TDD is sold as a productivity booster, it are often strengths such as automated continuous validation of correctness, partitioning of work in smaller units and cleaner and better designed code that are used as arguments. While these are valid arguments, it's a shame that the power of TDD as a consistent gateway to flow gets neglected or undersold.

Getting in the zone by yourself is one thing, getting there surrounded by a group of people is often out of the question. Here Event Storming has helped me out. Small goals; what happens before this event? Rules; write the previous event down on a yellow post-it. Feedback; once the post-it is up, we see that we're reaching a better understanding of the big picture. Control and challenge; you're the one searching for deeper insight, writing and putting up the post-its.

The activities that get me in a state of flow are the ones that I enjoy the most and which enable me to output my best results. If you reread the four preconditions, and assess the things that get you going, you might learn that the same goes for you.