Sunday, January 18, 2015

Averages are not good enough (F#)

Let's (no pun intended) look at a set of response times of a web service.

People like having a single number to summarize a piece of data. The average is the most popular candidate. The average is calculated by dividing the sum of the input elements by the number of input elements.

The average is a measure of centre which is fragile to outliers; one or two odd irregular values might skew the outcome. The median on the other hand is always representative of the centre, not just when the data distribution is symmetric. The median is determined by sorting the input elements and picking the one in the middle.

Both measures are terrible at showing how the data is distributed though. The average and median response time might look fair, but maybe there are a few outliers which are giving a few customers a bad time.

Instead of reducing our input down to a single number, it might be better to extract a table that displays the frequency of various outcomes in the input.

Now this is more useful; we can clearly see that the data is not distributed equally and there are a few outliers in our response times we need to investigate further.

This table takes up quite a bit of ink though. What if we want to get rid of the table, but maintain a feel for the distribution of the data?

The standard deviation measures the amount of variation from the average. A low standard deviation means that the data points are very close to the mean. A high standard deviation indicates that the data points are spread out over a large range of values.
It is calculated by taking the square root of the average of the squared differences of the values from their average value.

The standard deviation is even more useful when you put the average at the centre of a graph, lay out the input elements according their deviation of the average and see a bell graph drawn. This means that we can use the empirical 68-95-99.7 rule to get a feel of how the data is distributed.
In statistics, the so-called 68–95–99.7 rule is a shorthand used to remember the percentage of values that lie within in a band around the mean in a normal distribution with a width of one, two and three standard deviations, respectively; more accurately, 68.27%, 95.45% and 99.73% of the values lie within one, two and three standard deviations of the mean, respectively. 

For our set of response times, this means that 68.27% of the response times lies within the 24.8 and 66.4 range, 95.45% lies within the 4 and 87.2 range, while 99.73% lies within the -16.8 and 108 range.

When we calculate the standard deviation, we can put one extra number next to the average and derive from just two numbers how the data is distributed.

In conclusion, the mean and the median hide outliers. Looking at the frequency distribution gives you a more complete picture. If we insist on having less data to look at, the standard deviation and the 68–95–99.7 rule can compress the same complete picture into just two numbers.  

Sunday, January 11, 2015

Consumed in 2014

Starting 2014, I wanted to look more closely at everything I consume. So I started keeping a list of everything I read, watch and listen to.

I started off with a markdown file on Github that quikcly evolved into a good excuse to dabble with an alternative stack. I ended up writing an event sourced node.js application on top of postgres, hosted on Heroku. I could write a post on that particular experience, but it's safe to say this blog post captures it better than I ever could.

In 2014, I listened to 10 audiobooks and 18 podcasts, read 20 books and watched 12 movies and 13 shows.

I just went over the list and cherry picked three items per category that stuck with me the most.

Books
  1. Addiction by Design: An incredibly complete overview of the gambling industry, with insights into the human psyche which apply far outside the domain of gambling. 
  2. 11/22/63: I try to read a fiction book for every non-fiction book I read. This was my first "real" King book, and I was hooked instantly. I devoured the 880 pages thick book in less than two weeks, and was left with a big gaping hole in my heart - the journey was more than worth it. 
  3. SQL Performance Explained: The author succeeds in capturing the essentials in only 204 pages. This is one of those books you want to have going around at the office. 
Movies
  1. Be Here To Love Me: Not-your-average documentary about Townes van Zandt; my musical and philosophical discovery of 2014. A troubled soul, struggling with the futility of existence. Simple poetic songs beautifully crafted, capturing in a few words what would take others books. While darkness inspired his best work, it was also the search for relief of that same darkness that led him further down the path of self-destructive behaviour. Painful to watch. RakeWaiting around to dieSnake Mountain BluesDollar Bill Blues.
  2. 3-10 to Yuma: A true modern Western, capturing the American West beautifully, making me nostalgic.
  3. Dallas Buyers Club: Is there one role Matthew McConaughey can't hack?
Townes Van Zandt



Audio books
  1. On Writing - A Memoir of the Craft: "What follows is an attempt to put down, briefly and simply, how I came to the craft (of telling stories on paper), what I know about it now, and how it's done. It's about the day job; it's about the language." If you're not a King fan, you will be after finishing this one. 
  2. Getting to Yes: Separate people and issues, focus on interest, generate options and use objective criteria. But also, what to do when the other party is more powerful or when they use dirty tricks.
  3. Blink: Making decisions in the blink of an eye; why you should go with instinct and why you shouldn't.
Shows
  1. True Detective: "Life's barely long enough to get good at one thing. So be careful what you get good at." "If the only thing keeping a person decent is the expectation of divine reward then, brother, that person is a piece of shit." "People incapable of guilt usually do have a good time." Heavy stuff.
  2. Rick and Morty: "Listen, Morty, I hate to break it to you but what people call love is just a chemical reaction that compels animals to breed." "Nobody exists on purpose. Nobody belongs anywhere. Everybody's gonna die. Come watch TV? "Snuffles was my slave name, you can call me snowball because my fur is white and pretty." Must be the best animated series of the moment hands down, looking forward to next season. The first season is available for free.
  3. The Walking Dead: Addictive soap opera that features zombies.

Papers

Mostly classics on Functional Programming.