Sunday, December 20, 2015

Visualizing event streams

In my recent talk on Evil by Design, I showed how I've been visualizing event streams as a means to get a better grip on how aggregates behave in production. The talk's scope kept me from showing the code that goes together with the examples shown. Consider this post as an addendum to that talk.

First off, we need a few types: a string that identifies a stream, an event containing a timestamp and its name. A stream which is a composition of an identifier and a sequence of events. We also need a function that's able to read a stream based on its identifier.

Once we've implemented that, we want to go ahead and visualize a single stream. Having some experience with Google Charts, I used the XPlot.GoogleCharts package.

I want to visualize my event stream as a timeline. For that, it makes only sense to use the Timeline graph. This means that I'll have to make sure I transform my data into a format the Timeline chart can work with, which is a sequence of tuples.

So we write a function which accepts a stream, and returns a sequence of tuples containing the stream identifier, the event name and the timestamp of the event.

With just a few lines of code, we can already compose our way to a timeline.




The result tells a small story: a withdrawal to a casino was requested at 9:44PM, approved at 12:15PM the next day, and eventually completed 7 hours later.

From an operational perspective, this visualization can be used as visual assistance for your support team when users have a question or a complaint. From a more technical perspective, it can be used to get a feel of the domain language and business processes without having to look at code or tests. I could even see this being used in the front-end, where you enable users to monitor a process; think package tracking, document verification and so on.

Once you start exploring aggregates, you will notice that some aggregates look healthier than others; lean and short-lived. While other aggregates are fat and long-lived which can introduce a set of problems:

  • rebuilding state from a large event stream might kill performance
  • there's often more contention on larger aggregates making optimistic (or pessimistic) concurrency very annoying



Spotting one of these instances is an invitation to review your model, to revise true invariants and to break things apart.

We've now looked at an aggregate's event stream in isolation, but often something happening in one place leads to a reaction somewhere else. A simple example: when a new user registers, a promotion is awarded. We can visualize this by rendering multiple streams on one timeline.

Technically, we need to transform a sequence of streams to a single sequence of tuples which we can feed the chart. It's as simple as mapping each stream for then to flatten the result into a single sequence.

This one extra step makes the result even more useful.



There's more potential though; consider showing the payload when hovering over an event, adding commands in the mix, zooming out, zooming in, filtering...

If this is something you could see being useful to you or your organization, let me know! Maybe I can port some bits and polish the concept in the open.

Wednesday, November 18, 2015

Slides from my talk "Evil by Design" at Build Stuff

Third time attending Build Stuff, first time doing a talk. I'm happy that it's out of the way and can now just enjoy the conference, but I'm even more excited that it was well-received! The talk should have been recorded, but you can already find the abstract and slides below.
Last year I ventured into the domain of (online) gambling. Given that the industry has been around since forever, I expected most problems to be of the technical kind. As it turned out, the struggle with technology was only part of a bigger problem; to move forward we needed to fully grasp the industry and its consumers. 
Events started out as a way to dismantle a legacy system, but quickly proved to be an effective tool to gain a deeper understanding of our domain. Visualising event streams, we discovered patterns that helped us identify what drives different types of users. 
Having a better understanding of what customers are looking for, we dove into existing literature to learn which techniques and models casinos use to cater for each type of user. We learned how to program chance while staying true to the Random Number God. Even when variance is brutal, casinos have enough data and tools to steer clear from the pain barrier. 
All of this entails interesting problems and software, but isn't my code damaging society? Or is gambling just another human trait?

Monday, November 16, 2015

Defining big wins

Casinos invest a lot of energy selling the dream. One way to do this is by showing off people winning big in your casino. Everyone has seen those corny pictures of people holding human-sized cheques right? It's a solid tactic, since empirical evidence shows that after a store has sold a large-prize winning lottery ticket, the ticket sales increase from 12 to 38% over the following weeks.

If we look at slot machine play, what exactly defines a big win? The first stab we took at this was quite sloppy. We took an arbitrary number and said wins bigger than 500 euro are impressive. This was quick and easy to implement, but when we observed the results we noticed that when you have players playing at high stakes, a win of 500 euro really isn't that impressive, and we would see the exceptional high roller often dominate the results.

What defines a big win, is not the amount, but how many times the win multiplies your stake. Betting 1 euro to win 200 euro sounds like quite the return right? Coming to this conclusion, we had to define a multiplier threshold that indicates a big win.

Having each win correlate to a bet, we could project the multipliers, and look at the distribution.

In this example I'm using matlab, but we could do the same using Excel or code.

So first we load the multipliers data set.

For then to look at its histogram, visualizing how the multipliers are distributed.




Here we notice that there is a skewness towards large values; a few points are much larger than the bulk of data. Logarithmic scales can help us here.


This shows us a pretty fitting bell curve, meaning the multipliers are somewhat log normally distributed. We could now use the log standard deviation to pick the outliers.

But we can also tabulate the data set and hand pick the cut-off of normal wins.

We could now write a rule in our projection of big wins which states that a log(multiplier) larger than 3 is considered to be a big win.

Matlab, Excel and the like are great domain specific tools for data exploration which can help you reach a better feel and understanding.

Sunday, October 18, 2015

Bulk SQL projections with F# and type providers

Early Summer, I had to set up an integration with an external partner. They required of us to daily provide them with a relational dataset stored in SQL Server. Most, if not all of the data was temporal, append-only by nature; think logins, financial transactions..

Since the data required largely lived in an eventstore on our end, I needed fast bulk projections. Having experimented with a few approaches, I eventually settled on projections in F# taking advantage of type providers.

Let's say we have an event for when users watched a video and one for when users shared a video.

We want to take streams from our eventstore and project them to a specific state; a stream goes in and state comes out.

Then we want to take that state, and store it in our SQL Server database.

Some infrastructure that reads a specific stream, runs the projection, stores the state and checkpoints the projection, could look like this.

To avoid data corruption, storing the state and writing the checkpoint happens in the same transaction.

With this piece of infrastructure in place, we are close to implementing an example. But before we do that, we first need to install the FSharp.Data.SqlClient package. Using this package, we can use the SqlProgrammabilityProvider type provider to provide us with types for each table in our destination database. In the snippet below, I'll create a typed dataset for the WatchedVideos table and add a row.

I haven't defined this type, nor was it generated by me. The SqlProgrammabilityProvider type provider gives you these for free, based on the meta data it can extract from the destination database. This also means that when you change your table, without changing your code, the compiler will have no mercy and immediately feed back where you broke your code. In this usecase, where you rather rebuild your data than migrate it, the feedback loop of changing your database model becomes so short, that it allows you to break stuff with much confidence. The only caveat here is that the compiler must always be able to access that specific database, compiling without fails. In practice, this means you need to ship your source with a build script that sets up your database locally before you do any work.

Going from a stream to a dataset is quite declarative and straightforward with the help of pattern matching.

Storing the result in an efficient fashion is also simple, since the dataset directly exposes a BulkCopy method.

When we put this all together, we end up with this composition.

Executing this program, we can see the data was persisted like expected.


In the real world, you also want to take care of batching and logging, but that isn't too hard to implement.

Having this approach in production for some time now, I'm still quite happy with how it turned out. The implementation is fast, and the code is compact and easy to maintain.

Friday, September 11, 2015

Aspect ratio calculation

Earlier today I was writing a migration script in F# where I had to calculate the aspect ratio based on the given screen dimensions. This is one of those problems where I don't even mind breaking my head over, but directly head over to Stackoverflow to find an accepted answer which I can just copy paste. Since I didn't find an F# snippet I could use, I ported some JavaScript, and embedded the result below for future snippet hunters.

The aspectRatio function does two things: 1. Recursively find the greatest common divisor between the width and height 2. Divide the width and height by the greatest common divisor

Monday, June 22, 2015

Basic casino math

In a previous series of posts, I went over the models used by casinos to spin a wheel (spinning, manipulating the odds, clustering and near misses). I did not yet expand on the basic mathematical models that ensure a casino makes money.

Let's pretend we are spinning the wheel again. The wheel has 5 pockets, and just one of those is the winning one. Given we will be using an unmodified wheel, you win 1 out of 5 spins. Each bet costs you 1 euro. Looking at the true odds (1/5), the casino should pay out 4 euro for you to break even.

Respecting the true odds would not make the casino any money, they pay out less to ensure that the house has an edge on you. So instead of paying out 4 euro, it will be a tad less.

The house edge can be cast into a fairly simple formula.

In this example, the house edge is a whopping 20%, meaning statistically 20% of each bet will go to the casino. So the higher the house edge, the better?

Not really, if players constantly go through their bankroll in a matter of minutes, it's not very likely they will keep returning to your casino. The inverse to the house edge, and maybe even a more important number, is the payout percentage. When the house edge is 20%, the player's payout percentage will be 80%. For each bet you make, you will statistically see a return of 80%. As a player to get maximum value for money - to play as long as possible - you should aim to play in a casino that has the highest payout percentages.

Often misunderstood is that this does not mean you will get to keep 80% of your bankroll by the end of the night. The payout percentage relates to a single bet. The casino's hold, or money eventually left on the table, is several times the house edge, since players tend to circulate through the same money more than once. So the longer you play, the more the house edge will nibble at your bankroll.

Knowing the house edge, it's pretty simple for a casino to predict a customer's worth; multiply the house edge, the average stake and the number of games per hour.

Given we spin the wheel 60 times an hour for a stake of 1 euro, we will make the casino 12 euro an hour on average. The higher this number, the bigger your potential, the harder a casino will try to make you a regular.

Understanding how casinos make a living, it's safe to say casinos aren't the place to play for money, but to play with money.

Sunday, May 24, 2015

Consumed: Queries and projections (F#)

This is the third post in my series on porting a node.js application to an F# application.

So far, I've looked at parsing command line arguments, handling commands and storing events. Today, I want to project those events into something useful that can be formatted and printed to the console.

In the original application, I only had a single query. The result of this query lists all items consumed grouped by category, sorted chronologically

Handling the query is done in a similar fashion to handling commands. The handle function matches each query and has a dependency on the event store.

Where C# requires a bit of plumbing to get declarative projections going, F#'s pattern matching and set of built-in functions give you this for free.

We can fold over the event stream, starting with an empty list, to append each item that was consumed, excluding the ones that were removed later. Those projected items can then be grouped by category, to be mapped into a category type that contains a sorted list of items.

The result can be printed to the console using a more imperative style.

And that's it, we've come full circle. We can now consume items, remove items and query for a list of consumed items.

Compared to the node.js implementation, the F# version required substantially less code (two to three times less). More importantly, although I wrote tests for both, I felt way more confident completing the F# version. A strong type system, discriminated unions, pattern matching, purity, composability and a smart compiler makes way for sensible and predictable code.

Source code is up on Github.

Sunday, May 17, 2015

Consumed: Handling commands (F#)

As I wrote earlier, I've been working on porting a node.js web application to an F# console application. It's an application I wrote to learn node.js but still use today to keep track of all the things I consume.

The application is able to consume an item, to remove a consumed item and to query all consumed items.

In the previous post, I parsed command line arguments into typed commands and queries. Today, I'll look at handling the two commands.

I've refactored the command discriminated union to contain records with data that go along with the command - I found that this makes for more discoverable and refactor-friendly deconstruction later on.

Validation

Before we do anything with the command, we need to make sure it passes basic validation. The validate function takes a command, and returns a success or failure result. Validation can fail because an argument is empty, its structure is invalid or it's out of range. Inside the function we match the command discriminated union with each case, validate the data and return a result.

Producing events

Having validated the command, we can start thinking about doing something useful. I want the command handlers to be pure, to be able to focus on computation, without having to worry about side effects.

Since the node.js web application stores its data in the form of events, this one will too. I can now migrate the existing event store to a simple text file living in my Dropbox, for then to drop the existing Postgres database.

This means that command handlers will need to produce events.

Dependencies

Looking at the tests the command handlers need to satisfy, we know that a command handler depends on the system time and the eventstore.

The dependency on time is just a function that takes no arguments and returns a datetime.

An implementation could look like this.

Reading an event stream is a function that takes a stream name and returns an event stream.

Implementing a naïve event store takes more than one line on code.

An eventstore

This implementation stores events in a text file. When an event is stored, it gets serialized to JSON for then to be appended to a text file. When reading a stream it will read all events from disk, deserialize them to then filter by stream name before returning it as an event stream - it's not exactly web scale.

The signature for reading a stream doesn't satisfy the signature we defined earlier though. We can satisfy it by creating a partially applied function.

Handlers

Handlers focus on pure computation, they just need to return an event or a failure.

We can only consume an item once, and we can only remove items that exist. It shouldn't be possible to consume items that have been removed. There isn't much needed on the inside to cover these use cases.

We inject the event store and time dependencies by passing in the relevant functions - since I'm already using this function further on in program.fs, the compiler can infer the signatures, no need to explicitly state the signatures I defined earlier.

Side effects

So far we have been able to avoid intentional side effects - we did introduce functions that might have accidental side effects (reading from disk and reading the system time ). It would be nice to be able to restart the application without losing all state, so we need to take the result the command produced and persist it. A small function takes care of matching each result to invoke the relevant side effect. So far, we only want to store events. With this, we successfully isolated side effects to one small function.

Putting it all together

By now, we can validate commands, handle them and take care of their side effects. We can now compose those pieces together using Railway Oriented Programming and invoke the pipeline. The output gets matched, so we can print something relevant for the user to see.

Next time, we'll look at implementing queries.

Sunday, April 26, 2015

Finding unused code (F#)

Coming from C#, I'm used to the compiler warning me about unused variables. Relying on the compiler to help me with checked exceptions in F#, I noticed that unused values (and functions) would go unnoticed. Having accidentally read earlier that Haskell has a compiler flag to check for unused bindings, I looked for the F# equivalent but failed to find it, until Scott Wlaschin pointed me in the right direction.

By using the --warnon:1182 flag, the compiler will warn you about unused bindings.


For example, compiling Paket.Core with this flag enabled, outputs the following warnings.

Looking into these warnings revealed values and functions that can be deleted, but no apparent bugs. There are also cases where unused bindings make sense, for example when you pass in a function that does not use all of its arguments or when pattern matching. In these cases you can suppress the warning by prefixing the bindings with an underscore.

A useful compiler feature which strangely enough is opt-in. I plan on using it from now on.

Sunday, April 19, 2015

Consumed: Parsing command line arguments (F#)

Last year, I set out to write my first node.js application; a small web application for keeping lists of everything I consume. I had something working pretty quickly, deployed it to Heroku and still find myself using it today. Since there's very little use for having it running on a server, and because I wanted something to toy with getting better at F#, I decided to port it to an F# console application.

With the UI gone, I need to resort to passing in arguments from the command line to have my program transform those into valid commands and queries that can be executed.

The set of commands and queries is limited; consume an item, remove an item and query a list of everything consumed.

Ideally I go from a sequence of strings to a typed command or query. However, when the list of arguments can't be parsed, I expect a result telling me what failed just the same.

Since we need the name to identify the command or query, I expect the input to have at least two arguments.

Arguments come in pairs; a key and a value. My first thought was to build a map here, but that made key validation, key transformations and pattern matching harder.  I can actually get away with transforming the input to a sequence of tuples.

Hoping to avoid some mistakes in the input, basic validation makes sure the keys actually look like keys, instead of a value. Keys start with a single or double dash.

Once that validation is out of the way, I strip away those dashes. That should make the two last steps easier.

The name is required, so I wrote a small function that makes sure a specific key exists.

Now that I have a list of arguments,  I can map them into a typed command or query using pattern matching.

Having written all these small functions, I can simply compose them using Scott Wlaschin's Railway oriented programmming.

This is far from a generic command line parser, but it's simple and covers my needs.

Next up, executing those commands and queries, and printing feedback.

Sunday, March 29, 2015

Checked errors in F#

In the land of C#, exceptions are king. By definition exceptions help us deal with "unexpected or exceptional situations that arise while a program is running". In that regard, we're often optimistic, overoptimistic. Most code bases treat errors as exceptional while they're often commonplace. We are so confident about the likelyhood of things going wrong, we don't even feel the need to communicate to consumers what might go wrong. If a consumer of a method wants to know what exceptions might be thrown, he needs to resort to reading the documentation (or source) and hope it's up-to-date.

Java on the other hand has a concept of unchecked and checked exceptions. Unchecked exceptions are exceptions that are caused by a programming mistake and should be left unhandled (null reference, division by zero, argument out of range etc); while checked exceptions are exceptions that your program might be able to recover from. They become part of the method signature and the Java compiler forces consumers to handle them explicitly.

While checked exceptions might bloat the method's contract and enlarge the API surface area, they might have every right to. Dealing with errors is an important part of programming. Having discoverable errors which require thoughtful care, should improve overall quality. Having said that, it also requires careful consideration from the designer to decide what's truly exceptional.

Coming up with something that can compete with the mechanics of checked exceptions in C# seems to be impossible. We could return a result with an error from a method, but the compiler doesn't force you to do anything with that result.

F# on the other hand doesn't allow for the result of an expression to be thrown away. That is, unless you explicitly ignore it, or bind it and leave it unused.

Let's look at an example. We start by defining two discriminated unions. The first type defines a generic result; it can either be success or failure. The second type defines all the errors that can be returned after deleting a file.

Then we write a function that deletes a file, but instead of throwing exceptions when an error occurs, it returns a specific error. When no errors occur, success is returned.

When I now use this function, the compiler will tell me that it has a return value which needs to ignored or binded.

While ignoring a result stands out, an unused binding is easier to go unnoticed. I wish the F# compiler had a flag to detect unused bindings.

Assuming I don't ingore the result, I can use pattern matching to address each error specifically.

By not including a wildcard pattern, extending the contract by adding errors will introduce a breaking change. We'll have to consider what to do with newly added errors.

For example, if I add the error PathTooLong, the compiler shows me this warning.

In summary, it might be more safe to be a bit less optimistic when it comes to errors. Instead of throwing exceptions, making errors part of the public interface, communicating errors explicitly, and handing responsibility on what to do with the error to the caller, might lead to more robust systems. While this can be achieved with C#, the mechanics are error-prone. Expressions and pattern matching make that F# allows for stronger, yet still not ideal, mechanics.  

Sunday, March 15, 2015

Scaling promotion codes

In our system a backoffice user can issue a promotion code for users to redeem. Redeeming a promotion code, a user receives a discount on his next purchase or a free gift. A promotion code is only active for a limited amount of time, can only be redeemed a limited amount of times and can only be redeemed once per user.

In code these requirements translated into a promotion code aggregate which would guard three invariants.

The command handler looked something like this.

Depending on the promotion code, we would often have a bunch of users doing this simultaneously, leading to a hot aggregate, leading to concurrency exceptions.

Studying the system, we discovered that the limit on the amount of times a promotion code could be redeemed was not being used in practice. Issued promotion codes all had the limit set to 999999. Just by looking at production usage, we were able to remove an invariant, saving us some trouble.

The next invariant we looked at, is the one that avoids users redeeming a promotion code multiple times. Instead of this being part of the big promotion code aggregate, a promotion code redemption now is a separate aggregate. The promotion code aggregate now picks up a new role; the role of a factory, it decides on the creation of new life.

The promotion code redemption's identifier is a composition of the promotion code identifier and the user identifier. Thus even when the aggregate is stored as a stream, we can check in a consistent fashion whether the aggregate (or stream) already exists, avoiding users redeeming a promotion code multiple times. On creation of the stream, the repository can pass to the event store that it expects no stream to be there yet, making absolutely sure we don't redeem twice. The event store would throw an exception when it would find a stream to already exist (think unique key constraint).

In this example, we were able to remove an annoying and expensive invariant by looking at the data. Even if we had to keep supporting promotion code depletion, we might have removed this invariant and replaced it with data fed into the aggregate/factory from the read model. Ask yourself, how big is the cost of having a few more people redeem a promotion code? Teasing apart the aggregate even further, we discovered that the promotion code had a second role; a creational role. It now helps us spawning promotion code redemptions while still making sure this only happens when the promotion code is active. Each promotion code redemption is now a new short-lived aggregate, while the promotion code itself stays untouched. By checking the existence of the aggregate up front and by using the stream name to enforce uniqueness, we avoid users redeeming a promotion code more than once. This has allowed us to completely avoid contention on the promotion code, making it perform without hiccups.

Monday, February 23, 2015

Domain Language: The Playthrough Bonus

Since online gambling has been regulated in Belgium, basically each eligible license holder has complemented their land based operations with an online counterpart. Being such a small country, everyone wants to secure their market share as soon as possible. The big players have been pouring tons of money in to marketing and advertising, it's everywhere: radio, television, (online) newspapers, bus stops, billboards, sport events, airplane vouchers - you name it. While regulations for land based casinos are very strict and almost overprotective, regulations for online play are much more permissive. This makes that online casinos can be rather aggressive acquiring new customers.

You will often see online casinos hand out free registration bonuses: "You get 10 euro for free when you sign up, no strings attached!". This makes it look like casinos are just handing out free cash right? We should all know better than that though.

There are always conditions to cash out a bonus. Bonuses come in different forms and flavors and preconditions to clear them vary wildly. The Playthrough Bonus is the favorite among players by far; it's straightforward and requires zero investment.

When you receive a Playthrough Bonus, you receive an amount of bonus money, which can be converted to cash by wagering it a specific amount of times. For example; you receive a Playthrough Bonus of 10 euro, which needs to be wagered 30 times before the bonus is cleared. This means that you need to bet 300 euro (10 euro multiplied by 30) in total to clear the bonus and to receive what's left off your balance.

So is betting the bonus amount 30 times realistic, or will you always close your browser empty handed with nothing to show for on your balance? The answer depends heavily on the payout rate. This percentage represents how much of the money that goes into the casino, is returned to players. In a formula, this is the wins divided by the bets. This percentage is generally a lot higher than most people expect. Casinos aim for a payout rate between 95 and 99 percent. They want to cultivate a long term relationship with happy and social customers, not clear your bank account as soon as you open the door. Note that the payout percentage is an average, not all games are a smooth ride. Some players like big wins, and big losses, while others feel more comfortable losing small, but don't mind winning small either. Casinos also prefer a smooth ride, especially when it comes to bonuses. They might even tweak games to have less aggressive, more equally distributed wins when bonus money is in play.

Now let's look at how much money would be left on our balance when we try to clear a Playthrough Bonus of 10 euro with a playthrough of 30 and a payout percentage of 98.

I defined two records (excuse my primitive obsession). First a Bonus record that contains an amount, a balance, the total amount of bets and a playthrough. A few functions are associated with the Bonus record; they allow the bonus to be created, to bet, to win, to check if it still accepts bets and to check if the bonus has been cleared. The second record, the GameSettings, define the payout percentage and the stake of a bet.

After defining these structures, I defined a function that recursively keeps playing (bet and win) until either the bonus is cleared, or the bonus no longer accepts bets (out of money).

When we run this function we know the answer to our question. On average, we will clear the bonus with four euro left on our balance.

When we turn down the payout to be one percent lower, we only have 1 euro left on our balance. When we turn it down even more, there won't be anything left.

Given a few parameters which should be available to you (bonus amount, playthrough and even payout percentage), you can calculate how feasible it is to clear a Playthrough Bonus. Unless variance is on your side, I guess it will rarely turn out to be a lucrative grind.

Sunday, February 15, 2015

Side by side

This week marked my first year at my current employer. While that event went by rather silently, one year in, a few of my observations are finally shaping up to be cast into writing.

Where I used to work in the typical battery cage, I'm now part of a team of just four people, having the luxury of a big dedicated room to ourselves - a whole floor actually. The room is set up almost symmetrically; two desks on one side of the room and two more on the other side, with quite some space in between. Having only four people in the room makes it easy to casually throw something at the group - be it a question, a critique or a random idea.

I made good use of this perk early on, but noticed that I would too often find myself amid a Mexican Standoff. We would often get ourselves into discussions that quickly turned into a my-opinion-versus-your-opinion and would lead nowhere.

It didn't make sense how I got myself into this situation time after time, until I read somewhere how to approach petting an unfamiliar dog.
Our species have more in common than you would think. Our shared history of pack hunting has made both our species highly social and interdependent.
For example, when you approach an unfamiliar dog, you shouldn't pet him on the head since this can be very threatening. It's better to approach him from the side to rub his ears, neck or back. This behaviour is an evolutionary remnant of pack hunting; members of the pack would rub each other's shoulder constantly chasing their next meal.

It occurred to me that the cause of our unproductive discussions might be as simple as our desks being in an aggressive position, desk-to-desk or face-to-face.
Looking back at my previous jobs, I found no precedents of having discussions in this position. The horrors of the open plan had always forced me to either walk over or to find a meeting room.

But when I look at my personal relationships and discussions, I find more situations that confirm this theory. When going for drinks, most of my friends prefer a noisy and crowded bar, which force you side-by-side just to make yourself understood. Even when communicating with my girlfriend, it's not the cliché tete-a-tete dinner dates that yield the best conversations, it's taking a walk, long road trips or even cooking together.

When it came to avoiding these unproductive situations at work, I now carefully consider which things I can just throw out there, or which things require a side-by-side approach. Even when you got yourself in trouble face-to-face, you can still guide things into a more constructive direction by changing the situation.

Getting side-by-side shouldn't be too hard in a professional environment. Both pair programming and whiteboard sessions (or one of its more exotic evolutions - model storming, mob programming etc) are ingrained in most places now.

Evolution has wired our brain in a way that makes being side-by-side, preferably chasing the same goal, extremely amicable. Although we're no longer hunting for food, we still find ourselves chasing different means of prosperity and success. Just like before, it's still the case that we're at our best as a team, side-by-side, in pack.

Sunday, January 18, 2015

Averages are not good enough (F#)

Let's (no pun intended) look at a set of response times of a web service.

People like having a single number to summarize a piece of data. The average is the most popular candidate. The average is calculated by dividing the sum of the input elements by the number of input elements.

The average is a measure of centre which is fragile to outliers; one or two odd irregular values might skew the outcome. The median on the other hand is always representative of the centre, not just when the data distribution is symmetric. The median is determined by sorting the input elements and picking the one in the middle.

Both measures are terrible at showing how the data is distributed though. The average and median response time might look fair, but maybe there are a few outliers which are giving a few customers a bad time.

Instead of reducing our input down to a single number, it might be better to extract a table that displays the frequency of various outcomes in the input.

Now this is more useful; we can clearly see that the data is not distributed equally and there are a few outliers in our response times we need to investigate further.

This table takes up quite a bit of ink though. What if we want to get rid of the table, but maintain a feel for the distribution of the data?

The standard deviation measures the amount of variation from the average. A low standard deviation means that the data points are very close to the mean. A high standard deviation indicates that the data points are spread out over a large range of values.
It is calculated by taking the square root of the average of the squared differences of the values from their average value.

The standard deviation is even more useful when you put the average at the centre of a graph, lay out the input elements according their deviation of the average and see a bell graph drawn. This means that we can use the empirical 68-95-99.7 rule to get a feel of how the data is distributed.
In statistics, the so-called 68–95–99.7 rule is a shorthand used to remember the percentage of values that lie within in a band around the mean in a normal distribution with a width of one, two and three standard deviations, respectively; more accurately, 68.27%, 95.45% and 99.73% of the values lie within one, two and three standard deviations of the mean, respectively. 

For our set of response times, this means that 68.27% of the response times lies within the 24.8 and 66.4 range, 95.45% lies within the 4 and 87.2 range, while 99.73% lies within the -16.8 and 108 range.

When we calculate the standard deviation, we can put one extra number next to the average and derive from just two numbers how the data is distributed.

In conclusion, the mean and the median hide outliers. Looking at the frequency distribution gives you a more complete picture. If we insist on having less data to look at, the standard deviation and the 68–95–99.7 rule can compress the same complete picture into just two numbers.  

Sunday, January 11, 2015

Consumed in 2014

Starting 2014, I wanted to look more closely at everything I consume. So I started keeping a list of everything I read, watch and listen to.

I started off with a markdown file on Github that quikcly evolved into a good excuse to dabble with an alternative stack. I ended up writing an event sourced node.js application on top of postgres, hosted on Heroku. I could write a post on that particular experience, but it's safe to say this blog post captures it better than I ever could.

In 2014, I listened to 10 audiobooks and 18 podcasts, read 20 books and watched 12 movies and 13 shows.

I just went over the list and cherry picked three items per category that stuck with me the most.

Books
  1. Addiction by Design: An incredibly complete overview of the gambling industry, with insights into the human psyche which apply far outside the domain of gambling. 
  2. 11/22/63: I try to read a fiction book for every non-fiction book I read. This was my first "real" King book, and I was hooked instantly. I devoured the 880 pages thick book in less than two weeks, and was left with a big gaping hole in my heart - the journey was more than worth it. 
  3. SQL Performance Explained: The author succeeds in capturing the essentials in only 204 pages. This is one of those books you want to have going around at the office. 
Movies
  1. Be Here To Love Me: Not-your-average documentary about Townes van Zandt; my musical and philosophical discovery of 2014. A troubled soul, struggling with the futility of existence. Simple poetic songs beautifully crafted, capturing in a few words what would take others books. While darkness inspired his best work, it was also the search for relief of that same darkness that led him further down the path of self-destructive behaviour. Painful to watch. RakeWaiting around to dieSnake Mountain BluesDollar Bill Blues.
  2. 3-10 to Yuma: A true modern Western, capturing the American West beautifully, making me nostalgic.
  3. Dallas Buyers Club: Is there one role Matthew McConaughey can't hack?
Townes Van Zandt



Audio books
  1. On Writing - A Memoir of the Craft: "What follows is an attempt to put down, briefly and simply, how I came to the craft (of telling stories on paper), what I know about it now, and how it's done. It's about the day job; it's about the language." If you're not a King fan, you will be after finishing this one. 
  2. Getting to Yes: Separate people and issues, focus on interest, generate options and use objective criteria. But also, what to do when the other party is more powerful or when they use dirty tricks.
  3. Blink: Making decisions in the blink of an eye; why you should go with instinct and why you shouldn't.
Shows
  1. True Detective: "Life's barely long enough to get good at one thing. So be careful what you get good at." "If the only thing keeping a person decent is the expectation of divine reward then, brother, that person is a piece of shit." "People incapable of guilt usually do have a good time." Heavy stuff.
  2. Rick and Morty: "Listen, Morty, I hate to break it to you but what people call love is just a chemical reaction that compels animals to breed." "Nobody exists on purpose. Nobody belongs anywhere. Everybody's gonna die. Come watch TV? "Snuffles was my slave name, you can call me snowball because my fur is white and pretty." Must be the best animated series of the moment hands down, looking forward to next season. The first season is available for free.
  3. The Walking Dead: Addictive soap opera that features zombies.

Papers

Mostly classics on Functional Programming.