Wednesday, April 27, 2016

Pieter Hintjens

Writing doesn't necessarily always come naturally to me. It often takes me days, weeks or even months of toying with an idea, before I think it's mature enough to put it down into writing. I can't afford that luxury this time though, I wouldn't think of myself as much of a friend if Pieter didn't get to read this in time.

I met Pieter the first time in a bar in Vilnius, December 2013, I accidentally ended up sitting next to him during the traditional pre-conf drinks. The first thing that stood out, was what a comfortable warm sweater he was wearing - I still cherish the memory of that grey woolen sweater on cold winter nights. I'm still unsure whether it was the sweater or his undeniable radiant charisma that made its way into my memories. When Pieter talks, people tend to listen, or at least pay attention. That's what I ended up doing that night - listening, sipping in the knowledge, afraid to make a fool out of myself joining the conversation.

That next day Pieter opened the conference with probably the most motivational keynote I ever attended, aptly titled "Building stuff changes everything". Him being a fellow countryman and me having a few Lithuanian beers in me, helped me gather enough courage to properly introduce myself and talk for a bit.

From that moment on, we would only meet a few times a year, traveling to Vilnius with the Belgian delegation or as a guest at the Domain Driven Design Belgium meetup. During that time, we had a few - far from enough - lengthy conversations. Me mostly asking questions, him sharing his point of view, and me trying hard to keep up, taking it all in. Although he was always more than happy to entertain each question you would throw at him, I would always feel a bit selfish keeping him to myself for too long.

The most memorable talk I had with Pieter was during a tête-à-tête in the Vilnius sky bar. We would mingle Dutch and English, whichever language made the next sentence sound best. We shared some personal experiences, he laid out most of the groundwork for what a good year later materialized into "The Psychopath Code", but most importantly he allowed me a peek through his eyes, looking at his playground we like to call life.

You don't need his Mensa membership card, to realize he is a highly gifted individual. He could have pursued anything he wanted and been good at it, but he chose all-out for people, freedom and love - making it his mission to evangelize his core beliefs.

His words - both spoken and written - have inspired me more than he can imagine. And they will likely continue to inspire others for many more years to come. His work has given me a framework to build on for the rest of my life. There's so much to learn from how he is capable of dissecting the world, to document things that are broken and his tireless effort to make it whole again - one protocol, one word, one hug at a time. Where I would often feel overwhelmed by dark hopeless sentiments, he has given me enough tools to overcome those. From his mouth to my heart: "We're not much more than a pile of ants, but together we can achieve tremendous things".

Pieter, I'm not half the writer you are, but I hope these words can serve as a testimony to your children what a great dad they had. If your time comes, know that I'm grateful that I've been able to call you my friend.

Sunday, April 24, 2016

Using a batch layer for fast(er) aggregations

In the oldest system I'm maintaining right now, we have an account aggregate that, next to mutating various balances, produces immutable financial transactions. These transactions are persisted together with the aggregate itself to a relational database. The transactions can be queried by the owner of the account in an immediate consistent fashion.

The table with these transactions looks similar to this:

There's an index on the timestamp, the account identifier and the transaction type, which allows for fast enough reads for the most common access patterns which only return a small subset.

In a use case we recently worked on, we wanted real-time statistics of an account's transactions over its life time.

Running this query would seek the account id index, to look up all rows that match given predicate. In case one account has tens of thousands of transactions, this results in a high amount of reads. In case your database fits into memory, SQL Server can probably satisfy your query looking in its buffer cache. Although this still has overhead, it's supposed to be a lot faster than when SQL Server is forced to do physical reads - reading pages straight from disk. In this case, where transactions are often years old, and the database does not fit into memory, odds are high that SQL Server will be reading from disk - which is dog-slow.

One option would be to create a new covering index (including columns like CashAmount etc) for this specific workload. The problem is that indexes don't come for free. You pay for them on every write, and depending on your performance goals, that might be a cost you want to avoid. It might even be impossible, or too expensive to create such an index on environments that have no maintenance window and no license that allows for online index builds. Assuming that when you don't own said license, you don't have read replicas available either.

Considering the workload, the never-changing nature of financial transactions and constraints in place, we applied Lambda Architecture theory on a small scale, starting by building daily aggregations of transactions per account.

This translates into scheduling a job which catches up all days, by performing a query per day and appending the results to a specific table.

On our dataset, this compresses the transaction table by a factor of more than 300. Not just that, by separating reads from writes, we give ourselves so much more breathing room and options, which makes me sleep so much better at night.

As you probably noticed, for real-time statistics on this data, we're still missing today's transactions in this table. Since today's transactions are a much smaller subset and likely to live in SQL Server's cache, we can query both the batch table and the transaction table, to eventually merge the results of both queries. For our use case, resource usage and query response times have dropped significantly, especially for the largest accounts.

I don't see it happening in the near future, but in case the usage of these queries grows, we can still borrow more Lambda Architecture practices and push further.

Sunday, April 17, 2016

Notifications from an event log

User notifications are a feature that came as an afterthought, but turned out to be rather easy to implement - without touching (read: breaking) existing functionality - thanks to having an immutable event log.

In the domain I'm working in at the moment, we will often give users incentives to return to the website, or to extend their stay on the website. These incentives were only communicated by email at first, and this is a decent medium when you want users to return to the website. However, when you want to extend their stay on the website, you want to avoid users switching contexts between your website and their mail client. But also, as soon as they return to your website, you want to show them a crisp overview of all relevant calls to action. Having most calls to action map to a specific page, the list of notifications can serve as a one-click starting point, lowering the hurdle to browse to a relevant page.

Notifying a user is one thing. Another use case we wanted to solve, is to dismiss notifications as soon as they are no longer relevant.

Two examples of when a notification might no longer be considered relevant:
  1. When a bonus is awarded to a user, he might ignore the notification and activate the bonus by directly browsing to the specific page.
  2. When a bonus is awarded to a user, he might not visit the website before the bonus expires.
In these cases, to avoid confusion and unsatisfied customers, we want to dismiss the notification automatically.

Let's say that we're going to implement notifications for bonuses. We have these type of events to work with.

On the other hand, we have a set of commands that interact with notifications.

A notification has an identifier, references a user, contains some data, and most importantly can be linked to something.

Working from an immutable event log, we can project the events to commands (to dispatch them eventually).

When a bonus is awarded to a user, we will notify the user, providing the template id and data that can be used inside of the template. In this example, the notification can be linked to a specific bonus, leveraging the bonus identifier.

The user might now see something like this.

Being aware of the events which a bonus produces over its lifetime, and their significance, we choose to dismiss the notification as soon as the bonus is activated or expired (leveraging the bonus identifier as the link again).

Now it's up to the UX team (if you're lucky enough to have one) to decide on how to visualize the difference between a read and a dismissed notification (if at all).