Sunday, August 21, 2016

My InfoQ interview on DDD, events and legacy

Seems that it's impossible to beat the Gaussian curve of blogging frequency. On the other hand, I spent quite some of my mental blogging budget on an interview with InfoQ.

I'm a bit bummed out that it's such a large wall of text. When submitting the answers, I highlighted some snippets which should make for easier scanning. Too bad the formatting was lost when publishing it. I included some highlights below.

The interview itself can be found here. Let me know what you think!
Extracting components: Starting out, this can be as trivial as trying to model boundaries as namespaces or modules. 
Invariants: Having core properties enforced deep within the model, allows for a better night's sleep.
Sizing aggregates: Make your aggregates as small as they can be, but not any smaller. There's a big difference between an invariant that needs to be strongly held and data that helps the aggregate to make a decision, but which doesn't require strong consistency. 
ORM pitfalls: Being able to navigate through a graph, which basically walks through your whole database, is a great way to lose any sense of transactional boundaries. 
The value of bounded contexts: Now when I switch between bounded contexts, it feels like walking through a door, entering a separate room where you can tackle a problem with an unbiased mindset, allowing you to use the right tool for the job. 
Introducing domain events: When you don't want to or can't afford to invest in the full paradigm shift there's a middle ground. You can try a hybrid approach in which you, next to persisting state, also persist the events that led up to a specific state. This does entail the risk of introducing a bug which causes split-brain in which your events do not add up to your state.  
Designing contracts: If you get the semantics wrong, you will end up with a system that's held together by brittle contracts that break constantly.

Wednesday, April 27, 2016

Pieter Hintjens

Writing doesn't necessarily always come naturally to me. It often takes me days, weeks or even months of toying with an idea, before I think it's mature enough to put it down into writing. I can't afford that luxury this time though, I wouldn't think of myself as much of a friend if Pieter didn't get to read this in time.

I met Pieter the first time in a bar in Vilnius, December 2013, I accidentally ended up sitting next to him during the traditional pre-conf drinks. The first thing that stood out, was what a comfortable warm sweater he was wearing - I still cherish the memory of that grey woolen sweater on cold winter nights. I'm still unsure whether it was the sweater or his undeniable radiant charisma that made its way into my memories. When Pieter talks, people tend to listen, or at least pay attention. That's what I ended up doing that night - listening, sipping in the knowledge, afraid to make a fool out of myself joining the conversation.

That next day Pieter opened the conference with probably the most motivational keynote I ever attended, aptly titled "Building stuff changes everything". Him being a fellow countryman and me having a few Lithuanian beers in me, helped me gather enough courage to properly introduce myself and talk for a bit.

From that moment on, we would only meet a few times a year, traveling to Vilnius with the Belgian delegation or as a guest at the Domain Driven Design Belgium meetup. During that time, we had a few - far from enough - lengthy conversations. Me mostly asking questions, him sharing his point of view, and me trying hard to keep up, taking it all in. Although he was always more than happy to entertain each question you would throw at him, I would always feel a bit selfish keeping him to myself for too long.

The most memorable talk I had with Pieter was during a tête-à-tête in the Vilnius sky bar. We would mingle Dutch and English, whichever language made the next sentence sound best. We shared some personal experiences, he laid out most of the groundwork for what a good year later materialized into "The Psychopath Code", but most importantly he allowed me a peek through his eyes, looking at his playground we like to call life.

You don't need his Mensa membership card, to realize he is a highly gifted individual. He could have pursued anything he wanted and been good at it, but he chose all-out for people, freedom and love - making it his mission to evangelize his core beliefs.

His words - both spoken and written - have inspired me more than he can imagine. And they will likely continue to inspire others for many more years to come. His work has given me a framework to build on for the rest of my life. There's so much to learn from how he is capable of dissecting the world, to document things that are broken and his tireless effort to make it whole again - one protocol, one word, one hug at a time. Where I would often feel overwhelmed by dark hopeless sentiments, he has given me enough tools to overcome those. From his mouth to my heart: "We're not much more than a pile of ants, but together we can achieve tremendous things".

Pieter, I'm not half the writer you are, but I hope these words can serve as a testimony to your children what a great dad they had. If your time comes, know that I'm grateful that I've been able to call you my friend.

Sunday, April 24, 2016

Using a batch layer for fast(er) aggregations

In the oldest system I'm maintaining right now, we have an account aggregate that, next to mutating various balances, produces immutable financial transactions. These transactions are persisted together with the aggregate itself to a relational database. The transactions can be queried by the owner of the account in an immediate consistent fashion.

The table with these transactions looks similar to this:

There's an index on the timestamp, the account identifier and the transaction type, which allows for fast enough reads for the most common access patterns which only return a small subset.

In a use case we recently worked on, we wanted real-time statistics of an account's transactions over its life time.

Running this query would seek the account id index, to look up all rows that match given predicate. In case one account has tens of thousands of transactions, this results in a high amount of reads. In case your database fits into memory, SQL Server can probably satisfy your query looking in its buffer cache. Although this still has overhead, it's supposed to be a lot faster than when SQL Server is forced to do physical reads - reading pages straight from disk. In this case, where transactions are often years old, and the database does not fit into memory, odds are high that SQL Server will be reading from disk - which is dog-slow.

One option would be to create a new covering index (including columns like CashAmount etc) for this specific workload. The problem is that indexes don't come for free. You pay for them on every write, and depending on your performance goals, that might be a cost you want to avoid. It might even be impossible, or too expensive to create such an index on environments that have no maintenance window and no license that allows for online index builds. Assuming that when you don't own said license, you don't have read replicas available either.

Considering the workload, the never-changing nature of financial transactions and constraints in place, we applied Lambda Architecture theory on a small scale, starting by building daily aggregations of transactions per account.

This translates into scheduling a job which catches up all days, by performing a query per day and appending the results to a specific table.

On our dataset, this compresses the transaction table by a factor of more than 300. Not just that, by separating reads from writes, we give ourselves so much more breathing room and options, which makes me sleep so much better at night.

As you probably noticed, for real-time statistics on this data, we're still missing today's transactions in this table. Since today's transactions are a much smaller subset and likely to live in SQL Server's cache, we can query both the batch table and the transaction table, to eventually merge the results of both queries. For our use case, resource usage and query response times have dropped significantly, especially for the largest accounts.

I don't see it happening in the near future, but in case the usage of these queries grows, we can still borrow more Lambda Architecture practices and push further.