Tuesday, December 27, 2016

Consumed in 2016

I'm keeping the tradition alive, sharing how much I've consumed over the last year highlighting the things that stood out. 18 books, 8 movies and 9 shows. Looks like I consumed more than other years, which probably also explains why I produced less after-hours.

Books

I finished the Dark Tower series after 3 years. Following Roland Deschain and his ka-tet throughout the 8 books has been an epic adventure. Finishing Harry Potter by the time I was 18, I had little hopes to be ever dragged into such a long and captivating tale ever again. The Stand, another epic by Stephen King is also high up on my list. I've seem to have taken a liking to stories that are set in a post-apocalyptic world, in which the antagonist is not necessarily a horde of zombies.
Control the things you can control, maggot. Let everything else take a flying fuck at you, and if you must go down, go down with your guns blazing. - Stephen King, The Dark Tower
Show me a man or a woman alone and I'll show you a saint. Give me two and they'll fall in love. Give me three and they'll invent the charming thing we call 'society'. Give me four and they'll build a pyramid. Give me five and they'll make one an outcast. Give me six and they'll reinvent prejudice. Give me seven and in seven years they'll reinvent warfare. Man may have been made in the image of God, but human society was made in the image of His opposite number, and is always trying to get back home. - Stephen King, The Stand
Culture and Empire and The Psychopath Code by the late Pieter Hintjens were thought provoking, yet easy reads. I wish more authors could break apart complex problems in such an understandable fashion. Pieter's ability to research any given topic and to build a convincing case is a skill I'd like to acquire one day.
For the sake of argument, let's divide society into four roughly equal chunks. We have the bandits, who specialize in taking from others. Then, we have the beggars, who specialize in getting something for nothing. Middle management, perhaps. Then, we have the bureaucrats, who specialize in making rules and keeping things organized. Finally, we have the bakers, who specialize in making things that other people need. - Pieter Hintjens, Culture and Empire
When the cost of secrets held by one person or group outweighs the benefits to society, then it's right that those secrets be leaked. Security does not just trump Liberty, he takes her into a dark back alley, violates her repeatedly, and then beats her senseless with a heavy stick. - Pieter Hintjens, Culture and Empire
As far as technical books go, I've been mostly reading about things that are relevant to my current job. I learned most from Site Reliability Engineering and SQL Server 2012 Internals. They are long, very slow reads, but the information has a depth to it which is hard to get by in other formats. In retrospect, I often wonder if I might have been better of reading them more selectively though.
Covering the same domain, I finally got around to reading the classic Release it! This book was published in 2007, and it shows. However, that doesn't mean it's not worth your time. It's fascinating to see how much the infrastructure landscape has changed over the years. Knowing where we're coming from deploying and running large systems helps me understand why we're doing things in a certain way today.

Movies

The Revenant. If you're looking for a rich storyline, you might end up disappointed. However, it's the high quality cinematography and breathtaking scenery that make this film. I'm a sucker for the romance of the American frontier though.

Snowden. Not a documentary, but a Hollywood picture telling the story of the most disputed whistleblower of the century. To be fair, having worked with government tech over the years, I used to be ignorant of the fact that a government agency would be able to acquire the talent needed to build and run a mass surveillance system successfully. Being Belgian, I totally ignored the fact that in other countries there are some really smart people patriotic enough to serve their country no matter what. Although I read up on the subject when it was early days, the massive scale of it all hit me like a ton of bricks halfway through the movie. Generally I'm pretty excited to work in a field that has changed how we communicate, distribute and do work in such a small period of time. However, it has become more and more apparent to me that the centralized nature of the things we build and use, is also a huge threat to our freedom. It's crucial that we learn from this incident, and grow towards a world where pardoning Snowden is the obvious thing to do, and in which the people in power toying with our privacy are the outcasts.
I think the greatest freedom that I have gained, the fact that I don't have to worry about what happens tomorrow, because I'm happy with what I've done today. - Edward Snowden
Shows

I got around to watching the first episode of Westworld yesterday, and my expectations are sky-high. Friends have been hyping me up over this show since the first episode.

Earlier in the year, I watched Narcos and Stranger Things. Netflix focussing more on content creation instead of rapidly expanding their portfolio with third party content might pay off big time. Both shows bring high quality, bingeable content.

Podcasts

This American Life has been a favorite of mine for a long time. The Monday morning commute almost becomes enjoyable. Almost.

Software Engineering Daily is the podcast that I use to break out of my technological echo chamber. The pace at which these have been published is unheard-of. Five days a week there's a fresh one hour interview with a high-profile guest involved in building incredibly interesting systems at scale.

Being the accidental DBA at work, the SQL Server Pain Relief podcast has been compensating for the lack of people I know that have first hand experience running large, high available and performance intensive SQL Server production systems.

If there's anything I really missed out on that I should watch or read, let me know!

Sunday, September 18, 2016

Commands and events with JustSaying and AWS

I've been looking into handing a bit of our messaging infrastructure over to a managed alternative. Managing your own messaging infrastructure that should be highly available is not always an investment you want to make in this day and age. Going through the documentation and relying on experiences from some people I trust, I ended up looking at AWS and SNS/SQS.

Making the Github repository rounds, looking for inspiration, I stumbled on JustSaying: a library by the people from JustEat implementing a message bus on top of AWS.

I wanted to find two messaging patterns in this library:
  1. Command queuing. A common pattern in our components is to react to an event by making an HTTP request to an external partner. To improve reliability and throughput, we generally don't make that HTTP request in the projection itself, but rather drop a command onto a queue which will then be processed in parallel using a bounded amount of retries. When things do go wrong, we either retry the messages by moving them from the error queue back to the input queue or we change the reaction and reset the projection checkpoint, sending the commands again.
  2. Pub-sub. Another pattern used when there is a certain level of familiarity between components, is to have a component publish events. Other components can subscribe to these messages and have them delivered to their own queues.
Both these styles are supported by JustSaying.

In this example, I have two commands: BookFlight and CancelBooking, with two related events: FlightWasBooked and BookingWasCancelled.

Since JustSaying requires messages to inherit from a base class, these message definitions live on the outside, far from the domain. This allows to decouple the domain from the outside contracts and to make sure the events go out to the world in the format I want them to be.

To handle these messages, JustSaying requires you to implement the IHandler interface.

Having this out of the way, we need to configure the bus (publishers and subscribers).

First of all, Amazon needs to know who we are and what we're allowed to do.

We should define which region our infrastructure lives in.

Now we can configure our command queue. Commands should be published using an SQS publisher, directly dropping messages into the "Commands" queue. A point-to-point subscriber will directly pull messages from the "Commands" queue and hand them over to the command handlers.

Events are not directly dropped to an SQS queue, but will be created as an SNS topic. We can use SQS to subscribe to these topics and have them delivered to an "Events" queue.

Once the bus has been created, we can start listening and publishing messages.

JustSaying will create two SNS topics and four SQS queues: two input queues and two error queues.

Those topic and queue names are not that descriptive once you introduce multiple components and might cause names to collide. JustSaying allows you to define a custom naming strategy. I've settled on a strategy that is based on the message type and prefixed with the component name. This has the added advantage that each message type now goes into its own queue.



This whole experiment has had a scary low learning curve (maybe a bit too low). While I'm still in the assess-phase, I'm fairly optimistic that running on top of SNS/SQS might take away some of our operational burden. Going over the JustSaying API and code base, it's quite opinionated and there are things I might have approached differently. Some features I'd like to see, like the library providing a message envelope as a first-class citizen (a base message class is something I've regretted in the past) is being worked on, so I'm keeping my eye on those. Since I'm only using command queuing at the moment, I should be pretty safe from future breaking changes to the message format and such.

Sunday, August 21, 2016

My InfoQ interview on DDD, events and legacy

Seems that it's impossible to beat the Gaussian curve of blogging frequency. On the other hand, I spent quite some of my mental blogging budget on an interview with InfoQ.

I'm a bit bummed out that it's such a large wall of text. When submitting the answers, I highlighted some snippets which should make for easier scanning. Too bad the formatting was lost when publishing it. I included some highlights below.

The interview itself can be found here. Let me know what you think!
Extracting components: Starting out, this can be as trivial as trying to model boundaries as namespaces or modules. 
Invariants: Having core properties enforced deep within the model, allows for a better night's sleep.
Sizing aggregates: Make your aggregates as small as they can be, but not any smaller. There's a big difference between an invariant that needs to be strongly held and data that helps the aggregate to make a decision, but which doesn't require strong consistency. 
ORM pitfalls: Being able to navigate through a graph, which basically walks through your whole database, is a great way to lose any sense of transactional boundaries. 
The value of bounded contexts: Now when I switch between bounded contexts, it feels like walking through a door, entering a separate room where you can tackle a problem with an unbiased mindset, allowing you to use the right tool for the job. 
Introducing domain events: When you don't want to or can't afford to invest in the full paradigm shift there's a middle ground. You can try a hybrid approach in which you, next to persisting state, also persist the events that led up to a specific state. This does entail the risk of introducing a bug which causes split-brain in which your events do not add up to your state.  
Designing contracts: If you get the semantics wrong, you will end up with a system that's held together by brittle contracts that break constantly.

Wednesday, April 27, 2016

Pieter Hintjens

Writing doesn't necessarily always come naturally to me. It often takes me days, weeks or even months of toying with an idea, before I think it's mature enough to put it down into writing. I can't afford that luxury this time though, I wouldn't think of myself as much of a friend if Pieter didn't get to read this in time.

I met Pieter the first time in a bar in Vilnius, December 2013, I accidentally ended up sitting next to him during the traditional pre-conf drinks. The first thing that stood out, was what a comfortable warm sweater he was wearing - I still cherish the memory of that grey woolen sweater on cold winter nights. I'm still unsure whether it was the sweater or his undeniable radiant charisma that made its way into my memories. When Pieter talks, people tend to listen, or at least pay attention. That's what I ended up doing that night - listening, sipping in the knowledge, afraid to make a fool out of myself joining the conversation.

That next day Pieter opened the conference with probably the most motivational keynote I ever attended, aptly titled "Building stuff changes everything". Him being a fellow countryman and me having a few Lithuanian beers in me, helped me gather enough courage to properly introduce myself and talk for a bit.

From that moment on, we would only meet a few times a year, traveling to Vilnius with the Belgian delegation or as a guest at the Domain Driven Design Belgium meetup. During that time, we had a few - far from enough - lengthy conversations. Me mostly asking questions, him sharing his point of view, and me trying hard to keep up, taking it all in. Although he was always more than happy to entertain each question you would throw at him, I would always feel a bit selfish keeping him to myself for too long.

The most memorable talk I had with Pieter was during a tête-à-tête in the Vilnius sky bar. We would mingle Dutch and English, whichever language made the next sentence sound best. We shared some personal experiences, he laid out most of the groundwork for what a good year later materialized into "The Psychopath Code", but most importantly he allowed me a peek through his eyes, looking at his playground we like to call life.

You don't need his Mensa membership card, to realize he is a highly gifted individual. He could have pursued anything he wanted and been good at it, but he chose all-out for people, freedom and love - making it his mission to evangelize his core beliefs.

His words - both spoken and written - have inspired me more than he can imagine. And they will likely continue to inspire others for many more years to come. His work has given me a framework to build on for the rest of my life. There's so much to learn from how he is capable of dissecting the world, to document things that are broken and his tireless effort to make it whole again - one protocol, one word, one hug at a time. Where I would often feel overwhelmed by dark hopeless sentiments, he has given me enough tools to overcome those. From his mouth to my heart: "We're not much more than a pile of ants, but together we can achieve tremendous things".

Pieter, I'm not half the writer you are, but I hope these words can serve as a testimony to your children what a great dad they had. If your time comes, know that I'm grateful that I've been able to call you my friend.

Sunday, April 24, 2016

Using a batch layer for fast(er) aggregations

In the oldest system I'm maintaining right now, we have an account aggregate that, next to mutating various balances, produces immutable financial transactions. These transactions are persisted together with the aggregate itself to a relational database. The transactions can be queried by the owner of the account in an immediate consistent fashion.

The table with these transactions looks similar to this:

There's an index on the timestamp, the account identifier and the transaction type, which allows for fast enough reads for the most common access patterns which only return a small subset.

In a use case we recently worked on, we wanted real-time statistics of an account's transactions over its life time.

Running this query would seek the account id index, to look up all rows that match given predicate. In case one account has tens of thousands of transactions, this results in a high amount of reads. In case your database fits into memory, SQL Server can probably satisfy your query looking in its buffer cache. Although this still has overhead, it's supposed to be a lot faster than when SQL Server is forced to do physical reads - reading pages straight from disk. In this case, where transactions are often years old, and the database does not fit into memory, odds are high that SQL Server will be reading from disk - which is dog-slow.

One option would be to create a new covering index (including columns like CashAmount etc) for this specific workload. The problem is that indexes don't come for free. You pay for them on every write, and depending on your performance goals, that might be a cost you want to avoid. It might even be impossible, or too expensive to create such an index on environments that have no maintenance window and no license that allows for online index builds. Assuming that when you don't own said license, you don't have read replicas available either.

Considering the workload, the never-changing nature of financial transactions and constraints in place, we applied Lambda Architecture theory on a small scale, starting by building daily aggregations of transactions per account.

This translates into scheduling a job which catches up all days, by performing a query per day and appending the results to a specific table.

On our dataset, this compresses the transaction table by a factor of more than 300. Not just that, by separating reads from writes, we give ourselves so much more breathing room and options, which makes me sleep so much better at night.

As you probably noticed, for real-time statistics on this data, we're still missing today's transactions in this table. Since today's transactions are a much smaller subset and likely to live in SQL Server's cache, we can query both the batch table and the transaction table, to eventually merge the results of both queries. For our use case, resource usage and query response times have dropped significantly, especially for the largest accounts.

I don't see it happening in the near future, but in case the usage of these queries grows, we can still borrow more Lambda Architecture practices and push further.

Sunday, April 17, 2016

Notifications from an event log

User notifications are a feature that came as an afterthought, but turned out to be rather easy to implement - without touching (read: breaking) existing functionality - thanks to having an immutable event log.

In the domain I'm working in at the moment, we will often give users incentives to return to the website, or to extend their stay on the website. These incentives were only communicated by email at first, and this is a decent medium when you want users to return to the website. However, when you want to extend their stay on the website, you want to avoid users switching contexts between your website and their mail client. But also, as soon as they return to your website, you want to show them a crisp overview of all relevant calls to action. Having most calls to action map to a specific page, the list of notifications can serve as a one-click starting point, lowering the hurdle to browse to a relevant page.

Notifying a user is one thing. Another use case we wanted to solve, is to dismiss notifications as soon as they are no longer relevant.

Two examples of when a notification might no longer be considered relevant:
  1. When a bonus is awarded to a user, he might ignore the notification and activate the bonus by directly browsing to the specific page.
  2. When a bonus is awarded to a user, he might not visit the website before the bonus expires.
In these cases, to avoid confusion and unsatisfied customers, we want to dismiss the notification automatically.

Let's say that we're going to implement notifications for bonuses. We have these type of events to work with.

On the other hand, we have a set of commands that interact with notifications.

A notification has an identifier, references a user, contains some data, and most importantly can be linked to something.

Working from an immutable event log, we can project the events to commands (to dispatch them eventually).

When a bonus is awarded to a user, we will notify the user, providing the template id and data that can be used inside of the template. In this example, the notification can be linked to a specific bonus, leveraging the bonus identifier.

The user might now see something like this.


Being aware of the events which a bonus produces over its lifetime, and their significance, we choose to dismiss the notification as soon as the bonus is activated or expired (leveraging the bonus identifier as the link again).


Now it's up to the UX team (if you're lucky enough to have one) to decide on how to visualize the difference between a read and a dismissed notification (if at all).

Monday, March 28, 2016

Functional one-liner for running totals in C#

Visualizing some data earlier this week I had to compute the running total of a sequence of numbers.

For example, if the input sequence was [ 100; 50; 25 ] the result of the computation would be a new sequence of [ 100; 150; 175 ].

Muscle memory made me take a procedural approach, which works, but made me wonder if I could get away with less lines of code and without mutable state.

Although C# doesn't try very hard to push a functional approach, the BCL does give you some useful tools.

The first thing that comes to mind is using IEnumerable's Aggregate function, which will apply a function over each item in the sequence and will pass the aggregated partial result the next time the function is applied. Each time the function is applied, we can take the last item (if it exists) of the aggregated partial result and add the current item's value to it, and append that sum to the aggregated partial result.

Another more compact - but less efficient approach - I could think of, is using the index of each element in the sequence, to take subsets and to sum their values.

Running out of ideas, I ported F#'s Scan function which allows more compact code, without giving up efficiency. This function, similar to the Aggregate function, applies a function over each item in the sequence. However, instead of passing the aggregated partial result each time the function is applied, the value of the last computation is passed in, to finally return the list of all computations.

With a bit of good will, C# allows you to be more functional too.

Friday, January 1, 2016

Consumed in 2015

I started in 2014 to keep lists of everything I consume. I've continued this effort throughout 2015 and can now share the items I particularly enjoyed.

In 2015, I read 16 books and 3 papers, watched 3 movies and 4 shows, listened to 1 audio book and no podcasts. A lot less TV compared to 2014, but most of that time went to playing video games with some of my friends. Also a lot less time spent in the car listening to audiobooks or podcasts, since I'm now dropping off my girlfriend every day.


Going over the books I've read this year here are my recommendations:
Most of the TV shows I've watched this year were more than great: