Sunday, December 29, 2013

2013's most read posts

This time of year, we're all very busy honoring traditions. This post is no exception (2009, 2010, 2011, 2012).

Today was probably the first time this year that I took some time to study my blog's Google Analytics metrics. If I keep last year's black swan posts out of the equation, traffic seems to be comparable to last year's. The most meaningful metrics to me are the number of subscriptions and the average time spent reading. Both of them increased - happy about that!

The actor model has been gaining in popularity more and more. Next to addressing more infrastructural concerns, it can also be used as a framework for modeling and reasoning about complex systems. Working with a team of mainframe programmers over the last year, I observed a few similarities in how they have been designing their systems. Actor Model in COBOL is the third most read post of 2013.

The second most read post was Not handling edge cases, making them explicit instead. Inspired by a talk with Greg Young at DDDX, I tried to create a narrative that uncovered an edge case in a green field project. Instead of handling the edge case, we use an event to make it explicit, allowing a human to intervene. This way we can go to market more quickly, with less code, and we might even end up with happier customers.

Number one is But I already wrote it. In this post I shared my motivation having an argument with a colleague about whether to dump some code that did too much. Don't cherish your code. It's nothing but a means to an end; the side product of creating a solution. Aim for simple and lean solutions; nobody likes bulky software, nobody likes fighting complexity all day. Don't neglect the hidden cost of that extra small feature you're throwing in.

Thank you for reading!

Sunday, December 22, 2013

Databases are growing on me

I learned all about logical design of relational databases back in school; tables, columns, data types, views, normalization, constraints, primary keys, foreign keys... At the same time, I learned how to use SQL to put data in, and how to get it out again; INSERT INTO, SELECT, FROM, WHERE, JOIN, GROUP...

In the first project I worked on just out of school, we weren't doing anything interesting with databases; we didn't have that many users, or that much data. A database veteran on the team took it on him to maintain the schema and to provide stored procedures we could do work with.

All that time, I consciously was very ignorant of the database. I had no idea what was in the box, and I didn't care either; databases were boring, applications were where the fun was at.

Since then, I have rarely worked with a team that had a dedicated role for database design. Why invest in another person when you can do without? Database basics are not rocket science; with 20% of the knowledge, you get very far. Definitely now that it's probably easier and cheaper to throw hardware at the problem.

That being said, it's a good idea to keep a DBA close. Time and time again I see them only being called in when it's too late and much needed improvements are often too far-reaching and expensive. No wonder DBA's are grumpy all the time.

Being exposed to databases more and more, I got to pick up a few things here and there - mostly cargo-cult best practices. It wasn't until last year that I got really curious for what was in the box. Working on an application with a decent amount of data crunching for a year forced me to open up the lid. Also my ventures in NoSQL land, overhearing discussions on Twitter between kellabyte, ayende, gregyoungpbailis and others had much to do with it.

On opening the lid, I found a lot more than I expected. It had never occurred to me how much interesting problems databases have to solve. Making a database execute a query and see results returned in milliseconds only looks easy on the surface. Memory, disk, CPU, caching, networking, protocols, concurrency, fault tolerance, data structures, transactions, compilation... it's all in there.

The book Physical Database Design and the SQLite technical documentation were the first good reads that helped me understand what was going on closer to the metal. From there, I now try reading a paper (or a reference) from the Readings in Database Systems collection once in a while. This collection of papers is supposed to contain the most important papers in database research. Maybe academic, but delicious brain food nonetheless - stretching my mind in ways I'm not used to.

Sunday, December 15, 2013

Buildstuff 2013

Last night, I returned from Vilnius, after seven intensive days of Buildstuff. After a few long months, I was looking forward to be influenced and inspired by concepts and experiences strange to my day-to-day job - surrounded by people hungry to build better software. Because of the diversity of the program, the high level of the speakers and the presence of some familiar DDD-community faces, my expectations were easily met. Also, the location and low rates for tickets, transport and beer make it a very reasonable investment.

I expect the videos to be uploaded over the next few weeks and thus I will not bore you with session summaries. Instead I picked out some quotes which stood out for me in one way or the other; because they were spot on, summarize a concept in just a few words, or because they challenge me and give me food for thought.

Hope to see you again next year, Buildstuff.

Bertrand Meyer
Code contracts help generating tests. Pre-condition failed? Not a useful test. Post -condition failed? We found a bug.
Documenting assumptions outside of code lead to rapid unscheduled disassembly of the Arianne 5.
Why separate tests from your program? 
RBob Ashton
If I have to download the internet to use your module, I'm not going to use it.
Ian Cooper
The test is the unit, not the object under test. 
If you use tests to drive the implementation that couple to the inside, remove them once you're finished.
Devs often find outside-in testing too hard because they don't master the good setup techniques.
Joe Armstrong
A program is the most precise description of the problem that we have.
Paper systems are far more fault tolerant than what we're replacing them with.
Failure should not be handled in each piece of code but let it be handled externally.
Jonas Bonér
Isolate the failure, compartmentalize, manage failures locally, avoid cascading failure.
Synchronous RPC, distributed transactions and shared mutable state are the graveyard of distributed systems. 
Synchronous RPC give you a very high abstraction that ignores all problems. 
It's all about message passing; let that be the programming model.
Simon Brown
You can have micro-services without them being different physical artifacts. Pull them out when it pays off.
Package by component instead of by layer. Components can have layering too.
If your code doesn't match the architecture, it becomes very hard to do architectural refactoring.
Good architecture enables agility.
Torben Hoffmann
Technical arguments are never enough, you need economical ones.
You need to experience pain to have the willingness to change.
Aras Pranckevicius
Better hardware and software is the cheapest way to get more work done.
Stupidity adds up, intelligence multiplies.
Guidelines, not rules.
Alberto Brandolini
Watching the ceiling while making decisions is forbidden; visualize.
Time-box decisions. Be happy with a decision, although it's probably not the best solution.
The easiest way to remove crap from your system is not to put crap in.
Greg Young
An append-only model can be cached forever.
Wrong models cause massive accidental complexity. 
Pieter Hintjens
You are defined by what you say online, not by who you are.
You can't avoid getting your information stolen, but we can make it extremely expensive.
As information centralizes, it gets easier to spy on us. 

Sunday, December 8, 2013

Book review: Antifragile

When things are fragile, they break easily. We often see fragility as a bad thing and design things to be robust. But this isn't what we're really after either; things that are robust might be hard to break, but they're also hard to change, making them fail to adapt to new stressors over time. The model that we're really after is antifragility; when something is antifragile it will benefit from stressors and get better over time.

A cup is fragile; drop it and it breaks. A skyscraper on the other hand is robust; it is designed to resist the forces of nature; from hurricanes to earthquakes. A perfect example of antifragility is the human body; go to the gym, work your ass off, and you will grow stronger.

In this book, Taleb takes this model and applies it to a wide variety of systems; biological, medical, economic and political. Most concepts are easily relatable to software too.

While the core concepts could be summarized in a few pages, Taleb uses anecdotes, ancient texts, narratives and formulas to prove his point - resulting in a 519 pages thick book. Not all passages are a smooth read though. Some parts read as if they were written in one sitting, dumping everything that had been building for years on paper, without being proofread after. Mixing that with Taleb's extremely rich (= hard) vocabulary makes for reading sessions where your full concentration is a must - not suitable for after work commutes. It took me six weeks to finish the book - well worth it though.

There are a lot of things that stuck with me. Instead of sharing those in my own words, I revisited some quotes I highlighted and copied them below. I hope they give a better feel of what to expect from the book.
Remember that you need a name for the color blue when you build a narrative, but not in action - the thinker lacking a word for "blue" is handicapped; not the doer. 
It would have taken a bit of heroic courage to justify inaction in a democracy where the incentive is to always promise a better promise than the other guy, regardless of the actual, delayed cost. 
It's much easier to sell "Look what I did for you" than "Look what I avoided for you." Of course a bonus system based on "performance" exacerbates the problem. 
The benefits of procrastination apply similarly to medical procedures: we saw that procrastination protects you from error as it gives nature a chance to do its job, given the inconvenient fact that nature is less error-prone than scientists. 
People who build their strength using these modern expensive gym machines can lift extremely large weights, show great numbers and develop impressive looking muscles, but fail to lift a stone; they get completely hammered in a street fight by someone trained in more disorderly settings.  
In project management, Bent Flyvbjerg has shown firm evidence that an increase in the size of projects maps to poor outcomes and higher and higher costs of delays as a proportion of the total budget. But there is a nuance: it is the size per segment of the project that matters, not the entire project. 
What survives must be good at serving some purpose that time can see but our eyes and logical faculties can't capture. 
We notice what varies and changes more than what plays a large role but doesn't change. We rely more on water than on cell phones but because water does not change and cell phones do, we are prone to thinking that cell phones play a larger role than they do. 
Corporations that are large today should be gone, as they have always been weakened by what they think is their strength: size.  
There are secrets to our world that only practice can reveal, and no opinion or analysis will ever capture in full. 
Recall that under nonlinearities, the simple statements "harmful" or "beneficial" break down: it is all in the dosage. 
When you think you have found a free lunch, say, steroids or trans fat, something that helps the healthy without visible downside, it is most likely that there is a concealed trap somewhere. Actually, my days in trading, it was called a "sucker's trade." 
What made medicine mislead people for so long is that its successes were prominently displayed, and its mistakes literally buried - just like so many other interesting stories in the cemetery of history. 
We are built to be dupes of theories. But theories come and go; experience stays. 
The English went further and had the families of the engineers spend time with them under the bridge after it was built. 
In the old days, privilege came with obligations. You want war? First in battle.
Never ask anyone for their opinion, forecast, or recommendation. Just ask them what they have - or don't - in their portfolio. 
Corporate managers have incentives without disincentives - something the general public doesn't quite get, as they have the illusion that managers are properly "incentivized." 
Third layer, the even more serious violation: companies trying to misrepresent the product they sell by playing with our cognitive biases, our unconscious associations, and that's sneaky. The latter is done by, say, showing a poetic picture of a sunset with a cowboy smoking and forcing an association between great romantic moments and some given product that, logically, has no possible connection to it. You seek a romantic moment and what you get is cancer. 
Don't be fooled by money. These are just numbers. Being self-owned is a state of mind.

Sunday, November 24, 2013

Observations over assumptions

I heard a story once about an engineer who worked on the Disneyland site when it opened in Anaheim, CA in 1955.  
A month before the park opened, the new grass and sod were being applied to the grounds as one of the last items to be completed before the big grand opening. The parking lot was some distance from the park's gates and required a lot of turf.  However, the sidewalks had not been planned or constructed to accommodate patterns. 
Before engineers would validate the proper placement of the sidewalks, a heated internal discussion grew among landscape designers and park developers over how and what to build.  One engineer suggested they allow visitors to walk on the grass for months in order to observe the paths they created themselves. Then they would build over those paths that showed the most foot traffic. 
Those sidewalks are still there today. 
One engineer's focus on meeting the guests' needs saved the park millions of dollars' worth of error and political positioning.
I found this story on Quora a few weeks ago, and thought it was a testament to how observing - instead of assuming - can save you a lot of effort, and will end up serving the users best.

In the first product I helped building, there had been lots and lots of requirements gathering before we got to build the first functionality. Once we rolled out the pieces to the first set of users, they were disappointed to say the least - enraged actually; "This not usable at all! Do you even know what you're doing?". Apparently requirements were made up by people higher in rank, without consulting those who would have to use the software on a day-to-day basis. Not the best move getting buy-in from actual users, but I can't really blame them either; requirements are hard, often times you're just making stuff up as you go along - with best intentions. Being in a crisis, we got to learn a lot the next few months. One of us got sent out to the customer location a few times, and got to observe how they were trying to use our software. That information proved to be invaluable, and gave us enough to start shipping better and more useful software. Users felt empowered by our software, eventually leading to earning their trust and approval.

Another example can be found in my current project - which is not that visible to users, but more behind the scenes. Instead of guessing performance targets, it's by observing production metrics that we're able to set realistic goals.

Observing instead of assuming usually leads to better results. Keep in mind that you can misinterpret what you're observing too, and that there often is no other option than to assume. You can still avoid overly expensive mistakes, by validating these assumptions as early as possible.

Sunday, November 17, 2013

Event storming workshop slides

At Euricom, we quarterly all retreat to headquarters for a day of sharing and learning. This time, I and others organized and facilitated an event storming workshop.

After a short introduction on event storming participants were initiated to the domain of Cambio CarSharing - which is packed with behaviour. After that, seven groups of five (+ one domain expert) spread out across the office, and spent two slots of twenty minutes modeling the domain - with two extra slots for feedback.

Even after an afternoon of taxing sessions, people were willing to tap out of their energy reserves, and ended up presenting great results.

You can find the slides (heavily based on Alberto's material) I used here, or embedded below.



If you're interested in running your first event storming workshop, I'd love to come over and help you get started.


Sunday, November 10, 2013

An event store with optimistic concurrency

Like I mentioned last week - after only five posts on the subject - there still are a great deal of event sourcing nuances left to be discovered.

My current event store implementation only supports a single user. Due to an aggressive file lock, concurrently accessing an aggregate will throw an exception. Can we allow multiple users to write to and read from an event stream? Also, what can we do about users making changes to the same aggregate; can we somehow detect conflicts and avoid changes to be committed?

Multi-user

In the current version, concurrently appending to or reading from an aggregate's event stream will throw since the file will already be locked.
Parallel.For(0, 1000, (i) =>
{    
    _eventStore.CreateOrAppend(aggregateId, new EventStream(new List<IEvent>() 
    { 
        new ConcurrencyTestEvent() 
    }));
    _eventStore.GetStream(aggregateId);    
});
The exception looks like this: "System.IO.IOException: The process cannot access the file 'C:\EventStore\92f42a08-8583-4dcf-98a5-440b06f34719.txt' because it is being used by another process."

To prevent concurrent file access, we can lock code accessing the aggregate's event stream. Instead of using a global lock, we maintain a dictionary of lock objects; one lock object per aggregate.
lock (Lock.For(aggregateId))
{
    using (var stream = new FileStream(
        path, FileMode.Append, FileAccess.Write, FileShare.Read))
    {
        // Access the aggregate's event stream
    }
}

public class Lock
{
    private static ConcurrentDictionary<Guid, object> _locks = 
        new ConcurrentDictionary<Guid, object>();

    public static object For(Guid aggregateId)
    {
        var aggregateLock = _locks.GetOrAdd(aggregateId, new object());

        return aggregateLock;
    }
}     
Optimistic concurrency

Before committing changes, we want to verify that no other changes have been committed in the meanwhile. These changes could have influenced the behaviour of our aggregate significantly. Appending the last changes without considering what might have happened in the meanwhile might corrupt our aggregate's state.

One way to verify this is by using a number (or a timestamp - clocks, bah) to keep track of an aggregate's version. It's up to the client to tell us which version he expects when appending to a stream. To accommodate for this, we need to change the contract of our event store.
public interface IEventStore
{
    void Create(Guid aggregateId, EventStream eventStream);

    void Append(Guid aggregateId, EventStream eventStream, int expectedVersion);

    ReadEventStream GetStream(Guid aggregateId);
}
Clients now need to pass in the expected version when appending to a stream. The result of reading a stream will include the current version.

In the event store, we now store an index with every event.


If we append to an event stream, we will get the current version by reading the highest index - storing this in aggregate meta data would be faster for reading. If the current version doesn't match the expected version, we throw an exception.
var currentVersion = GetCurrentVersion(path);

if (currentVersion != expectedVersion)
    throw new OptimisticConcurrencyException(expectedVersion, currentVersion);

using (var stream = new FileStream(
    path, FileMode.Append, FileAccess.Write, FileShare.Read))
{
    using (var streamWriter = new StreamWriter(stream))
    {
        foreach (var @event in eventStream)
        {
            currentVersion++;

            streamWriter.WriteLine(new Record(
                aggregateId, @event, currentVersion).Serialized());
        }
    }
}
A test for that looks something like this.
try
{
    GivenEventStore();
    GivenAggregateId();
    GivenEventStreamCreated();
    WhenAppendingTwoEventStreamsWithTheSameExpectedVersion();
}
catch (OptimisticConcurrencyException ocex) 
{
    _expectedConcurrencyException = ocex;
}

[TestMethod]
public void ThenTheConcurrencyExceptionHasANiceMessage()
{
    var expected = "Version found: 3, expected: 1";
    var actual = _expectedConcurrencyException.Message

    Assert.AreEqual(expected, actual);
}
Reading the event stream doesn't change much; we now also read the current version, and return it with the event stream. 
var lines = File.ReadAllLines(path);

if (lines.Any())
{
    var records = lines.Select(x => Record.Deserialize(x, _assembly));
    var currentVersion = records.Max(x => x.Version);
    var events = records.Select(x => x.Event).ToList();

    return new ReadEventStream(events, currentVersion);
}

return null; 
And that's one way to implement optimistic concurrency. The biggest bottleneck in this approach is how we read the current version; having to read all the events to find the current version isn't very efficient.

Transactional behaviour is also missing. I've been thinking about adding a COMMIT flag after appending a set of events, and using that to resolve corruption on reads, or is this fundamentally flawed? 

Sunday, November 3, 2013

Event source all the things?

Having covered projections last week, I think I have come full circle in these posts that turned out to be a small preliminary series on event sourcing. Even though there are still a vast amount of nuances to discover, I think I've captured the gist of it. Even without running an event sourced system in production - I feel as if I somewhat have an idea of what event sourcing can bring to the table.

Event sourcing gives you a complete history of events that caused an aggregate to be in its current state. In some scenarios this will add an enormous amount of value, in other scenarios it will give you nothing - it might even steal time and effort.

The first thing you do - before even considering implementing event sourcing - is talking to your business. Do they feel as if events are a natural way to represent what's going on in their domain? Event sourcing is a lot more than just a technical implementation detail, discovering and understanding all of what goes on in a domain is a big investment - from both sides. Is it worth the trouble?

In my first job I worked on software for fire departments. I just now realize in how many bits of our solution event sourcing could have helped us:
  • the life cycle of a vehicle assigned to an emergency: vehicle dispatched, vehicle left the station, vehicle en route, vehicle arrived on the scene, vehicle back in the station...
  • a person's career: person was promoted, person was detached to another station, person learned a new skill...
  • a shift's schedule: person attached to unit, person returned to person pool, unit dispatched...
This data had to be made available in a set of diverse read models. Getting the data out was complex at times, often even impossible. A lot of these changes had to be propagated to external systems; there was no way to get that info out in real-time, and external systems had no notion of what happened.

In one of the functionalities of a system I'm currently working on, users also wanted to know what happened in the past, but for completely different reasons. Being in a financial context, they wanted to know who was responsible for changing system settings. Here it's not an event log they need, but a simple audit trail.

If it is just a passive log your business wants, you can get away with cheaper alternatives; a command journal, an audit trail and so on.

Benefits

Event sourcing goes hand-in-hand with Domain Driven Design. Events are a great tool to go from a structural model to a behavioural model, helping you to capture the true essence of a domain model.

Building and maintaining an event store should be doable. It's an append-only data model, storing serialized DTO's with some meta data. This makes - compared to ORM's and relational databases - tooling easier as well.

In traditional systems, you have to keep a lot of things in your head at once; how do I write my data, but also how do I query my data, and more importantly how do I get my data out in all these different use cases without making things too hard. In event sourced systems, separating writes from reads makes for more granular bits, easing the cognitive load.

Events can be projected into anything: a relational database, a document store, memory, files... This allows you to build a read model for each separate use case, while also giving you a lot of freedom in how you're going to persist them.

You can replay projections, rebuilding a read model from scratch. Forget about difficult data migrations.

Testing feels consistent and very complete. A test will assert if all the expected events were raised, but will also implicitly assert that unexpected events were not raised. Testing projections is also straight-forward.

Events provide a natural way of integrating with other systems. Committed events can be published to external subscribers.

Troubleshooting becomes easier since a developer can copy an event stream from production, and replay it locally - reproducing the exact issue without jumping through hoops getting the system in a specific state.

Instead of patching corrupted production data directly, you can send a compensating event or fix the projection and replay everything. This way nothing gets lost, and consistency between code and outcome is guaranteed.

Downsides

Defining events is hard. Defining good events takes a lot of practice and insight. If you're forcing a structural model into a behavioural one, it might even be impossible. So don't even consider turning CRUD into an event sourced model.

There are a few places you need to be on the look out for performance bottlenecks. Event streams of long lived aggregates might grow very big. Loading a giant event stream from a data store might take a while - snapshots can help here. Projecting giant event streams might get you into trouble too - how long will it take to rebuild your read model, will it even fit into memory? Making projections immediate consistent might become a problem if you do a lot of them. Parallelization or giving up on immediate consistency might bring solace.

Events don't change, versioning might get awkward. Are you going to create a new event type for each change, or will you relax deserialization? Or maybe you want to implement event migrations?

Since you're persisting multiple models; events and one or more read models, you're going to consume more storage, which will cost you.

Adaptation in the wild

Although there are - from a a business and engineering perspective - some good arguments to be made for event sourcing, those arguments only apply to a modest percentage of projects. Even when there's a strong case to be made for event sourcing, there are very few people with actual experience implementing an event sourced system and prescriptive frameworks that you can just drop into a project and feel good about, are lacking. Most won't even care about event sourcing to start with, but even if they do, it's a fight upstream; it introduces a risk most might not be comfortable with.

Having said that, there are some really good projects out there that are steadily gaining popularity and maturity. Pioneers in the field are sharing and documenting their experiences, lowering the barriers for others. Things are moving for sure.

As always, event sourcing is not a paradigm to blindly apply to each and every scenario, but definitely one worth considering.

Since I'm not running any of it in production, tell me what I'm missing, there must be more things that turn out to be harder than they sound at first right? If you're not running it in production, but thinking about it, what are some of your concerns? What are your predictions for the future of event sourcing?

Sunday, October 27, 2013

Event projections

In my first two posts on event sourcing, I implemented an event sourced aggregate from scratch. After being able to have an aggregate record and play events, I looked at persisting them in an event store. Logically, the next question is: how do I query my aggregates, how do I get my state out?

In traditional systems, write and read models are not separated, they are one and the same. Event sourced systems on the other hand have a write model - event streams, and a separate read model. The read model is built from events committed to the write model; events are projected into one or more read models.


An interface for a projection could look like this.
public interface IProjection {
    void Handle(EventStream eventStream);                     
}  
A projection takes in an event stream, and projects it to some read model.

A read model can be anything; a cache, a document store, a key value store, a relational database, a file, or even some evil global state.
public class EvilStatisticsReadModel {
    public static int WithdrawalAmountExceededCount { get; set; }

    public static int AmountDepositedCount { get; set; }
}
In this model, we want to maintain statistics of events that happened. For that to happen, we need to define a projection of our event stream.
public class ProjectionsToEvilStaticsReadModel : IProjection {
    public void Handle(EventStream eventStream) {
        foreach (var @event in eventStream)
            When((dynamic)@event);
    }

    public void When(WithdrawalAmountExceeded @event) {
        EvilStatisticsReadModel.WithdrawalAmountExceededCount++;
    }

    public void When(AmountDeposited @event) {
        EvilStatisticsReadModel.AmountDepositedCount++;
    }    
}
If we now let this projection handle an event stream, our read model will be kept up-to-date.
[TestMethod]
public void ReadModelIsKeptUpToDateWhileProjectingTheEventStream() {
    var events = new List<IEvent>() {
        new WithdrawalAmountExceeded(new Amount(3000)),
        new AmountDeposited(new Amount(300)),
        new AmountDeposited(new Amount(500)),
        new AmountWithdrawn(new Amount(100))
    };
    var stream = new EventStream(events);

    new ProjectionsToEvilStaticsReadModel().Handle(stream);

    Assert.AreEqual(1, EvilStatisticsReadModel.WithdrawalAmountExceededCount);
    Assert.AreEqual(2, EvilStatisticsReadModel.AmountDepositedCount);    
}
One could argue that all of this is too much - not worth the effort. Where you first just persisted the structure of an aggregate, and could query that same structure, you now first have to persist events for then to write projections that maintain separate read models that can be queried.

You have to look beyond that though. Those that have done any serious work on a traditional stack have felt the pain of migrations, complex queries that take up three pages, obscure stored procedures that run for hours, optimizing while having to consider a handful of different use cases, finding the balance between write- and read performance, database servers that can't handle the load on special events, expensive licenses and so on. While these first few concerns are mostly technical, personally I'm often overwhelmed by how much concepts these designs force you to keep in your head all at once.

Separating reads from writes using event sourcing might bring some relief. Reducing cognitive overload by separating responsibilities into smaller, more granular bits might be the only argument you need. However, there's a lot more. Running an event store should be low-maintenance; it's an append-only data model storing simple serialized DTO's with some meta data - forget about big migrations (not completely though), schemas, indexes and so on. Even if you project into a relational database, being able to re-run projections should make migration scripts and versioning avoidable. An event can be projected into multiple read models, allowing you to optimize per use case, without having to take other use cases into account. Since it should be easy to rebuild read models, they can be stored in cheap and volatile storage - think key-value store, in-memory and so on, allowing for crazy fast reads.

Letting go of the single-model dogma seems to enable so much more, giving you a whole new set of possibilities. Another extremely useful use case that suddenly becomes a lot easier to support is business intelligence; when business experts think of new ways to look at the past, you just create a new projection and project events from day one. Getting statistics of how your users are using your system doesn't sound that hard now, does it?

One of the obvious drawbacks next to writing a bit more, boring code is that storage costs will increase - you are now persisting the same data in multiple representations. But storage is cheap, right? Maybe money isn't an issue, but what about performance? It's slower to do three writes instead of one, right? For a lot of scenarios this won't be much of an issue, but if it is, there is a lot of room for optimiziations doing projections; parallelization, eventual consistency and so on.

Next week: event source all the things? 

Sunday, October 20, 2013

An event store

Last week, I implemented an event sourced aggregate from scratch. There I learned, that there isn't much to a naively implemented event sourced aggregate; it should be able to initialize itself from a stream of events, and it should be able to record all the events it raises.
public interface IEventSourcedAggregate : IAggregate {
    void Initialize(EventStream eventStream);

    EventStream RecordedEvents();
}
The question I want to answer today is: how do I persist those event sourced aggregates?

In traditional systems, aggregate persistence is not a trivial topic. Especially relational databases have the reputation to make things hard on us. Even though tools such as ORM's have tried to help in making the gap between the relational and object oriented model as small as possible, there is still a lot of friction associated with the notorious impedance mismatch.
The last two years I have done some work using one of the popular NoSQL variants: a document store. In this paradigm, each aggregate materializes into a single document. Structure, constraints and referential integrity are not enforced by the database, but by code. The advantage of relaxing consistency at the database, is that it makes it easier to scale outside a single machine, and that developers feel more empowered. Giving in on consistency guarantees is not acceptable for each system though. Again, pick the right tool for the job.
What both paradigms have in common is that they both focus on structure instead of behaviour.

Event sourced systems on the other hand, don't care about the structure of an aggregate, but about the events that caused the aggregate to be in its current state. Only having to store events - which are represented as DTO's - makes persistence and tooling much easier compared to traditional systems.

There are three things a minimalistic event store should be able to do:
  1. Store a new event stream 
  2. Append to an existing event stream
  3. Retrieve an existing event stream
An interface for that could look like this.
public interface IEventStore {
    void CreateOrAppend(Guid aggregateId, EventStream eventStream);

    EventStream GetStream(Guid aggregateId);
}
Notice that there is no update or delete - events happen, we can't jump in a time machine and alter the past. This allows us to get by with an append-only data model. Can you imagine how much easier to implement, optimize and distribute this must be compared to traditional models?

As an exercise, I took the interface I just defined and implemented a durable, non-transactional, non-scalable (up to 4294967295 streams), single-user event store that persists event streams in raw text files. Each record on disk represents a serialized event with a tiny bit of metadata. 
public class FileEventStore : IEventStore {    
    private const string Dir = @"C:\EventStore";            

    public void CreateOrAppend(Guid aggregateId, EventStream eventStream) {
        EnsureDirectoryExists();

        var path = EventStoreFilePath.From(Dir, aggregateId).Value;

        using (var stream = new FileStream(
            path, FileMode.Append, FileAccess.Write, FileShare.None))
        {
            using (var streamWriter = new StreamWriter(stream))
            {
                streamWriter.AutoFlush = false;
                foreach (var @event in eventStream)
                    streamWriter.WriteLine(
                        new Record(aggregateId, @event).Serialized());
            }
        }
    }
    
    public EventStream GetStream(Guid aggregateId) {           
        var path = EventStoreFilePath.From(Dir, aggregateId).Value;

        if (!File.Exists(path))
            return null;

        var lines = File.ReadAllLines(path);
        var events = lines
            .Select(x => Record.Deserialize(x))
            .Select(x => x.Event)
            .ToList();

        if (events.Any())
            return new EventStream(events);

        return null;
    }

    private void EnsureDirectoryExists()
    {
        if (!Directory.Exists(Dir))
            Directory.CreateDirectory(Dir);
    }
}
A long-ish test proves that I can create a stream, append to it and read it again without losing any data.
[TestMethod]
public void EventStoreCanCreateAppendAndRetrieveEventStreams() 
{
    var eventStore = new FileEventStore();

    var aggregateId = Guid.NewGuid();
    var account = new Account(aggregateId);
    account.Deposit(new Amount(3000));
    account.Withdraw(new Amount(400));    
    
    Assert.AreEqual(2, account.RecordedEvents().Count());
    Assert.AreEqual(new Amount(2600), account.Amount);

    eventStore.CreateOrAppend(aggregateId, account.RecordedEvents());
    var eventStream = eventStore.GetStream(aggregateId);

    Assert.AreEqual(2, eventStream.Count());

    var anotherAccount = new Account(aggregateId);
    anotherAccount.Initialize(eventStream);

    Assert.AreEqual(new Amount(2600), anotherAccount.Amount);

    anotherAccount.Withdraw(new Amount(200));

    Assert.AreEqual(new Amount(2400), anotherAccount.Amount);
    Assert.AreEqual(1, anotherAccount.RecordedEvents().Count());

    eventStore.CreateOrAppend(aggregateId, anotherAccount.RecordedEvents());

    var finalEventStream = eventStore.GetStream(aggregateId);
    Assert.AreEqual(3, finalEventStream.Count());
}
This produced the following artifact on disk.


While this implementation is far from ideal - dangerous really, it does show that implementing a minimalistic event store is doable - especially if you can build on top of existing data stores.

Doable, but not trivial. Greg Young - having actually implemented an event store, on the CLR too - recently shared some invaluable insights into what it takes to build a real-world event store.
I have always said an event store is a fun project because you can go anywhere from an afternoon to years on an implementation. 
I think there is a misunderstanding how people normally use an event stream for event sourcing. They read from it. Then they write to it. They expect optimistic concurrency from another thread having read from then written to the same stream. This is currently not handled. This could be handled as simply as checking the expected previous event but this wouldn't work because the file could be scavenged in between. The way this is generally worked around is a monotonically increasing sequence that gets assigned to an event. This would be relatively trivial to add. 
The next issue is that I can only read the stream from the beginning to the end or vice versa. If I have a stream with 20m records in it and I have read 14m of them and the power goes out; when I come back up I want to start from 14m (stream.Position = previous; is a Seek() and 14m can be very expensive if you happen to be working with files the OS has not cached for you). This is a hugely expensive operation to redo and the position I could have saved won't help me as the file could get compacted in between. To allow arbitrary access to the stream is a bit more difficult. The naive way would be to use something like a sorted dictionary or dictionary of lists as an index but you will very quickly run out of memory. B+Trees/LSM are quite useful here. 
Even with the current index (stream name to current position) there is a fairly large problem as it gets large. With 5m+ streams you will start seeing large pauses from the serializing out the dictionary. At around 50m your process will blow up due to 1gb object size limit in CLR
Similar to the index issue is that with a dictionary of all keys being stored in memory and taking large numbers of writes per second it is quite likely you will run out of memory if people are using small streams (say I have 10000 sensors and I do a stream every 5 seconds for their data to partition). Performance will also drastically decrease as you use more memory due to GC.
A more sinister problem is the scavenge / compaction. It stops the writer. When I have 100mb of events this may be a short pause. When I have 50gb of events this pause may very well turn into minutes. 
There is also the problem of needing N * N/? disk space in order to do a scavenge (you need both files on disk). With write speeds of 10MB/second it obviously wouldn't take long to make these kinds of huge files especially in a day where we consider a few TB to be small. The general way of handling this is the file gets broken into chunks then each chunk can be scavenged independently (while still allowing reads off it). Chunks can for instance be combined as well as they get smaller (or empty). 
Another point to bring up is someone wanting to write N events together in a transactional fashion to a stream. This sounds like a trivial addition but its less than trivial to implement (especially with some of the other things discussed here). As was mentioned in a previous thread a transaction starts by definition when there is more than one thing to do. 
There are decades worth of previous art in this space. It might be worth some time looking through it. LSM trees are a good starting point as is some of the older material on various ways of implementing transaction logs.
Playing with Greg's event store is something that has been on my list for a long time.

What is your experience with implementing an event store?

Next week: but how do we query our aggregates now?

Sunday, October 13, 2013

An event sourced aggregate

Last week I shared my theoretical understanding of event sourcing. Today, I want to make an attempt at making that theory tangible by implementing an event sourced aggregate.

In traditional systems, we only persist the current state of an object.


In event sourced systems, we don't persist the current state of an object, but the sequence of events that caused the object to be in the current state.


If we want an aggregate to be event sourced, it should be able to rebuild itself from a stream of events, and it should be able to record all the events it raises.
public interface IEventSourcedAggregate : IAggregate
{
    void Initialize(EventStream eventStream);

    EventStream RecordedEvents();
}
Let's implement the example aggregate we used last week: an account. An account owner can deposit and withdraw an amount from his account. There is a maximum amount policy for withdrawals though.
public class Account : IEventSourcedAggregate {
    private readonly Guid _id;

    public Account(Guid id) {
        _id = id;
    }

    public Guid Id { get { return _id; } }

    public void Initialize(EventStream eventStream) { 
        throw new NotImplementedException();
    }

    public EventStream RecordedEvents() { 
        throw new NotImplementedException(); 
    }
    
    public void Deposit(Amount amount) { }
    
    public void Withdraw(Amount amount) { }
}
Next to the Initialize and RecordedEvents method, our aggregate facade hasn't changed. We still have a Deposit and a Withdraw operation like we would have in a traditional aggregate. How those two methods get implemented differs though.

When we deposit or withdraw an amount, we want to - instead of changing the state directly - apply events. When an event gets applied its handler will first be invoked, for the event then to be recorded.
public void Deposit(Amount amount) {
    Apply(new AmountDeposited(amount));
}

public void Withdraw(Amount amount) {
    if (amount.IsOver(AmountPolicy.Maximum))     {
        Apply(new WithdrawalAmountExceeded(amount));

        return;
    }

    Apply(new AmountWithdrawn(amount));
}

private void Apply(IEvent @event) {
    When((dynamic)@event);
    _eventRecorder.Record(@event);
}
An event recorder is a small object that keeps track of recorded events.
public class EventRecorder
{
    private readonly List<IEvent> _events = new List<IEvent>();

    public void Record(IEvent @event) {
        _events.Add(@event);
    }

    public EventStream RecordedEvents() {
        return new EventStream(_events);
    }
}
This object will be used to have our aggregate return a stream of recorded events.
public EventStream RecordedEvents() {
    return _eventRecorder.RecordedEvents();
}
We can now also implement initializing the aggregate from a stream of events.
public void Initialize(EventStream eventStream) {
    foreach (var @event in eventStream)
        When((dynamic)@event);
}
Here too, event handlers are invoked by using the dynamic run-time to find the best overload.

It's the event handlers that will change the aggregate's state. In this example, they can be implemented like this.
private void When(AmountWithdrawn @event) {
    _amount = _amount.Substract(@event.Amount);
}

private void When(AmountDeposited @event) {
    _amount = _amount.Add(@event.Amount);
}

private void When(WithdrawalAmountExceeded @event) { }
A test verifies that when I invoke operations on the aggregate, all the events are recorded, and the state has changed. When I use those recorded events to rebuild the same aggregate, we end up with the same state.
[TestMethod]
public void ICanReplayTheEventsAndHaveTheStateRebuilt() {
    var account = new Account(Guid.NewGuid());

    account.Deposit(new Amount(2500));
    account.Withdraw(new Amount(100));
    account.Withdraw(new Amount(200));

    Assert.AreEqual(3, account.RecordedEvents().Count());
    Assert.AreEqual(new Amount(2200), account.Amount);

    var events = account.RecordedEvents();

    var secondAccount = new Account(Guid.NewGuid());
    secondAccount.Initialize(events);

    Assert.AreEqual(new Amount(2200), secondAccount.Amount);
    Assert.AreEqual(0, secondAccount.RecordedEvents().Count());
}
And this is all there is to an event sourced aggregate.

For this exercise I tried to keep the number of concepts low. Many will have noticed that extracting a few concepts would benefit re-use and explicitness.

Also using the DLR to invoke the correct event handlers might be frowned upon; it's not the most performant method, each event must have a handler, and in case a handler is missing the exception is not pretty. Experienced readers will also have noticed concepts such as versioning and snapshots are not implemented yet. I hope limiting the amount of concepts and indirections made this blog post easier to read.

Any thoughts on this implementation?

Next week: where do I persist these events?

Sunday, October 6, 2013

My understanding of event sourcing

I've been studying event sourcing from a distance for little over a year now; reading online material and going through some of the excellent OS code. Unfortunately, there would be no value introducing it into my current project - it would even be a terrible idea, so I decided to satisfy my inquisitiveness by consolidating and sharing my understanding of the concept.

Domain events

An event is something that happened in the past.

Events are described as verbs in the past tense. For example; amount withdrawn, amount deposited, maximum withdrawal amount exceeded. Listen for them when talking to your domain experts; events are as much a part of the ubiquitous language as commands, aggregates, value objects etc...

Once you've captured a few events, you will notice how these concepts have always implicitly been there, but by making them explicit you introduce a whole new set of power tools to work with.

Event sourcing

Having defined domain events one more time, we can now look at event sourcing. By the name alone, it should be obvious events are going to play the lead role.

In traditional systems, we only persist the current state of an object. In event sourced systems, we don't persist the current state of an object, but the sequence of events that caused the object to be in the current state.

In traditional systems, every time a change happens, we retrieve the old state, mutate it, and store the result as our current state. In this example, only the last column would be persisted.

Old amount Command Current amount
CreateAccount $0
$0 Deposit $2000 $2000
$2000 Withdraw $100 $1900
$1900 Withdraw $500 $1400
$1400 Withdraw $2000 $1400
$1400 Withdraw $300 $1100

In event sourced systems on the other hand, we store the changes that happened - the second column, not the current state. To arrive at the current state again, we take all these events - and replay them.

Command Event Current amount
CreateAccount AccountCreated $0
Deposit $2000 Deposited $2000 $2000
Withdraw $100 Withdrawn $100 $1900
Withdraw $500 Withdrawn $500 $1400
Withdraw $2000 Maximum withdrawal amount exceeded!  $1400
Withdraw $300 Withdrawn $300 $1100

Notice how we already gain better insights into what's happening by seeing an explicit maximum amount exceeded event.

Next time; what does this look like in code?

Feel free to complement and correct my understanding of event sourcing.

Sunday, September 29, 2013

CZ The Trilogy

Over the weekend, we visited the Czech Republic for the third time (one and two). It's mostly chance that sends us that way every time though. This time, we were invited by friends to accompany them in staying over at their family's house - who made it their job to lead guided tours through Prague.

We left Thursday right after work, hoping to get there in eight hours. A decent traffic jam, a missed exit, and some bad map reading decided otherwise; it added three hours to the trip.

We stayed in a small town just twenty minutes outside of Prague, which has been able to preserve all of its rural character. We took one day to go out hiking, and were surprised by the local fauna; we were lucky enough to spot wild deer, wild boars and a viper (that last one already crushed to death though). Next to those species, the forest was heavily occupied by locals gathering mushrooms - must be the season?


We used the remaining two days to stroll through Prague, and to live like kings getting the most out of the favorable prices. Food and drinks cost less than half of what they do in Belgium. There is a zero-tolerance alcohol policy for drivers though - I got pulled over too, so they seem to be pretty serious about it.




Sunday, September 22, 2013

Actor Model in COBOL

In an Actor system, each Actor acts as a self-contained and autonomous component. An Actor can only communicate with other Actors by exchanging messages - they are not allowed to share state. Messages are handled asynchronously, and are nondeterministic. The location of Actors should be transparent; they can either live on the same machine, or on a distributed system. These properties make the Actor Model a great fit for parallel and distributed computing.

Even without considering parallelism and distribution, the Actor Model appeals to me. If you take an existing system, and make each aggregate in that system an Actor, what would the impact be? You can get rid of all the messaging and queuing infrastructure; messages and asynchrony are now first class citizens. Where you had to have discipline abiding the aggregate rules of thumb - modifying one aggregate per transaction, no references to other aggregates, Tell Don't Ask... - the very nature of Actors will guide you into doing the right thing.
Next to these implementation concerns, the model itself can be used as a framework for modeling and reasoning about complex systems. Once they are well educated on the constraints, it must come natural for domain experts as well.

Having worked with a team of mainframe programmers over the last year, it recently came to me that how they have designed their systems over the years is compatible with a good amount of Actor laws.


Composition

A good thing about COBOL seems to be that it's nearly impossible to write maintainable big programs, so you're forced to decompose your program into smallish autonomous components - into jobs.

Messages

Communication between these jobs happens by passing flat files around - the only format that's supported out-of-the-box.
Messages come in, and new messages go out. Jobs will never mutate the incoming payload, a new copy is created instead; pipes and filters.
Folders serve as a queue, allowing files to be processed asynchronously and nondeterministic.

Staying clear of mutating messages makes debugging extremely easy; you'll never hear someone on the team asking for reproduction steps, they just restore the production archives locally.

Addresses

Actors send messages to other Actors using their addresses. This can be a memory or disk address, a network address, email address, whatever really. In mainframe land, file system paths serve as addresses.

No shared state

In general they stay away from jobs sharing state; the default is to lock files exclusively, so sharing them is highly impractical. Even most static data gets synchronized instead of shared - banking reference data, customer addresses, configuration etc...

Scheduling

A scheduler sits on top of all these jobs. Its responsibility is to start a job when a new file arrives. If a job fails, the scheduler acts as a supervisor and will notify operations, which will investigate the issue - probably look at what's on the file system, and use the same scheduler to restart the failed job. Notice that one failing job doesn't impact other jobs.

All of this gives you an automated, highly observable and fault-tolerant system.

Although COBOL remains to be a horrible language, mainframe systems do have their strengths. There must be some good reason a lot of core business functions are still running on mainframes, right? Maybe similarities with the Actor Model are far-fetched and merely a figment of my imagination. Feel free to share your thoughts. 

Sunday, September 15, 2013

Slides from my talk on the Ubiquitous Language

I just returned from our yearly Euricom retreat. This year, all forty of us got to spend four days in the South of Spain. Where we had longish sessions and a few workshops last year, we experimented with shorter talks this year - a la lightning talks, TEDx style.

This format made it possible for everyone to speak, but also forced the speaker to keep the scope of the talk focused, and to organize the information in a way that attendees can get the gist of it in only twelve minutes. This makes for high-energy talks designed to peak one's interest, to share useful tips, to plant a seed or to pitch an idea.
Covering more than just technical ground alone made topics extremely diverse; from query tuning to empathy, from automated testing to how to explain your kids what you do for a living, from personal kanban to juggling with a diabolo. Going back and forth between these technical and less technical sessions kept my brain from being oversaturated.

Definitely an experiment that only yielded positive results; we will be using this format more frequently when organizing internal events.

Initially, I planned on doing a session on the DDD strategic patterns; the ubiquitous language, subdomains, bounded contexts, context mapping... but I couldn't capture all of that in a meaningful way in less than twelve minutes. That's why I started over, focusing on the ubiquitous language alone.

You can find my slides embedded below or on Slideshare.


Friday, September 6, 2013

The first DDDBE Modellathon

On our way back from DDD Exchange, heavily influenced by yet another immersive DDD experience, we searched for ways to keep the momentum going. Sure, we met up regularly for CQRS beers, but we felt that we could do more, better. That's when we coined the term modellathon, something like a hackathon, but instead of writing code, we would build models.

Thanks to the effort of Mathias, Stijn and Yves, Tuesday marked the first get-together of the Belgian DDD user group in its official form. Combell was kind enough to provide us with a location, while Mathias fronted paper - lots of it too, post-its and markers.

Mathias and Stijn took the lead introducing themselves as domain experts of the day. The domain? The United Schools of Kazachstan.

We split up into groups of four, and used our first pomodoro trying to understand the domain. The second pomodoro, we threw everything away and started fresh.

The first modeling technique our group tried was Alberto Brandolini's event storming. We took what we thought was the most important event report approved, wrote it on a post-it, and posted it on the center of our sheet of paper. Then we worked our way back to how we got there, but also looked at what happened next. This modeling approach yielded results very quickly; we all gained a decent understanding of everything what's going on in the domain. Talking to the domain expert made it obvious what the hotspots were, he kept referring to two post-its in particular.

We might have zoomed in right there, but for the sake of experimenting with event storming, we stuck to events a little longer. We added commands, looked for clusters, made aggregate boundaries based on that, and looked where they were talking to each other.

We initially used a sheet of paper and stayed seated, but this was holding us back. Once we stood up and moved to the wall, our synergy increased. Space and blood circulation seem to be important.

Flow is important too. Since we only had two domain experts, we often had to make assumptions and come up with a name that made sense. This slowed us down. We should just write down whatever we come up with, and make doubts explicit on the post-it. You can always verify and fix the language consulting the domain experts later.

The next visualization technique, initiated by Yves, took a UI-first approach. While this quickly gives you something concrete to chat about with the domain expert, I learned it can also lead you to bounded context boundaries by helping you answer the question "Where is all this data coming from, which contexts does this data belong to?"

I thought this first experiment went really well - a lot better than I expected. It proves once again the value of visualization and collaboration. All models were probably wrong, but turned out to be useful. The end result probably doesn't matter that much, discovery and learning along the way does.



Note to self: make pictures of the end results; they would help explaining some of the experiments.

Sunday, August 25, 2013

Inheritance is like Jenga

These last few days, I have been working on a piece of our codebase that accidentally got very inheritance heavy.

When it comes to inheritance versus composition, there are a few widely accepted rules of thumb out there. While prefer composition over inheritance doesn't cover the nuances, it's not terrible advice; composition will statistically often be the better solution. Steve McConnell's composition defines a 'has a'- relationship while inheritance defines an 'is a'-relationship, gives you a more nuanced and simple tool to apply to a scenario. The Liskov substitution principle which states that, if S is a subtype of T, then objects of T may be replaced with objects of type S without any of the desirable properties of that program, is probably the most complete advice.

Inheritance, when applied with the wrong motivations - reuse and such, often leads to fragile monster class hierarchies which are too big to wrap your head around and extremely hard to change.

When I was working on such a monstrous hierarchy, it reminded me of playing Jenga. Some time not that long ago, someone had built this tower from the ground, laying block over block, layer over layer. On the surface it appears to be stable and rigid, but as soon as someone wants to winkle out one block, it becomes obvious one block can bring down the whole structure. The lower the block in the structure, the more layers rest on it, the greater the risk of breaking everything on top. Even if you do succeed in pulling one block out, chances are you had to touch the surrounding blocks to prevent the tower from tumbling over.

Instead of Jenga, I'd prefer a puzzle made of just a few pieces - designed for toddlers. A puzzle is flat, you can see the big picture in one glance, while you can reason about each piece individually as well. As long as the edges of the pieces fit together, you can assemble whatever picture you want. 

Sunday, August 18, 2013

But I already wrote it

A few weeks ago, we set out to implement a feature that enabled back office users to set a new rate ahead of time. With our analyst and the involved user being out of the office for days, we had to solely rely on written requirements. Two of us skimmed the documents, but didn't take the time to assure there wasn't any ambiguity - it looked trivial really. I went back to what I was doing, while my colleague set out to implement this feature. Going over the implementation together the next day, he had built something a lot more advanced than I had anticipated. While I argued that this was a lot more than we needed, we agreed to wait for feedback from our analyst to return from her holiday.

When our analyst returned, she confirmed that the implementation did a lot more than we needed. I suggested removing what we didn't really need. My colleague argued that he now already had put in the effort to write it, and we should just leave it as is.

I can relate to feeling good about freshly written code, but that shouldn't stop you from throwing it away. Code is just a means to an end; the side product of creating a solution or learning about a problem. If you really can't let go, treasure it in a gist.

In this particular scenario, one could argue that making the solution more advanced than it should be, isn't strong enough of an argument to make a big deal out of it. We're giving the users a little extra for free, right? 
I cannot stress the importance of simplicity enough; to be simple is to be great, perfection is achieved not when there is nothing more to add, but when there's is nothing left to take away, and all of that. Nobody likes bulky software. Nobody likes fighting complexity all day.
But by only considering the cost of initially writing it, you are also ignorant of the true cost of what appears to be a small little extra on the surface. Users, developers, designers and analysts alike have yet another thing to wrap their heads around. More code is not a good thing; more code to test, more code to maintain. Each feature, how small it may seem, needs to be taken into account when planning on new ones. Each feature, definitely an advanced one, makes the cost of training and support go up. The cost of implementing a feature is just a tiny portion of what it costs to support that feature through its entire lifetime.

Using this argument, I eventually succeeded in persuading my peer to dump the ballast. The real lesson for me however, is probably that how trivial it might have seemed, we could have ruled out any possible ambiguity in advance by using one of the various tools we have to our disposal; a smallish white board session or maybe pairing on some high level tests.

Thursday, August 15, 2013

Eventual consistent domain events with RavenDB and IronMQ

Working on side projects, I often find myself using RavenDB for storage and IronMQ for queueing. I wrote about that last one before here and here.

One project I'm working on right now makes use of domain events. As an example, I'll use the usual suspect: the BookingConfirmed event. When a booking has been confirmed, I want to notify my customer by sending him an email.

I want to avoid that persisting a booking fails because an eventhandler throws - the mail server is unavailable. I also don't want that an eventhandler executes an operation that can't be rolled back - sending out an email - without first making sure the booking was persisted succesfully. If an eventhandler fails, I want to give it the opportunity to fix what's wrong and retry.
public void Confirm()
{
    Status = BookingStatus.Accepted;

    Events.Raise(new BookingConfirmed(Id));
}
Get in line

The idea is, instead of dealing with the domain events in memory, to push them out to a queue so that  eventhandlers can deal with them asynchronously. If we're trusting IronMQ with our queues, we get in trouble guaranteeing that the events aren't sent out unless the booking is persisted succesfully; you can't make IronMQ enlist in a transaction.

Avoiding false events

To avoid pushing out events, and alerting our customer, without having succesfully persisted the booking, I want to commit my events in the same transaction. Since IronMQ can't be enlisted in a transaction, we have to take a detour; instead of publishing the event directly, we're going to persist it as a RavenDB document. This guarantees the event is committed in the same transaction as the booking.
public class DomainEvent
{
    public DomainEvent(object body)
    {
        Guard.ForNull(body, "body");          
        
        Type = body.GetType();
        Body = body;
        Published = false;
        TimeStamp = DateTimeProvider.Now();
    }
    
    protected DomainEvent() { }

    public string Id { get; private set; }

    public DateTime TimeStamp { get; private set; }

    public Type Type { get; private set; }

    public object Body { get; private set; }

    public bool Published { get; private set; }

    public void MarkAsPublished()
    {
        Published = true;
    }
}

public class DomainEvents : IDomainEvents
{
    private IDocumentSession _session;

    public DomainEvents(IDocumentSession session)
    {
        _session = session;
    }

    public void Raise<T>(T args) where T : IDomainEvent
    {       
        _session.Store(new DomainEvent(args));
    }
}

Getting the events out

Now we still need to get the events out of RavenDB. Looking into this, I found this to be a very good use of the Changes API. Using the Changes API, you can subscribe to all changes made to a certain document. If you're familiar with relation databases, the Changes API might remind you of triggers - except for that the Changes API doesn't live in the database, nor does it run in the same transaction. In this scenario, I use it to listen for changes to the domain events collection. On every change, I'll load the document, push the content out to IronMQ, and mark it as published.
public class DomainEventPublisher
{
    private readonly IQueueFactory _queueFactory;
    
    public DomainEventPublisher(IQueueFactory queueFactory)
    {           
        _queueFactory = queueFactory;
    }

    public void Start()
    {
        DocumentStore
          .Get()
          .Changes()
          .ForDocumentsStartingWith(typeof(DomainEvent).Name)
          .Subscribe(PublishDomainEvent);
    }

    private void PublishDomainEvent(DocumentChangeNotification change)
    {
        Task.Factory.StartNew(() =>
        {
            if (change.Type != DocumentChangeTypes.Put)
                return;

            using (var session = DocumentStore.Get().OpenSession())
            {
                var domainEvent = session.Load<DomainEvent>(change.Id);

                if (domainEvent.Published)
                    return;

                var queue = _queueFactory.CreateQueue(domainEvent.Type.Name);
                queue.Push(JsonConvert.SerializeObject(domainEvent.Body));

                domainEvent.MarkAsPublished();

                session.SaveChanges();
            }
        });
    }
}
I tested this by raising 10,000 events on my machine, and got up to an average of pushing out 7 events a second. With an average of 250ms per request, the major culprit is posting messages to IronMQ. Since I'm posting these messages over the Atlantic, IronMQ is not really to blame. Once you get closer, response times go down to the 10ms - 100ms range.

A back-up plan

If the subscriber goes down, events won't be pushed out, so you need to have a back-up plan. I planned for missing events by scheduling a Quartz job that periodically queries for old unpublished domain events and publishes them.

In conclusion

You don't need expensive infrastructure or a framework to enable handling domain events in an eventual consistent fashion. Using RavenDB as an event store, the Changes API as an event listener, and IronMQ for queuing, we landed a rather light-weight solution. It won't scale endlessly, but it doesn't have to either.

I'm interested in hearing which homegrown solutions you have come up with, or how I could improve mine.

Sunday, August 4, 2013

When your commands spell CUD

A good while ago, I blogged on commands (and queries). After exploring various flavors, I eventually settled on this one; commands, handlers and an in-memory bus that serves as a command executor.

Commands help you in supporting the ubiquitous language by explicitly capturing user intent at the boundaries of your system - think use cases. You can look at them as messages that are being sent to your domain. In this regard, they also serve as a layer over your domain - decoupling the inside from the outside, allowing you to gradually introduce concepts on the inside, without breaking the outside. The command executor gives you a nice pipeline you can take advantage of to centralize security, performance metrics, logging, session management and so on.

We always need to be critical of abstractions though, and regularly assess their value. A smell that might indicate that commands might not be working for you, or are adding little value, is that the first letters of your commands spell CUD - Create Update Delete.

For example; CreateCarCommand, UpdateCarCommand and DeleteCarCommand.

The language needs attention

Possibly, your team hasn't fully grasped the power of cultivating the ubiquitous language. If you start listening to your domain experts, you might end up with totally different command names; TakeInNewCarCommand, RepaintCarCommand, InstallOptionCommand and RemoveCarFromFleetCommand.

Maybe though, there is no language at all, and you're really just doing CRUD. If the context you are working on is implementing a generic or supporting subdomain this might not be terrible.

If I'm doing CRUD, do I still need commands?

Commands help you decouple the inside from the outside. If there is no domain on the inside though, they can still help you decouple the application layer from other concerns. You might prefer to use another facade to separate concerns though, such as a thin service layer. I don't think the service layer abstraction gives you anything commands don't though. Maybe you don't find any value in separating things at all, and just dump everything in the the application layer.

All of these approaches are valid. You just have to consider the trade-offs. With these last approaches, next to losing decoupling in from out, you also lose that central pipeline that a command executor gives you.

Doesn't my application layer give me this pipeline for free?

It sure can. Looking at modern networking stacks, these all have interception points built in. For example; NancyFx allows you to hook in the request pipeline using before and after hooks; Web API gives your message handlers and action filters; and WCF has a concept of interceptors.

Not all application frameworks do though - think of frameworks targeting desktop software.

The advantage of having your own pipeline, instead of solely having to rely on your application framework's interception points, is that you're boss and don't have to study the ins and outs of each framework, and hope for the framework to have thought of your needs. Also when you need to support multiple application layers, you don't have to implement all features twice.

What about using aspects instead of a pipeline to centralize all these concerns?

You can - instead of a pipeline - use aspects to take care of croscutting concerns such as security, logging, session management.. and have them woven into your executable at compile- or runtime. I think of aspects as if they were macros, which save you on lines of code written, but also often conceal the real problem; missing concepts. While they try to sell you on separation of concerns, notice that you're actually still producing procedural code, but instead of writing it by hand, you're now letting an AOP framework do it for you. Add harder testing, debugging and readability to the bunch, and you can understand why I'm not a fan of AOP. There are scenarios where this all doesn't matter much though, and to strengthen the cliche; granular logging is a good use case.

When your commands spell CUD, it might indicate you could do without them. Do realize what the consequences are of taking them away though;
  • you lose the opportunity to capture user intent at the boundaries of your system, to strengthen the ubiquitous language
  • you may need an alternative facade to decouple in from out
  • you lose that command executor serving as your own pipeline