Sunday, October 27, 2013

Event projections

In my first two posts on event sourcing, I implemented an event sourced aggregate from scratch. After being able to have an aggregate record and play events, I looked at persisting them in an event store. Logically, the next question is: how do I query my aggregates, how do I get my state out?

In traditional systems, write and read models are not separated, they are one and the same. Event sourced systems on the other hand have a write model - event streams, and a separate read model. The read model is built from events committed to the write model; events are projected into one or more read models.


An interface for a projection could look like this.
public interface IProjection {
    void Handle(EventStream eventStream);                     
}  
A projection takes in an event stream, and projects it to some read model.

A read model can be anything; a cache, a document store, a key value store, a relational database, a file, or even some evil global state.
public class EvilStatisticsReadModel {
    public static int WithdrawalAmountExceededCount { get; set; }

    public static int AmountDepositedCount { get; set; }
}
In this model, we want to maintain statistics of events that happened. For that to happen, we need to define a projection of our event stream.
public class ProjectionsToEvilStaticsReadModel : IProjection {
    public void Handle(EventStream eventStream) {
        foreach (var @event in eventStream)
            When((dynamic)@event);
    }

    public void When(WithdrawalAmountExceeded @event) {
        EvilStatisticsReadModel.WithdrawalAmountExceededCount++;
    }

    public void When(AmountDeposited @event) {
        EvilStatisticsReadModel.AmountDepositedCount++;
    }    
}
If we now let this projection handle an event stream, our read model will be kept up-to-date.
[TestMethod]
public void ReadModelIsKeptUpToDateWhileProjectingTheEventStream() {
    var events = new List<IEvent>() {
        new WithdrawalAmountExceeded(new Amount(3000)),
        new AmountDeposited(new Amount(300)),
        new AmountDeposited(new Amount(500)),
        new AmountWithdrawn(new Amount(100))
    };
    var stream = new EventStream(events);

    new ProjectionsToEvilStaticsReadModel().Handle(stream);

    Assert.AreEqual(1, EvilStatisticsReadModel.WithdrawalAmountExceededCount);
    Assert.AreEqual(2, EvilStatisticsReadModel.AmountDepositedCount);    
}
One could argue that all of this is too much - not worth the effort. Where you first just persisted the structure of an aggregate, and could query that same structure, you now first have to persist events for then to write projections that maintain separate read models that can be queried.

You have to look beyond that though. Those that have done any serious work on a traditional stack have felt the pain of migrations, complex queries that take up three pages, obscure stored procedures that run for hours, optimizing while having to consider a handful of different use cases, finding the balance between write- and read performance, database servers that can't handle the load on special events, expensive licenses and so on. While these first few concerns are mostly technical, personally I'm often overwhelmed by how much concepts these designs force you to keep in your head all at once.

Separating reads from writes using event sourcing might bring some relief. Reducing cognitive overload by separating responsibilities into smaller, more granular bits might be the only argument you need. However, there's a lot more. Running an event store should be low-maintenance; it's an append-only data model storing simple serialized DTO's with some meta data - forget about big migrations (not completely though), schemas, indexes and so on. Even if you project into a relational database, being able to re-run projections should make migration scripts and versioning avoidable. An event can be projected into multiple read models, allowing you to optimize per use case, without having to take other use cases into account. Since it should be easy to rebuild read models, they can be stored in cheap and volatile storage - think key-value store, in-memory and so on, allowing for crazy fast reads.

Letting go of the single-model dogma seems to enable so much more, giving you a whole new set of possibilities. Another extremely useful use case that suddenly becomes a lot easier to support is business intelligence; when business experts think of new ways to look at the past, you just create a new projection and project events from day one. Getting statistics of how your users are using your system doesn't sound that hard now, does it?

One of the obvious drawbacks next to writing a bit more, boring code is that storage costs will increase - you are now persisting the same data in multiple representations. But storage is cheap, right? Maybe money isn't an issue, but what about performance? It's slower to do three writes instead of one, right? For a lot of scenarios this won't be much of an issue, but if it is, there is a lot of room for optimiziations doing projections; parallelization, eventual consistency and so on.

Next week: event source all the things? 

2 comments:

  1. There are a lot of problems with your IProjection, EventStream and its implementation. It's one thing to say "A projection takes in an event stream, and projects it to some read model", it's another to code it that way. For one it assumes that the stream of events is finite which it most definitely is not (at least not most of the time). You might run out of memory on large streams. There is also a very implicit contract between the Handle(EventStream) and each of the Whens. As you may know, 'dynamic' barfs when there isn't a When for a particular event. Painful and non obvious to the caller/consumer. The tests could benefit from a higher abstraction because it's going to get repetitive really fast. The eventstream - in this case - is a weaved subscription of all the event types handled by a particular (type of) projection, which could be materialized (e.g. what EventStore allows you to do) or not.

    Another thing to watch out for is the assumption that your read model fits in memory. Sometimes it does, sometimes it doesn't. This leads most down the path of what I call "connected" projection handlers hooked up to an ORM or plain old ADO.NET. Few months/years down the line they cry their hearts out because the projection that took 5 seconds to fill now takes 24 hours to rebuild. Not saying it's always the case but I've seen it happen on more than one occasion. Batching and decoupling building the dml statement from the execution of said statement goes a long way in mitigating these problems. That along with a well thought out partitioning scheme.

    My 2c.

    ReplyDelete
    Replies
    1. I'm well aware this implementation with its abstractions won't get you far, I hoped it would be useful demonstrating the gist without too much details.

      However, I have been totally ignorant in this post of the points you mention. Great advice, thanks!

      Delete