Sunday, November 3, 2013

Event source all the things?

Having covered projections last week, I think I have come full circle in these posts that turned out to be a small preliminary series on event sourcing. Even though there are still a vast amount of nuances to discover, I think I've captured the gist of it. Even without running an event sourced system in production - I feel as if I somewhat have an idea of what event sourcing can bring to the table.

Event sourcing gives you a complete history of events that caused an aggregate to be in its current state. In some scenarios this will add an enormous amount of value, in other scenarios it will give you nothing - it might even steal time and effort.

The first thing you do - before even considering implementing event sourcing - is talking to your business. Do they feel as if events are a natural way to represent what's going on in their domain? Event sourcing is a lot more than just a technical implementation detail, discovering and understanding all of what goes on in a domain is a big investment - from both sides. Is it worth the trouble?

In my first job I worked on software for fire departments. I just now realize in how many bits of our solution event sourcing could have helped us:
  • the life cycle of a vehicle assigned to an emergency: vehicle dispatched, vehicle left the station, vehicle en route, vehicle arrived on the scene, vehicle back in the station...
  • a person's career: person was promoted, person was detached to another station, person learned a new skill...
  • a shift's schedule: person attached to unit, person returned to person pool, unit dispatched...
This data had to be made available in a set of diverse read models. Getting the data out was complex at times, often even impossible. A lot of these changes had to be propagated to external systems; there was no way to get that info out in real-time, and external systems had no notion of what happened.

In one of the functionalities of a system I'm currently working on, users also wanted to know what happened in the past, but for completely different reasons. Being in a financial context, they wanted to know who was responsible for changing system settings. Here it's not an event log they need, but a simple audit trail.

If it is just a passive log your business wants, you can get away with cheaper alternatives; a command journal, an audit trail and so on.

Benefits

Event sourcing goes hand-in-hand with Domain Driven Design. Events are a great tool to go from a structural model to a behavioural model, helping you to capture the true essence of a domain model.

Building and maintaining an event store should be doable. It's an append-only data model, storing serialized DTO's with some meta data. This makes - compared to ORM's and relational databases - tooling easier as well.

In traditional systems, you have to keep a lot of things in your head at once; how do I write my data, but also how do I query my data, and more importantly how do I get my data out in all these different use cases without making things too hard. In event sourced systems, separating writes from reads makes for more granular bits, easing the cognitive load.

Events can be projected into anything: a relational database, a document store, memory, files... This allows you to build a read model for each separate use case, while also giving you a lot of freedom in how you're going to persist them.

You can replay projections, rebuilding a read model from scratch. Forget about difficult data migrations.

Testing feels consistent and very complete. A test will assert if all the expected events were raised, but will also implicitly assert that unexpected events were not raised. Testing projections is also straight-forward.

Events provide a natural way of integrating with other systems. Committed events can be published to external subscribers.

Troubleshooting becomes easier since a developer can copy an event stream from production, and replay it locally - reproducing the exact issue without jumping through hoops getting the system in a specific state.

Instead of patching corrupted production data directly, you can send a compensating event or fix the projection and replay everything. This way nothing gets lost, and consistency between code and outcome is guaranteed.

Downsides

Defining events is hard. Defining good events takes a lot of practice and insight. If you're forcing a structural model into a behavioural one, it might even be impossible. So don't even consider turning CRUD into an event sourced model.

There are a few places you need to be on the look out for performance bottlenecks. Event streams of long lived aggregates might grow very big. Loading a giant event stream from a data store might take a while - snapshots can help here. Projecting giant event streams might get you into trouble too - how long will it take to rebuild your read model, will it even fit into memory? Making projections immediate consistent might become a problem if you do a lot of them. Parallelization or giving up on immediate consistency might bring solace.

Events don't change, versioning might get awkward. Are you going to create a new event type for each change, or will you relax deserialization? Or maybe you want to implement event migrations?

Since you're persisting multiple models; events and one or more read models, you're going to consume more storage, which will cost you.

Adaptation in the wild

Although there are - from a a business and engineering perspective - some good arguments to be made for event sourcing, those arguments only apply to a modest percentage of projects. Even when there's a strong case to be made for event sourcing, there are very few people with actual experience implementing an event sourced system and prescriptive frameworks that you can just drop into a project and feel good about, are lacking. Most won't even care about event sourcing to start with, but even if they do, it's a fight upstream; it introduces a risk most might not be comfortable with.

Having said that, there are some really good projects out there that are steadily gaining popularity and maturity. Pioneers in the field are sharing and documenting their experiences, lowering the barriers for others. Things are moving for sure.

As always, event sourcing is not a paradigm to blindly apply to each and every scenario, but definitely one worth considering.

Since I'm not running any of it in production, tell me what I'm missing, there must be more things that turn out to be harder than they sound at first right? If you're not running it in production, but thinking about it, what are some of your concerns? What are your predictions for the future of event sourcing?

6 comments:

  1. I have recently deployed my first event driven application and it has been a very interesting process. As always, you don't discover/appreciate many of the challenges until you try it your self. Overall it has been proven to be a great approach. A couple of additional things I would consider if starting out on a green field project would be the skill level and teachability of the team. Do they/will they get it? It's also worth trying to identify the life expectancy of your application. A short lived system might benefit more from a simplistic throw away CRUD structure in terms of time to market. Linked to that point, if the system is intended to give the business a competitive advantage, it probably is worth considering event sourcing as it allows for so much 'Agile' like flexibility through the use of projections and cacheable read models etc. Anyway nice post!

    ReplyDelete
  2. A lot of the difficulty lies in all the mistakes you make, that seem obvious once you figure them out. For example, you can't have projections depend on other read models, if those read models are not guaranteed to be consistent at that time. Modelling in a way that writes do not depend on the read model, can be very hard, and in a complex domain you'll get your aggregate boundaries wrong a lot. Your modelling and refactoring skills are really being put to the test. If you want high throughout, dealing with messages arriving out of sync is a pain. Then there's the really stupid crap , like discovering that the serialisation library you're using, that is supposed to be to notch, is actually so slow that it is a bigger bottleneck than I/O...

    On the sunny side: all these things are the sorry of problems a passionate developer loves to figure out. Doing CQRS/ES has really improved my programming, even on traditional projects. Fun times!

    ReplyDelete
  3. Totally random thoughts:

    You might want to rethink the part about compensating "events" (really, events?? Uma, you mean commands, right?) because it's not as simple as you make it out to be (yes, yes, I know, introductionary series). When event data is a side effect of a data transformation induced by code, it can be HARD to get it to compensate the way you want it. There are cases where it's obvious, but there are also cases where it's non-obvious until you hit them.

    I don't think "versioning gets awkward" is the proper term. Rather most traditional fellahs are not accustomed to dealing with messaging in general and versioning in particular. The closest most get is when they write some migration script for their DB. It takes a different kind of mentality ;-)

    Most hard problems are waiting for you in production. Best school is doing, and bumping your head.

    Bottom line for me is: embrace messaging first, before you tackle eventsourcing, coz your relational mind is not ready for it.

    ReplyDelete
    Replies
    1. Correct, you must have a lot more messaging experience under your belt.

      In my messaging experience, we didn't have hundreds of message contracts, and those we had were very stable. If they did change, we only made the receiving end one version backwards compatible, since we wouldn't receive any older versions. There was one system producing messages, multiple systems were subscribed.

      With event sourcing, every event is a message, so you will likely end up with _a lot_ of message contracts. Since you want to be able to replay from day one you need to be compatible with all preceeding versions, instead of just the last one - which 'complicates' versioning over 'ordinary' messaging?

      Delete
    2. Short answer: It depends (TM)

      Slightly longer: Ways of dealing with contract versioning vary wildly. In general you want to be as append-only, additive-only as possible. You want to be picky about the data interchange format you choose and how it deals with versioning. Be diligent in the way you design contracts. The birth of reflection based serialization frameworks and libraries have blinded us. I still remember the day we had to use reader/writer pairs to serialize our stuff. It was manual labor, but at least we had a high degree of control and performance was superb. But I'm digressing ...

      So do you have to keep all the old classes around? You could if you wanted to. It's easy. You'll have to write upconverters for sure. Once they're written and well tested you could replace the old class with something more dynamic (e.g. dynamic keyword or JObject or dictionary access), at which point you could ditch the old class. You just traded old classes for immutable upconverters (you don't ever touch them anymore). Using a string as a contract identifier instead of a type name helps a lot in this department. Alternatively, in a closed system, you could run an upgrade process on your event store, where you do the upconversion of events once (per upgrade). Obviously, you're making a tradeoff here since you just touched immutable data. Subscribers might not be too happy about that. I guess a lot depends on whether you look upon your eventsourcing event contracts as being public or not.

      When your contracts change a lot there are other issues at play (lack of commitment, communication or review, unstable or unknown behavior).

      Delete
    3. "'complicates' versioning over 'ordinary' messaging" makes no sense to me, BTW. Messaging and versioning go hand in hand.

      Delete