Sunday, November 24, 2013

Observations over assumptions

I heard a story once about an engineer who worked on the Disneyland site when it opened in Anaheim, CA in 1955.  
A month before the park opened, the new grass and sod were being applied to the grounds as one of the last items to be completed before the big grand opening. The parking lot was some distance from the park's gates and required a lot of turf.  However, the sidewalks had not been planned or constructed to accommodate patterns. 
Before engineers would validate the proper placement of the sidewalks, a heated internal discussion grew among landscape designers and park developers over how and what to build.  One engineer suggested they allow visitors to walk on the grass for months in order to observe the paths they created themselves. Then they would build over those paths that showed the most foot traffic. 
Those sidewalks are still there today. 
One engineer's focus on meeting the guests' needs saved the park millions of dollars' worth of error and political positioning.
I found this story on Quora a few weeks ago, and thought it was a testament to how observing - instead of assuming - can save you a lot of effort, and will end up serving the users best.

In the first product I helped building, there had been lots and lots of requirements gathering before we got to build the first functionality. Once we rolled out the pieces to the first set of users, they were disappointed to say the least - enraged actually; "This not usable at all! Do you even know what you're doing?". Apparently requirements were made up by people higher in rank, without consulting those who would have to use the software on a day-to-day basis. Not the best move getting buy-in from actual users, but I can't really blame them either; requirements are hard, often times you're just making stuff up as you go along - with best intentions. Being in a crisis, we got to learn a lot the next few months. One of us got sent out to the customer location a few times, and got to observe how they were trying to use our software. That information proved to be invaluable, and gave us enough to start shipping better and more useful software. Users felt empowered by our software, eventually leading to earning their trust and approval.

Another example can be found in my current project - which is not that visible to users, but more behind the scenes. Instead of guessing performance targets, it's by observing production metrics that we're able to set realistic goals.

Observing instead of assuming usually leads to better results. Keep in mind that you can misinterpret what you're observing too, and that there often is no other option than to assume. You can still avoid overly expensive mistakes, by validating these assumptions as early as possible.

Sunday, November 17, 2013

Event storming workshop slides

At Euricom, we quarterly all retreat to headquarters for a day of sharing and learning. This time, I and others organized and facilitated an event storming workshop.

After a short introduction on event storming participants were initiated to the domain of Cambio CarSharing - which is packed with behaviour. After that, seven groups of five (+ one domain expert) spread out across the office, and spent two slots of twenty minutes modeling the domain - with two extra slots for feedback.

Even after an afternoon of taxing sessions, people were willing to tap out of their energy reserves, and ended up presenting great results.

You can find the slides (heavily based on Alberto's material) I used here, or embedded below.



If you're interested in running your first event storming workshop, I'd love to come over and help you get started.


Sunday, November 10, 2013

An event store with optimistic concurrency

Like I mentioned last week - after only five posts on the subject - there still are a great deal of event sourcing nuances left to be discovered.

My current event store implementation only supports a single user. Due to an aggressive file lock, concurrently accessing an aggregate will throw an exception. Can we allow multiple users to write to and read from an event stream? Also, what can we do about users making changes to the same aggregate; can we somehow detect conflicts and avoid changes to be committed?

Multi-user

In the current version, concurrently appending to or reading from an aggregate's event stream will throw since the file will already be locked.
Parallel.For(0, 1000, (i) =>
{    
    _eventStore.CreateOrAppend(aggregateId, new EventStream(new List<IEvent>() 
    { 
        new ConcurrencyTestEvent() 
    }));
    _eventStore.GetStream(aggregateId);    
});
The exception looks like this: "System.IO.IOException: The process cannot access the file 'C:\EventStore\92f42a08-8583-4dcf-98a5-440b06f34719.txt' because it is being used by another process."

To prevent concurrent file access, we can lock code accessing the aggregate's event stream. Instead of using a global lock, we maintain a dictionary of lock objects; one lock object per aggregate.
lock (Lock.For(aggregateId))
{
    using (var stream = new FileStream(
        path, FileMode.Append, FileAccess.Write, FileShare.Read))
    {
        // Access the aggregate's event stream
    }
}

public class Lock
{
    private static ConcurrentDictionary<Guid, object> _locks = 
        new ConcurrentDictionary<Guid, object>();

    public static object For(Guid aggregateId)
    {
        var aggregateLock = _locks.GetOrAdd(aggregateId, new object());

        return aggregateLock;
    }
}     
Optimistic concurrency

Before committing changes, we want to verify that no other changes have been committed in the meanwhile. These changes could have influenced the behaviour of our aggregate significantly. Appending the last changes without considering what might have happened in the meanwhile might corrupt our aggregate's state.

One way to verify this is by using a number (or a timestamp - clocks, bah) to keep track of an aggregate's version. It's up to the client to tell us which version he expects when appending to a stream. To accommodate for this, we need to change the contract of our event store.
public interface IEventStore
{
    void Create(Guid aggregateId, EventStream eventStream);

    void Append(Guid aggregateId, EventStream eventStream, int expectedVersion);

    ReadEventStream GetStream(Guid aggregateId);
}
Clients now need to pass in the expected version when appending to a stream. The result of reading a stream will include the current version.

In the event store, we now store an index with every event.


If we append to an event stream, we will get the current version by reading the highest index - storing this in aggregate meta data would be faster for reading. If the current version doesn't match the expected version, we throw an exception.
var currentVersion = GetCurrentVersion(path);

if (currentVersion != expectedVersion)
    throw new OptimisticConcurrencyException(expectedVersion, currentVersion);

using (var stream = new FileStream(
    path, FileMode.Append, FileAccess.Write, FileShare.Read))
{
    using (var streamWriter = new StreamWriter(stream))
    {
        foreach (var @event in eventStream)
        {
            currentVersion++;

            streamWriter.WriteLine(new Record(
                aggregateId, @event, currentVersion).Serialized());
        }
    }
}
A test for that looks something like this.
try
{
    GivenEventStore();
    GivenAggregateId();
    GivenEventStreamCreated();
    WhenAppendingTwoEventStreamsWithTheSameExpectedVersion();
}
catch (OptimisticConcurrencyException ocex) 
{
    _expectedConcurrencyException = ocex;
}

[TestMethod]
public void ThenTheConcurrencyExceptionHasANiceMessage()
{
    var expected = "Version found: 3, expected: 1";
    var actual = _expectedConcurrencyException.Message

    Assert.AreEqual(expected, actual);
}
Reading the event stream doesn't change much; we now also read the current version, and return it with the event stream. 
var lines = File.ReadAllLines(path);

if (lines.Any())
{
    var records = lines.Select(x => Record.Deserialize(x, _assembly));
    var currentVersion = records.Max(x => x.Version);
    var events = records.Select(x => x.Event).ToList();

    return new ReadEventStream(events, currentVersion);
}

return null; 
And that's one way to implement optimistic concurrency. The biggest bottleneck in this approach is how we read the current version; having to read all the events to find the current version isn't very efficient.

Transactional behaviour is also missing. I've been thinking about adding a COMMIT flag after appending a set of events, and using that to resolve corruption on reads, or is this fundamentally flawed? 

Sunday, November 3, 2013

Event source all the things?

Having covered projections last week, I think I have come full circle in these posts that turned out to be a small preliminary series on event sourcing. Even though there are still a vast amount of nuances to discover, I think I've captured the gist of it. Even without running an event sourced system in production - I feel as if I somewhat have an idea of what event sourcing can bring to the table.

Event sourcing gives you a complete history of events that caused an aggregate to be in its current state. In some scenarios this will add an enormous amount of value, in other scenarios it will give you nothing - it might even steal time and effort.

The first thing you do - before even considering implementing event sourcing - is talking to your business. Do they feel as if events are a natural way to represent what's going on in their domain? Event sourcing is a lot more than just a technical implementation detail, discovering and understanding all of what goes on in a domain is a big investment - from both sides. Is it worth the trouble?

In my first job I worked on software for fire departments. I just now realize in how many bits of our solution event sourcing could have helped us:
  • the life cycle of a vehicle assigned to an emergency: vehicle dispatched, vehicle left the station, vehicle en route, vehicle arrived on the scene, vehicle back in the station...
  • a person's career: person was promoted, person was detached to another station, person learned a new skill...
  • a shift's schedule: person attached to unit, person returned to person pool, unit dispatched...
This data had to be made available in a set of diverse read models. Getting the data out was complex at times, often even impossible. A lot of these changes had to be propagated to external systems; there was no way to get that info out in real-time, and external systems had no notion of what happened.

In one of the functionalities of a system I'm currently working on, users also wanted to know what happened in the past, but for completely different reasons. Being in a financial context, they wanted to know who was responsible for changing system settings. Here it's not an event log they need, but a simple audit trail.

If it is just a passive log your business wants, you can get away with cheaper alternatives; a command journal, an audit trail and so on.

Benefits

Event sourcing goes hand-in-hand with Domain Driven Design. Events are a great tool to go from a structural model to a behavioural model, helping you to capture the true essence of a domain model.

Building and maintaining an event store should be doable. It's an append-only data model, storing serialized DTO's with some meta data. This makes - compared to ORM's and relational databases - tooling easier as well.

In traditional systems, you have to keep a lot of things in your head at once; how do I write my data, but also how do I query my data, and more importantly how do I get my data out in all these different use cases without making things too hard. In event sourced systems, separating writes from reads makes for more granular bits, easing the cognitive load.

Events can be projected into anything: a relational database, a document store, memory, files... This allows you to build a read model for each separate use case, while also giving you a lot of freedom in how you're going to persist them.

You can replay projections, rebuilding a read model from scratch. Forget about difficult data migrations.

Testing feels consistent and very complete. A test will assert if all the expected events were raised, but will also implicitly assert that unexpected events were not raised. Testing projections is also straight-forward.

Events provide a natural way of integrating with other systems. Committed events can be published to external subscribers.

Troubleshooting becomes easier since a developer can copy an event stream from production, and replay it locally - reproducing the exact issue without jumping through hoops getting the system in a specific state.

Instead of patching corrupted production data directly, you can send a compensating event or fix the projection and replay everything. This way nothing gets lost, and consistency between code and outcome is guaranteed.

Downsides

Defining events is hard. Defining good events takes a lot of practice and insight. If you're forcing a structural model into a behavioural one, it might even be impossible. So don't even consider turning CRUD into an event sourced model.

There are a few places you need to be on the look out for performance bottlenecks. Event streams of long lived aggregates might grow very big. Loading a giant event stream from a data store might take a while - snapshots can help here. Projecting giant event streams might get you into trouble too - how long will it take to rebuild your read model, will it even fit into memory? Making projections immediate consistent might become a problem if you do a lot of them. Parallelization or giving up on immediate consistency might bring solace.

Events don't change, versioning might get awkward. Are you going to create a new event type for each change, or will you relax deserialization? Or maybe you want to implement event migrations?

Since you're persisting multiple models; events and one or more read models, you're going to consume more storage, which will cost you.

Adaptation in the wild

Although there are - from a a business and engineering perspective - some good arguments to be made for event sourcing, those arguments only apply to a modest percentage of projects. Even when there's a strong case to be made for event sourcing, there are very few people with actual experience implementing an event sourced system and prescriptive frameworks that you can just drop into a project and feel good about, are lacking. Most won't even care about event sourcing to start with, but even if they do, it's a fight upstream; it introduces a risk most might not be comfortable with.

Having said that, there are some really good projects out there that are steadily gaining popularity and maturity. Pioneers in the field are sharing and documenting their experiences, lowering the barriers for others. Things are moving for sure.

As always, event sourcing is not a paradigm to blindly apply to each and every scenario, but definitely one worth considering.

Since I'm not running any of it in production, tell me what I'm missing, there must be more things that turn out to be harder than they sound at first right? If you're not running it in production, but thinking about it, what are some of your concerns? What are your predictions for the future of event sourcing?