Sunday, February 24, 2013

My Christmas holiday project postmortem

Somewhere over a year and a half ago I discovered the music of Dire Straits, which has sparked a fanatical love and fascination for the guitar in me, and basically for every piece of music Mark Knopfler has ever touched (*). A year ago, I finally had the courage to pick up the guitar myself. Not sure if I'd stick with it, I made an uninformed purchase of a rather inexpensive Squier Jazzmaster, just because it somewhat resembled the real object of desire, a Fender Stratocaster. Three months ago, I got rid of the Jazzmaster, and bought myself a tango red Mexican Stratocaster, and I'm in love with it. Yet, I have also become very fond of the sound of a Les Paul these days. Having just bought the Strat, I thought I could maybe find a cheap used Les Paul online.

I set out to find one on the most popular Belgian online secondhand marketplace (600k visitors daily), so I started browsing their listings daily. This turned out to be rather cumbersome and inefficient: constantly repeating the same process, items were already sold before I could make a bid, there were no new items since the last time I visited... It didn't take me long before I started thinking of a way to automate this dull process. Looking at other marketplaces, I found that some unburden their users by providing notifications; push instead of pull. Maybe I could build something similar?

Doing a bit of research over the weekend, I found out that they expose, what I call, an accidental API; written in first place to provide a snappy mobile user experience, not to expose all their data to third parties. Having an uncomplicated way of searching their data, I built a bit of code on top of it. Nothing all too fancy though; a Quartz job which periodically queries their service, parses the results and stores them in a RavenDB database. A few times a day, these search results are then compiled, and sent to my inbox using MailGun. All of this running for free on AppHarbor.

Showing all of this to the girlfriend, she asked if I could set up the same thing for her, but then for a specific type of camera. This is when I decided it could be useful enough to make it publicly available. I thought of a few ways to support the costs of hosting (which are extremely low): embed ridiculously relevant ads in the mails, make people pay for more frequent polling, or sell it (although very unlikely).

Two weeks later, I had something working online.


But how do I inform people of its existence? I first thought of using Twitter to monitor for relevant tweets where people ask for something secondhand. I started out by using a new dedicated generic account, but this got suspended rather quickly. In the second iteration, I used my personal account. This yielded better results; people saw I was human, and regularly thanked me for the tip. They didn't sign up too often though..

Not willing to give up on the idea already, I used Google Adwords to advertise on the online marketplace directly. The results of this campaign were sobering, but extremely valuable; people just didn't care. As a side note; secondhand seems to be quite an expensive keyword!



In hindsight, I can think of plenty of reasons why this somewhat useful project has no success:
  • The problem it is trying to solve obviously isn't enough of a pain!
  • People don't Google for it, and even if they did, this site had hardly any chance in making it to the first page.
  • In general the secondhand offering is enormous, and people often quickly settle for less.
  • Specialized markets aren't situated on these sites. 
  • People don't like giving away their email address, definitely not to strangers they don't trust. 
  • Everyone struggles to keep their inboxes clean, receiving mail puts off people.

Damn, it's always evident in hindsight.

Here are a few things I learned/got confirmed:
  • Being your own customer is priceless; you know exactly where the value is at.
  • Chances are you will never see a user; don't spend too much time optimizing for scalability and reliability.
  • Some Google ads plus a website with a simple form can be enough to let you cheaply validate the worthiness of pursuing an idea.

(*) Take some time to explore his work: brothers in arms, postcards from Paraguay, song for Sonny Liston, you and your friend, and so many more are worth a listen.

Sunday, February 17, 2013

Adding the R to CQS: some storage options

I've been writing quite a bit about CQS (or command and query separation) lately. In my last post on using events, I already hinted towards bringing in the R; command and query responsibility separation.

With CQS, commands can mutate data, while queries can only read that data. CQRS takes this one step further, and assigns commands and queries each a dedicated model; we now talk of a write side, and a read side.

I like Clemens Vasters definition best.
CQRS is a simple pattern that strictly segregates the responsibility of handling command input into an autonomous system from the responsibility of handling side-effect-free query/read access on the same system. Consequently, the decoupling allows for any number of homogeneous or heterogeneous query/read modules to be paired with a command processor. This principle presents a very suitable foundation for event sourcing, eventual-consistency state replication/fan-out and,  thus, high-scale read access. In simple terms, you don’t service queries via the same module of a service that you process commands through. In REST terminology, GET requests wire up to a different thing from what PUT, POST, and DELETE requests wire up to. 
A nice drawing also helps in understanding CQRS (from the CQRS journey material).


Although scalability seems to be one of the big selling points of CQRS, there are still some valid arguments applicable to my world; the strongest one being able to avoid the discrepancy which exists while you use the same model for reading and writing. I think everyone suffers from this one regularly. A popular and realistic example is this one; an ORM is used to map our domain model to a relational database, the tables are mapped very closely to the structure of the domain model. Not long after, it becomes evident that it's impossible to write simple and performant queries targeting this datastructure. We could optimize for reads, but this would impact the complexity and performance of writing. With CQRS, reads and writes are segregated; we can now optimize both parts independently. And this doesn't only result in being able to show a list faster on a user's screen, but interesting things can also be done to empower reporting and data mining; think of how often using the same database for these tasks makes it hard and expensive to change things.

While I still have done very little with CQRS, I have been looking at more and more real world examples, trying to fill in the blanks. What always has been kind of vague to me, is how you go at storing your domain model, and your read models in practice. Here are a few possible techniques - these are some proven techniques, and partially my own presumptions (you hardly find any OSS brown field examples).

The compromise

CQRS doesn't have to be an application-wide architecture necessarily; nothing stops you from introducing it gently, and just applying it to parts of your application where the added value is over-obvious. This could mean that you use a conventional architecture; a relational database with an ORM, or a document store, not distinguishing the write side from the read side. Yet for certain scenarios, you could introduce a specialized read or write side. For example; update the statistics read model on every relevant write, update a denormalized optimized read model for searches, etc..

NORM

While the relational paradigm definitely has its place, mapping your domain to the database can get complex, and require much maintenance. If you don't expect of your write side to be queryable, you can take advantage of less cumbersome techniques such as a key value store to store your domain model. This does force you to completely separate reads from writes though.

Event Sourcing

When you look at most OSS CQRS implementations, Event Sourcing and CQRS go hand in hand. With Event Sourcing, you capture all application state changes as a sequence of events. I'm really fond of the theory behind this pattern, and I can imagine the added operational value of having a log of each change. Yet, I also think you could largely achieve the same result by enabling journaling and adding some interception. Storage wise, you store all the event streams in an event store, which is optimized for such a task. Your read side can again be whatever you fancy.

These three techniques aren't mutually exclusive. There are a bunch of arguments to consider, and everything is highly dependent on your technical and operational requirements.

What is your experience with CQRS? Which techniques have you applied in practice?

Sunday, February 10, 2013

Premature judgment

When I started my first job, I hardly ever judged my peers. After all, how could I? Everything was unknown for me; I couldn't differentiate good from bad. Over the years that has changed a bit, but with that, I've also slowly become more judgmental towards peers, often prematurely, and not always deservedly.

The first few months of last year, I found myself doing maintenance on a legacy code base between projects. While I worked my way through layer after layer, I pointed my frustration towards those that had come before me; they were responsible for putting me in this mess. With half the office having touched the code base, that didn't really add up though. When I looked at the commit history of some of the offending modules, I found names that I didn't expect; those people were still around, and I actually thought of them pretty highly.

Judging someone's competence solely by code he has written in the past is a flaw. There are very little pieces of code I have written over the years where I still feel comfortable about today. When I reflect on what made it go wrong, I don't have to look far to find a bunch of reasons to blame it on; consistency was favored over common sense, major breakthroughs occurred only after the project was already in maintenance mode, people inexperienced with the domain and infrastructure were dumped on the project last minute to make up for bad planning, knowledge of the technology stack hadn't matured, some patterns and practices weren't commonplace yet, etc... I always find plenty of reasons to shift blame, but when I look at code written by someone else, it has to be their own fault; they must not be very good at building software. And this is unfair; I have no way of knowing the constraints they had to deal with, nor the context they had to work in. None of these justify neglecting basic hygiene though!

I tried to come up with other things that influence my opinion on someone before having actually worked with them; I found two.
The biggest influencer is word of mouth. I try to surround myself with people that share a similar thinking, and if that trusted circle has a strong opinion on someone else, I take note.
The last influencer is someone's online presence. When I learn of someone new joining ranks, I can't resist to look up what he's doing online. Twitter, Facebook or a blog can give away quite a bit.

All in all, I think some preconception is human, and might be the result of subconsciously protecting your work. You only want to involve those with whom you will enjoy working towards your shared goal.

Have you experienced similar behavior? When do you judge prematurely?

Sunday, February 3, 2013

Raising events in commandhandlers

I've explored quite a few options on how to handle commands and queries in the last few posts. I finally settled on this approach. The example used in that post looked like this.
public class CreateSubscriptionCommandHandler : ICommandHandler<CreateSubscriptionCommand>
{    
    private IDocumentSession _session;

    public CreateSubscriptionCommandHandler(IDocumentSession session)
    {
        _session = session;
    }

    public void Handle(CreateSubscriptionCommand command)
    {
        var subscription = new Documents.Subscription(
            command.Value, command.Category, command.EmailAddress);

        _session.Store(subscription);    
    }
}
Now imagine I would want to do some extra stuff after creating the subscription; update the sales statistics, append the email address to a mailing list, send out a confirmation email, etc..

You could go at this by simply extending the commandhandler, but the problem here is that you quickly end up with a bulky and dependency-heavy commandhandler, which will quickly fail to communicate its intent.

One solution could be to introduce events to decouple things in smaller pieces, and to help communicate intent more clearly.

The infrastructure to handle events is rather straightforward, and can be based on Udi Dahan's Domain Events Salvation.
public class Events : IEvents
{
    private readonly IKernel _kernel;

    public Events(IKernel kernel)
    {
        _kernel = kernel;
    }

    public void Raise<T>(T @event) where T : IEvent
    {
        var handlers = _kernel.GetAll<IEventHandler<T>>();

        foreach (var handler in handlers)        
            handler.Handle(@event);        
    }     
}
When an event is raised, the eventing infrastructure will look in the container for implementations that can handle the event, and invoke them in a random order.

Raising an event from the commandhandler can be done by injecting this extra piece of infrastructure.
public class CreateSubscriptionCommandHandler : ICommandHandler<CreateSubscriptionCommand>
{    
    private IDocumentSession _session;
    private readonly IEvents _events;

    public CreateSubscriptionCommandHandler(IDocumentSession session, IEvent events)
    {
        _session = session;
        _events = events;
    }

    public void Handle(CreateSubscriptionCommand command)
    {
        var subscription = new Documents.Subscription(
            command.Value, command.Category, command.EmailAddress);

        _session.Store(subscription);    
        
        _events.Raise(new SubscriptionCreatedEvent(query.Id));
    }
}
The SubscriptionCreatedEvent class is a simple value object, which exposes the subscription identifier.
public class SubscriptionCreatedEvent : IEvent
{
    public SubscriptionCreatedEvent(string subscriptionId)
    {
        SubscriptionId = subscriptionId;
    }

    public string SubscriptionId { get; private set; }

    public override bool Equals(Object other)
    {
        if (other == null)
            return false;

        var otherEvent = other as SubscriptionCreatedEvent;
        if (otherEvent == null)
            return false;

        return otherEvent.SubscriptionId == SubscriptionId;
    }

    public override int GetHashCode()
    {
        return SubscriptionId.GetHashCode();
    }
}   
To subscribe to this event, implement the IEventHandler interface, and register the implemenation in the container.
public interface IEventHandler<T> where T : IEvent
{
    void Handle(T @event);
}

public class SendConfirmationMailOnSubscriptionCreated : IEventHandler<SubscriptionCreatedEvent>
{    
    public void  Handle(SubscriptionCreatedEvent @event)
    {
        ...
    }
}

public class UpdateSalesStatisticsOnSubscriptionCreated : IEventHandler<SubscriptionCreatedEvent>
{    
    public void  Handle(SubscriptionCreatedEvent @event)
    {
        ...
    }
}
Eventhandlers are invoked synchronously, and participate in the commandhandler's unit of work, so if something goes haywire in one of the eventhandlers, nothing gets committed, not even what happened in the original commandhandler. Depending on your requirements, you might want to handle this differently though.

With this approach, tests also become more compact. Commandhandler tests now only need to assert that the event gets raised, and all the other logic gets offloaded to separate tests per eventhandler.

Summary

By introducing events, you can decouple commandhandlers into more focused, and intent-revealing bits. Your tests are the perfect proof of how much cleaner things get. One of the cues to listen for is when you do x or on doing y, also do z.

Are you using events? If so, domain events, or its big brother Event Sourcing?