Sunday, January 26, 2014

Repositories, where did we go wrong?

In essence, repositories are a simple abstraction over aggregate storage. A repository will insert, update, delete or fetch an aggregate from the underlying persistence mechanism. This abstraction avoids that databases, SQL statements, Object Mappers and the like leak into your domain. Next to that, swapping out repositories for an in-memory version makes testing easier.

Recently, the use of repositories is being questioned again.

Why would we wrap Object Mappers in yet another abstraction? Aren't Object Mappers already an implementation of the repository pattern? In a recent project, we left out repositories. In that project we're using RavenDB, which already has an expressive API, and which can be configured to use an in-memory database for testing. Even though LINQ and indexes help make simple queries expressive, a lot of cruft still leaks in, not doing the language any justice. In other projects, we did make use of repositories over our ORM. Partly because setting up in-memory tests without was awkward at best, but also because it removed constraints trying to capture the language. Next to testing and expressiveness, you should also consider how comfortable you feel gluing everything to a library or framework. When it comes to aggregate storage, having those repositories is a small price to pay to keep technicalities out.

Another remark is that a repository makes it hard to control eager- and lazy loading, which is contextual. In general I think that lazy loading introduces unpredictable behaviour. Getting in trouble without lazy loading is a strong indication that your aggregates are just too big.

The last and loudest argument is that once you have a view heavy application things get dirty really fast. It starts by adding a few badly named query methods on your repositories. Then, you start to see use cases where you need to query over multiple aggregates and deal with projections or aggregations. In these situations repositories won't help you.
Truth is that repositories were never intended for complex reads. Views on the data that your application needs rarely resemble the structure of your aggregates. Making your aggregates suited for querying inevitably steers away from behaviour thinking, back to data thinking. The trick is to separate read concerns from your domain. Instead of trying to use repositories for querying, make use of the best tool for the job, something as close to the database as possible. The implementation depends on your flavor, but what has worked for me is having use case optimized read models, a query object and a query handler that reads from the database and converts the result into a read model. The implementation of each query handler can differ; from raw SQL, to hibernate query language, to a micro ORM... whatever works best really.
Doing this, you allow your domain model to stay focused on the task at hand - handling complex business problems, staying far away from read concerns. Before you know it you're successfully applying that popular four letter acronym, enabling you to even try other concepts without having to rewrite your model completely.


  1. Is that like this: ?

  2. One more thought on lazy loading. For me the need for such ORM tricks indicates the aggregates are not properly designed. Aggregate defines the transactional consistency boundry. I some entities sometimes are loaded and sometimes not then it means they are not the part that needs to stay consistent with the rest of the aggregate. So they are not a part of it.

    1. I've found exactly the same - optional loading meant that I was designing my aggregates wrong. That was a happy day when I turned off nHibernate lazy loading.

  3. Hi
    I think you expressed my thoughts in a clear way. Rather than searching for "purity", it's better to be pragmatic and use the right tool.

  4. In my opinion, what happens on most software projects is that the team starts working on the "1 right way" to access the data. And they start bending and shaping that "1 right way" to everything. When in actual fact is, what is required is to be more pragmatic, and make use of many patterns, and use the right one for the task at hand!

    Splitting your projects out into isolated functional units of activity. is far better than trying to code your entire project to use some predefined patterns. Which may become too restrictive to meet isolated user stories.

    In my opinion pragmatism which supersede paternism!

  5. I totally agree. After experimenting with CQRS I realized how much garbage my Entities had to carry around for the sake of generating the DTOs that my GUI needed. Even more amazing is realizing that now I can have way more Entities that expose no attributes, and just the operations named according the domain model.

    A second side effect is that ORM are less of a need. The database is less of a plain object container, and more of a collection of different projections of the same data. This increases the value of good database knowledge. Because of that, I am suddenly more interested in re-learning SQL and exploring beyond the NoSQL-Document databases, like the BigTable type.