Repositories, where did we go wrong?

In essence, repositories are a simple abstraction over aggregate storage. A repository will insert, update, delete or fetch an aggregate from the underlying persistence mechanism. This abstraction avoids that databases, SQL statements, Object Mappers and the like leak into your domain. Next to that, swapping out repositories for an in-memory version makes testing easier.

Recently, the use of repositories is being questioned again.

Why would we wrap Object Mappers in yet another abstraction? Aren’t Object Mappers already an implementation of the repository pattern? In a recent project, we left out repositories. In that project we’re using RavenDB, which already has an expressive API, and which can be configured to use an in-memory database for testing. Even though LINQ and indexes help make simple queries expressive, a lot of cruft still leaks in, not doing the language any justice. In other projects, we did make use of repositories over our ORM. Partly because setting up in-memory tests without was awkward at best, but also because it removed constraints trying to capture the language. Next to testing and expressiveness, you should also consider how comfortable you feel gluing everything to a library or framework. When it comes to aggregate storage, having those repositories is a small price to pay to keep technicalities out.

Another remark is that a repository makes it hard to control eager- and lazy loading, which is contextual. In general I think that lazy loading introduces unpredictable behaviour. Getting in trouble without lazy loading is a strong indication that your aggregates are just too big.

The last and loudest argument is that once you have a view heavy application things get dirty really fast. It starts by adding a few badly named query methods on your repositories. Then, you start to see use cases where you need to query over multiple aggregates and deal with projections or aggregations. In these situations repositories won’t help you.
Truth is that repositories were never intended for complex reads. Views on the data that your application needs rarely resemble the structure of your aggregates. Making your aggregates suited for querying inevitably steers away from behaviour thinking, back to data thinking. The trick is to separate read concerns from your domain. Instead of trying to use repositories for querying, make use of the best tool for the job, something as close to the database as possible. The implementation depends on your flavor, but what has worked for me is having use case optimized read models, a query object and a query handler that reads from the database and converts the result into a read model. The implementation of each query handler can differ; from raw SQL, to hibernate query language, to a micro ORM… whatever works best really.
Doing this, you allow your domain model to stay focused on the task at hand - handling complex business problems, staying far away from read concerns. Before you know it you’re successfully applying that popular four letter acronym, enabling you to even try other concepts without having to rewrite your model completely.