Sunday, May 26, 2013

Accidental entities - you don't need that identity

An entity is identified by an identifier, while value objects are identified by their value.

If I make a living renting cars to tourists, I might not care the least about the identity of the colors the cars came in. I just care about their value; Rosso Corsa, Azurro Metallic... If I repaint the car, the color changes, and the previous color is abandoned as a whole.
However, if I were a car paint manufacturer, I would care a great deal about the identity of a color. My first action might be to make up a marketable name for the color, something that I can identify it with - a la Burnt Sienna or Iceberg Blue. The color might have a certain structure from the get-go, but I might experiment with the structure along the way, while I'm still referring to it as the same color.

Imagine that I'm implementing a tool to manage the car rental's fleet, and that the CEO told me that color is one of the specifications that seems to matter a lot to their customers. The list of available colors is rather limited though; black, dark gray, and blue. Yet the CEO insists on managing this collection by herself; this should avoid having to call in another expensive consultant a few years down the road.

Color as an entity

So we define a collection of colors that will be persisted by NHibernate. Since NHibernate, and our relational database don't play nice without a primary key, we add an identifier to the colors.

We end up with two classes; one entity that defines a color, and another entity that defines a car. A car references a color.
public class Car
{
    public Car(Color color) 
    {
        if (color == null)
            throw new ArgumentNullException("color");

        Color = color;
    }

    protected Car() { }

    public virtual int Id { get; protected set; }

    public virtual Color Color { get; protected set; }

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;

        var car = obj as Car;

        if (car == null)
            return false;

        return car.Id == Id;
    }

    public override int GetHashCode()
    {
        return Id.GetHashCode();
    }
}

public class Color
{
    public Color(string name, string hexadecimalNotation)
    {
        if (string.IsNullOrEmpty(name))
            throw new ArgumentNullException("name");
        if (string.IsNullOrEmpty(hexadecimalNotation))
            throw new ArgumentNullException("hexadecimalNotation");

        Name = name;
        HexadecimalNotation = hexadecimalNotation;
    }

    protected Color() { }

    public virtual int Id { get; protected set; }

    public virtual string Name { get; protected set; }

    public virtual string HexadecimalNotation { get; protected set; }

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;

        var color = obj as Color;

        if (color == null)
            return false;

        return color.Id == Id;
    }

    public override int GetHashCode()
    {
        return Id.GetHashCode();
    }
}

public class CarClassMap : ClassMap<Car>
{
    public CarClassMap()
    {
        Id(x => x.Id).GeneratedBy.HiLo("10");

        References(x => x.Color);          
    }
}

public class ColorClassMap : ClassMap<Color>
{
    public ColorClassMap()
    {
        Id(x => x.Id).GeneratedBy.HiLo("10");

        Map(x => x.Name).Length(30).Not.Nullable();

        Map(x => x.HexadecimalNotation).Length(7).Not.Nullable();
    }
}
The generated schema looks like this.



And while this looks innocent at first, accidentally creating an entity raises a bunch of new concerns and questions. What happens if a color is no longer available, and the CEO wants to remove it from the collection? Does that mean we should delete all models that came in this color? No, those colors still exist, we're not going to repaint all the vehicles; those colors just aren't available anymore. This hints towards a concept that might be missing.

Fighting symptoms

We see the CEO heading over to the cafeteria, so we jump up, and ask her whether it makes sense for her to mark those colors as unavailable, instead of deleting them. After a short delay she shrugs and replies: "Well, I could do that if that makes things easier for you." We go ahead and model our solution to reflect this new information. 
public class Color
{
    public Color(string name, string hexadecimalNotation)
    {
        if (string.IsNullOrEmpty(name))
            throw new ArgumentNullException("name");
        if (string.IsNullOrEmpty(hexadecimalNotation))
            throw new ArgumentNullException("hexadecimalNotation");

        Name = name;
        HexadecimalNotation = hexadecimalNotation;
        Available = true;
    }

    protected Color() { }

    public virtual int Id { get; protected set; }

    public virtual string Name { get; protected set; }

    public virtual string HexadecimalNotation { get; protected set; }
    
    public virtual bool Available { get; protected set; }
    
    public virtual void MakeUnavailable() 
    {
        Available = false;
    }

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;

        var color = obj as Color;

        if (color == null)
            return false;

        return color.Id == Id;
    }

    public override int GetHashCode()
    {
        return Id.GetHashCode();
    }
}
A few days later we show off what we came up with to the CEO. She looks content with what we built over the last few days, until we show her the user interface that manages the colors. "I'm rather busy so I often make mistakes when I take care of these administrative tasks, could you add a button to really delete a color from this list anyway?" This brings us back to square one. First thing we think about is soft deleting the colors. We could also only make it possible to remove a color if it hasn't been referenced yet. A voice in the back of our heads keeps telling us that we must be missing something though, and that this seems to be harder than it should be. A few hours later, driving home after a tough day, it becomes obvious that the CEO really thinks of a color as a value instead of an entity, so we should really be modeling it as such.

Color as a component

Luckily, NHibernate makes this pretty simple. The next day, we arrive early at the office, and change our mapping to use a component, so that instead of the car referencing a color, we store the value, and lose the id.
public class CarClassMap : ClassMap<Car>
{
    public CarClassMap()
    {
        Id(x => x.Id).GeneratedBy.HiLo("10");

        Component(
           x => x.Color,
           m =>
           {
               m.Map(x => x.Name).Column("ColorName").Length(30).Not.Nullable();
               m.Map(x => x.HexadecimalNotation).Column("ColorHex").Length(7).Not.Nullable();
           });
    }
}

The generated schema now looks like this; we're no longer referencing the Color table.



When we store a color now, it's an entity. But as soon as we put it on a car, it's a value object. When we pull the car back out of our persistence store, we have lost the identity. We should modify our code so that the color's behaviour reflects these changes. We modify the default constructor so that the object gets initialized with a default identifier. The default constructor will get used by NHibernate when it hydrates the object after getting it out the persistence store. We override the Equals and GetHashCode methods so that the identifiers are compared when there's an identity, but when the identifier isn't hydrated, the values are compared.
public class Color
{
    public Color(string name, string hexadecimalNotation)
    {
        if (string.IsNullOrEmpty(name))
            throw new ArgumentNullException("name");
        if (string.IsNullOrEmpty(hexadecimalNotation))
            throw new ArgumentNullException("hexadecimalNotation");

        Name = name;
        HexadecimalNotation = hexadecimalNotation;
    }

    protected Color() 
    {
        Id = -1;
    }

    public virtual int Id { get; protected set; }

    public virtual string Name { get; protected set; }

    public virtual string HexadecimalNotation { get; protected set; }

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;

        var color = obj as Color;

        if (color == null)
            return false;

        if (Id != -1)
            return Id == color.Id;

        return Name == color.Name && HexadecimalNotation == color.HexadecimalNotation;
    }

    public override int GetHashCode()
    {
        if (Id != -1)
            return Id.GetHashCode();

        return Name.GetHashCode() & HexadecimalNotation.GetHashCode();
    }
}
This feels off though; using one concept in two different contexts makes things rather confusing. Is color an entity, or value object? Or does it depend?

Separate concepts

We extract two explicit concepts instead; a color as a value object, and an available color as an entity. 
public class AvailableColor 
{
    public AvailableColor(string name, string hexadecimalNotation)            
    {
        if (string.IsNullOrEmpty(name))
            throw new ArgumentNullException("name");
        if (string.IsNullOrEmpty(hexadecimalNotation))
            throw new ArgumentNullException("hexadecimalNotation");

        Name = name;
        HexadecimalNotation = hexadecimalNotation;
    }

    protected AvailableColor()
    {
    }

    public virtual int Id { get; protected set; }

    public virtual string Name { get; protected set; }

    public virtual string HexadecimalNotation { get; protected set; }

    public static explicit operator Color(AvailableColor value)
    {
        return new Color(value.Name, value.HexadecimalNotation);
    }  

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;

        var color = obj as AvailableColor;

        if (color == null)
            return false;

        return color.Id == Id;
    }

    public override int GetHashCode()
    {
        return Id.GetHashCode();
    }
}

public class Color
{
    public Color(string name, string hexadecimalNotation)
    {
        if (string.IsNullOrEmpty(name))
            throw new ArgumentNullException("name");
        if (string.IsNullOrEmpty(hexadecimalNotation))
            throw new ArgumentNullException("hexadecimalNotation");

        Name = name;
        HexadecimalNotation = hexadecimalNotation;
    }

    protected Color()
    {
    }

    public virtual string Name { get; protected set; }

    public virtual string HexadecimalNotation { get; protected set; }

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;

        var color = obj as Color;

        if (color == null)
            return false;

        return Name == color.Name && HexadecimalNotation == color.HexadecimalNotation;
    }

    public override int GetHashCode()
    {
        return Name.GetHashCode() & HexadecimalNotation.GetHashCode();
    }
}
This looks better. We now have two explicit concepts. An explicit conversion allows you to get a color out of an available color, losing the identifier.
var availableColorOrange = new AvailableColor("Orange", "#CC3232");
var car = new Car((Color)availableColorOrange);                    
                    
Console.WriteLine(car.Color.Equals(new Color("Orange", "#CC3232"))); // true
Conclusion

We meet up with the CEO one last time, and show her what we reworked. When we demo how she can manage the collection of available colors by just adding and deleting them - without caring whether one of the cars came in that color, a smile shows up on her face; "This is exactly what I needed, it really shouldn't be harder than this."

Tools often trick us into creating entities. These accidental entities then go on to introduce expensive coupling, introducing questions and problems that could easily be avoided by copying values around instead.

Does your codebase contain accidental entities?

Next week: but what about the UI?

21 comments:

  1. Interesting modeling approach!
    I have one question though:
    What if the CEO wants to change the name of a color and have this changes relfected in the cars?
    Are you gonna update all color names in the car entities?

    ReplyDelete
    Replies
    1. This might indicate that we do need an entity.

      How plausible is this though? How likely is it that color names change?

      Even if it does, I can imagine it only happens once in a blue moon. Is it really that big of a deal for a user to change the color of those 30 cars by hand? It might be more of a problem if we own 1000 cars though. Maybe it's perfectly fine that if the day this does happen, it's an operational intervention (bulk update) that straightens this.

      It all depends though.. ;)

      Delete
    2. Interesting question, the fact that the ceo asks this question might indicate that you didn't gather all requirements to start modeling your domain. There might be a reason why he asks this. Because when a car is painted in for instance "Testarossa Red", you can never call it "Just Red". The color is fixed from the time it's painted.

      Delete
    3. > Because when a car is painted in for instance "Testarossa Red", you can never call it "Just Red". The color is fixed from the time it's painted.

      Exactly, I would say changing a color, would require a car to be repainted.

      Delete
    4. It shouldn't always mean a repaint of the car.
      From a marketing perspective it can be an option to rename a color to a more "fancy" name to sell the large car stock that's left.
      But these are all assumptions off course.
      With the current information you have, the model above is probably the best choice. The model can still evolve as the domain knowledge expands.

      Delete
    5. So this 'marketing perspective' would be a different context then? ;)

      Delete
  2. If your main worry is about deleted values - you could just have a deleted attribute on the colour entity - then only return those not deleted for new cars options (which might be useful if you want to link such things a paint codes)

    ReplyDelete
    Replies
    1. Don't call it deleted then, the color is discontinued. You cannot sell it anymore. It's the same when a person leaves a company he is not deleted, he is fired or he resigned.

      Delete
  3. AvailableColor, just love it ;)

    ReplyDelete
  4. Great article!

    What do you think about making AvailableColor a subclass of Color?
    Because I see a lot of code duplication in those two classes.

    ReplyDelete
    Replies
    1. I'm a hesitant to introduce inheritance here. The 'is a' relationship doesn't really hold up; Color is a value object, while AvailableColor is an entity. AvailableColor overriding the behaviour of Equals violates the Liskov substitution principle which might bite us.

      Delete
  5. Stepping aside from the approach that you're taking to fulfill customer requirements, I'm not sure if your approach to gathering requirement is fitting.

    Why are you letting your customer micro managing your implementation detail? Your customer should not be giving you the solution to implement. You should be gathering what problem the CEO is trying to solve and then suggest a solution - while keeping the implementation detail to yourself.

    Why is the CEO saying "I need to delete the colors so I won't make mistakes?" Why does she need to delete those colors? I would interpret that as "The list of colors is to cluttered. It's easy to make mistakes. How can you solve that?"

    My approach would be using what you already had: using the "Available" flag.
    Provide a check box in your UI "Only show available colors."

    Otherwise you're violating data integrity: as another commenter pointed out, what if the CEO wants to change the name of a color (because of legal issue for example)? Even if it's one in a million chance, it can still happen and you can catch that from the beginning.

    ReplyDelete
    Replies
    1. Thanks for your comment.

      I don't feel like we're letting the customer 'micro manage' our implementation. It's not like we are making him decide which database we'll be using. What we're trying to do though, is capture the language and intent of our customer and reflect that in our models and software. That's important right?

      How am I violating data integrity, if a color is just a value, and the customer doesn't care about tracking its lifecycle.

      Delete
    2. Maybe micro-manage isn't the correct word.

      My point was not to let customers come to you with "I want feature XYZ." Ask them "Why do you want that feature? What problem would that solve?" They often don't know what we can do. We know the resources and it's our job to give them the solution.

      Your "available-flag" solution was perfectly fine. In fact, if you had gone with that solution, you wouldn't have had to do the "-1" workaround on your UI (your next post) -- which is essentially the same thing as the available flag. Though your UI approach is fine for now, what happens if your CEO say "Hey, sometimes I'm very busy and only have time to update the repaint records at the end of the day. Can I update many cars (different colors) at once?" Now we're back to square one.

      I'll repeat again, your CEO does not know what ability you have. She might not even know the concept of "soft-deleting." Ask her, "why do you want to delete the colors?"

      Delete
    3. To answer "How am I violating data integrity": violating the dont-repeat-yourself principle for no apparent reason.

      1. What happens if they call you back and say "internationalization?" Will your car entity suddenly have the fields "GermanName", "ChineseName", "EnglishName?"

      2. Oh no typo! Oh no competitor just got trademark on the color name! Now I have to change it.

      Delete
    4. The CEO just wants a list of available colors. When they're not available anymore, she wants to remove them from the list. Why would I even introduce her to the concept of 'soft-deleting' if she does not care about the life cycle of those colors?

      I don't consider DRY of a solid argument here. It's like having a FirstName table because some people share the same first name.

      Those last two points introduce new requirements/concepts which probably would make color an entity; they now need an identity and have a life cycle.

      It all depends on how the CEO perceives color in her context.

      Delete
    5. Hi Jef,

      Your approach look spot-on to me. I hadn't even thought about the possibility of using operator overloading to compare Color/AvailableColor, so that was a bit of an eye-opener!

      Re Steven's comment above - "What happens if they call you back and say "internationalization?" Will your car entity suddenly have the fields "GermanName", "ChineseName", "EnglishName?"

      - If indeed we had to make Color an entity at that point, the issue with updating existing Cars still remains, and I'm interested in how you would approach this? My understanding (based on the current context) is that once a Car has been assigned a Color, that Color is immutable (so existing cars with that Color should not change), so..... if we now have a Color table and the name does change, for example, from "Testarossa Red" to "F50 Red" (or perhaps the translated version changes to "F50 Rojo" etc) would we add a new Color instance (and table row) to represent "F50 Red", and leave the old instance intact (perhaps setting its available flag to false)?

      I'm interested in your view as that above proposal would be how I'd do it (I think!)

      Thanks,
      Phil

      Delete
    6. Hi Phil

      What makes sense for business? ;) I wouldn't give Color a life cycle until business really feels as it needs one.

      Regards
      Jef

      Delete
  6. That's a really interesting article Jef. Your Color example reminded me of how we are modeling our system. I would like to hear your approach on it.

    We sell items online. An Order can have a status. Allowed statuses are (placed, shipped, cancelled, refunded). ERDs taught us to create a table for the allowed statuses and link them to the Order table using a foreign key, so the relation between Order and Status is Many-to-One.

    For the domain model I would have two entities Order and Status mapping to the 2 tables mentioned before.

    If I wanted to follow your approach and model them as Value Objects, what would it be like?

    ReplyDelete
  7. I have been struggling with this dilemma for some time now. Been searching for a logical, "feel right" solution and I have only one thing to say... Finally! You nailed Sir, great article!

    ReplyDelete