Sunday, September 23, 2012

Finding the gaps in a sequence of dates

Somewhere earlier this week I had to find the gaps in a sequence of dates. Admittedly, my first action was to search Stackoverflow for a clean solution. But since no one asked the question there yet, I had to implement it myself.

The solution comes in the form of an extension method on IEnumerable<DateTime>, which takes a lower bound and an upper bound, and returns an enumerable of dates.
public static IEnumerable<DateTime> GetGaps(
    this IEnumerable<DateTime> sequence, DateTime lowerbound, DateTime upperbound)
{          
    var completeSequence = new List<DateTime>();           
    var tmpDay = lowerbound.Date;

    while (tmpDay <= upperbound.Date)
    {
        completeSequence.Add(tmpDay);
        tmpDay = tmpDay.AddDays(1);
    }
    
    var gaps = completeSequence.Except(sequence.Select(d => d.Date));
    
    return gaps;
}
I start with building a second - from lower bound to upper bound, thus gapless - sequence. Then I use the LINQ Except method to compare the complete sequence to the input sequence, and produce a difference set.

It was only after a few not so successful iterations that I found this - for me - satisfactory solution though. If I hadn't intellistumbled upon the LINQ Except method, I probably would have ended up with something a lot less elegant.


This extension method is backed up by a few tests.
[TestMethod()]
public void Given_an_empty_sequence_everything_between_bounds_is_a_gap()
{
    var sequence = new List<DateTime>();            
 
    var actual = sequence.GetGaps(new DateTime(2012, 1, 6), new DateTime(2012, 1, 9));

    Assert.IsTrue(actual.Count() == 4);
    Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 6)));
    Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 7)));
    Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 8)));
    Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 9)));
}

[TestMethod()]        
public void Given_sequence_with_duplicate_dates_one_is_not_returned_as_gap()
{
    var sequence = new[] 
    { 
        new DateTime(2012, 1, 6, 10, 0, 0), // double date
        new DateTime(2012, 1, 6, 11, 0, 0), // double date
        new DateTime(2012, 1, 7) 
    };

    var actual = sequence.GetGaps(new DateTime(2012, 1, 6), new DateTime(2012, 1, 9));

    Assert.IsTrue(actual.Count() == 2);
    Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 8)));
    Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 9)));
}

[TestMethod()]
public void Given_sequence_with_gaps_the_gaps_are_returned()
{
    var sequence = new[] 
    { 
        // gap
        new DateTime(2012, 1, 4), 
        new DateTime(2012, 1, 5), 
        // gap
        new DateTime(2012, 1, 7, 11, 10, 1) 
        // gap
    };

    var actual = sequence.GetGaps(new DateTime(2012, 1, 3), new DateTime(2012, 1, 8));

    Assert.IsTrue(actual.Count() == 3);
    Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 3)));
    Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 6)));
    Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 8)));
}

[TestMethod()]
public void Given_sequence_with_no_gaps_an_empty_enumerable_is_returned()
{
    var sequence = new[] 
    {                
        new DateTime(2012, 1, 4), 
        new DateTime(2012, 1, 5),                 
        new DateTime(2012, 1, 6)                 
    };

    var actual = sequence.GetGaps(new DateTime(2012, 1, 4), new DateTime(2012, 1, 6));

    Assert.IsFalse(actual.Any());
}
Do you know of a better way to do this?

7 comments:

  1. Did you add it as a question/anwser to SO? I think that's a better place for this. There's a multitude of ways to improve on this, one of which is to use Enumerable.Range to reduce the need for a temporary list of DateTime values.

    ReplyDelete
    Replies
    1. Using Enumerable.Range you'd also generate a second list.

      I think a solution with Enumerable.Range would be more terse though, but would it also be more readable/understandable?

      Delete
    2. Let's do this:
      var days = (int) (upperbound - lowerbound).TotalDays + 1;
      var completeSequence = Enumerable.Range(0, days).Select(add => lowerbound.AddDays(add));

      completeSequence is now an enumerable that doesn't need to be a (potentially large) list. Except() still creates an internal set of the second argument to do the lookups, though.

      If you could promise that the sequence of dates is in the right order, I suppose you could create an iterator that walks across the two sequences simultaneously, spotting missing dates. That would be the most optimal, I think.

      Delete
  2. Dave had the same solution as me. Here's my version:

    public static IEnumerable GetGapsRefactored(
    this IEnumerable sequence, DateTime lowerbound, DateTime upperbound)
    {
    return Enumerable.Range(0, (upperbound - lowerbound).Days)
    .Select(num => lowerbound.AddDays(num))
    .Except(sequence);
    }

    ReplyDelete
    Replies
    1. Oops, it appears this commenting system strips out C# generics.

      Delete
    2. Which is why I think it's a better idea to post it as a question/answer on SO and let others put up answers as well.

      Delete