Somewhere earlier this week I had to find the gaps in a sequence of dates. Admittedly, my first action was to search Stackoverflow for a clean solution. But since no one asked the question there yet, I had to implement it myself.
The solution comes in the form of an extension method on IEnumerable<DateTime>, which takes a lower bound and an upper bound, and returns an enumerable of dates.
I start with building a second - from lower bound to upper bound, thus gapless - sequence. Then I use the LINQ Except method to compare the complete sequence to the input sequence, and produce a difference set.
The solution comes in the form of an extension method on IEnumerable<DateTime>, which takes a lower bound and an upper bound, and returns an enumerable of dates.
public static IEnumerable<DateTime> GetGaps( this IEnumerable<DateTime> sequence, DateTime lowerbound, DateTime upperbound) { var completeSequence = new List<DateTime>(); var tmpDay = lowerbound.Date; while (tmpDay <= upperbound.Date) { completeSequence.Add(tmpDay); tmpDay = tmpDay.AddDays(1); } var gaps = completeSequence.Except(sequence.Select(d => d.Date)); return gaps; }
This extension method is backed up by a few tests.
[TestMethod()] public void Given_an_empty_sequence_everything_between_bounds_is_a_gap() { var sequence = new List<DateTime>(); var actual = sequence.GetGaps(new DateTime(2012, 1, 6), new DateTime(2012, 1, 9)); Assert.IsTrue(actual.Count() == 4); Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 6))); Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 7))); Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 8))); Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 9))); } [TestMethod()] public void Given_sequence_with_duplicate_dates_one_is_not_returned_as_gap() { var sequence = new[] { new DateTime(2012, 1, 6, 10, 0, 0), // double date new DateTime(2012, 1, 6, 11, 0, 0), // double date new DateTime(2012, 1, 7) }; var actual = sequence.GetGaps(new DateTime(2012, 1, 6), new DateTime(2012, 1, 9)); Assert.IsTrue(actual.Count() == 2); Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 8))); Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 9))); } [TestMethod()] public void Given_sequence_with_gaps_the_gaps_are_returned() { var sequence = new[] { // gap new DateTime(2012, 1, 4), new DateTime(2012, 1, 5), // gap new DateTime(2012, 1, 7, 11, 10, 1) // gap }; var actual = sequence.GetGaps(new DateTime(2012, 1, 3), new DateTime(2012, 1, 8)); Assert.IsTrue(actual.Count() == 3); Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 3))); Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 6))); Assert.IsTrue(actual.Contains(new DateTime(2012, 1, 8))); } [TestMethod()] public void Given_sequence_with_no_gaps_an_empty_enumerable_is_returned() { var sequence = new[] { new DateTime(2012, 1, 4), new DateTime(2012, 1, 5), new DateTime(2012, 1, 6) }; var actual = sequence.GetGaps(new DateTime(2012, 1, 4), new DateTime(2012, 1, 6)); Assert.IsFalse(actual.Any()); }
Did you add it as a question/anwser to SO? I think that's a better place for this. There's a multitude of ways to improve on this, one of which is to use Enumerable.Range to reduce the need for a temporary list of DateTime values.
ReplyDeleteUsing Enumerable.Range you'd also generate a second list.
DeleteI think a solution with Enumerable.Range would be more terse though, but would it also be more readable/understandable?
Let's do this:
Deletevar days = (int) (upperbound - lowerbound).TotalDays + 1;
var completeSequence = Enumerable.Range(0, days).Select(add => lowerbound.AddDays(add));
completeSequence is now an enumerable that doesn't need to be a (potentially large) list. Except() still creates an internal set of the second argument to do the lookups, though.
If you could promise that the sequence of dates is in the right order, I suppose you could create an iterator that walks across the two sequences simultaneously, spotting missing dates. That would be the most optimal, I think.
Nice alternative, thanks!
DeleteDave had the same solution as me. Here's my version:
ReplyDeletepublic static IEnumerable GetGapsRefactored(
this IEnumerable sequence, DateTime lowerbound, DateTime upperbound)
{
return Enumerable.Range(0, (upperbound - lowerbound).Days)
.Select(num => lowerbound.AddDays(num))
.Except(sequence);
}
Oops, it appears this commenting system strips out C# generics.
DeleteWhich is why I think it's a better idea to post it as a question/answer on SO and let others put up answers as well.
Delete