Wednesday, April 6, 2011

Using lambda expressions to get a subset where array elements are equal

I have an interesting problem, and I can't seem to figure out the lambda expression to make this work.

I have the following code:

List<string[]> list = GetSomeData(); // Returns large number of string[]'s
List<string[]> list2 = GetSomeData2(); // similar data, but smaller subset
&nbsp;
List<string[]> newList = list.FindAll(predicate(string[] line){ 
    return (???);
});

I want to return only those records in list in which element 0 of each string[] is equal to one of the element 0's in list2.

list contains data like this:

"000", "Data", "more data", "etc..."

list2 contains data like this:

"000", "different data", "even more different data"

Fundamentally, i could write this code like this:

List<string[]> newList = new List<string[]>();
foreach(var e in list)
{
    foreach(var e2 in list2)
    {
        if (e[0] == e2[0])
            newList.Add(e);
    }
}
return newList;

But, i'm trying to use generics and lambda's more, so i'm looking for a nice clean solution. This one is frustrating me though.. maybe a Find inside of a Find?

EDIT: Marc's answer below lead me to experiment with a varation that looks like this:

var z = list.Where(x => list2.Select(y => y[0]).Contains(x[0])).ToList();

I'm not sure how efficent this is, but it works and is sufficiently succinct. Anyone else have any suggestions?

From stackoverflow
  • You could join? I'd use two steps myself, though:

    var keys = new HashSet<string>(list2.Select(x => x[0]));
    var data = list.Where(x => keys.Contains(x[0]));
    

    If you only have .NET 2.0, then either install LINQBridge and use the above (or similar with a Dictionary<> if LINQBridge doesn't include HashSet<>), or perhaps use nested Find:

    var data = list.FindAll(arr => list2.Find(arr2 => arr2[0] == arr[0]) != null);
    

    note though that the Find approach is O(n*m), where-as the HashSet<> approach is O(n+m)...

    Mystere Man : What is the reason for the HashSet? It seems to work well without the has set (see my edit above). Does the HashSet make it more efficent?
    Marc Gravell : note; for very small lists, it *can* be more efficient to just scan the list [like your edit does]... but for small lists it is going to be very fast no matter what approach you use. As the list size increases, the scan approach can quickly become a bottleneck.
    Mystere Man : Ok. One piece of infromation I should mention is that the "keys" (or list2) will always be relatively small, probably less than 10. While the source (list) can be several hundred elements (up to 1000).
    Marc Gravell : But in general: much more efficient, yes. Firstly, it only keeps the *distinct* keys; secondly, it uses a hash algorithm (similar to dictionary) so that Contains tends to O(1) rather than O(n) [essentially, think of it as an "index" in database terms].
    Marc Gravell : If you have < 10 keys (list2), then either approach should be fine.
    Mystere Man : I think I prefer your original approach, even though it may be fine now, one never knows how things could grow in the future. Thanks
    Marc Gravell : In fact, for 10 keys, the optimal approach would probably be: var keys = list2.Select(x => x[0]).ToArray(); var data = list.Where(x => keys.Contains(x[0]));
    Marc Gravell : No problem; hope it helped.
  • You could use the Intersect extension method in System.Linq, but you would need to provide an IEqualityComparer to do the work.

     static void Main(string[] args)
     {
      List<string[]> data1 = new List<string[]>();
      List<string[]> data2 = new List<string[]>();
    
      var result = data1.Intersect(data2, new Comparer());
     }
    
     class Comparer : IEqualityComparer<string[]>
     {
      #region IEqualityComparer<string[]> Members
    
      bool IEqualityComparer<string[]>.Equals(string[] x, string[] y)
      {
       return x[0] == y[0];
      }
    
      int IEqualityComparer<string[]>.GetHashCode(string[] obj)
      {
       return obj.GetHashCode();
      }
    
      #endregion
     }
    
    Mystere Man : Interesting solution, but it's larger than the original problem ;)
    Ch00k : Depends on the context really. If this comes up repeatedly, put your equality comparer in a library assembly and you can use one simple call to Intersect to get the Intersection of your two lists. Later on if you need to, you can use the same comparer for Equals, or Except, or other uses
  • (q1) define lamda expression that represent a link list ? (2) lamda expression to find the number of elements in a given list ?

0 comments:

Post a Comment