Pair (of) Programmers: Linq

As I recently discovered, Linq is a very powerful query language that can help us query our objects in a very beautiful way (yes,beautiful).

Lately I had to write some join queries between lists and I found-out that linq let you do that in all kinds of ways.

So what kind of joins can we do,and what is the difference between them?

In order to explain this we need an object model,a simple one.

Lets say we are running a store. we have products and each product is created by a company.

   public class Company
   {
       public string Name { get; set; }
       public int Id { get; set; }
   }

   public class Product
   {
       public string Name { get; set; }
       public int CompanyId { get; set; }
   }

We will run our queries on these lists:

var companies = new List<Company>
                            {
                                new Company() {Name = "Candy's", Id = 1},
                                new Company() {Name = "Cow's",  Id = 2},
                                new Company() {Name = "Stuff",  Id = 3},
                            };

           var products = new List<Product>
                           {
                               new Product() {CompanyId = 1, Name = "Bamba"},
                               new Product() {CompanyId = 1, Name = "Bisli"},
                               new Product() {CompanyId = 2, Name = "Milk"},
                           };

Inner-join

The first join is a simple inner-join. we want to print each product from the list and it's company's name:

from product in products
join company in companies on product.CompanyId equals company.Id
select new {Product = product.Name, Company = company.Name};

If we print the result it will be:

Bamba - Candy's
Bisli - Candy's
Milk – Cow's

The query run on the products list and search the matching company.

another option to get these same result is with this "sql" syntax:

from product in products
from company in companies 
where product.CompanyId == company.Id
select new { Product = product.Name, Company = company.Name };

So what's the difference? when to use what? in a second…

Group-join

Now we would like to print a company and all of it's products.

we could use the inner-join syntax as follows:

from company in companies
join product in products on company.Id equals product.CompanyId
select new { Product = product.Name, Company = company.Name };

And get this result:

Candy's - Bamba
Candy's - Bisli
Cow's – Milk

But what if we want to print the company's name only once? or we want to keep the company's products in a list? that why we have the group join.

from company in companies
join product in products on company.Id equals product.CompanyId into companysProducts
select new { Company = company.Name,Products = companysProducts };

As you can see we added the "into" key-word and now,for each company we select it's name and a list of it's products.

The result will look like that:

Candy's
    Bamba
    Bisli
Cow's
    Milk
Stuff

Another interesting thing we can see in the query result is that the "Stuff" company appears although it has no products. that didn't happened before. this is because now we group for each company it's products. if it has no products the list will be empty. this query also called "outer-join".

Outer-join

Except for the query we saw above we have one more option to perform an outer-join. what if we wanted to print lines of type "company-product" but wanted "Stuff" (that has no products) to be in that list with empty product?

For this, we have "DefaultIfEmpty" method:

from company in companies
join product in products on company.Id equals product.CompanyId into companysProducts
from item in companysProducts.DefaultIfEmpty(new Product{Name = "No products"})
select new {Company = company.Name, Product = item.Name };

What we do is creating a "default product" that will be inserted if the companysProducts list of the current company is empty.

The result of this will be:

Candy's Bamba
Candy's Bisli
Cow's Milk
Stuff No products

Custom-join

As you noticed, all our joins so far used the "equals" comparison. but what if we want to select for each company all the products that it didn't make?? we need to use not-equal.

The "join" syntax we saw so far wont be good here. so we need to use the "from" statements to perform what we want:

from company in companies
from product in products 
where company.Id != product.CompanyId
select new { Company = company.Name,Product= product.Name };

The result will be:

Candy's Milk
Cow's Bamba
Cow's Bisli
Stuff Bamba
Stuff Bisli
Stuff Milk

(we could obviously do this with a group-join and the results would be more readable)

So now it's the time to explain "When to user what":

The "join" syntax is designed to perform equals joins in the best way,that's why when you want to use "==" operator you better use the "join" syntax – it has best performance.

When you want to use other operators than "equals" Or you want to use more than one condition (where company.id == product.CompanyId and company.state = product.state) you need to use the "from … from.. where.." syntax.

This was just the tip of the iceberg about linq queries, hope i made things a beat more understandable, and now you are able to refactor your 10 lines of "foreach" loop to one query statement(really, that what happens).

first post - yay! :)

A while ago i was debugging a method that generates a Linq query and returns IQueryable .
I had a problem with it's results and i wanted to check-out the IQueryable object that was generated,
so I set an "Add watch" to the return value and was trying to find my generated linq. that wasn't so simple as i though.
In fact, the IQueryable dose not keeps the Linq statement as is, but saves it in an Expression object(also known as Expression tree).
You can see this in the IQueryable interface definition:
public interface IQueryable : IEnumerable
{
   Type ElementType { get; }
   Expression Expression { get; }
   IQueryProvider Provider { get; }
}

What is an expression tree?
“Expression trees represent language-level code in the form of data”
msdn.microsoft.com
-This means that an expression tree gives us some kind of meta-data for our code.

To explain the concept of the Expression trees we need to start from the very beginning....
Lets say we have this method:
public void PrintNumbers(Func<int,bool> filter)
{
   for(int i=0;i<=10;++I)
{
      if (filter(i))
         Console.writeline(i);
}
}

The method takes a filter function and prints all the numbers from 1 to 10 that match the filter.

Until now we had 3 ways to do this:
1) Declare a method that match the parameter,something like this:
private bool MyFilter(int x)
{
return (x%2) == 1;
}

And call our method like this:
PrintNumbers(MyFilter);

2) we could call it with anonymous delegates:
PrintNumbers(delegate(int x)
{
   return (x%2) == 1;
}

3) Lambdas expression:
PrintNumbers(x => (x % 2) == 1);

And Now we have a new way - Expressions:
Expression<Func<int, bool>> myExpression = x => (x % 2) == 1;

This expression is of type Func<int, bool> and assigned with lambdas expression.
when doing this assignment the expression object's properties are filled as follows:
1)Body - the expression: (x % 2) == 1
2)Parameters - x
3)NodeType - LambdaExpression
4)Type - Func<int, bool>

Becasue expressions are nested types(that's why they are called trees) we would like to view them in a tree format.
you can download the visualizer from Visual studio 2008 sampels.

For exmple, if we run the visualizer on our current expression, we will get the following result:

Our Expression is an : ExpressionEqual which equals between (marked in pink underline) ExpressionModulo(left) and ExpressionConstant(righ).

The Expression modulo represent a modulo operation between ExpressionParamter "x" and a ConstantParameter "2' ,and the ExpressionConstant represent the const "1".

Expression trees and Linq
As I said in the begining of the post,the linq statement is saved in the Expression property of the IQueryable object. when the query is "executed" the expression is translated into sql statment.

for example, if we run the expression tree visualizer on the query:
var query = from c in db.Customers
where c.City == “London"
select new { c.City, c.CompanyName };

We will get the:

and now that we can read this, it is simpler to debug Linq queries.

There are some more usages to the Expression trees such as generating code and reflection, but thats for a diffrent post....

Barko(aka Tal)

Pair (of) Programmers

Friday, February 5, 2010

Linq – Different between "join" syntax

Tuesday, January 19, 2010

Expression trees

Search This Blog

Labels

Blog Archive

My Blog List

Contributors