Friday, February 5, 2010

Linq – Different between "join" syntax

As I recently discovered, Linq is a very powerful query language that can help us query our objects in a very beautiful way (yes,beautiful).

Lately I had to write some join queries between lists and I found-out that linq let you do that in all kinds of ways.

So what kind of joins can we do,and what is the difference between them?

In order to explain this we need an object model,a simple one.

Lets say we are running a store. we have products and each product is created by a company.

   public class Company
   {
       public string Name { get; set; }
       public int Id { get; set; }
   }

   public class Product
   {
       public string Name { get; set; }
       public int CompanyId { get; set; }
   }
We will run our queries on these lists:
var companies = new List<Company>
                            {
                                new Company() {Name = "Candy's", Id = 1},
                                new Company() {Name = "Cow's",  Id = 2},
                                new Company() {Name = "Stuff",  Id = 3},
                            };

           var products = new List<Product>
                           {
                               new Product() {CompanyId = 1, Name = "Bamba"},
                               new Product() {CompanyId = 1, Name = "Bisli"},
                               new Product() {CompanyId = 2, Name = "Milk"},
                           };

Inner-join

The first join is a simple inner-join. we want to print each product from the list and it's company's name:

from product in products
join company in companies on product.CompanyId equals company.Id
select new {Product = product.Name, Company = company.Name};

If we print the result it will be:

Bamba - Candy's
Bisli - Candy's
Milk – Cow's

The query run on the products list and search the matching company.

another option to get these same result is with this "sql" syntax:

from product in products
from company in companies 
where product.CompanyId == company.Id
select new { Product = product.Name, Company = company.Name };
So what's the difference? when to use what? in a second…

Group-join

Now we would like to print a company and all of it's products.

we could use the inner-join syntax as follows:

from company in companies
join product in products on company.Id equals product.CompanyId
select new { Product = product.Name, Company = company.Name };
And get this result:

Candy's - Bamba
Candy's - Bisli
Cow's – Milk

But what if we want to print the company's name only once? or we want to keep the company's products in a list? that why we have the group join.

from company in companies
join product in products on company.Id equals product.CompanyId into companysProducts
select new { Company = company.Name,Products = companysProducts };
As you can see we added the "into" key-word and now,for each company we select it's name and a list of it's products.

The result will look like that:

Candy's
    Bamba
    Bisli
Cow's
    Milk
Stuff

Another interesting thing we can see in the query result is that the "Stuff" company appears although it has no products. that didn't happened before. this is because now we group for each company it's products. if it has no products the list will be empty. this query also called "outer-join".

Outer-join

Except for the query we saw above we have one more option to perform an outer-join. what if we wanted to print lines of type "company-product" but wanted "Stuff" (that has no products) to be in that list with empty product?

For this, we have "DefaultIfEmpty" method:

from company in companies
join product in products on company.Id equals product.CompanyId into companysProducts
from item in companysProducts.DefaultIfEmpty(new Product{Name = "No products"})
select new {Company = company.Name, Product = item.Name };
What we do is creating a "default product" that will be inserted if the companysProducts list of the current company is empty.

The result of this will be:

Candy's Bamba
Candy's Bisli
Cow's Milk
Stuff No products

Custom-join

As you noticed, all our joins so far used the "equals" comparison. but what if we want to select for each company all the products that it didn't make?? we need to use not-equal.

The "join" syntax we saw so far wont be good here. so we need to use the "from" statements to perform what we want:

from company in companies
from product in products 
where company.Id != product.CompanyId
select new { Company = company.Name,Product= product.Name };

The result will be:

Candy's Milk
Cow's Bamba
Cow's Bisli
Stuff Bamba
Stuff Bisli
Stuff Milk

(we could obviously do this with a group-join and the results would be more readable)

So now it's the time to explain "When to user what":

The "join" syntax is designed to perform equals joins in the best way,that's why when you want to use "==" operator you better use the "join" syntax – it has best performance.

When you want to use other operators than "equals" Or you want to use more than one condition (where company.id == product.CompanyId and company.state = product.state) you need to use the "from … from.. where.." syntax.

This was just the tip of the iceberg about linq queries, hope i made things a beat more understandable, and now you are able to refactor your 10 lines of "foreach" loop to one query statement(really, that what happens).

No comments:

Post a Comment