Improve Performance of your LINQ to SQL Queries - Query Compilation

LINQ to SQL is a great Object-relational mapping (ORM) software that encapsulates and simplifies a lot anything related to your database.

But as with every good things in life, it comes with a price. No I'm not talking about Montreal's hockey team here (because they're not good), I'm talking about performance penalty!

The Problem
The Solution
Recap
Conclusion
Related Articles

The Problem

To understand where's the performance penalty, let's clarify things up a little before jumping into a solution to our problem. It makes sense doesn't it? So here we go: When the code execution reaches a LINQ to SQL snippet, the framework goes through a series of stages:

Expression tree creation: The LINQ to SQL you wrote is converted into an expression tree. It's like an in-memory data struture that represents parsed code. To be more precise, IL is generated to construct a tree of objetcs representing your LINQ to SQL the expression.
SQL Query generation: It is now easier for the framework to build an SQL Query using the expression tree. With it, the framework knows what data to retrieve and generates SQL accordingly.
Query execution: The generated query is sent to the SQL server with the proper parameters.
Data retrieval: The data is retreived from the SQL server.
Objects creation: The framework converts the rows retreived to objects as specified by your LINQ to SQL expression.

So even when you're using the same query over and over, the framework always goes through those five steps. You may agree with me that some steps are redundant.

Take step one for example. If the query is always the same inside one of your functions, and you call that function over and over. It's useless to recreate the exact same expression tree again and again (assuming the LINQ to SQL expression never changes across function calls). This is also true for step two. If the expression tree generated is always the same, SQL generated from it will also be the same. On the other hand, it would be difficult to perform a SQL query without sending it over the wire! We need step three to five because these steps are all about the data.

The Solution

So we determined that step one and two could only be done once for a particular query. How can we do that? Is it possible?

Fortunately, Microsoft came up with a solution. It's possible for you to compile your query so that these steps are executed only once, even if the query is executed many times. This feature is possible through the CompiledQuery class, with the static member Compile. Here's a full example:

private static Func<YourDataContext, string, Member> MemberByEmail = CompiledQuery.Compile(
  (YourDataContext dc, string email) =>
  (
    from m in dc.Members
    where m.Email == email
    select m
  ).SingleOrDefault()
);

Here's what the code does: A private static member variable MemberByEmail is declared and assigned the return value of the CompiledQuery.Compile method. The return value is a delegate (A delegate in C# is like a function pointer in C/C++), so MemberByEmail is a member variable but we can use it like a function:

Member m = MemberByEmail(dc, "test@domain.com");

We should note that compilation does not occur until the first time the query is executed (the first time MemberByEmail is called). To compile one of your query, here's a template you can use to fill in the proper parameters to the Compile method (sorry for the readability):

private static Func<YourDataContext, TypeOfParam1, TypeOfParamN, TypeOfReturnValue> FunctionName = CompiledQuery.Compile(
  (YourDataContext dc, TypeOfParam1 param1, TypeOfParamN paramN) =>
  (
    * your query *
  )
);

Recap

There's quite a number of advantages to compile your LINQ to SQL queries:

Speed! Not compiling your queries can be twenty times slower than compiling them! See some benchmarks.
Free performance. Why the "free"!? Because there's no architectural changes involved. You only move your queries around. So it's a pretty safe way to optimize and you don't have to do aggressive regression testing.
Your code can be more readable. Instead of having a huge LINQ query in an important part of your code, you now have a simple a function call. You can then regroup your compiled queries together in a #region somewhere away from the implementation of your business rules.

But there's some limitations/drawbacks too:

The Compile method only provide overloads for a maximum of four parameters. It means that the function representing your compiled query can have a maximum of four parameters. If you need more that that, you have to declare a structure, fill your parameters into your structure, and then pass the structure to the Compile method. Maybe I'll blog about that in a post on itself.
You can't pass everything to the Compile method either... For example, you can't pass lists.
You can't build your queries in multiple steps depending on some conditions (Like adding a where if we sometimes want a search filter).
You can't call a compiled query across DataContexts with different LoadOptions.
It's harmful in terms of speed if you only call your query once

Conclusion

It's getting late, so I'll cut it short: Compile your queries for god sakes! I hope you liked the article and learned something.

I'm putting time and effort on this blog and having you here is my reward. Make me feel better and better everyday by spreading the love with the buttons below! Also, don't hesitate to leave a comment. Thanks for reading!

See ya

Mike Gleason jr's Blog

Wednesday, October 28, 2009