Venkat's SQL Server Club: Best Practices for TSQL Progamming.

Ø Variables - Use few variables as possible. It frees spaces in cache.

Ø Keywords - Use SQL keywords in capital letters to increase readability. Also use proper indentation to increase readability.

Ø SQL-92 - Always try to use ANSI 92 syntax. Till now the old syntax is working the old syntax will be deprecated in the future release of MS SQL server.

As an example, for JOINS, use the following query

SELECT * FROM employee e1 INNER JOIN employee_dtl e2

ON e1.id = e2.id

Instead of

SELECT * FROM employee e1, employee_dtl e2 WHERE e1.id = e2.id

Ø Dynamic Queries - Try to minimize the usage of dynamic queries. If you are using a dynamic query like:

SELECT * FROM emp where empid = @eid

then there is no problem. You can supply a value for the @eid parameter and there is no recompilation of the execution plan in the database cache. But if you are using a SQL query like :

SELECT * FROM emp where empid = " + @eid

and supply a parameter (say 100), then the cache will keep the execution plan for the value of 100 only. If the id changes (to say 101), it will recompile the statement. Hence, this approach is slower than the previous one.

Ø Fully Qualified Names - Always use the fully qualified name when calling stored procedures. For example, use EXEC dbo.Your_Proc_name instead of EXEC Your_Proc_name This is a very common mistake, which causes an extra trip to the procedure cache to get the execution plan for execution. Also try to use the schema name while creating a procedure.

Like: CREATE PROCEDURE dbo.<Proc_name> instead of CREATE PROCEDURE <Proc_name>.Its good practice to start with ‘usp_’ to identify user defined stored procedures.

Ø SET NOCOUNT ON - This suppresses the message that shows number of rows affected by SQL statement. Otherwise this can cause extra network traffic and can have some serious impact on performance when the procedure is called frequently.

I would recommend you to use SET NOCOUNT ON for the sake of performance unless there is a very good reason for using it.

Ø sp_executesql - Avoid the recompilation of a stored procedure. If you want to provide parameterized dynamic SQL, then go for sp_executesql instead of EXEC(proc_name). Here the execution plan for the procedure is stored with the variable name in cache memory. When the variable values are supplied, then the values are simply mapped to the query, hence no need for a recompilation.

Update: Keep in mind that recompilations are not always bad. Sometimes, using an old and inefficient plan can make the procedure to run slower.

As an example of sp_executesql, we can write:

sp_executesql N'SELECT * FROM dbo.emp where empid = @eid', N'@eid int', @eid=40

Ø SELECT vs SET - A single SELECT statement can assign values to different variables and is much faster than multiple SET statements assigning values to multiple different variables.

SELECT @Var1 = @Var1 + 1, @Var2 = @Var2 – 1

Instead of

SET @Var1 = @Var1 + 1

SET @Var2 = @Var2 - 1

Ø WHERE clause hints - Try to avoid a function in the WHERE clause as it presents SQL engine to do index seek. Even it forces SQL full index scans or even table scans.

Also, try to avoid IN. While checking the existence of some values, then use EXISTS instead of IN. IN counts the NULL values whereas EXISTS doesn’t. EXISTS returns Boolean(Yes/No) but IN returns all values hence result set for IN is heavier than EXISTS.

SELECT name FROM emp WHERE empid IN (SELECT empid FROM empdetails)

SELECT name FROM emp WHERE EXISTS (SELECT 1 FROM empdetails where emp. empid = empdetails. empid)

Ø CAST and CONVERT - Try to use CAST instead of CONVERT. CAST is ANSI-92 standard but CONVERT works in MS SQL server only. Also, some CONVERT styles may be deprecated in future MS SQL releases. It is better to use CONVERT only when you need to format the DATETIME data type with the style option. CAST cannot do this.

Ø Avoid calculations on the left-hand side of the WHERE clause and avoid treating dates like strings.

Ø Do not use any functionality that is undocumented. You will come across on website with many undocumented commands but do not use them. Because

· Microsoft will not provide you support on such command in case of any issue and

· Microsoft may stop supporting that command in future release.

Ø Avoid DISTINCT and ORDER BY - Try to avoid DISTINCT/ORDER BY clause. DISTINCT or ORDER BY clauses cause extra work to the database engine and reduce the performance. In some rare cases, ORDER BY helps to speed up the operation.

Ø Avoid using CURSORS - Using cursors make the program slower as it works against SET based SQL. Try to use temporary table/table variables with identity column and then iterate all the rows using WHILE loop and a looping counter, which will map with the identity column.

Ø SELECT Statements - Try to use only the required number of columns in the SELECT clause instead of using *. Using * returns all columns, which unnecessarily create a fat recordset.

Ø Subquery vs JOINs - In fact most sub queries can be expressed as an equivalent form of JOIN. Subquery is faster when we have to retrieve data from large number of tables because it becomes tedious to join more tables. JOIN is faster to retrieve data from database when we have less number of tables. Try to avoid correlated sub queries because it makes the query much slower.

Ø CREATE TABLE vs. SELECT INTO - Select * INTO works fine for small tables, but when dealing with large record sets or long-running queries, it creates locks on the system objects within the tempdb database. As a result, other queries and procedures that need to create objects within the tempdb database will have to wait for the long-running query to complete. This is because when an object is created, an exclusive lock is taken against the sysobjects, syscolumns, sysindexes tables.

Also, SELECT INTO not only copies data but also it copies the table structure, hence it performs slower.

Ø Table variables VS Temporary Tables - Temp tables can cause stored procedures to recompile. (From SQL 2005, using temporary table not always cause recompilations. But adding rows to temporary tables may cause recompilations). But table variables were designed specifically to guard against stored procedure recompiles during execution.

If the result set is not containing a huge number of records then you should stick to table variable, otherwise temp table has its advantages. There is a misconception that temp tables always use the tempdb database but table variable do not. Table variables also use tempdb after a certain size.

Ø Lock Hints – Avoid using query hints unless you know what exactly you are doing, and you have verified that hint actually boosts performance. Specifying table hints by using the WITH keyword. Make sure to use WITH (<HINT>) keyword.

Ø Datatypes - Try to define data types that use minimum storage unless it’s really necessary.

Do not use CHAR data type for Nullable column. If column can contain null value or multi length value then DO NOT use CHAR data type.Use sparse column type when you think the data has more nulls in it.

Ø Transaction Scope- Try to minimize the transaction scope within the sql statements. This reduces locking and increases application concurrency.

Ø Isolation Level: Use the least restrictive transaction isolation level possible for your user connection, instead of always using the default READ COMMITTED.

Ø Usage of Stored Procedures: Whenever a client application needs to send Transact-SQL to SQL Server, send it in the form of a stored procedure instead of a script or embedded Transact-SQL. Stored procedures offer many benefits, including:

· Reduced network traffic and latency, boosting application performance.

Stored procedure execution plans can be reused, staying cached in SQL Server’s memory, reducing server Overhead.

· Client execution requests are more efficient. For example, if an application needs to INSERT a large binary value into an image data column not using a stored procedure, it must convert the binary value to a character string (which doubles its size), and send it to SQL Server. When SQL Server receives it, it then must convert the character value back to the binary format. This is a lot of wasted overhead. A stored procedure eliminates this issue as parameter values stay in the binary format all the way from the application to SQL Server, reducing overhead and boosting performance.

· Stored procedures help promote code reuse. While this does not directly boost an application’s performance, it can boost the productivity of developers by reducing the amount of code required, along with reducing debugging time.

· Stored procedures can encapsulate logic. You can change stored procedure code without affecting clients (Assuming you keep the parameters the same and don’t remove any result sets columns). This saves developer time.

· Stored procedures provide better security to your data.

Ø CLR(Common Language Runtime) :

· Don’t turn on the CLR unless you will be using it. Use the CLR to complement Transact-SQL code, not to replace it.

· Standard data access, such as SELECT, INSERTs, UPDATEs, and DELETEs are best done via Transact-SQL code, not the CLR.

· Computationally or procedurally intensive business logic can often be encapsulated as functions running in the CLR.

· If a procedure includes both significant data access and computation and logic, consider separating the procedure code in the CLR that calls the Transact-SQL stored procedure that does most of the data access.

· Use the CLR for error handling, as it is more robust than what Transact-SQL offers.

· Use the CLR for string manipulation, as it is generally faster than using Transact-SQL.

· Use the CLR when you need to take advantage of the large base class library.

· Use the CLR when you want to access external resources, such as the file System, Event Log, a web service, or the registry.

Venkat's SQL Server Club

Saturday, August 13, 2011

Best Practices for TSQL Progamming.

No comments:

Post a Comment