Robert's SQL Blog

My thoughts on SQL Server, PowerShell and Microsoft products in general.
@rhartskeerl
Follow me on Twitter

Entity Framework 4, the end of stored procedures?

Last week I did a session for the Dutch PASS chapter on WCF Data Services, ADO.NET Entity Framework and Silverlight. It was a demo on how you can click-build an application in a couple of minutes that’s using databinding, master-detail grids and whatever more. I showed how the EF came up with a horrible query but as it turned out this was due to the way the data was retrieved by the developer. By changing this code the SQL build by EF was much more DBA friendly and more performant. What was noticed in the audiance was that the parameters we’re send over as is, instead of wrapping them in sp_execeutesql and proper parameters. As I didn’t have an answer that night I promised to get in more detail through my blog. So, here is my blog post on how EF4 does SQL.

To get things going we use a simple console application that retrieves data from an entity model. The model is shown below and it is used to build an agenda for an event, like SQL Saturday or SQL BITS. We use LINQ to retrieve data from the model and send some output back.

And here is the constructed LINQ we use.

PassDbEntities e = new PassDbEntities(ConfigurationManager.ConnectionStrings["PassDbEntities"].ConnectionString);
var q = from r in e.Rooms select r;

foreach (Room r in q)
{
    Console.WriteLine("Sessions for room: {0}, capacity: {1}",
        r.Name, r.Capacity);
    foreach (Slot s in r.Slots)
    {
        Console.WriteLine("-- Time: {0} Session: {1}",
        s.SlotDateTime, s.Session.Title);
    }
}

If you run this the outcome will be like the next screenshot. Everything behaves as expected.

Of course, the most interesting part for DBA’s is “What’s in the profiler?”. Actually,some nice and clean code is going on there. For the outer loop,the rooms, everything is selected.

SELECT
    [Extent1].[Id] AS [Id],
    [Extent1].[Name] AS [Name],
    [Extent1].[Capacity] AS [Capacity]
FROM [pass].[Rooms] AS [Extent1]

In the next loop we will retrieve data from the Slots table but also from the Sessions table by referring to Slots.Session.Title. Take a look at the SQL produced by this loop:

exec sp_executesql N'SELECT
    [Extent1].[Id] AS [Id],
    [Extent1].[SlotDateTime] AS [SlotDateTime],
    [Extent1].[RoomId] AS [RoomId],
    [Extent1].[SessionId] AS [SessionId]
FROM [pass].[Slots] AS [Extent1]
WHERE [Extent1].[RoomId] = @EntityKeyValue1',N'@EntityKeyValue1 int',@EntityKeyValue1=1

exec sp_executesql N'SELECT
    [Extent1].[Id] AS [Id],
    [Extent1].[Title] AS [Title],
    [Extent1].[Abstract] AS [Abstract],
    [Extent1].[Level] AS [Level],
   [Extent1].[SpeakerId] AS [SpeakerId]
FROM [pass].[Sessions] AS [Extent1]
WHERE [Extent1].[Id] = @EntityKeyValue1',N'@EntityKeyValue1 int',@EntityKeyValue1=1

Wow, pretty cool huh. Except for the fact more columns are retrieved then needed, but it’s using parameters and sp_executesql so plans can be reused.

Things change though if the WCF Data Services kick in. In the previous example we used the Entity Framework within our project. But if you want to reuse it in Silverlight or expose data to the web you can connect the Entity Framework to WCF Data Services and you get instant Atom feeds and JSON serialization. That’s not just cool, it’s sub-zero! But there is a catch. I will use the same example as before but the code is changed a little to make it work with WCF Data Services.

PassDbEntities e = new PassDbEntities(new Uri("http://localhost:50789/Ef4DataService.svc"));
var q = from r in e.Rooms select r;

foreach (Room r in q)
{
    Console.WriteLine("Sessions for room: {0}, capacity: {1}",
        r.Name, r.Capacity);
    foreach (Slot s in e.Slots.Expand("Session").Where(t => t.RoomId == r.Id))
    {
        Console.WriteLine("-- Time: {0} Session: {1}",
            s.SlotDateTime, s.Session.Title);
    }
}

We cannot access the data like before (Slots.Session.Title) unless we “Expand” it. This will cause the entity framework to join the two tables – Slots and Sessions – and retrieve all data. This will get pretty ugly if a lot of columns and rows are involved. Or if you Expand().Expand().Expand(). In that case you get a lot of data and if you are lucky some outer joins, left joins and inner joins packed together. I’d say try to avoid Expand whenever you can. The result of this code is the same as the “regular” example but the difference is in the generated SQL. The first part – getting the rooms – is the same, but check out the next part.

SELECT
    [Extent1].[Id] AS [Id],
    [Extent1].[SlotDateTime] AS [SlotDateTime],
    [Extent1].[RoomId] AS [RoomId],
    [Extent1].[SessionId] AS [SessionId],
    [Extent2].[Id] AS [Id1],
    [Extent2].[Title] AS [Title],
    [Extent2].[Abstract] AS [Abstract],
    [Extent2].[Level] AS [Level],
    [Extent2].[SpeakerId] AS [SpeakerId]
FROM  [pass].[Slots] AS [Extent1]
INNER JOIN [pass].[Sessions] AS [Extent2] ON [Extent1].[SessionId] = [Extent2].[Id]
WHERE 1 = [Extent1].[RoomId]

You see the effect of the Expand, an INNER JOIN is performed on Slots and Sessions. But that’s not the most important part. Notice the WHERE clause. Instead of wrapping it all in sp_exucutesql and use parameters the values are pasted in! This means no plan reuse! To prove it here is a quick example. The first screenshot shows the performed statements and the second shows the cached plans. You will see three plans for the query not using parameters and one for the one that used parameters but with three executions. This doesn’t always have to be a better solution but it’s one of those things why DBA’s want to use stored procedures. Plan reuse and parameter sniffing.

It seems WCF is doing something with our call causing the Entity Framework not to use parameters. One way I found to force the use of parameters is creating a service operation. The next piece of code is added to the service class. This will take an id as parameter and return a filtered set.

[WebGet]
public IQueryable GetSlotByRoomId(int id)
{
    return CurrentDataSource.Slots.Where(a => a.RoomId == id);
}

If you add a service operation you must also configure the service to allow the user to execute it. Uncomment the line in the code that was generated by Visual Studio and fill in the proper names.

config.SetServiceOperationAccessRule("GetSlotByRoomId", ServiceOperationRights.All);

To call this function from our application is a bit harder. Make sure you recompile and update the servicereference. As you will see the service operation isn’t strongly typed. To access it you must build an uri with the request and filter in the string. To do this I created a function in my console application that will do the call, where localhost:50789 is my reference to the  service.

public static IEnumerable GetSlotByRoomId(int id)
{
    PassDbEntities e = new PassDbEntities(new Uri("http://localhost:50789/Ef4DataService.svc"));
    Uri uri = new Uri("http://localhost:50789/Ef4DataService.svc/GetSlotByRoomId?id=" + id);

    return e.Execute(uri);
}

And this function will be called from the code like this.

foreach (Slot s in GetSlotByRoomId(r.Id))
{
    Console.WriteLine("-- Time: {0} Session: {1}",
        s.SlotDateTime, s.SessionId);
}

Notice I did change the s.Session.Title to s.SessionId. Because of the use of a service operation you cannot expand anymore. There are lots of solutions to this one but for the sake of the demo and length of this post I will not get into this at this point.

So, parameters can be used albeit with the loss of flexibility. It really seems the WCF Data Service is changing the game plan at some point so I would start my investigation there. So, is it the end of stored procedures as we know it? The queries created by EF4 in this example aren’t bad at all and look a lot like the ones you would have created yourself. But stored procedures are also a way to tighten security. If you use stored procedures for insert, update and delete you do not have to give out permissions directly to the underlying tables. This would always force people to use stored procedures, even if they would access your database from something other than your application. Another reason to use stored procedures for a lot of people is they can incorporate business logic. Although I will agree it can, T-SQL was build and optimized for set based operations, and when I look at some of the stored procedures in the wild out there they are far too complex, simply because T-SQL wasn’t built for it. It can handle it, but C# or C++ would do it a lot faster and cleaner.

So, stored procedures or not? It depends, on your business requirements, on the size and complexity of your database. LINQ, EF4 and data services are a great addition to the developers tool belt and in our connected worlds. There was a time application talked to each other by dumping tables to delimited files every night and send them over with ftp, where the other application would read and import the file. Nowadays you just expose your data as a web service and you can combine two data services and mash up the results. Look at all the social network integrations. The latest additions for the developer cannot be ignored by DBA’s but developers cannot ignore the performance penalty with improper use of the entity framework.

Things your company doesn’t want to do with SQL Server

I used to work for a government agency, and not just a regular one, but a military one. So, as you can imagine security is a big issue. The same goes for financial and medical databases. For some people downtime is not an option, for others the integrity and liability of the data is the most important.

SQL Server helps you on this. It’s followed the path Microsoft took a while ago to make their products ‘secure by default’. This means the Windows Firewall is now enabled in the latest server releases, instead of disabled or absent in earlier releases. The same goes for SQL Server. No more Builtin Administrators in the SYSADMIN serverrole. And xp_cmdshell is disabled by default. As is the guest user in user databases.

I’ve seen a lot of developers turn pale white when their glorious application didn’t work in production, simply because it needed SYSADMIN. Well, it didn’t need SYSADMIN but they forgot to specify what permissions were needed and did not provide a databaserole with the proper permissions. If you do not have a strong organization it’s most likely you will give the application account SYSADMIN permissions. You might believe that some day you will change this to a databaserole but that day will never come.

Sometimes, and only sometimes and in small companies, they place the SQL service account in the Domain Admin group (yes, people who do this exist on this planet). And to be honest, if you are a DBA,this is the best thing that can happen to you. Because now you enable xp_cmdshell and with a simple ‘xp_cmdshell ‘dsadd user cn=backdoor,cn=users,dc=contoso,dc=com -samid adm‘’ you can add users to AD. But of course you can grant yourself Domain Admin permissions. And you don’t have to impersonate the account because xp_cmdshell always runs in the context of the SQL service account.

And if there are people who believe the SQL account has to be Domain Admin to be able to register a SPN at startup; Read this: http://msmvps.com/blogs/ad/archive/2010/07/15/how-to-delegate-the-right-to-delegate-kerberos-constrained-delegation.aspx

And any domain administrators who goes along in this and grant Domain Admin permissions to a service account should URALT (Update Resume And Leave Town).

T-SQL Tuesday #12–Why are DBA skills necessary?

alt

Why are DBA skills necessary? You can answer that question easily. You just can’t do it all. True, if you’re working at a company with only 10 employees you do not need DBA skills. It’s not likely to run into true DBA problems and SQL does a fair job of tuning and assisting you with nice GUI tools to make sure it all keeps running.

Things are different when you’re hosting more than 1.000 databases, if you’re hosting a 5TB SharePoint environment, if all your databases are on a SAN or if the business want’s to go virtual.

You cannot have 100 databases on a single instance if you don’t have a DBA watching over it. The more databases you put on an instance the more you have to think about how these interact with the instance. How much IO is each databases taking, which queries use the most CPU, what is SQL Server waiting on? There are tools that can assist you in the process, but they only make it easier to spot the pain. You still need DBA skills to understand the pain to know how to fix it. Take a doctor for example. If a patient walks in with his arm in an abnormal way the doctor can easily tell it’s broken. If a patient walks in screaming with pain in his arm but no visible injury the doctor would have to check certain things an would have to know the cause of the pain. He might even use a tool to assist him in the diagnose. The tool in this case could be a x-ray device. If the x-ray shows that the arm was in fact broken the doctor would know how to fix it. But it would take certain skills to be able to do this.

There is also the case when people rely on the tools too much. They are just tools, they can be wrong. The tools are written by people who tend to make mistakes, sometimes referred to as bugs. In SQL 2008 we came across an issue where a restore failed within SSMS. It turned out that the query build by SSMS to get the restore chain didn’t check if the log or differential backups actually belonged to the full backup. This would create a situation where SSMS would error out on the restore leaving you with a database in restoring state. If you would just gather the backups yourself and build your own restore script the database was completely restorable. This is fixed in SQL 2008 CU5 (https://connect.microsoft.com/SQLServer/feedback/details/411015/ssms-shows-incorrect-restore-chain). If you do not have the necessary DBA skills you might end up reporting to your boss that the data is lost.

Last but not least you need DBA skills if you have developers in the house. I have been amazed quite a few times when a developer requested a database backup but wasn’t able to restore it. Simply because the paths on our production servers differ from his dev vm. At one time a developer told me his plan to migrate a database from one datacenter to another. By backing up the database and burning it on disk, drive to the second datacenter (1 hour drive) and restore it there. This was faster than the copy they tried over the network. And this database was only 6GB on local storage. The backup itself took 1 hour. By simply splitting the backup in four files not only the backup time shortened but the copying over the network became more reliable. The whole procedure including the restore took less than an hour.

And what about performance tuning or at least performance perception. Most developers focus on execution time. If the query results are there in a few milliseconds it’s fast. DBA are more likely to focus on execution plans, plan reusability and IO’s. I have been in a discussion with a developer about one of his queries. I showed my query and he replied with “but mine is faster!”. Yes, it maybe now,but what if there’s million of records in the database instead of your reference set of 100 records. And what about concurrent connections? Did you test the query with one or one thousand concurrent connections. Although the execution times we’re almost equal,the difference in logical reads was ten times more. In a production environment with more users and data this would have created serious blocking issues. Then again, most developers circumvent that problem by adding a READUNCOMMITTED hint to the query. Yes, DBA skills are necessary.

As always with SQL, it depends. It depends on the size of your business. It depends on how many databases you manage and which techniques are involved. You could easily manage 50 databases on 50 vm’s without any DBA skills (you will need some vm skills). But if you have 5 databases that need 99.999 percent uptime and they are build on geo clusters, replicated by your SAN. You need DBA skills… And SAN skills, and Windows skills. In fact, the DBA needs to know it all, simply because the others lack DBA skills.

Autonumbering vs Semi-Autonumbering

A while ago someone wondered if you had to use auto numbering in SQL or if there was an alternative. Especially since Oracle doesn’t use Auto numbering like SQL does but uses Sequences (http://www.oradev.com/oracle_sequence.jsp).

So, what are the alternatives? There are actually two alternative methods.

  1. Use a sub select like SELECT MAX(ID) + 1 to calculate the next ID.
  2. Use a separate table to hold the last inserted value.

Let’s run through these options and see where they differ from auto numbering.

But first off, why use auto numbering. Most people use it as a primary key. It is definitely a proper use but the most important concept behind auto numbering is that it is sequentially correct. By default SQL Server creates a clustered index for the primary key unless specified not to do so. A clustered index determines how the pages are ordered. If the new value comes after the previous one it will just be added. But if the new value is somewhere in between previous values, pages must be moved to insert the value causing index fragmentation. This is the main reason you do not want to see a GUID as a clustered index. If you must then use NEWSEQUENTIALID() to generate the new value.

Alternative 1: sub select

To insert a new value it is possible to get the max value and add one. Typically this would look like this:

INSERT INTO Table1 (ID, SomeField) VALUES (
    (SELECT ISNULL(MAX(ID), 0) + 1 FROM Table1), ‘Some Text’);

You must use ISNULL to make sure the insert works when there are no records yet. But let’s make it a bit more complex. What happens if someone is inserting records in a transaction that’s not committed yet and someone else wants to insert a record as well? This can easily be tested by opening two query windows in SQL Management Studio and use a BEGIN TRAN to start the first insert and start the second on the other window. By not using BEGIN TRAN an implicit commit will be performed.

The first window, with the BEGIN TRAN indicates a successful insert. Remember that no record is really inserted until a COMMIT is issued. But in the second window nothing will happen. Because the first window is neither committed nor rolled back the table is locked and a select cannot be placed. If you would check out the DMV’s (use this query as a shortcut: http://weblogs.sqlteam.com/mladenp/archive/2008/04/29/SQL-Server-2005-Get-full-information-about-transaction-locks.aspx ) you will see an exclusive lock is placed on the KEY. Until a COMMIT or ROLLBACK is issued the insert statement will hold.

Now ROLLBACK the statement and check the sequence. Notice that the sequence is still correct.

Although this alternative seems likely it’s not useable. Not to mention what will happen if you have a 1M row table.

Alternative 2: separate table

To overcome the locking problem you could create a separate table to hold the last value. But in reality the same problem occurs as with the first alternative. To update the last value you will lock the table and it will be locked within the not yet committed transaction.

Bottom-line, is. If you want to create your own auto numbering strategy be prepared to do some complex snapshot isolation, triggering and perhaps some CLR functioning.

In the end you could just use auto numbering within SQL Server itself, no need to recreate what’s already there.

Count # of words in a string

Two weeks ago I was at the first Dutch SqlZaterdag which is the Dutch version of SQLBits or SqlSaturday. It was a great day and almost 200 people showed up. I’m sure there will be another SqlZaterdag.

This post is a reply to the session from @andrekamman and Henk van der Valk who has put the presentation online. In their session there was a question: What is the fastest way to count the words in a string? The correct answer according to Andre and Henk was:

select LEN(@s) - LEN(replace(@s, ' ', ''))+1;

Which indeed does the trick. If @s is “To be or not to be, that is the question.” it will return 10. But what if someone didn’t put a space after the colon or put two spaces. A typo is easily made. To test it I used this script:

declare @start datetime2 = SYSDATETIME();
declare @end datetime2;
declare @s varchar(100);
declare @s1 varchar(100);
declare @s2 varchar(100);

set @s = 'To be or not to be, that is the question.';
set @s1 = 'To be or not te be,that is the question.';
set @s2 = 'To be or not to be, that  is the question.';

select LEN(@s) - LEN(replace(@s, ' ', ''))+1;
select LEN(@s1) - LEN(replace(@s1, ' ', ''))+1;
select LEN(@s2) - LEN(replace(@s2, ' ',''))+1;
set @end = SYSDATETIME();
select @start;
select @end;
select DATEDIFF(MICROSECOND,@start, @end);

The first time it takes a wile to run but the second time it will run much faster. The problem is that this will return three different outcomes, 10,9 and 11. So, yes it maybe the fastest way to count words in a string but not the most precise. But then again, if that’s not what you’re after, this is it.

If you would ask a developer for a solution he/she will say Regular Expressions. So, how do we get these in T-SQL. Yes, by using a CLR function.

Fire up Visual Studio 2008 and create a new Database – SQL Server Project. Give it a name, in my case I named it CountWords. Select your project and choose Add – User-Defined Function

image

Name the function RegExCount and replace the contents with this code.

using System;
using System.Text;
using System.Text.RegularExpressions;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlInt32 RegExCOUNT(string pattern, string message) as Computed
    {
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
        int i = r.Matches(message).Count;
        return new SqlInt32(i);
    }
};

This CLR function will accept a pattern and a search string. It will then count the occurrences of the pattern in the string and return it. Great thing about Visual Studio is that you can deploy this function directly on your SQL Server. So click Build and then Deploy to get the DLL inside your database. Or do it the proper way, build it and use CREATE ASSEMBLY to add it to a database.

Now run the following script:

declare @start datetime2 = SYSDATETIME();
declare @end datetime2;
declare @s varchar(100);
declare @s1 varchar(100);
declare @s2 varchar(100);

set @s = 'To be or not to be, that is the question.';
set @s1 = 'To be or not te be,that is the question.';
set @s2 = 'To be or not to be, that  is the question.';

select dbo.RegExCount('(s+)|(w,w)|(w.w)', @s)+1;
select dbo.RegExCount('(s+)|(w,w)|(w.w)', @s1)+1;
select dbo.RegExCount('(s+)|(w,w)|(w.w)', @s2)+1;
set @end = SYSDATETIME();
select @start;
select @end;
select DATEDIFF(MICROSECOND, @start, @end);

 

We feed the CLR-function with a RegEx Pattern:

  • s+: Matches one or more whitespaces.
  • w,w: Matches a character followed by a colon followed by a character.
  • w.w: Matches a character followed by a period followed by a character.

 

The result will be 10 for all three strings and although the first time it will take some time (the DLL needs to be compiled), subsequent runs are faster. So is this the fastest way? Maybe not but it is the more accurate one. And the pattern can be extended to catch more typo’s.

But still, SqlZaterdag was great!!!

Who’s backing up my database’s?

DBA’s know one of the most important tasks is to make sure the databases are backed up. For numerous reasons; To make recovery possible, to get back to an earlier point when someone runs the wrong update statement or to free up space in the transaction log.

And then, a backup is as good as it’s ability to restore. A backup that cannot be restored is useless. Who remembers broken backup chains in SQL 2000. When an user asked for a copy of the data and you decided the easiest way would to be backup and restore the database. Than you threw away the backup file because you didn’t need it anymore, at least that’s what you thought. SQL 2005 and beyond brings us the possibility to make a copy only backup which is a backup but does not interfere with the backup schedule.

So what if you have a great backup schedule and proper procedures with your team and you see this in your log on a time you are not backing up the databases:

2010-03-13 14:07:25.70 spid56      I/O is frozen on database msdb. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2010-03-13 14:07:26.06 spid56      I/O was resumed on database msdb. No user action is required.
2010-03-13 14:07:27.52 Backup      Database backed up. Database: msdb, creation date(time): 2008/07/09(16:46:27), pages dumped: 1579, first LSN: 212:536:75, last LSN: 212:576:1, number of dump devices: 1, device information: (FILE=1,TYPE=VIRTUAL_DEVICE: {‘{34F344B1-0A76-44EF-A72A-6A7B21E301C0}3′}). This is an informational message only. No user action is required.

Something is backing up my database? But who? You call the people who backup the OS and politely tell them to stay away from your databases but they claim not to be backing up SQL although the timeframe corresponds with their backup of the OS. So,there are two clues here:

  1. The OS is being backed up.
  2. There is I/O freezing and resuming during the same time.

 

The IO freeze and resume has to do with Volume Shadow Copy. A technique in Windows to backup files and the system state. What’s even funnier is that SQL thinks the databases are backed up to, at least the management studio. If you look in the MSDB.DBO.BACKUPSET system table you will notice that it’s the LOCAL SYSTEM account that made the backup and that the column is_snapshot has a value of 1. So, question is, can you restore it? Well, no you cannot. Oh, but is it bad? Will it break my backup chain? That’s a no too. Because actually a backup never took place. A couple of things need to be in place for this to happen:

  • The SQL VSS Writer service needs to be running
  • The LOCAL SYSTEM account needs to be SYSADMIN (not a best practice but it happens)

 

If that’s the case, it’s the volume shadow copy backup that’s causing this behavior. So, what’s going on:

  1. Someone starts a backup of the system state using Shadow Copy (e.g. The Windows Backup in Windows itself)
  2. The Shadow Copy service will tell registered writers that it is starting a backup.
  3. If the SQL VSS Writer  is running it will freeze all I/O.
  4. Because you only wanted to backup the system state I/O will resume and no backup is taken.
  5. But SQL will write a record to the log and the MSDB anyway. So in a way this is a ghosted backup.

 

Simple solution is to disable the SQL VSS Writer service. If you’re not backing up SQL using Volume Shadow Copy you will not need it. On the other hand it is harmless. If you want to test it yourself fire up Windows backup and make sure the SQL VSS Writer service is running and the LOCAL SYSTEM account is SYSADMIN.

SQL Server back-up mania part 1

So, you thought you knew everything about SQL backups. Here are two cases for you. We recently came across these in our production environment.

Part 1. Do I need a full back-up before taking a differential or log backup?

BOL is clear on this, for a log backup a full backup is required and since a differential backup is based on the previous full backup, a full backup is required. So, let’s put this to the test.

  • First of we create a sample database with one table in it.
    
    create database BackupTest;gouse BackupTest
    
    go
    
    create table SampleTable (SampleColumn varchar(25));
    
    go
    
    insert into SampleTable (SampleColumn) values ('Put it to the test');  
    

    So try a differential backup followed by a log backup. And? Well is must be no surprise this isn’t working. So create a full backup and name it BackupTest_FULL_1.bak.

  • Next we restore the database to a new database. First off you must drop the database we just created and make sure you mark the Delete backup history option. Now restore the database but restore it to a different database name, for example BackupTest2.
    
    RESTORE DATABASE BackupTest2 FROM
    
      DISK = N'C:Projectsbackuptest_FULL_1.bak' WITH  FILE = 1, 
    
      MOVE N'BackupTest' TO
    
       N'C:Program FilesMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSQLDATABackupTest2.mdf',  MOVE N'BackupTest_log' TO    N'C:Program FilesMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSQLDATABackupTest2_1.LDF',  NOUNLOAD,  STATS = 10
    
    GO  
    
  • Now let’s try to perform a differential backup on this database. And? That works, so a log must also be working. I will spoil the surprise for you, this works. Now use SQL Server Management Studio to restore the database. Right click on the BackupTest2 database and choose tasks > restore > database. This will bring up the most recent restore chain,or will it?

ssms_restore1

So,no backup to restore from, which makes no sense since we were able to perform a differential backup. If you check the database properties for the last full backup data it will show none, which again explains why the restore is failing but it shows a date for the last log backup.

ssms_properties1

 

So check for part 2 where some other strange SSMS behavior and fixes are displayed.

To not Order by in a view

At my work a user called use saying we couldn’t migrate to SQL 2008 because there was a big bug in it. It had to do with sorting data. Instead of properly sorting the data, as SQL 2000 did, SQL 2008 appeared to not sort the data at all. So, we did what every IT pro would do in this situation. Assume the user is always right and you’re not, and migrate everything back to SQL 2000 and call PSS to submit a bug in SQL 2008. Well, that’s what you should never do.

So we tracked the object that would display the user data to check if the sorting was in SQL or in the application. The sorting was in SQL and to be more precise, inside a view. Something like this:

create view Samples.MySortedView  as  select top 100 percent      column1,      column2  from       Samples.OrderByView  order by      column2 

So, not much wrong here you would say? Well, if you execute the statement within the view the data is sorted by column2. If you would execute select column1, column2 from Samples.MySortedView the data would not be sorted as you can see in this screenshot.

image

So? Big bug in SQL Server or not so smart SQL Developers. If you check the execution plan you will see why the view is not sorted.

image

The execution plan shows that the view is not sorted. The reason for this is actually explained in Books Online.

The ORDER BY clause is used only to determine the rows that are returned by the TOP clause in the view definition. The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself.

This means that the TOP in a ORDER BY is only used to determine the rows that are returned. The query engine will first order the list and then return the top rows you specified. And it is here where SQL 2005/2008 differ from SQL 2000. SQL 2005/2008 will say,you want it all (100 percent) so there is no need to sort it first. The sorting is only to determine which rows are returned,and not in which order. That’s a side effect. To prove this you can change to view definition to something other than 100 percent and you will see the data will get sorted. Of course not the real solution. The real solution is to put the order by outside the view (select column1, column2 from Samples.MySortedView order by column2).

So, proven to the user, nothing wrong with SQL. It’s your developer that’s wrong, I suggest you upgrade the developer to a newer version.

CommandType.Text and CommandType.StoredProcedure

I attended the DevDays 2009 PreCon session by Gert Drapers, a well known name in SQL world. He demonstrated, in a very simple way, the difference between CommandType.Text and CommandType.StoredProcedure.

 

Most developers in .NET know you can specify how you’re CommandText will be interpreted. CommandType has three enumerations:

  • Text (Default)
  • StoredProcedure
  • TableDirect

According to MSDN TableDirect is only available for the OLE DB provider which leaves Text and StoredProcedure for the SqlClient provider. But besides choosing between Text and StoredProcedure there are two different ways of sending your query to SQL Server.

Method 1 – Using literals

   1: cmd.CommandText = string.Format(
   2:     @"insert into dbo.t1 values({0},'{1}')",
   3:     i,
   4:     "Insert Demo 1 - Literals");

Method 2 – Using SqlParameter

   1: SqlParameter p1 = new SqlParameter("@p1", SqlDbType.Int, 4);
   2: SqlParameter p2 = new SqlParameter("@p2", SqlDbType.VarChar, 50);
   3:  
   4: cmd.CommandText = "insert into dbo.t1 values(@p1, @p2)";
   5: cmd.Parameters.Add(p1);
   6: cmd.Parameters.Add(p2);

Both methods will get the job done but the first method is in more ways not preferred. Search the internet for the keyword Sqlinjection and you will find out why. The second method is also better if you look at what is actually send to SQL using the profiler. The second method will wrap inside a sp_executesql statement. This will be cached and reused by the engine. So,in more than one way a better option.

So what if you used stored procedures,which is always good. You could write a stored procedure for the insert statement passing two parameters. Something like this:

   1: create procedure dbo.InsertT1
   2: @p1 int, @p2 varchar(50)
   3: as
   4: insert into dbo.t1
   5:  values (@p1, @p2);

The two previous lines of code would look like this if you used CommandType.Text and a stored procedure.

   1: // Using a Procedure with literals
   2: cmd.CommandText =
   3:     string.Format("exec dbo.InsertT1 {0},'{1}'",
   4:     i,
   5:     "Insert Demo 3 - Proc literal");
   6:  
   7: // Using a Procedure with SqlParameter
   8: SqlParameter p1 = new SqlParameter("@p1", SqlDbType.Int, 4);
   9: SqlParameter p2 = new SqlParameter("@p2", SqlDbType.VarChar, 50);
  10:  
  11: cmd.CommandText = "exec dbo.InsertT1 @p1, @p2";
  12: cmd.Parameters.Add(p1);
  13: cmd.Parameters.Add(p2);

So, not much different from a coding point of view. Later on we will see what the difference in execution time is.

And then last but not least, using CommandType.StoredProcedure. The code will look like this.

   1: SqlParameter p1 = new SqlParameter("@p1", SqlDbType.Int, 4);
   2: SqlParameter p2 = new SqlParameter("@p2", SqlDbType.VarChar, 50);
   3:  
   4: cmd.CommandText = "dbo.InsertT1";
   5: cmd.CommandType = CommandType.StoredProcedure;
   6: cmd.Parameters.Add(p1);
   7: cmd.Parameters.Add(p2);

The most interesting part is to see which method is the fastest. If you wrap these five methods in a loop inserting 1000 records you can measure which method will executes the fastest.

  • Demo 1 – Using Literals: 408 ms
  • Demo 2 – Using Parameter: 296 ms
  • Demo 3 – Using Procedure with literals: 348 ms
  • Demo 4 – Using Procedure with parameter: 322 ms
  • Demo 5 – Using CommandType.StoredProcedure: 272 ms

So, the fastest method is the CommandType.StoredProcedure. The difference is only 24ms but at these ratings that’s 10 percent.

There is also a faster method. If you wrap each insert into a single transaction, execution times will drop 60 percent. The reason for this is that when you run into a single transaction there are less actions written to the transaction log. If you’re not using a single transaction for all inserts you will have a begin and commit for every insert. This means that three records are written to the transactionlog. For a single transaction only a begin is written at the start and a commit at after the last record. So, here is the reason for a 60 percent performance gain. You can test this yourself using some of the undocumented stored procedures which read the transaction log.