Dec
20
2012

Collation of character fields in temporary objects

It's a common practice to use scalar variables, table variables, and temporary tables when developing a database application. If these transient objects contain (or are) character fields, which collation do they have if a collation isn't explicitly specified?

 

Test Setup

The idea of the test is to demonstrate how collation conflicts are possible when using temporary objects.

First, we'll create a new database with a collation that differs from the instance collation. I've chosen one randomly to make it easy to pick out for the purposes of demonstration. The collation of the instance I'm testing on is SQL_Latin1_General_CP1_CI_AS, which is probably the most common.

CREATE DATABASE CollationTest COLLATE Thai_CS_AI;
GO

USE [CollationTest];
GO

Next, we'll create a base table with a couple of character fields in it, and a row of data to test with:

CREATE TABLE [dbo].[Table1]
(
	/* Inherit the default collation */
	Col1 varchar(20),

	/* Same as the instance collation */
	Col2 varchar(20) COLLATE SQL_Latin1_General_CP1_CI_AS
);
GO

INSERT INTO [dbo].[Table1](Col1, Col2) VALUES (N'a', N'b');
GO

Let's verify Col1's collation:

SELECT
	collation_name
	FROM sys.columns
	WHERE
		(object_id = OBJECT_ID(N'[dbo].[Table1]')) AND
		(name = N'Col1');

Running that query returns Thai_CS_AI, which was inherited from the database collation.

 

Scalar Variables

It turns out that there's no way to get a collation conflict using a scalar variable unless we explicitly use COLLATE on both sides of the comparison expression, like so:

DECLARE @v1 varchar(20) = 'B';

SELECT * FROM [dbo].[Table1]
	WHERE Col2 COLLATE Thai_CI_AI = @v1 COLLATE Thai_CS_AI;

The reason why is because of collation precedence: scalar variables (and string literals, by the way) are always considered to be coercible-default. In other words, the comparison is done using the collation of the field on the other side of the expression:

DECLARE @v1 varchar(20) = 'B';

SELECT * FROM [dbo].[Table1] WHERE Col2 = @v1;

The above query returns a row because the comparison used the case-insensitive SQL_Latin1_General_CP1_CI_AS collation.

It's interesting to note that the collation of a scalar variable can't be explicitly set in its definition; it will always inherit the database collation.

 

Table Variables

Since we know that table variables are actually backed by tempdb, we can now run some tests where the collation of the object doesn't always get clobbered by the comparison field.

Let's try comparing two fields that have the same collation label (implicit):

DECLARE @t table (Col1 varchar(20));

SELECT * FROM [dbo].[Table1] a INNER JOIN @t b ON b.Col1 = a.Col1;

This succeeds. We're comparing a Thai_CS_AI field in the base table with a table variable whose field Col1 has inherited the same collation from the database.

Now let's try two different collations by comparing against Col2 in the base table, and see what happens:

DECLARE @t table (Col1 varchar(20));

SELECT * FROM [dbo].[Table1] a INNER JOIN @t b ON b.Col1 = a.Col2;

Now we get the error, "Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Thai_CS_AI" in the equal to operation." In this case, the two collations are different and at the same precedence level. SQL Server can't simply give one collation precedence over the other, and it throws up its hands in disgust.

 

Temporary Tables

As mentioned in the link in the previous section, there are many similarities and differences between table variables and temporary tables. Let's find out where character field collation fits in. We'll run the exact same tests as in the table variable section, using a temporary table instead.

CREATE TABLE #t (Col1 varchar(20));

SELECT * FROM [dbo].[Table1] a INNER JOIN #t b ON b.Col1 = a.Col1;
GO

DROP TABLE #t;

Now we get a collation conflict error. What? It turns out that temporary tables are different than scalar variables in this respect -- the field collations now inherit from tempdb instead of from the current database. (Edit 2013-02-10: Thanks to Aaron Bertrand for pointing out that temporary tables created in the context of a contained database inherit the collation from the current database, not from tempdb.)

Running the second test now succeeds because both fields have an implicit collation of SQL_Latin1_General_CP1_CI_AS (since that is my instance, and hence tempdb, collation):

CREATE TABLE #t (Col1 varchar(20));

SELECT * FROM [dbo].[Table1] a INNER JOIN #t b ON b.Col1 = a.Col2;
GO

DROP TABLE #t;

 

DATABASE_DEFAULT

Because of this crazy behaviour with temporary tables, a virtual collation DATABASE_DEFAULT was created to say that the field should use the collation of the current database, instead of inheriting from the database where the field was created.

Here's the first query from the previous section again, with the special collation added:

CREATE TABLE #t (Col1 varchar(20) COLLATE DATABASE_DEFAULT);

SELECT * FROM [dbo].[Table1] a INNER JOIN #t b ON b.Col1 = a.Col1;
GO

DROP TABLE #t;

And this succeeds like we would expect.

 

Conclusion

When working with table variables and temporary tables, it's really important to know the similarities and differences between the two. In the case of character field collations, simply switching between the two types of tables (which is often done when doing performance tuning) can potentially have bigger side-effects than were intended!

Even though character fields in table variables always inherit the collation of the current database, I would strongly consider explicitly specifying the DATABASE_DEFAULT collation on those fields anyway (assuming this is appropriate), so that the data/business logic will still be the same if someone decides to go and switch things up when performance tuning.

Comments (1) -

moni

Thanks

Comments are closed