When designing new database structures, I feel it’s really important to make sure that the data goes into the new tables as cleanly as possible. Doing this early on helps prevent bugs from creeping into the system as it’s developed. And preventing bugs as early as possible in the development cycle is almost certainly the most cost-efficient strategy — cleaning up bad data later on is extremely expensive, time consuming, and frustrating.
One of the tools that can be used to reject bad data values going into tables are CHECK constraints. This type of constraint is meant to be used to validate a single row based on a predicate that’s a function of any combination of the table columns in that row. (CHECK constraints can be used for things beyond this definition… but doesn’t mean they should be.)
In this post, I’m not going to address any more of the advantages of constraining data values. What I do want to investigate is performance.
The extra logic to do this validation necessarily has additional processing cost associated with it. The question is: how much extra? Does it really make a difference? What kind of general guidelines can be established for performance purposes?
The Setup
I decided to run two series of tests: one with a very narrow table (3 columns), and one with a wide table (21 columns). We’ll INSERT a bunch of test data into each of the tables. A baseline would be established for each test, where the tables have no constraints. Then, different constraints would be added, and hopefully we’ll be able to see a difference in the load times.
Here is an abridged version of the table creation script (note: all scripts are available to download in full at the bottom of this post):
-- Narrow, no constraints
CREATE TABLE [dbo].[NarrowWithoutConstraints]
(
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Col1 int NOT NULL,
Col2 varchar(10) NOT NULL
);
-- Wide, no constraints
CREATE TABLE [dbo].[WideWithoutConstraints]
(
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Col10 int NOT NULL,
...
Col19 int NOT NULL,
Col20 varchar(10) NOT NULL,
...
Col29 varchar(10) NOT NULL,
);
-- Narrow, simple constraints
CREATE TABLE [dbo].[NarrowWithConstraints1]
(
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Col1 int NOT NULL CHECK (Col1 >= 0),
Col2 varchar(10) NOT NULL CHECK (LEN(Col2) > 0)
);
-- Wide, simple constraints
CREATE TABLE [dbo].[WideWithConstraints1]
(
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Col10 int NOT NULL CHECK (Col10 >= 0),
...
Col19 int NOT NULL CHECK (Col19 >= 0),
Col20 varchar(10) NOT NULL CHECK (LEN(Col20) > 0),
...
Col29 varchar(10) NOT NULL CHECK (LEN(Col29) > 0)
);
-- Narrow, complex constraints
CREATE TABLE [dbo].[NarrowWithConstraints2]
(
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Col1 int NOT NULL
CHECK ((Col1 >= 0) AND (Col1 > -9999) AND (Col1 < 5000) AND (Col1 < 9999)),
Col2 varchar(10) NOT NULL
CHECK ((LEN(Col2) > 0) AND (DATALENGTH(Col2) = DATALENGTH(LTRIM(RTRIM(Col2)))))
);
-- Wide, complex constraints
CREATE TABLE [dbo].[WideWithConstraints2]
(
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Col10 int NOT NULL
CHECK ((Col10 >= 0) AND (Col10 > -9999) AND (Col10 < 5000) AND (Col10 < 9999)),
...
Col19 int NOT NULL
CHECK ((Col19 >= 0) AND (Col19 > -9999) AND (Col19 < 5000) AND (Col19 < 9999)),
Col20 varchar(10) NOT NULL
CHECK ((LEN(Col20) > 0) AND (DATALENGTH(Col20) = DATALENGTH(LTRIM(RTRIM(Col20))))),
...
Col29 varchar(10) NOT NULL
CHECK ((LEN(Col29) > 0) AND (DATALENGTH(Col29) = DATALENGTH(LTRIM(RTRIM(Col29))))),
);
-- Narrow, many complex constraints
CREATE TABLE [dbo].[NarrowWithConstraints3]
(
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Col1 int NOT NULL
CHECK ((Col1 >= 0) AND (Col1 > -9999) AND (Col1 < 5000) AND (Col1 < 9999)),
Col2 varchar(10) NOT NULL
CHECK ((LEN(Col2) > 0) AND (DATALENGTH(Col2) = DATALENGTH(LTRIM(RTRIM(Col2)))))
);
ALTER TABLE [dbo].[NarrowWithConstraints3]
ADD CHECK ((Col1 > -1) AND (Col1 > -9998) AND (Col1 < 5001) AND (Col1 < 9998));
...
ALTER TABLE [dbo].[NarrowWithConstraints3]
ADD CHECK ((Col1 > -9) AND (Col1 > -9990) AND (Col1 < 5009) AND (Col1 < 9990));
ALTER TABLE [dbo].[NarrowWithConstraints3]
ADD CHECK ((LEN(Col2) > -1) AND ((DATALENGTH(Col2) + 1) = (DATALENGTH(LTRIM(RTRIM(Col2))) + 1)));
...
ALTER TABLE [dbo].[NarrowWithConstraints3]
ADD CHECK ((LEN(Col2) > -9) AND ((DATALENGTH(Col2) + 9) = (DATALENGTH(LTRIM(RTRIM(Col2))) + 9)));
The Test
Testing these scenarios is pretty simple: shove a bunch of data into a table, and time how long it takes. One of the test harnesses is shown below:
DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @i int = 1250;
SET NOCOUNT ON;
TRUNCATE TABLE [dbo].[<table>];
DBCC CHECKIDENT('dbo.<table>', RESEED, 1);
SET @startTime = SYSDATETIME();
WHILE @i >= 0
BEGIN
INSERT INTO [dbo].[<table>](Col1, Col2)
SELECT
number,
CAST(number AS varchar(10))
FROM master..spt_values v
WHERE v.type = 'P'
SET @i -= 1;
END
SET @endTime = SYSDATETIME();
SELECT DATEDIFF(MILLISECOND, @startTime, @endTime);
So that’s pretty standard. I ran the narrow table test with @i = 1250, and the wide table test with @i = 750, which resulted in 2,560,000 and 1,536,000 rows inserted, respectively. For each table, I ran the test harness 11 times; the first result was discarded, and the rest got recorded and averaged together. Note that the table is TRUNCATEd and the identity seed reset before each run. Also, the database was in SIMPLE recovery.
The Results
My hypothesis was that the constraints (at least the simple ones) would add a little bit of overhead, but I wasn’t sure if it would be measurable, and it was unclear how much (if anything) SQL Server would do to simplify the expressions I used (turns out it didn’t simplify them at all).
|
Average (ms) |
ms/1000 Rows |
% Increase |
Narrow (no constraints) |
12,401 |
0.484 |
– |
Narrow (simple constraints) |
13,034 |
0.509 |
5.1% |
Narrow (complex constraints) |
14,505 |
0.567 |
17.0% |
Narrow (many complex constraints) |
25,101 |
0.981 |
102.4% |
There are a couple of very interesting things here. First of all, the difference between no constraints and a small number of simple constraints is measurable. Second — this is what really surprised me — a complex constraint is significantly more expensive than the simple constraint. This makes sense because functions are always going to be more expensive than something like a basic comparison. It seems that perhaps this difference is magnified in this case because the amount of work to do the actual INSERT is small relative to the total amount of work needed to be done. That said, it appears that many complex constraints could at the very least hobble an INSERT-only workload.
|
Average (ms) |
ms/1000 Rows |
% Increase |
Wide (no constraints) |
12,477 |
0.812 |
– |
Wide (simple constraints) |
14,647 |
0.954 |
17.4% |
Wide (complex constraints) |
20,238 |
1.318 |
62.2% |
In this test, we would expect that the differences are larger because of the number of constraints used. As a sanity check, we can see that the ratio of the increases (62.2% / 17.4% = 3.6) vs (17.0% / 5.1% = 3.3) is approximately the same.
If we compare the narrow table with many complex constraints with the wide table with complex constraints, they both have the same number and type of constraints, and the ms/1000 rows numbers have the same sort of increases (0.981 – 0.484 = 0.497) vs (1.318 – 0.812 = 0.506), respectively. This says that evaluating constraints has a fixed cost, independent of the table schema.
Conclusion
I certainly learned a lot from doing this experiment. There are a few takeaways from the results:
- Disabling CHECK constraints will reduce the load time, perhaps significantly.
- A CHECK constraint’s overhead is determined by it’s complexity of evaluation (comparisons good; function calls not so much).
- Evaluating CHECK constraints has a fixed cost, and thus a smaller relative cost when the other parts of the process are more expensive.
While these aren’t Earth-shattering conclusions, we did get to validate the best-practice of disabling constraints for loading, and see just how much overhead CHECK constraints can add.
What I want to stress here is that this test was for a relatively narrow situation involving only INSERTs. If your workload is 100% INSERTs, then you definitely need to pay attention to these kinds of things, but if your system isn’t INSERTing 200k rows/sec — or even anywhere near that — it’s likely that the benefits of the constraints will outweigh the performance penalty, as the overhead will be a small portion of the overall workload.
Full scripts and test results:
01 – Create Tables.sql
02 – Narrow Table Test Harness.sql
03 – Wide Table Test Harness.sql
Test Results.xlsx