Last week, Brent Ozar told us to stop worrying about index fragmentation and get on with our lives.
He makes good points, particularly about taking full advantage of the ridiculously low price of memory right now. This makes external fragmentation a thing of the past, while the server is running. SSDs make fragmentation a thing of the past, full stop, so if you’re running those, you can abandon this article now. (See you in the next paragraph!)
I want to bring up an important point that wasn’t mentioned in the article: if your SQL Server instance is turned off, deliberately or otherwise, all caches, including the buffer pool are now cold, and have to be reconstituted from somewhere — that somewhere is from disk.
This is crunch time: the data needs to be read from disk as fast as possible into the buffer pool so it can be served directly from memory the next time it’s requested. This could be the most critical period of time if you have performance SLAs to meet, because as Brent mentioned, random read performance can be orders of magnitude slower than sequential reads1. Putting in place an intelligent index maintenance process helps the system perform more sequential reads, and thus improve performance. The longer the reads take, the more your SLAs are under pressure, or perhaps broken entirely.
That brings up another important point: if you’re ever negotiating performance SLAs, make sure to take cold starts into account during the discussion. The business may be okay with relaxed SLAs in the 10 minutes following a restart, for example. But maybe they aren’t okay with that: index maintenance could play a part in meeting the business requirements without investing in more expensive technologies. In any event, if you’re going to agree to something, this is definitely one aspect you want to put in writing. Moreover, the process should be tested — now, before you agree to anything, and periodically in the future — to make sure that you can keep your word, and give you a proactive look at how long you have until the SLA can no longer be met.
The last thing you want during a failover is a flurry of heated questions from a CIO standing over your shoulder, but if you have a written document you can point to that says “this is what was agreed,” and that’s how the system is performing, that’s the end of the discussion right there.
1 The transfer rate cited — under 2 MB/s random reads — is a somewhat misleading example because no one would set up shared storage with a 4 KB stripe size. At least I hope they wouldn’t… A more typical stripe size is 64 KB, which would yield a higher transfer rate than the cited number, but would still be much slower than sequential reads.