Scaling Server Management - Theory

No one wants to constantly monitor and maintain their server farm 24 hours a day, 7 days a week.

In the face of a growing business with a growing number of servers and a growing number of databases to manage, it can be easy to get overwhelmed at times, especially when the business grows fast. This is why scalable management techniques are so important, even if you only have a single server to manage right now.

What would your management techniques look like if you suddenly had to manage 10 times as many servers than you do now?

Trick question. You should already be managing your servers that way.

But how does one accomplish that when it’s so intangible right now?

Ultimately, there is only one concept to get handled: consistency. The other, meaner, uglier, eviler side of that coin is exception, which is something to be avoided like the plague. Interestingly, to explain the concept of consistency, it’s much easier to cite cases of exception; to explain what consistency isn’t. You might even call it... inconsistency. Hmmm.

  • Do you have a database named Production1 that is backed by files named Testing1.mdf and Testing1.ldf? Is it actually a testing database?
  • Are your companys’ server or instance names picked by randomly selecting words from the dictionary?
  • Do you have an instance that contains development, testing and production databases, or any combination of more than one of those? (“...that was the testing database, right?”)

"Wait a second," I hear you say, "that isn’t hard to manage. In fact, it’s easy to remember." Famous last words.

...well, maybe not famous.

...and maybe not last.

My point is that remembering is precisely the problem.

Can you keep track of a single inconsistency for each of 10 databases? Alright, sure, no problem. How about 100 databases? Yeah, it’ll suck, but I can do it. How about 1,000? Uhhh... How about 10,000? No way! -- and please don’t tell my manager.

Clearly, your own memory does not scale as well as technology does.

"Aha! I know, I’ll use automation!"

Now we’re on the right track. But automation is not a silver bullet. You can try to automate your way out of an inconsistent situation (without changing the situation itself). What you’ll end up doing, though, is spending most of your time maintaining the automation system. It’s trading an intense direct effort, for an intense indirect effort with minimal scale, plus a whole lot of opportunity for mistakes, downtime, and even data loss.

If you have a database to manage granular settings of your databases, you’re (probably) doing it wrong.

In the context of database management, automation is really only useful in two ways:

  • Making many things similar. (Set all user databases in an instance to use the FULL recovery model.)
  • Doing something to many similar things. (Take transaction log backups every 15 minutes for all databases using the FULL recovery model.)

That may seem really simple, but it’s also really powerful. This is the key to scale.

The number of production databases is the sum of:

SELECT COUNT(*) - 4 FROM sys.databases

over all of your production servers, not "Umm... give me a few minutes." (Hint: growing more fingers and toes for counting does not scale well.)

Your ability to keep everything consistent directly reflects and affects the extent to which you can leverage automation to manage your servers, and therefore not have to scale yourself.

  1. Eliminate barriers that prevent consistency.
  2. Create consistency.
  3. Create systems to enforce the consistency you worked so hard to achieve.

In a subsequent post, I’ll introduce and explain some of the tools and techniques you can use to accomplish these steps in SQL Server.