Scale

When an OLTP database goes into production, automatically it’s required to meet certain demands: work correctly and reliably, store data safely and securely, and meet the performance needs of the users.

Depending on what the database is used for, there often comes a time when the business asks us to go back to the data that has accumulated over time, and answer questions to drive the business forward. The business may even want — or need — those answers as soon as new data is entered into the system.

These new demands place an additional workload on the database that may or may not have been anticipated when the system was first designed. Sometimes this workload can consume a significant amount of resources.

No system can be designed from the get-go with perfect future-proofness. Sometimes changing the design of the system is more expensive than creating a new, totally separate system, to answer the business questions. Sometimes the production database already handles a workload near its maximum capacity, and couldn’t maintain the original requirements after adding the new workload, even if the system was perfectly designed.

How could performance suffer with the additional workload?

Memory pressure: reading lots of historical data could potentially bump current data out of memory, causing disk thrashing to keep the required data in memory. This is even more perilous if the existing database is larger than the buffer pool. Using a different SQL instance or physical machine could alleviate this issue.
CPU usage: the workload may need to perform very intensive calculations on a lot of data all at once, thus slowing down other user queries. Business analysis queries may use parallelism, which will increase CPU usage, add contention for worker threads, and increase the probability of blocking and/or deadlocks. Using a different physical machine would alleviate this issue.
Locking, blocking, and deadlocks: if the production database runs at the default isolation level of READ COMMITTED* (or higher) and the business questions require us to use the same level (or higher), there is the potential to create blocks and/or deadlocks, particularly if the existing system is experiencing blocking or deadlocks already. This problem can be solved by using a separate copy of the data; in particular, a read-only database does not even take locks (because there are no writes allowed) and therefore there is no blocking, and no possibility of deadlocks.

The requirements for our project are as follows:

Answer the business questions
Minimize the impact on the production database
The data source should be transactionally consistent (most likely)
The data source should be kept up-to-date automatically (most likely)
The data source should be reconciled to a point-in-time according to the business needs (current as of today, or as of the last hour, etc.)
We don’t need to make changes to data, so a read-only version would be okay

So how do we meet the requirements? A read-only copy of the production database may be the solution. In SQL Server, there are many ways of doing this, depending on our exact needs.

Transaction Log Shipping

Take a transaction log backup; copy the backup to another location; restore the backup. That’s how log shipping works. Log shipping can be configured to restore the backups either in NORECOVERY mode, or STANDBY mode. The latter is what we’re interested in, because it allows us to read the data.

Pros:

Very easy to set up, and simple to understand.
Usually we’re already taking transaction log backups, so there’s no additional load on production.
The log backups can be restored on another server.
Although all editions except Express have built-in tools to configure log shipping, it can still be accomplished in Express by rolling our own mechanism. After all, log shipping is just glorified transaction log backup and restore.

Cons:

May requires us to change our backup strategy, depending on how frequently the read-only copy needs to be updated.
Users have to be kicked out of the standby database to restore subsequent backups.
Requires enough storage to hold an entire second copy of the production database.

Notes:

The production database has to be using either the FULL or BULK_LOGGED recovery model.
CPU and disk resources are required to restore the backups.
Network resources may be required to copy the backups to a different location.
This gives us an entire copy of the database (this could be a pro or con).

Database Snapshots

This gives us a point-in-time, transactionally consistent copy of the entire database. We would keep the copy we read from up-to-date by dropping the snapshot, and taking a new snapshot at a later time.

Pros:

Only requires enough storage to hold the changes made to the production database between the rollover interval. This could allow us to take and keep multiple snapshots at different times and roll over as a window, without taking up N times the amount of storage as the original database.
Very little overhead to take a snapshot (therefore, it’s also fast).
All database recovery models are supported.

Cons:

Introduces extra disk write workload depending on how many changes are made to the production database.
Can only be created within the same instance of SQL Server as the production database.
Requires Enterprise Edition or higher.

Notes:

Requires that the snapshot files be placed on an NTFS volume.
Each file of the source database has a corresponding snapshot file. Try using my sp_TakeSnapshot stored procedure as a base to automate the process of taking a snapshot.
This gives us an entire copy of the database.

Database Mirroring

As of SQL Server 2012, database mirroring is deprecated, but I’m going to mention it here anyway, because the proposed alternative still… has its issues. Database mirroring sends transaction log records over-the-wire to another instance of SQL Server, where they get replayed on the copy of the database. While the secondary database is not directly readable, database snapshots can be taken against it which are. So, the differences between this and snapshots are:

Pros:

Mirroring is usually used for high availability, so this could leverage existing infrastructure that would otherwise be underutilized.
The copy of the database can (in fact, must) be in a different instance (same machine, or another machine).
There’s nothing to mess around with in the file system; instead, communication is done through endpoints.

Cons:

The production database must be using the FULL recovery model.
Requires enough storage for an entire second copy of the database (plus the snapshots).
While database mirroring is available in synchronous mode in Standard Edition and higher, we need snapshots for the purpose of this discussion, which requires Enterprise Edition. Enterprise also allows for asynchronous mirroring mode.

Notes:

The secondary has to do work to restore the log records, including operations such as index maintenance.
Network throughput may be a concern; it’s usually better amortized over time than, e.g., log shipping.
Likely will need to poke holes in the servers’ firewalls to allow communication through.
This gives us an entire copy of the database.

SQL Server Replication

Replication is a system built to sit on top of our database that reads committed transactions, convert them to a format any database system can understand, copies the changes to a subscriber, and applies those changes to another database. Unlike the other methods I’ve covered so far, replication doesn’t actually give us a physically read-only copy of the database. It does, however give us a copy of the data which we can read. There are several different types of replication (snapshot, merge, and transactional), and for the purposes of this discussion I’m going to lump them all together into a single category.

Pros:

Can create a copy of a subset of our data, including vertical and/or horizontal partitioning of individual tables. Replication is the only SQL Server technology that lets us do this. This can be a huge pro for security of data and storage space, because we can publish only the objects we need to read later.
Express Edition can subscribe to all types of publications.
Standard Edition can publish all types of publications. **
Changes are applied to the subscriber without the need to kick people out. (Warning: this means we need to be mindful of the transaction isolation level we use when running queries.)
All database recovery models are supported.

Cons:

Requires configuration, storage, and management of an additional database (the distribution database).
Can be difficult to understand, configure, secure, and administer. (Don’t let this short list of cons fool you.)

Depending on how replication is configured and what we need to accomplish, database schema changes may not propagate to the subscriber. Also, some operations such as index maintenance don’t get applied at subscribers, which may or may not be desirable.

Availability Groups

This is SQL Server’s newest offering, and it’s a hybrid of database mirroring and SQL Server failover clustering all rolled into one feature. (This is the reason why database mirroring on its own is deprecated.)

Pros:

Allows scale-out of reads to multiple secondaries without the use of database snapshots.
Like database mirroring, the read scale-out can also be part of a high availability strategy.

Cons:

The production database must be using the FULL recovery model.
Requires Enterprise Edition.
Must run on a Windows Server failover cluster (WSFC).

Out of all the technologies I’ve listed, this one is probably the least likely to be a good solution to solve the problem of having a read-only copy of the database (and that’s it), unless existing infrastructure is in place for other reasons.

So there we have it: 5 different technologies built into SQL Server to give us a read-only copy of our data. In addition to those, there are various storage-level technologies, including replication and snapshots, that can be used for the same purpose. Even if those technologies aren’t suitable for your project now, it’s worth asking your storage administrator to find out which options are readily available in the event you do need to use them in the future.

Another option is to enable one of the snapshot isolation levels, where only writers block writers. This doesn’t provide a copy of the database, and it has performance implications, but it can be a very easy way out in some circumstances.

* READ COMMITTED is the default isolation level on the user-installable versions of SQL Server, while READ COMMITTED SNAPSHOT is the default for Azure databases (aka Windows Azure SQL Database).

** For completeness, Enterprise Edition is required for a peer-to-peer topology, but this is not a suitable technology to only solve the “read-only copy of our data” problem.

No one wants to constantly monitor and maintain their server farm 24 hours a day, 7 days a week.

In the face of a growing business with a growing number of servers and a growing number of databases to manage, it can be easy to get overwhelmed at times, especially when the business grows fast. This is why scalable management techniques are so important, even if you only have a single server to manage right now.

What would your management techniques look like if you suddenly had to manage 10 times as many servers than you do now?

Trick question. You should already be managing your servers that way.

But how does one accomplish that when it’s so intangible right now?

Ultimately, there is only one concept to get handled: consistency. The other, meaner, uglier, eviler side of that coin is exception, which is something to be avoided like the plague. Interestingly, to explain the concept of consistency, it’s much easier to cite cases of exception; to explain what consistency isn’t. You might even call it… inconsistency. Hmmm.

Do you have a database named Production1 that is backed by files named Testing1.mdf and Testing1.ldf? Is it actually a testing database?
Are your companys’ server or instance names picked by randomly selecting words from the dictionary?
Do you have an instance that contains development, testing and production databases, or any combination of more than one of those? (“…that was the testing database, right?”)

“Wait a second,” I hear you say, “that isn’t hard to manage. In fact, it’s easy to remember.” Famous last words.

…well, maybe not famous.

…and maybe not last.

My point is that remembering is precisely the problem.

Can you keep track of a single inconsistency for each of 10 databases? Alright, sure, no problem. How about 100 databases? Yeah, it’ll suck, but I can do it. How about 1,000? Uhhh… How about 10,000? No way! — and please don’t tell my manager.

Clearly, your own memory does not scale as well as technology does.

“Aha! I know, I’ll use automation!”

Now we’re on the right track. But automation is not a silver bullet. You can try to automate your way out of an inconsistent situation (without changing the situation itself). What you’ll end up doing, though, is spending most of your time maintaining the automation system. It’s trading an intense direct effort, for an intense indirect effort with minimal scale, plus a whole lot of opportunity for mistakes, downtime, and even data loss.

If you have a database to manage granular settings of your databases, you’re (probably) doing it wrong.

In the context of database management, automation is really only useful in two ways:

Making many things similar. (Set all user databases in an instance to use the FULL recovery model.)
Doing something to many similar things. (Take transaction log backups every 15 minutes for all databases using the FULL recovery model.)

That may seem really simple, but it’s also really powerful. This is the key to scale.

The number of production databases is the sum of:

SELECT COUNT(*) - 4 FROM sys.databases

over all of your production servers, not “Umm… give me a few minutes.” (Hint: growing more fingers and toes for counting does not scale well.)

Your ability to keep everything consistent directly reflects and affects the extent to which you can leverage automation to manage your servers, and therefore not have to scale yourself.

Eliminate barriers that prevent consistency.
Create consistency.
Create systems to enforce the consistency you worked so hard to achieve.

In a subsequent post, I’ll introduce and explain some of the tools and techniques you can use to accomplish these steps in SQL Server.

Voluntary DBA

Proactive Database Administration

Registered Servers and Central Management Server: In Action (Video)

Why would I want a read-only copy of my database?

Transaction Log Shipping

Database Snapshots

Database Mirroring

SQL Server Replication

Availability Groups

Scaling Server Management – Theory