What changed in my database last week?

PANIC! A customer clicked through four layers of warning messages and accidentally deleted a bunch of data from the database, but didn’t bother to tell you about it until today.

Great. The database is in FULL or BULK_LOGGED recovery, and we have a full set of transaction log backups that contain all the transactions in the database for the time when things “happened.” Okay… now what? Log backups seem very opaque, as we can’t just open them up in Notepad and expect things to be human-readable.

Enter the undocumented table-valued function: fn_dump_dblog.

This function reads a transaction log backup file and returns a human-readable geek-readable description of the physical log records from the backup.

(The sister function fn_dblog does the same sort of thing, except it operates only on the active transaction log of a database, not a transaction log backup.)

Paul Randal wrote a detailed blog entry on the basics of how to use both of these functions, so I won’t cover that here. Where this post differs is in consuming the output of the function so it’s much more usable.

The first step is to read the entire backup and dump the output into a temporary table. This will make querying the same set of transactions (or more transactions if you load additional backups into the table) much faster, as the log reading part of things is rather slow.

--INSERT INTO #transactions SELECT *
--SELECT * INTO #transactions
	FROM
		fn_dump_dblog
		(
			NULL,	-- Start
			NULL,	-- End
			'DISK',	-- Device Type
			1,		-- File number in backup
			'',		-- Backup file
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL
		);

ALTER TABLE #transactions
	ADD
		StartDate datetime NULL,
		EndDate datetime NULL;

UPDATE #transactions
	SET
		StartDate = CAST([Begin Time] AS datetime),
		EndDate = CAST([End Time] AS datetime)

ALTER TABLE #transactions DROP COLUMN [Begin Time];
ALTER TABLE #transactions DROP COLUMN [End Time];

Now that we have the transactions available for querying more readily, let’s show what we need to see in an easy-to-consume format. This works best if you’ve restored a copy of the database in STANDBY mode to a point in time before the time of interest. If the script is run in the context of that database, the code will show you the names of the tables affected, the login SID of who made the change, and also proactively return a DBCC PAGE command for when you want to look at the raw data values. This makes it really easy to inch through the transaction log to figure out what changed using out-of-the-box tools. (Yes, there are 3rd-party tools that do this, too.)

DECLARE @startDate datetime = NULL;
DECLARE @endDate datetime = NULL;
DECLARE @minLSN varchar(22) = NULL; /* '00000000:00000000:0000' */
DECLARE @maxLSN varchar(22) = NULL; /* '00000000:00000000:0000' */

SELECT
	a.TransactionId,
	a.Seq,
	a.LSN,
	a.SID,
	a.StartDate AS TransactionStartDate,
	a.EndDate AS TransactionEndDate,
	a.Operation,
	a.TableName,
	a.FileNumber,
	a.PageNumber,
	a.SlotId,
	(
		CASE WHEN a.FileNumber IS NOT NULL THEN
			'DBCC PAGE (''' + DB_NAME() + N''', ' + CAST(a.FileNumber AS varchar(MAX)) + ', ' + CAST(a.PageNumber AS varchar(MAX)) + ', 3) WITH TABLERESULTS'
		END
	) AS DBCCPageCommand
	FROM
	(
		SELECT
			UPPER(t.[Transaction ID]) AS TransactionId,
			ROW_NUMBER() OVER(PARTITION BY t.[Transaction ID] ORDER BY t.[Current LSN]) AS Seq,
			UPPER(t.[Current LSN]) AS LSN,
			bt.StartDate,
			ct.EndDate,
			t.Operation,
			CAST(CONVERT(varbinary, UPPER(LEFT(t.[Page ID], 4)), 2) AS int) AS FileNumber,
			CAST(CONVERT(varbinary, UPPER(RIGHT(t.[Page ID], 8)), 2) AS int) AS PageNumber,
			t.[Slot ID] AS SlotId,
			o.name AS TableName,
			bt.[Transaction SID] AS SID
			FROM #transactions t
			LEFT OUTER JOIN #transactions bt ON ((bt.[Transaction ID] = t.[Transaction ID]) AND (bt.Operation = 'LOP_BEGIN_XACT'))
			LEFT OUTER JOIN #transactions ct ON ((ct.[Transaction ID] = t.[Transaction ID]) AND (ct.Operation = 'LOP_COMMIT_XACT'))
			LEFT OUTER JOIN
			(
				sys.partitions p
				INNER JOIN sys.objects o ON o.object_id = p.object_id
			) ON p.partition_id = t.PartitionId
			WHERE
				(t.Context IN ('LCX_CLUSTERED', 'LCX_HEAP')) AND
				(t.[Transaction ID] != N'0000:00000000') AND
				((@startDate IS NULL) OR (t.StartDate IS NULL) OR (t.StartDate >= @startDate)) AND
				((@endDate IS NULL) OR (t.EndDate IS NULL) OR (t.EndDate <= @endDate)) AND
				((@minLSN IS NULL) OR (t.[Current LSN] >= @minLSN)) AND
				((@maxLSN IS NULL) OR (t.[Current LSN] <= @maxLSN))
	) a
	ORDER BY
		a.StartDate,
		a.EndDate,
		a.TransactionId,
		a.LSN;

If you feel like playing around, there are many more fields that come back from the function; I’ve chosen to output the set of columns that I find most useful when I need to use this script.

Once you’ve identified when the change occurred, you can run a data comparison tool between the STANDBY database, and the current database (or a STANDBY copy from immediately after the change).

A copy of the full script can be downloaded here.

Managing the Size of the Transaction Log

When we manage databases in either the FULL or BULK_LOGGED recovery models, we need to pay particular attention to the size of the transaction log files. If our processes aren’t optimal, we can see log files grow either out of control, or beyond what we think is a reasonable size.

Virtual Log Files

As I mentioned in a previous post, the transaction log contains an ordered sequence of the physical operations that occur on a database.

What I didn’t mention is that the physical transaction log file also contains logical sections, called virtual log files (VLFs). It’s the VLFs which actually contain the physical operations I mentioned. The purpose of VLFs is to allow SQL Server to more efficiently manage the log records; specifically, to know which portions of the transaction log are used, and which aren’t.

Knowing the portions of the log that are used is important when we go to take a transaction log backup, which creates a copy of all the transaction log records since the last transaction log backup. After a transaction log backup, as long as all the log records within a VLF are not part of an active transaction (or are required for some other feature, such as replication), the VLF can be marked as unused/inactive. This allows SQL Server to reuse that portion of the log file. This process is called many names, including clearing the log and log truncation. It does not affect the physical size of the log file.

Problems only start happening if all the VLFs end up being used, and none are available1. This means the log file (the physical file) has to grow to accommodate more transactions. A physical log file growth automatically creates new unused VLFs, and so the transactions can continue.

What is causing the log to grow after I shrank it?

Any write operation in the database can potentially cause the log file to grow. The real question is: why are all the VLFs used up when that’s not what is expected?

Here are some of the problems you might encounter:

  • Not taking transaction log backups frequently enough, or not taking them at all. The only way to mark VLFs as inactive is to take a transaction log backup (again, only for FULL and BULK_LOGGED databases). If transaction log backups are only taken once/day, besides exposing yourself to a full day of data loss, you also need a transaction log file large enough to hold all the transactions that occur for a full day. That could be quite a bit of space! The solution is obvious, and it’s a win for recoverability, too: take transaction log backups more frequently.
  • Overly aggressive and/or too frequent index maintenance. I’m certainly a proponent of index maintenance, but it’s very easy to get carried away. Maintenance plans that rebuild shiny new indexes for you every night are both ridiculously easy to set up, and also ridiculously bad for the size of the transaction log. Typically, rebuilding all the indexes of a database takes more log space than the data files actually take up. If you’re dealing with databases of even small-to-moderate sizes (say, 1 GB and larger), this can add up to a tremendous amount of wasted storage space, because all that transaction log has to be backed up at some point, and the log file will likely end up larger than the data files. What I strongly recommend doing is putting in place a much more selective index maintenance process, such as Ola Hallengren’s index maintenance scripts, which I use in production.
  • The database is in FULL or BULK_LOGGED when it actually should be in SIMPLE. Before you go flipping the settings around (which can potentially cause data loss), go read about choosing the right recovery model for the database. Typically, I see this happen when a production backup is restored to a development environment, and a backup gets taken for some reason. And then the transaction log ends up growing, and growing, and growing as developers test and roll back their work, right up until things screech to a halt when the drive that contains the log file has filled up.
  • A SQL Server feature, such as replication or mirroring, is enabled. Until the transactions have been copied to the secondary server, the records cannot be cleared from the transaction log. If the secondary is slow applying changes, or if you don’t even have a secondary any more and things weren’t cleaned up correctly, this could lead to infinite transaction log growth. To solve this really depends on where the specific problem is — solve that first, and then see if there is still an issue with the transaction log.

If none of those cover your scenario, you can run the following query, which will tell you why the transaction log is still in use.

SELECT
    name,
    log_reuse_wait_desc,
    recovery_model_desc
    FROM sys.databases

 
1 Extra empty log space is pre-allocated such that there is always enough space for active transactions to be able to roll back. Regardless, that space is “used” in the sense that it’s been spoken for already.

Choosing a Database Recovery Model

The recovery model of a database is a critical setting that determines how the transaction log both records and retains information about transactions that occur in the database. It’s important to remember that the setting always applies to the entire database.

As we’ll explore in this post, there are 3 recovery models in SQL Server to choose from: FULL, BULK_LOGGED, and SIMPLE; each model has advantages, disadvantages, and tradeoffs. Only you can decide which model is appropriate for your particular situation and goals; we’ll go through some of those tradeoffs and a few typical scenarios.

First of all, let’s take a quick step back and ask why we care about what gets recorded in the transaction log.

What is recorded in the transaction log?

The transaction log contains an ordered sequence of physical operations that occur within a database. Each physical operation is written to the transaction log before the change itself is applied to the data page(s). This is called write-ahead logging, and it occurs regardless of the recovery model chosen.

Usually, the physical operations written to the transaction log also contain enough information to perform the inverse operation of the change (i.e., to undo a delete, do an insert). This combined with the write-ahead logging mechanism ensures transactions succeed or fail as a whole (atomicity) by giving SQL Server the ability to both undo and redo transactions. Hopefully those two bolded words set off hints about how logging is intimately tied to backup, restore, and several other features in SQL Server — we’ll get to those in a minute.

Minimal Logging

There are a handful of operations we can do in a database that can be minimally logged — this means that only enough information required to undo the transaction is logged. In the case of a BULK INSERT, for example, all that’s logged are page allocations — we can undo those by deallocating the pages, but we can’t redo them because the actual contents of the pages aren’t recorded in the log.

We gain speed by writing to the transaction log less, but lose the ability to perform point-in-time recovery during the operation.

You might be wondering how a database restored using only transaction log backups can be complete after a point in time where an operation such as a BULK INSERT took place. How does SQL Server know what to put on those data pages, since only the allocations were logged?

The answer is in what is recorded by the first transaction log backup taken after the minimally-logged operation. SQL Server keeps track of extents that are modified by minimally-logged operations in a special metadata page (the ML map, or minimally-logged map, which contains 1 bit per extent), and copies those extents into the transaction log backup, along with the required portion of the transaction log itself. Yes, transaction log backups can contain data!

The net result of minimal logging isn’t that we get the operation completely for free; we just defer a good chunk of the work to the backup subsystem for an immediate gain in speed.

Armed with this knowledge, let’s take a look at the features of each of the three recovery models.

FULL

  • Full logging, all the time.
  • Supports point-in-time recovery (you must be taking transaction log backups).
  • Only transaction log backups internally clear the transaction log, so a good backup strategy is a must.
  • Required for certain features, such as Database Mirroring and Availability Replicas.
  • Typically used for OLTP databases, and where data protection is a high priority.

BULK_LOGGED

  • Full logging, except when the required conditions are met for minimally-logged operations.
  • Supports point-in-time recovery (again, you have to take transaction log backups) for all transaction log backups that do not contain minimally-logged operations. This means you should take a transaction log backup immediately before, and immediately after the minimally-logged operation(s), to maximize your ability to do a point-in-time restore.
  • Only transaction log backups internally clear the transaction log (same as FULL).
  • Used for performing bulk operations on a database that would otherwise prefer to be in FULL recovery. A database can be switched freely between FULL and BULK_LOGGED without breaking the transaction log backup chain.

SIMPLE

  • Full logging, except when the required conditions are met for minimally-logged operations.
  • Does not support point-in-time recovery, and the transaction log cannot be backed up.
  • A database CHECKPOINT internally clears the transaction log as part of the process (since you can’t take transaction log backups), which means you don’t necessarily have to manage the transaction log as much as the other recovery models.
  • Usually used for databases that frequently handle large quantities of data (i.e., data warehouses), or databases where point-in-time recovery is not important.
  • Switching a database from FULL or BULK_LOGGED to SIMPLE breaks the transaction log backup chain. The transaction log backup chain can be restarted by switching from SIMPLE back to FULL or BULK_LOGGED, and taking a full or differential backup.

 

A good place to start choosing a recovery model would be by answering the following questions. Remember that these are only guidelines, and your specific situation will determine the best course of action.

  • Does the database require point-in-time recovery? (Yes = FULL or BULK_LOGGED, No = SIMPLE)
  • Are special features going be used on this database? (Yes = FULL or BULK_LOGGED — usually FULL — depending on the feature)
  • Will large quantities of data be imported into, or manipulated within this database? (Yes = BULK_LOGGED or SIMPLE)