Troubleshooting – Voluntary DBA

Allowing Remote Connections to SQL Express (the Right Way) (Video)

September 19, 2013August 28, 2018Jon Seigel

Sometimes we need to allow remote connections to a SQL Server Express or Developer edition instance, usually on our local machine. In this video demo, I show you how to allow remote access without resorting to disabling the Windows Firewall entirely, which is an unfortunately-common solution pervading the internet.

Since there are many possible configurations (which is why I say “default” a lot in the video — sorry), and part of what I covered applies to all editions of SQL Server, a much more complete reference can be found here on MSDN.

Is my SQL Server’s memory over-committed?

March 19, 2013August 28, 2018Jon Seigel

As our applications’ data grows, so usually does the memory required by SQL Server to efficiently process requests for that data. Sometimes those requirements are more than the host operating system instance can handle, and we don’t find out about it until it’s too late and performance takes a nosedive. In this post, we’ll explore what memory over-commit is and why it’s a bad thing, how to mitigate the problem, and how to help prevent it from occurring in the first place.

Cause

It’s pretty obvious that memory over-commit occurs when the amount of memory required by applications exceeds the amount of physical memory available in the host operating system. (This applies equally to both physical and virtual machines. In this post, when I say “host operating system,” I mean an operating system instance that hosts SQL Server, not an operating system instance that hosts virtual machines.)

When the amount of memory required exceeds the amount of physical memory available, Windows uses disk (the page file) as a persistent store to satisfy the excess memory requirements. This is why this mechanism is called Virtual Memory — it looks like normal memory to an application, but really Windows is backing it with disk.

How does this happen? Well, first, you’ll notice that I haven’t mentioned anything directly about SQL Server. Virtual Memory is a mechanism of Windows, and so it applies to all applications that run under Windows, including SQL Server. In fact, it’s possible the system is over-committed because of memory requirements from applications other than SQL Server. Usually, though, SQL Server is the largest consumer of memory in a Windows instance, so it’s also usually responsible for causing over-commit problems.

The heart of the issue is controlling the amount of memory SQL Server is allowed to allocate. The only real influence we have over this is the Max Server Memory setting. While that might sound really concrete, the problem is that it… doesn’t actually control the total amount of memory SQL Server can allocate. On SQL Server 2005 to 2008 R2, this setting controls the maximum amount of memory used for the buffer pool only; it doesn’t include other memory pools such as the procedure cache, which can be very significant (gigabytes!) in some scenarios. SQL Server 2012 improves the state of affairs by increasing the scope of what this setting covers. While it’s still not perfect, it’s a welcome improvement to better represent what the setting actually does, and offers greater control of maximum memory utilization. In any event, the point is that this setting underestimates the amount of memory that’s going to be used (sometimes significantly, as mentioned), which can lead to unexpected over-commit.

Symptoms

The performance implications of backing memory with disk can be crippling: disk can be thousands of times slower than physical memory, particularly when it comes to where the Windows page file is landed, as we don’t normally put it on our fastest, most expensive storage device. Probably the worst-case scenario is when the page file is landed on a RAID 1 mirror (the typical physical machine scenario), which simply isn’t meant to handle a huge number of random reads or writes.

In order to detect when memory over-commit is happening, you’ll have to be doing Performance Monitor (PerfMon) logging, as you really won’t see anything directly in SQL Server except that things are very slow (more accurately, the wait time associated with retrieving a page from the buffer pool without physical I/O will be high). I strongly recommend setting up 24/7 PerfMon logging in your environment, and at some point I’ll write a post or record a demo video of how to set it up.

Below are the key PerfMon counters you’ll want to record to detect and troubleshoot memory over-commit. This, of course, is by no means an exhaustive list of all the counters you should be recording.

Paging File(_Total)\% Usage – Not surprisingly, this counter can be a dead giveaway to detect if there are issues. If it’s at any value greater than zero, you need to take a closer look at the other counters to determine if there’s a problem. Sometimes a system will be perfectly fine with a value less than 2-3% (it also depends on the size of the page file), but the higher this counter is, the more of a red flag it is. Also, watch this counter to make sure it’s stable, and not creeping up over time.
Memory\Available MBytes – If this number is below ~500 (in the absence of page file usage), you’re in the danger zone of over-commit. It’s recommended to keep at least this much memory available not only for unexpected SQL Server usage, but also to cover the case where administrators need to Remote Desktop into the box for some reason. User sessions take memory, so we need to keep some free for emergencies. I won’t get into the amount of memory to keep free on a SQL Server here, as that’s a discussion in itself. My point here is that if this counter is getting too low (less than ~500), you could start getting in trouble soon. I should note also that if the system is currently over-committed, this counter will reflect the amount of virtual memory provisioned, as it gets counted as available memory. So the system could be over-committed, yet appear to have plenty of available memory — look at the other counters to put the number in context.
Physical Disk\Disk Reads/sec and Physical Disk\Disk Writes/sec for the disk that has the page file on it – Normal operations do cause some disk activity here, but when memory over-commit happens, these counters will spike up dramatically from the baseline.

Since memory over-commit can only happen when the amount of physical memory is exhausted, the system will only become slow after a certain point. In troubleshooting, sometimes a SQL instance (or Windows itself) is restarted, and it fixes the problem for a while, only to return some time later. By now it should be obvious that this happens because after a restart, the SQL Server buffer pool is empty, and there’s no possibility of over-commit until the physical memory is used up again.

Solution

Iteratively lower SQL Server’s Max Server Memory setting (or initially set it to a reasonable value), and monitor the performance counters until the system falls back to a stable configuration. Because of the nature of Virtual Memory, Windows can hold on to swapped-out pages for quite a long time, so it’s possible that the counters will stabilize with the page file usage at a higher level than normal. That may be okay, as when the pages are swapped back in, they will never be swapped out again, unless the settings on this iteration are still out of whack. If the initial configure was way off (default Max Server Memory setting), you may want to restart the box to start with a clean slate, because the counters will be so far out.

It seems counter-intuitive to lower the amount of memory SQL Server is able to allocate. SQL Server internally manages which sets of pages in memory are hot and cold, an insight Windows doesn’t have. This means that by adjusting the Max Server Memory setting down, even though the amount of memory available to SQL Server will be less, it will still be able to perform well by keeping the most active pages in memory, and only going to physical disk occasionally for pages that aren’t in the buffer pool, as opposed to potentially going to disk for any memory access.

Prevention

While over-commit can never truly be prevented — users could potentially run other applications on the SQL box that require lots of memory — what you can put in place is an early-warning system by monitoring the PerfMon counters. Third-party software solutions should be able to help with this, particularly if you manage many servers.

Speaking of other applications, if you have any installed on the SQL box (including the third-party monitoring software I just mentioned), it’s doubly important to monitor the state of affairs, as these are variables out of your control. The Max Server Memory setting and the amount of available member should be more conservative in this case.

It’s also important, particularly if your SQL Server is version 2005 to 2008 R2, to ensure the Max Server Memory setting is allowing for some future growth in your environment. Because the setting doesn’t encompass the plan cache, even adding an insignificantly-small database could cause over-commit if many different queries are run against it. The setting and counters should be evaluated as part of the change process. For SQL Server 2012, this is less of a concern for the reasons previously mentioned, but it can still be worth checking things out as part of your regular change process.

Finally, try to avoid letting users remote into the SQL box to do regular work or maintenance, as this can use up a tremendous amount of memory. Nearly all tasks can be accomplished remotely using SQL Server Management Studio and remotely/non-interactively using PowerShell. If your administrators’ workstations aren’t in the same domain as your servers, create a management box on the server domain, and remote into that instead to manage the servers.

What changed in my database last week?

November 8, 2012August 29, 2018Jon Seigel

PANIC! A customer ~~clicked through four layers of warning messages and~~ accidentally deleted a bunch of data from the database, but didn’t bother to tell you about it until today.

Great. The database is in FULL or BULK_LOGGED recovery, and we have a full set of transaction log backups that contain all the transactions in the database for the time when things “happened.” Okay… now what? Log backups seem very opaque, as we can’t just open them up in Notepad and expect things to be human-readable.

Enter the undocumented table-valued function: fn_dump_dblog.

This function reads a transaction log backup file and returns a ~~human-readable~~ geek-readable description of the physical log records from the backup.

(The sister function fn_dblog does the same sort of thing, except it operates only on the active transaction log of a database, not a transaction log backup.)

Paul Randal wrote a detailed blog entry on the basics of how to use both of these functions, so I won’t cover that here. Where this post differs is in consuming the output of the function so it’s much more usable.

The first step is to read the entire backup and dump the output into a temporary table. This will make querying the same set of transactions (or more transactions if you load additional backups into the table) much faster, as the log reading part of things is rather slow.

--INSERT INTO #transactions SELECT *
--SELECT * INTO #transactions
	FROM
		fn_dump_dblog
		(
			NULL,	-- Start
			NULL,	-- End
			'DISK',	-- Device Type
			1,		-- File number in backup
			'',		-- Backup file
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
			NULL, NULL, NULL
		);

ALTER TABLE #transactions
	ADD
		StartDate datetime NULL,
		EndDate datetime NULL;

UPDATE #transactions
	SET
		StartDate = CAST([Begin Time] AS datetime),
		EndDate = CAST([End Time] AS datetime)

ALTER TABLE #transactions DROP COLUMN [Begin Time];
ALTER TABLE #transactions DROP COLUMN [End Time];

Now that we have the transactions available for querying more readily, let’s show what we need to see in an easy-to-consume format. This works best if you’ve restored a copy of the database in STANDBY mode to a point in time before the time of interest. If the script is run in the context of that database, the code will show you the names of the tables affected, the login SID of who made the change, and also proactively return a DBCC PAGE command for when you want to look at the raw data values. This makes it really easy to inch through the transaction log to figure out what changed using out-of-the-box tools. (Yes, there are 3rd-party tools that do this, too.)

DECLARE @startDate datetime = NULL;
DECLARE @endDate datetime = NULL;
DECLARE @minLSN varchar(22) = NULL; /* '00000000:00000000:0000' */
DECLARE @maxLSN varchar(22) = NULL; /* '00000000:00000000:0000' */

SELECT
	a.TransactionId,
	a.Seq,
	a.LSN,
	a.SID,
	a.StartDate AS TransactionStartDate,
	a.EndDate AS TransactionEndDate,
	a.Operation,
	a.TableName,
	a.FileNumber,
	a.PageNumber,
	a.SlotId,
	(
		CASE WHEN a.FileNumber IS NOT NULL THEN
			'DBCC PAGE (''' + DB_NAME() + N''', ' + CAST(a.FileNumber AS varchar(MAX)) + ', ' + CAST(a.PageNumber AS varchar(MAX)) + ', 3) WITH TABLERESULTS'
		END
	) AS DBCCPageCommand
	FROM
	(
		SELECT
			UPPER(t.[Transaction ID]) AS TransactionId,
			ROW_NUMBER() OVER(PARTITION BY t.[Transaction ID] ORDER BY t.[Current LSN]) AS Seq,
			UPPER(t.[Current LSN]) AS LSN,
			bt.StartDate,
			ct.EndDate,
			t.Operation,
			CAST(CONVERT(varbinary, UPPER(LEFT(t.[Page ID], 4)), 2) AS int) AS FileNumber,
			CAST(CONVERT(varbinary, UPPER(RIGHT(t.[Page ID], 8)), 2) AS int) AS PageNumber,
			t.[Slot ID] AS SlotId,
			o.name AS TableName,
			bt.[Transaction SID] AS SID
			FROM #transactions t
			LEFT OUTER JOIN #transactions bt ON ((bt.[Transaction ID] = t.[Transaction ID]) AND (bt.Operation = 'LOP_BEGIN_XACT'))
			LEFT OUTER JOIN #transactions ct ON ((ct.[Transaction ID] = t.[Transaction ID]) AND (ct.Operation = 'LOP_COMMIT_XACT'))
			LEFT OUTER JOIN
			(
				sys.partitions p
				INNER JOIN sys.objects o ON o.object_id = p.object_id
			) ON p.partition_id = t.PartitionId
			WHERE
				(t.Context IN ('LCX_CLUSTERED', 'LCX_HEAP')) AND
				(t.[Transaction ID] != N'0000:00000000') AND
				((@startDate IS NULL) OR (t.StartDate IS NULL) OR (t.StartDate >= @startDate)) AND
				((@endDate IS NULL) OR (t.EndDate IS NULL) OR (t.EndDate <= @endDate)) AND
				((@minLSN IS NULL) OR (t.[Current LSN] >= @minLSN)) AND
				((@maxLSN IS NULL) OR (t.[Current LSN] <= @maxLSN))
	) a
	ORDER BY
		a.StartDate,
		a.EndDate,
		a.TransactionId,
		a.LSN;

If you feel like playing around, there are many more fields that come back from the function; I’ve chosen to output the set of columns that I find most useful when I need to use this script.

Once you’ve identified when the change occurred, you can run a data comparison tool between the STANDBY database, and the current database (or a STANDBY copy from immediately after the change).

A copy of the full script can be downloaded here.

Fix for Web Service Task “Cannot find definition for …. Service Description with namespace … is missing. Parameter name: name”

October 24, 2012August 29, 2018Jon Seigel

The problem is that the SSIS Web Service Task does not handle namespaces other than the default.

While the web service WSDL file may validate just fine, SSIS still balks at it. Here’s how to fix up the WSDL file.

Before:

<wsdl:definitions
	...
	targetNamespace="http://tempuri.org"
	xmlns:tns="http://tempuri.org"
	xmlns:ns1="http://www.MyServiceTargetNameSpace.com">
	...
	<wsdl:service name="MyServiceName">
		<wsdl:port name="MyPortName" binding="ns1:MyBindingName">
			...
		</wsdl:port>
	</wsdl:service>
</wsdl:definitions>

After:

<wsdl:definitions
	...
	targetNamespace="http://www.MyServiceTargetNameSpace.com"
	xmlns:tns="http://www.MyServiceTargetNameSpace.com">
	...
	<wsdl:service name="MyServiceName">
		<wsdl:port name="MyPortName" binding="tns:MyBindingName">
			...
		</wsdl:port>
	</wsdl:service>
</wsdl:definitions>

You could also just overwrite the targetNamespace and xmlns:tns attribute values with your service target namespace. Essentially, they all have to end up being the same namespace for it to work.

How are deadlocks created? (Video)

October 4, 2012August 28, 2018Jon Seigel

Deadlocks occur when two processes both hold locks on resources, and also try to take locks on resources from the other process. This results in neither process being able to continue.

SQL Server resolves these situations automatically by terminating one of the processes, thus allowing the other(s) to proceed. This is done by an internal process called the Deadlock Monitor.

In this video demo, I demonstrate a basic sequence of events needed to create a deadlock. I also look at the locks that are held and requested by each process, and introduce you to reading a deadlock graph (which is what’s captured from the Extended Events deadlock capturing system I presented last week).

(Sorry my cursor isn’t visible in the video — I blame the capture program I used. It won’t happen again.)

If you want to reproduce the situation yourself, you can download the scripts here.