In this post, I’m going to give a high-level overview of where SQL Server stores user data. One of the most important topics, backup and restore, is closely related to how this (and logging) works, so it’s important to have a solid understanding of these concepts to be able to plan and execute a successful backup and restore strategy. I’ll cover logging and the transaction log in a future post.
The structures in which SQL Server stores data are arranged hierarchically. We’re going to start with the smallest relevant units and work our way up.
Pages and Extents
In the SQL Server database engine, a page is an 8 KB region of storage (applies to both memory and disk). Each page has a header region that stores metadata about the page (the page’s internal id, the type of the page, etc.), and a data region that contains the actual data.
While there are many different types of pages, there are only two basic classes:
- Data Pages – Contains information related to data rows themselves.
- Metadata Pages – Contains state information about other pages or extents.
Pages are grouped into sets of 8 (for a total of 64 KB), called extents, which makes the allocation system more efficient. Most extents only contain a single type of page1.
Logically groups extents into objects. A normal table is stored in a single partition, while a partitioned table is stored in multiple partitions. This is how ALTER TABLE … SWITCH is able to efficiently move partitions to/from an independent table, or between partitioned tables — a partition is a table.
A data file is the basic unit of a database in the file system2, and it contains some or all of the extents that make up a partition (see next section on filegroups).
All databases have a primary data file, which stores structural (metadata) information about the database, and by convention has the file extension .mdf. The primary data file cannot be removed from the database.
Secondary data files can be added, which have the extension .ndf by convention.
A filegroup is a named logical construct that contains data files. (Note: log files do not belong to filegroups.) All databases have a PRIMARY filegroup that contains (at least) the primary data file. The PRIMARY filegroup cannot be removed. Filegroups (except PRIMARY) can be set as read-only.
A filegroup allows for the distribution of extent and page allocations between multiple data files. SQL Server uses two algorithms simultaneously when making allocations:
- Round-robin. Allocations are made sequentially from the data files within the filegroup (i.e., for a filegroup with 3 files, the allocations would come from File1, File2, File3, File1, File2, File3, etc.).
- Proportional fill. The number of allocations made from a file is proportional to the amount of free space in that file. The purpose of this is to eventually reach a steady-state where allocations are balanced between all the files. However, you can get into trouble by adding a new, empty, data file, which can result in almost all the allocations coming from that one file, probably defeating the purpose of the allocation distribution among the files in the first place.
Filegroups also serve as the target for object storage. For example, the table [dbo].[Sales] can be stored in a filegroup called SALES.
In this context, a database is a logical container that contains one or more filegroups (PRIMARY, plus zero or more others). One of the database’s filegroups can be set as the default filegroup, where objects get created if the T-SQL statement does not explicitly specify a filegroup.
Now that you understand the basic internal storage structures SQL Server uses, you will be able to easily understand how many other areas of the system work, including backups, which is an important topic I will cover in a future post.
1 Mixed extents can contain multiple types of pages. These are a small minority of allocations.
2 Not getting into FILESTREAM or snapshot files here.