Scale out database applications using SQL Server Service Broker (video)

Database applications can be scaled up or out by using SQL Server Service Broker, which provides asynchronous messaging and queuing infrastructure. The great thing is that the framework of this functionality is already built into your existing databases! In this video, I explain the basics of how a Service Broker application works and the database objects behind creating your own Service Broker applications.



Video resources:

SqlDependency class
Service Broker samples on CodePlex
ssbdiagnose utility


Globalizing applications, in any locale

I was watching a car review on YouTube the other day, and the host made the point that the voice recognition system only understood names as they're pronounced, according to the locale of the infotainment software (U.S. English in this case). Moreover, he said the user should think about how this process works in order for it to be more usable.

Now, that's a great idea, and it kinda-sorta works, but it's not very user-friendly. Users shouldn't be expected to understand how a system functions internally for it to be usable. Many people know how to drive, yet they may not understand exactly how a car works.

These are the kinds of issues we encounter when developing globalized applications. How do we bridge the gap between what the user understands, and what the computerized system understands?

I think it comes down to assumptions. In other words, the computerized system assumes something, while the user either isn't aware of that assumption, or doesn't provide enough input (or is never prompted for input) for the system to do anything except make an assumption.

In the case of the voice recognition system, the system assumes the user's contact names are pronounced based on the locale of the system. This sounds like a reasonable assumption, and for the most part, it works. When thinking about it in the context of an increasingly globalized society, though, it's becoming a less and less valid assumption. If the system is only single-locale-aware, there's no other choice for it to make, just given a list of names.

While giving the user the ability to choose a different default locale for the system probably helps quite a bit, it doesn't solve the problem of the inevitable exceptions to the default.

If we think about the data available, is there any way we could help the system better understand the exceptions?

  • We could add a property to each name that specifies the locale in which to interpret it. This would solve the problem of complete misinterpretation as shown in the car review. It doesn't, however, solve the problem of idiosyncrasies within a locale (i.e. "Smith" and "Smythe" could be pronounced the same, and in more than just one way), and it would make the software more complex (although it could potentially save on costs, because the same bits could be deployed to all locales, just with different default settings).
  • Another approach would be to allow the user to input a phonetic version of the name, in order for the system to understand the pronunciation based only on the single default locale. This would solve the problem of exceptions, and the issue of same-pronunciations-different-spelling as mentioned in the previous point. While the phonetic data is assumed to be of the same locale as the locale of the names themselves, this is probably an acceptable drawback for most applications.

With globalization, other factors come into play, too:

  • The characters accepted by the system. For example, if the system's default locale is Chinese, is it going to allow the entry of names using English characters, and how would the system understand a phonetic spelling of that name?
  • What if there are differences between languages where some soundings of syllables may not exist in both?
  • How would efficient searching of the same data be accomplished for multiple locales?
  • How much extra effort (i.e., cost and time) is required to add a reasonably globalized solution for the specific application?

As you can probably tell, these are not necessarily problems that are easily and effortlessly solved, and I'm not going to claim for a second that either of the two approaches I mentioned will solve all application issues related to globalization. Since every application is different, though, they may offer a good solution, or at least a reasonable compromise -- either way, an improvement in usability, which was the original goal.

As database designers, we must be aware of how globalization affects which pieces of data we store, how they are arranged/sorted for efficient searching, and how users will be interacting with the data. It's also important to think about designing database structures such that they can easily be enhanced to accommodate a globalized application, even if the application will not support that kind of scenario in Version 1. It's coming; be prepared for it.


T-SQL Code Review Checklist

In software development, code review is a simple and effective way to reduce the number of bugs that creep in. Merely having a second pair of eyes look at a piece of code can often reveal logic that was implemented incorrectly or weakly, or misses use cases never considered. (Test-driven development is a separate discussion entirely; T-SQL unit testing is not the easiest thing in the world to do, and compromises sometimes have to be made, usually due to time constraints or politics.)

There are many good reasons why code review is a good practice, and I'm not going to get into that here. You can use Google as well as I can, or you can start by reading some of these questions on Programmers Stack Exchange. The purpose of this post is to consolidate many of the different things I look at when doing code review, or even when I'm looking over my own code. While this won't be a complete list, and the items will vary somewhat depending on your company's environment, it should be a good start.

In no particular order:

  • Syntax, including version-specific language use. This seems really obvious, but it can be easy to slip up. For example, if you normally do your development on a 2008+ server and the application still has to support 2005, it's easy to write something like DECLARE @i int = 0 and only have this get caught during testing. (Testing is done against all versions of the database engine that are supported by the application, right? ...right?) In terms of outright syntax, pay particular attention to T-SQL that's contained in string variables, meaning dynamic SQL, or SQL statements embedded in client applications, as these aren't validated until they are run through the containing code. Do all INSERT statements have a column list, and are all required columns specified?
  • Data/business logic protection. This could apply to either designing tables (which I do recommend code-reviewing, too), or when writing code to manipulate data (in the tables or otherwise). Are there enough constraints on the tables to prevent invalid data from entering the system? Are there any logical gaps in a piece of code? Does the piece of code accomplish the prescribed task? If in doubt, ask the developer what was intended. Sometimes logical gaps can be okay if a condition is rare, and the code is appropriately guarded. These same ideas apply to things like procedure parameters, particularly if any of the parameter values come from application user input.
  • Dynamic SQL. Follow through all logic paths to make sure the statements are being concatenated correctly for both syntax and potential injection attack vectors. Make sure the statements are parameterized appropriately.
  • Code Cleanliness. This is a bit tough to define. I personally don't expect strict adherence to a style guide, for example, but I do expect code to be legible, and make appropriate use of indentation, line breaks and comments. Everyone has their own natural style -- and even that can change over time -- so I think giving people more or less free reign over style is fine. Again, as long as the code is legible to others, because that is the most important thing.
  • Version- or edition-specific feature use. This shouldn't come up, as it should have been decided during the design phase, but at the end of the day, a developer can check anything in to the version control system.
  • Use of unexpected methods to accomplish a task. Does the code use a Rube-Goldberg-like approach to accomplish the task? Are cursors being used when all that's necessary is a slightly more difficult to write set-based operation? In these types of cases, I will always go back to the developer and find out why this kind of approach was taken, and what was intended to be accomplished. Most of the time, this kind of situation provides a great learning opportunity.
  • Query plan reuse. Does the code reuse (or not reuse) query plans as expected? Pay particular attention to dynamic SQL: for example, there can be subtle string concatenation bugs where two generated statements (via different code paths) only differ by whitespace, but this is enough to produce a separate query plan. Sometimes dynamic SQL is used to force the creation of an alternative plan for the expected data size -- does this work as expected?
  • Performance. Does the code meet the performance requirements, if any, and/or does it execute in a "reasonable" amount of time with a "reasonable" amount of resource usage? Does the code use table variables or temporary tables, and how does this interact with the query plan(s)? Does the code produce efficient query plans that will scale? It's important to test code on a real data set to find out these things. Some of this is technically a "testing" activity; however, I would say it's unlikely a testing department will have the technical skill required to determine if the code is doing the right thing. Reading and understanding query plans is a huge topic in itself; in short, look for inefficient operators that are unexpected, such as table scans (particularly as the bottom operator in a nested loops join), huge row counts (thick arrows), missing indexes, etc. Speaking of indexes, determine if the code would benefit from index changes, even if query performance is currently acceptable. Even if you don't make those changes now, record your findings, as it could be valuable information to have later if performance does become a problem. This could potentially save an incredible amount of time and effort doing a workload analysis.
  • Deprecated features. This is fairly broad, as deprecation can happen for a wide range of things, from major features (i.e., database mirroring) to minor T-SQL syntax (i.e., using HOLDLOCK without surrounding parentheses). That said, for every new release of SQL Server, it's a good idea to thoroughly read the official list of deprecated features, and make sure no new code is using any of it unless absolutely necessary (and in that case, there needs to be a plan to modify the system to remove the deprecated feature usage). I have a script on the Scripts & Code resource page that dumps the internal performance counters that track usage of deprecated features, which I think may be more broad than the official reference list. This method is great if you want to find out what your application is really doing in production, so you can proactively fix it long before it becomes a problem.

That's a pretty good start! If you think I've left out something important, let me know in the comments and I'll add it to the post, as I want this to be a good reference guide.