Aaron Bertrand

T-SQL Tuesday #33 : Trick Shots : Schema Switch-A-Roo

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Itzik is a T-SQL trainer, a co-founder of SolidQ, and blogs about T-SQL fundamentals and query tuning.

Itzik’s Posts

T-SQL Tuesday #33 : Trick Shots

This month's T-SQL Tuesday is being hosted by Mike Fal (blog | twitter), and the topic is Trick Shots, where we're invited to tell the community about some solution we used in SQL Server that felt, at least to us, as a sort of "trick shot" – something similar to using massé, "English" or complicated bank shots in billiards or snooker. After working with SQL Server for some 15 years, I've had the occasion to come up with tricks to solve some pretty interesting problems, but one that seems to be quite reusable, easily adapts to many situations, and is simple to implement, is something I call "schema switch-a-roo."

Let's say you have a scenario where you have a large lookup table that needs to get refreshed periodically. This lookup table is needed across many servers and can contain data that gets populated from an external or 3rd party source, e.g. IP or domain data, or can represent data from within your own environment.

The first couple of scenarios where I needed a solution for this were making metadata and denormalized data available to read-only "data caches" – really just SQL Server MSDE (and later Express) instances installed on various web servers, so the web servers pulled this cached data locally instead of bothering the primary OLTP system. This may seem redundant, but off-loading read activity away from the primary OLTP system, and being able to take the network connection out of the equation completely, led to a real bump in all-around performance and, most notably, for end users.

These servers did not need up-to-the minute copies of the data; in fact, a lot of the cache tables were only updated daily. But since the systems were 24×7, and some of these updates could take several minutes, they often got in the way of real customers doing real things on the system.

The Original Approach(es)

At the very beginning, the code was rather simplistic: we deleted rows that had been removed from the source, updated all the rows that we could tell had changed, and inserted all the new rows. It looked something like this (error handling etc. removed for brevity):

BEGIN TRANSACTION;

DELETE dbo.Lookup 
  WHERE [key] NOT IN 
  (SELECT [key] FROM [source]);

UPDATE d SET [col] = s.[col]
  FROM dbo.Lookup AS d
  INNER JOIN [source] AS s
  ON d.[key] = s.[key]
  -- AND [condition to detect change];

INSERT dbo.Lookup([cols]) 
  SELECT [cols] FROM [source]
  WHERE [key] NOT IN 
  (SELECT [key] FROM dbo.Lookup);

COMMIT TRANSACTION;

Needless to say this transaction could cause some real performance issues when the system was in use. Surely there were other ways to do this, but every method we tried was equally slow and expensive. How slow and expensive? "Let me count the scans…"

Since this pre-dated MERGE, and we had already discarded "external" approaches like DTS, through some testing we determined that it would be more efficient to just wipe the table and re-populate it, rather than to try and sync to the source:

BEGIN TRANSACTION;

TRUNCATE TABLE dbo.Lookup;

INSERT dbo.Lookup([cols]) 
  SELECT [cols] FROM [source];

COMMIT TRANSACTION;

Now, as I explained, this query from [source] could take a couple of minutes, especially if all of the web servers were being updated in parallel (we tried to stagger where we could). And if a customer was on the site and trying to run a query involving the lookup table, they had to wait for that transaction to finish. In most cases, if they're running this query at midnight, it wouldn't really matter if they got yesterday's copy of the lookup data or today's; so, making them wait for the refresh seemed silly, and actually did lead to a number of support calls.

So while this was better, it was certainly far from perfect.

My Initial Solution : sp_rename

My initial solution, back when SQL Server 2000 was cool, was to create a "shadow" table:

CREATE TABLE dbo.Lookup_Shadow([cols]);

This way I could populate the shadow table without interrupting users at all, and then perform a three-way rename – a fast, metadata-only operation – only after the population was complete. Something like this (again, grossly simplified):

TRUNCATE TABLE dbo.Lookup_Shadow;

INSERT dbo.Lookup_Shadow([cols]) 
  SELECT [cols] FROM [source];

BEGIN TRANSACTION;

  EXEC sp_rename N'dbo.Lookup',        N'dbo.Lookup_Fake';
  EXEC sp_rename N'dbo.Lookup_Shadow', N'dbo.Lookup';

COMMIT TRANSACTION;

-- if successful:
EXEC sp_rename N'dbo.Lookup_Fake', N'dbo.Lookup_Shadow';

The downside to this initial approach was that sp_rename has a non-suppressible output message warning you about the dangers of renaming objects. In our case we performed this task through SQL Server Agent jobs, and we handled a lot of metadata and other cache tables, so the job history was flooded with all these useless messages and actually caused real errors to be truncated from the history details. (I complained about this in 2007, but my suggestion was ultimately dismissed and closed as "Won't Fix.")

A Better Solution : Schemas

Once we upgraded to SQL Server 2005, I discovered this fantastic command called CREATE SCHEMA. It was trivial to implement the same type of solution using schemas instead of renaming tables, and now the Agent history wouldn't be polluted with all of these unhelpful messages. Basically I created two new schemas:

CREATE SCHEMA fake   AUTHORIZATION dbo;
CREATE SCHEMA shadow AUTHORIZATION dbo;

Then I moved the Lookup_Shadow table into the cache schema, and renamed it:

ALTER SCHEMA shadow TRANSFER dbo.Lookup_Shadow;

EXEC sp_rename N'shadow.Lookup_Shadow', N'Lookup';

(If you are just implementing this solution, you'd be creating a new copy of the table in the schema, not moving the existing table there and renaming it.)

With those two schemas in place, and a copy of the Lookup table in the shadow schema, my three-way rename became a three-way schema transfer:

TRUNCATE TABLE shadow.Lookup;

INSERT shadow.Lookup([cols]) 
  SELECT [cols] FROM [source];

-- perhaps an explicit statistics update here

BEGIN TRANSACTION;

  ALTER SCHEMA fake TRANSFER     dbo.Lookup;
  ALTER SCHEMA dbo  TRANSFER  shadow.Lookup;

COMMIT TRANSACTION;

ALTER SCHEMA shadow TRANSFER fake.Lookup;

At this point you can of course empty out the shadow copy of the table, however in some cases I found it useful to leave the "old" copy of the data around for troubleshooting purposes:

TRUNCATE TABLE shadow.Lookup;

Anything further that you do with the shadow copy, you'll want to make sure you do outside of the transaction – the two transfer operations should be as concise and quick as possible.

Some Caveats

  • Foreign Keys
    This won't work out of the box if the lookup table is referenced by foreign keys. In our case we didn't point any constraints at these cache tables, but if you do, you may have to stick with intrusive methods such as MERGE. Or use append-only methods and disable or drop the foreign keys before performing any data modifications (then re-create or re-enable them afterward). If you stick with MERGE / UPSERT techniques and you're doing this between servers or, worse yet, from a remote system, I highly recommend getting the raw data locally rather than trying to use these methods between servers.
     
  • Statistics
    Switching the tables (using rename or schema transfer) will lead to statistics flipping back and forth between the two copies of the table, and this can obviously be an issue for plans. So you may consider adding explicit statistics updates as part of this process.
     
  • Other Approaches
    There are of course other ways to do this that I simply haven't had the occasion to try. Partition switching and using a view + synonym are two approaches I may investigate in the future for a more thorough treatment of the topic. I'd be interested to hear your experiences and how you've solved this problem in your environment. And yes, I realize that this problem is largely solved by Availability Groups and readable secondaries in SQL Server 2012, but I consider it a "trick shot" if you can solve the problem without throwing high-end licenses at the problem, or replicating an entire database to make a few tables redundant. :-)

Conclusion

If you can live with the limitations here, this approach may well be a better performer than a scenario where you essentially take a table offline using SSIS or your own MERGE / UPSERT routine, but please be sure to test both techniques. The most significant point is that the end user accessing the table should have the exact same experience, any time of the day, even if they hit the table in the middle of your periodic update.