The “Don’t Repeat Yourself” principle suggests that you should reduce repetition. This week I came across a case where DRY should be thrown out the window. There are other cases as well (for example, scalar functions), but this one was an interesting one involving Bitwise logic.
Let’s imagine the following table:
CREATE TABLE dbo.CarOrders ( OrderID INT PRIMARY KEY, WheelFlag TINYINT, OrderDate DATE --, ... other columns ... ); CREATE INDEX IX_WheelFlag ON dbo.CarOrders(WheelFlag);
The “WheelFlag” bits represent the following options:
0 = stock wheels 1 = 17" wheels 2 = 18" wheels 4 = upgraded tires
So possible combinations are:
0 = no upgrade 1 = upgrade to 17" wheels only 2 = upgrade to 18" wheels only 4 = upgrade tires only 5 = 1 + 4 = upgrade to 17" wheels and better tires 6 = 2 + 4 = upgrade to 18" wheels and better tires
Let’s set aside arguments, at least for now, about whether this should be packed into a single TINYINT in the first place, or stored as separate columns, or use an EAV model… fixing the design is a separate issue. This is about working with what you have.
To make the examples useful, let’s fill this table up with a bunch of random data. (And we’ll assume, for simplicity, that this table contains only orders that haven’t yet shipped.) This will insert 50,000 rows of roughly equal distribution between the six option combinations:
;WITH n AS ( SELECT n,Flag FROM (VALUES(1,0),(2,1),(3,2),(4,4),(5,5),(6,6)) AS n(n,Flag) ) INSERT dbo.CarOrders ( OrderID, WheelFlag, OrderDate ) SELECT x.rn, n.Flag, DATEADD(DAY, x.rn/100, '20100101') FROM n INNER JOIN ( SELECT TOP (50000) n = (ABS(s1.[object_id]) % 6) + 1, rn = ROW_NUMBER() OVER (ORDER BY s2.[object_id]) FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2 ) AS x ON n.n = x.n;
If we look at the breakdown, we can see this distribution. Note that your results may differ slightly than mine depending on the objects in your system:
SELECT WheelFlag, [Count] = COUNT(*) FROM dbo.CarOrders GROUP BY WheelFlag;
WheelFlag Count --------- ----- 0 7654 1 8061 2 8757 4 8682 5 8305 6 8541
Now let’s say it’s Tuesday, and we just got a shipment of 18″ wheels, which were previously out of stock. This means we are able to satisfy all of the orders that require 18″ wheels – both those that upgraded tires (6), and those that did not (2). So we *could* write a query like the following:
SELECT OrderID FROM dbo.CarOrders WHERE WheelFlag IN (2,6);
In real life, of course, you can’t really do that; what if more options are added later, like wheel locks, lifetime wheel warranty, or multiple tire options? You don’t want to have to write a series of IN() values for every possible combination. Instead we can write a BITWISE AND operation, to find all the rows where the 2nd bit is set, such as:
DECLARE @Flag TINYINT = 2; SELECT OrderID FROM dbo.CarOrders WHERE WheelFlag & @Flag = @Flag;
This gets me the same results as the IN() query, but if I compare them using SQL Sentry Plan Explorer, the performance is quite different:
It’s easy to see why. The first uses an index seek to isolate the rows that satisfy the query, with a filter on the WheelFlag column:
The second uses a scan, coupled with an implicit convert, and terribly inaccurate statistics. All due to the BITWISE AND operator:
So what does this mean? At the heart of it, this tells us that the BITWISE AND operation is not sargable.
But all hope is not lost.
If we ignore the DRY principle for a moment, we can write a slightly more efficient query by being a bit redundant in order to take advantage of the index on the WheelFlag column. Assuming that we’re after any WheelFlag option above 0 (no upgrade at all), we can re-write the query this way, telling SQL Server that the WheelFlag value must be at least the same value as flag (which eliminates 0 and 1), and then adding the supplemental information that it also must contain that flag (thus eliminating 5).
SELECT OrderID FROM dbo.CarOrders WHERE WheelFlag >= @Flag AND WheelFlag & @Flag = @Flag;
The >= portion of this clause is obviously covered by the BITWISE portion, so this is where we violate DRY. But because this clause we’ve added is sargable, relegating the BITWISE AND operation to a secondary search condition still yields the same result, and the overall query yields better performance. We see a similar index seek to the hard-coded version of the query above, and while the estimates are even further off (something that may be addressed as a separate issue), reads are still lower than with the BITWISE AND operation alone:
We can also see that a filter is used against the index, which we didn’t see when using the BITWISE AND operation alone:
Don’t be afraid to repeat yourself. There are times when this information can help the optimizer; even though it may not be entirely intuitive to *add* criteria in order to improve performance, it’s important to understand when additional clauses help whittle the data down for the end result rather than making it “easy” for the optimizer to find the exact rows on its own.