In my previous post on streamlining the operations of the transaction log, I discussed two of the most common causes of extra log records being generated: dead weight from unused nonclustered indexes and page split operations (that cause index fragmentation). Assuming you’ve read that post, I mentioned that there are more subtle problems that can be detrimental to transaction log performance, and I’m going to cover these here.
Many, Many Tiny Transactions
Every so often SQL Server will flush a portion of the transaction log to disk. A log flush occurs whenever:
- A transaction commit log record is generated.
- A transaction abort log record is generated at the end of a transaction roll back.
- 60KB of log records have been generated since the previous log flush.
The smallest log flush possible is a single 512-byte log block. If all transactions in a workload are very small (e.g. inserting a single, small table row) then there will be lots of minimally-sized log flushes occurring. Log flushes are performed asynchronously, to allow decent transaction log throughput, but there is a fixed limit of 32 concurrent log-flush I/Os at any one time.
There are two possible effects this may have:
- On a slow-performing I/O subsystem, the volume of tiny transaction log writes could overwhelm the I/O subsystem leading to high-latency writes and subsequent transaction log throughput degradation. This situation can be identified by high-write latencies for the transaction log file in the output of sys.dm_io_virtual_file_stats (see the demo links at the top of the previous post)
- On a high-performing I/O subsystem, the writes may complete extremely quickly, but the limit of 32 concurrent log-flush I/Os creates a bottleneck when trying to make the log records durable on disk. This situation can be identified by low write latencies and a near-constant number of outstanding transaction log writes near to 32 in the aggregated output of sys.dm_io_pending_io_requests (see the same demo links).
In both cases, making transactions longer (which is very counter-intuitive!) can reduce the frequency of transaction log flushes and increase performance. Additionally, in case #1, moving to a higher-performing I/O subsystem may help – but may lead to case #2. With case #2, if the transactions cannot be made longer, the only alternative is to split the workload over multiple databases to get around the fixed limit of 32 concurrent log-flush I/Os.
Transaction Log Auto-Growth
Whenever new space is added to the transaction log it must be zero-initialized (writing out zeroes to overwrite the previous use of that portion of the disk), no matter whether the instant file initialization feature is enabled or not. This applies to creation, manual growth, and auto-growth of the transaction log. While the zero initialization is taking places, log records cannot be flushed to the log, so auto-growth during a workload that is changing data can lead to a noticeable drop in throughput, especially if the auto-growth size is set to be large (say gigabytes, or left at the default 10%).
Auto-growth should be avoided, then, if at all possible by allowing the transaction log to clear so there is always free space that can be reused for new log records. Transaction log clearing (also known as transaction log truncation, not to be confused with transaction log shrinking) is performed by transaction log backups when using the Full or Bulk-Logged recovery modes, and by checkpoints when using the Simple recovery mode.
Log clearing can only occur if nothing requires the log records in the section of transaction log being cleared. One common problem that prevents log clearing is having long-running transactions. Until a transaction commits or rolls back, all the log records from the beginning of the transaction onwards are required in case the transaction rolls back – including all the log records from other transactions that are interspersed with those from the long-running transaction. Long-running transactions could be because of poor design, code that is waiting for human input, or improper use of nested transactions, for example. All of these can be avoided once they are identified as a problem.
Some high-availability features can also delay transaction log clearing:
- Database mirroring and availability groups when running asynchronously can build up a queue of log records that have not yet been sent to the redundant database copy. These log records must be kept around until they’re sent, delaying transaction log clearing.
- Transactional replication (and also Change Data Capture) relies on a Log Reader Agent job to periodically scan the transaction log for transactions that modify a table contained in a replication publication. If the Log Reader Agent falls behind for any reason, or is purposefully made to run infrequently, all the log records that have not been scanned by the job must be kept around, delaying transaction log clearing.
When running in synchronous mode, database mirroring and availability groups can cause other problems with the logging mechanism. Using synchronous database mirroring as an example, any transaction that commits on the principal cannot actually return to the user or application until all log records it generated have successfully been sent to the mirror server, adding a commit delay on the principal. If the average transaction size is long, and the delay is short, this may not be noticeable, but if the average transaction is very short, and the delay is quite long, this can have a noticeable effect on the workload throughput. In that case, either the performance goals of the workload need to be changed, the high-availability technology changed to asynchronous mode, or the network bandwidth and speed between the principal and redundant databases must be increased.
Incidentally, the same kind of issue can occur if the I/O subsystem itself is synchronously mirrored – with a potential delay for all writes that SQL Server performs.
As you can see, transaction log performance is not just about extra transaction log records being generated – there are many environmental factors that can have a profound effect too.
The bottom line is that transaction log health and performance are of paramount importance for maintaining overall workload performance. In these two posts I’ve detailed the major causes of transaction log performance problems so hopefully you’ll be able to identify and remediate any that you have.
If you want to learn a whole lot more about transaction log operations and performance tuning, I recommend that you check out my 7.5 hour online training course on logging, recovery, and the transaction log, available through Pluralsight.