Monday, March 24, 2008

MySQL InnoDB Performance Tuning for the Solaris 10 OS


Article

MySQL InnoDB Performance Tuning for the Solaris 10 OS



By Luojiia Chen, Updated February 2008
Abstract: You can maximize the performance of MySQL on the Solaris platform through configuration and tuning of the database server, along with optimizing the Solaris OS for MySQL. This paper is intended to help you define tuning parameters and tune them in your environment.
Contents

- Introduction
- InnoDB User Threads
- Using the Optimized Time Library to Minimize the time(2) System Call
- Using the Solaris OS mtmalloc Memory Allocator in MySQL
- Using 64-bit MySQL on the Solaris Platform
- Building MySQL With Sun Studio 11 Software
- Optimizing File System Performance
- InnoDB Data and Index Cache Size
- Transaction Log Flushing Mode
- Log Buffer Size
- Checkpoint Operation
- Query Cache Size
- Conclusion
- Resources
- About the Author

Introduction

MySQL is one of the world's most popular open source databases. A key strength of the MySQL database is its excellent performance and scalability, making it well suited to serve as a back end for web sites, data warehousing, and other data-intensive applications in the enterprise environment.

To maximize the performance of MySQL on the Solaris Operating System, database server configuration and tuning are important, as well as the optimization of the Solaris OS for MySQL. However, no universal MySQL server tuning parameters apply to all workloads on all platforms; the appropriate parameters will depend on your particular workload, hardware and OS platform, and MySQL usage. This paper is intended to help you define these parameters and tune them in your environment.

MySQL includes several storage engines, including MyISAM, InnoDB, HEAP, and Berkeley DB (BDB). InnoDB and BDB storage support Atomic, Consistent, Isolation, and Durable (ACID) transactions with commit, rollback, and crash recovery capabilities, and only InnoDB storage supports row-level locks with queries running as non-locking consistent reads by default.

The InnoDB storage engine supports all four isolation levels: read uncommitted, read committed, repeatable read, and serializable. InnoDB also has the feature known as referential integrity with foreign key constraints support, and it supports fast record lookups for queries using a primary key. Because of these and other powerful functions and features, InnoDB is often used in large, heavy-load production systems. This paper covers the different ways to improve the use of CPU, memory, and disk drive resources for the Solaris 10 OS. Topics include using optimized libraries, compiling 64-bit MySQL with Sun Studio 11 software, tuning the Solaris UFS file system, and MySQL server configuration and tuning for the InnoDB storage engine on the Solaris platform.

InnoDB User Threads

MySQL is a single-process, multithreaded application. One master thread has the highest priority among all the MySQL threads to control the server. The main thread is idle most of the time and “wakes up” every 300 milliseconds (msec) to check whether an action is required, such as flushing dirty blocks in the buffer pool.

In addition to the main threads, a dedicated set of user threads run at normal priority in the thread pools to handle simultaneous client requests. For each client request, one single thread is created to process the client request and send back the result to each client once the result is ready. And there is one single user thread that waits for input from the console and a group of utility threads running at lower priority to handle some background tasks.

Right now, MySQL cannot scale well with the number of the user threads handling client requests. Performance scales up efficiently with each additional user thread until it reaches the peak performance point. After that, increasing the number of user connections will decrease MySQL performance because of thread concurrency contention. For applications where the number of user connections is tunable, you need to determine -- for different workloads -- the optimum number of user connections for peak performance.

We ran the SysBench CPU-bound benchmark test (1M-row data can be filled in the InnoDB data and index cache buffer) on a four-way UltraSPARC IV processor-based server. MySQL performance peaked at 16 user connections, and started to drop from 32 user connections, as shown in the following graph. (Please note: Results may vary.)



Figure 1: MySQL 5.0.7 SysBench Connection Scalability Test
Figure 1: MySQL 5.0.7 SysBench Connection Scalability Test

This test shows that for the SysBench workload, peak MySQL performance on the UltraSPARC IV processor-based server can be achieved by setting the number of user connections as 4*CPUs on a 1-4 CPU system. The peak MySQL performance scales in a nearly linear fashion up to 4 CPUs, and the scalability ratio starts to drop from 8 CPUs. The following figure shows the scalability test result using systems with 1-24 UltraSPARC IV processors. (Please note: Results may vary.)



Figure 2: MySQL 5.0.7 SysBench CPU Scalability Test
Figure 2: MySQL 5.0.7 SysBench CPU Scalability Test

For applications where the number of user connections is not tunable, the innodb_thread_concurrency parameter can be configured to set the maximum number of threads concurrently kept inside InnoDB. You need to increase this value when you see many queries in the queue in show innodb status. In MySQL versions below 5.0.8, set this value over 500 can disable the concurrency checking, so there will be as many threads concurrently running inside InnoDB as needed to handle the different tasks inside the server. This variable changed from MySQL 5.0.8: setting it equal or greater than 20 will disable the concurrency checking; and it changed again from MySQL 5.0.19 and above: you will need to set innodb_thread_concurrency to be 0 to disable the concurrency checking. In some workloads running on the Solaris platform, when you see a large user level lock (LCK in the prstat –mL output), reducing this parameter increases the efficiency of CPU usage to improve overall performance. Tuning this parameter according to the behavior of your system at runtime can affect performance significantly.

Using the Optimized Time Library to Minimize the time(2) System Call

On the Solaris platform, the time(2) system call actually takes a trap to the kernel to call gethrestime(), which is expensive and time consuming. When MySQL executes each query, it performs a time(2) system call at the start and at the end to measure how long the query takes. For some workloads, MySQL can spend more than 30 percent of system time on the time(2) system call, consuming a large amount of system CPU cycles as illustrated below:

# truss -c -p 385
syscall seconds calls errors
read 28.286 450958 3248
write 19.516 231648
open .000 2
close .000 2
time 45.247 848307
lseek .329 6878 6800
alarm .140 2218
fdsync 1.140 5520
fcntl .364 6510
lwp_park 12.288 187383
lwp_unpark 11.134 187381
poll 4.535 67263
sigprocmask 2.072 36030
sigtimedwait .381 2506
yield .741 9829
lwp_kill .201 2512
pread .000 2
pwrite 1.040 5527
-------- ------ ----
sys totals: 127.447 2051007 10048
usr time: 101.288
elapsed: 123.750

On the Solaris platform, we can optimize the time(2) system call by implementing the faster gethrtime(3C) system call instead of calling gethrestime() in the kernel. If your workload is spending lots of system CPU resources on the time(2) system call (you can verify this by checking the truss output), it may make sense to re-compile the MySQL DB by linking it to the optimized time library described in A Performance Optimization for C/C++ Systems That Employ Time-Stamping. Do this by adding LIBS='-lfasttime' at the beginning of the configure file under the MySQL source tree home to implement gethrtime(3C), or you can set LD_PRELOAD (32-bit) or LD_PRELOAD_64 (64-bit) to point to the location of the libfasttime.so library. We ran the OSDL Database Test 2 (DBT2) workload test and improved the MySQL performance by about 7 percent by using the optimized time(2) in the 10-warehouse workload test on an 8-way UltraSPARC system.

Using the Solaris OS mtmalloc Memory Allocator in MySQL

The malloc routine on the Solaris platform behaves rather well from a memory footprint perspective; however, the default single-thread malloc in libc handles the concurrent memory allocation requests one by one in a queue, slowing down performance in multithreaded applications. The Solaris OS offers several malloc implementations, including mtmalloc, libumem, and hoard to improve memory allocation performance for multithreaded applications. However, it can be difficult to tell which malloc implementation is best for different applications as that depends on the memory allocation pattern of the application and how the different algorithms in the different malloc implementations fit the memory allocation pattern of the application.

MySQL actively uses malloc() and free() to allocate memory for unexpectedly large strings. The malloc calls block mysqld threads for HEAP contention. By measuring with different memory allocators, replacing malloc with mtmalloc can significantly improve MySQL DB performance. We saw about a 65 percent improvement in the SysBench CPU-bound benchmark test (1M-row data can be filled in the InnoDB data and index cache buffer). We ran this test on an 8-way dual-core UltraSPARC IV processor-based system with MySQL 5.0.7.

To use mtmalloc, you can preload the mtmalloc library by setting the LD_PRELOAD or LD_PRELOAD_64 environment variable in the MySQL startup script on the Solaris platform, so that you do not have to rebuild the MySQL DB to link to the mtmalloc library.

If it is 32-bit MySQL, preload mtmalloc by setting LD_PRELOAD as follows:

LD_PRELOAD=/usr/lib/libmtmalloc.so (x86)
LD_PRELOAD=/usr/lib/libmtmalloc.so(sparc)

If it is 64-bit MySQL, set LD_PRELOAD_64 as follows:

LD_PRELOAD_64=/usr/lib/amd64/libmtmalloc.so(x64)
LD_PRELOAD_64=/usr/lib/sparcv9/libmtmalloc.so(64-bit sparc)

MySQL 5.0 GA binary for Solaris OS released by MySQL has been built to use libmtmalloc with "--with-mysqld-libs=-lmtmalloc" configuration flag, so you will not need to set the LD_PRELOAD or LD_PRELOAD_64 environment variable to use mtmalloc.

Using 64-bit MySQL on the Solaris Platform

The Solaris OS is a full 64-bit computing environment providing binary compatibility with 32-bit applications, so that both 64-bit and 32-bit MySQL can run well on the Solaris platform. Compared to 32-bit MySQL, 64-bit MySQL can address more memory for the data cache, code cache, and metadata caches inside MySQL to reduce disk I/O. In addition, with wider CPU operations in the 64-bit computing environment, 64-bit MySQL operates faster than 32-bit MySQL. The following figure depicts the performance data in the SysBench CPU-bound test (1M-row data can be filled in the InnoDB data and index cache buffer) on an 8-way UltraSPARC IV based server. (Note: Results may vary.)



Figure 3: MySQL 4.1.11 CPU-Bound SysBench Test
Figure 3: MySQL 4.1.11 CPU-Bound SysBench Test
Building MySQL With Sun Studio 11 Software

To build 64-bit MySQL on the Solaris platform using the Sun Studio 11 release, use the compiler flags and configuration options as shown here:

With the Solaris OS for x64 platforms:

CC=cc CFLAGS="-xO4 -mt -fsimple=1 -ftrap=%none -nofstore
-xbuiltin=%all -xlibmil -xlibmopt -xtarget=opteron
-xarch=amd64 -xregs=no%frameptr"
CXX=CC CXXFLAGS="-xO3 -mt -fsimple=1 -ftrap=%none -nofstore
-xbuiltin=%all -xlibmil -xlibmopt -xtarget=opteron
-xarch=amd64 -xregs=no%frameptr"
LDFLAGS="-xtarget=opteron -xarch=amd64"
./configure --prefix=/usr/local/mysql
--localstatedir=/usr/local/mysql/data
--libexecdir=/usr/local/mysql/bin --with-extra-charsets=complex
--with-server-suffix=-standard --enable-thread-safe-client
--enable-local-infile --with-named-curses=-lcurses
--with-big-tables --disable-shared --with-readline
--with-archive-storage-engine --with-innodb

With the Solaris OS for SPARC platforms:

CC=cc CFLAGS="-xO4 -mt -fsimple=1 -ftrap=%none
-xbuiltin=%all -xlibmil -xlibmopt -xstrconst -xarch=v9"
CXX=CC CXXFLAGS="-xO3 -noex -mt -fsimple=1
-ftrap=%none -xbuiltin=%all -xlibmil -xlibmopt
-xarch=v9"
./configure --prefix=/usr/local/mysql
--localstatedir=/usr/local/mysql/data
--libexecdir=/usr/local/mysql/bin
--with-extra-charsets=complex
--with-server-suffix=-standard
--enable-thread-safe-client --enable-local-infile
--with-named-z-libs=no --with-big-tables
--disable-shared --with-readline
--with-archive-storage-engine --with-innodb

To compare the performance of the MySQL database built with different compiler releases (Sun Studio 10 and Sun Studio 11), we used the DBT2 test suite. The workload represents a wholesale parts supplier operating out of a number of warehouses and associated sales districts. The tasks involve the mixture of read-only and update-intensive transactions to enter and deliver orders, record payments, check the status of the orders, and monitor the level of stock at the warehouses. The nine tables include Warehouse, District, Item, Stock, Customer, Order, New order, Order-line and History, scaling with the number of warehouses (except the Item table).

We ran the DBT2 test using the 10-warehouse database on the Solaris 10 OS on four dual-core 2200-MHz AMD Opteron processor-based Sun Fire x64 servers. In this test case, most database queries were cached in the innodb buffer. Most of the CPU time was spent on processing the queries, so the system was CPU bound during this test. The test metric was the throughput, that is, new-order transactions per minutes. As shown in our test result data in the following figure, MySQL built with Sun Studio 11 performed around 13 percent better than MySQL built with Sun Studio 10 software. (Note: Results may vary.)



Figure 4: MySQL 5.0.15 DBT2 on Solaris 10 OS
Figure 4: MySQL 5.0.15 DBT2 on Solaris 10 OS

In addition to the optimized performance improvement for the MySQL database that Sun Studio 11 offers compared to Sun Studio 10 in the x64 architecture-based system, Sun Studio 11 also provides the multicore and chip multithreading (CMT) optimizations in the UltraSPARC processor-based system. The Sun Studio 11 release also includes an advanced graphical debugger tool to easily set breakpoints, examine variables, navigate the call stack, and debug the multithreaded code inside MySQL. The sophisticated performance analysis tool in Sun Studio 11 software provides additional data space profiling ability on the UltraSPARC processor-based system to evaluate the performance costs associated with the application memory references. The detailed instructions on how to use these new features in the debugger and performance analyzer tools in Sun Studio 11 software can be found in the comprehensive user guide.

Sun Studio 11 software also bundles the dmake tool, which can compile MySQL source code in parallel. dmake can significantly improve compilation performance compared to make on a multiprocessor system. On an 8-core Sun Fire T2000 system with T1 processors, it took only 9 minutes to compile MySQL source code using dmake -j 64 -- this was over 3 times faster than using make, which took 29 minutes.

Optimizing File System Performance

File system cluster size can have a big impact on system performance -- particularly when MySQL is running a workload with a database size much bigger than system memory. On the Solaris platform, UFS file system cluster size (the maxcontig parameter) is set as 128 by default. The file system block size on the Solaris 10 OS for SPARC platforms (x86/x64) is 8 Kbytes. You can get the value of maxcontig and bsize using the mkfs –F or fstyp –v command on the file system. This will trigger read-ahead for the whole file system cluster length (128*8 Kbytes), even in random I/O, which can saturate a disk and significantly degrade performance.

One way to solve this problem is to reduce the value of the maxcontig parameter so that the disk I/O transfer size matches the DB block size. You can change the maxcontig value by using the tunefs –a maxcontig# command on the file system. The shortcoming of this solution is that it may impact the performance of other workloads running large sequential I/O from the client.

Another solution is to enable file system Direct I/O by mounting the file system with the option --forcedirectio, since file system Direct I/O will automatically disable the read-ahead. In addition, since MySQL has its own data and cache buffers, using Direct I/O can disable the file system buffer to save the CPU cycles from being spent on double buffering. The following figure shows performance data on a Sun Fire V65x server using the SysBench I/O bound test (100M-row data cannot fit in the InnoDB data and index cache buffer). This test compared the performance using default maxcontig, setting maxcontig as 5 (the disk transfer rate size is 5*4 Kbytes), and using Direct I/O. (Note: Results may vary.)



Figure 5: MySQL 4.1.11 I/O-Bound SysBench Test
Figure 5: MySQL 4.1.11 I/O-Bound SysBench Test
InnoDB Data and Index Cache Size

MySQL doesn't access the disk directly; instead, it reads data into the internal buffer cache, reads/writes blocks, and flushes the changes back to the disk. If the server requests data available in the cache, the data can be processed right away. Otherwise, the operating system will request that the data be loaded from the disk. The bigger the cache size, the more disk accesses can be avoided. The default value of 8 Mbytes is too small for most workloads. You will need to increase this number when you see that %b (percentage utilization of the disk) is above 60 percent, svc_t (response time) is above 35 msec in the iostat –xnt 5 trace output, and a high amount of read appears in the FILE IO part of the show innodb status output.

However, you should not set the value of the innodb_buffer_pool_size parameter too high to avoid the expensive paging for the other processes running without enough RAM, because it will significantly degrade performance. For systems running on a single dedicated MySQL process, it should be fine to set the innodb_buffer_pool_size parameter up to a value between 70 and 80 percent of memory since the footprint of the MySQL process is only around 2 to 3 Mbytes.

Transaction Log Flushing Mode

InnoDB flushes the transaction log to disk approximately once per second in the background. As a default, the log flushes to the disk on each transaction commit. The safest way to avoid transaction loss in the event of MySQL, OS, or hardware crashes is to use the mode innodb_flush_log_at_trx_commit = 1.

For workloads running with many short transactions, you can reduce disk writing by setting the innodb_flush_log_at_trx_commit parameter with different values.

When setting this value at 0, there is no log flushing on each transaction commit. It can reduce the disk I/O to improve performance, however, the transaction might be lost if MySQL crashes.

When setting this value at 2, the log flushes to the OS cache (file system cache) instead of the disk on each transaction commit. It can also reduce the disk I/O and perform a little slower than when the value is set at 0; however, no transaction loss occurs in the event that MySQL crashes (although a loss will probably occur if the OS or hardware crashes).

Log Buffer Size
For large transactions, before the log buffer is flushed on each transaction commit, if your setting innodb_flush_log_at_trx_commit is at 1, the log can be loaded into the log buffer instead of flushing to the disk in the background to reduce the disk I/O. If you see large log I/O in the show innodb status output at runtime, you probably need to set a larger value for the innodb_log_buffer_size parameter. For most workloads without long transactions, it is not necessary to waste memory resources by setting a higher value for the log buffer. It is usually fine to set it between 8 and 64 Mbytes.

Checkpoint Operation

The recovery management subsystem inside InnoDB flushes database pages and transaction operations to the log files for the purpose of backup and recovery. It implements a fuzzy checkpoint operation by continually flushing modified database pages from the buffer pool in small batches. InnoDB writes to the log files in a circular fashion, so if the log file has reached the configure limit set by the innodb_log_file_size parameter, the checkpoint operation is executed at once to flush the modified database pages. This is done in order to make sure the committed modification pages are available in the log files in case of recovery.

The size of each log file should be chosen to avoid executing checkpoint operations too often. The bigger log file size reduces disk I/O in checkpointing. You will need to increase this parameter when you see large page writes in the BUFFER POOL AND MEMORY part of the show innodb status output. However, the larger size of the log file increases the “redo” recovery time in case of a server crash.

Query Cache Size

In addition to the data and index buffer cache, MySQL version 4.0.1 and later has a nice feature called query cache that stores the identical SELECT queries issued by clients to the database server. This makes it possible to locate and re-issue the same queries without repetitive hard parsing activities. MySQL also stores the query's result set in the query cache, which can significantly reduce the overhead of creating complex result sets for queries from the disk or memory caches, reducing both physical and logical I/O. For some applications executing the same queries from user clients, the query cache can greatly improve the response time.

The query_cache_size parameter is used to allocate an amount of memory to cache the frequently executed queries and return the result set back to the client without real query execution. The query_cache_type parameter is used to enable or disable query cache in different ways. To decide how to set these two parameters, you need to check the qcache_inserts, qcache_hits, and qcache_free_memory during runtime. qcache_inserts shows the number of queries added to the query cache, qcache_hits shows the number of query results taken from the query cache without real query execution, and qcache_free_memory shows the free available memory for query cache not being used.

If you see a high value for qcache_hits compared to your total queries at runtime or a low value for qcache_free_memory, you probably need to increase the value of the query_cache_size parameter accordingly. Otherwise, you would decrease the value of the query_cache_size parameter to save memory resources for the other MySQL cache buffers. If qcache_hit is 0 in the runtime, you would completely turn off the query cache by setting query_cache_type as 0, together with setting query_cache_size as 0.

The query_cache_limit parameter sets the maximum result sets stored in the query cache. A low ratio of qcache_hits to qcache_insert at runtime may also be caused by too low a setting for the query_cache_limit parameter. In this case, you will need to increase the value of the query_cache_limit parameter to store the large query result sets in the query cache.

Conclusion
The outstanding performance and scalability of MySQL can be enhanced even further by tuning the storage engines within MySQL for specific workloads running on Sun systems. While there are many variables that impact performance and many parameters for tuning for each workload, this paper has attempted to provide general guidelines and practical advice to help you optimize the performance of InnoDB on the Solaris platform. We welcome your feedback on this paper and your insights into achieving higher performance for MySQL on Sun systems.

Resources
About the Author
Luojia Chen is a software engineer in Sun's Market Development Engineering organization, in the open source team. She is currently responsible for MySQL adoption and migration for Sun's latest technologies, and she is focused on making MySQL run and scale well on the Solaris platform. She can be reached at luojia.chen@sun.com.

0 comments: