A little bit of untruthiness about MySQL and Threads
Curt Monash has an interesting post. As with everything on the web, I agree and disagree with various bits – which is one of the great things about blogging … but this comment concerns me enough to respond, since it’s not so much an opinion but incorrect technical information…
As for your distinction between too many connections and memory pools — lack of memory is the reason for limits on connections.
Lack of memory is actually _NOT_ the reason for limits on connections in a well tuned MySQL installation. MySQL is a multi-threaded application and as such in its current implementation allocates a thread to a connection. In the standard web hosting platform, which is Linux, there is a point at which Linux itself can’t actually deal with the threads effectively, thus getting you into an overloaded run queue situation. The answer to high-traffic and high concurrency situations is actually often to lower the number of connections to decrease the amount of time linux spends managing the thread queue and thus lowering your individual thread service time. People fight this a lot, since they think “I’m maxing out my connections, I should increase the setting”, but it is quite easy to demonstrate.
There is a patch coming up to decouple the two which should help the few people out there who cannot, for whatever reason, manage a sensible number of connections. But it is very rare that I see a client for whom this “problem” can’t be managed or fixed by simple education.
In any case, the issue is almost never (I say almost because you can always grossly misconfigure anything) memory related.
What’s more, my web host, who to my knowledge doesn’t handle any terribly busy sites, finds that MySQL will at times eat up all his RAM even so.
In that case, I would suggest that you tell your web host to learn how to configure MySQL properly. MySQL will only eat up all his RAM if he configures it to do so. If he isn’t sure how to do that, I’d be happy to help him out!
I would also like to add that there is often an unfair double standard applied to MySQL as far as this goes. No one seems to doubt that a well functioning Oracle system needs a team of well-trained Oracle DBAs to tune and run. But many times when I suggest that a problem can be solved by learning a little more about how MySQL works, people tell me that I have just some how pointed out a deficiency in MySQL. Try running an top-10 web property with 100+ Oracle databases with a team of 1 DBA and see how long it remains running. I’d be happy to take the challenge of running the same thing on MySQL.
Technorati Tags: mysql threads performance
MySQL 5.1 auto-inc patch in action: InnoDB scalability test
I’ve recently done a quick scalability test with MySQL 5.0 and 5.1 to check the new auto-inc patch with InnoDB and to see how MySQL 5.1 scales with InnoDB:
New in MySQL 5.1: innodb_autoinc_lock_mode = 1 (“consecutive” lock mode)
With this lock mode, “simple inserts” (only) use a new locking model where a light-weight mutex is used during the allocation of auto-increment values, and no AUTO-INC table-level lock is used, unless an AUTO-INC lock is held by another transaction. If another transaction does hold an AUTO-INC lock, a “simple insert” waits for the AUTO-INC lock, as if it too were a “bulk insert.”
http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html#innodb-auto-increment-configurable
This fixes the Auto-Inc bug with InnoDB
I’ve run sysbench in OTLP mode against MySQL 5.0.45 and 5.1.22-rc. Insert queries with auto_inc field were running in multiple threads: 1, 4, 16, 64, 128 and 256 threads on 4 cores box. Benchmark is CPU bound (except for log flushing/fsync).
Results
With 256 threads MySQL 5.0 starts deadlocking and MySQL 5.1 shows no deadlocks.
However, MySQL 5.1 is still affected by broken group commit bug for InnoDB, so enabling bin-log and using innodb_flush_log_at_trx_commit = 1 with no battery backup cache caused significant decrease in performance: tests showed only around 100 tps constantly with MySQL 5.1 and MySQL 5.0. Waiting on InnoDB logs/bin logs serialized transactions; with this situation speed depends upon hardware. After enabling Battery Backup Cache (BBU cache) on RAID we were able to handle around 2000 tps.
PeterZ originally performed InnoDB scalability tests for MySQL 5.0 to show broken group commit bug.
further malloc() scaling
As a quick follow up to the last post on malloc() speed comparisons, I decided to extend the size out a bit to cover malloc()ing up to 500M.
Time for 128k: 0.035259 Time for 256k: 0.009718 Time for 1M: 0.478129 Time for 5M: 0.968945 Time for 10M: 0.965172 Time for 50M: 0.674316 Time for 500M: 1.018901
As you can see, once you make the jump up to mmap() (>256k), the cost is fairly well constant (give or take fluctuations). So it’s not that huge memory buffers are terrible, just that there is a cost difference between the smaller and larger buffer sizes that may or may not matter in your case.
more on malloc() speed
The recent read_buffer issue got me curious, so I hacked up a quick (ugly) test program to see if I could show the different speeds of malloc()ing different buffer sizes. Here’s the test code:
[C]
#include
#include
#include
#define LOOP 10000
int main(void) {
int x[4] = { 128,256,1024,5*1024 } ;
int f = 0;
for(f=0;f<4;f++) {
int val = x[f];
timeval before;
timeval after;
gettimeofday(&before,0);
int loop=0;
for(loop=0;loop after.tv_usec) {
udiff = (float)(after.tv_usec + 10000000 – before.tv_usec)/10000000.0;
diff = after.tv_sec – 1 – before.tv_sec;
} else {
udiff = (float)(after.tv_usec – before.tv_usec)/1000000.0;
diff = after.tv_sec – before.tv_sec;
}
printf(“Time for %dk:\t%f\n”,val,((float)diff+udiff));
}
return 0;
}
[/C]
And the results.
Time for 128k: 0.002641 Time for 256k: 0.002628 Time for 1024k: 0.097950 Time for 5120k: 0.060240
Which is rather huge and in line with the read buffer performance we were talking about. malloc()ing 1M isn’t 4x as slow as doing 256k… it’s 37x as slow. Obviously, there is some variance here, and I should probably make this run a little longer. So setting LOOP to 100000, I get this:
Time for 128k: 0.035105 Time for 256k: 0.028631 Time for 1024k: 0.942634 Time for 5120k: 0.985856
Luckily this shows about 10x the performance of before, so we’re at least on the right track here.
The long and short of this is that bigger memory buffers aren’t always better. Much depends on how and when they are used. Much also depends on where your bottleneck is. Many database loads are disk bound, so the cost of 37x isn’t much, since we’re talking microseconds here anyway. But when you’re talking high concurrency and everything is in memory, shaving microseconds from your queries can actually be crucial.
Read Buffer performance hit
I had some fun yesterday with some odd performance problems. So I did a run with oprofile and got this:
561612 25.0417 /lib64/tls/libc-2.3.4.so memset 429457 19.1491 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux clear_page 214268 9.5540 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_page_fault 144293 6.4339 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_no_page 94410 4.2097 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux buffered_rmqueue 64998 2.8982 /lib64/tls/libc-2.3.4.so memcpy 59565 2.6559 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __down_read_trylock 59369 2.6472 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux handle_mm_fault 47312 2.1096 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux free_hot_cold_page 39161 1.7462 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux release_pages 39140 1.7452 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux unmap_vmas 27200 1.2128 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __alloc_pages 21520 0.9596 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __pagevec_lru_add_active 20772 0.9262 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux page_add_anon_rmap 19852 0.8852 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux page_remove_rmap 16424 0.7323 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux bad_range 11993 0.5348 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __might_sleep
Which seemed to me like an awful lot of page faults, especially on a system that wasn’t
swapping at all.
Alexander and I got to talking about it, and I learned a few new things. The first thing is that mmap() appears as page_faults, because the machinery for doing mmap() and for doing swap is actually the same. Fun.
Back to the problem at hand… the following was in the my.cnf:
read_buffer=5M
As I understand it, if read_buffer set to over 256K, it uses mmap() instead of malloc() for memory
allocation. Actually – this is a libc malloc thing and is tunable, but defaults to 256k. From the manual:
Each thread that does a sequential scan allocates a buffer of this size (in bytes) for each table it scans.
In this case, the server is pretty much handling click logging, so all of the queries are pure inserts. Why then would the read buffer get used at all?
Well, it just so happens that this is a PHP app and is not using any sort of persistent connections, so the app is connecting and disconnecting 60 – 100 times a second. Each of those connections is going to have to authenticate against (at least) the mysql.user table. On the other hand, that _should_ be cached, so it shouldn’t be a problem. However, lowering the read buffer is quite easy … and it’ll be interesting.
So, theory to the fire – I reduced read_buffer to 128k (the default) and then got this output from oprofile:
2220 15.7760 /lib64/tls/libc-2.3.4.so memset 785 5.5785 /lib64/tls/libc-2.3.4.so memcpy 608 4.3206 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux thread_return 398 2.8283 /usr/libexec/mysqld yyparse(void*) 337 2.3948 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux system_call 234 1.6629 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_gettimeoffset_hpet 197 1.3999 /lib64/tls/libc-2.3.4.so _int_malloc 183 1.3005 /lib64/tls/libpthread-2.3.4.so pthread_mutex_lock 181 1.2862 /lib64/tls/libpthread-2.3.4.so pthread_mutex_unlock 181 1.2862 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux clear_page 176 1.2507 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux wake_futex 143 1.0162 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux sys_tgkill 138 0.9807 /usr/libexec/mysqld yylex(void*, void*) 130 0.9238 /usr/libexec/mysqld update_sys_var_str(sys_var_str*, _pthread_rwlock_t*, set_var*) 126 0.8954 /jbd (no symbols) 123 0.8741 /usr/libexec/mysqld my_strntoul_ucs2 106 0.7533 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __might_sleep 106 0.7533 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_futex 104 0.7391 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux tcp_recvmsg
Which is more what we would expect.
The results on the system as a whole were that with read buffer set to 5M, extra 3-5% usr and 6-12% sys cpu time taken on an 8 cpu box (as reported by top) or, to look at it another way, about 100% of a single CPU taken by mysqld.
With read_buffer reduced to 128k, the CPU usage dropped to just about nothing.
OProfile, Kernel Images and InnoDB (oh my!)
Kristian Köhntopp has a wonderful article about using oprofile to track down problems in running programs. I thought I’d add a few thoughts.
If you need to get a vmlinux kernel on redhat, apparently you just need to install kernel-debuginfo, which will provide a vmlinux image you can profile against.
If you are using debian, unfortunately there is no package I could find to allow you to get a vmlinux. so what I did was:
($kver isn’t a real variable – it’s your kernel version. tab completion probably comes in handy at some point)
- Install linux-tree-$kver – which gets you the debian kernel sources
- Unpack the tar.bz2 file that is now in /usr/src
- Copy /boot/config-$kver to /usr/src/linux-$kver/.config
- cd /usr/src/linux-$kver
- make oldconfig
- make prepare
- make vmlinux
- cp vmlinux /boot
And now I get things that look like this:
322451 5.7214 /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld cmp_dtuple_rec_with_match 304963 5.4111 /lib/libpthread-0.60.so __pthread_rwlock_rdlock_internal 275748 4.8927 /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld my_lengthsp_8bit 275503 4.8884 /boot/vmlinux try_to_wake_up 221124 3.9235 /lib/libpthread-0.60.so __pthread_rwlock_wrlock_internal 188777 3.3496 /boot/vmlinux futex_wait 170249 3.0208 /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld btr_search_guess_on_hash 144030 2.5556 /lib/libc-2.3.2.so _wordcopy_bwd_aligned 131726 2.3373 /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld rec_get_nth_field 125462 2.2261 /lib/libpthread-0.60.so __pthread_mutex_unlock_internal 122389 2.1716 /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld safe_mutex_lock 116276 2.0631 /boot/vmlinux futex_wake 115654 2.0521 /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld safe_mutex_unlock
Which – btw – shows what your system looks like when it’s experiencing the InnoDB thread concurrency bug.