MySQL-HA


A little bit of untruthiness about MySQL and Threads

Posted in performance by mtaylor on January 25, 2008

Curt Monash has an interesting post. As with everything on the web, I agree and disagree with various bits – which is one of the great things about blogging … but this comment concerns me enough to respond, since it’s not so much an opinion but incorrect technical information…

As for your distinction between too many connections and memory pools — lack of memory is the reason for limits on connections.

Lack of memory is actually _NOT_ the reason for limits on connections in a well tuned MySQL installation. MySQL is a multi-threaded application and as such in its current implementation allocates a thread to a connection. In the standard web hosting platform, which is Linux, there is a point at which Linux itself can’t actually deal with the threads effectively, thus getting you into an overloaded run queue situation. The answer to high-traffic and high concurrency situations is actually often to lower the number of connections to decrease the amount of time linux spends managing the thread queue and thus lowering your individual thread service time. People fight this a lot, since they think “I’m maxing out my connections, I should increase the setting”, but it is quite easy to demonstrate.

There is a patch coming up to decouple the two which should help the few people out there who cannot, for whatever reason, manage a sensible number of connections. But it is very rare that I see a client for whom this “problem” can’t be managed or fixed by simple education.

In any case, the issue is almost never (I say almost because you can always grossly misconfigure anything) memory related.

What’s more, my web host, who to my knowledge doesn’t handle any terribly busy sites, finds that MySQL will at times eat up all his RAM even so.

In that case, I would suggest that you tell your web host to learn how to configure MySQL properly. MySQL will only eat up all his RAM if he configures it to do so. If he isn’t sure how to do that, I’d be happy to help him out!

I would also like to add that there is often an unfair double standard applied to MySQL as far as this goes. No one seems to doubt that a well functioning Oracle system needs a team of well-trained Oracle DBAs to tune and run. But many times when I suggest that a problem can be solved by learning a little more about how MySQL works, people tell me that I have just some how pointed out a deficiency in MySQL. Try running an top-10 web property with 100+ Oracle databases with a team of 1 DBA and see how long it remains running. I’d be happy to take the challenge of running the same thing on MySQL.

Technorati Tags:

MySQL 5.1 auto-inc patch in action: InnoDB scalability test

Posted in Benchmarks, InnoDB, performance by alikrubin on November 8, 2007

I’ve recently done a quick scalability test with MySQL 5.0 and 5.1 to check the new auto-inc patch with InnoDB and to see how MySQL 5.1 scales with InnoDB:

New in MySQL 5.1: innodb_autoinc_lock_mode = 1 (“consecutive” lock mode)
With this lock mode, “simple inserts” (only) use a new locking model where a light-weight mutex is used during the allocation of auto-increment values, and no AUTO-INC table-level lock is used, unless an AUTO-INC lock is held by another transaction. If another transaction does hold an AUTO-INC lock, a “simple insert” waits for the AUTO-INC lock, as if it too were a “bulk insert.”
http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html#innodb-auto-increment-configurable

This fixes the Auto-Inc bug with InnoDB

I’ve run sysbench in OTLP mode against MySQL 5.0.45 and 5.1.22-rc. Insert queries with auto_inc field were running in multiple threads: 1, 4, 16, 64, 128 and 256 threads on 4 cores box. Benchmark is CPU bound (except for log flushing/fsync).

Results

With 256 threads MySQL 5.0 starts deadlocking and MySQL 5.1 shows no deadlocks.
However, MySQL 5.1 is still affected by broken group commit bug for InnoDB, so enabling bin-log and using innodb_flush_log_at_trx_commit = 1 with no battery backup cache caused significant decrease in performance: tests showed only around 100 tps constantly with MySQL 5.1 and MySQL 5.0. Waiting on InnoDB logs/bin logs serialized transactions; with this situation speed depends upon hardware. After enabling Battery Backup Cache (BBU cache) on RAID we were able to handle around 2000 tps.

PeterZ originally performed InnoDB scalability tests for MySQL 5.0 to show broken group commit bug.

(more…)

further malloc() scaling

Posted in performance by mtaylor on September 10, 2007

As a quick follow up to the last post on malloc() speed comparisons, I decided to extend the size out a bit to cover malloc()ing up to 500M.

Time for        128k:   0.035259
Time for        256k:   0.009718
Time for        1M:     0.478129
Time for        5M:     0.968945
Time for        10M:    0.965172
Time for        50M:    0.674316
Time for        500M:   1.018901

As you can see, once you make the jump up to mmap() (>256k), the cost is fairly well constant (give or take fluctuations). So it’s not that huge memory buffers are terrible, just that there is a cost difference between the smaller and larger buffer sizes that may or may not matter in your case.

more on malloc() speed

Posted in performance by mtaylor on September 10, 2007

The recent read_buffer issue got me curious, so I hacked up a quick (ugly) test program to see if I could show the different speeds of malloc()ing different buffer sizes. Here’s the test code:

[C]
#include
#include
#include

#define LOOP 10000

int main(void) {

int x[4] = { 128,256,1024,5*1024 } ;

int f = 0;
for(f=0;f<4;f++) {
int val = x[f];

timeval before;
timeval after;

gettimeofday(&before,0);

int loop=0;
for(loop=0;loop after.tv_usec) {
udiff = (float)(after.tv_usec + 10000000 – before.tv_usec)/10000000.0;
diff = after.tv_sec – 1 – before.tv_sec;
} else {
udiff = (float)(after.tv_usec – before.tv_usec)/1000000.0;
diff = after.tv_sec – before.tv_sec;
}
printf(“Time for %dk:\t%f\n”,val,((float)diff+udiff));

}

return 0;
}
[/C]

And the results.

Time for 128k:  0.002641
Time for 256k:  0.002628
Time for 1024k: 0.097950
Time for 5120k: 0.060240

Which is rather huge and in line with the read buffer performance we were talking about. malloc()ing 1M isn’t 4x as slow as doing 256k… it’s 37x as slow. Obviously, there is some variance here, and I should probably make this run a little longer. So setting LOOP to 100000, I get this:

Time for 128k:  0.035105
Time for 256k:  0.028631
Time for 1024k: 0.942634
Time for 5120k: 0.985856

Luckily this shows about 10x the performance of before, so we’re at least on the right track here.

The long and short of this is that bigger memory buffers aren’t always better. Much depends on how and when they are used. Much also depends on where your bottleneck is. Many database loads are disk bound, so the cost of 37x isn’t much, since we’re talking microseconds here anyway. But when you’re talking high concurrency and everything is in memory, shaving microseconds from your queries can actually be crucial.

Read Buffer performance hit

Posted in performance by mtaylor on September 6, 2007

I had some fun yesterday with some odd performance problems. So I did a run with oprofile and got this:

561612   25.0417  /lib64/tls/libc-2.3.4.so memset
429457   19.1491  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux clear_page
214268    9.5540  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_page_fault
144293    6.4339  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_no_page
94410     4.2097  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux buffered_rmqueue
64998     2.8982  /lib64/tls/libc-2.3.4.so memcpy
59565     2.6559  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __down_read_trylock
59369     2.6472  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux handle_mm_fault
47312     2.1096  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux free_hot_cold_page
39161     1.7462  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux release_pages
39140     1.7452  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux unmap_vmas
27200     1.2128  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __alloc_pages
21520     0.9596  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __pagevec_lru_add_active
20772     0.9262  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux page_add_anon_rmap
19852     0.8852  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux page_remove_rmap
16424     0.7323  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux bad_range
11993     0.5348  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __might_sleep

Which seemed to me like an awful lot of page faults, especially on a system that wasn’t
swapping at all.

Alexander and I got to talking about it, and I learned a few new things. The first thing is that mmap() appears as page_faults, because the machinery for doing mmap() and for doing swap is actually the same. Fun.

Back to the problem at hand… the following was in the my.cnf:

read_buffer=5M

As I understand it, if read_buffer set to over 256K, it uses mmap() instead of malloc() for memory
allocation. Actually – this is a libc malloc thing and is tunable, but defaults to 256k. From the manual:

Each thread that does a sequential scan allocates a buffer of this size (in bytes) for each table it scans.

In this case, the server is pretty much handling click logging, so all of the queries are pure inserts. Why then would the read buffer get used at all?

Well, it just so happens that this is a PHP app and is not using any sort of persistent connections, so the app is connecting and disconnecting 60 – 100 times a second. Each of those connections is going to have to authenticate against (at least) the mysql.user table. On the other hand, that _should_ be cached, so it shouldn’t be a problem. However, lowering the read buffer is quite easy … and it’ll be interesting.

So, theory to the fire – I reduced read_buffer to 128k (the default) and then got this output from oprofile:

2220     15.7760  /lib64/tls/libc-2.3.4.so memset
785       5.5785  /lib64/tls/libc-2.3.4.so memcpy
608       4.3206  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux thread_return
398       2.8283  /usr/libexec/mysqld      yyparse(void*)
337       2.3948  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux system_call
234       1.6629  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_gettimeoffset_hpet
197       1.3999  /lib64/tls/libc-2.3.4.so _int_malloc
183       1.3005  /lib64/tls/libpthread-2.3.4.so pthread_mutex_lock
181       1.2862  /lib64/tls/libpthread-2.3.4.so pthread_mutex_unlock
181       1.2862  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux clear_page
176       1.2507  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux wake_futex
143       1.0162  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux sys_tgkill
138       0.9807  /usr/libexec/mysqld      yylex(void*, void*)
130       0.9238  /usr/libexec/mysqld      update_sys_var_str(sys_var_str*, _pthread_rwlock_t*, set_var*)
126       0.8954  /jbd                     (no symbols)
123       0.8741  /usr/libexec/mysqld      my_strntoul_ucs2
106       0.7533  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __might_sleep
106       0.7533  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_futex
104       0.7391  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux tcp_recvmsg

Which is more what we would expect.

The results on the system as a whole were that with read buffer set to 5M, extra 3-5% usr and 6-12% sys cpu time taken on an 8 cpu box (as reported by top) or, to look at it another way, about 100% of a single CPU taken by mysqld.

With read_buffer reduced to 128k, the CPU usage dropped to just about nothing.

OProfile, Kernel Images and InnoDB (oh my!)

Posted in performance by mtaylor on December 6, 2006

Kristian Köhntopp has a wonderful article about using oprofile to track down problems in running programs. I thought I’d add a few thoughts.

If you need to get a vmlinux kernel on redhat, apparently you just need to install kernel-debuginfo, which will provide a vmlinux image you can profile against.

If you are using debian, unfortunately there is no package I could find to allow you to get a vmlinux. so what I did was:
($kver isn’t a real variable – it’s your kernel version. tab completion probably comes in handy at some point)

  1. Install linux-tree-$kver – which gets you the debian kernel sources
  2. Unpack the tar.bz2 file that is now in /usr/src
  3. Copy /boot/config-$kver to /usr/src/linux-$kver/.config
  4. cd /usr/src/linux-$kver
  5. make oldconfig
  6. make prepare
  7. make vmlinux
  8. cp vmlinux /boot

And now I get things that look like this:

322451    5.7214  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  cmp_dtuple_rec_with_match
304963    5.4111  /lib/libpthread-0.60.so                                                                            __pthread_rwlock_rdlock_internal
275748    4.8927  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  my_lengthsp_8bit
275503    4.8884  /boot/vmlinux                                                                                       try_to_wake_up
221124    3.9235  /lib/libpthread-0.60.so                                                                            __pthread_rwlock_wrlock_internal
188777    3.3496  /boot/vmlinux                                                                                       futex_wait
170249    3.0208  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  btr_search_guess_on_hash
144030    2.5556  /lib/libc-2.3.2.so                                                                                   _wordcopy_bwd_aligned
131726    2.3373  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  rec_get_nth_field
125462    2.2261  /lib/libpthread-0.60.so                                                                            __pthread_mutex_unlock_internal
122389    2.1716  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  safe_mutex_lock
116276    2.0631  /boot/vmlinux                                                                                       futex_wake
115654    2.0521  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  safe_mutex_unlock

Which – btw – shows what your system looks like when it’s experiencing the InnoDB thread concurrency bug.