July 26, 2006
Linux Kernel Killing off Processes on 2.6.15 AMD64?

The last while we've had some random issues here at ufies.org that maybe the inter-tubes can help me with. Around 3am, about the same time the main backup process runs, the kernel on the system here:
Linux master 2.6.15-gentoo-r5 #1 SMP Tue Feb 21 17:49:33 PST 2006 x86_64 Intel(R) Xeon(TM) CPU 3.40GHz GNU/Linux
decides to randomly kill off some processes (apache, mysql.....). This was an occassional annoyance, but now it's almost a nightly thing and for those of us who receive the alerts (luckily not me :) it's a real pisser. The system has gobs of ram in it and we use pretty much no swap:
master cron.daily # free -m
             total       used       free     shared    buffers     cached
Mem:          5969       5514        455          0        296       2490
-/+ buffers/cache:       2726       3243
Swap:         1906          0       1905

Out of 6G of ram, 2.7G is used with the rest holding cached data. No swap is used. Fred dug up this post indicating this might be a 2.6.15 bug on AMD64 and I'm wondering if just upgrading to 2.6.16 2.6.17 (which just went stable) would fix it all. Anyone have any similar experience? Can you lend a hand?

Read on for the nitty gritty details of the kernel dump.

oom-killer: gfp_mask=0xd1, order=0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 0, high 0, batch 1 used:0
cpu 0 cold: low 0, high 0, batch 1 used:0
cpu 1 hot: low 0, high 0, batch 1 used:0
cpu 1 cold: low 0, high 0, batch 1 used:0
cpu 2 hot: low 0, high 0, batch 1 used:0
cpu 2 cold: low 0, high 0, batch 1 used:0
cpu 3 hot: low 0, high 0, batch 1 used:0
cpu 3 cold: low 0, high 0, batch 1 used:0
DMA32 per-cpu:
cpu 0 hot: low 0, high 186, batch 31 used:183
cpu 0 cold: low 0, high 62, batch 15 used:51
cpu 1 hot: low 0, high 186, batch 31 used:176
cpu 1 cold: low 0, high 62, batch 15 used:48
cpu 2 hot: low 0, high 186, batch 31 used:158
cpu 2 cold: low 0, high 62, batch 15 used:21
cpu 3 hot: low 0, high 186, batch 31 used:175
cpu 3 cold: low 0, high 62, batch 15 used:47
Normal per-cpu:
cpu 0 hot: low 0, high 186, batch 31 used:168
cpu 0 cold: low 0, high 62, batch 15 used:11
cpu 1 hot: low 0, high 186, batch 31 used:156
cpu 1 cold: low 0, high 62, batch 15 used:11
cpu 2 hot: low 0, high 186, batch 31 used:23
cpu 2 cold: low 0, high 62, batch 15 used:1
cpu 3 hot: low 0, high 186, batch 31 used:110
cpu 3 cold: low 0, high 62, batch 15 used:12
HighMem per-cpu: empty
Free pages:      510320kB (0kB HighMem)
Active:329243 inactive:1025889 dirty:193094 writeback:0 unstable:0 free:127580 slab:31606 mapped:212907 pagetables:3911
DMA free:12kB min:16kB low:20kB high:24kB active:0kB inactive:0kB present:10992kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3512 6037 6037
DMA32 free:286540kB min:5776kB low:7220kB high:8664kB active:760260kB inactive:2358736kB present:3596384kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 2525 2525
Normal free:223768kB min:4152kB low:5188kB high:6228kB active:556712kB inactive:1744820kB present:2585600kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 12kB
DMA32: 46049*4kB 8683*8kB 561*16kB 211*32kB 8*64kB 8*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 286540kB
Normal: 39766*4kB 5398*8kB 989*16kB 50*32kB 14*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 223768kB
HighMem: empty
Swap cache: add 9900, delete 9900, find 4865/5619, race 0+0
Free swap  = 1951164kB
Total swap = 1951888kB
Free swap:       1951164kB
1703936 pages of RAM
175694 reserved pages
794922 pages shared
0 pages swap cached
Out of Memory: Killed process 24143 (apache).
Out of Memory: Killed process 25925 (apache).
Out of Memory: Killed process 24297 (apache).
Out of Memory: Killed process 25934 (apache).
Out of Memory: Killed process 25961 (apache).
Out of Memory: Killed process 25924 (apache).
Out of Memory: Killed process 25960 (apache).
Out of Memory: Killed process 16420 (python).
Out of Memory: Killed process 25262 (python).
Out of Memory: Killed process 26005 (postmaster).
Out of Memory: Killed process 25263 (python).
Out of Memory: Killed process 25264 (python).
Out of Memory: Killed process 25265 (python).
Out of Memory: Killed process 26003 (postmaster).
Out of Memory: Killed process 25266 (python).
Out of Memory: Killed process 26022 (postmaster).
Out of Memory: Killed process 25267 (python).
Out of Memory: Killed process 25268 (python).
printk: 28 messages suppressed.


Posted by Arcterex at July 26, 2006 08:40 AM