Hallo,
mein neuer RS 2000 G7SE 15 years ist sehr instabil. Ich wurde bereits zweimal auf einen anderen Node verschoben. Das System, Debian Stretch unter 3.14.51-grsec, lief auf dem Root-Server M SSD v6 ohne Probleme. Dies wurde per RSync in der Recovery-Console auf das neue System kopiert und nur die IP-Addresse /etc bzw. die fstab entsprechend angepaßt.
Da ich vermutet habe, das der Kernel mit dem neuen Server nicht richtig zusammenläuft, habe ich den Kernel auf 4.11.8 aktualisiert. Trotzdem bekomme ich noch ständig Kernel-Meldungen wie
Code
[Mi Jul 5 20:13:01 2017] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 30s! [imap:3474]
[Mi Jul 5 20:13:01 2017] Modules linked in: fuse ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter cpufreq_conservative cpufreq_userspace cpufreq_powersave quota_v2 quota_tree ghash_clmulni_intel pcbc ppdev aesni_intel aes_x86_64 snd_pcm crypto_simd cryptd joydev evdev glue_helper sg snd_timer serio_raw snd virtio_balloon soundcore parport_pc tpm_tis pcspkr tpm_tis_core tpm parport button loop ip_tables x_tables autofs4 aacraid raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq raid1 raid0 md_mod sata_nv sata_sil sata_via sr_mod cdrom sd_mod ata_generic virtio_scsi virtio_net crc32c_intel psmouse uhci_hcd ehci_hcd ata_piix virtio_pci virtio_ring usbcore virtio usb_common floppy
[Mi Jul 5 20:13:01 2017] CPU: 0 PID: 3474 Comm: imap Not tainted 4.11.8 #1
[Mi Jul 5 20:13:01 2017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161116_142049-atsina 04/01/2014
[Mi Jul 5 20:13:01 2017] task: ffff88032e229200 task.stack: ffffc90003514000
[Mi Jul 5 20:13:01 2017] RIP: 0010:exit_to_usermode_loop+0x50/0xa0
[Mi Jul 5 20:13:01 2017] RSP: 0000:ffffc90003517f20 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[Mi Jul 5 20:13:01 2017] RAX: ffff88032e229200 RBX: 0000000000000008 RCX: ffffffff81e0e500
[Mi Jul 5 20:13:01 2017] RDX: 0140000000000000 RSI: 0000000000000008 RDI: ffffc90003517f58
[Mi Jul 5 20:13:01 2017] RBP: ffff88032e229200 R08: 0000000000000400 R09: ffff88032e229200
[Mi Jul 5 20:13:01 2017] R10: ffff88033fc03f28 R11: 0000000000000001 R12: ffff88032e229200
[Mi Jul 5 20:13:01 2017] R13: ffff88032e229200 R14: ffffc90003517f58 R15: 0000000000000000
[Mi Jul 5 20:13:01 2017] FS: 00007efc7156f200(0000) GS:ffff88033fc00000(0000) knlGS:0000000000000000
[Mi Jul 5 20:13:01 2017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mi Jul 5 20:13:01 2017] CR2: 00007f8ca83aa000 CR3: 000000032f970000 CR4: 00000000003406f0
[Mi Jul 5 20:13:01 2017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Mi Jul 5 20:13:01 2017] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Mi Jul 5 20:13:01 2017] Call Trace:
[Mi Jul 5 20:13:01 2017] ? prepare_exit_to_usermode+0x2a/0x30
[Mi Jul 5 20:13:01 2017] ? retint_user+0x8/0x10
[Mi Jul 5 20:13:01 2017] Code: 48 8b 2c 25 80 c5 00 00 eb 1d f6 c3 04 75 28 f6 c3 02 75 30 80 e7 08 75 45 fa 48 8b 45 00 a9 0e 18 00 00 89 c3 74 3e fb f6 c3 08 <74> dd e8 39 be 85 00 f6 c3 04 74 d8 4c 89 f7 e8 8c 9d 01 00 f6
[Mi Jul 5 20:13:01 2017] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
[Mi Jul 5 20:13:01 2017] clocksource: 'hpet' wd_now: 669d9d38 wd_last: a4637f60 mask: ffffffff
[Mi Jul 5 20:13:01 2017] clocksource: 'tsc' cs_now: 24c65c202be cs_last: 23a3046a04e mask: ffffffffffffffff
[Mi Jul 5 20:13:01 2017] sched_clock: Marking unstable (1047889686948, 49988077)<-(1048005102520, -65427495)
[Mi Jul 5 20:13:01 2017] tsc: Marking TSC unstable due to clocksource watchdog
[Mi Jul 5 20:13:01 2017] clocksource: Switched to clocksource hpet
[Mi Jul 5 20:13:38 2017] ata2: lost interrupt (Status 0x58)
[Mi Jul 5 20:13:38 2017] ata2: drained 8 bytes to clear DRQ
[Mi Jul 5 20:13:38 2017] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Mi Jul 5 20:13:38 2017] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Get event status notification 4a 01 00 00 10 00 00 00 08 00res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[Mi Jul 5 20:13:38 2017] ata2.00: status: { DRDY }
[Mi Jul 5 20:13:38 2017] ata2: soft resetting link
[Mi Jul 5 20:13:39 2017] ata2.01: NODEV after polling detection
[Mi Jul 5 20:13:39 2017] ata2.00: configured for MWDMA2
[Mi Jul 5 20:13:39 2017] ata2: EH complete
[Mi Jul 5 20:14:07 2017] perf: interrupt took too long (5225 > 5120), lowering kernel.perf_event_max_sample_rate to 38250
[Mi Jul 5 20:15:22 2017] perf: interrupt took too long (6549 > 6531), lowering kernel.perf_event_max_sample_rate to 30500
Alles anzeigen
Code: /etc/sysctl.conf
vm.dirty_background_ratio=5
vm.dirty_ratio=10
kernel.perf_event_paranoid=3
kernel.sched_min_granularity_ns=10000000
vm.swappiness=10
kernel.sched_wakeup_granularity_ns=15000000
kernel.sysrq=1
kernel.dmesg_restrict=1
(durch das weglassen von nohz=off gibt es keine Verbesserung)