Hi. I'm using ZFS on my Fedora 14 backup server, and on the client I am performing the backup through rsync to the NFS-mounted file system. For the most part it is working well, but every once in a while rsync crashes and has to be restarted. I have found the following 3 types of errors in the server's dmesg log:
[ 2485.464779] SPL: Showing stack for process 5451
[ 2485.464785] Pid: 5451, comm: txg_sync Tainted: P 2.6.35.12-88.fc14.x86_64 #1
[ 2485.464789] Call Trace:
[ 2485.464807] [] spl_debug_dumpstack+0x30/0x32 [spl]
[ 2485.464818] [] kmem_alloc_debug+0x35/0xd7 [spl]
[ 2485.464864] [] dsl_scan_setup_sync+0x13c/0x1c0 [zfs]
[ 2485.464873] [] ? need_resched+0x23/0x2d
[ 2485.464910] [] dsl_sync_task_group_sync+0x129/0x189 [zfs]
[ 2485.464945] [] dsl_pool_sync+0x1da/0x41e [zfs]
[ 2485.464984] [] spa_sync+0x4b1/0x875 [zfs]
[ 2485.464990] [] ? autoremove_wake_function+0x16/0x39
[ 2485.464996] [] ? __wake_up+0x44/0x4d
[ 2485.465037] [] txg_sync_thread+0x1bd/0x322 [zfs]
[ 2485.465077] [] ? txg_sync_thread+0x0/0x322 [zfs]
[ 2485.465088] [] thread_generic_wrapper+0x71/0x7e [spl]
[ 2485.465098] [] ? thread_generic_wrapper+0x0/0x7e [spl]
[ 2485.465103] [] kthread+0x7f/0x87
[ 2485.465108] [] kernel_thread_helper+0x4/0x10
[ 2485.465113] [] ? kthread+0x0/0x87
[ 2485.465117] [] ? kernel_thread_helper+0x0/0x10
[17291.729355] general protection fault: 0000 [#1] SMP
[17291.729364] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map
[17291.729369] CPU 1
[17291.729372] Modules linked in: lzfs zfs(P) zcommon(P) znvpair(P) zavl(P) zlib_deflate zunicode(P) spl fuse nfs fscache ipmi_msghandler dell_rbu nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc p4_clockmod freq_table speedstep_lib ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput tg3 iTCO_wdt i2c_i801 i2c_core shpchp iTCO_vendor_support microcode i3000_edac edac_core dcdbas mptspi mptscsih mptbase scsi_transport_spi [last unloaded: mperf]
[17291.729432]
[17291.729438] Pid: 5564, comm: nfsd Tainted: P 2.6.35.12-88.fc14.x86_64 #1 0Y8628/PowerEdge 850
[17291.729448] RIP: 0010:[] [] virt_to_head_page+0x23/0x31
[17291.729461] RSP: 0018:ffff8800c669f710 EFLAGS: 00010207
[17291.729465] RAX: 020b772f8b8b8b78 RBX: 9595959595959595 RCX: ffff8800c14fdd00
[17291.729469] RDX: ffffea0000000000 RSI: ffffffffa027d878 RDI: 9595959595959595
[17291.729472] RBP: ffff8800c669f710 R08: ffff8800c669e000 R09: ffff88010a21f800
[17291.729475] R10: ffff8800c1fca800 R11: ffff8800ef3f3e10 R12: ffff8800c648a8b0
[17291.729479] R13: ffffffffa026fec0 R14: ffff88009abc0a00 R15: 0000000000000f70
[17291.729483] FS: 0000000000000000(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
[17291.729487] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[17291.729490] CR2: 00000000f0443000 CR3: 00000000e6c7f000 CR4: 00000000000006e0
[17291.729494] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17291.729498] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17291.729503] Process nfsd (pid: 5564, threadinfo ffff8800c669e000, task ffff8800ef2ac5c0)
[17291.729505] Stack:
[17291.729507] ffff8800c669f750 ffffffff811090d7 ffff8800c648a890 9595959595959595
[17291.729513] <0> ffff8800c648a8b0 ffff8800c648a890 ffff88009abc0a00 0000000000000f70
[17291.729520] <0> ffff8800c669f770 ffffffffa026fec0 ffff8800c669f770 ffff8800555e5a38
[17291.729527] Call Trace:
[17291.729533] [] kfree+0x5a/0xd7
[17291.729549] [] kmem_free_debug+0x28/0x2d [spl]
[17291.729562] [] __cv_destroy+0x2c/0x43 [spl]
[17291.729616] [] release_rl+0x38/0x49 [zfs]
[17291.729657] [] do_delayed_release+0x26/0x28 [zfs]
[17291.729698] [] zfs_range_lock+0x4d9/0x504 [zfs]
[17291.729740] [] zfs_write+0x2e1/0x915 [zfs]
[17291.729747] [] ? __raw_local_irq_save+0x1d/0x23
[17291.729786] [] ? sa_lookup_impl+0x13/0x15 [zfs]
[17291.729792] [] ? _cond_resched+0xe/0x22
[17291.729797] [] ? mutex_lock+0x29/0x50
[17291.729801] [] ? _raw_spin_lock+0xe/0x10
[17291.729809] [] lzfs_write+0x88/0xb4 [lzfs]
[17291.729816] [] lzfs_vnop_write+0x8f/0x25a [lzfs]
[17291.729822] [] do_loop_readv_writev+0x41/0x79
[17291.729828] [] ? lzfs_vnop_write+0x0/0x25a [lzfs]
[17291.729833] [] do_readv_writev+0xb7/0x127
[17291.729850] [] ? put_cred+0x16/0x26 [nfsd]
[17291.729862] [] ? nfsd_setuser+0x1b7/0x1d9 [nfsd]
[17291.729868] [] ? need_resched+0x23/0x2d
[17291.729873] [] ? should_resched+0xe/0x2e
[17291.729877] [] ? _cond_resched+0xe/0x22
[17291.729881] [] ? mutex_lock+0x29/0x50
[17291.729891] [] ? nfsd_setuser_and_check_port+0x7c/0x9f [nfsd]
[17291.729896] [] vfs_writev+0x45/0x47
[17291.729906] [] nfsd_vfs_write.clone.6+0x133/0x342 [nfsd]
[17291.729911] [] ? dentry_open+0x89/0x91
[17291.729922] [] ? nfsd_open+0x163/0x172 [nfsd]
[17291.729933] [] nfsd_write+0xd1/0xef [nfsd]
[17291.729946] [] nfsd3_proc_write+0xd2/0xee [nfsd]
[17291.729956] [] nfsd_dispatch+0xf1/0x1cc [nfsd]
[17291.729977] [] svc_process_common+0x2b4/0x4b6 [sunrpc]
[17291.729992] [] svc_process+0x126/0x13a [sunrpc]
[17291.730001] [] nfsd+0xf6/0x13f [nfsd]
[17291.730010] [] ? nfsd+0x0/0x13f [nfsd]
[17291.730015] [] kthread+0x7f/0x87
[17291.730021] [] kernel_thread_helper+0x4/0x10
[17291.730025] [] ? kthread+0x0/0x87
[17291.730030] [] ? kernel_thread_helper+0x0/0x10
[17291.730032] Code: 02 00 00 0f 4f c2 c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 50 e1 f2 ff 48 c1 e8 0c 48 ba 00 00 00 00 00 ea ff ff 48 6b c0 38 48 01 d0 <48> 8b 10 80 e6 80 74 04 48 8b 40 10 c9 c3 55 48 89 e5 41 57 41
[17291.730087] RIP [] virt_to_head_page+0x23/0x31
[17291.730092] RSP
[17291.730103] ---[ end trace 031c33c605fb5d6f ]---
[40329.911522] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[40975.531949] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[41775.912437] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[42620.390826] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[44356.427640] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[46601.886003] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[47662.237039] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[55633.304601] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[56101.442909] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[57550.661933] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
Hi. I'm using ZFS on my Fedora 14 backup server, and on the client I am performing the backup through rsync to the NFS-mounted file system. For the most part it is working well, but every once in a while rsync crashes and has to be restarted. I have found the following 3 types of errors in the server's dmesg log:
[ 2485.464779] SPL: Showing stack for process 5451
[ 2485.464785] Pid: 5451, comm: txg_sync Tainted: P 2.6.35.12-88.fc14.x86_64 #1
[ 2485.464789] Call Trace:
[ 2485.464807] [] spl_debug_dumpstack+0x30/0x32 [spl]
[ 2485.464818] [] kmem_alloc_debug+0x35/0xd7 [spl]
[ 2485.464864] [] dsl_scan_setup_sync+0x13c/0x1c0 [zfs]
[ 2485.464873] [] ? need_resched+0x23/0x2d
[ 2485.464910] [] dsl_sync_task_group_sync+0x129/0x189 [zfs]
[ 2485.464945] [] dsl_pool_sync+0x1da/0x41e [zfs]
[ 2485.464984] [] spa_sync+0x4b1/0x875 [zfs]
[ 2485.464990] [] ? autoremove_wake_function+0x16/0x39
[ 2485.464996] [] ? __wake_up+0x44/0x4d
[ 2485.465037] [] txg_sync_thread+0x1bd/0x322 [zfs]
[ 2485.465077] [] ? txg_sync_thread+0x0/0x322 [zfs]
[ 2485.465088] [] thread_generic_wrapper+0x71/0x7e [spl]
[ 2485.465098] [] ? thread_generic_wrapper+0x0/0x7e [spl]
[ 2485.465103] [] kthread+0x7f/0x87
[ 2485.465108] [] kernel_thread_helper+0x4/0x10
[ 2485.465113] [] ? kthread+0x0/0x87
[ 2485.465117] [] ? kernel_thread_helper+0x0/0x10
[17291.729355] general protection fault: 0000 [#1] SMP
[17291.729364] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map
[17291.729369] CPU 1
[17291.729372] Modules linked in: lzfs zfs(P) zcommon(P) znvpair(P) zavl(P) zlib_deflate zunicode(P) spl fuse nfs fscache ipmi_msghandler dell_rbu nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc p4_clockmod freq_table speedstep_lib ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput tg3 iTCO_wdt i2c_i801 i2c_core shpchp iTCO_vendor_support microcode i3000_edac edac_core dcdbas mptspi mptscsih mptbase scsi_transport_spi [last unloaded: mperf]
[17291.729432]
[17291.729438] Pid: 5564, comm: nfsd Tainted: P 2.6.35.12-88.fc14.x86_64 #1 0Y8628/PowerEdge 850
[17291.729448] RIP: 0010:[] [] virt_to_head_page+0x23/0x31
[17291.729461] RSP: 0018:ffff8800c669f710 EFLAGS: 00010207
[17291.729465] RAX: 020b772f8b8b8b78 RBX: 9595959595959595 RCX: ffff8800c14fdd00
[17291.729469] RDX: ffffea0000000000 RSI: ffffffffa027d878 RDI: 9595959595959595
[17291.729472] RBP: ffff8800c669f710 R08: ffff8800c669e000 R09: ffff88010a21f800
[17291.729475] R10: ffff8800c1fca800 R11: ffff8800ef3f3e10 R12: ffff8800c648a8b0
[17291.729479] R13: ffffffffa026fec0 R14: ffff88009abc0a00 R15: 0000000000000f70
[17291.729483] FS: 0000000000000000(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
[17291.729487] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[17291.729490] CR2: 00000000f0443000 CR3: 00000000e6c7f000 CR4: 00000000000006e0
[17291.729494] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17291.729498] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17291.729503] Process nfsd (pid: 5564, threadinfo ffff8800c669e000, task ffff8800ef2ac5c0)
[17291.729505] Stack:
[17291.729507] ffff8800c669f750 ffffffff811090d7 ffff8800c648a890 9595959595959595
[17291.729513] <0> ffff8800c648a8b0 ffff8800c648a890 ffff88009abc0a00 0000000000000f70
[17291.729520] <0> ffff8800c669f770 ffffffffa026fec0 ffff8800c669f770 ffff8800555e5a38
[17291.729527] Call Trace:
[17291.729533] [] kfree+0x5a/0xd7
[17291.729549] [] kmem_free_debug+0x28/0x2d [spl]
[17291.729562] [] __cv_destroy+0x2c/0x43 [spl]
[17291.729616] [] release_rl+0x38/0x49 [zfs]
[17291.729657] [] do_delayed_release+0x26/0x28 [zfs]
[17291.729698] [] zfs_range_lock+0x4d9/0x504 [zfs]
[17291.729740] [] zfs_write+0x2e1/0x915 [zfs]
[17291.729747] [] ? __raw_local_irq_save+0x1d/0x23
[17291.729786] [] ? sa_lookup_impl+0x13/0x15 [zfs]
[17291.729792] [] ? _cond_resched+0xe/0x22
[17291.729797] [] ? mutex_lock+0x29/0x50
[17291.729801] [] ? _raw_spin_lock+0xe/0x10
[17291.729809] [] lzfs_write+0x88/0xb4 [lzfs]
[17291.729816] [] lzfs_vnop_write+0x8f/0x25a [lzfs]
[17291.729822] [] do_loop_readv_writev+0x41/0x79
[17291.729828] [] ? lzfs_vnop_write+0x0/0x25a [lzfs]
[17291.729833] [] do_readv_writev+0xb7/0x127
[17291.729850] [] ? put_cred+0x16/0x26 [nfsd]
[17291.729862] [] ? nfsd_setuser+0x1b7/0x1d9 [nfsd]
[17291.729868] [] ? need_resched+0x23/0x2d
[17291.729873] [] ? should_resched+0xe/0x2e
[17291.729877] [] ? _cond_resched+0xe/0x22
[17291.729881] [] ? mutex_lock+0x29/0x50
[17291.729891] [] ? nfsd_setuser_and_check_port+0x7c/0x9f [nfsd]
[17291.729896] [] vfs_writev+0x45/0x47
[17291.729906] [] nfsd_vfs_write.clone.6+0x133/0x342 [nfsd]
[17291.729911] [] ? dentry_open+0x89/0x91
[17291.729922] [] ? nfsd_open+0x163/0x172 [nfsd]
[17291.729933] [] nfsd_write+0xd1/0xef [nfsd]
[17291.729946] [] nfsd3_proc_write+0xd2/0xee [nfsd]
[17291.729956] [] nfsd_dispatch+0xf1/0x1cc [nfsd]
[17291.729977] [] svc_process_common+0x2b4/0x4b6 [sunrpc]
[17291.729992] [] svc_process+0x126/0x13a [sunrpc]
[17291.730001] [] nfsd+0xf6/0x13f [nfsd]
[17291.730010] [] ? nfsd+0x0/0x13f [nfsd]
[17291.730015] [] kthread+0x7f/0x87
[17291.730021] [] kernel_thread_helper+0x4/0x10
[17291.730025] [] ? kthread+0x0/0x87
[17291.730030] [] ? kernel_thread_helper+0x0/0x10
[17291.730032] Code: 02 00 00 0f 4f c2 c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 50 e1 f2 ff 48 c1 e8 0c 48 ba 00 00 00 00 00 ea ff ff 48 6b c0 38 48 01 d0 <48> 8b 10 80 e6 80 74 04 48 8b 40 10 c9 c3 55 48 89 e5 41 57 41
[17291.730087] RIP [] virt_to_head_page+0x23/0x31
[17291.730092] RSP
[17291.730103] ---[ end trace 031c33c605fb5d6f ]---
[40329.911522] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[40975.531949] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[41775.912437] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[42620.390826] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[44356.427640] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[46601.886003] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[47662.237039] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[55633.304601] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[56101.442909] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[57550.661933] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket