Skip to content
  1. Mar 05, 2010
  2. Mar 04, 2010
    • Yehuda Sadeh's avatar
      ceph: reset osd after relevant messages timed out · 422d2cb8
      Yehuda Sadeh authored
      
      
      This simplifies the process of timing out messages. We
      keep lru of current messages that are in flight. If a
      timeout has passed, we reset the osd connection, so that
      messages will be retransmitted.  This is a failsafe in case
      we hit some sort of problem sending out message to the OSD.
      Normally, we'll get notification via an updated osdmap if
      there are problems.
      
      If a request is older than the keepalive timeout, send a
      keepalive to ensure we detect any breaks in the TCP connection.
      
      Signed-off-by: default avatarYehuda Sadeh <yehuda@hq.newdream.net>
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      422d2cb8
  3. Mar 01, 2010
  4. Feb 26, 2010
    • Sage Weil's avatar
      ceph: remove bogus mds forward warning · 080af17e
      Sage Weil authored
      
      
      The must_resend flag is always true, not false.  In any case, we can
      just ignore it anyway.
      
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      080af17e
    • Sage Weil's avatar
      ceph: remove fragile __map_osds optimization · c99eb1c7
      Sage Weil authored
      
      
      We used to try to avoid freeing and then reallocating the osd
      struct.  This is a bit fragile due to potential interactions with
      other references (beyond o_requests), and may be the cause of
      this crash:
      
      [120633.442358] BUG: unable to handle kernel NULL pointer dereference at (null)
      [120633.443292] IP: [<ffffffff812549b6>] rb_erase+0x11d/0x277
      [120633.443292] PGD f7ff3067 PUD f7f53067 PMD 0
      [120633.443292] Oops: 0000 [#1] PREEMPT SMP
      [120633.443292] last sysfs file: /sys/kernel/uevent_seqnum
      [120633.443292] CPU 1
      [120633.443292] Modules linked in: ceph fan ac battery psmouse ehci_hcd ide_pci_generic ohci_hcd thermal processor button
      [120633.443292] Pid: 3023, comm: ceph-msgr/1 Not tainted 2.6.32-rc2 #12 H8SSL
      [120633.443292] RIP: 0010:[<ffffffff812549b6>]  [<ffffffff812549b6>] rb_erase+0x11d/0x277
      [120633.443292] RSP: 0018:ffff8800f7b13a50  EFLAGS: 00010246
      [120633.443292] RAX: ffff880022907819 RBX: ffff880022907818 RCX: 0000000000000000
      [120633.443292] RDX: ffff8800f7b13a80 RSI: ffff8800f587eb48 RDI: 0000000000000000
      [120633.443292] RBP: ffff8800f7b13a60 R08: 0000000000000000 R09: 0000000000000004
      [120633.443292] R10: 0000000000000000 R11: ffff8800c4441000 R12: ffff8800f587eb48
      [120633.443292] R13: ffff8800f58eaa00 R14: ffff8800f413c000 R15: 0000000000000001
      [120633.443292] FS:  00007fbef6e226e0(0000) GS:ffff880009200000(0000) knlGS:0000000000000000
      [120633.443292] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      [120633.443292] CR2: 0000000000000000 CR3: 00000000f7c53000 CR4: 00000000000006e0
      [120633.443292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [120633.443292] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [120633.443292] Process ceph-msgr/1 (pid: 3023, threadinfo ffff8800f7b12000, task ffff8800f5858b40)
      [120633.443292] Stack:
      [120633.443292]  ffff8800f413c000 ffff8800f587e9c0 ffff8800f7b13a80 ffffffffa0098a86
      [120633.443292] <0> 00000000000006f1 0000000000000000 ffff8800f7b13af0 ffffffffa009959b
      [120633.443292] <0> ffff8800f413c000 ffff880022a68400 ffff880022a68400 ffff8800f587e9c0
      [120633.443292] Call Trace:
      [120633.443292]  [<ffffffffa0098a86>] __remove_osd+0x4d/0xbc [ceph]
      [120633.443292]  [<ffffffffa009959b>] __map_osds+0x199/0x4fa [ceph]
      [120633.443292]  [<ffffffffa00999f4>] ? __send_request+0xf8/0x186 [ceph]
      [120633.443292]  [<ffffffffa0099beb>] kick_requests+0x169/0x3cb [ceph]
      [120633.443292]  [<ffffffffa009a8c1>] ceph_osdc_handle_map+0x370/0x522 [ceph]
      
      Since we're probably screwed anyway if a small kmalloc is
      failing, don't bother with trying to be clever here.
      
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      c99eb1c7
  5. Feb 25, 2010
  6. Feb 23, 2010
  7. Feb 19, 2010
  8. Feb 17, 2010
  9. Feb 15, 2010
  10. Feb 14, 2010
  11. Feb 11, 2010
Loading