- Jan 17, 2013
-
-
Yan, Zheng authored
Otherwise osd may truncate the object to larger size. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- Dec 28, 2012
-
-
Sage Weil authored
We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Alex Elder authored
A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by:
Ugis <ugis22@gmail.com> Tested-by:
Ugis <ugis22@gmail.com> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
When ceph_osdc_handle_map() is called to process a new osd map, kick_requests() is called to ensure all affected requests are updated if necessary to reflect changes in the osd map. This happens in two cases: whenever an incremental map update is processed; and when a full map update (or the last one if there is more than one) gets processed. In the former case, the kick_requests() call is followed immediately by a call to reset_changed_osds() to ensure any connections to osds affected by the map change are reset. But for full map updates this isn't done. Both cases should be doing this osd reset. Rather than duplicating the reset_changed_osds() call, move it into the end of kick_requests(). Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
The kick_requests() function is called by ceph_osdc_handle_map() when an osd map change has been indicated. Its purpose is to re-queue any request whose target osd is different from what it was when it was originally sent. It is structured as two loops, one for incomplete but registered requests, and a second for handling completed linger requests. As a special case, in the first loop if a request marked to linger has not yet completed, it is moved from the request list to the linger list. This is as a quick and dirty way to have the second loop handle sending the request along with all the other linger requests. Because of the way it's done now, however, this quick and dirty solution can result in these incomplete linger requests never getting re-sent as desired. The problem lies in the fact that the second loop only arranges for a linger request to be sent if it appears its target osd has changed. This is the proper handling for *completed* linger requests (it avoids issuing the same linger request twice to the same osd). But although the linger requests added to the list in the first loop may have been sent, they have not yet completed, so they need to be re-sent regardless of whether their target osd has changed. The first required fix is we need to avoid calling __map_request() on any incomplete linger request. Otherwise the subsequent __map_request() call in the second loop will find the target osd has not changed and will therefore not re-send the request. Second, we need to be sure that a sent but incomplete linger request gets re-sent. If the target osd is the same with the new osd map as it was when the request was originally sent, this won't happen. This can be fixed through careful handling when we move these requests from the request list to the linger list, by unregistering the request *before* it is registered as a linger request. This works because a side-effect of unregistering the request is to make the request's r_osd pointer be NULL, and *that* will ensure the second loop actually re-sends the linger request. Processing of such a request is done at that point, so continue with the next one once it's been moved. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- Dec 20, 2012
-
-
Alex Elder authored
In kick_requests(), we need to register the request before we unregister the linger request. Otherwise the unregister will reset the request's osd pointer to NULL. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
The red-black node in the ceph osd request structure is initialized in ceph_osdc_alloc_request() using rbd_init_node(). We do need to initialize this, because in __unregister_request() we call RB_EMPTY_NODE(), which expects the node it's checking to have been initialized. But rb_init_node() is apparently overkill, and may in fact be on its way out. So use RB_CLEAR_NODE() instead. For a little more background, see this commit: 4c199a93 rbtree: empty nodes have no color" Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
The red-black node node in the ceph osd event structure is not initialized in create_osdc_create_event(). Because this node can be the subject of a RB_EMPTY_NODE() call later on, we should ensure the node is initialized properly for that. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
The red-black node node in the ceph osd structure is not initialized in create_osd(). Because this node can be the subject of a RB_EMPTY_NODE() call later on, we should ensure the node is initialized properly for that. Add a call to RB_CLEAR_NODE() initialize it. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
When a connection's socket disconnects, or if there's a protocol error of some kind on the connection, a fault is signaled and the connection is reset (closed and reopened, basically). We currently get an error message on the log whenever this occurs. A ceph connection will attempt to reestablish a socket connection repeatedly if a fault occurs. This means that these error messages will get repeatedly added to the log, which is undesirable. Change the error message to be a warning, so they don't get logged by default. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- Dec 17, 2012
-
-
Alex Elder authored
A connection's socket can close for any reason, independent of the state of the connection (and without irrespective of the connection mutex). As a result, the connectino can be in pretty much any state at the time its socket is closed. Handle those other cases at the top of con_work(). Pull this whole block of code into a separate function to reduce the clutter. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
In __unregister_linger_request(), the request is being removed from the osd client's req_linger list only when the request has a non-null osd pointer. It should be done whether or not the request currently has an osd. This is most likely a non-issue because I believe the request will always have an osd when this function is called. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
If an osd has no requests and no linger requests, __reset_osd() will just remove it with a call to __remove_osd(). That drops a reference to the osd, and therefore the osd may have been free by the time __reset_osd() returns. That function offers no indication this may have occurred, and as a result the osd will continue to be used even when it's no longer valid. Change__reset_osd() so it returns an error (ENODEV) when it deletes the osd being reset. And change __kick_osd_requests() so it returns immediately (before referencing osd again) if __reset_osd() returns *any* error. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
In __unregister_request(), there is a call to list_del_init() referencing a request that was the subject of a call to ceph_osdc_put_request() on the previous line. This is not safe, because the request structure could have been freed by the time we reach the list_del_init(). Fix this by reversing the order of these lines. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-off-by:
Sage Weil <sage@inktank.com>
-
- Dec 13, 2012
-
-
Sage Weil authored
This would reset a connection with any OSD that had an outstanding request that was taking more than N seconds. The idea was that if the OSD was buggy, the client could compensate by resending the request. In reality, this only served to hide server bugs, and we haven't actually seen such a bug in quite a while. Moreover, the userspace client code never did this. More importantly, often the request is taking a long time because the OSD is trying to recover, or overloaded, and killing the connection and retrying would only make the situation worse by giving the OSD more work to do. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
- Nov 01, 2012
-
-
Alex Elder authored
Define and export function ceph_pg_pool_name_by_id() to supply the name of a pg pool whose id is given. This will be used by the next patch. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Josh Durgin <josh.durgin@inktank.com>
-
- Oct 30, 2012
-
-
Sage Weil authored
Ensure that we set the err value correctly so that we do not pass a 0 value to ERR_PTR and confuse the calling code. (In particular, osd_client.c handle_map() will BUG(!newmap)). Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
- Oct 26, 2012
-
-
Sage Weil authored
The ceph_on_in_msg_alloc() method calls the ->alloc_msg() helper which may return NULL. It also drops con->mutex while it allocates a message, which means that the connection state may change (e.g., get closed). If that happens, we clean up and bail out. Avoid calling ceph_msg_put() on a NULL return value and triggering a crash. This was observed when an ->alloc_msg() call races with a timeout that resends a zillion messages and resets the connection, and ->alloc_msg() returns NULL (because the request was resent to another target). Fixes http://tracker.newdream.net/issues/3342 Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
- Oct 10, 2012
-
-
Alex Elder authored
This patch defines a single function, queue_con_delay() to call queue_delayed_work() for a connection. It basically generalizes what was previously queue_con() by adding the delay argument. queue_con() is now a simple helper that passes 0 for its delay. queue_con_delay() returns 0 if it queued work or an errno if it did not for some reason. If con_work() finds the BACKOFF flag set for a connection, it now calls queue_con_delay() to handle arranging to start again after a delay. Note about connection reference counts: con_work() only ever gets called as a work item function. At the time that work is scheduled, a reference to the connection is acquired, and the corresponding con_work() call is then responsible for dropping that reference before it returns. Previously, the backoff handling inside con_work() silently handed off its reference to delayed work it scheduled. Now that queue_con_delay() is used, a new reference is acquired for the newly-scheduled work, and the original reference is dropped by the con->ops->put() call at the end of the function. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
Both ceph_fault() and con_work() include handling for imposing a delay before doing further processing on a faulted connection. The latter is used only if ceph_fault() is unable to. Instead, just let con_work() always be responsible for implementing the delay. After setting up the delay value, set the BACKOFF flag on the connection unconditionally and call queue_con() to ensure con_work() will get called to handle it. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
Alex Elder authored
If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- Oct 01, 2012
-
-
Sage Weil authored
If we are creating an osd request and get an invalid layout, return an EINVAL to the caller. We switch up the return to have an error code instead of NULL implying -ENOMEM. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Sage Weil authored
If we encounter an invalid (e.g., zeroed) mapping, return an error and avoid a divide by zero. Signed-off-by:
Sage Weil <sage@inktank.com> Reviewed-by:
Alex Elder <elder@inktank.com>
-
Wei Yongjun authored
Using list_move_tail() instead of list_del() + list_add_tail(). Signed-off-by:
Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Iulius Curt authored
Make ceph_monc_do_poolop() static to remove the following sparse warning: * net/ceph/mon_client.c:616:5: warning: symbol 'ceph_monc_do_poolop' was not declared. Should it be static? Also drops the 'ceph_monc_' prefix, now being a private function. Signed-off-by:
Iulius Curt <icurt@ixiacom.com> Signed-off-by:
Sage Weil <sage@inktank.com>
-
Sage Weil authored
This is unused; use monc->client->have_fsid. Signed-off-by:
Sage Weil <sage@inktank.com>
-
- Sep 27, 2012
-
-
Nicolas Dichtel authored
When jiffies wraps around (for example, 5 minutes after the boot, see INITIAL_JIFFIES) and peer has just been created, now - peer->rate_last can be < XRLIM_BURST_FACTOR * timeout, so token is not set to the maximum value, thus some icmp packets can be unexpectedly dropped. Fix this case by initializing last_rate to 60 seconds in the past. Signed-off-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
In case of error, the function genlmsg_put() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch ) Signed-off-by:
Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 25, 2012
-
-
Jan Engelhardt authored
Commit v2.6.19-rc1~1272^2~41 tells us that r->cost != 0 can happen when a running state is saved to userspace and then reinstated from there. Make sure that private xt_limit area is initialized with correct values. Otherwise, random matchings due to use of uninitialized memory. Signed-off-by:
Jan Engelhardt <jengelh@inai.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Eric Dumazet authored
mip6_mh_filter() should not modify its input, or else its caller would need to recompute ipv6_hdr() if skb->head is reallocated. Use skb_header_pointer() instead of pskb_may_pull() Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
icmpv6_filter() should not modify its input, or else its caller would need to recompute ipv6_hdr() if skb->head is reallocated. Use skb_header_pointer() instead of pskb_may_pull() and change the prototype to make clear both sk and skb are const. Also, if icmpv6 header cannot be found, do not deliver the packet, as we do in IPv4. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 24, 2012
-
-
Eric Dumazet authored
Its possible to use RAW sockets to get a crash in tcp_set_keepalive() / sk_reset_timer() Fix is to make sure socket is a SOCK_STREAM one. Reported-by:
Dave Jones <davej@redhat.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 23, 2012
-
-
Linus Lüssing authored
If receiving an OGM from a neighbor other than the currently selected and if it has the same TQ then we are supposed to switch if this neighbor provides a more symmetric link than the currently selected one. However this symmetry check currently is broken if the interface of the neighbor we received the OGM from and the one of the currently selected neighbor differ: We are currently trying to determine the symmetry of the link towards the selected router via the link we received the OGM from instead of just checking via the link towards the currently selected router. This leads to way more route switches than necessary and can lead to permanent route flapping in many common multi interface setups. This patch fixes this issue by using the right interface for this symmetry check. Signed-off-by:
Linus Lüssing <linus.luessing@web.de>
-
Def authored
Into function interface_set_mac_addr, the function tt_local_add was invoked before updating dev->dev_addr. The new MAC address was not tagged as NoPurge. Signed-off-by:
Def <def@laposte.net>
-
- Sep 22, 2012
-
-
Eric Dumazet authored
icmp_filter() should not modify its input, or else its caller would need to recompute ip_hdr() if skb->head is reallocated. Use skb_header_pointer() instead of pskb_may_pull() and change the prototype to make clear both sk and skb are const. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alex Elder authored
In write_partial_msg_pages(), pages need to be kmapped in order to perform a CRC-32c calculation on them. As an artifact of the way this code used to be structured, the kunmap() call was separated from the kmap() call and both were done conditionally. But the conditions under which the kmap() and kunmap() calls were made differed, so there was a chance a kunmap() call would be done on a page that had not been mapped. The symptom of this was tripping a BUG() in kunmap_high() when pkmap_count[nr] became 0. Reported-by:
Bryan K. Wright <bryan@virginia.edu> Signed-off-by:
Alex Elder <elder@inktank.com> Reviewed-by:
Sage Weil <sage@inktank.com>
-
- Sep 21, 2012
-
-
Zhao Hongjiang authored
Change return value from -EACCES to -EPERM when the permission check fails. Signed-off-by:
Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
In case of error, the function fib6_add_1() returns ERR_PTR() or NULL pointer. The ERR_PTR() case check is missing in fib6_add(). dpatch engine is used to generated this patch. (https://github.com/weiyj/dpatch ) Signed-off-by:
Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ed L. Cashin authored
A change in a series of VLAN-related changes appears to have inadvertently disabled the use of the scatter gather feature of network cards for transmission of non-IP ethernet protocols like ATA over Ethernet (AoE). Below is a reference to the commit that introduces a "harmonize_features" function that turns off scatter gather when the NIC does not support hardware checksumming for the ethernet protocol of an sk buff. commit f01a5236 Author: Jesse Gross <jesse@nicira.com> Date: Sun Jan 9 06:23:31 2011 +0000 net offloading: Generalize netif_get_vlan_features(). The can_checksum_protocol function is not equipped to consider a protocol that does not require checksumming. Calling it for a protocol that requires no checksum is inappropriate. The patch below has harmonize_features call can_checksum_protocol when the protocol needs a checksum, so that the network layer is not forced to perform unnecessary skb linearization on the transmission of AoE packets. Unnecessary linearization results in decreased performance and increased memory pressure, as reported here: http://www.spinics.net/lists/linux-mm/msg15184.html The problem has probably not been widely experienced yet, because only recently has the kernel.org-distributed aoe driver acquired the ability to use payloads of over a page in size, with the patchset recently included in the mm tree: https://lkml.org/lkml/2012/8/28/140 The coraid.com-distributed aoe driver already could use payloads of greater than a page in size, but its users generally do not use the newest kernels. Signed-off-by:
Ed Cashin <ecashin@coraid.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 20, 2012
-
-
Mathias Krause authored
The ESN replay window was already fully initialized in xfrm_alloc_replay_state_esn(). No need to copy it again. Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
Mathias Krause <minipli@googlemail.com> Acked-by:
Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-