- Jul 18, 2012
-
-
Saurabh authored
Incorporated David and Steffen's comments. Add hook for rx-path xfmr4_mode_tunnel for VTI tunnel module. Signed-off-by:
Saurabh Mohan <saurabh.mohan@vyatta.com> Reviewed-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Followup of commit 0c24604b (tcp: implement RFC 5961 4.2) As reported by Vijay Subramanian, we should send a challenge ACK instead of a dup ack if a SYN flag is set on a packet received out of window. This permits the ratelimiting to work as intended, and to increase correct SNMP counters. Suggested-by:
Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 17, 2012
-
-
Eric Dumazet authored
free_nh_exceptions() should use rcu_dereference_protected(..., 1) since its called after one RCU grace period. Also add some const-ification in recent code. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Need to mask it with (FNHE_HASH_SIZE - 1). Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
In a regime where we have subnetted route entries, we need a way to store persistent storage about destination specific learned values such as redirects and PMTU values. This is implemented here via nexthop exceptions. The initial implementation is a 2048 entry hash table with relaiming starting at chain length 5. A more sophisticated scheme can be devised if that proves necessary. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Implement the RFC 5691 mitigation against Blind Reset attack using SYN bit. Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop incoming packet, instead of resetting the session. Add a new SNMP counter to count number of challenge acks sent in response to SYN packets. (netstat -s | grep TCPSYNChallenge) Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session because of a SYN flag. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s | grep TCPChallengeACK) Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andrey Vagin authored
Before this patch sock_diag works for init_net only and dumps information about sockets from all namespaces. This patch expands sock_diag for all name-spaces. It creates a netlink kernel socket for each netns and filters data during dumping. v2: filter accoding with netns in all places remove an unused variable. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by:
Andrew Vagin <avagin@openvz.org> Acked-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Add three SNMP TCP counters, to better track TCP behavior at global stage (netstat -s), when packets are received Out Of Order (OFO) TCPOFOQueue : Number of packets queued in OFO queue TCPOFODrop : Number of packets meant to be queued in OFO but dropped because socket rcvbuf limit hit. TCPOFOMerge : Number of packets in OFO that were merged with other packets. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 16, 2012
-
-
David S. Miller authored
This abstracts away the call to dst_ops->update_pmtu() so that we can transparently handle the fact that, in the future, the dst itself can be invalidated by the PMTU update (when we have non-host routes cached in sockets). So we try to rebuild the socket cached route after the method invocation if necessary. This isn't used by SCTP because it needs to cache dsts per-transport, and thus will need it's own local version of this helper. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 13, 2012
-
-
David S. Miller authored
We only use it to fetch the rule's tclassid, so just store the tclassid there instead. This also decreases the size of fib_result by a full 8 bytes on 64-bit. On 32-bits it's a wash. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Socket state LAST_ACK should allow TSQ to send additional frames, or else we rely on incoming ACKS or timers to send them. Reported-by:
Yuchung Cheng <ycheng@google.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 12, 2012
-
-
David S. Miller authored
No longer used. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
All handler->err() routines expect that we've done a pskb_may_pull() test to make sure that IP header length + 8 bytes can be safely pulled. Reported-by:
Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer necessary. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer needed, as the protocol handlers now all properly propagate the redirect back into the routing code. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
All of the redirect acceptance policy is now contained within. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Pass in the SKB rather than just the IP addresses, so that policy and other aspects can reside in ip_rt_redirect() rather then icmp_redirect(). Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
And thus, we can remove the ping_err() hack. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This introduce TSQ (TCP Small Queues) TSQ goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. sk->sk_wmem_alloc not allowed to grow above a given limit, allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a given time. TSO packets are sized/capped to half the limit, so that we have two TSO packets in flight, allowing better bandwidth use. As a side effect, setting the limit to 40000 automatically reduces the standard gso max limit (65536) to 40000/2 : It can help to reduce latencies of high prio packets, having smaller TSO packets. This means we divert sock_wfree() to a tcp_wfree() handler, to queue/send following frames when skb_orphan() [2] is called for the already queued skbs. Results on my dev machines (tg3/ixgbe nics) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) < 8ms on 100Mbit (instead of 132 ms) I no longer have 4 MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes. As skb destructor cannot restart xmit itself ( as qdisc lock might be taken at this point ), we delegate the work to a tasklet. We use one tasklest per cpu for performance reasons. If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag. This flag is tested in a new protocol method called from release_sock(), to eventually send new segments. [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable [2] skb_orphan() is usually called at TX completion time, but some drivers call it in their start_xmit() handler. These drivers should at least use BQL, or else a single TCP session can still fill the whole NIC TX ring, since TSQ will have no effect. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Dave Taht <dave.taht@bufferbloat.net> Cc: Tom Herbert <therbert@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
The recent patch "tcp: Maintain dynamic metrics in local cache." introduced an out of bounds access due to what appears to be a typo. I believe this change should resolve the issue by replacing the access to RTAX_CWND with TCP_METRIC_CWND. Signed-off-by:
Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 11, 2012
-
-
Ben Hutchings authored
Signed-off-by:
Ben Hutchings <bhutchings@solarflare.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
Fix incorrect start markers, wrapped summary lines, missing section breaks, incorrect separators, and some name mismatches. Signed-off-by:
Ben Hutchings <bhutchings@solarflare.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer used. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Nothing every writes to ipv4 metrics any longer. PMTU is stored in rt->rt_pmtu. Dynamic TCP metrics are stored in a special TCP metrics cache, completely outside of the routes. Therefore ->cow_metrics() can simply nothing more than a WARN_ON trigger so we can catch anyone who tries to add new writes to ipv4 route metrics. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Blackhole routes have a COW metrics operation that returns NULL always, therefore this dst_copy_metrics() call did absolutely nothing. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Rather than at every struct rtable creation. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Maintaining this in the inetpeer entries was not the right way to do this at all. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Nobody provides non-zero values any longer. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer needed. TCP writes metrics, but now in it's own special cache that does not dirty the route metrics. Therefore there is no longer any reason to pre-cow metrics in this way. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Only use it in the absolutely required cases: 1) COW'ing metrics 2) ipv4 PMTU 3) ipv4 redirects Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer used. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer used. Signed-off-by:
David S. Miller <davem@davemloft.net>
-