Skip to content
  1. Jun 16, 2017
    • Vivien Didelot's avatar
      net: dsa: add cross-chip multicast support · a1a6b7ea
      Vivien Didelot authored
      
      
      Similarly to how cross-chip VLAN works, define a bitmap of multicast
      group members for a switch, now including its DSA ports, so that
      multicast traffic can be sent to all switches of the fabric.
      
      A switch may drop the frames if no user port is a member.
      
      This brings support for multicast in a multi-chip environment.
      As of now, all switches of the fabric must support the multicast
      operations in order to program a single fabric port.
      
      Reported-by: default avatarJason Cobham <jcobham@questertangent.com>
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Tested-by: default avatarJason Cobham <jcobham@questertangent.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1a6b7ea
    • Wei Wang's avatar
      decnet: always not take dst->__refcnt when inserting dst into hash table · 247488c0
      Wei Wang authored
      
      
      In the existing dn_route.c code, dn_route_output_slow() takes
      dst->__refcnt before calling dn_insert_route() while dn_route_input_slow()
      does not take dst->__refcnt before calling dn_insert_route().
      This makes the whole routing code very buggy.
      In dn_dst_check_expire(), dnrt_free() is called when rt expires. This
      makes the routes inserted by dn_route_output_slow() not able to be
      freed as the refcnt is not released.
      In dn_dst_gc(), dnrt_drop() is called to release rt which could
      potentially cause the dst->__refcnt to be dropped to -1.
      In dn_run_flush(), dst_free() is called to release all the dst. Again,
      it makes the dst inserted by dn_route_output_slow() not able to be
      released and also, it does not wait on the rcu and could potentially
      cause crash in the path where other users still refer to this dst.
      
      This patch makes sure both input and output path do not take
      dst->__refcnt before calling dn_insert_route() and also makes sure
      dnrt_free()/dst_free() is called when removing dst from the hash table.
      The only difference between those 2 calls is that dnrt_free() waits on
      the rcu while dst_free() does not.
      
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      247488c0
    • Sowmini Varadhan's avatar
      rds: tcp: Set linger when rejecting an incoming conn in rds_tcp_accept_one · 10beea7d
      Sowmini Varadhan authored
      
      
      Each time we get an incoming SYN to the RDS_TCP_PORT, the TCP
      layer accepts the connection and then the rds_tcp_accept_one()
      callback is invoked to process the incoming connection.
      
      rds_tcp_accept_one() may reject the incoming syn for a number of
      reasons, e.g., commit 1a0e100f ("RDS: TCP: Force every connection
      to be initiated by numerically smaller IP address"), or because
      we are getting spammed by a malicious node that is triggering
      a flood of connection attempts to RDS_TCP_PORT. If the incoming
      syn is rejected, no data would have been sent on the TCP socket,
      and we do not need to be in TIME_WAIT state, so we set linger on
      the TCP socket before closing, thereby closing the socket efficiently
      with a RST.
      
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Tested-by: default avatarImanti Mendez <imanti.mendez@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10beea7d
    • Sowmini Varadhan's avatar
      rds: tcp: various endian-ness fixes · 00354de5
      Sowmini Varadhan authored
      
      
      Found when testing between sparc and x86 machines on different
      subnets, so the address comparison patterns hit the corner cases and
      brought out some bugs fixed by this patch.
      
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Tested-by: default avatarImanti Mendez <imanti.mendez@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00354de5
    • Sowmini Varadhan's avatar
      rds: tcp: remove cp_outgoing · 41500c3e
      Sowmini Varadhan authored
      
      
      After commit 1a0e100f ("RDS: TCP: Force every connection to be
      initiated by numerically smaller IP address") we no longer need
      the logic associated with cp_outgoing, so clean up usage of this
      field.
      
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Tested-by: default avatarImanti Mendez <imanti.mendez@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41500c3e
    • Martin KaFai Lau's avatar
      net: Add IFLA_XDP_PROG_ID · 58038695
      Martin KaFai Lau authored
      
      
      Expose prog_id through IFLA_XDP_PROG_ID.  This patch
      makes modification to generic_xdp.  The later patches will
      modify other xdp-supported drivers.
      
      prog_id is added to struct net_dev_xdp.
      
      iproute2 patch will be followed. Here is how the 'ip link'
      will look like:
      > ip link show eth0
      3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp(prog_id:1) qdisc fq_codel state UP mode DEFAULT group default qlen 1000
      
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58038695
    • Johannes Berg's avatar
      networking: add and use skb_put_u8() · 634fef61
      Johannes Berg authored
      
      
      Joe and Bjørn suggested that it'd be nicer to not have the
      cast in the fairly common case of doing
      	*(u8 *)skb_put(skb, 1) = c;
      
      Add skb_put_u8() for this case, and use it across the code,
      using the following spatch:
      
          @@
          expression SKB, C, S;
          typedef u8;
          identifier fn = {skb_put};
          fresh identifier fn2 = fn ## "_u8";
          @@
          - *(u8 *)fn(SKB, S) = C;
          + fn2(SKB, C);
      
      Note that due to the "S", the spatch isn't perfect, it should
      have checked that S is 1, but there's also places that use a
      sizeof expression like sizeof(var) or sizeof(u8) etc. Turns
      out that nobody ever did something like
      	*(u8 *)skb_put(skb, 2) = c;
      
      which would be wrong anyway since the second byte wouldn't be
      initialized.
      
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Suggested-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      634fef61
    • Johannes Berg's avatar
      networking: make skb_push & __skb_push return void pointers · d58ff351
      Johannes Berg authored
      
      
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
          @@
          expression SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - fn(SKB, LEN)[0]
          + *(u8 *)fn(SKB, LEN)
      
      Note that the last part there converts from push(...)[0] to the
      more idiomatic *(u8 *)push(...).
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d58ff351
    • Johannes Berg's avatar
      networking: make skb_pull & friends return void pointers · af72868b
      Johannes Berg authored
      
      
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = {
                  skb_pull,
                  __skb_pull,
                  skb_pull_inline,
                  __pskb_pull_tail,
                  __pskb_pull,
                  pskb_pull
          };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = {
                  skb_pull,
                  __skb_pull,
                  skb_pull_inline,
                  __pskb_pull_tail,
                  __pskb_pull,
                  pskb_pull
          };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af72868b
    • Johannes Berg's avatar
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg authored
      
      
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4df864c1
    • Johannes Berg's avatar
      networking: introduce and use skb_put_data() · 59ae1d12
      Johannes Berg authored
      
      
      A common pattern with skb_put() is to just want to memcpy()
      some data into the new space, introduce skb_put_data() for
      this.
      
      An spatch similar to the one for skb_put_zero() converts many
      of the places using it:
      
          @@
          identifier p, p2;
          expression len, skb, data;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, len);
          |
          -memcpy(p, data, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb, data;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, sizeof(*p));
          |
          -memcpy(p, data, sizeof(*p));
          )
      
          @@
          expression skb, len, data;
          @@
          -memcpy(skb_put(skb, len), data, len);
          +skb_put_data(skb, data, len);
      
      (again, manually post-processed to retain some comments)
      
      Reviewed-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59ae1d12
    • Johannes Berg's avatar
      networking: convert many more places to skb_put_zero() · b080db58
      Johannes Berg authored
      
      
      There were many places that my previous spatch didn't find,
      as pointed out by yuan linyu in various patches.
      
      The following spatch found many more and also removes the
      now unnecessary casts:
      
          @@
          identifier p, p2;
          expression len;
          expression skb;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, len);
          |
          -memset(p, 0, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, sizeof(*p));
          |
          -memset(p, 0, sizeof(*p));
          )
      
          @@
          expression skb, len;
          @@
          -memset(skb_put(skb, len), 0, len);
          +skb_put_zero(skb, len);
      
      Apply it to the tree (with one manual fixup to keep the
      comment in vxlan.c, which spatch removed.)
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b080db58
    • David S. Miller's avatar
      tls: Depend upon INET not plain NET. · 54144b48
      David S. Miller authored
      
      
      We refer to TCP et al. symbols so have to use INET as
      the dependency.
      
         ERROR: "tcp_prot" [net/tls/tls.ko] undefined!
      >> ERROR: "tcp_rate_check_app_limited" [net/tls/tls.ko] undefined!
         ERROR: "tcp_register_ulp" [net/tls/tls.ko] undefined!
         ERROR: "tcp_unregister_ulp" [net/tls/tls.ko] undefined!
         ERROR: "do_tcp_sendpages" [net/tls/tls.ko] undefined!
      
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54144b48
  2. Jun 15, 2017
  3. Jun 14, 2017
    • David Howells's avatar
      rxrpc: Cache the congestion window setting · f7aec129
      David Howells authored
      
      
      Cache the congestion window setting that was determined during a call's
      transmission phase when it finishes so that it can be used by the next call
      to the same peer, thereby shortcutting the slow-start algorithm.
      
      The value is stored in the rxrpc_peer struct and is accessed without
      locking.  Each call takes the value that happens to be there when it starts
      and just overwrites the value when it finishes.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7aec129
    • Jesper Dangaard Brouer's avatar
      net: don't global ICMP rate limit packets originating from loopback · 849a44de
      Jesper Dangaard Brouer authored
      
      
      Florian Weimer seems to have a glibc test-case which requires that
      loopback interfaces does not get ICMP ratelimited.  This was broken by
      commit c0303efe ("net: reduce cycles spend on ICMP replies that
      gets rate limited").
      
      An ICMP response will usually be routed back-out the same incoming
      interface.  Thus, take advantage of this and skip global ICMP
      ratelimit when the incoming device is loopback.  In the unlikely event
      that the outgoing it not loopback, due to strange routing policy
      rules, ICMP rate limiting still works via peer ratelimiting via
      icmpv4_xrlim_allow().  Thus, we should still comply with RFC1812
      (section 4.3.2.8 "Rate Limiting").
      
      This seems to fix the reproducer given by Florian.  While still
      avoiding to perform expensive and unneeded outgoing route lookup for
      rate limited packets (in the non-loopback case).
      
      Fixes: c0303efe ("net: reduce cycles spend on ICMP replies that gets rate limited")
      Reported-by: default avatarFlorian Weimer <fweimer@redhat.com>
      Reported-by: default avatar"H.J. Lu" <hjl.tools@gmail.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      849a44de
    • Dan Carpenter's avatar
      net/act_pedit: fix an error code · c4f65b09
      Dan Carpenter authored
      
      
      I'm reviewing static checker warnings where we do ERR_PTR(0), which is
      the same as NULL.  I'm pretty sure we intended to return ERR_PTR(-EINVAL)
      here.  Sometimes these bugs lead to a NULL dereference but I don't
      immediately see that problem here.
      
      Fixes: 71d0ed70 ("net/act_pedit: Support using offset relative to the conventional network headers")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarAmir Vadai <amir@vadai.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4f65b09
    • Paolo Abeni's avatar
      net: use skb_unref() in napi_consume_skb() · 7608894e
      Paolo Abeni authored
      
      
      The commit 83ada39bb79d ("net: factor out a helper to decrement the
      skb refcount") provided and used a helper for decrementing skb usage,
      but I missed at least a spot for it.
      
      This change remove some more duplicated code reusing skb_unref() in
      napi_consume_skb(), too. The helper uses an additional, unneeded
      unlikely(!skb) test - napi_consume_skb() already check it a few lines
      above - but the compiler is smart enough to optimize the duplicated
      test out.
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7608894e
    • Yonghong Song's avatar
      bpf: permits narrower load from bpf program context fields · 31fd8581
      Yonghong Song authored
      
      
      Currently, verifier will reject a program if it contains an
      narrower load from the bpf context structure. For example,
              __u8 h = __sk_buff->hash, or
              __u16 p = __sk_buff->protocol
              __u32 sample_period = bpf_perf_event_data->sample_period
      which are narrower loads of 4-byte or 8-byte field.
      
      This patch solves the issue by:
        . Introduce a new parameter ctx_field_size to carry the
          field size of narrower load from prog type
          specific *__is_valid_access validator back to verifier.
        . The non-zero ctx_field_size for a memory access indicates
          (1). underlying prog type specific convert_ctx_accesses
               supporting non-whole-field access
          (2). the current insn is a narrower or whole field access.
        . In verifier, for such loads where load memory size is
          less than ctx_field_size, verifier transforms it
          to a full field load followed by proper masking.
        . Currently, __sk_buff and bpf_perf_event_data->sample_period
          are supporting narrowing loads.
        . Narrower stores are still not allowed as typical ctx stores
          are just normal stores.
      
      Because of this change, some tests in verifier will fail and
      these tests are removed. As a bonus, rename some out of bound
      __sk_buff->cb access to proper field name and remove two
      redundant "skb cb oob" tests.
      
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31fd8581
    • WANG Cong's avatar
      net_sched: move tcf_lock down after gen_replace_estimator() · 74030603
      WANG Cong authored
      
      
      Laura reported a sleep-in-atomic kernel warning inside
      tcf_act_police_init() which calls gen_replace_estimator() with
      spinlock protection.
      
      It is not necessary in this case, we already have RTNL lock here
      so it is enough to protect concurrent writers. For the reader,
      i.e. tcf_act_police(), it needs to make decision based on this
      rate estimator, in the worst case we drop more/less packets than
      necessary while changing the rate in parallel, it is still acceptable.
      
      Reported-by: default avatarLaura Abbott <labbott@redhat.com>
      Reported-by: default avatarNick Huber <nicholashuber@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74030603
  4. Jun 13, 2017
Loading