Skip to content
  1. Mar 27, 2018
  2. Mar 26, 2018
  3. Mar 24, 2018
  4. Mar 23, 2018
    • Jon Maloy's avatar
      tipc: add 128-bit node identifier · d50ccc2d
      Jon Maloy authored
      
      
      We add a 128-bit node identity, as an alternative to the currently used
      32-bit node address.
      
      For the sake of compatibility and to minimize message header changes
      we retain the existing 32-bit address field. When not set explicitly by
      the user, this field will be filled with a hash value generated from the
      much longer node identity, and be used as a shorthand value for the
      latter.
      
      We permit either the address or the identity to be set by configuration,
      but not both, so when the address value is set by a legacy user the
      corresponding 128-bit node identity is generated based on the that value.
      
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d50ccc2d
    • Dave Watson's avatar
      tls: RX path for ktls · c46234eb
      Dave Watson authored
      
      
      Add rx path for tls software implementation.
      
      recvmsg, splice_read, and poll implemented.
      
      An additional sockopt TLS_RX is added, with the same interface as
      TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
      together (with two different setsockopt calls with appropriate keys).
      
      Control messages are passed via CMSG in a similar way to transmit.
      If no cmsg buffer is passed, then only application data records
      will be passed to userspace, and EIO is returned for other types of
      alerts.
      
      EBADMSG is passed for decryption errors, and EMSGSIZE is passed for
      framing too big, and EBADMSG for framing too small (matching openssl
      semantics). EINVAL is returned for TLS versions that do not match the
      original setsockopt call.  All are unrecoverable.
      
      strparser is used to parse TLS framing.   Decryption is done directly
      in to userspace buffers if they are large enough to support it, otherwise
      sk_cow_data is called (similar to ipsec), and buffers are decrypted in
      place and copied.  splice_read always decrypts in place, since no
      buffers are provided to decrypt in to.
      
      sk_poll is overridden, and only returns POLLIN if a full TLS message is
      received.  Otherwise we wait for strparser to finish reading a full frame.
      Actual decryption is only done during recvmsg or splice_read calls.
      
      Signed-off-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c46234eb
    • Dave Watson's avatar
      tls: Refactor variable names · 58371585
      Dave Watson authored
      
      
      Several config variables are prefixed with tx, drop the prefix
      since these will be used for both tx and rx.
      
      Signed-off-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58371585
    • Dave Watson's avatar
      tls: Pass error code explicitly to tls_err_abort · f4a8e43f
      Dave Watson authored
      
      
      Pass EBADMSG explicitly to tls_err_abort.  Receive path will
      pass additional codes - EMSGSIZE if framing is larger than max
      TLS record size, EINVAL if TLS version mismatch.
      
      Signed-off-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4a8e43f
    • Dave Watson's avatar
      tls: Move cipher info to a separate struct · dbe42559
      Dave Watson authored
      
      
      Separate tx crypto parameters to a separate cipher_context struct.
      The same parameters will be used for rx using the same struct.
      
      tls_advance_record_sn is modified to only take the cipher info.
      
      Signed-off-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbe42559
    • David Ahern's avatar
      devlink: Remove top_hierarchy arg for DEVLINK disabled path · e9de0018
      David Ahern authored
      
      
      Earlier change missed the path where CONFIG_NET_DEVLINK is disabled.
      Thanks to Jiri for spotting.
      
      Fixes: 14530746 ("devlink: Remove top_hierarchy arg to devlink_resource_register")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9de0018
    • Daniel Vacek's avatar
      Revert "mm: page_alloc: skip over regions of invalid pfns where possible" · f59f1caf
      Daniel Vacek authored
      This reverts commit b92df1de ("mm: page_alloc: skip over regions of
      invalid pfns where possible").  The commit is meant to be a boot init
      speed up skipping the loop in memmap_init_zone() for invalid pfns.
      
      But given some specific memory mapping on x86_64 (or more generally
      theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
      implementation also skips valid pfns which is plain wrong and causes
      'kernel BUG at mm/page_alloc.c:1389!'
      
        crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
        kernel BUG at mm/page_alloc.c:1389!
        invalid opcode: 0000 [#1] SMP
        --
        RIP: 0010: move_freepages+0x15e/0x160
        --
        Call Trace:
          move_freepages_block+0x73/0x80
          __rmqueue+0x263/0x460
          get_page_from_freelist+0x7e1/0x9e0
          __alloc_pages_nodemask+0x176/0x420
        --
      
        crash> page_init_bug -v | grep RAM
        <struct resource 0xffff88067fffd2f8>          1000 -        9bfff       System RAM (620.00 KiB)
        <struct resource 0xffff88067fffd3a0>        100000 -     430bffff       System RAM (  1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
        <struct resource 0xffff88067fffd410>      4b0c8000 -     4bf9cfff       System RAM ( 14.83 MiB = 15188.00 KiB)
        <struct resource 0xffff88067fffd480>      4bfac000 -     646b1fff       System RAM (391.02 MiB = 400408.00 KiB)
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct resource 0xffff88067fffd640>     100000000 -    67fffffff       System RAM ( 22.00 GiB)
      
        crash> page_init_bug | head -6
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct page 0xffffea0001ede200>   1fffff00000000  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        <struct page 0xffffea0001ede200>       505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
        <struct page 0xffffea0001ed8000>                0  0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA               1       4095
        <struct page 0xffffea0001edffc0>   1fffff00000400  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        BUG, zones differ!
      
        crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
              PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
        ffffea0001e00000  78000000                0        0  0 0
        ffffea0001ed7fc0  7b5ff000                0        0  0 0
        ffffea0001ed8000  7b600000                0        0  0 0       <<<<
        ffffea0001ede1c0  7b787000                0        0  0 0
        ffffea0001ede200  7b788000                0        0  1 1fffff00000000
      
      Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
      
      
      Fixes: b92df1de ("mm: page_alloc: skip over regions of invalid pfns where possible")
      Signed-off-by: default avatarDaniel Vacek <neelx@redhat.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f59f1caf
    • Toshi Kani's avatar
      mm/vmalloc: add interfaces to free unmapped page table · b6bdb751
      Toshi Kani authored
      On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
      create pud/pmd mappings.  A kernel panic was observed on arm64 systems
      with Cortex-A75 in the following steps as described by Hanjun Guo.
      
       1. ioremap a 4K size, valid page table will build,
       2. iounmap it, pte0 will set to 0;
       3. ioremap the same address with 2M size, pgd/pmd is unchanged,
          then set the a new value for pmd;
       4. pte0 is leaked;
       5. CPU may meet exception because the old pmd is still in TLB,
          which will lead to kernel panic.
      
      This panic is not reproducible on x86.  INVLPG, called from iounmap,
      purges all levels of entries associated with purged address on x86.  x86
      still has memory leak.
      
      The patch changes the ioremap path to free unmapped page table(s) since
      doing so in the unmap path has the following issues:
      
       - The iounmap() path is shared with vunmap(). Since vmap() only
         supports pte mappings, making vunmap() to free a pte page is an
         overhead for regular vmap users as they do not need a pte page freed
         up.
      
       - Checking if all entries in a pte page are cleared in the unmap path
         is racy, and serializing this check is expensive.
      
       - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
         Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
         purge.
      
      Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
      clear a given pud/pmd entry and free up a page for the lower level
      entries.
      
      This patch implements their stub functions on x86 and arm64, which work
      as workaround.
      
      [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
      Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
      
      
      Fixes: e61ce6ad ("mm: change ioremap to set up huge I/O mappings")
      Reported-by: default avatarLei Li <lious.lilei@hisilicon.com>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6bdb751
  5. Mar 22, 2018
  6. Mar 21, 2018
  7. Mar 19, 2018
    • John Fastabend's avatar
      bpf: sk_msg program helper bpf_sk_msg_pull_data · 015632bb
      John Fastabend authored
      
      
      Currently, if a bpf sk msg program is run the program
      can only parse data that the (start,end) pointers already
      consumed. For sendmsg hooks this is likely the first
      scatterlist element. For sendpage this will be the range
      (0,0) because the data is shared with userspace and by
      default we want to avoid allowing userspace to modify
      data while (or after) BPF verdict is being decided.
      
      To support pulling in additional bytes for parsing use
      a new helper bpf_sk_msg_pull(start, end, flags) which
      works similar to cls tc logic. This helper will attempt
      to point the data start pointer at 'start' bytes offest
      into msg and data end pointer at 'end' bytes offset into
      message.
      
      After basic sanity checks to ensure 'start' <= 'end' and
      'end' <= msg_length there are a few cases we need to
      handle.
      
      First the sendmsg hook has already copied the data from
      userspace and has exclusive access to it. Therefor, it
      is not necessesary to copy the data. However, it may
      be required. After finding the scatterlist element with
      'start' offset byte in it there are two cases. One the
      range (start,end) is entirely contained in the sg element
      and is already linear. All that is needed is to update the
      data pointers, no allocate/copy is needed. The other case
      is (start, end) crosses sg element boundaries. In this
      case we allocate a block of size 'end - start' and copy
      the data to linearize it.
      
      Next sendpage hook has not copied any data in initial
      state so that data pointers are (0,0). In this case we
      handle it similar to the above sendmsg case except the
      allocation/copy must always happen. Then when sending
      the data we have possibly three memory regions that
      need to be sent, (0, start - 1), (start, end), and
      (end + 1, msg_length). This is required to ensure any
      writes by the BPF program are correctly transmitted.
      
      Lastly this operation will invalidate any previous
      data checks so BPF programs will have to revalidate
      pointers after making this BPF call.
      
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      015632bb
Loading