Skip to content
  1. Jan 26, 2016
  2. Jan 19, 2016
  3. Jan 18, 2016
    • Chris Metcalf's avatar
      numa: remove stale node_has_online_mem() define · 00d27c63
      Chris Metcalf authored
      
      
      This isn't used anywhere, so delete it.
      
      Looks like the last usage (in x86-specific code) was removed by Tejun
      in 2011 in commit bd6709a9 ("x86, NUMA: Make 32bit use common NUMA
      init path").
      
      Signed-off-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
      00d27c63
    • Chris Metcalf's avatar
      arch/tile: move user_exit() to early kernel entry sequence · 1bb50cad
      Chris Metcalf authored
      
      
      This ensures that we always notify context tracking that we
      have exited from user space no matter how we enter the kernel.
      It is similar to how arm64 handles context tracking, for example.
      
      This allows the removal of all the exception_enter() calls that
      were added in commit 49e4e156 ("tile: support CONTEXT_TRACKING and
      thus NOHZ_FULL").
      
      Signed-off-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
      1bb50cad
    • Chris Metcalf's avatar
      tile: fix bug in setting PT_FLAGS_DISABLE_IRQ on kernel entry · 9ce815ed
      Chris Metcalf authored
      
      
      This flag value is saved in ptregs and used to decide whether
      to disable irqs when returning from the kernel.  Commit 1168df528fe4
      ("tile: don't assume user privilege is zero") performed a bad
      merge from some KVM-enabled code that had not yet been upstreamed.
      
      The only issue with the old code is that we will read the interrupt
      mask in more conditions than we need to (e.g., coming from user
      space when user space has the Interrupt Critical Section bit set, or
      coming from a guest kernel), which is a slow multi-cycle operation.
      This change saves those few cycles in the common case.
      
      Signed-off-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
      9ce815ed
    • Chris Metcalf's avatar
      tile: fix tilepro casts for readl, writel, etc · acfb699e
      Chris Metcalf authored
      
      
      Missing parentheses could cause an argument of the form
      "integer + pointer" to get cast to "(long)integer + pointer"
      and remain a pointer type, causing compiler warnings.
      
      Signed-off-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
      acfb699e
    • Chris Metcalf's avatar
      tile: fix a -Wframe-larger-than warning · f3a26163
      Chris Metcalf authored
      
      
      The warning occurs in setup.c, where it is known that it can't be
      a problem, but it's still a good idea to silence the warning.
      The onstack array is converted from an s32 to a u8, which still
      is plenty of range for the values being managed there.
      
      Signed-off-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
      f3a26163
    • Chris Metcalf's avatar
      tile: include the syscall number in the backtrace · 5ac65abd
      Chris Metcalf authored
      
      
      This information is easily available in the backtrace data and can
      be helpful when trying to figure out the backtrace, particularly
      if we're early in kernel entry or late in kernel exit.
      
      Signed-off-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
      5ac65abd
    • Chris Metcalf's avatar
      arch/tile: adopt prepare_exit_to_usermode() model from x86 · 583b24a2
      Chris Metcalf authored
      
      
      This change is a prerequisite change for TASK_ISOLATION but also
      stands on its own for readability and maintainability.  The existing
      tile do_work_pending() was called in a loop from assembly on
      the slow path; this change moves the loop into C code as well.
      For the x86 version see commit c5c46f59 ("x86/entry: Add new,
      comprehensible entry and exit handlers written in C").
      
      This change exposes a pre-existing bug on the older tilepro platform;
      the singlestep processing is done last, but on tilepro (unlike tilegx)
      we enable interrupts while doing that processing, so we could in
      theory miss a signal or other asynchronous event.  A future change
      could fix this by breaking the singlestep work into a "prepare"
      step done in the main loop, and a "trigger" step done after exiting
      the loop.  Since this change is intended as purely a restructuring
      change, we call out the bug explicitly now, but don't yet fix it.
      
      Signed-off-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
      583b24a2
    • Zi Shen Lim's avatar
      arm64: bpf: add extra pass to handle faulty codegen · 42ff712b
      Zi Shen Lim authored
      
      
      Code generation functions in arch/arm64/kernel/insn.c previously
      BUG_ON invalid parameters. Following change of that behavior, now we
      need to handle the error case where AARCH64_BREAK_FAULT is returned.
      
      Instead of error-handling on every emit() in JIT, we add a new
      validation pass at the end of JIT compilation. There's no point in
      running JITed code at run-time only to trap due to AARCH64_BREAK_FAULT.
      Instead, we drop this failed JIT compilation and allow the system to
      gracefully fallback on the BPF interpreter.
      
      Signed-off-by: default avatarZi Shen Lim <zlim.lnx@gmail.com>
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42ff712b
    • Zi Shen Lim's avatar
      arm64: insn: remove BUG_ON from codegen · c94ae4f7
      Zi Shen Lim authored
      
      
      During code generation, we used to BUG_ON unknown/unsupported encoding
      or invalid parameters.
      
      Instead, now we report these as errors and simply return the
      instruction AARCH64_BREAK_FAULT. Users of these codegen helpers should
      check for and handle this failure condition as appropriate.
      
      Otherwise, unhandled codegen failure will result in trapping at
      run-time due to AARCH64_BREAK_FAULT, which is arguably better than a
      BUG_ON.
      
      Signed-off-by: default avatarZi Shen Lim <zlim.lnx@gmail.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c94ae4f7
  4. Jan 16, 2016
    • Will Deacon's avatar
      Kconfig: remove HAVE_LATENCYTOP_SUPPORT · da48d094
      Will Deacon authored
      
      
      As illustrated by commit a3afe70b ("[S390] latencytop s390
      support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to
      advertise an implementation of save_stack_trace_tsk.
      
      However, as of 9212ddb5 ("stacktrace: provide save_stack_trace_tsk()
      weak alias") a dummy implementation is provided if STACKTRACE=y.  Given
      that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects
      STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether.
      
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Helge Deller <deller@gmx.de>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da48d094
    • Helge Deller's avatar
      parisc: Protect huge page pte changes with spinlocks · b0e55131
      Helge Deller authored
      
      
      PA-RISC doesn't have atomic instructions to modify page table entries, so it
      takes spinlock in the TLB handler and modifies the page table entry
      non-atomically. If you modify the page table entry without the spinlock, you
      may race with TLB handler on another CPU and your modification may be lost.
      Protect against that with usage of purge_tlb_start() and purge_tlb_end() which
      handles the TLB spinlock.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # v4.4
      b0e55131
    • Dominik Dingel's avatar
      s390/mm: enable fixup_user_fault retrying · fef8953a
      Dominik Dingel authored
      
      
      By passing a non-null flag we allow fixup_user_fault to retry, which
      enables userfaultfd.  As during these retries we might drop the mmap_sem
      we need to check if that happened and redo the complete chain of
      actions.
      
      Signed-off-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fef8953a
    • Dominik Dingel's avatar
      mm: bring in additional flag for fixup_user_fault to signal unlock · 4a9e1cda
      Dominik Dingel authored
      
      
      During Jason's work with postcopy migration support for s390 a problem
      regarding gmap faults was discovered.
      
      The gmap code will call fixup_user_fault which will end up always in
      handle_mm_fault.  Till now we never cared about retries, but as the
      userfaultfd code kind of relies on it.  this needs some fix.
      
      This patchset does not take care of the futex code.  I will now look
      closer at this.
      
      This patch (of 2):
      
      With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault
      to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the
      faulting we ever unlocked mmap_sem.
      
      This patch brings in the logic to handle retries as well as it cleans up
      the current documentation.  fixup_user_fault was not having the same
      semantics as filemap_fault.  It never indicated if a retry happened and
      so a caller wasn't able to handle that case.  So we now changed the
      behaviour to always retry a locked mmap_sem.
      
      Signed-off-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4a9e1cda
    • Dan Williams's avatar
      mm, x86: get_user_pages() for dax mappings · 3565fce3
      Dan Williams authored
      
      
      A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
      has established a devm_memremap_pages() mapping, i.e.  when the pfn_t
      return from ->direct_access() has PFN_DEV and PFN_MAP set.  Later, when
      encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
      struct dev_pagemap instance to keep the result of pfn_to_page() valid
      until put_page().
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3565fce3
    • Dan Williams's avatar
      mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd · 5c7fb56e
      Dan Williams authored
      
      
      A dax-huge-page mapping while it uses some thp helpers is ultimately not
      a transparent huge page.  The distinction is especially important in the
      get_user_pages() path.  pmd_devmap() is used to distinguish dax-pmds
      from pmd_huge() and pmd_trans_huge() which have slightly different
      semantics.
      
      Explicitly mark the pmd_trans_huge() helpers that dax needs by adding
      pmd_devmap() checks.
      
      [kirill.shutemov@linux.intel.com: fix regression in handling mlocked pages in  __split_huge_pmd()]
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c7fb56e
    • Dan Williams's avatar
      mm, dax: convert vmf_insert_pfn_pmd() to pfn_t · f25748e3
      Dan Williams authored
      
      
      Similar to the conversion of vm_insert_mixed() use pfn_t in the
      vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the
      pfn is backed by a devm_memremap_pages() mapping.
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f25748e3
    • Dan Williams's avatar
      mm, dax, gpu: convert vm_insert_mixed to pfn_t · 01c8f1c4
      Dan Williams authored
      
      
      Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose of
      evaluating the PFN_MAP and PFN_DEV flags.  When both are set it triggers
      _PAGE_DEVMAP to be set in the resulting pte.
      
      There are no functional changes to the gpu drivers as a result of this
      conversion.
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Airlie <airlied@linux.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01c8f1c4
    • Dan Williams's avatar
      x86, mm: introduce _PAGE_DEVMAP · 69660fd7
      Dan Williams authored
      
      
      _PAGE_DEVMAP is a hardware-unused pte bit that will later be used in the
      get_user_pages() path to identify pfns backed by the dynamic allocation
      established by devm_memremap_pages.  Upon seeing that bit the gup path
      will lookup and pin the allocation while the pages are in use.
      
      Since the _PAGE_DEVMAP bit is > 32 it must be cast to u64 instead of a
      pteval_t to allow pmd_flags() usage in the realmode boot code to build.
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69660fd7
    • Dan Williams's avatar
      frv: fix compiler warning from definition of __pmd() · 6d8113c7
      Dan Williams authored
      
      
      Take into account that the pmd_t type is a array inside a struct, so it
      needs two levels of brackets to initialize.  Otherwise, a usage of __pmd
      generates a warning:
      
        include/linux/mm.h:986:2: warning: missing braces around initializer [-Wmissing-braces]
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d8113c7
    • Dan Williams's avatar
      avr32: convert to asm-generic/memory_model.h · 083fc214
      Dan Williams authored
      
      
      Switch avr32/include/asm/page.h to use the common defintions for
      pfn_to_page(), page_to_pfn(), and ARCH_PFN_OFFSET.
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      083fc214
    • Dan Williams's avatar
      libnvdimm, pfn, pmem: allocate memmap array in persistent memory · d2c0f041
      Dan Williams authored
      
      
      Use the new vmem_altmap capability to enable the pmem driver to arrange
      for a struct page memmap to be established in persistent memory.
      
      [linux@roeck-us.net: mn10300: declare __pfn_to_phys() to fix build error]
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d2c0f041
    • Dan Williams's avatar
      x86, mm: introduce vmem_altmap to augment vmemmap_populate() · 4b94ffdc
      Dan Williams authored
      
      
      In support of providing struct page for large persistent memory
      capacities, use struct vmem_altmap to change the default policy for
      allocating memory for the memmap array.  The default vmemmap_populate()
      allocates page table storage area from the page allocator.  Given
      persistent memory capacities relative to DRAM it may not be feasible to
      store the memmap in 'System Memory'.  Instead vmem_altmap represents
      pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
      requests.
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b94ffdc
    • Dan Williams's avatar
      mm, dax, pmem: introduce pfn_t · 34c0fd54
      Dan Williams authored
      
      
      For the purpose of communicating the optional presence of a 'struct
      page' for the pfn returned from ->direct_access(), introduce a type that
      encapsulates a page-frame-number plus flags.  These flags contain the
      historical "page_link" encoding for a scatterlist entry, but can also
      denote "device memory".  Where "device memory" is a set of pfns that are
      not part of the kernel's linear mapping by default, but are accessed via
      the same memory controller as ram.
      
      The motivation for this new type is large capacity persistent memory
      that needs struct page entries in the 'memmap' to support 3rd party DMA
      (i.e.  O_DIRECT I/O with a persistent memory source/target).  However,
      we also need it in support of maintaining a list of mapped inodes which
      need to be unmapped at driver teardown or freeze_bdev() time.
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34c0fd54
    • Dan Williams's avatar
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams authored
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
      
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
    • Dan Williams's avatar
      um: kill pfn_t · 16da3068
      Dan Williams authored
      The core has developed a need for a "pfn_t" type [1].  Convert the usage
      of pfn_t by usermode-linux to an unsigned long, and update pfn_to_phys()
      to drop its expectation of a typed pfn.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16da3068
    • Dan Williams's avatar
      pmem, dax: clean up clear_pmem() · 52db400f
      Dan Williams authored
      
      
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 25):
      
      Both __dax_pmd_fault, and clear_pmem() were taking special steps to
      clear memory a page at a time to take advantage of non-temporal
      clear_page() implementations.  However, x86_64 does not use non-temporal
      instructions for clear_page(), and arch_clear_pmem() was always
      incurring the cost of __arch_wb_cache_pmem().
      
      Clean up the assumption that doing clear_pmem() a page at a time is more
      performant.
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52db400f
    • Minchan Kim's avatar
      arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP · 05ee26d9
      Minchan Kim authored
      
      
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_mkclean for THP page MADV_FREE support.
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      05ee26d9
    • Minchan Kim's avatar
      arch/arm/include/asm/pgtable-3level.h: add pmd_mkclean for THP · 44842045
      Minchan Kim authored
      
      
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_mkclean for THP page MADV_FREE support.
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44842045
Loading