Skip to content
Snippets Groups Projects
  1. Feb 18, 2016
    • Dave Hansen's avatar
      mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys · 33a709b2
      Dave Hansen authored
      
      Today, for normal faults and page table walks, we check the VMA
      and/or PTE to ensure that it is compatible with the action.  For
      instance, if we get a write fault on a non-writeable VMA, we
      SIGSEGV.
      
      We try to do the same thing for protection keys.  Basically, we
      try to make sure that if a user does this:
      
      	mprotect(ptr, size, PROT_NONE);
      	*ptr = foo;
      
      they see the same effects with protection keys when they do this:
      
      	mprotect(ptr, size, PROT_READ|PROT_WRITE);
      	set_pkey(ptr, size, 4);
      	wrpkru(0xffffff3f); // access disable pkey 4
      	*ptr = foo;
      
      The state to do that checking is in the VMA, but we also
      sometimes have to do it on the page tables only, like when doing
      a get_user_pages_fast() where we have no VMA.
      
      We add two functions and expose them to generic code:
      
      	arch_pte_access_permitted(pte_flags, write)
      	arch_vma_access_permitted(vma, write)
      
      These are, of course, backed up in x86 arch code with checks
      against the PTE or VMA's protection key.
      
      But, there are also cases where we do not want to respect
      protection keys.  When we ptrace(), for instance, we do not want
      to apply the tracer's PKRU permissions to the PTEs from the
      process being traced.
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-s390@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      33a709b2
  2. Jan 28, 2016
  3. Jan 21, 2016
    • Stephen Rothwell's avatar
      powerpc: Remove newly added extra definition of pmd_dirty · 0e2bce74
      Stephen Rothwell authored
      
      Commit d5d6a443 ("arch/powerpc/include/asm/pgtable-ppc64.h:
      add pmd_[dirty|mkclean] for THP") added a new identical definition
      of pmd_dirty(). Remove it again.
      
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0e2bce74
    • Chandan Rajendra's avatar
      powerpc: Wire up copy_file_range() syscall · d7f9ee60
      Chandan Rajendra authored
      
      Test runs on a ppc64 BE guest succeeded using modified fstests.
      
      Also tested on ppc64 LE using a home made test - mpe.
      
      Signed-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d7f9ee60
    • Christoph Hellwig's avatar
      dma-mapping: always provide the dma_map_ops based implementation · e1c7e324
      Christoph Hellwig authored
      
      Move the generic implementation to <linux/dma-mapping.h> now that all
      architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
      that everyone supports them.
      
      [valentinrothberg@gmail.com: remove leftovers in Kconfig]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: default avatarValentin Rothberg <valentinrothberg@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1c7e324
    • Rasmus Villemoes's avatar
      powerpc/fadump: rename cpu_online_mask member of struct fadump_crash_info_header · a0512164
      Rasmus Villemoes authored
      The four cpumasks cpu_{possible,online,present,active}_bits are exposed
      readonly via the corresponding const variables cpu_xyz_mask.  But they are
      also accessible for arbitrary writing via the exposed functions
      set_cpu_xyz.  There's quite a bit of code throughout the kernel which
      iterates over or otherwise accesses these bitmaps, and having the access
      go via the cpu_xyz_mask variables is nowadays [1] simply a useless
      indirection.
      
      It may be that any problem in CS can be solved by an extra level of
      indirection, but that doesn't mean every extra indirection solves a
      problem.  In this case, it even necessitates some minor ugliness (see
      4/6).
      
      Patch 1/6 is new in v2, and fixes a build failure on ppc by renaming a
      struct member, to avoid problems when the identifier cpu_online_mask
      becomes a macro later in the series.  The next four patches eliminate the
      cpu_xyz_mask variables by simply exposing the actual bitmaps, after
      renaming them to discourage direct access - that still happens through
      cpu_xyz_mask, which are now simply macros with the same type and value as
      they used to have.
      
      After that, there's no longer any reason to have the setter functions be
      out-of-line: The boolean parameter is almost always a literal true or
      false, so by making them static inlines they will usually compile to one
      or two instructions.
      
      For a defconfig build on x86_64, bloat-o-meter says we save ~3000 bytes.
      We also save a little stack (stackdelta says 127 functions have a 16 byte
      smaller stack frame, while two grow by that amount).  Mostly because, when
      iterating over the mask, gcc typically loads the value of cpu_xyz_mask
      into a callee-saved register and from there into %rdi before each
      find_next_bit call - now it can just load the appropriate immediate
      address into %rdi before each call.
      
      [1] See Rusty's kind explanation
      http://thread.gmane.org/gmane.linux.kernel/2047078/focus=2047722
      
       for
      some historic context.
      
      This patch (of 6):
      
      As preparation for eliminating the indirect access to the various global
      cpu_*_bits bitmaps via the pointer variables cpu_*_mask, rename the
      cpu_online_mask member of struct fadump_crash_info_header to simply
      online_mask, thus allowing cpu_online_mask to become a macro.
      
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0512164
  4. Jan 16, 2016
    • Dan Williams's avatar
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams authored
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
      
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
    • Minchan Kim's avatar
      arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP · d5d6a443
      Minchan Kim authored
      
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d5d6a443
    • Kirill A. Shutemov's avatar
      powerpc, thp: remove infrastructure for handling splitting PMDs · 7aa9a23c
      Kirill A. Shutemov authored
      
      With new refcounting we don't need to mark PMDs splitting.  Let's drop
      code to handle this.
      
      pmdp_splitting_flush() is not needed too: on splitting PMD we will do
      pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
      needed for fast_gup.
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7aa9a23c
  5. Jan 13, 2016
    • Ulrich Weigand's avatar
      powerpc/module: Handle R_PPC64_ENTRY relocations · a61674bd
      Ulrich Weigand authored
      
      GCC 6 will include changes to generated code with -mcmodel=large,
      which is used to build kernel modules on powerpc64le.  This was
      necessary because the large model is supposed to allow arbitrary
      sizes and locations of the code and data sections, but the ELFv2
      global entry point prolog still made the unconditional assumption
      that the TOC associated with any particular function can be found
      within 2 GB of the function entry point:
      
      func:
      	addis r2,r12,(.TOC.-func)@ha
      	addi  r2,r2,(.TOC.-func)@l
      	.localentry func, .-func
      
      To remove this assumption, GCC will now generate instead this global
      entry point prolog sequence when using -mcmodel=large:
      
      	.quad .TOC.-func
      func:
      	.reloc ., R_PPC64_ENTRY
      	ld    r2, -8(r12)
      	add   r2, r2, r12
      	.localentry func, .-func
      
      The new .reloc triggers an optimization in the linker that will
      replace this new prolog with the original code (see above) if the
      linker determines that the distance between .TOC. and func is in
      range after all.
      
      Since this new relocation is now present in module object files,
      the kernel module loader is required to handle them too.  This
      patch adds support for the new relocation and implements the
      same optimization done by the GNU linker.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a61674bd
    • Russell Currey's avatar
      powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages · c88c5d43
      Russell Currey authored
      
      The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no
      parameters and returned nothing.  The call was updated to accept the
      terminal number to flush, and returned various values depending on the
      state of the output buffer.
      
      The prototype has been updated and its usage in the OPAL kmsg dumper has
      been modified to support its new behaviour as an incremental flush.
      
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c88c5d43
  6. Jan 12, 2016
  7. Jan 11, 2016
  8. Jan 08, 2016
  9. Jan 05, 2016
  10. Jan 04, 2016
  11. Dec 28, 2015
  12. Dec 27, 2015
    • Russell Currey's avatar
      powerpc/powernv: Add a kmsg_dumper that flushes console output on panic · affddff6
      Russell Currey authored
      
      On BMC machines, console output is controlled by the OPAL firmware and is
      only flushed when its pollers are called.  When the kernel is in a panic
      state, it no longer calls these pollers and thus console output does not
      completely flush, causing some output from the panic to be lost.
      
      Output is only actually lost when the kernel is configured to not power off
      or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
      flushes the console buffer as part of its power down routines.  Before this
      patch, however, only partial output would be printed during the timeout wait.
      
      This patch adds a new kmsg_dumper which gets called at panic time to ensure
      panic output is not lost.  It accomplishes this by calling OPAL_CONSOLE_FLUSH
      in the OPAL API, and if that is not available, the pollers are called enough
      times to (hopefully) completely flush the buffer.
      
      The flushing mechanism will only affect output printed at and before the
      kmsg_dump call in kernel/panic.c:panic().  As such, the "end Kernel panic"
      message may still be truncated as follows:
      
      >Call Trace:
      >[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
      >[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
      >[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
      >[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
      >[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
      >[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
      >[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
      >---[ end Kernel panic - not
      
      This functionality is implemented as a kmsg_dumper as it seems to be the
      most sensible way to introduce platform-specific functionality to the
      panic function.
      
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      affddff6
    • Michael Neuling's avatar
      powerpc: Copy only required pieces of the mm_context_t to the paca · 2fc251a8
      Michael Neuling authored
      Currently we copy the whole mm_context_t to the paca but only access a
      few bits of it.  This is wasteful of space paca and also takes quite
      some time in the hot path of context switching.
      
      This patch pulls in only the required bits from the mm_context_t to
      the paca and on context switch, copies only those.
      
      Benchmarking this (On top of Anton's recent MSR context switching
      changes [1]) using processes and yield shows an improvement of almost
      3% on POWER8:
      
        http://ozlabs.org/~anton/junkcode/context_switch2.c
        ./context_switch2 --test=yield --process 0 0
      
      1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.html
      
      
      
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      [mpe: Rename paca fields to be mm_ctx_foo rather than context_foo]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2fc251a8
  13. Dec 22, 2015
  14. Dec 19, 2015
  15. Dec 17, 2015
    • Alistair Popple's avatar
      powerpc/powernv: Add support for Nvlink NPUs · 5d2aa710
      Alistair Popple authored
      
      NVLink is a high speed interconnect that is used in conjunction with a
      PCI-E connection to create an interface between CPU and GPU that
      provides very high data bandwidth. A PCI-E connection to a GPU is used
      as the control path to initiate and report status of large data
      transfers sent via the NVLink.
      
      On IBM Power systems the NVLink processing unit (NPU) is similar to
      the existing PHB3. This patch adds support for a new NPU PHB type. DMA
      operations on the NPU are not supported as this patch sets the TCE
      translation tables to be the same as the related GPU PCIe device for
      each NVLink. Therefore all DMA operations are setup and controlled via
      the PCIe device.
      
      EEH is not presently supported for the NPU devices, although it may be
      added in future.
      
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5d2aa710
    • Alistair Popple's avatar
      powerpc: Add __raw_rm_writeq() function · a84bf321
      Alistair Popple authored
      
      Move __raw_rm_writeq() from platforms/powernv/pci-ioda.c to
      include/asm/io.h so that it can be used by other code.
      
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a84bf321
    • Alistair Popple's avatar
      Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field" · 94973b24
      Alistair Popple authored
      
      This commit removed the pcidev field from struct pci_dn as it was no
      longer in use by the kernel. However to support finding the
      association of Nvlink devices to GPU devices from the device-tree this
      field is required.
      
      This reverts commit 250c7b27 ("powerpc/pci: Remove unused struct
      pci_dn.pcidev field").
      
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      94973b24
    • Laurent Dufour's avatar
      powerpc/mm: Add page soft dirty tracking · 7207f436
      Laurent Dufour authored
      
      User space checkpoint and restart tool (CRIU) needs the page's change
      to be soft tracked. This allows to do a pre checkpoint and then dump
      only touched pages.
      
      This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when
      the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when
      the page is swapped out.
      
      To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k
      and hash 64k pte, the bits already defined in hash-*4k.h should be
      shifted left by one.
      
      The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in
      the swap pte. A check is added to ensure that the bit is not
      overwritten by _PAGE_HPTEFLAGS.
      
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7207f436
    • Michael Ellerman's avatar
      powerpc/kernel: Combine vec/loc for STD_EXCEPTION_PSERIES · 2613265c
      Michael Ellerman authored
      
      The STD_EXCEPTION_PSERIES macro takes both a vector number, and a
      location (memory address). However both are always identical, so combine
      them to save repeating ourselves.
      
      This does mean an exception handler must always exist at the location in
      memory that matches its vector number. But that's OK because this is the
      "STD" macro (standard), which does exactly that. We have other macros
      for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line).
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2613265c
    • Michael Ellerman's avatar
      powerpc/kernel: Open code SET_DEFAULT_THREAD_PPR · d8725ce8
      Michael Ellerman authored
      
      This is only used in one location, open code it.
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d8725ce8
    • Michael Ellerman's avatar
      powerpc/kernel: Open code HMT_MEDIUM_LOW_HAS_PPR · d030a4b5
      Michael Ellerman authored
      
      HMT_MEDIUM_LOW_HAS_PPR is only used in once place, open code it.
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d030a4b5
    • Michael Ellerman's avatar
      powerpc/kernel: Drop HMT_MEDIUM_PPR_DISCARD · d6265aea
      Michael Ellerman authored
      
      HMT_MEDIUM_PPR_DISCARD is a macro which is present at the start of most
      of our first level exception handlers. It conditionally executes a
      HMT_MEDIUM instruction, which sets the processor priority to medium.
      
      On on modern systems, ie. Power7 and later, it is nop'ed out at boot.
      All it does is make the exception vectors more cramped, and consume 4
      bytes of icache.
      
      On old systems it has the effect of boosting the processor priority at
      the start of exception processing. If we were previously in the idle
      loop for example, we may be at low or very low priority. This is
      desirable as we want to process the exception as fast as possible.
      
      However looking closely at the generated code, we see that in all cases
      we execute another HMT_MEDIUM just four instructions later. With code
      patching applied, the final code on an old (Power6) system will look
      like, eg:
      
        c000000000000300 <data_access_pSeries>:
        c000000000000300:	7c 42 13 78	mr	r2,r2		<-
        c000000000000304:	7d b2 43 a6	mtsprg	2,r13
        c000000000000308:	7d b1 42 a6	mfsprg	r13,1
        c00000000000030c:	f9 2d 00 80	std	r9,128(r13)
        c000000000000310:	60 00 00 00	nop
        c000000000000314:	7c 42 13 78	mr	r2,r2		<-
      
      So I suggest that the added code complexity of HMT_MEDIUM_PPR_DISCARD is
      not justified by the benefit of boosting the processor priority for the
      duration of four instructions, and therefore we drop it.
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d6265aea
    • Michael Ellerman's avatar
      powerpc/rtas: Make enter_rtas() private · cd5cdeb6
      Michael Ellerman authored
      
      There are no longer any users of enter_rtas() outside of rtas.c, so make
      it "private", by moving the declaration inside rtas.c. Hopefully this
      will encourage people to use one of the wrappers which takes the sharp
      edges off the RTAS calling sequence.
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cd5cdeb6
    • Michael Ellerman's avatar
      powerpc/rtas: Add rtas_call_unlocked() · 209eb4e5
      Michael Ellerman authored
      
      Most users of RTAS (Run-Time Abstraction Services) use rtas_call(),
      which deals with locking as well as endian handling.
      
      However we have two users outside of rtas.c that can't use rtas_call()
      because they have different locking requirements.
      
      The hotplug CPU code can't take the RTAS lock because the CPU would go
      offline with the lock held and no other CPUs would be able to call RTAS
      until the CPU came back online.
      
      The xmon code doesn't want to take the lock because it would risk dead
      locking when we are trying to recover from a crash.
      
      Both sites required multiple patches when we added little endian
      support, proving that programmers can't do endian right.
      
      Although that ship has sailed, we can still clean the code up by
      providing an unlocked version of rtas_call() which avoids the need to
      open code the logic elsewhere.
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      209eb4e5
    • Stewart Smith's avatar
      powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL · e4d54f71
      Stewart Smith authored
      
      Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is
      just OPALv3, with nobody ever expecting anything on pre-OPALv3 to
      be cared about or supported by mainline kernels.
      
      So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL
      exclusively.
      
      Signed-off-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e4d54f71
    • Stewart Smith's avatar
      powerpc/powernv: Remove OPALv2 firmware define and references · 7261aafc
      Stewart Smith authored
      
      OPALv2 only ever existed in the lab and didn't escape to the world.
      All OPAL systems in the wild are OPALv3.
      
      The probability of there being an OPALv2 system still powered on
      anywhere inside IBM is approximately zero, let alone anyone
      expecting to run mainline kernels.
      
      So, start to remove references to OPALv2.
      
      Signed-off-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7261aafc
Loading