Skip to content
  1. Nov 28, 2016
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Decrease the powerpc default halt poll max value · f4944613
      Suraj Jitindar Singh authored
      
      
      KVM_HALT_POLL_NS_DEFAULT is an arch specific constant which sets the
      default value of the halt_poll_ns kvm module parameter which determines
      the global maximum halt polling interval.
      
      The current value for powerpc is 500000 (500us) which means that any
      repetitive workload with a period of less than that can drive the cpu
      usage to 100% where it may have been mostly idle without halt polling.
      This presents the possibility of a large increase in power usage with
      a comparatively small performance benefit.
      
      Reduce the default to 10000 (10us) and a user can tune this themselves
      to set their affinity for halt polling based on the trade off between power
      and performance which they are willing to make.
      
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      f4944613
  2. Nov 23, 2016
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Enable hypervisor virtualization interrupts while in guest · 84f7139c
      Paul Mackerras authored
      
      
      The new XIVE interrupt controller on POWER9 can direct external
      interrupts to the hypervisor or the guest.  The interrupts directed to
      the hypervisor are controlled by an LPCR bit called LPCR_HVICE, and
      come in as a "hypervisor virtualization interrupt".  This sets the
      LPCR bit so that hypervisor virtualization interrupts can occur while
      we are in the guest.  We then also need to cope with exiting the guest
      because of a hypervisor virtualization interrupt.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      84f7139c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9 · f725758b
      Paul Mackerras authored
      
      
      POWER9 includes a new interrupt controller, called XIVE, which is
      quite different from the XICS interrupt controller on POWER7 and
      POWER8 machines.  KVM-HV accesses the XICS directly in several places
      in order to send and clear IPIs and handle interrupts from PCI
      devices being passed through to the guest.
      
      In order to make the transition to XIVE easier, OPAL firmware will
      include an emulation of XICS on top of XIVE.  Access to the emulated
      XICS is via OPAL calls.  The one complication is that the EOI
      (end-of-interrupt) function can now return a value indicating that
      another interrupt is pending; in this case, the XIVE will not signal
      an interrupt in hardware to the CPU, and software is supposed to
      acknowledge the new interrupt without waiting for another interrupt
      to be delivered in hardware.
      
      This adapts KVM-HV to use the OPAL calls on machines where there is
      no XICS hardware.  When there is no XICS, we look for a device-tree
      node with "ibm,opal-intc" in its compatible property, which is how
      OPAL indicates that it provides XICS emulation.
      
      In order to handle the EOI return value, kvmppc_read_intr() has
      become kvmppc_read_one_intr(), with a boolean variable passed by
      reference which can be set by the EOI functions to indicate that
      another interrupt is pending.  The new kvmppc_read_intr() keeps
      calling kvmppc_read_one_intr() until there are no more interrupts
      to process.  The return value from kvmppc_read_intr() is the
      largest non-zero value of the returns from kvmppc_read_one_intr().
      
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      f725758b
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9 · 7c5b06ca
      Paul Mackerras authored
      
      
      POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
      and tlbiel (local tlbie) instructions.  Both instructions get a
      set of new parameters (RIC, PRS and R) which appear as bits in the
      instruction word.  The tlbiel instruction now has a second register
      operand, which contains a PID and/or LPID value if needed, and
      should otherwise contain 0.
      
      This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
      as well as older processors.  Since we only handle HPT guests so
      far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
      word as on previous processors, so we don't need to conditionally
      execute different instructions depending on the processor.
      
      The local flush on first entry to a guest in book3s_hv_rmhandlers.S
      is a loop which depends on the number of TLB sets.  Rather than
      using feature sections to set the number of iterations based on
      which CPU we're on, we now work out this number at VM creation time
      and store it in the kvm_arch struct.  That will make it possible to
      get the number from the device tree in future, which will help with
      compatibility with future processors.
      
      Since mmu_partition_table_set_entry() does a global flush of the
      whole LPID, we don't need to do the TLB flush on first entry to the
      guest on each processor.  Therefore we don't set all bits in the
      tlb_need_flush bitmap on VM startup on POWER9.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      7c5b06ca
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs · e9cf1e08
      Paul Mackerras authored
      
      
      This adds code to handle two new guest-accessible special-purpose
      registers on POWER9: TIDR (thread ID register) and PSSCR (processor
      stop status and control register).  They are context-switched
      between host and guest, and the guest values can be read and set
      via the one_reg interface.
      
      The PSSCR contains some fields which are guest-accessible and some
      which are only accessible in hypervisor mode.  We only allow the
      guest-accessible fields to be read or set by userspace.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      e9cf1e08
    • Michael Neuling's avatar
      powerpc/powernv: Define and set POWER9 HFSCR doorbell bit · 02ed21ae
      Michael Neuling authored
      
      
      Define and set the POWER9 HFSCR doorbell bit so that guests can use
      msgsndp.
      
      ISA 3.0 calls this MSGP, so name it accordingly in the code.
      
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      02ed21ae
  3. Nov 22, 2016
  4. Nov 21, 2016
  5. Nov 16, 2016
    • Paul Mackerras's avatar
      powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format · 6b243fcf
      Paul Mackerras authored
      
      
      This changes the way that we support the new ISA v3.00 HPTE format.
      Instead of adapting everything that uses HPTE values to handle either
      the old format or the new format, depending on which CPU we are on,
      we now convert explicitly between old and new formats if necessary
      in the low-level routines that actually access HPTEs in memory.
      This limits the amount of code that needs to know about the new
      format and makes the conversions explicit.  This is OK because the
      old format contains all the information that is in the new format.
      
      This also fixes operation under a hypervisor, because the H_ENTER
      hypercall (and other hypercalls that deal with HPTEs) will continue
      to require the HPTE value to be supplied in the old format.  At
      present the kernel will not boot in HPT mode on POWER9 under a
      hypervisor.
      
      This fixes and partially reverts commit 50de596d
      ("powerpc/mm/hash: Add support for Power9 Hash", 2016-04-29).
      
      Fixes: 50de596d ("powerpc/mm/hash: Add support for Power9 Hash")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6b243fcf
  6. Oct 29, 2016
  7. Oct 27, 2016
    • Nicholas Piggin's avatar
      powerpc/64s: relocation, register save fixes for system reset interrupt · fb479e44
      Nicholas Piggin authored
      
      
      This patch does a couple of things. First of all, powernv immediately
      explodes when running a relocated kernel, because the system reset
      exception for handling sleeps does not do correct relocated branches.
      
      Secondly, the sleep handling code trashes the condition and cfar
      registers, which we would like to preserve for debugging purposes (for
      non-sleep case exception).
      
      This patch changes the exception to use the standard format that saves
      registers before any tests or branches are made. It adds the test for
      idle-wakeup as an "extra" to break out of the normal exception path.
      Then it branches to a relocated idle handler that calls the various
      idle handling functions.
      
      After this patch, POWER8 CPU simulator now boots powernv kernel that is
      running at non-zero.
      
      Fixes: 948cf67c ("powerpc: Add NAP mode support on Power7 in HV mode")
      Cc: stable@vger.kernel.org # v3.0+
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fb479e44
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Use tlbiel only if we ever ran on the current cpu · bd77c449
      Aneesh Kumar K.V authored
      
      
      Before this patch, we used tlbiel, if we ever ran only on this core.
      That was mostly derived from the nohash usage of the same. But is
      incorrect, the ISA 3.0 clarifies tlbiel such that:
      
      "All TLB entries that have all of the following properties are made
      invalid on the thread executing the tlbiel instruction"
      
      ie. tlbiel only invalidates TLB entries on the current thread. So if the
      mm has been used on any other thread (aka. cpu) then we must broadcast
      the invalidate.
      
      This bug could lead to invalid TLB entries if a program runs on multiple
      threads of a core.
      
      Hence use tlbiel, if we only ever ran on only the current cpu.
      
      Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bd77c449
  8. Oct 21, 2016
  9. Oct 19, 2016
  10. Oct 08, 2016
    • Srikar Dronamraju's avatar
      powerpc: implement arch_reserved_kernel_pages · 1e76609c
      Srikar Dronamraju authored
      Currently significant amount of memory is reserved only in kernel booted
      to capture kernel dump using the fa_dump method.
      
      Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialize
      only certain size memory per node.  The certain size takes into account
      the dentry and inode cache sizes.  Currently the cache sizes are
      calculated based on the total system memory including the reserved
      memory.  However such a kernel when booting the same kernel as fadump
      kernel will not be able to allocate the required amount of memory to
      suffice for the dentry and inode caches.  This results in crashes like
      
      Hence only implement arch_reserved_kernel_pages() for CONFIG_FA_DUMP
      configurations.  The amount reserved will be reduced while calculating
      the large caches and will avoid crashes like the below on large systems
      such as 32 TB systems.
      
        Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
        vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
        swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
        Call Trace:
           dump_stack+0xb0/0xf0 (unreliable)
           warn_alloc_failed+0x114/0x160
           __vmalloc_node_range+0x304/0x340
           __vmalloc+0x6c/0x90
           alloc_large_system_hash+0x1b8/0x2c0
           inode_init+0x94/0xe4
           vfs_caches_init+0x8c/0x13c
           start_kernel+0x50c/0x578
           start_here_common+0x20/0xa8
      
      Link: http://lkml.kernel.org/r/1472476010-4709-4-git-send-email-srikar@linux.vnet.ibm.com
      
      
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Suggested-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e76609c
  11. Oct 04, 2016
    • Naveen N. Rao's avatar
      powerpc/bpf: Implement support for tail calls · ce076141
      Naveen N. Rao authored
      
      
      Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
      programs. This can be achieved either by:
      (1) retaining the stack setup by the first eBPF program and having all
      subsequent eBPF programs re-using it, or,
      (2) by unwinding/tearing down the stack and having each eBPF program
      deal with its own stack as it sees fit.
      
      To ensure that this does not create loops, there is a limit to how many
      tail calls can be done (currently 32). This requires the JIT'ed code to
      maintain a count of the number of tail calls done so far.
      
      Approach (1) is simple, but requires every eBPF program to have (almost)
      the same prologue/epilogue, regardless of whether they need it. This is
      inefficient for small eBPF programs which may not sometimes need a
      prologue at all. As such, to minimize impact of tail call
      implementation, we use approach (2) here which needs each eBPF program
      in the chain to use its own prologue/epilogue. This is not ideal when
      many tail calls are involved and when all the eBPF programs in the chain
      have similar prologue/epilogue. However, the impact is restricted to
      programs that do tail calls. Individual eBPF programs are not affected.
      
      We maintain the tail call count in a fixed location on the stack and
      updated tail call count values are passed in through this. The very
      first eBPF program in a chain sets this up to 0 (the first 2
      instructions). Subsequent tail calls skip the first two eBPF JIT
      instructions to maintain the count. For programs that don't do tail
      calls themselves, the first two instructions are NOPs.
      
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ce076141
    • Cyril Bur's avatar
      powerpc: tm: Enable transactional memory (TM) lazily for userspace · 5d176f75
      Cyril Bur authored
      
      
      Currently the MSR TM bit is always set if the hardware is TM capable.
      This adds extra overhead as it means the TM SPRS (TFHAR, TEXASR and
      TFAIR) must be swapped for each process regardless of if they use TM.
      
      For processes that don't use TM the TM MSR bit can be turned off
      allowing the kernel to avoid the expensive swap of the TM registers.
      
      A TM unavailable exception will occur if a thread does use TM and the
      kernel will enable MSR_TM and leave it so for some time afterwards.
      
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5d176f75
    • Cyril Bur's avatar
      powerpc: Remove do_load_up_transact_{fpu,altivec} · d986d6f4
      Cyril Bur authored
      
      
      Previous rework of TM code leaves these functions unused
      
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d986d6f4
    • Cyril Bur's avatar
      powerpc: tm: Rename transct_(*) to ck(\1)_state · 000ec280
      Cyril Bur authored
      
      
      Make the structures being used for checkpointed state named
      consistently with the pt_regs/ckpt_regs.
      
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      000ec280
    • Cyril Bur's avatar
      powerpc: tm: Always use fp_state and vr_state to store live registers · dc310669
      Cyril Bur authored
      
      
      There is currently an inconsistency as to how the entire CPU register
      state is saved and restored when a thread uses transactional memory
      (TM).
      
      Using transactional memory results in the CPU having duplicated
      (almost) all of its register state. This duplication results in a set
      of registers which can be considered 'live', those being currently
      modified by the instructions being executed and another set that is
      frozen at a point in time.
      
      On context switch, both sets of state have to be saved and (later)
      restored. These two states are often called a variety of different
      things. Common terms for the state which only exists after the CPU has
      entered a transaction (performed a TBEGIN instruction) in hardware are
      'transactional' or 'speculative'.
      
      Between a TBEGIN and a TEND or TABORT (or an event that causes the
      hardware to abort), regardless of the use of TSUSPEND the
      transactional state can be referred to as the live state.
      
      The second state is often to referred to as the 'checkpointed' state
      and is a duplication of the live state when the TBEGIN instruction is
      executed. This state is kept in the hardware and will be rolled back
      to on transaction failure.
      
      Currently all the registers stored in pt_regs are ALWAYS the live
      registers, that is, when a thread has transactional registers their
      values are stored in pt_regs and the checkpointed state is in
      ckpt_regs. A strange opposite is true for fp_state/vr_state. When a
      thread is non transactional fp_state/vr_state holds the live
      registers. When a thread has initiated a transaction fp_state/vr_state
      holds the checkpointed state and transact_fp/transact_vr become the
      structure which holds the live state (at this point it is a
      transactional state).
      
      This method creates confusion as to where the live state is, in some
      circumstances it requires extra work to determine where to put the
      live state and prevents the use of common functions designed (probably
      before TM) to save the live state.
      
      With this patch pt_regs, fp_state and vr_state all represent the
      same thing and the other structures [pending rename] are for
      checkpointed state.
      
      Acked-by: default avatarSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      dc310669
    • Cyril Bur's avatar
      powerpc: signals: Stop using current in signal code · d1199431
      Cyril Bur authored
      
      
      Much of the signal code takes a pt_regs on which it operates. Over
      time the signal code has needed to know more about the thread than
      what pt_regs can supply, this information is obtained as needed by
      using 'current'.
      
      This approach is not strictly incorrect however it does mean that
      there is now a hard requirement that the pt_regs being passed around
      does belong to current, this is never checked. A safer approach is for
      the majority of the signal functions to take a task_struct from which
      they can obtain pt_regs and any other information they need. The
      caveat that the task_struct they are passed must be current doesn't go
      away but can more easily be checked for.
      
      Functions called from outside powerpc signal code are passed a pt_regs
      and they can confirm that the pt_regs is that of current and pass
      current to other functions, furthurmore, powerpc signal functions can
      check that the task_struct they are passed is the same as current
      avoiding possible corruption of current (or the task they are passed)
      if this assertion ever fails.
      
      CC: paulus@samba.org
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d1199431
    • Cyril Bur's avatar
      powerpc: Return the new MSR from msr_check_and_set() · 3cee070a
      Cyril Bur authored
      
      
      msr_check_and_set() always performs a mfmsr() to determine if it needs
      to perform an mtmsr(), as mfmsr() can be a costly operation
      msr_check_and_set() could return the MSR now on the CPU to avoid
      callers of msr_check_and_set having to make their own mfmsr() call.
      
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3cee070a
    • Gavin Shan's avatar
      powerpc/powernv: Specify proper data type for PCI_SLOT_ID_PREFIX · 066bcd78
      Gavin Shan authored
      
      
      This fixes the warning reported from sparse:
      
        eeh-powernv.c:875:23: warning: constant 0x8000000000000000 is so big it is unsigned long
      
      Fixes: ebe22531 ("powerpc/powernv: Support PCI slot ID")
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      066bcd78
    • Anton Blanchard's avatar
      powerpc: Remove static branch prediction in atomic{, 64}_add_unless · 61e98ebf
      Anton Blanchard authored
      
      
      I see quite a lot of static branch mispredictions on a simple
      web serving workload. The issue is in __atomic_add_unless(), called
      from _atomic_dec_and_lock(). There is no obvious common case, so it
      is better to let the hardware predict the branch.
      
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      61e98ebf
    • Anton Blanchard's avatar
      powerpc: During context switch, check before setting mm_cpumask · bb85fb58
      Anton Blanchard authored
      
      
      During context switch, switch_mm() sets our current CPU in mm_cpumask.
      We can avoid this atomic sequence in most cases by checking before
      setting the bit.
      
      Testing on a POWER8 using our context switch microbenchmark:
      
      tools/testing/selftests/powerpc/benchmarks/context_switch \
      	--process --no-fp --no-altivec --no-vector
      
      Performance improves 2%.
      
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bb85fb58
    • Nicholas Piggin's avatar
      powerpc: Use gas sections for arranging exception vectors · 57f26649
      Nicholas Piggin authored
      
      
      Use assembler sections of fixed size and location to arrange the 64-bit
      Book3S exception vector code (64-bit Book3E also uses it in head_64.S
      for 0x0..0x100).
      
      This allows better flexibility in arranging exception code and hiding
      unimportant details behind macros.
      
      Gas sections can be a bit painful to use this way, mainly because the
      assembler does not know where they will be finally linked. Taking
      absolute addresses requires a bit of trickery for example, but it can
      be hidden behind macros for the most part.
      
      Generated code is mostly the same except locations, offsets, alignments.
      
      The "+ 0x2" is only required for the trap number / kvm exit number,
      which gets loaded as a constant into a register.
      
      Previously, code also used + 0x2 for label names, but we changed to
      using "H" to distinguish HV case for that. Remove the last vestiges
      of that.
      
      __after_prom_start is taking absolute address of a label in another
      fixed section. Newer toolchains seemed to compile this okay, but older
      ones do not. FIXED_SYMBOL_ABS_ADDR is more foolproof, it just takes an
      additional line to define.
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      57f26649
    • Nicholas Piggin's avatar
      powerpc/64s: Consolidate exception handler alignment · be642c34
      Nicholas Piggin authored
      
      
      Move exception handler alignment directives into the head-64.h macros,
      beause they will no longer work in-place after the next patch. This
      slightly changes functions that have alignments applied and therefore
      code generation, which is why it was not done initially (see earlier
      patch).
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      be642c34
    • Michael Ellerman's avatar
      powerpc/64s: Add new exception vector macros · da2bc464
      Michael Ellerman authored
      
      
      Create arch/powerpc/include/asm/head-64.h with macros that specify
      an exception vector (name, type, location), which will be used to
      label and lay out exceptions into the object file.
      
      Naming is moved out of exception-64s.h, which is used to specify the
      implementation of exception handlers.
      
      objdump of generated code in exception vectors is unchanged except for
      names. Alignment directives scattered around are annoying, but done
      this way so that disassembly can verify identical instruction
      generation before and after patch. These get cleaned up in future
      patch.
      
      We change the way KVMTEST works, explicitly passing EXC_HV or EXC_STD
      rather than overloading the trap number. This removes the need to have
      SOFTEN values for the overloaded trap numbers, eg. 0x502.
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      da2bc464
  12. Sep 29, 2016
    • Balbir Singh's avatar
      KVM: PPC: Book3S HV: Migrate pinned pages out of CMA · 2e5bbb54
      Balbir Singh authored
      
      
      When PCI Device pass-through is enabled via VFIO, KVM-PPC will
      pin pages using get_user_pages_fast(). One of the downsides of
      the pinning is that the page could be in CMA region. The CMA
      region is used for other allocations like the hash page table.
      Ideally we want the pinned pages to be from non CMA region.
      
      This patch (currently only for KVM PPC with VFIO) forcefully
      migrates the pages out (huge pages are omitted for the moment).
      There are more efficient ways of doing this, but that might
      be elaborate and might impact a larger audience beyond just
      the kvm ppc implementation.
      
      The magic is in new_iommu_non_cma_page() which allocates the
      new page from a non CMA region.
      
      I've tested the patches lightly at my end. The full solution
      requires migration of THP pages in the CMA region. That work
      will be done incrementally on top of this.
      
      Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Acked-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      [mpe: Merged via powerpc tree as that's where the changes are]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2e5bbb54
    • Gavin Shan's avatar
      drivers/pci/hotplug: Support surprise hotplug in powernv driver · 360aebd8
      Gavin Shan authored
      
      
      This supports PCI surprise hotplug. The design is highlighted as
      below:
      
         * The PCI slot's surprise hotplug capability is exposed through
           device node property "ibm,slot-surprise-pluggable", meaning
           PCI surprise hotplug will be disabled if skiboot doesn't support
           it yet.
         * The interrupt because of presence or link state change is raised
           on surprise hotplug event. One event is allocated and queued to
           the PCI slot for workqueue to pick it up and process in serialized
           fashion. The code flow for surprise hotplug is same to that for
           managed hotplug except: the affected PEs are put into frozen state
           to avoid unexpected EEH error reporting in surprise hot remove path.
      
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      360aebd8
  13. Sep 27, 2016
    • Thomas Huth's avatar
      KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register · fa73c3b2
      Thomas Huth authored
      
      
      The MMCR2 register is available twice, one time with number 785
      (privileged access), and one time with number 769 (unprivileged,
      but it can be disabled completely). In former times, the Linux
      kernel was using the unprivileged register 769 only, but since
      commit 8dd75ccb ("powerpc: Use privileged SPR number
      for MMCR2"), it uses the privileged register 785 instead.
      The KVM-PR code then of course also switched to use the SPR 785,
      but this is causing older guest kernels to crash, since these
      kernels still access 769 instead. So to support older kernels
      with KVM-PR again, we have to support register 769 in KVM-PR, too.
      
      Fixes: 8dd75ccb
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      fa73c3b2
    • Balbir Singh's avatar
      KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie · 4f053d06
      Balbir Singh authored
      
      
      Remove duplicate setting of the the "B" field when doing a tlbie(l).
      In compute_tlbie_rb(), the "B" field is set again just before
      returning the rb value to be used for tlbie(l).
      
      Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      4f053d06
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread · 88b02cf9
      Paul Mackerras authored
      
      
      POWER8 has one virtual timebase (VTB) register per subcore, not one
      per CPU thread.  The HV KVM code currently treats VTB as a per-thread
      register, which can lead to spurious soft lockup messages from guests
      which use the VTB as the time source for the soft lockup detector.
      (CPUs before POWER8 did not have the VTB register.)
      
      For HV KVM, this fixes the problem by making only the primary thread
      in each virtual core save and restore the VTB value.  With this,
      the VTB state becomes part of the kvmppc_vcore structure.  This
      also means that "piggybacking" of multiple virtual cores onto one
      subcore is not possible on POWER8, because then the virtual cores
      would share a single VTB register.
      
      PR KVM emulates a VTB register, which is per-vcpu because PR KVM
      has no notion of CPU threads or SMT.  For PR KVM we move the VTB
      state into the kvmppc_vcpu_book3s struct.
      
      Cc: stable@vger.kernel.org # v3.14+
      Reported-by: default avatarThomas Huth <thuth@redhat.com>
      Tested-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      88b02cf9
  14. Sep 25, 2016
Loading