Skip to content
  1. Jul 28, 2014
    • Mihai Caraman's avatar
      KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 · 9c0d4e0d
      Mihai Caraman authored
      
      
      Add mising defines MAS0_GET_TLBSEL() and MAS1_GET_TSIZE() for Book3E.
      
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      9c0d4e0d
    • Bharat Bhushan's avatar
      kvm: ppc: Add SPRN_EPR get helper function · 34f754b9
      Bharat Bhushan authored
      
      
      kvmppc_set_epr() is already defined in asm/kvm_ppc.h, So
      rename and move get_epr helper function to same file.
      
      Signed-off-by: default avatarBharat Bhushan <Bharat.Bhushan@freescale.com>
      [agraf: remove duplicate return]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      34f754b9
    • Bharat Bhushan's avatar
      kvm: ppc: booke: Add shared struct helpers of SPRN_ESR · dc168549
      Bharat Bhushan authored
      
      
      Add and use kvmppc_set_esr() and kvmppc_get_esr() helper functions
      
      Signed-off-by: default avatarBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      dc168549
    • Bharat Bhushan's avatar
      kvm: ppc: bookehv: Added wrapper macros for shadow registers · 1dc0c5b8
      Bharat Bhushan authored
      
      
      There are shadow registers like, GSPRG[0-3], GSRR0, GSRR1 etc on
      BOOKE-HV and these shadow registers are guest accessible.
      So these shadow registers needs to be updated on BOOKE-HV.
      This patch adds new macro for get/set helper of shadow register .
      
      Signed-off-by: default avatarBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      1dc0c5b8
    • Alexander Graf's avatar
      KVM: PPC: Book3S: Make magic page properly 4k mappable · 89b68c96
      Alexander Graf authored
      
      
      The magic page is defined as a 4k page of per-vCPU data that is shared
      between the guest and the host to accelerate accesses to privileged
      registers.
      
      However, when the host is using 64k page size granularity we weren't quite
      as strict about that rule anymore. Instead, we partially treated all of the
      upper 64k as magic page and mapped only the uppermost 4k with the actual
      magic contents.
      
      This works well enough for Linux which doesn't use any memory in kernel
      space in the upper 64k, but Mac OS X got upset. So this patch makes magic
      page actually stay in a 4k range even on 64k page size hosts.
      
      This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE
      hosts for me.
      
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      89b68c96
    • Alexander Graf's avatar
      KVM: PPC: Book3S: Add hack for split real mode · c01e3f66
      Alexander Graf authored
      
      
      Today we handle split real mode by mapping both instruction and data faults
      into a special virtual address space that only exists during the split mode
      phase.
      
      This is good enough to catch 32bit Linux guests that use split real mode for
      copy_from/to_user. In this case we're always prefixed with 0xc0000000 for our
      instruction pointer and can map the user space process freely below there.
      
      However, that approach fails when we're running KVM inside of KVM. Here the 1st
      level last_inst reader may well be in the same virtual page as a 2nd level
      interrupt handler.
      
      It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a
      kernel copy_from/to_user implementation can easily overlap with user space
      addresses.
      
      The architecturally correct way to fix this would be to implement an instruction
      interpreter in KVM that kicks in whenever we go into split real mode. This
      interpreter however would not receive a great amount of testing and be a lot of
      bloat for a reasonably isolated corner case.
      
      So I went back to the drawing board and tried to come up with a way to make
      split real mode work with a single flat address space. And then I realized that
      we could get away with the same trick that makes it work for Linux:
      
      Whenever we see an instruction address during split real mode that may collide,
      we just move it higher up the virtual address space to a place that hopefully
      does not collide (keep your fingers crossed!).
      
      That approach does work surprisingly well. I am able to successfully run
      Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I
      apply a tiny timing probe hack to QEMU. I'd say this is a win over even more
      broken split real mode :).
      
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      c01e3f66
    • Alexander Graf's avatar
      KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct · 1287cb3f
      Alexander Graf authored
      
      
      When building KVM with a lot of vcores (NR_CPUS is big), we can potentially
      get out of the ld immediate range for dereferences inside that struct.
      
      Move the array to the end of our kvm_arch struct. This fixes compilation
      issues with NR_CPUS=2048 for me.
      
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      1287cb3f
    • Mihai Caraman's avatar
      KVM: PPC: e500: Emulate power management control SPR · debf27d6
      Mihai Caraman authored
      
      
      For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
      to enable idle power down for cores and devices by setting up the idle count
      period at boot time. With the host already controlling the power management
      configuration the guest could simply benefit from it, so emulate guest request
      as a general store.
      
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      debf27d6
    • Alexander Graf's avatar
      KVM: PPC: Book3S HV: Make HTAB code LE host aware · 6f22bd32
      Alexander Graf authored
      
      
      When running on an LE host all data structures are kept in little endian
      byte order. However, the HTAB still needs to be maintained in big endian.
      
      So every time we access any HTAB we need to make sure we do so in the right
      byte order. Fix up all accesses to manually byte swap.
      
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      6f22bd32
    • Alexander Graf's avatar
      PPC: Add asm helpers for BE 32bit load/store · 8f6822c4
      Alexander Graf authored
      
      
      From assembly code we might not only have to explicitly BE access 64bit values,
      but sometimes also 32bit ones. Add helpers that allow for easy use of lwzx/stwx
      in their respective byte-reverse or native form.
      
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      8f6822c4
    • Mihai Caraman's avatar
      KVM: PPC: e500: Fix default tlb for victim hint · d57cef91
      Mihai Caraman authored
      
      
      Tlb search operation used for victim hint relies on the default tlb set by the
      host. When hardware tablewalk support is enabled in the host, the default tlb is
      TLB1 which leads KVM to evict the bolted entry. Set and restore the default tlb
      when searching for victim hint.
      
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Reviewed-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      d57cef91
    • Michael Neuling's avatar
      KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling · 9642382e
      Michael Neuling authored
      
      
      This adds support for the H_SET_MODE hcall.  This hcall is a
      multiplexer that has several functions, some of which are called
      rarely, and some which are potentially called very frequently.
      Here we add support for the functions that set the debug registers
      CIABR (Completed Instruction Address Breakpoint Register) and
      DAWR/DAWRX (Data Address Watchpoint Register and eXtension),
      since they could be updated by the guest as often as every context
      switch.
      
      This also adds a kvmppc_power8_compatible() function to test to see
      if a guest is compatible with POWER8 or not.  The CIABR and DAWR/X
      only exist on POWER8.
      
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      9642382e
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled · ae2113a4
      Paul Mackerras authored
      
      
      This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL
      capability is used to enable or disable in-kernel handling of an
      hcall, that the hcall is actually implemented by the kernel.
      If not an EINVAL error is returned.
      
      This also checks the default-enabled list of hcalls and prints a
      warning if any hcall there is not actually implemented.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      ae2113a4
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling · 699a0ea0
      Paul Mackerras authored
      
      
      This provides a way for userspace controls which sPAPR hcalls get
      handled in the kernel.  Each hcall can be individually enabled or
      disabled for in-kernel handling, except for H_RTAS.  The exception
      for H_RTAS is because userspace can already control whether
      individual RTAS functions are handled in-kernel or not via the
      KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
      H_RTAS is out of the normal sequence of hcall numbers.
      
      Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
      KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
      The args field of the struct kvm_enable_cap specifies the hcall number
      in args[0] and the enable/disable flag in args[1]; 0 means disable
      in-kernel handling (so that the hcall will always cause an exit to
      userspace) and 1 means enable.  Enabling or disabling in-kernel
      handling of an hcall is effective across the whole VM.
      
      The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
      on PowerPC is new, added by this commit.  The KVM_CAP_ENABLE_CAP_VM
      capability advertises that this ability exists.
      
      When a VM is created, an initial set of hcalls are enabled for
      in-kernel handling.  The set that is enabled is the set that have
      an in-kernel implementation at this point.  Any new hcall
      implementations from this point onwards should not be added to the
      default set without a good reason.
      
      No distinction is made between real-mode and virtual-mode hcall
      implementations; the one setting controls them both.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      699a0ea0
    • Alexander Graf's avatar
      KVM: PPC: Book3s HV: Fix tlbie compile error · f6bf3a66
      Alexander Graf authored
      
      
      Some compilers complain about uninitialized variables in the compute_tlbie_rb
      function. When you follow the code path you'll realize that we'll never get
      to that point, but the compiler isn't all that smart.
      
      So just default to 4k page sizes for everything, making the compiler happy
      and the code slightly easier to read.
      
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      f6bf3a66
    • Aneesh Kumar K.V's avatar
      KVM: PPC: BOOK3S: PR: Emulate instruction counter · 06da28e7
      Aneesh Kumar K.V authored
      
      
      Writing to IC is not allowed in the privileged mode.
      
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      06da28e7
    • Aneesh Kumar K.V's avatar
      KVM: PPC: BOOK3S: PR: Emulate virtual timebase register · 8f42ab27
      Aneesh Kumar K.V authored
      
      
      virtual time base register is a per VM, per cpu register that needs
      to be saved and restored on vm exit and entry. Writing to VTB is not
      allowed in the privileged mode.
      
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [agraf: fix compile error]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      8f42ab27
  2. Jul 06, 2014
  3. Jun 11, 2014
  4. Jun 05, 2014
  5. Jun 04, 2014
    • Fabian Frederick's avatar
      sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL · f6187769
      Fabian Frederick authored
      
      
      sys_sgetmask and sys_ssetmask are obsolete system calls no longer
      supported in libc.
      
      This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
      mode configuration.That option is enabled by default for those
      architectures.
      
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f6187769
    • Mel Gorman's avatar
      mm: disable zone_reclaim_mode by default · 4f9b16a6
      Mel Gorman authored
      
      
      When it was introduced, zone_reclaim_mode made sense as NUMA distances
      punished and workloads were generally partitioned to fit into a NUMA
      node.  NUMA machines are now common but few of the workloads are
      NUMA-aware and it's routine to see major performance degradation due to
      zone_reclaim_mode being enabled but relatively few can identify the
      problem.
      
      Those that require zone_reclaim_mode are likely to be able to detect
      when it needs to be enabled and tune appropriately so lets have a
      sensible default for the bulk of users.
      
      This patch (of 2):
      
      zone_reclaim_mode causes processes to prefer reclaiming memory from
      local node instead of spilling over to other nodes.  This made sense
      initially when NUMA machines were almost exclusively HPC and the
      workload was partitioned into nodes.  The NUMA penalties were
      sufficiently high to justify reclaiming the memory.  On current machines
      and workloads it is often the case that zone_reclaim_mode destroys
      performance but not all users know how to detect this.  Favour the
      common case and disable it by default.  Users that are sophisticated
      enough to know they need zone_reclaim_mode will detect it.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4f9b16a6
    • Mel Gorman's avatar
      x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels · c46a7c81
      Mel Gorman authored
      
      
      _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
      faults on x86.  Care is taken such that _PAGE_NUMA is used only in
      situations where the VMA flags distinguish between NUMA hinting faults
      and prot_none faults.  This decision was x86-specific and conceptually
      it is difficult requiring special casing to distinguish between PROTNONE
      and NUMA ptes based on context.
      
      Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
      between an entry that is really unmapped and a page that is protected
      for NUMA hinting faults as if the PTE is not present then a fault will
      be trapped.
      
      Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset.
      This patch shrinks the maximum possible swap size and uses the bit to
      uniquely distinguish between NUMA hinting ptes and swap ptes.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c46a7c81
  6. Jun 01, 2014
  7. May 30, 2014
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs · 9bc01a9b
      Paul Mackerras authored
      
      
      This adds workarounds for two hardware bugs in the POWER8 performance
      monitor unit (PMU), both related to interrupt generation.  The effect
      of these bugs is that PMU interrupts can get lost, leading to tools
      such as perf reporting fewer counts and samples than they should.
      
      The first bug relates to the PMAO (perf. mon. alert occurred) bit in
      MMCR0; setting it should cause an interrupt, but doesn't.  The other
      bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0.
      Setting PMAE when a counter is negative and counter negative
      conditions are enabled to cause alerts should cause an alert, but
      doesn't.
      
      The workaround for the first bug is to create conditions where a
      counter will overflow, whenever we are about to restore a MMCR0
      value that has PMAO set (and PMAO_SYNC clear).  The workaround for
      the second bug is to freeze all counters using MMCR2 before reading
      MMCR0.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      9bc01a9b
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number · e1d8a96d
      Paul Mackerras authored
      
      
      Commit b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8
      SPRs") added a definition of KVM_REG_PPC_WORT with the same register
      number as the existing KVM_REG_PPC_VRSAVE (though in fact the
      definitions are not identical because of the different register sizes.)
      
      For clarity, this moves KVM_REG_PPC_WORT to the next unused number,
      and also adds it to api.txt.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      e1d8a96d
    • Aneesh Kumar K.V's avatar
      KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler · ddca156a
      Aneesh Kumar K.V authored
      
      
      Use make_dsisr instead of open coding it. This also have
      the added benefit of handling alignment interrupt on additional
      instructions.
      
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      ddca156a
    • Alexander Graf's avatar
      KVM: PPC: Disable NX for old magic page using guests · f3383cf8
      Alexander Graf authored
      
      
      Old guests try to use the magic page, but map their trampoline code inside
      of an NX region.
      
      Since we can't fix those old kernels, try to detect whether the guest is sane
      or not. If not, just disable NX functionality in KVM so that old guests at
      least work at all. For newer guests, add a bit that we can set to keep NX
      functionality available.
      
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      f3383cf8
    • Aneesh Kumar K.V's avatar
      KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest · 1f365bb0
      Aneesh Kumar K.V authored
      
      
      On recent IBM Power CPUs, while the hashed page table is looked up using
      the page size from the segmentation hardware (i.e. the SLB), it is
      possible to have the HPT entry indicate a larger page size.  Thus for
      example it is possible to put a 16MB page in a 64kB segment, but since
      the hash lookup is done using a 64kB page size, it may be necessary to
      put multiple entries in the HPT for a single 16MB page.  This
      capability is called mixed page-size segment (MPSS).  With MPSS,
      there are two relevant page sizes: the base page size, which is the
      size used in searching the HPT, and the actual page size, which is the
      size indicated in the HPT entry. [ Note that the actual page size is
      always >= base page size ].
      
      We use "ibm,segment-page-sizes" device tree node to advertise
      the MPSS support to PAPR guest. The penc encoding indicates whether
      we support a specific combination of base page size and actual
      page size in the same segment. We also use the penc value in the
      LP encoding of HPTE entry.
      
      This patch exposes MPSS support to KVM guest by advertising the
      feature via "ibm,segment-page-sizes". It also adds the necessary changes
      to decode the base page size and the actual page size correctly from the
      HPTE entry.
      
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      1f365bb0
    • Alexander Graf's avatar
      KVM: PPC: Book3S PR: Expose TAR facility to guest · e14e7a1e
      Alexander Graf authored
      
      
      POWER8 implements a new register called TAR. This register has to be
      enabled in FSCR and then from KVM's point of view is mere storage.
      
      This patch enables the guest to use TAR.
      
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      e14e7a1e
Loading