Skip to content
  1. Feb 02, 2016
    • Maciej W. Rozycki's avatar
      MIPS: traps.c: Don't emulate RDHWR in the CpU #0 exception handler · 10f6d99f
      Maciej W. Rozycki authored
      
      
      In the regular MIPS instruction set RDHWR is encoded with the SPECIAL3
      (011111) major opcode.  Therefore it cannot trigger the CpU (Coprocessor
      Unusable) exception, and certainly not for coprocessor 0, as the opcode
      does not overlap with any of the older ISA reservations, i.e. LWC0
      (110000), SWC0 (111000), LDC0 (110100) or SDC0 (111100).  The closest
      match might be SDC3 (111111), possibly causing a CpU #3 exception,
      however our code does not handle it anyway.  A quick check with a MIPS I
      and a MIPS III processor:
      
      CPU0 revision is: 00000220 (R3000)
      CPU0 revision is: 00000440 (R4400SC)
      
      indeed indicates that the RI (Reserved Instruction) exception is
      triggered.  It's only LL and SC that require emulation in the CpU #0
      exception handler as they reuse the LWC0 and SWC0 opcodes respectively.
      
      In the microMIPS instruction set RDHWR is mandatory and triggering the
      RI exception is required on unimplemented or disabled register accesses.
      Therefore emulating the microMIPS instruction in the CpU #0 exception
      handler is not required either.
      
      Signed-off-by: default avatarMaciej W. Rozycki <macro@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/12280/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      10f6d99f
  2. Feb 01, 2016
    • James Hogan's avatar
      MIPS: Fix FPU disable with preemption · 00fe56dc
      James Hogan authored
      
      
      The FPU should not be left enabled after a task context switch. This
      isn't usually a problem as the FPU enable bit is updated before
      returning to userland, however it can potentially mask kernel bugs, and
      in fact KVM assumes it won't happen and won't clear the FPU enable bit
      before returning to the guest, which allows the guest to use stale FPU
      context.
      
      Interrupts and exceptions save and restore most bits of the CP0 Status
      register which contains the FPU enable bit (CU1). When the kernel needs
      to enable or disable the FPU (for example due to attempted FPU use by
      userland, or the scheduler being invoked) both the actual Status
      register and the saved value in the userland context are updated.
      
      However this doesn't work correctly with full kernel preemption enabled,
      since the FPU enable bit can be cleared from within an interrupt when
      the scheduler is invoked, and only the userland context is updated, not
      the interrupt context.
      
      For example:
      1) Enter kernel with FPU already enabled, TIF_USEDFPU=1, Status.CU1=1
         saved.
      2) Take a timer interrupt while in kernel mode, Status.CU1=1 saved.
      3) Timer interrupt invokes scheduler to preempt the task, which clears
         TIF_USEDFPU, disables the FPU in Status register (Status.CU1=0), and
         the value stored in user context from step (1), but not the interrupt
         context from step (2).
      4) When the process is scheduled back in again Status.CU1=0.
      5) The interrupt context from step (2) is restored, which sets
         Status.CU1=1. So from user context point of view, preemption has
         re-enabled FPU!
      6) If the scheduler is invoked again (via preemption or voluntarily)
         before returning to userland, TIF_USEDFPU=0 so the FPU is not
         disabled before the task context switch.
      7) The next task resumes from the context switch with FPU enabled!
      
      The restoring of the Status register on return from interrupt/exception
      is already selective about which bits to restore, leaving the interrupt
      mask bits alone so enabling/disabling of CPU interrupt lines can
      persist. Extend this to also leave both the CU1 bit (FPU enable) and the
      FR bit (which specifies the FPU mode and gets changed with CU1). This
      prevents a stale Status value being restored in step (5) above and
      persisting through subsequent context switches.
      
      Also switch to the use of definitions from asm/mipsregs.h while we're at
      it.
      
      Since this change also affects the restoration of Status register on the
      path back to userland, it increases the sensitivity of the kernel to the
      problem of the FPU being left enabled, allowing it to propagate to
      userland, therefore a warning is also added to lose_fpu_inatomic() to
      point out any future reoccurances before they do any damage.
      
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Reviewed-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/12303/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      00fe56dc
    • James Hogan's avatar
      MIPS: Properly disable FPU in start_thread() · 76e5846d
      James Hogan authored
      
      
      start_thread() (called for execve(2)) clears the TIF_USEDFPU flag
      without atomically disabling the FPU. With a preemptive kernel, an
      unfortunately timed preemption after this could result in another
      task (or KVM guest) being scheduled in with the FPU still enabled, since
      lose_fpu_inatomic() only turns it off if TIF_USEDFPU is set.
      
      Use lose_fpu(0) instead of the separate FPU / MSA management, which
      should do the right thing (drop FPU properly and atomically without
      saving state) and will be more future proof.
      
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Reviewed-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/12302/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      76e5846d
    • James Hogan's avatar
      MIPS: Fix buffer overflow in syscall_get_arguments() · f4dce1ff
      James Hogan authored
      
      
      Since commit 4c21b8fd ("MIPS: seccomp: Handle indirect system calls
      (o32)"), syscall_get_arguments() attempts to handle o32 indirect syscall
      arguments by incrementing both the start argument number and the number
      of arguments to fetch. However only the start argument number needs to
      be incremented. The number of arguments does not change, they're just
      shifted up by one, and in fact the output array is provided by the
      caller and is likely only n entries long, so reading more arguments
      overflows the output buffer.
      
      In the case of seccomp, this results in it fetching 7 arguments starting
      at the 2nd one, which overflows the unsigned long args[6] in
      populate_seccomp_data(). This clobbers the $s0 register from
      syscall_trace_enter() which __seccomp_phase1_filter() saved onto the
      stack, into which syscall_trace_enter() had placed its syscall number
      argument. This caused Chromium to crash.
      
      Credit goes to Milko for tracking it down as far as $s0 being clobbered.
      
      Fixes: 4c21b8fd ("MIPS: seccomp: Handle indirect system calls (o32)")
      Reported-by: default avatarMilko Leporis <milko.leporis@imgtec.com>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.15-
      Patchwork: https://patchwork.linux-mips.org/patch/12213/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      f4dce1ff
  3. Jan 29, 2016
    • Matt Fleming's avatar
      x86/mm/pat: Avoid truncation when converting cpa->numpages to address · 74256377
      Matt Fleming authored
      
      
      There are a couple of nasty truncation bugs lurking in the pageattr
      code that can be triggered when mapping EFI regions, e.g. when we pass
      a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting
      left by PAGE_SHIFT will truncate the resultant address to 32-bits.
      
      Viorel-Cătălin managed to trigger this bug on his Dell machine that
      provides a ~5GB EFI region which requires 1236992 pages to be mapped.
      When calling populate_pud() the end of the region gets calculated
      incorrectly in the following buggy expression,
      
        end = start + (cpa->numpages << PAGE_SHIFT);
      
      And only 188416 pages are mapped. Next, populate_pud() gets invoked
      for a second time because of the loop in __change_page_attr_set_clr(),
      only this time no pages get mapped because shifting the remaining
      number of pages (1048576) by PAGE_SHIFT is zero. At which point the
      loop in __change_page_attr_set_clr() spins forever because we fail to
      map progress.
      
      Hitting this bug depends very much on the virtual address we pick to
      map the large region at and how many pages we map on the initial run
      through the loop. This explains why this issue was only recently hit
      with the introduction of commit
      
        a5caa209 ("x86/efi: Fix boot crash by mapping EFI memmap
         entries bottom-up at runtime, instead of top-down")
      
      It's interesting to note that safe uses of cpa->numpages do exist in
      the pageattr code. If instead of shifting ->numpages we multiply by
      PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and
      so the result is unsigned long.
      
      To avoid surprises when users try to convert very large cpa->numpages
      values to addresses, change the data type from 'int' to 'unsigned
      long', thereby making it suitable for shifting by PAGE_SHIFT without
      any type casting.
      
      The alternative would be to make liberal use of casting, but that is
      far more likely to cause problems in the future when someone adds more
      code and fails to cast properly; this bug was difficult enough to
      track down in the first place.
      
      Reported-and-tested-by: default avatarViorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com>
      Acked-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131
      Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.uk
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      74256377
    • Peter Zijlstra's avatar
      perf/x86: De-obfuscate code · 8f04b853
      Peter Zijlstra authored
      
      
      Get rid of the 'onln' obfuscation.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8f04b853
    • Peter Zijlstra's avatar
      perf/x86: Fix uninitialized value usage · e01d8718
      Peter Zijlstra authored
      
      
      When calling intel_alt_er() with .idx != EXTRA_REG_RSP_* we will not
      initialize alt_idx and then use this uninitialized value to index an
      array.
      
      When that is not fatal, it can result in an infinite loop in its
      caller __intel_shared_reg_get_constraints(), with IRQs disabled.
      
      Alternative error modes are random memory corruption due to the
      cpuc->shared_regs->regs[] array overrun, which manifest in either
      get_constraints or put_constraints doing weird stuff.
      
      Only took 6 hours of painful debugging to find this. Neither GCC nor
      Smatch warnings flagged this bug.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: ae3f011f ("perf/x86/intel: Fix SLM MSR_OFFCORE_RSP1 valid_mask")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e01d8718
  4. Jan 28, 2016
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Fixup _HPAGE_CHG_MASK · 2d19fc63
      Aneesh Kumar K.V authored
      
      
      This was wrongly updated by commit 7aa9a23c ("powerpc, thp: remove
      infrastructure for handling splitting PMDs") during the last merge
      window. Fix it up.
      
      This could lead to incorrect behaviour in THP and/or mprotect(), at a
      minimum.
      
      Fixes: 7aa9a23c ("powerpc, thp: remove infrastructure for handling splitting PMDs")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2d19fc63
    • Madhavan Srinivasan's avatar
      powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8 · 370f06c8
      Madhavan Srinivasan authored
      
      
      Commit 7a786832 ("powerpc/perf: Add an explict flag indicating
      presence of SLOT field") introduced the PPMU_HAS_SSLOT flag to remove
      the assumption that MMCRA[SLOT] was present when PPMU_ALT_SIPR was not
      set.
      
      That commit's changelog also mentions that Power8 does not support
      MMCRA[SLOT]. However when the Power8 PMU support was merged, it
      errnoeously included the PPMU_HAS_SSLOT flag.
      
      So remove PPMU_HAS_SSLOT from the Power8 flags.
      
      mpe: On systems where MMCRA[SLOT] exists, the field occupies bits 37:39
      (IBM numbering). On Power8 bit 37 is reserved, and 38:39 overlap with
      the high bits of the Threshold Event Counter Mantissa. I am not aware of
      any published events which use the threshold counting mechanism, which
      would cause the mantissa bits to be set. So in practice this bug is
      unlikely to trigger.
      
      Fixes: e05b9b9e ("powerpc/perf: Power8 PMU support")
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      370f06c8
  5. Jan 27, 2016
  6. Jan 26, 2016
  7. Jan 25, 2016
    • Masanari Iida's avatar
      arm64: Fix an enum typo in mm/dump.c · b3122023
      Masanari Iida authored
      
      
      This patch fixes a typo in mm/dump.c:
      "MODUELS_END_NR" should be "MODULES_END_NR".
      
      Signed-off-by: default avatarMasanari Iida <standby24x7@gmail.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      b3122023
    • Catalin Marinas's avatar
      arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings · ac15bd63
      Catalin Marinas authored
      
      
      Currently, set_pte_at() only checks the software PTE_WRITE bit for user
      mappings when it sets or clears the hardware PTE_RDONLY accordingly. The
      kernel ptes are written directly without any modification, relying
      solely on the protection bits in macros like PAGE_KERNEL. However,
      modifying kernel pte attributes via pte_wrprotect() would be ignored by
      set_pte_at(). Since pte_wrprotect() does not set PTE_RDONLY (it only
      clears PTE_WRITE), the new permission is not taken into account.
      
      This patch changes set_pte_at() to adjust the read-only permission for
      kernel ptes as well. As a side effect, existing PROT_* definitions used
      for kernel ioremap*() need to include PTE_DIRTY | PTE_WRITE.
      
      (additionally, white space fix for PTE_KERNEL_ROX)
      
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Tested-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      ac15bd63
    • Lorenzo Pieralisi's avatar
      arm64: kernel: fix architected PMU registers unconditional access · f436b2ac
      Lorenzo Pieralisi authored
      
      
      The Performance Monitors extension is an optional feature of the
      AArch64 architecture, therefore, in order to access Performance
      Monitors registers safely, the kernel should detect the architected
      PMU unit presence through the ID_AA64DFR0_EL1 register PMUVer field
      before accessing them.
      
      This patch implements a guard by reading the ID_AA64DFR0_EL1 register
      PMUVer field to detect the architected PMU presence and prevent accessing
      PMU system registers if the Performance Monitors extension is not
      implemented in the core.
      
      Cc: Peter Maydell <peter.maydell@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: <stable@vger.kernel.org>
      Fixes: 60792ad3 ("arm64: kernel: enforce pmuserenr_el0 initialization and restore")
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      f436b2ac
    • Ard Biesheuvel's avatar
      arm64: kasan: ensure that the KASAN zero page is mapped read-only · 7b1af979
      Ard Biesheuvel authored
      
      
      When switching from the early KASAN shadow region, which maps the
      entire shadow space read-write, to the permanent KASAN shadow region,
      which uses a zero page to shadow regions that are not subject to
      instrumentation, the lowest level table kasan_zero_pte[] may be
      reused unmodified, which means that the mappings of the zero page
      that it contains will still be read-write.
      
      So update it explicitly to map the zero page read only when we
      activate the permanent mapping.
      
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      7b1af979
    • Ard Biesheuvel's avatar
      arm64: hide __efistub_ aliases from kallsyms · 75feee3d
      Ard Biesheuvel authored
      
      
      Commit e8f3010f ("arm64/efi: isolate EFI stub from the kernel
      proper") isolated the EFI stub code from the kernel proper by prefixing
      all of its symbols with __efistub_, and selectively allowing access to
      core kernel symbols from the stub by emitting __efistub_ aliases for
      functions and variables that the stub can access legally.
      
      As an unintended side effect, these aliases are emitted into the
      kallsyms symbol table, which means they may turn up in backtraces,
      e.g.,
      
        ...
        PC is at __efistub_memset+0x108/0x200
        LR is at fixup_init+0x3c/0x48
        ...
        [<ffffff8008328608>] __efistub_memset+0x108/0x200
        [<ffffff8008094dcc>] free_initmem+0x2c/0x40
        [<ffffff8008645198>] kernel_init+0x20/0xe0
        [<ffffff8008085cd0>] ret_from_fork+0x10/0x40
      
      The backtrace in question has nothing to do with the EFI stub, but
      simply returns one of the several aliases of memset() that have been
      recorded in the kallsyms table. This is undesirable, since it may
      suggest to people who are not aware of this that the issue they are
      seeing is somehow EFI related.
      
      So hide the __efistub_ aliases from kallsyms, by emitting them as
      absolute linker symbols explicitly. The distinction between those
      and section relative symbols is completely irrelevant to these
      definitions, and to the final link we are performing when these
      definitions are being taken into account (the distinction is only
      relevant to symbols defined inside a section definition when performing
      a partial link), and so the resulting values are identical to the
      original ones. Since absolute symbols are ignored by kallsyms, this
      will result in these values to be omitted from its symbol table.
      
      After this patch, the backtrace generated from the same address looks
      like this:
        ...
        PC is at __memset+0x108/0x200
        LR is at fixup_init+0x3c/0x48
        ...
        [<ffffff8008328608>] __memset+0x108/0x200
        [<ffffff8008094dcc>] free_initmem+0x2c/0x40
        [<ffffff8008645198>] kernel_init+0x20/0xe0
        [<ffffff8008085cd0>] ret_from_fork+0x10/0x40
      
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      75feee3d
    • Vasant Hegde's avatar
      powerpc/mm: Allow user space to map rtas_rmo_buf · e256caa7
      Vasant Hegde authored
      
      
      With commit 90a545e9 (restrict /dev/mem to idle io memory ranges) mapping
      rtas_rmo_buf from user space is failing. Hence we are not able to make
      RTAS syscall.
      
      This patch calls page_is_rtas_user_buf before calling iomem_is_exclusive
      in  devmem_is_allowed(). This will allow user space to map rtas_rmo_buf
      and we are able to make RTAS syscall.
      
      Reported-by: default avatarBharata B Rao <bharata@linux.vnet.ibm.com>
      CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e256caa7
  8. Jan 24, 2016
Loading