Skip to content
  1. May 09, 2016
  2. May 06, 2016
    • Dmitry V. Levin's avatar
      parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls · f0b22d1b
      Dmitry V. Levin authored
      
      
      Do not load one entry beyond the end of the syscall table when the
      syscall number of a traced process equals to __NR_Linux_syscalls.
      Similar bug with regular processes was fixed by commit 3bb457af
      ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").
      
      This bug was found by strace test suite.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      f0b22d1b
    • Chen Yu's avatar
      x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO · 886123fb
      Chen Yu authored
      
      
      Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f;
      
      Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM
      (35.5), the ratio bits are bit 8-15.
      
      Ignoring the upper bits can result in an incorrect tsc ratio, which causes the
      TSC calibration and the Local APIC timer frequency to be incorrect.
      
      Fix this problem by masking 0xff instead.
      
      [ tglx: Massaged changelog ]
      
      Fixes: 7da7c156 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs"
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: stable@vger.kernel.org
      Cc: Bin Gao <bin.gao@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      886123fb
    • Andrea Arcangeli's avatar
      mm: thp: kvm: fix memory corruption in KVM with THP enabled · 127393fb
      Andrea Arcangeli authored
      
      
      After the THP refcounting change, obtaining a compound pages from
      get_user_pages() no longer allows us to assume the entire compound page
      is immediately mappable from a secondary MMU.
      
      A secondary MMU doesn't want to call get_user_pages() more than once for
      each compound page, in order to know if it can map the whole compound
      page.  So a secondary MMU needs to know from a single get_user_pages()
      invocation when it can map immediately the entire compound page to avoid
      a flood of unnecessary secondary MMU faults and spurious
      atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
      users).
      
      Ideally instead of the page->_mapcount < 1 check, get_user_pages()
      should return the granularity of the "page" mapping in the "mm" passed
      to get_user_pages().  However it's non trivial change to pass the "pmd"
      status belonging to the "mm" walked by get_user_pages up the stack (up
      to the caller of get_user_pages).  So the fix just checks if there is
      not a single pte mapping on the page returned by get_user_pages, and in
      turn if the caller can assume that the whole compound page is mapped in
      the current "mm" (in a pmd_trans_huge()).  In such case the entire
      compound page is safe to map into the secondary MMU without additional
      get_user_pages() calls on the surrounding tail/head pages.  In addition
      of being faster, not having to run other get_user_pages() calls also
      reduces the memory footprint of the secondary MMU fault in case the pmd
      split happened as result of memory pressure.
      
      Without this fix after a MADV_DONTNEED (like invoked by QEMU during
      postcopy live migration or balloning) or after generic swapping (with a
      failure in split_huge_page() that would only result in pmd splitting and
      not a physical page split), KVM would map the whole compound page into
      the shadow pagetables, despite regular faults or userfaults (like
      UFFDIO_COPY) may map regular pages into the primary MMU as result of the
      pte faults, leading to the guest mode and userland mode going out of
      sync and not working on the same memory at all times.
      
      Any other secondary MMU notifier manager (KVM is just one of the many
      MMU notifier users) will need the same information if it doesn't want to
      run a flood of get_user_pages_fast and it can support multiple
      granularity in the secondary MMU mappings, so I think it is justified to
      be exposed not just to KVM.
      
      The other option would be to move transparent_hugepage_adjust to
      mm/huge_memory.c but that currently has all kind of KVM data structures
      in it, so it's definitely not a cut-and-paste work, so I couldn't do a
      fix as cleaner as this one for 4.6.
      
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: "Li, Liang Z" <liang.z.li@intel.com>
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      127393fb
  3. May 05, 2016
  4. May 04, 2016
    • Josh Boyer's avatar
      x86/efi-bgrt: Switch all pr_err() to pr_notice() for invalid BGRT · 7f9b474c
      Josh Boyer authored
      
      
      The promise of pretty boot splashes from firmware via BGRT was at
      best only that; a promise.  The kernel diligently checks to make
      sure the BGRT data firmware gives it is valid, and dutifully warns
      the user when it isn't.  However, it does so via the pr_err log
      level which seems unnecessary.  The user cannot do anything about
      this and there really isn't an error on the part of Linux to
      correct.
      
      This lowers the log level by using pr_notice instead.  Users will
      no longer have their boot process uglified by the kernel reminding
      us that firmware can and often is broken when the 'quiet' kernel
      parameter is specified.  Ironic, considering BGRT is supposed to
      make boot pretty to begin with.
      
      Signed-off-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Môshe van der Sterre <me@moshe.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1462303781-8686-4-git-send-email-matt@codeblueprint.co.uk
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7f9b474c
  5. May 02, 2016
    • Anton Blanchard's avatar
      powerpc: Fix bad inline asm constraint in create_zero_mask() · b4c11211
      Anton Blanchard authored
      
      
      In create_zero_mask() we have:
      
      	addi	%1,%2,-1
      	andc	%1,%1,%2
      	popcntd	%0,%1
      
      using the "r" constraint for %2. r0 is a valid register in the "r" set,
      but addi X,r0,X turns it into an li:
      
      	li	r7,-1
      	andc	r7,r7,r0
      	popcntd	r4,r7
      
      Fix this by using the "b" constraint, for which r0 is not a valid
      register.
      
      This was found with a kernel build using gcc trunk, narrowed down to
      when -frename-registers was enabled at -O2. It is just luck however
      that we aren't seeing this on older toolchains.
      
      Thanks to Segher for working with me to find this issue.
      
      Cc: stable@vger.kernel.org
      Fixes: d0cebfa6 ("powerpc: word-at-a-time optimization for 64-bit Little Endian")
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b4c11211
  6. Apr 29, 2016
    • Arnd Bergmann's avatar
      ARM: davinci: only use NVMEM when available · 04b9665b
      Arnd Bergmann authored
      
      
      The davinci platform contains code that calls into the nvmem
      subsystem, but that might be a loadable module, causing a
      link error:
      
      arch/arm/mach-davinci/built-in.o: In function `davinci_get_mac_addr':
      :(.text+0x1088): undefined reference to `nvmem_device_read'
      arch/arm/mach-davinci/built-in.o: In function `read_factory_config':
      :(.text+0x214c): undefined reference to `nvmem_device_read'
      
      Also, when NVMEM is completely disabled, the functions fail with
      nonobvious error messages.
      
      This ensures we only call the API functions when the code is actually
      reachable from the board file, and otherwise prints a unique log
      message.
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: bec3c11b ("misc: at24: replace memory_accessor with nvmem_device_read")
      Signed-off-by: default avatarSekhar Nori <nsekhar@ti.com>
      Signed-off-by: default avatarKevin Hilman <khilman@baylibre.com>
      04b9665b
  7. Apr 28, 2016
    • Kan Liang's avatar
      perf/x86/intel: Fix incorrect lbr_sel_mask value · cf3beb7c
      Kan Liang authored
      
      
      This patch fixes a bug which was introduced by:
      
       b16a5b52 ("perf/x86: Add option to disable reading branch flags/cycles")
      
      In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK
      doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be
      set in lbr_select.
      
      This patch corrects the LBR_SEL_MASK by including all valid bits in
      LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in
      LBR_SELECT. It does not operate in suppress mode, so it needs to be
      specially handled in intel_pmu_setup_hw_lbr_filter.
      
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cf3beb7c
    • Alexander Shishkin's avatar
      perf/x86/intel/pt: Don't die on VMXON · 1c5ac21a
      Alexander Shishkin authored
      
      
      Some versions of Intel PT do not support tracing across VMXON, more
      specifically, VMXON will clear TraceEn control bit and any attempt to
      set it before VMXOFF will throw a #GP, which in the current state of
      things will crash the kernel. Namely:
      
        $ perf record -e intel_pt// kvm -nographic
      
      on such a machine will kill it.
      
      To avoid this, notify the intel_pt driver before VMXON and after
      VMXOFF so that it knows when not to enable itself.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1c5ac21a
    • Adam Borowski's avatar
      perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX · 0a25556f
      Adam Borowski authored
      
      
      The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is
      referenced by filter_events() which expects undefined events to have a
      value of 0.
      
      Found via KASAN:
      
        UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
        index 9 is out of range for type 'u64 [9]'
        UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
        load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64'
      
      Signed-off-by: default avatarAdam Borowski <kilobyte@angband.pl>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0a25556f
    • Keith Busch's avatar
      x86/apic: Handle zero vector gracefully in clear_vector_irq() · 1bdb8970
      Keith Busch authored
      
      
      If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
      the already allocated vectors. This subsequently calls clear_vector_irq().
      
      The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
      clear_vector_irq().
      
      We cannot suppress the call to x86_vector_free_irqs() for the failed
      interrupt, because the other data related to this irq must be cleaned up as
      well. So calling clear_vector_irq() with vector == 0 is legitimate.
      
      Remove the BUG_ON and return if vector is zero,
      
      [ tglx: Massaged changelog ]
      
      Fixes: b5dc8e6c "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1bdb8970
  8. Apr 27, 2016
    • Sascha Hauer's avatar
      ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel · 5616f367
      Sascha Hauer authored
      
      
      The secondary CPU starts up in ARM mode. When the kernel is compiled in
      thumb2 mode we have to explicitly compile the secondary startup
      trampoline in ARM mode, otherwise the CPU will go to Nirvana.
      
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Reported-by: default avatarSteffen Trumtrar <s.trumtrar@pengutronix.de>
      Suggested-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDinh Nguyen <dinguyen@opensource.altera.com>
      Signed-off-by: default avatarKevin Hilman <khilman@baylibre.com>
      5616f367
    • David S. Miller's avatar
      sparc64: Fix bootup regressions on some Kconfig combinations. · 49fa5230
      David S. Miller authored
      
      
      The system call tracing bug fix mentioned in the Fixes tag
      below increased the amount of assembler code in the sequence
      of assembler files included by head_64.S
      
      This caused to total set of code to exceed 0x4000 bytes in
      size, which overflows the expression in head_64.S that works
      to place swapper_tsb at address 0x408000.
      
      When this is violated, the TSB is not properly aligned, and
      also the trap table is not aligned properly either.  All of
      this together results in failed boots.
      
      So, do two things:
      
      1) Simplify some code by using ba,a instead of ba/nop to get
         those bytes back.
      
      2) Add a linker script assertion to make sure that if this
         happens again the build will fail.
      
      Fixes: 1a40b953 ("sparc: Fix system call tracing register handling.")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Reported-by: default avatarJoerg Abraham <joerg.abraham@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49fa5230
    • Alexey Brodkin's avatar
      ARC: add support for reserved memory defined by device tree · 1b10cb21
      Alexey Brodkin authored
      
      
      Enable reserved memory initialization from device tree.
      
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      1b10cb21
    • Alexey Brodkin's avatar
      ARC: support generic per-device coherent dma mem · 32ed9a0e
      Alexey Brodkin authored
      
      
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      32ed9a0e
    • Romain Perier's avatar
      nios2: memset: use the right constraint modifier for the %4 output operand · a8950e49
      Romain Perier authored
      Depending on the size of the area to be memset'ed, the nios2 memset implementation
      either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized
      implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores
      rather than 1-byte stores to speed up memset.
      
      However, we discovered that on our nios2 platform, memset() was not properly setting the
      buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to:
      
      0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ...
      
      Which is obviously incorrect. Our investigation has revealed that the problem lies in the
      incorrect constraints used in the inline assembly.
      
      The following piece of assembly, from the nios2 memset implementation, is supposed to
      create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument:
      
      /* fill8 %3, %5 (c & 0xff) */
      "       slli    %4, %5, 8\n"
      "       or      %4, %4, %5\n"
      "       slli    %3, %4, 16\n"
      "       or      %3, %3, %4\n"
      
      However, depending on the compiler and optimization level, this code might be compiled as:
      
      34:	280a923a 	slli	r5,r5,8
      38:	294ab03a 	or	r5,r5,r5
      3c:	2808943a 	slli	r4,r5,16
      40:	2148b03a 	or	r4,r4,r5
      
      This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern
      stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff.
      
      %4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in
      http://www.ethernut.de/en/documents/arm-inline-asm.html
      
      , this does not prevent gcc from
      using the same register for an output operand (%4) and input operand (%5). By using the
      constraint modifier '&', we indicate that the register should be used for output only. With this
      change, we get the following assembly output:
      
      34:	2810923a 	slli	r8,r5,8
      38:	4150b03a 	or	r8,r8,r5
      3c:	400e943a 	slli	r7,r8,16
      40:	3a0eb03a 	or	r7,r7,r8
      
      Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern.
      
      It is worth mentioning the observed consequence of this bug: we were hitting the kernel
      BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was
      previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is
      set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right
      after the initialization was finding some bits to be set to 0, which didn't make sense since the
      bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the
      bitmap was in fact initialized to 0xff00ff00.
      
      Thanks to Marek Vasut for his help and feedback.
      
      Signed-off-by: default avatarRomain Perier <romain.perier@free-electrons.com>
      Acked-by: default avatarMarek Vasut <marex@denx.de>
      Acked-by: default avatarLey Foon Tan <lftan@altera.com>
      a8950e49
    • Rui Salvaterra's avatar
      powerpc: wire up preadv2 and pwritev2 syscalls · d701cca6
      Rui Salvaterra authored
      
      
      Wire up preadv2/pwritev2 in the same way as preadv/pwritev. Fixes two
      build warnings on ppc64.
      
      mpe: Lightly tested with fio (slightly hacked to add the syscall
      wrappers):
      
        fio-4217  [009] ....  1304.635300: sys_preadv2(fd: 3, vec:
        10025821de0, vlen: 1, pos_l: 6253000, pos_h: 0, flags: 1)
        fio-4217  [009] ....  1304.635474: sys_preadv2 -> 0x1000
      
      Signed-off-by: default avatarRui Salvaterra <rsalvaterra@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d701cca6
  9. Apr 26, 2016
Loading