Skip to content
  1. Dec 19, 2016
  2. Dec 14, 2016
    • Vineet Gupta's avatar
      ARCv2: intc: default all interrupts to priority 1 · 107177b1
      Vineet Gupta authored
      
      
      ARC HS Cores support configurable multiple interrupt priorities of upto
      16 levels. In commit dec2b284 ("ARCv2: intc: Allow interruption by
      lowest priority interrupt") we switched to 15 which seems a bit
      excessive given that there would be rare hardware implementing so many
      preemption levels AND running Linux. It would seem that 2 levels will be
      more common so switch to 1 as the default priority level. This will be
      the "lower" priority level saving 0 for implementing NMI style support.
      
      This scheme also works in systems with more than 2 prioity levels as
      well.
      
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      107177b1
    • Vineet Gupta's avatar
      ARCv2: entry: document intr disable in hard isr · 78833e79
      Vineet Gupta authored
      
      
      And while at it - use the proper assembler macro which includes the
      optional irq tracing already - de-uglify'ing the code a bit
      
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      78833e79
  3. Dec 13, 2016
  4. Nov 30, 2016
  5. Nov 29, 2016
  6. Nov 28, 2016
  7. Nov 25, 2016
    • John David Anglin's avatar
      parisc: Also flush data TLB in flush_icache_page_asm · 5035b230
      John David Anglin authored
      
      
      This is the second issue I noticed in reviewing the parisc TLB code.
      
      The fic instruction may use either the instruction or data TLB in
      flushing the instruction cache.  Thus, on machines with a split TLB, we
      should also flush the data TLB after setting up the temporary alias
      registers.
      
      Although this has no functional impact, I changed the pdtlb and pitlb
      instructions to consistently use the index register %r0.  These
      instructions do not support integer displacements.
      
      Tested on rp3440 and c8000.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: <stable@vger.kernel.org> # v3.16+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      5035b230
    • John David Anglin's avatar
      parisc: Fix race in pci-dma.c · c0452fb9
      John David Anglin authored
      
      
      We are still troubled by occasional random segmentation faults and
      memory memory corruption on SMP machines.  The causes quite a few
      package builds to fail on the Debian buildd machines for parisc.  When
      gcc-6 failed to build three times in a row, I looked again at the TLB
      related code.  I found a couple of issues.  This is the first.
      
      In general, we need to ensure page table updates and corresponding TLB
      purges are atomic.  The attached patch fixes an instance in pci-dma.c
      where the page table update was not guarded by the TLB lock.
      
      Tested on rp3440 and c8000.  So far, no further random segmentation
      faults have been observed.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: <stable@vger.kernel.org> # v3.16+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      c0452fb9
    • Helge Deller's avatar
      parisc: Switch to generic sched_clock implementation · 43b1f6ab
      Helge Deller authored
      
      
      Drop the open-coded sched_clock() function and replace it by the provided
      GENERIC_SCHED_CLOCK implementation.  We have seen quite some hung tasks in the
      past, which seem to be fixed by this patch.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v4.7+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      43b1f6ab
    • John David Anglin's avatar
      parisc: Fix races in parisc_setup_cache_timing() · 741dc7bf
      John David Anglin authored
      
      
      Helge reported to me the following startup crash:
      
      [    0.000000] Linux version 4.8.0-1-parisc64-smp (debian-kernel@lists.debian.org) (gcc version 5.4.1 20161019 (GCC) ) #1 SMP Debian 4.8.7-1 (2016-11-13)
      [    0.000000] The 64-bit Kernel has started...
      [    0.000000] Kernel default page size is 4 KB. Huge pages enabled with 1 MB physical and 2 MB virtual size.
      [    0.000000] Determining PDC firmware type: System Map.
      [    0.000000] model 9000/785/J5000
      [    0.000000] Total Memory: 2048 MB
      [    0.000000] Memory: 2018528K/2097152K available (9272K kernel code, 3053K rwdata, 1319K rodata, 1024K init, 840K bss, 78624K reserved, 0K cma-reserved)
      [    0.000000] virtual kernel memory layout:
      [    0.000000]     vmalloc : 0x0000000000008000 - 0x000000003f000000   (1007 MB)
      [    0.000000]     memory  : 0x0000000040000000 - 0x00000000c0000000   (2048 MB)
      [    0.000000]       .init : 0x0000000040100000 - 0x0000000040200000   (1024 kB)
      [    0.000000]       .data : 0x0000000040b0e000 - 0x0000000040f533e0   (4372 kB)
      [    0.000000]       .text : 0x0000000040200000 - 0x0000000040b0e000   (9272 kB)
      [    0.768910] Brought up 1 CPUs
      [    0.992465] NET: Registered protocol family 16
      [    2.429981] Releasing cpu 1 now, hpa=fffffffffffa2000
      [    2.635751] CPU(s): 2 out of 2 PA8500 (PCX-W) at 440.000000 MHz online
      [    2.726692] Setting cache flush threshold to 1024 kB
      [    2.729932] Not-handled unaligned insn 0x43ffff80
      [    2.798114] Setting TLB flush threshold to 140 kB
      [    2.928039] Unaligned handler failed, ret = -1
      [    3.000419]       _______________________________
      [    3.000419]      < Your System ate a SPARC! Gah! >
      [    3.000419]       -------------------------------
      [    3.000419]              \   ^__^
      [    3.000419]                  (__)\       )\/\
      [    3.000419]                   U  ||----w |
      [    3.000419]                      ||     ||
      [    9.340055] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-1-parisc64-smp #1 Debian 4.8.7-1
      [    9.448082] task: 00000000bfd48060 task.stack: 00000000bfd50000
      [    9.528040]
      [   10.760029] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004025d154 000000004025d158
      [   10.868052]  IIR: 43ffff80    ISR: 0000000000340000  IOR: 000001ff54150960
      [   10.960029]  CPU:        1   CR30: 00000000bfd50000 CR31: 0000000011111111
      [   11.052057]  ORIG_R28: 000000004021e3b4
      [   11.100045]  IAOQ[0]: irq_exit+0x94/0x120
      [   11.152062]  IAOQ[1]: irq_exit+0x98/0x120
      [   11.208031]  RP(r2): irq_exit+0xb8/0x120
      [   11.256074] Backtrace:
      [   11.288067]  [<00000000402cd944>] cpu_startup_entry+0x1e4/0x598
      [   11.368058]  [<0000000040109528>] smp_callin+0x2c0/0x2f0
      [   11.436308]  [<00000000402b53fc>] update_curr+0x18c/0x2d0
      [   11.508055]  [<00000000402b73b8>] dequeue_entity+0x2c0/0x1030
      [   11.584040]  [<00000000402b3cc0>] set_next_entity+0x80/0xd30
      [   11.660069]  [<00000000402c1594>] pick_next_task_fair+0x614/0x720
      [   11.740085]  [<000000004020dd34>] __schedule+0x394/0xa60
      [   11.808054]  [<000000004020e488>] schedule+0x88/0x118
      [   11.876039]  [<0000000040283d3c>] rescuer_thread+0x4d4/0x5b0
      [   11.948090]  [<000000004028fc4c>] kthread+0x1ec/0x248
      [   12.016053]  [<0000000040205020>] end_fault_vector+0x20/0xc0
      [   12.092239]  [<00000000402050c0>] _switch_to_ret+0x0/0xf40
      [   12.164044]
      [   12.184036] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-1-parisc64-smp #1 Debian 4.8.7-1
      [   12.244040] Backtrace:
      [   12.244040]  [<000000004021c480>] show_stack+0x68/0x80
      [   12.244040]  [<00000000406f332c>] dump_stack+0xec/0x168
      [   12.244040]  [<000000004021c74c>] die_if_kernel+0x25c/0x430
      [   12.244040]  [<000000004022d320>] handle_unaligned+0xb48/0xb50
      [   12.244040]
      [   12.632066] ---[ end trace 9ca05a7215c7bbb2 ]---
      [   12.692036] Kernel panic - not syncing: Attempted to kill the idle task!
      
      We have the insn 0x43ffff80 in IIR but from IAOQ we should have:
         4025d150:   0f f3 20 df     ldd,s r19(r31),r31
         4025d154:   0f 9f 00 9c     ldw r31(ret0),ret0
         4025d158:   bf 80 20 58     cmpb,*<> r0,ret0,4025d18c <irq_exit+0xcc>
      
      Cpu0 has just completed running parisc_setup_cache_timing:
      
      [    2.429981] Releasing cpu 1 now, hpa=fffffffffffa2000
      [    2.635751] CPU(s): 2 out of 2 PA8500 (PCX-W) at 440.000000 MHz online
      [    2.726692] Setting cache flush threshold to 1024 kB
      [    2.729932] Not-handled unaligned insn 0x43ffff80
      [    2.798114] Setting TLB flush threshold to 140 kB
      [    2.928039] Unaligned handler failed, ret = -1
      
      From the backtrace, cpu1 is in smp_callin:
      
      void __init smp_callin(void)
      {
             int slave_id = cpu_now_booting;
      
             smp_cpu_init(slave_id);
             preempt_disable();
      
             flush_cache_all_local(); /* start with known state */
             flush_tlb_all_local(NULL);
      
             local_irq_enable();  /* Interrupts have been off until now */
      
             cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
      
      So, it has just flushed its caches and the TLB. It would seem either the
      flushes in parisc_setup_cache_timing or smp_callin have corrupted kernel
      memory.
      
      The attached patch reworks parisc_setup_cache_timing to remove the races
      in setting the cache and TLB flush thresholds. It also corrects the
      number of bytes flushed in the TLB calculation.
      
      The patch flushes the cache and TLB on cpu0 before starting the
      secondary processors so that they are started from a known state.
      
      Tested with a few reboots on c8000.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: <stable@vger.kernel.org> # v3.18+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      741dc7bf
    • Matt Redfearn's avatar
      MIPS: mm: Fix output of __do_page_fault · 2a872a5d
      Matt Redfearn authored
      
      
      Since commit 4bcc595c ("printk: reinstate KERN_CONT for printing
      continuation lines") the output from __do_page_fault on MIPS has been
      pretty unreadable due to the lack of KERN_CONT markers. Use pr_cont
      to provide the appropriate markers & restore the expected output.
      
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@imgtec.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/14544/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      2a872a5d
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Fixup kernel read only mapping · 984d7a1e
      Aneesh Kumar K.V authored
      With commit e58e87ad ("powerpc/mm: Update _PAGE_KERNEL_RO") we
      started using the ppp value 0b110 to map kernel readonly. But that
      facility was only added as part of ISA 2.04. For earlier ISA version
      only supported ppp bit value for readonly mapping is 0b011. (This
      implies both user and kernel get mapped using the same ppp bit value for
      readonly mapping.).
      Update the code such that for earlier architecture version we use ppp
      value 0b011 for readonly mapping. We don't differentiate between power5+
      and power5 here and apply the new ppp bits only from power6 (ISA 2.05).
      This keep the changes minimal.
      
      This fixes issue with PS3 spu usage reported at
      https://lkml.kernel.org/r/rep.1421449714.geoff@infradead.org
      
      
      
      Fixes: e58e87ad ("powerpc/mm: Update _PAGE_KERNEL_RO")
      Cc: stable@vger.kernel.org # v4.7+
      Tested-by: default avatarGeoff Levand <geoff@infradead.org>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      984d7a1e
  8. Nov 24, 2016
    • Radim Krčmář's avatar
      KVM: x86: check for pic and ioapic presence before use · df492896
      Radim Krčmář authored
      
      
      Split irqchip allows pic and ioapic routes to be used without them being
      created, which results in NULL access.  Check for NULL and avoid it.
      (The setup is too racy for a nicer solutions.)
      
      Found by syzkaller:
      
        general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in:
        CPU: 3 PID: 11923 Comm: kworker/3:2 Not tainted 4.9.0-rc5+ #27
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Workqueue: events irqfd_inject
        task: ffff88006a06c7c0 task.stack: ffff880068638000
        RIP: 0010:[...]  [...] __lock_acquire+0xb35/0x3380 kernel/locking/lockdep.c:3221
        RSP: 0000:ffff88006863ea20  EFLAGS: 00010006
        RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000
        RDX: 0000000000000039 RSI: 0000000000000000 RDI: 1ffff1000d0c7d9e
        RBP: ffff88006863ef58 R08: 0000000000000001 R09: 0000000000000000
        R10: 00000000000001c8 R11: 0000000000000000 R12: ffff88006a06c7c0
        R13: 0000000000000001 R14: ffffffff8baab1a0 R15: 0000000000000001
        FS:  0000000000000000(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000004abdd0 CR3: 000000003e2f2000 CR4: 00000000000026e0
        Stack:
         ffffffff894d0098 1ffff1000d0c7d56 ffff88006863ecd0 dffffc0000000000
         ffff88006a06c7c0 0000000000000000 ffff88006863ecf8 0000000000000082
         0000000000000000 ffffffff815dd7c1 ffffffff00000000 ffffffff00000000
        Call Trace:
         [...] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746
         [...] __raw_spin_lock include/linux/spinlock_api_smp.h:144
         [...] _raw_spin_lock+0x38/0x50 kernel/locking/spinlock.c:151
         [...] spin_lock include/linux/spinlock.h:302
         [...] kvm_ioapic_set_irq+0x4c/0x100 arch/x86/kvm/ioapic.c:379
         [...] kvm_set_ioapic_irq+0x8f/0xc0 arch/x86/kvm/irq_comm.c:52
         [...] kvm_set_irq+0x239/0x640 arch/x86/kvm/../../../virt/kvm/irqchip.c:101
         [...] irqfd_inject+0xb4/0x150 arch/x86/kvm/../../../virt/kvm/eventfd.c:60
         [...] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096
         [...] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230
         [...] kthread+0x328/0x3e0 kernel/kthread.c:209
         [...] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 49df6397 ("KVM: x86: Split the APIC from the rest of IRQCHIP.")
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      df492896
    • Radim Krčmář's avatar
      KVM: x86: fix out-of-bounds accesses of rtc_eoi map · 81cdb259
      Radim Krčmář authored
      
      
      KVM was using arrays of size KVM_MAX_VCPUS with vcpu_id, but ID can be
      bigger that the maximal number of VCPUs, resulting in out-of-bounds
      access.
      
      Found by syzkaller:
      
        BUG: KASAN: slab-out-of-bounds in __apic_accept_irq+0xb33/0xb50 at addr [...]
        Write of size 1 by task a.out/27101
        CPU: 1 PID: 27101 Comm: a.out Not tainted 4.9.0-rc5+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __apic_accept_irq+0xb33/0xb50 arch/x86/kvm/lapic.c:905
         [...] kvm_apic_set_irq+0x10e/0x180 arch/x86/kvm/lapic.c:495
         [...] kvm_irq_delivery_to_apic+0x732/0xc10 arch/x86/kvm/irq_comm.c:86
         [...] ioapic_service+0x41d/0x760 arch/x86/kvm/ioapic.c:360
         [...] ioapic_set_irq+0x275/0x6c0 arch/x86/kvm/ioapic.c:222
         [...] kvm_ioapic_inject_all arch/x86/kvm/ioapic.c:235
         [...] kvm_set_ioapic+0x223/0x310 arch/x86/kvm/ioapic.c:670
         [...] kvm_vm_ioctl_set_irqchip arch/x86/kvm/x86.c:3668
         [...] kvm_arch_vm_ioctl+0x1a08/0x23c0 arch/x86/kvm/x86.c:3999
         [...] kvm_vm_ioctl+0x1fa/0x1a70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: af1bae54 ("KVM: x86: bump KVM_MAX_VCPU_ID to 1023")
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      81cdb259
    • Radim Krčmář's avatar
      KVM: x86: drop error recovery in em_jmp_far and em_ret_far · 2117d539
      Radim Krčmář authored
      
      
      em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
      bit mode, but syzkaller proved otherwise (and SDM agrees).
      Code segment was restored upon failure, but it was left uninitialized
      outside of long mode, which could lead to a leak of host kernel stack.
      We could have fixed that by always saving and restoring the CS, but we
      take a simpler approach and just break any guest that manages to fail
      as the error recovery is error-prone and modern CPUs don't need emulator
      for this.
      
      Found by syzkaller:
      
        WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
        Kernel panic - not syncing: panic_on_warn set ...
      
        CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __dump_stack lib/dump_stack.c:15
         [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
         [...] panic+0x1b7/0x3a3 kernel/panic.c:179
         [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
         [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
         [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
         [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
         [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
         [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
         [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
         [...] complete_emulated_io arch/x86/kvm/x86.c:6870
         [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
         [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
         [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
         [...] vfs_ioctl fs/ioctl.c:43
         [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
         [...] SYSC_ioctl fs/ioctl.c:694
         [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
         [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: d1442d85 ("KVM: x86: Handle errors when RIP is set during far jumps")
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      2117d539
    • Radim Krčmář's avatar
      KVM: x86: fix out-of-bounds access in lapic · 444fdad8
      Radim Krčmář authored
      
      
      Cluster xAPIC delivery incorrectly assumed that dest_id <= 0xff.
      With enabled KVM_X2APIC_API_USE_32BIT_IDS in KVM_CAP_X2APIC_API, a
      userspace can send an interrupt with dest_id that results in
      out-of-bounds access.
      
      Found by syzkaller:
      
        BUG: KASAN: slab-out-of-bounds in kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 at addr ffff88003d9ca750
        Read of size 8 by task syz-executor/22923
        CPU: 0 PID: 22923 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __dump_stack lib/dump_stack.c:15
         [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
         [...] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
         [...] print_address_description mm/kasan/report.c:194
         [...] kasan_report_error mm/kasan/report.c:283
         [...] kasan_report+0x231/0x500 mm/kasan/report.c:303
         [...] __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:329
         [...] kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 arch/x86/kvm/lapic.c:824
         [...] kvm_irq_delivery_to_apic+0x132/0x9a0 arch/x86/kvm/irq_comm.c:72
         [...] kvm_set_msi+0x111/0x160 arch/x86/kvm/irq_comm.c:157
         [...] kvm_send_userspace_msi+0x201/0x280 arch/x86/kvm/../../../virt/kvm/irqchip.c:74
         [...] kvm_vm_ioctl+0xba5/0x1670 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3015
         [...] vfs_ioctl fs/ioctl.c:43
         [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
         [...] SYSC_ioctl fs/ioctl.c:694
         [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
         [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: e45115b6 ("KVM: x86: use physical LAPIC array for logical x2APIC")
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      444fdad8
    • Paul Burton's avatar
      MIPS: Mask out limit field when calculating wired entry count · 10313980
      Paul Burton authored
      
      
      Since MIPSr6 the Wired register is split into 2 fields, with the upper
      16 bits of the register indicating a limit on the value that the wired
      entry count in the bottom 16 bits of the register can take. This means
      that simply reading the wired register doesn't get us a valid TLB entry
      index any longer, and we instead need to retrieve only the lower 16 bits
      of the register. Introduce a new num_wired_entries() function which does
      this on MIPSr6 or higher and simply returns the value of the wired
      register on older architecture revisions, and make use of it when
      reading the number of wired entries.
      
      Since commit e710d666 ("MIPS: tlb-r4k: If there are wired entries,
      don't use TLBINVF") we have been using a non-zero number of wired
      entries to determine whether we should avoid use of the tlbinvf
      instruction (which would invalidate wired entries) and instead loop over
      TLB entries in local_flush_tlb_all(). This loop begins with the number
      of wired entries, or before this patch some large bogus TLB index on
      MIPSr6 systems. Thus since the aforementioned commit some MIPSr6 systems
      with FTLBs have been prone to leaving stale address translations in the
      FTLB & crashing in various weird & wonderful ways when we later observe
      the wrong memory.
      
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/14557/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      10313980
    • Oliver O'Halloran's avatar
      powerpc/boot: Fix the early OPAL console wrappers · a1ff5741
      Oliver O'Halloran authored
      
      
      When configured with CONFIG_PPC_EARLY_DEBUG_OPAL=y the kernel expects
      the OPAL entry and base addresses to be passed in r8 and r9
      respectively. Currently the wrapper does not attempt to restore these
      values before entering the decompressed kernel which causes the kernel
      to branch into whatever happens to be in r9 when doing a write to the
      OPAL console in early boot.
      
      This patch adds a platform_ops hook that can be used to branch into the
      new kernel. The OPAL console driver patches this at runtime so that if
      the console is used it will be restored just prior to entering the
      kernel.
      
      Fixes: 656ad58e ("powerpc/boot: Add OPAL console to epapr wrappers")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a1ff5741
  9. Nov 23, 2016
    • Chris Metcalf's avatar
      tile: avoid using clocksource_cyc2ns with absolute cycle count · e658a6f1
      Chris Metcalf authored
      
      
      For large values of "mult" and long uptimes, the intermediate
      result of "cycles * mult" can overflow 64 bits.  For example,
      the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
      we have mult = 853, and after 208.5 days, we overflow 64 bits.
      
      Since clocksource_cyc2ns() is intended to be used for relative
      cycle counts, not absolute cycle counts, performance is more
      importance than accepting a wider range of cycle values.  So,
      just use mult_frac() directly in tile's sched_clock().
      
      Commit 4cecf6d4 ("sched, x86: Avoid unnecessary overflow
      in sched_clock") by Salman Qazi results in essentially the same
      generated code for x86 as this change does for tile.  In fact,
      a follow-on change by Salman introduced mult_frac() and switched
      to using it, so the C code was largely identical at that point too.
      
      Peter Zijlstra then added mul_u64_u32_shr() and switched x86
      to use it.  This is, in principle, better; by optimizing the
      64x64->64 multiplies to be 32x32->64 multiplies we can potentially
      save some time.  However, the compiler piplines the 64x64->64
      multiplies pretty well, and the conditional branch in the generic
      mul_u64_u32_shr() causes some bubbles in execution, with the
      result that it's pretty much a wash.  If tilegx provided its own
      implementation of mul_u64_u32_shr() without the conditional branch,
      we could potentially save 3 cycles, but that seems like small gain
      for a fair amount of additional build scaffolding; no other platform
      currently provides a mul_u64_u32_shr() override, and tile doesn't
      currently have an <asm/div64.h> header to put the override in.
      
      Additionally, gcc currently has an optimization bug that prevents
      it from recognizing the opportunity to use a 32x32->64 multiply,
      and so the result would be no better than the existing mult_frac()
      until such time as the compiler is fixed.
      
      For now, just using mult_frac() seems like the right answer.
      
      Cc: stable@kernel.org [v3.4+]
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      e658a6f1
    • Russell King's avatar
      Revert "arm: move exports to definitions" · 8478132a
      Russell King authored
      
      
      This reverts commit 4dd1837d.
      
      Moving the exports for assembly code into the assembly files breaks
      KSYM trimming, but also breaks modversions.
      
      While fixing the KSYM trimming is trivial, fixing modversions brings
      us to a technically worse position that we had prior to the above
      change:
      
      - We end up with the prototype definitions divorsed from everything
        else, which means that adding or removing assembly level ksyms
        become more fragile:
        * if adding a new assembly ksyms export, a missed prototype in
          asm-prototypes.h results in a successful build if no module in
          the selected configuration makes use of the symbol.
        * when removing a ksyms export, asm-prototypes.h will get forgotten,
          with armksyms.c, you'll get a build error if you forget to touch
          the file.
      
      - We end up with the same amount of include files and prototypes,
        they're just in a header file instead of a .c file with their
        exports.
      
      As for lines of code, we don't get much of a size reduction:
       (original commit)
       47 files changed, 131 insertions(+), 208 deletions(-)
       (fix for ksyms trimming)
       7 files changed, 18 insertions(+), 5 deletions(-)
       (two fixes for modversions)
       1 file changed, 34 insertions(+)
       3 files changed, 7 insertions(+), 2 deletions(-)
      which results in a net total of only 25 lines deleted.
      
      As there does not seem to be much benefit from this change of approach,
      revert the change.
      
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      8478132a
  10. Nov 22, 2016
    • Helge Deller's avatar
      4345a64a
    • Peter Zijlstra's avatar
      perf/x86/intel/uncore: Allow only a single PMU/box within an events group · 033ac60c
      Peter Zijlstra authored
      
      
      Group validation expects all events to be of the same PMU; however
      is_uncore_pmu() is too wide, it matches _all_ uncore events, even
      across PMUs.
      
      This triggers failure when we group different events from different
      uncore PMUs, like:
      
        perf stat -vv -e '{uncore_cbox_0/config=0x0334/,uncore_qpi_0/event=1/}' -a sleep 1
      
      Fix is_uncore_pmu() by only matching events to the box at hand.
      
      Note that generic code; ran after this step; will disallow this
      mixture of PMU events.
      
      Reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20161118125354.GQ3117@twins.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      033ac60c
    • Peter Zijlstra's avatar
      perf/x86/intel: Cure bogus unwind from PEBS entries · b8000586
      Peter Zijlstra authored
      
      
      Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event
      unwinds sometimes do 'weird' things. In particular, we seemed to be
      ending up unwinding from random places on the NMI stack.
      
      While it was somewhat expected that the event record BP,SP would not
      match the interrupt BP,SP in that the interrupt is strictly later than
      the record event, it was overlooked that it could be on an already
      overwritten stack.
      
      Therefore, don't copy the recorded BP,SP over the interrupted BP,SP
      when we need stack unwinds.
      
      Note that its still possible the unwind doesn't full match the actual
      event, as its entirely possible to have done an (I)RET between record
      and interrupt, but on average it should still point in the general
      direction of where the event came from. Also, it's the best we can do,
      considering.
      
      The particular scenario that triggered the bogus NMI stack unwind was
      a PEBS event with very short period, upon enabling the event at the
      tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly
      triggers a record (while still on the NMI stack) which in turn
      triggers the next PMI. This then causes back-to-back NMIs and we'll
      try and unwind the stack-frame from the last NMI, which obviously is
      now overwritten by our own.
      
      Analyzed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davej@codemonkey.org.uk <davej@codemonkey.org.uk>
      Cc: dvyukov@google.com <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: ca037701 ("perf, x86: Add PEBS infrastructure")
      Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b8000586
    • Johannes Weiner's avatar
      perf/x86: Restore TASK_SIZE check on frame pointer · ae31fe51
      Johannes Weiner authored
      
      
      The following commit:
      
        75925e1a ("perf/x86: Optimize stack walk user accesses")
      
      ... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual
      access_ok() check.
      
      Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE,
      whereas the access_ok() uses whatever the current address limit of the task is.
      
      We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and
      then see vmalloc faults when we access what looks like pointers into vmalloc
      space:
      
        [] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290
        [] CPU: 3 PID: 3685731 Comm: sh Tainted: G        W       4.6.0-5_fbk1_223_gdbf0f40 #1
        [] Call Trace:
        []  <NMI>  [<ffffffff814717d1>] dump_stack+0x4d/0x6c
        []  [<ffffffff81076e43>] __warn+0xd3/0xf0
        []  [<ffffffff81076f2d>] warn_slowpath_null+0x1d/0x20
        []  [<ffffffff8104a899>] vmalloc_fault+0x289/0x290
        []  [<ffffffff8104b5a0>] __do_page_fault+0x330/0x490
        []  [<ffffffff8104b70c>] do_page_fault+0xc/0x10
        []  [<ffffffff81794e82>] page_fault+0x22/0x30
        []  [<ffffffff81006280>] ? perf_callchain_user+0x100/0x2a0
        []  [<ffffffff8115124f>] get_perf_callchain+0x17f/0x190
        []  [<ffffffff811512c7>] perf_callchain+0x67/0x80
        []  [<ffffffff8114e750>] perf_prepare_sample+0x2a0/0x370
        []  [<ffffffff8114e840>] perf_event_output+0x20/0x60
        []  [<ffffffff8114aee7>] ? perf_event_update_userpage+0xc7/0x130
        []  [<ffffffff8114ea01>] __perf_event_overflow+0x181/0x1d0
        []  [<ffffffff8114f484>] perf_event_overflow+0x14/0x20
        []  [<ffffffff8100a6e3>] intel_pmu_handle_irq+0x1d3/0x490
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  [<ffffffff81197191>] ? vunmap_page_range+0x1a1/0x2f0
        []  [<ffffffff811972f1>] ? unmap_kernel_range_noflush+0x11/0x20
        []  [<ffffffff814f2056>] ? ghes_copy_tofrom_phys+0x116/0x1f0
        []  [<ffffffff81040d1d>] ? x2apic_send_IPI_self+0x1d/0x20
        []  [<ffffffff8100411d>] perf_event_nmi_handler+0x2d/0x50
        []  [<ffffffff8101ea31>] nmi_handle+0x61/0x110
        []  [<ffffffff8101ef94>] default_do_nmi+0x44/0x110
        []  [<ffffffff8101f13b>] do_nmi+0xdb/0x150
        []  [<ffffffff81795187>] end_repeat_nmi+0x1a/0x1e
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  <<EOE>>  <IRQ>  [<ffffffff8115d05e>] ? __probe_kernel_read+0x3e/0xa0
      
      Fix this by moving the valid_user_frame() check to before the uaccess
      that loads the return address and the pointer to the next frame.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 75925e1a ("perf/x86: Optimize stack walk user accesses")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ae31fe51
    • Nicholas Piggin's avatar
      powerpc: Fix missing CRCs, add more asm-prototypes.h declarations · 9e5f6884
      Nicholas Piggin authored
      
      
      After patch 4efca4ed ("kbuild: modversions for EXPORT_SYMBOL() for asm"),
      asm exports can get modversions CRCs generated if they have C definitions
      in asm-prototypes.h. This patch adds missing definitions for 32 and 64 bit
      allmodconfig builds.
      
      Fixes: 9445aa1a ("ppc: move exports to definitions")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9e5f6884
    • Benjamin Herrenschmidt's avatar
      powerpc: Set missing wakeup bit in LPCR on POWER9 · 7a43906f
      Benjamin Herrenschmidt authored
      
      
      There is a new bit, LPCR_PECE_HVEE (Hypervisor Virtualization Exit
      Enable), which controls wakeup from STOP states on Hypervisor
      Virtualization Interrupts (which happen to also be all external
      interrupts in host or bare metal mode).
      
      It needs to be set or we will miss wakeups.
      
      Fixes: 9baaef0a ("powerpc/irq: Add support for HV virtualization interrupts")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Rename it to HVEE to match the name in the ISA]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7a43906f
Loading