Skip to content
Snippets Groups Projects
  1. May 25, 2018
  2. Apr 30, 2018
    • Joel Stanley's avatar
      ARM: dts: aspeed: Describe random number device · 5daa8212
      Joel Stanley authored
      
      There is a random number generator that updates a register in the SCU
      every second. This is compatible with the timeriomem rng driver in the
      kernel.
      
      From the timeriomem_rng bindings:
      
        quality: estimated number of bits of true entropy per 1024 bits read
        from the rng.  Defaults to zero which causes the kernel's default
        quality to be used instead.  Note that the default quality is usually
        zero which disables using this rng to automatically fill the kernel's
        entropy pool.
      
      As to the recommended value for us to use:
      
       Rick Altherr <raltherr@google.com> wrote:
       > Quality is #bit of entropy per 1000 bits read.  100 is a
       > conservative value that was suggested by those in the know.
      
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      5daa8212
  3. Apr 21, 2018
  4. Apr 20, 2018
  5. Apr 19, 2018
    • Michael Ellerman's avatar
      powerpc/kvm: Fix lockups when running KVM guests on Power8 · 56376c58
      Michael Ellerman authored
      
      When running KVM guests on Power8 we can see a lockup where one CPU
      stops responding. This often leads to a message such as:
      
        watchdog: CPU 136 detected hard LOCKUP on other CPUs 72
        Task dump for CPU 72:
        qemu-system-ppc R  running task    10560 20917  20908 0x00040004
      
      And then backtraces on other CPUs, such as:
      
        Task dump for CPU 48:
        ksmd            R  running task    10032  1519      2 0x00000804
        Call Trace:
          ...
          --- interrupt: 901 at smp_call_function_many+0x3c8/0x460
              LR = smp_call_function_many+0x37c/0x460
          pmdp_invalidate+0x100/0x1b0
          __split_huge_pmd+0x52c/0xdb0
          try_to_unmap_one+0x764/0x8b0
          rmap_walk_anon+0x15c/0x370
          try_to_unmap+0xb4/0x170
          split_huge_page_to_list+0x148/0xa30
          try_to_merge_one_page+0xc8/0x990
          try_to_merge_with_ksm_page+0x74/0xf0
          ksm_scan_thread+0x10ec/0x1ac0
          kthread+0x160/0x1a0
          ret_from_kernel_thread+0x5c/0x78
      
      This is caused by commit 8c1c7fb0 ("powerpc/64s/idle: avoid sync
      for KVM state when waking from idle"), which added a check in
      pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is
      already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and
      test of kvm_hstate.hwthread_req.
      
      The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when
      entering the guest, so it can then come out to cede with
      KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after
      setting hwthread_req to 1, but because hwthread_state is still
      KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we
      wake up from idle and won't go to kvm_start_guest. From there the
      thread will return somewhere garbage and crash.
      
      Fix it by skipping the store of hwthread_state, but not the test of
      hwthread_req, when coming out of idle. It's OK to skip the sync in
      that case because hwthread_req will have been set on the same thread,
      so there is no synchronisation required.
      
      Fixes: 8c1c7fb0 ("powerpc/64s/idle: avoid sync for KVM state when waking from idle")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      56376c58
    • Michael Neuling's avatar
      powerpc/eeh: Fix enabling bridge MMIO windows · 13a83eac
      Michael Neuling authored
      
      On boot we save the configuration space of PCIe bridges. We do this so
      when we get an EEH event and everything gets reset that we can restore
      them.
      
      Unfortunately we save this state before we've enabled the MMIO space
      on the bridges. Hence if we have to reset the bridge when we come back
      MMIO is not enabled and we end up taking an PE freeze when the driver
      starts accessing again.
      
      This patch forces the memory/MMIO and bus mastering on when restoring
      bridges on EEH. Ideally we'd do this correctly by saving the
      configuration space writes later, but that will have to come later in
      a larger EEH rewrite. For now we have this simple fix.
      
      The original bug can be triggered on a boston machine by doing:
        echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
      On boston, this PHB has a PCIe switch on it.  Without this patch,
      you'll see two EEH events, 1 expected and 1 the failure we are fixing
      here. The second EEH event causes the anything under the PHB to
      disappear (i.e. the i40e eth).
      
      With this patch, only 1 EEH event occurs and devices properly recover.
      
      Fixes: 652defed ("powerpc/eeh: Check PCIe link after reset")
      Cc: stable@vger.kernel.org # v3.11+
      Reported-by: default avatarPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Acked-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      13a83eac
  6. Apr 18, 2018
  7. Apr 17, 2018
    • Matt Redfearn's avatar
      MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup · daf70d89
      Matt Redfearn authored
      
      The __clear_user function is defined to return the number of bytes that
      could not be cleared. From the underlying memset / bzero implementation
      this means setting register a2 to that number on return. Currently if a
      page fault is triggered within the memset_partial block, the value
      loaded into a2 on return is meaningless.
      
      The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
      out how many bytes failed to copy, the exception handler should find how
      many bytes left in the partial block (andi a2, STORMASK), add that to
      the partial block end address (a2), and subtract the faulting address to
      get the remainder. Currently it incorrectly subtracts the partial block
      start address (t1), which has additionally been clobbered to generate a
      jump target in memset_partial. Fix this by adding the block end address
      instead.
      
      This issue was found with the following test code:
            int j, k;
            for (j = 0; j < 512; j++) {
              if ((k = clear_user(NULL, j)) != j) {
                 pr_err("clear_user (NULL %d) returned %d\n", j, k);
              }
            }
      Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).
      
      Suggested-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19108/
      
      
      Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      daf70d89
    • Mark Rutland's avatar
      arm64: kasan: avoid pfn_to_nid() before page array is initialized · 800cb2e5
      Mark Rutland authored
      
      In arm64's kasan_init(), we use pfn_to_nid() to find the NUMA node a
      span of memory is in, hoping to allocate shadow from the same NUMA node.
      However, at this point, the page array has not been initialized, and
      thus this is bogus.
      
      Since commit:
      
        f165b378 ("mm: uninitialized struct page poisoning sanity")
      
      ... accessing fields of the page array results in a boot time Oops(),
      highlighting this problem:
      
      [    0.000000] Unable to handle kernel paging request at virtual address dfff200000000000
      [    0.000000] Mem abort info:
      [    0.000000]   ESR = 0x96000004
      [    0.000000]   Exception class = DABT (current EL), IL = 32 bits
      [    0.000000]   SET = 0, FnV = 0
      [    0.000000]   EA = 0, S1PTW = 0
      [    0.000000] Data abort info:
      [    0.000000]   ISV = 0, ISS = 0x00000004
      [    0.000000]   CM = 0, WnR = 0
      [    0.000000] [dfff200000000000] address between user and kernel address ranges
      [    0.000000] Internal error: Oops: 96000004 [#1] PREEMPT SMP
      [    0.000000] Modules linked in:
      [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.16.0-07317-gf165b378bbdf #42
      [    0.000000] Hardware name: ARM Juno development board (r1) (DT)
      [    0.000000] pstate: 80000085 (Nzcv daIf -PAN -UAO)
      [    0.000000] pc : __asan_load8+0x8c/0xa8
      [    0.000000] lr : __dump_page+0x3c/0x3b8
      [    0.000000] sp : ffff2000099b7ca0
      [    0.000000] x29: ffff2000099b7ca0 x28: ffff20000a1762c0
      [    0.000000] x27: ffff7e0000000000 x26: ffff2000099dd000
      [    0.000000] x25: ffff200009a3f960 x24: ffff200008f9c38c
      [    0.000000] x23: ffff20000a9d3000 x22: ffff200009735430
      [    0.000000] x21: fffffffffffffffe x20: ffff7e0001e50420
      [    0.000000] x19: ffff7e0001e50400 x18: 0000000000001840
      [    0.000000] x17: ffffffffffff8270 x16: 0000000000001840
      [    0.000000] x15: 0000000000001920 x14: 0000000000000004
      [    0.000000] x13: 0000000000000000 x12: 0000000000000800
      [    0.000000] x11: 1ffff0012d0f89ff x10: ffff10012d0f89ff
      [    0.000000] x9 : 0000000000000000 x8 : ffff8009687c5000
      [    0.000000] x7 : 0000000000000000 x6 : ffff10000f282000
      [    0.000000] x5 : 0000000000000040 x4 : fffffffffffffffe
      [    0.000000] x3 : 0000000000000000 x2 : dfff200000000000
      [    0.000000] x1 : 0000000000000005 x0 : 0000000000000000
      [    0.000000] Process swapper (pid: 0, stack limit = 0x        (ptrval))
      [    0.000000] Call trace:
      [    0.000000]  __asan_load8+0x8c/0xa8
      [    0.000000]  __dump_page+0x3c/0x3b8
      [    0.000000]  dump_page+0xc/0x18
      [    0.000000]  kasan_init+0x2e8/0x5a8
      [    0.000000]  setup_arch+0x294/0x71c
      [    0.000000]  start_kernel+0xdc/0x500
      [    0.000000] Code: aa0403e0 9400063c 17ffffee d343fc00 (38e26800)
      [    0.000000] ---[ end trace 67064f0e9c0cc338 ]---
      [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
      [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
      
      Let's fix this by using early_pfn_to_nid(), as other architectures do in
      their kasan init code. Note that early_pfn_to_nid acquires the nid from
      the memblock array, which we iterate over in kasan_init(), so this
      should be fine.
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: 39d114dd ("arm64: add KASAN support")
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      800cb2e5
    • Joerg Roedel's avatar
      x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y · d6ef1f19
      Joerg Roedel authored
      
      The walk_pte_level() function just uses __va to get the virtual address of
      the PTE page, but that breaks when the PTE page is not in the direct
      mapping with HIGHPTE=y.
      
      The result is an unhandled kernel paging request at some random address
      when accessing the current_kernel or current_user file.
      
      Use the correct API to access PTE pages.
      
      Fixes: fe770bf0 ('x86: clean up the page table dumper and add 32-bit support')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: jgross@suse.com
      Cc: JBeulich@suse.com
      Cc: hpa@zytor.com
      Cc: aryabinin@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Link: https://lkml.kernel.org/r/1523971636-4137-1-git-send-email-joro@8bytes.org
      d6ef1f19
    • Alison Schofield's avatar
      x86,sched: Allow topologies where NUMA nodes share an LLC · 1340ccfa
      Alison Schofield authored
      
      Intel's Skylake Server CPUs have a different LLC topology than previous
      generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided
      into two "slices", each containing half the cores, half the LLC, and one
      memory controller and each slice is enumerated to Linux as a NUMA
      node. This is similar to how the cores and LLC were arranged for the
      Cluster-On-Die (CoD) feature.
      
      CoD allowed the same cache line to be present in each half of the LLC.
      But, with SNC, each line is only ever present in *one* slice. This means
      that the portion of the LLC *available* to a CPU depends on the data being
      accessed:
      
          Remote socket: entire package LLC is shared
          Local socket->local slice: data goes into local slice LLC
          Local socket->remote slice: data goes into remote-slice LLC. Slightly
                          		higher latency than local slice LLC.
      
      The biggest implication from this is that a process accessing all
      NUMA-local memory only sees half the LLC capacity.
      
      The CPU describes its cache hierarchy with the CPUID instruction. One of
      the CPUID leaves enumerates the "logical processors sharing this
      cache". This information is used for scheduling decisions so that tasks
      move more freely between CPUs sharing the cache.
      
      But, the CPUID for the SNC configuration discussed above enumerates the LLC
      as being shared by the entire package. This is not 100% precise because the
      entire cache is not usable by all accesses. But, it *is* the way the
      hardware enumerates itself, and this is not likely to change.
      
      The userspace visible impact of all the above is that the sysfs info
      reports the entire LLC as being available to the entire package. As noted
      above, this is not true for local socket accesses. This patch does not
      correct the sysfs info. It is the same, pre and post patch.
      
      The current code emits the following warning:
      
       sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
      
      The warning is coming from the topology_sane() check in smpboot.c because
      the topology is not matching the expectations of the model for obvious
      reasons.
      
      To fix this, add a vendor and model specific check to never call
      topology_sane() for these systems. Also, just like "Cluster-on-Die" disable
      the "coregroup" sched_domain_topology_level and use NUMA information from
      the SRAT alone.
      
      This is OK at least on the hardware we are immediately concerned about
      because the LLC sharing happens at both the slice and at the package level,
      which are also NUMA boundaries.
      
      Signed-off-by: default avatarAlison Schofield <alison.schofield@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: brice.goglin@gmail.com
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com
      1340ccfa
    • Dou Liyang's avatar
      x86/processor: Remove two unused function declarations · 451cf3ca
      Dou Liyang authored
      
      early_trap_init() and cpu_set_gdt() have been removed, so remove the stale
      declarations as well.
      
      Signed-off-by: default avatarDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Cc: luto@kernel.org
      Cc: hpa@zytor.com
      Cc: bp@suse.de
      Cc: kirill.shutemov@linux.intel.com
      Link: https://lkml.kernel.org/r/20180404064527.10562-1-douly.fnst@cn.fujitsu.com
      451cf3ca
    • Dou Liyang's avatar
      x86/acpi: Prevent X2APIC id 0xffffffff from being accounted · 10daf10a
      Dou Liyang authored
      
      RongQing reported that there are some X2APIC id 0xffffffff in his machine's
      ACPI MADT table, which makes the number of possible CPU inaccurate.
      
      The reason is that the ACPI X2APIC parser has no sanity check for APIC ID
      0xffffffff, which is an invalid id in all APIC types. See "Intel® 64
      Architecture x2APIC Specification", Chapter 2.4.1.
      
      Add a sanity check to acpi_parse_x2apic() which ignores the invalid id.
      
      Reported-by: default avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: len.brown@intel.com
      Cc: rjw@rjwysocki.net
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20180412014052.25186-1-douly.fnst@cn.fujitsu.com
      10daf10a
    • Xiaoming Gao's avatar
      x86/tsc: Prevent 32bit truncation in calc_hpet_ref() · d3878e16
      Xiaoming Gao authored
      
      The TSC calibration code uses HPET as reference. The conversion normalizes
      the delta of two HPET timestamps:
      
          hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6
      
      and then divides the normalized delta of the corresponding TSC timestamps
      by the result to calulate the TSC frequency.
      
          tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref
      
      This uses do_div() which takes an u32 as the divisor, which worked so far
      because the HPET frequency was low enough that 'hpetref' never exceeded
      32bit.
      
      On Skylake machines the HPET frequency increased so 'hpetref' can exceed
      32bit. do_div() truncates the divisor, which causes the calibration to
      fail.
      
      Use div64_u64() to avoid the problem.
      
      [ tglx: Fixes whitespace mangled patch and rewrote changelog ]
      
      Signed-off-by: default avatarXiaoming Gao <newtongao@tencent.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/38894564-4fc9-b8ec-353f-de702839e44e@gmail.com
      d3878e16
    • Christoph Hellwig's avatar
      x86: Remove pci-nommu.c · ef97837d
      Christoph Hellwig authored
      
      The commit that switched x86 to dma_direct_ops stopped using and building
      this file, but accidentally left it in the tree.  Remove it.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: iommu@lists.infradead.org
      Link: https://lkml.kernel.org/r/20180416124442.13831-1-hch@lst.de
      ef97837d
    • Madhavan Srinivasan's avatar
      powerpc/64s: Default l1d_size to 64K in RFI fallback flush · 9dfbf78e
      Madhavan Srinivasan authored
      
      If there is no d-cache-size property in the device tree, l1d_size could
      be zero. We don't actually expect that to happen, it's only been seen
      on mambo (simulator) in some configurations.
      
      A zero-size l1d_size leads to the loop in the asm wrapping around to
      2^64-1, and then walking off the end of the fallback area and
      eventually causing a page fault which is fatal.
      
      Just default to 64K which is correct on some CPUs, and sane enough to
      not cause a crash on others.
      
      Fixes: aa8a5e00 ('powerpc/64s: Add support for RFI flush of L1-D cache')
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      [mpe: Rewrite comment and change log]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9dfbf78e
    • Martin Schwidefsky's avatar
      s390/signal: cleanup uapi struct sigaction · fae76491
      Martin Schwidefsky authored
      
      The struct sigaction for user space in arch/s390/include/uapi/asm/signal.h
      is ill defined. The kernel uses two structures 'struct sigaction' and
      'struct old_sigaction', the correlation in the kernel for both 31 and
      64 bit is as follows
      
          sys_sigaction -> struct old_sigaction
          sys_rt_sigaction -> struct sigaction
      
      The correlation of the (single) uapi definition for 'struct sigaction'
      under '#ifndef __KERNEL__':
      
          31-bit: sys_sigaction -> uapi struct sigaction
          31-bit: sys_rt_sigaction -> no structure available
      
          64-bit: sys_sigaction -> no structure available
          64-bit: sys_rt_sigaction -> uapi struct sigaction
      
      This is quite confusing. To make it a bit less confusing make the
      uapi definition of 'struct sigaction' usable for sys_rt_sigaction for
      both 31-bit and 64-bit.
      
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      fae76491
  8. Apr 16, 2018
Loading