Skip to content
  1. Nov 16, 2019
  2. Nov 06, 2019
  3. Nov 05, 2019
    • Michael Zhivich's avatar
      x86/tsc: Respect tsc command line paraemeter for clocksource_tsc_early · 63ec58b4
      Michael Zhivich authored
      
      
      The introduction of clocksource_tsc_early broke the functionality of
      "tsc=reliable" and "tsc=nowatchdog" command line parameters, since
      clocksource_tsc_early is unconditionally registered with
      CLOCK_SOURCE_MUST_VERIFY and thus put on the watchdog list.
      
      This can cause the TSC to be declared unstable during boot:
      
        clocksource: timekeeping watchdog on CPU0: Marking clocksource
                     'tsc-early' as unstable because the skew is too large:
        clocksource: 'refined-jiffies' wd_now: fffb7018 wd_last: fffb6e9d
                     mask: ffffffff
        clocksource: 'tsc-early' cs_now: 68a6a7070f6a0 cs_last: 68a69ab6f74d6
                     mask: ffffffffffffffff
        tsc: Marking TSC unstable due to clocksource watchdog
      
      The corresponding elapsed times are cs_nsec=1224152026 and wd_nsec=378942392, so
      the watchdog differs from TSC by 0.84 seconds.
      
      This happens when HPET is not available and jiffies are used as the TSC
      watchdog instead and the jiffies update is not happening due to lost timer
      interrupts in periodic mode, which can happen e.g. with expensive debug
      mechanisms enabled or under massive overload conditions in virtualized
      environments.
      
      Before the introduction of the early TSC clocksource the command line
      parameters "tsc=reliable" and "tsc=nowatchdog" could be used to work around
      this issue.
      
      Restore the behaviour by disabling the watchdog if requested on the kernel
      command line.
      
      [ tglx: Clarify changelog ]
      
      Fixes: aa83c457 ("x86/tsc: Introduce early tsc clocksource")
      Signed-off-by: default avatarMichael Zhivich <mzhivich@akamai.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20191024175945.14338-1-mzhivich@akamai.com
      63ec58b4
  4. Nov 04, 2019
    • Thomas Gleixner's avatar
      x86/dumpstack/64: Don't evaluate exception stacks before setup · e361362b
      Thomas Gleixner authored
      
      
      Cyrill reported the following crash:
      
        BUG: unable to handle page fault for address: 0000000000001ff0
        #PF: supervisor read access in kernel mode
        RIP: 0010:get_stack_info+0xb3/0x148
      
      It turns out that if the stack tracer is invoked before the exception stack
      mappings are initialized in_exception_stack() can erroneously classify an
      invalid address as an address inside of an exception stack:
      
          begin = this_cpu_read(cea_exception_stacks);  <- 0
          end = begin + sizeof(exception stacks);
      
      i.e. any address between 0 and end will be considered as exception stack
      address and the subsequent code will then try to derefence the resulting
      stack frame at a non mapped address.
      
       end = begin + (unsigned long)ep->size;
           ==> end = 0x2000
      
       regs = (struct pt_regs *)end - 1;
           ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0
      
       info->next_sp   = (unsigned long *)regs->sp;
           ==> Crashes due to accessing 0x1ff0
      
      Prevent this by checking the validity of the cea_exception_stack base
      address and bailing out if it is zero.
      
      Fixes: afcd21da ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
      Reported-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de
      e361362b
    • Jan Beulich's avatar
      x86/apic/32: Avoid bogus LDR warnings · fe6f85ca
      Jan Beulich authored
      
      
      The removal of the LDR initialization in the bigsmp_32 APIC code unearthed
      a problem in setup_local_APIC().
      
      The code checks unconditionally for a mismatch of the logical APIC id by
      comparing the early APIC id which was initialized in get_smp_config() with
      the actual LDR value in the APIC.
      
      Due to the removal of the bogus LDR initialization the check now can
      trigger on bigsmp_32 APIC systems emitting a warning for every booting
      CPU. This is of course a false positive because the APIC is not using
      logical destination mode.
      
      Restrict the check and the possibly resulting fixup to systems which are
      actually using the APIC in logical destination mode.
      
      [ tglx: Massaged changelog and added Cc stable ]
      
      Fixes: bae3a8d3 ("x86/apic: Do not initialize LDR and DFR for bigsmp")
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/666d8f91-b5a8-1afd-7add-821e72a35f03@suse.com
      fe6f85ca
    • Huacai Chen's avatar
      timekeeping/vsyscall: Update VDSO data unconditionally · 52338415
      Huacai Chen authored
      
      
      The update of the VDSO data is depending on __arch_use_vsyscall() returning
      True. This is a leftover from the attempt to map the features of various
      architectures 1:1 into generic code.
      
      The usage of __arch_use_vsyscall() in the actual vsyscall implementations
      got dropped and replaced by the requirement for the architecture code to
      return U64_MAX if the global clocksource is not usable in the VDSO.
      
      But the __arch_use_vsyscall() check in the update code stayed which causes
      the VDSO data to be stale or invalid when an architecture actually
      implements that function and returns False when the current clocksource is
      not usable in the VDSO.
      
      As a consequence the VDSO implementations of clock_getres(), time(),
      clock_gettime(CLOCK_.*_COARSE) operate on invalid data and return bogus
      information.
      
      Remove the __arch_use_vsyscall() check from the VDSO update function and
      update the VDSO data unconditionally.
      
      [ tglx: Massaged changelog and removed the now useless implementations in
        	asm-generic/ARM64/MIPS ]
      
      Fixes: 44f57d78 ("timekeeping: Provide a generic update_vsyscall() implementation")
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1571887709-11447-1-git-send-email-chenhc@lemote.com
      52338415
    • Lucas Stach's avatar
      arm64: dts: zii-ultra: fix ARM regulator GPIO handle · f852497c
      Lucas Stach authored
      
      
      The GPIO handle is referencing the wrong GPIO, so the voltage did not
      actually change as intended. The pinmux is already correct, so just
      correct the GPIO number.
      
      Fixes: 4a13b3be (arm64: dts: imx: add Zii Ultra board support)
      Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      f852497c
  5. Nov 03, 2019
  6. Nov 01, 2019
  7. Oct 31, 2019
    • Heiko Carstens's avatar
      s390/idle: fix cpu idle time calculation · 3d7efa4e
      Heiko Carstens authored
      
      
      The idle time reported in /proc/stat sometimes incorrectly contains
      huge values on s390. This is caused by a bug in arch_cpu_idle_time().
      
      The kernel tries to figure out when a different cpu entered idle by
      accessing its per-cpu data structure. There is an ordering problem: if
      the remote cpu has an idle_enter value which is not zero, and an
      idle_exit value which is zero, it is assumed it is idle since
      "now". The "now" timestamp however is taken before the idle_enter
      value is read.
      
      Which in turn means that "now" can be smaller than idle_enter of the
      remote cpu. Unconditionally subtracting idle_enter from "now" can thus
      lead to a negative value (aka large unsigned value).
      
      Fix this by moving the get_tod_clock() invocation out of the
      loop. While at it also make the code a bit more readable.
      
      A similar bug also exists for show_idle_time(). Fix this is as well.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      3d7efa4e
    • Ilya Leoshkevich's avatar
      s390/unwind: fix mixing regs and sp · a1d863ac
      Ilya Leoshkevich authored
      
      
      unwind_for_each_frame stops after the first frame if regs->gprs[15] <=
      sp.
      
      The reason is that in case regs are specified, the first frame should be
      regs->psw.addr and the second frame should be sp->gprs[8]. However,
      currently the second frame is regs->gprs[15], which confuses
      outside_of_stack().
      
      Fix by introducing a flag to distinguish this special case from
      unwinding the interrupt handler, for which the current behavior is
      appropriate.
      
      Fixes: 78c98f90 ("s390/unwind: introduce stack unwind API")
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Cc: stable@vger.kernel.org # v5.2+
      Reviewed-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      a1d863ac
    • Yihui ZENG's avatar
      s390/cmm: fix information leak in cmm_timeout_handler() · b8e51a6a
      Yihui ZENG authored
      
      
      The problem is that we were putting the NUL terminator too far:
      
      	buf[sizeof(buf) - 1] = '\0';
      
      If the user input isn't NUL terminated and they haven't initialized the
      whole buffer then it leads to an info leak.  The NUL terminator should
      be:
      
      	buf[len - 1] = '\0';
      
      Signed-off-by: default avatarYihui Zeng <yzeng56@asu.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      [heiko.carstens@de.ibm.com: keep semantics of how *lenp and *ppos are handled]
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      b8e51a6a
    • Bjorn Andersson's avatar
      arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo · 36c602dc
      Bjorn Andersson authored
      
      
      The Kryo cores share errata 1009 with Falkor, so add their model
      definitions and enable it for them as well.
      
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      [will: Update entry in silicon-errata.rst]
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      36c602dc
    • Paolo Bonzini's avatar
      KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active · 9167ab79
      Paolo Bonzini authored
      
      
      VMX already does so if the host has SMEP, in order to support the combination of
      CR0.WP=1 and CR4.SMEP=1.  However, it is perfectly safe to always do so, and in
      fact VMX already ends up running with EFER.NXE=1 on old processors that lack the
      "load EFER" controls, because it may help avoiding a slow MSR write.  Removing
      all the conditionals simplifies the code.
      
      SVM does not have similar code, but it should since recent AMD processors do
      support SMEP.  So this patch also makes the code for the two vendors more similar
      while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors.
      
      Cc: stable@vger.kernel.org
      Cc: Joerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9167ab79
    • Kairui Song's avatar
      x86, efi: Never relocate kernel below lowest acceptable address · 220dd769
      Kairui Song authored
      
      
      Currently, kernel fails to boot on some HyperV VMs when using EFI.
      And it's a potential issue on all x86 platforms.
      
      It's caused by broken kernel relocation on EFI systems, when below three
      conditions are met:
      
      1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
         by the loader.
      2. There isn't enough room to contain the kernel, starting from the
         default load address (eg. something else occupied part the region).
      3. In the memmap provided by EFI firmware, there is a memory region
         starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
         kernel.
      
      EFI stub will perform a kernel relocation when condition 1 is met. But
      due to condition 2, EFI stub can't relocate kernel to the preferred
      address, so it fallback to ask EFI firmware to alloc lowest usable memory
      region, got the low region mentioned in condition 3, and relocated
      kernel there.
      
      It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
      is the lowest acceptable kernel relocation address.
      
      The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
      Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
      address if kernel is located below it. Then the relocation before
      decompression, which move kernel to the end of the decompression buffer,
      will overwrite other memory region, as there is no enough memory there.
      
      To fix it, just don't let EFI stub relocate the kernel to any address
      lower than lowest acceptable address.
      
      [ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ]
      
      Signed-off-by: default avatarKairui Song <kasong@redhat.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      220dd769
  8. Oct 30, 2019
  9. Oct 29, 2019
Loading