Skip to content
  1. Sep 08, 2015
    • Juergen Gross's avatar
      xen: limit memory to architectural maximum · cb9e444b
      Juergen Gross authored
      
      
      When a pv-domain (including dom0) is started it tries to size it's
      p2m list according to the maximum possible memory amount it ever can
      achieve. Limit the initial maximum memory size to the architectural
      limit of the hardware in order to avoid overflows during remapping
      of memory.
      
      This problem will occur when dom0 is started with an initial memory
      size being a multiple of 1GB, but without specifying it's maximum
      memory size. The kernel must be configured without
      CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen.
      
      Reported-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Tested-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      cb9e444b
    • Juergen Gross's avatar
      xen: avoid another early crash of memory limited dom0 · ab24507c
      Juergen Gross authored
      
      
      Commit b1c9f169047b ("xen: split counting of extra memory pages...")
      introduced an error when dom0 was started with limited memory occurring
      only on some hardware.
      
      The problem arises in case dom0 is started with initial memory and
      maximum memory being the same. The kernel must be configured without
      CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen. If all
      of this is true and the E820 map of the machine is sparse (some areas
      are not covered) then the machine might crash early in the boot
      process.
      
      An example E820 map triggering the problem looks like this:
      
      [    0.000000] e820: BIOS-provided physical RAM map:
      [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
      [    0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cf7fafff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000cf7fb000-0x00000000cf95ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cf960000-0x00000000cfb62fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfb63000-0x00000000cfd14fff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000cfd15000-0x00000000cfd61fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfd62000-0x00000000cfd6cfff] ACPI data
      [    0.000000] BIOS-e820: [mem 0x00000000cfd6d000-0x00000000cfd6ffff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfd70000-0x00000000cfd70fff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000cfd71000-0x00000000cfea8fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfea9000-0x00000000cfeb9fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfeba000-0x00000000cfecafff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfecb000-0x00000000cfecbfff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfecc000-0x00000000cfedbfff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfedc000-0x00000000cfedcfff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfedd000-0x00000000cfeddfff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfede000-0x00000000cfee3fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfee4000-0x00000000cfef6fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfef7000-0x00000000cfefffff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000020effffff] usable
      
      In this case the area a0000-dffff isn't present in the map. This will
      confuse the memory setup of the domain when remapping the memory from
      such holes to populated areas.
      
      To avoid the problem the accounting of to be remapped memory has to
      count such holes in the E820 map as well.
      
      Reported-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      ab24507c
    • Juergen Gross's avatar
      xen: avoid early crash of memory limited dom0 · eafd72e0
      Juergen Gross authored
      
      
      Commit b1c9f169047b ("xen: split counting of extra memory pages...")
      introduced an error when dom0 was started with limited memory.
      
      The problem arises in case dom0 is started with initial memory and
      maximum memory being the same and exactly a multiple of 1 GB. The
      kernel must be configured without CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
      for the problem to happen. In this case it will crash very early
      during boot due to the virtual mapped p2m list not being large
      enough to be able to remap any memory:
      
      (XEN) Freed 304kB init memory.
      mapping kernel into physical memory
      about to get started...
      (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
      (XEN) domain_crash_sync called from entry.S: fault at ffff82d080229a93 create_bounce_frame+0x12b/0x13a
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.5.2-pre  x86_64  debug=n Not tainted ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff81d120cb>]
      (XEN) RFLAGS: 0000000000000206   EM: 1 CONTEXT: pv guest (d0v0)
      (XEN) rax: ffffffff81db2000   rbx: 000000004d000000   rcx: 0000000000000000
      (XEN) rdx: 000000004d000000   rsi: 0000000000063000   rdi: 000000004d063000
      (XEN) rbp: ffffffff81c03d78   rsp: ffffffff81c03d28   r8:  0000000000023000
      (XEN) r9:  00000001040ff000   r10: 0000000000007ff0   r11: 0000000000000000
      (XEN) r12: 0000000000063000   r13: 000000000004d000   r14: 0000000000000063
      (XEN) r15: 0000000000000063   cr0: 0000000080050033   cr4: 00000000000006f0
      (XEN) cr3: 0000000105c0f000   cr2: ffffc90000268000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81c03d28:
      (XEN)   0000000000000000 0000000000000000 ffffffff81d120cb 000000010000e030
      (XEN)   0000000000010006 ffffffff81c03d68 000000000000e02b ffffffffffffffff
      (XEN)   0000000000000063 000000000004d063 ffffffff81c03de8 ffffffff81d130a7
      (XEN)   ffffffff81c03de8 000000000004d000 00000001040ff000 0000000000105db1
      (XEN)   00000001040ff001 000000000004d062 ffff8800092d6ff8 0000000002027000
      (XEN)   ffff8800094d8340 ffff8800092d6ff8 00003ffffffff000 ffff8800092d7ff8
      (XEN)   ffffffff81c03e48 ffffffff81d13c43 ffff8800094d8000 ffff8800094d9000
      (XEN)   0000000000000000 ffff8800092d6000 00000000092d6000 000000004cfbf000
      (XEN)   00000000092d6000 00000000052d5442 0000000000000000 0000000000000000
      (XEN)   ffffffff81c03ed8 ffffffff81d185c1 0000000000000000 0000000000000000
      (XEN)   ffffffff81c03e78 ffffffff810f8ca4 ffffffff81c03ed8 ffffffff8171a15d
      (XEN)   0000000000000010 ffffffff81c03ee8 0000000000000000 0000000000000000
      (XEN)   ffffffff81f0e402 ffffffffffffffff ffffffff81dae900 0000000000000000
      (XEN)   0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81d0cf0f
      (XEN)   0000000000000000 0000000000000000 0000000000000000 ffffffff81db82e0
      (XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN)   ffffffff81c03f38 ffffffff81d0c603 ffffffff81c03ff8 ffffffff81d11c86
      (XEN)   0300000100000032 0000000000000005 0000000000000020 0000000000000000
      (XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
      
      This can be avoided by allocating aneough space for the p2m to cover
      the maximum memory of dom0 plus the identity mapped holes required
      for PCI space, BIOS etc.
      
      Reported-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      eafd72e0
  2. Aug 20, 2015
  3. Aug 16, 2015
  4. Aug 13, 2015
  5. Aug 12, 2015
    • Matt Fleming's avatar
      perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler · d7a702f0
      Matt Fleming authored
      
      
      Tony reports that booting his 144-cpu machine with maxcpus=10 triggers
      the following WARN_ON():
      
      [   21.045727] WARNING: CPU: 8 PID: 647 at arch/x86/kernel/cpu/perf_event_intel_cqm.c:1267 intel_cqm_cpu_prepare+0x75/0x90()
      [   21.045744] CPU: 8 PID: 647 Comm: systemd-udevd Not tainted 4.2.0-rc4 #1
      [   21.045745] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0066.R00.1506021730 06/02/2015
      [   21.045747]  0000000000000000 0000000082771b09 ffff880856333ba8 ffffffff81669b67
      [   21.045748]  0000000000000000 0000000000000000 ffff880856333be8 ffffffff8107b02a
      [   21.045750]  ffff88085b789800 ffff88085f68a020 ffffffff819e2470 000000000000000a
      [   21.045750] Call Trace:
      [   21.045757]  [<ffffffff81669b67>] dump_stack+0x45/0x57
      [   21.045759]  [<ffffffff8107b02a>] warn_slowpath_common+0x8a/0xc0
      [   21.045761]  [<ffffffff8107b15a>] warn_slowpath_null+0x1a/0x20
      [   21.045762]  [<ffffffff81036725>] intel_cqm_cpu_prepare+0x75/0x90
      [   21.045764]  [<ffffffff81036872>] intel_cqm_cpu_notifier+0x42/0x160
      [   21.045767]  [<ffffffff8109a33d>] notifier_call_chain+0x4d/0x80
      [   21.045769]  [<ffffffff8109a44e>] __raw_notifier_call_chain+0xe/0x10
      [   21.045770]  [<ffffffff8107b538>] _cpu_up+0xe8/0x190
      [   21.045771]  [<ffffffff8107b65a>] cpu_up+0x7a/0xa0
      [   21.045774]  [<ffffffff8165e920>] cpu_subsys_online+0x40/0x90
      [   21.045777]  [<ffffffff81433b37>] device_online+0x67/0x90
      [   21.045778]  [<ffffffff81433bea>] online_store+0x8a/0xa0
      [   21.045782]  [<ffffffff81430e78>] dev_attr_store+0x18/0x30
      [   21.045785]  [<ffffffff8126b6ba>] sysfs_kf_write+0x3a/0x50
      [   21.045786]  [<ffffffff8126ad40>] kernfs_fop_write+0x120/0x170
      [   21.045789]  [<ffffffff811f0b77>] __vfs_write+0x37/0x100
      [   21.045791]  [<ffffffff811f38b8>] ? __sb_start_write+0x58/0x110
      [   21.045795]  [<ffffffff81296d2d>] ? security_file_permission+0x3d/0xc0
      [   21.045796]  [<ffffffff811f1279>] vfs_write+0xa9/0x190
      [   21.045797]  [<ffffffff811f2075>] SyS_write+0x55/0xc0
      [   21.045800]  [<ffffffff81067300>] ? do_page_fault+0x30/0x80
      [   21.045804]  [<ffffffff816709ae>] entry_SYSCALL_64_fastpath+0x12/0x71
      [   21.045805] ---[ end trace fe228b836d8af405 ]---
      
      The root cause is that CPU_UP_PREPARE is completely the wrong notifier
      action from which to access cpu_data(), because smp_store_cpu_info()
      won't have been executed by the target CPU at that point, which in turn
      means that ->x86_cache_max_rmid and ->x86_cache_occ_scale haven't been
      filled out.
      
      Instead let's invoke our handler from CPU_STARTING and rename it
      appropriately.
      
      Reported-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikas Shivappa <vikas.shivappa@intel.com>
      Link: http://lkml.kernel.org/r/1438863163-14083-1-git-send-email-matt@codeblueprint.co.uk
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d7a702f0
    • Peter Zijlstra's avatar
      perf/x86/intel: Fix memory leak on hot-plug allocation fail · dbc72b7a
      Peter Zijlstra authored
      
      
      We fail to free the shared_regs allocation if the constraint_list
      allocation fails.
      
      Cure this and be more consistent in NULL-ing the pointers after free.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dbc72b7a
  6. Aug 11, 2015
    • Nathan Lynch's avatar
      ARM: 8410/1: VDSO: fix coarse clock monotonicity regression · 09edea4f
      Nathan Lynch authored
      
      
      Since 906c5557 ("timekeeping: Copy the shadow-timekeeper over the
      real timekeeper last") it has become possible on ARM to:
      
      - Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp
        via syscall.
      - Subsequently obtain a timestamp for the same clock ID via VDSO which
        predates the first timestamp (by one jiffy).
      
      This is because ARM's update_vsyscall is deriving the coarse time
      using the __current_kernel_time interface, when it should really be
      using the timekeeper object provided to it by the timekeeping core.
      It happened to work before only because __current_kernel_time would
      access the same timekeeper object which had been passed to
      update_vsyscall.  This is no longer the case.
      
      Cc: stable@vger.kernel.org
      Fixes: 906c5557 ("timekeeping: Copy the shadow-timekeeper over the real timekeeper last")
      Signed-off-by: default avatarNathan Lynch <nathan_lynch@mentor.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      09edea4f
  7. Aug 10, 2015
    • Nathan Lynch's avatar
      arm64: VDSO: fix coarse clock monotonicity regression · 878854a3
      Nathan Lynch authored
      
      
      Since 906c5557 ("timekeeping: Copy the shadow-timekeeper over the
      real timekeeper last") it has become possible on arm64 to:
      
      - Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp
        via syscall.
      - Subsequently obtain a timestamp for the same clock ID via VDSO which
        predates the first timestamp (by one jiffy).
      
      This is because arm64's update_vsyscall is deriving the coarse time
      using the __current_kernel_time interface, when it should really be
      using the timekeeper object provided to it by the timekeeping core.
      It happened to work before only because __current_kernel_time would
      access the same timekeeper object which had been passed to
      update_vsyscall.  This is no longer the case.
      
      Signed-off-by: default avatarNathan Lynch <nathan_lynch@mentor.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      878854a3
    • Jason A. Donenfeld's avatar
      x86/xen: build "Xen PV" APIC driver for domU as well · fc5fee86
      Jason A. Donenfeld authored
      It turns out that a PV domU also requires the "Xen PV" APIC
      driver. Otherwise, the flat driver is used and we get stuck in busy
      loops that never exit, such as in this stack trace:
      
      (gdb) target remote localhost:9999
      Remote debugging using localhost:9999
      __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
      56              while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
      (gdb) bt
       #0  __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
       #1  __default_send_IPI_shortcut (shortcut=<optimized out>,
      dest=<optimized out>, vector=<optimized out>) at
      ./arch/x86/include/asm/ipi.h:75
       #2  apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
       #3  0xffffffff81011336 in arch_irq_work_raise () at
      arch/x86/kernel/irq_work.c:47
       #4  0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
      kernel/irq_work.c:100
       #5  0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
       #6  0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
      out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
      fmt=<optimized out>, args=<optimized out>)
          at kernel/printk/printk.c:1778
       #7  0xffffffff816010c8 in printk (fmt=<optimized out>) at
      kernel/printk/printk.c:1868
       #8  0xffffffffc00013ea in ?? ()
       #9  0x0000000000000000 in ?? ()
      
      Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
      
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      fc5fee86
  8. Aug 08, 2015
Loading