Skip to content
  1. May 02, 2007
    • Rusty Russell's avatar
      [PATCH] i386: i386 separate hardware-defined TSS from Linux additions · a75c54f9
      Rusty Russell authored
      
      
      On Thu, 2007-03-29 at 13:16 +0200, Andi Kleen wrote:
      > Please clean it up properly with two structs.
      
      Not sure about this, now I've done it.  Running it here.
      
      If you like it, I can do x86-64 as well.
      
      ==
      lguest defines its own TSS struct because the "struct tss_struct"
      contains linux-specific additions.  Andi asked me to split the struct
      in processor.h.
      
      Unfortunately it makes usage a little awkward.
      
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      a75c54f9
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Remove smp_alt_instructions · d0175ab6
      Jeremy Fitzhardinge authored
      
      
      The .smp_altinstructions section and its corresponding symbols are
      completely unused, so remove them.
      
      Also, remove stray #ifdef __KENREL__ in asm-i386/alternative.h
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      d0175ab6
    • H. Peter Anvin's avatar
      [PATCH] x86: Clean up x86 control register and MSR macros (corrected) · 4bc5aa91
      H. Peter Anvin authored
      
      
      This patch is based on Rusty's recent cleanup of the EFLAGS-related
      macros; it extends the same kind of cleanup to control registers and
      MSRs.
      
      It also unifies these between i386 and x86-64; at least with regards
      to MSRs, the two had definitely gotten out of sync.
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      4bc5aa91
    • Andi Kleen's avatar
      [PATCH] x86: Don't use MWAIT on AMD Family 10 · f039b754
      Andi Kleen authored
      
      
      It doesn't put the CPU into deeper sleep states, so it's better to use the standard
      idle loop to save power. But allow to reenable it anyways for benchmarking.
      
      I also removed the obsolete idle=halt on i386
      
      Cc: andreas.herrmann@amd.com
      
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      f039b754
    • Jeremy Fitzhardinge's avatar
      [PATCH] x86-64: Clean up asm-x86_64/bugs.h · c169859d
      Jeremy Fitzhardinge authored
      
      
      Most of asm-x86_64/bugs.h is code which should be in a C file, so put it there.
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      c169859d
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Make COMPAT_VDSO runtime selectable. · 1dbf527c
      Jeremy Fitzhardinge authored
      
      
      Now that relocation of the VDSO for COMPAT_VDSO users is done at
      runtime rather than compile time, it is possible to enable/disable
      compat mode at runtime.
      
      This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the
      kernel command line, or via sysctl.  (Switching on a running system
      shouldn't be done lightly; any process which was relying on the compat
      VDSO will be upset if it goes away.)
      
      The COMPAT_VDSO config option still exists, but if enabled it just
      makes vdso_enabled default to VDSO_COMPAT.
      
      +From: Hugh Dickins <hugh@veritas.com>
      
      Fix oops from i386-make-compat_vdso-runtime-selectable.patch.
      
      Even mingetty at system startup finds it easy to trigger an oops
      while reading /proc/PID/maps: though it has a good hold on the mm
      itself, that cannot stop exit_mm() from resetting tsk->mm to NULL.
      
      (It is usually show_map()'s call to get_gate_vma() which oopses,
      and I expect we could change that to check priv->tail_vma instead;
      but no matter, even m_start()'s call just after get_task_mm() is racy.)
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: "Jan Beulich" <JBeulich@novell.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      1dbf527c
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO · d4f7a2c1
      Jeremy Fitzhardinge authored
      
      
      Some versions of libc can't deal with a VDSO which doesn't have its
      ELF headers matching its mapped address.  COMPAT_VDSO maps the VDSO at
      a specific system-wide fixed address.  Previously this was all done at
      build time, on the grounds that the fixed VDSO address is always at
      the top of the address space.  However, a hypervisor may reserve some
      of that address space, pushing the fixmap address down.
      
      This patch does the adjustment dynamically at runtime, depending on
      the runtime location of the VDSO fixmap.
      
      [ Patch has been through several hands: Jan Beulich wrote the orignal
        version; Zach reworked it, and Jeremy converted it to relocate phdrs
        as well as sections. ]
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: "Jan Beulich" <JBeulich@novell.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      d4f7a2c1
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: clean up identify_cpu · a6c4e076
      Jeremy Fitzhardinge authored
      
      
      identify_cpu() is used to identify both the boot CPU and secondary
      CPUs, but it performs some actions which only apply to the boot CPU.
      Those functions are therefore really __init functions, but because
      they're called by identify_cpu(), they must be marked __cpuinit.
      
      This patch splits identify_cpu() into identify_boot_cpu() and
      identify_secondary_cpu(), and calls the appropriate init functions
      from each.  Also, identify_boot_cpu() and all the functions it
      dominates are marked __init.
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      a6c4e076
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Clean up asm-i386/bugs.h · 1353ebb4
      Jeremy Fitzhardinge authored
      
      
      Most of asm-i386/bugs.h is code which should be in a C file, so put it there.
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      1353ebb4
    • Avi Kivity's avatar
      [PATCH] x86-64: fix arithmetic in comment · bbf30a16
      Avi Kivity authored
      
      
      The xmm space on x86_64 is 256 bytes.
      
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      bbf30a16
    • Andi Kleen's avatar
      [PATCH] x86-64: Use X86_EFLAGS_IF in x86-64/irqflags.h. · 5d02d7ae
      Andi Kleen authored
      
      
      As per i386 patch: move X86_EFLAGS_IF et al out to a new header:
      processor-flags.h, so we can include it from irqflags.h and use it in
      raw_irqs_disabled_flags().
      
      As a side-effect, we could now use these flags in .S files.
      
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      5d02d7ae
    • Jan Beulich's avatar
      [PATCH] x86: fix amd64-agp aperture validation · b92e9fac
      Jan Beulich authored
      
      
      Under CONFIG_DISCONTIGMEM, assuming that a !pfn_valid() implies all
      subsequent pfn-s are also invalid is wrong. Thus replace this by
      explicitly checking against the E820 map.
      
      AK: make e820 on x86-64 not initdata
      
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatarMark Langsdorf <mark.langsdorf@amd.com>
      b92e9fac
    • Jeremy Fitzhardinge's avatar
      [PATCH] x86-64: Account for module percpu space separately from kernel percpu · b00742d3
      Jeremy Fitzhardinge authored
      
      
      Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as
      the sum of kernel_percpu + PERCPU_MODULE_RESERVE.  This is now common
      to all architectures; if an architecture wants to set
      PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is
      the only one which does).
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      b00742d3
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Add machine_ops interface to abstract halting and rebooting · 07f3331c
      Jeremy Fitzhardinge authored
      
      
      machine_ops is an interface for the machine_* functions defined in
      <linux/reboot.h>.  This is intended to allow hypervisors to intercept
      the reboot process, but it could be used to implement other x86
      subarchtecture reboots.
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      07f3331c
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Add smp_ops interface · 01a2f435
      Jeremy Fitzhardinge authored
      
      
      Add a smp_ops interface.  This abstracts the API defined by
      <linux/smp.h> for use within arch/i386.  The primary intent is that it
      be used by a paravirtualizing hypervisor to implement SMP, but it
      could also be used by non-APIC-using sub-architectures.
      
      This is related to CONFIG_PARAVIRT, but is implemented unconditionally
      since it is simpler that way and not a highly performance-sensitive
      interface.
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      01a2f435
    • Rusty Russell's avatar
      [PATCH] i386: cleanup GDT Access · 4fbb5968
      Rusty Russell authored
      
      
      Now we have an explicit per-cpu GDT variable, we don't need to keep the
      descriptors around to use them to find the GDT: expose cpu_gdt directly.
      
      We could go further and make load_gdt() pack the descriptor for us, or even
      assume it means "load the current cpu's GDT" which is what it always does.
      
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4fbb5968
    • Adrian Bunk's avatar
      [PATCH] x86: sys_ioperm() prototype cleanup · ca906e42
      Adrian Bunk authored
      
      
      - there's no reason for duplicating the prototype from
        include/linux/syscalls.h in include/asm-x86_64/unistd.h
      - every file should #include the headers containing the prototypes for
        it's global functions
      
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      ca906e42
    • Christoph Lameter's avatar
      [PATCH] x86-64: use lru instead of page->index and page->private for pgd lists management. · 2bff7383
      Christoph Lameter authored
      
      
      x86_64 currently simulates a list using the index and private fields of the
      page struct.  Seems that the code was inherited from i386.  But x86_64 does
      not use the slab to allocate pgds and pmds etc.  So the lru field is not
      used by the slab and therefore available.
      
      This patch uses standard list operations on page->lru to realize pgd
      tracking.
      
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2bff7383
    • Andi Kleen's avatar
      [PATCH] i386: Use X86_EFLAGS_IF in irqflags.h. · b4531e86
      Andi Kleen authored
      
      
      Move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we
      can include it from irqflags.h and use it in raw_irqs_disabled_flags().
      
      As a side-effect, we could now use these flags in .S files.
      
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      b4531e86
    • Jan Beulich's avatar
      [PATCH] x86: tighten kernel image page access rights · 6fb14755
      Jan Beulich authored
      
      
      On x86-64, kernel memory freed after init can be entirely unmapped instead
      of just getting 'poisoned' by overwriting with a debug pattern.
      
      On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table
      can also be write-protected.
      
      Compared to the first version, this one prevents re-creating deleted
      mappings in the kernel image range on x86-64, if those got removed
      previously. This, together with the original changes, prevents temporarily
      having inconsistent mappings when cacheability attributes are being
      changed on such pages (e.g. from AGP code). While on i386 such duplicate
      mappings don't exist, the same change is done there, too, both for
      consistency and because checking pte_present() before using various other
      pte_XXX functions is a requirement anyway. At once, i386 code gets
      adjusted to use pte_huge() instead of open coding this.
      
      AK: split out cpa() changes
      
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      6fb14755
    • Jan Beulich's avatar
      [PATCH] x86: Improve handling of kernel mappings in change_page_attr · d01ad8dd
      Jan Beulich authored
      
      
      Fix various broken corner cases in i386 and x86-64 change_page_attr.
      
      AK: split off from tighten kernel image access rights
      
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      d01ad8dd
    • Rusty Russell's avatar
      [PATCH] i386: rationalize paravirt wrappers · 90a0a06a
      Rusty Russell authored
      
      
      paravirt.c used to implement native versions of all low-level
      functions.  Far cleaner is to have the native versions exposed in the
      headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply
      #define XXX native_XXX.
      
      There are several nice side effects:
      
      1) write_dt_entry() now takes the correct "struct Xgt_desc_struct *"
         not "void *".
      
      2) load_TLS is reintroduced to the for loop, not manually unrolled
         with a #error in case the bounds ever change.
      
      3) Macros become inlines, with type checking.
      
      4) Access to the native versions is trivial for KVM, lguest, Xen and
         others who might want it.
      
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Avi Kivity <avi@qumranet.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      90a0a06a
    • Rusty Russell's avatar
      [PATCH] i386: clean up cpu_init() · d2cbcc49
      Rusty Russell authored
      
      
      We now have cpu_init() and secondary_cpu_init() doing nothing but calling
      _cpu_init() with the same arguments.  Rename _cpu_init() to cpu_init() and use
      it as a replcement for secondary_cpu_init().
      
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d2cbcc49
    • Rusty Russell's avatar
      [PATCH] i386: Use per-cpu GDT immediately upon boot · bf504672
      Rusty Russell authored
      
      
      Now we are no longer dynamically allocating the GDT, we don't need the
      "cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the
      per-cpu GDT.  This means initializing the cpu_gdt array in C.
      
      The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it
      switches to the per-cpu copy just allocated.  For secondary CPUs, the
      early_gdt_descr is set to point directly to their per-cpu copy.
      
      For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP,
      but we never have to move.
      
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bf504672
    • Rusty Russell's avatar
      [PATCH] i386: Use per-cpu variables for GDT, PDA · ae1ee11b
      Rusty Russell authored
      
      
      Allocating PDA and GDT at boot is a pain.  Using simple per-cpu variables adds
      happiness (although we need the GDT page-aligned for Xen, which we do in a
      followup patch).
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ae1ee11b
    • Ian Campbell's avatar
      [PATCH] i386: Allow i386 crash kernels to handle x86_64 dumps · 79e03011
      Ian Campbell authored
      
      
      The specific case I am encountering is kdump under Xen with a 64 bit
      hypervisor and 32 bit kernel/userspace.  The dump created is 64 bit due to
      the hypervisor but the dump kernel is 32 bit for maximum compatibility.
      
      It's possibly less likely to be useful in a purely native scenario but I
      see no reason to disallow it.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarIan Campbell <ian.campbell@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Cc: Horms <horms@verge.net.au>
      Cc: Magnus Damm <magnus.damm@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      79e03011
    • Rusty Russell's avatar
      [PATCH] x86-64: Introduce load_TLS to the "for" loop. · eab0c72a
      Rusty Russell authored
      
      
      GCC (4.1 at least) unrolls it anyway, but I can't believe this code
      was ever justifiable.  (I've also submitted a patch which cleans up
      i386, which is even uglier).
      
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eab0c72a
    • Rusty Russell's avatar
      [PATCH] i386: Initialize esp0 properly all the time · 692174b9
      Rusty Russell authored
      
      
      Whenever we schedule, __switch_to calls load_esp0 which does:
      
      	tss->esp0 = thread->esp0;
      
      This is never initialized for the initial thread (ie "swapper"), so when we're
      scheduling that, we end up setting esp0 to 0.  This is fine: the swapper never
      leaves ring 0, so this field is never used.
      
      lguest, however, gets upset that we're trying to used an unmapped page as our
      kernel stack.  Rather than work around it there, let's initialize it.
      
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      692174b9
    • David Rientjes's avatar
      [PATCH] x86-64: configurable fake numa node sizes · 8b8ca80e
      David Rientjes authored
      
      
      Extends the numa=fake x86_64 command-line option to allow for configurable
      node sizes.  These nodes can be used in conjunction with cpusets for coarse
      memory resource management.
      
      The old command-line option is still supported:
        numa=fake=32	gives 32 fake NUMA nodes, ignoring the NUMA setup of the
      		actual machine.
      
      But now you may configure your system for the node sizes of your choice:
        numa=fake=2*512,1024,2*256
      		gives two 512M nodes, one 1024M node, two 256M nodes, and
      		the rest of system memory to a sixth node.
      
      The existing hash function is maintained to support the various node sizes
      that are possible with this implementation.
      
      Each node of the same size receives roughly the same amount of available
      pages, regardless of any reserved memory with its address range.  The total
      available pages on the system is calculated and divided by the number of equal
      nodes to allocate.  These nodes are then dynamically allocated and their
      borders extended until such time as their number of available pages reaches
      the required size.
      
      Configurable node sizes are recommended when used in conjunction with cpusets
      for memory control because it eliminates the overhead associated with scanning
      the zonelists of many smaller full nodes on page_alloc().
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8b8ca80e
    • John Stultz's avatar
      [PATCH] x86: Log reason why TSC was marked unstable · 5a90cf20
      John Stultz authored
      
      
      Change mark_tsc_unstable() so it takes a string argument, which holds the
      reason the TSC was marked unstable.
      
      This is then displayed the first time mark_tsc_unstable is called.
      
      This should help us better debug why the TSC was marked unstable on certain
      systems and allow us to make sure we're not being overly paranoid when
      throwing out this troublesome clocksource.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      5a90cf20
    • Vivek Goyal's avatar
      [PATCH] i386: modpost apic related warning fixes · 1833d6bc
      Vivek Goyal authored
      
      
      o Modpost generates warnings for i386 if compiled with CONFIG_RELOCATABLE=y
      
      WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_oem_table from .text between 'acpi_madt_oem_check' (at offset 0xc0101eda) and 'enable_apic_mode'
      WARNING: vmlinux - Section mismatch: reference to .init.text:acpi_get_table_header_early from .text between 'acpi_madt_oem_check' (at offset 0xc0101ef0) and 'enable_apic_mode'
      WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'acpi_madt_oem_check' (at offset 0xc0101f2e) and 'enable_apic_mode'
      WARNING: vmlinux - Section mismatch: reference to .init.text:setup_unisys from .text between 'acpi_madt_oem_check' (at offset 0xc0101f37) and 'enable_apic_mode'WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'mps_oem_check' (at offset 0xc0101ec7) and 'acpi_madt_oem_check'
      WARNING: vmlinux - Section mismatch: reference to .init.text:es7000_sw_apic from .text between 'enable_apic_mode' (at offset 0xc0101f48) and 'check_apicid_present'
      
      o Some functions which are inline (acpi_madt_oem_check) are not inlined by
        compiler as these functions are accessed using function pointer. These
        functions are put in .text section and they in-turn access __init type
        functions hence modpost generates warnings.
      
      o Do not iniline acpi_madt_oem_check, instead make it __init.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1833d6bc
    • Ravikiran G Thirumalai's avatar
      [PATCH] x86-64: Set HASHDIST_DEFAULT to 1 for x86_64 NUMA · e073ae1b
      Ravikiran G Thirumalai authored
      
      
      Enable system hashtable memory to be distributed among nodes on x86_64 NUMA
      
      Forcing the kernel to use node interleaved vmalloc instead of bootmem for
      the system hashtable memory (alloc_large_system_hash) reduces the memory
      imbalance on node 0 by around 40MB on a 8 node x86_64 NUMA box:
      
      Before the following patch, on bootup of a 8 node box:
      
      Node 0 MemTotal:      3407488 kB
      Node 0 MemFree:       3206296 kB
      Node 0 MemUsed:        201192 kB
      Node 0 Active:           7012 kB
      Node 0 Inactive:          512 kB
      Node 0 Dirty:               0 kB
      Node 0 Writeback:           0 kB
      Node 0 FilePages:        1912 kB
      Node 0 Mapped:            420 kB
      Node 0 AnonPages:        5612 kB
      Node 0 PageTables:        468 kB
      Node 0 NFS_Unstable:        0 kB
      Node 0 Bounce:              0 kB
      Node 0 Slab:             5408 kB
      Node 0 SReclaimable:      644 kB
      Node 0 SUnreclaim:       4764 kB
      
      After the patch (or using hashdist=1 on the kernel command line):
      
      Node 0 MemTotal:      3407488 kB
      Node 0 MemFree:       3247608 kB
      Node 0 MemUsed:        159880 kB
      Node 0 Active:           3012 kB
      Node 0 Inactive:          616 kB
      Node 0 Dirty:               0 kB
      Node 0 Writeback:           0 kB
      Node 0 FilePages:        2424 kB
      Node 0 Mapped:            380 kB
      Node 0 AnonPages:        1200 kB
      Node 0 PageTables:        396 kB
      Node 0 NFS_Unstable:        0 kB
      Node 0 Bounce:              0 kB
      Node 0 Slab:             6304 kB
      Node 0 SReclaimable:     1596 kB
      Node 0 SUnreclaim:       4708 kB
      
      I guess it is a good idea to keep HASHDIST_DEFAULT "on" for x86_64 NUMA
      since x86_64 has no dearth of vmalloc space?  Or maybe enable hash
      distribution for all 64bit NUMA arches?  The following patch does it only
      for x86_64.
      
      I ran a HPC MPI benchmark -- 'Ansys wingsolid', which takes up quite a bit of
      memory and uses up tlb entries.  This was on a 4 way, 2 socket
      Tyan AMD box (non vsmp), with 8G total memory (4G pernode).
      
      The results with and without hash distribution are:
      
      1. Vanilla - runtime of 1188.000s
      2. With hashdist=1 runtime of 1154.000s
      
      Oprofile output for the duration of run is:
      
      1. Vanilla:
      PU: AMD64 processors, speed 2411.16 MHz (estimated)
      Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
      mask of 0x00 (No unit mask) count 500
      samples  %        app name                 symbol name
      163054    6.5513  libansys1.so             MultiFront::decompose(int, int,
      Elemset *, int *, int, int, int)
      162061    6.5114  libansys3.so             blockSaxpy6L_fd
      162042    6.5107  libansys3.so             blockInnerProduct6L_fd
      156286    6.2794  libansys3.so             maxb33_
      87879     3.5309  libansys1.so             elmatrixmultpcg_
      84857     3.4095  libansys4.so             saxpy_pcg
      58637     2.3560  libansys4.so             .st4560
      46612     1.8728  libansys4.so             .st4282
      43043     1.7294  vmlinux-t                copy_user_generic_string
      41326     1.6604  libansys3.so             blockSaxpyBackSolve6L_fd
      41288     1.6589  libansys3.so             blockInnerProductBackSolve6L_fd
      
      2. With hashdist=1
      CPU: AMD64 processors, speed 2411.13 MHz (estimated)
      Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
      mask of 0x00 (No unit mask) count 500
      samples  %        app name                 symbol name
      162993    6.9814  libansys1.so             MultiFront::decompose(int, int,
      Elemset *, int *, int, int, int)
      160799    6.8874  libansys3.so             blockInnerProduct6L_fd
      160459    6.8729  libansys3.so             blockSaxpy6L_fd
      156018    6.6826  libansys3.so             maxb33_
      84700     3.6279  libansys4.so             saxpy_pcg
      83434     3.5737  libansys1.so             elmatrixmultpcg_
      58074     2.4875  libansys4.so             .st4560
      46000     1.9703  libansys4.so             .st4282
      41166     1.7632  libansys3.so             blockSaxpyBackSolve6L_fd
      41033     1.7575  libansys3.so             blockInnerProductBackSolve6L_fd
      35762     1.5318  libansys1.so             inner_product_sub
      35591     1.5245  libansys1.so             inner_product_sub2
      28259     1.2104  libansys4.so             addVectors
      
      Signed-off-by: default avatarPravin B. Shelar <pravin.shelar@calsoftinc.com>
      Signed-off-by: default avatarRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: default avatarShai Fultheim <shai@scalex86.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatarChristoph Lameter <clameter@engr.sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e073ae1b
    • Andrew Morton's avatar
      [PATCH] x86-64: fix x86_64-mm-sched-clock-share · 184c44d2
      Andrew Morton authored
      
      
      Fix for the following patch. Provide dummy cpufreq functions when
      CPUFREQ is not compiled in.
      
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      184c44d2
    • Vivek Goyal's avatar
      [PATCH] x86-64: build-time checking · 6a50a664
      Vivek Goyal authored
      
      
      o X86_64 kernel should run from 2MB aligned address for two reasons.
      	- Performance.
      	- For relocatable kernels, page tables are updated based on difference
      	  between compile time address and load time physical address.
      	  This difference should be multiple of 2MB as kernel text and data
      	  is mapped using 2MB pages and PMD should be pointing to a 2MB
      	  aligned address. Life is simpler if both compile time and load time
      	  kernel addresses are 2MB aligned.
      
      o Flag the error at compile time if one is trying to build a kernel which
        does not meet alignment restrictions.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6a50a664
    • Vivek Goyal's avatar
      [PATCH] x86-64: Relocatable Kernel Support · 1ab60e0f
      Vivek Goyal authored
      
      
      This patch modifies the x86_64 kernel so that it can be loaded and run
      at any 2M aligned address, below 512G.  The technique used is to
      compile the decompressor with -fPIC and modify it so the decompressor
      is fully relocatable.  For the main kernel the page tables are
      modified so the kernel remains at the same virtual address.  In
      addition a variable phys_base is kept that holds the physical address
      the kernel is loaded at.  __pa_symbol is modified to add that when
      we take the address of a kernel symbol.
      
      When loaded with a normal bootloader the decompressor will decompress
      the kernel to 2M and it will run there.  This both ensures the
      relocation code is always working, and makes it easier to use 2M
      pages for the kernel and the cpu.
      
      AK: changed to not make RELOCATABLE default in Kconfig
      
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      1ab60e0f
    • Vivek Goyal's avatar
      [PATCH] x86: __pa and __pa_symbol address space separation · 0dbf7028
      Vivek Goyal authored
      
      
      Currently __pa_symbol is for use with symbols in the kernel address
      map and __pa is for use with pointers into the physical memory map.
      But the code is implemented so you can usually interchange the two.
      
      __pa which is much more common can be implemented much more cheaply
      if it is it doesn't have to worry about any other kernel address
      spaces.  This is especially true with a relocatable kernel as
      __pa_symbol needs to peform an extra variable read to resolve
      the address.
      
      There is a third macro that is added for the vsyscall data
      __pa_vsymbol for finding the physical addesses of vsyscall pages.
      
      Most of this patch is simply sorting through the references to
      __pa or __pa_symbol and using the proper one.  A little of
      it is continuing to use a physical address when we have it
      instead of recalculating it several times.
      
      swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
      and init_mm.pgd is initialized at boot (instead of compile time)
      to the physmem virtual mapping of init_level4_pgd.  The
      physical address changed.
      
      Except for the for EMPTY_ZERO page all of the remaining references
      to __pa_symbol appear to be during kernel initialization.  So this
      should reduce the cost of __pa in the common case, even on a relocated
      kernel.
      
      As this is technically a semantic change we need to be on the lookout
      for anything I missed.  But it works for me (tm).
      
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      0dbf7028
    • Vivek Goyal's avatar
      [PATCH] x86-64: Remove the identity mapping as early as possible · cfd243d4
      Vivek Goyal authored
      
      
      With the rewrite of the SMP trampoline and the early page
      allocator there is nothing that needs identity mapped pages,
      once we start executing C code.
      
      So add zap_identity_mappings into head64.c and remove
      zap_low_mappings() from much later in the code.  The functions
       are subtly different thus the name change.
      
      This also kills boot_level4_pgt which was from an earlier
      attempt to move the identity mappings as early as possible,
      and is now no longer needed.  Essentially I have replaced
      boot_level4_pgt with trampoline_level4_pgt in trampoline.S
      
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      cfd243d4
    • Vivek Goyal's avatar
      [PATCH] x86-64: wakeup.S rename registers to reflect right names · 7db681d7
      Vivek Goyal authored
      
      
      o Use appropriate names for 64bit regsiters.
      
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      7db681d7
    • Vivek Goyal's avatar
      [PATCH] x86-64: Add EFER to the register set saved by save_processor_state · 3c321bce
      Vivek Goyal authored
      
      
      EFER varies like %cr4 depending on the cpu capabilities, and which cpu
      capabilities we want to make use of.  So save/restore it make certain
      we have the same EFER value when we are done.
      
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      3c321bce
    • Vivek Goyal's avatar
      [PATCH] x86-64: cleanup segments · 30f47289
      Vivek Goyal authored
      
      
      Move __KERNEL32_CS up into the unused gdt entry.  __KERNEL32_CS is
      used when entering the kernel so putting it first is useful when
      trying to keep boot gdt sizes to a minimum.
      
      Set the accessed bit on all gdt entries.  We don't care
      so there is no need for the cpu to burn the extra cycles,
      and it potentially allows the pages to be immutable.  Plus
      it is confusing when debugging and your gdt entries mysteriously
      change.
      
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      30f47289
Loading