- Apr 28, 2005
-
-
David Mosberger-Tang authored
Using stf8 seemed like a clever idea at the time, but stf8 forces the cache-line to be invalidated in the L1D (if it happens to be there already). This patch eliminates a guaranteed L1D cache-miss and, by itself, is good for a 1-2 cycle improvement for heavy-weight syscalls. Signed-off-by:
David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
David Mosberger-Tang authored
Why is this a good idea? Clearing b7 to 0 is guaranteed to do us no good and writing it with __kernel_syscall_via_epc() yields a 6 cycle improvement _if_ the application performs another EPC-based system- call without overwriting b7, which is not all that uncommon. Well worth the minimal cost of 1 bundle of code. Signed-off-by:
David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
David Mosberger-Tang authored
Decreases syscall overhead by approximately 6 cycles. Signed-off-by:
David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
David Mosberger-Tang authored
This by itself is good for a 1-2 cycle speed up. Effect is bigger when combined with the later patches. Signed-off-by:
David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
David Mosberger-Tang authored
Signed-off-by:
David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
- Apr 25, 2005
-
-
Kenji Kaneshige authored
vector sharing patch had a typo ... mismatched spin_lock() with a spin_unlock_irq(). Fix from Kenji Kaneshige. Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
Tony Luck authored
Rohit and Suresh changed their mind about the order to print things in /proc/cpuinfo, but didn't include the change in the version of the patch they sent to me. Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
Kenji Kaneshige authored
Current ia64 linux cannot handle greater than 184 interrupt sources because of the lack of vectors. The following patch enables ia64 linux to handle greater than 184 interrupt sources by allowing the same vector number to be shared by multiple IOSAPIC's RTEs. The design of this patch is besed on "Intel(R) Itanium(R) Processor Family Interrupt Architecture Guide". Even if you don't have a large I/O system, you can see the behavior of vector sharing by changing IOSAPIC_LAST_DEVICE_VECTOR to fewer value. Signed-off-by:
Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
Suresh Siddha authored
Version 3 - rediffed to apply on top of Ashok's hotplug cpu patch. /proc/cpuinfo output in step with x86. This is an updated MC/MT identification patch based on the previous discussions on list. Add the Multi-core and Multi-threading detection for IPF. - Add new core and threading related fields in /proc/cpuinfo. Physical id Core id Thread id Siblings - setup the cpu_core_map and cpu_sibling_map appropriately - Handles Hot plug CPU Signed-off-by:
Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by:
Gordon Jin <gordon.jin@intel.com> Signed-off-by:
Rohit Seth <rohit.seth@intel.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
David Mosberger-Tang authored
Sadly, I goofed in this syscall-tuning patch: ChangeSet 1.1966.1.40 2005/01/22 13:31:05 davidm@hpl.hp.com [IA64] Improve ia64_leave_syscall() for McKinley-type cores. Optimize ia64_leave_syscall() a bit better for McKinley-type cores. The patch looks big, but that's mostly due to renaming r16/r17 to r2/r3. Good for a 13 cycle improvement. The problem is that the size of the physical stacked registers was loaded into the wrong register (r3 instead of r17). Since r17 by coincidence always had the value 1, this had the effect of turning rse_clear_invalid into a no-op. That poses the risk of leaking kernel state back to user-land and is hence not acceptable. The fix below is simple, but unfortunately it costs us about 28 cycles in syscall overhead. ;-( Unfortunately, there isn't much we can do about that since those registers have to be cleared one way or another. --david Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
Stephane Eranian authored
- make pfm_sysctl a global such that it is possible to enable/disable debug printk in sampling formats using PFM_DEBUG. - remove unused pfm_debug_var variable - fix a bug in pfm_handle_work where an BUG_ON() could be triggered. There is a path where pfm_handle_work() can be called with interrupts enabled, i.e., when TIF_NEED_RESCHED is set. The fix correct the masking and unmasking of interrupts in pfm_handle_work() such that we restore the interrupt mask as it was upon entry. signed-off-by:
stephane eranian <eranian@hpl.hp.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
David Mosberger-Tang authored
Recently I noticed that clearing ar.ssd/ar.csd right before srlz.d is causing significant stalling in the syscall path. The patch below fixes that by moving the register-writes after srlz.d. On a Madison, this drops break-based getpid() from 241 to 226 cycles (-15 cycles). Signed-off-by:
David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
Keith Owens authored
Detect user space by the unwind frame with predicate PRED_USER_STACK set, instead of a user space IP. Tighten up the last ditch check for running off the top of the kernel stack. Based on a suggestion by David Mosberger, reworked to fit the current tree. This survives my stress test which used to break 2.6.9 kernels. Unlike 2.6.11, the stress test now unwinds to the correct point, so gdb can get the user space registers. Signed-off-by:
Keith Owens <kaos@sgi.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
David Mosberger-Tang authored
Call cpu_relax() in busy-waiting loops of the ITC-syncing code. Signed-off-by:
David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
- Apr 22, 2005
-
-
Ashok Raj authored
Signed-off-by:
Ashok Raj <ashok.raj@intel.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
Ashok Raj authored
This patch is required to support cpu removal for IPF systems. Existing code just fakes the real offline by keeping it run the idle thread, and polling for the bit to re-appear in the cpu_state to get out of the idle loop. For the cpu-offline to work correctly, we need to pass control of this CPU back to SAL so it can continue in the boot-rendez mode. This gives the SAL control to not pick this cpu as the monarch processor for global MCA events, and addition does not wait for this cpu to checkin with SAL for global MCA events as well. The handoff is implemented as documented in SAL specification section 3.2.5.1 "OS_BOOT_RENDEZ to SAL return State" Signed-off-by:
Ashok Raj <ashok.raj@intel.com> Signed-off-by:
Tony Luck <tony.luck@intel.com>
-
- Apr 16, 2005
-
-
Linus Torvalds authored
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
-