- Feb 11, 2009
-
-
Milton Miller authored
find_min_common_depth() was checking the property length incorrectly. The value is in bytes not cells, and it is using the second entry. Signed-off-By:
Milton Miller <miltonm@bga.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Feb 10, 2009
-
-
Kumar Gala authored
Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t instead of unsigned long. In 36-bit physical mode we really need these functions to deal with phys_addr_t when trying to match a physical address or when returning one. Signed-off-by:
Kumar Gala <galak@kernel.crashing.org>
-
- Jan 29, 2009
-
-
Trent Piepho authored
On booke processors, the code that maps low memory only uses up to three CAM entries, even though there are sixteen and nothing else uses them. Make this number configurable in the advanced options menu along with max low memory size. If one wants 1 GB of lowmem, then it's typically necessary to have four CAM entries. Signed-off-by:
Trent Piepho <tpiepho@freescale.com> Signed-off-by:
Kumar Gala <galak@kernel.crashing.org>
-
Trent Piepho authored
The code that maps kernel low memory would only use page sizes up to 256 MB. On E500v2 pages up to 4 GB are supported. However, a page must be aligned to a multiple of the page's size. I.e. 256 MB pages must aligned to a 256 MB boundary. This was enforced by a requirement that the physical and virtual addresses of the start of lowmem be aligned to 256 MB. Clearly requiring 1GB or 4GB alignment to allow pages of that size isn't acceptable. To solve this, I simply have adjust_total_lowmem() take alignment into account when it decides what size pages to use. Give it PAGE_OFFSET = 0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map pages like this: PA 0x3000_0000 VA 0x7000_0000 Size 256 MB PA 0x4000_0000 VA 0x8000_0000 Size 1 GB PA 0x8000_0000 VA 0xC000_0000 Size 256 MB PA 0x9000_0000 VA 0xD000_0000 Size 256 MB PA 0xA000_0000 VA 0xE000_0000 Size 256 MB Because the lowmem mapping code now takes alignment into account, PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB. Even lower might be possible. The lowmem code will work down to 4 kB but it's possible some of the boot code will fail before then. Poor alignment will force small pages to be used, which combined with the limited number of TLB1 pages available, will result in very little memory getting mapped. So alignments less than 64 MB probably aren't very useful anyway. Signed-off-by:
Trent Piepho <tpiepho@freescale.com> Signed-off-by:
Kumar Gala <galak@kernel.crashing.org>
-
Trent Piepho authored
The code to map lowmem uses three CAM aka TLB[1] entries to cover it. The size of each is stored in three globals named __cam0, __cam1, and __cam2. All the code that uses them is duplicated three times for each of the three variables. We have these things called arrays and loops.... Once converted to use an array, it will be easier to make the number of CAMs configurable. Signed-off-by:
Trent Piepho <tpiepho@freescale.com> Signed-off-by:
Kumar Gala <galak@kernel.crashing.org>
-
- Jan 28, 2009
-
-
Gerhard Pircher authored
_PAGE_COHERENT is now always set in _PAGE_RAM resp. PAGE_KERNEL. Thus it has to be masked out, if the BAT mapping should be non cacheable or CPU_FTR_NEED_COHERENT is not set. This will work on normal SMP setups because we force-set CPU_FTR_NEED_COHERENT as part of CPU_FTR_COMMON on SMP. Signed-off-by:
Gerhard Pircher <gerhard_pircher@gmx.net> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Jan 16, 2009
-
-
Dave Kleikamp authored
powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices The subpage_prot syscall fails on second and subsequent calls for a given region, because is_hugepage_only_range() is mis-identifying the 4 kB slices when the process has a 64 kB page size. Signed-off-by:
Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Jan 13, 2009
-
-
Ingo Molnar authored
Convert arch/powerpc/ over to long long based u64: -#ifdef __powerpc64__ -# include <asm-generic/int-l64.h> -#else -# include <asm-generic/int-ll64.h> -#endif +#include <asm-generic/int-ll64.h> This will avoid reoccuring spurious warnings in core kernel code that comes when people test on their own hardware. (i.e. x86 in ~98% of the cases) This is what x86 uses and it generally helps keep 64-bit code 32-bit clean too. [Adjusted to not impact user mode (from paulus) - sfr] Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Jan 08, 2009
-
-
Anton Vorontsov authored
The clear_fixmap() routine issues map_page() with flags set to 0. Currently this causes a BUG_ON() inside the map_page(), as it assumes that a PTE should be clear before mapping. This patch makes the map_page() to trigger the BUG_ON() only if the flags were set. Signed-off-by:
Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
This is a brown paper bag from one of my earlier patches that breaks build on 40x and 8xx. And yes, I've now added 40x and 8xx to my list of test configs :-) Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Dave Liu authored
Signed-off-by:
Dave Liu <daveliu@freescale.com> Acked-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Dave Hansen authored
Both users of careful_allocation() immediately memset() the result. So, just do it in one place. Also give careful_allocation() a 'z' prefix to bring it in line with kzmalloc() and friends. Signed-off-by:
Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Dave Hansen authored
Since we memset() the result in both of the uses here, just make careful_alloc() return a virtual address. Also, add a separate variable to store the physial address that comes back from the lmb_alloc() functions. This makes it less likely that someone will screw it up forgetting to convert before returning since the vaddr is always in a void* and the paddr is always in an unsigned long. I admit this is arbitrary since one of its users needs a paddr and one a vaddr, but it does remove a good number of casts. Signed-off-by:
Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Dave Hansen authored
If we fail a bootmem allocation, the bootmem code itself panics. No need to redo it here. Also change the wording of the other panic. We don't strictly have to allocate memory on the specified node. It is just a hint and that node may not even *have* any memory on it. In that case we can and do fall back to other nodes. Signed-off-by:
Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Dave Hansen authored
The behavior in careful_allocation() really confused me at first. Add a comment to hopefully make it easier on the next doofus that looks at it. Signed-off-by:
Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Jan 07, 2009
-
-
Trent Piepho authored
This is a global variable defined in fsl_booke_mmu.c with a value that gets initialized in assembly code in head_fsl_booke.S. It's never used. If some code ever does want to know the number of entries in TLB1, then "numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a global initialized during kernel boot from assembly. Signed-off-by:
Trent Piepho <tpiepho@freescale.com> Signed-off-by:
Kumar Gala <galak@kernel.crashing.org>
-
Trent Piepho authored
Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam to 20 when it indexed the TLBCAM table. Anyone changing the size of struct tlbcam would not know to expect that. The kernel already has a system to get the size of C structures into assembly language files, asm-offsets, so let's use it. The definition of the struct gets moved to a header, so that asm-offsets.c can include it. Signed-off-by:
Trent Piepho <tpiepho@freescale.com> Signed-off-by:
Kumar Gala <galak@kernel.crashing.org>
-
- Jan 06, 2009
-
-
Gary Hade authored
Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by:
Gary Hade <garyhade@us.ibm.com> Signed-off-by:
Badari Pulavarty <pbadari@us.ibm.com> Acked-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mel Gorman authored
The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the kernel to back a VMA. This matches the size used by the MMU in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64K as a base pagesize may still use 4K pages for the MMU on older processor. To distinguish, this patch reports MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps. Signed-off-by:
Mel Gorman <mel@csn.ul.ie> Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 28, 2008
-
-
Ilya Yanok authored
This adds support for 16k and 64k page sizes on PowerPC 44x processors. The PGDIR table is much smaller than a page when using 16k or 64k pages (512 and 32 bytes respectively) so we allocate the PGDIR with kzalloc() instead of __get_free_pages(). One PTE table covers rather a large memory area when using 16k or 64k pages (32MB or 512MB respectively), so we can easily put FIXMAP and PKMAP in the area covered by one PTE table. Signed-off-by:
Yuri Tikhonov <yur@emcraft.com> Signed-off-by:
Vladimir Panfilov <pvr@emcraft.com> Signed-off-by:
Ilya Yanok <yanok@emcraft.com> Acked-by:
Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Dec 23, 2008
-
-
Dale Farnsworth authored
Add the ability for a classic ppc kernel to be loaded at an address of 32MB. This done by fixing a few places that assume we are loaded at address 0, and by changing several uses of KERNELBASE to use PAGE_OFFSET, instead. Signed-off-by:
Dale Farnsworth <dale@farnsworth.org> Signed-off-by:
Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Anton Vorontsov authored
While for debugging it is good to catch bogus users of ioremap, though for kdump support it is more convenient to use __ioremap for copy_oldmem_page() (exactly as we do for PPC64 currently). Note that copy_oldmem_page() calls __ioremap with flags set to '0', so it should be safe with the regard to the caches. The other option is to use kmap_atomic_pfn()[1], but it will not work for kernels compiled without HIGHMEM. That is, on a board with 256MB RAM and crashkernel=64M@32M case, the !HIGHMEM capturing kernel maps 0-96M range, which does not include all the memory needed to capture the dump. And, obviously, accessing anything upper than 96M will cause faults. [1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046747.html Signed-off-by:
Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Dec 21, 2008
-
-
Benjamin Herrenschmidt authored
Rework to MMU code dropped a much missed 'blr' instruction. Brown-Paper-Bag-Worn-By:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Grant Likely <grant.likely@secretlab.ca>
-
Benjamin Herrenschmidt authored
Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in in the hash code based on some CPU feature bit. We also manipulate _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places. This changes the logic so that instead, the PTE now contains _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms that need it. The hash code clears it if the feature bit is not set. It also adds some clean accessors to setup various valid combinations of access flags and change various bits of code to use them instead. This should help having the PTE actually containing the bit combinations that we really want. I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead set it explicitely from the TLB miss. I will ultimately remove it completely as it appears that it might not be needed after all but in the meantime, having it in the TLB miss makes things a lot easier. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Benjamin Herrenschmidt authored
This makes the MMU context code used for CPUs with no hash table (except 603) dynamically allocate the various maps used to track the state of contexts. Only the main free map and CPU 0 stale map are allocated at boot time. Other CPU maps are allocated when those CPUs are brought up and freed if they are unplugged. This also moves the initialization of the MMU context management slightly later during the boot process, which should be fine as it's really only needed when userland if first started anyways. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Benjamin Herrenschmidt authored
The handlers for Critical, Machine Check or Debug interrupts will save and restore MMUCR nowadays, thus we only need to disable normal interrupts when invalidating TLB entries. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Acked-by:
Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Benjamin Herrenschmidt authored
Currently, the various forms of low level TLB invalidations are all implemented in misc_32.S for 32-bit processors, in a fairly scary mess of #ifdef's and with interesting duplication such as a whole bunch of code for FSL _tlbie and _tlbia which are no longer used. This moves things around such that _tlbie is now defined in hash_low_32.S and is only used by the 32-bit hash code, and all nohash CPUs use the various _tlbil_* forms that are now moved to a new file, tlb_nohash_low.S. I moved all the definitions for that stuff out of include/asm/tlbflush.h as they are really internal mm stuff, into mm/mmu_decl.h The code should have no functional changes. I kept some variants inline for trivial forms on things like 40x and 8xx. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Benjamin Herrenschmidt authored
This commit moves the whole no-hash TLB handling out of line into a new tlb_nohash.c file, and implements some basic SMP support using IPIs and/or broadcast tlbivax instructions. Note that I'm using local invalidations for D->I cache coherency. At worst, if another processor is trying to execute the same and has the old entry in its TLB, it will just take a fault and re-do the TLB flush locally (it won't re-do the cache flush in any case). Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Benjamin Herrenschmidt authored
We're soon running out of CPU features and I need to add some new ones for various MMU related bits, so this patch separates the MMU features from the CPU features. I moved over the 32-bit MMU related ones, added base features for MMU type families, but didn't move over any 64-bit only feature yet. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Benjamin Herrenschmidt authored
This reworks the context management code used by 4xx,8xx and freescale BookE. It adds support for SMP by implementing a concept of stale context map to lazily flush the TLB on processors where a context may have been invalidated. This also contains the ground work for generalizing such lazy TLB flushing by just picking up a new PID and marking the old one stale. This will be implemented later. This is a first implementation that uses a global spinlock. Ideally, we should try to get at least the fast path (context ID already assigned) lockless or limited to a per context lock, but for now this will do. I tried to keep the UP case reasonably simple to avoid adding too much overhead to 8xx which does a lot of context stealing since it effectively has only 16 PIDs available. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Benjamin Herrenschmidt authored
This splits the mmu_context handling between 32-bit hash based processors, 64-bit hash based processors and everybody else. This is preliminary work for adding SMP support for BookE processors. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Dec 16, 2008
-
-
Benjamin Herrenschmidt authored
The function flush_HPTE() is used in only one place, the implementation of DEBUG_PAGEALLOC on ppc32. It's actually a dup of flush_tlb_page() though it's -slightly- more efficient on hash based processors. We remove it and replace it by a direct call to the hash flush code on those processors and to flush_tlb_page() for everybody else. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Benjamin Herrenschmidt authored
This renames the files to clarify the fact that they are used by the hash based family of CPUs (the 603 being an exception in that family but is still handled by that code). This paves the way for the new tlb_nohash.c coming via a subsequent commit. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Dave Hansen authored
careful_allocation() was calling into the bootmem allocator for nodes which had not been fully initialized and caused a previous bug: http://patchwork.ozlabs.org/patch/10528/ So, I merged a few broken out loops in do_init_bootmem() to fix it. That changed the code ordering. I think this bug is triggered by having reserved areas for a node which are spanned by another node's contents. In the mark_reserved_regions_for_nid() code, we attempt to reserve the area for a node before we have allocated the NODE_DATA() for that nid. We do this since I reordered that loop. I suck. This is causing crashes at bootup on some systems, as reported by Jon Tollefson. This may only present on some systems that have 16GB pages reserved. But, it can probably happen on any system that is trying to reserve large swaths of memory that happen to span other nodes' contents. This commit ensures that we do not touch bootmem for any node which has not been initialized, and also removes a compile warning about an unused variable. Signed-off-by:
Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Brian King authored
It looks like most of the hugetlb code is doing the correct thing if hugepages are not supported, but the mmap code is not. If we get into the mmap code when hugepages are not supported, such as in an LPAR which is running Active Memory Sharing, we can oops the kernel. This fixes the oops being seen in this path. oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: nfs(N) lockd(N) nfs_acl(N) sunrpc(N) ipv6(N) fuse(N) loop(N) dm_mod(N) sg(N) ibmveth(N) sd_mod(N) crc_t10dif(N) ibmvscsic(N) scsi_transport_srp(N) scsi_tgt(N) scsi_mod(N) Supported: No NIP: c000000000038d60 LR: c00000000003945c CTR: c0000000000393f0 REGS: c000000077e7b830 TRAP: 0300 Tainted: G (2.6.27.5-bz50170-2-ppc64) MSR: 8000000000009032 <EE,ME,IR,DR> CR: 44000448 XER: 20000001 DAR: c000002000af90a8, DSISR: 0000000040000000 TASK = c00000007c1b8600[4019] 'hugemmap01' THREAD: c000000077e78000 CPU: 6 GPR00: 0000001fffffffe0 c000000077e7bab0 c0000000009a4e78 0000000000000000 GPR04: 0000000000010000 0000000000000001 00000000ffffffff 0000000000000001 GPR08: 0000000000000000 c000000000af90c8 0000000000000001 0000000000000000 GPR12: 000000000000003f c000000000a73880 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000010000 GPR20: 0000000000000000 0000000000000003 0000000000010000 0000000000000001 GPR24: 0000000000000003 0000000000000000 0000000000000001 ffffffffffffffb5 GPR28: c000000077ca2e80 0000000000000000 c00000000092af78 0000000000010000 NIP [c000000000038d60] .slice_get_unmapped_area+0x6c/0x4e0 LR [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80 Call Trace: [c000000077e7bbc0] [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80 [c000000077e7bc30] [c000000000107e30] .get_unmapped_area+0x64/0xd8 [c000000077e7bcb0] [c00000000010b140] .do_mmap_pgoff+0x140/0x420 [c000000077e7bd80] [c00000000000bf5c] .sys_mmap+0xc4/0x140 [c000000077e7be30] [c0000000000086b4] syscall_exit+0x0/0x40 Instruction dump: fac1ffb0 fae1ffb8 fb01ffc0 fb21ffc8 fb41ffd0 fb61ffd8 fb81ffe0 fbc1fff0 fbe1fff8 f821fef1 f8c10158 f8e10160 <7d49002e> f9010168 e92d01b0 eb4902b0 Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Dec 03, 2008
-
-
Kumar Gala authored
Refactor the RCU based pte free code that was used on ppc64 to be used on all powerpc. Additionally refactor pte_free() & pte_free_kernel() into common code between ppc32 & ppc64. Signed-off-by:
Kumar Gala <galak@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Kumar Gala authored
Clean up the ifdefs so we only use hash_page_sync if we have CONFIG_SMP && CONFIG_PPC_STD_MMU_32. Signed-off-by:
Kumar Gala <galak@kernel.crashing.org> Acked-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Nov 30, 2008
-
-
Dave Hansen authored
I got a bug report about a distro kernel not booting on a particular machine. It would freeze during boot: > ... > Could not find start_pfn for node 1 > [boot]0015 Setup Done > Built 2 zonelists in Node order, mobility grouping on. Total pages: 123783 > Policy zone: DMA > Kernel command line: > [boot]0020 XICS Init > [boot]0021 XICS Done > PID hash table entries: 4096 (order: 12, 32768 bytes) > clocksource: timebase mult[7d0000] shift[22] registered > Console: colour dummy device 80x25 > console handover: boot [udbg0] -> real [hvc0] > Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes) > Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes) > freeing bootmem node 0 I've reproduced this on 2.6.27.7. It is caused by commit 8f64e1f2 ("powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes"). The problem is that Jon took a loop which was (in pseudocode): for_each_node(nid) NODE_DATA(nid) = careful_alloc(nid); setup_bootmem(nid); reserve_node_bootmem(nid); and broke it up into: for_each_node(nid) NODE_DATA(nid) = careful_alloc(nid); setup_bootmem(nid); for_each_node(nid) reserve_node_bootmem(nid); The issue comes in when the 'careful_alloc()' is called on a node with no memory. It falls back to using bootmem from a previously-initialized node. But, bootmem has not yet been reserved when Jon's patch is applied. It gives back bogus memory (0xc000000000000000) and pukes later in boot. The following patch collapses the loop back together. It also breaks the mark_reserved_regions_for_nid() code out into a function and adds some comments. I think a huge part of introducing this bug is because for loop was too long and hard to read. The actual bug fix here is the: + if (end_pfn <= node->node_start_pfn || + start_pfn >= node_end_pfn) + continue; Signed-off-by:
Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Al Viro authored
called only from __init, calls __init. Incidentally, it ought to be static in file. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 19, 2008
-
-
Robert Jennings authored
Linux will report the number of page-ins so that the hypervisor can better determine partition memory pressure. The hardware page size and the OS page size can be different. In the case where the hardware page size is 4k and the OS is running with 64k pages the code in commit 40900194 ("powerpc: Update page-in counter for CMM") would under-report the number of pages. This corrects the reporting to the hypervisor by incrementing the page_in count by 1 << PAGE_FACTOR each time. Reported-by:
Andrew Theurer <habanero@linux.vnet.ibm.com> Signed-off-by:
Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-