- Mar 22, 2007
-
-
Linas Vepstas authored
Some drivers will attempt to perform a lot of mmio even after an EEH event was detected. This is especially the case for fast cpu's and PCI-E slots. Be a bit more lenient in allowing this. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
Change the order in which pci error state is examined; the "capabilites" is not valid if "reset state" is 5. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Feb 12, 2007
-
-
Arjan van de Ven authored
Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. [akpm@osdl.org: sparc64 fix] Signed-off-by:
Arjan van de Ven <arjan@linux.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 07, 2007
-
-
Linas Vepstas authored
It appears that EEH is improperly enabled for some Power4 systems. On these systems, the ibm,set-eeh-option returns a value of success even when EEH is not supported on the given node. Thus, an explicit check for support is required. During boot, on power4, without this patch, one sees messages similar to: EEH: event on unsupported device, rc=0 dn=/pci@400000000110/IBM,sp@1 EEH: event on unsupported device, rc=0 dn=/pci@400000000110/pci@2 EEH: event on unsupported device, rc=0 dn=/pci@400000000110/pci@2,2 etc. The patch makes these go away. Without this patch, EEH recovery does seem to work correctly for at least some devices (I tested ethernet e1000), but fails to recover others (the Emulex LightPulse LPFC, most notably). Off the top of my head, I don't remember why some devices are affected, but not others. The PAPR indicates that the correct way to test for EEH is as done in this patch; its not clear to me if this was in the PAPR all along, or recently added; if it was there all along, its not clear to me why this hadn't been fixed long ago. I suspect only certain firmware levels are affected. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Dec 08, 2006
-
-
Linas Vepstas authored
If one attempts to create a device driver recovery sequence that does not depend on a hard reset of the device, but simply just attempts to resume processing, then one discovers that the recovery sequence implemented on powerpc is not quite right. This patch fixes this up. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Sep 26, 2006
-
-
Linas Vepstas authored
Bug fix: when marking a slot as frozen, we forgot to mark pci device itself as frozen. (we did manage to mark the pci children, but forget the parent itself.) This is needed so that some device drivers can check the pci status in critical sections (e.g. in spin loops with interrupts disabled). Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Sep 22, 2006
-
-
Linas Vepstas authored
On detection of an EEH error, some Power4 systems seem to occasionally want to be reset twice before they report themselves as fully recovered. This patch re-arranges the code to attempt additional resets if the first one doesn't take. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Sep 21, 2006
-
-
Linas Vepstas authored
Add wrapper around the rtas call to enable MMIO or DMA on a frozen pci slot. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
Clean up subroutine documentation; mostly formatting changes, with some new content. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Jul 31, 2006
-
-
Jeremy Kerr authored
Now that get_property() returns a void *, there's no need to cast its return value. Also, treat the return value as const, so we can constify get_property later. pseries platform changes. Built for pseries_defconfig Signed-off-by:
Jeremy Kerr <jk@ozlabs.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Apr 13, 2006
-
-
Linas Vepstas authored
Repeated calls to eeh_remove_device() can result in multiple (and thus unbalanced) calls to pci_dev_put(). Make sure the pci_device_put() is called only once (since there was only one call to the matching pci_device_get()). Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Apr 01, 2006
-
-
Nathan Fontenot authored
This patch removes unnecessary exports, marks functions as static when possible, and simplifies some list-related code. Signed-off-by:
Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Mar 28, 2006
-
-
Benjamin Herrenschmidt authored
This removes statically assigned platform numbers and reworks the powerpc platform probe code to use a better mechanism. With this, board support files can simply declare a new machine type with a macro, and implement a probe() function that uses the flattened device-tree to detect if they apply for a given machine. We now have a machine_is() macro that replaces the comparisons of _machine with the various PLATFORM_* constants. This commit also changes various drivers to use the new macro instead of looking at _machine. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Feb 28, 2006
-
-
John Rose authored
Some hotplug driver functions were migrated to the kernel for use by EEH in commit 2bf6a8fa. Previously, the PCI Hotplug module had been changed to use the new OFDT-based PCI probe when appropriate: 5fa80fcd When rpaphp_pci_config_slot() was moved from the rpaphp driver to the new kernel function pcibios_add_pci_devices(), the OFDT-based probe stuff was dropped. This patch restores it. Signed-off-by:
John Rose <johnrose@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Jan 12, 2006
-
-
Olof Johansson authored
Remove warning in eeh code about mixed variables and code. Signed-off-by:
Olof Johansson <olof@lixom.net> Acked-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Jan 10, 2006
-
-
Linas Vepstas authored
This fixes a crash on null-pointer deref during dlpar slot addition. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from 1c87c0f84943fbbc91826967ff4fea1b059a526f commit)
-
Linas Vepstas authored
242-eeh-no-percpu-counters.patch Remove per-cpu counters from the EEH code. These statistics counters are incremented at a very low frequency, and the performance gains of per-cpu variables are negligable. By contrast, the counters weren't safe against cpu off/online operations, and its not worth the effort to make them so (other than to turn them into plain globals). Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from be3b5d1be053ccb41e91fa5a6f43ef5db301357d commit)
-
Linas Vepstas authored
241-eeh-save-bars-earlier.patch Save the PCI device bars *before* any PCI probing is done. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from 76c902b919098860f3d4e125f847abcc4cb1782a commit)
-
Linas Vepstas authored
239-eeh-multifunction-consolidate.patch New-style firmware will often place multiple different functions under a non-EEH-aware parent. However, these devices might share a common PE "partition endpoint" and config address, ad thus any EEH events will affect all of the devices in common. This patch makes the effort to find all of these common devices and handle them together. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from 216810296bb97d39da8e176822e9de78d2f00187 commit)
-
Linas Vepstas authored
238-eeh-stop-if-reset_failed.patch If the firmware is unable to reset the PCI slot for some reason, then don't attempt any further recovery steps after that point. Instead, mark the device as permanently failed. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from e06b942521eb2cdaf232726f45a820d5837acb12 commit)
-
Linas Vepstas authored
237-eeh-bridge-token.patch Minor: the rtas-bridge token should be set up the same way that all the other rtas tokens are set up. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from 78379b6c5fc17b6666c40b05988e6708e98479c0 commit)
-
Linas Vepstas authored
236-eeh-config-addr.patch The PE configuration address wasn't being cnsistently used in all locations where a config address is called for. This patch adds it to the places it should have appeared in. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from c2bc904a28095aca0b04a37854b63b78622a032e commit)
-
Linas Vepstas authored
234-eeh-find-pe.patch The find_device_pe() routine is duplicated in two files. Remove one of the two copies, declare the other extern. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from 48408e708282d4d0269136ff27ea5acbd9410b5a commit)
-
Linas Vepstas authored
233-eeh-buid-fix.patch Remove un-desired warning print from EEH code. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from 241239e6aff69788a177d97c5d06fe9995c74cca commit)
-
Linas Vepstas authored
26-eeh-partition-endpoint.patch New versions of firmware introduce a new method by which the "partitionable endpoint" (the point at which the pci bus is cut) should be located. This code adds the support for this (mandatory) new feature. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from 9fcfb5d35b5294659f9299aa9cae6fd16325c07e commit)
-
Linas Vepstas authored
25-pci-address-cache.patch The core EEH file is rather large. This patch splits out a self-contained chunk of it into its own file. This is the chunk that performes the caching and lookup of pci devices based on the i/o addresses of thier resoures. This code is almos architecture-independent and could be used by any system that wanted to find a pci device based only on the i/o address used by the device. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from b0b291d59906d4a9a89ed9e34d9fd684c7188924 commit)
-
Linas Vepstas authored
Various PCI bus errors can be signaled by newer PCI controllers. The core error recovery routines are architecture dependent. This patch adds a recovery infrastructure for the PPC64 pSeries systems. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)
-
- Jan 09, 2006
-
-
Linas Vepstas authored
20-rpaphp-eeh-cleanup.patch This patch move some code from the rpaphp directory, to the powerpc directory, where it should have been all along (Among other things, I need it in the powerpc directory for the PCI error recovery.) Please note that patch affects TWO maintainers: Paul, after applying the powerpc part, please ask that GregKH appli the PCI part. It is safe to have the powerpc part go in first. It would be bad to have the PCI part go in first. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Nov 17, 2005
-
-
David Woodhouse authored
If the kernel supports both G5 and pSeries, and CONFIG_EEH is enabled, eeh_init() is (quite reasonably) never called when we boot on a G5. Yet eeh_check_failure() still gets called. We should avoid doing that if !eeh_subsystem_enabled. Signed-off-by:
David Woodhouse <dwmw2@infradead.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Nov 10, 2005
-
-
Linas Vepstas authored
17-eeh-slot-marking-bug.patch A device that experiences a PCI outage may be just one deivce out of many that was affected. In order to avoid repeated reports of a failure, the entire tree of affected devices should be marked as failed. This patch marks up the entire tree. Signed-off-by:
Linas Vepstas <linas@linas.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Paul Mackerras authored
Gcc 4 doesn't like being told to inline a recursive function... Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Paul Mackerras authored
This patch merges platform codes. systemcfg->platform is no longer used, systemcfg use in general is deprecated as much as possible (and renamed _systemcfg before it gets completely moved elsewhere in a future patch), _machine is now used on ppc64 along as ppc32. Platform codes aren't gone yet but we are getting a step closer. A bunch of asm code in head[_64].S is also turned into C code. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
14-eeh-device-bar-save.patch After a PCI device has been resest, the device BAR's and other config space info must be restored to the same state as they were in when the firmware first handed us this device. This will allow the PCI device driver, when restarted, to correctly recognize and set up the device. Tis patch saves the device config space as early as reasonable after the firmware has handed over the device. Te state resore funcion is inteded for use by the EEH recovery routines. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
13-eeh-recovery-support-routines.patch EEH Recovery support routines This patch adds routines required to help drive the recovery of EEH-frozen slots. The main function is to drive the PCI #RST signal line high for a qurter of a second, and then allow for a second & a half of settle time. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
12-eeh-event-dispatcher.patch ppc64: EEH Recovery dispatcher thread This patch adds a mechanism to create recovery threads when an EEH event is received. Since an EEH freeze state may be detected within an interrupt context, we need to get out of the interrupt context before starting recovery. This dispatcher does this in two steps: first, it uses a workqueue to get out, and then lanuches a kernel thread, so that the recovery routine can sleep for exteded periods without upseting the keventd. A kernel thread is created with each EEH event, rather than having one long-running daemon started at boot time. This is because it is anticipated that EEH events will be very rare (very very rare, ideally) and so its pointless to cluter the process tables with a daemon that will almost never run. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
11-eeh-move-to-powerpc.patch Move arch/ppc64/kernel/eeh.c to arch//powerpc/platforms/pseries/eeh.c No other changes (except for Makefile to build it) Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
10-EEH-enable-bugfix.patch Bugfix: With the curent linux-2.6.14-rc2-git6, EEH errors are ignored because thier detection requires an unused, uninitialized flag to be set. This patch removes the unused flag. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
08-eeh-spin-counter.patch One an EEH event is triggers, all further I/O to a device is blocked (until reset). Bad device drivers may end up spinning in their interrupt handlers, trying to read an interrupt status register that will never change state. This patch moves that spin counter to a per-device structure, and adds some diagnostic prints to help locate the bad driver. Signed-off-by:
Linas Vepstas <linas@linas.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
07-eeh-report-race.patch When a PCI slot is isolated, all PCI functions under that slot are affected. If hese functions have separate device drivers, the EEH isolation event might be reported multiple times. This patch adds a lock to prevent the racing of such multiple reports. It also marks every device under the slot as having experienced an EEH event, so that multiple reports may be recognized more easily. Signed-off-by:
Linas Vepstas <linas@linas.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
Linas Vepstas authored
06-eeh-empty-slot-error.patch Performing PCI config-space reads to empty PCI slots can lead to reports of "permanent failure" from the firmware. Ignore permanent failures on empty slots. Signed-off-by:
Linas Vepstas <linas@linas.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-