- Mar 20, 2006
-
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Christoph Hellwig authored
Currently lockd directly access the file_lock_list from fs/locks.c. It does so to mark locks granted or reclaimable. This is very suboptimal, because a) lockd needs to poke into locks.c internals, and b) it needs to iterate over all locks in the system for marking locks granted or reclaimable. This patch adds lists for granted and reclaimable locks to the nlm_host structure instead, and adds locks to those. nlmclnt_lock: now adds the lock to h_granted instead of setting the NFS_LCK_GRANTED, still O(1) nlmclnt_mark_reclaim: goes away completely, replaced by a list_splice_init. Complexity reduced from O(locks in the system) to O(1) reclaimer: iterates over h_reclaim now, complexity reduced from O(locks in the system) to O(locks per nlm_host) Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Otherwise, the block may disappear from underneath us when in nlmsvc_retry_blocked. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Currently NFS O_DIRECT writes use FILE_SYNC so that a COMMIT is not necessary. This simplifies the internal logic, but this could be a difficult workload for some servers. Instead, let's send UNSTABLE writes, and after they all complete, send a COMMIT for the dirty range. After the COMMIT returns successfully, then do the wake_up or fire off aio_complete(). Test plan: Async direct I/O tests against Solaris (or any server that requires committed unstable writes). Reboot server during test. Based on an earlier patch by Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
We need to use nfs_commit_alloc() in fs/nfs/direct.c. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Duplicate infrastructure from direct read path that will allow write path to generate multiple write requests concurrently. This will enable us to add support for aio in this path. Temporarily we will lose the ability to do UNSTABLE writes followed by a COMMIT in the direct write path. However, all applications I am aware of that use NFS O_DIRECT currently write in relatively small chunks, so this should not be inconvenient in any way. Test plan: Millions of fsx-odirect ops. OraSim. Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Same callback hierarchy inversion as for the NFS write calls. This patch is not strictly speaking needed by the O_DIRECT code, but avoids confusing differences between the asynchronous read and write code. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
This patch inverts the callback hierarchy for NFS write calls. Instead of having the NFSv2/v3/v4-specific code set up the RPC callback ops, we allow the original caller to do so. This allows for more flexibility w.r.t. how to set up and tear down the nfs_write_data structure while still allowing the NFSv3/v4 code to perform error handling. The greater flexibility is needed by the asynchronous O_DIRECT code, which wants to be able to hold on to the original nfs_write_data structures after the WRITE RPC call has completed in order to be able to replay them if the COMMIT call determines that the server has rebooted. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
J. Bruce Fields authored
Currently lockd identifies its own locks using the FL_LOCKD flag. This doesn't scale well to multiple lock managers--if we did this in nfsv4 too, for example, we'd be left with only one free flag bit. Instead, we just check whether the file manager ops (fl_lmops) set on this lock are our own. The only use for this is in nlm_traverse_locks, which uses it to find locks that need cleaning up when freeing a host or a file. In the long run it might be nice to do reference counting instead of traversing all the locks like this.... Signed-off-by:
J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Andy Adamson authored
posix_test_lock() returns a pointer to a struct file_lock which is unprotected and can be removed while in use by the caller. Move the conflicting lock from the return to a parameter, and copy the conflicting lock. In most cases the caller ends up putting the copy of the conflicting lock on the stack. On i386, sizeof(struct file_lock) appears to be about 100 bytes. We're assuming that's reasonable. Signed-off-by:
Andy Adamson <andros@citi.umich.edu> Signed-off-by:
J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Andy Adamson authored
posix_lock_file() is used to add a blocked lock to Lockd's block, so posix_block_lock() is no longer needed. Signed-off-by:
Andy Adamson <andros@citi.umich.edu> Signed-off-by:
J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Clean-up: replace rpc_call() helper with direct call to rpc_call_sync. This makes NFSv2 and NFSv3 synchronous calls more computationally efficient, and reduces stack consumption in functions that used to invoke rpc_call more than once. Test plan: Compile kernel with CONFIG_NFS enabled. Connectathon on NFS version 2, version 3, and version 4 mount points. Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Add fields to the rpc_procinfo struct that allow the display of a human-readable name for each procedure in the rpc_iostats output. Also fix it so that the NFSv4 stats are broken up correctly by sub-procedure number. NFSv4 uses only two real RPC procedures: NULL, and COMPOUND. Test plan: Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats". Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Add a simple mechanism for collecting stats in the RPC client. Stats are tabulated during xprt_release. Note that per_cpu shenanigans are not required here because the RPC client already serializes on the transport write lock. Test plan: Compile kernel with CONFIG_NFS enabled. Basic performance regression testing with high-speed networking and high performance server. Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Account for various things that occur while an RPC task is executed. Separate timers for RPC round trip and RPC execution time show how long RPC requests wait in queue before being sent. Eventually these will be accumulated at xprt_release time in one place where they can be viewed from userland. Test plan: Compile kernel with CONFIG_NFS enabled. Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Monitor generic transport events. Add a transport switch callout to format transport counters for export to user-land. Test plan: Compile kernel with CONFIG_NFS enabled. Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
RPC wait queue length will eventually be exported to userland via the RPC iostats interface. Test plan: Compile kernel with CONFIG_NFS enabled. Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Add a field in nfs_server to record a timestamp when a mount succeeds. Report the number of seconds the file system has been mounted via nfs_show_stats(). Test plan: Mount an NFS file system, watch the mountstats reports and compare with clock time. Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Add a per-superblock performance counter facility to the NFS client. This facility mimics the counters available for block devices and for networking. Expose these new counters via the new /proc/self/mountstats interface. Thanks to Andrew Morton and Trond Myklebust for their review and comments. Test plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Sometimes it's important to know the exact RPC retransmit settings the kernel is using for an NFS mount point. Add this facility to the NFS client's show_options method. Test plan: Set various retransmit settings via the mount command, and check that the settings are reflected in /proc/mounts. Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Create a new file under /proc/self, called mountstats, where mounted file systems can export information (configuration options, performance counters, and so on). Use a mechanism similar to /proc/mounts and s_ops->show_options. This mechanism does not violate namespace security, and is safe to use while other processes are unmounting file systems. Thanks to Mike Waychison for his review and comments. Test-plan: Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats. Signed-off-by:
Chuck Lever <cel@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Goldwyn Rodrigues authored
read_cache_mtime is no longer used in nfs_inode. This patch removes references of read_cache_mtime in the code comments. Signed-off-by:
Goldwyn Rodrigues <rgoldwyn@gmail.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Instead we use the nlm_lockowner->pid. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
The nfs_open_context may live longer than the file descriptor that spawned it, so it needs to carry a reference to the vfsmount. If not, then generic_shutdown_super() may end up being called before reads and writes have been flushed out. Make a couple of functions static while we're at it... Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
- Mar 19, 2006
-
-
Michael Chan authored
The 40-bit DMA workaround recently implemented for 5714, 5715, and 5780 needs to be expanded because there may be other tg3 devices behind the EPB Express to PCIX bridge in the 5780 class device. For example, some 4-port card or mother board designs have 5704 behind the 5714. All devices behind the EPB require the 40-bit DMA workaround. Thanks to Chris Elmquist again for reporting the problem and testing the patch. Signed-off-by:
Michael Chan <mchan@broadcom.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ralf Baechle authored
If the AX.25 dialect chosen by the sysadmin is set to DAMA master / 3 (or DAMA slave / 2, if CONFIG_AX25_DAMA_SLAVE=n) ax25_kick() will fall through the switch statement without calling ax25_send_iframe() or any other function that would eventually free skbn thus leaking the packet. Fix by restricting the sysctl inferface to allow only actually supported AX.25 dialects. The system administration mistake needed for this to happen is rather unlikely, so this is an uncritical hole. Coverity #651. Signed-off-by:
Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Mar 18, 2006
-
-
Ralf Baechle authored
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>: sb1250_gettimeoffset() simply reads the current cpu 0 timer remaining value, however once this counter reaches 0 and the interrupt is raised, it immediately resets and begins to count down again. If sb1250_gettimeoffset() is called on cpu 1 via do_gettimeofday() after the timer has reset but prior to cpu 0 processing the interrupt and taking write_seqlock() in timer_interrupt() it will return a full value (or close to it) causing time to jump backwards 1ms. Once cpu 0 handles the interrupt and timer_interrupt() gets far enough along it will jump forward 1ms. Fix this problem by implementing mips_hpt_*() on sb1250 using a spare timer unrelated to the existing periodic interrupt timers. It runs at 1Mhz with a full 23bit counter. This eliminated the custom do_gettimeoffset() for sb1250 and allowed use of the generic fixed_rate_gettimeoffset() using mips_hpt_*() and timerhi/timerlo. Signed-off-by:
Ralf Baechle <ralf@linux-mips.org>
-
Ralf Baechle authored
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>: Field width should be 23 bits not 20 bits. Signed-off-by:
Ralf Baechle <ralf@linux-mips.org>
-
Ralf Baechle authored
If a call to set_io_port_base() was being followed by usage of mips_io_port_base in the same function gcc was possibly using the old value due to some clever abuse of const. Adding a barrier will keep the optimization and result in correct code with latest gcc. Signed-off-by:
Ralf Baechle <ralf@linux-mips.org>
-
Atsushi Nemoto authored
If dcache_size != icache_size or dcache_size != scache_size, or set-associative cache, icache/scache does not flushed properly. Make blast_?cache_page_indexed() masks its index value correctly. Also, use physical address for physically indexed pcache/scache. Signed-off-by:
Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by:
Ralf Baechle <ralf@linux-mips.org>
-
Ralf Baechle authored
The SB1 core has a three cycle interrupt disable hazard but we were wrongly treating it as fully interlocked. Signed-off-by:
Ralf Baechle <ralf@linux-mips.org>
-
Alexey Kuznetsov authored
It is broken, the condition is checked out of socket lock. It is wonderful the bug survived for so long time. [ This fixes bugzilla #6233: race condition in tcp_sendmsg when connection became established ] Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Mar 16, 2006
-
-
John Rose authored
The dynamic add path for PCI Host Bridges can fail to configure children adapters under P5IOC controllers. It fails to properly fixup bus/device resources, and it fails to properly enable EEH. Both of these steps need to occur before any children devices are enabled in pci_bus_add_devices(). Signed-off-by:
John Rose <johnrose@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- Mar 15, 2006
-
-
Ben Dooks authored
Patch from Ben Dooks The enable_hlt and disable_hlt should be declared in include/asm/setup.h. This fixes sparse errors from arch/arm/kernel/process.c Signed-off-by:
Ben Dooks <ben-linux@fluff.org> Signed-off-by:
Russell King <rmk+kernel@arm.linux.org.uk>
-
- Mar 12, 2006
-
-
Russell King authored
This patch removes the reliance of iwmmxt on hand coded alignments. Since thread_info is always 8K aligned, specifying that fpstate is 8-byte aligned achieves the same effect without needing to resort to hand coded alignments. Signed-off-by:
Russell King <rmk+kernel@arm.linux.org.uk>
-
- Mar 11, 2006
-
-
Christoph Hellwig authored
The patch '[PATCH] RCU signal handling' [1] added an export for __put_task_struct_cb, a put_task_struct helper newly introduced in that patch. But the put_task_struct couldn't be used modular previously as __put_task_struct wasn't exported. There are not callers of it in modular code, and it shouldn't be exported because we don't want drivers to hold references to task_structs. This patch removes the export and folds __put_task_struct into __put_task_struct_cb as there's no other caller. [1] http://www2.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e56d090310d7625ecb43a1eeebd479f04affb48b Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Kirill Korotaev authored
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in ext3_symlink(). Such allocation may re-enter ext3 code from try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal handle in task_struct and, hence, is not reentrable. This bug led to "Assertion failure in journal_dirty_metadata()" messages. http://bugzilla.openvz.org/show_bug.cgi?id=115 Signed-off-by:
Andrey Savochkin <saw@saw.sw.com.sg> Signed-off-by:
Kirill Korotaev <dev@openvz.org> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-