- Feb 06, 2018
-
-
David Howells authored
Support the AFS dynamic root which is a pseudo-volume that doesn't connect to any server resource, but rather is just a root directory that dynamically creates mountpoint directories where the name of such a directory is the name of the cell. Such a mount can be created thus: mount -t afs none /afs -o dyn Dynamic root superblocks aren't shared except by bind mounts and propagation. Cell root volumes can then be mounted by referring to them by name, e.g.: ls /afs/grand.central.org/ ls /afs/.grand.central.org/ The kernel will upcall to consult the DNS if the address wasn't supplied directly. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Rearrange afs_select_fileserver() a little to put the use_server chunk before the next_server chunk so that with the removal of a couple of gotos the main path through the function is all one sequence. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Remove some old unused code. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Fix server list handling in the following ways: (1) In afs_alloc_volume(), remove duplicate server list build code. This was already done by afs_alloc_server_list() which afs_alloc_volume() previously called. This just results in twice as many VL RPCs. (2) In afs_deliver_vl_get_entry_by_name_u(), use the number of server records indicated by ->nServers in the UVLDB record returned by the VL.GetEntryByNameU RPC call rather than scanning all NMAXNSERVERS slots. Unused slots may contain garbage. (3) In afs_alloc_server_list(), don't stop converting a UVLDB record into a server list just because we can't look up one of the servers. Just skip that server and go on to the next. If we can't look up any of the servers then we'll fail at the end. Without this patch, an attempt to view the umich.edu root cell using something like "ls /afs/umich.edu" on a dynamic root (future patch) mount or an autocell mount will result in ENOMEDIUM. The failure is due to kafs not stopping after nServers'worth of records have been read, but then trying to access a server with a garbage UUID and getting an error, which aborts the server list build. Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by:
Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by:
David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
-
David Howells authored
In afs_select_fileserver(), we need to clear the ->responded flag in the address list when reusing it. We should also clear it in afs_select_current_fileserver(). To this end, just memset() the object before initialising it. Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by:
David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
-
David Howells authored
afs_select_fileserver() ends the address cursor it is using in the case in which we get some sort of network error and run out of addresses to iterate through, before it jumps to try the next server. This also needs to be done when the server aborts with some sort of error that means we should try the next server. Fix this by: (1) Move the iterate_address afs_end_cursor() call to the next_server case. (2) End the cursor in the failed case. (3) Make afs_end_cursor() clear the ->begun flag and ->addr pointer in the address cursor. (4) Make afs_end_cursor() able to be called on an already cleared cursor. Without this, something like the following oops may occur: AFS: Assertion failed 18446612134397189888 == 0 is false 0xffff88007c279f00 == 0x0 is false ------------[ cut here ]------------ kernel BUG at fs/afs/rotate.c:360! RIP: 0010:afs_select_fileserver+0x79b/0xa30 [kafs] Call Trace: afs_statfs+0xcc/0x180 [kafs] ? p9_client_statfs+0x9e/0x110 [9pnet] ? _cond_resched+0x19/0x40 statfs_by_dentry+0x6d/0x90 vfs_statfs+0x1b/0xc0 user_statfs+0x4b/0x80 SYSC_statfs+0x15/0x30 SyS_statfs+0xe/0x10 entry_SYSCALL_64_fastpath+0x20/0x83 Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by:
Marc Dionne <marc.dionne@auristor.com> Signed-off-by:
David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
-
David Howells authored
afs_alloc_volume() needs to release the cell ref it obtained in the case of an error. Fix this by adding an afs_put_cell() call into the error path. This can triggered when a lookup for a cell in a dynamic root or an autocell mount returns an error whilst trying to look up the server (such as ENOMEDIUM). This results in an assertion failure oops when the module is unloaded due to outstanding refs on a cell record. Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by:
David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
-
- Jan 29, 2018
-
-
Jeff Layton authored
For AFS, it's generally treated as an opaque value, so we use the *_raw variants of the API here. Note that AFS has quite a different definition for this counter. AFS only increments it on changes to the data to the data in regular files and contents of the directories. Inode metadata changes do not result in a version increment. We'll need to reconcile that somehow if we ever want to present this to userspace via statx. Signed-off-by:
Jeff Layton <jlayton@redhat.com>
-
- Jan 02, 2018
-
-
David Howells authored
afs_write_end() is missing page unlock and put if afs_fill_page() fails. Reported-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Repeating creation and deletion of a file on an afs mount will run the box out of memory, e.g.: dd if=/dev/zero of=/afs/scratch/m0 bs=$((1024*1024)) count=512 rm /afs/scratch/m0 The problem seems to be that it's not properly decrementing the nlink count so that the inode can be scrapped. Note that this doesn't fix local creation followed by remote deletion. That's harder to handle and will require a separate patch as we're not told that the file has been deleted - only that the directory has changed. Reported-by:
Marc Dionne <marc.dionne@auristor.com> Signed-off-by:
David Howells <dhowells@redhat.com>
-
Dan Carpenter authored
Smatch warns that: fs/afs/rxrpc.c:922 afs_extract_data() error: uninitialized symbol 'remote_abort'. Smatch is right that "remote_abort" might be uninitialized when we pass it to afs_set_call_complete(). I don't know if that function uses the uninitialized variable. Anyway, the comment for rxrpc_kernel_recv_data(), says that "*_abort should also be initialised to 0." and this patch does that. Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
David Howells <dhowells@redhat.com>
-
- Dec 01, 2017
-
-
David Howells authored
When an AFS inode is allocated by afs_alloc_inode(), the allocated afs_vnode struct isn't necessarily reset from the last time it was used as an inode because the slab constructor is only invoked once when the memory is obtained from the page allocator. This means that information can leak from one inode to the next because we're not calling kmem_cache_zalloc(). Some of the information isn't reset, in particular the permit cache pointer. Bring the clearances up to date. Signed-off-by:
David Howells <dhowells@redhat.com> Tested-by:
Marc Dionne <marc.dionne@auristor.com>
-
David Howells authored
Fix four refcount bugs in afs_cache_permit(): (1) When checking the result of the kzalloc(), we can't just return, but must put 'permits'. (2) We shouldn't put permits immediately after hashing a new permit as we need to keep the pointer stable so that we can check to see if vnode->permit_cache has changed before we decide whether to assign to it. (3) 'permits' is being put twice. (4) We need to put either the replacement or the thing replaced after the assignment to vnode->permit_cache. Without this, lots of the following are seen: Kernel BUG at ffffffffa039857b [verbose debug info unavailable] ------------[ cut here ]------------ Kernel BUG at ffffffffa039858a [verbose debug info unavailable] ------------[ cut here ]------------ The addresses are in the .text..refcount section of the kafs.ko module. Following the relocation records for the __ex_table section shows one to be due to the decrement in afs_put_permits() and the other to be key_get() in afs_cache_permit(). Occasionally, the following is seen: refcount_t overflow at afs_cache_permit+0x57d/0x5c0 [kafs] in cc1[562], uid/euid: 0/0 WARNING: CPU: 0 PID: 562 at kernel/panic.c:657 refcount_error_report+0x9c/0xac ... Reported-by:
Marc Dionne <marc.dionne@auristor.com> Signed-off-by:
David Howells <dhowells@redhat.com> Tested-by:
Marc Dionne <marc.dionne@auristor.com>
-
- Nov 27, 2017
-
-
Linus Torvalds authored
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBIND...
-
- Nov 24, 2017
-
-
Colin Ian King authored
The assignment of dvnode to itself is redundant and can be removed. Cleans up warning detected by cppcheck: fs/afs/dir.c:975: (warning) Redundant assignment of 'dvnode' to itself. Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by:
Colin Ian King <colin.king@canonical.com> Signed-off-by:
David Howells <dhowells@redhat.com>
-
Gustavo A. R. Silva authored
Due to recent changes this piece of code is no longer needed. Addresses-Coverity-ID: 1462033 Link: https://lkml.kernel.org/r/4923.1510957307@warthog.procyon.org.uk Signed-off-by:
Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
afs_mkdir(), afs_create(), afs_link() and afs_symlink() all need to drop the target dentry if a signal causes the operation to be killed immediately before we try to contact the server. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Fix some of dentry handling in AFS directory ops: (1) Do d_drop() on the new_dentry before assigning a new inode to it in afs_vnode_new_inode(). It's fine to do this before calling afs_iget() because the operation has taken place on the server. (2) Replace d_instantiate()/d_rehash() with d_add(). (3) Don't d_drop() the new_dentry in afs_rename() on error. Also fix afs_link() and afs_rename() to call key_put() on all error paths where the key is taken. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Make afs_write_begin() wait for a page that's marked PG_writeback because: (1) We need to avoid interference with the data being stored so that the data on the server ends up in a defined state. (2) page->private is used to track the window of dirty data within a page, but it's also used by the storage code to track what's being written, being cleared by the completion notification. Ownership can't be relinquished by the storage code until completion because it a store fails, the data must be remarked dirty. Tracing shows something like the following (edited): x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-125 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store+ 0-125 x86_64-linux-gn-15940 [1] afs_page_dirty: vn=ffff8800bef33800 9c75 begin 0-2052 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 clear 0-2052 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 store 0-0 kworker/u8:3-114 [2] afs_page_dirty: vn=ffff8800bef33800 9c75 WARN 0-0 The clear (completion) corresponding to the store+ (store continuation from a previous page) happens between the second begin (afs_write_begin) and the store corresponding to that. This results in the second store not seeing any data to write back, leading to the following warning: WARNING: CPU: 2 PID: 114 at ../fs/afs/write.c:403 afs_write_back_from_locked_page+0x19d/0x76c [kafs] Modules linked in: kafs(E) CPU: 2 PID: 114 Comm: kworker/u8:3 Tainted: G E 4.14.0-fscache+ #242 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Workqueue: writeback wb_workfn (flush-afs-2) task: ffff8800cad72600 task.stack: ffff8800cad44000 RIP: 0010:afs_write_back_from_locked_page+0x19d/0x76c [kafs] RSP: 0018:ffff8800cad47aa0 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff8800bef33a20 RCX: 0000000000000000 RDX: 000000000000000f RSI: ffffffff81c5d0e0 RDI: ffff8800cad72e78 RBP: ffff8800d31ea1e8 R08: ffff8800c1358000 R09: ffff8800ca00e400 R10: ffff8800cad47a38 R11: ffff8800c5d9e400 R12: 0000000000000000 R13: ffffea0002d9df00 R14: ffffffffa0023c1c R15: 0000000000007fdf FS: 0000000000000000(0000) GS:ffff8800ca700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f85ac6c4000 CR3: 0000000001c10001 CR4: 00000000001606e0 Call Trace: ? clear_page_dirty_for_io+0x23a/0x267 afs_writepages_region+0x1be/0x286 [kafs] afs_writepages+0x60/0x127 [kafs] do_writepages+0x36/0x70 __writeback_single_inode+0x12f/0x635 writeback_sb_inodes+0x2cc/0x452 __writeback_inodes_wb+0x68/0x9f wb_writeback+0x208/0x470 ? wb_workfn+0x22b/0x565 wb_workfn+0x22b/0x565 ? worker_thread+0x230/0x2ac process_one_work+0x2cc/0x517 ? worker_thread+0x230/0x2ac worker_thread+0x1d4/0x2ac ? rescuer_thread+0x29b/0x29b kthread+0x15d/0x165 ? kthread_create_on_node+0x3f/0x3f ? call_usermodehelper_exec_async+0x118/0x11f ret_from_fork+0x24/0x30 Signed-off-by:
David Howells <dhowells@redhat.com>
-
- Nov 17, 2017
-
-
David Howells authored
Fix the AFS file locking whereby the use of the big kernel lock (which could be slept with) was replaced by a spinlock (which couldn't). The problem is that the AFS code was doing stuff inside the critical section that might call schedule(), so this is a broken transformation. Fix this by the following means: (1) Use a state machine with a proper state that can only be changed under the spinlock rather than using a collection of bit flags. (2) Cache the key used for the lock and the lock type in the afs_vnode struct so that the manager work function doesn't have to refer to a file_lock struct that's been dequeued. This makes signal handling safer. (4) Move the unlock from afs_do_unlk() to afs_fl_release_private() which means that unlock is achieved in other circumstances too. (5) Unlock the file on the server before taking the next conflicting lock. Also change: (1) Check the permits on a file before actually trying the lock. (2) fsync the file before effecting an explicit unlock operation. We don't fsync if the lock is erased otherwise as we might not be in a context where we can actually do that. Further fixes: (1) Fixed-fileserver address rotation is made to work. It's only used by the locking functions, so couldn't be tested before. Fixes: 72f98e72 ("locks: turn lock_flocks into a spinlock") Signed-off-by:
David Howells <dhowells@redhat.com> cc: jlayton@redhat.com
-
- Nov 16, 2017
-
-
Mel Gorman authored
Every pagevec_init user claims the pages being released are hot even in cases where it is unlikely the pages are hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net Signed-off-by:
Mel Gorman <mgorman@techsingularity.net> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Use find_get_pages_range_tag() in afs_writepages_region() as we are interested only in pages from given range. Remove unnecessary code after this conversion. Link: http://lkml.kernel.org/r/20171009151359.31984-16-jack@suse.cz Signed-off-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 13, 2017
-
-
David Howells authored
Protect call->state changes against the call being prematurely terminated due to a signal. What can happen is that a signal causes afs_wait_for_call_to_complete() to abort an afs_call because it's not yet complete whilst afs_deliver_to_call() is delivering data to that call. If the data delivery causes the state to change, this may overwrite the state of the afs_call, making it not-yet-complete again - but no further notifications will be forthcoming from AF_RXRPC as the rxrpc call has been aborted and completed, so kAFS will just hang in various places waiting for that call or on page bits that need clearing by that call. A tracepoint to monitor call state changes is also provided. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Add a trace event that logs the dirtying and cleaning of pages attached to AFS inodes. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Implement shared-writeable mmap for AFS. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Get rid of the afs_writeback record that kAFS is using to match keys with writes made by that key. Instead, keep a list of keys that have a file open for writing and/or sync'ing and iterate through those. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Introduce a file-private data record for kAFS and put the key into it rather than storing the key in file->private_data. Signed-off-by:
David Howells <dhowells@redhat.com>
-
Marc Dionne authored
It is not required that the afs client operate on port 7001. The port could be in use because another kernel or userspace client has already bound to it. If the port is in use, just fallback to using a dynamic port. Signed-off-by:
Marc Dionne <marc.dionne@auristor.com> Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Because parsing of the directory wasn't being done under any sort of lock, the pages holding the directory content can get invalidated whilst the parsing is ongoing. Further, the directory page check function gets called outside of the page lock, so if the page gets cleared or updated, this may return reports of bad magic numbers in the directory page. Also, the directory may change size whilst checking and parsing are ongoing, so more care needs to be taken here. Fix this by: (1) Perform the page check from the page filling function before we set PageUptodate and drop the page lock. (2) Check for the file having shrunk and the page having been abandoned before checking the page contents. (3) Lock the page whilst parsing it for the directory iterator. Whilst we're at it, add a tracepoint to report check failure. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Add a pair of tracepoints to log the sending of pages for an FS.StoreData or FS.StoreData64 operation. Tracepoint afs_send_pages notes each set of pages added to the operation. There may be several of these per operation as we get up at most 8 contiguous pages in one go because the bvec we're using is on the stack. Tracepoint afs_sent_pages notes the end of adding data from a whole run of pages to the operation and the completion of the request phase. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Add tracepoints to trace the initiation and completion of client calls within the kafs filesystem. The afs_make_vl_call tracepoint watches calls to the volume location database server. The afs_make_fs_call tracepoint watches calls to the file server. The afs_call_done tracepoint watches for call completion. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Fix the total-length calculation in afs_make_call() when the operation being dispatched has data from a series of pages attached. Despite the patched code looking like that it should reduce mathematically to the current code, it doesn't because the 32-bit unsigned arithmetic being used to calculate the page-offset-difference doesn't correctly extend to a 64-bit value when the result is effectively negative. Without this, some FS.StoreData operations that span multiple pages fail, reporting too little or too much data. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Only progress the AFS call state at the end of Tx phase from the callback passed to rxrpc_kernel_send_data() rather than setting it before the last data send call. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
YFS VL servers offer an upgraded Volume Location service that can return IPv6 addresses to fileservers and volume servers in addition to IPv4 addresses using the YFSVL.GetEndpoints operation which we should use if it's available. To this end: (1) Make rxrpc_kernel_recv_data() return the call's current service ID so that the caller can detect service upgrade and see what the service was upgraded to. (2) When we see a VL server address we haven't seen before, send a VL.GetCapabilities operation to it with the service upgrade bit set. If we get an upgrade to the YFS VL service, change the service ID in the address list for that address to use the upgraded service and set a flag to note that this appears to be a YFS-compatible server. (3) If, when a server's addresses are being looked up, we note that we previously detected a YFS-compatible server, then send the YFSVL.GetEndpoints operation rather than VL.GetAddrsU. (4) Build a fileserver address list from the reply of YFSVL.GetEndpoints, including both IPv4 and IPv6 addresses. Volume server addresses are discarded. (5) The address list is sorted by address and port now, instead of just address. This allows multiple servers on the same host sitting on different ports. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
The current code assumes that volumes and servers are per-cell and are never shared, but this is not enforced, and, indeed, public cells do exist that are aliases of each other. Further, an organisation can, say, set up a public cell and a private cell with overlapping, but not identical, sets of servers. The difference is purely in the database attached to the VL servers. The current code will malfunction if it sees a server in two cells as it assumes global address -> server record mappings and that each server is in just one cell. Further, each server may have multiple addresses - and may have addresses of different families (IPv4 and IPv6, say). To this end, the following structural changes are made: (1) Server record management is overhauled: (a) Server records are made independent of cell. The namespace keeps track of them, volume records have lists of them and each vnode has a server on which its callback interest currently resides. (b) The cell record no longer keeps a list of servers known to be in that cell. (c) The server records are now kept in a flat list because there's no single address to sort on. (d) Server records are now keyed by their UUID within the namespace. (e) The addresses for a server are obtained with the VL.GetAddrsU rather than with VL.GetEntryByName, using the server's UUID as a parameter. (f) Cached server records are garbage collected after a period of non-use and are counted out of existence before purging is allowed to complete. This protects the work functions against rmmod. (g) The servers list is now in /proc/fs/afs/servers. (2) Volume record management is overhauled: (a) An RCU-replaceable server list is introduced. This tracks both servers and their coresponding callback interests. (b) The superblock is now keyed on cell record and numeric volume ID. (c) The volume record is now tied to the superblock which mounts it, and is activated when mounted and deactivated when unmounted. This makes it easier to handle the cache cookie without causing a double-use in fscache. (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU to get the server UUID list. (e) The volume name is updated if it is seen to have changed when the volume is updated (the update is keyed on the volume ID). (3) The vlocation record is got rid of and VLDB records are no longer cached. Sufficient information is stored in the volume record, though an update to a volume record is now no longer shared between related volumes (volumes come in bundles of three: R/W, R/O and backup). and the following procedural changes are made: (1) The fileserver cursor introduced previously is now fleshed out and used to iterate over fileservers and their addresses. (2) Volume status is checked during iteration, and the server list is replaced if a change is detected. (3) Server status is checked during iteration, and the address list is replaced if a change is detected. (4) The abort code is saved into the address list cursor and -ECONNABORTED returned in afs_make_call() if a remote abort happened rather than translating the abort into an error message. This allows actions to be taken depending on the abort code more easily. (a) If a VMOVED abort is seen then this is handled by rechecking the volume and restarting the iteration. (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is handled by sleeping for a short period and retrying and/or trying other servers that might serve that volume. A message is also displayed once until the condition has cleared. (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the moment. (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to see if it has been deleted; if not, the fileserver is probably indicating that the volume couldn't be attached and needs salvaging. (e) If statfs() sees one of these aborts, it does not sleep, but rather returns an error, so as not to block the umount program. (5) The fileserver iteration functions in vnode.c are now merged into their callers and more heavily macroised around the cursor. vnode.c is removed. (6) Operations on a particular vnode are serialised on that vnode because the server will lock that vnode whilst it operates on it, so a second op sent will just have to wait. (7) Fileservers are probed with FS.GetCapabilities before being used. This is where service upgrade will be done. (8) A callback interest on a fileserver is set up before an FS operation is performed and passed through to afs_make_call() so that it can be set on the vnode if the operation returns a callback. The callback interest is passed through to afs_iget() also so that it can be set there too. In general, record updating is done on an as-needed basis when we try to access servers, volumes or vnodes rather than offloading it to work items and special threads. Notes: (1) Pre AFS-3.4 servers are no longer supported, though this can be added back if necessary (AFS-3.4 was released in 1998). (2) VBUSY is retried forever for the moment at intervals of 1s. (3) /proc/fs/afs/<cell>/servers no longer exists. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Move server rotation code into its own file. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Add an RCU replaceable address list structure to hold a list of server addresses. The list also holds the To this end: (1) A cell's VL server address list can be loaded directly via insmod or echo to /proc/fs/afs/cells or dynamically from a DNS query for AFSDB or SRV records. (2) Anyone wanting to use a cell's VL server address must wait until the cell record comes online and has tried to obtain some addresses. (3) An FS server's address list, for the moment, has a single entry that is the key to the server list. This will change in the future when a server is instead keyed on its UUID and the VL.GetAddrsU operation is used. (4) An 'address cursor' concept is introduced to handle iteration through the address list. This is passed to the afs_make_call() as, in the future, stuff (such as abort code) that doesn't outlast the call will be returned in it. In the future, we might want to annotate the list with information about how each address fares. We might then want to propagate such annotations over address list replacement. Whilst we're at it, we allow IPv6 addresses to be specified in colon-delimited lists by enclosing them in square brackets. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Overhaul the way that the in-kernel AFS client keeps track of cells in the following manner: (1) Cells are now held in an rbtree to make walking them quicker and RCU managed (though this is probably overkill). (2) Cells now have a manager work item that: (A) Looks after fetching and refreshing the VL server list. (B) Manages cell record lifetime, including initialising and destruction. (B) Manages cell record caching whereby threads are kept around for a certain time after last use and then destroyed. (C) Manages the FS-Cache index cookie for a cell. It is not permitted for a cookie to be in use twice, so we have to be careful to not allow a new cell record to exist at the same time as an old record of the same name. (3) Each AFS network namespace is given a manager work item that manages the cells within it, maintaining a single timer to prod cells into updating their DNS records. This uses the reduce_timer() facility to make the timer expire at the soonest timed event that needs happening. (4) When a module is being unloaded, cells and cell managers are now counted out using dec_after_work() to make sure the module text is pinned until after the data structures have been cleaned up. (5) Each cell's VL server list is now protected by a seqlock rather than a semaphore. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Overhaul permit caching in AFS by making it per-vnode and sharing permit lists where possible. When most of the fileserver operations are called, they return a status structure indicating the (revised) details of the vnode or vnodes involved in the operation. This includes the access mark derived from the ACL (named CallerAccess in the protocol definition file). This is cacheable and if the ACL changes, the server will tell us that it is breaking the callback promise, at which point we can discard the currently cached permits. With this patch, the afs_permits structure has, at the end, an array of { key, CallerAccess } elements, sorted by key pointer. This is then cached in a hash table so that it can be shared between vnodes with the same access permits. Permit lists can only be shared if they contain the exact same set of key->CallerAccess mappings. Note that that table is global rather than being per-net_ns. If the keys in a permit list cross net_ns boundaries, there is no problem sharing the cached permits, since the permits are just integer masks. Since permit lists pin keys, the permit cache also makes it easier for a future patch to find all occurrences of a key and remove them by means of setting the afs_permits::invalidated flag and then clearing the appropriate key pointer. In such an event, memory barriers will need adding. Lastly, the permit caching is skipped if the server has sent either a vnode-specific or an entire-server callback since the start of the operation. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Overhaul the AFS callback handling by the following means: (1) Don't give up callback promises on vnodes that we are no longer using, rather let them just expire on the server or let the server break them. This is actually more efficient for the server as the callback lookup is expensive if there are lots of extant callbacks. (2) Only give up the callback promises we have from a server when the server record is destroyed. Then we can just give up *all* the callback promises on it in one go. (3) Servers can end up being shared between cells if cells are aliased, so don't add all the vnodes being backed by a particular server into a big FID-indexed tree on that server as there may be duplicates. Instead have each volume instance (~= superblock) register an interest in a server as it starts to make use of it and use this to allow the processor for callbacks from the server to find the superblock and thence the inode corresponding to the FID being broken by means of ilookup_nowait(). (4) Rather than iterating over the entire callback list when a mass-break comes in from the server, maintain a counter of mass-breaks in afs_server (cb_seq) and make afs_validate() check it against the copy in afs_vnode. It would be nice not to have to take a read_lock whilst doing this, but that's tricky without using RCU. (5) Save a ref on the fileserver we're using for a call in the afs_call struct so that we can access its cb_s_break during call decoding. (6) Write-lock around callback and status storage in a vnode and read-lock around getattr so that we don't see the status mid-update. This has the following consequences: (1) Data invalidation isn't seen until someone calls afs_validate() on a vnode. Unfortunately, we need to use a key to query the server, but getting one from a background thread is tricky without caching loads of keys all over the place. (2) Mass invalidation isn't seen until someone calls afs_validate(). (3) Callback breaking is going to hit the inode_hash_lock quite a bit. Could this be replaced with rcu_read_lock() since inodes are destroyed under RCU conditions. Signed-off-by:
David Howells <dhowells@redhat.com>
-