summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | | [JFFS2] Fix compr_rubin.c build after include file elimination.Andrew Morton2007-04-261-10/+9Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems to be silly season lately. (Oops, test builds are more useful if the file in question is actually configured on. dwmw2). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Handle inodes with only a single metadata node with non-zero isizeDavid Woodhouse2007-04-253-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should never happen unless there's corruption on the medium and the actual data nodes go missing. But the failure mode (an oops when we assume the fragtree isn't empty and go looking for its last node) isn't useful. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Tidy up licensing/copyright boilerplate.David Woodhouse2007-04-2545-217/+155Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In particular, remove the bit in the LICENCE file about contacting Red Hat for alternative arrangements. Their errant IS department broke that arrangement a long time ago -- the policy of collecting copyright assignments from contributors came to an end when the plug was pulled on the servers hosting the project, without notice or reason. We do still dual-license it for use with eCos, with the GPL+exception licence approved by the FSF as being GPL-compatible. It's just that nobody has the right to license it differently. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Better fix for all-zero node headersJoakim Tjernlund2007-04-252-15/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to check for all-zero header since the header cannot be zero due to other checks. Replace the all-zero header check in readinode.c with a check for the magic word. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Improve read_inode memory usage, v2.David Woodhouse2007-04-253-599/+618
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We originally used to read every node and allocate a jffs2_tmp_dnode_info structure for each, before processing them in (reverse) version order and discarding the ones which are obsoleted by later nodes. With huge logfiles, this behaviour caused memory problems. For example, a file involved in OLPC trac #1292 has 1822391 nodes, and would cause the XO machine to run out of memory during the first stage of read_inode(). Instead of just inserting nodes into a tree in version order as we find them, we now put them into a tree in order of their offset within the file, which allows us to immediately discard nodes which are completely obsoleted. We don't use a full tree with 'fragments' pointing to the real data structure, as we do in the normal fragtree. We sort only on the start address, and add an 'overlapped' flag to the tmp_dnode_info to indicate that the node in question is (partially) overlapped by another. When the scan is complete, we start at the end of the file, adding each node to a real fragtree as before. Where the node is non-overlapped, we just add it (it doesn't matter that it's not the latest version; there is no overlap). When the node at the end of the tree _is_ overlapped, we sort it and all its overlapping nodes into version order and then add them to the fragtree in that order. This 'early discard' reduces the peak allocation of tmp_dnode_info structures from 1.8M to a mere 62872 (3.5%) in the degenerate case referenced above. This version of the patch also correctly rememembers the highest node version# seen for an inode when it's scanned. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Improve failure mode if inode checking leaves unchecked space.David Woodhouse2007-04-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should never find the unchecked size is non-zero after we've finished checking all inodes. If it happens, used to BUG(), leaving the alloc_sem held and deadlocking. Instead, just return -ENOSPC after complaining. The GC thread will die, but read-only operation should be able to continue and the file system should be unmountable. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Fix cross-endian build.David Woodhouse2007-04-232-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling a LE-capable JFFS2 on PowerPC, wbuf.c fails to compile: fs/jffs2/wbuf.c:973: error: braced-group within expression allowed only inside a function fs/jffs2/wbuf.c:973: error: initializer element is not constant fs/jffs2/wbuf.c:973: error: (near initialization for ‘oob_cleanmarker.magic’) fs/jffs2/wbuf.c:974: error: braced-group within expression allowed only inside a function fs/jffs2/wbuf.c:974: error: initializer element is not constant fs/jffs2/wbuf.c:974: error: (near initialization for ‘oob_cleanmarker.nodetype’) fs/jffs2/wbuf.c:975: error: braced-group within expression allowed only inside a function fs/jffs2/wbuf.c:976: error: initializer element is not constant fs/jffs2/wbuf.c:976: error: (near initialization for ‘oob_cleanmarker.totlen’) Provide constant_cpu_to_je{16,32} functions, and use them for initialising the offending structure. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Obsolete dirent nodes immediately on unlink, where possible.Joakim Tjernlund2007-04-211-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Speed up mount for directly-mapped NOR flashJoakim Tjernlund2007-04-171-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove excessive scanning of empty flash after a clean marker for users of the point/unpoint method. cfi_cmdset_0001 uses point/unpoint by default iff flash mapping is linear. The speedup is several orders of magnitude if FS is less than half full. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] fix buffer sise calculations in jffs2_get_inode_nodes()Artem Bityutskiy2007-04-171-65/+33Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In read inode we have an optimization which prevents one min. I/O unit (e.g. NAND page) to be read more then once. Namely, at the beginning we do not know which node type we read, so we read so we assume we read the directory entry, because it has the smallest node header. When we read it, we read up to the next min. I/O unit, just because if later we'll need to read more, we already have this data. If it turns out to be that the node is not directory entry, and we need more data, and we did not read it because it sits in the next min. I/O unit, we read the whole next (or several next) min. I/O unit(s). And if it happens to be that we read a data node, and we've read part of its data, we calculate partial CRC. So if later we need to check data CRC, we'll only read the rest of the data from further min. I/O units and continue CRC checking. This code was a bit messy and buggy. The bug was that it assumed relatively large min. I/O unit, so that the largest node header could overlap only one min. I/O unit boundary. This parch clean-ups the code a bit and fixes this bug. The patch was not tested on flash with small min. I/O unit, like NOR-ECC, nut it was tested on NAND with 512 bytes NAND page, so it at least does not break NAND. It was also tested with mtdram so it should not break NOR. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Disable summary after wbuf recoveryAdrian Hunter2007-04-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a write error, any data in the write buffer must be relocated. This is handled by the jffs2_wbuf_recover function. This function does not fix up the erase block summary information that is collected for writing at the end of the block, which results in an incorrect summary (or BUG if the summary was found to be empty). As the summary is not essential (it is an optimisation), it may be disabled for the current erase block when this situation arises. This patch does that. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Prevent list corruption when handling write errorsAdrian Hunter2007-04-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a write error occurs, the affected block is placed on the bad_used_list. In the case that the write error occured when writing summary data the block was also being placed on the dirty_list, which caused list corruption and ultimately a soft lockup in jffs2_mark_node_obsolete. This fixes that. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] fix deadlock on error pathArtem Bityutskiy2007-04-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the MTD driver returns write failure, the following deadlock occurs: We are in __jffs2_flush_wbuf(), we hold &c->wbuf_sem. Write failure. jffs2_wbuf_recover()->jffs2_reserve_space_gc()->jffs2_do_reserve_space() ->jffs2_erase_pending_blocks()->jffs2_flash_read() and it tries to lock &c->wbuf_sem again. Deadlock. Reported-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] check node crc before doing anything elseThomas Gleixner2007-04-171-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check the node CRC on scan before doing anything else with the node. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Delete everything related to obsolete JFFS2_PROC optionRobert P. J. Day2007-04-022-148/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Delete everything related to the apparently non-existent kernel config option JFFS2_PROC. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| | * | | [JFFS2] Remove superfluous source file fs/jffs2/comprtest.cRobert P. J. Day2007-03-101-307/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Delete the obsolete source file fs/jffs2/comprtest.c. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| * | | | Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds2007-04-276-38/+110
| |\ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (46 commits) dev_dbg: check dev_dbg() arguments drivers/base/attribute_container.c: use mutex instead of binary semaphore mod_sysfs_setup() doesn't return errno when kobject_add_dir() failure occurs s2ram: add arch irq disable/enable hooks define platform wakeup hook, use in pci_enable_wake() security: prevent permission checking of file removal via sysfs_remove_group() device_schedule_callback() needs a module reference s390: cio: Delay uevents for subchannels sysfs: bin.c printk fix Driver core: use mutex instead of semaphore in DMA pool handler driver core: bus_add_driver should return an error if no bus debugfs: Add debugfs_create_u64() the overdue removal of the mount/umount uevents kobject: Comment and warning fixes to kobject.c Driver core: warn when userspace writes to the uevent file in a non-supported way Driver core: make uevent-environment available in uevent-file kobject core: remove rwsem from struct subsystem qeth: Remove usage of subsys.rwsem PHY: remove rwsem use from phy core IEEE1394: remove rwsem use from ieee1394 core ...
| | * | | security: prevent permission checking of file removal via sysfs_remove_group()James Morris2007-04-272-22/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent permission checking from being performed when the kernel wants to unconditionally remove a sysfs group, by introducing an kernel-only variant of lookup_one_len(), lookup_one_len_kern(). Additionally, as sysfs_remove_group() does not check the return value of the lookup before using it, a BUG_ON has been added to pinpoint the cause of any problems potentially caused by this (and as a form of annotation). Signed-off-by: James Morris <jmorris@namei.org> Cc: Nagendra Singh Tomar <nagendra_tomar@adaptec.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| | * | | device_schedule_callback() needs a module referenceAlan Stern2007-04-271-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (as896b) fixes an oversight in the design of device_schedule_callback(). It is necessary to acquire a reference to the module owning the callback routine, to prevent the module from being unloaded before the callback can run. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: Satyam Sharma <satyam.sharma@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| | * | | sysfs: bin.c printk fixAndrew Morton2007-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fs/sysfs/bin.c: In function 'read': fs/sysfs/bin.c:77: warning: format '%zd' expects type 'signed size_t', but argument 4 has type 'int' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| | * | | debugfs: Add debugfs_create_u64()Michael Ellerman2007-04-271-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I went to use this the other day, only to find it didn't exist. It's a straight copy of the debugfs u32 code, then s/u32/u64/. A quick test shows it seems to be working. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| | * | | the overdue removal of the mount/umount ueventsAdrian Bunk2007-04-271-12/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains the overdue removal of the mount/umount uevents. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | Merge branch 'for-linus' of git://git.infradead.org/ubi-2.6Linus Torvalds2007-04-273-0/+42
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.infradead.org/ubi-2.6: UBI: remove unused variable UBI: add me to MAINTAINERS JFFS2: add UBI support UBI: Unsorted Block Images
| | * | | | JFFS2: add UBI supportArtem Bityutskiy2007-04-273-0/+42
| | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch make JFFS2 able to work with UBI volumes via the emulated MTD devices which are directly mapped to these volumes. Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
| * | | | Merge branch 'upstream-linus' of ↵Linus Torvalds2007-04-2730-2235/+4690
| |\ \ \ \ | | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (27 commits) ocfs2: Cache extent records ocfs2: Remember rw lock level during direct io ocfs2: Fix up i_blocks calculation to know about holes ocfs2: Fix extent lookup to return true size of holes ocfs2: Read from an unwritten extent returns zeros ocfs2: make room for unwritten extents flag ocfs2: Use own splice write actor ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate() [PATCH] Turn do_sync_file_range() into do_sync_mapping_range() ocfs2: zero tail of sparse files on truncate ocfs2: Teach ocfs2_get_block() about holes ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write() ocfs2: teach ocfs2_file_aio_write() about sparse files ocfs2: Turn off shared writeable mmap for local files systems with holes. ocfs2: abstract out allocation locking ocfs2: teach extend/truncate about sparse files ocfs2: temporarily remove extent map caching ocfs2: sparse b-tree support ocfs2: small cleanup of ocfs2_request_delete() ocfs2: remove unused code ...
| | * | | ocfs2: Cache extent recordsMark Fasheh2007-04-277-0/+289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The extent map code was ripped out earlier because of an inability to deal with holes. This patch adds back a simpler caching scheme requiring far less code. Our old extent map caching was designed back when meta data block caching in Ocfs2 didn't work very well, resulting in many disk reads. These days our metadata caching is much better, resulting in no un-necessary disk reads. As a result, extent caching doesn't have to be as fancy, nor does it have to cache as many extents. Keeping the last 3 extents seen should be sufficient to give us a small performance boost on some streaming workloads. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Remember rw lock level during direct ioMark Fasheh2007-04-273-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cluster locking might have been redone because a direct write won't complete, so this needs to be reflected in the iocb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Fix up i_blocks calculation to know about holesMark Fasheh2007-04-279-30/+25Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Fix extent lookup to return true size of holesMark Fasheh2007-04-275-12/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Read from an unwritten extent returns zerosMark Fasheh2007-04-2710-23/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: make room for unwritten extents flagMark Fasheh2007-04-276-69/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the size of our group bitmaps, we'll never have a leaf node extent record with more than 16 bits worth of clusters. Split e_clusters up so that leaf nodes can get a flags field where we can mark unwritten extents. Interior nodes whose length references all the child nodes beneath it can't split their e_clusters field, so we use a union to preserve sizing there. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Use own splice write actorMark Fasheh2007-04-273-1/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to fill holes during a splice write. Provide our own splice write actor which can call ocfs2_file_buffered_write() with a splice-specific callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate()Mark Fasheh2007-04-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do this instead of filemap_fdatawrite() - this way we sync only the range between i_size and the cluster boundary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | [PATCH] Turn do_sync_file_range() into do_sync_mapping_range()Mark Fasheh2007-04-271-5/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_sync_file_range() accepts a file * from which it takes an address_space to sync. Abstract out the bulk of the function into do_sync_mapping_range() which takes the address_space directly. This way callers who want to sync an address_space directly can take advantage of the functionality provided. do_sync_file_range() is preserved as a small wrapper around do_sync_mapping_range(). Ocfs2 in particular would like to use this to initiate a sync of a specific inode range during truncate, where a file * may not be available. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| | * | | ocfs2: zero tail of sparse files on truncateMark Fasheh2007-04-277-25/+328
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we don't zero on extend anymore, truncate needs to be fixed up to zero the part of a file between i_size and and end of it's cluster. Otherwise a subsequent extend could expose bad data. This introduced a new helper, which can be used in ocfs2_write(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Teach ocfs2_get_block() about holesMark Fasheh2007-04-271-38/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocfs2_get_block() didn't understand sparse files, fix that. Also remove some code that isn't really useful anymore. We can fix up ocfs2_direct_IO_get_blocks() at the same time. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write()Mark Fasheh2007-04-271-120/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are no longer used, and can't handle file systems with sparse file allocation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: teach ocfs2_file_aio_write() about sparse filesMark Fasheh2007-04-277-57/+1076
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock() because allocating writes will require zeroing of pages adjacent to the I/O for cluster sizes greater than page size. Implement a custom file write here, which can order page locks for zeroing. This also has the advantage that cluster locks can easily be ordered outside of the page locks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Turn off shared writeable mmap for local files systems with holes.Mark Fasheh2007-04-271-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will be turned back on once we can do allocation in ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: abstract out allocation lockingMark Fasheh2007-04-271-27/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, file allocation for ocfs2 is done within ocfs2_extend_file(), which is either called from ->setattr() (for an i_size change), or at the top of ocfs2_file_aio_write(). Inodes on file systems with sparse file support will want to do their allocation during the actual write call. In either case the cluster locking decisions are the same. We abstract out that code into a new function, ocfs2_lock_allocators() which will be used by a later patch to enable writing to sparse files. This also provides a nice cleanup of ocfs2_extend_allocation(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: teach extend/truncate about sparse filesMark Fasheh2007-04-273-237/+320
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no longer exists since i_size is not tied to i_clusters. In ocfs2_extend_file(), we skip the allocation / page zeroing code for file systems which understand sparse files. The core truncate code is changed to do a bottom up tree traversal. This gets abstracted out into it's own function. To make things more readable, most of the special case handling for in-inode extents from ocfs2_do_truncate() is also removed. Though write support for sparse files comes in a later patch, we at least update ocfs2_prepare_inode_for_write() to skip allocation for sparse files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: temporarily remove extent map cachingMark Fasheh2007-04-2714-996/+96Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: sparse b-tree supportMark Fasheh2007-04-268-497/+2002
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce tree rotations into the b-tree code. This will allow ocfs2 to support sparse files. Much of the added code is designed to be generic (in the ocfs2 sense) so that it can later be re-used to implement large extended attributes. This patch only adds the rotation code and does minimal updates to callers of the extent api. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: small cleanup of ocfs2_request_delete()Mark Fasheh2007-04-261-33/+13Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two checks in there (one for inode newness, one for other mounted nodes) which are unnecessary, so remove them. The DLM will allow the trylock in either case without any messaging overhead. Removing these makes ocfs2_request_delete() a one liner function, so just move the trylock out one level into ocfs2_query_inode_wipe(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: remove unused codeTiger Yang2007-04-264-329/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove node messaging code that becomes unused with the delete inode vote removal. [Removed even more cruft which I spotted during review --Mark] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Remove delete inode voteTiger Yang2007-04-2610-38/+205
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: filter more error printsMark Fasheh2007-04-262-3/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't want to print anything at all in ocfs2_lookup() when getting an error from ocfs2_iget() - it could be something as innocuous as a signal being detected in the dlm. ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can return if the inode was deleted on another node. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Replace panic() with emergency_restart() when fencingSunil Mushran2007-04-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have noticed panic() hanging leading us to a situation in which the node, while otherwise dead, is still disk heartbeating. This leads to a hung cluster as the other nodes are waiting for this node to stop disk heartbeating. This situation is only resolved by power resetting the box. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Silence compiler warningsSunil Mushran2007-04-262-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| | * | | ocfs2: Local mounts should skip inode updatesMark Fasheh2007-04-261-12/+9Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't want the extent map and uptodate cache destruction in ocfs2_meta_lock_update() on a local mount, so skip that. This fixes several bugs with uptodate being cleared on buffers and extent maps being corrupted. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>