summaryrefslogtreecommitdiffstats
path: root/fs/ocfs2/dlm/dlmrecovery.c
Commit message (Collapse)AuthorAgeFilesLines
* ocfs2/dlm: don't access beyond bitmap sizeWengang Wang2010-07-121-1/+1
| | | | | | | | | | | dlm->recovery_map is defined as unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; We should treat O2NM_MAX_NODES as the bit map size in bits. This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* ocfs2: print node # when tcp failsWengang Wang2010-05-061-9/+18
| | | | | | | Print the node number of a peer node if sending it a message failed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* dlm: allow dlm do recovery during shutdownSrinivas Eeda2010-02-271-1/+1
| | | | | | | | | | | | | If a node down event happens while dlm shutdown in progress, dlm recovery should be done before dlm is shutdown. We can't migrate unrecovered locks, obviously. But dlm_reco_thread only does recovery if the dlm_state is in DLM_CTXT_JOINED. dlm_reco_thread should do recovery if dlm_state is in DLM_CTXT_JOINED or DLM_CTXT_IN_SHUTDOWN. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead nodeSunil Mushran2010-02-041-1/+6
| | | | | | | | | | | | | | | | | | | During recovery, the dlm frees the locks for the dead node. If it finds a lock in a resource for the dead node, it expects that node to also have a ref in that lock resource. If not, it BUGs. ossbz#1175 was filed with the above BUG. Now, while it is correct that we should be expecting the ref, I see no reason why we have to BUG. After all, we are freeing up the lock and clearing the ref. This patch replaces the BUG_ON with a printk(). Hopefully, that will give us more clues next time this happens. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1175 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* ocfs2/dlm: Handle EAGAIN for compatibility - v2Sunil Mushran2010-02-031-1/+7
| | | | | | | | | | | | Mainline commit aad1b15310b9bcd59fa81ab8f2b1513b59553ea8 made the dlm_begin_reco_handler() return -EAGAIN instead of EAGAIN. As this error is transmitted over the wire, we want the receiver, dlm_send_begin_reco_message(), to understand both the older EAGAIN and the newer -EAGAIN, to allow rolling upgrade of the cluster nodes. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* ocfs2/dlm: Print more messages during lock migrationSunil Mushran2010-01-261-8/+38
| | | | | | | | | | | | | | When a lock resource is migrated, the dlm compares the migrated locks with that that was already existing on the new node. If the comparison fails, it BUGs. This patch prints more messages when the comparison fails inorder to help with the root cause analyis. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1206 This does not fix bz1206. However, if we run into it again, we will have more information to chew on. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* ocfs2/dlm: Ignore LVBs of locks in the Blocked listSunil Mushran2010-01-261-14/+34
| | | | | | | | | | | | | | | | | | During lock resource migration, o2dlm fills the packet with a LVB from the first valid lock. For sanity, it ensures that the other valid locks have the same LVB. If not, it BUGs. The valid locks are ones that have granted EX or PR lock levels and are either on the Granted or Converting lists. Locks in the Blocked list cannot have a valid LVB. This patch ensures that we skip the locks in the Blocked list. Fixes oss bugzilla#1202 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1202 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* ocfs2/trivial: Remove trailing whitespacesSunil Mushran2010-01-261-19/+19
| | | | | | | Patch removes trailing whitespaces. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* ocfs2: return -EAGAIN instead of EAGAIN in dlmTiger Yang2009-12-031-9/+9
| | | | | | | | | We used to return positive EAGAIN to indicate a retry action is needed in dlm_begin_reco_handler(). Now we return negative -EAGAIN to erase the confusion caused by this error code. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* headers: utsname.h reduxAlexey Dobriyan2009-09-241-1/+0Star
| | | | | | | | | | | | | * remove asm/atomic.h inclusion from linux/utsname.h -- not needed after kref conversion * remove linux/utsname.h inclusion from files which do not need it NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however due to some personality stuff it _is_ needed -- cowardly leave ELF-related headers and files alone. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: trivial fix for s/migrate/migration/ in dlmrecovery.c loggingJeff Liu2009-07-091-1/+1
| | | | | | | | in dlmrecovery.c:1121, replace 'migrate' to 'migration' to keep the consistency by comparing to other lines with the similar log info in the same file. Signed-off-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* ocfs2/dlm: Print message showing the recovery masterSunil Mushran2008-03-101-3/+3
| | | | | | | | | Knowing the dlm recovery master helps in debugging recovery issues. This patch prints a message on the recovery master node. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2/dlm: Add missing dlm_lockres_put()s in migration pathSunil Mushran2008-03-101-6/+34
| | | | | | | | | | | | | | | | During migration, the recovery master node may be asked to master a lockres it may not know about. In that case, it would not only have to create a lockres and add it to the hash, but also remember to to do the _put_ corresponding to the kref_init in dlm_init_lockres(), as soon as the migration is completed. Yes, we don't wait for the dlm_purge_lockres() to do that matching put. Note the ref added for it being in the hash protects the lockres from being freed prematurely. This patch adds that missing put, as described above, to plug a memleak. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2/dlm: Add missing dlm_lock_put()sSunil Mushran2008-03-101-0/+9
| | | | | | | | | | | | | | | | | | | Normally locks for remote nodes are freed when that node sends an UNLOCK message to the master. The master node tags an DLM_UNLOCK_FREE_LOCK action to do an extra put on the lock at the end. However, there are times when the master node has to free the locks for the remote nodes forcibly. Two cases when this happens are: 1. When the master has migrated the lockres plus all locks to another node. 2. When the master is clearing all the locks of a dead node. It was in the above two conditions that the dlm was missing the extra put. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: Use dlm_print_one_lock_resource for lock resource printTao Ma2008-03-101-1/+1
| | | | | | | | | | __dlm_print_one_lock_resource must be called with spin_lock the res->spinlock. While in some cases, we use it without this precondition and lead to the failure of assert_spin_locked. So call dlm_print_one_lock_resource instead. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2/dlm: Clear joining_node on hearbeat node downTao Ma2008-01-261-6/+6
| | | | | | | | | | | | | | | | Currently the process of dlm join contains 2 steps: query join and assert join. After query join, the joined node will set its joining_node. So if the joining node happens to panic before the 2nd step, the joined node will fail to clear its joining_node flag because that node isn't in the domain map. It at least cause 2 problems. 1. All the new join request will fail. So no new node can mount the volume. 2. The joined node can't umount the volume since during the umount process it has to wait for the joining_node to be unknown. So the umount will be hanged. The solution is to clear the joining_node before we check the domain map. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2_dlm: Call node eviction callbacks from heartbeat handlerMark Fasheh2008-01-251-0/+7
| | | | | | | | With this, a dlm client can take advantage of the group protocol in the dlm to get full notification whenever a node within the dlm domain leaves unexpectedly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* Use helpers to obtain task pid in printksPavel Emelyanov2007-10-191-5/+5
| | | | | | | | | | | | | | | | The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [akpm@linux-foundation.org: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in ↵Shani Moideen2007-07-111-1/+1
| | | | | | | | | | fs/ocfs2/dlm/dlmrecovery.c Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c Signed-off-by: Shani Moideen <shani.moideen@wipro.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* [PATCH] ocfs2: use list_for_each_entry where beneficalChristoph Hellwig2007-07-111-52/+25Star
| | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: fix sparse warnings in fs/ocfs2/dlmMark Fasheh2007-05-031-2/+2
| | | | Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2_dlm: fix race in dlm_remaster_locksSrinivas Eeda2007-04-261-0/+2
| | | | | | | | | | | There is a possibility that dlm_remaster_locks could overwride node->state with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock spinlock. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: Added post handler callable function in o2net message handlerKurt Hackel2007-02-071-6/+12
| | | | | | | | | | | | | Currently o2net allows one handler function per message type. This patch adds the ability to call another function to be called after the handler has returned the message to the other node. Handlers are now given the option of returning a context (in the form of a void **) which will be passed back into the post message handler function. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2_dlm: Cookies in locks not being printed correctly in error messagesKurt Hackel2007-02-071-6/+6
| | | | | | | | | | | | The dlm encodes the node number and a sequence number in the lock cookie. It also stores the cookie in the lockres in the big endian format to avoid swapping 8 bytes on each lock request. The bug here was that it was assuming the cookie to be in the cpu format when decoding it for printing the error message. This patch swaps the bytes before the print. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2_dlm: wake up sleepers on the lockres waitqueueKurt Hackel2007-02-071-0/+1
| | | | | | | | | | The dlm was not waking up threads waiting on the lockres wait queue, waiting for the lockres to be no longer be in the DLM_LOCK_RES_IN_PROGRESS and the DLM_LOCK_RES_MIGRATING states. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2_dlm: Dlm dispatch was stopping too earlyKurt Hackel2007-02-071-3/+0Star
| | | | | | | | | | dlm_dispatch_work was not processing the queued up tasks at the first sign of the node leaving the domain leading to not only incompleted tasks but also a mismatch in the dlm refcnt. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2_dlm: Drop inflight refmap even if no locks found on the lockresKurt Hackel2007-02-071-5/+3Star
| | | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2_dlm: Fix migrate lockres handler queue scanningKurt Hackel2007-02-071-6/+20
| | | | | | | | | | | | The migrate lockres handler was only searching for its lock on migrated lockres on the expected queue. This could be problematic as the new master could have also issued a convert request during the migration and thus moved the lock to the convert queue. We now search for the lock on all three queues. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2_dlm: Make dlmunlock() wait for migration to completeKurt Hackel2007-02-071-0/+1
| | | | | | | | | dlmunlock() was not waiting for migration to complete before releasing locks on locally mastered locks. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2_dlm: fix cluster-wide refcounting of lock resourcesKurt Hackel2007-02-071-7/+116
| | | | | | | | | | | This was previously broken and migration of some locks had to be temporarily disabled. We use a new (and backward-incompatible) set of network messages to account for all references to a lock resources held across the cluster. once these are all freed, the master node may then free the lock resource memory once its local references are dropped. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* [PATCH] Fix numerous kcalloc() calls, convert to kzalloc()Robert P. J. Day2006-12-131-3/+3
| | | | | | | | | | | | | | | | | | | All kcalloc() calls of the form "kcalloc(1,...)" are converted to the equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect ordering of the first two arguments are fixed. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Adam Belay <ambx1@neo.rr.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* WorkStruct: make allyesconfigDavid Howells2006-11-221-2/+3
| | | | | | Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
* ocfs2: Allow binary names in the DLMMark Fasheh2006-09-241-1/+2
| | | | | | | | The OCFS2 DLM uses strlen() to determine lock name length, which excludes the possibility of putting binary values in the name string. Fix this by requiring that string length be passed in as a parameter. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* [PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() staticAdrian Bunk2006-06-301-2/+6
| | | | | | | dlm_lockres_master_requery() became global without any external usage. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* [PATCH] spin/rwlock init cleanupsIngo Molnar2006-06-281-2/+2
| | | | | | | | | | | | | | | | | | | | | locking init cleanups: - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK() - convert rwlocks in a similar manner this patch was generated automatically. Motivation: - cleanliness - lockdep needs control of lock initialization, which the open-coded variants do not give - it's also useful for -rt and for lock debugging in general Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fs/ocfs2/dlm/: cleanupsAdrian Bunk2006-06-261-1/+1
| | | | | | | | | | | | This patch #if 0's the no longer used dlm_dump_lock_resources(). Since this makes dlmdebug.h empty, this patch also removes this header. Additionally, the needlessly global dlm_is_node_recovered() is made static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: move dlm work to a private work queueKurt Hackel2006-06-261-2/+11
| | | | | | | | The work that is done can block for long periods of time and so is not appropriate for keventd. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: tune down some noisy messages during dlm recoveryKurt Hackel2006-06-261-1/+1
| | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: display message before waiting for recovery to completeKurt Hackel2006-06-261-0/+7
| | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: use GFP_NOFS in some dlm operationsKurt Hackel2006-06-261-5/+5
| | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: wait for recovery when starting lock masteryKurt Hackel2006-06-261-0/+30
| | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: continue recovery when a dead node is encounteredKurt Hackel2006-06-261-0/+1
| | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: remove unneccesary spin_unlock() in dlm_remaster_locks()Kurt Hackel2006-06-261-1/+0Star
| | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: dlm_remaster_locks() should never exit without completingKurt Hackel2006-06-261-54/+62
| | | | | | | | | We cannot restart recovery. Once we begin to recover a node, keep the state of the recovery intact and follow through, regardless of any other node deaths that may occur. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: special case recovery lock in dlmlock_remote()Kurt Hackel2006-06-261-0/+4
| | | | | | | | | | If the previous master of the recovery lock dies, let calc_usage take it down completely and let the caller completely redo the dlmlock() call. Otherwise, there will never be an opportunity to re-master the lockres and recovery wont be able to progress. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: update lvb immediately during recoveryKurt Hackel2006-06-261-18/+26
| | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: dump mismatching migrated lvbs before BUG()Kurt Hackel2006-06-261-2/+13
| | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: make dlm recovery finalization 2 stageKurt Hackel2006-06-261-17/+95
| | | | | | | Makes it easier for the recovery process to deal with node death. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: dlm recovery / lockres reference count fixKurt Hackel2006-06-261-3/+13
| | | | | | | Take a reference on lockres structures while they are on the recovery list. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: clean up recovery related messagesKurt Hackel2006-06-261-12/+90
| | | | | Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>