| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to avoid redoing synchronization/recovery/reshape partially,
the raid set got frozen until after all passed in table line flags had
been cleared. The related table reload sequence had to be precisely
followed, or reshaping may lead to data corruption caused by the active
mapping carrying on with a reshape when the inactive mapping already
had retrieved a stale reshape position.
Harden by retrieving the actual resync/recovery/reshape position
during resume whilst the active table is suspended thus avoiding
to keep the raid set frozen altogether. This prevents superfluous
redoing of an already resynchronized or recovered segment and,
most importantly, potential for redoing of an already reshaped
segment causing data corruption.
Fixes: d39f0010e ("dm raid: fix raid_resume() to keep raid set frozen as needed")
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Verifying the current raid sets redundancy based on retrieved
superblock content has to use the superblock's raid level (e.g. raid0),
not the constructor requested one (e.g. raid10).
Using the requested raid level of raid10 lead to a "divide error"
on raid0 which defines data copies divided by to be zero.
Also check for bogus data copies.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
| |
Also update Documentation accordingly.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move raid_resume()'s setting of 'rw' and 'in_sync' to just prior to
mddev_resume().
Also, remove unused 'bitmap_loaded' member from "struct raid_set".
No functional changes.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix various sync state issues causing racy/bogus sync ratio,
sync_action ad health chars in dm_status() info output.
Sync ratio could be N/N (i.e. 100%) shortly after raid set
creation, i.e. creating a new RaidLV or upconverting a linear LV to
raid1 thus:
"0 2097152 raid raid1 2 Aa 2097162/2097152 recover 0 0 -"
instead of:
"0 2097152 raid raid1 2 Aa 0/2097152 idle 0 0 -"
Sync action could be non-idle, when the MD thread was done with io.
Health chars could be 'A' when they should be 'a' for a short time
before a resynchonization started.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
The raid_status() function passes the bool array_in_sync variable around
providing synchronization state of the MD array. Replace it with a
runtime flag. This will avoid a pattern of having to pass discrete
variables to various functions.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The MD sync thread updates recovery flags providing state of any
running, idle, frozen, recovering, reshaping, ... activity it performs
and updates respective flags asynchronously versus dm processing
raid_status(). To close that race window, take a single copy of the
flags and pass it into its callees.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
During a reshape request: if userspace reloads a "raid" table multiple
times, resulting in multiple superblock reads, the raid set needs to
stay frozen until all config changes (chunk size, layout data_offset,
delta_disks) have been stored in the superblocks and respective flags
cleared.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check all component data device sizes versus calculated size.
Reject if device(s) are too small. Otherwise, MD will fail the
operation by accessing beyond the end of the data device.
An example use-case is that growing bitmap won't fit any more and the MD
runtime will report an error when DM raid should catch this earlier.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The raid set size is being revalidated unconditionally before a
reshaping conversion is started. MD requires the size to only be
reduced in case of a stripe removing (i.e. shrinking) reshape but not
when growing because the raid array has to stay small until after the
growing reshape finishes.
Fix by avoiding the size revalidation in preresume unless a shrinking
reshape is requested.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Pay attention to existing reshape space to define if a raid set needs
resizing. Otherwise we can hit "Can't resize a reshaping raid set"
when a reshape is being requested.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The md raid personalities call md_finish_reshape() at the end of a
reshape conversion which adjusts rdev->sectors.
Correct/check rdev->sectors before initiating a reshape and raise the
recovery pointer accordingly.
Otherwise, the DM raid coordinated reshape will fail.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
md_stop_writes() is called in raid_presuspend() causing deadlocks on
bios submitted afterwards -- which happens on loaded raid sets with
conversion requests.
Fix by moving md_stop_writes() to raid_postsuspend(). NOTE: when the
recovery's frozen (MD_RECOVERY_FROZEN), writes haven't been started (or
are already stopped) so don't stop them again.
Also remove superfluous readonly setting.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When system is under memory pressure it is observed that dm bufio
shrinker often reclaims only one buffer per scan. This change fixes
the following two issues in dm bufio shrinker that cause this behavior:
1. ((nr_to_scan - freed) <= retain_target) condition is used to
terminate slab scan process. This assumes that nr_to_scan is equal
to the LRU size, which might not be correct because do_shrink_slab()
in vmscan.c calculates nr_to_scan using multiple inputs.
As a result when nr_to_scan is less than retain_target (64) the scan
will terminate after the first iteration, effectively reclaiming one
buffer per scan and making scans very inefficient. This hurts vmscan
performance especially because mutex is acquired/released every time
dm_bufio_shrink_scan() is called.
New implementation uses ((LRU size - freed) <= retain_target)
condition for scan termination. LRU size can be safely determined
inside __scan() because this function is called after dm_bufio_lock().
2. do_shrink_slab() uses value returned by dm_bufio_shrink_count() to
determine number of freeable objects in the slab. However dm_bufio
always retains retain_target buffers in its LRU and will terminate
a scan when this mark is reached. Therefore returning the entire LRU size
from dm_bufio_shrink_count() is misleading because that does not
represent the number of freeable objects that slab will reclaim during
a scan. Returning (LRU size - retain_target) better represents the
number of freeable objects in the slab. This way do_shrink_slab()
returns 0 when (LRU size < retain_target) and vmscan will not try to
scan this shrinker avoiding scans that will not reclaim any memory.
Test: tested using Android device running
<AOSP>/system/extras/alloc-stress that generates memory pressure
and causes intensive shrinker scans
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit ca5beb76 ("dm mpath: micro-optimize the hot path relative to
MPATHF_QUEUE_IF_NO_PATH") caused bio-based DM-multipath to fail mptest's
"test_02_sdev_delete".
Restoring the logic that existed prior to commit ca5beb76 fixes this
bio-based DM-multipath regression. Also verified all mptest tests pass
with request-based DM-multipath.
This commit effectively reverts commit ca5beb76 -- but it does so
without reintroducing the need to take the m->lock spinlock in
must_push_back_{rq,bio}.
Fixes: ca5beb76 ("dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
created
A NULL pointer is seen if two concurrent "vgchange -ay -K <vg name>"
processes race to load the dm-thin-pool module:
PID: 25992 TASK: ffff883cd7d23500 CPU: 4 COMMAND: "vgchange"
#0 [ffff883cd743d600] machine_kexec at ffffffff81038fa9
0000001 [ffff883cd743d660] crash_kexec at ffffffff810c5992
0000002 [ffff883cd743d730] oops_end at ffffffff81515c90
0000003 [ffff883cd743d760] no_context at ffffffff81049f1b
0000004 [ffff883cd743d7b0] __bad_area_nosemaphore at ffffffff8104a1a5
0000005 [ffff883cd743d800] bad_area at ffffffff8104a2ce
0000006 [ffff883cd743d830] __do_page_fault at ffffffff8104aa6f
0000007 [ffff883cd743d950] do_page_fault at ffffffff81517bae
0000008 [ffff883cd743d980] page_fault at ffffffff81514f95
[exception RIP: kmem_cache_alloc+108]
RIP: ffffffff8116ef3c RSP: ffff883cd743da38 RFLAGS: 00010046
RAX: 0000000000000004 RBX: ffffffff81121b90 RCX: ffff881bf1e78cc0
RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000
RBP: ffff883cd743da68 R8: ffff881bf1a4eb00 R9: 0000000080042000
R10: 0000000000002000 R11: 0000000000000000 R12: 00000000000000d0
R13: 0000000000000000 R14: 00000000000000d0 R15: 0000000000000246
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
0000009 [ffff883cd743da70] mempool_alloc_slab at ffffffff81121ba5
0000010 [ffff883cd743da80] mempool_create_node at ffffffff81122083
0000011 [ffff883cd743dad0] mempool_create at ffffffff811220f4
0000012 [ffff883cd743dae0] pool_ctr at ffffffffa08de049 [dm_thin_pool]
0000013 [ffff883cd743dbd0] dm_table_add_target at ffffffffa0005f2f [dm_mod]
0000014 [ffff883cd743dc30] table_load at ffffffffa0008ba9 [dm_mod]
0000015 [ffff883cd743dc90] ctl_ioctl at ffffffffa0009dc4 [dm_mod]
The race results in a NULL pointer because:
Process A (vgchange -ay -K):
a. send DM_LIST_VERSIONS_CMD ioctl;
b. pool_target not registered;
c. modprobe dm_thin_pool and wait until end.
Process B (vgchange -ay -K):
a. send DM_LIST_VERSIONS_CMD ioctl;
b. pool_target registered;
c. table_load->dm_table_add_target->pool_ctr;
d. _new_mapping_cache is NULL and panic.
Note:
1. process A and process B are two concurrent processes.
2. pool_target can be detected by process B but
_new_mapping_cache initialization has not ended.
To fix dm-thin-pool, and other targets (cache, multipath, and snapshot)
with the same problem, simply dm_register_target() after all resources
created during module init (as labelled with __init) are finished.
Cc: stable@vger.kernel.org
Signed-off-by: monty <monty_pavel@sina.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
conversion
Multiple refcounts are needed if the device was already added. The
micro-optimization of setting the refcount to 1 on first added (rather
than fall thru to a common refcount_inc) lost sight of the fact that the
refcount_inc is also needed for the case when the device already exists
and the mode need not be upgraded.
Fixes: 2a0b4682e0 ("dm: convert dm_dev_internal.count from atomic_t to refcount_t")
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull ARM fix from Russell King:
"Just one fix this time around, for the late commit in the merge window
that triggered a problem with qemu. Qemu is apparently also going to
receive a fix for the discovered issue"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: avoid faulting on qemu
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When qemu starts a kernel in a bare environment, the default SCR has
the AW and FW bits clear, which means that the kernel can't modify
the PSR A or PSR F bits, and means that FIQs and imprecise aborts are
always masked.
When running uboot under qemu, the AW and FW SCR bits are set, and the
kernel functions normally - and this is how real hardware behaves.
Fix this for qemu by ignoring the FIQ bit.
Fixes: 8bafae202c82 ("ARM: BUG if jumping to usermode address in kernel mode")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Here are two bugfixes for I2C, fixing a memleak in the core and irq
allocation for i801.
Also three bugfixes for the at24 eeprom driver which Bartosz collected
while taking over maintainership for this driver"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
eeprom: at24: check at24_read/write arguments
eeprom: at24: fix reading from 24MAC402/24MAC602
eeprom: at24: correctly set the size for at24mac402
i2c: i2c-boardinfo: fix memory leaks on devinfo
i2c: i801: Fix Failed to allocate irq -2147483648 error
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current
Please consider pulling the following fixes for v4.15. While it doesn't
fix any regression introduced in the v4.15 merge window, we have a
feature in at24 since linux v4.8 - reading the mac address block from
at24mac series - which turned out to be not working.
This pull request contains changes that fix it together with a patch
that hardens the read and write argument sanitization with
out-of-bounds checks that were missing.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
So far we completely rely on the caller to provide valid arguments.
To be on the safe side perform an own sanity check.
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Chip datasheet mentions that word addresses other than the actual
start position of the MAC delivers undefined results. So fix this.
Current implementation doesn't work due to this wrong offset.
Cc: stable@vger.kernel.org
Fixes: 0b813658c115 ("eeprom: at24: add support for at24mac series")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There's an ilog2() expansion in AT24_DEVICE_MAGIC() which rounds down
the actual size of EUI-48 byte array in at24mac402 eeproms to 4 from 6,
making it impossible to read it all.
Fix it by manually adjusting the value in probe().
This patch contains a temporary fix that is suitable for stable
branches. Eventually we'll probably remove the call to ilog2() while
converting the magic values to actual structs.
Cc: stable@vger.kernel.org
Fixes: 0b813658c115 ("eeprom: at24: add support for at24mac series")
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently when an error occurs devinfo is still allocated but is
unused when the error exit paths break out of the for-loop. Fix
this by kfree'ing devinfo to avoid the leak.
Detected by CoverityScan, CID#1416590 ("Resource Leak")
Fixes: 4124c4eba402 ("i2c: allow attaching IRQ resources to i2c_board_info")
Fixes: 0daaf99d8424 ("i2c: copy device properties when using i2c_register_board_info()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On Apollo Lake devices the BIOS does not set up IRQ routing for the i801
SMBUS controller IRQ, so we end up with dev->irq set to IRQ_NOTCONNECTED.
Detect this and do not try to use the irq in this case silencing:
i801_smbus 0000:00:1f.1: Failed to allocate irq -2147483648: -107
Cc: stable@vger.kernel.org
BugLink: https://communities.intel.com/thread/114759
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fixes:
- Drop reference to obsolete maintainer tree
- Fix overflow bug in pmbus driver
- Fix SMBUS timeout problem in jc42 driver
For the SMBUS timeout handling, we had a brief discussion if this
should be considered a bug fix or a feature. Peter says "it fixes real
problems where the application misbehave due to faulty content when
reading from an eeprom", and he needs the patch in his company's v4.14
images. This is good enough for me and warrants backport to stable
kernels"
* tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (jc42) optionally try to disable the SMBUS timeout
hwmon: (pmbus) Use 64bit math for DIRECT format values
hwmon: Drop reference to Jean's tree
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With a nxp,se97 chip on an atmel sama5d31 board, the I2C adapter driver
is not always capable of avoiding the 25-35 ms timeout as specified by
the SMBUS protocol. This may cause silent corruption of the last bit of
any transfer, e.g. a one is read instead of a zero if the sensor chip
times out. This also affects the eeprom half of the nxp-se97 chip, where
this silent corruption was originally noticed. Other I2C adapters probably
suffer similar issues, e.g. bit-banging comes to mind as risky...
The SMBUS register in the nxp chip is not a standard Jedec register, but
it is not special to the nxp chips either, at least the atmel chips
have the same mechanism. Therefore, do not special case this on the
manufacturer, it is opt-in via the device property anyway.
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Peter Rosin <peda@axentia.se>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Power values in the 100s of watt range can easily blow past
32bit math limits when processing everything in microwatts.
Use 64bit math instead to avoid these issues on common 32bit ARM
BMC platforms.
Fixes: 442aba78728e ("hwmon: PMBus device driver")
Signed-off-by: Robert Lippert <rlippert@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This tree has not been used for over a year, Guenter is taking all
the hwmon patches in practice.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull NFS client fixes from Anna Schumaker:
"These patches fix a problem with compiling using an old version of
gcc, and also fix up error handling in the SUNRPC layer.
- NFSv4: Ensure gcc 4.4.4 can compile initialiser for
"invalid_stateid"
- SUNRPC: Allow connect to return EHOSTUNREACH
- SUNRPC: Handle ENETDOWN errors"
* tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Handle ENETDOWN errors
SUNRPC: Allow connect to return EHOSTUNREACH
NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
gcc 4.4.4 is too old to have full C11 anonymous union support, so
the current initialiser fails to compile.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
(compile-)Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull xfs fixes from Darrick Wong:
"Here are some bug fixes for 4.15-rc2.
- fix memory leaks that appeared after removing ifork inline data
buffer
- recover deferred rmap update log items in correct order
- fix memory leaks when buffer construction fails
- fix memory leaks when bmbt is corrupt
- fix some uninitialized variables and math problems in the quota
scrubber
- add some omitted attribution tags on the log replay commit
- fix some UBSAN complaints about integer overflows with large sparse
files
- implement an effective inode mode check in online fsck
- fix log's inability to retry quota item writeout due to transient
errors"
* tag 'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Properly retry failed dquot items in case of error during buffer writeback
xfs: scrub inode mode properly
xfs: remove unused parameter from xfs_writepage_map
xfs: ubsan fixes
xfs: calculate correct offset in xfs_scrub_quota_item
xfs: fix uninitialized variable in xfs_scrub_quota
xfs: fix leaks on corruption errors in xfs_bmap.c
xfs: fortify xfs_alloc_buftarg error handling
xfs: log recovery should replay deferred ops in order
xfs: always free inline data before resetting inode fork during ifree
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Once the inode item writeback errors is already fixed, it's time to fix the same
problem in dquot code.
Although there were no reports of users hitting this bug in dquot code (at least
none I've seen), the bug is there and I was already planning to fix it when the
correct approach to fix the inodes part was decided.
This patch aims to fix the same problem in dquot code, regarding failed buffers
being unable to be resubmitted once they are flush locked.
Tested with the recently test-case sent to fstests list by Hou Tao.
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Since we've used up all the bits in i_mode, the existing mode check
doesn't actually do anything useful. However, we've not used all the
bit values in the format portion of i_mode, so we /do/ need to test
that for bad values.
Fixes: 80e4e1268 ("xfs: scrub inodes")
Fixes-coverity-id: 1423992
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The first thing that xfs_writepage_map does is clobber the offset
parameter. Since we never use the passed-in value, turn the parameter
into a local variable. This gets rid of an UBSAN warning in generic/466.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fix some complaints from the UBSAN about signed integer addition overflows.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
It's only used for tracepoints so it's relatively harmless,
but the offset is calculated incorrectly in xfs_scrub_quota_item.
qi_dqperchunk is the nr. of dquots per "chunk" which we have
conveniently *cough* defined to always be 1 FSB. Therefore
block_offset * qi_dqperchunk == first id in that chunk,
and so offset = id / qi_dqperchunk
id * dqperchunk is ... meaningless.
Fixes-coverity-id: 1423965
Fixes: c2fc338c ("xfs: scrub quota information")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
On the first pass through the while(1) loop, we get to
xfs_scrub_should_terminate() which can test the uninitialized
error variable.
Fixes-coverity-id: 1423737
Fixes: c2fc338c ("xfs: scrub quota information")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Use _GOTO instead of _RETURN so we can free the allocated
cursor on error.
Fixes: bf80628 ("xfs: remove xfs_bmse_shift_one")
Fixes-coverity-id: 1423813, 1423676
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
percpu_counter_init failure path doesn't clean up &btp->bt_lru list.
Call list_lru_destroy in that error path. Similarly register_shrinker
error path is not handled.
While it is unlikely to trigger these error path, it is not impossible
especially the later might fail with large NUMAs. Let's handle the
failure to make the code more robust.
Noticed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As part of testing log recovery with dm_log_writes, Amir Goldstein
discovered an error in the deferred ops recovery that lead to corruption
of the filesystem metadata if a reflink+rmap filesystem happened to shut
down midway through a CoW remap:
"This is what happens [after failed log recovery]:
"Phase 1 - find and verify superblock...
"Phase 2 - using internal log
" - zero log...
" - scan filesystem freespace and inode maps...
" - found root inode chunk
"Phase 3 - for each AG...
" - scan (but don't clear) agi unlinked lists...
" - process known inodes and perform inode discovery...
" - agno = 0
"data fork in regular inode 134 claims CoW block 376
"correcting nextents for inode 134
"bad data fork in inode 134
"would have cleared inode 134"
Hou Tao dissected the log contents of exactly such a crash:
"According to the implementation of xfs_defer_finish(), these ops should
be completed in the following sequence:
"Have been done:
"(1) CUI: Oper (160)
"(2) BUI: Oper (161)
"(3) CUD: Oper (194), for CUI Oper (160)
"(4) RUI A: Oper (197), free rmap [0x155, 2, -9]
"Should be done:
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI A
"(8) RUD: for RUI B
"Actually be done by xlog_recover_process_intents()
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI B
"(8) RUD: for RUI A
"So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
from the log record in post_mount.log (generated after umount) and the trace
print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
entry [0x155, 2, -9] are freed."
When reconstructing the internal log state from the log items found on
disk, it's required that deferred ops replay in exactly the same order
that they would have had the filesystem not gone down. However,
replaying unfinished deferred ops can create /more/ deferred ops. These
new deferred ops are finished in the wrong order. This causes fs
corruption and replay crashes, so let's create a single defer_ops to
handle the subsequent ops created during replay, then use one single
transaction at the end of log recovery to ensure that everything is
replayed in the same order as they're supposed to be.
Reported-by: Amir Goldstein <amir73il@gmail.com>
Analyzed-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |/ /
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In xfs_ifree, we reset the data/attr forks to extents format without
bothering to free any inline data buffer that might still be around
after all the blocks have been truncated off the file. Prior to commit
43518812d2 ("xfs: remove support for inlining data/extents into the
inode fork") nobody noticed because the leftover inline data after
truncation was small enough to fit inside the inline buffer inside the
fork itself.
However, now that we've removed the inline buffer, we /always/ have to
free the inline data buffer or else we leak them like crazy. This test
was found by turning on kmemleak for generic/001 or generic/388.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V cleanups and ABI fixes from Palmer Dabbelt:
"This contains a handful of small cleanups that are a result of
feedback that didn't make it into our original patch set, either
because the feedback hadn't been given yet, I missed the original
emails, or we weren't ready to submit the changes yet.
I've been maintaining the various cleanup patch sets I have as their
own branches, which I then merged together and signed. Each merge
commit has a short summary of the changes, and each branch is based on
your latest tag (4.15-rc1, in this case). If this isn't the right way
to do this then feel free to suggest something else, but it seems sane
to me.
Here's a short summary of the changes, roughly in order of how
interesting they are.
- libgcc.h has been moved from include/lib, where it's the only
member, to include/linux. This is meant to avoid tab completion
conflicts.
- VDSO entries for clock_get/gettimeofday/getcpu have been added.
These are simple syscalls now, but we want to let glibc use them
from the start so we can make them faster later.
- A VDSO entry for instruction cache flushing has been added so
userspace can flush the instruction cache.
- The VDSO symbol versions for __vdso_cmpxchg{32,64} have been
removed, as those VDSO entries don't actually exist.
- __io_writes has been corrected to respect the given type.
- A new READ_ONCE in arch_spin_is_locked().
- __test_and_op_bit_ord() is now actually ordered.
- Various small fixes throughout the tree to enable allmodconfig to
build cleanly.
- Removal of some dead code in our atomic support headers.
- Improvements to various comments in our atomic support headers"
* tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: (23 commits)
RISC-V: __io_writes should respect the length argument
move libgcc.h to include/linux
RISC-V: Clean up an unused include
RISC-V: Allow userspace to flush the instruction cache
RISC-V: Flush I$ when making a dirty page executable
RISC-V: Add missing include
RISC-V: Use define for get_cycles like other architectures
RISC-V: Provide stub of setup_profiling_timer()
RISC-V: Export some expected symbols for modules
RISC-V: move empty_zero_page definition to C and export it
RISC-V: io.h: type fixes for warnings
RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
RISC-V: use generic serial.h
RISC-V: remove spin_unlock_wait()
RISC-V: `sfence.vma` orderes the instruction cache
RISC-V: Add READ_ONCE in arch_spin_is_locked()
RISC-V: __test_and_op_bit_ord should be strongly ordered
RISC-V: Remove smb_mb__{before,after}_spinlock()
RISC-V: Remove __smp_bp__{before,after}_atomic
RISC-V: Comment on why {,cmp}xchg is ordered how it is
...
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Olaf said: Here's a short series of patches that produces a working
allmodconfig. Would be nice to see them go in so we can add build
coverage.
I've dropped patches 8 and 10 from the original set:
* [PATCH 08/10] (RISC-V: Set __ARCH_WANT_RENAMEAT to pick up generic
version) has a better fix that I've sent out for review, we don't want
renameat.
* [PATCH 10/10] (input: joystick: riscv has get_cycles) has already been
taken into Dmitry Torokhov's tree.
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fixes:
include/asm-generic/mm_hooks.h:20:11: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/mm_hooks.h:19:38: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|