summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core/device.c
Commit message (Collapse)AuthorAgeFilesLines
...
* | RDMA/restrack: Hide restrack DB from IB/coreLeon Romanovsky2019-02-191-1/+5
| | | | | | | | | | | | | | There is no need to expose internals of restrack DB to IB/core. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Move device addition deletion to device.cParav Pandit2019-02-161-1/+10
| | | | | | | | | | | | | | | | | | Move core device addition and removal from sysfs.c to device.c as device.c is more appropriate place for device management. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Use simpler device_del() instead of device_unregister()Parav Pandit2019-02-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Instead of holding extra reference using get_device() that device_unregister() releases, simplify it as below. device_add() balances with device_del(). device_initialize() balances with put_device(), always via ib_dealloc_device(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/devices: Re-organize device.c lockingJason Gunthorpe2019-02-091-140/+221
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The locking here started out with a single lock that covered everything and then has lately veered into crazy town. The fundamental problem is that several places need to iterate over a linked list, but also need to drop their locks to avoid deadlock during client callbacks. xarray's restartable iteration offers a simple solution to the problem. Once all the lists are xarrays we can drop locks in the places that need that and rely on xarray to provide consistency and locking for the data structure. The resulting simplification is that each of the three lists has a dedicated rwsem that must be held when working with the list it covers. One data structure is no longer covered by multiple locks. The sleeping semaphore is selected because the read side generally needs to be held over something sleeping, and using RCU reader locking in those cases is overkill. In the process this simplifies the entire registration/unregistration flow to be the expected list of setups and the reversed list of matching teardowns, and the registration lock 'refcount' can now be revised to be released after the ULPs are removed, providing a very sane semantic for this feature. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/devices: Use xarray to store the client_dataJason Gunthorpe2019-02-091-178/+170Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have a small ID for each client we can use xarray instead of linearly searching linked lists for client data. This will give much faster and scalable client data lookup, and will lets us revise the locking scheme. Since xarray can store 'going_down' using a mark just entirely eliminate the struct ib_client_data and directly store the client_data value in the xarray. However this does require a special iterator as we must still iterate over any NULL client_data values. Also eliminate the client_data_lock in favour of internal xarray locking. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/devices: Use xarray to store the clientsJason Gunthorpe2019-02-091-5/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | This gives each client a unique ID and will let us move client_data to use xarray, and revise the locking scheme. clients have to be add/removed in strict FIFO/LIFO order as they interdepend. To support this the client_ids are assigned to increase in FIFO order. The existing linked list is kept to support reverse iteration until xarray can get a reverse iteration API. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
* | RDMA/device: Use an ida instead of a free page in alloc_nameJason Gunthorpe2019-02-091-11/+17
| | | | | | | | | | | | | | | | ida is the proper data structure to hold list of clustered small integers and then allocate an unused integer. Get rid of the convoluted and limited open-coded bitmap. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/device: Get rid of reg_stateJason Gunthorpe2019-02-091-6/+2Star
| | | | | | | | | | | | | | | | This really has no purpose anymore, refcount can be used to tell if the device is still registered. Keeping it around just invites mis-use. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
* | RDMA/device: Call ib_cache_release_one() only from ib_device_release()Jason Gunthorpe2019-02-091-29/+12Star
| | | | | | | | | | | | | | | | | | | | | | Instead of complicated logic about when this memory is freed, always free it during device release(). All the cache pointers start out as NULL, so it is safe to call this before the cache is initialized. This makes for a simpler error unwind flow, and a simpler understanding of the lifetime of the memory allocations inside the struct ib_device. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/device: Ensure that security memory is always freedJason Gunthorpe2019-02-091-7/+3Star
| | | | | | | | | | | | | | | | Since this only frees memory it should be done during the release callback. Otherwise there are possible error flows where it might not get called if registration aborts. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/device: Check that the rename is nop under the lockJason Gunthorpe2019-02-091-4/+6
| | | | | | | | | | | | | | | | | | | | Since another rename could be running in parallel it is safer to check that the name is not changing inside the lock, where we already know the device name will not change. Fixes: d21943dd19b5 ("RDMA/core: Implement IB device rename function") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
* | RDMA: Handle PD allocations by IB/coreLeon Romanovsky2019-02-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Share driver structure size with coreLeon Romanovsky2019-02-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Add new macros to be used in drivers while registering ops structure and IB/core while calling allocation routines, so drivers won't need to perform kzalloc/kfree in their paths. The change in allocation stage allows us to initialize common fields prior to calling to drivers (e.g. restrack). Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Don't register each MAD agent for LSM notifierDaniel Jurgens2019-02-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating many MAD agents in a short period of time, receive packet processing can be delayed long enough to cause timeouts while new agents are being added to the atomic notifier chain with IRQs disabled. Notifier chain registration and unregstration is an O(n) operation. With large numbers of MAD agents being created and destroyed simultaneously the CPUs spend too much time with interrupts disabled. Instead of each MAD agent registering for it's own LSM notification, maintain a list of agents internally and register once, this registration already existed for handling the PKeys. This list is write mostly, so a normal spin lock is used vs a read/write lock. All MAD agents must be checked, so a single list is used instead of breaking them down per device. Notifier calls are done under rcu_read_lock, so there isn't a risk of similar packet timeouts while checking the MAD agents security settings when notified. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | Merge tag 'v5.0-rc5' into rdma.git for-nextJason Gunthorpe2019-02-041-3/+10
|\| | | | | | | | | | | | | | | Linux 5.0-rc5 Needed to merge the include/uapi changes so we have an up to date single-tree for these files. Patches already posted are also expected to need this for dependencies.
| * RDMA/device: Expose ib_device_try_get(()Jason Gunthorpe2019-01-211-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out future patches need this capability quite widely now, not just for netlink, so provide two global functions to manage the registration lock refcount. This also moves the point the lock becomes 1 to within ib_register_device() so that the semantics of the public API are very sane and clear. Calling ib_device_try_get() will fail on devices that are only allocated but not yet registered. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
* | RDMA/core: Use the ops infrastructure to keep all callbacks in one placeLeon Romanovsky2019-01-311-0/+1
| | | | | | | | | | | | | | | | | | As preparation to hide rdma_restrack_root, refactor the code to use the ops structure instead of a special callback which is hidden in rdma_restrack_root. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Simplify restrack interfaceLeon Romanovsky2019-01-311-2/+2
| | | | | | | | | | | | | | | | | | | | In the current implementation, we have one restrack root per-device and all users are simply providing it directly. Let's simplify the interface and have callers provide the ib_device and internally access the restrack_root. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA: Add indication for in kernel API support to IB deviceGal Pressman2019-01-311-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers that do not provide kernel verbs support should not be used by ib kernel clients at all. In case a device does not implement all mandatory verbs for kverbs usage mark it as a non kverbs provider and prevent its usage for all clients except for uverbs. The device is marked as a non kverbs provider using the 'kverbs_provider' flag which should only be set by the core code. The clients can choose whether kverbs are requested for its usage using the 'no_kverbs_req' flag which is currently set for uverbs only. This patch allows drivers to remove mandatory verbs stubs and simply set the callbacks to NULL. The IB device will be registered as a non-kverbs provider. Note that verbs that are required for the device registration process must be implemented. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA: Provide safe ib_alloc_device() functionLeon Romanovsky2019-01-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | All callers to ib_alloc_device() provide a larger size than struct ib_device and rely on the fact that struct ib_device is embedded in their driver specific structure as the first member. Provide a safer variant of ib_alloc_device() that checks and enforces this approach to make sure the drivers are using it right. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Simplify rdma cgroup registrationParav Pandit2019-01-181-7/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | RDMA cgroup registration routine always returns success, so simplify function to be void and run clang formatter over whole CONFIG_CGROUP_RDMA art of core_priv.h. This reduces unwinding error path for regular registration and future net namespace change functionality for rdma device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/device: Use __ib_device_get_by_name() in ib_device_rename()Jason Gunthorpe2019-01-181-6/+3Star
| | | | | | | | | | | | | | No reason to open code this loop. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
* | RDMA: Introduce and use rdma_device_to_ibdev()Parav Pandit2019-01-141-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce and use rdma_device_to_ibdev() API for those drivers which are registering one sysfs group and also use in ib_core. In subsequent patch, device->provider_ibdev one-to-one mapping is no longer holds true during accessing sysfs entries. Therefore, introduce an API rdma_device_to_ibdev() that provides such information. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA: Rename port_callback to init_portParav Pandit2019-01-141-4/+3Star
|/ | | | | | | | | | | | | | Most provider routines are callback routines which ib core invokes. _callback suffix doesn't convey information about when such callback is invoked. Therefore, rename port_callback to init_port. Additionally, store the init_port function pointer in ib_device_ops, so that it can be accessed in subsequent patches when binding rdma device to net namespace. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/core: Add advise_mr to the list of known opsMoni Shoua2019-01-021-0/+1
| | | | | | | | | | | | | | | We need to add advise_mr to the list of operation setters on the ib_device or otherwise callers to ib_set_device_ops() for advise_mr operation will not have their callback registered. When the advise_mr series was merged with the device ops series the SET_DEVICE_OPS() was missed. Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA: Start use ib_device_opsKamal Heib2018-12-121-104/+107
| | | | | | | Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Introduce ib_device_opsKamal Heib2018-12-111-0/+98
| | | | | | | | | | | | | | This change introduces the ib_device_ops structure that defines all the InfiniBand device operations in one place, so the code will be more readable and clean, unlike today when the ops are mixed with ib_device data members. The providers will need to define the supported operations and assign them using ib_set_device_ops(), that will also make the providers code more readable and clean. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Validate port number in query_pkey verbYuval Shaia2018-12-071-0/+3
| | | | | | | Before calling the driver's function let's make sure port is valid. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Sync unregistration with netlink commandsParav Pandit2018-11-221-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the rdma device is getting removed, get resource info can race with device removal, as below: CPU-0 CPU-1 -------- -------- rdma_nl_rcv_msg() nldev_res_get_cq_dumpit() mutex_lock(device_lock); get device reference mutex_unlock(device_lock); [..] ib_unregister_device() /* Valid reference to * device->dev exists. */ ib_dealloc_device() [..] provider->fill_res_entry(); Even though device object is not freed, fill_res_entry() can get called on device which doesn't have a driver anymore. Kernel core device reference count is not sufficient, as this only keeps the structure valid, and doesn't guarantee the driver is still loaded. Similar race can occur with device renaming and device removal, where device_rename() tries to rename a unregistered device. While this is fine for devices of a class which are not net namespace aware, but it is incorrect for net namespace aware class coming in subsequent series. If a class is net namespace aware, then the below [1] call trace is observed in above situation. Therefore, to avoid the race, keep a reference count and let device unregistration wait until all netlink users drop the reference. [1] Call trace: kernfs: ns required in 'infiniband' for 'mlx5_0' WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 kernfs_find_ns+0x104/0x120 libahci i2c_core mlxfw libata dca [last unloaded: devlink] RIP: 0010:kernfs_find_ns+0x104/0x120 Call Trace: kernfs_find_and_get_ns+0x2e/0x50 sysfs_rename_link_ns+0x40/0xb0 device_rename+0xb2/0xf0 ib_device_rename+0xb3/0x100 [ib_core] nldev_set_doit+0x165/0x190 [ib_core] rdma_nl_rcv_msg+0x249/0x250 [ib_core] ? netlink_deliver_tap+0x8f/0x3e0 rdma_nl_rcv+0xd6/0x120 [ib_core] netlink_unicast+0x17c/0x230 netlink_sendmsg+0x2f0/0x3e0 sock_sendmsg+0x30/0x40 __sys_sendto+0xdc/0x160 Fixes: da5c85078215 ("RDMA/nldev: add driver-specific resource tracking") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Refactor ib_register_device() functionParav Pandit2018-10-171-51/+75
| | | | | | | | | | | ib_register_device() does several allocation and initialization steps. Split it into smaller more readable functions for easy review and maintenance. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/core: Fix unwinding flow in case of error to register deviceParav Pandit2018-10-171-2/+4
| | | | | | | | | | | | | If port pkey list initialization fails, free the port_immutable memory during cleanup path. Currently it is missed out. If cache setup fails, free the pkey list during cleanup path. Fixes: d291f1a65 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/core: Implement IB device rename functionLeon Romanovsky2018-10-161-0/+25
| | | | | | | Generic implementation of IB device rename function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/core: Use dev_name instead of ibdev->nameJason Gunthorpe2018-09-261-5/+3Star
| | | | | | | | | | These return the same thing but dev_name is a more conventional use of the kernel API. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
* RDMA/core: Use dev_err/dbg/etc instead of pr_* + ibdev->nameJason Gunthorpe2018-09-261-14/+19
| | | | | | | | | | | | Any messages related to a device should be printed with the dev_* formatters. This provides greater consistency for the user. The core does not set pr_fmt so this has no significant change. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
* RDMA: Fully setup the device name in ib_register_deviceJason Gunthorpe2018-09-261-15/+20
| | | | | | | | | | | | | | | | | | | | | | | The current code has two copies of the device name, ibdev->dev and dev_name(&ibdev->dev), and they are setup at different times, which is very confusing. Set them both up at the same time and make dev_name() the lead name, which is the proper use of the driver core APIs. To make it very clear that the name is not valid until registration pass it in to the ib_register_device() call rather than messing with ibdev->name directly. Also the reorganization now checks that dev_name is unique even if it does not contain a %. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
* RDMA/core: Assign device ifindex before publishing the deviceParav Pandit2018-09-061-1/+2
| | | | | | | | | | | Even though device->ifindex is assigned before adding the device in the list which is read by netlink flow, it is better to assign rdma device index before publishing the device in the system to users and clients. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Follow correct unregister order between sysfs and cgroupParav Pandit2018-09-061-1/+1
| | | | | | | | | | | | | During register_device() init sequence is, (a) register with rdma cgroup followed by (b) register with sysfs Therefore, unregister_device() sequence should follow the reverse order. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Define client_data_lock as rwlock instead of spinlockParav Pandit2018-09-061-15/+15
| | | | | | | | | | | Even though device registration/unregistration and client registration/unregistration is not a performance path, define the client_data_lock as rwlock for code clarity. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Use simpler spin lock irq API from blocking contextParav Pandit2018-09-061-11/+9Star
| | | | | | | | | | | | | | | add_client_context(), ib_unregister_device() and ib_unregister_client() are designed to call from blocking context. There is no need to save and restore last interrupt state when called from such blocking context. Even though this is not a performance path, using the right spin lock API is desired for code clarity. To avoid checkpatch warning while removing flags, sizeof() is used. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Remove context entries from list while unregistering deviceParav Pandit2018-09-061-1/+5
| | | | | | | | | | | While unregistering a device, remove the context elements from the list to not have any stale entries. With that any errors/bugs can be checked when device is freed. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Use simplified list_for_eachParav Pandit2018-09-061-5/+4Star
| | | | | | | | | | | While traversing client_data_list in following conditions, linked list is only read, no elements of the list are removed. Therefore, use list_for_each_entry(), instead of list_for_each_safe(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: No need to protect kfree with spin lock and semaphoreParav Pandit2018-09-061-1/+1
| | | | | | | | | | | While unregistering a client, only context removal should be protected with lock. There is no need to protect a freeing of such context which is already removed from the list. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/core: Add an unbound WQ type to the new CQ APIJack Morgenstein2018-09-051-1/+14
| | | | | | | | | | | | | | | | | | | | | | | The upstream kernel commit cited below modified the workqueue in the new CQ API to be bound to a specific CPU (instead of being unbound). This caused ALL users of the new CQ API to use the same bound WQ. Specifically, MAD handling was severely delayed when the CPU bound to the WQ was busy handling (higher priority) interrupts. This caused a delay in the MAD "heartbeat" response handling, which resulted in ports being incorrectly classified as "down". To fix this, add a new "unbound" WQ type to the new CQ API, so that users have the option to choose either a bound WQ or an unbound WQ. For MADs, choose the new "unbound" WQ. Fixes: b7363e67b23e ("IB/device: Convert ib-comp-wq to be CPU-bound") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.m> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/core: Remove {create,destroy}_ah from mandatory verbsKamal Heib2018-07-311-2/+0Star
| | | | | | | | {create,destroy}_ah aren't mandatory verbs, because not all providers are implementing them. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB: Replace ib_query_gid/ib_get_cached_gid with rdma_query_gidParav Pandit2018-06-181-20/+1Star
| | | | | | | | | | | | | If the gid_attr argument is NULL then the functions behave identically to rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything can be consolidated to one function. Now that all callers either use rdma_query_gid() or ib_get_cached_gid(), ib_query_gid() API is removed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* treewide: kzalloc() -> kcalloc()Kees Cook2018-06-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
* RDMA/core: Remove indirection through ib_cache_setup()Jason Gunthorpe2018-05-291-2/+2
| | | | | | | | | This once might have made sense when cache.c was in a different module from device.c, but today it just obfuscation. Get rid of the wrappers and call roge_gid_mgmt_init()/cleanup() directly. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
* Merge tag 'for-linus-unmerged' of ↵Linus Torvalds2018-04-071-13/+5Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Doug and I are at a conference next week so if another PR is sent I expect it to only be bug fixes. Parav noted yesterday that there are some fringe case behavior changes in his work that he would like to fix, and I see that Intel has a number of rc looking patches for HFI1 they posted yesterday. Parav is again the biggest contributor by patch count with his ongoing work to enable container support in the RDMA stack, followed by Leon doing syzkaller inspired cleanups, though most of the actual fixing went to RC. There is one uncomfortable series here fixing the user ABI to actually work as intended in 32 bit mode. There are lots of notes in the commit messages, but the basic summary is we don't think there is an actual 32 bit kernel user of drivers/infiniband for several good reasons. However we are seeing people want to use a 32 bit user space with 64 bit kernel, which didn't completely work today. So in fixing it we required a 32 bit rxe user to upgrade their userspace. rxe users are still already quite rare and we think a 32 bit one is non-existing. - Fix RDMA uapi headers to actually compile in userspace and be more complete - Three shared with netdev pull requests from Mellanox: * 7 patches, mostly to net with 1 IB related one at the back). This series addresses an IRQ performance issue (patch 1), cleanups related to the fix for the IRQ performance problem (patches 2-6), and then extends the fragmented completion queue support that already exists in the net side of the driver to the ib side of the driver (patch 7). * Mostly IB, with 5 patches to net that are needed to support the remaining 10 patches to the IB subsystem. This series extends the current 'representor' framework when the mlx5 driver is in switchdev mode from being a netdev only construct to being a netdev/IB dev construct. The IB dev is limited to raw Eth queue pairs only, but by having an IB dev of this type attached to the representor for a switchdev port, it enables DPDK to work on the switchdev device. * All net related, but needed as infrastructure for the rdma driver - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers - SRP performance updates - IB uverbs write path cleanup patch series from Leon - Add RDMA_CM support to ib_srpt. This is disabled by default. Users need to set the port for ib_srpt to listen on in configfs in order for it to be enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port) - TSO and Scatter FCS support in mlx4 - Refactor of modify_qp routine to resolve problems seen while working on new code that is forthcoming - More refactoring and updates of RDMA CM for containers support from Parav - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory' user API features - Infrastructure updates for the new IOCTL interface, based on increased usage - ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit kernel as was originally intended. See the commit messages for extensive details - Syzkaller bugs and code cleanups motivated by them" * tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits) IB/rxe: Fix for oops in rxe_register_device on ppc64le arch IB/mlx5: Device memory mr registration support net/mlx5: Mkey creation command adjustments IB/mlx5: Device memory support in mlx5_ib net/mlx5: Query device memory capabilities IB/uverbs: Add device memory registration ioctl support IB/uverbs: Add alloc/free dm uverbs ioctl support IB/uverbs: Add device memory capabilities reporting IB/uverbs: Expose device memory capabilities to user RDMA/qedr: Fix wmb usage in qedr IB/rxe: Removed GID add/del dummy routines RDMA/qedr: Zero stack memory before copying to user space IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR IB/mlx5: Add information for querying IPsec capabilities IB/mlx5: Add IPsec support for egress and ingress {net,IB}/mlx5: Add ipsec helper IB/mlx5: Add modify_flow_action_esp verb IB/mlx5: Add implementation for create and destroy action_xfrm IB/uverbs: Introduce ESP steering match filter IB/uverbs: Add modify ESP flow_action ...
| * IB/core: Simplify ib_query_gid to always refer to cacheParav Pandit2018-04-041-12/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently following inconsistencies exist. 1. ib_query_gid() returns GID from the software cache for a RoCE port and returns GID from the HCA for an IB port. This is incorrect because software GID cache is maintained regardless of HCA port type. 2. GID is queries from the HCA via ib_query_gid and updated in the software cache for IB link layer. Both of them might not be in sync. ULPs such as SRP initiator, SRP target, IPoIB driver have historically used ib_query_gid() API to query the GID. However CM used cached version during CM processing, When software cache was introduced, this inconsitency remained. In order to simplify, improve readability and avoid link layer specific above inconsistencies, this patch brings following changes. 1. ib_query_gid() always refers to the cache layer regardless of link layer. 2. cache module who reads the GID entry from HCA and builds the cache, directly invokes the HCA provider verb's query_gid() callback function. 3. ib_query_port() is being called in early stage where GID cache is not yet build while reading port immutable property. Therefore it needs to read the default GID from the HCA for IB link layer to publish the subnet prefix. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit2018-04-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>