openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
*	svcrdma: svc_rdma_put_context() is invoked twice in Send error path	Chuck Lever	2016-05-13	1	-15/+13
\| \| \| \| \| \| \| \| \|	Get a fresh op_ctxt in send_reply() instead of in svc_rdma_sendto(). This ensures that svc_rdma_put_context() is invoked only once if send_reply() fails. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Use new CQ API for RPC-over-RDMA server send CQs	Chuck Lever	2016-03-01	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Make RDMA_ERROR messages work	Chuck Lever	2016-03-01	1	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: svc_rdma_post_recv() should close connection on error	Chuck Lever	2016-03-01	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \|	Clean up: Most svc_rdma_post_recv() call sites close the transport connection when a receive cannot be posted. Wrap that in a common helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Close connection when a send error occurs	Chuck Lever	2016-03-01	1	-2/+6
\| \| \| \| \|	Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Do not send Write chunk XDR pad with inline content	Chuck Lever	2016-03-01	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NFS server's XDR encoders adds an XDR pad for content in the xdr_buf page list at the beginning of the xdr_buf's tail buffer. On RDMA transports, Write chunks are sent separately and without an XDR pad. If a Write chunk is being sent, strip off the pad in the tail buffer so that inline content following the Write chunk remains XDR-aligned when it is sent to the client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Do not write xdr_buf::tail in a Write chunk	Chuck Lever	2016-03-01	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the Linux NFS server writes an odd-length data item into a Write chunk, it finishes with XDR pad bytes. If the data item is smaller than the Write chunk, the pad bytes are written at the end of the data item, but still inside the chunk (ie, in the application's buffer). Since this is direct data placement, that exposes the pad bytes. XDR pad bytes are inserted in order to preserve the XDR alignment of the next XDR data item in an XDR stream. But Write chunks do not appear in the payload XDR stream, and only one data item is allowed in each chunk. Thus XDR padding is not needed in a Write chunk. With NFSv4, the Linux NFS server places the results of any operations that follow an NFSv4 READ or READLINK in the xdr_buf's tail. Those results also should never be sent as a part of a Write chunk. The current logic in send_write_chunks() appears to assume that the xdr_buf's tail contains only pad bytes (ie, NFSv3). The server should write only the contents of the xdr_buf's page list in a Write chunk. If there's more than an XDR pad in the tail, that needs to go inline or in the Reply chunk. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Find client-provided write and reply chunks once per reply	Chuck Lever	2016-03-01	1	-44/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The client provides the location of Write chunks into which the server writes bulk payload. The client provides these when the Upper Layer Protocol wants direct data placement and the Binding allows it. (For NFS, this is READ and READLINK operations). The client also provides the location of a Reply chunk into which the server writes the non-bulk part of an RPC reply. The client provides this chunk whenever it believes the reply can be larger than its receive buffers. The server then uses the presence of these chunks to determine how it will form its reply message. svc_rdma_sendto() was looking for Write and Reply chunks multiple times for every reply message. It would be more efficient to do it just once. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svc_rdma: use local_dma_lkey	Christoph Hellwig	2016-01-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	We now alwasy have a per-PD local_dma_lkey available. Make use of that fact in svc_rdma and stop registering our own MR. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
*	svcrdma: Make map_xdr non-static	Chuck Lever	2016-01-19	1	-7/+7
\| \| \| \| \| \| \| \|	Pre-requisite to use map_xdr in the backchannel code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
*	svcrdma: Remove last two __GFP_NOFAIL call sites	Chuck Lever	2016-01-19	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	Clean up. These functions can otherwise fail, so check for page allocation failures too. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
*	svcrdma: Add gfp flags to svc_rdma_post_recv()	Chuck Lever	2016-01-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	svc_rdma_post_recv() allocates pages for receive buffers on-demand. It uses GFP_KERNEL so the allocator tries hard, and may sleep. But I'm about to add a call to svc_rdma_post_recv() from a function that may not sleep. Since all svc_rdma_post_recv() call sites can tolerate its failure, allow it to fail if the page allocator returns nothing. Longer term, receive buffers, being a finite resource per-connection, should be pre-allocated and re-used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
*	svcrdma: Improve allocation of struct svc_rdma_req_map	Chuck Lever	2016-01-19	1	-3/+3
\| \| \| \| \| \| \| \| \|	To ensure this allocation cannot fail and will not sleep, pre-allocate the req_map structures per-connection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
*	IB: split struct ib_send_wr	Christoph Hellwig	2015-10-08	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>
*	svcrdma: Clean up svc_rdma_get_reply_array()	Chuck Lever	2015-07-20	1	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kernel coding conventions frown upon having large nontrivial functions in header files, and the preference these days is to allow the compiler to make inlining decisions if possible. As these functions are re-homed into a .c file, be sure that comparisons with fields in struct rpcrdma_msg are with be32 constants. This is a refactoring change; no behavior change is intended. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Fix send_reply() scatter/gather set-up	Chuck Lever	2015-07-20	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Linux NFS server returns garbage in the data payload of inline NFS/RDMA READ replies. These are READs of under 1000 bytes or so where the client has not provided either a reply chunk or a write list. The NFS server delivers the data payload for an NFS READ reply to the transport in an xdr_buf page list. If the NFS client did not provide a reply chunk or a write list, send_reply() is supposed to set up a separate sge for the page containing the READ data, and another sge for XDR padding if needed, then post all of the sges via a single SEND Work Request. The problem is send_reply() does not advance through the xdr_buf when setting up scatter/gather entries for SEND WR. It always calls dma_map_xdr with xdr_off set to zero. When there's more than one sge, dma_map_xdr() sets up the SEND sge's so they all point to the xdr_buf's head. The current Linux NFS/RDMA client always provides a reply chunk or a write list when performing an NFS READ over RDMA. Therefore, it does not exercise this particular case. The Linux server has never had to use more than one extra sge for building RPC/RDMA replies with a Linux client. However, an NFS/RDMA client _is_ allowed to send small NFS READs without setting up a write list or reply chunk. The NFS READ reply fits entirely within the inline reply buffer in this case. This is perhaps a more efficient way of performing NFS READs that the Linux NFS/RDMA client may some day adopt. Fixes: b432e6b3d9c1 ('svcrdma: Change DMA mapping logic to . . .') BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL	Chuck Lever	2015-06-04	1	-1/+1
\| \| \| \| \| \| \| \| \|	At the 2015 LSF/MM, it was requested that memory allocation call sites that request GFP_KERNEL allocations in a loop should be annotated with __GFP_NOFAIL. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Fix byte-swapping in svc_rdma_sendto.c	Chuck Lever	2015-06-04	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In send_write_chunks(), we have: for (xdr_off = rqstp->rq_res.head[0].iov_len, chunk_no = 0; xfer_len && chunk_no < arg_ary->wc_nchunks; chunk_no++) { . . . } Note that arg_ary->wc_nchunk is in network byte-order. For the comparison to work correctly, both have to be in native byte-order. In send_reply_chunks, we have: write_len = min(xfer_len, htonl(ch->rs_length)); xfer_len is in native byte-order, and ch->rs_length is in network byte-order. be32_to_cpu() is the correct byte swap for ch->rs_length. As an additional clean up, replace ntohl() with be32_to_cpu() in a few other places. This appears to address a problem with large rsize hangs while using PHYSICAL memory registration. I suspect that is the only registration mode that uses more than one chunk element. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Find rmsgp more reliably	Chuck Lever	2015-01-15	1	-14/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xdr_start() can return the wrong rmsgp address if an assumption about how the xdr_buf was constructed changes. When it gets it wrong, the client receives a reply that has gibberish in the RPC/RDMA header, preventing it from matching a waiting RPC request. Instead, make (and document) just one assumption: that the RDMA header for the client's RPC call is at the start of the first page in rq_pages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Scrub BUG_ON() and WARN_ON() call sites	Chuck Lever	2015-01-15	1	-6/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current convention is to avoid using BUG_ON() in places where an oops could cause complete system failure. Replace BUG_ON() call sites in svcrdma with an assertion error message and allow execution to continue safely. Some BUG_ON() calls are removed because they have never fired in production (that we are aware of). Some WARN_ON() calls are also replaced where a back trace is not helpful; e.g., in a workqueue task. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: send_write() must not overflow the device's max sge	Steve Wise	2014-07-11	1	-24/+15
\| \| \| \| \| \| \| \|	Function send_write() must stop creating sges when it reaches the device max and return the amount sent in the RDMA Write to the caller. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: refactor marshalling logic	Steve Wise	2014-06-07	1	-208/+22
\| \| \| \| \| \| \| \| \| \|	This patch refactors the NFSRDMA server marshalling logic to remove the intermediary map structures. It also fixes an existing bug where the NFSRDMA server was not minding the device fast register page list length limitations. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
*	svcrdma: fix offset calculation for non-page aligned sge entries	Jeff Layton	2014-03-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The xdr_off value in dma_map_xdr gets passed to ib_dma_map_page as the offset into the page to be mapped. This calculation does not correctly take into account the case where the data starts at some offset into the page. Increment the xdr_off by the page_base to ensure that it is respected. Cc: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	Fix regression in NFSRDMA server	Tom Tucker	2014-03-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	The server regression was caused by the addition of rq_next_page (afc59400d6c65bad66d4ad0b2daf879cbff8e23e). There were a few places that were missed with the update of the rq_respages array. Signed-off-by: Tom Tucker <tom@ogc.us> Tested-by: Steve Wise <swise@ogc.us> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: cleanup: replace rq_resused count by rq_next_page pointer	J. Bruce Fields	2012-12-18	1	-1/+3
\| \| \| \| \| \| \|	It may be a matter of personal taste, but I find this makes the code clearer. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Cleanup sparse warnings in the svcrdma module	Tom Tucker	2012-02-18	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The svcrdma transport was un-marshalling requests in-place. This resulted in sparse warnings due to __beXX data containing both NBO and HBO data. The code has been restructured to do byte-swapping as the header is parsed instead of when the header is validated immediately after receipt. Also moved extern declarations for the workqueue and memory pools to the private header file. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Cleanup DMA unmapping in error paths.	Tom Tucker	2010-10-19	1	-0/+2
\| \| \| \| \| \| \| \|	There are several error paths in the code that do not unmap DMA. This patch adds calls to svc_rdma_unmap_dma to free these DMA contexts. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcrdma: Change DMA mapping logic to avoid the page_address kernel API	Tom Tucker	2010-10-19	1	-22/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was logic in the send path that assumed that a page containing data to send to the client has a KVA. This is not always the case and can result in data corruption when page_address returns zero and we end up DMA mapping zero. This patch changes the bus mapping logic to avoid page_address() where necessary and converts all calls from ib_dma_map_single to ib_dma_map_page in order to keep the map/unmap calls symmetric. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	tree-wide: fix assorted typos all over the place	André Goddard Rosa	2009-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	That is "success", "unknown", "through", "performance", "[re\|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over\|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
*	svcrdma: dma unmap the correct length for the RPCRDMA header page.	Steve Wise	2009-05-28	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The svcrdma module was incorrectly unmapping the RPCRDMA header page. On IBM pserver systems this causes a resource leak that results in running out of bus address space (10 cthon iterations will reproduce it). The code was mapping the full page but only unmapping the actual header length. The fix is to only map the header length. I also cleaned up the use of ib_dma_map_page() calls since the unmap logic always uses ib_dma_unmap_single(). I made these symmetrical. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
*	svcrdma: clean up error paths.	Steve Wise	2009-05-03	1	-0/+3
\| \| \| \| \| \| \| \| \|	These fixes resolved crashes due to resource leak BUG_ON checks. The resource leaks were detected by introducing asynchronous transport errors. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
*	SVCRDMA: fix recent printk format warnings.	Tom Talpey	2009-03-19	1	-1/+1
\| \| \| \| \| \| \| \|	printk formats in prior commit were reversed/incorrect. Compiled without warning on x86 and x86_64, but detected on ppc. Signed-off-by: Tom Talpey <tmtalpey@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
*	SVCRDMA: remove faulty assertions in rpc/rdma chunk validation.	Tom Talpey	2009-03-11	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \|	Certain client-provided RPCRDMA chunk alignments result in an additional scatter/gather entry, which triggered nfs/rdma server assertions incorrectly. OpenSolaris nfs/rdma client connectathon testing was blocked by these in the special/locking section. Signed-off-by: Tom Talpey <tmtalpey@gmail.com> Cc: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
*	net: Make static	Roel Kluin	2008-12-11	1	-1/+1
\| \| \| \| \| \| \|	Sparse asked whether these could be static. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	svcrdma: Modify the RPC reply path to use FRMR when available	Tom Tucker	2008-10-06	1	-40/+215
\| \| \| \| \| \| \| \|	Use FRMR to map local RPC reply data. This allows RDMA_WRITE to send reply data using a single WR. The FRMR is invalidated by linking the LOCAL_INV WR to the RDMA_SEND message used to complete the reply. Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
*	dma-mapping: add the device argument to dma_mapping_error()	FUJITA Tomonori	2008-07-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	svcrdma: Add dma map count and WARN_ON	Tom Tucker	2008-07-02	1	-0/+3
\| \| \| \| \| \| \|	Add a dma map count in order to verify that all DMA mapping resources have been freed when the transport is closed. Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
*	svcrdma: Use RPC reply map for RDMA_WRITE processing	Tom Tucker	2008-07-02	1	-87/+76
\| \| \| \| \| \| \| \|	Use the new svc_rdma_req_map data type for mapping the client side memory to the server side memory. Move the DMA mapping to the context pointed to by each WR individually so that it is unmapped after the WR completes. Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
*	svcrdma: Free context on post_recv error in send_reply	Tom Tucker	2008-05-19	1	-1/+2
\| \| \| \| \| \| \| \|	If an error is encountered trying to post a recv buffer in send_reply, free the passed in context. Return an error to the caller so it is aware that the request was not posted. Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
*	svcrdma: Simplify receive buffer posting	Tom Tucker	2008-05-19	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \|	The svcrdma transport provider currently allocates receive buffers to the RQ through the xpo_release_rqst method. This approach is overly complicated since it means that the rqstp rq_xprt_ctxt has to be selectively set based on whether the RPC is going to be processed immediately or deferred. Instead, just post the receive buffer when we are certain that we are replying in the send_reply function. Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
*	SVCRDMA: Fix erroneous BUG_ON in send_write	Tom Tucker	2008-03-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The assertion that checks for sge context overflow is incorrectly hard-coded to 32. This causes a kernel bug check when using big-data mounts. Changed the BUG_ON to use the computed value RPCSVC_MAXPAGES. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	SUNPRC: Fix printk format warning	Roland Dreier	2008-02-11	1	-1/+2
\| \| \| \| \| \| \| \|	net/sunrpc/xprtrdma/svc_rdma_sendto.c:160: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'u64' Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
*	rdma: SVCRDMA sendto	Tom Tucker	2008-02-01	1	-0/+520
	This file implements the RDMA transport sendto function. A RPC reply on an RDMA transport consists of some number of RDMA_WRITE requests followed by an RDMA_SEND request. The sendto function parses the ONCRPC RDMA reply header to determine how to send the reply back to the client. The send queue is sized so as to be able to send complete replies for requests in most cases. In the event that there are not enough SQ WR slots to reply, e.g. big data, the send will block the NFSD thread. The I/O callback functions in svc_rdma_transport.c that reap WR completions wake any waiters blocked on the SQ. In general, the goal is not to block NFSD threads and the has_wspace method stall requests when the SQ is nearly full. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>