summaryrefslogtreecommitdiffstats
path: root/net/rxrpc
Commit message (Collapse)AuthorAgeFilesLines
...
* rxrpc: Preset timestamp on Tx sk_buffsDavid Howells2016-09-231-0/+5
| | | | | | | | | | | | | Set the timestamp on sk_buffs holding packets to be transmitted before queueing them because the moment the packet is on the queue it can be seen by the retransmission algorithm - which may see a completely random timestamp. If the retransmission algorithm sees such a timestamp, it may retransmit the packet and, in future, tell the congestion management algorithm that the retransmit timer expired. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Reduce the number of PING ACKs sentDavid Howells2016-09-222-3/+6
| | | | | | | | | | We don't want to send a PING ACK for every new incoming call as that just adds to the network traffic. Instead, we send a PING ACK to the first three that we receive and then once per second thereafter. This could probably be made adjustable in future. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Reduce the number of ACK-Requests sentDavid Howells2016-09-224-4/+13
| | | | | | | | | | | | | | | Reduce the number of ACK-Requests we set on DATA packets that we're sending to reduce network traffic. We set the flag on odd-numbered DATA packets to start off the RTT cache until we have at least three entries in it and then probe once per second thereafter to keep it topped up. This could be made tunable in future. Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA packets as we transmit them and not stored statically in the sk_buff. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Obtain RTT data by requesting ACKs on DATA packetsDavid Howells2016-09-227-20/+57
| | | | | | | | | | | | | | | In addition to sending a PING ACK to gain RTT data, we can set the RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK. The ACK packet contains the serial number of the packet it is in response to, so we can look through the Tx buffer for a matching DATA packet. This requires that the data packets be stamped with the time of transmission as a ktime rather than having the resend_at time in jiffies. This further requires the resend code to do the resend determination in ktimes and convert to jiffies to set the timer. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Expedite ping response transmissionDavid Howells2016-09-221-0/+4
| | | | | | | | | | | | | Expedite the transmission of a response to a PING ACK by sending it from sendmsg if one is pending. We're most likely to see a PING ACK during the client call Tx phase as the other side may use it to determine a number of parameters, such as the client's receive window size, the RTT and whether the client is doing slow start (similar to RFC5681). If we don't expedite it, it's left to the background processing thread to transmit. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Send pings to get RTT dataDavid Howells2016-09-224-8/+80
| | | | | | | | | | | | | | | | | | | | | | | | Send a PING ACK packet to the peer when we get a new incoming call from a peer we don't have a record for. The PING RESPONSE ACK packet will tell us the following about the peer: (1) its receive window size (2) its MTU sizes (3) its support for jumbo DATA packets (4) if it supports slow start (similar to RFC 5681) (5) an estimate of the RTT This is necessary because the peer won't normally send us an ACK until it gets to the Rx phase and we send it a packet, but we would like to know some of this information before we start sending packets. A pair of tracepoints are added so that RTT determination can be observed. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add per-peer RTT trackerDavid Howells2016-09-223-4/+70
| | | | | | | | | | | | | | Add a function to track the average RTT for a peer. Sources of RTT data will be added in subsequent patches. The RTT data will be useful in the future for determining resend timeouts and for handling the slow-start part of the Rx protocol. Also add a pair of tracepoints, one to log transmissions to elicit a response for RTT purposes and one to log responses that contribute RTT data. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add re-sent Tx annotationDavid Howells2016-09-223-12/+32
| | | | | | | | | Add a Tx-phase annotation for packet buffers to indicate that a buffer has already been retransmitted. This will be used by future congestion management. Re-retransmissions of a packet don't affect the congestion window managment in the same way as initial retransmissions. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Don't store the rxrpc header in the Tx queue sk_buffsDavid Howells2016-09-226-88/+71Star
| | | | | | | | | | | Don't store the rxrpc protocol header in sk_buffs on the transmit queue, but rather generate it on the fly and pass it to kernel_sendmsg() as a separate iov. This reduces the amount of storage required. Note that the security header is still stored in the sk_buff as it may get encrypted along with the data (and doesn't change with each transmission). Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add config to inject packet lossDavid Howells2016-09-173-0/+24
| | | | | | | | | | Add a configuration option to inject packet loss by discarding approximately every 8th packet received and approximately every 8th DATA packet transmitted. Note that no locking is used, but it shouldn't really matter. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Improve skb tracingDavid Howells2016-09-1713-55/+127
| | | | | | | | | | | | | | | | | | | | | | | Improve sk_buff tracing within AF_RXRPC by the following means: (1) Use an enum to note the event type rather than plain integers and use an array of event names rather than a big multi ?: list. (2) Distinguish Rx from Tx packets and account them separately. This requires the call phase to be tracked so that we know what we might find in rxtx_buffer[]. (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the event type. (4) A pair of 'rotate' events are added to indicate packets that are about to be rotated out of the Rx and Tx windows. (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for packet loss injection recording. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Remove printks from rxrpc_recvmsg_data() to fix uninit varDavid Howells2016-09-171-8/+0Star
| | | | | | | Remove _enter/_debug/_leave calls from rxrpc_recvmsg_data() of which one uses an uninitialised variable. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add a tracepoint to follow what recvmsg doesDavid Howells2016-09-173-8/+57
| | | | | Add a tracepoint to follow what recvmsg does within AF_RXRPC. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add a tracepoint to follow packets in the Rx bufferDavid Howells2016-09-175-1/+40
| | | | | | | Add a tracepoint to follow the life of packets that get added to a call's receive buffer. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add a tracepoint to log ACK transmissionDavid Howells2016-09-172-1/+9
| | | | | | Add a tracepoint to log information about ACK transmission. Signed-off-by: David Howels <dhowells@redhat.com>
* rxrpc: Add a tracepoint to log received ACK packetsDavid Howells2016-09-171-0/+2
| | | | | | Add a tracepoint to log information from received ACK packets. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add a tracepoint to follow the life of a packet in the Tx bufferDavid Howells2016-09-174-1/+31
| | | | | | | Add a tracepoint to follow the insertion of a packet into the transmit buffer, its transmission and its rotation out of the buffer. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add connection tracepoint and client conn state tracepointDavid Howells2016-09-178-59/+214
| | | | | | | Add a pair of tracepoints, one to track rxrpc_connection struct ref counting and the other to track the client connection cache state. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add some additional call tracingDavid Howells2016-09-172-4/+17
| | | | | | | | | | Add additional call tracepoint points for noting call-connected, call-released and connection-failed events. Also fix one tracepoint that was using an integer instead of the corresponding enum value as the point type. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Print the packet type name in the Rx packet traceDavid Howells2016-09-171-3/+3
| | | | | | | Print a symbolic packet type name for each valid received packet in the trace output, not just a number. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Fix the basic transmit DATA packet content size at 1412 bytesDavid Howells2016-09-171-1/+1
| | | | | | | | | | | | Fix the basic transmit DATA packet content size at 1412 bytes so that they can be arbitrarily assembled into jumbo packets. In the future, I'm thinking of moving to keeping a jumbo packet header at the beginning of each packet in the Tx queue and creating the packet header on the spot when kernel_sendmsg() is invoked. That way, jumbo packets can be assembled on the spur of the moment for (re-)transmission. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Be consistent about switch value in rxrpc_send_call_packet()David Howells2016-09-171-1/+1
| | | | | | | | rxrpc_send_call_packet() should use type in both its switch-statements rather than using pkt->whdr.type. This might give the compiler an easier job of uninitialised variable checking. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Don't transmit an ACK if there's no reason setDavid Howells2016-09-171-0/+5
| | | | | | | | Don't transmit an ACK if call->ackr_reason in unset. There's the possibility of a race between recvmsg() sending an ACK and the background processing thread trying to send the same one. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Fix retransmission algorithmDavid Howells2016-09-171-8/+4Star
| | | | | | | | | | | | | | Make the retransmission algorithm use for-loops instead of do-loops and move the counter increments into the for-statement increment slots. Though the do-loops are slighly more efficient since there will be at least one pass through the each loop, the counter increments are harder to get right as the continue-statements skip them. Without this, if there are any positive acks within the loop, the do-loop will cycle forever because the counter increment is never done. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Fix the parsing of soft-ACKsDavid Howells2016-09-171-1/+1
| | | | | | | | | | | The soft-ACK parser doesn't increment the pointer into the soft-ACK list, resulting in the first ACK/NACK value being applied to all the relevant packets in the Tx queue. This has the potential to miss retransmissions and cause excessive retransmissions. Fix this by incrementing the pointer. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Fix unexposed client conn releaseDavid Howells2016-09-171-1/+0Star
| | | | | | | | | | | | | | | | | | | | If the last call on a client connection is release after the connection has had a bunch of calls allocated but before any DATA packets are sent (so that it's not yet marked RXRPC_CONN_EXPOSED), an assertion will happen in rxrpc_disconnect_client_call(). af_rxrpc: Assertion failed - 1(0x1) >= 2(0x2) is false ------------[ cut here ]------------ kernel BUG at ../net/rxrpc/conn_client.c:753! This is because it's expecting the conn to have been exposed and to have 2 or more refs - but this isn't necessarily the case. Simply remove the assertion. This allows the conn to be moved into the inactive state and deleted if it isn't resurrected before the final put is called. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()David Howells2016-09-171-24/+12Star
| | | | | | | | | | | | Call rxrpc_release_call() on getting an error in rxrpc_new_client_call() rather than trying to do the cleanup ourselves. This isn't a problem, provided we set RXRPC_CALL_HAS_USERID only if we actually add the call to the calls tree as cleanup code fragments that would otherwise cause problems are conditional. Without this, we miss some of the cleanup. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Fix the putting of client connectionsDavid Howells2016-09-171-15/+13Star
| | | | | | | | | | | | | | | | | | | In rxrpc_put_one_client_conn(), if a connection has RXRPC_CONN_COUNTED set on it, then it's accounted for in rxrpc_nr_client_conns and may be on various lists - and this is cleaned up correctly. However, if the connection doesn't have RXRPC_CONN_COUNTED set on it, then the put routine returns rather than just skipping the extra bit of cleanup. Fix this by making the extra bit of clean up conditional instead and always killing off the connection. This manifests itself as connections with a zero usage count hanging around in /proc/net/rxrpc_conns because the connection allocated, but discarded, due to a race with another process that set up a parallel connection, which was then shared instead. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Purge the to_be_accepted queue on socket releaseDavid Howells2016-09-171-0/+10
| | | | | | | | | | Purge the queue of to_be_accepted calls on socket release. Note that purging sock_calls doesn't release the ref owned by to_be_accepted. Probably the sock_calls list is redundant given a purges of the recvmsg_q, the to_be_accepted queue and the calls tree. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Record calls that need to be acceptedDavid Howells2016-09-171-0/+2
| | | | | | | | | | | Record calls that need to be accepted using sk_acceptq_added() otherwise the backlog counter goes negative because sk_acceptq_removed() is called. This causes the preallocator to malfunction. Calls that are preaccepted by AFS within the kernel aren't affected by this. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Fix handling of the last packet in rxrpc_recvmsg_data()David Howells2016-09-172-17/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code for determining the last packet in rxrpc_recvmsg_data() has been using the RXRPC_CALL_RX_LAST flag to determine if the rx_top pointer points to the last packet or not. This isn't a good idea, however, as the input code may be running simultaneously on another CPU and that sets the flag *before* updating the top pointer. Fix this by the following means: (1) Restrict the use of RXRPC_CALL_RX_LAST to the input routines only. There's otherwise a synchronisation problem between detecting the flag and checking tx_top. This could probably be dealt with by appropriate application of memory barriers, but there's a simpler way. (2) Set RXRPC_CALL_RX_LAST after setting rx_top. (3) Make rxrpc_rotate_rx_window() consult the flags header field of the DATA packet it's about to discard to see if that was the last packet. Use this as the basis for ending the Rx phase. This shouldn't be a problem because the recvmsg side of things is guaranteed to see the packets in order. (4) Make rxrpc_recvmsg_data() return 1 to indicate the end of the data if: (a) the packet it has just processed is marked as RXRPC_LAST_PACKET (b) the call's Rx phase has been ended. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Check the return value of rxrpc_locate_data()David Howells2016-09-171-1/+4
| | | | | | Check the return value of rxrpc_locate_data() in rxrpc_recvmsg_data(). Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Move the check of rx_pkt_offset from rxrpc_locate_data() to callerDavid Howells2016-09-171-5/+4Star
| | | | | | | Move the check of rx_pkt_offset from rxrpc_locate_data() to the caller, rxrpc_recvmsg_data(), so that it's more clear what's going on there. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Remove some whitespace.David Howells2016-09-171-1/+1
| | | | | | Remove a tab that's on a line that should otherwise be blank. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Make IPv6 support conditional on CONFIG_IPV6David Howells2016-09-178-2/+34
| | | | | | | | | | | | | | Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it. This is then made conditional on CONFIG_IPV6. Without this, the following can be seen: net/built-in.o: In function `rxrpc_init_peer': >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* rxrpc: Add IPv6 supportDavid Howells2016-09-147-83/+154
| | | | | | | | | | | | | | | | | | | Add IPv6 support to AF_RXRPC. With this, AF_RXRPC sockets can be created: service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6); instead of: service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); The AFS filesystem doesn't support IPv6 at the moment, though, since that requires upgrades to some of the RPC calls. Note that a good portion of this patch is replacing "%pI4:%u" in print statements with "%pISpc" which is able to handle both protocols and print the port. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Use rxrpc_extract_addr_from_skb() rather than doing this manuallyDavid Howells2016-09-142-34/+11Star
| | | | | | | | | There are two places that want to transmit a packet in response to one just received and manually pick the address to reply to out of the sk_buff. Make them use rxrpc_extract_addr_from_skb() instead so that IPv6 is handled automatically. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Don't specify protocol to when creating transport socketDavid Howells2016-09-141-2/+2
| | | | | | | Pass 0 as the protocol argument when creating the transport socket rather than IPPROTO_UDP. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Create an address for sendmsg() to bind unbound socket withDavid Howells2016-09-141-0/+12
| | | | | | | | | | | | Create an address for sendmsg() to bind unbound socket with rather than using a completely blank address otherwise the transport socket creation will fail because it will try to use address family 0. We use the address family specified in the protocol argument when the AF_RXRPC socket was created and SOCK_DGRAM as the default. For anything else, bind() must be used. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Correctly initialise, limit and transmit call->rx_winsizeDavid Howells2016-09-136-13/+26
| | | | | | | | | | | | | | call->rx_winsize should be initialised to the sysctl setting and the sysctl setting should be limited to the maximum we want to permit. Further, we need to place this in the ACK info instead of the sysctl setting. Furthermore, discard the idea of accepting the subpackets of a jumbo packet that lie beyond the receive window when the first packet of the jumbo is within the window. Just discard the excess subpackets instead. This allows the receive window to be opened up right to the buffer size less one for the dead slot. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Fix prealloc refcountingDavid Howells2016-09-133-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The preallocated call buffer holds a ref on the calls within that buffer. The ref was being released in the wrong place - it worked okay for incoming calls to the AFS cache manager service, but doesn't work right for incoming calls to a userspace service. Instead of releasing an extra ref service calls in rxrpc_release_call(), the ref needs to be released during the acceptance/rejectance process. To this end: (1) The prealloc ref is now normally released during rxrpc_new_incoming_call(). (2) For preallocated kernel API calls, the kernel API's ref needs to be released when the call is discarded on socket close. (3) We shouldn't take a second ref in rxrpc_accept_call(). (4) rxrpc_recvmsg_new_call() needs to get a ref of its own when it adds the call to the to_be_accepted socket queue. In doing (4) above, we would prefer not to put the call's refcount down to 0 as that entails doing cleanup in softirq context, but it's unlikely as there are several refs held elsewhere, at least one of which must be put by someone in process context calling rxrpc_release_call(). However, it's not a problem if we do have to do that. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Adjust the call ref tracepoint to show kernel API refsDavid Howells2016-09-134-2/+7
| | | | | | | | | | | Adjust the call ref tracepoint to show references held on a call by the kernel API separately as much as possible and add an additional trace to at the allocation point from the preallocation buffer for an incoming call. Note that this doesn't show the allocation of a client call for the kernel separately at the moment. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Allow tx_winsize to grow in response to an ACKDavid Howells2016-09-131-3/+5
| | | | | | | Allow tx_winsize to grow when the ACK info packet shows a larger receive window at the other end rather than only permitting it to shrink. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Use skb->len not skb->data_lenDavid Howells2016-09-131-4/+4
| | | | | | | | | | | | | skb->len should be used rather than skb->data_len when referring to the amount of data in a packet. This will only cause a malfunction in the following cases: (1) We receive a jumbo packet (validation and splitting both are wrong). (2) We see if there's extra ACK info in an ACK packet (we think it's not there and just ignore it). Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add missing unlock in rxrpc_call_accept()David Howells2016-09-131-3/+5
| | | | | | | Add a missing unlock in rxrpc_call_accept() in the path taken if there's no call to wake up. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Requeue call for recvmsg if more dataDavid Howells2016-09-131-0/+4
| | | | | | | | | | | | | | rxrpc_recvmsg() needs to make sure that the call it has just been processing gets requeued for further attention if the buffer has been filled and there's more data to be consumed. The softirq producer only queues the call and wakes the socket if it fills the first slot in the window, so userspace might end up sleeping forever otherwise, despite there being data available. This is not a problem provided the userspace buffer is big enough or it empties the buffer completely before more data comes in. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: The IDLE ACK packet should use rxrpc_idle_ack_delayDavid Howells2016-09-131-1/+1
| | | | | | | The IDLE ACK packet should use the rxrpc_idle_ack_delay setting when the timer is set for it. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Add missing wakeup on Tx window rotationDavid Howells2016-09-131-0/+2
| | | | | | | | | | | | We need to wake up the sender when Tx window rotation due to an incoming ACK makes space in the buffer otherwise the sender is liable to just hang endlessly. This problem isn't noticeable if the Tx phase transfers no more than will fit in a single window or the Tx window rotates fast enough that it doesn't get full. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Make sure we initialise the peer hash keyDavid Howells2016-09-131-1/+1
| | | | | | | | | Peer records created for incoming connections weren't getting their hash key set. This meant that incoming calls wouldn't see more than one DATA packet - which is not a problem for AFS CM calls with small request data blobs. Signed-off-by: David Howells <dhowells@redhat.com>
* rxrpc: Rewrite the data and ack handling codeDavid Howells2016-09-0821-3286/+1983Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the data and ack handling code such that: (1) Parsing of received ACK and ABORT packets and the distribution and the filing of DATA packets happens entirely within the data_ready context called from the UDP socket. This allows us to process and discard ACK and ABORT packets much more quickly (they're no longer stashed on a queue for a background thread to process). (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead keep track of the offset and length of the content of each packet in the sk_buff metadata. This means we don't do any allocation in the receive path. (3) Jumbo DATA packet parsing is now done in data_ready context. Rather than cloning the packet once for each subpacket and pulling/trimming it, we file the packet multiple times with an annotation for each indicating which subpacket is there. From that we can directly calculate the offset and length. (4) A call's receive queue can be accessed without taking locks (memory barriers do have to be used, though). (5) Incoming calls are set up from preallocated resources and immediately made live. They can than have packets queued upon them and ACKs generated. If insufficient resources exist, DATA packet #1 is given a BUSY reply and other DATA packets are discarded). (6) sk_buffs no longer take a ref on their parent call. To make this work, the following changes are made: (1) Each call's receive buffer is now a circular buffer of sk_buff pointers (rxtx_buffer) rather than a number of sk_buff_heads spread between the call and the socket. This permits each sk_buff to be in the buffer multiple times. The receive buffer is reused for the transmit buffer. (2) A circular buffer of annotations (rxtx_annotations) is kept parallel to the data buffer. Transmission phase annotations indicate whether a buffered packet has been ACK'd or not and whether it needs retransmission. Receive phase annotations indicate whether a slot holds a whole packet or a jumbo subpacket and, if the latter, which subpacket. They also note whether the packet has been decrypted in place. (3) DATA packet window tracking is much simplified. Each phase has just two numbers representing the window (rx_hard_ack/rx_top and tx_hard_ack/tx_top). The hard_ack number is the sequence number before base of the window, representing the last packet the other side says it has consumed. hard_ack starts from 0 and the first packet is sequence number 1. The top number is the sequence number of the highest-numbered packet residing in the buffer. Packets between hard_ack+1 and top are soft-ACK'd to indicate they've been received, but not yet consumed. Four macros, before(), before_eq(), after() and after_eq() are added to compare sequence numbers within the window. This allows for the top of the window to wrap when the hard-ack sequence number gets close to the limit. Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also to indicate when rx_top and tx_top point at the packets with the LAST_PACKET bit set, indicating the end of the phase. (4) Calls are queued on the socket 'receive queue' rather than packets. This means that we don't need have to invent dummy packets to queue to indicate abnormal/terminal states and we don't have to keep metadata packets (such as ABORTs) around (5) The offset and length of a (sub)packet's content are now passed to the verify_packet security op. This is currently expected to decrypt the packet in place and validate it. However, there's now nowhere to store the revised offset and length of the actual data within the decrypted blob (there may be a header and padding to skip) because an sk_buff may represent multiple packets, so a locate_data security op is added to retrieve these details from the sk_buff content when needed. (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is individually secured and needs to be individually decrypted. The code to do this is broken out into rxrpc_recvmsg_data() and shared with the kernel API. It now iterates over the call's receive buffer rather than walking the socket receive queue. Additional changes: (1) The timers are condensed to a single timer that is set for the soonest of three timeouts (delayed ACK generation, DATA retransmission and call lifespan). (2) Transmission of ACK and ABORT packets is effected immediately from process-context socket ops/kernel API calls that cause them instead of them being punted off to a background work item. The data_ready handler still has to defer to the background, though. (3) A shutdown op is added to the AF_RXRPC socket so that the AFS filesystem can shut down the socket and flush its own work items before closing the socket to deal with any in-progress service calls. Future additional changes that will need to be considered: (1) Make sure that a call doesn't hog the front of the queue by receiving data from the network as fast as userspace is consuming it to the exclusion of other calls. (2) Transmit delayed ACKs from within recvmsg() when we've consumed sufficiently more packets to avoid the background work item needing to run. Signed-off-by: David Howells <dhowells@redhat.com>