summaryrefslogtreecommitdiffstats
path: root/net/ipv4/inet_diag.c
Commit message (Collapse)AuthorAgeFilesLines
* tcp: Tail loss probe (TLP)Nandita Dukkipati2013-03-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch series implement the Tail loss probe (TLP) algorithm described in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The first patch implements the basic algorithm. TLP's goal is to reduce tail latency of short transactions. It achieves this by converting retransmission timeouts (RTOs) occuring due to tail losses (losses at end of transactions) into fast recovery. TLP transmits one packet in two round-trips when a connection is in Open state and isn't receiving any ACKs. The transmitted packet, aka loss probe, can be either new or a retransmission. When there is tail loss, the ACK from a loss probe triggers FACK/early-retransmit based fast recovery, thus avoiding a costly RTO. In the absence of loss, there is no change in the connection state. PTO stands for probe timeout. It is a timer event indicating that an ACK is overdue and triggers a loss probe packet. The PTO value is set to max(2*SRTT, 10ms) and is adjusted to account for delayed ACK timer when there is only one oustanding packet. TLP Algorithm On transmission of new data in Open state: -> packets_out > 1: schedule PTO in max(2*SRTT, 10ms). -> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms) -> PTO = min(PTO, RTO) Conditions for scheduling PTO: -> Connection is in Open state. -> Connection is either cwnd limited or no new data to send. -> Number of probes per tail loss episode is limited to one. -> Connection is SACK enabled. When PTO fires: new_segment_exists: -> transmit new segment. -> packets_out++. cwnd remains same. no_new_packet: -> retransmit the last segment. Its ACK triggers FACK or early retransmit based recovery. ACK path: -> rearm RTO at start of ACK processing. -> reschedule PTO if need be. In addition, the patch includes a small variation to the Early Retransmit (ER) algorithm, such that ER and TLP together can in principle recover any N-degree of tail loss through fast recovery. TLP is controlled by the same sysctl as ER, tcp_early_retrans sysctl. tcp_early_retrans==0; disables TLP and ER. ==1; enables RFC5827 ER. ==2; delayed ER. ==3; TLP and delayed ER. [DEFAULT] ==4; TLP only. The TLP patch series have been extensively tested on Google Web servers. It is most effective for short Web trasactions, where it reduced RTOs by 15% and improved HTTP response time (average by 6%, 99th percentile by 10%). The transmitted probes account for <0.5% of the overall transmissions. Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2012-12-131-1/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking changes from David Miller: 1) Allow to dump, monitor, and change the bridge multicast database using netlink. From Cong Wang. 2) RFC 5961 TCP blind data injection attack mitigation, from Eric Dumazet. 3) Networking user namespace support from Eric W. Biederman. 4) tuntap/virtio-net multiqueue support by Jason Wang. 5) Support for checksum offload of encapsulated packets (basically, tunneled traffic can still be checksummed by HW). From Joseph Gasparakis. 6) Allow BPF filter access to VLAN tags, from Eric Dumazet and Daniel Borkmann. 7) Bridge port parameters over netlink and BPDU blocking support from Stephen Hemminger. 8) Improve data access patterns during inet socket demux by rearranging socket layout, from Eric Dumazet. 9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and Jon Maloy. 10) Update TCP socket hash sizing to be more in line with current day realities. The existing heurstics were choosen a decade ago. From Eric Dumazet. 11) Fix races, queue bloat, and excessive wakeups in ATM and associated drivers, from Krzysztof Mazur and David Woodhouse. 12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions in VXLAN driver, from David Stevens. 13) Add "oops_only" mode to netconsole, from Amerigo Wang. 14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also allow DCB netlink to work on namespaces other than the initial namespace. From John Fastabend. 15) Support PTP in the Tigon3 driver, from Matt Carlson. 16) tun/vhost zero copy fixes and improvements, plus turn it on by default, from Michael S. Tsirkin. 17) Support per-association statistics in SCTP, from Michele Baldessari. And many, many, driver updates, cleanups, and improvements. Too numerous to mention individually. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) net/mlx4_en: Add support for destination MAC in steering rules net/mlx4_en: Use generic etherdevice.h functions. net: ethtool: Add destination MAC address to flow steering API bridge: add support of adding and deleting mdb entries bridge: notify mdb changes via netlink ndisc: Unexport ndisc_{build,send}_skb(). uapi: add missing netconf.h to export list pkt_sched: avoid requeues if possible solos-pci: fix double-free of TX skb in DMA mode bnx2: Fix accidental reversions. bna: Driver Version Updated to 3.1.2.1 bna: Firmware update bna: Add RX State bna: Rx Page Based Allocation bna: TX Intr Coalescing Fix bna: Tx and Rx Optimizations bna: Code Cleanup and Enhancements ath9k: check pdata variable before dereferencing it ath5k: RX timestamp is reported at end of frame ath9k_htc: RX timestamp is reported at end of frame ...
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-11-111-1/+4
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c Minor conflict between the BCM_CNIC define removal in net-next and a bug fix added to net. Based upon a conflict resolution patch posted by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: better retrans tracking for defer-acceptEric Dumazet2012-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For passive TCP connections using TCP_DEFER_ACCEPT facility, we incorrectly increment req->retrans each time timeout triggers while no SYNACK is sent. SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for which we received the ACK from client). Only the last SYNACK is sent so that we can receive again an ACK from client, to move the req into accept queue. We plan to change this later to avoid the useless retransmit (and potential problem as this SYNACK could be lost) TCP_INFO later gives wrong information to user, claiming imaginary retransmits. Decouple req->retrans field into two independent fields : num_retrans : number of retransmit num_timeout : number of timeouts num_timeout is the counter that is incremented at each timeout, regardless of actual SYNACK being sent or not, and used to compute the exponential timeout. Introduce inet_rtx_syn_ack() helper to increment num_retrans only if ->rtx_syn_ack() succeeded. Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans when we re-send a SYNACK in answer to a (retransmitted) SYN. Prior to this patch, we were not counting these retransmits. Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS only if a synack packet was successfully queued. Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Elliott Hughes <enh@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sock-diag: Report shutdown for inet and unix sockets (v2)Pavel Emelyanov2012-10-231-0/+3
| | | | | | | | | | | | | | | | | | | | | Make it simple -- just put new nlattr with just sk->sk_shutdown bits. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | inet_diag: validate port comparison byte code to prevent unsafe readsNeal Cardwell2012-12-101-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add logic to verify that a port comparison byte code operation actually has the second inet_diag_bc_op from which we read the port for such operations. Previously the code blindly referenced op[1] without first checking whether a second inet_diag_bc_op struct could fit there. So a malicious user could make the kernel read 4 bytes beyond the end of the bytecode array by claiming to have a whole port comparison byte code (2 inet_diag_bc_op structs) when in fact the bytecode was not long enough to hold both. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()Neal Cardwell2012-12-101-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add logic to check the address family of the user-supplied conditional and the address family of the connection entry. We now do not do prefix matching of addresses from different address families (AF_INET vs AF_INET6), except for the previously existing support for having an IPv4 prefix match an IPv4-mapped IPv6 address (which this commit maintains as-is). This change is needed for two reasons: (1) The addresses are different lengths, so comparing a 128-bit IPv6 prefix match condition to a 32-bit IPv4 connection address can cause us to unwittingly walk off the end of the IPv4 address and read garbage or oops. (2) The IPv4 and IPv6 address spaces are semantically distinct, so a simple bit-wise comparison of the prefixes is not meaningful, and would lead to bogus results (except for the IPv4-mapped IPv6 case, which this commit maintains). Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | inet_diag: validate byte code to prevent oops in inet_diag_bc_run()Neal Cardwell2012-12-101-3/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND operations. Previously we did not validate the inet_diag_hostcond, address family, address length, and prefix length. So a malicious user could make the kernel read beyond the end of the bytecode array by claiming to have a whole inet_diag_hostcond when the bytecode was not long enough to contain a whole inet_diag_hostcond of the given address family. Or they could make the kernel read up to about 27 bytes beyond the end of a connection address by passing a prefix length that exceeded the length of addresses of the given family. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV stateNeal Cardwell2012-12-101-14/+39
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix inet_diag to be aware of the fact that AF_INET6 TCP connections instantiated for IPv4 traffic and in the SYN-RECV state were actually created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This means that for such connections inet6_rsk(req) returns a pointer to a random spot in memory up to roughly 64KB beyond the end of the request_sock. With this bug, for a server using AF_INET6 TCP sockets and serving IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to inet_diag_fill_req() causing an oops or the export to user space of 16 bytes of kernel memory as a garbage IPv6 address, depending on where the garbage inet6_rsk(req) pointed. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: inet_diag -- Return error code if protocol handler is missedCyrill Gorcunov2012-11-041-1/+4
|/ | | | | | | | | | | | | | | | | | We've observed that in case if UDP diag module is not supported in kernel the netlink returns NLMSG_DONE without notifying a caller that handler is missed. This patch makes __inet_diag_dump to return error code instead. So as example it become possible to detect such situation and handle it gracefully on userspace level. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: David Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Pavel Emelyanov <xemul@parallels.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: Rename pid to portid to avoid confusionEric W. Biederman2012-09-101-16/+16
| | | | | | | | | | | | | | | It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* userns: Teach inet_diag to work with user namespacesEric W. Biederman2012-08-151-6/+15
| | | | | | | | | | | Compute the user namespace of the socket that we are replying to and translate the kuids of reported sockets into that user namespace. Cc: Andrew Vagin <avagin@openvz.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* net: make sock diag per-namespaceAndrey Vagin2012-07-171-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | Before this patch sock_diag works for init_net only and dumps information about sockets from all namespaces. This patch expands sock_diag for all name-spaces. It creates a netlink kernel socket for each netns and filters data during dumping. v2: filter accoding with netns in all places remove an unused variable. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Andrew Vagin <avagin@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Do not use RTA_PUT() macrosThomas Graf2012-06-281-59/+53Star
| | | | | | | | Also, no need to trim on nlmsg_put() failure, nothing has been added yet. We also want to use nlmsg_end(), nlmsg_new() and nlmsg_free(). Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Move away from NLMSG_PUT().David S. Miller2012-06-271-19/+24
| | | | | | | And use nlmsg_data() while we're here too, and remove useless casts. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-05-081-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/intel/e1000e/param.c drivers/net/wireless/iwlwifi/iwl-agn-rx.c drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c drivers/net/wireless/iwlwifi/iwl-trans.h Resolved the iwlwifi conflict with mainline using 3-way diff posted by John Linville and Stephen Rothwell. In 'net' we added a bug fix to make iwlwifi report a more accurate skb->truesize but this conflicted with RX path changes that happened meanwhile in net-next. In e1000e a conflict arose in the validation code for settings of adapter->itr. 'net-next' had more sophisticated logic so that logic was used. Signed-off-by: David S. Miller <davem@davemloft.net>
| * udp_diag: implement idiag_get_info for udp/udplite to get queue informationShan Wei2012-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | When we use netlink to monitor queue information for udp socket, idiag_rqueue and idiag_wqueue of inet_diag_msg are returned with 0. Keep consistent with netstat, just return back allocated rmem/wmem size. Signed-off-by: Shan Wei <davidshan@tencent.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sock_diag_handler structs can be constShan Wei2012-04-261-2/+2
|/ | | | | | | | read only, so change it to const. Signed-off-by: Shan Wei <davidshan@tencent.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: add netlink_dump_control structure for netlink_dump_start()Pablo Neira Ayuso2012-02-261-6/+12
| | | | | | | | | | | | | | | Davem considers that the argument list of this interface is getting out of control. This patch tries to address this issue following his proposal: struct netlink_dump_control c = { .dump = dump, .done = done, ... }; netlink_dump_start(..., &c); Suggested by David S. Miller. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Rename inet_diag_req_compat into inet_diag_reqPavel Emelyanov2012-01-111-4/+4
| | | | | Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Rename inet_diag_req into inet_diag_req_v2Pavel Emelyanov2012-01-111-17/+17
| | | | | Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Add the SKMEMINFO extensionPavel Emelyanov2011-12-301-0/+4
| | | | | Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sock_diag: Generalize requests cookies managementsPavel Emelyanov2011-12-161-19/+4Star
| | | | | | | | | The sk address is used as a cookie between dump/get_exact calls. It will be required for unix socket sdumping, so move it from inet_diag to sock_diag. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sock_diag: Fix module netlink aliasesPavel Emelyanov2011-12-161-3/+4
| | | | | | | | | | | | | | I've made a mistake when fixing the sock_/inet_diag aliases :( 1. The sock_diag layer should request the family-based alias, not just the IPPROTO_IP one; 2. The inet_diag layer should request for AF_INET+protocol alias, not just the protocol one. Thus fix this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use IS_ENABLED(CONFIG_IPV6)Eric Dumazet2011-12-121-8/+8
| | | | | | | Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Generalize inet_diag dump and get_exact callsPavel Emelyanov2011-12-091-5/+6
| | | | | | | | | | | | | | | Introduce two callbacks in inet_diag_handler -- one for dumping all sockets (with filters) and the other one for dumping a single sk. Replace direct calls to icsk handlers with indirect calls to callbacks provided by handlers. Make existing TCP and DCCP handlers use provided helpers for icsk-s. The UDP diag module will provide its own. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Introduce the inet socket dumping routinePavel Emelyanov2011-12-091-19/+34
| | | | | | | | | | | The existing inet_csk_diag_fill dumps the inet connection sock info into the netlink inet_diag_message. Prepare this routine to be able to dump only the inet_sock part of a socket if the icsk part is missing. This will be used by UDP diag module when dumping UDP sockets. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Introduce the byte-code run on an inet socketPavel Emelyanov2011-12-091-24/+31
| | | | | | | | The upcoming UDP module will require exactly this ability, so just move the existing code to provide one. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Split inet_diag_get_exact into partsPavel Emelyanov2011-12-091-12/+16
| | | | | | | | | Similar to previous patch: the 1st part locks the inet handler and will get generalized and the 2nd one dumps icsk-s and will be used by TCP and DCCP handlers. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Split inet_diag_get_exact into partsPavel Emelyanov2011-12-091-16/+22
| | | | | | | | | | | | The 1st part locks the inet handler and the 2nd one dump the inet connection sock. In the next patches the 1st part will be generalized to call the socket dumping routine indirectly (i.e. TCP/UDP/DCCP) and the 2nd part will be used by TCP and DCCP handlers. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Export inet diag cookie checking routinePavel Emelyanov2011-12-091-5/+14
| | | | | | | | | | | | The netlink diag susbsys stores sk address bits in the nl message as a "cookie" and uses one when dumps details about particular socket. The same will be required for udp diag module, so introduce a heler in inet_diag module Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Reduce the number of args for bytecode run routinePavel Emelyanov2011-12-091-6/+8
| | | | | Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Remove indirect sizeof from inet diag handlersPavel Emelyanov2011-12-091-3/+2Star
| | | | | | | | | There's an info_size value stored on inet_diag_handler, but for existing code this value is effectively constant, so just use sizeof(struct tcp_info) where required. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sock_diag: Move the sock_ code to net/core/Pavel Emelyanov2011-12-061-99/+6Star
| | | | | | | | | This patch moves the sock_ code from inet_diag.c to generic sock_diag.c file and provides necessary request_module-s calls and a pointer on inet_diag_compat dumping routine. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Cleanup type2proto last userPavel Emelyanov2011-12-061-23/+24
| | | | | | | | | | | | | Now all the code works with sock_diag_req-compatible structs, so it's possible to stop using the inet_diag_type2proto in inet_csk_diag_fill. Pass the inet_diag_req into it and use the sdiag_protocol field. At the same time remove the explicit ext argument, since it's also on the req. However, this conversion is still required in _compat code, so just move this routine, not remove. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Introduce socket family checksPavel Emelyanov2011-12-061-1/+11
| | | | | | | | | | | The new API will specify family to work with. Teach the existing socket walking code to bypass not interesting ones. To preserve compatibility with existing behavior the _compat code sets interesting family to AF_UNSPEC to dump them all. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Switch the _dump to work with new headerPavel Emelyanov2011-12-061-18/+53
| | | | | | | | | | | | | Make inet_diag_dumo work with given header instead of calculating one from the nl message. The SOCK_DIAG_BY_FAMILY just passes skb's one through, the compat code converts the old header to new one. Also fix the bytecode calculation to find one at proper offset. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Switch the _get_exact to work with new headerPavel Emelyanov2011-12-061-7/+22
| | | | | | | | | | | Make inet_diag_get_exact work with given header instead of calculating one from the nl message. The SOCK_DIAG_BY_FAMILY just passes skb's one through, the compat code converts the old header to new one. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Introduce new inet_diag_req headerPavel Emelyanov2011-12-061-7/+7
| | | | | | | | | | | This one coinsides with the sock_diag_req in the beginning and contains only used fields from its previous analogue. The existing code is patched to use the _compat version of it for now. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sock_diag: Initial skeletonPavel Emelyanov2011-12-061-2/+100
| | | | | | | | | | | When receiving the SOCK_DIAG_BY_FAMILY message we have to find the handler for provided family and pass the nl message to it. This patch describes an infrastructure to work with such nandlers and implements stubs for AF_INET(6) ones. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Switch from _GETSOCK to IPPROTO_ numbersPavel Emelyanov2011-12-061-11/+23
| | | | | | | | | | | Sorry, but the vger didn't let this message go to the list. Re-sending it with less spam-filter-prone subject. When dumping the AF_INET/AF_INET6 sockets user will also specify the protocol, so prepare the protocol diag handlers to work with IPPROTO_ constants. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Move byte-code finding up the call-stackPavel Emelyanov2011-12-061-19/+17Star
| | | | | | | | Current code calculates it at fixed offset. This offset will change, so move the BC calculation upper to make the further patching simpler. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sock_diag: Introduce new message typePavel Emelyanov2011-12-061-2/+15
| | | | | | | | This type will run the family+protocol based socket dumping. Also prepare the stub function for it. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_diag: Partly rename inet_ to sock_Pavel Emelyanov2011-12-061-14/+19
| | | | | | | | | | The ultimate goal is to get the sock_diag module, that works in family+protocol terms. Currently this is suitable to do on the inet_diag basis, so rename parts of the code. It will be moved to sock_diag.c later. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-11-261-3/+6
|\ | | | | | | | | Conflicts: net/ipv4/inet_diag.c
| * net-netlink: fix diag to export IPv4 tos for dual-stack IPv6 socketsMaciej Żenczykowski2011-11-221-5/+9
| | | | | | | | | | Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: remove ipv6_addr_copy()Alexey Dobriyan2011-11-221-12/+6Star
|/ | | | | | | C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-netlink: Add a new attribute to expose TCLASS values via netlinkMaciej Żenczykowski2011-11-141-2/+2
| | | | | | | | | | | | | | | commit 3ceca749668a52bd795585e0f71c6f0b04814f7b added a TOS attribute. Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket, furthermore they can have different values. As such one cannot in a sane way expose both through a single attribute. Signed-off-by: Maciej Żenczyowski <maze@google.com> CC: Murali Raja <muralira@google.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-netlink: Add a new attribute to expose TOS values via netlinkMurali Raja2011-10-131-0/+5
| | | | | | | | | | | | | | This patch exposes the tos value for the TCP sockets when the TOS flag is requested in the ext_flags for the inet_diag request. This would mainly be used to expose TOS values for both for TCP and UDP sockets. Currently it is supported for TCP. When netlink support for UDP would be added the support to expose the TOS values would alse be done. For IPV4 tos value is exposed and for IPV6 tclass value is exposed. Signed-off-by: Murali Raja <muralira@google.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2011-06-211-8/+6Star
|\ | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c