summaryrefslogtreecommitdiffstats
path: root/include/uapi/linux/pkt_cls.h
Commit message (Collapse)AuthorAgeFilesLines
* net/sched: cls_flower: Add matching on conntrack infoPaul Blakey2019-07-091-0/+16
| | | | | | | | | | New matches for conntrack mark, label, zone, and state. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: Introduce action ctPaul Blakey2019-07-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow sending a packet to conntrack module for connection tracking. The packet will be marked with conntrack connection's state, and any metadata such as conntrack mark and label. This state metadata can later be matched against with tc classifers, for example with the flower classifier as below. In addition to committing new connections the user can optionally specific a zone to track within, set a mark/label and configure nat with an address range and port range. Usage is as follows: $ tc qdisc add dev ens1f0_0 ingress $ tc qdisc add dev ens1f0_1 ingress $ tc filter add dev ens1f0_0 ingress \ prio 1 chain 0 proto ip \ flower ip_proto tcp ct_state -trk \ action ct zone 2 pipe \ action goto chain 2 $ tc filter add dev ens1f0_0 ingress \ prio 1 chain 2 proto ip \ flower ct_state +trk+new \ action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_0 ingress \ prio 1 chain 2 proto ip \ flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \ action ct nat pipe \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_1 ingress \ prio 1 chain 0 proto ip \ flower ip_proto tcp ct_state -trk \ action ct zone 2 pipe \ action goto chain 1 $ tc filter add dev ens1f0_1 ingress \ prio 1 chain 1 proto ip \ flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \ action ct nat pipe \ action mirred egress redirect dev ens1f0_0 Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Changelog: V5->V6: Added CONFIG_NF_DEFRAG_IPV6 in handle fragments ipv6 case V4->V5: Reordered nf_conntrack_put() in tcf_ct_skb_nfct_cached() V3->V4: Added strict_start_type for act_ct policy V2->V3: Fixed david's comments: Removed extra newline after rcu in tcf_ct_params , and indent of break in act_ct.c V1->V2: Fixed parsing of ranges TCA_CT_NAT_IPV6_MAX as 'else' case overwritten ipv4 max Refactored NAT_PORT_MIN_MAX range handling as well Added ipv4/ipv6 defragmentation Removed extra skb pull push of nw offset in exectute nat Refactored tcf_ct_skb_network_trim after pull Removed TCA_ACT_CT define Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: add mpls manipulation actions to TCJohn Hurley2019-07-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, TC offers the ability to match on the MPLS fields of a packet through the use of the flow_dissector_key_mpls struct. However, as yet, TC actions do not allow the modification or manipulation of such fields. Add a new module that registers TC action ops to allow manipulation of MPLS. This includes the ability to push and pop headers as well as modify the contents of new or existing headers. A further action to decrement the TTL field of an MPLS header is also provided with a new helper added to support this. Examples of the usage of the new action with flower rules to push and pop MPLS labels are: tc filter add dev eth0 protocol ip parent ffff: flower \ action mpls push protocol mpls_uc label 123 \ action mirred egress redirect dev eth1 tc filter add dev eth0 protocol mpls_uc parent ffff: flower \ action mpls pop protocol ipv4 \ action mirred egress redirect dev eth1 Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: remove NET_CLS_IND config optionJiri Pirko2019-06-151-1/+1
| | | | | | | | | | | | | This config option makes only couple of lines optional. Two small helpers and an int in couple of cls structs. Remove the config option and always compile this in. This saves the user from unexpected surprises when he adds a filter with ingress device match which is silently ignored in case the config option is not set. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: Introduce act_ctinfo actionKevin 'ldir' Darbyshire-Bryant2019-05-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ctinfo is a new tc filter action module. It is designed to restore information contained in firewall conntrack marks to other packet fields and is typically used on packet ingress paths. At present it has two independent sub-functions or operating modes, DSCP restoration mode & skb mark restoration mode. The DSCP restore mode: This mode copies DSCP values that have been placed in the firewall conntrack mark back into the IPv4/v6 diffserv fields of relevant packets. The DSCP restoration is intended for use and has been found useful for restoring ingress classifications based on egress classifications across links that bleach or otherwise change DSCP, typically home ISP Internet links. Restoring DSCP on ingress on the WAN link allows qdiscs such as but by no means limited to CAKE to shape inbound packets according to policies that are easier to set & mark on egress. Ingress classification is traditionally a challenging task since iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT lookups, hence are unable to see internal IPv4 addresses as used on the typical home masquerading gateway. Thus marking the connection in some manner on egress for later restoration of classification on ingress is easier to implement. Parameters related to DSCP restore mode: dscpmask - a 32 bit mask of 6 contiguous bits and indicate bits of the conntrack mark field contain the DSCP value to be restored. statemask - a 32 bit mask of (usually) 1 bit length, outside the area specified by dscpmask. This represents a conditional operation flag whereby the DSCP is only restored if the flag is set. This is useful to implement a 'one shot' iptables based classification where the 'complicated' iptables rules are only run once to classify the connection on initial (egress) packet and subsequent packets are all marked/restored with the same DSCP. A mask of zero disables the conditional behaviour ie. the conntrack mark DSCP bits are always restored to the ip diffserv field (assuming the conntrack entry is found & the skb is an ipv4/ipv6 type) e.g. dscpmask 0xfc000000 statemask 0x01000000 |----0xFC----conntrack mark----000000---| | Bits 31-26 | bit 25 | bit24 |~~~ Bit 0| | DSCP | unused | flag |unused | |-----------------------0x01---000000---| | | | | ---| Conditional flag v only restore if set |-ip diffserv-| | 6 bits | |-------------| The skb mark restore mode (cpmark): This mode copies the firewall conntrack mark to the skb's mark field. It is completely the functional equivalent of the existing act_connmark action with the additional feature of being able to apply a mask to the restored value. Parameters related to skb mark restore mode: mask - a 32 bit mask applied to the firewall conntrack mark to mask out bits unwanted for restoration. This can be useful where the conntrack mark is being used for different purposes by different applications. If not specified and by default the whole mark field is copied (i.e. default mask of 0xffffffff) e.g. mask 0x00ffffff to mask out the top 8 bits being used by the aforementioned DSCP restore mode. |----0x00----conntrack mark----ffffff---| | Bits 31-24 | | | DSCP & flag| some value here | |---------------------------------------| | | v |------------skb mark-------------------| | | | | zeroed | | |---------------------------------------| Overall parameters: zone - conntrack zone control - action related control (reclassify | pipe | drop | continue | ok | goto chain <CHAIN_INDEX>) Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Change TCA_ACT_* to TCA_ID_* to match that of TCA_ID_POLICEEli Cohen2019-02-101-1/+1
| | | | | | | | | | | | | | | Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with TCA_ID_POLICE and also differentiates these identifier, used in struct tc_action_ops type field, from other macros starting with TCA_ACT_. To make things clearer, we name the enum defining the TCA_ID_* identifiers and also change the "type" field of struct tc_action to id. Signed-off-by: Eli Cohen <eli@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Move all TC actions identifiers to one placeEli Cohen2019-02-101-3/+40
| | | | | | | | | | | | Move all the TC identifiers to one place, to the same enum that defines the identifier of police action. This makes it easier choose numbers for new actions since they are now defined in one place. We preserve the original values for binary compatibility. New IDs should be added inside the enum. Signed-off-by: Eli Cohen <eli@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: add performance counters for basic filterCong Wang2019-01-201-0/+7
| | | | | | | | | | | | | | | | | | | | | | Similar to u32 filter, it is useful to know how many times we reach each basic filter and how many times we pass the ematch attached to it. Sample output: filter protocol arp pref 49152 basic chain 0 filter protocol arp pref 49152 basic chain 0 handle 0x1 (rule hit 3 success 3) action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 installed 81 sec used 4 sec Action statistics: Sent 126 bytes 3 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: add hit counter for matchallCong Wang2019-01-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Although matchall always matches packets, however, it still relies on a protocol match first. So it is still useful to have such a counter for matchall. Of course, unlike u32, every time we hit a matchall filter, it is always a success, so we don't have to distinguish them. Sample output: filter protocol 802.1Q pref 100 matchall chain 0 filter protocol 802.1Q pref 100 matchall chain 0 handle 0x1 not_in_hw (rule hit 10) action order 1: vlan pop continue index 1 ref 1 bind 1 installed 40 sec used 1 sec Action statistics: Sent 836 bytes 10 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Reported-by: Martin Olsson <martin.olsson+netdev@sentorsecurity.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: cls_flower: Classify packets using port rangesAmritha Nambiar2018-11-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added support in tc flower for filtering based on port ranges. Example: 1. Match on a port range: ------------------------- $ tc filter add dev enp4s0 protocol ip parent ffff:\ prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\ action drop $ tc -s filter show dev enp4s0 parent ffff: filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 ip_proto tcp dst_port range 20-30 skip_hw not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 85 sec used 3 sec Action statistics: Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0) backlog 0b 0p requeues 0 2. Match on IP address and port range: -------------------------------------- $ tc filter add dev enp4s0 protocol ip parent ffff:\ prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\ skip_hw action drop $ tc -s filter show dev enp4s0 parent ffff: filter protocol ip pref 1 flower chain 0 handle 0x2 eth_type ipv4 ip_proto tcp dst_ip 192.168.1.1 dst_port range 100-200 skip_hw not_in_hw action order 1: gact action drop random type none pass val 0 index 2 ref 1 bind 1 installed 58 sec used 2 sec Action statistics: Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0) backlog 0b 0p requeues 0 v4: 1. Added condition before setting port key. 2. Organized setting and dumping port range keys into functions and added validation of input range. v3: 1. Moved new fields in UAPI enum to the end of enum. 2. Removed couple of empty lines. v2: Addressed Jiri's comments: 1. Added separate functions for dst and src comparisons. 2. Removed endpoint enum. 3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range lookup. 4. Cleaned up fl_lookup function. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: cls_flower: dump offload count valueVlad Buslov2018-09-101-0/+2
| | | | | | | | | | Change flower in_hw_count type to fixed-size u32 and dump it as TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared blocks and re-offload functionality. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: allow flower to match tunnel optionsPieter Jansen van Vuuren2018-08-071-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow matching on options in Geneve tunnel headers. This makes use of existing tunnel metadata support. The options can be described in the form CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is represented as a 16bit hexadecimal value, TYPE as an 8bit hexadecimal value and DATA as a variable length hexadecimal value. e.g. # ip link add name geneve0 type geneve dstport 0 external # tc qdisc add dev geneve0 ingress # tc filter add dev geneve0 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \ ip_proto udp \ action mirred egress redirect dev eth1 This patch adds support for matching Geneve options in the order supplied by the user. This leads to an efficient implementation in the software datapath (and in our opinion hardware datapaths that offload this feature). It is also compatible with Geneve options matching provided by the Open vSwitch kernel datapath which is relevant here as the Flower classifier may be used as a mechanism to program flows into hardware as a form of Open vSwitch datapath offload (sometimes referred to as OVS-TC). The netlink Kernel/Userspace API may be extended, for example by adding a flag, if other matching options are desired, for example matching given options in any order. This would require an implementation in the TC software datapath. And be done in a way that drivers that facilitate offload of the Flower classifier can reject or accept such flows based on hardware datapath capabilities. This approach was discussed and agreed on at Netconf 2017 in Seoul. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: user-space can't set unknown tcfa_action valuesPaolo Abeni2018-07-301-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | Currently, when initializing an action, the user-space can specify and use arbitrary values for the tcfa_action field. If the value is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC. This change explicitly checks for unknown values at action creation time, and explicitly convert them to TC_ACT_UNSPEC. No functional changes are introduced, but this will allow introducing tcfa_action values not exposed to user-space in a later patch. Note: we can't use the above to hide TC_ACT_REDIRECT from user-space, as the latter is already part of uAPI. v3 -> v4: - use an helper to check for action validity (JiriP) - emit an extack for invalid actions (JiriP) v4 -> v5: - keep messages on a single line, drop net_warn (Marcelo) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: Support matching on ip tos and ttl for tunnelsOr Gerlitz2018-07-201-0/+5
| | | | | | | | | | Allow users to set rules matching on ipv4 tos and ttl or ipv6 traffic-class and hoplimit of tunnel headers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: flower: Add supprt for matching on QinQ vlan headersJianbo Liu2018-07-071-0/+4
| | | | | | | | | As support dissecting of QinQ inner and outer vlan headers, user can add rules to match on QinQ vlan headers. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sched: cls: enable verbose loggingMarcelo Ricardo Leitner2018-05-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when the rule is not to be exclusively executed by the hardware, extack is not passed along and offloading failures don't get logged. The idea was that hardware failures are okay because the rule will get executed in software then and this way it doesn't confuse unware users. But this is not helpful in case one needs to understand why a certain rule failed to get offloaded. Considering it may have been a temporary failure, like resources exceeded or so, reproducing it later and knowing that it is triggering the same reason may be challenging. The ultimate goal is to improve Open vSwitch debuggability when using flower offloading. This patch adds a new flag to enable verbose logging. With the flag set, extack will be passed to the driver, which will be able to log the error. As the operation itself probably won't fail (not because of this, at least), current iproute will already log it as a Warning. The flag is generic, so it can be reused later. No need to restrict it just for HW offloading. The command line will follow the syntax that tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning. For example: # ./tc qdisc add dev p7p1 ingress # ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \ flower verbose \ src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \ src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop Warning: TC offload is disabled on net device. # echo $? 0 # ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \ flower \ src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \ src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop # echo $? 0 Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: Add support to handle first frag as match fieldPieter Jansen van Vuuren2018-03-081-0/+1
| | | | | | | | | | | | | | Allow setting firstfrag as matching option in tc flower classifier. # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 \ ip_flags firstfrag action mirred egress redirect dev eth1 Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: add em_ipt ematch for calling xtables matchesEyal Birger2018-02-211-1/+2
| | | | | | | | | | | | | | | | | | | | | The commit a new tc ematch for using netfilter xtable matches. This allows early classification as well as mirroning/redirecting traffic based on logic implemented in netfilter extensions. Current supported use case is classification based on the incoming IPSec state used during decpsulation using the 'policy' iptables extension (xt_policy). The module dynamically fetches the netfilter match module and calls it using a fake xt_action_param structure based on validated userspace provided parameters. As the xt_policy match does not access skb->data, no skb modifications are needed on match. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* License cleanup: add SPDX license identifier to uapi header files with no ↵Greg Kroah-Hartman2017-11-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | license Many user space API headers are missing licensing information, which makes it hard for compliance tools to determine the correct license. By default are files without license information under the default license of the kernel, which is GPLV2. Marking them GPLV2 would exclude them from being included in non GPLV2 code, which is obviously not intended. The user space API headers fall under the syscall exception which is in the kernels COPYING file: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work". otherwise syscall usage would not be possible. Update the files which contain no license information with an SPDX license identifier. The chosen identifier is 'GPL-2.0 WITH Linux-syscall-note' which is the officially assigned identifier for the Linux syscall exception. SPDX license identifiers are a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. See the previous patch in this series for the methodology of how this patch was researched. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* bpf: expose prog id for cls_bpf and act_bpfDaniel Borkmann2017-06-211-0/+1
| | | | | | | | | | | In order to be able to retrieve the attached programs from cls_bpf and act_bpf, we need to expose the prog ids via netlink so that an application can later on get an fd based on the id through the BPF_PROG_GET_FD_BY_ID command, and dump related prog info via BPF_OBJ_GET_INFO_BY_FD command for bpf(2). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: introduce a TRAP control actionJiri Pirko2017-06-061-0/+7
| | | | | | | | | | | | | | | | There is need to instruct the HW offloaded path to push certain matched packets to cpu/kernel for further analysis. So this patch introduces a new TRAP control action to TC. For kernel datapath, this action does not make much sense. So with the same logic as in HW, new TRAP behaves similar to STOLEN. The skb is just dropped in the datapath (and virtually ejected to an upper level, which does not exist in case of kernel). Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: add support for matching on ip tos and ttlOr Gerlitz2017-06-051-0/+5
| | | | | | | | | | Benefit from the support of ip header fields dissection and allow users to set rules matching on ipv4 tos and ttl or ipv6 traffic-class and hoplimit. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: flower: add support for matching on tcp flagsJiri Pirko2017-05-241-0/+3
| | | | | | | | Benefit from the support of tcp flags dissection and allow user to insert rules matching on tcp flags. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: add termination action to allow goto chainJiri Pirko2017-05-171-0/+1
| | | | | | | | | | | | Introduce new type of termination action called "goto_chain". This allows user to specify a chain to be processed. This action type is then processed as a return value in tcf_classify loop in similar way as "reclassify" is, only it does not reset to the first filter in chain but rather reset to the first filter of the desired chain. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: add helpers to handle extended actionsJiri Pirko2017-05-021-1/+14
| | | | | | | | | | | | | Jump is now the only one using value action opcode. This is going to change soon. So introduce helpers to work with this. Convert TC_ACT_JUMP. This also fixes the TC_ACT_JUMP check, which is incorrectly done as a bit check, not a value check. Fixes: e0ee84ded796 ("net sched actions: Complete the JUMPX opcode") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* cls_flower: add support for matching MPLS fields (v2)Benjamin LaHaise2017-04-241-0/+5
| | | | | | | | | | | | | | | | | | | Add support to the tc flower classifier to match based on fields in MPLS labels (TTL, Bottom of Stack, TC field, Label). Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Simon Horman <simon.horman@netronome.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Gao Feng <fgao@ikuai8.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: Reflect HW offload statusOr Gerlitz2017-02-171-2/+4
| | | | | | | | | | | | | | | | | | | Currently there is no way of querying whether a filter is offloaded to HW or not when using "both" policy (where none of skip_sw or skip_hw flags are set by user-space). Add two new flags, "in hw" and "not in hw" such that user space can determine if a filter is actually offloaded to hw or not. The "in hw" UAPI semantics was chosen so it's similar to the "skip hw" flag logic. If none of these two flags are set, this signals running over older kernel. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Amir Vadai <amir@vadai.me> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net sched actions: Add support for user cookiesJamal Hadi Salim2017-01-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce optional 128-bit action cookie. Like all other cookie schemes in the networking world (eg in protocols like http or existing kernel fib protocol field, etc) the idea is to save user state that when retrieved serves as a correlator. The kernel _should not_ intepret it. The user can store whatever they wish in the 128 bits. Sample exercise(showing variable length use of cookie) .. create an accept action with cookie a1b2c3d4 sudo $TC actions add action ok index 1 cookie a1b2c3d4 .. dump all gact actions.. sudo $TC -s actions ls action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 0 installed 5 sec used 5 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie a1b2c3d4 .. bind the accept action to a filter.. sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \ u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1 ... send some traffic.. $ ping 127.0.0.1 -c 3 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-01-171-1/+1
|\
| * bpf: rework prog_digest into prog_tagDaniel Borkmann2017-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") was recently discussed, partially due to admittedly suboptimal name of "prog_digest" in combination with sha1 hash usage, thus inevitably and rightfully concerns about its security in terms of collision resistance were raised with regards to use-cases. The intended use cases are for debugging resp. introspection only for providing a stable "tag" over the instruction sequence that both kernel and user space can calculate independently. It's not usable at all for making a security relevant decision. So collisions where two different instruction sequences generate the same tag can happen, but ideally at a rather low rate. The "tag" will be dumped in hex and is short enough to introspect in tracepoints or kallsyms output along with other data such as stack trace, etc. Thus, this patch performs a rename into prog_tag and truncates the tag to a short output (64 bits) to make it obvious it's not collision-free. Should in future a hash or facility be needed with a security relevant focus, then we can think about requirements, constraints, etc that would fit to that situation. For now, rework the exposed parts for the current use cases as long as nothing has been released yet. Tested on x86_64 and s390x. Fixes: 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: cls_flower: Support matching on ARPSimon Horman2017-01-111-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support matching on ARP operation, and hardware and protocol addresses for Ethernet hardware and IPv4 protocol addresses. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol arp parent ffff: flower indev eth0 \ arp_op request arp_sip 10.0.0.1 action drop tc filter add dev eth0 protocol rarp parent ffff: flower indev eth0 \ arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net-tc: convert tc_from to tc_from_ingress and tc_redirectedWillem de Bruijn2017-01-091-6/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tc_from field fulfills two roles. It encodes whether a packet was redirected by an act_mirred device and, if so, whether act_mirred was called on ingress or egress. Split it into separate fields. The information is needed by the special IFB loop, where packets are taken out of the normal path by act_mirred, forwarded to IFB, then reinjected at their original location (ingress or egress) by IFB. The IFB device cannot use skb->tc_at_ingress, because that may have been overwritten as the packet travels from act_mirred to ifb_xmit, when it passes through tc_classify on the IFB egress path. Cache this value in skb->tc_from_ingress. That field is valid only if a packet arriving at ifb_xmit came from act_mirred. Other packets can be crafted to reach ifb_xmit. These must be dropped. Set tc_redirected on redirection and drop all packets that do not have this bit set. Both fields are set only on cloned skbs in tc actions, so original packet sources do not have to clear the bit when reusing packets (notably, pktgen and octeon). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net-tc: convert tc_verd to integer bitfieldsWillem de Bruijn2017-01-091-31/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract the remaining two fields from tc_verd and remove the __u16 completely. TC_AT and TC_FROM are converted to equivalent two-bit integer fields tc_at and tc_from. Where possible, use existing helper skb_at_tc_ingress when reading tc_at. Introduce helper skb_reset_tc to clear fields. Not documenting tc_from and tc_at, because they will be replaced with single bit fields in follow-on patches. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net-tc: extract skip classify bit from tc_verdWillem de Bruijn2017-01-091-6/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Packets sent by the IFB device skip subsequent tc classification. A single bit governs this state. Move it out of tc_verd in anticipation of removing that __u16 completely. The new bitfield tc_skip_classify temporarily uses one bit of a hole, until tc_verd is removed completely in a follow-up patch. Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long. With that many options, little value in documenting it. Introduce a helper function to deduplicate the logic in the two sites that check this bit. The field tc_skip_classify is set only in IFB on skbs cloned in act_mirred, so original packet sources do not have to clear the bit when reusing packets (notably, pktgen and octeon). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net-tc: make MAX_RECLASSIFY_LOOP localWillem de Bruijn2017-01-091-5/+0Star
| | | | | | | | | | | | | | | | This field is no longer kept in tc_verd. Remove it from the global definition of that struct. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net-tc: remove unused tc_verd fieldsWillem de Bruijn2017-01-091-7/+0Star
|/ | | | | | | | Remove the last reference to tc_verd's munge and redirect ttl bits. These fields are no longer used. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: Support matching on ICMP type and codeSimon Horman2016-12-081-0/+10
| | | | | | | | | | | | | | | | | | Support matching on ICMP type and code. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: flower \ indev eth0 ip_proto icmp type 8 code 0 action drop tc filter add dev eth0 protocol ipv6 parent ffff: flower \ indev eth0 ip_proto icmpv6 type 128 code 0 action drop Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: Add support for matching on flagsOr Gerlitz2016-12-081-0/+7
| | | | | | | | | | | | | Add UAPI to provide set of flags for matching, where the flags provided from user-space are mapped to flow-dissector flags. The 1st flag allows to match on whether the packet is an IP fragment and corresponds to the FLOW_DIS_IS_FRAGMENT flag. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf: add prog_digest and expose it via fdinfo/netlinkDaniel Borkmann2016-12-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When loading a BPF program via bpf(2), calculate the digest over the program's instruction stream and store it in struct bpf_prog's digest member. This is done at a point in time before any instructions are rewritten by the verifier. Any unstable map file descriptor number part of the imm field will be zeroed for the hash. fdinfo example output for progs: # cat /proc/1590/fdinfo/5 pos: 0 flags: 02000002 mnt_id: 11 prog_type: 1 prog_jited: 1 prog_digest: b27e8b06da22707513aa97363dfb11c7c3675d28 memlock: 4096 When programs are pinned and retrieved by an ELF loader, the loader can check the program's digest through fdinfo and compare it against one that was generated over the ELF file's program section to see if the program needs to be reloaded. Furthermore, this can also be exposed through other means such as netlink in case of a tc cls/act dump (or xdp in future), but also through tracepoints or other facilities to identify the program. Other than that, the digest can also serve as a base name for the work in progress kallsyms support of programs. The digest doesn't depend/select the crypto layer, since we need to keep dependencies to a minimum. iproute2 will get support for this facility. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: Add UDP port to tunnel parametersHadar Hen Zion2016-11-091-0/+5
| | | | | | | | | | The current IP tunneling classification supports only IP addresses and key. Enhance UDP based IP tunneling classification parameters by adding UDP src and dst port. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: Support matching on SCTP portsSimon Horman2016-11-031-0/+5
| | | | | | | | | | | | | | | | | Support matching on SCTP ports in the same way that matching on TCP and UDP ports is already supported. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 ip_proto sctp dst_port 80 \ action drop Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cls_bpf: limit hardware offload by software-only flagJakub Kicinski2016-09-221-0/+1
| | | | | | | | | | | | | Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag. Unlike U32 and flower cls_bpf already has some netlink flags defined. Create a new attribute to be able to use the same flag values as the above. Unlike U32 and flower reject unknown flags. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: Specify vlan attributes format in the UAPI headerOr Gerlitz2016-09-161-3/+3
| | | | | | | | Specify the format (size and endianess) for the vlan attributes. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: Support masking for matching on tcp/udp portsOr Gerlitz2016-09-161-0/+4
| | | | | | | | | | Add the definitions for src/dst udp/tcp port masks and use them when setting && dumping the relevant keys. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_flower: Classify packet in ip tunnelsAmir Vadai2016-09-111-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce classifying by metadata extracted by the tunnel device. Outer header fields - source/dest ip and tunnel id, are extracted from the metadata when classifying. For example, the following will add a filter on the ingress Qdisc of shared vxlan device named 'vxlan0'. To forward packets with outer src ip 11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be forwarded to tap device 'vnet0' (after metadata is released): $ tc filter add dev vxlan0 protocol ip parent ffff: \ flower \ enc_src_ip 11.11.0.2 \ enc_dst_ip 11.11.0.1 \ enc_key_id 11 \ dst_ip 11.11.11.1 \ action tunnel_key release \ action mirred egress redirect dev vnet0 The action tunnel_key, will be introduced in the next patch in this series. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: flower: Add vlan supportHadar Hen Zion2016-08-191-0/+3
| | | | | | | | | | | | | | | | | | | | Enhance flower to support 802.1Q vlan protocol classification. Currently, the supported fields are vlan_id and vlan_priority. Example: # add a flower filter with vlan id and priority classification tc filter add dev ens4f0 protocol 802.1Q parent ffff: \ flower \ indev ens4f0 \ vlan_ethtype ipv4 \ vlan_id 100 \ vlan_prio 3 \ action vlan pop Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: Add match-all classifier hw offloading.Yotam Gigi2016-07-251-0/+1
| | | | | | | | | | | | | | | | | | Following the work that have been done on offloading classifiers like u32 and flower, now the match-all classifier hw offloading is possible. if the interface supports tc offloading. To control the offloading, two tc flags have been introduced: skip_sw and skip_hw. Typical usage: tc filter add dev eth25 parent ffff: \ matchall skip_sw \ action mirred egress mirror \ dev eth27 Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: introduce Match-all classifierJiri Pirko2016-07-251-0/+11
| | | | | | | | | | | The matchall classifier matches every packet and allows the user to apply actions on it. This filter is very useful in usecases where every packet should be matched, for example, packet mirroring (SPAN) can be setup very easily using that filter. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net sched: indentation and other OCD stylistic fixesJamal Hadi Salim2016-06-081-3/+3
| | | | | Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
* net sched actions: introduce timestamp for firsttime useJamal Hadi Salim2016-06-081-0/+1
| | | | | | | | Useful to know when the action was first used for accounting (and debugging) Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>