summaryrefslogtreecommitdiffstats
path: root/include/net/netfilter/nf_tables.h
Commit message (Collapse)AuthorAgeFilesLines
* net: flow_offload: add flow_block structure and use itPablo Neira Ayuso2019-07-201-2/+3
| | | | | | | | | | | | This object stores the flow block callbacks that are attached to this block. Update flow_block_cb_lookup() to take this new object. This patch restores the block sharing feature. Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: nf_tables: add hardware offload supportPablo Neira Ayuso2019-07-091-0/+14
| | | | | | | | | | | | | | | | | | | | | This patch adds hardware offload support for nftables through the existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER classifier and the flow rule API. This hardware offload support is available for the NFPROTO_NETDEV family and the ingress hook. Each nftables expression has a new ->offload interface, that is used to populate the flow rule object that is attached to the transaction object. There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload an entire table, including all of its chains. This patch supports for basic metadata (layer 3 and 4 protocol numbers), 5-tuple payload matching and the accept/drop actions; this also includes basechain hardware offload only. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: nf_tables: enable set expiration time for set elementsLaura Garcia Liebana2019-06-191-1/+1
| | | | | | | | | | | | | | | | | Currently, the expiration of every element in a set or map is a read-only parameter generated at kernel side. This change will permit to set a certain expiration date per element that will be required, for example, during stateful replication among several nodes. This patch handles the NFTA_SET_ELEM_EXPIRATION in order to configure the expiration parameter per element, or will use the timeout in the case that the expiration is not set. Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: drop include of module.h from nf_tables.hPaul Gortmaker2019-04-301-1/+2
| | | | | | | | | | | | | | | | | | Ideally, header files under include/linux shouldn't be adding includes of other headers, in anticipation of their consumers, but just the headers needed for the header itself to pass parsing with CPP. The module.h is particularly bad in this sense, as it itself does include a whole bunch of other headers, due to the complexity of module support. Since nf_tables.h is not going into a module struct looking for specific fields, we can just let it know that module is a struct, just like about 60 other include/linux headers already do. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: relocate header content to consumerPaul Gortmaker2019-04-301-17/+0Star
| | | | | | | | | | | | | | | | | | | | | The nf_tables.h header is used in a lot of files, but it turns out that there is only one actual user of nft_expr_clone(). Hence we relocate that function to be with the one consumer of it and avoid having to process it with CPP for all the other files. This will also enable a reduction in the other headers that the nf_tables.h itself has to include just to be stand-alone, hence a pending further significant reduction in the CPP content that needs to get processed for each netfilter file. Note that the explicit "inline" has been dropped as part of this relocation. In similar changes to this, I believe Dave has asked this be done, so we free up gcc to make the choice of whether to inline or not. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: make two functions staticFlorian Westphal2019-04-081-2/+0Star
| | | | | | | They have no external callers anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: merge route type into coreFlorian Westphal2019-04-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | very little code, so it really doesn't make sense to have extra modules or even a kconfig knob for this. Merge them and make functionality available unconditionally. The merge makes inet family route support trivial, so add it as well here. Before: text data bss dec hex filename 835 832 0 1667 683 nft_chain_route_ipv4.ko 870 832 0 1702 6a6 nft_chain_route_ipv6.ko 111568 2556 529 114653 1bfdd nf_tables.ko After: text data bss dec hex filename 113133 2556 529 116218 1c5fa nf_tables.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: bogus EBUSY when deleting set after flushPablo Neira Ayuso2019-03-111-0/+6
| | | | | | | | | | Set deletion after flush coming in the same batch results in EBUSY. Add set use counter to track the number of references to this set from rules. We cannot rely on the list of bindings for this since such list is still populated from the preparation phase. Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: fix set double-free in abort pathPablo Neira Ayuso2019-03-081-4/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The abort path can cause a double-free of an anonymous set. Added-and-to-be-aborted rule looks like this: udp dport { 137, 138 } drop The to-be-aborted transaction list looks like this: newset newsetelem newsetelem rule This gets walked in reverse order, so first pass disables the rule, the set elements, then the set. After synchronize_rcu(), we then destroy those in same order: rule, set element, set element, newset. Problem is that the anonymous set has already been bound to the rule, so the rule (lookup expression destructor) already frees the set, when then cause use-after-free when trying to delete the elements from this set, then try to free the set again when handling the newset expression. Rule releases the bound set in first place from the abort path, this causes the use-after-free on set element removal when undoing the new element transactions. To handle this, skip new element transaction if set is bound from the abort path. This is still causes the use-after-free on set element removal. To handle this, remove transaction from the list when the set is already bound. Joint work with Florian Westphal. Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path") Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1325 Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nft_compat: use .release_ops and remove list of extensionPablo Neira Ayuso2019-02-271-0/+3
| | | | | | | | | | | | | | Add .release_ops, that is called in case of error at a later stage in the expression initialization path, ie. .select_ops() has been already set up operations and that needs to be undone. This allows us to unwind .select_ops from the error path, ie. release the dynamic operations for this extension. Moreover, allocate one single operation instead of recycling them, this comes at the cost of consuming a bit more memory per rule, but it simplifies the infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-02-091-4/+13
|\ | | | | | | | | | | | | | | | | | | An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_tables: unbind set in rule from commit pathPablo Neira Ayuso2019-02-041-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Anonymous sets that are bound to rules from the same transaction trigger a kernel splat from the abort path due to double set list removal and double free. This patch updates the logic to search for the transaction that is responsible for creating the set and disable the set list removal and release, given the rule is now responsible for this. Lookup is reverse since the transaction that adds the set is likely to be at the tail of the list. Moreover, this patch adds the unbind step to deliver the event from the commit path. This should not be done from the worker thread, since we have no guarantees of in-order delivery to the listener. This patch removes the assumption that both activate and deactivate callbacks need to be provided. Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Mikhail Morfikov <mmorfikov@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: handle nft_object lookups via rhltableFlorian Westphal2019-01-181-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of linear search, use rhlist interface to look up the objects. This fixes rulesets with thousands of named objects (quota, counters and the like). We only use a single table for this and consider the address of the table we're doing the lookup in as a part of the key. This reduces restore time of a sample ruleset with ~20k named counters from 37 seconds to 0.8 seconds. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: prepare nft_object for lookups via hashtableFlorian Westphal2019-01-181-5/+14
|/ | | | | | | | Add a 'key' structure for object, so we can look them up by name + table combination (the name can be the same in each table). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: asynchronous releaseFlorian Westphal2018-09-171-0/+2
| | | | | | | | | | | | Release the committed transaction log from a work queue, moving expensive synchronize_rcu out of the locked section and providing opportunity to batch this. On my test machine this cuts runtime of nft-test.py in half. Based on earlier patch from Pablo Neira Ayuso. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: split set destruction in deactivate and destroy phaseFlorian Westphal2018-09-171-1/+6
| | | | | | | | | | | | | | | | | Splits unbind_set into destroy_set and unbinding operation. Unbinding removes set from lists (so new transaction would not find it anymore) but keeps memory allocated (so packet path continues to work). Rebind function is added to allow unrolling in case transaction that wants to remove set is aborted. Destroy function is added to free the memory, but this could occur outside of transaction in the future. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: fix register orderingFlorian Westphal2018-08-161-1/+1
| | | | | | | | | | We must register nfnetlink ops last, as that exposes nf_tables to userspace. Without this, we could theoretically get nfnetlink request before net->nft state has been initialized. Fixes: 99633ab29b213 ("netfilter: nf_tables: complete net namespace support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nft_set: fix allocation size overflow in privsize callback.Taehee Yoo2018-08-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to determine allocation size of set, ->privsize is invoked. At this point, both desc->size and size of each data structure of set are used. desc->size means number of element that is given by user. desc->size is u32 type. so that upperlimit of set element is 4294967295. but return type of ->privsize is also u32. hence overflow can occurred. test commands: %nft add table ip filter %nft add set ip filter hash1 { type ipv4_addr \; size 4294967295 \; } %nft list ruleset splat looks like: [ 1239.202910] kasan: CONFIG_KASAN_INLINE enabled [ 1239.208788] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 1239.217625] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 1239.219329] CPU: 0 PID: 1603 Comm: nft Not tainted 4.18.0-rc5+ #7 [ 1239.229091] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set] [ 1239.229091] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16 [ 1239.229091] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246 [ 1239.229091] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001 [ 1239.229091] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410 [ 1239.229091] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030 [ 1239.229091] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0 [ 1239.229091] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000 [ 1239.229091] FS: 00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 1239.229091] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1239.229091] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0 [ 1239.229091] Call Trace: [ 1239.229091] ? nft_hash_remove+0xf0/0xf0 [nf_tables_set] [ 1239.229091] ? memset+0x1f/0x40 [ 1239.229091] ? __nla_reserve+0x9f/0xb0 [ 1239.229091] ? memcpy+0x34/0x50 [ 1239.229091] nf_tables_dump_set+0x9a1/0xda0 [nf_tables] [ 1239.229091] ? __kmalloc_reserve.isra.29+0x2e/0xa0 [ 1239.229091] ? nft_chain_hash_obj+0x630/0x630 [nf_tables] [ 1239.229091] ? nf_tables_commit+0x2c60/0x2c60 [nf_tables] [ 1239.229091] netlink_dump+0x470/0xa20 [ 1239.229091] __netlink_dump_start+0x5ae/0x690 [ 1239.229091] nft_netlink_dump_start_rcu+0xd1/0x160 [nf_tables] [ 1239.229091] nf_tables_getsetelem+0x2e5/0x4b0 [nf_tables] [ 1239.229091] ? nft_get_set_elem+0x440/0x440 [nf_tables] [ 1239.229091] ? nft_chain_hash_obj+0x630/0x630 [nf_tables] [ 1239.229091] ? nf_tables_dump_obj_done+0x70/0x70 [nf_tables] [ 1239.229091] ? nla_parse+0xab/0x230 [ 1239.229091] ? nft_get_set_elem+0x440/0x440 [nf_tables] [ 1239.229091] nfnetlink_rcv_msg+0x7f0/0xab0 [nfnetlink] [ 1239.229091] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink] [ 1239.229091] ? debug_show_all_locks+0x290/0x290 [ 1239.229091] ? sched_clock_cpu+0x132/0x170 [ 1239.229091] ? find_held_lock+0x39/0x1b0 [ 1239.229091] ? sched_clock_local+0x10d/0x130 [ 1239.229091] netlink_rcv_skb+0x211/0x320 [ 1239.229091] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink] [ 1239.229091] ? netlink_ack+0x7b0/0x7b0 [ 1239.229091] ? ns_capable_common+0x6e/0x110 [ 1239.229091] nfnetlink_rcv+0x2d1/0x310 [nfnetlink] [ 1239.229091] ? nfnetlink_rcv_batch+0x10f0/0x10f0 [nfnetlink] [ 1239.229091] ? netlink_deliver_tap+0x829/0x930 [ 1239.229091] ? lock_acquire+0x265/0x2e0 [ 1239.229091] netlink_unicast+0x406/0x520 [ 1239.509725] ? netlink_attachskb+0x5b0/0x5b0 [ 1239.509725] ? find_held_lock+0x39/0x1b0 [ 1239.509725] netlink_sendmsg+0x987/0xa20 [ 1239.509725] ? netlink_unicast+0x520/0x520 [ 1239.509725] ? _copy_from_user+0xa9/0xc0 [ 1239.509725] __sys_sendto+0x21a/0x2c0 [ 1239.509725] ? __ia32_sys_getpeername+0xa0/0xa0 [ 1239.509725] ? retint_kernel+0x10/0x10 [ 1239.509725] ? sched_clock_cpu+0x132/0x170 [ 1239.509725] ? find_held_lock+0x39/0x1b0 [ 1239.509725] ? lock_downgrade+0x540/0x540 [ 1239.509725] ? up_read+0x1c/0x100 [ 1239.509725] ? __do_page_fault+0x763/0x970 [ 1239.509725] ? retint_user+0x18/0x18 [ 1239.509725] __x64_sys_sendto+0x177/0x180 [ 1239.509725] do_syscall_64+0xaa/0x360 [ 1239.509725] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1239.509725] RIP: 0033:0x7f5a8f468e03 [ 1239.509725] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb d0 0f 1f 84 00 00 00 00 00 83 3d 49 c9 2b 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 [ 1239.509725] RSP: 002b:00007ffd78d0b778 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 1239.509725] RAX: ffffffffffffffda RBX: 00007ffd78d0c890 RCX: 00007f5a8f468e03 [ 1239.509725] RDX: 0000000000000034 RSI: 00007ffd78d0b7e0 RDI: 0000000000000003 [ 1239.509725] RBP: 00007ffd78d0b7d0 R08: 00007f5a8f15c160 R09: 000000000000000c [ 1239.509725] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd78d0b7e0 [ 1239.509725] R13: 0000000000000034 R14: 00007f5a8f9aff60 R15: 00005648040094b0 [ 1239.509725] Modules linked in: nf_tables_set nf_tables nfnetlink ip_tables x_tables [ 1239.670713] ---[ end trace 39375adcda140f11 ]--- [ 1239.676016] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set] [ 1239.682834] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16 [ 1239.705108] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246 [ 1239.711115] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001 [ 1239.719269] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410 [ 1239.727401] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030 [ 1239.735530] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0 [ 1239.743658] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000 [ 1239.751785] FS: 00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 1239.760993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1239.767560] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0 [ 1239.775679] Kernel panic - not syncing: Fatal exception [ 1239.776630] Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 1239.776630] Rebooting in 5 seconds.. Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: use dev->name directlyFlorian Westphal2018-07-201-1/+0Star
| | | | | | | | | | | | | | | no need to store the name in separate area. Furthermore, it uses kmalloc but not kfree and most accesses seem to treat it as char[IFNAMSIZ] not char *. Remove this and use dev->name instead. In case event zeroed dev, just omit the name in the dump. Fixes: d92191aa84e5f1 ("netfilter: nf_tables: cache device name in flowtable object") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: fix jumpstack depth validationTaehee Yoo2018-07-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The level of struct nft_ctx is updated by nf_tables_check_loops(). That is used to validate jumpstack depth. But jumpstack validation routine doesn't update and validate recursively. So, in some cases, chain depth can be bigger than the NFT_JUMP_STACK_SIZE. After this patch, The jumpstack validation routine is located in the nft_chain_validate(). When new rules or new set elements are added, the nft_table_validate() is called by the nf_tables_newrule and the nf_tables_newsetelem. The nft_table_validate() calls the nft_chain_validate() that visit all their children chains recursively. So it can update depth of chain certainly. Reproducer: %cat ./test.sh #!/bin/bash nft add table ip filter nft add chain ip filter input { type filter hook input priority 0\; } for ((i=0;i<20;i++)); do nft add chain ip filter a$i done nft add rule ip filter input jump a1 for ((i=0;i<10;i++)); do nft add rule ip filter a$i jump a$((i+1)) done for ((i=11;i<19;i++)); do nft add rule ip filter a$i jump a$((i+1)) done nft add rule ip filter a10 jump a11 Result: [ 253.931782] WARNING: CPU: 1 PID: 0 at net/netfilter/nf_tables_core.c:186 nft_do_chain+0xacc/0xdf0 [nf_tables] [ 253.931915] Modules linked in: nf_tables nfnetlink ip_tables x_tables [ 253.932153] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #48 [ 253.932153] RIP: 0010:nft_do_chain+0xacc/0xdf0 [nf_tables] [ 253.932153] Code: 83 f8 fb 0f 84 c7 00 00 00 e9 d0 00 00 00 83 f8 fd 74 0e 83 f8 ff 0f 84 b4 00 00 00 e9 bd 00 00 00 83 bd 64 fd ff ff 0f 76 09 <0f> 0b 31 c0 e9 bc 02 00 00 44 8b ad 64 fd [ 253.933807] RSP: 0018:ffff88011b807570 EFLAGS: 00010212 [ 253.933807] RAX: 00000000fffffffd RBX: ffff88011b807660 RCX: 0000000000000000 [ 253.933807] RDX: 0000000000000010 RSI: ffff880112b39d78 RDI: ffff88011b807670 [ 253.933807] RBP: ffff88011b807850 R08: ffffed0023700ece R09: ffffed0023700ecd [ 253.933807] R10: ffff88011b80766f R11: ffffed0023700ece R12: ffff88011b807898 [ 253.933807] R13: ffff880112b39d80 R14: ffff880112b39d60 R15: dffffc0000000000 [ 253.933807] FS: 0000000000000000(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000 [ 253.933807] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 253.933807] CR2: 00000000014f1008 CR3: 000000006b216000 CR4: 00000000001006e0 [ 253.933807] Call Trace: [ 253.933807] <IRQ> [ 253.933807] ? sched_clock_cpu+0x132/0x170 [ 253.933807] ? __nft_trace_packet+0x180/0x180 [nf_tables] [ 253.933807] ? sched_clock_cpu+0x132/0x170 [ 253.933807] ? debug_show_all_locks+0x290/0x290 [ 253.933807] ? __lock_acquire+0x4835/0x4af0 [ 253.933807] ? inet_ehash_locks_alloc+0x1a0/0x1a0 [ 253.933807] ? unwind_next_frame+0x159e/0x1840 [ 253.933807] ? __read_once_size_nocheck.constprop.4+0x5/0x10 [ 253.933807] ? nft_do_chain_ipv4+0x197/0x1e0 [nf_tables] [ 253.933807] ? nft_do_chain+0x5/0xdf0 [nf_tables] [ 253.933807] nft_do_chain_ipv4+0x197/0x1e0 [nf_tables] [ 253.933807] ? nft_do_chain_arp+0xb0/0xb0 [nf_tables] [ 253.933807] ? __lock_is_held+0x9d/0x130 [ 253.933807] nf_hook_slow+0xc4/0x150 [ 253.933807] ip_local_deliver+0x28b/0x380 [ 253.933807] ? ip_call_ra_chain+0x3e0/0x3e0 [ 253.933807] ? ip_rcv_finish+0x1610/0x1610 [ 253.933807] ip_rcv+0xbcc/0xcc0 [ 253.933807] ? debug_show_all_locks+0x290/0x290 [ 253.933807] ? ip_local_deliver+0x380/0x380 [ 253.933807] ? __lock_is_held+0x9d/0x130 [ 253.933807] ? ip_local_deliver+0x380/0x380 [ 253.933807] __netif_receive_skb_core+0x1c9c/0x2240 Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: handle chain name lookups via rhltableFlorian Westphal2018-06-031-1/+6
| | | | | | | | | | | | | | | | | | If there is a significant amount of chains list search is too slow, so add an rhlist table for this. This speeds up ruleset loading: for every new rule we have to check if the name already exists in current generation. We need to be able to cope with duplicate chain names in case a transaction drops the nfnl mutex (for request_module) and the abort of this old transaction is still pending. The list is kept -- we need a way to iterate chains even if hash resize is in progress without missing an entry. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: add destroy_clone expressionPablo Neira Ayuso2018-06-031-0/+2
| | | | | | | | | | | | Before this patch, cloned expressions are released via ->destroy. This is a problem for the new connlimit expression since the ->destroy path drop a reference on the conntrack modules and it unregisters hooks. The new ->destroy_clone provides context that this expression is being released from the packet path, so it is mirroring ->clone(), where neither module reference is dropped nor hooks need to be unregistered - because this done from the control plane path from the ->init() path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: garbage collection for stateful expressionsPablo Neira Ayuso2018-06-031-0/+4
| | | | | | | | | | | | Use garbage collector to schedule removal of elements based of feedback from expression that this element comes with. Therefore, the garbage collector is not guided by timeout expirations in this new mode. The new connlimit expression sets on the NFT_EXPR_GC flag to enable this behaviour, the dynset expression needs to explicitly enable the garbage collector via set->ops->gc_init call. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: pass ctx to nf_tables_expr_destroy()Pablo Neira Ayuso2018-06-031-0/+4
| | | | | | | | | nft_set_elem_destroy() can be called from call_rcu context. Annotate netns and table in set object so we can populate the context object. Moreover, pass context object to nf_tables_set_elem_destroy() from the commit phase, since it is already available from there. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: pass context to object destroy indirectionPablo Neira Ayuso2018-06-031-1/+2
| | | | | | | The new connlimit object needs this to properly deal with conntrack dependencies. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: fix chain dependency validationPablo Neira Ayuso2018-06-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following ruleset: add table ip filter add chain ip filter input { type filter hook input priority 4; } add chain ip filter ap add rule ip filter input jump ap add rule ip filter ap masquerade results in a panic, because the masquerade extension should be rejected from the filter chain. The existing validation is missing a chain dependency check when the rule is added to the non-base chain. This patch fixes the problem by walking down the rules from the basechains, searching for either immediate or lookup expressions, then jumping to non-base chains and again walking down the rules to perform the expression validation, so we make sure the full ruleset graph is validated. This is done only once from the commit phase, in case of problem, we abort the transaction and perform fine grain validation for error reporting. This patch requires 003087911af2 ("netfilter: nfnetlink: allow commit to fail") to achieve this behaviour. This patch also adds a cleanup callback to nfnl batch interface to reset the validate state from the exit path. As a result of this patch, nf_tables_check_loops() doesn't use ->validate to check for loops, instead it just checks for immediate expressions. Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: remove synchronize_rcu in commit phaseFlorian Westphal2018-05-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | synchronize_rcu() is expensive. The commit phase currently enforces an unconditional synchronize_rcu() after incrementing the generation counter. This is to make sure that a packet always sees a consistent chain, either nft_do_chain is still using old generation (it will skip the newly added rules), or the new one (it will skip old ones that might still be linked into the list). We could just remove the synchronize_rcu(), it would not cause a crash but it could cause us to evaluate a rule that was removed and new rule for the same packet, instead of either-or. To resolve this, add rule pointer array holding two generations, the current one and the future generation. In commit phase, allocate the rule blob and populate it with the rules that will be active in the new generation. Then, make this rule blob public, replacing the old generation pointer. Then the generation counter can be incremented. nft_do_chain() will either continue to use the current generation (in case loop was invoked right before increment), or the new one. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2018-05-231-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal. 2) Add support for map lookups to numgen, random and hash expressions, from Laura Garcia. 3) Allow to register nat hooks for iptables and nftables at the same time. Patchset from Florian Westpha. 4) Timeout support for rbtree sets. 5) ip6_rpfilter works needs interface for link-local addresses, from Vincent Bernat. 6) Add nf_ct_hook and nf_nat_hook structures and use them. 7) Do not drop packets on packets raceing to insert conntrack entries into hashes, this is particularly a problem in nfqueue setups. 8) Address fallout from xt_osf separation to nf_osf, patches from Florian Westphal and Fernando Mancera. 9) Remove reference to struct nft_af_info, which doesn't exist anymore. From Taehee Yoo. This batch comes with is a conflict between 25fd386e0bc0 ("netfilter: core: add missing __rcu annotation") in your tree and 2c205dd3981f ("netfilter: add struct nf_nat_hook and use it") coming in this batch. This conflict can be solved by leaving the __rcu tag on __netfilter_net_init() - added by 25fd386e0bc0 - and remove all code related to nf_nat_decode_session_hook - which is gone after 2c205dd3981f, as described by: diff --cc net/netfilter/core.c index e0ae4aae96f5,206fb2c4c319..168af54db975 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo EXPORT_SYMBOL_GPL(nf_ct_zone_dflt); #endif /* CONFIG_NF_CONNTRACK */ - static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max) -#ifdef CONFIG_NF_NAT_NEEDED -void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *); -EXPORT_SYMBOL(nf_nat_decode_session_hook); -#endif - + static void __net_init + __netfilter_net_init(struct nf_hook_entries __rcu **e, int max) { int h; I can also merge your net-next tree into nf-next, solve the conflict and resend the pull request if you prefer so. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_tables: allow chain type to override hook registerFlorian Westphal2018-05-231-4/+4
| | | | | | | | | | | | | | | | Will be used in followup patch when nat types no longer use nf_register_net_hook() but will instead register with the nat core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-05-211-0/+5
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | S390 bpf_jit.S is removed in net-next and had changes in 'net', since that code isn't used any more take the removal. TLS data structures split the TX and RX components in 'net-next', put the new struct members from the bug fix in 'net' into the RX part. The 'net-next' tree had some reworking of how the ERSPAN code works in the GRE tunneling code, overlapping with a one-line headroom calculation fix in 'net'. Overlapping changes in __sock_map_ctx_update_elem(), keep the bits that read the prog members via READ_ONCE() into local variables before using them. Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_tables: bogus EBUSY in chain deletionsPablo Neira Ayuso2018-05-091-0/+5
| | | | | | | | | | | | | | | | | | When removing a rule that jumps to chain and such chain in the same batch, this bogusly hits EBUSY. Add activate and deactivate operations to expression that can be called from the preparation and the commit/abort phases. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: support timeouts larger than 23 daysFlorian Westphal2018-04-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marco De Benedetto says: I would like to use a timeout of 30 days for elements in a set but it seems there is a some kind of problem above 24d20h31m23s. Fix this by using 'jiffies64' for timeout handling to get same behaviour on 32 and 64bit systems. nftables passes timeouts as u64 in milliseconds to the kernel, but on kernel side we used a mixture of 'long' and jiffies conversions rather than u64 and jiffies64. Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1237 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: Simplify set backend selectionPhil Sutter2018-04-241-19/+15Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop nft_set_type's ability to act as a container of multiple backend implementations it chooses from. Instead consolidate the whole selection logic in nft_select_set_ops() and the actual backend provided estimate() callback. This turns nf_tables_set_types into a list containing all available backends which is traversed when selecting one matching userspace requested criteria. Also, this change allows to embed nft_set_ops structure into nft_set_type and pull flags field into the latter as it's only used during selection phase. A crucial part of this change is to make sure the new layout respects hash backend constraints formerly enforced by nft_hash_select_ops() function: This is achieved by introduction of a specific estimate() callback for nft_hash_fast_ops which returns false for key lengths != 4. In turn, nft_hash_estimate() is changed to return false for key lengths == 4 so it won't be chosen by accident. Also, both callbacks must return false for unbounded sets as their size estimate depends on a known maximum element count. Note that this patch partially reverts commit 4f2921ca21b71 ("netfilter: nf_tables: meter: pick a set backend that supports updates") by making nft_set_ops_candidate() not explicitly look for an update callback but make NFT_SET_EVAL a regular backend feature flag which is checked along with the others. This way all feature requirements are checked in one go. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: simplify lookup functionsPablo Neira Ayuso2018-04-241-6/+6
| | | | | | | | | | | | | | | | Replace the nf_tables_ prefix by nft_ and merge code into single lookup function whenever possible. In many cases we go over the 80-chars boundary function names, this save us ~50 LoC. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_flow_table: track flow tables in nf_flow_table directlyFelix Fietkau2018-04-241-3/+0Star
|/ | | | | | | | Avoids having nf_flow_table depend on nftables (useful for future iptables backport work) Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-04-021-0/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_tables: cache device name in flowtable objectPablo Neira Ayuso2018-03-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | Devices going away have to grab the nfnl_lock from the netdev event path to avoid races with control plane updates. However, netlink dumps in netfilter do not hold nfnl_lock mutex. Cache the device name into the objects to avoid an use-after-free situation for a device that is going away. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: rename to nft_set_lookup_global()Pablo Neira Ayuso2018-03-301-5/+5
| | | | | | | | | | | | To prepare shorter introduction of shorter function prefix. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: enable conntrack if NAT chain is registeredPablo Neira Ayuso2018-03-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | Register conntrack hooks if the user adds NAT chains. Users get confused with the existing behaviour since they will see no packets hitting this chain until they add the first rule that refers to conntrack. This patch adds new ->init() and ->free() indirections to chain types that can be used by NAT chains to invoke the conntrack dependency. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: build-in filter chain typePablo Neira Ayuso2018-03-301-0/+3
| | | | | | | | | | | | | | | | One module per supported filter chain family type takes too much memory for very little code - too much modularization - place all chain filter definitions in one single file. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: nft_register_chain_type() returns voidPablo Neira Ayuso2018-03-301-1/+1
| | | | | | | | | | | | | | | | Use WARN_ON() instead since it should not happen that neither family goes over NFPROTO_NUMPROTO nor there is already a chain of this type already registered. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: nf_tables: rename struct nf_chain_typePablo Neira Ayuso2018-03-301-8/+8
|/ | | | | | | | Use nft_ prefix. By when I added chain types, I forgot to use the nftables prefix. Rename enum nft_chain_type to enum nft_chain_types too, otherwise there is an overlap. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: allocate handle and delete objects via handleHarsha Sharma2018-01-191-2/+9
| | | | | | | | This patch allows deletion of objects via unique handle which can be listed via '-a' option. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: get rid of struct nft_af_info abstractionPablo Neira Ayuso2018-01-101-21/+2Star
| | | | | | | Remove the infrastructure to register/unregister nft_af_info structure, this structure stores no useful information anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: get rid of pernet familiesPablo Neira Ayuso2018-01-101-2/+2
| | | | | | | | | | | Now that we have a single table list for each netns, we can get rid of one pointer per family and the global afinfo list, thus, shrinking struct netns for nftables that now becomes 64 bytes smaller. And call __nft_release_afinfo() from __net_exit path accordingly to release netnamespace objects on removal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: add single table list for all familiesPablo Neira Ayuso2018-01-101-4/+4
| | | | | | | | | | | | | | Place all existing user defined tables in struct net *, instead of having one list per family. This saves us from one level of indentation in netlink dump functions. Place pointer to struct nft_af_info in struct nft_table temporarily, as we still need this to put back reference module reference counter on table removal. This patch comes in preparation for the removal of struct nft_af_info. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: remove flag field from struct nft_af_infoPablo Neira Ayuso2018-01-101-6/+0Star
| | | | | | Replace it by a direct check for the netdev protocol family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: remove nhooks field from struct nft_af_infoPablo Neira Ayuso2018-01-101-2/+0Star
| | | | | | | | | | We already validate the hook through bitmask, so this check is superfluous. When removing this, this patch is also fixing a bug in the new flowtable codebase, since ctx->afi points to the table family instead of the netdev family which is where the flowtable is really hooked in. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: add flow table netlink frontendPablo Neira Ayuso2018-01-081-0/+48
| | | | | | | | | | | | This patch introduces a netlink control plane to create, delete and dump flow tables. Flow tables are identified by name, this name is used from rules to refer to an specific flow table. Flow tables use the rhashtable class and a generic garbage collector to remove expired entries. This also adds the infrastructure to add different flow table types, so we can add one for each layer 3 protocol family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: remove nft_dereference()Pablo Neira Ayuso2018-01-081-3/+0Star
| | | | | | | This macro is unnecessary, it just hides details for one single caller. nfnl_dereference() is just enough. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>