summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* netfilter: ebtables: fix compat entry paddingAlin Nastac2018-06-041-5/+5
| | | | | | | | | | | | | On arm64, ebt_entry_{match,watcher,target} structs are 40 bytes long while on 32-bit arm these structs have a size of 36 bytes. COMPAT_XT_ALIGN() macro cannot be used here to determine the necessary padding for the CONFIG_COMPAT because it imposes an 8-byte boundary alignment, condition that is not found in 32-bit ebtables application. Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* ipvs: fix check on xmit to non-local addressesJulian Anastasov2018-06-041-1/+1
| | | | | | | | | | | | | | | | | | There is mistake in the rt_mode_allow_non_local assignment. It should be used to check if sending to non-local addresses is allowed, now it checks if local addresses are allowed. As local addresses are allowed for most of the cases, the only places that are affected are for traffic to transparent cache servers: - bypass connections when cache server is not available - related ICMP in FORWARD hook when sent to cache server Fixes: 4a4739d56b00 ("ipvs: Pull out crosses_local_route_boundary logic") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nft_reject_bridge: fix skb allocation size in ↵Taehee Yoo2018-06-041-1/+1
| | | | | | | | | nft_reject_br_send_v6_unreach In order to allocate icmpv6 skb, sizeof(struct ipv6hdr) should be used. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* ipvs: register conntrack hooks for ftpJulian Anastasov2018-06-022-0/+34
| | | | | | | | | | | | | | | ip_vs_ftp requires conntrack modules for mangling of FTP command responses in passive mode. Make sure the conntrack hooks are registered when real servers use NAT method in FTP virtual service. The hooks will be registered while the service is present. Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: nf_tables: check msg_type before nft_trans_set(trans)Alexey Kodanev2018-06-011-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch moves the "trans->msg_type == NFT_MSG_NEWSET" check before using nft_trans_set(trans). Otherwise we can get out of bounds read. For example, KASAN reported the one when running 0001_cache_handling_0 nft test. In this case "trans->msg_type" was NFT_MSG_NEWTABLE: [75517.177808] BUG: KASAN: slab-out-of-bounds in nft_set_lookup_global+0x22f/0x270 [nf_tables] [75517.279094] Read of size 8 at addr ffff881bdb643fc8 by task nft/7356 ... [75517.375605] CPU: 26 PID: 7356 Comm: nft Tainted: G E 4.17.0-rc7.1.x86_64 #1 [75517.489587] Hardware name: Oracle Corporation SUN SERVER X4-2 [75517.618129] Call Trace: [75517.648821] dump_stack+0xd1/0x13b [75517.691040] ? show_regs_print_info+0x5/0x5 [75517.742519] ? kmsg_dump_rewind_nolock+0xf5/0xf5 [75517.799300] ? lock_acquire+0x143/0x310 [75517.846738] print_address_description+0x85/0x3a0 [75517.904547] kasan_report+0x18d/0x4b0 [75517.949892] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.019153] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.088420] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.157689] nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.224869] nf_tables_newsetelem+0x1a5/0x5d0 [nf_tables] [75518.291024] ? nft_add_set_elem+0x2280/0x2280 [nf_tables] [75518.357154] ? nla_parse+0x1a5/0x300 [75518.401455] ? kasan_kmalloc+0xa6/0xd0 [75518.447842] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink] [75518.507743] ? nfnetlink_rcv+0x7a5/0x1bdf [nfnetlink] [75518.569745] ? nfnl_err_reset+0x3c0/0x3c0 [nfnetlink] [75518.631711] ? lock_acquire+0x143/0x310 [75518.679133] ? netlink_deliver_tap+0x9b/0x1070 [75518.733840] ? kasan_unpoison_shadow+0x31/0x40 [75518.788542] netlink_unicast+0x45d/0x680 [75518.837111] ? __isolate_free_page+0x890/0x890 [75518.891913] ? netlink_attachskb+0x6b0/0x6b0 [75518.944542] netlink_sendmsg+0x6fa/0xd30 [75518.993107] ? netlink_unicast+0x680/0x680 [75519.043758] ? netlink_unicast+0x680/0x680 [75519.094402] sock_sendmsg+0xd9/0x160 [75519.138810] ___sys_sendmsg+0x64d/0x980 [75519.186234] ? copy_msghdr_from_user+0x350/0x350 [75519.243118] ? lock_downgrade+0x650/0x650 [75519.292738] ? do_raw_spin_unlock+0x5d/0x250 [75519.345456] ? _raw_spin_unlock+0x24/0x30 [75519.395065] ? __handle_mm_fault+0xbde/0x3410 [75519.448830] ? sock_setsockopt+0x3d2/0x1940 [75519.500516] ? __lock_acquire.isra.25+0xdc/0x19d0 [75519.558448] ? lock_downgrade+0x650/0x650 [75519.608057] ? __audit_syscall_entry+0x317/0x720 [75519.664960] ? __fget_light+0x58/0x250 [75519.711325] ? __sys_sendmsg+0xde/0x170 [75519.758850] __sys_sendmsg+0xde/0x170 [75519.804193] ? __ia32_sys_shutdown+0x90/0x90 [75519.856725] ? syscall_trace_enter+0x897/0x10e0 [75519.912354] ? trace_event_raw_event_sys_enter+0x920/0x920 [75519.979432] ? __audit_syscall_entry+0x720/0x720 [75520.036118] do_syscall_64+0xa3/0x3d0 [75520.081248] ? prepare_exit_to_usermode+0x47/0x1d0 [75520.139904] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75520.201680] RIP: 0033:0x7fc153320ba0 [75520.245772] RSP: 002b:00007ffe294c3638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [75520.337708] RAX: ffffffffffffffda RBX: 00007ffe294c4820 RCX: 00007fc153320ba0 [75520.424547] RDX: 0000000000000000 RSI: 00007ffe294c46b0 RDI: 0000000000000003 [75520.511386] RBP: 00007ffe294c47b0 R08: 0000000000000004 R09: 0000000002114090 [75520.598225] R10: 00007ffe294c30a0 R11: 0000000000000246 R12: 00007ffe294c3660 [75520.684961] R13: 0000000000000001 R14: 00007ffe294c3650 R15: 0000000000000001 [75520.790946] Allocated by task 7356: [75520.833994] kasan_kmalloc+0xa6/0xd0 [75520.878088] __kmalloc+0x189/0x450 [75520.920107] nft_trans_alloc_gfp+0x20/0x190 [nf_tables] [75520.983961] nf_tables_newtable+0xcd0/0x1bd0 [nf_tables] [75521.048857] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink] [75521.108655] netlink_unicast+0x45d/0x680 [75521.157013] netlink_sendmsg+0x6fa/0xd30 [75521.205271] sock_sendmsg+0xd9/0x160 [75521.249365] ___sys_sendmsg+0x64d/0x980 [75521.296686] __sys_sendmsg+0xde/0x170 [75521.341822] do_syscall_64+0xa3/0x3d0 [75521.386957] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75521.467867] Freed by task 23454: [75521.507804] __kasan_slab_free+0x132/0x180 [75521.558137] kfree+0x14d/0x4d0 [75521.596005] free_rt_sched_group+0x153/0x280 [75521.648410] sched_autogroup_create_attach+0x19a/0x520 [75521.711330] ksys_setsid+0x2ba/0x400 [75521.755529] __ia32_sys_setsid+0xa/0x10 [75521.802850] do_syscall_64+0xa3/0x3d0 [75521.848090] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75521.929000] The buggy address belongs to the object at ffff881bdb643f80 which belongs to the cache kmalloc-96 of size 96 [75522.079797] The buggy address is located 72 bytes inside of 96-byte region [ffff881bdb643f80, ffff881bdb643fe0) [75522.221234] The buggy address belongs to the page: [75522.280100] page:ffffea006f6d90c0 count:1 mapcount:0 mapping:0000000000000000 index:0x0 [75522.377443] flags: 0x2fffff80000100(slab) [75522.426956] raw: 002fffff80000100 0000000000000000 0000000000000000 0000000180200020 [75522.521275] raw: ffffea006e6fafc0 0000000c0000000c ffff881bf180f400 0000000000000000 [75522.615601] page dumped because: kasan: bad access detected Fixes: 37a9cc525525 ("netfilter: nf_tables: add generation mask to sets") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: xt_CT: Reject the non-null terminated string from user spaceGao Feng2018-06-011-0/+10
| | | | | | | | | | | | | The helper and timeout strings are from user-space, we need to make sure they are null terminated. If not, evil user could make kernel read the unexpected memory, even print it when fail to find by the following codes. pr_info_ratelimited("No such helper \"%s\"\n", helper_name); Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* net-sysfs: Fix memory leak in XPS configurationAlexander Duyck2018-06-011-3/+3
| | | | | | | | | | This patch reorders the error cases in showing the XPS configuration so that we hold off on memory allocation until after we have verified that we can support XPS on a given ring. Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ixgbe: fix parsing of TC actions for HW offloadOndřej Hlavatý2018-06-011-5/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous code was optimistic, accepting the offload of whole action chain when there was a single known action (drop/redirect). This results in offloading a rule which should not be offloaded, because its behavior cannot be reproduced in the hardware. For example: $ tc filter add dev eno1 parent ffff: protocol ip \ u32 ht 800: order 1 match tcp src 42 FFFF \ action mirred egress mirror dev enp1s16 pipe \ drop The controller is unable to mirror the packet to a VF, but still offloads the rule by dropping the packet. Change the approach of the function to a pessimistic one, rejecting the chain when an unknown action is found. This is better suited for future extensions. Note that both recognized actions always return TC_ACT_SHOT, therefore it is safe to ignore actions behind them. Signed-off-by: Ondřej Hlavatý <ohlavaty@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethernet: davinci_emac: fix error handling in probe()Dan Carpenter2018-05-311-10/+12
| | | | | | | | | | | | | | | | | The current error handling code has an issue where it does: if (priv->txchan) cpdma_chan_destroy(priv->txchan); The problem is that ->txchan is either valid or an error pointer (which would lead to an Oops). I've changed it to use multiple error labels so that the test can be removed. Also there were some missing calls to netif_napi_del(). Fixes: 3ef0fdb2342c ("net: davinci_emac: switch to new cpdma layer") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/ncsi: Fix array size in dumpit handlerSamuel Mendoza-Jonas2018-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CONFIG_CC_STACKPROTECTOR enabled the kernel panics as below when parsing a NCSI_CMD_PKG_INFO command: [ 150.149711] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08 [ 150.149711] [ 150.159919] CPU: 0 PID: 1301 Comm: ncsi-netlink Not tainted 4.13.16-468cbec6d2c91239332cb91b1f0a73aafcb6f0c6 #1 [ 150.170004] Hardware name: Generic DT based system [ 150.174852] [<80109930>] (unwind_backtrace) from [<80106bc4>] (show_stack+0x20/0x24) [ 150.182641] [<80106bc4>] (show_stack) from [<805d36e4>] (dump_stack+0x20/0x28) [ 150.189888] [<805d36e4>] (dump_stack) from [<801163ac>] (panic+0xdc/0x278) [ 150.196780] [<801163ac>] (panic) from [<801162cc>] (__stack_chk_fail+0x20/0x24) [ 150.204111] [<801162cc>] (__stack_chk_fail) from [<805cff08>] (ncsi_pkg_info_all_nl+0x244/0x258) [ 150.212912] [<805cff08>] (ncsi_pkg_info_all_nl) from [<804f939c>] (genl_lock_dumpit+0x3c/0x54) [ 150.221535] [<804f939c>] (genl_lock_dumpit) from [<804f873c>] (netlink_dump+0xf8/0x284) [ 150.229550] [<804f873c>] (netlink_dump) from [<804f8d44>] (__netlink_dump_start+0x124/0x17c) [ 150.237992] [<804f8d44>] (__netlink_dump_start) from [<804f9880>] (genl_rcv_msg+0x1c8/0x3d4) [ 150.246440] [<804f9880>] (genl_rcv_msg) from [<804f9174>] (netlink_rcv_skb+0xd8/0x134) [ 150.254361] [<804f9174>] (netlink_rcv_skb) from [<804f96a4>] (genl_rcv+0x30/0x44) [ 150.261850] [<804f96a4>] (genl_rcv) from [<804f7790>] (netlink_unicast+0x198/0x234) [ 150.269511] [<804f7790>] (netlink_unicast) from [<804f7ffc>] (netlink_sendmsg+0x368/0x3b0) [ 150.277783] [<804f7ffc>] (netlink_sendmsg) from [<804abea4>] (sock_sendmsg+0x24/0x34) [ 150.285625] [<804abea4>] (sock_sendmsg) from [<804ac1dc>] (___sys_sendmsg+0x244/0x260) [ 150.293556] [<804ac1dc>] (___sys_sendmsg) from [<804ad98c>] (__sys_sendmsg+0x5c/0x9c) [ 150.301400] [<804ad98c>] (__sys_sendmsg) from [<804ad9e4>] (SyS_sendmsg+0x18/0x1c) [ 150.308984] [<804ad9e4>] (SyS_sendmsg) from [<80102640>] (ret_fast_syscall+0x0/0x3c) [ 150.316743] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08 This turns out to be because the attrs array in ncsi_pkg_info_all_nl() is initialised to a length of NCSI_ATTR_MAX which is the maximum attribute number, not the number of attributes. Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family") Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'wireless-drivers-for-davem-2018-05-30' of ↵David S. Miller2018-05-312-9/+8Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.17 Two last minute fixes, hopefully they make it to 4.17 still. rt2x00 * revert a fix which caused even more problems iwlwifi * fix a crash when there are 16 or more logical CPUs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUsHao Wei Tee2018-05-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there are 16 or more logical CPUs, we request for `IWL_MAX_RX_HW_QUEUES` (16) IRQs only as we limit to that number of IRQs, but later on we compare the number of IRQs returned to nr_online_cpus+2 instead of max_irqs, the latter being what we actually asked for. This ends up setting num_rx_queues to 17 which causes lots of out-of-bounds array accesses later on. Compare to max_irqs instead, and also add an assertion in case num_rx_queues > IWM_MAX_RX_HW_QUEUES. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199551 Fixes: 2e5d4a8f61dc ("iwlwifi: pcie: Add new configuration to enable MSIX") Signed-off-by: Hao Wei Tee <angelsl@in04.sg> Tested-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
| * Revert "rt2800: use TXOP_BACKOFF for probe frames"Stanislaw Gruszka2018-05-291-4/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commit fb47ada8dc3c30c8e7b415da155742b49536c61e. In some situations when we set TXOP_BACKOFF, the probe frame is not sent at all. What it worse then sending probe frame as part of AMPDU and can degrade 11n performance to 11g rates. Cc: stable@vger.kernel.org Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
* | cls_flower: Fix incorrect idr release when failing to modify rulePaul Blakey2018-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When we fail to modify a rule, we incorrectly release the idr handle of the unmodified old rule. Fix that by checking if we need to release it. Fixes: fe2502e49b58 ("net_sched: remove cls_flower idr on failure") Reported-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sonic: Use dma_mapping_error()Finn Thain2018-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | With CONFIG_DMA_API_DEBUG=y, calling sonic_open() produces the message, "DMA-API: device driver failed to check map error". Add the missing dma_mapping_error() call. Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vhost_net: flush batched heads before trying to busy pollingJason Wang2018-05-301-13/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit e2b3b35eb989 ("vhost_net: batch used ring update in rx"), we tend to batch updating used heads. But it doesn't flush batched heads before trying to do busy polling, this will cause vhost to wait for guest TX which waits for the used RX. Fixing by flush batched heads before busy loop. 1 byte TCP_RR performance recovers from 13107.83 to 50402.65. Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tun: Fix NULL pointer dereference in XDP redirectToshiaki Makita2018-05-291-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling XDP redirection requires bh disabled. Softirq can call another XDP function and redirection functions, then the percpu static variable ri->map can be overwritten to NULL. This is a generic XDP case called from tun. [ 3535.736058] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 3535.743974] PGD 0 P4D 0 [ 3535.746530] Oops: 0000 [#1] SMP PTI [ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel scsi_transport_sas tg3 i2c_core ptp pps_core [ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 4.17.0-rc4 #2 [ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017 [ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30 [ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246 [ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: ffff9fdf27fe3100 [ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 0000000000000000 [ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 0000000000000ec0 [ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: ffff9fdf9e72c000 [ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: ffff9fdfc82cdd00 [ 3535.874990] FS: 0000000000000000(0000) GS:ffff9fdfcfe80000(0000) knlGS:0000000000000000 [ 3535.883152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 00000000007626e0 [ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3535.910538] PKRU: 55555554 [ 3535.913267] Call Trace: [ 3535.915736] xdp_do_generic_redirect+0x7a/0x310 [ 3535.920310] do_xdp_generic.part.117+0x285/0x370 [ 3535.924970] tun_get_user+0x5b9/0x1260 [tun] [ 3535.929279] tun_sendmsg+0x52/0x70 [tun] [ 3535.933237] handle_tx+0x2ad/0x5f0 [vhost_net] [ 3535.937721] vhost_worker+0xa5/0x100 [vhost] [ 3535.942030] kthread+0xf5/0x130 [ 3535.945198] ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost] [ 3535.950031] ? kthread_bind+0x10/0x10 [ 3535.953727] ret_from_fork+0x35/0x40 [ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 c0 c3 [ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58 [ 3535.982883] CR2: 0000000000000018 [ 3535.987096] ---[ end trace 383b299dd1430240 ]--- [ 3536.131325] Kernel panic - not syncing: Fatal exception [ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]--- And a kernel with generic case fixed still panics in tun driver XDP redirect, because it disabled only preemption, but not bh. [ 2055.128746] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 2055.136662] PGD 0 P4D 0 [ 2055.139219] Oops: 0000 [#1] SMP PTI [ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas crc32c_intel i2c_core ptp pps_core [ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G W 4.17.0-rc5-fix-tun+ #1 [ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017 [ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30 [ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246 [ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 0000000000000004 [ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 0000000000000000 [ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 0000000000005d80 [ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: ffff998b097a9000 [ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: ffff998b07607d78 [ 2055.269879] FS: 0000000000000000(0000) GS:ffff8dbd8ff00000(0000) knlGS:0000000000000000 [ 2055.278039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 00000000007626e0 [ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2055.305424] PKRU: 55555554 [ 2055.308153] Call Trace: [ 2055.310624] xdp_do_redirect+0x7b/0x380 [ 2055.314499] tun_get_user+0x10fe/0x12a0 [tun] [ 2055.318895] tun_sendmsg+0x52/0x70 [tun] [ 2055.322852] handle_tx+0x2ad/0x5f0 [vhost_net] [ 2055.327337] vhost_worker+0xa5/0x100 [vhost] [ 2055.331646] kthread+0xf5/0x130 [ 2055.334813] ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost] [ 2055.339646] ? kthread_bind+0x10/0x10 [ 2055.343343] ret_from_fork+0x35/0x40 [ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 c0 c3 [ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0 [ 2055.372500] CR2: 0000000000000018 [ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]--- [ 2055.523626] Kernel panic - not syncing: Fatal exception [ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]--- v2: - Removed preempt_disable/enable since local_bh_disable will prevent preemption as well, feedback from Jason Wang. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | be2net: Fix error detection logic for BE3Suresh Reddy2018-05-291-1/+3
| | | | | | | | | | | | | | | | | | Check for 0xE00 (RECOVERABLE_ERR) along with ARMFW UE (0x0) in be_detect_error() to know whether the error is valid error or not Fixes: 673c96e5a ("be2net: Fix UE detection logic for BE3") Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: qmi_wwan: Add Netgear Aircard 779SJosh Hill2018-05-291-0/+1
| | | | | | | | | | | | | | | | Add support for Netgear Aircard 779S Signed-off-by: Josh Hill <josh@joshuajhill.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAGPetr Machata2018-05-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VLAN 1 is internally used for untagged traffic. Prevent creation of explicit netdevice for that VLAN, because that currently isn't supported and leads to the NULL pointer dereference cited below. Fix by preventing creation of VLAN devices with VID of 1 over mlxsw devices or LAG devices that involve mlxsw devices. [ 327.175816] ================================================================================ [ 327.184544] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c:200:12 [ 327.193667] member access within null pointer of type 'const struct mlxsw_sp_fid' [ 327.201226] CPU: 0 PID: 8983 Comm: ip Not tainted 4.17.0-rc4-petrm_net_ip6gre_headroom-custom-140 #11 [ 327.210496] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016 [ 327.219872] Call Trace: [ 327.222384] dump_stack+0xc3/0x12b [ 327.234007] ubsan_epilogue+0x9/0x49 [ 327.237638] ubsan_type_mismatch_common+0x1f9/0x2d0 [ 327.255769] __ubsan_handle_type_mismatch+0x90/0xa7 [ 327.264716] mlxsw_sp_fid_type+0x35/0x50 [mlxsw_spectrum] [ 327.270255] mlxsw_sp_port_vlan_router_leave+0x46/0xc0 [mlxsw_spectrum] [ 327.277019] mlxsw_sp_inetaddr_port_vlan_event+0xe1/0x340 [mlxsw_spectrum] [ 327.315031] mlxsw_sp_netdevice_vrf_event+0xa8/0x100 [mlxsw_spectrum] [ 327.321626] mlxsw_sp_netdevice_event+0x276/0x430 [mlxsw_spectrum] [ 327.367863] notifier_call_chain+0x4c/0x150 [ 327.372128] __netdev_upper_dev_link+0x1b3/0x260 [ 327.399450] vrf_add_slave+0xce/0x170 [vrf] [ 327.403703] do_setlink+0x658/0x1d70 [ 327.508998] rtnl_newlink+0x908/0xf20 [ 327.559128] rtnetlink_rcv_msg+0x50c/0x720 [ 327.571720] netlink_rcv_skb+0x16a/0x1f0 [ 327.583450] netlink_unicast+0x2ca/0x3e0 [ 327.599305] netlink_sendmsg+0x3e2/0x7f0 [ 327.616655] sock_sendmsg+0x76/0xc0 [ 327.620207] ___sys_sendmsg+0x494/0x5d0 [ 327.666117] __sys_sendmsg+0xc2/0x130 [ 327.690953] do_syscall_64+0x66/0x370 [ 327.694677] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 327.699782] RIP: 0033:0x7f4c2f3f8037 [ 327.703393] RSP: 002b:00007ffe8c389708 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 327.711035] RAX: ffffffffffffffda RBX: 000000005b03f53e RCX: 00007f4c2f3f8037 [ 327.718229] RDX: 0000000000000000 RSI: 00007ffe8c389760 RDI: 0000000000000003 [ 327.725431] RBP: 00007ffe8c389760 R08: 0000000000000000 R09: 00007f4c2f443630 [ 327.732632] R10: 00000000000005eb R11: 0000000000000246 R12: 0000000000000000 [ 327.739833] R13: 00000000006774e0 R14: 00007ffe8c3897e8 R15: 0000000000000000 [ 327.747096] ================================================================================ Fixes: 9589a7b5d7d9 ("mlxsw: spectrum: Handle VLAN devices linking / unlinking") Suggested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | atm: zatm: fix memcmp castingIvan Bornyakov2018-05-291-2/+2
| | | | | | | | | | | | | | | | | | memcmp() returns int, but eprom_try_esi() cast it to unsigned char. One can lose significant bits and get 0 from non-0 value returned by the memcmp(). Signed-off-by: Ivan Bornyakov <brnkv.i1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: netsec: reduce DMA mask to 40 bitsArd Biesheuvel2018-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The netsec network controller IP can drive 64 address bits for DMA, and the DMA mask is set accordingly in the driver. However, the SynQuacer SoC, which is the only silicon incorporating this IP at the moment, integrates this IP in a manner that leaves address bits [63:40] unconnected. Up until now, this has not resulted in any problems, given that the DDR controller doesn't decode those bits to begin with. However, recent firmware updates for platforms incorporating this SoC allow the IOMMU to be enabled, which does decode address bits [47:40], and allocates top down from the IOVA space, producing DMA addresses that have bits set that have been left unconnected. Both the DT and ACPI (IORT) descriptions of the platform take this into account, and only describe a DMA address space of 40 bits (using either dma-ranges DT properties, or DMA address limits in IORT named component nodes). However, even though our IOMMU and bus layers may take such limitations into account by setting a narrower DMA mask when creating the platform device, the netsec probe() entrypoint follows the common practice of setting the DMA mask uncondionally, according to the capabilities of the IP block itself rather than to its integration into the chip. It is currently unclear what the correct fix is here. We could hack around it by only setting the DMA mask if it deviates from its default value of DMA_BIT_MASK(32). However, this makes it impossible for the bus layer to use DMA_BIT_MASK(32) as the bus limit, and so it appears that a more comprehensive approach is required to take DMA limits imposed by the SoC as a whole into account. In the mean time, let's limit the DMA mask to 40 bits. Given that there is currently only one SoC that incorporates this IP, this is a reasonable approach that can be backported to -stable and buys us some time to come up with a proper fix going forward. Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver") Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jassi Brar <jaswinder.singh@linaro.org> Cc: Masahisa Kojima <masahisa.kojima@linaro.org> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inlineMathieu Xhonneux2018-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | seg6_do_srh_encap and seg6_do_srh_inline can possibly do an out-of-bounds access when adding the SRH to the packet. This no longer happen when expanding the skb not only by the size of the SRH (+ outer IPv6 header), but also by skb->mac_len. [ 53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620 [ 53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674 [ 53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90 [ 53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 [ 53.796673] Call Trace: [ 53.796679] <IRQ> [ 53.796689] dump_stack+0x71/0xab [ 53.796700] print_address_description+0x6a/0x270 [ 53.796707] kasan_report+0x258/0x380 [ 53.796715] ? seg6_do_srh_encap+0x284/0x620 [ 53.796722] memmove+0x34/0x50 [ 53.796730] seg6_do_srh_encap+0x284/0x620 [ 53.796741] ? seg6_do_srh+0x29b/0x360 [ 53.796747] seg6_do_srh+0x29b/0x360 [ 53.796756] seg6_input+0x2e/0x2e0 [ 53.796765] lwtunnel_input+0x93/0xd0 [ 53.796774] ipv6_rcv+0x690/0x920 [ 53.796783] ? ip6_input+0x170/0x170 [ 53.796791] ? eth_gro_receive+0x2d0/0x2d0 [ 53.796800] ? ip6_input+0x170/0x170 [ 53.796809] __netif_receive_skb_core+0xcc0/0x13f0 [ 53.796820] ? netdev_info+0x110/0x110 [ 53.796827] ? napi_complete_done+0xb6/0x170 [ 53.796834] ? e1000_clean+0x6da/0xf70 [ 53.796845] ? process_backlog+0x129/0x2a0 [ 53.796853] process_backlog+0x129/0x2a0 [ 53.796862] net_rx_action+0x211/0x5c0 [ 53.796870] ? napi_complete_done+0x170/0x170 [ 53.796887] ? run_rebalance_domains+0x11f/0x150 [ 53.796891] __do_softirq+0x10e/0x39e [ 53.796894] do_softirq_own_stack+0x2a/0x40 [ 53.796895] </IRQ> [ 53.796898] do_softirq.part.16+0x54/0x60 [ 53.796900] __local_bh_enable_ip+0x5b/0x60 [ 53.796903] ip6_finish_output2+0x416/0x9f0 [ 53.796906] ? ip6_dst_lookup_flow+0x110/0x110 [ 53.796909] ? ip6_sk_dst_lookup_flow+0x390/0x390 [ 53.796911] ? __rcu_read_unlock+0x66/0x80 [ 53.796913] ? ip6_mtu+0x44/0xf0 [ 53.796916] ? ip6_output+0xfc/0x220 [ 53.796918] ip6_output+0xfc/0x220 [ 53.796921] ? ip6_finish_output+0x2b0/0x2b0 [ 53.796923] ? memcpy+0x34/0x50 [ 53.796926] ip6_send_skb+0x43/0xc0 [ 53.796929] rawv6_sendmsg+0x1216/0x1530 [ 53.796932] ? __orc_find+0x6b/0xc0 [ 53.796934] ? rawv6_rcv_skb+0x160/0x160 [ 53.796937] ? __rcu_read_unlock+0x66/0x80 [ 53.796939] ? __rcu_read_unlock+0x66/0x80 [ 53.796942] ? is_bpf_text_address+0x1e/0x30 [ 53.796944] ? kernel_text_address+0xec/0x100 [ 53.796946] ? __kernel_text_address+0xe/0x30 [ 53.796948] ? unwind_get_return_address+0x2f/0x50 [ 53.796950] ? __save_stack_trace+0x92/0x100 [ 53.796954] ? save_stack+0x89/0xb0 [ 53.796956] ? kasan_kmalloc+0xa0/0xd0 [ 53.796958] ? kmem_cache_alloc+0xd2/0x1f0 [ 53.796961] ? prepare_creds+0x23/0x160 [ 53.796963] ? __x64_sys_capset+0x252/0x3e0 [ 53.796966] ? do_syscall_64+0x69/0x160 [ 53.796968] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.796971] ? __alloc_pages_nodemask+0x170/0x380 [ 53.796973] ? __alloc_pages_slowpath+0x12c0/0x12c0 [ 53.796977] ? tty_vhangup+0x20/0x20 [ 53.796979] ? policy_nodemask+0x1a/0x90 [ 53.796982] ? __mod_node_page_state+0x8d/0xa0 [ 53.796986] ? __check_object_size+0xe7/0x240 [ 53.796989] ? __sys_sendto+0x229/0x290 [ 53.796991] ? rawv6_rcv_skb+0x160/0x160 [ 53.796993] __sys_sendto+0x229/0x290 [ 53.796996] ? __ia32_sys_getpeername+0x50/0x50 [ 53.796999] ? commit_creds+0x2de/0x520 [ 53.797002] ? security_capset+0x57/0x70 [ 53.797004] ? __x64_sys_capset+0x29f/0x3e0 [ 53.797007] ? __x64_sys_rt_sigsuspend+0xe0/0xe0 [ 53.797011] ? __do_page_fault+0x664/0x770 [ 53.797014] __x64_sys_sendto+0x74/0x90 [ 53.797017] do_syscall_64+0x69/0x160 [ 53.797019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.797022] RIP: 0033:0x7f43b7a6714a [ 53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a [ 53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004 [ 53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c [ 53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661 [ 53.797171] Allocated by task 642: [ 53.797460] kasan_kmalloc+0xa0/0xd0 [ 53.797463] kmem_cache_alloc+0xd2/0x1f0 [ 53.797465] getname_flags+0x40/0x210 [ 53.797467] user_path_at_empty+0x1d/0x40 [ 53.797469] do_faccessat+0x12a/0x320 [ 53.797471] do_syscall_64+0x69/0x160 [ 53.797473] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.797607] Freed by task 642: [ 53.797869] __kasan_slab_free+0x130/0x180 [ 53.797871] kmem_cache_free+0xa8/0x230 [ 53.797872] filename_lookup+0x15b/0x230 [ 53.797874] do_faccessat+0x12a/0x320 [ 53.797876] do_syscall_64+0x69/0x160 [ 53.797878] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.798014] The buggy address belongs to the object at ffff88011975e600 which belongs to the cache names_cache of size 4096 [ 53.799043] The buggy address is located 1786 bytes inside of 4096-byte region [ffff88011975e600, ffff88011975f600) [ 53.800013] The buggy address belongs to the page: [ 53.800414] page:ffffea000465d600 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 53.801259] flags: 0x17fff0000008100(slab|head) [ 53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000 0000000100070007 [ 53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40 0000000000000000 [ 53.803787] page dumped because: kasan: bad access detected [ 53.804384] Memory state around the buggy address: [ 53.804788] ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805384] ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.806577] ^ [ 53.807165] ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.807762] ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.808356] ================================================================== [ 53.808949] Disabling lock debugging due to kernel taint Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2018-05-299-43/+71
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Null pointer dereference when dumping conntrack helper configuration, from Taehee Yoo. 2) Missing sanitization in ebtables extension name through compat, from Paolo Abeni. 3) Broken fetch of tracing value, from Taehee Yoo. 4) Incorrect arithmetics in packet ratelimiting. 5) Buffer overflow in IPVS sync daemon, from Julian Anastasov. 6) Wrong argument to nla_strlcpy() in nfnetlink_{acct,cthelper}, from Eric Dumazet. 7) Fix splat in nft_update_chain_stats(). 8) Null pointer dereference from object netlink dump path, from Taehee Yoo. 9) Missing static_branch_inc() when enabling counters in existing chain, from Taehee Yoo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_tables: increase nft_counters_enabled in nft_chain_stats_replace()Taehee Yoo2018-05-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a chain is updated, a counter can be attached. if so, the nft_counters_enabled should be increased. test commands: %nft add table ip filter %nft add chain ip filter input { type filter hook input priority 4\; } %iptables-compat -Z input %nft delete chain ip filter input we can see below messages. [ 286.443720] jump label: negative count! [ 286.448278] WARNING: CPU: 0 PID: 1459 at kernel/jump_label.c:197 __static_key_slow_dec_cpuslocked+0x6f/0xf0 [ 286.449144] Modules linked in: nf_tables nfnetlink ip_tables x_tables [ 286.449144] CPU: 0 PID: 1459 Comm: nft Tainted: G W 4.17.0-rc2+ #12 [ 286.449144] RIP: 0010:__static_key_slow_dec_cpuslocked+0x6f/0xf0 [ 286.449144] RSP: 0018:ffff88010e5176f0 EFLAGS: 00010286 [ 286.449144] RAX: 000000000000001b RBX: ffffffffc0179500 RCX: ffffffffb8a82522 [ 286.449144] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88011b7e5eac [ 286.449144] RBP: 0000000000000000 R08: ffffed00236fce5c R09: ffffed00236fce5b [ 286.449144] R10: ffffffffc0179503 R11: ffffed00236fce5c R12: 0000000000000000 [ 286.449144] R13: ffff88011a28e448 R14: ffff88011a28e470 R15: dffffc0000000000 [ 286.449144] FS: 00007f0384328700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 286.449144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 286.449144] CR2: 00007f038394bf10 CR3: 0000000104a86000 CR4: 00000000001006f0 [ 286.449144] Call Trace: [ 286.449144] static_key_slow_dec+0x6a/0x70 [ 286.449144] nf_tables_chain_destroy+0x19d/0x210 [nf_tables] [ 286.449144] nf_tables_commit+0x1891/0x1c50 [nf_tables] [ 286.449144] nfnetlink_rcv+0x1148/0x13d0 [nfnetlink] [ ... ] Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: fix NULL-ptr in nf_tables_dump_obj()Taehee Yoo2018-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The table field in nft_obj_filter is not an array. In order to check tablename, we should check if the pointer is set. Test commands: %nft add table ip filter %nft add counter ip filter ct1 %nft reset counters Splat looks like: [ 306.510504] kasan: CONFIG_KASAN_INLINE enabled [ 306.516184] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 306.524775] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 306.528284] Modules linked in: nft_objref nft_counter nf_tables nfnetlink ip_tables x_tables [ 306.528284] CPU: 0 PID: 1488 Comm: nft Not tainted 4.17.0-rc4+ #17 [ 306.528284] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [ 306.528284] RIP: 0010:nf_tables_dump_obj+0x52c/0xa70 [nf_tables] [ 306.528284] RSP: 0018:ffff8800b6cb7520 EFLAGS: 00010246 [ 306.528284] RAX: 0000000000000000 RBX: ffff8800b6c49820 RCX: 0000000000000000 [ 306.528284] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffed0016d96e9a [ 306.528284] RBP: ffff8800b6cb75c0 R08: ffffed00236fce7c R09: ffffed00236fce7b [ 306.528284] R10: ffffffff9f6241e8 R11: ffffed00236fce7c R12: ffff880111365108 [ 306.528284] R13: 0000000000000000 R14: ffff8800b6c49860 R15: ffff8800b6c49860 [ 306.528284] FS: 00007f838b007700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 306.528284] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 306.528284] CR2: 00007ffeafabcf78 CR3: 00000000b6cbe000 CR4: 00000000001006f0 [ 306.528284] Call Trace: [ 306.528284] netlink_dump+0x470/0xa20 [ 306.528284] __netlink_dump_start+0x5ae/0x690 [ 306.528284] ? nf_tables_getobj+0x1b3/0x740 [nf_tables] [ 306.528284] nf_tables_getobj+0x2f5/0x740 [nf_tables] [ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables] [ 306.528284] ? nf_tables_getobj+0x740/0x740 [nf_tables] [ 306.528284] ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables] [ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables] [ 306.528284] nfnetlink_rcv_msg+0x8ff/0x932 [nfnetlink] [ 306.528284] ? nfnetlink_rcv_msg+0x216/0x932 [nfnetlink] [ 306.528284] netlink_rcv_skb+0x1c9/0x2f0 [ 306.528284] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink] [ 306.528284] ? debug_check_no_locks_freed+0x270/0x270 [ 306.528284] ? netlink_ack+0x7a0/0x7a0 [ 306.528284] ? ns_capable_common+0x6e/0x110 [ ... ] Fixes: e46abbcc05aa8 ("netfilter: nf_tables: Allow table names of up to 255 chars") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: disable preemption in nft_update_chain_stats()Pablo Neira Ayuso2018-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following splat. [118709.054937] BUG: using smp_processor_id() in preemptible [00000000] code: test/1571 [118709.054970] caller is nft_update_chain_stats.isra.4+0x53/0x97 [nf_tables] [118709.054980] CPU: 2 PID: 1571 Comm: test Not tainted 4.17.0-rc6+ #335 [...] [118709.054992] Call Trace: [118709.055011] dump_stack+0x5f/0x86 [118709.055026] check_preemption_disabled+0xd4/0xe4 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: provide correct argument to nla_strlcpy()Eric Dumazet2018-05-242-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent patch forgot to remove nla_data(), upsetting syzkaller a bit. BUG: KASAN: slab-out-of-bounds in nla_strlcpy+0x13d/0x150 lib/nlattr.c:314 Read of size 1 at addr ffff8801ad1f4fdd by task syz-executor189/4509 CPU: 1 PID: 4509 Comm: syz-executor189 Not tainted 4.17.0-rc6+ #62 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430 nla_strlcpy+0x13d/0x150 lib/nlattr.c:314 nfnl_acct_new+0x574/0xc50 net/netfilter/nfnetlink_acct.c:118 nfnetlink_rcv_msg+0xdb5/0xff0 net/netfilter/nfnetlink.c:212 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448 nfnetlink_rcv+0x1fe/0x1ba0 net/netfilter/nfnetlink.c:513 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 sock_write_iter+0x35a/0x5a0 net/socket.c:908 call_write_iter include/linux/fs.h:1784 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x64d/0x960 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0xf9/0x250 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 Fixes: 4e09fc873d92 ("netfilter: prefer nla_strlcpy for dealing with NLA_STRING attributes") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Florian Westphal <fw@strlen.de> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | ipvs: fix buffer overflow with sync daemon and serviceJulian Anastasov2018-05-231-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller reports for buffer overflow for interface name when starting sync daemons [1] What we do is that we copy user structure into larger stack buffer but later we search NUL past the stack buffer. The same happens for sched_name when adding/editing virtual server. We are restricted by IP_VS_SCHEDNAME_MAXLEN and IP_VS_IFNAME_MAXLEN being used as size in include/uapi/linux/ip_vs.h, so they include the space for NUL. As using strlcpy is wrong for unsafe source, replace it with strscpy and add checks to return EINVAL if source string is not NUL-terminated. The incomplete strlcpy fix comes from 2.6.13. For the netlink interface reduce the len parameter for IPVS_DAEMON_ATTR_MCAST_IFN and IPVS_SVC_ATTR_SCHED_NAME, so that we get proper EINVAL. [1] kernel BUG at lib/string.c:1052! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 373 Comm: syz-executor936 Not tainted 4.17.0-rc4+ #45 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051 RSP: 0018:ffff8801c976f800 EFLAGS: 00010282 RAX: 0000000000000022 RBX: 0000000000000040 RCX: 0000000000000000 RDX: 0000000000000022 RSI: ffffffff8160f6f1 RDI: ffffed00392edef6 RBP: ffff8801c976f800 R08: ffff8801cf4c62c0 R09: ffffed003b5e4fb0 R10: ffffed003b5e4fb0 R11: ffff8801daf27d87 R12: ffff8801c976fa20 R13: ffff8801c976fae4 R14: ffff8801c976fae0 R15: 000000000000048b FS: 00007fd99f75e700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200001c0 CR3: 00000001d6843000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: strlen include/linux/string.h:270 [inline] strlcpy include/linux/string.h:293 [inline] do_ip_vs_set_ctl+0x31c/0x1d00 net/netfilter/ipvs/ip_vs_ctl.c:2388 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x7d/0xd0 net/netfilter/nf_sockopt.c:115 ip_setsockopt+0xd8/0xf0 net/ipv4/ip_sockglue.c:1253 udp_setsockopt+0x62/0xa0 net/ipv4/udp.c:2487 ipv6_setsockopt+0x149/0x170 net/ipv6/ipv6_sockglue.c:917 tcp_setsockopt+0x93/0xe0 net/ipv4/tcp.c:3057 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x447369 RSP: 002b:00007fd99f75dda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00000000006e39e4 RCX: 0000000000447369 RDX: 000000000000048b RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000018 R09: 0000000000000000 R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000006e39e0 R13: 75a1ff93f0896195 R14: 6f745f3168746576 R15: 0000000000000001 Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 d2 8f 48 fa eb de 55 48 89 fe 48 c7 c7 60 65 64 88 48 89 e5 e8 91 dd f3 f9 <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56 RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801c976f800 Reported-and-tested-by: syzbot+aac887f77319868646df@syzkaller.appspotmail.com Fixes: e4ff67513096 ("ipvs: add sync_maxlen parameter for the sync daemon") Fixes: 4da62fc70d7c ("[IPVS]: Fix for overflows") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nft_limit: fix packet ratelimitingPablo Neira Ayuso2018-05-231-14/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Credit calculations for the packet ratelimiting are not correct, as per the applied ratelimit of 25/second and burst 8, a total of 33 packets should have been accepted. This is true in iptables(33) but not in nftables (~65). For packet ratelimiting, use: div_u64(limit->nsecs, limit->rate) * limit->burst; to calculate credit, just like in iptables' xt_limit does. Moreover, use default burst in iptables, users are expecting similar behaviour. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nft_meta: fix wrong value dereference in nft_meta_set_evalTaehee Yoo2018-05-231-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the nft_meta_set_eval, nftrace value is dereferenced as u32 from sreg. But correct type is u8. so that sometimes incorrect value is dereferenced. Steps to reproduce: %nft add table ip filter %nft add chain ip filter input { type filter hook input priority 4\; } %nft add rule ip filter input nftrace set 0 %nft monitor Sometimes, we can see trace messages. trace id 16767227 ip filter input packet: iif "enp2s0" ether saddr xx:xx:xx:xx:xx:xx ether daddr xx:xx:xx:xx:xx:xx ip saddr 192.168.0.1 ip daddr 255.255.255.255 ip dscp cs0 ip ecn not-ect ip trace id 16767227 ip filter input rule nftrace set 0 (verdict continue) trace id 16767227 ip filter input verdict continue trace id 16767227 ip filter input Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: ebtables: handle string from userspace with carePaolo Abeni2018-05-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | strlcpy() can't be safely used on a user-space provided string, as it can try to read beyond the buffer's end, if the latter is not NULL terminated. Leveraging the above, syzbot has been able to trigger the following splat: BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline] BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user net/bridge/netfilter/ebtables.c:1957 [inline] BUG: KASAN: stack-out-of-bounds in ebt_size_mwt net/bridge/netfilter/ebtables.c:2059 [inline] BUG: KASAN: stack-out-of-bounds in size_entry_mwt net/bridge/netfilter/ebtables.c:2155 [inline] BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0 net/bridge/netfilter/ebtables.c:2194 Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504 CPU: 0 PID: 4504 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #40 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 memcpy+0x37/0x50 mm/kasan/kasan.c:303 strlcpy include/linux/string.h:300 [inline] compat_mtw_from_user net/bridge/netfilter/ebtables.c:1957 [inline] ebt_size_mwt net/bridge/netfilter/ebtables.c:2059 [inline] size_entry_mwt net/bridge/netfilter/ebtables.c:2155 [inline] compat_copy_entries+0x96c/0x14a0 net/bridge/netfilter/ebtables.c:2194 compat_do_replace+0x483/0x900 net/bridge/netfilter/ebtables.c:2285 compat_do_ebt_set_ctl+0x2ac/0x324 net/bridge/netfilter/ebtables.c:2367 compat_nf_sockopt net/netfilter/nf_sockopt.c:144 [inline] compat_nf_setsockopt+0x9b/0x140 net/netfilter/nf_sockopt.c:156 compat_ip_setsockopt+0xff/0x140 net/ipv4/ip_sockglue.c:1279 inet_csk_compat_setsockopt+0x97/0x120 net/ipv4/inet_connection_sock.c:1041 compat_tcp_setsockopt+0x49/0x80 net/ipv4/tcp.c:2901 compat_sock_common_setsockopt+0xb4/0x150 net/core/sock.c:3050 __compat_sys_setsockopt+0x1ab/0x7c0 net/compat.c:403 __do_compat_sys_setsockopt net/compat.c:416 [inline] __se_compat_sys_setsockopt net/compat.c:413 [inline] __ia32_compat_sys_setsockopt+0xbd/0x150 net/compat.c:413 do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline] do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7fb3cb9 RSP: 002b:00000000fff0c26c EFLAGS: 00000282 ORIG_RAX: 000000000000016e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000000080 RSI: 0000000020000300 RDI: 00000000000005f4 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 The buggy address belongs to the page: page:ffffea0006c2afc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x2fffc0000000000() raw: 02fffc0000000000 0000000000000000 0000000000000000 00000000ffffffff raw: 0000000000000000 ffffea0006c20101 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Fix the issue replacing the unsafe function with strscpy() and taking care of possible errors. Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Reported-and-tested-by: syzbot+4e42a04e0bc33cb6c087@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: fix NULL pointer dereference on nft_ct_helper_obj_dump()Taehee Yoo2018-05-171-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the nft_ct_helper_obj_dump(), always priv->helper4 is dereferenced. But if family is ipv6, priv->helper6 should be dereferenced. Steps to reproduces: #test.nft table ip6 filter { ct helper ftp { type "ftp" protocol tcp } chain input { type filter hook input priority 4; ct helper set "ftp" } } %nft -f test.nft %nft list ruleset we can see the below messages: [ 916.286233] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 916.294777] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 916.302613] Modules linked in: nft_objref nf_conntrack_sip nf_conntrack_snmp nf_conntrack_broadcast nf_conntrack_ftp nft_ct nf_conntrack nf_tables nfnetlink [last unloaded: nfnetlink] [ 916.318758] CPU: 1 PID: 2093 Comm: nft Not tainted 4.17.0-rc4+ #181 [ 916.326772] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [ 916.338773] RIP: 0010:strlen+0x1a/0x90 [ 916.342781] RSP: 0018:ffff88010ff0f2f8 EFLAGS: 00010292 [ 916.346773] RAX: dffffc0000000000 RBX: ffff880119b26ee8 RCX: ffff88010c150038 [ 916.354777] RDX: 0000000000000002 RSI: ffff880119b26ee8 RDI: 0000000000000010 [ 916.362773] RBP: 0000000000000010 R08: 0000000000007e88 R09: ffff88010c15003c [ 916.370773] R10: ffff88010c150037 R11: ffffed002182a007 R12: ffff88010ff04040 [ 916.378779] R13: 0000000000000010 R14: ffff880119b26f30 R15: ffff88010ff04110 [ 916.387265] FS: 00007f57a1997700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000 [ 916.394785] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 916.402778] CR2: 00007f57a0ac80f0 CR3: 000000010ff02000 CR4: 00000000001006e0 [ 916.410772] Call Trace: [ 916.414787] nft_ct_helper_obj_dump+0x94/0x200 [nft_ct] [ 916.418779] ? nft_ct_set_eval+0x560/0x560 [nft_ct] [ 916.426771] ? memset+0x1f/0x40 [ 916.426771] ? __nla_reserve+0x92/0xb0 [ 916.434774] ? memcpy+0x34/0x50 [ 916.434774] nf_tables_fill_obj_info+0x484/0x860 [nf_tables] [ 916.442773] ? __nft_release_basechain+0x600/0x600 [nf_tables] [ 916.450779] ? lock_acquire+0x193/0x380 [ 916.454771] ? lock_acquire+0x193/0x380 [ 916.458789] ? nf_tables_dump_obj+0x148/0xcb0 [nf_tables] [ 916.462777] nf_tables_dump_obj+0x5f0/0xcb0 [nf_tables] [ 916.470769] ? __alloc_skb+0x30b/0x500 [ 916.474779] netlink_dump+0x752/0xb50 [ 916.478775] __netlink_dump_start+0x4d3/0x750 [ 916.482784] nf_tables_getobj+0x27a/0x930 [nf_tables] [ 916.490774] ? nft_obj_notify+0x100/0x100 [nf_tables] [ 916.494772] ? nf_tables_getobj+0x930/0x930 [nf_tables] [ 916.502579] ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables] [ 916.506774] ? nft_obj_notify+0x100/0x100 [nf_tables] [ 916.514808] nfnetlink_rcv_msg+0x8ab/0xa86 [nfnetlink] [ 916.518771] ? nfnetlink_rcv_msg+0x550/0xa86 [nfnetlink] [ 916.526782] netlink_rcv_skb+0x23e/0x360 [ 916.530773] ? nfnetlink_bind+0x200/0x200 [nfnetlink] [ 916.534778] ? debug_check_no_locks_freed+0x280/0x280 [ 916.542770] ? netlink_ack+0x870/0x870 [ 916.546786] ? ns_capable_common+0xf4/0x130 [ 916.550765] nfnetlink_rcv+0x172/0x16c0 [nfnetlink] [ 916.554771] ? sched_clock_local+0xe2/0x150 [ 916.558774] ? sched_clock_cpu+0x144/0x180 [ 916.566575] ? lock_acquire+0x380/0x380 [ 916.570775] ? sched_clock_local+0xe2/0x150 [ 916.574765] ? nfnetlink_net_init+0x130/0x130 [nfnetlink] [ 916.578763] ? sched_clock_cpu+0x144/0x180 [ 916.582770] ? lock_acquire+0x193/0x380 [ 916.590771] ? lock_acquire+0x193/0x380 [ 916.594766] ? lock_acquire+0x380/0x380 [ 916.598760] ? netlink_deliver_tap+0x262/0xa60 [ 916.602766] ? lock_acquire+0x193/0x380 [ 916.606766] netlink_unicast+0x3ef/0x5a0 [ 916.610771] ? netlink_attachskb+0x630/0x630 [ 916.614763] netlink_sendmsg+0x72a/0xb00 [ 916.618769] ? netlink_unicast+0x5a0/0x5a0 [ 916.626766] ? _copy_from_user+0x92/0xc0 [ 916.630773] __sys_sendto+0x202/0x300 [ 916.634772] ? __ia32_sys_getpeername+0xb0/0xb0 [ 916.638759] ? lock_acquire+0x380/0x380 [ 916.642769] ? lock_acquire+0x193/0x380 [ 916.646761] ? finish_task_switch+0xf4/0x560 [ 916.650763] ? __schedule+0x582/0x19a0 [ 916.655301] ? __sched_text_start+0x8/0x8 [ 916.655301] ? up_read+0x1c/0x110 [ 916.655301] ? __do_page_fault+0x48b/0xaa0 [ 916.655301] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe [ 916.655301] __x64_sys_sendto+0xdd/0x1b0 [ 916.655301] do_syscall_64+0x96/0x3d0 [ 916.655301] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 916.655301] RIP: 0033:0x7f57a0ff5e03 [ 916.655301] RSP: 002b:00007fff6367e0a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 916.655301] RAX: ffffffffffffffda RBX: 00007fff6367f1e0 RCX: 00007f57a0ff5e03 [ 916.655301] RDX: 0000000000000020 RSI: 00007fff6367e110 RDI: 0000000000000003 [ 916.655301] RBP: 00007fff6367e100 R08: 00007f57a0ce9160 R09: 000000000000000c [ 916.655301] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff6367e110 [ 916.655301] R13: 0000000000000020 R14: 00007f57a153c610 R15: 0000562417258de0 [ 916.655301] Code: ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 fa 53 48 c1 ea 03 48 b8 00 00 00 00 00 fc ff df 48 89 fd 48 83 ec 08 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f [ 916.655301] RIP: strlen+0x1a/0x90 RSP: ffff88010ff0f2f8 [ 916.771929] ---[ end trace 1065e048e72479fe ]--- [ 916.777204] Kernel panic - not syncing: Fatal exception [ 916.778158] Kernel Offset: 0x14000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2018-05-2616-43/+125
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "16 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kasan: fix memory hotplug during boot kasan: free allocated shadow memory on MEM_CANCEL_ONLINE checkpatch: fix macro argument precedence test init/main.c: include <linux/mem_encrypt.h> kernel/sys.c: fix potential Spectre v1 issue mm/memory_hotplug: fix leftover use of struct page during hotplug proc: fix smaps and meminfo alignment mm: do not warn on offline nodes unless the specific node is explicitly requested mm, memory_hotplug: make has_unmovable_pages more robust mm/kasan: don't vfree() nonexistent vm_area MAINTAINERS: change hugetlbfs maintainer and update files ipc/shm: fix shmat() nil address after round-down when remapping Revert "ipc/shm: Fix shmat mmap nil-page protection" idr: fix invalid ptr dereference on item delete ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio" mm: fix nr_rotate_swap leak in swapon() error case
| * | | kasan: fix memory hotplug during bootDavid Hildenbrand2018-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using module_init() is wrong. E.g. ACPI adds and onlines memory before our memory notifier gets registered. This makes sure that ACPI memory detected during boot up will not result in a kernel crash. Easily reproducible with QEMU, just specify a DIMM when starting up. Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com Fixes: 786a8959912e ("kasan: disable memory hotplug") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | kasan: free allocated shadow memory on MEM_CANCEL_ONLINEDavid Hildenbrand2018-05-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have to free memory again when we cancel onlining, otherwise a later onlining attempt will fail. Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | checkpatch: fix macro argument precedence testJoe Perches2018-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | checkpatch's macro argument precedence test is broken so fix it. Link: http://lkml.kernel.org/r/5dd900e9197febc1995604bb33c23c136d8b33ce.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | init/main.c: include <linux/mem_encrypt.h>Mathieu Malaterre2018-05-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit c7753208a94c ("x86, swiotlb: Add memory encryption support") a call to function `mem_encrypt_init' was added. Include prototype defined in header <linux/mem_encrypt.h> to prevent a warning reported during compilation with W=1: init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes] Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | kernel/sys.c: fix potential Spectre v1 issueGustavo A. R. Silva2018-05-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `resource' can be controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap) kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap) Fix this by sanitizing *resource* before using it to index current->signal->rlim Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm/memory_hotplug: fix leftover use of struct page during hotplugJonathan Cameron2018-05-263-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The case of a new numa node got missed in avoiding using the node info from page_struct during hotplug. In this path we have a call to register_mem_sect_under_node (which allows us to specify it is hotplug so don't change the node), via link_mem_sections which unfortunately does not. Fix is to pass check_nid through link_mem_sections as well and disable it in the new numa node path. Note the bug only 'sometimes' manifests depending on what happens to be in the struct page structures - there are lots of them and it only needs to match one of them. The result of the bug is that (with a new memory only node) we never successfully call register_mem_sect_under_node so don't get the memory associated with the node in sysfs and meminfo for the node doesn't report it. It came up whilst testing some arm64 hotplug patches, but appears to be universal. Whilst I'm triggering it by removing then reinserting memory to a node with no other elements (thus making the node disappear then appear again), it appears it would happen on hotplugging memory where there was none before and it doesn't seem to be related the arm64 patches. These patches call __add_pages (where most of the issue was fixed by Pavel's patch). If there is a node at the time of the __add_pages call then all is well as it calls register_mem_sect_under_node from there with check_nid set to false. Without a node that function returns having not done the sysfs related stuff as there is no node to use. This is expected but it is the resulting path that fails... Exact path to the problem is as follows: mm/memory_hotplug.c: add_memory_resource() The node is not online so we enter the 'if (new_node)' twice, on the second such block there is a call to link_mem_sections which calls into drivers/node.c: link_mem_sections() which calls drivers/node.c: register_mem_sect_under_node() which calls get_nid_for_pfn and keeps trying until the output of that matches the expected node (passed all the way down from add_memory_resource) It is effectively the same fix as the one referred to in the fixes tag just in the code path for a new node where the comments point out we have to rerun the link creation because it will have failed in register_new_memory (as there was no node at the time). (actually that comment is wrong now as we don't have register_new_memory any more it got renamed to hotplug_memory_register in Pavel's patch). Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com Fixes: fc44f7f9231a ("mm/memory_hotplug: don't read nid from struct page during hotplug") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | proc: fix smaps and meminfo alignmentHugh Dickins2018-05-261-5/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 4.17-rc /proc/meminfo and /proc/<pid>/smaps look ugly: single-digit numbers (commonly 0) are misaligned. Remove seq_put_decimal_ull_width()'s leftover optimization for single digits: it's wrong now that num_to_str() takes care of the width. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805241554210.1326@eggly.anvils Fixes: d1be35cb6f96 ("proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm: do not warn on offline nodes unless the specific node is explicitly ↵Michal Hocko2018-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | requested Oscar has noticed that we splat WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9 [...] CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G W E 4.17.0-rc5-next-20180517-1-default+ #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: kacpi_hotplug acpi_hotplug_work_fn Call Trace: vmemmap_populate+0xf2/0x2ae sparse_mem_map_populate+0x28/0x35 sparse_add_one_section+0x4c/0x187 __add_pages+0xe7/0x1a0 add_pages+0x16/0x70 add_memory_resource+0xa3/0x1d0 add_memory+0xe4/0x110 acpi_memory_device_add+0x134/0x2e0 acpi_bus_attach+0xd9/0x190 acpi_bus_scan+0x37/0x70 acpi_device_hotplug+0x389/0x4e0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x146/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 ret_from_fork+0x35/0x40 when adding memory to a node that is currently offline. The VM_WARN_ON is just too loud without a good reason. In this particular case we are doing alloc_pages_node(node, GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_NOWARN, order) so we do not insist on allocating from the given node (it is more a hint) so we can fall back to any other populated node and moreover we explicitly ask to not warn for the allocation failure. Soften the warning only to cases when somebody asks for the given node explicitly by __GFP_THISNODE. Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Oscar Salvador <osalvador@techadventures.net> Tested-by: Oscar Salvador <osalvador@techadventures.net> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm, memory_hotplug: make has_unmovable_pages more robustMichal Hocko2018-05-261-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oscar has reported: : Due to an unfortunate setting with movablecore, memblocks containing bootmem : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable. : So while trying to remove that memory, the system failed in do_migrate_range : and __offline_pages never returned. : : This can be reproduced by running : qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M : and movablecore=4G kernel command line : : linux kernel: BIOS-provided physical RAM map: : linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable : linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved : linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable : linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved : linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable : linux kernel: NX (Execute Disable) protection: active : linux kernel: SMBIOS 2.8 present. : linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org : linux kernel: Hypervisor detected: KVM : linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved : linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable : linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000 : : linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 : linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1 : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff] : linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug : linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0 : linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0 : linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff] : linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff] : : zoneinfo shows that the zone movable is placed into both numa nodes: : Node 0, zone Movable : pages free 160140 : min 1823 : low 2278 : high 2733 : spanned 262144 : present 262144 : managed 245670 : Node 1, zone Movable : pages free 448427 : min 3827 : low 4783 : high 5739 : spanned 524288 : present 524288 : managed 515766 Note how only Node 0 has a hutplugable memory region which would rule it out from the early memblock allocations (most likely memmap). Node1 will surely contain memmaps on the same node and those would prevent offlining to succeed. So this is arguably a configuration issue. Although one could argue that we should be more clever and rule early allocations from the zone movable. This would be correct but probably not worth the effort considering what a hack movablecore is. Anyway, We could do better for those cases though. We rely on start_isolate_page_range resp. has_unmovable_pages to do their job. The first one isolates the whole range to be offlined so that we do not allocate from it anymore and the later makes sure we are not stumbling over non-migrateable pages. has_unmovable_pages is overly optimistic, however. It doesn't check all the pages if we are withing zone_movable because we rely that those pages will be always migrateable. As it turns out we are still not perfect there. While bootmem pages in zonemovable sound like a clear bug which should be fixed let's remove the optimization for now and warn if we encounter unmovable pages in zone_movable in the meantime. That should help for now at least. Btw. this wasn't a real problem until commit 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early") because we used to have a small number of retries and then failed. This turned out to be too fragile though. Link: http://lkml.kernel.org/r/20180523125555.30039-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Oscar Salvador <osalvador@techadventures.net> Tested-by: Oscar Salvador <osalvador@techadventures.net> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm/kasan: don't vfree() nonexistent vm_areaAndrey Ryabinin2018-05-261-2/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KASAN uses different routines to map shadow for hot added memory and memory obtained in boot process. Attempt to offline memory onlined by normal boot process leads to this: Trying to vfree() nonexistent vm area (000000005d3b34b9) WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190 Call Trace: kasan_mem_notifier+0xad/0xb9 notifier_call_chain+0x166/0x260 __blocking_notifier_call_chain+0xdb/0x140 __offline_pages+0x96a/0xb10 memory_subsys_offline+0x76/0xc0 device_offline+0xb8/0x120 store_mem_state+0xfa/0x120 kernfs_fop_write+0x1d5/0x320 __vfs_write+0xd4/0x530 vfs_write+0x105/0x340 SyS_write+0xb0/0x140 Obviously we can't call vfree() to free memory that wasn't allocated via vmalloc(). Use find_vm_area() to see if we can call vfree(). Unfortunately it's a bit tricky to properly unmap and free shadow allocated during boot, so we'll have to keep it. If memory will come online again that shadow will be reused. Matthew asked: how can you call vfree() on something that isn't a vmalloc address? vfree() is able to free any address returned by __vmalloc_node_range(). And __vmalloc_node_range() gives you any address you ask. It doesn't have to be an address in [VMALLOC_START, VMALLOC_END] range. That's also how the module_alloc()/module_memfree() works on architectures that have designated area for modules. [aryabinin@virtuozzo.com: improve comments] Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com [akpm@linux-foundation.org: fix typos, reflow comment] Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Paul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | MAINTAINERS: change hugetlbfs maintainer and update filesMike Kravetz2018-05-261-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current hugetlbfs maintainer has not been active for more than a few years. I have been been active in this area for more than two years and plan to remain active in the foreseeable future. Also, update the hugetlbfs entry to include linux-mm mail list and additional hugetlbfs related files. hugetlb.c and hugetlb.h are not 100% hugetlbfs, but a majority of their content is hugetlbfs related. Link: http://lkml.kernel.org/r/20180518225236.19079-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Nadia Yvette Chambers <nyc@holomorphy.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | ipc/shm: fix shmat() nil address after round-down when remappingDavidlohr Bueso2018-05-261-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shmat()'s SHM_REMAP option forbids passing a nil address for; this is in fact the very first thing we check for. Andrea reported that for SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check, but we need to check again if the address was rounded down to nil. As of this patch, such cases will return -EINVAL. Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805 Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | Revert "ipc/shm: Fix shmat mmap nil-page protection"Davidlohr Bueso2018-05-261-7/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "ipc/shm: shmat() fixes around nil-page". These patches fix two issues reported[1] a while back by Joe and Andrea around how shmat(2) behaves with nil-page. The first reverts a commit that it was incorrectly thought that mapping nil-page (address=0) was a no no with MAP_FIXED. This is not the case, with the exception of SHM_REMAP; which is address in the second patch. I chose two patches because it is easier to backport and it explicitly reverts bogus behaviour. Both patches ought to be in -stable and ltp testcases need updated (the added testcase around the cve can be modified to just test for SHM_RND|SHM_REMAP). [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805 This patch (of 2): Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection") worked on the idea that we should not be mapping as root addr=0 and MAP_FIXED. However, it was reported that this scenario is in fact valid, thus making the patch both bogus and breaks userspace as well. For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem initialization[1]. [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347 Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection") Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | idr: fix invalid ptr dereference on item deleteMatthew Wilcox2018-05-262-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the radix tree underlying the IDR happens to be full and we attempt to remove an id which is larger than any id in the IDR, we will call __radix_tree_delete() with an uninitialised 'slot' pointer, at which point anything could happen. This was easiest to hit with a single entry at id 0 and attempting to remove a non-0 id, but it could have happened with 64 entries and attempting to remove an id >= 64. Roman said: The syzcaller test boils down to opening /dev/kvm, creating an eventfd, and calling a couple of KVM ioctls. None of this requires superuser. And the result is dereferencing an uninitialized pointer which is likely a crash. The specific path caught by syzbot is via KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are other user-triggerable paths, so cc:stable is probably justified. Matthew added: We have around 250 calls to idr_remove() in the kernel today. Many of them pass an ID which is embedded in the object they're removing, so they're safe. Picking a few likely candidates: drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl. drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar drivers/atm/nicstar.c could be taken down by a handcrafted packet Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree") Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com> Debugged-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting ↵Changwei Ge2018-05-261-10/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | incorrect bio" This reverts commit ba16ddfbeb9d ("ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"). In my testing, this patch introduces a problem that mkfs can't have slots more than 16 with 4k block size. And the original logic is safe actually with the situation it mentions so revert this commit. Attach test log: (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0,bi_sector 8192 (mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5 (mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5 (mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5 Link: http://lkml.kernel.org/r/SIXPR06MB0461721F398A5A92FC68C39ED5920@SIXPR06MB0461.apcprd06.prod.outlook.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | mm: fix nr_rotate_swap leak in swapon() error caseOmar Sandoval2018-05-261-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If swapon() fails after incrementing nr_rotate_swap, we don't decrement it and thus effectively leak it. Make sure we decrement it if we incremented it. Link: http://lkml.kernel.org/r/b6fe6b879f17fa68eee6cbd876f459f6e5e33495.1526491581.git.osandov@fb.com Fixes: 81a0298bdfab ("mm, swap: don't use VMA based swap readahead if HDD is used as swap") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>