summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'akpm' (patches from Andrew)Linus Torvalds2015-06-254-7/+129
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge first patchbomb from Andrew Morton: - a few misc things - ocfs2 udpates - kernel/watchdog.c feature work (took ages to get right) - most of MM. A few tricky bits are held up and probably won't make 4.2. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits) mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc() mm, thp: respect MPOL_PREFERRED policy with non-local node tmpfs: truncate prealloc blocks past i_size mm/memory hotplug: print the last vmemmap region at the end of hot add memory mm/mmap.c: optimization of do_mmap_pgoff function mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan mm: kmemleak: avoid deadlock on the kmemleak object insertion error path mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup() mm: kmemleak: fix delete_object_*() race when called on the same memory block mm: kmemleak: allow safe memory scanning during kmemleak disabling memcg: convert mem_cgroup->under_oom from atomic_t to int memcg: remove unused mem_cgroup->oom_wakeups frontswap: allow multiple backends x86, mirror: x86 enabling - find mirrored memory ranges mm/memblock: allocate boot time data structures from mirrored memory mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute mm: do not ignore mapping_gfp_mask in page cache allocation paths mm/cma.c: fix typos in comments mm/oom_kill.c: print points as unsigned int mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages ...
| * mm: oom_kill: clean up victim marking and exiting interfacesJohannes Weiner2015-06-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename unmark_oom_victim() to exit_oom_victim(). Marking and unmarking are related in functionality, but the interface is not symmetrical at all: one is an internal OOM killer function used during the killing, the other is for an OOM victim to signal its own death on exit later on. This has locking implications, see follow-up changes. While at it, rename mark_tsk_oom_victim() to mark_oom_victim(), which is easier on the eye. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * watchdog: add watchdog_cpumask sysctl to assist nohzChris Metcalf2015-06-253-5/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the default behavior of watchdog so it only runs on the housekeeping cores when nohz_full is enabled at build and boot time. Allow modifying the set of cores the watchdog is currently running on with a new kernel.watchdog_cpumask sysctl. In the current system, the watchdog subsystem runs a periodic timer that schedules the watchdog kthread to run. However, nohz_full cores are designed to allow userspace application code running on those cores to have 100% access to the CPU. So the watchdog system prevents the nohz_full application code from being able to run the way it wants to, thus the motivation to suppress the watchdog on nohz_full cores, which this patchset provides by default. However, if we disable the watchdog globally, then the housekeeping cores can't benefit from the watchdog functionality. So we allow disabling it only on some cores. See Documentation/lockup-watchdogs.txt for more information. [jhubbard@nvidia.com: fix a watchdog crash in some configurations] Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * smpboot: allow excluding cpus from the smpboot threadsChris Metcalf2015-06-251-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch series allows the watchdog to run by default only on the housekeeping cores when nohz_full is in effect; this seems to be a good compromise short of turning it off completely (since the nohz_full cores can't tolerate a watchdog). To provide customizability, we add /proc/sys/kernel/watchdog_cpumask so that the set of cores running the watchdog can be tuned to different values after bootup. To implement this customizability, we add a new smpboot_update_cpumask_percpu_thread() API to the smpboot_thread subsystem that lets us park or unpark "unwanted" threads. And now that threads can be parked for long periods of time, we tweak the /proc/<pid>/stat and /proc/<pid>/status code so parked threads aren't reported as running, which is otherwise confusing. This patch (of 3): This change allows some cores to be excluded from running the smp_hotplug_thread tasks. The following commit to update kernel/watchdog.c to use this functionality is the motivating example, and more information on the motivation is provided there. A new smp_hotplug_thread field is introduced, "cpumask", which is cpumask field managed by the smpboot subsystem that indicates whether or not the given smp_hotplug_thread should run on that core; the cpumask is checked when deciding whether to unpark the thread. To limit the cpumask to less than cpu_possible, you must call smpboot_update_cpumask_percpu_thread() after registering. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2015-06-257-118/+413
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Add TX fast path in mac80211, from Johannes Berg. 2) Add TSO/GRO support to ibmveth, from Thomas Falcon 3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau. 4) Lots of new rhashtable tests, from Thomas Graf. 5) Run ingress qdisc lockless, from Alexei Starovoitov. 6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet. 7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov. 8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck. 9) Move page frag allocator under mm/, also from Alexander. 10) Add xmit_more support to hv_netvsc, from KY Srinivasan. 11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler. 12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko. 13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn. 14) Add netdev driver for GENEVE tunnels, from John W Linville. 15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso. 16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet. 17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann. 18) Add tail call support to BPF, from Alexei Starovoitov. 19) Add several pktgen helper scripts, from Jesper Dangaard Brouer. 20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa. 21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet. 22) Add Cavium ThunderX driver, from Sunil Goutham. 23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov. 24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai. 25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu. 26) Add more entropy inputs to flow dissector, from Tom Herbert. 27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen. 28) Convert ipset over to RCU locking, from Jozsef Kadlecsik. 29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...
| * | bpf: let kprobe programs use bpf_get_smp_processor_id() helperAlexei Starovoitov2015-06-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | It's useful to do per-cpu histograms. Suggested-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bpf: allow networking programs to use bpf_trace_printk() for debuggingAlexei Starovoitov2015-06-162-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf_trace_printk() is a helper function used to debug eBPF programs. Let socket and TC programs use it as well. Note, it's DEBUG ONLY helper. If it's used in the program, the kernel will print warning banner to make sure users don't use it in production. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bpf: introduce current->pid, tgid, uid, gid, comm accessorsAlexei Starovoitov2015-06-163-0/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | eBPF programs attached to kprobes need to filter based on current->pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current->tgid << 32 | current->pid u64 bpf_get_current_uid_gid(void) Return: current_gid << 32 | current_uid bpf_get_current_comm(char *buf, int size_of_buf) stores current->comm into buf They can be used from the programs attached to TC as well to classify packets based on current task fields. Update tracex2 example to print histogram of write syscalls for each process instead of aggregated for all. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-06-142-2/+2
| |\ \
| * \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-06-093-3/+20
| |\ \ \
| * | | | bpf: allow programs to write to certain skb fieldsAlexei Starovoitov2015-06-071-9/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allow programs read/write skb->mark, tc_index fields and ((struct qdisc_skb_cb *)cb)->data. mark and tc_index are generically useful in TC. cb[0]-cb[4] are primarily used to pass arguments from one program to another called via bpf_tail_call() which can be seen in sockex3_kern.c example. All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs. mark, tc_index are writeable from tc_cls_act only. cb[0]-cb[4] are writeable by both sockets and tc_cls_act. Add verifier tests and improve sample code. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-06-022-6/+11
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/phy/amd-xgbe-phy.c drivers/net/wireless/iwlwifi/Kconfig include/net/mac80211.h iwlwifi/Kconfig and mac80211.h were both trivial overlapping changes. The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and the bug fix that happened on the 'net' side is already integrated into the rest of the amd-xgbe driver. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ebpf: misc core cleanupDaniel Borkmann2015-06-012-48/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Besides others, move bpf_tail_call_proto to the remaining definitions of other protos, improve comments a bit (i.e. remove some obvious ones, where the code is already self-documenting, add objectives for others), simplify bpf_prog_array_compatible() a bit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ebpf: allow bpf_ktime_get_ns_proto also for networkingDaniel Borkmann2015-06-013-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As this is already exported from tracing side via commit d9847d310ab4 ("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might as well want to move it to the core, so also networking users can make use of it, e.g. to measure diffs for certain flows from ingress/egress. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | bpf: add missing rcu protection when releasing programs from prog_arrayAlexei Starovoitov2015-05-312-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally the program attachment place (like sockets, qdiscs) takes care of rcu protection and calls bpf_prog_put() after a grace period. The programs stored inside prog_array may not be attached anywhere, so prog_array needs to take care of preserving rcu protection. Otherwise bpf_tail_call() will race with bpf_prog_put(). To solve that introduce bpf_prog_put_rcu() helper function and use it in 3 places where unattached program can decrement refcnt: closing program fd, deleting/replacing program in prog_array. Fixes: 04fd61ab36ec ("bpf: allow bpf programs to tail-call other bpf programs") Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-05-234-50/+82
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/cadence/macb.c drivers/net/phy/phy.c include/linux/skbuff.h net/ipv4/tcp.c net/switchdev/switchdev.c Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD} renaming overlapping with net-next changes of various sorts. phy.c was a case of two changes, one adding a local variable to a function whilst the second was removing one. tcp.c overlapped a deadlock fix with the addition of new tcp_info statistic values. macb.c involved the addition of two zyncq device entries. skbuff.h involved adding back ipv4_daddr to nf_bridge_info whilst net-next changes put two other existing members of that struct into a union. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | bpf: allow bpf programs to tail-call other bpf programsAlexei Starovoitov2015-05-215-8/+220
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | introduce bpf_tail_call(ctx, &jmp_table, index) helper function which can be used from BPF programs like: int bpf_prog(struct pt_regs *ctx) { ... bpf_tail_call(ctx, &jmp_table, index); ... } that is roughly equivalent to: int bpf_prog(struct pt_regs *ctx) { ... if (jmp_table[index]) return (*jmp_table[index])(ctx); ... } The important detail that it's not a normal call, but a tail call. The kernel stack is precious, so this helper reuses the current stack frame and jumps into another BPF program without adding extra call frame. It's trivially done in interpreter and a bit trickier in JITs. In case of x64 JIT the bigger part of generated assembler prologue is common for all programs, so it is simply skipped while jumping. Other JITs can do similar prologue-skipping optimization or do stack unwind before jumping into the next program. bpf_tail_call() arguments: ctx - context pointer jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table index - index in the jump table Since all BPF programs are idenitified by file descriptor, user space need to populate the jmp_table with FDs of other BPF programs. If jmp_table[index] is empty the bpf_tail_call() doesn't jump anywhere and program execution continues as normal. New BPF_MAP_TYPE_PROG_ARRAY map type is introduced so that user space can populate this jmp_table array with FDs of other bpf programs. Programs can share the same jmp_table array or use multiple jmp_tables. The chain of tail calls can form unpredictable dynamic loops therefore tail_call_cnt is used to limit the number of calls and currently is set to 32. Use cases: Acked-by: Daniel Borkmann <daniel@iogearbox.net> ========== - simplify complex programs by splitting them into a sequence of small programs - dispatch routine For tracing and future seccomp the program may be triggered on all system calls, but processing of syscall arguments will be different. It's more efficient to implement them as: int syscall_entry(struct seccomp_data *ctx) { bpf_tail_call(ctx, &syscall_jmp_table, ctx->nr /* syscall number */); ... default: process unknown syscall ... } int sys_write_event(struct seccomp_data *ctx) {...} int sys_read_event(struct seccomp_data *ctx) {...} syscall_jmp_table[__NR_write] = sys_write_event; syscall_jmp_table[__NR_read] = sys_read_event; For networking the program may call into different parsers depending on packet format, like: int packet_parser(struct __sk_buff *skb) { ... parse L2, L3 here ... __u8 ipproto = load_byte(skb, ... offsetof(struct iphdr, protocol)); bpf_tail_call(skb, &ipproto_jmp_table, ipproto); ... default: process unknown protocol ... } int parse_tcp(struct __sk_buff *skb) {...} int parse_udp(struct __sk_buff *skb) {...} ipproto_jmp_table[IPPROTO_TCP] = parse_tcp; ipproto_jmp_table[IPPROTO_UDP] = parse_udp; - for TC use case, bpf_tail_call() allows to implement reclassify-like logic - bpf_map_update_elem/delete calls into BPF_MAP_TYPE_PROG_ARRAY jump table are atomic, so user space can build chains of BPF programs on the fly Implementation details: ======================= - high performance of bpf_tail_call() is the goal. It could have been implemented without JIT changes as a wrapper on top of BPF_PROG_RUN() macro, but with two downsides: . all programs would have to pay performance penalty for this feature and tail call itself would be slower, since mandatory stack unwind, return, stack allocate would be done for every tailcall. . tailcall would be limited to programs running preempt_disabled, since generic 'void *ctx' doesn't have room for 'tail_call_cnt' and it would need to be either global per_cpu variable accessed by helper and by wrapper or global variable protected by locks. In this implementation x64 JIT bypasses stack unwind and jumps into the callee program after prologue. - bpf_prog_array_compatible() ensures that prog_type of callee and caller are the same and JITed/non-JITed flag is the same, since calling JITed program from non-JITed is invalid, since stack frames are different. Similarly calling kprobe type program from socket type program is invalid. - jump table is implemented as BPF_MAP_TYPE_PROG_ARRAY to reuse 'map' abstraction, its user space API and all of verifier logic. It's in the existing arraymap.c file, since several functions are shared with regular array map. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-05-134-13/+13
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Four minor merge conflicts: 1) qca_spi.c renamed the local variable used for the SPI device from spi_device to spi, meanwhile the spi_set_drvdata() call got moved further up in the probe function. 2) Two changes were both adding new members to codel params structure, and thus we had overlapping changes to the initializer function. 3) 'net' was making a fix to sk_release_kernel() which is completely removed in 'net-next'. 4) In net_namespace.c, the rtnl_net_fill() call for GET operations had the command value fixed, meanwhile 'net-next' adjusted the argument signature a bit. This also matches example merge resolutions posted by Stephen Rothwell over the past two days. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | seccomp, filter: add and use bpf_prog_create_from_user from seccompDaniel Borkmann2015-05-091-30/+12Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seccomp has always been a special candidate when it comes to preparation of its filters in seccomp_prepare_filter(). Due to the extra checks and filter rewrite it partially duplicates code and has BPF internals exposed. This patch adds a generic API inside the BPF code code that seccomp can use and thus keep it's filter preparation code minimal and better maintainable. The other side-effect is that now classic JITs can add seccomp support as well by only providing a BPF_LDX | BPF_W | BPF_ABS translation. Tested with seccomp and BPF test suites. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | seccomp: simplify seccomp_prepare_filter and reuse bpf_prepare_filterNicolas Schichan2015-05-091-46/+22Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the calls to bpf_check_classic(), bpf_convert_filter() and bpf_migrate_runtime() and let bpf_prepare_filter() take care of that instead. seccomp_check_filter() is passed to bpf_prepare_filter() so that it gets called from there, after bpf_check_classic(). We can now remove exposure of two internal classic BPF functions previously used by seccomp. The export of bpf_check_classic() symbol, previously known as sk_chk_filter(), was there since pre git times, and no in-tree module was using it, therefore remove it. Joint work with Daniel Borkmann. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | Merge branch 'sched-hrtimers-for-linus' of ↵Linus Torvalds2015-06-256-492/+587
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: "This series of scheduler updates depends on sched/core and timers/core branches, which are already in your tree: - Scheduler balancing overhaul to plug a hard to trigger race which causes an oops in the balancer (Peter Zijlstra) - Lockdep updates which are related to the balancing updates (Peter Zijlstra)" * 'sched-hrtimers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched,lockdep: Employ lock pinning lockdep: Implement lock pinning lockdep: Simplify lock_release() sched: Streamline the task migration locking a little sched: Move code around sched,dl: Fix sched class hopping CBS hole sched, dl: Convert switched_{from, to}_dl() / prio_changed_dl() to balance callbacks sched,dl: Remove return value from pull_dl_task() sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks sched,rt: Remove return value from pull_rt_task() sched: Allow balance callbacks for check_class_changed() sched: Use replace normalize_task() with __sched_setscheduler() sched: Replace post_schedule with a balance callback list
| * | | | | | | | sched,lockdep: Employ lock pinningPeter Zijlstra2015-06-195-8/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Employ the new lockdep lock pinning annotation to ensure no 'accidental' lock-breaks happen with rq->lock. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124744.003233193@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | lockdep: Implement lock pinningPeter Zijlstra2015-06-191-0/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a lockdep annotation that WARNs if you 'accidentially' unlock a lock. This is especially helpful for code with callbacks, where the upper layer assumes a lock remains taken but a lower layer thinks it maybe can drop and reacquire the lock. By unwittingly breaking up the lock, races can be introduced. Lock pinning is a lockdep annotation that helps with this, when you lockdep_pin_lock() a held lock, any unlock without a lockdep_unpin_lock() will produce a WARN. Think of this as a relative of lockdep_assert_held(), except you don't only assert its held now, but ensure it stays held until you release your assertion. RFC: a possible alternative API would be something like: int cookie = lockdep_pin_lock(&foo); ... lockdep_unpin_lock(&foo, cookie); Where we pick a random number for the pin_count; this makes it impossible to sneak a lock break in without also passing the right cookie along. I've not done this because it ends up generating code for !LOCKDEP, esp. if you need to pass the cookie around for some reason. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124743.906731065@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | lockdep: Simplify lock_release()Peter Zijlstra2015-06-191-101/+18Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lock_release() takes this nested argument that's mostly pointless these days, remove the implementation but leave the argument a rudiment for now. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124743.840411606@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched: Streamline the task migration locking a littlePeter Zijlstra2015-06-191-42/+34Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The whole migrate_task{,s}() locking seems a little shaky, there's a lot of dropping an require happening. Pull the locking up into the callers as far as possible to streamline the lot. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124743.755256708@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched: Move code aroundPeter Zijlstra2015-06-191-186/+178Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to reworking set_cpus_allowed_ptr() move some code around. This also removes some superfluous #ifdefs and adds comments to some #endifs. text data bss dec hex filename 12211532 1738144 1081344 15031020 e55aec defconfig-build/vmlinux.pre 12211532 1738144 1081344 15031020 e55aec defconfig-build/vmlinux.post Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124743.662086684@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched,dl: Fix sched class hopping CBS holePeter Zijlstra2015-06-191-66/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still have a few pending issues with the deadline code, one of which is that switching between scheduling classes can 'leak' CBS state. Close the hole by retaining the current CBS state when leaving SCHED_DEADLINE and unconditionally programming the deadline timer. The timer will then reset the CBS state if the task is still !SCHED_DEADLINE by the time it hits. If the task left SCHED_DEADLINE it will not call task_dead_dl() and we'll not cancel the hrtimer, leaving us a pending timer in free space. Avoid this by giving the timer a task reference, this avoids littering the task exit path for this rather uncommon case. In order to do this, I had to move dl_task_offline_migration() below the replenishment, such that the task_rq()->lock fully covers that. While doing this, I noticed that it (was) buggy in assuming a task is enqueued and or we need to enqueue the task now. Fixing this means select_task_rq_dl() might encounter an offline rq -- look into that. As a result this kills cancel_dl_timer() which included a rq->lock break. Fixes: 40767b0dc768 ("sched/deadline: Fix deadline parameter modification handling") Cc: Wanpeng Li <wanpeng.li@linux.intel.com> Cc: Luca Abeni <luca.abeni@unitn.it> Cc: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: Luca Abeni <luca.abeni@unitn.it> Cc: Juri Lelli <juri.lelli@arm.com> Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124743.574192138@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched, dl: Convert switched_{from, to}_dl() / prio_changed_dl() to balance ↵Peter Zijlstra2015-06-191-21/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | callbacks Remove the direct {push,pull} balancing operations from switched_{from,to}_dl() / prio_changed_dl() and use the balance callback queue. Again, err on the side of too many reschedules; since too few is a hard bug while too many is just annoying. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.968262663@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched,dl: Remove return value from pull_dl_task()Peter Zijlstra2015-06-191-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to be able to use pull_dl_task() from a callback, we need to do away with the return value. Since the return value indicates if we should reschedule, do this inside the function. Since not all callers currently do this, this can increase the number of reschedules due rt balancing. Too many reschedules is not a correctness issues, too few are. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.859398977@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance ↵Peter Zijlstra2015-06-191-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | callbacks Remove the direct {push,pull} balancing operations from switched_{from,to}_rt() / prio_changed_rt() and use the balance callback queue. Again, err on the side of too many reschedules; since too few is a hard bug while too many is just annoying. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.766832367@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched,rt: Remove return value from pull_rt_task()Peter Zijlstra2015-06-191-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to be able to use pull_rt_task() from a callback, we need to do away with the return value. Since the return value indicates if we should reschedule, do this inside the function. Since not all callers currently do this, this can increase the number of reschedules due rt balancing. Too many reschedules is not a correctness issues, too few are. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.679002000@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched: Allow balance callbacks for check_class_changed()Peter Zijlstra2015-06-191-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to remove dropping rq->lock from the switched_{to,from}()/prio_changed() sched_class methods, run the balance callbacks after it. We need to remove dropping rq->lock because its buggy, suppose using sched_setattr()/sched_setscheduler() to change a running task from FIFO to OTHER. By the time we get to switched_from_rt() the task is already enqueued on the cfs runqueues. If switched_from_rt() does pull_rt_task() and drops rq->lock, load-balancing can come in and move our task @p to another rq. The subsequent switched_to_fair() still assumes @p is on @rq and bad things will happen. By using balance callbacks we delay the load-balancing operations {rt,dl}x{push,pull} until we've done all the important work and the task is fully set up. Furthermore, the balance callbacks do not know about @p, therefore they cannot get confused like this. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Link: http://lkml.kernel.org/r/20150611124742.615343911@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched: Use replace normalize_task() with __sched_setscheduler()Peter Zijlstra2015-06-191-42/+23Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce duplicate logic; normalize_task() is a simplified version of __sched_setscheduler(). Parametrize the difference and collapse. This reduces the amount of check_class_changed() sites. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.532642391@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | sched: Replace post_schedule with a balance callback listPeter Zijlstra2015-06-194-38/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generalize the post_schedule() stuff into a balance callback list. This allows us to more easily use it outside of schedule() and cross sched_class. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.424032725@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | Merge branch 'timers/core' into sched/hrtimersThomas Gleixner2015-06-1931-1069/+939Star
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge sched/core and timers/core so we can apply the sched balancing patch queue, which depends on both.
* | \ \ \ \ \ \ \ \ Merge branch 'sched-locking-for-linus' of ↵Linus Torvalds2015-06-243-34/+88
|\ \ \ \ \ \ \ \ \ \ | |_|_|_|_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "These locking updates depend on the alreay merged sched/core branch: - Lockless top waiter wakeup for rtmutex (Davidlohr) - Reduce hash bucket lock contention for PI futexes (Sebastian) - Documentation update (Davidlohr)" * 'sched-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Update stale plist comments futex: Lower the lock contention on the HB lock during wake up locking/rtmutex: Implement lockless top-waiter wakeup
| * | | | | | | | | locking/rtmutex: Update stale plist commentsDavidlohr Bueso2015-06-191-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... as of fb00aca4744 (rtmutex: Turn the plist into an rb-tree) we no longer use plists for queuing any waiters. Update stale comments. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1432056298-18738-4-git-send-email-dave@stgolabs.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | | futex: Lower the lock contention on the HB lock during wake upSebastian Andrzej Siewior2015-06-193-18/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wake_futex_pi() wakes the task before releasing the hash bucket lock (HB). The first thing the woken up task usually does is to acquire the lock which requires the HB lock. On SMP Systems this leads to blocking on the HB lock which is released by the owner shortly after. This patch rearranges the unlock path by first releasing the HB lock and then waking up the task. [ tglx: Fixed up the rtmutex unlock path ] Originally-from: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/20150617083350.GA2433@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | | locking/rtmutex: Implement lockless top-waiter wakeupDavidlohr Bueso2015-06-181-11/+10Star
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark the task for later wakeup after the wait_lock has been released. This way, once the next task is awoken, it will have a better chance to of finding the wait_lock free when continuing executing in __rt_mutex_slowlock() when trying to acquire the rtmutex, calling try_to_take_rt_mutex(). Upon contended scenarios, other tasks attempting take the lock may acquire it first, right after the wait_lock is released, but (a) this can also occur with the current code, as it relies on the spinlock fairness, and (b) we are dealing with the top-waiter anyway, so it will always take the lock next. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1432056298-18738-2-git-send-email-dave@stgolabs.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | | | | Merge tag 'pm+acpi-4.2-rc1' of ↵Linus Torvalds2015-06-234-63/+78
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "The rework of backlight interface selection API from Hans de Goede stands out from the number of commits and the number of affected places perspective. The cpufreq core fixes from Viresh Kumar are quite significant too as far as the number of commits goes and because they should reduce CPU online/offline overhead quite a bit in the majority of cases. From the new featues point of view, the ACPICA update (to upstream revision 20150515) adding support for new ACPI 6 material to ACPICA is the one that matters the most as some new significant features will be based on it going forward. Also included is an update of the ACPI device power management core to follow ACPI 6 (which in turn reflects the Windows' device PM implementation), a PM core extension to support wakeup interrupts in a more generic way and support for the ACPI _CCA device configuration object. The rest is mostly fixes and cleanups all over and some documentation updates, including new DT bindings for Operating Performance Points. There is one fix for a regression introduced in the 4.1 cycle, but it adds quite a number of lines of code, it wasn't really ready before Thursday and you were on vacation, so I refrained from pushing it on the last minute for 4.1. Specifics: - ACPICA update to upstream revision 20150515 including basic support for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO, XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM, FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI, _MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore, Lv Zheng). - ACPI device power management core code update to follow ACPI 6 which reflects the ACPI device power management implementation in Windows (Rafael J Wysocki). - rework of the backlight interface selection logic to reduce the number of kernel command line options and improve the handling of DMI quirks that may be involved in that and to make the code generally more straightforward (Hans de Goede). - fixes for the ACPI Embedded Controller (EC) driver related to the handling of EC transactions (Lv Zheng). - fix for a regression related to the ACPI resources management and resulting from a recent change of ACPI initialization code ordering (Rafael J Wysocki). - fix for a system initialization regression related to ACPI introduced during the 3.14 cycle and caused by running the code that switches the platform over to the ACPI mode too early in the initialization sequence (Rafael J Wysocki). - support for the ACPI _CCA device configuration object related to DMA cache coherence (Suravee Suthikulpanit). - ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov). - ACPI battery driver cleanups (Luis Henriques, Mathias Krause). - ACPI processor driver cleanups (Hanjun Guo). - cleanups and documentation update related to the ACPI device properties interface based on _DSD (Rafael J Wysocki). - ACPI device power management fixes (Rafael J Wysocki). - assorted cleanups related to ACPI (Dominik Brodowski, Fabian Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki). - fix for a long-standing issue causing General Protection Faults to be generated occasionally on return to user space after resume from ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar). - fix to make the suspend core code return -EBUSY consistently in all cases when system suspend is aborted due to wakeup detection (Ruchi Kandoi). - support for automated device wakeup IRQ handling allowing drivers to make their PM support more starightforward (Tony Lindgren). - new tracepoints for suspend-to-idle tracing and rework of the prepare/complete callbacks tracing in the PM core (Todd E Brandt, Rafael J Wysocki). - wakeup sources framework enhancements (Jin Qian). - new macro for noirq system PM callbacks (Grygorii Strashko). - assorted cleanups related to system suspend (Rafael J Wysocki). - cpuidle core cleanups to make the code more efficient (Rafael J Wysocki). - powernv/pseries cpuidle driver update (Shilpasri G Bhat). - cpufreq core fixes related to CPU online/offline that should reduce the overhead of these operations quite a bit, unless the CPU in question is physically going away (Viresh Kumar, Saravana Kannan). - serialization of cpufreq governor callbacks to avoid race conditions in some cases (Viresh Kumar). - intel_pstate driver fixes and cleanups (Doug Smythies, Prarit Bhargava, Joe Konno). - cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep Holla, Felipe Balbi, Tang Yuantian). - assorted cleanups in cpufreq drivers and core (Shailendra Verma, Fabian Frederick, Wang Long). - new Device Tree bindings for representing Operating Performance Points (Viresh Kumar). - updates for the common clock operations support code in the PM core (Rajendra Nayak, Geert Uytterhoeven). - PM domains core code update (Geert Uytterhoeven). - Intel Knights Landing support for the RAPL (Running Average Power Limit) power capping driver (Dasaratharaman Chandramouli). - fixes related to the floor frequency setting on Atom SoCs in the RAPL power capping driver (Ajay Thomas). - runtime PM framework documentation update (Ben Dooks). - cpupower tool fix (Herton R Krzesinski)" * tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (194 commits) cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state x86: Load __USER_DS into DS/ES after resume PM / OPP: Add binding for 'opp-suspend' PM / OPP: Allow multiple OPP tables to be passed via DT PM / OPP: Add new bindings to address shortcomings of existing bindings ACPI: Constify ACPI device IDs in documentation ACPI / enumeration: Document the rules regarding the PRP0001 device ID ACPI / video: Make acpi_video_unregister_backlight() private acpi-video-detect: Remove old API toshiba-acpi: Port to new backlight interface selection API thinkpad-acpi: Port to new backlight interface selection API sony-laptop: Port to new backlight interface selection API samsung-laptop: Port to new backlight interface selection API msi-wmi: Port to new backlight interface selection API msi-laptop: Port to new backlight interface selection API intel-oaktrail: Port to new backlight interface selection API ideapad-laptop: Port to new backlight interface selection API fujitsu-laptop: Port to new backlight interface selection API eeepc-laptop: Port to new backlight interface selection API dell-wmi: Port to new backlight interface selection API ...
| | \ \ \ \ \ \ \ \
| | \ \ \ \ \ \ \ \
| *-. \ \ \ \ \ \ \ \ Merge branches 'pm-sleep' and 'pm-runtime'Rafael J. Wysocki2015-06-193-8/+19
| |\ \ \ \ \ \ \ \ \ \ | | | | |_|_|/ / / / / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-sleep: PM / sleep: trace_device_pm_callback coverage in dpm_prepare/complete PM / wakeup: add a dummy wakeup_source to record statistics PM / sleep: Make suspend-to-idle-specific code depend on CONFIG_SUSPEND PM / sleep: Return -EBUSY from suspend_enter() on wakeup detection PM / tick: Add tracepoints for suspend-to-idle diagnostics PM / sleep: Fix symbol name in a comment in kernel/power/main.c leds / PM: fix hibernation on arm when gpio-led used with CPU led trigger ARM: omap-device: use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS bus: omap_l3_noc: add missed callbacks for suspend-to-disk PM / sleep: Add macro to define common noirq system PM callbacks PM / sleep: Refine diagnostic messages in enter_state() PM / wakeup: validate wakeup source before activating it. * pm-runtime: PM / Runtime: Update last_busy in rpm_resume PM / runtime: add note about re-calling in during device probe()
| | * | | | | | | | | PM / sleep: Make suspend-to-idle-specific code depend on CONFIG_SUSPENDRafael J. Wysocki2015-05-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since idle_should_freeze() is defined to always return 'false' for CONFIG_SUSPEND unset, all of the code depending on it in cpuidle_idle_call() is not necessary in that case. Make that code depend on CONFIG_SUSPEND too to avoid building it when it is not going to be used. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | | | | PM / sleep: Return -EBUSY from suspend_enter() on wakeup detectionRuchi Kandoi2015-05-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a wakeup source is found to be pending in the last stage of suspend after syscore suspend, then the machine won't suspend, but suspend_enter() will return 0. That is confusing, as wakeup detection elsewhere causes -EBUSY to be returned from suspend_enter(). To avoid the confusion, make suspend_enter() return -EBUSY in that case too. Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com> [ rjw: Subject and changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | | | | PM / tick: Add tracepoints for suspend-to-idle diagnosticsRafael J. Wysocki2015-05-151-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add suspend/resume tracepoints to tick_freeze() and tick_unfreeze() to catch when timekeeping is suspended and resumed during suspend-to-idle so as to be able to check whether or not we enter the "frozen" state and to measure the time spent in it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | | | | PM / sleep: Fix symbol name in a comment in kernel/power/main.cRafael J. Wysocki2015-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | | | | PM / sleep: Refine diagnostic messages in enter_state()Rafael J. Wysocki2015-05-121-3/+3
| | | |_|_|/ / / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the system suspend diagnostic messages related to suspend-to-idle refer to it as "freeze sleep" or "freeze state" while the others say "suspend-to-idle". To reduce the possible confusion that may result from that, refine the former either to say "suspend to idle" too or to make it clearer that what is printed is a state string written to /sys/power/state ("mem", "standby", or "freeze"). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | | | Merge branch 'pm-cpuidle'Rafael J. Wysocki2015-06-191-55/+59
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-cpuidle: cpuidle: Do not use CPUIDLE_DRIVER_STATE_START in cpuidle.c cpuidle: Select a different state on tick_broadcast_enter() failures sched / idle: Call default_idle_call() from cpuidle_enter_state() sched / idle: Call idle_set_state() from cpuidle_enter_state() cpuidle: Fix the kerneldoc comment for cpuidle_enter_state() sched / idle: Eliminate the "reflect" check from cpuidle_idle_call() cpuidle: Check the sign of index in cpuidle_reflect() sched / idle: Move the default idle call code to a separate function
| | * | | | | | | | | sched / idle: Call default_idle_call() from cpuidle_enter_state()Rafael J. Wysocki2015-05-141-13/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check of the cpuidle_enter() return value against -EBUSY made in call_cpuidle() will not be necessary any more if cpuidle_enter_state() calls default_idle_call() directly when it is about to return -EBUSY, so make that happen and eliminate the check. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Kevin Hilman <khilman@linaro.org>
| | * | | | | | | | | sched / idle: Call idle_set_state() from cpuidle_enter_state()Rafael J. Wysocki2015-05-141-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a wrapper function around idle_set_state() called sched_idle_set_state() that will pass this_rq() to it as the first argument and make cpuidle_enter_state() call the new function before and after entering the target state. At the same time, remove direct invocations of idle_set_state() from call_cpuidle(). This will allow the invocation of default_idle_call() to be moved from call_cpuidle() to cpuidle_enter_state() safely and call_cpuidle() to be simplified a bit as a result. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Kevin Hilman <khilman@linaro.org>
| | * | | | | | | | | sched / idle: Eliminate the "reflect" check from cpuidle_idle_call()Rafael J. Wysocki2015-05-041-44/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since cpuidle_reflect() should only be called if the idle state to enter was selected by cpuidle_select(), there is the "reflect" variable in cpuidle_idle_call() whose value is used to determine whether or not that is the case. However, if the entire code run between the conditional setting "reflect" and the call to cpuidle_reflect() is moved to a separate function, it will be possible to call that new function in both branches of the conditional, in which case cpuidle_reflect() will only need to be called from one of them too and the "reflect" variable won't be necessary any more. This eliminates one check made by cpuidle_idle_call() on the majority of its invocations, so change the code as described. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>