summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* cpufreq: Replace recover_policy with new_policy in cpufreq_online()Rafael J. Wysocki2015-07-311-12/+11Star
| | | | | | | | | | | | | | | | | The recover_policy is unsed in cpufreq_online() to indicate whether a new policy object is created or an existing one is reinitialized. The "recover" part of the name is slightly confusing (it should be "reinitialization" rather than "recovery") and the logical not (!) operator is applied to it in almost all of the checks it is used in, so replace that variable with a new one called "new_policy" that will be true in the case of a new policy creation. While at it, drop one of the labels that is jumped to from only one spot. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: Separate CPU device registration from CPU onlineRafael J. Wysocki2015-07-311-43/+47
| | | | | | | | | | | | To separate the CPU online interface from the CPU device registration, split cpufreq_online() out of cpufreq_add_dev() and make cpufreq_cpu_callback() call the former, while cpufreq_add_dev() itself will only be used as the CPU device addition subsystem interface callback. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Suggested-by: Russell King <linux@arm.linux.org.uk> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottlingShilpasri G Bhat2015-07-281-2/+29
| | | | | | | | | | | | If frequency is throttled due to OCC reset then cpus will be in Psafe frequency, so restore the frequency on all cpus to policy->cur when OCCs are active again. And if frequency is throttled due to Pmax capping then restore the frequency of all the cpus in the chip on unthrottling. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is setShilpasri G Bhat2015-07-281-9/+3Star
| | | | | | | | | | | | | | | | | | | On a reset cycle of OCC, although the system retires from safe frequency state the local pstate is not restored to Pmin or last requested pstate. Now if the cpufreq governor initiates a pstate change, the local pstate will be in Psafe and we will be reporting a false positive when we are not throttled. So in powernv_cpufreq_throttle_check() remove the condition which checks if local pstate is less than Pmin while checking for Psafe frequency. If the cpus are forced to Psafe then PMSR.psafe_mode_active bit will be set. So, when OCCs become active this bit will be cleared. Let us just rely on this bit for reporting throttling. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLEShilpasri G Bhat2015-07-281-2/+26
| | | | | | | | | | | | | | | | | Re-evaluate the chip's throttled state on recieving OCC_THROTTLE notification by executing *throttle_check() on any one of the cpu on the chip. This is a sanity check to verify if we were indeed throttled/unthrottled after receiving OCC_THROTTLE notification. We cannot call *throttle_check() directly from the notification handler because we could be handling chip1's notification in chip2. So initiate an smp_call to execute *throttle_check(). We are irq-disabled in the notification handler, so use a worker thread to smp_call throttle_check() on any of the cpu in the chipmask. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: powernv: Register for OCC related opal_message notificationShilpasri G Bhat2015-07-281-1/+73
| | | | | | | | | | | | | | | | | | | | | OCC is an On-Chip-Controller which takes care of power and thermal safety of the chip. During runtime due to power failure or overtemperature the OCC may throttle the frequencies of the CPUs to remain within the power budget. We want the cpufreq driver to be aware of such situations to be able to report the reason to the user. We register to opal_message_notifier to receive OCC messages from opal. powernv_cpufreq_throttle_check() reports any frequency throttling and this patch will report the reason or event that caused throttling. We can be throttled if OCC is reset or OCC limits Pmax due to power or thermal reasons. We are also notified of unthrottling after an OCC reset or if OCC restores Pmax on the chip. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powerpc/powernv: Add definition of OPAL_MSG_OCC message typeShilpasri G Bhat2015-07-281-0/+12
| | | | | | | | | | | | | | | | | | | | | | Add OPAL_MSG_OCC message definition to opal_message_type to receive OCC events like reset, load and throttled. Host performance can be affected when OCC is reset or OCC throttles the max Pstate. We can register to opal_message_notifier to receive OPAL_MSG_OCC type of message and report it to the userspace so as to keep the user informed about the reason for a performance drop in workloads. The reset and load OCC events are notified to kernel when FSP sends OCC_RESET and OCC_LOAD commands. Both reset and load messages are sent to kernel on successful completion of reset and load operation respectively. The throttle OCC event indicates that the Pmax of the chip is reduced. The chip_id and throttle reason for reducing Pmax is also queued along with the message. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: powernv: Handle throttling due to Pmax capping at chip levelShilpasri G Bhat2015-07-281-4/+55
| | | | | | | | | | | | | | | | | The On-Chip-Controller(OCC) can throttle cpu frequency by reducing the max allowed frequency for that chip if the chip exceeds its power or temperature limits. As Pmax capping is a chip level condition report this throttling behavior at chip level and also do not set the global 'throttled' on Pmax capping instead set the per-chip throttled variable. Report unthrottling if Pmax is restored after throttling. This patch adds a structure to store chip id and throttled state of the chip. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: Pass CPU number to cpufreq_policy_alloc()Rafael J. Wysocki2015-07-281-4/+8
| | | | | | | | | | | | Change cpufreq_policy_alloc() to take a CPU number instead of a CPU device pointer as its argument, as it is the only function called by cpufreq_add_dev() taking a device pointer argument at this point. That will allow us to split the CPU online part from cpufreq_add_dev() more cleanly going forward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: Do not update related_cpus on every policy activationRafael J. Wysocki2015-07-281-5/+5
| | | | | | | | | | | | | | The related_cpus mask includes CPUs whose cpufreq_cpu_data per-CPU pointers have been set the the given policy. Since those pointers are only set at the policy creation time and unset when the policy is deleted, the related_cpus should not be updated between those two operations. For this reason, avoid updating it whenever the first of the "related" CPUs goes online. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: Drop unused dev argument from two functionsRafael J. Wysocki2015-07-281-6/+4Star
| | | | | | | | | The dev argument of cpufreq_add_policy_cpu() and cpufreq_add_dev_interface() is not used by any of them, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: Drop unnecessary label from cpufreq_add_dev()Rafael J. Wysocki2015-07-281-3/+2Star
| | | | | | | | The leftover out_release_rwsem label in cpufreq_add_dev() is not necessary any more and confusing, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: Drop cpufreq_policy_restore()Rafael J. Wysocki2015-07-281-33/+11Star
| | | | | | | | | | | | | | | | | | | Notice that when cpufreq_policy_restore() is called, its per-CPU cpufreq_cpu_data variable has been already dereferenced and if that variable is not NULL, the policy local pointer in cpufreq_add_dev() contains its value. Therefore it is not necessary to dereference it again and the policy pointer can be used directly. Moreover, if that pointer is not NULL, the policy is inactive (or the previous check would have made us return from cpufreq_add_dev()) so the restoration code from cpufreq_policy_restore() can be moved to that point in cpufreq_add_dev(). Do that and drop cpufreq_policy_restore(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: Rework two functions related to CPU offlineRafael J. Wysocki2015-07-281-20/+12Star
| | | | | | | | | | | | | Since __cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish() are about CPU offline rather than about CPU removal, rename them to cpufreq_offline_prepare() and cpufreq_offline_finish(), respectively. Also change their argument from a struct device pointer to a CPU number, because they use the CPU number only internally anyway and make them void as their return values are ignored. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* Merge back earlier cpufreq material for v4.3.Rafael J. Wysocki2015-07-2815-272/+341
|\
| * cpufreq: Remove cpufreq_rwsemSebastian Andrzej Siewior2015-07-251-38/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpufreq_rwsem was introduced in commit 6eed9404ab3c4 ("cpufreq: Use rwsem for protecting critical sections) in order to replace try_module_get() on the cpu-freq driver. That try_module_get() worked well until the refcount was so heavily used that module removal became more or less impossible. Though when looking at the various (undocumented) protection mechanisms in that code, the randomly sprinkeled around cpufreq_rwsem locking sites are superfluous. The policy, which is acquired in cpufreq_cpu_get() and released in cpufreq_cpu_put() is sufficiently protected already. cpufreq_cpu_get(cpu) /* Protects against concurrent driver removal */ read_lock_irqsave(&cpufreq_driver_lock, flags); policy = per_cpu(cpufreq_cpu_data, cpu); kobject_get(&policy->kobj); read_unlock_irqrestore(&cpufreq_driver_lock, flags); The reference on the policy serializes versus module unload already: cpufreq_unregister_driver() subsys_interface_unregister() __cpufreq_remove_dev_finish() per_cpu(cpufreq_cpu_data) = NULL; cpufreq_policy_put_kobj() If there is a reference held on the policy, i.e. obtained prior to the unregister call, then cpufreq_policy_put_kobj() will wait until that reference is dropped. So once subsys_interface_unregister() returns there is no policy pointer in flight and no new reference can be obtained. So that rwsem protection is useless. The other usage of cpufreq_rwsem in show()/store() of the sysfs interface is redundant as well because sysfs already does the proper kobject_get()/put() pairs. That leaves CPU hotplug versus module removal. The current down_write() around the write_lock() in cpufreq_unregister_driver() is silly at best as it protects actually nothing. The trivial solution to this is to prevent hotplug across cpufreq_unregister_driver completely. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: ia64: Fix a memory leak in acpi_cpufreq_cpu_exit()Pan Xinhui2015-07-221-0/+1
| | | | | | | | | | | | | | | | freq_table should be alloced in ->init and freed in ->exit, but it it is not freed. Fix this memory leak in acpi_cpufreq_cpu_exit(). Signed-off-by: Pan Xinhui <xinhuix.pan@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: ia64: remove redundant freq_table of acpi_cpufreq_dataPan Xinhui2015-07-221-7/+7
| | | | | | | | | | | | | | | | | | freq_table is now stored as policy->freq_table, so drop the redundant freq_table from struct cpufreq_acpi_io. Signed-off-by: Pan Xinhui <xinhuix.pan@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: acpi-cpufreq: Fix up the handling of cpb sysfs attributeRafael J. Wysocki2015-07-221-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cpb sysfs attribute is only exposed by the ACPI cpufreq driver after a runtime check. For this purpose, the driver keeps a NULL placeholder in its table of sysfs attributes and replaces the NULL with a pointer to an attribute structure if it decides to expose cpb. That is confusing, so make the driver set the pointer to the cpb attribute structure upfront and replace it with NULL if the attribute should not be exposed instead. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
| * cpufreq: acpi-cpufreq: Drop acpi_data from struct acpi_cpufreq_dataRafael J. Wysocki2015-07-221-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | After commit 8cfcfd39000d (acpi-cpufreq: Fix an ACPI perf unregister issue) we store both a pointer to per-CPU data of the first policy CPU and the number of that CPU which are redundant. Since the CPU number has to be stored anyway for the unregistration, the pointer to the CPU's per-CPU data may be dropped and we can access the data in question via per_cpu_ptr(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
| * ACPI / processor: Drop an unused argument of a cleanup routineRafael J. Wysocki2015-07-228-29/+17Star
| | | | | | | | | | | | | | | | acpi_processor_unregister_performance() actually doesn't use its first argument, so drop it and update the callers accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
| * cpufreq: propagate errors returned from __cpufreq_governor()Viresh Kumar2015-07-211-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Return codes aren't honored properly in cpufreq_set_policy(). This can lead to two problems: - wrong errors propagated to sysfs - we try to do next state-change even if the previous one failed cpufreq_governor_dbs() now returns proper errors on all invalid state-transition requests and this code should honor that. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: governor: Don't WARN on invalid statesViresh Kumar2015-07-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | With previous commit, governors have started to return errors on invalid state-transition requests. We already have a WARN for an invalid state-transition request in cpufreq_governor_dbs(). This does trigger today, as the sequence of events isn't guaranteed by cpufreq core. Lets stop warning on that for now, and make sure we don't enter an invalid state. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: governor: Avoid invalid states with additional checksViresh Kumar2015-07-211-11/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There can be races where the request has come to a wrong state. For example INIT followed by STOP (instead of START) or START followed by EXIT (instead of STOP). Address these races by making sure the state-machine never gets into any invalid state. Also return an error if an invalid state-transition is requested. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: governor: split out common part of {cs|od}_dbs_timer()Viresh Kumar2015-07-214-51/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is unnecessarily duplicated. Create the real work-handler in cpufreq_governor.c and put the common code in this routine (dbs_timer()). Shouldn't make any functional change. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: governor: Keep single copy of information common to policy->cpusViresh Kumar2015-07-214-58/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone. The memory for cpu_common_dbs_info is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: governor: rename cur_policy as policyViresh Kumar2015-07-174-24/+25
| | | | | | | | | | | | | | | | | | Just call it 'policy', cur_policy is unnecessarily long and doesn't have any special meaning. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: governor: name pointer to cpu_dbs_info as 'cdbs'Viresh Kumar2015-07-171-13/+13
| | | | | | | | | | | | | | | | | | It is called as 'cdbs' at most of the places and 'cpu_dbs' at others. Lets use 'cdbs' consistently for better readability. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: governor: Rename 'cpu_dbs_common_info' to 'cpu_dbs_info'Viresh Kumar2015-07-172-17/+15Star
| | | | | | | | | | | | | | | | | | | | Its not common info to all CPUs, but a structure representing common type of cpu info to both governor types. Lets drop 'common_' from its name. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: governor: Drop unused field 'cpu'Viresh Kumar2015-07-172-2/+0Star
| | | | | | | | | | | | | | | | Its not used at all, drop it. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: governor: Name delayed-work as dworkViresh Kumar2015-07-174-9/+9
| | | | | | | | | | | | | | | | | | Delayed work was named as 'work' and to access work within it we do work.work. Not much readable. Rename delayed_work as 'dwork'. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * intel_pstate: enable HWP per CPUKristen Carlson Accardi2015-07-161-4/+7
| | | | | | | | | | | | | | | | | | | | | | HWP previously was only enabled at driver load time, on the boot CPU, however, HWP must be enabled per package. Move the code to enable HWP to the cpufreq driver init path so that it will be called per CPU. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Tested-by: David Zhuang <david.zhuang@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: integrator: fixed coding style issuesCristian Ardelean2015-07-161-10/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | | | Fixed coding style issues found by checkpatch.pl tool. Changed space indentation to tab, removed unneccesary braces, removed space between MODULE macros and parentheses. REMARKS: failed to 'make' this file with error message 'fatal error: asm/mach-types.h: No such file or directory'. Signed-off-by: Cristian Ardelean <cristian97.ardelean@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * acpi-cpufreq: Fix an ACPI perf unregister issuePan Xinhui2015-07-161-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As policy->cpu may not be same in acpi_cpufreq_cpu_init() and acpi_cpufreq_cpu_exit(). There is a risk that we use different CPU to un/register ACPI performance. So acpi_processor_unregister_performance() may not be able to do the cleanup work. That causes a memory leak. And if there will be another acpi_processor_register_performance() call, it may also fail thanks to the internal check of pr->performace. So add a new struct acpi_cpufreq_data field, acpi_perf_cpu, to fix this issue. Signed-off-by: Pan Xinhui <xinhuix.pan@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: Properly handle errors from cpufreq_init_policy()Viresh Kumar2015-07-161-9/+11
| | | | | | | | | | | | | | | | | | | | | | cpufreq_init_policy() can fail, and we don't do anything except a call to ->exit() on that. The policy should be freed if this happens. Do it properly. Reported-and-tested-by: "Jon Medhurst (Tixy)" <tixy@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: cpufreq_add_dev: name goto labels based on what they doViresh Kumar2015-07-161-8/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | These labels are are named in two ways normally: - Based on what caused to jump to such labels - Based on what we do under such labels We follow the first naming convention today and that leads to multiple labels for doing the same work. Fix it by switching to the second way of naming them. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * acpi-cpufreq: replace per_cpu with driver_data of policyPan Xinhui2015-07-161-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers can store their internal per-policy information in policy->driver_data, lets use it. we have benefits after this replacing. 1) memory saving. 2) policy is shared by several cpus, per_cpu seems not correct. using *driver_data* is more reasonable. 3) fix a memory leak in acpi_cpufreq_cpu_exit. as policy->cpu might change during cpu hotplug. So sometimes we cant't free *data*, use *driver_data* to fix it. 4) fix a zero return value of get_cur_freq_on_cpu. Only per_cpu of policy->cpu is set to *data*, if we try to get cpufreq on other cpus, we get zero instead of correct values. Use *driver_data* to fix it. Signed-off-by: Pan Xinhui <xinhuix.pan@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | cpufreq: Avoid attempts to create duplicate symbolic linksRafael J. Wysocki2015-07-282-53/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 87549141d516 (cpufreq: Stop migrating sysfs files on hotplug) there is a problem with CPUs that share cpufreq policy objects with other CPUs and are initially offline. Say CPU1 shares a policy with CPU0 which is online and is registered first. As part of the registration process, cpufreq_add_dev() is called for it. It creates the policy object and a symbolic link to it from the CPU1's sysfs directory. If CPU1 is registered subsequently and it is offline at that time, cpufreq_add_dev() will attempt to create a symbolic link to the policy object for it, but that link is present already, so a warning about that will be triggered. To avoid that warning, make cpufreq use an additional CPU mask containing related CPUs that are actually present for each policy object. That mask is initialized when the policy object is populated after its creation (for the first online CPU using it) and it includes CPUs from the "policy CPUs" mask returned by the cpufreq driver's ->init() callback that are physically present at that time. Symbolic links to the policy are created only for the CPUs in that mask. If cpufreq_add_dev() is invoked for an offline CPU, it checks the new mask and only creates the symlink if the CPU was not in it (the CPU is added to the mask at the same time). In turn, cpufreq_remove_dev() drops the given CPU from the new mask, removes its symlink to the policy object and returns, unless it is the CPU owning the policy object. In that case, the policy object is moved to a new CPU's sysfs directory or deleted if the CPU being removed was the last user of the policy. While at it, notice that cpufreq_remove_dev() can't fail, because its return value is ignored, so make it ignore return values from __cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish() and prevent these functions from aborting on errors returned by __cpufreq_governor(). Also drop the now unused sif argument from them. Fixes: 87549141d516 (cpufreq: Stop migrating sysfs files on hotplug) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-and-tested-by: Russell King <linux@arm.linux.org.uk> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* | intel_pstate: Add get_scaling cpu_defaults param to Knights LandingLukasz Anaczkowski2015-07-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Scaling for Knights Landing is same as the default scaling (100000). When Knigts Landing support was added to the pstate driver, this parameter was omitted resulting in a kernel panic during boot. Fixes: b34ef932d79a (intel_pstate: Knights Landing support) Reported-by: Yasuaki Ishimatsu <yishimat@redhat.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: 4.1+ <stable@vger.kernel.org> # 4.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | Linux 4.2-rc4Linus Torvalds2015-07-261-1/+1
| |
* | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2015-07-261-0/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix for the intel cqm perf facility to prevent IPIs from interrupt context" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/cqm: Return cached counter value from IRQ context
| * | perf/x86/intel/cqm: Return cached counter value from IRQ contextMatt Fleming2015-07-261-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Peter reported the following potential crash which I was able to reproduce with his test program, [ 148.765788] ------------[ cut here ]------------ [ 148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260() [ 148.765797] Modules linked in: [ 148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4 [ 148.765803] ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007 [ 148.765805] 0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000 [ 148.765807] ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640 [ 148.765809] Call Trace: [ 148.765810] <NMI> [<ffffffff818bdfd5>] dump_stack+0x45/0x57 [ 148.765818] [<ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0 [ 148.765822] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765824] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765825] [<ffffffff810e422a>] warn_slowpath_null+0x1a/0x20 [ 148.765827] [<ffffffff811613f6>] smp_call_function_many+0xb6/0x260 [ 148.765829] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765831] [<ffffffff81161748>] on_each_cpu_mask+0x28/0x60 [ 148.765832] [<ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0 [ 148.765836] [<ffffffff811cdd35>] perf_output_read+0x2a5/0x400 [ 148.765839] [<ffffffff811d2e5a>] perf_output_sample+0x31a/0x590 [ 148.765840] [<ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380 [ 148.765841] [<ffffffff811d3497>] perf_event_output+0x47/0x60 [ 148.765843] [<ffffffff811d36c5>] __perf_event_overflow+0x215/0x240 [ 148.765844] [<ffffffff811d4124>] perf_event_overflow+0x14/0x20 [ 148.765847] [<ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440 [ 148.765849] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765853] [<ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0 [ 148.765854] [<ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20 [ 148.765859] [<ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0 [ 148.765863] [<ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30 [ 148.765865] [<ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20 [ 148.765869] [<ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40 [ 148.765872] [<ffffffff811c8d86>] ? irq_work_queue+0x66/0x80 [ 148.765875] [<ffffffff81075306>] perf_event_nmi_handler+0x26/0x40 [ 148.765877] [<ffffffff81063ed9>] nmi_handle+0x79/0x100 [ 148.765879] [<ffffffff81064422>] default_do_nmi+0x42/0x100 [ 148.765880] [<ffffffff81064563>] do_nmi+0x83/0xb0 [ 148.765884] [<ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e [ 148.765886] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765888] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765890] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765891] <<EOE>> [<ffffffff8110ab66>] finish_task_switch+0x156/0x210 [ 148.765898] [<ffffffff818c1671>] __schedule+0x341/0x920 [ 148.765899] [<ffffffff818c1c87>] schedule+0x37/0x80 [ 148.765903] [<ffffffff810ae1af>] ? do_page_fault+0x2f/0x80 [ 148.765905] [<ffffffff818c1f4a>] schedule_user+0x1a/0x50 [ 148.765907] [<ffffffff818c666c>] retint_careful+0x14/0x32 [ 148.765908] ---[ end trace e33ff2be78e14901 ]--- The CQM task events are not safe to be called from within interrupt context because they require performing an IPI to read the counter value on all sockets. And performing IPIs from within IRQ context is a "no-no". Make do with the last read counter value currently event in event->count when we're invoked in this context. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Will Auld <will.auld@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2015-07-2611-59/+81
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "This update contains: - the manual revert of the SYSCALL32 changes which caused a regression - a fix for the MPX vma handling - three fixes for the ioremap 'is ram' checks. - PAT warning fixes - a trivial fix for the size calculation of TLB tracepoints - handle old EFI structures gracefully This also contains a PAT fix from Jan plus a revert thereof. Toshi explained why the code is correct" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/pat: Revert 'Adjust default caching mode translation tables' x86/asm/entry/32: Revert 'Do not use R9 in SYSCALL32' commit x86/mm: Fix newly introduced printk format warnings mm: Fix bugs in region_is_ram() x86/mm: Remove region_is_ram() call from ioremap x86/mm: Move warning from __ioremap_check_ram() to the call site x86/mm/pat, drivers/media/ivtv: Move the PAT warning and replace WARN() with pr_warn() x86/mm/pat, drivers/infiniband/ipath: Replace WARN() with pr_warn() x86/mm/pat: Adjust default caching mode translation tables x86/fpu: Disable dependent CPU features on "noxsave" x86/mpx: Do not set ->vm_ops on MPX VMAs x86/mm: Add parenthesis for TLB tracepoint size calculation efi: Handle memory error structures produced based on old versions of standard
| * | x86/mm/pat: Revert 'Adjust default caching mode translation tables'Thomas Gleixner2015-07-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Toshi explains: "No, the default values need to be set to the fallback types, i.e. minimal supported mode. For WC and WT, UC is the fallback type. When PAT is disabled, pat_init() does update the tables below to enable WT per the default BIOS setup. However, when PAT is enabled, but CPU has PAT -errata, WT falls back to UC per the default values." Revert: ca1fec58bc6a 'x86/mm/pat: Adjust default caching mode translation tables' Requested-by: Toshi Kani <toshi.kani@hp.com> Cc: Jan Beulich <jbeulich@suse.de> Link: http://lkml.kernel.org/r/1437577776.3214.252.camel@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/asm/entry/32: Revert 'Do not use R9 in SYSCALL32' commitDenys Vlasenko2015-07-241-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change reverts most of commit 53e9accf0f 'Do not use R9 in SYSCALL32'. I don't yet understand how, but code in that commit sometimes fails to preserve EBP. See https://bugzilla.kernel.org/show_bug.cgi?id=101061 "Problems while executing 32-bit code on AMD64" Reported-and-tested-by: Krzysztof A. Sobiecki <sobkas@gmail.com> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Will Drewry <wad@chromium.org> Cc: Kees Cook <keescook@chromium.org> CC: x86@kernel.org Link: http://lkml.kernel.org/r/1437740203-11552-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/mm: Fix newly introduced printk format warningsThomas Gleixner2015-07-241-2/+2
| | | | | | | | | | | | Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | mm: Fix bugs in region_is_ram()Toshi Kani2015-07-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | region_is_ram() looks up the iomem_resource table to check if a target range is in RAM. However, it always returns with -1 due to invalid range checks. It always breaks the loop at the first entry of the table. Another issue is that it compares p->flags and flags, but it always fails. flags is declared as int, which makes it as a negative value with IORESOURCE_BUSY (0x80000000) set while p->flags is unsigned long. Fix the range check and flags so that region_is_ram() works as advertised. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Mike Travis <travis@sgi.com> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roland Dreier <roland@purestorage.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1437088996-28511-4-git-send-email-toshi.kani@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/mm: Remove region_is_ram() call from ioremapToshi Kani2015-07-221-18/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __ioremap_caller() calls region_is_ram() to walk through the iomem_resource table to check if a target range is in RAM, which was added to improve the lookup performance over page_is_ram() (commit 906e36c5c717 "x86: use optimized ioresource lookup in ioremap function"). page_is_ram() was no longer used when this change was added, though. __ioremap_caller() then calls walk_system_ram_range(), which had replaced page_is_ram() to improve the lookup performance (commit c81c8a1eeede "x86, ioremap: Speed up check for RAM pages"). Since both checks walk through the same iomem_resource table for the same purpose, there is no need to call both functions. Aside of that walk_system_ram_range() is the only useful check at the moment because region_is_ram() always returns -1 due to an implementation bug. That bug in region_is_ram() cannot be fixed without breaking existing ioremap callers, which rely on the subtle difference of walk_system_ram_range() versus non page aligned ranges. Once these offending callers are fixed we can use region_is_ram() and remove walk_system_ram_range(). [ tglx: Massaged changelog ] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Roland Dreier <roland@purestorage.com> Cc: Mike Travis <travis@sgi.com> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1437088996-28511-3-git-send-email-toshi.kani@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/mm: Move warning from __ioremap_check_ram() to the call siteToshi Kani2015-07-221-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __ioremap_check_ram() has a WARN_ONCE() which is emitted when the given pfn range is not RAM. The warning is bogus in two aspects: - it never triggers since walk_system_ram_range() only calls __ioremap_check_ram() for RAM ranges. - the warning message is wrong as it says: "ioremap on RAM' after it established that the pfn range is not RAM. Move the WARN_ONCE() to __ioremap_caller(), and update the message to include the address range so we get an actual warning when something tries to ioremap system RAM. [ tglx: Massaged changelog ] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Roland Dreier <roland@purestorage.com> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1437088996-28511-2-git-send-email-toshi.kani@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | Merge tag 'efi-urgent' of ↵Ingo Molnar2015-07-212-4/+33
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull an EFI fix from Matt Fleming: - Fix a bug in the Common Platform Error Record (CPER) driver that caused old UEFI spec (< 2.3) versions of the memory error record structure to be declared invalid. (Tony Luck) Signed-off-by: Ingo Molnar <mingo@kernel.org>