* refs/heads/tmp-960923f:
Linux 4.9.89
usb: gadget: bdc: 64-bit pointer capability check
usb: dwc3: Fix GDBGFIFOSPACE_TYPE values
USB: gadget: udc: Add missing platform_device_put() on error in bdc_pci_probe()
scsi: qla2xxx: Fix extraneous ref on sp's after adapter break
btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
btrfs: alloc_chunk: fix DUP stripe size handling
scsi: sg: only check for dxfer_len greater than 256M
scsi: sg: fix static checker warning in sg_is_valid_dxfer
scsi: sg: fix SG_DXFER_FROM_DEV transfers
irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis
fs/aio: Use RCU accessors for kioctx_table->table[]
fs/aio: Add explicit RCU grace period when freeing kioctx
lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
fs: Teach path_connected to handle nfs filesystems with multiple roots.
drm/amdgpu/dce: Don't turn off DP sink when disconnected
drm/amdgpu: fix prime teardown order
ALSA: seq: Clear client entry before deleting else at closing
ALSA: seq: Fix possible UAF in snd_seq_check_queue()
ALSA: hda - Revert power_save option default value
ALSA: pcm: Fix UAF in snd_pcm_oss_get_formats()
parisc: Handle case where flush_cache_range is called with no context
x86/mm: Fix vmalloc_fault to use pXd_large
x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist
x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels
x86/vm86/32: Fix POPF emulation
selftests/x86/entry_from_vm86: Add test cases for POPF
selftests/x86: Add tests for the STR and SLDT instructions
selftests/x86: Add tests for User-Mode Instruction Prevention
selftests/x86/entry_from_vm86: Exit with 1 if we fail
x86/cpufeatures: Add Intel PCONFIG cpufeature
x86/boot/32: Fix UP boot on Quark and possibly other platforms
net: hns: Some checkpatch.pl script & warning fixes
ima: relax requiring a file signature for new files with zero length
locking/locktorture: Fix num reader/writer corner cases
rcutorture/configinit: Fix build directory error message
ipvlan: add L2 check for packets arriving via virtual devices
ASoC: nuc900: Fix a loop timeout test
mac80211: remove BUG() when interface type is invalid
mac80211_hwsim: enforce PS_MANUAL_POLL to be set after PS_ENABLED
agp/intel: Flush all chipset writes after updating the GGTT
powerpc/modules: Don't try to restore r2 after a sibling call
drm/amdkfd: Fix memory leaks in kfd topology
veth: set peer GSO values
media: cpia2: Fix a couple off by one bugs
media: vsp1: Prevent suspending and resuming DRM pipelines
scsi: dh: add new rdac devices
scsi: devinfo: apply to HP XP the same flags as Hitachi VSP
scsi: core: scsi_get_device_flags_keyed(): Always return device flags
bnxt_en: Don't print "Link speed -1 no longer supported" messages.
spi: sun6i: disable/unprepare clocks on remove
tools/usbip: fixes build with musl libc toolchain
ath10k: fix invalid STS_CAP_OFFSET_MASK
mwifiex: cfg80211: do not change virtual interface during scan processing
clk: qcom: msm8916: fix mnd_width for codec_digcodec
pwm: stmpe: Fix wrong register offset for hwpwm=2 case
scsi: ses: don't ask for diagnostic pages repeatedly during probe
ath10k: update tdls teardown state to target
power: supply: ab8500_charger: Bail out in case of error in 'ab8500_charger_init_hw_registers()'
power: supply: ab8500_charger: Fix an error handling path
leds: pm8058: Silence pointer to integer size warning
userns: Don't fail follow_automount based on s_user_ns
mtd: nand: ifc: update bufnum mask for ver >= 2.0.0
ARM: dts: omap3-n900: Fix the audio CODEC's reset pin
ARM: dts: am335x-pepper: Fix the audio CODEC's reset pin
net: thunderx: Set max queue count taking XDP_TX into account
mtd: nand: fix interpretation of NAND_CMD_NONE in nand_command[_lp]()
net: xfrm: allow clearing socket xfrm policies.
net: ieee802154: adf7242: Fix bug if defined DEBUG
test_firmware: fix setting old custom fw path back on exit
sched: Stop resched_cpu() from sending IPIs to offline CPUs
sched: Stop switched_to_rt() from sending IPIs to offline CPUs
ARM: dts: exynos: Correct Trats2 panel reset line
clk: meson: gxbb: fix wrong clock for SARADC/SANA
iwlwifi: mvm: rs: don't override the rate history in the search cycle
HID: elo: clear BTN_LEFT mapping
video/hdmi: Allow "empty" HDMI infoframes
drm/edid: set ELD connector type in drm_edid_to_eld()
mwifiex: Fix invalid port issue
perf stat: Fix bug in handling events in error state
wil6210: fix memory access violation in wil_memcpy_from/toio_32
wil6210: fix protection against connections during reset
ath10k: fix compile time sanity check for CE4 buffer size
mac80211_hwsim: use per-interface power level
Bluetooth: 6lowpan: fix delay work init in add_peer_chan()
Bluetooth: Avoid bt_accept_unlink() double unlinking
clk: qcom: msm8996: Fix the vfe1 powerdomain name
pwm: tegra: Increase precision in PWM rate calculation
kprobes/x86: Set kprobes pages read-only
kprobes/x86: Fix kprobe-booster not to boost far call instructions
ALSA: hda: Add Geminilake id to SKL_PLUS
scsi: sg: close race condition in sg_remove_sfp_usercontext()
scsi: sg: check for valid direction before starting the request
vfio/spapr_tce: Check kzalloc() return when preregistering memory
vfio/powerpc/spapr_tce: Enforce IOMMU type compatibility check
perf session: Don't rely on evlist in pipe mode
net: fec: add phy-reset-gpios PROBE_DEFER check
perf inject: Copy events when reordering events in pipe mode
drivers/perf: arm_pmu: handle no platform_device
iwlwifi: mvm: fix RX SKB header size and align it properly
perf evsel: Return exact sub event which failed with EPERM for wildcards
usb: gadget: dummy_hcd: Fix wrong power status bit clear/reset in dummy_hub_control()
usb: dwc2: Make sure we disconnect the gadget state
powerpc/nohash: Fix use of mmu_has_feature() in setup_initial_memory_limit()
md.c:didn't unlock the mddev before return EINVAL in array_size_store
md/raid6: Fix anomily when recovering a single device in RAID6.
regulator: isl9305: fix array size
v4l: vsp1: Register pipe with output WPF
v4l: vsp1: Prevent multiple streamon race commencing pipeline early
MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters
MIPS: r2-on-r6-emu: Fix BLEZL and BGTZL identification
MIPS: BPF: Fix multiple problems in JIT skb access helpers.
MIPS: BPF: Quit clobbering callee saved registers in JIT code.
serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off
tty: amba-pl011: Fix spurious TX interrupts
lkdtm: turn off kcov for lkdtm_rodata_do_nothing:
coresight: Fixes coresight DT parse to get correct output port ID.
i40e: only register client on iWarp-capable devices
drm/rockchip: vop: Enable pm domain before vop_initial
drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)
drm/radeon: Fail fb creation from imported dma-bufs.
video: ARM CLCD: fix dma allocation size
kvm: nVMX: Disallow userspace-injected exceptions in guest mode
kvm/svm: Setup MCG_CAP on AMD properly
iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range
apparmor: Make path_max parameter readonly
qed: Correct MSI-x for storage
scsi: ses: don't get power status of SES device slot on probe
EDAC, altera: Fix peripheral warnings for Cyclone5
fm10k: correctly check if interface is removed
ALSA: firewire-digi00x: handle all MIDI messages on streaming packets
ALSA: firewire-digi00x: add support for console models of Digi00x series
IB/hfi1: Check for QSFP presence before attempting reads
ASoC: rt5677: Add OF device ID table
reiserfs: Make cancel_old_flush() reliable
ARM: dts: koelsch: Correct clock frequency of X2 DU clock input
drm: rcar-du: Handle event when disabling CRTCs
printk: Correctly handle preemption in console_unlock()
rtmutex: Fix PI chain order integrity
qed: Fix TM block ILT allocation
net/faraday: Add missing include of of.h
net: hns: Correct HNS RSS key set function
powerpc: Avoid taking a data miss on every userspace instruction miss
ARM: dts: r8a7793: Correct parent of SSI[0-9] clocks
ARM: dts: r8a7791: Correct parent of SSI[0-9] clocks
ARM: dts: r8a7790: Correct parent of SSI[0-9] clocks
ARM: dts: r7s72100: fix ethernet clock parent
NFC: pn533: change order of free_irq and dev unregistration
NFC: nfcmrvl: double free on error path
NFC: nfcmrvl: Include unaligned.h instead of access_ok.h
vxlan: vxlan dev should inherit lowerdev's gso_max_size
drm/vmwgfx: Fixes to vmwgfx_fb
braille-console: Fix value returned by _braille_console_setup
powerpc/mm/hugetlb: Filter out hugepage size not supported by page table layout
PCI: Apply Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
bonding: refine bond_fold_stats() wrap detection
drm/ttm: never add BO that failed to validate to the LRU list
f2fs: relax node version check for victim data in gc
perf trace: Handle unpaired raw_syscalls:sys_exit event
regulator: core: Limit propagation of parent voltage count and list
blk-throttle: make sure expire time isn't too big
ARM: dts: silk: Correct clock of DU1
ARM: dts: r8a7794: Correct clock of DU1
ARM: dts: r8a7794: Add DU1 clock to device tree
ALSA: firewire-lib: add a quirk of packet without valid EOH in CIP format
mm: Fix false-positive VM_BUG_ON() in page_cache_{get,add}_speculative()
bonding: make speed, duplex setting consistent with link state
driver: (adm1275) set the m,b and R coefficients correctly for power
scsi: be2iscsi: Check tag in beiscsi_mccq_compl_wait
i40e/i40evf: Fix use after free in Rx cleanup path
perf buildid: Do not assume that readlink() returns a null terminated string
perf annotate: Fix a bug following symbolic link of a build-id file
ARM: dts: bcm2835: add index to the ethernet alias
usb: dwc3: make sure UX_EXIT_PX is cleared
dmaengine: imx-sdma: add 1ms delay to ensure SDMA channel is stopped
tcp: sysctl: Fix a race to avoid unexpected 0 window from space
spi: omap2-mcspi: poll OMAP2_MCSPI_CHSTAT_RXS for PIO transfer
ASoC: rcar: ssi: don't set SSICR.CKDV = 000 with SSIWSR.CONT
PCI: hv: Lock PCI bus on device eject
PCI: hv: Properly handle PCI bus remove
sched: act_csum: don't mangle TCP and UDP GSO packets
Input: qt1070 - add OF device ID table
sysrq: Reset the watchdog timers while displaying high-resolution timers
timers, sched_clock: Update timeout for clock wrap
media: i2c/soc_camera: fix ov6650 sensor getting wrong clock
scsi: ipr: Fix missed EH wakeup
scsi: fnic: Fix for "Number of Active IOs" in fnicstats becoming negative
x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up
solo6x10: release vb2 buffers in solo_stop_streaming()
of: fix of_device_get_modalias returned length when truncating buffers
batman-adv: handle race condition for claims between gateways
zd1211rw: fix NULL-deref at probe
s390/topology: fix typo in early topology code
qed: Always publish VF link from leading hwfn
ARM: dts: Adjust moxart IRQ controller and flags
net/8021q: create device with all possible features in wanted_features
HID: clamp input to logical range if no null state
perf probe: Return errno when not hitting any event
perf probe: Fix concat_probe_trace_events
omapfb: dss: Handle return errors in dss_init_ports()
x86/mce: Init some CPU features early
netem: apply correct delay when rate throttling
net: ethernet: bgmac: Allow MAC address to be specified in DTB
ARM: bcm2835: Enable missing CMA settings for VC4 driver
usb: misc: lvs: fix race condition in disconnect handling
ath10k: fix fetching channel during potential radar detection
ath10k: disallow DFS simulation if DFS channel is not enabled
drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off)
drivers: net: xgene: Fix Rx checksum validation logic
drivers: net: xgene: Fix wrong logical operation
drivers: net: phy: xgene: Fix mdio write
drivers: net: xgene: Fix hardware checksum setting
ARM: brcmstb: Enable ZONE_DMA for non 64-bit capable peripherals
perf tools: Make perf_event__synthesize_mmap_events() scale
i40e: fix ethtool to get EEPROM data from X722 interface
i40e: Acquire NVM lock before reads on all devices
eventpoll.h: fix epoll event masks
x86/mce: Handle broadcasted MCE gracefully with kexec
perf sort: Fix segfault with basic block 'cycles' sort dimension
x86/mm: Make mmap(MAP_32BIT) work correctly
selinux: check for address length in selinux_socket_bind()
PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown()
drm/sun4i: Fix TCON clock and regmap initialization sequence
ath10k: fix a warning during channel switch with multiple vaps
drm/sun4i: Set drm_crtc.port to the underlying TCON's output port node
drm/sun4i: Fix up error path cleanup for master bind function
arm64: dts: r8a7796: Remove unit-address and reg from integrated cache
ARM: dts: r8a7794: Remove unit-address and reg from integrated cache
ARM: dts: r8a7793: Remove unit-address and reg from integrated cache
ARM: dts: r8a7792: Remove unit-address and reg from integrated cache
ARM: dts: r8a7791: Remove unit-address and reg from integrated cache
drm: qxl: Don't alloc fbdev if emulation is not supported
HID: reject input outside logical range only if null state is set
staging: wilc1000: add check for kmalloc allocation failure.
staging: speakup: Replace BUG_ON() with WARN_ON().
perf stat: Issue a HW watchdog disable hint
Input: tsc2007 - check for presence and power down tsc2007 during probe
blkcg: fix double free of new_blkg in blkcg_init_queue
staging: android: ashmem: Fix possible deadlock in ashmem_ioctl
Conflicts:
drivers/net/wireless/ath/wil6210/main.c
drivers/usb/dwc3/core.h
Change-Id: I2d77962cfc3dbc8b051ba51bf10b577d327ffaa9
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
[ Upstream commit 0107042768658fea9f5f5a9c00b1c90f5dab6a06 ]
On systems with a large number of CPUs, running sysrq-<q> can cause
watchdog timeouts. There are two slow sections of code in the sysrq-<q>
path in timer_list.c.
1. print_active_timers() - This function is called by print_cpu() and
contains a slow goto loop. On a machine with hundreds of CPUs, this
loop took approximately 100ms for the first CPU in a NUMA node.
(Subsequent CPUs in the same node ran much quicker.) The total time
to print all of the CPUs is ultimately long enough to trigger the
soft lockup watchdog.
2. print_tickdevice() - This function outputs a large amount of textual
information. This function also took approximately 100ms per CPU.
Since sysrq-<q> is not a performance critical path, there should be no
harm in touching the nmi watchdog in both slow sections above. Touching
it in just one location was insufficient on systems with hundreds of
CPUs as occasional timeouts were still observed during testing.
This issue was observed on an Oracle T7 machine with 128 CPUs, but I
anticipate it may affect other systems with similarly large numbers of
CPUs.
Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to
prevent short sleeps, but we do not account for that in interfaces which
retrieve the remaining time.
Helge observed that timerfd can return a remaining time larger than the
relative timeout. That's not expected and breaks userland test programs.
Store the information that the timer was armed relative and provide functions
to adjust the remaining time. To avoid bloating the hrtimer struct make state
a u8, which as a bonus results in better code on x86 at least.
Reported-and-tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160114164159.273328486@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I noticed for non-monotonic timers in timer_list, some of the
output looked a little confusing.
For example:
#1: <0000000000000000>, posix_timer_fn, S:01, hrtimer_start_range_ns, leap-a-day/2360
# expires at 1434412800000000000-1434412800000000000 nsecs [in 1434410725062375469 to 1434410725062375469 nsecs]
You'll note the relative time till the expiration "[in xxx to
yyy nsecs]" is incorrect. This is because its printing the delta
between CLOCK_MONOTONIC time to the CLOCK_REALTIME expiration.
This patch fixes this issue by adding the clock offset to the
"now" time which we use to calculate the delta.
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.
We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.
The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.
With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.
Before:
47.76% hog [.] main
14.84% [kernel] [k] _raw_spin_lock_irqsave
9.55% [kernel] [k] _raw_spin_unlock_irqrestore
6.71% [kernel] [k] mod_timer
6.24% [kernel] [k] lock_timer_base.isra.38
3.76% [kernel] [k] detach_if_pending
3.71% [kernel] [k] del_timer
2.50% [kernel] [k] internal_add_timer
1.51% [kernel] [k] get_nohz_timer_target
1.28% [kernel] [k] __internal_add_timer
0.78% [kernel] [k] timerfn
0.48% [kernel] [k] wake_up_nohz_cpu
After:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When no timers/hrtimers are pending, the expiry time is set to a
special value: 'KTIME_MAX'. This normally happens with
NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer
(NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device
(NOHZ_MODE_LOWRES). But, the clockevent device is already
reprogrammed from the tick-handler for next tick.
As the clock event device is programmed in ONESHOT mode it will at
least fire one more time (unnecessarily). Timers on few
implementations (like arm_arch_timer, etc.) only support PERIODIC mode
and their drivers emulate ONESHOT over that. Which means that on these
platforms we will get spurious interrupts periodically (at last
programmed interval rate, normally tick rate).
In order to avoid spurious interrupts, the clockevent device should be
stopped or its interrupts should be masked.
A simple (yet hacky) solution to get this fixed could be: update
hrtimer_force_reprogram() to always reprogram clockevent device and
update clockevent drivers to STOP generating events (or delay it to
max time) when 'expires' is set to KTIME_MAX. But the drawback here is
that every clockevent driver has to be hacked for this particular case
and its very easy for new ones to miss this.
However, Thomas suggested to add an optional state ONESHOT_STOPPED to
solve this problem: lkml.org/lkml/2014/5/9/508.
This patch adds support for ONESHOT_STOPPED state in clockevents
core. It will only be available to drivers that implement the
state-specific callbacks instead of the legacy ->set_mode() callback.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/b8b383a03ac07b13312c16850b5106b82e4245b5.1428031396.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
'enum clock_event_mode' is used for two purposes today:
- to pass mode to the driver of clockevent device::set_mode().
- for managing state of the device for clockevents core.
For supporting new modes/states we have moved away from the
legacy set_mode() callback to new per-mode/state callbacks. New
modes/states shouldn't be exposed to the legacy (now OBSOLOTE)
callbacks and so we shouldn't add new states to 'enum
clock_event_mode'.
Lets have separate enums for the two use cases mentioned above.
Keep using the earlier enum for legacy set_mode() callback and
mark it OBSOLETE. And add another enum to clearly specify the
possible states of a clockevent device.
This also renames the newly added per-mode callbacks to reflect
state changes.
We haven't got rid of 'mode' member of 'struct
clock_event_device' as it is used by some of the clockevent
drivers and it would automatically die down once we migrate
those drivers to the new interface. It ('mode') is only updated
now for the drivers using the legacy interface.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/b6b0143a8a57bd58352ad35e08c25424c879c0cb.1425037853.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is not possible for the clockevents core to know which modes (other than
those with a corresponding feature flag) are supported by a particular
implementation. And drivers are expected to handle transition to all modes
elegantly, as ->set_mode() would be issued for them unconditionally.
Now, adding support for a new mode complicates things a bit if we want to use
the legacy ->set_mode() callback. We need to closely review all clockevents
drivers to see if they would break on addition of a new mode. And after such
reviews, it is found that we have to do non-trivial changes to most of the
drivers [1].
Introduce mode-specific set_mode_*() callbacks, some of which the drivers may or
may not implement. A missing callback would clearly convey the message that the
corresponding mode isn't supported.
A driver may still choose to keep supporting the legacy ->set_mode() callback,
but ->set_mode() wouldn't be supporting any new modes beyond RESUME. If a driver
wants to benefit from using a new mode, it would be required to migrate to
the mode specific callbacks.
The legacy ->set_mode() callback and the newly introduced mode-specific
callbacks are mutually exclusive. Only one of them should be supported by the
driver.
Sanity check is done at the time of registration to distinguish between optional
and required callbacks and to make error recovery and handling simpler. If the
legacy ->set_mode() callback is provided, all mode specific ones would be
ignored by the core but a warning is thrown if they are present.
Call sites calling ->set_mode() directly are also updated to use
__clockevents_set_mode() instead, as ->set_mode() may not be available anymore
for few drivers.
[1] https://lkml.org/lkml/2014/12/9/605
[2] https://lkml.org/lkml/2015/1/23/255
Suggested-by: Thomas Gleixner <tglx@linutronix.de> [2]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Link: http://lkml.kernel.org/r/792d59a40423f0acffc9bb0bec9de1341a06fa02.1423788565.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When running with 4096 cores attemping to read /proc/timer_list will fail
with an ENOMEM condition. On a sufficantly large systems the total amount
of data is more then 4mb, so it won't fit into a single buffer. The
failure can also occur on smaller systems when memory fragmentation is
high as reported by Dave Jones.
Convert /proc/timer_list to a proper seq_file with its own iterator. This
is a little more complex given that we have to make two passes with two
separate headers.
sysrq_timer_list_show also needed to be updated to reflect the fact that
now timer_list_show only does one cpu at at time.
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/1364345790-14577-3-git-send-email-nzimmer@sgi.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For the ondemand cpufreq governor, it is desired that the iowait
time is microaccounted in a similar way as idle time is.
This patch introduces the infrastructure to account and expose
this information via the get_cpu_iowait_time_us() function.
[akpm@linux-foundation.org: fix CONFIG_NO_HZ=n build]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082523.284feab6@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current logic which handles clock events programming failures can
increase min_delta_ns unlimited and even can cause overflows.
Sanitize it by:
- prevent zero increase when min_delta_ns == 1
- limiting min_delta_ns to a jiffie
- bail out if the jiffie limit is hit
- add retries stats for /proc/timer_list so we can gather data
Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
struct cpumask will be undefined soon with CONFIG_CPUMASK_OFFSTACK=y,
to avoid them being declared on the stack.
cpumask_bits() does what we want here (of course, this code is crap).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: Thomas Gleixner <tglx@linutronix.de>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
execution time of the hrtimer callbacks.
This is error-prone for virtual machines, where a guest vcpu can be
scheduled out during the execution of the callbacks (and the callbacks
themselves can do operations that translate to blocking operations in
the hypervisor), which in can lead to large min_delta_ns rendering the
system unusable.
Replace the current heuristics with something more reliable. Allow the
interrupt code to try 3 times to catch up with the lost time. If that
fails use the total time spent in the interrupt handler to defer the
next timer interrupt so the system can catch up with other things
which got delayed. Limit that deferment to 100ms.
The retry events and the maximum time spent in the interrupt handler
are recorded and exposed via /proc/timer_list
Inspired by a patch from Marcelo.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
In the dynamic tick code, "max_delta_ns" (member of the
"clock_event_device" structure) represents the maximum sleep time
that can occur between timer events in nanoseconds.
The variable, "max_delta_ns", is defined as an unsigned long
which is a 32-bit integer for 32-bit machines and a 64-bit
integer for 64-bit machines (if -m64 option is used for gcc).
The value of max_delta_ns is set by calling the function
"clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
nanoseconds this equates to ~2.15 seconds. Hence, the maximum
sleep time for a 32-bit machine is ~2.15 seconds, where as for
a 64-bit machine it will be many years.
This patch changes the type of max_delta_ns to be "u64" instead of
"unsigned long" so that this variable is a 64-bit type for both 32-bit
and 64-bit machines. It also changes the maximum value returned by
clockevent_delta2ns() to KTIME_MAX. Hence this allows a 32-bit
machine to sleep for longer than ~2.15 seconds. Please note that this
patch also changes "min_delta_ns" to be "u64" too and although this is
unnecessary, it makes the patch simpler as it avoids to fixup all
callers of clockevent_delta2ns().
[ tglx: changed "unsigned long long" to u64 as we use this data type
through out the time code ]
Signed-off-by: Jon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The mult and shift factors of clock events differ in their data type
from those of clock sources for no reason. u32 is sufficient for
both. shift is always <= 32 and mult is limited to 2^32-1 to avoid
64bit multiplication overflows in the conversion.
Preparatory patch for a generic mult/shift factor calculation
function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikael Pettersson <mikpe@it.uu.se>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20091111134229.725664788@linutronix.de>
The base address of a (per cpu) clock base is a useful debug info.
Add it and bump the version number of timer_lists.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The per cpu clock events device output of timer_list lacks an
association of the device to the cpu which is annoying when looking at
the output of /proc/timer_list from a 128 way system.
Add the CPU number info and mark the broadcast device in the device
list printout.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The current timer_list output prints the address of the on stack copy
of the active hrtimer instead of the hrtimer itself.
Print the address of the real timer instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
to help debugging and visibility of timer ranges, show them
in the existing timer list in /proc/timer_list
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
In order to be able to do range hrtimers we need to use accessor functions
to the "expire" member of the hrtimer struct.
This patch converts kernel/* to these accessors.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Relative expiry time can get negative, so it should be signed.
Signed-off-by: Pavel Machek <Pavel@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To allow better diagnosis of tick-sched related, especially NOHZ
related problems, we need to know when the last wakeup via an irq
happened and when the CPU left the idle state.
Add two fields (idle_waketime, idle_exittime) to the tick_sched
structure and add them to the timer_list output.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes sure printk format strings contain no more than a single
line.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
KSYM_NAME_LEN is peculiar in that it does not include the space for the
trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
buffer. This is nonsense and error-prone. Moreover, when the caller
forgets that it's very likely to subtly bite back by corrupting the stack
because the last position of the buffer is always cleared to zero.
This patch increments KSYM_NAME_LEN by one and updates code accordingly.
* off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
is fixed.
* Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
MODULE_NAME_LEN was treated as if it didn't include space for the
trailing '\0'. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Paulo Marques <pmarques@grupopie.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kallsyms_lookup() can go iterating over modules list unprotected which is OK
for emergency situations (oops), but not OK for regular stuff like
/proc/*/wchan.
Introduce lookup_symbol_name()/lookup_module_symbol_name() which copy symbol
name into caller-supplied buffer or return -ERANGE. All copying is done with
module_mutex held, so...
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several kallsyms_lookup() pass dummy arguments but only need, say, module's
name. Make kallsyms_lookup() accept NULLs where possible.
Also, makes picture clearer about what interfaces are needed for all symbol
resolving business.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>