* refs/heads/tmp-05baf14:
Linux 4.9.93
spi: davinci: fix up dma_mapping_error() incorrect patch
Revert "ip6_vti: adjust vti mtu according to mtu of lower device"
Revert "mtip32xx: use runtime tag to initialize command header"
Revert "spi: bcm-qspi: shut up warning about cfi header inclusion"
Revert "ARM: dts: omap3-n900: Fix the audio CODEC's reset pin"
Revert "ARM: dts: am335x-pepper: Fix the audio CODEC's reset pin"
Fix slab name "biovec-(1<<(21-12))"
net: hns: Fix ethtool private flags
md/raid10: reset the 'first' at the end of loop
ARM: dts: am57xx-idk-common: Add overide powerhold property
ARM: dts: am57xx-beagle-x15-common: Add overide powerhold property
ARM: dts: dra7: Add power hold and power controller properties to palmas
Documentation: pinctrl: palmas: Add ti,palmas-powerhold-override property definition
vt: change SGR 21 to follow the standards
Input: i8042 - enable MUX on Sony VAIO VGN-CS series to fix touchpad
Input: i8042 - add Lenovo ThinkPad L460 to i8042 reset list
Input: ALPS - fix TrackStick detection on Thinkpad L570 and Latitude 7370
staging: comedi: ni_mio_common: ack ai fifo error interrupts.
crypto: x86/cast5-avx - fix ECB encryption when long sg follows short one
crypto: ahash - Fix early termination in hash walk
parport_pc: Add support for WCH CH382L PCI-E single parallel port card.
media: usbtv: prevent double free in error case
mei: remove dev_err message on an unsupported ioctl
USB: serial: cp210x: add ELDAT Easywave RX09 id
USB: serial: ftdi_sio: add support for Harman FirmwareHubEmulator
USB: serial: ftdi_sio: add RT Systems VX-8 cable
arm64: idmap: Use "awx" flags for .idmap.text .pushsection directives
arm64: entry: Reword comment about post_ttbr_update_workaround
arm64: Force KPTI to be disabled on Cavium ThunderX
arm64: kpti: Add ->enable callback to remap swapper using nG mappings
arm64: kpti: Make use of nG dependent on arm64_kernel_unmapped_at_el0()
arm64: Turn on KPTI only on CPUs that need it
arm64: cputype: Add MIDR values for Cavium ThunderX2 CPUs
arm64: capabilities: Handle duplicate entries for a capability
arm64: Allow checking of a CPU-local erratum
arm64: Take into account ID_AA64PFR0_EL1.CSV3
arm64: Kconfig: Reword UNMAP_KERNEL_AT_EL0 kconfig entry
arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0
arm64: use RET instruction for exiting the trampoline
arm64: kaslr: Put kernel vectors address in separate data page
arm64: entry: Add fake CPU feature for unmapping the kernel at EL0
arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks
arm64: entry: Hook up entry trampoline to exception vectors
arm64: entry: Explicitly pass exception level to kernel_ventry macro
arm64: mm: Map entry trampoline into trampoline and kernel page tables
arm64: entry: Add exception trampoline page for exceptions from EL0
module: extend 'rodata=off' boot cmdline parameter to module mappings
arm64: factor out entry stack manipulation
arm64: mm: Invalidate both kernel and user ASIDs when performing TLBI
arm64: mm: Add arm64_kernel_unmapped_at_el0 helper
arm64: mm: Allocate ASIDs in pairs
arm64: mm: Move ASID from TTBR0 to TTBR1
arm64: mm: Use non-global mappings for kernel space
usb: dwc2: Improve gadget state disconnection handling
scsi: virtio_scsi: always read VPD pages for multiqueue too
llist: clang: introduce member_address_is_nonnull()
Bluetooth: Fix missing encryption refresh on Security Request
netfilter: x_tables: add and use xt_check_proc_name
netfilter: bridge: ebt_among: add more missing match size checks
xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systems
net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()
RDMA/ucma: Introduce safer rdma_addr_size() variants
RDMA/ucma: Check that device exists prior to accessing it
RDMA/ucma: Check that device is connected prior to access it
RDMA/ucma: Ensure that CM_ID exists prior to access it
RDMA/ucma: Fix use-after-free access in ucma_close
RDMA/ucma: Check AF family prior resolving address
xfrm_user: uncoditionally validate esn replay attribute struct
mm/vmscan.c: fix unsequenced modification and access warning
selinux: Remove redundant check for unknown labeling behavior
arm64: avoid overflow in VA_START and PAGE_OFFSET
btrfs: Remove extra parentheses from condition in copy_items()
mac80211: ibss: Fix channel type enum in ieee80211_sta_join_ibss()
mac80211: Fix clang warning about constant operand in logical operation
netfilter: ctnetlink: Make some parameters integer to avoid enum mismatch
HID: sony: Use LED_CORE_SUSPENDRESUME
cfg80211: Fix array-bounds warning in fragment copy
nl80211: Fix enum type of variable in nl80211_put_sta_rate()
xgene_enet: remove bogus forward declarations
usb: gadget: remove redundant self assignment
frv: declare jiffies to be located in the .data section
jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp
fs: compat: Remove warning from COMPATIBLE_IOCTL
selinux: Remove unnecessary check of array base in selinux_set_mapping()
cpumask: Add helper cpumask_available()
genirq: Use cpumask_available() for check of cpumask variable
netfilter: nf_nat_h323: fix logical-not-parentheses warning
Input: mousedev - fix implicit conversion warning
dm ioctl: remove double parentheses
PCI: Make PCI_ROM_ADDRESS_MASK a 32-bit constant
kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
partitions/msdos: Unable to mount UFS 44bsd partitions
powerpc/64s: Fix i-side SLB miss bad address handler saving nonvolatile GPRs
powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened
ipc/shm.c: add split function to shm_vm_ops
ceph: only dirty ITER_IOVEC pages for direct read
perf/hwbp: Simplify the perf-hwbp code, fix documentation
ALSA: pcm: potential uninitialized return values
ALSA: pcm: Use dma_bytes as size parameter in dma_mmap_coherent()
ALSA: usb-audio: Add native DSD support for TEAC UD-301
mtd: jedec_probe: Fix crash in jedec_read_mfr()
ARM: 8746/1: vfp: Go back to clearing vfp_current_hw_state[]
ANDROID: fuse: Add null terminator to path in canonical path to avoid issue
ANDROID: sdcardfs: Fix sdcardfs to stop creating cases-sensitive duplicate entries.
ANDROID: cpufreq: times: skip printing invalid frequencies
ANDROID: cpufreq: Add time_in_state to /proc/uid directories
ANDROID: proc: Add /proc/uid directory
ANDROID: cpufreq: times: track per-uid time in state
ANDROID: cpufreq: track per-task time in state
arm64: fix show_data fallout from KERN_CONT changes
arm: fix show_data fallout from KERN_CONT changes
Conflicts:
arch/arm64/include/asm/assembler.h
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/sysreg.h
arch/arm64/kernel/cpufeature.c
kernel/sched/core.c
Change-Id: If39e1c5577a1c9345b1b2739f4a5368422cef135
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
* refs/heads/tmp-bb52bba:
Linux 4.9.88
PCI: dwc: Fix enumeration end when reaching root subordinate
earlycon: add reg-offset to physical address before mapping
serial: core: mark port as initialized in autoconfig
serial: 8250_pci: Add Brainboxes UC-260 4 port serial device
usb: gadget: f_fs: Fix use-after-free in ffs_fs_kill_sb()
usb: usbmon: Read text within supplied buffer size
usb: quirks: add control message delay for 1b1c:1b20
usbip: vudc: fix null pointer dereference on udc->lock
USB: storage: Add JMicron bridge 152d:2567 to unusual_devs.h
staging: android: ashmem: Fix lockdep issue during llseek
staging: comedi: fix comedi_nsamples_left.
uas: fix comparison for error code
tty/serial: atmel: add new version check for usart
serial: sh-sci: prevent lockup on full TTY buffers
ASoC: rt5651: Fix regcache sync errors on resume
ASoC: sgtl5000: Fix suspend/resume
x86: Treat R_X86_64_PLT32 as R_X86_64_PC32
x86/module: Detect and skip invalid relocations
NFS: Fix unstable write completion
NFS: Fix an incorrect type in struct nfs_direct_req
scsi: qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport
ubi: Fix race condition between ubi volume creation and udev
ext4: inplace xattr block update fails to deduplicate blocks
netfilter: x_tables: pack percpu counter allocations
netfilter: x_tables: pass xt_counters struct to counter allocator
netfilter: x_tables: pass xt_counters struct instead of packet counter
netfilter: ipv6: fix use-after-free Write in nf_nat_ipv6_manip_pkt
netfilter: bridge: ebt_among: add missing match size checks
netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets
netfilter: IDLETIMER: be syzkaller friendly
netfilter: nat: cope with negative port range
netfilter: x_tables: fix missing timer initialization in xt_LED
netfilter: add back stackpointer size checks
tc358743: fix register i2c_rd/wr function fix
Input: tca8418_keypad - remove double read of key event register
ARM: omap2: hide omap3_save_secure_ram on non-OMAP3 builds
watchdog: hpwdt: Remove legacy NMI sourcing.
watchdog: hpwdt: fix unused variable warning
watchdog: hpwdt: Check source of NMI
watchdog: hpwdt: SMBIOS check
x86/paravirt, objtool: Annotate indirect calls
x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
x86/speculation, objtool: Annotate indirect calls/jumps for objtool
x86/retpoline: Support retpoline builds with Clang
x86/speculation: Use IBRS if available before calling into firmware
Revert "x86/retpoline: Simplify vmexit_fill_RSB()"
nospec: Include <asm/barrier.h> dependency
nospec: Kill array_index_nospec_mask_check()
ALSA: hda: add dock and led support for HP ProBook 640 G2
ALSA: hda: add dock and led support for HP EliteBook 820 G3
ALSA: seq: More protection for concurrent write and ioctl races
ALSA: seq: Don't allow resizing pool in use
ALSA: hda/realtek - Make dock sound work on ThinkPad L570
ALSA: hda/realtek - Fix dock line-out volume on Dell Precision 7520
ALSA: hda/realtek: Limit mic boost on T480
x86/spectre_v2: Don't check microcode versions when running under hypervisors
perf tools: Fix trigger class trigger_on()
x86/MCE: Serialize sysfs changes
bcache: don't attach backing with duplicate UUID
bcache: fix crashes in duplicate cache device register
IB/mlx5: Fix incorrect size of klms in the memory region
kbuild: Handle builtin dtb file names containing hyphens
KVM: s390: fix memory overwrites when not using SCA entries
virtio_ring: fix num_free handling in error case
loop: Fix lost writes caused by missing flag
Input: matrix_keypad - fix race when disabling interrupts
MIPS: OCTEON: irq: Check for null return on kzalloc allocation
MIPS: ath25: Check for kzalloc allocation failure
MIPS: BMIPS: Do not mask IPIs during suspend
drm/amdgpu:Always save uvd vcpu_bo in VM Mode
drm/amdgpu:Correct max uvd handles
drm/amdgpu: fix KV harvesting
drm/radeon: fix KV harvesting
drm/amdgpu: Notify sbios device ready before send request
drm/amdgpu: Fix deadlock on runtime suspend
drm/radeon: Fix deadlock on runtime suspend
drm/nouveau: Fix deadlock on runtime suspend
drm: Allow determining if current task is output poll worker
workqueue: Allow retrieval of current task's work struct
drm/i915: Always call to intel_display_set_init_power() in resume_early.
scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS
drm/i915: Try EDID bitbanging on HDMI after failed read
RDMA/mlx5: Fix integer overflow while resizing CQ
RDMA/ucma: Check that user doesn't overflow QP state
RDMA/ucma: Limit possible option size
ANDROID: sdcardfs: fix lock issue on 32 bit/SMP architectures
UPSTREAM: kasan: add functions for unpoisoning stack variables
UPSTREAM: kasan: add tests for alloca poisoning
UPSTREAM: kasan: support alloca() poisoning
UPSTREAM: kasan/Makefile: support LLVM style asan parameters
BACKPORT: kasan: add compiler support for clang
kbuild: fix --gc-sections
BACKPORT: fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'"
UPSTREAM: netfilter: xt_bpf: add overflow checks
UPSTREAM: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
UPSTREAM: netfilter: xt_bpf: support ebpf
FROMLIST: f2fs: don't put dentry page in pagecache into highmem
Change-Id: I7f13fedc725fe5333e18e4e5b6639eee27ea1120
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
Changes in 4.9.88
RDMA/ucma: Limit possible option size
RDMA/ucma: Check that user doesn't overflow QP state
RDMA/mlx5: Fix integer overflow while resizing CQ
drm/i915: Try EDID bitbanging on HDMI after failed read
scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS
drm/i915: Always call to intel_display_set_init_power() in resume_early.
workqueue: Allow retrieval of current task's work struct
drm: Allow determining if current task is output poll worker
drm/nouveau: Fix deadlock on runtime suspend
drm/radeon: Fix deadlock on runtime suspend
drm/amdgpu: Fix deadlock on runtime suspend
drm/amdgpu: Notify sbios device ready before send request
drm/radeon: fix KV harvesting
drm/amdgpu: fix KV harvesting
drm/amdgpu:Correct max uvd handles
drm/amdgpu:Always save uvd vcpu_bo in VM Mode
MIPS: BMIPS: Do not mask IPIs during suspend
MIPS: ath25: Check for kzalloc allocation failure
MIPS: OCTEON: irq: Check for null return on kzalloc allocation
Input: matrix_keypad - fix race when disabling interrupts
loop: Fix lost writes caused by missing flag
virtio_ring: fix num_free handling in error case
KVM: s390: fix memory overwrites when not using SCA entries
kbuild: Handle builtin dtb file names containing hyphens
IB/mlx5: Fix incorrect size of klms in the memory region
bcache: fix crashes in duplicate cache device register
bcache: don't attach backing with duplicate UUID
x86/MCE: Serialize sysfs changes
perf tools: Fix trigger class trigger_on()
x86/spectre_v2: Don't check microcode versions when running under hypervisors
ALSA: hda/realtek: Limit mic boost on T480
ALSA: hda/realtek - Fix dock line-out volume on Dell Precision 7520
ALSA: hda/realtek - Make dock sound work on ThinkPad L570
ALSA: seq: Don't allow resizing pool in use
ALSA: seq: More protection for concurrent write and ioctl races
ALSA: hda: add dock and led support for HP EliteBook 820 G3
ALSA: hda: add dock and led support for HP ProBook 640 G2
nospec: Kill array_index_nospec_mask_check()
nospec: Include <asm/barrier.h> dependency
Revert "x86/retpoline: Simplify vmexit_fill_RSB()"
x86/speculation: Use IBRS if available before calling into firmware
x86/retpoline: Support retpoline builds with Clang
x86/speculation, objtool: Annotate indirect calls/jumps for objtool
x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
x86/paravirt, objtool: Annotate indirect calls
watchdog: hpwdt: SMBIOS check
watchdog: hpwdt: Check source of NMI
watchdog: hpwdt: fix unused variable warning
watchdog: hpwdt: Remove legacy NMI sourcing.
ARM: omap2: hide omap3_save_secure_ram on non-OMAP3 builds
Input: tca8418_keypad - remove double read of key event register
tc358743: fix register i2c_rd/wr function fix
netfilter: add back stackpointer size checks
netfilter: x_tables: fix missing timer initialization in xt_LED
netfilter: nat: cope with negative port range
netfilter: IDLETIMER: be syzkaller friendly
netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets
netfilter: bridge: ebt_among: add missing match size checks
netfilter: ipv6: fix use-after-free Write in nf_nat_ipv6_manip_pkt
netfilter: x_tables: pass xt_counters struct instead of packet counter
netfilter: x_tables: pass xt_counters struct to counter allocator
netfilter: x_tables: pack percpu counter allocations
ext4: inplace xattr block update fails to deduplicate blocks
ubi: Fix race condition between ubi volume creation and udev
scsi: qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport
NFS: Fix an incorrect type in struct nfs_direct_req
NFS: Fix unstable write completion
x86/module: Detect and skip invalid relocations
x86: Treat R_X86_64_PLT32 as R_X86_64_PC32
ASoC: sgtl5000: Fix suspend/resume
ASoC: rt5651: Fix regcache sync errors on resume
serial: sh-sci: prevent lockup on full TTY buffers
tty/serial: atmel: add new version check for usart
uas: fix comparison for error code
staging: comedi: fix comedi_nsamples_left.
staging: android: ashmem: Fix lockdep issue during llseek
USB: storage: Add JMicron bridge 152d:2567 to unusual_devs.h
usbip: vudc: fix null pointer dereference on udc->lock
usb: quirks: add control message delay for 1b1c:1b20
usb: usbmon: Read text within supplied buffer size
usb: gadget: f_fs: Fix use-after-free in ffs_fs_kill_sb()
serial: 8250_pci: Add Brainboxes UC-260 4 port serial device
serial: core: mark port as initialized in autoconfig
earlycon: add reg-offset to physical address before mapping
PCI: dwc: Fix enumeration end when reaching root subordinate
Linux 4.9.88
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit ae0ac0ed6fcf5af3be0f63eb935f483f44a402d2 upstream.
instead of allocating each xt_counter individually, allocate 4k chunks
and then use these for counter allocation requests.
This should speed up rule evaluation by increasing data locality,
also speeds up ruleset loading because we reduce calls to the percpu
allocator.
As Eric points out we can't use PAGE_SIZE, page_allocator would fail on
arches with 64k page size.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d31eef5176df06f218201bc9c0ce40babb41660 upstream.
On SMP we overload the packet counter (unsigned long) to contain
percpu offset. Hide this from callers and pass xt_counters address
instead.
Preparation patch to allocate the percpu counters in page-sized batch
chunks.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux Kernel SIP ALG did not handle Segmented TCP Packets
because of which SIP communication could not be established
for some clients. This change fixes that issue.
Also,This change handle porting of below changes
1)Additional fixes for SIP Segmentation Support
I49fa88eebadb24c7ce03c4d189c71da0fe034e1d
2)Changes to handle MT call issue in SIP ALG
Ia4c83fb18c0e92e8402742f50c0ef05a344e51a4
3)Enable/Disable SIP Segmentation Support
I0e2fdc7dfa43f5d1cc00daa5c3103231c979ac43
Change-Id: I8c77322f69cf4d9ad4c7b4971da924ffd585dea0
Acked-by: Suraj Jaiswal <c_surajj@qti.qualcomm.com>
Signed-off-by: Ravinder Konka <rkonka@codeaurora.org>
Signed-off-by: Tyler Wear <twear@codeaurora.org>
Signed-off-by: Mohammed Javid <mjavid@codeaurora.org>
To prevent protential risk of memory leak caused by closing socket with
out untag it from qtaguid module, the qtaguid module now do not hold any
socket file reference count. Instead, it will increase the sk_refcnt of
the sk struct to prevent a reuse of the socket pointer. And when a socket
is released. It will delete the tag if the socket is previously tagged so
no more resources is held by xt_qtaguid moudle. A flag is added to the untag
process to prevent possible kernel crash caused by fail to delete
corresponding socket_tag_entry list.
Bug: 36374484
Test: compile and run test under system/extra/test/iptables,
run cts -m CtsNetTestCases -t android.net.cts.SocketRefCntTest
Signed-off-by: Chenbo Feng <fengc@google.com>
Change-Id: Iea7c3bf0c59b9774a5114af905b2405f6bc9ee52
The original xt_quota in the kernel is plain broken:
- counts quota at a per CPU level
(was written back when ubiquitous SMP was just a dream)
- provides no way to count across IPV4/IPV6.
This patch is the original unaltered code from:
http://sourceforge.net/projects/xtables-addons
at commit e84391ce665cef046967f796dd91026851d6bbf3
Change-Id: I19d49858840effee9ecf6cff03c23b45a97efdeb
Signed-off-by: JP Abgrall <jpa@google.com>
This module allows tracking stats at the socket level for given UIDs.
It replaces xt_owner.
If the --uid-owner is not specified, it will just count stats based on
who the skb belongs to. This will even happen on incoming skbs as it
looks into the skb via xt_socket magic to see who owns it.
If an skb is lost, it will be assigned to uid=0.
To control what sockets of what UIDs are tagged by what, one uses:
echo t $sock_fd $accounting_tag $the_billed_uid \
> /proc/net/xt_qtaguid/ctrl
So whenever an skb belongs to a sock_fd, it will be accounted against
$the_billed_uid
and matching stats will show up under the uid with the given
$accounting_tag.
Because the number of allocations for the stats structs is not that big:
~500 apps * 32 per app
we'll just do it atomic. This avoids walking lists many times, and
the fancy worker thread handling. Slabs will grow when needed later.
It use netdevice and inetaddr notifications instead of hooks in the core dev
code to track when a device comes and goes. This removes the need for
exposed iface_stat.h.
Put procfs dirs in /proc/net/xt_qtaguid/
ctrl
stats
iface_stat/<iface>/...
The uid stats are obtainable in ./stats.
Change-Id: I01af4fd91c8de651668d3decb76d9bdc1e343919
Signed-off-by: JP Abgrall <jpa@google.com>
These counters sit in hot path and do show up in perf, this is especially
true for 'found' and 'searched' which get incremented for every packet
processed.
Information like
searched=212030105
new=623431
found=333613
delete=623327
does not seem too helpful nowadays:
- on busy systems found and searched will overflow every few hours
(these are 32bit integers), other more busy ones every few days.
- for debugging there are better methods, such as iptables' trace target,
the conntrack log sysctls. Nowadays we also have perf tool.
This removes packet path stat counters except those that
are expected to be 0 (or close to 0) on a normal system, e.g.
'insert_failed' (race happened) or 'invalid' (proto tracker rejects).
The insert stat is retained for the ctnetlink case.
The found stat is retained for the tuple-is-taken check when NAT has to
determine if it needs to pick a different source address.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There are two existing strutures which defines the GRE and PPTP header.
So use these two structures instead of the ones defined by netfilter to
keep consitent with other codes.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There are already some GRE_* macros in kernel, so it is unnecessary
to define these macros. And remove some useless macros
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We should report the over quota message to the right net namespace
instead of the init netns.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The dummy ruleset I used to test the original validation change was broken,
most rules were unreachable and were not tested by mark_source_chains().
In some cases rulesets that used to load in a few seconds now require
several minutes.
sample ruleset that shows the behaviour:
echo "*filter"
for i in $(seq 0 100000);do
printf ":chain_%06x - [0:0]\n" $i
done
for i in $(seq 0 100000);do
printf -- "-A INPUT -j chain_%06x\n" $i
printf -- "-A INPUT -j chain_%06x\n" $i
printf -- "-A INPUT -j chain_%06x\n" $i
done
echo COMMIT
[ pipe result into iptables-restore ]
This ruleset will be about 74mbyte in size, with ~500k searches
though all 500k[1] rule entries. iptables-restore will take forever
(gave up after 10 minutes)
Instead of always searching the entire blob for a match, fill an
array with the start offsets of every single ipt_entry struct,
then do a binary search to check if the jump target is present or not.
After this change ruleset restore times get again close to what one
gets when reverting 3647234101 (~3 seconds on my workstation).
[1] every user-defined rule gets an implicit RETURN, so we get
300k jumps + 100k userchains + 100k returns -> 500k rule entries
Fixes: 3647234101 ("netfilter: x_tables: validate targets of jumps")
Reported-by: Jeff Wu <wujiafu@gmail.com>
Tested-by: Jeff Wu <wujiafu@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter uses multiple FWINV #defines with identical form that hide a
specific structure variable and dereference it with a invflags member.
$ git grep "#define FWINV"
include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg))
net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg))
net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg)))
net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg)))
net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg)))
net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg)))
Consolidate these macros into a single NF_INVF macro.
Miscellanea:
o Neaten the alignment around these uses
o A few lines are > 80 columns for intelligibility
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a forward-port of the original patch from Andrzej Hajda,
he said:
"IS_ERR_VALUE should be used only with unsigned long type.
Otherwise it can work incorrectly. To achieve this function
xt_percpu_counter_alloc is modified to return unsigned long,
and its result is assigned to temporary variable to perform
error checking, before assigning to .pcnt field.
The patch follows conclusion from discussion on LKML [1][2].
[1]: http://permalink.gmane.org/gmane.linux.kernel/2120927
[2]: http://permalink.gmane.org/gmane.linux.kernel/2150581"
Original patch from Andrzej is here:
http://patchwork.ozlabs.org/patch/582970/
This patch has clashed with input validation fixes for x_tables.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, mostly from Florian Westphal to sort out the lack of sufficient
validation in x_tables and connlabel preparation patches to add
nf_tables support. They are:
1) Ensure we don't go over the ruleset blob boundaries in
mark_source_chains().
2) Validate that target jumps land on an existing xt_entry. This extra
sanitization comes with a performance penalty when loading the ruleset.
3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables.
4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables.
5) Make sure the minimal possible target size in x_tables.
6) Similar to #3, add xt_compat_check_entry_offsets() for compat code.
7) Check that standard target size is valid.
8) More sanitization to ensure that the target_offset field is correct.
9) Add xt_check_entry_match() to validate that matches are well-formed.
10-12) Three patch to reduce the number of parameters in
translate_compat_table() for {arp,ip,ip6}tables by using a container
structure.
13) No need to return value from xt_compat_match_from_user(), so make
it void.
14) Consolidate translate_table() so it can be used by compat code too.
15) Remove obsolete check for compat code, so we keep consistent with
what was already removed in the native layout code (back in 2007).
16) Get rid of target jump validation from mark_source_chains(),
obsoleted by #2.
17) Introduce xt_copy_counters_from_user() to consolidate counter
copying, and use it from {arp,ip,ip6}tables.
18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump
functions.
19) Move nf_connlabel_match() to xt_connlabel.
20) Skip event notification if connlabel did not change.
21) Update of nf_connlabels_get() to make the upcoming nft connlabel
support easier.
23) Remove spinlock to read protocol state field in conntrack.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_data() is now aligned on a 64-bit area.
The temporary function nla_put_be64_32bit() is removed in this patch.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The three variants use same copy&pasted code, condense this into a
helper and use that.
Make sure info.name is 0-terminated.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We're currently asserting that targetoff + targetsize <= nextoff.
Extend it to also check that targetoff is >= sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.
We also need the e->elems pointer in a followup change to validate matches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.
Unfortunately these checks are not sufficient.
To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This fix adds a new reference counter (ref_netlink) for the struct ip_set.
The other reference counter (ref) can be swapped out by ip_set_swap and we
need a separate counter to keep track of references for netlink events
like dump. Using the same ref counter for dump causes a race condition
which can be demonstrated by the following script:
ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \
counters
ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \
counters
ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \
counters
ipset save &
ipset swap hash_ip3 hash_ip2
ipset destroy hash_ip3 /* will crash the machine */
Swap will exchange the values of ref so destroy will see ref = 0 instead of
ref = 1. With this fix in place swap will not succeed because ipset save
still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink).
Both delete and swap will error out if ref_netlink != 0 on the set.
Note: The changes to *_head functions is because previously we would
increment ref whenever we called these functions, we don't do that
anymore.
Reviewed-by: Joshua Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
delay hook registration until the table is being requested inside a
namespace.
Historically, a particular table (iptables mangle, ip6tables filter, etc)
was registered on module load.
When netns support was added to iptables only the ip/ip6tables ruleset was
made namespace aware, not the actual hook points.
This means f.e. that when ipt_filter table/module is loaded on a system,
then each namespace on that system has an (empty) iptables filter ruleset.
In other words, if a namespace sends a packet, such skb is 'caught' by
netfilter machinery and fed to hooking points for that table (i.e. INPUT,
FORWARD, etc).
Thanks to Eric Biederman, hooks are no longer global, but per namespace.
This means that we can avoid allocation of empty ruleset in a namespace and
defer hook registration until we need the functionality.
We register a tables hook entry points ONLY in the initial namespace.
When an iptables get/setockopt is issued inside a given namespace, we check
if the table is found in the per-namespace list.
If not, we attempt to find it in the initial namespace, and, if found,
create an empty default table in the requesting namespace and register the
needed hooks.
Hook points are destroyed only once namespace is deleted, there is no
'usage count' (it makes no sense since there is no 'remove table' operation
in xtables api).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Following mmapped netlink removal this code can be simplified by
removing the alloc wrapper.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains the first batch of Netfilter updates for
the upcoming 4.5 kernel. This batch contains userspace netfilter header
compilation fixes, support for packet mangling in nf_tables, the new
tracing infrastructure for nf_tables and cgroup2 support for iptables.
More specifically, they are:
1) Two patches to include dependencies in our netfilter userspace
headers to resolve compilation problems, from Mikko Rapeli.
2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.
3) Remove duplicate include in the netfilter reject infrastructure,
from Stephen Hemminger.
4) Two patches to simplify the netfilter defragmentation code for IPv6,
patch from Florian Westphal.
5) Fix root ownership of /proc/net netfilter for unpriviledged net
namespaces, from Philip Whineray.
6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.
7) Add mangling support to our nf_tables payload expression, from
Patrick McHardy.
8) Introduce a new netlink-based tracing infrastructure for nf_tables,
from Florian Westphal.
9) Change setter functions in nfnetlink_log to be void, from
Rami Rosen.
10) Add netns support to the cttimeout infrastructure.
11) Add cgroup2 support to iptables, from Tejun Heo.
12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.
13) Add support for mangling pkttype in the nf_tables meta expression,
also from Florian.
BTW, I need that you pull net into net-next, I have another batch that
requires changes that I don't yet see in net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the net pointer to the call_batch callback functions so we can skip
recurrent lookups.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
ip_ct_sctp is an internal structure, embedded by the union
nf_conntrack_proto to store sctp-specific information at conntrack
entries. It has no business with UAPI.
This patch moves it from UAPI to a saner place, together with similar
structs for other protocols.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The data extensions in ipset lacked the proper memory alignment and
thus could lead to kernel crash on several architectures. Therefore
the structures have been reorganized and alignment attributes added
where needed. The patch was tested on armv7h by Gerhard Wiesinger and
on x86_64, sparc64 by Jozsef Kadlecsik.
Reported-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
This patch makes lockdep_nfnl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As xt_action_param lives on the stack this does not bloat any
persistent data structures.
This is a first step in making netfilter code that needs to know
which network namespace it is executing in simpler.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fengguang reported, that some randconfig generated the following linker
issue with nf_ct_zone_dflt object involved:
[...]
CC init/version.o
LD init/built-in.o
net/built-in.o: In function `ipv4_conntrack_defrag':
nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt'
net/built-in.o: In function `ipv6_defrag':
nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt'
make: *** [vmlinux] Error 1
Given that configurations exist where we have a built-in part, which is
accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user()
and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a
module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in
area when netfilter is configured in general.
Therefore, split the more generic parts into a common header under
include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in
section that already holds parts related to CONFIG_NF_CONNTRACK in the
netfilter core. This fixes the issue on my side.
Fixes: 308ac9143e ("netfilter: nf_conntrack: push zone object into functions")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Move the nfnl_acct_list into the network namespace, initialize
and destroy it per namespace
- Keep track of refcnt on nfacct objects, the old logic does not
longer work with a per namespace list
- Adjust xt_nfacct to pass the namespace when registring objects
Signed-off-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Don't bother testing if we need to switch to alternate stack
unless TEE target is used.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In most cases there is no reentrancy into ip/ip6tables.
For skbs sent by REJECT or SYNPROXY targets, there is one level
of reentrancy, but its not relevant as those targets issue an absolute
verdict, i.e. the jumpstack can be clobbered since its not used
after the target issues absolute verdict (ACCEPT, DROP, STOLEN, etc).
So the only special case where it is relevant is the TEE target, which
returns XT_CONTINUE.
This patch changes ip(6)_do_table to always use the jump stack starting
from 0.
When we detect we're operating on an skb sent via TEE (percpu
nf_skb_duplicated is 1) we switch to an alternate stack to leave
the original one alone.
Since there is no TEE support for arptables, it doesn't need to
test if tee is active.
The jump stack overflow tests are no longer needed as well --
since ->stacksize is the largest call depth we cannot exceed it.
A much better alternative to the external jumpstack would be to just
declare a jumps[32] stack on the local stack frame, but that would mean
we'd have to reject iptables rulesets that used to work before.
Another alternative would be to start rejecting rulesets with a larger
call depth, e.g. 1000 -- in this case it would be feasible to allocate the
entire stack in the percpu area which would avoid one dereference.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
On 32bit archs gcc complains due to cast from void* to u64.
Add intermediate casts to long to silence these warnings.
include/linux/netfilter/x_tables.h:376:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
include/linux/netfilter/x_tables.h:384:15: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
include/linux/netfilter/x_tables.h:391:23: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
include/linux/netfilter/x_tables.h:400:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
Fixes: 71ae0dff02 ("netfilter: xtables: use percpu rule counters")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Let's force a 16 bytes alignment on xt_counter percpu allocations,
so that bytes and packets sit in same cache line.
xt_counter being exported to user space, we cannot add __align(16) on
the structure itself.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After Florian patches, there is no need for XT_TABLE_INFO_SZ anymore :
Only one copy of table is kept, instead of one copy per cpu.
We also can avoid a dereference if we put table data right after
xt_table_info. It reduces register pressure and helps compiler.
Then, we attempt a kmalloc() if total size is under order-3 allocation,
to reduce TLB pressure, as in many cases, rules fit in 32 KB.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Replace rwlock_t with spinlock_t in "struct ip_set" and change the locking
accordingly. Convert the comment extension into an rcu-avare object. Also,
simplify the timeout routines.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
When elements added to a hash:* type of set and resizing triggered,
parallel listing could start to list the original set (before resizing)
and "continue" with listing the new set. Fix it by references and
using the original hash table for listing. Therefore the destroying of
the original hash table may happen from the resizing or listing functions.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Commit "Simplify cidr handling for hash:*net* types" broke the cidr
handling for the hash:*net* types when the sets were used by the SET
target: entries with invalid cidr values were added to the sets.
Reported by Jonathan Johnson.
Testsuite entry is added to verify the fix.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
We store the rule blob per (possible) cpu. Unfortunately this means we can
waste lot of memory on big smp machines. ipt_entry structure ('rule head')
is 112 byte, so e.g. with maxcpu=64 one single rule eats
close to 8k RAM.
Since previous patch made counters percpu it appears there is nothing
left in the rule blob that needs to be percpu.
On my test system (144 possible cpus, 400k dummy rules) this
change saves close to 9 Gigabyte of RAM.
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>