* refs/heads/tmp-b324a70:
Linux 4.9.86
MIPS: Implement __multi3 for GCC7 MIPS64r6 builds
KVM: arm/arm64: Fix check for hugepage size when allocating at Stage 2
net: gianfar_ptp: move set_fipers() to spinlock protecting area
sctp: make use of pre-calculated len
xen/gntdev: Fix partial gntdev_mmap() cleanup
xen/gntdev: Fix off-by-one error when unmapping with holes
SolutionEngine771x: fix Ether platform data
mdio-sun4i: Fix a memory leak
xen-netfront: enable device after manual module load
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
mac80211: mesh: drop frames appearing to be from us
nl80211: Check for the required netlink attribute presence
i40e/i40evf: Account for frags split over multiple descriptors in check linearize
uapi libc compat: add fallback for unsupported libcs
drm/ttm: check the return value of kzalloc
NET: usb: qmi_wwan: add support for YUGA CLM920-NC5 PID 0x9625
e1000: fix disabling already-disabled warning
macvlan: Fix one possible double free
xfs: quota: check result of register_shrinker()
xfs: quota: fix missed destroy of qi_tree_lock
IB/ipoib: Fix race condition in neigh creation
IB/mlx4: Fix mlx4_ib_alloc_mr error flow
s390/dasd: fix wrongly assigned configuration data
genirq: Guard handle_bad_irq log messages
IB/mlx5: Fix mlx5_ib_alloc_mr error flow
led: core: Fix brightness setting when setting delay_off=0
bnx2x: Improve reliability in case of nested PCI errors
tg3: Enable PHY reset in MTU change path for 5720
tg3: Add workaround to restrict 5762 MRRS to 2048
tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path
tipc: error path leak fixes in tipc_enable_bearer()
lib/mpi: Fix umul_ppmm() for MIPS64r6
ARM: dts: ls1021a: fix incorrect clock references
scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error
net: stmmac: Fix TX timestamp calculation
ip6_tunnel: get the min mtu properly in ip6_tnl_xmit
net: arc_emac: fix arc_emac_rx() error paths
net: mediatek: setup proper state for disabled GMAC on the default
ASoC: nau8825: fix issue that pop noise when start capture
spi: atmel: fixed spin_lock usage inside atmel_spi_remove
mac80211_hwsim: Fix a possible sleep-in-atomic bug in hwsim_get_radio_nl
drm/nouveau/pci: do a msi rearm on init
net: phy: xgene: disable clk on error paths
sget(): handle failures of register_shrinker()
x86/asm: Allow again using asm.h when building for the 'bpf' clang target
ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
ipv6: icmp6: Allow icmp messages to be looped back
mtd: nand: brcmnand: Zero bitflip is not an error
mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM
net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
nvme: check hw sectors before setting chunk sectors
dmaengine: fsl-edma: disable clks on all error paths
f2fs: fix a bug caused by NULL extent tree
i2c: designware: must wait for enable
hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers)
ANDROID: kbuild: change LTO into a choice
ANDROID: arm64: crypto: fix AES CE when built as a module
ANDROID: staging: lustre: fix filler function type
ANDROID: fs: logfs: fix filler function type
ANDROID: fs: gfs2: fix filler function type
ANDROID: fs: exofs: fix filler function type
ANDROID: fs: afs: fix filler function type
ANDROID: keychord: Check for write data size
media-device: fix ioctl function types
drivers/perf: arm_pmu: fix function type mismatch
dummycon: fix function types
fs: nfs: fix filler function type
mm: fix filler function type mismatch
mm: fix drain_local_pages function type
BACKPORT: vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
arch/arm64/crypto: fix CFI in AES CE
arch/arm64/crypto: fix CFI in SHA CE
arm64: disable CFI for cpu_replace_ttbr1
v4l2-ioctl: fix function types for IOCTL_INFO_STD
UPSTREAM: module: Do not paper over type mismatches in module_param_call()
BACKPORT: treewide: Fix function prototypes for module_param_call()
UPSTREAM: module: Prepare to convert all module_param_call() prototypes
bpf: fix function type for __bpf_prog_run
kallsyms: strip the .cfi postfix from symbols with CONFIG_CFI_CLANG
add support for clang Control Flow Integrity (CFI)
HACK: init: ensure initcall ordering with LTO
xen/efi: don't use -fshort-wchar
drivers/misc: disable LTO for lkdtm_rodata.o
arm64: vdso: disable LTO
FROMLIST: BACKPORT: arm64: select ARCH_SUPPORTS_LTO_CLANG
FROMLIST: BACKPORT: arm64: disable RANDOMIZE_MODULE_REGION_FULL with LTO_CLANG
FROMLIST: arch/arm64/crypto: disable LTO for aes-ce-cipher.c
arm64: disable ARM64_ERRATUM_843419 for clang LTO
arm64: pass code model to LLVMgold
FROMLIST: BACKPORT: arm64: make mrs_s and msr_s macros work with LTO
FROMLIST: arm64: kvm: use -fno-jump-tables with clang
FROMLIST: efi/libstub: disable LTO
FROMLIST: scripts/mod: disable LTO for empty.c
FROMLIST: BACKPORT: kbuild: fix dynamic ftrace with clang LTO
FROMLIST: BACKPORT: kbuild: add support for clang LTO
FROMLIST: BACKPORT: arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS
FROMLIST: arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold
FROMLIST: kbuild: add __ld-ifversion and linker-specific macros
FROMLIST: kbuild: add ld-name macro
FROMLIST: BACKPORT: arm64: keep .altinstructions and .altinstr_replacement
arm64: fix LD_DEAD_CODE_DATA_ELIMINATION
FROMLIST: kbuild: fix LD_DEAD_CODE_DATA_ELIMINATION
FROMLIST: BACKPORT: kbuild: add __cc-ifversion and compiler-specific variants
FROMLIST: kbuild: add clang-version.sh
Revert "binder: add missing binder_unlock()"
Linux 4.9.85
x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface
mm: fail get_vaddr_frames() for filesystem-dax mappings
mm: Fix devm_memremap_pages() collision handling
libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
IB/core: disable memory registration of filesystem-dax vmas
v4l2: disable filesystem-dax mapping support
mm: introduce get_user_pages_longterm
device-dax: implement ->split() to catch invalid munmap attempts
libnvdimm: fix integer overflow static analysis warning
fs/dax.c: fix inefficiency in dax_writeback_mapping_range()
mm: avoid spurious 'bad pmd' warning messages
X.509: fix NULL dereference when restricting key with unsupported_sig
binder: add missing binder_unlock()
drm/amdgpu: add new device to use atpx quirk
drm/amdgpu: Avoid leaking PM domain on driver unbind (v2)
drm/amdgpu: add atpx quirk handling (v2)
drm/amdgpu: Add dpm quirk for Jet PRO (v2)
usb: renesas_usbhs: missed the "running" flag in usb_dmac with rx path
usb: gadget: f_fs: Process all descriptors during bind
Revert "usb: musb: host: don't start next rx urb if current one failed"
usb: ldusb: add PIDs for new CASSY devices supported by this driver
usb: dwc3: gadget: Set maxpacket size for ep0 IN
drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA
Add delay-init quirk for Corsair K70 RGB keyboards
arm64: Disable unhandled signal log messages by default
usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks()
ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()
PCI/cxgb4: Extend T3 PCI quirk to T4+ devices
irqchip/gic-v3: Use wmb() instead of smb_wmb() in gic_raise_softirq()
x86/oprofile: Fix bogus GCC-8 warning in nmi_setup()
iio: adis_lib: Initialize trigger before requesting interrupt
iio: buffer: check if a buffer has been set up when poll is called
RDMA/uverbs: Protect from command mask overflow
PKCS#7: fix certificate chain verification
X.509: fix BUG_ON() when hash algorithm is unsupported
cfg80211: fix cfg80211_beacon_dup
scsi: ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info
xtensa: fix high memory/reserved memory collision
netfilter: drop outermost socket lock in getsockopt()
ANDROID: sdcardfs: Set num in extension_details during make_item
Conflicts:
Makefile
arch/arm64/include/asm/arch_gicv3.h
arch/arm64/kernel/module.lds
drivers/usb/gadget/function/f_fs.c
scripts/link-vmlinux.sh
Change in module_param_call() definition requires alignment in:
drivers/hwtracing/coresight/coresight-event.c
drivers/media/radio/radio-iris-transport.c
drivers/power/reset/msm-poweroff.c
drivers/soc/qcom/wcnss/wcnss_wlan.c
drivers/video/fbdev/msm/mdss_dsi_status.c
Change-Id: I2fa32c39bd4ba8a132f8f8abc8132a2ceb32907a
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:
@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@
module_param_call(_name, _set_func, _get_func, _arg, _mode);
@fix_set_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _set_func(
-_val_type _val
+const char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
@fix_get_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _get_func(
-_val_type _val
+char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:
drivers/platform/x86/thinkpad_acpi.c
fs/lockd/svc.c
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Bug: 67506682
Change-Id: I2c9c0ee8ed28065e63270a52c155e5e7d2791295
(cherry picked from commit e4dca7b7aa08b22893c45485d222b5807c1375ae)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
* 4.9/tmp-8cca21f:
Linux 4.9.65
mm/pagewalk.c: report holes in hugetlb ranges
coda: fix 'kernel memory exposure attempt' in fsync
mm/page_alloc.c: broken deferred calculation
ipmi: fix unsigned long underflow
ocfs2: should wait dio before inode lock in ocfs2_setattr()
ocfs2: fix cluster hang after a node dies
dmaengine: dmatest: warn user when dma test times out
serial: 8250_fintek: Fix finding base_port with activated SuperIO
serial: omap: Fix EFR write on RTS deassertion
ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
crypto: dh - Fix double free of ctx->p
crypto: dh - fix memleak in setkey
net/sctp: Always set scope_id in sctp_inet6_skb_msgname
fealnx: Fix building error on MIPS
sctp: do not peel off an assoc from one netns to another one
af_netlink: ensure that NLMSG_DONE never fails in dumps
vlan: fix a use-after-free in vlan_device_event()
net: usb: asix: fill null-ptr-deref in asix_suspend
qmi_wwan: Add missing skb_reset_mac_header-call
net: qmi_wwan: fix divide by 0 on bad descriptors
net: cdc_ether: fix divide by 0 on bad descriptors
bonding: discard lowest hash bit for 802.3ad layer3+4
netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
tcp: do not mangle skb->cb[] in tcp_make_synack()
net: vrf: correct FRA_L3MDEV encode type
tcp_nv: fix division by zero in tcpnv_acked()
Linux 4.9.64
staging: greybus: spilib: fix use-after-free after deregistration
brcmfmac: don't preset all channels as disabled
x86/MCE/AMD: Always give panic severity for UC errors in kernel context
USB: serial: garmin_gps: fix memory leak on probe errors
USB: serial: garmin_gps: fix I/O after failed probe and remove
USB: serial: qcserial: add pid/vid for Sierra Wireless EM7355 fw update
usb: gadget: f_fs: Fix use-after-free in ffs_free_inst
USB: Add delay-init quirk for Corsair K70 LUX keyboards
USB: usbfs: compute urb->actual_length for isochronous
crypto: dh - Don't permit 'key' or 'g' size longer than 'p'
crypto: dh - Don't permit 'p' to be 0
Revert "dt-bindings: Add LEGO MINDSTORMS EV3 compatible specification"
Revert "dt-bindings: Add vendor prefix for LEGO"
uapi: fix linux/rds.h userspace compilation errors
uapi: fix linux/rds.h userspace compilation error
Revert "uapi: fix linux/rds.h userspace compilation errors"
Revert "crypto: xts - Add ECB dependency"
MIPS: Netlogic: Exclude netlogic,xlp-pic code from XLR builds
MIPS: traps: Ensure L1 & L2 ECC checking match for CM3 systems
MIPS: init: Ensure reserved memory regions are not added to bootmem
MIPS: init: Ensure bootmem does not corrupt reserved memory
MIPS: End asm function prologue macros with .insn
staging: greybus: add host device function pointer checks
staging: wilc1000: Fix endian sparse warning
staging: rtl8712: fixed little endian problem
ixgbe: do not disable FEC from the driver
ixgbe: add mask for 64 RSS queues
ixgbe: Reduce I2C retry count on X550 devices
ixgbe: Fix reporting of 100Mb capability
ixgbe: handle close/suspend race with netif_device_detach/present
ixgbe: fix AER error handling
ixgbe: Configure advertised speeds correctly for KR/KX backplane
arm64: dts: NS2: reserve memory for Nitro firmware
ALSA: hda/realtek - Add new codec ID ALC299
gpu: drm: mgag200: mgag200_main:- Handle error from pci_iomap
backlight: adp5520: Fix error handling in adp5520_bl_probe()
backlight: lcd: Fix race condition during register
drm/omap: panel-sony-acx565akm.c: Add MODULE_ALIAS
ALSA: vx: Fix possible transfer overflow
ALSA: vx: Don't try to update capture stream before running
power: supply: axp288_fuel_gauge: Read 12 bit values 2 registers at a time
power: supply: axp288_fuel_gauge: Read 15 bit values 2 registers at a time
rtc: rx8010: change lock mechanism
scsi: lpfc: Clear the VendorVersion in the PLOGI/PLOGI ACC payload
scsi: lpfc: Correct issue leading to oops during link reset
scsi: lpfc: Correct host name in symbolic_name field
scsi: lpfc: FCoE VPort enable-disable does not bring up the VPort
scsi: lpfc: Add missing memory barrier
x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers
staging: rtl8188eu: fix incorrect ERROR tags from logs
tcp: provide timestamps for partial writes
scsi: ufs: add capability to keep auto bkops always enabled
scsi: ufs-qcom: Fix module autoload
igb: Fix hw_dbg logging in igb_update_flash_i210
igb: close/suspend race in netif_device_detach
igb: reset the PHY before reading the PHY ID
drm/sti: sti_vtg: Handle return NULL error from devm_ioremap_nocache
ata: SATA_MV should depend on HAS_DMA
ata: SATA_HIGHBANK should depend on HAS_DMA
ata: ATA_BMDMA should depend on HAS_DMA
ARM: dts: omap5-uevm: Allow bootloader to configure USB Ethernet MAC
ARM: dts: Fix omap3 off mode pull defines
ARM: OMAP2+: Fix init for multiple quirks for the same SoC
ARM: dts: Fix am335x and dm814x scm syscon to probe children
ARM: dts: Fix compatible for ti81xx uarts for 8250
fm10k: request reset when mbx->state changes
extcon: palmas: Check the parent instance to prevent the NULL
extcon: Remove potential problem when calling extcon_register_notifier()
Bluetooth: btusb: fix QCA Rome suspend/resume
arm: crypto: reduce priority of bit-sliced AES cipher
media: dib0700: fix invalid dvb_detach argument
media: imon: Fix null-ptr-deref in imon_probe
Linux 4.9.63
misc: panel: properly restore atomic counter on error path
qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2)
target/iscsi: Fix iSCSI task reassignment handling
brcmfmac: remove setting IBSS mode when stopping AP
security/keys: add CONFIG_KEYS_COMPAT to Kconfig
netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"
netfilter: nat: avoid use of nf_conn_nat extension
Revert "ARM: dts: imx53-qsb-common: fix FEC pinmux config"
ALSA: seq: Cancel pending autoload work at unbinding device
Input: ims-psu - check if CDC union descriptor is sane
usb: usbtest: fix NULL pointer dereference
mac80211: don't compare TKIP TX MIC key in reinstall prevention
mac80211: use constant time comparison with keys
mac80211: accept key reinstall without changing anything
ppp: fix race in ppp device destruction
net_sched: avoid matching qdisc with zero handle
sctp: reset owner sk for data chunks on out queues when migrating a sock
tun: allow positive return values on dev_get_valid_name() call
ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit
ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err
ipip: only increase err_count for some certain type icmp in ipip_err
tap: double-free in error path in tap_open()
net/unix: don't show information about sockets from other namespaces
tcp/dccp: fix other lockdep splats accessing ireq_opt
tcp/dccp: fix lockdep splat in inet_csk_route_req()
sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND
ipv6: flowlabel: do not leave opt->tot_len with garbage
soreuseport: fix initialization race
packet: avoid panic in packet_getsockopt()
tcp/dccp: fix ireq->opt races
sctp: add the missing sock_owned_by_user check in sctp_icmp_redirect
tun: call dev_get_valid_name() before register_netdevice()
l2tp: check ps->sock before running pppol2tp_session_ioctl()
tcp: fix tcp_mtu_probe() vs highest_sack
net: call cgroup_sk_alloc() earlier in sk_clone_lock()
netlink: do not set cb_running if dump's start() errs
ipv6: addrconf: increment ifp refcount before ipv6_del_addr()
tun/tap: sanitize TUNSETSNDBUF input
gso: fix payload length when gso_size is zero
FROMLIST: binder: fix proc->files use-after-free
Conflicts:
drivers/scsi/ufs/ufshcd.h
include/net/netfilter/nf_conntrack.h
Change-Id: I38fd3aa5f077a7bde0a8de4ebe9dc9316075f199
Signed-off-by: Kyle Yan <kyan@codeaurora.org>
NATTYPE entry timeout will not be updated when packets
start flowing through IPA and eventually can timeout.
Make changes to refresh the NATTYPE entry timeout from
the connection tracking netlink module which is used by
IPANAT to update conntrack timeout. Also make the timeout
dynamic by taking the expiry timeout value from connection
tracking entry. Otherwise NATTYPE entry will not timeout
even when all the corresponding conntrack entries expire.
Also, This change take care of porting of below changes
1) netfilter: Add of NATTYPE COOKIE Check
I1a53fa0dc6961dd3e53d382642b413d4ee781ed6
2)ipt_NATTYPE: Fix for changes in
I31e4c4acae30378f37bd4b8e4d7d617957a0545d
3)netfilter: Move NATTYPE forward mode to POSTROUTING chain.
Ic03436339e2b0a6c4277146942f518e6c7d49574
CRs-FIXED: 490813
Change-Id: Ibb225d7b91070ca9948c7a11f0b5925a8435915c
Acked-by: Suraj Jaiswal <c_surajj@qti.qualcomm.com>
Signed-off-by: Tyler Wear <twear@codeaurora.org>
Signed-off-by: Mohammed Javid <mjavid@codeaurora.org>
Linux Kernel SIP ALG did not handle Segmented TCP Packets
because of which SIP communication could not be established
for some clients. This change fixes that issue.
Also,This change handle porting of below changes
1)Additional fixes for SIP Segmentation Support
I49fa88eebadb24c7ce03c4d189c71da0fe034e1d
2)Changes to handle MT call issue in SIP ALG
Ia4c83fb18c0e92e8402742f50c0ef05a344e51a4
3)Enable/Disable SIP Segmentation Support
I0e2fdc7dfa43f5d1cc00daa5c3103231c979ac43
Change-Id: I8c77322f69cf4d9ad4c7b4971da924ffd585dea0
Acked-by: Suraj Jaiswal <c_surajj@qti.qualcomm.com>
Signed-off-by: Ravinder Konka <rkonka@codeaurora.org>
Signed-off-by: Tyler Wear <twear@codeaurora.org>
Signed-off-by: Mohammed Javid <mjavid@codeaurora.org>
When IPA is present, all the data packets go through IPA
and as a result NATTYPE entry timeout will not be refreshed
and eventually it times out. IPA periodically refreshes
the connection tracking entry timeout. So make changes to refresh
the NATTYPE entry timeout from the connection tracking module.
Change-Id: I5861427990af4bfd6046d21809a778409d0d8d5f
Signed-off-by: Mohammed Javid <mjavid@codeaurora.org>
Acked-by: Suraj Jaiswal <c_surajj@qti.qualcomm.com>
commit e1bf1687740ce1a3598a1c5e452b852ff2190682 upstream.
This reverts commit 870190a9ec.
It was not a good idea. The custom hash table was a much better
fit for this purpose.
A fast lookup is not essential, in fact for most cases there is no lookup
at all because original tuple is not taken and can be used as-is.
What needs to be fast is insertion and deletion.
rhlist removal however requires a rhlist walk.
We can have thousands of entries in such a list if source port/addresses
are reused for multiple flows, if this happens removal requests are so
expensive that deletions of a few thousand flows can take several
seconds(!).
The advantages that we got from rhashtable are:
1) table auto-sizing
2) multiple locks
1) would be nice to have, but it is not essential as we have at
most one lookup per new flow, so even a million flows in the bysource
table are not a problem compared to current deletion cost.
2) is easy to add to custom hash table.
I tried to add hlist_node to rhlist to speed up rhltable_remove but this
isn't doable without changing semantics. rhltable_remove_fast will
check that the to-be-deleted object is part of the table and that
requires a list walk that we want to avoid.
Furthermore, using hlist_node increases size of struct rhlist_head, which
in turn increases nf_conn size.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821
Reported-by: Ivan Babrou <ibobrik@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shortcut forward Engine (SFE) is a software packet accelerator
which works on packet tuple entires (SFE entry) based on
conntrack information.
net:core has changes to invoke SFE module during packet traversal.
net:netfilter has changes to remove SFE Entries when conntrack is
deleted or expires. Also has changes to avoid tcp window check for
incoming packets.
Change-Id: I1622677e472870f8100c72221d9b1fab7fa768be
Signed-off-by: Mohammed Javid <mjavid@codeaurora.org>
Add support to send conntrack event when
configured packet threshold is met.
Change-Id: I11fe78a74512d901d260deead3354461fc4990ab
Acked-by: Chaitanya Pratapa <cpratapa@qti.qualcomm.com>
Signed-off-by: Mohammed Javid <mjavid@codeaurora.org>
[ Upstream commit 568af6de058cb2b0c5b98d98ffcf37cdc6bc38a7 ]
Phil Sutter reports that IPv6 AH header matching is broken. From
userspace, nft generates bytecode that expects to find the AH header at
NFT_PAYLOAD_TRANSPORT_HEADER both for IPv4 and IPv6. However,
pktinfo->thoff is set to the inner header after the AH header in IPv6,
while in IPv4 pktinfo->thoff points to the AH header indeed. This
behaviour is inconsistent. This patch fixes this problem by updating
ipv6_find_hdr() to get the IP6_FH_F_AUTH flag so this function stops at
the AH header, so both IPv4 and IPv6 pktinfo->thoff point to the AH
header.
This is also inconsistent when trying to match encapsulated headers:
1) A packet that looks like IPv4 + AH + TCP dport 22 will *not* match.
2) A packet that looks like IPv6 + AH + TCP dport 22 will match.
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stas Nichiporovich reports oops in nf_nat_bysource_cmp(), trying to
access nf_conn struct at address 0xffffffffffffff50.
This is the result of fetching a null rhash list (struct embedded at
offset 176; 0 - 176 gets us ...fff50).
The problem is that conntrack entries are allocated from a
SLAB_DESTROY_BY_RCU cache, i.e. entries can be free'd and reused
on another cpu while nf nat bysource hash access the same conntrack entry.
Freeing is fine (we hold rcu read lock); zeroing rhlist_head isn't.
-> Move the rhlist struct outside of the memset()-inited area.
Fixes: 7c96643519 ("netfilter: move nat hlist_head to nf_conn")
Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
As Liping Zhang reports, after commit a8b1e36d0d ("netfilter: nft_dynset:
fix element timeout for HZ != 1000"), priv->timeout was stored in jiffies,
while set->timeout was stored in milliseconds. This is inconsistent and
incorrect.
Firstly, we already call msecs_to_jiffies in nft_set_elem_init, so
priv->timeout will be converted to jiffies twice.
Secondly, if the user did not specify the NFTA_DYNSET_TIMEOUT attr,
set->timeout will be used, but we forget to call msecs_to_jiffies
when do update elements.
Fix this by using jiffies internally for traditional sets and doing the
conversions to/from msec when interacting with userspace - as dynset
already does.
This is preferable to doing the conversions, when elements are inserted or
updated, because this can happen very frequently on busy dynsets.
Fixes: a8b1e36d0d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Reported-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
Acked-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
I got offlist bug report about failing connections and high cpu usage.
This happens because we hit 'elasticity' checks in rhashtable that
refuses bucket list exceeding 16 entries.
The nat bysrc hash unfortunately needs to insert distinct objects that
share same key and are identical (have same source tuple), this cannot
be avoided.
Switch to the rhlist interface which is designed for this.
The nulls_base is removed here, I don't think its needed:
A (unlikely) false positive results in unneeded port clash resolution,
a false negative results in packet drop during conntrack confirmation,
when we try to insert the duplicate into main conntrack hash table.
Tested by adding multiple ip addresses to host, then adding
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
... and then creating multiple connections, from same source port but
different addresses:
for i in $(seq 2000 2032);do nc -p 1234 192.168.7.1 $i > /dev/null & done
(all of these then get hashed to same bysource slot)
Then, to test that nat conflict resultion is working:
nc -s 10.0.0.1 -p 1234 192.168.7.1 2000
nc -s 10.0.0.2 -p 1234 192.168.7.1 2000
tcp .. src=10.0.0.1 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1024 [ASSURED]
tcp .. src=10.0.0.2 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1025 [ASSURED]
tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1234 [ASSURED]
tcp .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2001 src=192.168.7.1 dst=192.168.7.10 sport=2001 dport=1234 [ASSURED]
[..]
-> nat altered source ports to 1024 and 1025, respectively.
This can also be confirmed on destination host which shows
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1024
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1025
ESTAB 0 0 192.168.7.1:2000 192.168.7.10:1234
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 870190a9ec ("netfilter: nat: convert nat bysrc hash to rhashtable")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is now a fixed-size extension, so we don't need to pass a variable
alloc size. This (harmless) error results in allocating 32 instead of
the needed 16 bytes for this extension as the size gets passed twice.
Fixes: 23014011ba ("netfilter: conntrack: support a fixed size of 128 distinct labels")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 36b701fae1 ("netfilter: nf_tables: validate maximum value of
u32 netlink attributes") introduced nft_parse_u32_check with a return
value of "unsigned int", yet on error it returns "-ERANGE".
This patch corrects the mismatch by changing the return value to "int",
which happens to match the actual users of nft_parse_u32_check already.
Found by Coverity, CID 1373930.
Note that commit 21a9e0f156 ("netfilter: nft_exthdr: fix error
handling in nft_exthdr_init()) attempted to address the issue, but
did not address the return type of nft_parse_u32_check.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Laura Garcia Liebana <nevola@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 36b701fae1 ("netfilter: nf_tables: validate maximum value...")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When nft_expr_clone failed, a series of problems will happen:
1. module refcnt will leak, we call __module_get at the beginning but
we forget to put it back if ops->clone returns fail
2. memory will be leaked, if clone fail, we just return NULL and forget
to free the alloced element
3. set->nelems will become incorrect when set->size is specified. If
clone fail, we should decrease the set->nelems
Now this patch fixes these problems. And fortunately, clone fail will
only happen on counter expression when memory is exhausted.
Fixes: 086f332167 ("netfilter: nf_tables: add clone interface to expression operations")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
net/netfilter/core.c
net/netfilter/nf_tables_netdev.c
Resolve two conflicts before pull request for David's net-next tree:
1) Between c73c248490 ("netfilter: nf_tables_netdev: remove redundant
ip_hdr assignment") from the net tree and commit ddc8b6027a
("netfilter: introduce nft_set_pktinfo_{ipv4, ipv6}_validate()").
2) Between e8bffe0cf9 ("net: Add _nf_(un)register_hooks symbols") and
Aaron Conole's patches to replace list_head with single linked list.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
NFTA_LOG_FLAGS attribute is already supported, but the related
NF_LOG_XXX flags are not exposed to the userspace. So we cannot
explicitly enable log flags to log uid, tcp sequence, ip options
and so on, i.e. such rule "nft add rule filter output log uid"
is not supported yet.
So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In
order to keep consistent with other modules, change NF_LOG_MASK to
refer to all supported log flags. On the other hand, add a new
NF_LOG_DEFAULT_MASK to refer to the original default log flags.
Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP
and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the
userspace.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Inverse ranges != [a,b] are not currently possible because rules are
composites of && operations, and we need to express this:
data < a || data > b
This patch adds a new range expression. Positive ranges can be already
through two cmp expressions:
cmp(sreg, data, >=)
cmp(sreg, data, <=)
This new range expression provides an alternative way to express this.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The netfilter hook list never uses the prev pointer, and so can be trimmed to
be a simple singly-linked list.
In addition to having a more light weight structure for hook traversal,
struct net becomes 5568 bytes (down from 6400) and struct net_device becomes
2176 bytes (down from 2240).
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A future patch will modify the hook drop and outfn functions. This will
cause the line lengths to take up too much space. This is simply a
readability change.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This replaces the last uses of NF_HOOK_THRESH().
Followup patch will remove it and rename nf_hook_thresh.
The reason is that inet (non-bridge) netfilter no longer invokes the
hooks from hooks, so we do no longer need the thresh value to skip hooks
with a lower priority.
The bridge netfilter however may need to do this. br_nf_hook_thresh is a
wrapper that is supposed to do this, i.e. only call hooks with a
priority that exceeds NF_BR_PRI_BRNF.
It's used only in the recursion cases of br_netfilter. It invokes
nf_hook_slow while holding an rcu read-side critical section to make a
future cleanup simpler.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After commit ac28634456 ("netfilter: bridge: add nf_afinfo to enable
queuing to userspace"), we can queue packets to the user space in bridge
family. But when the user specify the queue range, packets will be only
delivered to the first queue num. Because in nfqueue_hash, we only support
ipv4 and ipv6 family. Now add support for bridge family too.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fetch value and validate u32 netlink attribute. This validation is
usually required when the u32 netlink attributes are being stored in a
field whose size is smaller.
This patch revisits 4da449ae1d ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").
Fixes: 96518518cc ("netfilter: add nftables")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When memory is exhausted, nfct_seqadj_ext_add may fail to add the
synproxy and seqadj extensions. The function nf_ct_seqadj_init doesn't
check if get valid seqadj pointer by the nfct_seqadj.
Now drop the packet directly when fail to add seqadj extension to
avoid dereference NULL pointer in nf_ct_seqadj_init from
init_conntrack().
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
drivers/net/ethernet/mediatek/mtk_eth_soc.c
drivers/net/ethernet/qlogic/qed/qed_dcbx.c
drivers/net/phy/Kconfig
All conflicts were cases of overlapping commits.
Signed-off-by: David S. Miller <davem@davemloft.net>
hash_v6 is used by both nftables and ip6tables, so depend on
IP6_NF_IPTABLES is not properly.
Actually, it only parses ipv6hdr and computes a hash value, so
even if IPV6 is disabled, there's no side effect too, remove it.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is overly conservative and not flexible at all, so better let them
go through and let the filtering policy decide what to do with them. We
use skb_header_pointer() all over the place so we would just fail to
match when trying to access fields from malformed traffic.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Consolidate pktinfo setup and validation by using the new generic
functions so we converge to the netdev family codebase.
We only need a linear IPv4 and IPv6 header from the reject expression,
so move nft_bridge_iphdr_validate() and nft_bridge_ip6hdr_validate()
to net/bridge/netfilter/nft_reject_bridge.c.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These functions are extracted from the netdev family, they initialize
the pktinfo structure and validate that the IPv4 and IPv6 headers are
well-formed given that these functions are called from a path where
layer 3 sanitization did not happen yet.
These functions are placed in include/net/netfilter/nf_tables_ipv{4,6}.h
so they can be reused by a follow up patch to use them from the bridge
family too.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Make sure the pktinfo protocol fields are initialized if this fails to
parse the transport header.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch introduces nft_set_pktinfo_unspec() that ensures proper
initialization all of pktinfo fields for non-IP traffic. This is used
by the bridge, netdev and arp families.
This new function relies on nft_set_pktinfo_proto_unspec() to set a new
tprot_set field that indicates if transport protocol information is
available. Remain fields are zeroed.
The meta expression has been also updated to check to tprot_set in first
place given that zero is a valid tprot value. Even a handcrafted packet
may come with the IPPROTO_RAW (255) protocol number so we can't rely on
this value as tprot unset.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After commit adf0516845 ("netfilter: remove ip_conntrack* sysctl
compat code"), ctl_table_path member in struct nf_conntrack_l3proto{}
is not used anymore, remove it.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nf_log_set is an interface function, so it should do the strict sanity
check of parameters. Convert the return value of nf_log_set as int instead
of void. When the pf is invalid, return -EOPNOTSUPP.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After timer removal this just calls nf_ct_delete so remove the __ prefix
version and make nf_ct_kill a shorthand for nf_ct_delete.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
Eric Dumazet pointed out during netfilter workshop 2016.
Eric also says: "Another reason was the fact that Thomas was about to
change max timer range [..]" (500462a9de, 'timers: Switch to
a non-cascading wheel').
Remove the timer and use a 32bit jiffies value containing timestamp until
entry is valid.
During conntrack lookup, even before doing tuple comparision, check
the timeout value and evict the entry in case it is too old.
The dying bit is used as a synchronization point to avoid races where
multiple cpus try to evict the same entry.
Because lookup is always lockless, we need to bump the refcnt once
when we evict, else we could try to evict already-dead entry that
is being recycled.
This is the standard/expected way when conntrack entries are destroyed.
Followup patches will introduce garbage colliction via work queue
and further places where we can reap obsoleted entries (e.g. during
netlink dumps), this is needed to avoid expired conntracks from hanging
around for too long when lookup rate is low after a busy period.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The reliable event delivery mode currently (ab)uses the DYING bit to
detect which entries on the dying list have to be skipped when
re-delivering events from the eache worker in reliable event mode.
Currently when we delete the conntrack from main table we only set this
bit if we could also deliver the netlink destroy event to userspace.
If we fail we move it to the dying list, the ecache worker will
reattempt event delivery for all confirmed conntracks on the dying list
that do not have the DYING bit set.
Once timer is gone, we can no longer use if (del_timer()) to detect
when we 'stole' the reference count owned by the timer/hash entry, so
we need some other way to avoid racing with other cpu.
Pablo suggested to add a marker in the ecache extension that skips
entries that have been unhashed from main table but are still waiting
for the last reference count to be dropped (e.g. because one skb waiting
on nfqueue verdict still holds a reference).
We do this by adding a tristate.
If we fail to deliver the destroy event, make a note of this in the
eache extension. The worker can then skip all entries that are in
a different state. Either they never delivered a destroy event,
e.g. because the netlink backend was not loaded, or redelivery took
place already.
Once the conntrack timer is removed we will now be able to replace
del_timer() test with test_and_set_bit(DYING, &ct->status) to avoid
racing with other cpu that tries to evict the same conntrack.
Because DYING will then be set right before we report the destroy event
we can no longer skip event reporting when dying bit is set.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the NLM_F_EXCL flag is set, then new elements that clash with an
existing one return EEXIST. In case you try to add an element whose
data area differs from what we have, then this returns EBUSY. If no
flag is specified at all, then this returns success to userspace.
This patch also update the set insert operation so we can fetch the
existing element that clashes with the one you want to add, we need
this to make sure the element data doesn't differ.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
"meta pkttype set" is only supported on prerouting chain with bridge
family and ingress chain with netdev family.
But the validate check is incomplete, and the user can add the nft
rules on input chain with bridge family, for example:
# nft add table bridge filter
# nft add chain bridge filter input {type filter hook input \
priority 0 \;}
# nft add chain bridge filter test
# nft add rule bridge filter test meta pkttype set unicast
# nft add rule bridge filter input jump test
This patch fixes the problem.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After I add the nft rule "nft add rule filter prerouting reject
with tcp reset", kernel panic happened on my system:
NULL pointer dereference at ...
IP: [<ffffffff81b9db2f>] nf_send_reset+0xaf/0x400
Call Trace:
[<ffffffff81b9da80>] ? nf_reject_ip_tcphdr_get+0x160/0x160
[<ffffffffa0928061>] nft_reject_ipv4_eval+0x61/0xb0 [nft_reject_ipv4]
[<ffffffffa08e836a>] nft_do_chain+0x1fa/0x890 [nf_tables]
[<ffffffffa08e8170>] ? __nft_trace_packet+0x170/0x170 [nf_tables]
[<ffffffffa06e0900>] ? nf_ct_invert_tuple+0xb0/0xc0 [nf_conntrack]
[<ffffffffa07224d4>] ? nf_nat_setup_info+0x5d4/0x650 [nf_nat]
[...]
Because in the PREROUTING chain, routing information is not exist,
then we will dereference the NULL pointer and oops happen.
So we restrict reject expression to INPUT, FORWARD and OUTPUT chain.
This is consistent with iptables REJECT target.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since commit 64b87639c9 ("netfilter: conntrack: fix race between
nf_conntrack proc read and hash resize") introduce the
nf_conntrack_get_ht, so there's no need to check nf_conntrack_generation
again and again to get the hash table and hash size. And convert
nf_conntrack_get_ht to inline function here.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This backward compatibility has been around for more than ten years,
since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
the conntrack utility got adopted by many people in the user community
according to what I observed on the netfilter user mailing list.
So let's get rid of this.
Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
not need to be exported as symbol anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next,
they are:
1) Count pre-established connections as active in "least connection"
schedulers such that pre-established connections to avoid overloading
backend servers on peak demands, from Michal Kubecek via Simon Horman.
2) Address a race condition when resizing the conntrack table by caching
the bucket size when fulling iterating over the hashtable in these
three possible scenarios: 1) dump via /proc/net/nf_conntrack,
2) unlinking userspace helper and 3) unlinking custom conntrack timeout.
From Liping Zhang.
3) Revisit early_drop() path to perform lockless traversal on conntrack
eviction under stress, use del_timer() as synchronization point to
avoid two CPUs evicting the same entry, from Florian Westphal.
4) Move NAT hlist_head to nf_conn object, this simplifies the existing
NAT extension and it doesn't increase size since recent patches to
align nf_conn, from Florian.
5) Use rhashtable for the by-source NAT hashtable, also from Florian.
6) Don't allow --physdev-is-out from OUTPUT chain, just like
--physdev-out is not either, from Hangbin Liu.
7) Automagically set on nf_conntrack counters if the user tries to
match ct bytes/packets from nftables, from Liping Zhang.
8) Remove possible_net_t fields in nf_tables set objects since we just
simply pass the net pointer to the backend set type implementations.
9) Fix possible off-by-one in h323, from Toby DiPasquale.
10) early_drop() may be called from ctnetlink patch, so we must hold
rcu read size lock from them too, this amends Florian's patch #3
coming in this batch, from Liping Zhang.
11) Use binary search to validate jump offset in x_tables, this
addresses the O(n!) validation that was introduced recently
resolve security issues with unpriviledge namespaces, from Florian.
12) Fix reference leak to connlabel in error path of nft_ct, from Zhang.
13) Three updates for nft_log: Fix log prefix leak in error path. Bail
out on loglevel larger than debug in nft_log and set on the new
NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang.
14) Allow to filter rule dumps in nf_tables based on table and chain
names.
15) Simplify connlabel to always use 128 bits to store labels and
get rid of unused function in xt_connlabel, from Florian.
16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack
helper, by Gao Feng.
17) Put back x_tables module reference in nft_compat on error, from
Liping Zhang.
18) Add a reference count to the x_tables extensions cache in
nft_compat, so we can remove them when unused and avoid a crash
if the extensions are rmmod, again from Zhang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntrack label extension is currently variable-sized, e.g. if
only 2 labels are used by iptables rules then the labels->bits[] array
will only contain one element.
We track size of each label storage area in the 'words' member.
But in nftables and openvswitch we always have to ask for worst-case
since we don't know what bit will be used at configuration time.
As most arches are 64bit we need to allocate 24 bytes in this case:
struct nf_conn_labels {
u8 words; /* 0 1 */
/* XXX 7 bytes hole, try to pack */
long unsigned bits[2]; /* 8 24 */
Make bits a fixed size and drop the words member, it simplifies
the code and only increases memory requirements on x86 when
less than 64bit labels are required.
We still only allocate the extension if its needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>