This patch reintroduces immediate rehash during insertion. If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.
If this rehash fails then the insertion will fail. Otherwise the
insertion will be reattempted in the new hash table.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the missing bits to allow multiple rehashes. The
read-side as well as remove already handle this correctly. So it's
only the rehasher and insertion that need modification to handle
this.
Note that this patch doesn't actually enable it so for now rehashing
is still only performed by the worker thread.
This patch also disables the explicit expand/shrink interface because
the table is meant to expand and shrink automatically, and continuing
to export these interfaces unnecessarily complicates the life of the
rehasher since the rehash process is now composed of two parts.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.
This function provides a solution to this problem by allowing
hashfn to be unset. In which case rhashtable will automatically
set it to jhash. Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When rht_key_hashfn is called from rhashtable itself and params
is equal to ht->p, there is no point in checking params.key_len
and falling back to ht->p.key_len.
For some reason gcc couldn't figure out that params is the same
as ht->p. So let's help it by only checking params.key_len when
it's a constant.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJVD+1xAAoJECte4hHFiupUVD8P/2ymRe2lcNILciuzaErjkDjc
GvbvsDM2rIPKfxT1ptrn52l9MhvIN/oFwVjSzdBNFKz9GmCnqvwwT+EHpV2CxdIR
deblAbDAGuzYbDbY7uadOXU2CQWcUw7r7wCgNhIcWs1qd7ndfGVj3eBKFsJsGuvM
5s2sSP4s7U2xgIk2Ts8RHAGdofL/YVUUqSTOWzUfTJm7rPDTtWLXHMk0e3V8kFN0
im9yy00gOnMYpuAMfemD7fm6OClO/3L2x5gjVXNMGrf73TivfLMSrCGC3eqN3W9J
FY7G9502c6cMxNK7eneZE31XCHxpVIOHwGD9C4AWi/b9kHGZjKdQ5HK9KTXAbCLh
eN8h2hpnlrotS1wDXh0ThGqZYpW7LhzKyZvZ3DSW9WBnwf2NqTdHx2HtIQL6WE8X
WAZbmUGJhOh+ZZ7hGhIBNHqAT7qz6rprMIKfjio609ZU3vZscnwEUv+8FEKipKFD
qZTSUE1ZLTczHZg/jMQ208ecE4uD06GsmNzi1kzzHjbaCtGnRhwKfD3yl3V8dn1O
Zz8n2nJqF+V5F+AWFHgJ6AkM5Na7UOvVhtY/8R7u1D1gn+yQStYjY4y4jxR4JdsU
8mIUc+1PhJgIggsvbKYP1IzXvMecWtPDDfIkPIXsOaFANF4jN8PTfLXe1ZYGhpCV
aMuUZk7XUthdJavD0FWZ
=m6GG
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.1-20150323' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2015-03-23
this is a pull request of 6 patches for net-next/master.
A patch by Florian Westphal, converts the skb->destructor to use
sock_efree() instead of own destructor. Ahmed S. Darwish's patch
converts the kvaser_usb driver to use unregister_candev(). A patch by
me removes a return from a void function in the m_can driver. Yegor
Yefremov contributes a patch for combined rx/tx LED trigger support. A
sparse warning in the esd_usb2 driver was fixes by Thomas Körper. Ben
Dooks converts the at91_can driver to use endian agnostic IO accessors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next.
Basically, more incremental updates for br_netfilter from Florian
Westphal, small nf_tables updates (including one fix for rb-tree
locking) and small two-liner to add extra validation for the REJECT6
target.
More specifically, they are:
1) Use the conntrack status flags from br_netfilter to know that DNAT is
happening. Patch for Florian Westphal.
2) nf_bridge->physoutdev == NULL already indicates that the traffic is
bridged, so let's get rid of the BRNF_BRIDGED flag. Also from Florian.
3) Another patch to prepare voidization of seq_printf/seq_puts/seq_putc,
from Joe Perches.
4) Consolidation of nf_tables_newtable() error path.
5) Kill nf_bridge_pad used by br_netfilter from ip_fragment(),
from Florian Westphal.
6) Access rb-tree root node inside the lock and remove unnecessary
locking from the get path (we already hold nfnl_lock there), from
Patrick McHardy.
7) You cannot use a NFT_SET_ELEM_INTERVAL_END when the set doesn't
support interval, also from Patrick.
8) Enforce IP6T_F_PROTO from ip6t_REJECT to make sure the core is
actually restricting matches to TCP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
dccp_v4_err() can restrict lookups to ehash table, and not to listeners.
Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.
New dccp_req_err() helper is exported so that we can use it in IPv6
in following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not needed, and req->sk_listener points to the listener anyway.
request_sock argument can be const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add <ifname>-rxtx trigger, that will be activated both for tx
as rx events. This trigger mimics "activity" LED for Ethernet
devices.
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
It is identical to the can destructor.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The br_netfilter frag output function calls skb_cow_head() so in
case it needs a larger headroom to e.g. re-add a previously stripped PPPOE
or VLAN header things will still work (at cost of reallocation).
We can then move nf_bridge_encap_header_len to br_netfilter.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This reverts commit ca10b9e9a8.
No longer needed after commit eb8895debe
("tcp: tcp_make_synack() should use sock_wmalloc")
When under SYNFLOOD, we build lot of SYNACK and hit false sharing
because of multiple modifications done on sk_listener->sk_wmem_alloc
Since tcp_make_synack() uses sock_wmalloc(), there is no need
to call skb_set_owner_w() again, as this adds two atomic operations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
net/core/sysctl_net_core.c
net/ipv4/inet_diag.c
The be_main.c conflict resolution was really tricky. The conflict
hunks generated by GIT were very unhelpful, to say the least. It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().
So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged. And this worked beautifully.
The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to include linux/errno.h in rhashtable.h since it doesn't
always get included otherwise.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that all rhashtable users have been converted over to the
inline interface, this patch removes the unused out-of-line
interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch deals with the complaint that we make indirect function
calls on the fast paths unnecessarily in rhashtable. We resolve
it by moving the fast paths into inline functions that take struct
rhashtable_param (which obviously must be the same set of parameters
supplied to rhashtable_init) as an argument.
The only remaining indirect call is to obj_hashfn (or key_hashfn it
obj_hashfn is unset) on the rehash as well as the insert-during-
rehash slow path.
This patch also extends the support of vairable-length keys to
include those where the key is fixed but scattered in the object.
For example, in netlink we want to key off the namespace and the
portid but they're not next to each other.
This patch does this by directly using the object hash function
as the indicator of whether the key is accessible or not. It
also adds a new function obj_cmpfn to compare a key against an
object. This means that the caller no longer needs to supply
explicit compare functions.
All this is done in a backwards compatible manner so no existing
users are affected until they convert to the new interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch marks the rhashtable_init params argument const as
there is no reason to modify it since we will always make a copy
of it in the rhashtable.
This patch also fixes a bug where we don't actually round up the
value of min_size unless it is less than HASH_MIN_SIZE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix up consumer return values on pin control stubs.
- Four patches fixing up the interrupt handling and
sleep context save in the Baytrail driver.
- Make default output directions work properly in the
Cherryview driver.
- Fix interrupt locking in the AT91 driver.
- Fix setting interrupt generating lines as input in
the sunxi driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVCo4HAAoJEEEQszewGV1z+HQQAMAzKL9igh3iCDR8p3tmB1sp
ZBTgWl4NHT9MFeA1AmilVEn1JbmUDmDetTN7P/sVC8mxaJiheY8PrFbj4bwBsvzA
JV0mlSBSj7jw8CxNM9rzSWZxRNtdpKr45siLA1SBPdni2x711SRW1H57eK73UQnQ
PEVW9hWzXY4cHU6Q7dX67YDJuteQsu5A1QCy6hBYX4Kyyy5gT8RJs6lAvx2f5k8g
Gsdgs60T/bTmIAiQT3FIf6VUQezW2m1PZn2fhuJeUCZWM17ej2YjVoanQUKu3Bz5
GPvV3wt2FpbKJsum5p4FJQPRD2qPsuq4jg7Msk6QOiVWHOl/QtL30AnS6N/iQ97z
TlblAH3ze2t182rHeI4J8d4FIX8jRfftb9DHlgBhLZFU4k0EanMvqEWN6/8yqgmy
n5nYUB88y6rI5RRLoGAStudlRIHqpz0fFvsU6IW5aZ7wpobIv+JPtMBUbfIKdQBV
Xj39LDJj9W5jtI7Icl2v8q7oTknnEa7rUuH/VYbptMLkXBxndWPKG2JLFwfhQ3Py
KMZvFdLP7E2uAR89KNvqQxbQQuYOK8wx5T2nFV57wVPX6VFv/G9sKUqdS5F7iraS
qg8aBloBN9k79mBBWyIo/XEdluYC/zuKf2KPRvH7UhXep+e9iqCSxWyXqgbgv6F9
MtWvNLoK9kZyxymnqbmJ
=qp0z
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here is a slew of pin control fixes I've accumulated for the v4.0
kernel. Nothing special, just driver fixes (mainly embedded Intel it
seems) and a misunderstanding regarding the stub functions was
reverted:
- Fix up consumer return values on pin control stubs.
- Four patches fixing up the interrupt handling and sleep context
save in the Baytrail driver.
- Make default output directions work properly in the Cherryview
driver.
- Fix interrupt locking in the AT91 driver.
- Fix setting interrupt generating lines as input in the sunxi
driver"
* tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sun4i: GPIOs configured as irq must be set to input before reading
pinctrl: at91: move lock/unlock_as_irq calls into request/release
pinctrl: update direction_output function of cherryview driver
pinctrl: baytrail: Save pin context over system sleep
pinctrl: baytrail: Rework interrupt handling
pinctrl: baytrail: Clear interrupt triggering from pins that are in GPIO mode
pinctrl: baytrail: Relax GPIO request rules
Revert "pinctrl: consumer: use correct retval for placeholder functions"
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-03-19
This wont the last 4.1 bluetooth-next pull request, but we've piled up
enough patches in less than a week that I wanted to save you from a
single huge "last-minute" pull somewhere closer to the merge window.
The main changes are:
- Simultaneous LE & BR/EDR discovery support for HW that can do it
- Complete LE OOB pairing support
- More fine-grained mgmt-command access control (normal user can now do
harmless read-only operations).
- Added RF power amplifier support in cc2520 ieee802154 driver
- Some cleanups/fixes in ieee802154 code
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix packet header offset calculation in _decode_session6(), from
Hajime Tazaki.
2) Fix route leak in error paths of xfrm_lookup(), from Huaibin Wang.
3) Be sure to clear state properly when scans fail in iwlwifi mvm code,
from Luciano Coelho.
4) iwlwifi tries to stop scans that aren't actually running, also from
Luciano Coelho.
5) mac80211 should drop mesh frames that are not encrypted, fix from
Bob Copeland.
6) Add new device ID to b43 wireless driver for BCM432228 chips, from
Rafał Miłecki.
7) Fix accidental addition of members after variable sized array in
struct tc_u_hnode, from WANG Cong.
8) Don't re-enable interrupts until after we call napi_complete() in
ibmveth and WIZnet drivers, frm Yongbae Park.
9) Fix regression in vlan tag handling of fec driver, from Fugang Duan.
10) If a network namespace change fails during rtnl_newlink(), we don't
unwind the device registry properly.
11) Fix two TCP regressions, from Neal Cardwell:
- Don't allow snd_cwnd_cnt to accumulate huge values due to missing
test in tcp_cong_avoid_ai().
- Restore CUBIC back to advancing cwnd by 1.5x packets per RTT.
12) Fix performance regression in xne-netback involving push TX
notifications, from David Vrabel.
13) __skb_tstamp_tx() can be called with a NULL sk pointer, do not
dereference blindly. From Willem de Bruijn.
14) Fix potential stack overflow in RDS protocol stack, from Arnd
Bergmann.
15) VXLAN_VID_MASK used incorrectly in new remote checksum offload
support of VXLAN driver. Fix from Alexey Kodanev.
16) Fix too small netlink SKB allocation in inet_diag layer, from Eric
Dumazet.
17) ieee80211_check_combinations() does not count interfaces correctly,
from Andrei Otcheretianski.
18) Hardware feature determination in bxn2x driver references a piece of
software state that actually isn't initialized yet, fix from Michal
Schmidt.
19) inet_csk_wait_for_connect() needs a sched_annotate_sleep()
annoation, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
Revert "net: cx82310_eth: use common match macro"
net/mlx4_en: Set statistics bitmap at port init
IB/mlx4: Saturate RoCE port PMA counters in case of overflow
net/mlx4_en: Fix off-by-one in ethtool statistics display
IB/mlx4: Verify net device validity on port change event
act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome
Revert "smc91x: retrieve IRQ and trigger flags in a modern way"
inet: Clean up inet_csk_wait_for_connect() vs. might_sleep()
ip6_tunnel: fix error code when tunnel exists
netdevice.h: fix ndo_bridge_* comments
bnx2x: fix encapsulation features on 57710/57711
mac80211: ignore CSA to same channel
nl80211: ignore HT/VHT capabilities without QoS/WMM
mac80211: ask for ECSA IE to be considered for beacon parse CRC
mac80211: count interfaces correctly for combination checks
isdn: icn: use strlcpy() when parsing setup options
rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
caif: fix MSG_OOB test in caif_seqpkt_recvmsg()
bridge: reset bridge mtu after deleting an interface
can: kvaser_usb: Fix tx queue start/stop race conditions
...
When a networking device is taken down that has a non-trivial number
of VLAN devices configured under it, we eat a full synchronize_net()
for every such VLAN device.
This is because of the call chain:
NETDEV_DOWN notifier
--> vlan_device_event()
--> dev_change_flags()
--> __dev_change_flags()
--> __dev_close()
--> __dev_close_many()
--> dev_deactivate_many()
--> synchronize_net()
This is kind of rediculous because we already have infrastructure for
batching doing operation X to a list of net devices so that we only
incur one sync.
So make use of that by exporting dev_close_many() and adjusting it's
interfaace so that the caller can fully manage the batch list. Use
this in vlan_device_event() and all the overhead goes away.
Reported-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to port id allow netdevices to specify port names and export
the name via sysfs. Drivers can implement the netdevice operation to
assist udev in having sane default names for the devices using the
rule:
$ cat /etc/udev/rules.d/80-net-setup-link.rules
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_port_name}!="",
NAME="$attr{phys_port_name}"
Use of phys_name versus phys_id was suggested-by Jiri Pirko.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
in favor of their inner __ ones, which doesn't grab rtnl.
As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.
So this patch:
- move rtnl handling to callers instead while already fixing some
reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
__ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
__ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
const qualifiers ease code review by making clear
which objects are not written in a function.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the low-level device commands and definitions used for QP max-rate limiting.
This is done through the following elements:
- read rate-limit device caps in QUERY_DEV_CAP: number of different
rates and the min/max rates in Kbs/Mbs/Gbs units
- enhance the QP context struct to contain rate limit units and value
- allow to do run time rate-limit setting to QPs through the
update-qp firmware command
- QP rate-limiting is disallowed for VFs
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a tx_maxrate attribute to the tx queue sysfs entry allowing
for max-rate limiting. Along with DCB-ETS and BQL this provides another
knob to tune queue performance. The limit units are Mbps.
By default it is disabled. To disable the rate limitation after it
has been set for a queue, it should be set to zero.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull livepatching fix from Jiri Kosina:
- fix for potential race with module loading, from Petr Mladek.
The race is very unlikely to be seen in real world and has been found
by code inspection, but should be fixed for 4.0 anyway.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Fix subtle race with coming and going modules
The TI CC2521 is an RF power amplifier that is designed to interface
with the CC2520. Conveniently, it directly interfaces with the CC2520
and does not require any pins to be connected to a
microcontroller/processor. Adding a CC2591 increases the CC2520's range,
which is useful for border router and other wall-powered applications.
Using the CC2591 with the CC2520 requires configuring the CC2520 GPIOs
that are connected to the CC2591 to correctly set the CC2591 into TX and
RX modes. Further, TI recommends that the CC2520_TXPOWER and
CC2520_AGCCTRL1 registers are set differently to maximize the CC2591's
performance. These settings are covered in TI Application Note AN065.
This patch adds an optional `amplified` field to the cc2520 entry in the
device tree. If present, the CC2520 will be configured to operate with a
CC2591.
The expected pin mapping is:
CC2520 GPIO0 --> CC2591 EN
CC2520 GPIO5 --> CC2591 PAEN
Signed-off-by: Brad Campbell <bradjc5@gmail.com>
Acked-by: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that nobody uses max_shift and min_shift, we can safely remove
them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the parameters max_size and min_size which are
meant to replace max_shift and min_shift.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keeping both size and shift is silly. We only need one.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The listener field in struct tcp_request_sock is a pointer
back to the listener. We now have req->rsk_listener, so TCP
only needs one boolean and not a full pointer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The argument 'flags' was missing in ndo_bridge_setlink().
ndo_bridge_dellink() was missing.
Fixes: 407af3299e ("bridge: Add netlink interface to configure vlans on bridge ports")
Fixes: add511b382 ("bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellink")
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Its not needed anymore since 2bf540b73e
([NETFILTER]: bridge-netfilter: remove deferred hooks).
Before this it was possible to have physoutdev set for locally generated
packets -- this isn't the case anymore:
BRNF_STATE_BRIDGED flag is set when we assign nf_bridge->physoutdev,
so physoutdev != NULL means BRNF_STATE_BRIDGED is set.
If physoutdev is NULL, then we are looking at locally-delivered and
routed packet.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ask conntrack instead of storing ipv4 address in nf_bridge_info->data.
Ths avoids the need to use ->data during NF_PRE_ROUTING.
Only two functions that need ->data remain.
These will be addressed in followup patches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
As discussed at netconf, introduce swdev_ops as first step to move switchdev
ops from ndo to swdev. This will keep switchdev from cluttering up ndo ops
space.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
introduce user accessible mirror of in-kernel 'struct sk_buff':
struct __sk_buff {
__u32 len;
__u32 pkt_type;
__u32 mark;
__u32 queue_mapping;
};
bpf programs can do:
int bpf_prog(struct __sk_buff *skb)
{
__u32 var = skb->pkt_type;
which will be compiled to bpf assembler as:
dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type)
bpf verifier will check validity of access and will convert it to:
dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset))
dst_reg &= 7
since skb->pkt_type is a bitfield.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the possibility to obtain raw_smp_processor_id() in
eBPF. Currently, this is only possible in classic BPF where commit
da2033c282 ("filter: add SKF_AD_RXHASH and SKF_AD_CPU") has added
facilities for this.
Perhaps most importantly, this would also allow us to track per CPU
statistics with eBPF maps, or to implement a poor-man's per CPU data
structure through eBPF maps.
Example function proto-type looks like:
u32 (*smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id;
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work is similar to commit 4cd3675ebf ("filter: added BPF
random opcode") and adds a possibility for packet sampling in eBPF.
Currently, this is only possible in classic BPF and useful to
combine sampling with f.e. packet sockets, possible also with tc.
Example function proto-type looks like:
u32 (*prandom_u32)(void) = (void *)BPF_FUNC_get_prandom_u32;
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
driver fixes for new regressions since v3.19. Second are fixes to the
common clock divider type caused by recent changes to how we round clock
rates. This affects many clock drivers that use this common code.
Finally there are fixes for drivers that improperly compared struct clk
pointers (drivers must not deref these pointers). While some of these
drivers have done this for a long time, this did not cause a problem
until we started generating unique struct clk pointers for every
consumer. A new function, clk_is_match was introduced to get these
drivers working again and they are fixed up to no longer deref the
pointers themselves.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVBewEAAoJEKI6nJvDJaTUR2gP/2frIXCm/krwkDyofIGxxQ+F
RwIXFTn9GG9QEUwKLUcxRUegHbWbZMXYp6W19hUcUdYz3pD+uEJSuH0NI8kW1Ohy
32P5/ALuoTq7OVzBBz/9di9jBDdIM1wusLZGJfOWk9DXBLOS3VHuhN55D47dqWS/
GsszeEpp8r1WBKFVmAkuQ5Jc0CqgS5GxvMOndXXVN3kDMhCT+9pBiqtUT0V3YV/J
d5GCvfPlO/Xmjnpjf99MPButkfiW/o6YXt3H0QY6hhskS1Av8alyfabVctk8lqOW
py8SQFY7MdRLZ84Zk87sqKCKUc/vHkTBT9vKWYm65l3yJ5OEFv60NaFMYY61HVlJ
n6qWU6SbFZvkPnQTJn6Ii/v7BQ92bXjYpLNcBK8UY35jIjmHsPS/YXCbkmArtn1N
/yAB4TIfH8uX93smFb3XEmZLSiKFuZAhU8YbjDzYgsGuQ9EwN3aNP9c5mzC7Soou
tYnDqVic0i993qQTD2Db5dplGwxelCRJpazO2kK6NW/EJzE8XJaM6XVy1xIBKiDX
bbWPdp53/eWV7gbUEZ8zzcS06G/DLw5/N45XyaWIx+2ThDjhOAlVHTAFDY+Oa743
42Dkwtr5GZ6yORyXY1wI5HttUGU0gnPN0kM84GQG8O/GbzGureZSW1e6G7tJJljn
JXhROl0w4aIPwUUxry7R
=nc5C
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clock framework fixes from Michael Turquette:
"The clk fixes for 4.0-rc4 comprise three themes.
First are the usual driver fixes for new regressions since v3.19.
Second are fixes to the common clock divider type caused by recent
changes to how we round clock rates. This affects many clock drivers
that use this common code.
Finally there are fixes for drivers that improperly compared struct
clk pointers (drivers must not deref these pointers). While some of
these drivers have done this for a long time, this did not cause a
problem until we started generating unique struct clk pointers for
every consumer. A new function, clk_is_match was introduced to get
these drivers working again and they are fixed up to no longer deref
the pointers themselves"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
ASoC: kirkwood: fix struct clk pointer comparing
ASoC: fsl_spdif: fix struct clk pointer comparing
ARM: imx: fix struct clk pointer comparing
clk: introduce clk_is_match
clk: don't export static symbol
clk: divider: fix calculation of initial best divider when rounding to closest
clk: divider: fix selection of divider when rounding to closest
clk: divider: fix calculation of maximal parent rate for a given divider
clk: divider: return real rate instead of divider value
clk: qcom: fix platform_no_drv_owner.cocci warnings
clk: qcom: fix platform_no_drv_owner.cocci warnings
clk: qcom: Add PLL4 vote clock
clk: qcom: lcc-msm8960: Fix PLL rate detection
clk: qcom: Fix slimbus n and m val offsets
clk: ti: Fix FAPLL parent enable bit handling
- armada-370-xp
- Chained per-cpu interrupts
- gic{,-v3,v3-its}
- Various fixes for safer operation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJVBO4uAAoJEP45WPkGe8Zn8jAQAL0bbO4wTUf8eIAVMQ8gvpWT
RmmPx806bctxWzCQdZeI+5SLbAoMGH4sGN2nvZjxs+gf1+8TDxcNbqpcJdng88+v
KcYCBVAtNCae3ZScCIiqgbqOWO6EeCi2XoEdxKmmoY6xvw/S+392Gq5Yx6rRhTui
SYuhq6ZEdeugfaFDZoJE4piBPQWsH6KTCZ2GbLeQbrPGYI1ChIz79bwn0vE3KNXH
Ctlv3F9CdRgqXMvX5RbE0bqYlAfzV+qkZciIGLd1iLXxfve68HTQYeO1bfMXd2LT
o3JQBivSU+RMYYXY/+xG3h4+poT9mOlMwArQyEFmYsLCADUkKY0XEkiSfYypcqtl
RBrIxX1bbcjAFWQPGADP9x7MeR1fgHrPbSvfWRaKRM+R7+mPXyfidP4CNvhCdVoN
37TpU+XKy/T+bTuciDPieqOHjIcCDZLFy4EHgs1T8KkcGlMr9/AC6noW5ce46zut
L4CajpAuwsRVAEgERRLU0xdDcRJYHO6GIou4RS4aGyejjaK1bkq1n9/svDt3GmfI
urUCNcjkkLsYqSBlYhmcUE6ijPnhvLCaaACfomUU2Jlk4YRhGhEpzN2l5sFDvbaA
xr4DcPV7/XGqzNTwpk1MHKZaloaRhHF2HDjfyKv+23xMPeNu+nRovabHWP4Il2LI
vowXFSS9ciCxnVOLfEXA
=DqZ5
-----END PGP SIGNATURE-----
Merge tag 'irqchip-fixes-4.0' of git://git.infradead.org/users/jcooper/linux
Pull irqchip fixes from Jason Cooper:
"armada-370-xp:
- Chained per-cpu interrupts
gic{,-v3,v3-its}"
- Various fixes for safer operation"
* tag 'irqchip-fixes-4.0' of git://git.infradead.org/users/jcooper/linux:
irqchip: gicv3-its: Support safe initialization
irqchip: gicv3-its: Define macros for GITS_CTLR fields
irqchip: gicv3-its: Add limitation to page order
irqchip: gicv3-its: Use 64KB page as default granule
irqchip: gicv3-its: Zero itt before handling to hardware
irqchip: gic-v3: Fix out of bounds access to cpu_logical_map
irqchip: gic: Fix unsafe locking reported by lockdep
irqchip: gicv3-its: Fix unsafe locking reported by lockdep
irqchip: gicv3-its: Iterate over PCI aliases to generate ITS configuration
irqchip: gicv3-its: Allocate enough memory for the full range of DeviceID
irqchip: gicv3-its: Fix ITS CPU init
irqchip: armada-370-xp: Fix chained per-cpu interrupts
This patch moves future_tbl to open up the possibility of having
multiple rehashes on the same table.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a rehash counter to bucket_table to indicate
the last bucket that has been rehashed. This serves two purposes:
1. Any bucket that has been rehashed can never gain a new object.
2. If the rehash counter reaches the size of the table, the table
will forever remain empty.
This patch also downsizes bucket_table->size to an unsigned int
since we do not support sizes greater than 32 bits yet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is in fact no need to wait for an RCU grace period in the
rehash function, since all insertions are guaranteed to go into
the new table through spin locks.
This patch uses call_rcu to free the old/rehashed table at our
leisure.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously whenever the walker encountered a resize it simply
snaps back to the beginning and starts again. However, this only
works if the rehash started and completed while the walker was
idle.
If the walker attempts to restart while the rehash is still ongoing,
we may miss objects that we shouldn't have.
This patch fixes this by making the walker walk the old table
followed by the new table just like all other readers. If a
rehash is detected we will still signal our caller of the fact
so they can prepare for duplicates but we will simply continue
the walk onto the new table after the old one is finished either
by us or by the rehasher.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
Here's another set of Bluetooth & ieee802154 patches intended for 4.1:
- Added support for QCA ROME chipset family in the btusb driver
- at86rf230 driver fixes & cleanups
- ieee802154 cleanups
- Refactoring of Bluetooth mgmt API to allow new users
- New setting for static Bluetooth address exposed to user space
- Refactoring of hci_dev flags to remove limit of 32
- Remove unnecessary fast-connectable setting usage restrictions
- Fix behavior to be consistent when trying to pair already paired device
- Service discovery corner-case fixes
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix the max sifs size correction when the
IEEE802154_HW_TX_OMIT_CKSUM flag is set. With this flag the sk_buff
doesn't contain the CRC, because the transceiver will add the CRC
while transmit.
Also add some defines for the max sifs frame size value and frame check
sequence according to 802.15.4 standard.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>