Commit graph

30472 commits

Author SHA1 Message Date
Eyal Perry
d324353919 net/vlan: Provide read access to the vlan egress map
Provide a method for read-only access to the vlan device egress mapping.

Do this by refactoring vlan_dev_get_egress_qos_mask() such that now it
receives as an argument the skb priority instead of pointer to the skb.

Such an access is needed for the IBoE stack where the control plane
goes through the network stack. This is an add-on step on top of commit
d4a968658c "net/route: export symbol ip_tos2prio" which allowed the RDMA-CM
to use ip_tos2prio.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:09:44 -05:00
Erik Hugne
a715b49e79 tipc: reassembly failures should cause link reset
If appending a received fragment to the pending fragment chain
in a unicast link fails, the current code tries to force a retransmission
of the fragment by decrementing the 'next received sequence number'
field in the link. This is done under the assumption that the failure
is caused by an out-of-memory situation, an assumption that does
not hold true after the previous patch in this series.

A failure to append a fragment can now only be caused by a protocol
violation by the sending peer, and it must hence be assumed that it
is either malicious or buggy.  Either way, the correct behavior is now
to reset the link instead of trying to revert its sequence number.
So, this is what we do in this commit.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 18:30:11 -05:00
Erik Hugne
40ba3cdf54 tipc: message reassembly using fragment chain
When the first fragment of a long data data message is received on a link, a
reassembly buffer large enough to hold the data from this and all subsequent
fragments of the message is allocated. The payload of each new fragment is
copied into this buffer upon arrival. When the last fragment is received, the
reassembled message is delivered upwards to the port/socket layer.

Not only is this an inefficient approach, but it may also cause bursts of
reassembly failures in low memory situations. since we may fail to allocate
the necessary large buffer in the first place. Furthermore, after 100 subsequent
such failures the link will be reset, something that in reality aggravates the
situation.

To remedy this problem, this patch introduces a different approach. Instead of
allocating a big reassembly buffer, we now append the arriving fragments
to a reassembly chain on the link, and deliver the whole chain up to the
socket layer once the last fragment has been received. This is safe because
the retransmission layer of a TIPC link always delivers packets in strict
uninterrupted order, to the reassembly layer as to all other upper layers.
Hence there can never be more than one fragment chain pending reassembly at
any given time in a link, and we can trust (but still verify) that the
fragments will be chained up in the correct order.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 18:30:11 -05:00
Erik Hugne
528f6f4bf3 tipc: don't reroute message fragments
When a message fragment is received in a broadcast or unicast link,
the reception code will append the fragment payload to a big reassembly
buffer through a call to the function tipc_recv_fragm(). However, after
the return of that call, the logics goes on and passes the fragment
buffer to the function tipc_net_route_msg(), which will simply drop it.
This behavior is a remnant from the now obsolete multi-cluster
functionality, and has no relevance in the current code base.

Although currently harmless, this unnecessary call would be fatal
after applying the next patch in this series, which introduces
a completely new reassembly algorithm. So we change the code to
eliminate the redundant call.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 18:30:11 -05:00
Linus Torvalds
c224b76b56 NFS client updates for Linux 3.13
Highlights include:
 
 - Changes to the RPC socket code to allow NFSv4 to turn off timeout+retry
   - Detect TCP connection breakage through the "keepalive" mechanism
 - Add client side support for NFSv4.x migration (Chuck Lever)
 - Add support for multiple security flavour arguments to the "sec=" mount
   option (Dros Adamson)
 - fs-cache bugfixes from David Howells:
   - Fix an issue whereby caching can be enabled on a file that is open for
     writing
 - More NFSv4 open code stable bugfixes
 - Various Labeled NFS (selinux) bugfixes, including one stable fix
 - Fix buffer overflow checking in the RPCSEC_GSS upcall encoding
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSe8TEAAoJEGcL54qWCgDydu0QAJVtVhfwlUKm/HZ4oAy0Q5T8
 rJOWupqGnwyqTNLIRTlNegFSwMY+bABbkihXzSoj641o5zRb200KePlNxknzzlu1
 Q715035LDeEC1jrrHHeztTa9uWxAZ9B6gstMzilJYbV72VRYuWA6Q5LstXwQy/jN
 ViSldrGJ4sRZUe6wpNLPBRDBfOMWOtZdyRqqqjm71ZHJJnaqQWLBvThTG4MsLlpg
 j/khi5189MxJWePTKI9zGZdnXZAZ0ar1tAi1QWDNv044EwsS3LZZIko+YdBh6LZx
 9IBwk6TqOXFY0jxPDsIZtTfWPf4pjewRrPINMkjlZl3TJEf97sIlavZ7gWqvVIz5
 eXzFGy7D2XBgub8TGcmZM/7keHY/sqghz7lXZ8FulXlVem52r/95NiQ9tu8l8hq3
 Ab0FUnjtXeuaDFPBCHlKb3zmCMGFF89VqtpCj2plCPvfcGgJvXJqddWBRisQw9St
 UgD1PQWRFGtkrHv5EcQkd5boVdRNjAVAC9PaCWNpOpSVDjJyuUE+v/k75+ZwDcG8
 afAFMJSbCwRxW+cFlLAsQTfQztzuWTTOOVQvJDxfyYulcWshyIruhiYItRDfJqRp
 RynuVzrBERzUs5wsefnBbC218C/WSlOrodPbsZvdhKolvRx1RNtWT29ilZ6+p2tH
 4378ZRLtQvm9RXBnAkRc
 =gflJ
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Changes to the RPC socket code to allow NFSv4 to turn off
     timeout+retry:
      * Detect TCP connection breakage through the "keepalive" mechanism
   - Add client side support for NFSv4.x migration (Chuck Lever)
   - Add support for multiple security flavour arguments to the "sec="
     mount option (Dros Adamson)
   - fs-cache bugfixes from David Howells:
     * Fix an issue whereby caching can be enabled on a file that is
       open for writing
   - More NFSv4 open code stable bugfixes
   - Various Labeled NFS (selinux) bugfixes, including one stable fix
   - Fix buffer overflow checking in the RPCSEC_GSS upcall encoding"

* tag 'nfs-for-3.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits)
  NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security
  NFSv4: Sanity check the server reply in _nfs4_server_capabilities
  NFSv4.2: encode_readdir - only ask for labels when doing readdirplus
  nfs: set security label when revalidating inode
  NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec
  NFS: Fix a missing initialisation when reading the SELinux label
  nfs: fix oops when trying to set SELinux label
  nfs: fix inverted test for delegation in nfs4_reclaim_open_state
  SUNRPC: Cleanup xs_destroy()
  SUNRPC: close a rare race in xs_tcp_setup_socket.
  SUNRPC: remove duplicated include from clnt.c
  nfs: use IS_ROOT not DCACHE_DISCONNECTED
  SUNRPC: Fix buffer overflow checking in gss_encode_v0_msg/gss_encode_v1_msg
  SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message
  SUNRPC: remove an unnecessary if statement
  nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'
  nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function
  nfs: Remove useless 'error' assignment
  sunrpc: comment typo fix
  SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport
  ...
2013-11-08 05:57:46 +09:00
Linus Torvalds
0324e74534 Driver Core / sysfs patches for 3.13-rc1
Here's the big driver core / sysfs update for 3.13-rc1.
 
 There's lots of dev_groups updates for different subsystems, as they all
 get slowly migrated over to the safe versions of the attribute groups
 (removing userspace races with the creation of the sysfs files.)  Also
 in here are some kobject updates, devres expansions, and the first round
 of Tejun's sysfs reworking to enable it to be used by other subsystems
 as a backend for an in-kernel filesystem.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJ6xAMACgkQMUfUDdst+yk1kQCfcHXhfnrvFZ5J/mDP509IzhNS
 ddEAoLEWoivtBppNsgrWqXpD1vi4UMsE
 =JmVW
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core / sysfs patches from Greg KH:
 "Here's the big driver core / sysfs update for 3.13-rc1.

  There's lots of dev_groups updates for different subsystems, as they
  all get slowly migrated over to the safe versions of the attribute
  groups (removing userspace races with the creation of the sysfs
  files.) Also in here are some kobject updates, devres expansions, and
  the first round of Tejun's sysfs reworking to enable it to be used by
  other subsystems as a backend for an in-kernel filesystem.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (83 commits)
  sysfs: rename sysfs_assoc_lock and explain what it's about
  sysfs: use generic_file_llseek() for sysfs_file_operations
  sysfs: return correct error code on unimplemented mmap()
  mdio_bus: convert bus code to use dev_groups
  device: Make dev_WARN/dev_WARN_ONCE print device as well as driver name
  sysfs: separate out dup filename warning into a separate function
  sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c
  sysfs: remove unused sysfs_get_dentry() prototype
  sysfs: honor bin_attr.attr.ignore_lockdep
  sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr
  devres: restore zeroing behavior of devres_alloc()
  sysfs: fix sysfs_write_file for bin file
  input: gameport: convert bus code to use dev_groups
  input: serio: remove bus usage of dev_attrs
  input: serio: use DEVICE_ATTR_RO()
  i2o: convert bus code to use dev_groups
  memstick: convert bus code to use dev_groups
  tifm: convert bus code to use dev_groups
  virtio: convert bus code to use dev_groups
  ipack: convert bus code to use dev_groups
  ...
2013-11-07 11:42:15 +09:00
John Stultz
5ac68e7c34 ipv6: Fix possible ipv6 seqlock deadlock
While enabling lockdep on seqlocks, I ran across the warning below
caused by the ipv6 stats being updated in both irq and non-irq context.

This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested
by Eric Dumazet) to resolve this problem.

[   11.120383] =================================
[   11.121024] [ INFO: inconsistent lock state ]
[   11.121663] 3.12.0-rc1+ #68 Not tainted
[   11.122229] ---------------------------------
[   11.122867] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes:
[   11.124505]  (&stats->syncp.seq#6){+.?...}, at: [<c1ab80c2>] ndisc_send_ns+0xe2/0x130
[   11.125736] {SOFTIRQ-ON-W} state was registered at:
[   11.126447]   [<c10e0eb7>] __lock_acquire+0x5c7/0x1af0
[   11.127222]   [<c10e2996>] lock_acquire+0x96/0xd0
[   11.127925]   [<c1a9a2c3>] write_seqcount_begin+0x33/0x40
[   11.128766]   [<c1a9aa03>] ip6_dst_lookup_tail+0x3a3/0x460
[   11.129582]   [<c1a9e0ce>] ip6_dst_lookup_flow+0x2e/0x80
[   11.130014]   [<c1ad18e0>] ip6_datagram_connect+0x150/0x4e0
[   11.130014]   [<c1a4d0b5>] inet_dgram_connect+0x25/0x70
[   11.130014]   [<c198dd61>] SYSC_connect+0xa1/0xc0
[   11.130014]   [<c198f571>] SyS_connect+0x11/0x20
[   11.130014]   [<c198fe6b>] SyS_socketcall+0x12b/0x300
[   11.130014]   [<c1bbf880>] syscall_call+0x7/0xb
[   11.130014] irq event stamp: 1184
[   11.130014] hardirqs last  enabled at (1184): [<c1086901>] local_bh_enable+0x71/0x110
[   11.130014] hardirqs last disabled at (1183): [<c10868cd>] local_bh_enable+0x3d/0x110
[   11.130014] softirqs last  enabled at (0): [<c108014d>] copy_process.part.42+0x45d/0x11a0
[   11.130014] softirqs last disabled at (1147): [<c1086e05>] irq_exit+0xa5/0xb0
[   11.130014]
[   11.130014] other info that might help us debug this:
[   11.130014]  Possible unsafe locking scenario:
[   11.130014]
[   11.130014]        CPU0
[   11.130014]        ----
[   11.130014]   lock(&stats->syncp.seq#6);
[   11.130014]   <Interrupt>
[   11.130014]     lock(&stats->syncp.seq#6);
[   11.130014]
[   11.130014]  *** DEADLOCK ***
[   11.130014]
[   11.130014] 3 locks held by init/4483:
[   11.130014]  #0:  (rcu_read_lock){.+.+..}, at: [<c109363c>] SyS_setpriority+0x4c/0x620
[   11.130014]  #1:  (((&ifa->dad_timer))){+.-...}, at: [<c108c1c0>] call_timer_fn+0x0/0xf0
[   11.130014]  #2:  (rcu_read_lock){.+.+..}, at: [<c1ab6494>] ndisc_send_skb+0x54/0x5d0
[   11.130014]
[   11.130014] stack backtrace:
[   11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68
[   11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   11.130014]  00000000 00000000 c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 c1ec1123
[   11.130014]  c1ec1484 00001183 00000000 00000000 00000001 00000003 00000001 00000000
[   11.130014]  c1ec1484 00000004 c5712dcc 00000000 c55e5c84 c10de492 00000004 c10755f2
[   11.130014] Call Trace:
[   11.130014]  [<c1bb0e71>] dump_stack+0x4b/0x66
[   11.130014]  [<c1badf79>] print_usage_bug+0x1d3/0x1dd
[   11.130014]  [<c10de492>] mark_lock+0x282/0x2f0
[   11.130014]  [<c10755f2>] ? kvm_clock_read+0x22/0x30
[   11.130014]  [<c10dd8b0>] ? check_usage_backwards+0x150/0x150
[   11.130014]  [<c10e0e74>] __lock_acquire+0x584/0x1af0
[   11.130014]  [<c10b1baf>] ? sched_clock_cpu+0xef/0x190
[   11.130014]  [<c10de58c>] ? mark_held_locks+0x8c/0xf0
[   11.130014]  [<c10e2996>] lock_acquire+0x96/0xd0
[   11.130014]  [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130
[   11.130014]  [<c1ab66d3>] ndisc_send_skb+0x293/0x5d0
[   11.130014]  [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130
[   11.130014]  [<c1ab80c2>] ndisc_send_ns+0xe2/0x130
[   11.130014]  [<c108cc32>] ? mod_timer+0xf2/0x160
[   11.130014]  [<c1aa706e>] ? addrconf_dad_timer+0xce/0x150
[   11.130014]  [<c1aa70aa>] addrconf_dad_timer+0x10a/0x150
[   11.130014]  [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0
[   11.130014]  [<c108c233>] call_timer_fn+0x73/0xf0
[   11.130014]  [<c108c1c0>] ? __internal_add_timer+0xb0/0xb0
[   11.130014]  [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0
[   11.130014]  [<c108c5b1>] run_timer_softirq+0x141/0x1e0
[   11.130014]  [<c1086b20>] ? __do_softirq+0x70/0x1b0
[   11.130014]  [<c1086b70>] __do_softirq+0xc0/0x1b0
[   11.130014]  [<c1086e05>] irq_exit+0xa5/0xb0
[   11.130014]  [<c106cfd5>] smp_apic_timer_interrupt+0x35/0x50
[   11.130014]  [<c1bbfbca>] apic_timer_interrupt+0x32/0x38
[   11.130014]  [<c10936ed>] ? SyS_setpriority+0xfd/0x620
[   11.130014]  [<c10e26c9>] ? lock_release+0x9/0x240
[   11.130014]  [<c10936d7>] ? SyS_setpriority+0xe7/0x620
[   11.130014]  [<c1bbee6d>] ? _raw_read_unlock+0x1d/0x30
[   11.130014]  [<c1093701>] SyS_setpriority+0x111/0x620
[   11.130014]  [<c109363c>] ? SyS_setpriority+0x4c/0x620
[   11.130014]  [<c1bbf880>] syscall_call+0x7/0xb

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:40:28 +01:00
John Stultz
827da44c61 net: Explicitly initialize u64_stats_sync structures for lockdep
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:40:25 +01:00
Duan Jiong
249a3630c4 ipv6: drop the judgement in rt6_alloc_cow()
Now rt6_alloc_cow() is only called by ip6_pol_route() when
rt->rt6i_flags doesn't contain both RTF_NONEXTHOP and RTF_GATEWAY,
and rt->rt6i_flags hasn't been changed in ip6_rt_copy().
So there is no neccessary to judge whether rt->rt6i_flags contains
RTF_GATEWAY or not.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 22:17:05 -05:00
Hannes Frederic Sowa
0e033e04c2 ipv6: fix headroom calculation in udp6_ufo_fragment
Commit 1e2bd517c1 ("udp6: Fix udp
fragmentation for tunnel traffic.") changed the calculation if
there is enough space to include a fragment header in the skb from a
skb->mac_header dervived one to skb_headroom. Because we already peeled
off the skb to transport_header this is wrong. Change this back to check
if we have enough room before the mac_header.

This fixes a panic Saran Neti reported. He used the tbf scheduler which
skb_gso_segments the skb. The offsets get negative and we panic in memcpy
because the skb was erroneously not expanded at the head.

Reported-by: Saran Neti <Saran.Neti@telus.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 22:09:53 -05:00
Hannes Frederic Sowa
482fc6094a ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.

Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.

(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)

Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>

This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.

This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.

Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.

Cc: David S. Miller <davem@davemloft.net>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 21:52:27 -05:00
John W. Linville
33b443422e Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-11-05 15:50:22 -05:00
John W. Linville
b476d3f143 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-11-05 15:49:16 -05:00
John W. Linville
353c78152c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	net/wireless/reg.c
2013-11-05 15:49:02 -05:00
Florent Fourcot
b579035ff7 ipv6: remove old conditions on flow label sharing
The code of flow label in Linux Kernel follows
the rules of RFC 1809 (an informational one) for
conditions on flow label sharing. There rules are
not in the last proposed standard for flow label
(RFC 6437), or in the previous one (RFC 3697).

Since this code does not follow any current or
old standard, we can remove it.

With this removal, the ipv6_opt_cmp function is
now a dead code and it can be removed too.

Changelog to v1:
 * add justification for the change
 * remove the condition on IPv6 options

[ Remove ipv6_hdr_cmp and it is now unused as well. -DaveM ]

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 14:40:53 -05:00
David S. Miller
cfce0a2b61 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please accept the following pull request intended for the 3.13 tree...

I had intended to pass most of these to you as much as two weeks ago.
Unfortunately, I failed to account for the effects of bad Internet
connections and my own fatique/laziness while traveling.  On the bright
side, at least these have been baking in linux-next for some time!

For the mac80211 bits, Johannes says:

"This time I have two fixes for P2P (which requires not using CCK rates)
and a workaround for APs with broken WMM information."

For the iwlwifi bits, Johannes says:

"I have a few fixes for warnings/issues: one from Alex, fixing scan
timings, one from Emmanuel fixing a WARN_ON in the DVM driver, one from
Stanislaw removing a trigger-happy WARN_ON in the MVM driver and a
change from myself to try to recover when the device isn't processing
commands quickly."

And:

"For this round, I have a lot of changes:
 * power management improvements
 * BT coexistence improvements/updates
 * new device support
 * VHT support
 * IBSS support (though due to a small bug it requires new firmware)
 * various other fixes/improvements."

For the Bluetooth bits, Gustavo says:

"More patches for 3.12, busy times for Bluetooth. More than a 100 commits since
the last pull. The bulk of work comes from Johan and Marcel, they are doing
fixes and improvements all over the Bluetooth subsystem, as the diffstat can
show."

For the ath10k and ath6kl bits, Kalle says:

"Bartosz added support to ath10k for our 10.x AP firmware branch, which
gives us AP specific features and fixes. We still support the main
firmware branch as well just like before, ath10k detects runtime what
firmware is used. Unfortunately the firmware interface in 10.x branch is
somewhat different so there was quite a lot of changes in ath10k for
this.

Michal and Sujith did some performance improvements in ath10k. Vladimir
fixed a compiler warning and Fengguang removed an extra semicolon."

For the NFC bits, Samuel says:

"It's a fairly big one, with the following highlights:

- NFC digital layer implementation: Most NFC chipsets implement the NFC
  digital layer in firmware, but others have more basic functionalities
  and expect the host to implement the digital layer. This layer sits
  below the NFC core.

- Sony's port100 support: This is "soft" NFC USB dongle that expects the
  digital layer to be implemented on the host. This is the first user of
  our NFC digital stack implementation.

- Secure element API: We now provide a netlink API for enabling,
  disabling and discovering NFC attached (embedded or UICC ones) secure
  elements. With some userspace help, this allows us to support NFC
  payments.
  Only the pn544 driver currently supports that API.

- NCI SPI fixes and improvements: In order to support NCI devices over
  SPI, we fixed and improved our NCI/SPI implementation. The currently
  most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode
  and we're planning to use our NCI/SPI framework to implement a
  driver for it.

- pn533 fragmentation support in target mode: This was the only missing
  feature from our pn533 impementation. We now support fragmentation in
  both Tx and Rx modes, in target mode."

On top of all that, brcmfmac and rt2x00 both get the usual flurry
of updates.  A few other drivers get hit here or there as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 02:34:57 -05:00
Jason Wang
f8e617e100 net: introduce skb_coalesce_rx_frag()
Sometimes we need to coalesce the rx frags to avoid frag list. One example is
virtio-net driver which tries to use small frags for both MTU sized packet and
GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 20:03:52 -05:00
Yuchung Cheng
9f9843a751 tcp: properly handle stretch acks in slow start
Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
regardless the number of packets. Consequently slow start performance
is highly dependent on the degree of the stretch ACKs caused by
receiver or network ACK compression mechanisms (e.g., delayed-ACK,
GRO, etc).  But slow start algorithm is to send twice the amount of
packets of packets left so it should process a stretch ACK of degree
N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
follow up patch will use the remainder of the N (if greater than 1)
to adjust cwnd in the congestion avoidance phase.

In addition this patch retires the experimental limited slow start
(LSS) feature. LSS has multiple drawbacks but questionable benefit. The
fractional cwnd increase in LSS requires a loop in slow start even
though it's rarely used. Configuring such an increase step via a global
sysctl on different BDPS seems hard. Finally and most importantly the
slow start overshoot concern is now better covered by the Hybrid slow
start (hystart) enabled by default.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 19:57:59 -05:00
Yuchung Cheng
0d41cca490 tcp: enable sockets to use MSG_FASTOPEN by default
Applications have started to use Fast Open (e.g., Chrome browser has
such an optional flag) and the feature has gone through several
generations of kernels since 3.7 with many real network tests. It's
time to enable this flag by default for applications to test more
conveniently and extensively.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 19:57:47 -05:00
David S. Miller
f8785c5514 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
This batch contains fives nf_tables patches for your net-next tree,
they are:

* Fix possible use after free in the module removal path of the
  x_tables compatibility layer, from Dan Carpenter.

* Add filter chain type for the bridge family, from myself.

* Fix Kconfig dependencies of the nf_tables bridge family with
  the core, from myself.

* Fix sparse warnings in nft_nat, from Tomasz Bursztyka.

* Remove duplicated include in the IPv4 family support for nf_tables,
  from Wei Yongjun.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 19:48:57 -05:00
David S. Miller
72c39a0ade Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
This is another batch containing Netfilter/IPVS updates for your net-next
tree, they are:

* Six patches to make the ipt_CLUSTERIP target support netnamespace,
  from Gao feng.

* Two cleanups for the nf_conntrack_acct infrastructure, introducing
  a new structure to encapsulate conntrack counters, from Holger
  Eitzenberger.

* Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann.

* Skip checksum recalculation in SCTP support for IPVS, also from
  Daniel Borkmann.

* Fix behavioural change in xt_socket after IP early demux, from
  Florian Westphal.

* Fix bogus large memory allocation in the bitmap port set type in ipset,
  from Jozsef Kadlecsik.

* Fix possible compilation issues in the hash netnet set type in ipset,
  also from Jozsef Kadlecsik.

* Define constants to identify netlink callback data in ipset dumps,
  again from Jozsef Kadlecsik.

* Use sock_gen_put() in xt_socket to replace xt_socket_put_sk,
  from Eric Dumazet.

* Improvements for the SH scheduler in IPVS, from Alexander Frolkin.

* Remove extra delay due to unneeded rcu barrier in IPVS net namespace
  cleanup path, from Julian Anastasov.

* Save some cycles in ip6t_REJECT by skipping checksum validation in
  packets leaving from our stack, from Stanislav Fomichev.

* Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from
  Julian Anastasov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 19:46:58 -05:00
Dan Carpenter
c359c4157c netfilter: nft_compat: use _safe version of list_for_each
We need to use the _safe version of list_for_each_entry() here otherwise
we have a use after free bug.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-04 22:58:30 +01:00
David S. Miller
6fcf018ae4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
Open vSwitch

A set of updates for net-next/3.13. Major changes are:
 * Restructure flow handling code to be more logically organized and
   easier to read.
 * Rehashing of the flow table is moved from a workqueue to flow
   installation time. Before, heavy load could block the workqueue for
   excessive periods of time.
 * Additional debugging information is provided to help diagnose megaflows.
 * It's now possible to match on TCP flags.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 16:25:04 -05:00
Daniel Borkmann
cea80ea8d2 net: checksum: fix warning in skb_checksum
This patch fixes a build warning in skb_checksum() by wrapping the
csum_partial() usage in skb_checksum(). The problem is that on a few
architectures, csum_partial is used with prefix asmlinkage whereas
on most architectures it's not. So fix this up generically as we did
with csum_block_add_ext() to match the signature. Introduced by
2817a336d4 ("net: skb_checksum: allow custom update/combine for
walking skb").

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 15:27:08 -05:00
John W. Linville
87bc0728d4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/brcm80211/brcmfmac/sdio_host.h
2013-11-04 14:51:28 -05:00
John W. Linville
01925efdf7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/drv.c
2013-11-04 14:45:14 -05:00
David S. Miller
394efd19d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/netconsole.c
	net/bridge/br_private.h

Three mostly trivial conflicts.

The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.

In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".

Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 13:48:30 -05:00
Daniel Borkmann
7926c1d5be net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb
Introduced in f9e42b8535 ("net: sctp: sideeffect: throw BUG if
primary_path is NULL"), we intended to find a buggy assoc that's
part of the assoc hash table with a primary_path that is NULL.
However, we better remove the BUG_ON for now and find a more
suitable place to assert for these things as Mark reports that
this also triggers the bug when duplication cookie processing
happens, and the assoc is not part of the hash table (so all
good in this case). Such a situation can for example easily be
reproduced by:

  tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1
  tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20%
  tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \
            protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2

This drops 20% of COOKIE-ACK packets. After some follow-up
discussion with Vlad we came to the conclusion that for now we
should still better remove this BUG_ON() assertion, and come up
with two follow-ups later on, that is, i) find a more suitable
place for this assertion, and possibly ii) have a special
allocator/initializer for such kind of temporary assocs.

Reported-by: Mark Thomas <Mark.Thomas@metaswitch.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 00:46:44 -05:00
Arvid Brodin
f421436a59 net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)
High-availability Seamless Redundancy ("HSR") provides instant failover
redundancy for Ethernet networks. It requires a special network topology where
all nodes are connected in a ring (each node having two physical network
interfaces). It is suited for applications that demand high availability and
very short reaction time.

HSR acts on the Ethernet layer, using a registered Ethernet protocol type to
send special HSR frames in both directions over the ring. The driver creates
virtual network interfaces that can be used just like any ordinary Linux
network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring
must be HSR capable.

This code is a "best effort" to comply with the HSR standard as described in
IEC 62439-3:2010 (HSRv0).

Signed-off-by: Arvid Brodin <arvid.brodin@xdin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:20:14 -05:00
Eric Dumazet
74d332c13b net: extend net_device allocation to vmalloc()
Joby Poriyath provided a xen-netback patch to reduce the size of
xenvif structure as some netdev allocation could fail under
memory pressure/fragmentation.

This patch is handling the problem at the core level, allowing
any netdev structures to use vmalloc() if kmalloc() failed.

As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Joby Poriyath <joby.poriyath@citrix.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:19:00 -05:00
Daniel Borkmann
e6d8b64b34 net: sctp: fix and consolidate SCTP checksumming code
This fixes an outstanding bug found through IPVS, where SCTP packets
with skb->data_len > 0 (non-linearized) and empty frag_list, but data
accumulated in frags[] member, are forwarded with incorrect checksum
letting SCTP initial handshake fail on some systems. Linearizing each
SCTP skb in IPVS to prevent that would not be a good solution as
this leads to an additional and unnecessary performance penalty on
the load-balancer itself for no good reason (as we actually only want
to update the checksum, and can do that in a different/better way
presented here).

The actual problem is elsewhere, namely, that SCTP's checksumming
in sctp_compute_cksum() does not take frags[] into account like
skb_checksum() does. So while we are fixing this up, we better reuse
the existing code that we have anyway in __skb_checksum() and use it
for walking through the data doing checksumming. This will not only
fix this issue, but also consolidates some SCTP code with core
sk_buff code, bringing it closer together and removing respectively
avoiding reimplementation of skb_checksum() for no good reason.

As crc32c() can use hardware implementation within the crypto layer,
we leave that intact (it wraps around / falls back to e.g. slice-by-8
algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
combinator for crc32c blocks.

Also, we remove all other SCTP checksumming code, so that we only
have to use sctp_compute_cksum() from now on; for doing that, we need
to transform SCTP checkumming in output path slightly, and can leave
the rest intact.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:57 -05:00
Daniel Borkmann
2817a336d4 net: skb_checksum: allow custom update/combine for walking skb
Currently, skb_checksum walks over 1) linearized, 2) frags[], and
3) frag_list data and calculats the one's complement, a 32 bit
result suitable for feeding into itself or csum_tcpudp_magic(),
but unsuitable for SCTP as we're calculating CRC32c there.

Hence, in order to not re-implement the very same function in
SCTP (and maybe other protocols) over and over again, use an
update() + combine() callback internally to allow for walking
over the skb with different algorithms.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:57 -05:00
Wei Yongjun
ca0e8bd68b netfilter: nf_tables: remove duplicated include from nf_tables_ipv4.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-03 22:36:25 +01:00
Holger Eitzenberger
4542fa4727 netfilter: ctnetlink: account both directions in one step
With the intent to dump other accounting data later.
This patch is a cleanup.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-03 21:49:32 +01:00
Holger Eitzenberger
f7b13e4330 netfilter: introduce nf_conn_acct structure
Encapsulate counters for both directions into nf_conn_acct. During
that process also consistently name pointers to the extend 'acct',
not 'counters'. This patch is a cleanup.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-03 21:48:49 +01:00
Jason Wang
6f09234385 net: flow_dissector: fail on evil iph->ihl
We don't validate iph->ihl which may lead a dead loop if we meet a IPIP
skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl
is evil (less than 5).

This issue were introduced by commit ec5efe7946
(rps: support IPIP encapsulation).

Cc: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-02 02:16:07 -04:00
David S. Miller
296c10639a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	net/xfrm/xfrm_policy.c

Minor merge conflict in xfrm_policy.c, consisting of overlapping
changes which were trivial to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-02 02:13:48 -04:00
David S. Miller
2e19ef0251 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
1) Fix a possible race on ipcomp scratch buffers because
   of too early enabled siftirqs. From Michal Kubecek.

2) The current xfrm garbage collector threshold is too small
   for some workloads, resulting in bad performance on these
   workloads. Increase the threshold from 1024 to 32768.

3) Some codepaths might not have a dst_entry attached to the
   skb when calling xfrm_decode_session(). So add a check
   to prevent a null pointer dereference in this case.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-02 01:22:39 -04:00
Pravin B Shelar
8ddd094675 openvswitch: Use flow hash during flow lookup operation.
Flow->hash can be used to detect hash collisions and avoid flow key
compare in flow lookup.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-11-01 18:43:46 -07:00
Jarno Rajahalme
5eb26b156e openvswitch: TCP flags matching support.
tcp_flags=flags/mask
        Bitwise  match on TCP flags.  The flags and mask are 16-bit num‐
        bers written in decimal or in hexadecimal prefixed by 0x.   Each
        1-bit  in  mask requires that the corresponding bit in port must
        match.  Each 0-bit in mask causes the corresponding  bit  to  be
        ignored.

        TCP  protocol  currently  defines  9 flag bits, and additional 3
        bits are reserved (must be transmitted as zero), see  RFCs  793,
        3168, and 3540.  The flag bits are, numbering from the least
        significant bit:

        0: FIN No more data from sender.

        1: SYN Synchronize sequence numbers.

        2: RST Reset the connection.

        3: PSH Push function.

        4: ACK Acknowledgement field significant.

        5: URG Urgent pointer field significant.

        6: ECE ECN Echo.

        7: CWR Congestion Windows Reduced.

        8: NS  Nonce Sum.

        9-11:  Reserved.

        12-15: Not matchable, must be zero.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-11-01 18:43:45 -07:00
Jarno Rajahalme
df23e9f642 openvswitch: Widen TCP flags handling.
Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-11-01 18:43:45 -07:00
Pravin B Shelar
3cdb35b074 openvswitch: Enable all GSO features on internal port.
OVS already can handle all types of segmentation offloads that
are supported by the kernel.
Following patch specifically enables UDP and IPV6 segmentation
offloads.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-11-01 18:17:50 -07:00
Ingo Molnar
fb10d5b7ef Merge branch 'linus' into sched/core
Resolve cherry-picking conflicts:

Conflicts:
	mm/huge_memory.c
	mm/memory.c
	mm/mprotect.c

See this upstream merge commit for more details:

  52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-01 08:24:41 +01:00
Steffen Klassert
84502b5ef9 xfrm: Fix null pointer dereference when decoding sessions
On some codepaths the skb does not have a dst entry
when xfrm_decode_session() is called. So check for
a valid skb_dst() before dereferencing the device
interface index. We use 0 as the device index if
there is no valid skb_dst(), or at reverse decoding
we use skb_iif as device interface index.

Bug was introduced with git commit bafd4bd4dc
("xfrm: Decode sessions with output interface.").

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-11-01 07:08:46 +01:00
Trond Myklebust
a1311d87fa SUNRPC: Cleanup xs_destroy()
There is no longer any need for a separate xs_local_destroy() helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-31 09:31:17 -04:00
NeilBrown
93dc41bdc5 SUNRPC: close a rare race in xs_tcp_setup_socket.
We have one report of a crash in xs_tcp_setup_socket.
The call path to the crash is:

  xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.

The 'sock' passed to that last function is NULL.

The only way I can see this happening is a concurrent call to
xs_close:

  xs_close -> xs_reset_transport -> sock_release -> inet_release

inet_release sets:
   sock->sk = NULL;
inet_stream_connect calls
   lock_sock(sock->sk);
which gets NULL.

All calls to xs_close are protected by XPRT_LOCKED as are most
activations of the workqueue which runs xs_tcp_setup_socket.
The exception is xs_tcp_schedule_linger_timeout.

So presumably the timeout queued by the later fires exactly when some
other code runs xs_close().

To protect against this we can move the cancel_delayed_work_sync()
call from xs_destory() to xs_close().

As xs_close is never called from the worker scheduled on
->connect_worker, this can never deadlock.

Signed-off-by: NeilBrown <neilb@suse.de>
[Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-31 09:14:50 -04:00
Alexander Aring
3582b900ad 6lowpan: cleanup skb copy data
This patch drops the direct memcpy on skb and uses the right skb
memcpy functions. Also remove an unnecessary check if plen is non zero.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:18:46 -04:00
Alexander Aring
578d524127 6lowpan: set 6lowpan network and transport header
This is necessary to access network header with the skb_network_header
function instead of calculate the position with mac_len, etc.
Do the same for the transport header, when we replace the IPv6 header
with the 6LoWPAN header.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:18:46 -04:00
Alexander Aring
3e69162ea4 6lowpan: set and use mac_len for mac header length
Set the mac header length while creating the 802.15.4 mac header.

Drop the function for recalculate mac header length in upper layers
which was static and works for intra pan communication only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:18:46 -04:00
Alexander Aring
3961532fd4 6lowpan: remove unnecessary set of headers
On receiving side we don't need to set any headers in skb because the
6LoWPAN layer do not access it. Currently these values will set twice
after calling netif_rx.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:18:45 -04:00
Duan Jiong
ba4865027c ipv6: remove the unnecessary statement in find_match()
After reading the function rt6_check_neigh(), we can
know that the RT6_NUD_FAIL_SOFT can be returned only
when the IS_ENABLE(CONFIG_IPV6_ROUTER_PREF) is false.
so in function find_match(), there is no need to execute
the statement !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF).

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:06:51 -04:00
Chen Weilong
83a1a7ce60 mac802154: Use pr_err(...) rather than printk(KERN_ERR ...)
This change is inspired by checkpatch.

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 17:05:44 -04:00
Ying Xue
3af390e2c5 tipc: remove two indentation levels in tipc_recv_msg routine
The message dispatching part of tipc_recv_msg() is wrapped layers of
while/if/if/switch, causing out-of-control indentation and does not
look very good. We reduce two indentation levels by separating the
message dispatching from the blocks that checks link state and
sequence numbers, allowing longer function and arg names to be
consistently indented without wrapping. Additionally we also rename
"cont" label to "discard" and add one new label called "unlock_discard"
to make code clearer. In all, these are cosmetic changes that do not
alter the operation of TIPC in any way.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Andreas Bofjäll <andreas.bofjall@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30 16:54:54 -04:00
Wei Yongjun
09c3e54635 SUNRPC: remove duplicated include from clnt.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-30 11:13:34 -04:00
Vinod Koul
f0dad6e701 Merge branch 'dma_complete' into next 2013-10-30 15:42:19 +05:30
Yuchung Cheng
c968601d17 tcp: temporarily disable Fast Open on SYN timeout
Fast Open currently has a fall back feature to address SYN-data being
dropped but it requires the middle-box to pass on regular SYN retry
after SYN-data. This is implemented in commit aab487435 ("net-tcp:
Fast Open client - detecting SYN-data drops")

However some NAT boxes will drop all subsequent packets after first
SYN-data and blackholes the entire connections.  An example is in
commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast
Open".

The sender should note such incidents and fall back to use the regular
TCP handshake on subsequent attempts temporarily as well: after the
second SYN timeouts the original Fast Open SYN is most likely lost.
When such an event recurs Fast Open is disabled based on the number of
recurrences exponentially.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:50:41 -04:00
Daniel Borkmann
97203abe6b net: ipvs: sctp: do not recalc sctp csum when ports didn't change
Unlike UDP or TCP, we do not take the pseudo-header into
account in SCTP checksums. So in case port mapping is the
very same, we do not need to recalculate the whole SCTP
checksum in software, which is very expensive.

Also, similarly as in TCP, take into account when a private
helper mangled the packet. In that case, we also need to
recalculate the checksum even if ports might be same.

Thanks for feedback regarding skb->ip_summed checks from
Julian Anastasov; here's a discussion on these checks for
snat and dnat:

* For snat_handler(), we can see CHECKSUM_PARTIAL from
  virtual devices, and from LOCAL_OUT, otherwise it
  should be CHECKSUM_UNNECESSARY. In general, in snat it
  is more complex. skb contains the original route and
  ip_vs_route_me_harder() can change the route after
  snat_handler. So, for locally generated replies from
  local server we can not preserve the CHECKSUM_PARTIAL
  mode. It is an chicken or egg dilemma: snat_handler
  needs the device after rerouting (to check for
  NETIF_F_SCTP_CSUM), while ip_route_me_harder() wants
  the snat_handler() to put the new saddr for proper
  rerouting.

* For dnat_handler(), we should not see CHECKSUM_COMPLETE
  for SCTP, in fact the small set of drivers that support
  SCTP offloading return CHECKSUM_UNNECESSARY on correctly
  received SCTP csum. We can see CHECKSUM_PARTIAL from
  local stack or received from virtual drivers. The idea is
  that SCTP decides to avoid csum calculation if hardware
  supports offloading. IPVS can change the device after
  rerouting to real server but we can preserve the
  CHECKSUM_PARTIAL mode if the new device supports
  offloading too. This works because skb dst is changed
  before dnat_handler and we see the new device. So, checks
  in the 'if' part will decide whether it is ok to keep
  CHECKSUM_PARTIAL for the output. If the packet was with
  CHECKSUM_NONE, hence we deal with unknown checksum. As we
  recalculate the sum for IP header in all cases, it should
  be safe to use CHECKSUM_UNNECESSARY. We can forward wrong
  checksum in this case (without cp->app). In case of
  CHECKSUM_UNNECESSARY, the csum was valid on receive.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-10-30 09:48:16 +09:00
Vlad Yasevich
06499098a0 bridge: pass correct vlan id to multicast code
Currently multicast code attempts to extrace the vlan id from
the skb even when vlan filtering is disabled.  This can lead
to mdb entries being created with the wrong vlan id.
Pass the already extracted vlan id to the multicast
filtering code to make the correct id is used in
creation as well as lookup.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:40:08 -04:00
David S. Miller
911aeb1084 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
One patch for net/3.12 fixing an issue where devices could be in an
invalid state they are removed while still attached to OVS.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:36:39 -04:00
Michael Drüing
706e282b69 net: x25: Fix dead URLs in Kconfig
Update the URLs in the Kconfig file to the new pages at sangoma.com and cisco.com

Signed-off-by: Michael Drüing <michael@drueing.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:35:17 -04:00
Daniel Borkmann
7d1d65cb84 net: sched: cls_bpf: add BPF-based classifier
This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:

     0: No match
    -1: Select classid given in "tc filter ..." command
  else: flowid, overwrite the default one

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:33:17 -04:00
David S. Miller
68783ec73c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
This pull request contains the following netfilter fix:

* fix --queue-bypass in xt_NFQUEUE revision 3. While adding the
  revision 3 of this target, the bypass flags were not correctly
  handled anymore, thus, breaking packet bypassing if no application
  is listening from userspace, patch from Holger Eitzenberger,
  reported by Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 16:53:44 -04:00
Holger Eitzenberger
d954777324 netfilter: xt_NFQUEUE: fix --queue-bypass regression
V3 of the NFQUEUE target ignores the --queue-bypass flag,
causing packets to be dropped when the userspace listener
isn't running.

Regression is in since 8746ddcf12 ("netfilter: xt_NFQUEUE:
introduce CPU fanout").

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-29 13:05:54 +01:00
Mathias Krause
1c5ad13f7c net: esp{4,6}: get rid of struct esp_data
struct esp_data consists of a single pointer, vanishing the need for it
to be a structure. Fold the pointer into 'data' direcly, removing one
level of pointer indirection.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-29 06:39:42 +01:00
Mathias Krause
123b0d1ba0 net: esp{4,6}: remove padlen from struct esp_data
The padlen member of struct esp_data is always zero. Get rid of it.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-29 06:39:42 +01:00
Zhi Yong Wu
cdfb97bc01 net, mc: fix the incorrect comments in two mc-related functions
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:19:05 -04:00
Zhi Yong Wu
ab1a2d7773 net, iovec: fix the incorrect comment in memcpy_fromiovecend()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:19:04 -04:00
Zhi Yong Wu
c4e819d16c net, datagram: fix the incorrect comment in zerocopy_sg_from_iovec()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:19:04 -04:00
Hannes Frederic Sowa
daba287b29 ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK
UFO as well as UDP_CORK do not respect IP_PMTUDISC_DO and
IP_PMTUDISC_PROBE well enough.

UFO enabled packet delivery just appends all frags to the cork and hands
it over to the network card. So we just deliver non-DF udp fragments
(DF-flag may get overwritten by hardware or virtual UFO enabled
interface).

UDP_CORK does enqueue the data until the cork is disengaged. At this
point it sets the correct IP_DF and local_df flags and hands it over to
ip_fragment which in this case will generate an icmp error which gets
appended to the error socket queue. This is not reflected in the syscall
error (of course, if UFO is enabled this also won't happen).

Improve this by checking the pmtudisc flags before appending data to the
socket and if we still can fit all data in one packet when IP_PMTUDISC_DO
or IP_PMTUDISC_PROBE is set, only then proceed.

We use (mtu-fragheaderlen) to check for the maximum length because we
ensure not to generate a fragment and non-fragmented data does not need
to have its length aligned on 64 bit boundaries. Also the passed in
ip_options are already aligned correctly.

Maybe, we can relax some other checks around ip_fragment. This needs
more research.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:15:22 -04:00
Eric Dumazet
0d08c42cf9 tcp: gso: fix truesize tracking
commit 6ff50cd555 ("tcp: gso: do not generate out of order packets")
had an heuristic that can trigger a warning in skb_try_coalesce(),
because skb->truesize of the gso segments were exactly set to mss.

This breaks the requirement that

skb->truesize >= skb->len + truesizeof(struct sk_buff);

It can trivially be reproduced by :

ifconfig lo mtu 1500
ethtool -K lo tso off
netperf

As the skbs are looped into the TCP networking stack, skb_try_coalesce()
warns us of these skb under-estimating their truesize.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:04:47 -04:00
David S. Miller
5d9efa7ee9 ipv6: Remove privacy config option.
The code for privacy extentions is very mature, and making it
configurable only gives marginal memory/code savings in exchange
for obfuscation and hard to read code via CPP ifdef'ery.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 20:07:50 -04:00
Alexander Aring
8ef007fd1d 6lowpan: remove unnecessary break
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:52 -04:00
Alexander Aring
b236b954de 6lowpan: remove skb->dev assignment
This patch removes the assignment of skb->dev. We don't need it here because
we use the netdev_alloc_skb_ip_align function which already sets the
skb->dev.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:52 -04:00
Alexander Aring
b614442f34 6lowpan: use netdev_alloc_skb instead dev_alloc_skb
This patch uses the netdev_alloc_skb instead dev_alloc_skb function and
drops the seperate assignment to skb->dev.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:51 -04:00
Alexander Aring
53cb5717b4 6lowpan: remove unnecessary check on err >= 0
The err variable can only be zero in this case.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:51 -04:00
Alexander Aring
545f3613a8 6lowpan: remove unnecessary ret variable
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 19:47:51 -04:00
Trond Myklebust
9d3a2260f0 SUNRPC: Fix buffer overflow checking in gss_encode_v0_msg/gss_encode_v1_msg
In gss_encode_v1_msg, it is pointless to BUG() after the overflow has
happened. Replace the existing sprintf()-based code with scnprintf(),
and warn if an overflow is ever triggered.

In gss_encode_v0_msg, replace the runtime BUG_ON() with an appropriate
compile-time BUILD_BUG_ON.

Reported-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:53:21 -04:00
Trond Myklebust
5fccc5b52e SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message
Add the missing 'break' to ensure that we don't corrupt a legacy 'v0' type
message by appending the 'v1'.

Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:44:20 -04:00
wangweidong
8313164c36 SUNRPC: remove an unnecessary if statement
If req allocated failed just goto out_free, no need to check the
'i < num_prealloc'. There is just code simplification, no
functional changes.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:16:56 -04:00
J. Bruce Fields
e3bfab1848 sunrpc: comment typo fix
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:16:54 -04:00
Trond Myklebust
34751b9d04 SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport
rpc_clnt_set_transport should use rcu_derefence_protected(), as it is
only safe to be called with the rpc_clnt::cl_lock held.

Cc: Chuck Lever <Chuck.Lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 16:46:37 -04:00
Trond Myklebust
40b00b6b17 SUNRPC: Add a helper to switch the transport of an rpc_clnt
Add an RPC client API to redirect an rpc_clnt's transport from a
source server to a destination server during a migration event.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: forward ported to 3.12 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:21:32 -04:00
Chuck Lever
d746e54522 SUNRPC: Modify synopsis of rpc_client_register()
The rpc_client_register() helper was added in commit e73f4cc0,
"SUNRPC: split client creation routine into setup and registration,"
Mon Jun 24 11:52:52 2013.  In a subsequent patch, I'd like to invoke
rpc_client_register() from a context where a struct rpc_create_args
is not available.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:20:29 -04:00
Pablo Neira Ayuso
46413825a7 netfilter: bridge: nf_tables: add filter chain type
This patch adds the filter chain type which is required to
create filter chains in the bridge family from userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-28 18:07:35 +01:00
Tomasz Bursztyka
98c37b6b01 netfilter: nft_nat: Fix endianness issue reported by sparse
This patch fixes this:

CHECK   net/netfilter/nft_nat.c
net/netfilter/nft_nat.c:50:43: warning: incorrect type in assignment (different base types)
net/netfilter/nft_nat.c:50:43:    expected restricted __be32 [addressable] [usertype] ip
net/netfilter/nft_nat.c:50:43:    got unsigned int [unsigned] [usertype] <noident>
net/netfilter/nft_nat.c:51:43: warning: incorrect type in assignment (different base types)
net/netfilter/nft_nat.c:51:43:    expected restricted __be32 [addressable] [usertype] ip
net/netfilter/nft_nat.c:51:43:    got unsigned int [unsigned] [usertype] <noident>
net/netfilter/nft_nat.c:65:37: warning: incorrect type in assignment (different base types)
net/netfilter/nft_nat.c:65:37:    expected restricted __be16 [addressable] [assigned] [usertype] all
net/netfilter/nft_nat.c:65:37:    got unsigned int [unsigned] <noident>
net/netfilter/nft_nat.c:66:37: warning: incorrect type in assignment (different base types)
net/netfilter/nft_nat.c:66:37:    expected restricted __be16 [addressable] [assigned] [usertype] all
net/netfilter/nft_nat.c:66:37:    got unsigned int [unsigned] <noident>

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-28 17:41:49 +01:00
Pablo Neira Ayuso
6e078bc2f2 netfilter: bridge: fix nf_tables bridge dependencies with main core
when CONFIG_NF_TABLES[_MODULE] is not enabled,
but CONFIG_NF_TABLES_BRIDGE is enabled:

net/bridge/netfilter/nf_tables_bridge.c: In function 'nf_tables_bridge_init_net':
net/bridge/netfilter/nf_tables_bridge.c:24:5: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:25:9: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:28:2: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:30:34: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:35:11: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c: In function 'nf_tables_bridge_exit_net':
net/bridge/netfilter/nf_tables_bridge.c:41:27: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:42:11: error: 'struct net' has no member named 'nft'

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-28 17:41:21 +01:00
Andrei Otcheretianski
d0a361a5b3 nl80211: fix channel switch parsing
The nl80211 attribute NL80211_ATTR_CSA_C_OFF_BEACON should be nested
inside NL80211_ATTR_CSA_IES, but commit ee4bc9e758
("nl80211: enable IBSS support for channel switch announcements")
added a check in the outer message attributes.

Fix channel switch calls by removing the erroneus condition.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:13:04 +01:00
Michal Kazior
0951ebb8aa mac80211: fix uninitialized variable
CSA completion could call in a driver
bss_info_changed() with a garbled `changed` flag
leading to all sorts of problems.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:31 +01:00
Chun-Yeow Yeoh
33a45867c5 mac80211: process mesh channel switching using beacon
Trigger the mesh channel switching procedure if the mesh STA
happens to miss the CSA action frame but able to receive the
beacon containing the CSA and MCSP elements from its peer
mesh STAs.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
[fix locking in ieee80211_mesh_process_chnswitch()]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:30 +01:00
Chun-Yeow Yeoh
b8456a14e9 {nl,cfg,mac}80211: implement mesh channel switch userspace API
Implement the required procedures for mesh channel switching as defined
in the IEEE Std 802.11-2012 section 10.9.8.4.3 and also handle the CSA
and MCSP elements as followed:
 * Add the function for updating the beacon and probe response frames
   with CSA and MCSP elements during the period of switching to the new
   channel. Both CSA and MCSP elements must be included in beacon and
   probe response frames until the intended channel switch time.
 * The ifmsh->csa_settings is set to NULL and the CSA and MCSP elements
   will then be removed from the beacon or probe response frames once the
   new channel is switched to.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:30 +01:00
Chun-Yeow Yeoh
c6da674aff {nl,cfg,mac}80211: enable the triggering of CSA frame in mesh
Allow the triggering of CSA frame using mesh interface. The
rules are more or less same with IBSS, such as not allowed to
change between the band and channel width has to be same from
the previous mode. Also, move the ieee80211_send_action_csa
to a common space so that it can be re-used by mesh interface.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:29 +01:00
Chun-Yeow Yeoh
8f2535b92d mac80211: process the CSA frame for mesh accordingly
Process the CSA frame according to the procedures define in IEEE Std
802.11-2012 section 10.9.8.4.3 as follow:
* The mesh channel switch parameters element (MCSP) must be availabe.
* If the MCSP's TTL is 1, drop the frame but still process the CSA.
* If the MCSP's precedence value is less than or equal to the current
  precedence value, drop the frame and do not process the CSA.
* The CSA frame is forwarded after TTL is decremented by 1 and the
  initiator field is set to 0. Transmit restrict field and others
  are maintained as is.
* No beacon or probe response frame are handled here.

Also, introduce the debug message used for mesh CSA purpose.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:28 +01:00
Chun-Yeow Yeoh
c0f17eb9b2 mac80211: refactor the parsing of chan switch ie
Refactor the channel switch IE parsing to reduce the number
of function parameters.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:28 +01:00
Felix Fietkau
06be6b149f mac80211: add ieee80211_tx_prepare_skb() helper function
This can be used by a driver to prepare skbs for transmission, which were
obtained via functions such as ieee80211_probereq_get or
ieee80211_nullfunc_get.

This is useful for drivers that want to send those frames directly, but
need rate control information to be prepared first.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:27 +01:00
Luis R. Rodriguez
034c6d6e67 cfg80211: export reg_initiator_name()
Drivers can now use this to parse the regulatory request and
be more verbose when needed.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:27 +01:00
Marco Porsch
446075d76b mac80211: fixes for mesh powersave logic
This patch fixes errors in the mesh powersave logic which
cause that remote peers do not get peer power mode change
notifications and mesh peer service periods (MPSPs) got
stuck.

When closing a peer link, set the (now invalid) peer-specific
power mode to 'unknown'.

Avoid overhead when local power mode is unchanged.

Reliably clear MPSP flags on peering status update.

Avoid MPSP flags getting stuck by not requesting a further
MPSP ownership if we already are an MPSP owner.

Signed-off-by: Marco Porsch <marco@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:26 +01:00
Seth Forshee
17ac49594f mac80211: Remove check for offchannel state when waking netdev queues
6c17b77b67 ensures that a device's
mac80211 queues will remain stopped while offchannel. Since the
vif can no longer be offchannel when the queues wake it's not
necessary to check for this before waking its netdev queues.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:26 +01:00
Heikki Krogerus
ef91ffaa03 net: rfkill: gpio: add ACPI support
Including ACPI ID for Broadcom GPS receiver BCM4752.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:25 +01:00
Heikki Krogerus
262c91ee5e net: rfkill: gpio: prepare for DT and ACPI support
This will add the relevant values like the gpios and the
type in rfkill_gpio_platform_data to the rfkill_gpio_data
structure. It will allow those values to be easily picked
from DT and ACPI tables later.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:25 +01:00
Heikki Krogerus
0be81727a4 net: rfkill: gpio: spinlock-safe GPIO access
This sets the direction of the gpio once when it's requested,
and uses the spinlock-safe gpio_set_state() to change the
state.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:24 +01:00
Heikki Krogerus
f02ae59b12 net: rfkill: gpio: clean up clock handling
Use a simple flag to see the state of the clock, and make
the clock available even without a name. Also, get rid of
HAVE_CLK dependency.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:24 +01:00
Heikki Krogerus
5e7ca3937f net: rfkill: gpio: convert to resource managed allocation
And remove now unneeded resource freeing.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:23 +01:00
Simon Wunderlich
8e8d347da7 mac80211: enable DFS for IBSS mode
Allow changing to DFS channels if the channel is available for
beaconing and userspace controls DFS operation.

Channel switch announcement from other stations on DFS channels will
be interpreted as radar event. These channels will then be marked as
unvailable.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:21 +01:00
Simon Wunderlich
5336fa88e8 nl80211/cfg80211: enable DFS for IBSS mode
To use DFS in IBSS mode, userspace is required to react to radar events.
It can inform nl80211 that it is capable of doing so by adding a
NL80211_ATTR_HANDLE_DFS attribute when joining the IBSS.

This attribute is supplied to let the kernelspace know that the
userspace application can and will handle radar events, e.g. by
intiating channel switches to a valid channel. DFS channels may
only be used if this attribute is supplied and the driver supports
it. Driver support will be checked even if a channel without DFS
will be initially joined, as a DFS channel may be chosen later.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
[fix attribute name in commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:21 +01:00
Emmanuel Grumbach
687da13223 mac80211: implement SMPS for AP
When the driver requests to move to STATIC or DYNAMIC SMPS,
we send an action frame to each associated station and
reconfigure the channel context / driver.
Of course, non-MIMO stations are ignored.

The beacon isn't updated. The association response will
include the original capabilities. Stations that associate
while in non-OFF SMPS mode will get an action frame right
after association to inform them about our current state.
Note that we wait until the end of the EAPOL. Sending an
action frame before the EAPOL is finished can be an issue
for a few clients. Clients aren't likely to send EAPOL
frames in MIMO anyway.

When the SMPS configuration gets more permissive (e.g.
STATIC -> OFF), we don't wake up stations that are asleep
We remember that they don't know about the change and send
the action frame when they wake up.

When the SMPS configuration gets more restrictive (e.g.
OFF -> STATIC), we set the TIM bit for every sleeping STA.
uAPSD stations might send MIMO until they poll the action
frame, but this is for a short period of time.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[fix vht streams loop, initialisation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-28 15:05:11 +01:00
Daniel Borkmann
6e7cd27c0f net: ipvs: sctp: add missing verdict assignments in sctp_conn_schedule
If skb_header_pointer() fails, we need to assign a verdict, that is
NF_DROP in this case, otherwise, we would leave the verdict from
conn_schedule() uninitialized when returning.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-10-28 19:00:49 +09:00
Steffen Klassert
eeb1b73378 xfrm: Increase the garbage collector threshold
With the removal of the routing cache, we lost the
option to tweak the garbage collector threshold
along with the maximum routing cache size. So git
commit 703fb94ec ("xfrm: Fix the gc threshold value
for ipv4") moved back to a static threshold.

It turned out that the current threshold before we
start garbage collecting is much to small for some
workloads, so increase it from 1024 to 32768. This
means that we start the garbage collector if we have
more than 32768 dst entries in the system and refuse
new allocations if we are above 65536.

Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-28 07:37:52 +01:00
wangweidong
747edc0f9e sctp: merge two if statements to one
Two if statements do the same work, we can merge them to
one. And fix some typos. There is just code simplification,
no functional changes.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 01:02:34 -04:00
wangweidong
3dc0a548a0 sctp: remove the repeat initialize with 0
kmem_cache_zalloc had set the allocated memory to zero. I think no need
to initialize with 0. And move the comments to the function begin.

Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 01:02:34 -04:00
wangweidong
2bccbadf20 sctp: fix some comments in chunk.c and associola.c
fix some typos

Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 01:02:34 -04:00
Eric Dumazet
8c3a897bfa inet: restore gso for vxlan
Alexei reported a performance regression on vxlan, caused
by commit 3347c96029 "ipv4: gso: make inet_gso_segment() stackable"

GSO vxlan packets were not properly segmented, adding IP fragments
while they were not expected.

Rename 'bool tunnel' to 'bool encap', and add a new boolean
to express the fact that UDP should be fragmented.
This fragmentation is triggered by skb->encapsulation being set.

Remove a "skb->encapsulation = 1" added in above commit,
as its not needed, as frags inherit skb->frag from original
GSO skb.

Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 00:23:06 -04:00
Eric Dumazet
fc59d5bdf1 pkt_sched: fq: clear time_next_packet for reused flows
When a socket is freed/reallocated, we need to clear time_next_packet
or else we can inherit a prior value and delay first packets of the
new flow.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28 00:18:31 -04:00
Yuchung Cheng
2f715c1dde tcp: do not rearm RTO when future data are sacked
Patch ed08495c3 "tcp: use RTT from SACK for RTO" always re-arms RTO upon
obtaining a RTT sample from newly sacked data.

But technically RTO should only be re-armed when the data sent before
the last (re)transmission of write queue head are (s)acked. Otherwise
the RTO may continue to extend during loss recovery on data sent
in the future.

Note that RTTs from ACK or timestamps do not have this problem, as the RTT
source must be from data sent before.

The new RTO re-arm policy is
1) Always re-arm RTO if SND.UNA is advanced
2) Re-arm RTO if sack RTT is available, provided the sacked data was
   sent before the last time write_queue_head was sent.

Signed-off-by: Larry Brakmo <brakmo@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-27 16:50:06 -04:00
Yuchung Cheng
2909d874f3 tcp: only take RTT from timestamps if new data is acked
Patch ed08495c3 "tcp: use RTT from SACK for RTO" has a bug that
it does not check if the ACK acknowledge new data before taking
the RTT sample from TCP timestamps. This patch adds the check
back as required by the RFC.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-27 16:50:06 -04:00
Yuchung Cheng
bc15afa39e tcp: fix SYNACK RTT estimation in Fast Open
tp->lsndtime may not always be the SYNACK timestamp if a passive
Fast Open socket sends data before handshake completes. And if the
remote acknowledges both the data and the SYNACK, the RTT sample
is already taken in tcp_ack(), so no need to call
tcp_update_ack_rtt() in tcp_synack_rtt_meas() aagain.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-27 16:50:06 -04:00
Florian Westphal
6b8dbcf2c4 bridge: netfilter: orphan skb before invoking ip netfilter hooks
Pekka Pietikäinen reports xt_socket behavioural change after commit
00028aa37098o (netfilter: xt_socket: use IP early demux).

Reason is xt_socket now no longer does an unconditional sk lookup -
it re-uses existing skb->sk if possible, assuming ->sk was set by
ip early demux.

However, when netfilter is invoked via bridge, this can cause 'bogus'
sockets to be examined by the match, e.g. a 'tun' device socket.

bridge netfilter should orphan the skb just like the routing path
before invoking ipv4/ipv6 netfilter hooks to avoid this.

Reported-and-tested-by: Pekka Pietikäinen <pp@ee.oulu.fi>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-27 21:44:33 +01:00
Michael Opdenacker
1e56555ee1 netfilter: ipset: remove duplicate define
This patch removes a duplicate define from
net/netfilter/ipset/ip_set_hash_gen.h

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-10-27 19:24:45 +01:00
Jozsef Kadlecsik
dc476e7c8e netfilter:ipset: Fix memory allocation for bitmap:port
At the restructuring of the bitmap types creation in ipset, for the
bitmap:port type wrong (too large) memory allocation was copied
(netfilter bugzilla id #859).

Reported-by: Quentin Armitage <quentin@armitage.org.uk>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-10-27 19:24:39 +01:00
Jeff Layton
cf4c024b90 sunrpc: trim off EC bytes in GSSAPI v2 unwrap
As Bruce points out in RFC 4121, section 4.2.3:

   "In Wrap tokens that provide for confidentiality, the first 16 octets
    of the Wrap token (the "header", as defined in section 4.2.6), SHALL
    be appended to the plaintext data before encryption.  Filler octets
    MAY be inserted between the plaintext data and the "header.""

...and...

   "In Wrap tokens with confidentiality, the EC field SHALL be used to
    encode the number of octets in the filler..."

It's possible for the client to stuff different data in that area on a
retransmission, which could make the checksum come out wrong in the DRC
code.

After decrypting the blob, we should trim off any extra count bytes in
addition to the checksum blob.

Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-26 15:36:55 -04:00
Hannes Frederic Sowa
e3bc10bd95 ipv6: ip6_dst_check needs to check for expired dst_entries
On receiving a packet too big icmp error we check if our current cached
dst_entry in the socket is still valid. This validation check did not
care about the expiration of the (cached) route.

The error path I traced down:
The socket receives a packet too big mtu notification. It still has a
valid dst_entry and thus issues the ip6_rt_pmtu_update on this dst_entry,
setting RTF_EXPIRE and updates the dst.expiration value (which could
fail because of not up-to-date expiration values, see previous patch).

In some seldom cases we race with a) the ip6_fib gc or b) another routing
lookup which would result in a recreation of the cached rt6_info from its
parent non-cached rt6_info. While copying the rt6_info we reinitialize the
metrics store by copying it over from the parent thus invalidating the
just installed pmtu update (both dsts use the same key to the inetpeer
storage). The dst_entry with the just invalidated metrics data would
just get its RTF_EXPIRES flag cleared and would continue to stay valid
for the socket.

We should have not issued the pmtu update on the already expired dst_entry
in the first placed. By checking the expiration on the dst entry and
doing a relookup in case it is out of date we close the race because
we would install a new rt6_info into the fib before we issue the pmtu
update, thus closing this race.

Not reliably updating the dst.expire value was fixed by the patch "ipv6:
reset dst.expires value when clearing expire flag".

Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-by: Valentijn Sessink <valentyn@blub.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Valentijn Sessink <valentyn@blub.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:26:59 -04:00
Antonio Quartulli
8fb479a47c netpoll: fix rx_hook() interface by passing the skb
Right now skb->data is passed to rx_hook() even if the skb
has not been linearised and without giving rx_hook() a way
to linearise it.

Change the rx_hook() interface and make it accept the skb
and the offset to the UDP payload as arguments. rx_hook() is
also renamed to rx_skb_hook() to ensure that out of the tree
users notice the API change.

In this way any rx_skb_hook() implementation can perform all
the needed operations to properly (and safely) access the
skb data.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:26:58 -04:00
Alexei Starovoitov
7f29405403 net: fix rtnl notification in atomic context
commit 991fb3f74c "dev: always advertise rx_flags changes via netlink"
introduced rtnl notification from __dev_set_promiscuity(),
which can be called in atomic context.

Steps to reproduce:
ip tuntap add dev tap1 mode tap
ifconfig tap1 up
tcpdump -nei tap1 &
ip tuntap del dev tap1 mode tap

[  271.627994] device tap1 left promiscuous mode
[  271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940
[  271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip
[  271.677525] INFO: lockdep is turned off.
[  271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G        W    3.12.0-rc3+ #73
[  271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[  271.731254]  ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428
[  271.760261]  ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8
[  271.790683]  0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8
[  271.822332] Call Trace:
[  271.838234]  [<ffffffff817544e5>] dump_stack+0x55/0x76
[  271.854446]  [<ffffffff8108bad1>] __might_sleep+0x181/0x240
[  271.870836]  [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0
[  271.887076]  [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0
[  271.903368]  [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0
[  271.919716]  [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0
[  271.936088]  [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0
[  271.952504]  [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0
[  271.968902]  [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100
[  271.985302]  [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0
[  272.001642]  [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0
[  272.017917]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.033961]  [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50
[  272.049855]  [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0
[  272.065494]  [<ffffffff81732052>] packet_notifier+0x1b2/0x380
[  272.080915]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.096009]  [<ffffffff81761c66>] notifier_call_chain+0x66/0x150
[  272.110803]  [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10
[  272.125468]  [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20
[  272.139984]  [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70
[  272.154523]  [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20
[  272.168552]  [<ffffffff816224c5>] rollback_registered_many+0x145/0x240
[  272.182263]  [<ffffffff81622641>] rollback_registered+0x31/0x40
[  272.195369]  [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90
[  272.208230]  [<ffffffff81547ca0>] __tun_detach+0x140/0x340
[  272.220686]  [<ffffffff81547ed6>] tun_chr_close+0x36/0x60

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:45 -04:00
Hannes Frederic Sowa
66415cf8a1 net: initialize hashrnd in flow_dissector with net_get_random_once
We also can defer the initialization of hashrnd in flow_dissector
to its first use. Since net_get_random_once is irq safe now we don't
have to audit the call paths if one of this functions get called by an
interrupt handler.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:39 -04:00
Hannes Frederic Sowa
f84be2bd96 net: make net_get_random_once irq safe
I initial build non irq safe version of net_get_random_once because I
would liked to have the freedom to defer even the extraction process of
get_random_bytes until the nonblocking pool is fully seeded.

I don't think this is a good idea anymore and thus this patch makes
net_get_random_once irq safe. Now someone using net_get_random_once does
not need to care from where it is called.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:39 -04:00
Nikolay Aleksandrov
974daef7f8 net: add missing dev_put() in __netdev_adjacent_dev_insert
I think that a dev_put() is needed in the error path to preserve the
proper dev refcount.

CC: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:39 -04:00
Hagen Paul Pfeifer
4a3ad7b3ea netem: markov loss model transition fix
The transition from markov state "3 => lost packets within a burst
period" to "1 => successfully transmitted packets within a gap period"
has no *additional* loss event. The loss already happen for transition
from 1 -> 3, this additional loss will make things go wild.

E.g. transition probabilities:

p13:   10%
p31:  100%

Expected:

Ploss = p13 / (p13 + p31)
Ploss = ~9.09%

... but it isn't. Even worse: we get a double loss - each time.
So simple don't return true to indicate loss, rather break and return
false.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Stefano Salsano <stefano.salsano@uniroma2.it>
Cc: Fabio Ludovici <fabio.ludovici@yahoo.it>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-25 19:03:39 -04:00
Vinod Koul
27bf697083 net: use DMA_COMPLETE for dma completion status
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-10-25 11:16:21 +05:30
Al Viro
72c2d53192 file->f_op is never NULL...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:54 -04:00
Al Viro
1e903edadf sunrpc: switch to %pd
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:51 -04:00
David S. Miller
b45bd46dec Included changes:
- data structure reshaping to accommodate multiple routing protocol
   implementations
 - routing protocol API enhancement
 - send to userspace the event "batman-adv Gateway loss" in case of soft-iface
   destruction and a "batman-adv Gateway" was configured
 - improve the TT component to support and advertise runtime flag changes
 - minor code refactoring
 - make the ICMP kernel-to-userspace communication more generic
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJSZ+VyAAoJEADl0hg6qKeOccIP/1RE68wFqd20BTh6WtnKo4s6
 H/F5WMUrk/HCNe3p4JnIOYv5WESR3tqTylCqBIl3yzcsks2KsLm4zEG5FIx/0J3Y
 1Mgy8V/xlGVN7M+wFSLCgzpTnJ3aCvy7+ied2qoz9KsC4vgiKUimDkPTsbUL7NUp
 OBmJGYfe/nLDoPI/CXu3nJCtNXcDgv5a5Z8ZMBuipeK++JsBMZRLfJEIM+7Q4Ouc
 KNLZFXavwtXAxsHLpWDS48MkVAz0tbyy4P6e2k7iQQq+W1WZjCoFMz0xLIKRFf+Y
 yOOZXItpTTX7rzLxFHkLAopZo9UMPsFjm/OceFBlnAbp24ftfR0b4gjFAUQiKFsq
 lGlGIXVkhsR6arfQ4SIlrrGOW7h+Kea2I1aPWC7yoi/97+22Nrr/a903p+kkhP4t
 sAoMk7DbbdajcV01RULF+xjaBFvEdaSfSBVB5j76Gf9AxNZfSGd7wH1qPG7O7HFT
 jO6Z4fbG6bbHBHMt9j4o9oGg4h5X8epbhZB7e8rwoBe/dSzw+B4CK2Y+j1/QW3C4
 PDqL3t5yi0O0+dhkI2G8DmhTOm+ZKVZt60WIMM+G5T6DiLyECveexGWIbjjZR67w
 qgVXsvE0PwHabY8Ne/z+IlnHY8zegUFYVusQ0lQfLpkKhdjoYXLF709RT42r2QIN
 8/6WlNGD2YG/0sWEDF7H
 =sYKR
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
this is another set of changes intended for net-next/linux-3.13.
(probably our last pull request for this cycle)

Patches 1 and 2 reshape two of our main data structures in a way that they can
easily be extended in the future to accommodate new routing protocols.

Patches from 3 to 9 improve our routing protocol API and its users so that all
the protocol-related code is not mixed up with the other components anymore.

Patch 10 limits the local Translation Table maximum size to a value such that it
can be fully transfered over the air if needed. This value depends on
fragmentation being enabled or not and on the mtu values.

Patch 11 makes batman-adv send a uevent in case of soft-interface destruction
while a "bat-Gateway" was configured (this informs userspace about the GW not
being available anymore).

Patches 13 and 14 enable the TT component to detect non-mesh client flag
changes at runtime (till now those flags where set upon client detection and
were not changed anymore).

Patch 16 is a generalisation of our user-to-kernel space communication (and
viceversa) used to exchange ICMP packets to send/received to/from the mesh
network. Now it can easily accommodate new ICMP packet types without breaking
the existing userspace API anymore.

Remaining patches are minor changes and cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:12:33 -04:00
Hannes Frederic Sowa
7088ad74e6 inet: remove old fragmentation hash initializing
All fragmentation hash secrets now get initialized by their
corresponding hash function with net_get_random_once. Thus we can
eliminate the initial seeding.

Also provide a comment that hash secret seeding happens at the first
call to the corresponding hashing function.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:01:41 -04:00
Hannes Frederic Sowa
b1190570b4 ipv6: split inet6_hash_frag for netfilter and initialize secrets with net_get_random_once
Defer the fragmentation hash secret initialization for IPv6 like the
previous patch did for IPv4.

Because the netfilter logic reuses the hash secret we have to split it
first. Thus introduce a new nf_hash_frag function which takes care to
seed the hash secret.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:01:40 -04:00
Hannes Frederic Sowa
e7b519ba55 ipv4: initialize ip4_frags hash secret as late as possible
Defer the generation of the first hash secret for the ipv4 fragmentation
cache as late as possible.

ip4_frags.rnd gets initial seeded by inet_frags_init and regulary
reseeded by inet_frag_secret_rebuild. Either we call ipqhashfn directly
from ip_fragment.c in which case we initialize the secret directly.

If we first get called by inet_frag_secret_rebuild we install a new secret
by a manual call to get_random_bytes. This secret will be overwritten
as soon as the first call to ipqhashfn happens. This is safe because we
won't race while publishing the new secrets with anyone else.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 17:01:40 -04:00
Daniel Borkmann
fecda03493 net: sctp: fix ASCONF to allow non SCTP_ADDR_SRC addresses in ipv6
Commit 8a07eb0a50 ("sctp: Add ASCONF operation on the single-homed host")
implemented possible use of IPv4 addresses with non SCTP_ADDR_SRC state
as source address when sending ASCONF (ADD) packets, but IPv6 part for
that was not implemented in 8a07eb0a50. Therefore, as this is not restricted
to IPv4-only, fix this up to allow the same for IPv6 addresses in SCTP.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Michio Honda <micchie@sfc.wide.ad.jp>
Acked-by: Michio Honda <micchie@sfc.wide.ad.jp>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:57:14 -04:00
David S. Miller
afb14c7cb6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains three netfilter fixes for your net
tree, they are:

* A couple of fixes to resolve info leak to userspace due to uninitialized
  memory area in ulogd, from Mathias Krause.

* Fix instruction ordering issues that may lead to the access of
  uninitialized data in x_tables. The problem involves the table update
 (producer) and the main packet matching (consumer) routines. Detected in
  SMP ARMv7, from Will Deacon.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:55:04 -04:00
David S. Miller
c3fa32b976 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/qmi_wwan.c
	include/net/dst.h

Trivial merge conflicts, both were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:49:34 -04:00
Hannes Frederic Sowa
34d92d5315 net: always inline net_secret_init
Currently net_secret_init does not get inlined, so we always have a call
to net_secret_init even in the fast path.

Let's specify net_secret_init as __always_inline so we have the nop in
the fast-path without the call to net_secret_init and the unlikely path
at the epilogue of the function.

jump_labels handle the inlining correctly.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:26:46 -04:00
Simon Wunderlich
da6b8c20a5 batman-adv: generalize batman-adv icmp packet handling
Instead of handling icmp packets only up to length of icmp_packet_rr,
the code should handle any icmp length size. Therefore the length
truncating is moved to when the packet is actually sent to userspace
(this does not support lengths longer than icmp_packet_rr yet). Longer
packets are forwarded without truncating.

This patch also cleans up some parts where the icmp header struct could
be used instead of other icmp_packet(_rr) structs to make the code more
readable.

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-23 17:03:47 +02:00
Simon Wunderlich
15c33da6e8 batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-23 17:03:46 +02:00
Antonio Quartulli
0eb01568f0 batman-adv: include the sync-flags when compute the global/local table CRC
Flags covered by TT_SYNC_MASK are kept in sync among the
nodes in the network and therefore they have to be
considered while computing the global/local table CRC.

In this way a generic originator is able to understand if
its table contains the correct flags or not.

Bits from 4 to 7 in the TT flags fields are now reserved for
"synchronized" flags only.

This allows future developers to add more flags of this type
without breaking compatibility.

It's important to note that not all the remote TT flags are
synchronised. This comes from the fact that some flags are
used to inject an information once only.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-10-23 17:03:46 +02:00
Antonio Quartulli
3c4f7ab60c batman-adv: improve the TT component to support runtime flag changes
Some flags (i.e. the WIFI flag) may change after that the
related client has already been announced. However it is
useful to informa the rest of the network about this change.

Add a runtime-flag-switch detection mechanism and
re-announce the related TT entry to advertise the new flag
value.

This mechanism can be easily exploited by future flags that
may need the same treatment.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-10-23 17:03:45 +02:00
Antonio Quartulli
0c69aecc5b batman-adv: invoke dev_get_by_index() outside of is_wifi_iface()
Upcoming changes need to perform other checks on the
incoming net_device struct.

To avoid performing dev_get_by_index() for each and every
check, it is better to move it outside of is_wifi_iface()
and search the netdev object once only.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-10-23 17:03:44 +02:00
Antonio Quartulli
8257f55ae2 batman-adv: send GW_DEL event in case of soft-iface destruction
In case of soft_iface destruction send a GW DEL event to
userspace so that applications which are listening for GW
events are informed about the lost of connectivity and can
react accordingly.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2013-10-23 17:03:44 +02:00
Marek Lindner
a19d3d85e1 batman-adv: limit local translation table max size
The local translation table size is limited by what can be
transferred from one node to another via a full table request.

The number of entries fitting into a full table request depend
on whether the fragmentation is enabled or not. Therefore this
patch introduces a max table size check and refuses to add
more local clients when that size is reached. Moreover, if the
max full table packet size changes (MTU change or fragmentation
is disabled) the local table is downsized instantaneously.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
2013-10-23 17:03:43 +02:00
Antonio Quartulli
4627456a77 batman-adv: adapt the TT component to use the new API functions
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 17:03:42 +02:00
Antonio Quartulli
d0015fdd3d batman-adv: provide orig_node routing API
Some operations executed on an orig_node depends on the
current routing algorithm being used. To easily make this
mechanism routing algorithm agnostic add a orig_node
specific API that each algorithm can populate with its own
routines.

Such routines are then invoked by the code when needed,
without knowing which routing algorithm is currently in use

With this patch 3 API functions are added:
- orig_free (to free routing depending internal structs)
- orig_add_if (to change the inner state of an orig_node
  when a new hard interface is added)
- orig_del_if (to change the inner state of an orig_node
  when an hard interface is removed)

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 17:03:21 +02:00
Antonio Quartulli
81e26b1a1c batman-adv: adapt the neighbor purging routine to use the new API functions
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 15:33:12 +02:00
Antonio Quartulli
6680a1249f batman-adv: adapt bonding to use the new API functions
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 15:33:12 +02:00
Antonio Quartulli
c43c981e50 batman-adv: add bat_neigh_is_equiv_or_better API function
Each routing protocol has its own metric semantic and
therefore is the protocol itself the only component able to
compare two metrics to check their "similarity".

This new API allows each routing protocol to implement its
own logic and make the external code protocol agnostic.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 15:33:11 +02:00
Antonio Quartulli
a3285a8f20 batman-adv: add bat_neigh_cmp API function
This new API allows to compare the two neighbours based on
the metric avoiding the user to deal with any routing
algorithm specific detail

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 15:33:10 +02:00
Antonio Quartulli
737a2a2297 batman-adv: add bat_orig_print API function
Each routing protocol has its own metric and private
variables, therefore it is useful to introduce a new API
for originator information printing.

This API needs to be implemented by each protocol in order
to provide its specific originator table output.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 15:33:10 +02:00
Antonio Quartulli
bbad0a5e36 batman-adv: make struct batadv_orig_node algorithm agnostic
some of the struct batadv_orig_node members are B.A.T.M.A.N. IV
specific and therefore they are moved in a algorithm specific
substruct in order to make batadv_orig_node routing algorithm
agnostic

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 15:33:09 +02:00
Antonio Quartulli
0538f75991 batman-adv: make struct batadv_neigh_node algorithm agnostic
some of the fields in struct batadv_neigh_node are strictly
related to the B.A.T.M.A.N. IV algorithm. In order to
make the struct usable by any routing algorithm it has to be
split and made more generic

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-23 15:33:08 +02:00
Stanislav Fomichev
f2020b27be netfilter: ip6t_REJECT: skip checksum verification for outgoing ipv6 packets
Don't verify checksum for outgoing packets because checksum calculation
may be done by the device.

Without this patch:
$ ip6tables -I OUTPUT -p tcp --dport 80 -j REJECT --reject-with tcp-reset
$ time telnet ipv6.google.com 80
Trying 2a00:1450:4010:c03::67...
telnet: Unable to connect to remote host: Connection timed out

real    0m7.201s
user    0m0.000s
sys     0m0.000s

With the patch applied:
$ ip6tables -I OUTPUT -p tcp --dport 80 -j REJECT --reject-with tcp-reset
$ time telnet ipv6.google.com 80
Trying 2a00:1450:4010:c03::67...
telnet: Unable to connect to remote host: Connection refused

real    0m0.085s
user    0m0.000s
sys     0m0.000s

Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-23 11:20:00 +02:00
Linus Lüssing
454594f3b9 Revert "bridge: only expire the mdb entry when query is received"
While this commit was a good attempt to fix issues occuring when no
multicast querier is present, this commit still has two more issues:

1) There are cases where mdb entries do not expire even if there is a
querier present. The bridge will unnecessarily continue flooding
multicast packets on the according ports.

2) Never removing an mdb entry could be exploited for a Denial of
Service by an attacker on the local link, slowly, but steadily eating up
all memory.

Actually, this commit became obsolete with
"bridge: disable snooping if there is no querier" (b00589af3b)
which included fixes for a few more cases.

Therefore reverting the following commits (the commit stated in the
commit message plus three of its follow up fixes):

====================
Revert "bridge: update mdb expiration timer upon reports."
This reverts commit f144febd93.
Revert "bridge: do not call setup_timer() multiple times"
This reverts commit 1faabf2aab.
Revert "bridge: fix some kernel warning in multicast timer"
This reverts commit c7e8e8a8f7.
Revert "bridge: only expire the mdb entry when query is received"
This reverts commit 9f00b2e7cf.
====================

CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Reviewed-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22 14:41:02 -04:00
ZHAO Gang
0a6957e7d4 net: remove function sk_reset_txq()
What sk_reset_txq() does is just calls function sk_tx_queue_reset(),
and sk_reset_txq() is used only in sock.h, by dst_negative_advice().
Let dst_negative_advice() calls sk_tx_queue_reset() directly so we
can remove unneeded sk_reset_txq().

Signed-off-by: ZHAO Gang <gamerh2o@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22 14:00:21 -04:00
Andy Zhou
1bd7116f1c openvswitch: collect mega flow mask stats
Collect mega flow mask stats. ovs-dpctl show command can be used to
display them for debugging and performance tuning.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-10-22 10:42:46 -07:00
Jozsef Kadlecsik
1a869205c7 netfilter: ipset: The unnamed union initialization may lead to compilation error
The unnamed union should be possible to be initialized directly, but
unfortunately it's not so:

/usr/src/ipset/kernel/net/netfilter/ipset/ip_set_hash_netnet.c: In
function ?hash_netnet4_kadt?:
/usr/src/ipset/kernel/net/netfilter/ipset/ip_set_hash_netnet.c:141:
error: unknown field ?cidr? specified in initializer

Reported-by: Husnu Demir <hdemir@metu.edu.tr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-22 10:14:01 +02:00
Jozsef Kadlecsik
93302880d8 netfilter: ipset: Use netlink callback dump args only
Instead of cb->data, use callback dump args only and introduce symbolic
names instead of plain numbers at accessing the argument members.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-22 10:13:59 +02:00
Will Deacon
b416c144f4 netfilter: x_tables: fix ordering of jumpstack allocation and table update
During kernel stability testing on an SMP ARMv7 system, Yalin Wang
reported the following panic from the netfilter code:

  1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
  [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
  [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
  [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
  [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
  [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
  [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
  [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
  [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
  [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
  [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
  [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
  [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
  [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
  Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
  ---[ end trace da227214a82491bd ]---
  Kernel panic - not syncing: Fatal exception in interrupt

This comes about because CPU1 is executing xt_replace_table in response
to a setsockopt syscall, resulting in:

	ret = xt_jumpstack_alloc(newinfo);
		--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);

	[...]

	table->private = newinfo;
	newinfo->initial_entries = private->initial_entries;

Meanwhile, CPU0 is handling the network receive path and ends up in
ipt_do_table, resulting in:

	private = table->private;

	[...]

	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];

On weakly ordered memory architectures, the writes to table->private
and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
Furthermore, on architectures which don't respect ordering of address
dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.

This patch adds an smp_wmb() before the assignment to table->private
(which is essentially publishing newinfo) to ensure that all writes to
newinfo will be observed before plugging it into the table structure.
A dependent-read barrier is also added on the consumer sides, to ensure
the same ordering requirements are also respected there.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Tested-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-22 10:11:29 +02:00
Neal Cardwell
02cf4ebd82 tcp: initialize passive-side sk_pacing_rate after 3WHS
For passive TCP connections, upon receiving the ACK that completes the
3WHS, make sure we set our pacing rate after we get our first RTT
sample.

On passive TCP connections, when we receive the ACK completing the
3WHS we do not take an RTT sample in tcp_ack(), but rather in
tcp_synack_rtt_meas(). So upon receiving the ACK that completes the
3WHS, tcp_ack() leaves sk_pacing_rate at its initial value.

Originally the initial sk_pacing_rate value was 0, so passive-side
connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs
made in the first RTT. With a default initial cwnd of 10 packets, this
happened to be correct for RTTs 5ms or bigger, so it was hard to
see problems in WAN or emulated WAN testing.

Since 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing"), the
initial sk_pacing_rate is 0xffffffff. So after that change, passive
TCP connections were keeping this value (and using large numbers of
segments per skbuff) until receiving an ACK for data.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:56:23 -04:00
Hannes Frederic Sowa
c2f17e827b ipv6: probe routes asynchronous in rt6_probe
Routes need to be probed asynchronous otherwise the call stack gets
exhausted when the kernel attemps to deliver another skb inline, like
e.g. xt_TEE does, and we probe at the same time.

We update neigh->updated still at once, otherwise we would send to
many probes.

Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:56:22 -04:00
Eric Dumazet
61c1db7fae ipv6: sit: add GSO/TSO support
Now ipv6_gso_segment() is stackable, its relatively easy to
implement GSO/TSO support for SIT tunnels

Performance results, when segmentation is done after tunnel
device (as no NIC is yet enabled for TSO SIT support) :

Before patch :

lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      3168.31   4.81     4.64     2.988   2.877

After patch :

lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      5525.00   7.76     5.17     2.763   1.840

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:49:39 -04:00
Eric Dumazet
d3e5e0062d ipv6: gso: make ipv6_gso_segment() stackable
In order to support GSO on SIT tunnels, we need to make
inet_gso_segment() stackable.

It should not assume network header starts right after mac
header.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:49:39 -04:00
Eric W. Biederman
fd2d5356d9 ipv4: Allow unprivileged users to use per net sysctls
Allow unprivileged users to use:
/proc/sys/net/ipv4/icmp_echo_ignore_all
/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts
/proc/sys/net/ipv4/icmp_ignore_bogus_error_response
/proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr
/proc/sys/net/ipv4/icmp_ratelimit
/proc/sys/net/ipv4/icmp_ratemask
/proc/sys/net/ipv4/ping_group_range
/proc/sys/net/ipv4/tcp_ecn
/proc/sys/net/ipv4/ip_local_ports_range

These are occassionally handy and after a quick review I don't see
any problems with unprivileged users using them.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:03 -04:00
Eric W. Biederman
0a6fa23dcb ipv4: Use math to point per net sysctls into the appropriate struct net.
Simplify maintenance of ipv4_net_table by using math to point the per
net sysctls into the appropriate struct net, instead of manually
reassinging all of the variables into hard coded table slots.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Eric W. Biederman
2e685cad57 tcp_memcontrol: Kill struct tcp_memcontrol
Replace the pointers in struct cg_proto with actual data fields and kill
struct tcp_memcontrol as it is not fully redundant.

This removes a confusing, unnecessary layer of abstraction.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Eric W. Biederman
a4fe34bf90 tcp_memcontrol: Remove the per netns control.
The code that is implemented is per memory cgroup not per netns, and
having per netns bits is just confusing.  Remove the per netns bits to
make it easier to see what is really going on.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Eric W. Biederman
f594d63199 tcp_memcontrol: Remove setting cgroup settings via sysctl
The code is broken and does not constrain sysctl_tcp_mem as
tcp_update_limit does.  With the result that it allows the cgroup tcp
memory limits to be bypassed.

The semantics are broken as the settings are not per netns and are in a
per netns table, and instead looks at current.

Since the code is broken in both design and implementation and does not
implement the functionality for which it was written remove it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Eric W. Biederman
cd91cce620 tcp_memcontrol: Remove tcp_max_memory
This function is never called. Remove it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:43:02 -04:00
Julian Anastasov
56e42441ed netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper
Now when rt6_nexthop() can return nexthop address we can use it
for proper nexthop comparison of directly connected destinations.
For more information refer to commit bbb5823cf7
("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper").

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:37:01 -04:00
Julian Anastasov
550bab42f8 ipv6: fill rt6i_gateway with nexthop address
Make sure rt6i_gateway contains nexthop information in
all routes returned from lookup or when routes are directly
attached to skb for generated ICMP packets.

The effect of this patch should be a faster version of
rt6_nexthop() and the consideration of local addresses as
nexthop.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:37:01 -04:00
Gustavo Padovan
d78a32a8fc Bluetooth: Remove sk member from struct l2cap_chan
There is no access to chan->sk in L2CAP core now. This change marks the
end of the task of splitting L2CAP between Core and Socket, thus sk is now
gone from struct l2cap_chan.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 13:50:56 -07:00
Gustavo Padovan
7f5396a774 Bluetooth: Use bt_cb(skb)->chan to send raw data back
Instead of accessing skb->sk in L2CAP core we now compare the channel
a skb belongs to and not send it back if the channel is same. This change
removes another struct socket usage from L2CAP core.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 13:50:55 -07:00
Gustavo Padovan
0e790c64f3 Bluetooth: Add L2CAP channel to skb private data
Adding the channel to the skb private data makes possible to us know which
channel the skb we have came from.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 13:50:55 -07:00
Gustavo Padovan
8ffb929098 Bluetooth: Remove parent socket usage from l2cap_core.c
The parent socket is not used inside the L2CAP core anymore. We only lock
it to indirect access through the new_connection() callback. The hold of
the socket lock was moved to the new_connection() callback.

Inside L2CAP core the channel lock is now used in l2cap_le_conn_ready()
and l2cap_conn_ready() to protect the execution of these two functions
during the handling of new incoming connections.

This change remove the socket lock usage from L2CAP core while keeping
the code safe against race conditions.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 12:58:17 -07:00
Gustavo Padovan
f93fa27323 Bluetooth: Remove socket lock from l2cap_state_change()
This simplify and make safer the state change handling inside l2cap_core.c.
we got rid of __l2cap_state_change(). And l2cap_state_change() doesn't lock
the socket anymore, instead the socket is locked inside the ops callback for
state change in l2cap_sock.c.

It makes the code safer because in some we were using a unlocked version,
and now we are calls to l2cap_state_change(), when dealing with sockets, use
the locked version.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 12:58:17 -07:00
Gustavo Padovan
acdcabf532 Bluetooth: Hold socket in defer callback in L2CAP socket
In both places that we use the defer callback the socket lock is held for
a indirect sk access inside __l2cap_change_state() and chan->ops->defer(),
all the rest of the code between lock_sock() and release_sock() is
already protected by the channel lock and won't be affected by this
change.

We now use l2cap_change_state(), the locked version of the change state
function, and the defer callback does the locking itself now. This does
not affect other uses of the defer callback.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 12:58:17 -07:00
Gustavo Padovan
0f2c615374 Bluetooth: Do not access chan->sk directly
In the process of removing socket usage from L2CAP we now access the L2CAP
socket from the data member of struct l2cap_chan. For the L2CAP socket
user the data member points to the L2CAP socket.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 12:58:16 -07:00
Gustavo Padovan
d42970f319 Bluetooth: Remove not used struct sock
It is a leftover from the recent effort of remove sk usage from L2CAP
core.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 12:58:16 -07:00
Johan Hedberg
547003b114 Bluetooth: Fix enabling fast connectable on LE-only controllers
The current "fast connectable" feature is BR/EDR-only, so add a proper
check for BR/EDR support before proceeding with the associated HCI
commands.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21 07:18:27 -07:00
Michal Kazior
bbe09bbcf4 cfg80211: update dfs_state_entered upon dfs_state change
The timestamp wasn't updated after transitioning
to the NL80211_DFS_USABLE state after NOP time.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-21 14:50:27 +02:00
Michal Kazior
c532a58b0f cfg80211: fix DFS channel recovery timeout
The timeout was not properly converted from msecs
to jiffies. As a result channel transition to
NL80211_DFS_USABLE was delayed depending on
CONFIG_HZ configuration, e.g. HZ=100 would delay
the NOP from 30 minutes to 300 minutes.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-21 14:50:27 +02:00
Johannes Berg
79845c662e cfg80211: fix scheduled scan pointer access
Since rdev->sched_scan_req is dereferenced outside the
lock protecting it, this might be done at the wrong
time, causing crashes. Move the dereference to where
it should be - inside the RTNL locked section.

Cc: stable@vger.kernel.org [3.8+]
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-21 11:59:15 +02:00
Steffen Klassert
4d53eff48b xfrm: Don't queue retransmitted packets if the original is still on the host
It does not make sense to queue retransmitted packets if the
original packet is still in some queue of this host. So add
a check to xdst_queue_output() and drop the packet if the
original packet is not yet sent.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
2013-10-21 09:45:20 +02:00
Eric Dumazet
5cf4eb54c2 xfrm: use vmalloc_node() for percpu scratches
scratches are per cpu, we can use vmalloc_node() for proper
NUMA affinity.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-21 09:38:24 +02:00
Johan Hedberg
9a43e25fff Bluetooth: Update Set Discoverable to support LE
This patch updates the Set Discoverable management command to also be
applicable for LE. In particular this affects the advertising flags
where we can say "general discoverable" or "limited discoverable".

Since the device flags may not be up-to-date when the advertising data
is written this patch introduces a get_adv_discov_flags() helper
function which also looks at any pending mgmt commands (a pending
set_discoverable would be the exception when the flags are not yet
correct).

The patch also adds HCI_DISCOVERABLE flag clearing to the
mgmt_discoverable_timeout function, since the code was previously
relying on the mgmt_discoverable callback to handle this, which is only
called for the BR/EDR-only HCI_Write_Scan_Enable command.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20 09:05:41 -07:00
Johan Hedberg
b456f87cb0 Bluetooth: Move HCI_LIMITED_DISCOVERABLE changes to a general place
We'll soon be introducing also LE support for the Set Discoverable
management command, so move the HCI_LIMITED_DISCOVERABLE flag clearing
and setting out from the if-branch that is only used for a BR/EDR
specific HCI command.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20 09:05:41 -07:00
Johan Hedberg
4b580614e1 Bluetooth: Fix sending write_scan_enable when BR/EDR is disabled
We should only send the HCI_Write_Scan_Enable command from
mgmt_set_powered_failed() when BR/EDR support is enabled. This is
particularly important when the discoverable setting is also tied to LE.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20 09:05:40 -07:00
Johan Hedberg
eb2a8d202f Bluetooth: Move mgmt_pending_find to avoid forward declarations
We will soon need this function for updating the advertising data, so
move it higher up in mgmt.c to avoid a forward declaration.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20 09:05:40 -07:00
Johan Hedberg
a81070ba37 Bluetooth: Fix updating settings when there are no HCI commands to send
It is possible that the Set Connectable management command doesn't cause
any HCI commands to send (such as when BR/EDR is disabled). We can't
just send a response to user space in this case but must also update the
necessary device flags and settings. This patch fixes the issue by using
the recently introduced set_connectable_update_settings function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20 09:05:40 -07:00
Johan Hedberg
e8ba3a1f08 Bluetooth: Refactor set_connectable settings update to separate function
We will need to directly update the device flags and notify user space
of the new settings not just when we're powered off but also if it turns
out that there are no HCI commands to send (which can happen in
particular when BR/EDR is disabled). Since this is a considerable amount
of code, refactor it to a separate function so it can be reused for the
"no HCI commands to send" case.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20 09:05:40 -07:00
Johan Hedberg
f87ea1dabb Bluetooth: Add missing check for BREDR_ENABLED flag in update_class()
We shouldn't be sending the HCI_Write_Class_Of_Device command when
BR/EDR is disabled since this is a BR/EDR-only command.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20 09:05:40 -07:00
Johan Hedberg
10994ce6e6 Bluetooth: Check for flag instead of features in update_adv_data()
It's better to check for the device flag instead of device features so
that we avoid unnecessary HCI commands when the feature is supported but
disabled (i.e. the flag is unset).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20 09:05:40 -07:00
Johan Hedberg
7751ef1b31 Bluetooth: Check for flag instead of features in update_scan_rsp_data()
It's better to check for the device flag instead of device features so
that we avoid unnecessary HCI commands when the feature is supported but
disabled (i.e. the flag is unset).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20 09:05:40 -07:00
David S. Miller
5bf47256f5 Included changed:
- email addresses update in documentation, source files and MAINTAINERS
 - make the TT component distinguish non-mesh clients based on the VLAN they
   belong to
 - improve all the internal components to properly work on a per-VLAN basis
   (enabled by the new TT-VLAN feature)
 - enhance the sysfs interface in order to provide behaviour switches on a
   per-VLAN basis (enabled by the new TT-VLAN feature)
 - improve TT lock mechanism
 - improve unicast transmission APIs
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJSYvnjAAoJEADl0hg6qKeOGFAQAK+P3XkProP+MWhgpQzWDc+F
 TCVvi3eQ9NKVzxVnglRTznEVqXePVArK9oWb39KbCeguoqsuo7T8I+oNK06qPCH6
 1aodkBqJLq0OT1EWIxMo+1eOHCevRRqBjS1Jh0DMxuugMsKZZu3/DrHK59ay/4y7
 8wRb8CqQrKpILsh43cKRm9SPNJj0nmFPIwoWmgu++ffPyfIPMnBHSowMEgxqJ3h4
 Vp4adjJQU2D3qa1Vln99MYzdJaUhRDVDjxdAroCbuk6M1bl9o88UjhFxRvZJN8JN
 HdxiMN1hvlDJ7OsiBGw42RROnibyqkui8BZl5hP85sjbKSSU9lCqMJ1XWW+gVNhx
 sKA7LIm7NPNW9Ysvgd3FhjX/cg18WgjC2HHU26uMhYmGrGUfP8eBw55XidabApgb
 TpGhKjFxhYqfGnPhAtarsqLYfxWh6vbb1G6cyaC5jJ4baIa5YKqt8tejHCNiFJLI
 WrVnmi0TJfGjdoULfUdkBOx/pI6zyZ3PWPISbIDUIslQXrnEzKUj37VbN3N0Qlj1
 QcVcC+iVd3/gJ/dnvKzmeGjsm2nKK5eEwewRtNuPkQSaM13/dN2CUQ4/+/6BSY1D
 wODn+Wc5zCoi8sxvVb7TcT+NLO27QkH0REJh4W9KxJx9NSws0BfiVBcTKPFCra+x
 gMsgNYNdgoCTLQBNj8KK
 =sEEg
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
this is another batch intended for net-next/linux-3.13.

This pull request is a bit bigger than usual, but 6 patches are very small
(three of them are about email updates)..

Patch 1 is fixing a previous merge conflict resolution that went wrong
(I realised that only now while checking other patches..).
Patches from 2 to 4 that are updating our emails in all the proper files
(Documentation/, headers and MAINTAINERS).

Patches 5, 6 and 7 are bringing a big improvement to the TranslationTable
component: it is now able to group non-mesh clients based on the VLAN they
belong to. In this way a lot a new enhancements are now possible thanks to the
fact that each batman-adv behaviour can be applied on a per VLAN basis.

And, of course, in patches from 8 to 12 you have some of the enhancements I was
talking about:
- make the batman-Gateway selection VLAN dependent
- make DAT (Distributed ARP Table) group ARP entries on a VLAN basis (this
  allows DAT to work even when the admin decided to use the same IP subnet on
  different VLANs)
- make the AP-Isolation behaviour switchable on each VLAN independently
- export VLAN specific attributes via sysfs. Switches like the AP-Isolation are
  now exported once per VLAN (backward compatibility of the sysfs interface has
  been preserved)

Patches 13 and 14 are small code cleanups.
Patch 15 is a minor improvement in the TT locking mechanism.

Patches 16 and 17 are other enhancements to the TT component. Those allow a
node to parse a "non-mesh client announcement message" and accept only those
TT entries belonging to certain VLANs.

Patch 18 exploits this parse&accept mechanism to make the Bridge Loop Avoidance
component reject only TT entries connected to the VLAN where it is operating.
Previous to this change, BLA was rejecting all the entries coming from any other
Backbone node, regardless of the VLAN (for more details about how the Bridge
Loop Avoidance works please check [1]).

[1] http://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-avoidance-II
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:52:42 -04:00
Hannes Frederic Sowa
e34c9a6997 net: switch net_secret key generation to net_get_random_once
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:36 -04:00
Hannes Frederic Sowa
222e83d2e0 tcp: switch tcp_fastopen key generation to net_get_random_once
Changed key initialization of tcp_fastopen cookies to net_get_random_once.

If the user sets a custom key net_get_random_once must be called at
least once to ensure we don't overwrite the user provided key when the
first cookie is generated later on.

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Hannes Frederic Sowa
1bbdceef1e inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once
Initialize the ehash and ipv6_hash_secrets with net_get_random_once.

Each compilation unit gets its own secret now:
  ipv4/inet_hashtables.o
  ipv4/udp.o
  ipv6/inet6_hashtables.o
  ipv6/udp.o
  rds/connection.o

The functions still get inlined into the hashing functions. In the fast
path we have at most two (needed in ipv6) if (unlikely(...)).

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Hannes Frederic Sowa
b23a002fc6 inet: split syncookie keys for ipv4 and ipv6 and initialize with net_get_random_once
This patch splits the secret key for syncookies for ipv4 and ipv6 and
initializes them with net_get_random_once. This change was the reason I
did this series. I think the initialization of the syncookie_secret is
way to early.

Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Hannes Frederic Sowa
a48e42920f net: introduce new macro net_get_random_once
net_get_random_once is a new macro which handles the initialization
of secret keys. It is possible to call it in the fast path. Only the
initialization depends on the spinlock and is rather slow. Otherwise
it should get used just before the key is used to delay the entropy
extration as late as possible to get better randomness. It returns true
if the key got initialized.

The usage of static_keys for net_get_random_once is a bit uncommon so
it needs some further explanation why this actually works:

=== In the simple non-HAVE_JUMP_LABEL case we actually have ===
no constrains to use static_key_(true|false) on keys initialized with
STATIC_KEY_INIT_(FALSE|TRUE). So this path just expands in favor of
the likely case that the initialization is already done. The key is
initialized like this:

___done_key = { .enabled = ATOMIC_INIT(0) }

The check

                if (!static_key_true(&___done_key))                     \

expands into (pseudo code)

                if (!likely(___done_key > 0))

, so we take the fast path as soon as ___done_key is increased from the
helper function.

=== If HAVE_JUMP_LABELs are available this depends ===
on patching of jumps into the prepared NOPs, which is done in
jump_label_init at boot-up time (from start_kernel). It is forbidden
and dangerous to use net_get_random_once in functions which are called
before that!

At compilation time NOPs are generated at the call sites of
net_get_random_once. E.g. net/ipv6/inet6_hashtable.c:inet6_ehashfn (we
need to call net_get_random_once two times in inet6_ehashfn, so two NOPs):

      71:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
      76:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Both will be patched to the actual jumps to the end of the function to
call __net_get_random_once at boot time as explained above.

arch_static_branch is optimized and inlined for false as return value and
actually also returns false in case the NOP is placed in the instruction
stream. So in the fast case we get a "return false". But because we
initialize ___done_key with (enabled != (entries & 1)) this call-site
will get patched up at boot thus returning true. The final check looks
like this:

                if (!static_key_true(&___done_key))                     \
                        ___ret = __net_get_random_once(buf,             \

expands to

                if (!!static_key_false(&___done_key))                     \
                        ___ret = __net_get_random_once(buf,             \

So we get true at boot time and as soon as static_key_slow_inc is called
on the key it will invert the logic and return false for the fast path.
static_key_slow_inc will change the branch because it got initialized
with .enabled == 0. After static_key_slow_inc is called on the key the
branch is replaced with a nop again.

=== Misc: ===
The helper defers the increment into a workqueue so we don't
have problems calling this code from atomic sections. A seperate boolean
(___done) guards the case where we enter net_get_random_once again before
the increment happend.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Hannes Frederic Sowa
b50026b5ac ipv6: split inet6_ehashfn to hash functions per compilation unit
This patch splits the inet6_ehashfn into separate ones in
ipv6/inet6_hashtables.o and ipv6/udp.o to ease the introduction of
seperate secrets keys later.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:34 -04:00
Hannes Frederic Sowa
65cd8033ff ipv4: split inet_ehashfn to hash functions per compilation unit
This duplicates a bit of code but let's us easily introduce
separate secret keys later. The separate compilation units are
ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:34 -04:00
Eric Dumazet
cb32f511a7 ipip: add GSO/TSO support
Now inet_gso_segment() is stackable, its relatively easy to
implement GSO/TSO support for IPIP

Performance results, when segmentation is done after tunnel
device (as no NIC is yet enabled for TSO IPIP support) :

Before patch :

lpq83:~# ./netperf -H 7.7.9.84 -Cc
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      3357.88   5.09     3.70     2.983   2.167

After patch :

lpq83:~# ./netperf -H 7.7.9.84 -Cc
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      7710.19   4.52     6.62     1.152   1.687

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:36:19 -04:00
Eric Dumazet
3347c96029 ipv4: gso: make inet_gso_segment() stackable
In order to support GSO on IPIP, we need to make
inet_gso_segment() stackable.

It should not assume network header starts right after mac
header.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:36:18 -04:00
Eric Dumazet
2d26f0a3c0 ipv4: generalize gre_handle_offloads
This patch makes gre_handle_offloads() more generic
and rename it to iptunnel_handle_offloads()

This will be used to add GSO/TSO support to IPIP tunnels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:36:18 -04:00
Eric Dumazet
030737bcc3 net: generalize skb_segment()
While implementing GSO/TSO support for IPIP, I found skb_segment()
was assuming network header was immediately following mac header.

Its not really true in the case inet_gso_segment() is stacked :
By the time tcp_gso_segment() is called, network header points
to the inner IP header.

Let's instead assume nothing and pick the current offsets found in
original skb, we have skb_headers_offset_update() helper for that.

Also move the csum_start update inside skb_headers_offset_update()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:36:18 -04:00
Jiri Pirko
e93b7d748b ip_output: do skb ufo init for peeked non ufo skb as well
Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

introduced by:
commit e89e9cf539 "[IPv4/IPv6]: UFO Scatter-gather approach"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:20:52 -04:00
Jiri Pirko
c547dbf55d ip6_output: do skb ufo init for peeked non ufo skb as well
Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

introduced by:
commit e89e9cf539 "[IPv4/IPv6]: UFO Scatter-gather approach"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:20:52 -04:00
Jiri Pirko
e36d3ff911 udp6: respect IPV6_DONTFRAG sockopt in case there are pending frames
if up->pending != 0 dontfrag is left with default value -1. That
causes that application that do:
sendto len>mtu flag MSG_MORE
sendto len>mtu flag 0
will receive EMSGSIZE errno as the result of the second sendto.

This patch fixes it by respecting IPV6_DONTFRAG socket option.

introduced by:
commit 4b340ae20d "IPv6: Complete IPV6_DONTFRAG support"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:20:52 -04:00
Eric Dumazet
b917eb155c ipv6: gso: remove redundant locking
ipv6_gso_send_check() and ipv6_gso_segment() are called by
skb_mac_gso_segment() under rcu lock, no need to use
rcu_read_lock() / rcu_read_unlock()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:14:14 -04:00
Joe Perches
c1b1203d65 net: misc: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:12:11 -04:00
Joe Perches
7e58487b8c net: ipv4/ipv6: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:12:11 -04:00
Joe Perches
a402a5aa9b net: dccp: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:12:11 -04:00
Joe Perches
348662a142 net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:12:11 -04:00
Eric Dumazet
47d27aad44 ipv4: gso: send_check() & segment() cleanups
inet_gso_segment() and inet_gso_send_check() are called by
skb_mac_gso_segment() under rcu lock, no need to use
rcu_read_lock() / rcu_read_unlock()

Avoid calling ip_hdr() twice per function.

We can use ip_send_check() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:11:56 -04:00
Daniel Borkmann
90c6bd34f8 net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race
In the case of credentials passing in unix stream sockets (dgram
sockets seem not affected), we get a rather sparse race after
commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").

We have a stream server on receiver side that requests credential
passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
on each spawned/accepted socket on server side to 1 first (as it's
not inherited), it can happen that in the time between accept() and
setsockopt() we get interrupted, the sender is being scheduled and
continues with passing data to our receiver. At that time SO_PASSCRED
is neither set on sender nor receiver side, hence in cmsg's
SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
(== overflow{u,g}id) instead of what we actually would like to see.

On the sender side, here nc -U, the tests in maybe_add_creds()
invoked through unix_stream_sendmsg() would fail, as at that exact
time, as mentioned, the sender has neither SO_PASSCRED on his side
nor sees it on the server side, and we have a valid 'other' socket
in place. Thus, sender believes it would just look like a normal
connection, not needing/requesting SO_PASSCRED at that time.

As reverting 16e5726 would not be an option due to the significant
performance regression reported when having creds always passed,
one way/trade-off to prevent that would be to set SO_PASSCRED on
the listener socket and allow inheriting these flags to the spawned
socket on server side in accept(). It seems also logical to do so
if we'd tell the listener socket to pass those flags onwards, and
would fix the race.

Before, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
        msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
        cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
        msg_flags=0}, 0) = 5

After, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
        msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
        cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
        msg_flags=0}, 0) = 5

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 18:50:15 -04:00
Antonio Quartulli
cfd4f75701 batman-adv: make the backbone gw check VLAN specific
The backbone gw check has to be VLAN specific so that code
using it can specify VID where the check has to be done.

In the TT code, the check has been moved into the
tt_global_add() function so that it can be performed on a
per-entry basis instead of ignoring all the TT data received
from another backbone node. Only TT global entries belonging
to the VLAN where the backbone node is connected to are
skipped.
All the other spots where the TT code was checking whether a
node is a backbone have been removed.

Moreover, batadv_bla_is_backbone_gw_orig() now returns bool
since it used to return only 1 or 0.

Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 23:25:38 +02:00
Antonio Quartulli
95fb130d68 batman-adv: make the TT global purge routine VLAN specific
Instead of unconditionally removing all the TT entries
served by a given originator, make tt_global_orig_del()
remove only entries matching a given VLAN identifier
provided as argument.

If such argument is negative all the global entries
served by the originator are removed.

This change is used into the BLA code to purge entries
served by a newly discovered Backbone node, but limiting
the operation only to those connected to the VLAN where the
backbone has been discovered.

Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 23:25:37 +02:00
Antonio Quartulli
7ea7b4a142 batman-adv: make the TT CRC logic VLAN specific
This change allows nodes to handle the TT table on a
per-VLAN basis. This is needed because nodes may have to
store only some of the global entries advertised by another
node.

In this scenario such nodes would re-create only a partial
global table and would not be able to compute a correct CRC
anymore.

This patch splits the logic and introduces one CRC per VLAN.
In this way a node fetching only some entries belonging to
some VLANs is still able to compute the needed CRCs and
still check the table correctness.

With this patch the shape of the TVLV-TT is changed too
because now a node needs to advertise all the CRCs of all
the VLANs that it is wired to.

The debug output of the local Translation Table now shows
the CRC along with each entry since there is not a common
value for the entire table anymore.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 23:25:12 +02:00
Greg Kroah-Hartman
a7204d72db Merge 3.12-rc6 into driver-core-next
We want these fixes here too.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-19 13:05:38 -07:00
Marcel Holtmann
2be48b6542 Bluetooth: Fix minor coding style issue in hci_core.c
A few variable assignments ended up with missing a space between the
variable and equal sign.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 20:46:27 +03:00
Marcel Holtmann
58f01aa93f Bluetooth: Fix UUID values in debugfs file
The uuid entry struct is used for the UUID byte stream. That is
actually the wrong value. The correct value is uuid->uuid.

Besides fixing this up, use the %pUb modifier to print the UUID
string. However since the UUID is stored in big endian with
reversed byte order, change the byte order before printing.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 20:46:12 +03:00
Marcel Holtmann
4b4148e9ac Bluetooth: Add support for setting DUT mode
The Device Under Test (DUT) mode is useful for doing certification
testing and so expose this as debugfs option.

This mode is actually special since you can only enter it. Restoring
normal operation means that a HCI Reset is required. The current mode
value gets tracked as a new device flag and when disabling it, the
correct command to reset the controller is sent.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 18:56:56 +03:00
Marcel Holtmann
4e70c7e71c Bluetooth: Expose debugfs settings for LE connection interval
For testing purposes expose the default LE connection interval values
via debugfs.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 18:56:54 +03:00
Marcel Holtmann
06f5b7785a Bluetooth: Add support for setting SSP debug mode
Enabling and disabling SSP debug mode is useful for development. This
adds a debugfs entry that allows to configure the SSP debug mode.

On purpose this has been implemented as debugfs entry and not a public
API since it is really only useful during testing and development.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 18:56:52 +03:00
Antonio Quartulli
a70a9aa990 batman-adv: lock around TT operations to avoid sending inconsistent data
A TT response may be prepared and sent while the local or
global translation table is getting updated.

The worst case is when one of the tables is accessed after
its content has been recently updated but the metadata
(TTVN/CRC) has not yet. In this case the reader will get a
table content which does not match the TTVN/CRC.
This will lead to an inconsistent state and so to a TT
recovery.

To avoid entering this situation, put a lock around those TT
operations recomputing the metadata and around the TT
Response creation (the latter is the only reader that
accesses the metadata together with the table).

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 17:31:56 +02:00
Antonio Quartulli
e75de4fa41 batman-adv: remove bogus comment
this comment refers to the old batmand codebase and does
not make sense anymore. Remove it

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 17:31:55 +02:00
Linus Lüssing
e300d31466 batman-adv: refine API calls for unicast transmissions of SKBs
With this patch the functions batadv_send_skb_unicast() and
batadv_send_skb_unicast_4addr() are further refined into
batadv_send_skb_via_tt(), batadv_send_skb_via_tt_4addr() and
batadv_send_skb_via_gw(). This way we avoid any "guessing" about where to send
a packet in the unicast forwarding methods and let the callers decide.

This is going to be useful for the upcoming multicast related patches in
particular.

Further, the return values were polished a little to use the more
appropriate NET_XMIT_* defines.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-19 17:31:54 +02:00
Antonio Quartulli
b8cbd81d09 batman-adv: make the AP isolation attribute VLAN specific
AP isolation has to be enabled on one VLAN interface only.
This patch moves the AP isolation attribute to the per-vlan
interface attribute set, enabling it to have a different
value depending on the selected vlan.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 17:28:47 +02:00
Antonio Quartulli
90f4435da4 batman-adv: add sysfs framework for VLAN
Each VLAN can now have its own set of attributes which are
exported through a new subfolder in the sysfs tree.
Each VLAN created on top of a soft_iface will have its own
subfolder.

The subfolder is named "vlan%VID" and it is created inside
the "mesh" sysfs folder belonging to batman-adv.

Attributes corresponding to the untagged LAN are stored in
the root sysfs folder as before.

This patch also creates all the needed macros and data
structures to easily handle new VLAN spacific attributes.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 17:28:42 +02:00
Antonio Quartulli
5d2c05b213 batman-adv: add per VLAN interface attribute framework
Since batman-adv is now fully VLAN-aware, a proper framework
able to handle per-vlan-interface attributes is needed.

Those attributes will affect the associated VLAN interface
only, rather than the real soft_iface (which would result
in every vlan interface having the same attribute
configuration).

To make the code simpler and easier to extend, attributes
associated to the standalone soft_iface are now treated
like belonging to yet another vlan having a special vid.
This vid is different from the others because it is made up
by all zeros and the VLAN_HAS_TAG bit is not set.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 17:28:08 +02:00
Antonio Quartulli
be1db4f661 batman-adv: make the Distributed ARP Table vlan aware
The same IP subnet can be used on different VLANs, therefore
DAT has to differentiate whether the IP to resolve belongs
to one or the other virtual LAN.
To accomplish this task DAT has to deal with the VLAN tag
and store it together with each ARP entry.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 17:28:07 +02:00
Antonio Quartulli
bbb877ed77 batman-adv: make the GW module correctly talk to the new VLAN-TT
The gateway code is now adapted in order to correctly
interact with the Translation Table component by using the
vlan ID

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 17:27:11 +02:00
Marcel Holtmann
3497ac84bd Bluetooth: Remove interval parameter from HCI connection
The conn->interval parameter of HCI connections is not used at all
and so just remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:50:05 +03:00
Marcel Holtmann
cfbb2b5b91 Bluetooth: Add LE features to debugfs if available
For LE capable controllers at the special LE features page to the
debugfs list with all the other features pages.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:49:54 +03:00
Marcel Holtmann
12c269d7e3 Bluetooth: Expose setting if debug keys are used or not
The system can be figured to accept and use debug keys. Expose this
value in debugfs for debugging purposes.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:30:44 +03:00
Marcel Holtmann
922021854b Bluetooth: Expose debugfs entry read/write own address type
For some testing it is important to know the current own addres type,
but also be able to change it. The change is lost over powery cycles
and only intended for debugging.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:29:03 +03:00
Marcel Holtmann
79830f66e3 Bluetooth: Select the own address type during initial setup phase
The own address type is based on the fact if the controller has
a public address or not. This means that this detail can be just
configured once during setup phase.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:28:06 +03:00
Marcel Holtmann
8f8625cd80 Bluetooth: Expose current list of long term keys via debugfs
For debugging purposes expose the current list of long term keys
via debugfs. This file is read-only and limited to root access.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:26:20 +03:00
Marcel Holtmann
d0f729b8c1 Bluetooth: Expose white list size information in debugfs
Knowing the white list size information is important for
debugging. So export it via debugfs.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:26:10 +03:00
Marcel Holtmann
e132f7f6a2 Bluetooth: Remove bus attribute in favor of hierarchy
The bus information are exposed in the actual hierarchy and should
not be exposed as attribute.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:25:55 +03:00
Marcel Holtmann
02d08d15e0 Bluetooth: Expose current list of link keys via debugfs
For debugging purposes expose the current list of link keys via
debugfs. This file is read-only and limited to root access.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:24:50 +03:00
Marcel Holtmann
babdbb3c13 Bluetooth: Move export of class of device information into hci_core.c
The class of device debugfs information should be directly exported
from hci_core.c and so move them over there.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:24:27 +03:00
Marcel Holtmann
0d5551f5e4 Bluetooth: Store local version information only during setup phase
The local version information from the controller can not change
since they are static. So store them only once during setup
phase and not bother overwriting them every time this command
gets executed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:24:09 +03:00
Marcel Holtmann
ceeb3bc0f1 Bluetooth: Move manufacturer, hci_ver and hci_rev into hci_core.c
Move the debugfs entries for manufacturer, hci_ver and hci_rev into
hci_core.c and use the new helpers for static entries that will not
change at runtime. Once passed the setup procedure, they will stay
fixed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:23:51 +03:00
Marcel Holtmann
f96bc0a7f4 Bluetooth: Remove debug entry for connection features
The debug entry for connection features is incomplete and also does
not work with AMP controllers and physical links. So just remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:23:34 +03:00
Marcel Holtmann
57af75a8cf Bluetooth: Add workaround for buggy max_page features page value
Some controllers list the max_page value from the extended features
response as 0 when SSP has not yet been enabled. To workaround this
issue, force the max_page value to 1 when SSP support has been
detected.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:22:52 +03:00
Marcel Holtmann
dfb826a8b0 Bluetooth: Move HCI device features into hci_core.c
Move the handling of HCI device features debugfs into hci_core.c and
also extend it with handling of multiple feature pages.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-19 16:22:30 +03:00
Antonio Quartulli
1605278901 batman-adv: print the VID together with the TT entries
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19 15:11:25 +02:00