linux-hardened/net
Pavel Emelyanov 23fe18669e [NETNS]: Fix race between put_net() and netlink_kernel_create().
The comment about "race free view of the set of network
namespaces" was a bit hasty. Look (there even can be only
one CPU, as discovered by Alexey Dobriyan and Denis Lunev):

put_net()
  if (atomic_dec_and_test(&net->refcnt))
    /* true */
      __put_net(net);
        queue_work(...);

/*
 * note: the net now has refcnt 0, but still in
 * the global list of net namespaces
 */

== re-schedule ==

register_pernet_subsys(&some_ops);
  register_pernet_operations(&some_ops);
    (*some_ops)->init(net);
      /*
       * we call netlink_kernel_create() here
       * in some places
       */
      netlink_kernel_create();
         sk_alloc();
            get_net(net); /* refcnt = 1 */
         /*
          * now we drop the net refcount not to
          * block the net namespace exit in the
          * future (or this can be done on the
          * error path)
          */
         put_net(sk->sk_net);
             if (atomic_dec_and_test(&...))
                   /*
                    * true. BOOOM! The net is
                    * scheduled for release twice
                    */

When thinking on this problem, I decided, that getting and
putting the net in init callback is wrong. If some init
callback needs to have a refcount-less reference on the struct
net, _it_ has to be careful himself, rather than relying on
the infrastructure to handle this correctly.

In case of netlink_kernel_create(), the problem is that the
sk_alloc() gets the given namespace, but passing the info
that we don't want to get it inside this call is too heavy.

Instead, I propose to crate the socket inside an init_net
namespace and then re-attach it to the desired one right
after the socket is created.

After doing this, we also have to be careful on error paths
not to drop the reference on the namespace, we didn't get
the one on.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:22 -08:00
..
9p [NET] 9p: kill dead static inline buf_put_string 2008-01-31 19:27:05 -08:00
802 [TR]: Use ctl paths to register net/token-ring/ table 2008-01-28 14:56:28 -08:00
8021q [VLAN]: sparse warning fix 2008-01-28 15:10:17 -08:00
appletalk [APPLETALK]: Annotations to clear sparse warnings 2008-01-28 15:02:43 -08:00
atm [NETNS]: Add namespace parameter to ip_route_output_key. 2008-01-28 15:11:07 -08:00
ax25 [AX25]: Kill ax25_bind() user triggable printk. 2008-01-31 19:27:06 -08:00
bluetooth [BLUETOOTH]: Add conn add/del workqueues to avoid connection fail. 2008-01-31 19:27:12 -08:00
bridge [NETNS]: Add namespace parameter to ip_route_output_key. 2008-01-28 15:11:07 -08:00
can [CAN]: Add virtual CAN netdevice driver 2008-01-28 14:54:12 -08:00
core [NET]: Introducing socket mark socket option. 2008-01-31 19:27:19 -08:00
dccp [NETNS]: Add namespace parameter to ip_route_output_flow. 2008-01-28 15:11:06 -08:00
decnet [NETNS]: FIB rules API cleanup. 2008-01-28 15:08:13 -08:00
econet [NET]: Convert init_timer into setup_timer 2008-01-28 14:53:35 -08:00
ethernet [ETH]: Combine format_addr() with print_mac(). 2008-01-28 15:00:05 -08:00
ieee80211 ieee80211: beacon->capability is little-endian 2008-01-28 15:08:48 -08:00
ipv4 [XFRM]: constify 'struct xfrm_type' 2008-01-31 19:27:20 -08:00
ipv6 [XFRM]: constify 'struct xfrm_type' 2008-01-31 19:27:20 -08:00
ipx [NET]: Simple ctl_table to ctl_path conversions. 2008-01-28 15:01:07 -08:00
irda [IrDA]: LMP discovery timer not started by default 2008-01-28 15:10:54 -08:00
iucv [IUCV]: use LIST_HEAD instead of LIST_HEAD_INIT 2008-01-28 14:56:54 -08:00
key [XFRM] xfrm_policy_destroy: Rename and relative fixes. 2008-01-28 15:00:46 -08:00
lapb [LAPB] net/lapb/lapb_iface.c: use LIST_HEAD instead of LIST_HEAD_INIT 2008-01-28 14:56:52 -08:00
llc [NET]: Simple ctl_table to ctl_path conversions. 2008-01-28 15:01:07 -08:00
mac80211 mac80211: fixing null qos data frames check for reordering buffer 2008-01-31 19:26:38 -08:00
netfilter SELinux: Enable dynamic enable/disable of the network access checks 2008-01-30 08:17:26 +11:00
netlabel NetLabel: Add auditing to the static labeling mechanism 2008-01-30 08:17:29 +11:00
netlink [NETNS]: Fix race between put_net() and netlink_kernel_create(). 2008-01-31 19:27:22 -08:00
netrom [NET]: Simple ctl_table to ctl_path conversions. 2008-01-28 15:01:07 -08:00
packet [PACKET]: Fix sparse warnings in af_packet.c 2008-01-28 15:00:48 -08:00
rfkill rfkill: add the WiMAX radio type 2008-01-31 19:26:46 -08:00
rose [ROSE]: Supress sparse warnings 2008-01-28 15:02:44 -08:00
rxrpc [AF_RXRPC]: constify function pointer tables 2008-01-31 19:27:18 -08:00
sched [NET_SCHED]: Use nla_policy for attribute validation in ematches 2008-01-28 15:11:24 -08:00
sctp [SCTP]: Fix miss of report unrecognized HMAC Algorithm parameter 2008-01-31 19:27:09 -08:00
sunrpc Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-02-01 11:45:47 +11:00
tipc [TIPC]: Use tipc_port_unlock 2008-01-28 15:01:05 -08:00
unix [NET]: Add some acquires/releases sparse annotations. 2008-01-28 15:00:31 -08:00
wanrouter [NET]: Make /proc/net per network namespace 2007-10-10 16:49:06 -07:00
wireless WEXT: remove unused variable 2008-01-28 15:10:48 -08:00
x25 [AX25]: Beautify x25_init() version printk. 2008-01-31 19:27:06 -08:00
xfrm [XFRM]: constify 'struct xfrm_type' 2008-01-31 19:27:20 -08:00
compat.c [NETFILTER]: ip6_tables: add compat support 2008-01-28 14:58:36 -08:00
Kconfig [NETFILTER]: Add CONFIG_NETFILTER_ADVANCED option 2008-01-28 14:59:12 -08:00
Makefile [CAN]: Add PF_CAN core module 2008-01-28 14:54:10 -08:00
nonet.c
socket.c [NET] sysctl: make sysctl_somaxconn per-namespace 2008-01-28 14:56:57 -08:00
sysctl_net.c [NET]: Remove the empty net_table 2008-01-28 14:56:29 -08:00
TUNABLE