linux-hardened/net
Daniel Wagner 8a8e04df47 cgroup: Assign subsystem IDs during compile time
WARNING: With this change it is impossible to load external built
controllers anymore.

In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is
set, corresponding subsys_id should also be a constant. Up to now,
net_prio_subsys_id and net_cls_subsys_id would be of the type int and
the value would be assigned during runtime.

By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN
to IS_ENABLED, all *_subsys_id will have constant value. That means we
need to remove all the code which assumes a value can be assigned to
net_prio_subsys_id and net_cls_subsys_id.

A close look is necessary on the RCU part which was introduces by
following patch:

  commit f845172531
  Author:	Herbert Xu <herbert@gondor.apana.org.au>  Mon May 24 09:12:34 2010
  Committer:	David S. Miller <davem@davemloft.net>  Mon May 24 09:12:34 2010

  cls_cgroup: Store classid in struct sock

  Tis code was added to init_cgroup_cls()

	  /* We can't use rcu_assign_pointer because this is an int. */
	  smp_wmb();
	  net_cls_subsys_id = net_cls_subsys.subsys_id;

  respectively to exit_cgroup_cls()

	  net_cls_subsys_id = -1;
	  synchronize_rcu();

  and in module version of task_cls_classid()

	  rcu_read_lock();
	  id = rcu_dereference(net_cls_subsys_id);
	  if (id >= 0)
		  classid = container_of(task_subsys_state(p, id),
					 struct cgroup_cls_state, css)->classid;
	  rcu_read_unlock();

Without an explicit explaination why the RCU part is needed. (The
rcu_deference was fixed by exchanging it to rcu_derefence_index_check()
in a later commit, but that is a minor detail.)

So here is my pondering why it was introduced and why it safe to
remove it now. Note that this code was copied over to net_prio the
reasoning holds for that subsystem too.

The idea behind the RCU use for net_cls_subsys_id is to make sure we
get a valid pointer back from task_subsys_state(). task_subsys_state()
is just blindly accessing the subsys array and returning the
pointer. Obviously, passing in -1 as id into task_subsys_state()
returns an invalid value (out of lower bound).

So this code makes sure that only after module is loaded and the
subsystem registered, the id is assigned.

Before unregistering the module all old readers must have left the
critical section. This is done by assigning -1 to the id and issuing a
synchronized_rcu(). Any new readers wont call task_subsys_state()
anymore and therefore it is safe to unregister the subsystem.

The new code relies on the same trick, but it looks at the subsys
pointer return by task_subsys_state() (remember the id is constant
and therefore we allways have a valid index into the subsys
array).

No precautions need to be taken during module loading
module. Eventually, all CPUs will get a valid pointer back from
task_subsys_state() because rebind_subsystem() which is called after
the module init() function will assigned subsys[net_cls_subsys_id] the
newly loaded module subsystem pointer.

When the subsystem is about to be removed, rebind_subsystem() will
called before the module exit() function. In this case,
rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer
and then it calls synchronize_rcu(). All old readers have left by then
the critical section. Any new reader wont access the subsystem
anymore.  At this point we are safe to unregister the subsystem. No
synchronize_rcu() call is needed.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
2012-09-14 09:57:43 -07:00
..
9p net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
802 tokenring: delete all remaining driver support 2012-05-15 20:23:16 -04:00
8021q vlan: clean up vlan_dev_hard_start_xmit() 2012-08-14 14:33:32 -07:00
appletalk net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
atm atm: fix info leak via getsockname() 2012-08-15 21:36:30 -07:00
ax25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-19 11:17:30 -07:00
batman-adv batman-adv: Fix mem leak in the batadv_tt_local_event() function 2012-08-08 16:04:04 -07:00
bluetooth Bluetooth: L2CAP - Fix info leak via getsockname() 2012-08-15 21:36:31 -07:00
bridge bridge: fix rcu dereference outside of rcu_read_lock 2012-08-15 15:09:41 -07:00
caif caif: Do not dereference NULL in chnl_recv_cb() 2012-08-20 02:47:49 -07:00
can can: gw: Remove pointless casts 2012-07-10 22:36:17 +02:00
ceph libceph: avoid truncation due to racing banners 2012-08-21 15:55:27 -07:00
core cgroup: Assign subsystem IDs during compile time 2012-09-14 09:57:43 -07:00
dcb net: Fix non-kernel-doc comments with kernel-doc start marker 2012-07-10 23:13:45 -07:00
dccp dccp: fix info leak via getsockopt(DCCP_SOCKOPT_CCID_TX_INFO) 2012-08-15 21:36:31 -07:00
decnet ipv4: Restore old dst_free() behavior. 2012-07-31 14:41:38 -07:00
dns_resolver Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2012-05-21 20:27:36 -07:00
dsa dsa: Convert compare_ether_addr to ether_addr_equal 2012-05-09 20:49:19 -04:00
ethernet ipx: move peII functions 2012-07-19 10:48:00 -07:00
ieee802154 6lowpan: Change byte order when storing/accessing to len field 2012-07-16 22:52:02 -07:00
ipv4 ipv4: fix ip header ident selection in __ip_make_skb() 2012-08-21 14:51:06 -07:00
ipv6 net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child() 2012-08-20 03:03:33 -07:00
ipx ipx: move peII functions 2012-07-19 10:48:00 -07:00
irda irda: Fix typo in irda 2012-07-16 23:23:52 -07:00
iucv net: remove skb_orphan_try() 2012-06-15 15:30:15 -07:00
key net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
l2tp l2tp: fix info leak via getsockname() 2012-08-15 21:36:31 -07:00
lapb lapb: Neaten debugging 2012-05-17 18:45:20 -04:00
llc llc: fix info leak via getsockname() 2012-08-15 21:36:31 -07:00
mac80211 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-08-02 13:49:38 -04:00
mac802154 mac802154: sparse warnings: make symbols static 2012-07-12 07:54:45 -07:00
netfilter Merge git://1984.lsi.us.es/nf 2012-08-20 02:44:29 -07:00
netlabel netlabel: use GFP flags from caller instead of GFP_ATOMIC 2012-03-22 19:29:57 -04:00
netlink af_netlink: force credentials passing [CVE-2012-3520] 2012-08-21 14:53:01 -07:00
netrom net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
nfc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
openvswitch Revert "openvswitch: potential NULL deref in sample()" 2012-07-27 13:45:51 -07:00
packet af_packet: don't emit packet on orig fanout group 2012-08-20 02:37:29 -07:00
phonet net: remove my future former mail address 2012-06-17 16:29:38 -07:00
rds rds: set correct msg_namelen 2012-07-23 01:01:44 -07:00
rfkill rfkill: Add the capability to switch all devices of all type in __rfkill_switch_all(). 2012-06-06 15:18:17 -04:00
rose net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-10 23:56:33 -07:00
sched cgroup: Assign subsystem IDs during compile time 2012-09-14 09:57:43 -07:00
sctp netvm: prevent a stream-specific deadlock 2012-07-31 18:42:47 -07:00
sunrpc Merge branch 'akpm' (Andrew's patch-bomb) 2012-07-31 19:25:39 -07:00
tipc tipc: remove print_buf and deprecated log buffer code 2012-07-13 19:34:43 -04:00
unix af_netlink: force credentials passing [CVE-2012-3520] 2012-08-21 14:53:01 -07:00
wanrouter wanmain: comparing array with NULL 2012-07-24 13:55:21 -07:00
wimax net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
wireless cfg80211: process pending events when unregistering net device 2012-08-06 14:29:58 -04:00
x25 net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
xfrm net: ipv6: fix oops in inet_putpeer() 2012-08-20 02:56:56 -07:00
compat.c net: Fix references to out-of-scope variables in put_cmsg_compat() 2012-07-22 17:50:49 -07:00
Kconfig net: drop NET dependency from HAVE_BPF_JIT 2012-05-21 12:50:12 -07:00
Makefile econet: remove ancient bug ridden protocol 2012-05-18 01:35:08 -04:00
nonet.c
socket.c net: fix info leak in compat dev_ifconf() 2012-08-15 21:36:31 -07:00
sysctl_net.c net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00