内核通知链原理及机制linuxc++IP变动通知信号 内核通知链不是⼀个很复杂的东西,但是在内核中应⽤却很重要,当然也有很多⼈写过它的⽂章,当然这⾥写只供学习之⽤. 参考资料《深⼊理解linux⽹络内幕》,⽹络上⼀些⽂章.
通知链只在内核⼦系统之间使⽤,内核和⽤户空间的通知信息由其他机制来处理,⽐如ioctl.
内核源码参考: incllude/linux/notifier.h
kernel/notifier.c
通知链的基本数据结构定义:
struct notifier_block {
int (*notifier_call)(struct notifier_block *, unsigned long, void *);
struct notifier_block __rcu *next;
int priority;
};
通知链的本质注册函数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17/*
* Notifier chain core routines. The exported routines below * are layered on top of the, with appropriate locking added. */
static int notifier_chain_register(struct notifier_block **nl,
struct notifier_block *n)
{
while((*nl) != NULL) {
if(n->priority > (*nl)->priority)
break;
nl = &((*nl)->next);
}
n->next = *nl;
rcu_assign_pointer(*nl, n);
return0;
}
通⽤通知的产⽣:
notifier_call_chain函数
1 2 3 4 5 6 7 8 9 10 11 12 13
吉林高校14 15 16 17 18/**
* notifier_call_chain - Informs the registered notifiers about an event. * @nl: Pointer to head of the blocking notifier chain
* @val: Value pasd unmodified to notifier function
* @v: Pointer pasd unmodified to notifier function
* @nr_to_call: Number of notifier functions to be called. Don't care * value of this parameter is -1.
* @nr_calls: Records the number of notifications nt. Don't care * value of this field is NULL.
* @returns: notifier_call_chain returns the value returned by the
* last notifier function called.
*/
static int__kprobes notifier_call_chain(struct notifier_block **nl,
unsigned long val, void*v,
int nr_to_call, int*nr_calls)
{
int ret = NOTIFY_DONE;
struct notifier_block *nb, *next_nb;
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 struct notifier_block *nb, *next_nb;
nb = rcu_dereference_raw(*nl);
while(nb && nr_to_call) {
next_nb = rcu_dereference_raw(nb->next);
#ifdef CONFIG_DEBUG_NOTIFIERS
if(unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
WARN(1, "Invalid notifier called!");
nb = next_nb;
continue;
}
#endif
ret = nb->notifier_call(nb, val, v);
if(nr_calls)
(*nr_calls)++;
if((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) break;
nb = next_nb;
nr_to_call--;
}
return ret;
}
它有四种变体类型:
这⾥列出notifier.h中的⼀段注释
/
*
* Notifier chains are of four types:
*
* Atomic notifier chains: Chain callbacks run in interrupt/atomic
* context. Callouts are not allowed to block.
* Blocking notifier chains: Chain callbacks run in process context.
* Callouts are allowed to block.
* Raw notifier chains: There are no restrictions on callbacks,
* registration, or unregistration. All locking and protection
* must be provided by the caller.
* SRCU notifier chains: A variant of blocking notifier chains, with
* the same restrictions.
*
* atomic_notifier_chain_register() may be called from an atomic context,
* but blocking_notifier_chain_register() and srcu_notifier_chain_register()
* must be called from a process context. Ditto for the corresponding
* _unregister() routines.
*
* atomic_notifier_chain_unregister(), blocking_notifier_chain_unregister(),
* and srcu_notifier_chain_unregister() _must not_ be called from within
* the call chain.
*
八达岭长城旅游攻略* SRCU notifier chains are an alternative form of blocking notifier chains.
* They u SRCU (Sleepable Read-Copy Update) instead of rw-maphores for
* protection of the chain links. This means there is _very_ low overhead
* in srcu_notifier_call_chain(): no cache bounces and no memory barriers.
* As compensation, srcu_notifier_chain_unregister() is rather expensive.
* SRCU notifier chains should be ud when the chain will be called very
* often but notifier_blocks will ldom be removed. Also, SRCU notifier
* chains are slightly more difficult to u becau they require special
* runtime initialization.
*/
1. 原⼦通知链( Atomic notifier chains):通知链元素的回调函数(当事件发⽣时要执⾏的函数)只能在中断上下⽂中运⾏,不允许阻塞
struct atomic_notifier_head {
spinlock_t lock;
struct notifier_block __rcu *head;
};
2. 可阻塞通知链( Blocking notifier chains):通知链元素的回调函数在进程上下⽂中运⾏,允许阻塞
struct blocking_notifier_head {
struct rw_maphore rwm;
struct notifier_block __rcu *head;
};
3. 原始通知链( Raw notifier chains):对通知链元素的回调函数没有任何限制,所有锁和保护机制都由调⽤者维护
梦见自己大便struct raw_notifier_head {
黄龙峡struct notifier_block __rcu *head;
};
4. SRCU 通知链( SRCU notifier chains ):可阻塞通知链的⼀种变体
struct srcu_notifier_head {
struct mutex mutex;
struct srcu_struct srcu;
struct notifier_block __rcu *head;
};
这⾥我们并不会逐个分析,只分析原始通知链( Raw notifier chains).
当然在分析前先提下,内核总已经定义的⼀些链:
死亡提醒 通过register_die_notifier注册, 当内核函数触发了⼀个陷阱或违例错误发送,由oops页错误或断点命中引发。例如为⼀个医学级别卡写设备驱动,你可能想注册⾃⼰给死亡
提醒者,以在内核崩溃发⽣时关闭医疗电⼦信号。
⽹路设备提醒 通过register_netdevice_notifier注册:⽹络接⼝启动或关闭时产⽣。
CPU频率提醒 通过cpufreq_register_notifier注册:当处理器频率跃变时分发出去。
因特⽹地址提醒 通过register_inetaddr_notifier注册:当⽹络接⼝的IP地址发⽣变化被检测时发送
当然内核还有其他这⾥不再⼀⼀列出.
这⾥以netif_carrier_on为例⼦来分析下,它的原理和机制.
这个函数是检测⽹卡链路状态的情况,来通知内核⼦系统做出相应的处理.它主要会⽤在⽹络设备给的驱动⾥.drivers/net/*
1 2 3 4 5 6 7 8 9 10 11 12 13 14/**
* netif_carrier_on - t carrier
* @dev: network device牧牛词
*
* Device has detected that carrier.
*/
void netif_carrier_on(struct net_device *dev)
{
if(test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state)) { if(dev->reg_state == NETREG_UNINITIALIZED)
return;
linkwatch_fire_event(dev);
if(netif_running(dev))
__netdev_watchdog_up(dev);
16}
这⾥我们关注主要函数linkwatch_fire_event(dev);即如何通知处理例程.
void linkwatch_fire_event(struct net_device *dev)
{
bool urgent = linkwatch_urgent_event(dev);
if (!test_and_t_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
linkwatch_add_event(dev);
} el if (!urgent)
return;
linkwatch_schedule_work(urgent);
}
这个函数⾥⾯有两个关键操作,不⽤我说⼤家也看的明⽩:
linkwatch_add_event(dev);和 linkwatch_schedule_work(urgent);
在看上⾯代码中我们知道在调⽤linkwatch_add_event(dev);前做了⼀些有助于后续⼯作的初始化⼯作:bool urgent = linkwatch_urgent_event(dev);
为了后来的调度队列的⼯作.
1 2 3 4 5 6 7 8 9 10 11static void linkwatch_add_event(struct net_device *dev) {
unsigned long flags;
spin_lock_irqsave(&lweventlist_lock, flags);
if(list_empty(&dev->link_watch_list)) {
list_add_tail(&dev->link_watch_list, &lweventlist); dev_hold(dev);
}
spin_unlock_irqrestore(&lweventlist_lock, flags);
}
把监测到链路有载波的设备添加进⼊队列使⽤的链表:lweventlist 然后调度队列的执⾏.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18static void linkwatch_schedule_work(int urgent)
{
unsigned long delay = linkwatch_nextevent - jiffies;
if(test_bit(LW_URGENT, &linkwatch_flags))
return;
/* Minimi down-time: drop delay for up event. */
if(urgent) {
if(test_and_t_bit(LW_URGENT, &linkwatch_flags)) return;
delay = 0;
}
/* If we wrap around we'll delay it by at most HZ. */
if(delay > HZ)
delay = 0;
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 * This is true if we've scheduled it immeditately or if we don't * need an immediate execution and it's already pending.
*/
if(schedule_delayed_work(&linkwatch_work, delay) == !delay) return;
/* Don't bother if there is nothing urgent. */
if(!test_bit(LW_URGENT, &linkwatch_flags))
return;
/* It's already running which is good enough. */
if(!__cancel_delayed_work(&linkwatch_work))
return;
/* Otherwi we reschedule it again for immediate execution. */ schedule_delayed_work(&linkwatch_work, 0);
}
调度⼯作队列linkwatch_work.这⾥我们看下它的初始化:
static DECLARE_DELAYED_WORK(linkwatch_work, linkwatch_event); 即调度⼯作队列的执⾏函数:
static void linkwatch_event(struct work_struct *dummy)
{
rtnl_lock();
__linkwatch_run_queue(time_after(linkwatch_nextevent, jiffies));
rtnl_unlock();
燕衔泥}
__linkwatch_run_queue:
榭寄生
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22static void__linkwatch_run_queue(int urgent_only) {
struct net_device *dev;
LIST_HEAD(wrk);
/*
* Limit the number of linkwatch events to one
* per cond so that a runaway driver does not
贵开头的成语接龙* cau a storm of messages on the netlink
* socket. This limit does not apply to up events
* while the device qdisc is down.
*/
if(!urgent_only)
linkwatch_nextevent = jiffies + HZ;
/* Limit wrap-around effect on delay. */
el if(time_after(linkwatch_nextevent, jiffies + HZ)) linkwatch_nextevent = jiffies;
clear_bit(LW_URGENT, &linkwatch_flags);
spin_lock_irq(&lweventlist_lock);
list_splice_init(&lweventlist, &wrk);