9 minute read

Kernel Ticks and Task Scheduling

Kernel

Ticks and Task Scheduling

Advertisement

This article explores how kernel functions and task scheduling are closely linked to time.

Time is vital for kernel programming since many kernel functions are time-driven. Some are periodic, such as push and pull migration for load-balancing, the scheduler that runs queues, or refreshing the screen. Their frequencies are fixed (such as 100 times per second). The kernel schedules other functions, such as delayed disk I/O, at a relative time in the future. For example, the kernel might schedule a floppy device driver to shut off after the floppy driver motor becomes inactive; this can be 50 milliseconds (say) from now or after completion of a certain task. So the kernel horology is relative. The kernel must also manage the system uptime, and the current date and time.

Events that occur periodically every 10 milliseconds are driven by the system timer. This is a programmable piece of hardware that issues an interrupt at a fixed frequency. The interrupt handler for this timer is called the timer interrupt. The hardware provides a system timer that the kernel uses to gauge the passing of time, which works off an electronic time source, such as a digital clock or the frequency of the processor. The system timer goes off (often called hitting or popping) at a pre-programmed frequency, called the tick rate. When the system timer goes off, it issues an interrupt that the kernel handles via a special interrupt handler. Because the kernel knows the pre-programmed tick rate, it knows the time between any two successive timer interrupts. This period is called a tick. This is how the kernel keeps track of both ‘wall time’ and system uptime. ‘Wall time’, which is the actual time of day, is important to user-space applications. The kernel keeps track of it simply because the kernel controls the timer interrupt. The kernel defines the value in <asm/param.h>. For example, a microprocessor with an x86 architecture has 100 Hz, whereas one with the Intel Itanium architecture (earlier IA-64) has a 1024 Hz rate.

Timer interrupts

Interrupts are asynchronous events that are usually fired by external hardware; the CPU is interrupted in its current activity and executes special code—the ISR (Interrupt Service Routine)—to service the interrupt. Besides programming for hardware interrupts, the kernel services other interrupts. A module is expected to request an interrupt (or IRQ, for an interrupt request) channel before using it, and to release it when it’s done. The following functions, declared in <linux/ sched.h> implement the interface:

int request_irq ( unsigned int irq, //Interrupted number being requested void (*handler)(int, void *, struct pt_regs *), //function pointer to handle unsigned long flags,

const char *dev_name, //contains the owner of the interrupt void *dev_id //to identify which device is interrupting );

void free_irq(unsigned int irq, void *dev_id);

Here, request_irq returns either 0 to indicate success, or a negative error code. Every time a timer interrupt occurs, the value of an internal kernel counter is incremented. The counter is initialised to 0 at system boot, so it represents the number of clock ticks since the last boot. The counter is a 64-bit variable (even on 32-bit architectures) and is called jiffies_64.

Jiffies

The global variable, jiffies, holds the number of ticks that have occurred since the system booted. On boot, the kernel initialises the variable to zero, and it is incremented by one during each timer interrupt. Thus, because there are Hz timer interrupts in a second, there are Hz jiffies in a second. The system uptime is therefore jiffies/Hz seconds. What actually happens is slightly more complicated. The kernel initialises jiffies to a special initial value, causing the variable to overflow more often, catching bugs. When the actual value of jiffies is sought, this ‘offset’ is first subtracted. The jiffies variable is declared in <linux/jiffies.h> as extern unsigned long volatile jiffies; one should generally use <linux/sched. h>, which automatically pulls in <jiffies.h> to use the counter and the utility functions.

Calculating system date

The current time of day (the wall time) is defined in kernel/ time/timekeeping.c. The structures responsible for fetching the system data are as follows:

struct timespec { __kernel_time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; /* __kernel_time_t is long type mentioned in posix_types.h*/

struct timeval { __kernel_time_t tv_sec; /* seconds */ __kernel_suseconds_t tv_usec; /* microseconds */ };

struct timezone { int tz_minuteswest; /* minutes west of Greenwich */ int tz_dsttime; /* type of dst correction */ };

The timeval, timespec and timezone data structures are defined in <linux/time.h>. The xtime.tv_sec value stores the number of seconds that have elapsed since January 1, 1970 (UTC). This date is called the epoch. In most UNIX systems, the current wall time is set relative to this epoch. The xtime.v_nsec value stores the number of nanoseconds that have elapsed in the last second. The date is fetched into the structure by calling a routine called getnstimeofday, defined in kernel/timekeeping.c as follows:

/* Returns the time of day in a timeval*/ void do_gettimeofday(struct timeval *tv) { struct timespec now; getnstimeofday(&now); /*Returns the time of day in a timespec*/ tv->tv_sec = now.tv_sec; tv->tv_usec = now.tv_nsec/1000; }

Kernel code (especially a driver) often needs a way to delay execution for some time without using timers. This is usually to allow the hardware time to complete a given task. The time is typically quite short. For example, the specifications for a network card might list the time to change Ethernet modes as two microseconds. After setting the desired speed, the driver should wait at least for two microseconds before continuing.

Long and short delays

Long delays: Occasionally, a driver needs to delay execution for relatively long periods—more than one clock tick. Some solutions hog the processor while retarding real work, while other solutions do not do so, but offer no guarantee that your code will resume in exactly the required time. There are certain approaches regarding a long delay, which are listed below: ƒ The brain-dead approach: The simplest and easiest implementation is busy waiting or busy looping. It is applied as follows:

unsigned long j = jiffies + jit_delay * Hz; while (jiffies < j) /* void */;

However, this technique precludes the CPU from performing any other tasks, since the loop continues till the jiffies reach jit_delay. ƒ The scheduling approach: This explicitly releases the

CPU by using the schedule function declared in <linux/ sched.h>

while (time_before(jiffies, j1)) { schedule( ); }

ƒ The timeout approach: The best way to implement a long delay is to use the kernel’s intelligence. If, say, a driver uses a wait queue for some event, and simultaneously wants the event to be completed within a particular time span, then one should use wait_event_timeout, declared in <linux.wait.h> as follows:

long wait_event_timeout(wait_queue_head_t q, condition, long timeout);

Small delays: Sometimes, kernel code, like a real driver, needs to calculate very short delays in order to synchronise with hardware. Therefore, one cannot use jiffies for the delay. The kernel functions udelay and mdelay are used for short waits. Their prototypes are:

#include <linux/delay.h> void udelay(unsigned long usecs); void mdelay(unsigned long msecs);

The udelay function delays execution by busy looping for the specified number of microseconds. The mdelay function delays execution for the specified number of milliseconds. 1 second = 1,000 milliseconds = 1,000,000 microseconds, and udelay (150) is a delay for 150 μs. The udelay() function is implemented as a loop that knows how many iterations can be executed in a given period of time. The mdelay() function is then implemented in terms of udelay(). Because the kernel knows how many loops the processor can complete in a second, the udelay() function simply scales that value to the correct number of loop iterations for the given delay.

Task scheduling and the use of the kernel timer

In multi-tasking OSs, many tasks run at the same time. Providing the appropriate resource to the task is called task scheduling. The tool that distributes the available resources to the tasks is called a task scheduler. This is also called the process scheduler, and is the part of the kernel that decides which task to run next. It is one of the essential factors of a multi-tasking OS. One feature many drivers need is the ability to schedule the execution of some tasks at a later time without resorting to interrupts. Linux offers three different interfaces for this purpose: task queues, tasklets, and kernel timers.

Task queues and tasklets provide a flexible utility for scheduling execution at a later time, and are triggered by the kernel itself. They are only used to manage hardware that cannot generate interrupts. They always run in the interrupt time. And run only once, though scheduled to run multiple times. Kernel timers are used to schedule a task to run at a specific time in the future. Moreover, they are easy to use, and also do not re-register themselves, unlike task queues that do.

At times, one needs to execute operations detached from any process context, like finishing a lengthy shutdown operation. In that case, delaying the return from the close() (the function for closing the file descriptor; it returns 0 or 1 for success, r for failure, etc) wouldn’t be fair to the application program. Using a task queue would be wasteful, because a queued task must continually re-register itself until the requisite time has passed.

The kernel timers are organised as a double linked list. add_timer and del_timer are the functions used to add and remove a timer from the list. Once the timer expires it is automatically removed from the list. Bidirectional in double linked list is an advantage since in immediate previous timer is not necessitated for deletion. A timer is marked by its timeout value (in jiffies) and the function that is to be called on the time value expiry. The timer handler receives an argument, which is stored in the data structure, together with a pointer to the handler itself. The data structure of the timer is in <linux/timer.h>:

struct timer_list { struct timer_list *next; /*hold the address next node*/ struct timer_list *prev; /*hold the previous next node*/ unsigned long expires; /* the timeout, in jiffies */ unsigned long data; /* argument to the handler */ void (*function)(unsigned long); /* handler of the timeout */ volatile int running; };

Here, expires denotes jiffies. The execution of timer>function will last till jiffies is equal to or greater than timer>expires. The timeout is an absolute value equal to the current value of jiffies and the amount of the desired delay. The first step in creating a timer is to initialise it:

Struct timer_list K_timer ;

Now it should be initialised.

Init_timer(&K_timer);

The timer_list structure is initialised once; the function add_timer inserts it into a sorted list, which is then polled a 100 times per second, more or less. The add_timer is declared in <time.h> and defined in <timer.c>

Stereotype: Extern void add_timer( struct timer_list *timer )

Now the timer should be activated.

Add_timer(&k_timer);

Even systems (such as the Alpha) that run with a higher clock interrupt frequency do not check the timer list more often than that—the added timer resolution would not justify the cost of the extra passes through the list.

References

'Linux Kernel Development' by Robert Love

By: Debasree Panda

The author is an open source developer. His area of expertise is teradata databases.

This article is from: