-[ BFi - English version ]---------------------------------------------------- BFi is an e-zine written by the Italian hacker community. Full source code and original Italian version are available at: http://bfi.freaknet.org/dev/BFi12-dev-01.tar.gz http://www.s0ftpj.org/bfi/dev/BFi12-dev-01.tar.gz English version translated by twiz ------------------------------------------------------------------------------ ============================================================================== ---------------------[ BFi12-dev - file 01 - 13/01/2003 ]--------------------- ============================================================================== -[ DiSCLAiMER ]--------------------------------------------------------------- The whole stuff contained in BFi has informative and educational purposes only. In no event the authors could be considered liable for damages caused to people or things due to the use of code, programs, pieces of information, techniques published on the e-zine. BFi is a free and autonomous way of expression; we, the authors, are as free to write BFi as you are free to go on reading or to stop doing it right now. Therefore, if you think you could be harmed by the topics covered and/or by the way they are in, * stop reading immediately and remove these files from your computer * . You, the reader, will keep to youself all the responsabilities about the use you will do of the information published on BFi by going on. You are not allowed to post BFi to the newsgroups and to spread *parts* of the magazine: please distribute BFi in its original and complete form. ------------------------------------------------------------------------------ -[ HACKiNG ]------------------------------------------------------------------ ---[ FiND HiDDEN RESiDENT PR0CESSES -----[ twiz sgrakkyu ---[ Preface This article is basically divided into two sections: a first one that will analyze the theorycal underground behind FHRP, analyzing concepts, linux kernel implementation case studies and so on and a second one, "practical" that will present the pourposes and the effective code implementation. First part isn't strictly necessary to use FHRP, but it gives the fondaments to deeply understand fhrp and, maybe, extend or adapt it to fit your needs. In a nutshell FHRP is a linux 2.4.* loadable kernel module, implemented only for uniprocessor systems, designed to find hidden processes on a box using the cr3 value as a signature. A sensible part (even if not implemented) of the ideas and the code should be valid also on other operating systems, SMP systems and, eventually, other arch. ---[ Processes and the scheduler Find hidden processes, we said. I assume you all know what a process is and that you've clear the "astraction" that is done inside linux kernel, where would be more correct talk of tasks, since basically linux sees and manages kernel threads and userland processes in the same manner, that is with a struct task_struct. [Note: That doesn't mean that there's no difference between kthreads and userland processes, indeed, for example, the struct mm_struct of a kthread will always be NULL, since the virtual memory that a kthread references is the one directly mapped in kernel space. Moreover actually in the "base" 2.4 kernel thread the full preemption patch isn't applied, so kernel threads and whatever running in kernel space isn't pre-emptable] Likewise we won't dwell on what is and how a scheduler works, there are plenty of books and texts on the net that deal with such a topic (for a bare list check the reference at the end of this paper)... in a nutshell we can define the scheduler as that part of an operating system that takes care of choosing, among processes that compete for the CPU, which one let effectively run, using a particular algorithm (there are many possible algorithms and different pourposes, just think, for example, of the differences between a batch system and a real-time system). Different can be, also, the situations when the the scheduler gets called, for example when a process ends or blocks waiting for a specific resource (f.e. i/o on a port), or when that resource gets available, or, yet, it forks or has exausted its time quantum. A last definiton that could be useful to give is the difference between *non-preemptive* and *preemptive* scheduling, with, in the first case, the scheduler picking up a process and letting it execute until it ends, blocks or volountary yelds the cpu, while, in the second case, the process has assigned a time-quantum, that, when expired, force the process to stop and the scheduler to pick up another one. Whenever the "priority" among processes (Priority Scheduling Algorithm) is implemented, a process with higher priority that gets available (f.e. because the resource it was blocking on is now available) gets scheduled and the process that was running, suspended. Preemptive scheduling conditio sine qua non is obviously the presence of a timer interrupt. Since, as we said before, we gonna work with linux kernel and over a 32bit x86 uniprocessor system, let's go deeper and look closer at how all that is implemented. There are at least 3 texts online (Reference [2] [3] and [4]) that analyze, even quite deeply, the linux kernel scheduler (both UP and SMP), so, we will focus over the points that are most interesting for us, trying to go as deeper as possible into, leaving to the interested reader the possibility to examine closer the other subjects reading those texts. The best way to understand the linux kernel remains, thou, a read of kernel/sched.c and some part of kernel/timer.c (sys_alarm and sys_nanosleep are implemented there) as well as include/linux/sched.h and time[r].h . At a given time a task can be in one of these 5 states (task->state inside struct task_struct) : [snip] #define TASK_RUNNING 0 #define TASK_INTERRUPTIBLE 1 #define TASK_UNINTERRUPTIBLE 2 #define TASK_ZOMBIE 4 #define TASK_STOPPED 8 [snip] TASK_RUNNING -> task is on the runqueue and so competes for the CPU. Every process on the runqueue is in TASK_RUNNING state, while could not be true the opposite, since the action of setting a process in TASK_RUNNING and putting it on the runqueue isn't atomic. TASK_INTERRUPTIBLE -> task is sleeping, but can be woken up by a signal or if the time to sleep set with schedule_timeout() expires. When a process goes sleeping in TASK_INTERRUPTIBLE state its task_struct is inserted into the waitqueue linked to the resource it is blocking on. TASK_UNINTERRUPTIBLE -> task is sleeping, but it's garanted that it gonna keep that state untill the expiry of the timer set with schedule_timeout(). That option is seldom used inside linux kernel, but is useful for example for a device driver that has to wait for a particular operation to end and that, if interrupted, could return a wrong value or leave the device in an umpredicible and/or corrupted state. TASK_STOPPED -> task has been stopped by a signal or because there is an attempt to trace it with ptrace (to a PTRACE_ATTACH corresponds a SIGSTOP sent to the child). A task in TASK_STOPPED state isn't, obviously, in the runqueue or in a waitqueue. TASK_ZOMBIE state isn't of any particular interest, just the task has finished its execution, but the father hasn't executed a wait() on his status. Those processes become childs "adopted" by init, that, periodically, issues wait() calls, erasing them de facto. What it's interesting to notice is that *only* TASK_RUNNING processes compete for the CPU, while all the others don't get any CPU time (unless, obviously, they get woken up by an expiry, for the processes in timeout, or, out of TASK_UNINTERRUPTIBLE's ones, by a signal). The importance of that and of the fact that TASK_STOPPED processes are not in any waitqueue would be more clear when the foundaments of fhrp will be analyzed, expecially the signature-system. -----] Case study n.1 -> schedule_timeout() Once we have introduced the concept of timeout and schedule_timeout(), let's see how linux kernel handles it. The function we will start from is right schedule_timeout(), from kernel/sched.c . signed long schedule_timeout(signed long timeout) This function receives as parameter the "time" in jiffies the process will have to spend sleeping. Jiffies are just the number of clock ticks since the boot-up of the box, that is why the value of the jiffies variable is incremented at every timer interrupt. FHRP heavily relays on timer interrupt and that will be analyzed later. At this time, after declaring a struct timer_list (used to set the timeout) and a "long expire", a switch on the timeout value is executed (to fit readability and space the comments inside sched.c have been removed): switch (timeout) { case MAX_SCHEDULE_TIMEOUT: schedule(); goto out; default: { printk(KERN_ERR "schedule_timeout: wrong timeout " "value %lx from %p\n", timeout, __builtin_return_address(0)); current->state = TASK_RUNNING; goto out; } } We are mostly interested in the MAX_SCHEDULE_TIMEOUT case : in that situation, indeed, no timer will be set, but simply the scheduler will be invoked. That way the process, that has been previously set to TASK_INTERRUPTIBLE (generally) or TASK_UNINTERRUPTIBLE (seldom) is removed from the runqueue and another one is scheduled. The sleeping process so *WON'T* wake up after a whatever fixed period. The case of MAX_SCHEDULE_TIMEOUT is of direct interest in FHRP because sys_accept (precisely inside wait_for_connect() ) passes right that parameter to schedule_timeout() , creating some trouble in finding hidden processes that are listening on a specific port. The solution to that problem will be clear after the practical analysis of FHRP (second part of this article). For all the other cases (default), there's a check (*paranoid*, as we can read from comments inside the source) for a negative value passed to schedule_timeout() (something that should *never* happen). In that case, thou, schedule_timeout would return 0. Whenever (and it's far the most common situation) nothing of that will happen the function goes on that way : expire = timeout + jiffies; init_timer(&timer); timer.expires = expire; timer.data = (unsigned long) current; timer.function = process_timeout; add_timer(&timer); schedule(); del_timer_sync(&timer); timeout = expire - jiffies; Nothing strange, just the value of expire gets counted (adding the actual jiffies var value to the delay value contained in timeout) and the members of the struct timer_list are filled. What is interesting to notice is that the function that will be called before all to wake up the process is process_timeout() . add_timer(&timer) adds the process to the global list of active timers, while del_timer_sync(&timer) is used to avoid race condition if the function returns before the expected time (f.e. we have been woken up by a signal). In that situation 'timeout' is returned, that is the time elapsed after the set of the timer. For any other doubt about schedule_timeout() the comments inside sched.c should be enough :) Before going further just two words about how a process can be set to sleep, with or without timeout. The possibilities are basically two : an invocation that i would define "manual" of schedule_timeout() or the use of interruptible_sleep_on_timeout / interruptible_sleep_on / sleep_on_timeout / sleep_on . An example of "manual" invocation is inside sys_nanosleep, the syscall invoked when we wrote something like sleep(10) inside a C code. Leaving lines concerning realtime processes ( task->policy set to SCHED_RR or SCHED_FIFO ), the interesting lines are : current->state = TASK_INTERRUPTIBLE; expire = schedule_timeout(expire); In that case there is no need to set a waitqueue, because the process is not waiting (that is, blocking for) a particular resource, but simply it has to be kept sleeping for a given amount of time. If the process really blocks waiting for a resource/event and if schedule_timeout() is invoked "manually", a wait_queue is set and previously added to a waitqueue_head. We will see an example when wait_for_connect() will be analysed briefly. The *sleep_on* family (where '*' is used as a regexp ;)) does exactly the same, just a waitqueue is always set and added to the wait_queue_head_t struct passed as parameter. The set of the state of the process and, eventually, the call to schedule_timeout() are managed inside those functions. The difference between timeout or not is achieved, with schedule_timeout() , depending on the value of timeout (if it's equal or not to MAX_SCHEDULE_TIMEOUT ). ------] Case study n.2 -> The path to a process wake up As we saw few lines ago, at the expire of the timer set by schedule_timeout(), the function that gets called is process_timeout() , that receives as parameter an unsigned long, that is nothing more than a pointer to the task_strct put to sleep. From this moment on we will start jumping around among various functions, everyone acting as a wrapper to the following one, till we arrive to try_to_wake_up() , which is the function that will, effectively, wake up our process. What we're going to do in the second case study is follow the stages trying to spot out the most interesting ones (from the FHRP point of view) and the reasons behind the hooking in particular points of the code. The first function that gets called is the process_timeout() and is also the function that FHRP hooks to check this type of processes. The function itself, as every wrapper, is quite simple : static void process_timeout(unsigned long __data) { struct task_struct * p = (struct task_struct *) __data; wake_up_process(p); } A pointer is declared and with a casting it is made point to the process that had been put to sleep and, after that, the wake_up_process() function is called. process_timeout() is also the function hooked inside FHRP, and that is because : - It is the first function to be called when there is the need to wake up a process, and that means that it does not depend from other functions that could have been hooked by the attacker and so give us wrong/compromised results. - It is really short and so it is possible to write it back completly inside the hook, having the security that nothing will "interfere". - If we put a process in timeout, via schedule_timeout() and, because of whatever reason this process isn't passed to wake_up_process , and, so, it doesn't arrive at try_to_wake_up(), that process won't wake up any more... this in FHRP is used in a way, a bit "rude" (but efficient), to make harmless a possible hidden process. wake_up_process() is itself a wrapper function, that calls the try_to_wake_up() . It is used, for example, when a SIGCONT is sent to a process ( kernel/signal.c ). inline int wake_up_process(struct task_struct * p) { return try_to_wake_up(p, 0); } As we said few lines ago, that function does nothing more than call try_to_wake_up, passing as second parameter (int synchronous, as we will see in a minute) "0", that is the request to call reschedule_idle(), beside inserting the process in the runqueue. The result is that, inside reschedule_idle() , the goodness of the woken up process will be computed (thanks to the dynamic priority) and, if that one has a major priority than the current running process, a context switch will occur and the process just woken up will get immediately the CPU. Let's see try_to_wake_up() aswell : static inline int try_to_wake_up(struct task_struct * p, int synchronous) { unsigned long flags; int success = 0; spin_lock_irqsave(&runqueue_lock, flags); p->state = TASK_RUNNING; if (task_on_runqueue(p)) goto out; add_to_runqueue(p); if (!synchronous || !(p->cpus_allowed & (1 << smp_processor_id()))) reschedule_idle(p); success = 1; out: spin_unlock_irqrestore(&runqueue_lock, flags); return success; } This function is quite simple too : - a lock on the runqueue is acquired with spin_lock_irqsave and, beside the lock, interrupts get disabled on the current CPU, if in SMP context, (in UP that is just the classic sequence save_flags(), cli(), restore_flags()) and in flags the interrupt state of the processor is saved; - the state of the process is set to TASK_RUNNING and there's a check to see if the process is on the runqueue (if it is, the lock is released, the interrupt state restaured through flags and '0' returned); [Note: the state of the process is set to TASK_RUNNING without any check on the previous state and on the fact that it would have been "possibly" modified. That is the reason why moving to TASK_STOPPED manually the processes, those who had set a not-yet-expired timer are woken up... not that this is of any worry, indeed we can (and we do) just wake up *only* those processes with a valid cr3/signature ] - the task is added to the runqueue and, if syncronous == 0 or if it isn't possible to make the process run on the current CPU (condition that on UP is *always* false) (cpus_allowed is just a bitmask that lists valid CPUs for the switch) reschedule_idle() is called. In any evenience success is set to 1 to indicate that the task has been successfully inserted in the runqueue. reschedule_idle() is the function, as we said some paragraphs ago, that deals with checking if the goodness of the woken-up process is better than the one of the current running process and, if that is the case, carry on a context switch in advantage of the former. The code is quite complex in SMP (it has the pourpose to "find" an idle cpu to let the process run on), while it's just few lines in UP: int this_cpu = smp_processor_id(); struct task_struct *tsk; tsk = cpu_curr(this_cpu); if (preemption_goodness(tsk, p, this_cpu) > 1) tsk->need_resched = 1; In tsk is set the process current on the CPU, while the preemption_goodness does nothing more than subtract the goodness of the woken-up process with the one of the current process. If the value returned is major than one that means that the priority of the former is higher and need_resched is set to 1, forcing a scheduler invocation at next ret_from_intr or ret_from_sys_call . need_reschedule() , as said in a comment in kernel/sched.c too, is absolutely timing critical, since if you remember from the try_to_wake_up() , it is called with the lock set on the runqueue and it is impossible to claim for the tasklist_lock . Because there are many focusing texts online (Reference [2] [3] and [4]... besides sched.c) we won't go any further on other sections of the scheduler, as, for example, the goodness (that is the core of the scheduling algorithm) or schedule() itself, since they are widely analyzed both in Linux Kernel Internals 2.4 [2] and in the freely downloadable chapter of Understanding the Linux Kernel [3]. ------] Case study n.3 -> About the PIT, raising the clock frequency As we already said the timer interrupt is the conditio sine qua non of a preemptive scheduling. It is thanks to the timer interrupt that we can periodically decrease the time quantum of a process ( task->counter ) and set, if equal to 0 (expired), task->need_resced to 1, thus invoking the scheduler at the following ret_from_intr or ret_from_sys_call and obtain a context switch. Taking in exam the linux kernel, the frequency (modifiable at compile time simply modifing HZ value in asm/param.h ... default is 100) is set to 1 tick every 10ms. It's under the sun that if we raise up the value of HZ we get a better response time, that is the time between the send of the command and the execution of the command itself, but a major overhead too, due to the fact that the number of context switch increases sensibly and every process has globally at every epoch "few time" to run, since its counter expires in a shorter lack of time. It's more or less under the sun too that if we bring down HZ value we get longer and longer response times. Both the operations (to raise and to low the timer interrupt frequency) advantage particular kind of processes and drawback others. A shell or whatever interactive application gets benefit from a raising, while bringing it down gives benefit to an operation like a find on the whole hard disk or a backup. FHRP, when loaded, raise up the timer interrupt frequency, arriving at 1 tick every millisec (value that obviously is modifiable, as we'll see while analyzing that portion of the code), calling back, thou, the original timer interrupt handling routine with the classic frequency of 10ms. That gives us the possibility to check more than once between an "effective" timer interrupt and the other what effectively is running on the CPU. Let's see now how all that is possible and, moreover, what raising the interrupt permits and how the linux kernel manages all the stuff. At the end of the case study I'll provide a simple module that let us easily raise up and down the frequency. [Note: this part isn't strictly necessary to understand the working of FHRP and probably will be more suitable for those interested in low level arch and devices. If not interested you can just jump over it or read it just for couriosity, without paying too much attention to the details. Moreover, even if a part of the description is correct/appliable to SMP systems too, the case study and the code given is only for UP. ] Let's start from the architecture. We will analyze the 8253/8254 (even if, de facto, we will focus only on the 8253, since we won't discuss 8254 extensions) Programmable Interrupt Timer chip. Of the 3 channels available on the PIT we're only interested on the channel/timer 0, that is the device that the linux kernel uses to keep count of the time (timer interrupt). [Note: if you're interested in reading more about what it's not going to be analyzed there, that is the other two channels, channel/timer 1 linked to the DRAM refresh and channel/timer 2 linked to the speaker, you can check Reference [5] and [6]] The 3 timers of the chip 8253 are controlled by the same clock signal, that derives from the quartz oscillation on the motherboard. The frequency of this clock signal is more or less of 1.1931 MHz. Every timer has a counter, programmable, which, in a nutshell, counts how many time or how many time later (depending on the Operation Mode, described later) to send its "signal". Timer 0 channel of chip 8253 is in fact linked to the PIC (Programmable Interrupt Controller) 8259, which standardly listens on 8 interrupt sources and passes those interrupts, one at a time, to the CPU, according to a priority mechanism that allows a more important one to interrupt a less important one and so get the cpu. The 8259 PIC let us mask particular interrupts with the 8 bits (one for interrupt) of the IRM (Interrupt Mask Register), in fact a bit set to 1 into the IRM will prevent that interrupt from "reaching" the CPU to be handled. [Note: IRQs are actually more than 8: they are exactly the double, 16, divided into 8 Master, directly linked to the CPU, and 8 Slave, which pass through IRQ2. However it's not the place to go deeper, if you are interested read Reference [5]] 8253 chip timer 0 is linked to the IRQ0 of the 8259, the Interrupt Request (this is how is defined an interrupt source that passes through the 8259) with higher priority. As we said more than once, the default frequency inside linux kernel is 100 Hz, a tick every 10ms. That is achieved setting the timer counter to 11932 (from the simple division we get almost 100Hz or 10ms), let's see how it is counted inside the kernel. The value is returned by the LATCH macro: #define LATCH ((CLOCK_TICK_RATE + HZ/2) / HZ) /* For divider */ CLOCK_TICK_RATE is defined in include/asm/timex.h and is equal to 1193180. The reason behind this division is quite straightforward. LATCH is a generic macro, that is suitable to all the controller of whatever architecture supported, while CLOCK_TICK_RATE depends on the frequency of the quartz oscillation, and so is architecture dependant. So the result is : 11932. Before looking at how that values is set in the counter relative to timer 0 it's necessary to spend a few words about the ports PIT is linked to. PIT is linked to 4 ports: - 0x40 - Channel 0 counter - 0x41 - Channel 1 counter - 0x42 - Channel 2 counter - 0x43 - Mode Control Register The port 0x43 is the one of more interest for use, since, before setting the counter we've to "tell him" how to behave. That is achieved through an out on the 0x43 port. Mode Control Register is made of 8 bit: | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |___| |___| |_______| _ - bit 0 acts as switch to indicate how the counter value will be passed, if in 16 bit binary or in BCD for 4 decades. - bits 123 set one of possible 6 modes (for a description of each mode check Reference [5]). The mode that we need to set is n. 2, which generate an impulse every time that tot (counter) cycles elapsed. - bits 4 and 5 are the Read/Write/Latch format bits, we will set both of them to 1, signaling that we will pass to the port given with bits 6 and 7, with two following out, LSB and after MSB of the value to set in the counter. - bits 6 and 7 identify which counter we will modify and so at which port wait for the outs. We will set them to 0, selecting the counter linked to channel 0. In the end, so, the bitmask to pass will be 00110100, that is 0x34 in hex, and it's exactly what kernel does in arch/i386/kernel/i8259.c : outb_p(0x34,0x43); /* binary, mode 2, LSB/MSB, ch 0 */ outb_p(LATCH & 0xff , 0x40); /* LSB */ outb(LATCH >> 8 , 0x40); /* MSB */ Everything should be quite clear and there should be no need for further explanation. The last thing to do is to show a practical application. The module I present there is written in at&t syntax (gas) and let you pass as a parameter the value of the counter you want to set. Default value will raise the clock frequency to 1000Hz... try to insmod it and launch some interactive programs, like top, or execute shell commands. If you've the framebuffer, your cursor will behave madly :) <-| fhrp/citf.s |-> /* * citf.s * * Change the interrupt timer frequency via lkm * * * * This module gets the value to set in the counter of channel 0 via parameter 'freq' and * * sets it. Default value is 0x4a9, the value you'll get setting HZ to 1000 inside kernel * * I list there some value you could find useful to know : * * for HZ == 100 (default in linux kernel) -> 0x2e9c * * for HZ == 1000 (default in this lkm) -> 0x4a9 * * for HZ == 50 -> 0x5d38 * * * Example : insmod citf.o freq=0x5d38 */ .globl init_module .globl cleanup_module .globl freq .data .align 4 .size freq, 4 freq: .long 0x4a9 .text .align 4 start: init_module: pushl %ebp movl %esp,%ebp xorl %eax, %eax pushfl cli movb $0x34, %al outb %al, $0x43 movw freq, %ax outb %al, $0x40 movb %ah, %al outb %al, $0x40 popfl xorl %eax, %eax leave ret cleanup_module: pushl %ebp movl %esp,%ebp xorl %eax, %eax pushfl cli movb $0x34, %al outb %al, $0x43 movw $0x2e9c, %ax outb %al, $0x40 movb %ah, %al outb %al, $0x40 popfl leave ret .globl __module_parm_freq .section .modinfo __module_kernel_version: .ascii "kernel_version=2.4.9\0" __module_parm_freq: .ascii "parm_freq=i\0" <-X-> The following is a simple C code to count values to pass as a parameter, intended just to be handy, since i hope you've got clear how it's counted. <-| fhrp/freq.c |-> #define MY_LATCH(x) ((1193180 + x/2) / x) /* For divider */ main(int argc, char **argv) { int i = atoi(argv[1]); printf("%x\n", MY_LATCH(i) ); } <-X-> ---[ FHRP: pourposes, ideas and implementation After this "theorical" part it's time to present the tool and analyze its pourposes and the ideas that brought up to its writing. At the end of the presentation will be presented possible ideas to improve it or extend some function. ------] Pourposes The pourpose of FHRP is, in the very end, just one, find processes that have been hidden by the attacker. FHRP makes an assumption, processes have been hidden by removing them *at least* from the task_struct double chained list (reason will be clear in a minute) and its aim is to find a process that, removed from whatever kernel list (runqueue, pidhash list, task_list..), receives a "quantum of CPU" by even a low-level manual switching or a heavy and invasive modification of the scheduler (I'm not ignoring all the difficulties in implementing a manual switching or such an invasive modification to the scheduler, it's just the _extreme_ case and we want to be consistent in that situation too). The very first thing we need to achieve our aim is to find a way to recognize processes, a sort of valid signature that will discriminate "good" processes from "evil" ones. The choose is fallen on the cr3 (control register) value, because : - It's essential to the execution of the process cr3 register (aka PDBR - Page Directory Base Register) keeps the physical address of the base of the page directory and two flags (PCD and PWT). Only 20 most-significative bits are specified, while for the other 12 a 0 value is assumed. Just because it's essential for the execution of the process it can't be "faked". - It's fast/easy to retrieve The cr3 register is contained in every task_struct in task->mm->pgd (to compare it we've to remember to "translate" it into a physical addr, with __pa() ) and so it's fast to retrieve it to setup the cr3-valid database. Moreover it's fast too to retrieve it while the process is running on the CPU (movl %cr3, %eax) and that one is just a good news, since we're in interrupt time and the clock is raised to 1000Hz. The cr3 is loaded by the kernel at context-switch time by switch_mm() , with a simple inline assembly instruction: asm volatile("movl %0,%%cr3": :"r" (__pa(next->pgd))); Now that we've our signature we need a point where we can sit and check constantly the CPU and cr3-register value. [Note: FHRP code is divided into 4 file: 3 .c sources and a .h include, that we are going to analyze in their more important characteristics; read up the rest, if you're interested, directly inside files themselves] ------] cr3-timer.c Inside that file there are raise_timer() and restore_timer() functions that let fhrp raise up at insmod time and bring back down at rmmod time timer interrupt frequency. If you've read case study n.3 you should have no problem in understanding them (it's nothing more than the C tansposition of citf.s); if you jumped over it, the only thing you're interested in knowing is that during the period the module will be linked to the kernel a timer interrupt every ms will arise. The decision to raise up to 1000Hz has been taken to have more possibilities to find a CPU hidden process, and it showed no particulary balancing problem with the little overhead that our checking functions create during interrupt time. handler_new() is the new handler that we set for the timer interrupt. That handler computes from HZ and MY_HZ the frequency to call the scheduler at to avoid a change in the succession of context switches. Still inside handler_new() is executed the check between the cr3 active and the valid-cr3 list. As we understand following timer.c and following the various functions called, there are many points where we could hook to get more or less the same effect. We've decided to place ourself on the top of the chain, with the substitution of the address contained inside the struct irqhandler irq0 (the address of the timer interrupt handler) with the address of our new function. That way we gonna overwrite even other possible hooks aiming to "manage" somehow hidden processes. How an interrupt is handled is out of the pourposes of this article and so i don't spend time on how the IDT is handled and on the mechanism linked to the switching in kernel land, but rather is interesting, in the process to understand how FHRP works, to show how ISRs (Interrupt Service Routine) are invoked. The foundamental struct that keeps the ISR is the struct irqaction struct irqaction { void (*handler)(int. void *, struct pt_regs *); unsigned long flags; unsigned long mask; const char *name; void *dev_id; struct irqaction *next; }; - first field identifies the ISR itself, that is the function that will handle the interrupt; - flags set the modality this routine has to be executed in.. among the more important settings we list the possibility of executing the routine with the interrupts disabled (SA_INTERRUPT) and the possibility of sharing the IRQ line with other devices (SA_SHIRQ); - name field is just an identifier that gives the name to the I/O device; - dev_id identifies the Major/Minor device number; - next, in the end, is a pointer to another struct irqaction, that achieves the possibility, if the IRQ line is shared (SA_SHIRQ), to have a list of structs, every one linked to its device and its routine. Since we're working with irq0 struct, let's see how it's declared : static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT, 0, "timer", NULL, NULL}; The address of the struct irq0, got from nm or System.map, it's at the basis of the set_irq() and restore_irq() functions, that manage to set our new struct irquaction (and so, our new handler) and brings things back to original state. ------] cr3-func.c Inside that file there are some of the essential function for the working of FHRP and that arise sensibly the probability, already quite high, to find an hidden process. First two function we analyze are stop_all_process_safe() and resume_all_process() , that deals, respectively, to the "STOP"ping of the runnig processes and their waking-up back when work is done. The advantage that blocking all the good ones give us is to increase the possibility of scheduling of an hidden process, leaving to the scheduler the possibility, de facto, of giving the CPU only to kthreads, "safe" processes (we will talk of them in a couple of lines), and, obviously, the hidden ones. stop_all_process_safe() lists all the legal active processes on the box at insmod time (running the task_struct double chained list with list_for_each) and stop them one at time, sending a force_sig(SIGSTOP, p). However not *all* processes are stopped, as we see from checks : .... if(p->mm) .... if((t != pid_bash_safe) && (t != SAFE_P_KLOGD) && (t != SAFE_P_SYSLOGD) && (t != SAFE_P_INIT)) .... if((p->state != TASK_UNINTERRUPTIBLE) && (p!=current)) .... These are, in fact, left running: - kernel threads -> kernel threads, as we said, haven't a mm_struct (since they reference directly the memory mapped in kernel space) and, so, every attempt to reach the pgd, apart that absolutly pointless, will translate in a segfault; - the bash parent of insmod -> its pid is retrieved from p->p_opptr->pid and let us to have a shell from which launch the rmmod (avoiding so a complete freeze of the box) and, if you want, other programs (but remember that their cr3s will be interpreted as "evil" and, if using schedule_timeout() , they won't be woken up back); - klogd and syslogd -> that let our module send messages to the console; - init; - TASK_UNINTERRUPTIBLE state processes -> first of all, send a signal to those processes will be of no immediate use, indeed, they will keep on sleeping untill the end of the timeout and/or of the operation (generally I/O). Moreover processes in TASK_UNINTERRUPTIBLE are seldom present and, even if they would need to wake up, they would be correctly handled by the process_timeout() hook; - current -> stopping the current process gives some trouble in many situations. In our case current *is* insmod and so stopping it won't let our module to properly load and will, probably, led to a machine-freeze. Moreover we aren't particulary interested in insmod, since its destiny is to expire right after the loading of our module. resume_all_process() is nothing more than the opposite of the stop_all_process_safe() and uses again force_sig() to send a SIGCONT (if you experience a problem with some processes in reacquiring the tty a bounce of "fg" should be enough). Last, but crucial, function we find there (and that was already "introduced") is take_global_page_dir() , that manages to retrieve the cr3 of the current running process. unsigned long int take_global_page_dir() { __asm__ __volatile__ ("movl %cr3, %eax"); } ------] cr3-main.c This is the heart of the module, and, besides init_module and cleanup_module (that setup/clean module environment) there are some other interesting function. First of all is setupped the table of "allowed" processes, with the routine_set_table() , after that is set the process_timeout() hook. This hook is quite simple (like every hook to a wrapper) and give us the way to check all the processes that "wake-up" (and that probably will suddently come back to sleep, not being the resource available, without receiving cpu in userspace). Only authorized processes are allowed to wake-up, while the other are lost in the kernel and at the 99% (if the attacker hasn't set a parallel check) they won't get CPU anymore, becaming harmless. The second function that we analyze is the check_listening_socket() . That function gives us the possibility to retrieve hidden processes that are listening, after an accept(), on a given port. First of all let's clearify its necessity. Let's take as an example a "socket tcp" listening on a port (a common backdoor f.e.). Last step of the accept, on which it will sleep, is in net/ivp4/tcp.c and is wait_for_connect() . static int wait_for_connect(struct sock * sk, long timeo) { DECLARE_WAITQUEUE(wait, current); int err; add_wait_queue_exclusive(sk->sleep, &wait); for (;;) { current->state = TASK_INTERRUPTIBLE; release_sock(sk); if (sk->tp_pinfo.af_tcp.accept_queue == NULL) timeo = schedule_timeout(timeo); [...] This is the part we're interested in. The waitqueue is declared, there's a check if there's someone in 'accept_queue' for the socket and, if not, schedule_timeout() is invoqued. However the hook on the process_timeout in that situation is not helpful, in fact, in the classic case that we're taking in consideration, timeo is set to MAX_SCHEDULE_TIMEOUT . As we saw in the case study relative to the schedule_timeout() , a value of MAX_SCHEDULE_TIMEOUT doens't set any timer, but, just, invokes the scheduler and set the socket to sleep, untill a signal (generally SIGIO) will manage to wake it up. At this time we need : - a way to list all listening sockets; - a way to retrieve from the struct sock of the socket in listening the task_struct of the controlling process. The solution should be quite straightforward looking at the code, we take various hash tables of listening sockets (tcp, udp and raw), than, tracked down the sock struct, we get the wait_queue_head struct from sk->sleep . Listing again that one we get a list of "possible" struct wait_queue , from which we can track down the struct task_struct and do the check there. That check is done by check_wait_process() . When reporting the result, at rmmod time, the port and the kind of socket is reported. Another possible idea (and maybe the more natural one) was to track down the struct file from the struct sock and do a sort of pattern matching with the struct file reached from the task_strct, but that would have altered the nature of FHRP, that is the cr3 check. [Note: if you give a fast look at net/netsyms.c you'll see that two symbols that are of direct interest for us (udp_hash and tcp_hashinfo) are exported only if at least one among CONFIG_IPV6_MODULE , CONFIG_KHTTPD and CONFIG_KHTTPD_MODULE is set. The problem is fast to resolve hooking two other functions, but, since fhrp is a tool for admins, that hasn't been added to the code (if you want to a bounch of #ifdef should be enough). The fastest solution remains, obviously, recompile with, f.e., CONFIG_IPV6_MODULE set] -----] config.h FHRP header contains the #define to hook some function (f.e. process_timeout() ) or to handle some struct at kernel level (f.e. irq0 or the listening raw socket hash table), that have to be correctly set using nm or an up to date System.map . The name of the "string" to look for appears among the comments near to every #define . In that same file you should set pids of syslogd and klogd, being those two demons run at the startup of the system they should keep always the same pid after a reboot too. Last two things you could be interested in setting are MY_HZ, that sets the frequency to raise the clock to and MAX_RESULTS that sets the maximum number of results to report at rmmod time. Both are set to two values that, generally, should behave just fine... If exactly 10 results are reported, perhaps it would be worth, to be sure, to insmod the module again, with MAX_RESULTS increased. If you want, you could, with a simple add, change MAX_RESULTS and let the maximum of results to be set at insmod time with a MODULE_PARM . Last part we spend some words on inside config.h is about compare_cr3() . The function is declared static inline to avoid a CALL (we're in interrupt time and even few cycles less are fine... without forgetting that a call flushes the pipeline). This function has been structured to as optimized as possible, for example with the implementation of a sort of cache that keeps in memory latest cr3 found (indeed, having the clock ten times raised it's quite probably that the same evil cr3, whenever the program runs on the cpu for a quantum or more, is found over and over again many times. The cache let us avoid running the list of found-cr3 every time... don't forget we're in interrupt time). -----] To the end and yet again Let's start with how it works... it should be already quite clear that this is not a module thought to be resident, just the opposite, few seconds should be enough (at least if you don't want to be sure against sleep(100) or such long sleep, but repeated cross-checks should find them too), the time necessary for a couple of epochs to end, and you should have an image of what is running on your box. Moreover this module returns always a FALSE POSITIVE. It's the value of the cr3 of rmmod... we opted to let fhrp report it, printing the value before, for more safety. It's quite obvious that a simple hidden_task - 1 erases that false positive... but why don't be paranoid? Let's see an example of the module at work : root@twiz:/home/twiz/cr3/cr3-dev# insmod fhrp.o [snip] Pid: 79 Context: 2577000 Pid: 83 Context: 25da000 Pid: 85 Context: 2523000 Pid: 93 Context: 3c9c000 Pid: 94 Context: 3e1a000 Pid: 95 Context: 3cd1000 Pid: 96 Context: 3bd4000 Pid: 97 Context: 3b75000 Pid: 98 Context: 3ccd000 Pid: 99 Context: 24af000 Pid: 100 Context: 233d000 Pid: 101 Context: 23cc000 Pid: 194 Context: 3a36000 [snip] Setting up process_timeout hook.. root@twiz:/home/twiz/cr3/cr3-dev# rmmod fhrp Restoring process_timeout.. Ripristining all process... Leaving Module Hidden Processes Foud : 1 Current-> it should _likely_ be rmmod: 2139000 cr3 malign : 2139000 pid : 672 got from Interrupt handler root@twiz:/home/twiz/cr3/cr3-dev# The cr3 malign reported is nothing different than the false positive we were just talking about. If you want to test the consistency of the module against a listening backdoor or a process that you run, you can simply make if "forget" to collect the cr3 during the talbe-setup (a simple if (p->pid == pidtoforget) ) or try some lkm that will hide a process removing it from the task list. We got good results :) Let's go on with a thing we already said, but that is quite important: this lkm isn't a panacea, if a trojaned ps or a code that modifies proc or a bounch of syscalls (getdents?) is hiding the process, this module can't do anything but listing all the running processes at insmod time. There are other ways to check, first of all list the task_struct list (just like the module does at insmod time), use a ps safe or check the md5sum (if the redirect isn't at kernel level) or use KSTAT to check syscalls. That means that there's not *the* tool, but a combined work of various tools when it's necessary to check the integrity of a box. That LKM comes in aid finding out the more complex hides, those that removes the process from known lists, playing low-level and relying very little on kernel functions. Nothing is 100% safe when both the attacker and the admin can play in kernel-land. The attacker could have changed the create_module() to do pattern matching on some "signature-opcodes", looking for, f.e., movl %eax, %cr3 or somewhere else. At this time we could obfuscate a little the code with random nops (not necessary a nop is \x90 ;)) or just, in case of movl %eax, %cr3, change it into pushl %eax;popl %cr3... and so on. Attacker could do a statistical check of force_sig, checking for many SIGSTOP in a short time (and incremental pids too) and thus stop his process untill a reasonably number of SIGCONT would have passed through force_sig... and we could just not use force_sig at all and manually stop (changing their state) the processes. More, such a change, by the attacker, to force_sig should be quite visible, checking vmlinux opcodes, looking for, f.e., suspicious movl %eax/jmp *%eax, pushl/ret and so on. The fact itself that the module has been written a lot at low-level and is not resident gives us a good level of protection, but, obviously, not the 100% of safety :) The module as it is leave opened some nice extensions that aren't included in the release version. Among those : - Check of kernel threads - There's no implemented check at kspace (mostly because the cr3 approach isn't appliable), however the cr3 isn't the only consistent signature. Positives tests have been brought on using as signature %esp value (p->thread.esp). A fast look to the implementation of get_current/GET_CURRENT should give you the idea. - Manual check on other waitqueues - This is possible too through the wait_queue_t internal to the task_struct . The function used for sockets is volountary quite generic, to be applied to other scenarios. - Destroying found processes - That is possible too. We've p->thread.esp , so (if you gave a look to get_current should be clear how) we know how to reach the task_struct linked to the process. At that time we can simply behave like sys_exit would and erase mm locks, opened file descriptor (ecc.), possible links and free the memory. We can find out whatever information about the hidden process... but (there is a but) few of those fields are *absolutely necessary* (think to a manual switching) and the attacker could have modified them on purpose to make our module crash. - The irq0 isn't the only point where you can hook to have a check time-based, it's possible to use the RTC too... moreove in SMP contest some things could have to be modified... Reference [6] gives some more detail. Said that we think (and hope) you'll find that tool interesting, similary we hope you've found information inside that paper useful/interesting. For whatever doubt, critic, suggestion, patch & co contact us. Before leaving you the references, some thanks to vecna, Dark-Angel, albe and rene @ irc.kernelnewbies.org. An hello is for racl guys (racl.oltrelinux.org) and _oink (thanks for the postcard ;) ndtwiz). ---[ References [1] - Modern Operating Systems - Second Edition - Andrew S. Tanenbaum [2] - Linux Kernel Internals 2.4 - Tigran Aivazian http://www.moses.uklinux.net/patches/lki.html [3] - Understanding the Linux Kernel - Bovet, Cesati - Ch10 "Scheduling" http://www.oreilly.com/catalog/linuxkernel/chapter/ch10.html [4] - For Kernel_Newbies By a Kernel_Newbie - A.R.Karthick http://www.freeos.com/articles/4536/ [5] - http://www.nondot.org/sabre/os/articles/MiscellaneousDevices/ [6] - Timer-related functionality in Linux kernels 2.x.x - Andre Derric Balsa http://www.cse.msu.edu/~zhengpei/tech/Linux/timerin2.2.htm [7] - Linux Kernel Sources 2.4.* -[ WEB ]---------------------------------------------------------------------- http://www.bfi.cx http://bfi.freaknet.org http://www.s0ftpj.org/bfi/ -[ E-MAiL ]------------------------------------------------------------------- bfi@s0ftpj.org -[ PGP ]---------------------------------------------------------------------- -----BEGIN PGP PUBLIC KEY BLOCK----- Version: 2.6.3i mQENAzZsSu8AAAEIAM5FrActPz32W1AbxJ/LDG7bB371rhB1aG7/AzDEkXH67nni DrMRyP+0u4tCTGizOGof0s/YDm2hH4jh+aGO9djJBzIEU8p1dvY677uw6oVCM374 nkjbyDjvBeuJVooKo+J6yGZuUq7jVgBKsR0uklfe5/0TUXsVva9b1pBfxqynK5OO lQGJuq7g79jTSTqsa0mbFFxAlFq5GZmL+fnZdjWGI0c2pZrz+Tdj2+Ic3dl9dWax iuy9Bp4Bq+H0mpCmnvwTMVdS2c+99s9unfnbzGvO6KqiwZzIWU9pQeK+v7W6vPa3 TbGHwwH4iaAWQH0mm7v+KdpMzqUPucgvfugfx+kABRO0FUJmSTk4IDxiZmk5OEB1 c2EubmV0PokBFQMFEDZsSu+5yC9+6B/H6QEBb6EIAMRP40T7m4Y1arNkj5enWC/b a6M4oog42xr9UHOd8X2cOBBNB8qTe+dhBIhPX0fDJnnCr0WuEQ+eiw0YHJKyk5ql GB/UkRH/hR4IpA0alUUjEYjTqL5HZmW9phMA9xiTAqoNhmXaIh7MVaYmcxhXwoOo WYOaYoklxxA5qZxOwIXRxlmaN48SKsQuPrSrHwTdKxd+qB7QDU83h8nQ7dB4MAse gDvMUdspekxAX8XBikXLvVuT0ai4xd8o8owWNR5fQAsNkbrdjOUWrOs0dbFx2K9J l3XqeKl3XEgLvVG8JyhloKl65h9rUyw6Ek5hvb5ROuyS/lAGGWvxv2YJrN8ABLo= =o7CG -----END PGP PUBLIC KEY BLOCK----- ============================================================================== -----------------------------------[ EOF ]------------------------------------ ==============================================================================