NO MEMORY IS LEAKED ONCE PROCESS EXITS

When short lived programs are written, we might allocate some space using malloc() and forget to free it. The programs runs and then successfully completes.

It is actually wrong not to call free() on the allocated bunch of memory. But this will not lead to a memory leak in case we forget to call free(). The reason for this is simple. There are 2 levels of memory management working.

  1. The memory management performed by the OS: This hands out memory to processes as they run and takes it back when they exits.
  2. The memory management performed by the user: This is managed by the user, by explicitly calling malloc() and free().

When the process ends, no matter what the state of heap/stack is, everything is reclaimed by the OS, thus ensuring no memory is leaked.

But not calling free() would be an issue, if you have written a long running program, like a web service or any other service which is running for long time.

Common Memory Management errors:

There are a number of common errors that arise in the use of malloc() and free().

  1. Forgetting To Allocate Memory
  2. Not Allocating Enough Memory
  3. Forgetting to Initialize Allocated Memory
  4. Forgetting To Free Memory
  5. Freeing Memory Before You Are Done With It
  6. Freeing Memory Repeatedly
  7. Calling free() Incorrectly

It Compiled or It Ran != IT IS CORRECT

Just because a program compiled(maybe with warnings or without warnings) or ran even once or an awesome number of times correctly, does not mean that the program that is written is correct.

Many events may have conspired against you to reach this conclusion, but then a slight change in any item makes it fail.
A common programmer’s reaction is: “But it was working last time and many times! This is wrong! The compiler is wrong! The OS is faulty!”.

But the problem is usually right where you think it would be, in your code. Get to work and debug it before you blame those other components.


Writing a simple kernel module

Why do we need a kernel module?

Sometimes we need to carry out some privileged operation which is not available in Ring3. Linux kernel modules are a way to get hold of Ring0. Although linux provides a lot of APIs, but still the need arises sometimes for kernel modules.

A Linux kernel module is a piece of compiled binary code that is inserted directly into the Linux kernel, running at ring 0, the lowest and least protected ring of execution in the x86–64 processor. Code here runs completely unchecked but operates at incredible speed and has access to everything in the system.

Getting Started

Make a folder where you would put your kernel module code.

$ mkdir sample_module

Open up a file where you would write your main module code and write down the below sample contents. Suppose I name the file technicalityinside.c

#include<linux/init.h>
#include<linux/module.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Akash Panda");

static int technicalityinside_init(void) {
        printk(KERN_ALERT "Module loaded\n");
        return 0;
}

static void technicalityinside_exit(void) {
        printk(KERN_ALERT "Goodbye cruel world\n");
}

module_init(technicalityinside_init);
module_exit(technicalityinside_exit);

Now we have the simplest of all modules. Now let us understand what does it say line by line.

“includes” cover the required header files required for linux kernel module development.
There are different module licences available for MODULE_LICENSE:

  • “GPL” [GNU Public License v2 or later]
  • “GPL v2” [GNU Public License v2]
  • “GPL and additional rights” [GNU Public License v2 rights and more]
  • “Dual BSD/GPL” [GNU Public License v2or BSD license choice]
  • “Dual MIT/GPL” [GNU Public License v2 or MIT license choice]
  • “Dual MPL/GPL” [GNU Public License v2 or MPL license choice]

We define both the init (loading) and exit (unloading) functions as static and returning an int.

Please note that at the end of the file we have called module_init and module_exit functions. This gives us an opportunity to name the init and exit functions as we like.

Now let us look at writing Makefile

obj-m += technicalityinside.o
all:
 make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
 make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Once we are ready with Makefile, we can now run make to compile our module.

Now, once compiled, we are ready to insert the linux module and test it.

$ sudo insmod technicalityinside.ko

Once we have loaded the module, we can see the kernel log output

$ tail -4 /var/log/kern.log

Once we see our module is working, we can remove it by issuing rmmod command.
To learn how to load and unload kernel module, follow this article.

Linux x86 ring usage overview

In x86 protected mode, the CPU is always in one of 4 rings. The Linux kernel only uses 0 and 3:

  • 0 for kernel
  • 3 for users

This is the most hard and fast definition of kernel vs userland.

Why Linux does not use rings 1 and 2?

The intent by Intel in having rings 1 and 2 is for the OS to put device drivers at that level, so they are privileged, but somewhat separated from the rest of the kernel code.

Rings 1 and 2 are in a way, “mostly” privileged. They can access supervisor pages, but if they attempt to use a privileged instruction, they still GPF like ring 3 would. So it is not a bad place for drivers as Intel planned.

VirtualBox, a Virtual Machine, puts the guest kernel code in ring 1. Some Operating systems may use this, but not a famous design at the current design.

What can each ring do?

  • ring 0 can do anything
  • ring 3 cannot run several instructions and write to several registers, most notably:
    • cannot change its own ring!
    • cannot modify the page tables.
    • cannot register interrupt handlers.
    • cannot do IO instructions like in and out, and thus have arbitrary hardware accesses.

What is the point of having multiple rings?

There are two major advantages of separating kernel and userland:

  • it is easier to make programs as you are more certain one won’t interfere with the other. E.g., one userland process does not have to worry about overwriting the memory of another program because of paging, nor about putting hardware in an invalid state for another process.
  • it is more secure. E.g. file permissions and memory separation could prevent a hacking app from reading your bank data. This supposes, of course, that you trust the kernel.


Kernel module – loading and removing

Kernel modules are pieces of code, that can be loaded and unloaded from kernel on demand.

Kernel modules offers an easy way to extend the functionality of the base kernel without having to rebuild or recompile the kernel again. Most of the drivers are implemented as a Linux kernel modules. When those drivers are not needed, we can unload only that specific driver, which will reduce the kernel image size.

The kernel modules will have a .ko extension. On a normal linux system, the kernel modules will reside inside /lib/modules/<kernel_version>/kernel/ directory.

1. lsmod – List Modules that Loaded Already
lsmod command will list modules that are already loaded in the kernel as shown below.

 $ lsmod
Module Size Used by
ipt_MASQUERADE 16384 3
nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE
iptable_nat 16384 1
nf_nat_ipv4 16384 1 iptable_nat
nf_nat 32768 2 nf_nat_masquerade_ipv4,nf_nat_ipv4
nf_conntrack_ipv4 16384 5

2. insmod – Insert Module into Kernel
insmod command will insert a new module into the kernel.

$ insmod hello.ko 
$ lsmod | grep "hello"
hello 16384 0

3. modinfo – Display Module Info
modinfo command will display information about a kernel module as shown below.

$ modinfo hello.ko
filename: /home/akash/data/project/code/kernel-modules/SimplestLKM/hello.ko
author: maK
license: GPL
srcversion: C7C2D304485DDC1C93263AE
depends:
retpoline: Y
name: hello
vermagic: 4.15.0-46-generic SMP mod_unload

4. rmmod – Remove Module from Kernel
rmmod command will remove a module from the kernel. You cannot remove a module which is already used by any program.

$ rmmod hello.ko

5. modprobe – Add or Remove modules from the kernel
modprobe is an intelligent command which will load/unload modules based on the dependency between modules. Refer to modprobe commands for more detailed examples.

Mapped memory regions

memory-mapped file is a segment of virtual memory that has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource. This resource is typically a file that is physically present on disk, but can also be a device, shared memory object, or other resource that the operating system can reference through a file descriptor. Once present, this correlation between the file and the memory space permits applications to treat the mapped portion as if it were primary memory.

The /proc/[PID]/maps file containing the currently mapped memory regions and their access permissions.

The format is:

address           perms  offset   dev   inode  pathname
08048000-08049000 r-xp 00000000 03:00 8312 /opt/test

where “address” is the address space in the process that it occupies, “perms”
is a set of permissions, “offset” is the offset into the mapping, “dev” is the device (major:minor), and “inode” is the inode on that device. 0 indicates that no inode is associated with the memory region, as the case would be with BSS (uninitialized data). The “pathname” shows the name associated file for this mapping.

If the mapping is not associated with a file:
[heap] = the heap of the program
[stack] = the stack of the main process
[vdso] = the “virtual dynamic shared object”, the kernel system call handler
or if empty, the mapping is anonymous.

The /proc/[PID]/smaps is an extension based on maps, showing the memory consumption for each of the process’s mappings.


/proc/map_files – Information about memory mapped files

This directory contains symbolic links which represent memory mapped files the process is maintaining.

Example output:

dr-x------ 2 akash akash  0 Mar 27 21:32 ./
dr-xr-xr-x 9 akash akash 0 Mar 27 21:32 ../
lr-------- 1 akash akash 64 Mar 27 21:32 564277720000-56427773f000 -> /bin/ls*
lr-------- 1 akash akash 64 Mar 27 21:32 56427793e000-564277940000 -> /bin/ls*
lr-------- 1 akash akash 64 Mar 27 21:32 564277940000-564277941000 -> /bin/ls*
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1a833000-7fcb1a83e000 -> /lib/x86_64-linux-gnu/libnss_files-2.27.so
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1a83e000-7fcb1aa3d000 -> /lib/x86_64-linux-gnu/libnss_files-2.27.so
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1aa3d000-7fcb1aa3e000 -> /lib/x86_64-linux-gnu/libnss_files-2.27.so
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1aa3e000-7fcb1aa3f000 -> /lib/x86_64-linux-gnu/libnss_files-2.27.so
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1aa45000-7fcb1aa5c000 -> /lib/x86_64-linux-gnu/libnsl-2.27.so
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1aa5c000-7fcb1ac5b000 -> /lib/x86_64-linux-gnu/libnsl-2.27.so
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1ac5b000-7fcb1ac5c000 -> /lib/x86_64-linux-gnu/libnsl-2.27.so
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1ac5c000-7fcb1ac5d000 -> /lib/x86_64-linux-gnu/libnsl-2.27.so
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1ac5f000-7fcb1ac6a000 -> /lib/x86_64-linux-gnu/libnss_nis-2.27.so
lr-------- 1 akash akash 64 Mar 27 21:32 7fcb1ac6a000-7fcb1ae69000 -> /lib/x86_64-linux-gnu/libnss_nis-2.27.so

The name of a link represents the virtual memory bounds of a mapping, i.e.
vm_area_struct::vm_start-vm_area_struct::vm_end.

The main purpose of the map_files is to retrieve a set of memory mapped files in a fast way instead of parsing /proc/maps or /proc/smaps, both of which contain many more records. At the same time one can open(2) mappings from the listings of two processes and comparing their inode numbers to figure out which anonymous memory areas are actually shared.

KSM – What is it??

Kernel Samepage Merging (KSM) allows de-depulication of memory in Linux and has been released with kernel version 2.6.32. KSM tries to find identical Memory Pages and merge those to free memory. It tries to find memory pages that are updated seldom, otherwise it could be inefficient.

Originally KSM was developed for virtual machines. If these virtual machines use the same programs or operating systems, the overall memory usage can be reduced dramatically and more virtual machines can be operated with the available physical RAM.

Some tests from Red Hat have shown, that 52 virtual machines with Windows XP and 1 GB RAM can be operated on one server with only 16 GB RAM.

The following command allows to check if KSM is integrated into the kernel:

$ grep KSM /boot/config-`uname -r`
CONFIG_KSM=y

Further information and configuration options can be found in sysfs file system:

$ ls -1  /sys/kernel/mm/ksm/
full_scans
pages_shared
pages_sharing
pages_to_scan
pages_unshared
pages_volatile
run
sleep_millisecs

Checking if Hugepages are enabled in Linux

Issue:

On your Linux system, you want to check whether transparent hugepages are enabled on your system.

Solution:

Simple:

  • cat /sys/kernel/mm/transparent_hugepage/enabled

You will get an output like this:

  • [always] madvise never

You’ll see a list of all possible options ( alwaysmadvisenever ), with the currently active option being enclosed in brackets.
madvise is the default.

This means transparent hugepages are only enabled for memory regions that explicitly request hugepages using madvise(2).

always means transparent hugepages are always enabled for each process. This usually increases performance, but if you have a usecase with many processes that only consume a small amount of memory each, your overall memory usage could grow drastically.

never means transparent hugepages won’t be enabled even if requested using madvise.

For details, take a look at Documentation/vm/transhuge.txt in the Linux kernel documentation.

How to change the default value

Option 1: Modify sysfs directly (the setting is reverted back to default upon reboot):

  • echo always >/sys/kernel/mm/transparent_hugepage/enabled
  • echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
  • echo never >/sys/kernel/mm/transparent_hugepage/enabled

Option 2: Change system default by recompiling kernel with modified config (this is only recommended if you’re using your own custom kernel anyway):

In order to set default to always, set these options:

  • CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
  • # Comment out CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y

In order to set default to madvise, set these options:

  • CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
  • # Comment out CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y

Recent Posts

Categories

GiottoPress by Enrique Chavez