Linux Internals
This article is my attempt to explain Linux internals to myself and act as a structured way of keeping notes.
Linux is an operating system kernel - which is distinct from the user-space where user applications will run. The kernel directly interfaces with hardware via its firmware and associated drivers. It also manages resources for sharing by user applications. Other open-source kernels that may be of interest are those based on BSD and the GNU Hurd microkernel.
The Linux kernel can be interacted with or controlled from user space by means of a File based API or a C API which aim to be POSIX compatible.
The types of hardware interfaced in the kernel include:
- graphics
- audio
- memory
- cpu
- gpu
- network
- keyboard and mouse
- storage and i/o peripherals
with the kernel handling control messages including power messages for associated devices.
As part of its resource management the kernel includes:
- filesystems and virtual file management
- memory management functionality
- process scheduling and interprocess communication
- networking
- security
- media support
In the user-space of operating systems that use the Linux kernel (will controversially refer to as Linux distros for brevity) some foundational elements and applications are:
- an init system - often
systemdand previously collections of 'sysvinit scripts' or niche alternatives likeshepherd. - system daemons for administrative services
- a graphics system - oten
waylandnow replacingx org - an audo or multimedia system -
pipewireis becoming a common replacement forpulseaudioand others - a terminal emulator and shell
- network interface management, such as ethernet or wifi
- a package manager
- a bootloader
APIs
https://en.wikipedia.org/wiki/Linux_kernel_interfaces#Linux_API
The kernel tries to follow the Portable Operating System Interface (POSIX) and Single Unix Specification where applicable.
File Based
Device drivers are interacted with in directories:
/devhttps://en.wikipedia.org/wiki/Device_file#DEVFS/sys
Processes are interacted with in:
/proc/proc/sys
System calls and similar
https://en.wikipedia.org/wiki/System_call
ioctl (input/output control) is a system call for device specific io operations.
sysctl(system control)ioctl(io control)fcntl(file control)
Other communication mechanisms include netlink sockets which allow IPC between both kernel and userspace programs. It is designed to be a more flexible successor to ioctl.
User Space
Init Systems
Systemd
Important utilities:
systemctljournalctlnotifyloginctlsystemd-boot
Important daemons:
systemdjournaldresolvednetworkdloginduser-sessionudevd
Important libraries:
libnotifylibudev
Devices
Devices are managed in user space with the udev utility - which has the following parts:
- libudev which can be used as a library for device info
- udevd daemon for managing the
/devvirtual file hierarchy - the
udevadmcommand line utility for admin and diagnostics
Window management and device input events
Wayland and X are the two most common display servers on Linux systems. They both follow a client-server approach, where the latter in Wayland can be part of the compositor.
Display servers can react to device input events via the libinput library, which in turn uses libevdev to handle evdev ioctls from the kernel.
Network
iptables allows configuration of IP packet filter rules in the Linux kernel firewall, which are implemented as netfilter modules. nftables is a newer userspace utility that replaces iptables and similar.
netfilter operates by providing hooks to be called at different stages of packet processing, allowing decisions to be made on the fate of the packet. The user-space side of iptables has the concept of 'tables' - which define some function to perform on a packet and 'chains' which correspond to when the functions will be applied, mapping to the netfiler hooks. Chains has a priority controlling their order of application on the netfilter hooks.
The following are the netfilter hooks and corresponding built-in iptable chains:
NF_IP_PRE_ROUTING/PREROUTINGNF_IP_LOCAL_IN/INPUTNF_IP_FORWARD/FORWARDNF_IP_LOCAL_OUT/OUTPUTNF_IP_POST_ROUTING/POSTROUTING
The following are the available iptable tables:
nat: network address translationfilter: Where a packet can proceed or not - essentially firewallingmangle: IP header modificationraw: allows marking packets as part of a connectionsecurity: used in SELinux and similar applications
iptable rules are placed in a specific chain of a specific table. Rules have a 'matching' component and a 'target' component, which gives an action. The actions can be terminating which stops chain evaluation or non-terminating.
avahi is a zero-configuration networking implementation (allows network service use by freshly networked computers or peripherals) including multicast DNS and DNS service discovery. Apples' Bonjour and Systemd's systemd-resolved are other implementations.
The Desktop-Bus (DBus) is a user space middleware allowing communication between multiple processes (e.g. IPC).
-
https://www.digitalocean.com/community/tutorials/how-the-iptables-firewall-works
-
https://www.digitalocean.com/community/tutorials/a-deep-dive-into-iptables-and-netfilter-architecture
-
https://www.digitalocean.com/community/tutorials/how-to-choose-an-effective-firewall-policy-to-secure-your-servers
-
https://www.chiark.greenend.org.uk/~peterb/network/drop-vs-reject
Audio
ALSA (Advanced Linux Sounds Architecture) - has both kernel and user-space elements, the later with alsa-lib.
Pipewire is a new low-level multimedia framework.
Power Managemenr
UPower (previously DeviceKit-power) is a power manement middleware - it spawns upowerd.
Systemd has some session and power management through its logind. The elogind is a standalone fork of this logind.
The systemctl suspend analogue for elogind is loginctl suspend.
Other
- avahi
- dbus
- udisks
- cgroups
- autofs
- kdbus
- Plokit (Policy Kit) https://en.wikipedia.org/wiki/Polkit
- Pluggable Authentication Module (PAM) https://en.wikipedia.org/wiki/Pluggable_Authentication_Module
- Name Service Switch (NSS) https://en.wikipedia.org/wiki/Name_Service_Switch
- procfs https://en.wikipedia.org/wiki/Procfs
- sysfs https://en.wikipedia.org/wiki/Sysfs
Kernel Space
Input events
- evdev
/dev/input
Filesystems
-
ext4
-
btrfs
-
xfs
-
jfs
-
fat32
-
FUSE
Storage
- SCSI
- libATA
Virtualization
- KVM
- Xen
Process Management
clone(2)andclone3(2)futex(7)andfutex(2)- Completely Fair Scheduler
- https://en.wikipedia.org/wiki/Earliest_eligible_virtual_deadline_first_scheduling
- Native Posix Thread Library (NPTL) is kernel side of pthreads
Security
- Linux Security Modules
- SELinux
- AppArmor
- POSIX ACLs
Memory
- DMA buffers
Audio
- Advanced Linux Sound Architecture (ALSA)
Graphics
- Direct Rendering Manager (DRM)
- Kernel Mode Setting (KMS)
Network
- New API
- mac80211
- Netfiler
Power and Control
- ACPI https://en.wikipedia.org/wiki/ACPI
PAM and Login
- https://kl.wtf/posts/2022/03/12/login-managers-an-introduction.html#pluggable-authentication-modules