Module types
Available on crate feature
dep_nc
only.Structs§
- The remap event
- The trace itself
- User setup structure passed with BLKTRACESETUP
- anonymous struct for
BPF_BTF_LOAD
- anonymous struct used by
BPF_MAP_*_ELEM
commands - anonymous struct used by
BPF_*_GET_*_ID
- anonymous struct used by
BPF_OBJ_GET_INFO_BY_FD
- anonymous struct used by
BPF_MAP_CREATE
command - anonymous struct used by
BPF_OBJ_*
commands - anonymous struct used by
BPF_PROG_ATTACH/DETACH
commands - anonymous struct used by
BPF_PROG_LOAD
command - anonymous struct used by
BPF_PROG_TEST_RUN
command - anonymous struct used by
BPF_PROG_QUERY
command - Key of a
BPF_MAP_TYPE_LPM_TRIE
entry - Use
bpf_sock_addr
struct to access socket fields and sockaddr struct passed by user and intended to be used by socket (e.g. to bind to, depends on attach attach type). - User
bpf_sock_ops
struct to access socket values and specify request ops and their replies. - Arguments for the clone3 syscall.
- POSIX 1003.1g - ancillary data object information Ancillary data consits of a sequence of pairs of (
cmsghdr, cmsg_data[]
) - IA64 and
x86_64
need to avoid the 32-bit padding at the end, to be compatible with the i386 ABI - from struct
btrfs_ioctl_file_extent_same_info
- from struct
btrfs_ioctl_file_extent_same_args
- And dynamically-tunable limits and defaults:
- Structure for
FS_IOC_FSGETXATTR[A]
andFS_IOC_FSSETXATTR
. - A waiter for vectorized wait.
- Cache for
getcpu()
to speed it up. Results might be a short time out of date, but will be faster. - Internet address.
- struct
inotify_event
- structure read from the inotify device for each event read()
from/dev/aio
returns these structures.- Filled with the offset for
mmap(2)
- IO completion data structure (Completion Queue Entry)
- Passed in for
io_uring_setup(2)
. Copied back with updated info on success - IO submission data structure (Submission Queue Entry)
- we always use a 64bit
off_t
when communicating with userland. its up to libraries to do the proper padding andaio_error
abstraction - Berkeley style UIO structures
- Request struct for multicast socket ops
- The generic
ipc64_perm
structure: Note extra padding because this structure is passed back and forth between kernel and user space. - These are used to wrap system calls. See architecture code for ugly details..
- Obsolete, used only for backwards compatibility and libc5 compiles
- Slot for
KCMP_EPOLL_TFD
- legacy timeval structure, only embedded in structures that traditionally used ‘timeval’ to pass time intervals (not absolute times).
- This structure is used to hold the arguments that are used when loading kernel binaries.
- For recvmmsg/sendmmsg
- Structure for passing mount ID and miscellaneous parameters to
statmount(2)
andlistmount(2)
. mount_setattr()
- message buffer for msgsnd and msgrcv calls
- As we do 4.4BSD message passing we use a 4.4BSD message passing system, not 4.3. Thus
msg_accrights(len)
are now missing. They belong in an obscure libc emulation or the bin. - buffer for msgctl calls
IPC_INFO
,MSG_INFO
- Generic
msqid64_ds
structure. - Arguments for how openat2(2) should open the target path. If only flags and mode are non-zero, then openat2(2) operates very similarly to openat(2).
- single taken branch record layout:
- Hardware
event_id
to monitor via a performance monitoring event: - Structure of the page that can be mapped via mmap
- Structure used by below
PERF_EVENT_IOC_QUERY_BPF
command to query bpf programs attached to the same perf tracepoint as the given perf event. - This structure provides new memory descriptor map which mostly modifies
/proc/pid/stat[m]
output for a task. This mostly done in a sake of checkpoint/restore functionality. - Per-thread list head:
- Support for robust futexes: the kernel cleans up held futexes at thread exit time.
- struct
rseq_cs
is aligned on 4 * 8 bytes to ensure it is always contained within a single cache-line. It is usually declared as link-time constant data. - struct rseq is aligned on 4 * 8 bytes to ensure it is always contained within a single cache-line.
- Extended scheduling parameters data structure.
- struct
seccomp_data
- the format the BPF program executes over. - semop system calls takes an array of these.
- Obsolete, used only for backwards compatibility and libc5 compiles
- Serial input interrupt line counters – external structure Four lines can interrupt: CTS, DSR, RI, DCD
- Serial interface for controlling ISO7816 settings on chips with suitable support. Set with TIOCSISO7816 and get with TIOCGISO7816 if supported by your platform.
- Multiport serial configuration structure — external structure
- Serial interface for controlling RS485 settings on chips with suitable support.
- The
shmid64_ds
structure for x86 architecture. Note extra padding because this structure is passed back and forth between kernel and user space. - Obsolete, used only for backwards compatibility and libc5 compiles
- Obsolete, used only for backwards compatibility
kill()
- POSIX.1b signals
- SIGCHLD
- SIGPOLL
- SIGSYS
- POSIX.1b timers
- user accessible metadata for
SK_MSG
packet hook, new fields must be added to the end of this structure - 1003.1g requires
sa_family_t
and thatsa_data
is char. - ARM needs to avoid the 32-bit padding at the end, for consistency between EABI and OABI
- Structure for getting mount/superblock/filesystem info with
statmount(2)
. - Structures for the extended file attribute retrieval system call (
statx()
). - Timestamp structure for the timestamps in struct statx.
- syscall interface - used (mainly by NTP daemon) to discipline kernel clock oscillator
- Note on 64bit base and limit is ignored and you cannot set DS/ES/CS not to the default values if you still want to do syscalls. This call is more for 32bit mode therefore.
- user accessible metadata for XDP packet hook new fields must be added to the end of this structure
Enums§
- values to program into
branch_sample_type
whenPERF_SAMPLE_BRANCH
is set - The format of the data returned by
read()
on a perf event fd as specified byattr.read_format
: - Bits that can be set in
attr.sample_type
to request information in the overflow packets. perf_event_type
- Generalized hardware cache events:
- Generalized performance event
event_id
types, used by theattr.event_id
parameter of thesys_perf_event_open()
syscall: Common hardware events, generalized by the kernel: - Values to determine ABI of the registers dump.
- Special “software” events provided by the kernel, even if the hardware does not support performance events.
- attr.type
Constants§
/proc/sys/abi
- default handler for ELF binaries
- default handler for procs using lcall7
- default handler for an libc.so ELF interp
- fake target utsname information
- tracing flags
- address bit
- disable randomization of VA space
- estimated time error
- frequency offset
- maximum time error
- select microsecond resolution
- select nanosecond resolution
- Mode codes (timex.mode) time offset
- old-fashioned adjtime
- read-only adjtime
- add ‘time’ to current time
- clock status
- set TAI offset
- tick value
- pll time constant
- Algorithm sockets
AppleTalk
DDP- Ash
- ATM PVCs
- ATM SVCs
- Amateur Radio AX.25
- Bluetooth sockets
- Multiprotocol bridge
- CAIF sockets
- Controller Area Network
- Reserved for
DECnet
project - Acorn Econet
Native InfiniBand
address- IEEE802154 sockets
- Internet IP Protocol
- IP version 6
- Novell IPX
- IRDA sockets
mISDN
sockets- IUCV sockets
- Kernel Connection Multiplexor
PF_KEY
key management API- Linux LLC
- POSIX name for
AF_UNIX
- For now..
- MPLS
- Reserved for 802.2LLC project
- Amateur Radio NET/ROM
- NFC sockets
- Packet family
- Phonet sockets
PPPoX
sockets- Qualcomm IPC Router
- RDS sockets
- Amateur Radio X.25 PLP
- Alias to emulate 4.4BSD
RxRPC
sockets- Security callback pseudo AF
- smc sockets: reserve number for
PF_SMC
protocol family that reusesAF_INET
address family - Linux SNA Project (nutters!)
- TIPC sockets
- Unix domain sockets
- Supported address families.
- vSockets
- Wanpipe API Sockets
- Reserved for X.25 project
- XDP sockets
- Don’t use 0x3001-0x3004 because of old glibcs
ARCH_SHSTK
_ features bits- bytes of args + environ for
exec()
- For the close wait times, 0 means wait forever for serial port to flush its output. 65535 means don’t wait at all.
- Allow empty relative pathname
- Special value used to indicate openat should use the current working directory.
- Suppress terminal automount traversal
- Apply to the entire subtree
- Remove directory instead of unlinking file.
- Don’t sync attributes with the server
- Force the attributes to be sync’d with the server
- Do whatever
stat()
does - Type of synchronisation required from
statx()
- Follow symbolic links.
- Do not follow symbolic links.
c_cflag
bit meaning- flush buffer cache
- get filesystem (mm/filemap.c) read-ahead
- set filesystem (mm/filemap.c) read-ahead
- return device size /512 (long *arg)
- return device size in bytes (u64 *arg)
- get current read ahead setting
- set read ahead for block device
- get read-only status (0 =
read_write
) - the read-only stuff doesn’t really belong here, but any other place is probably as bad and I don’t want to create yet another include file. set device read-only (0 = read-write)
- re-read partition table
- get max sectors per request (
ll_rw_blk.c
) - set max sectors per request (
ll_rw_blk.c
) - get block device sector size
- Trace actions in full.
- readahead
- completions
- discard requests
- binary per-driver data
- flush
- fs requests
- fua requests
- issue
- metadata
- special message
- pc requests
- queueing/merging
- reads
- requeueing
- sync IO
- writes
- obsolete - kept for compatibility
- Mode for
BPF_FUNC_skb_adjust_room
helper. - alu mode in double word width
- flags for
BPF_MAP_UPDATE_ELEM
command create new element or update existing - sign extending arithmetic shift right
- function call
- ld/ldx fields double word (64-bit)
- change endianness of a register flags for endianness conversion:
- update existing element
- function return
(symbol + offset)
or addr(symbol + offset)
or addr- tp name
- tp name
filename + offset
filename + offset
- dest is blackholed; can be dropped
- fragmentation required to fwd
- fwding is not enabled on ingress
- packet is not forwarded
- no neighbor entry for nh
- dest not allowed; can be dropped
- lookup successful
- dest is unreachable; can be dropped
- fwd requires encapsulation
- DIRECT: Skip the FIB rules and go to FIB table associated with device OUTPUT: Do lookup from egress perspective; default is ingress
- cgroup-bpf attach flags used in
BPF_PROG_ATTACH
command - If
BPF_F_ANY_ALIGNMENT
is used inBPF_PROF_LOAD
command, the verifier will allow any alignment whatsoever. BPF_FUNC_perf_event_output
forsk_buff
input context.- Current network namespace
- flags used by
BPF_FUNC_get_stackid
only. BPF_FUNC_l3_csum_replace
andBPF_FUNC_l4_csum_replace
flags.BPF_FUNC_perf_event_output
,BPF_FUNC_perf_event_read
andBPF_FUNC_perf_event_read_value
flags.BPF_FUNC_clone_redirect
andBPF_FUNC_redirect
flags.- spin_lock-ed
map_lookup/map_update
- Instead of having one common LRU list in the
BPF_MAP_TYPE_LRU_[PERCPU_]
HASH map, use a percpu LRU list which can scale and perform better. - flags for
BPF_MAP_CREATE
command - Specify numa node during map creation
BPF_FUNC_l4_csum_replace
flags.- flags for
BPF_PROG_QUERY
- Flags for accessing BPF object
BPF_FUNC_skb_store_bytes
flags.- flags for both
BPF_FUNC_get_stackid
andBPF_FUNC_get_stack
. - Flag for
stack_map
, storebuild_id+offset
instead of pointer - The verifier will perform strict alignment checking.
BPF_FUNC_skb_set_tunnel_key
andBPF_FUNC_skb_get_tunnel_key
flags.- flags used by
BPF_FUNC_get_stack
only. BPF_FUNC_skb_set_tunnel_key
flags.- Zero-initialize hash function seed. This should only be used for testing.
- Mode for
BPF_FUNC_skb_load_bytes_relative
helper. - LE is unsigned, ‘<=’
- LT is unsigned, ‘<’
- Extended instruction set based on top of classic BPF instruction classes jmp mode in word width
- jmp encodings jump !=
- SGE is signed ‘>=’, GE in x86
- SGT is signed ‘>’, GT in x86
- SLE is signed, ‘<=’
- SLT is signed, ‘<’
- Encapsulation type for
BPF_FUNC_lwt_push_encap
helper. - BPF syscall commands, see
bpf(2)
man-page for details. - alu/jmp fields mov reg to reg
- create new element if it didn’t exist
- Generic BPF return codes which all BPF program types may support.
- Program types of bpf.
- when
bpf_call->src_reg == BPF_PSEUDO_CALL
,bpf_call->imm == pc-relative
offset to another bpf function - when
bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD
,bpf_ldimm64->imm == fd
- Register numbers
- Calls BPF program when an active connection is established
- Mask of all currently supported cb flags
- Get base RTT.
- If connection’s congestion control needs ECN
- Calls BPF program when a passive connection is established
- Called when skb is retransmitted.
- Called when an RTO has triggered.
- Definitions for
bpf_sock_ops_cb_flags
- Should return initial advertized window (in packets) or -1 if default value should be used
- Called when TCP changes state.
- Calls BPF program right before an active connection is initialized
- Called on listen(2), right after socket transition to LISTEN state.
- Should return SYN-RTO value to use or -1 if default value should be used
- List of known BPF
sock_ops
operators. New entries can only be added at the end - user space need an empty entry to identify end of a trace
- couldn’t get
build_id
, fallback to ip - with valid
build_id
and offset - Now a valid state
- List of TCP states.
- Leave at the end!
- convert to big-endian
- convert to little-endian
- exclusive add
- Signal interrupt on break
- SIGBUS
si_codes
invalid address alignment - non-existent physical address
/proc/sys/bus/isa
- hardware memory error detected in process but not consumed: action optional
- hardware memory error consumed on a machine check: action required
- object specific hardware error
- Allow configuration of audit via unicast netlink socket.
- Allow reading the audit log via multicast netlink socket.
- Allow writing the audit log via unicast netlink socket.
- Allow preventing system suspends.
- POSIX-draft defined capabilities. In a system with the
_POSIX_CHOWN_RESTRICTED
option defined, this overrides the restriction of changing file ownership and group ownership. - Override all DAC access, including ACL execute access if
_POSIX_ACL
is defined. Excluding DAC access covered byCAP_LINUX_IMMUTABLE
. - Overrides all DAC restrictions regarding read and search on files and directories, including ACL restrictions if
_POSIX_ACL
is defined. Excluding DAC access covered byCAP_LINUX_IMMUTABLE
. - Overrides all restrictions about allowed operations on files, where file owner ID must be equal to the user ID, except where
CAP_FSETID
is applicable. It doesn’t override MAC and DAC restrictions. - Overrides the following restrictions that the effective user ID shall match the file owner ID when setting the
S_ISUID
andS_ISGID
bits on that file. - Allow locking of shared memory segments Allow mlock and mlockall (which doesn’t really have anything to do with IPC)
- Override IPC ownership checks
- Overrides the restriction that the real or effective user ID of a process sending a signal must match the real or effective user ID of the process receiving the signal.
- Allow taking of leases on files.
- Allow modification of
S_IMMUTABLE
andS_APPEND
file attributes - Allow MAC configuration or state changes.
- Override MAC access.
- Allow the privileged aspects of
mknod()
. - Network administration capacity.
- Allows binding to TCP/UDP sockets below 1024 Allows binding to ATM VCIs below 32
- Allow broadcasting, listen to multicast
- Allow use of RAW sockets Allow use of PACKET sockets Allow binding to any address for transparent proxying (also via
NET_ADMIN
) - Allows setgid(2) manipulation Allows setgroups(2) Allows forged gids on socket credentials passing.
- Linux-specific capabilities
- Allows set*uid(2) manipulation (including fsuid). Allows forged pids on socket credentials passing.
- Allow configuring the kernel’s syslog (printk behaviour).
- Administration capacity.
- Allow use of
reboot()
- Allow use of
chroot()
- Insert and remove kernel modules - modify kernel without limit
- Allow raising priority and setting priority on other (different UID) processes.
- Allow configuration of process accounting
- Allow
ptrace()
of any process - Allow ioperm/iopl access Allow sending USB messages to any device via
/dev/bus/usb
- Override resource limits. Set resource limits.
- Allow manipulation of system clock.
- Allow configuration of tty devices.
- Allow triggering something that will wake the system.
c_cflag
bit meaning- input baud rate;
- stopped child has continued
- child terminated abnormally
- There is an additional set of SIGTRAP
si_codes
used by ptrace that are of the form:((PTRACE_EVENT_XXX << 8) | SIGTRAP)
SIGCHLDsi_codes
child has exited - child was killed
- child has stopped
- traced child has trapped
- The IDs of the various system clocks (for POSIX.1b interval timers):
- The driver implementing this got removed. The clock ID is kept as a place holder. Do not reuse!
- sizeof first published struct
- sizeof second published struct
- sizeof third published struct
- clear the TID in the child
- set the TID in the child
- Flags for the
clone3()
syscall. Clear any signal handler and reset toSIG_DFL
. - Unused, ignored
- set if open files shared between processes
- set if fs info shared between processes
- Clone into a specific cgroup given the right permissions.
- Clone io context
- New cgroup namespace
- New ipc namespace
- New network namespace
- New mount namespace group
- New pid namespace
- cloning flags intersect with CSIGNAL so can be used with unshare and
clone3()
syscalls only: New time namespace - New user namespace
- New utsname namespace
- set if we want to have the same parent as the cloner
- set the TID in the parent
- set if a pidfd should be placed in parent
- set if we want to let tracing continue on the child too
- create a new TLS for the child
- set if signal handlers and blocked signals shared
- share system V
SEM_UNDO
semantics - Same thread group?
- set if the tracing process can’t force
CLONE_PTRACE
on this clone - set if the parent wants the child to wake it up on
mm_release
- set if VM shared between processes
- Set the
FD_CLOEXEC
bit instead of closing the file descriptor. - Unshare the file descriptor table before closing file descriptors.
- This cluster is free
- This cluster is backing a transparent huge page
- This cluster has no next cluster
- mark or space (stick) parity
- Flag
swap_map
continuation for full count - flow control
- cloning flags: signal mask to be sent at exit
- Binary emulation
- arlan wireless driver
- Busses
CTL_BUS
names:- CPU stuff (speed scaling, etc)
- Debugging
- Devices
- frv specific sysctls
- Filesystems
- Top-level names:
- how many path components do we allow in a call to sysctl In other words, what is the largest acceptable value for the nlen member of a struct
sysctl_args_t
to have? - Networking
- frv power management
- removal breaks strace(1) compilation
- s390 debug
- sunrpc debug
- VM management
CTL_DEV
names:/proc/sys/dev/cdrom
/proc/sys/dev/ipmi
/proc/sys/dev/mac_hid
/proc/sys/dev/parport
/proc/sys/dev/parport/default
/proc/sys/dev/parport/parport n/devices/
/proc/sys/dev/parport/parport n /devices/device n
/proc/sys/dev/parport/parport n
/proc/sys/dev/raid
/proc/sys/dev/scsi
- Used by the DIPC package, try and avoid reusing it
- Types of directory notifications that may be requested. File accessed
- File changed attibutes
- File created
- File removed
- File modified
- Don’t remove notifier
- File renamed
- Kernel internal flags invisible to userspace
- Root squash enabled (for v1 quota format)
- Quota stored in a system file
- 16
- these are defined by POSIX and also present in glibc’s dirent.h
- SIGEMT
si_codes
tag overflow - Set the Edge Triggered behaviour for the target file descriptor
- Set exclusive wakeup mode for the target file descriptor
- Epoll event masks
- Set the One Shot behaviour for the target file descriptor
- Request the handling of system wakeup events so as to prevent system suspends from happening while those events are being processed.
- Flags for
epoll_create1()
. - Valid opcodes to issue to
sys_epoll_ctl()
fanotify_init()
flags that requireCAP_SYS_ADMIN
.- Directory entry modification events - reported only to directory where entry is modified and not to a watching parent.
- Events and flags relevant only for directories
- Events that can only be reported with data type
FSNOTIFY_EVENT_ERROR
- Events that user can request to be notified on
- Extra flags that may be reported with event or control handling of events
- Events that can be reported with event->fd
- Events that can only be reported with data type
FSNOTIFY_EVENT_INODE
- Events that may be reported to user
- Events that can be reported with data type
FSNOTIFY_EVENT_PATH
. Note thatFAN_MODIFY
can also be reported with data typeFSNOTIFY_EVENT_INODE
. - Flags allowed to be passed from/to userspace.
- Events that require a permission response from user
- These masks check for invalid bits in permission responses.
- Internal group flags
fanotify_init()
flags that are allowed for user withoutCAP_SYS_ADMIN
.- the following events that user-space can register for File was accessed
- File accessed in perm check
- Legit userspace responses to a _PERM event
- Deprecated - do not use this in programs and do not add new flags here!
- Deprecated - do not use this in programs and do not add new flags here!
- Deprecated - do not use this in programs and do not add new flags here!
- Deprecated - do not use this in programs and do not add new flags here!
- Deprecated - do not use this in programs and do not add new flags here!
- All events which require a permission response from userspace Deprecated - do not use this in programs and do not add new flags here!
- Metadata changed
- Bitmask to create audit record for result
- These are NOT bitwise flags. Both bits are used together.
- flags used for
fanotify_init()
- helper events close
- Unwritable file closed
- Writable file closed
- Subfile was created
- Subfile was deleted
- Self was deleted
- Reserved for
FAN_EVENT_INFO_TYPE_OLD_DFID
11 - Special info types for
FAN_RENAME
- Interested in child events
- Filesystem error
- Bitmask to indicate additional information
- flags used for
fanotify_modify_mark()
FAN_MARK_FILESYSTEM
is0x0000_0100
- This bit is mutually exclusive with
FAN_MARK_IGNORED_MASK
bit FAN_MARK_MOUNT
is0x0000_0010
- Convenience macro -
FAN_MARK_IGNORE
requiresFAN_MARK_IGNORED_SURV_MODIFY
for non-inode mark types. - These are NOT bitwise flags. Both bits can be used togther.
- File was modified
- moves
- File was moved from X
- File was moved to Y
- Self was moved
- No fd set in event
- Event occurred against dir
- File was opened
- File was opened for exec
- File open/exec in perm check
- File open in perm check
- Event queued overflowed
- File was renamed
- Convenience macro -
FAN_REPORT_NAME
requiresFAN_REPORT_DIR_FID
- Convenience macro -
FAN_REPORT_TARGET_FID
requires all other FID flags - Report unique directory id
- Report unique file id
- Report events with name
- Flags to determine fanotify event format Report pidfd for event->pid
- Report dirent target id
- event->pid is thread id
- Reserved for
FAN_EVENT_INFO_TYPE_NEW_DFID
13 - fcntl, for BSD compatibility
- userspace function ptrs point to descriptors (signal handling)
- For
F_[GET|SET]FL
- This allows for 1024 file descriptors:
- bmap access
- Data is encrypted by fs.
- Data mixed with metadata.
- Multiple files in block.
- Location still pending.
- Data can not be read while fs is unmounted
- Last extent in file.
- File does not natively support extents.
- Extent offsets may not be block aligned.
- Space shared with other files.
- Data location unknown.
- Space allocated, but no data (i.e. zero).
- request caching of the extents
- sync file data before map
- map extended attribute tree
- Freeze
- get the block size used for bmap
- extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions
- Some arches already define FIOQSIZE due to a historical conflict with a Hayes modem-specific ioctl value.
- Socket-level I/O control calls.
- Thaw
- Trim
- trap on condition
- floating point divide by zero
- floating point invalid operation
- floating point overflow
- floating point inexact result
- subscript out of range
- floating point underflow
- undiagnosed floating-point exception
- SIGFPE
si_codes
integer divide by zero - integer overflow
- top of stack page
- Create new or reuse existing superblock
- Create new superblock, fail if reusing existing superblock
- Invoke superblock reconfiguration
- Set parameter, supplying a binary blob value
- Set parameter, supplying an object by fd
- Set parameter, supplying no value
- Set parameter, supplying an object by path
- Set parameter, supplying an object by (empty) path
- Set parameter, supplying a string value
- Max chars for the interface; each fs may differ
fsmount()
flags.fsopen()
flags.fspick()
flags.- system-wide maximum number of aio requests
- current system-wide number of aio requests
- writes to file may only append
- btree format dir
- One or more compressed clusters
- Compress file
- dirsync behaviour (directories only)
- Reserved for compression usage…
- int: directory notification enabled
- disc quota usage statistics and control
/proc/sys/fs/quota/
- Inode used for large EA
- Encryption algorithms
- Removed, do not use.
- Removed, do not use.
- End compression flags — maybe not all used Encrypted file
- Reserved for ext4
- Extents
- User modifiable flags
- User visible flags
- Reserved for ext4
- AFS directory
- Immutable file
- hash-indexed directory
- Reserved for ext4
- inotify submenu
- Reserved for ext3
- File system encryption support Policy provided via an ioctl on the topmost directory
- Parameters for passing an encryption key into the kernel keyring
- int: leases enabled
- int: maximum time to wait for a lease break
- int:maximum number of dquots that can be allocated
- int:maximum number of filedescriptors that can be allocated
- int:maximum number of inodes that can be allocated
- int:maximum number of
super_blocks
that can be allocated - Structure that userspace passes to the kernel keyring
- do not update atime
- Don’t compress
- Do not cow file
- do not dump file
- file tail should not be merged
- int:current number of allocated dquots
- int:current number of allocated filedescriptors
CTL_FS
names:- int:current number of allocated
super_blocks
- ocfs2
- int: overflow GID
- int: overflow UID
- use master key directly
- Create with parents projid
- reserved for ext2 lib
- Inode flags (
FS_IOC_GETFLAGS
/FS_IOC_SETFLAGS
) - Synchronous updates
- Top of directory hierarchies
- Undelete
- all writes append
CoW
extent size allocator hint- use DAX for IO
- extent size allocator hint
- inherit inode extent size
- use filestream allocator
- no DIFLAG for this
- file cannot be modified
- do not update access time
- do not defragment
- do not include in backups
- disallow symlink creation
- preallocated file extents
- create with parents projid
- Flags for the
fsx_xflags
field data in realtime volume - create with rt bit set
- all writes synchronous
- struct: control xfs parameters
- fs on-disk file types.
- Flags to specify the bit length of the futex word for futex2 syscalls. Currently, only 32 is supported.
- bitset with all bits set for the
FUTEX_xxx_BITSET
OPs to request a match of any bit. - *(int *)UADDR2 += OPARG;
- *(int *)UADDR2 &= ~OPARG;
- if (oldval == CMPARG) wake
- if (oldval >= CMPARG) wake
- if (oldval > CMPARG) wake
- if (oldval <= CMPARG) wake
- if (oldval < CMPARG) wake
- if (oldval != CMPARG) wake
- Use (1 << OPARG) instead of OPARG.
- *(int *)UADDR2 |= OPARG;
- *(int *)UADDR2 = OPARG;
- *(int *)UADDR2 ^= OPARG;
- The kernel signals via this bit that a thread holding a futex has exited without unlocking the futex.
- The rest of the robust-futex field is for the TID:
- Second argument to futex syscall
- Are there any waiters for this robust futex:
- Max numbers of elements in a
futex_waitv_t
array. - Set/Get seals
- Cancel a blocking posix lock; internal use only until we expose an asynchronous lock api to userspace:
- Create a file descriptor with
FD_CLOEXEC
set. - for old implementation of bsd
flock()
- using ‘struct flock64’
- Set/Get write life time hints.
{GET,SET}_RW_HINT
operate on the underlying inode, while{GET,SET}_FILE_RW_HINT
operate only on the specific file. - Request nofications on a directory. See below for events that may be notified.
- Open File Description Locks
- Check for file existence.
- for posix
fcntl()
andlockf()
- prevent future writes while mapped
- prevent file from growing
- Types of seals prevent further seals from being set
- prevent file from shrinking
- prevent writes
- Set and get of pipe page size array
- get all semval’s
- get semncnt
- semctl Command Definitions. get sempid
- get semval
- get semzcnt
- element used for group quotas
- Shift from CBAUD to CIBAUD
- Map CR to NL on input
c_iflag
bits Ignore break condition- Ignore CR
- Ignore characters with parity errors
- Structure used for setting quota information about file via quotactl Following flags are used to specify which fields are valid
- unimplemented instruction address
- internal stack error
- coprocessor error
- illegal addressing mode
- SIGILL
si_codes
illegal opcode - illegal operand
- illegal trap
- privileged opcode
- privileged register
- 224.0.0.1
- 224.0.0.2
- 224.0.0.106
- Address to accept any incoming messages.
- Address to send to all hosts.
- Address to loopback in software to local host. 127.0.0.1
- 224.0.0.255
- Address indicating an error return.
- Defines for Multicast INADDR 224.0.0.0
- Map NL to CR on input
- /proc/sys/fs/inotify/ max instances per user
- max watches per user
- Enable input parity check
- Fixed constants first: Initial setting for nfile rlimits
- Hard limit for nfile rlimits
- the following are legal, implemented events that user-space can watch for File was accessed
- All of the events - we build the list by hand so that we can add flags in the future and not break backward compatibility.
- Metadata changed
- Flags for
sys_inotify_init1
. - close
- Unwrittable file closed
- Writtable file was closed
- Subfile was created
- Subfile was deleted
- Self was deleted
- don’t follow a sym link
- exclude events on unlinked objects
- File was ignored
- event occurred against dir
- Network number for local host loopback.
- add to the mask of an already existing watch
- only create watches
- File was modified
- moves
- File was moved from X
- File was moved to Y
- Self was moved
- only send event once
- special flags only watch the path if it is a directory
- File was opened
- Event queued overflowed
- the following are legal events. they are sent as needed to any watch Backing fs was unmounted
- Valid flags for the
aio_flags
member of thestruct iocb
. - …and for the drivers/sound files…
- Direction bits, which any architecture can choose to override before including this file.
- Let any architecture override either of the following before including this file.
- 8 best effort priority levels are supported
- These are the io priority groups as implemented by CFQ.
- Gives us 8 prio classes with 13-bits of data for each class
- Fallback BE priority
cqe->flags
io_uring_enter(2)
flagsio_uring_params->features
flagssqe->fsync_flags
- Magic offsets for the application to mmap the data it needs
io_uring_register(2)
opcodes and arguments- attach to existing wq
- clamp SQ/CQ ring sizes
- app defines CQ size
io_uring_setup()
flagsio_context
is polled- SQ poll thread
sq_thread_cpu
is validsq_ring->flags
needsio_uring_enter
wakeupsqe->timeout_flags
- always go async
- select buffer from
sqe->buf_group
sqe->flags
use fixed fileset- issue after inflight IO
- like LINK, but stronger
- links next sqe
- resource get request flags create if key is nonexistent
- these fields are used by the DIPC package so the kernel as standard should avoid using them if possible make it distributed
- fail if key exists
- See ipcs
- return error on wait
- Version flags for semctl, msgctl, and shmctl commands These are passed as bitflags or-ed with the actual command
- this machine is the DIPC owner
- Control commands used with semctl, msgctl and shmctl see also specific commands in sem.h, msg.h and shm.h remove resource
- Set
ipc_perm
options - Get
ipc_perm
options - Authentication Header protocol
- IP option pseudo header for BEET
- Compression Header Protocol
- Datagram Congestion Control Protocol
- IPv6 destination options
- Exterior Gateway Protocol
- Encapsulation Header
- Encapsulation Security Payload protocol
- IPv6 fragmentation header
- Cisco GRE tunnels (rfc 1701,1702)
- IPV6 extension headers IPv6 hop-by-hop options
- Internet Control Message Protocol
ICMPv6
- XNS IDP protocol
- Internet Group Management Protocol
- INET: An implementation of the TCP/IP protocol suite for the LINUX operating system. INET is implemented using the BSD Socket interface as the means of communication with the user level.
- IPIP tunnels (older KA9Q tunnels use 94)
- IPv6-in-IPv4 tunnelling
- IPv6 mobility header
- MPLS in IP (RFC 4023)
- Multicast Transport Protocol
- IPv6 no next header
- Protocol Independent Multicast
- PUP protocol
- Raw IP packets
- IPv6 routing header
- RSVP Protocol
- Stream Control Transport Protocol
- Transmission Control Protocol
- SO Transport Protocol Class 4
- User Datagram Protocol
- UDP-Lite (RFC 3828)
- IPV6 socket options
- RFC5014: Source address selection
- obsolete
- Bitmask constant declarations to help applications select out the flow label and priority fields.
- Flowlabel
- RFC5082: Generalized Ttl Security Mechanism
IPV6_MTU_DISCOVER
values- same as
IPV6_PMTUDISC_PROBE
, provided for symetry with IPv4 also see comments onIP_PMTUDISC_INTERFACE
- weaker version of
IPV6_PMTUDISC_INTERFACE
, which allows packets to get fragmented if they exceed the interface mtu - These definitions are obsolete
- Advanced API (RFC3542) (1) Note:
IPV6_RECVRTHDRDSTOPTS
does not exist. - Advanced API (RFC3542) (2)
- RFC 5570
- home address option
- IPv6 TLV options.
- IPX options
- These need to appear somewhere around here
- Proxy original addresses
- Always DF
IP_MTU_DISCOVER
values Never send DF frames- Always use interface mtu (ignores dst pmtu) but don’t set DF flag. Also incoming ICMP
frag_needed
notifications will be ignored on this socket to prevent accepting spoofed ones. - Ignore dst pmtu
- Use per route hints
- BSD compatibility
c_lflag
bits- Strip 8th bit off characters
- Names of the interval timers, and structure defining a timer setting:
c_iflag
bits- Any character will restart after stop
- Comparison type - enum
kcmp_type
. - BSD process accounting parameters
- int: flags for setting up video after ACPI sleep
- int: boot loader type
- int: PID of the process to notify on CAD
- int: print compat layer messages
- string: pattern for core-file names
- int: use core or core.%pid
- int: allow ctl-alt-del to reboot
- string: domainname
- string: path to uevent helper (deprecated)
- int: hppa soft-power enable
- int: hppa unaligned-trap enable
- int: hz timer on or off
- int: ia64 unaligned userland trap enable
- int: unimplemented ieee instructions
- int: rtmutex’s maximum lock depth
- int: Maximum nr of threads in the system
- string: modprobe path
- int: Maximum size of a messege
- int: Maximum message queue size
- int: msg queue identifiers
- int: Maximum system message pool size
- Name translation
- int:
NGROUPS_MAX
- int: enable/disable nmi watchdog
- string: hostname
- string: system release
- int: system revision
CTL_KERN
names:- int: overflow GID
- int: overflow UID
- int: panic timeout
- int: whether we will panic on an unrecovered
- int: whether we will panic on an oops
- int: call
panic()
inWARN()
functions - ulong: bitmask to print system info on panic
- int: PID # limit
- turn htab reclaimation on/off on PPC
- l2cr register on PPC
- use nap mode for power saving
- turn idle page zeroing on/off on PPC
- struct: control printk logging parameters
- int: tune printk ratelimiting
- int: tune printk ratelimiting
- table: profiling information
- dir: pty driver
- Random driver
- int: randomize virtual address space
- real root device to mount after initrd
- Max queuable
- Number of rt sigs queued
- int: dumps of user faults
- struct: maximum rights mask
- struct: sysv semaphore limits
- int: behaviour of dumps for setuid core
- int: sg driver reserved buffer size
- int: Maximum size of shared memory
- long: Maximum shared memory segment
- int: shm array identifiers
- string: path to shm fs
- reboot command on Sparc
- int: serial console power-off halt
- int: Sparc Stop-A enable
- int: number of spinlock retries
- int: Sysreq enable
- int: various kernel tainted flags
- int: unknown nmi panic flag
- string: compile time info
- These values match the ELF architecture values. Unless there is a good reason that should continue to be the case.
- Kexec file load interface flags.
- kexec system call - It loads the new kernel to boot into.
- The artificial cap on the number of segments passed to
kexec_load
. - Key is built into kernel
- Override the check on restricted keyrings
- add to quota, reject if would overrun
- not in quota
- add to quota, permit even if overrun
- allocating a user or user session keyring
- authentication token / access credential / keyring
- set if key is built in to the kernel
- set if key type has been deleted
- set if key has been invalidated
- set if key consumes quota
- set if key should not be removed
- set if key had been revoked
- set if key can be cleared by root without permission
- set if key can be invalidated by root without permission
- set if key is a user or user session keyring
- set if key is being constructed in userspace
- group permissions…
- Positively instantiated
- All the above permissions
- Require permission to link
- Require permission to read content
- Require permission to search (keyring) or find (key)
- Require permission to change attributes
- The permissions required on a key that we’re looking up. Require permission to view attributes
- Require permission to update / modify
- third party permissions…
- possessor can create a link to a key/keyring
- possessor can read key payload / view keyring
- possessor can find a key in search / search a keyring
- possessor can set key attributes
- possessor can view a key’s attributes
- possessor can update key payload / add link to keyring
- user permissions…
- Definitions of structures used with the
modify_ldt
system call. Maximum number of LDT entries supported. - The size of each LDT entry.
- links a file may have
- Backwardly compatible definition for source code - trapped in a 32-bit world. If you find you need this, please consider using libcap to untrap yourself…
- User-level do most of the mapping between kernel and user capabilities based on the version tag given by the kernel.
- deprecated - use v3
- Commands accepted by the
_reboot()
system call. - Magic values required to use
_reboot()
system call. - exclusive lock
- This is a mandatory flock …
- or’d with one of the above to prevent blocking
- which allows concurrent read operations
- which allows concurrent read & write ops
- operations for bsd
flock()
, also used by the kernel implementation shared lock - remove lock
- which allows concurrent write operations
- Special
mnt_id
values that can be passed to listmount root mount - deactivate these pages
- Clear the
MADV_DONTDUMP
flag - do inherit across fork
- Explicity exclude from the core dump, overrides the coredump filter bits
- don’t inherit across fork
- don’t need these pages
- common parameters: try to keep these consistent across architectures free pages only if memory pressure
- Worth backing with hugepages
- poison a page for testing
- Undo
MADV_WIPEONFORK
- KSM may merge identical pages
- Not worth backing with hugepages
- no further special treatment
- reclaim these pages
- expect random page references
- remove these pages & resources
- expect sequential page references
- soft offline page for testing
- KSM may not merge identical pages
- will need these pages
- Zero memory on fork, child only
- don’t use a file
- ETXTBSY
- mark it as an executable
- compatibility flags
- Interpret addr exactly
MAP_FIXED
which doesn’t unmap underlying mapping- stack-like segment
- create a huge page mapping
- Huge page size encoding when
MAP_HUGETLB
is specified, and a huge page size other than the default is desired. - pages are locked
- do not block on IO
- don’t check for reservations
- 0x0100 - 0x4000 flags are defined in asm-generic/mman.h populate (prefault) pagetables
- Changes are private
- Share changes
- share + validate extension flags
- give out an address that is best suited for process/thread stacks
- perform synchronous page faults for the mapping
- 0x01 - 0x03 are defined in linux/mman.h Mask for type of mapping
- For anonymous mmap, memory could be uninitialized
- max frequency error (ns/s)
- max phase error (ns)
- max interval between updates (s)
- BPF has 10 general purpose 64-bit registers and stack frame.
- size of the canonical input queue
- size of the type-ahead buffer
MAX_SWAPFILES
defines the maximum number of swaptypes: things which can be swapped to.- lock all current mappings
- lock all future mappings
- lock all pages that are faulted in
- enum
membarrier_cmd
- membarrier system call command - Alias for header backward compatibility.
- flags for
memfd_create(2)
(unsigned int) - Huge page size encoding when
MFD_HUGETLB
is specified, and a huge page size other than the default is desired. - min interval between updates (s)
- Flags for mlock Lock pages in range after they are faulted in, do not prefault
- Just detach from the tree
- Mark for expiry
- Umount options Attempt to forcibily umount
- List of all
mnt_id_req
versions. sizeof first published struct - NTP userland likes the MOD_ prefix better
- Idmap mount to @
userns_fd
in structmount_attr
. - Do not update access times.
- Disallow access to device special files
- Do not update directory access times
- Disallow program execution
- Ignore suid and sgid bits
- Do not follow symlinks
- Mount attributes used in
fsmount()
. Mount read-only - Update atime relative to mtime/ctime.
- List of all
mount_attr
versions. sizeof first published struct - Always perform atime updates
- Setting on how atime should be updated
- Mount beneath top mount
- Follow automounts on from path
- Empty from path permitted
move_mount()
flags. Follow symlinks on from path- Set sharing group instead
- Follow automounts on to path
- Empty to path permitted
- Follow symlinks on to path
- Policies
- look up vma using address
- preferred local allocation
- return allowed memories
- this policy wants migrate on fault
- Migrate On protnone Reference On Node
- Flags for
get_mempolicy
return next IL mode instead of node mask - Internal flags that share the struct mempolicy flags word with “mode flags”.
- Flags for
set_mempolicy
- always last member of enum
- Internal flags start here
- Modifies ’_MOVE: lazy migrate on fault
- Move pages owned by this process to conform to policy
- Move every page to conform to policy
- Flags for mbind Verify existing pages in the mapping
MPOL_MODE_FLAGS
is the union of all possible optional mode flags passed to eitherset_mempolicy()
ormbind()
.- per-uid limit of kernel memory used by mqueue, in bytes
- Parameters used to convert the timespec values:
- number of entries in message map
- <=
INT_MAX
, max size of message (bytes) - <=
INT_MAX
, default max size of a message queue - MSGMNI, MSGMAX and MSGMNB are default values which can be modified by sysctl.
- unused
- max no. of segments
- message segment size
- number of system message headers
sendmmsg()
: more messages coming- Set
close_on_exec
for file descriptor received throughSCM_RIGHTS
- We never have 32 bit fixups
- Confirm path validity
- copy (not remove) all queue messages
- Nonblocking io
- End of record
- Fetch message from error queue
- recv any msg except of specified type.
- Send data in TCP SYN
- Sender will send more
- msgrcv options no error if message is too big
- Do not generate SIGPIPE
sendpage()
internal : page frags are not shared- Flags we can use with send/ and recv. Added those for 1003.1g not all are supported yet
- Do not send. Only probe path f.e. for MTU
sendpage()
internal : page may carry plain text and require encryptionsendpage()
internal : do no apply policysendpage()
internal : not the last page- ipcs ctl commands
- Synonym for
MSG_DONTROUTE
forDECnet
- Wait for a full request
recvmmsg()
: block until 1+ packets avail- Use user data in kernel path
- sync memory asynchronously
- Directory modifications are synchronous
- invalidate the caches
- Update inode
I_version
field - this is a
kern_mount
call - Update the on-disk
acm
times lazily - Allow mandatory locks on an FS
- Old magic mount flag and mask
- Do not update access times.
- Disallow access to device special files
- Do not update directory access times
- Disallow program execution
- Ignore suid and sgid bits
- VFS does not apply the umask
- change to private
- These are the fs-independent mount-flags: up to 32 flags are supported
- Update atime relative to mtime/ctime.
- Alter flags of a mounted FS
- Superblock flags that can be altered by
MS_REMOUNT
- change to shared
- change to slave
- Always perform atime updates
- These sb flags are internal to the kernel
- synchronous memory sync
- Writes are synced at once
- change to unbindable
MS_VERBOSE
is deprecated.- chars in a file name
/proc/sys/net/appletalk
/proc/sys/net/ax25
/proc/sys/net/bridge
CTL_NET
names:- was
NET_CORE_DESTROY_DELAY
/proc/sys/net/core
/proc/sys/net/dccp
/proc/sys/net/decnet/conf/<dev>/
/proc/sys/net/decnet/conf/<dev>
/proc/sys/net/decnet/
/proc/sys/net/ipv4
/proc/sys/net/ipv4/netfilter
- obsolete since 2.6.38
- obsolete since 2.6.25
- obsolete since 2.6.25
/proc/sys/net/ipv6
/proc/sys/net/ipv6/icmp
/proc/sys/net/ipx
/proc/sys/net/llc
/proc/sys/net/llc/llc2/timeout
/proc/sys/net/llc/llc2
/proc/sys/net/llc/station
/proc/sys/net/<protocol>/neigh/<dev>
/proc/sys/net/netrom
/proc/sys/net/netfilter
/proc/sys/net/rose
/proc/sys/net/sctp
/proc/sys/net/token-ring
/proc/sys/net/unix
/proc/sys/net/x25
- supplemental group IDs are available
SIGEV_THREAD
implementation:- NET: An implementation of the SOCKET network access protocol. This is the master header file for the Linux NET layer, or, in plain English: the networking handling part of the kernel.
- number of available namespaces
- NTP API version
c_oflag
bits- Close the file on
execve()
open_tree()
flags. Clone the target tree and attach the clonec_oflag
bits- On syscall entry, this is syscall#. On CPU exception, this is error code. On hw interrupt, it’s IRQ number:
- set
close_on_exec
- not fcntl
- direct disk access hint
- must be a directory
- used to be
O_SYNC
, see below - not fcntl
- not fcntl
- don’t follow links
- a horrid kludge trying to make sure that this will fail on old kernels
- not fcntl
- Mark parity and framing errors
- chars in a path name including nul
- sizeof first published struct
- add: config2
- add:
branch_sample_type
- add:
sample_regs_user
add:sample_stack_user
- add:
sample_regs_intr
- add:
aux_watermark
- sample collided with another
- snapshot from overwrite mode
- record contains gaps
PERF_RECORD_AUX::flags
bits record was truncated to fit- function call
- conditional
- conditional function call
- conditional function return
- indirect
- indirect function call
- function return
- syscall
- syscall return
- unconditional
- Common flow change classification unknown
O_CLOEXEC
- pid=cgroup id, per-cpu mode only
- Ioctls that can be done on a perf event fd:
- locked transaction
- locked instruction not available
- 5-0xa available Any cache
- L1
- L2
- L3
- L4
- LFB
- N/A
- PMEM
- RAM
- hit level
- I/O memory
- L1
- L2
- L3
- Line Fill Buffer
- Local DRAM
- miss level
- memory hierarchy (memory level, hit or miss) not available
- Remote Cache (1 hop)
- Remote Cache (2 hops)
- Remote DRAM (1 hop)
- Remote DRAM (2 hops)
- Uncached memory
- code (execution)
- load instruction
- type of opcode (load/store/prefetch,code) not available
- prefetch
- store instruction
- Remote
- forward
- 1 free
- snoop hit
- snoop hit modified
- snoop miss
- snoop mode not available
- no snoop
- hit level
- L1
- L2
- miss level
- TLB access not available
- OS fault handler
- Hardware Walker
- These
PERF_RECORD_MISC_*
flags below are safely reused for the following events: - Reserve the last bit to indicate some extended misc field
- Following
PERF_RECORD_MISC_*
are used on different events, so can reuse the same bit position: - Indicates that
/proc/PID/maps
parsing are truncated by time out. - bits 32..63 are reserved for the abort code
- Instruction not related
- Capacity read abort
- Capacity write abort
- Conflict abort
- Values for the memory transaction event qualifier, mostly for abort events. Multiple bits can be set. From elision
- non-ABI
- Retry possible
- Instruction is related
- From transaction
- Security-relevant compatibility flags that must be cleared upon setuid or setgid exec:
- IRIX5 32-bit
- IRIX6 64-bit
- IRIX6 new 32-bit
- Personality types.
- OSF/1 v4
- Protocol families, same as address families.
- Flags for
pidfd_open()
. - Flags for
pidfd_send_signal()
. - bytes in atomic write to a pipe
- The clock frequency of the i8253/i8254 PIT
- currently only for epoll
- These are specified by
iBCS2
- The rest seem to be more-or-less nonstandard. Check them!
- i/o error
- device disconnected
- SIGPOLL (or any other signal without signal specific
si_codes
)si_codes
data input available - input message available
- output buffers available
- high priority input available
- Oxford Semiconductor
- usurped by cyclades.c
- RSA-DV II/S card
- usurped by cyclades.c
- These are the supported serial types.
- Don’t need these pages.
- Data will be accessed once.
- No further special treatment.
- Expect random page references.
- Expect sequential page references.
- Will need these pages.
- element used for project quotas
- page can be executed
- mprotect flag: extend change to start of growsdown vma
- mprotect flag: extend change to end of growsup vma
- 0x10 reserved for arch-specific use 0x20 reserved for arch-specific use page can not be accessed
- page can be read
- page may be used for atomic ops
- page can be written
- Get/set the capability bounding set (as per
security/commoncap.c
) - Control the ambient capability set
- True little endian mode
PowerPC
pseudo little endian- silently emulate fp operations accesses
- don’t emulate fp operations, send
SIGFPE
instead - async recoverable exception mode
- FP exceptions disabled
- floating point divide by zero
- floating point invalid operation
- async non-recoverable exc. mode
- floating point overflow
- precise exception mode
- floating point inexact result
- Use FPEXC for FP exception enables
- floating point underflow
- 64b FP registers
- 32b compatibility
- Get/set
current->mm->dumpable
- Get/set process endian
- Get/set floating-point emulation control bits (if meaningful)
- Get/set floating-point exception mode (if meaningful)
- Get/set whether or not to drop capabilities on
setuid()
away from uid 0 (as persecurity/commoncap.c
) - Get process name
- Second arg is a ptr to return the signal
- Get/set process seccomp mode
- Get/set securebits (as per
security/commoncap.c
) - Per task speculation control
- Get/set whether we use statistical process timing or accurate timestamp based process timing
- Get/set the process’ ability to use the timestamp counter instruction
- Get/set unaligned access control bits (if meaningful)
- Set early/late kill mode for hwpoison memory corruption. This influences when the process gets killed on a memory corruption.
- No longer implemented, but left here to ensure the numbers stay reserved:
- Reset arm64 pointer authentication keys
- Control reclaim behavior when allocating memory
- Tune up process memory map specifics.
- Set process name
- If
no_new_privs
is set, then operations that grant new privileges (i.e. execve) will either fail or not grant them. This affects suid/sgid, file capabilities, and LSMs. - Values to pass as first argument to
prctl()
Second arg is a signal - Set specific pid that is allowed to ptrace the current task. A value of 0 mean “no process”.
- Tagged user address controls for arm64
- Get/set the timerslack as used by
poll/select/nanosleep
A value of 0 means “use default” - Return and control values for
PR_SET/GET_SPECULATION_CTRL
- Speculation control variants
- get task vector length
- arm64 Scalable Vector Extension controls Flag values must be kept in sync with ptrace
NT_ARM_SVE
interface set task vector length - defer effect until exec
- inherit across exec
- Bits common to
PR_SVE_SET_VL
andPR_SVE_GET_VL
- Normal, traditional, statistical process timing
- Accurate timestamp based process timing
- allow the use of the timestamp counter
- throw a SIGSEGV instead of reading the TSC
- silently fix up unaligned user accesses
- generate SIGBUS on unaligned user access
- These values are stored in
task->ptrace_message
bytracehook_report_syscall_*
to describe the current syscall-stop. - Wait extended result codes for the above trace options.
- Extended result codes which enabled by means other than options.
- Arbitrarily choose the same ptrace numbers as used by the Sparc code.
- Generic ptrace interface that exports the architecture specific regsets using the corresponding
NT_*
types (which are also used in the core dump). - only useful for access 32bit programs / kernels
- eventless options
- Options set using
PTRACE_SETOPTIONS
or usingPTRACE_SEIZE
@data param - Read signals from a shared (process wide) queue
- 0x4200-0x4300 are reserved for architecture-independent additions.
- resume execution until next branch
/proc/sys/kernel/pty
- First argument to waitid:
- Quota format type IDs
- Quota structure used for communication with userspace via quotactl Following flags are used to specify which fields are valid
- Size of block in which space limits are passed through the quota interface
- Masks for quota types when used as a bitmask
- Usage got below block hardlimit
- Block hardlimit reached
- Usage got below block softlimit
- Block grace time expired
- Block softlimit reached
- Usage got below inode hardlimit
- Inode hardlimit reached
- Usage got below inode softlimit
- Inode grace time expired
- Inode softlimit reached
- Definitions for quota netlink interface
- get quota format used on given filesystem
- get information about quota files
- get disk limits and usage >= ID
- get user quota structure
- turn quotas off
- turn quotas on
- set information about quota files
- set user quota structure
- sync disk copy of a filesystems quotas
- These regs are callee-clobbered. Always saved on kernel entry.
- C ABI says these regs are callee-preserved. They aren’t saved on kernel entry unless syscall needs a complete, fully filled
struct pt_regs
. /proc/sys/kernel/random
- Exchange source and dest
- Don’t overwrite target
- Whiteout source
- Block “lexical” trickery like “..”, symlinks, and absolute paths which escape the dirfd.
- Only complete if resolution can be completed through cached lookup.
- Make all jumps to “/” and “..” be scoped inside the dirfd (similar to chroot(2)).
- Block traversal through procfs-style “magic-links”.
- Block traversal through all symlinks (implies
OEXT_NO_MAGICLINKS
) - how->resolve flags for openat2(2).
- Return frame for iretq
- address space limit
- max core file size
- Resource limit IDs
- max data size
- Maximum filesize
- maximum file locks held
- max locked-in-memory address space
- maximum bytes in POSIX mqueues
- max nice prio allowed to raise to 0-39 for nice level 19 .. -20
- max number of open files
- max number of processes
- max resident set size
- maximum realtime priority
- timeout for RT tasks in us
- max number of pending signals
- max stack size
SuS
says limits have to be unsigned. Which makes a ton more sense anyway.- This limit protects against a deliberately circular list. (Not worth introducing an rlimit for it)
sys_wait4()
uses this- Resource control/accounting header file for linux Definition of struct rusage taken from BSD 4.3 Reno
- only the calling thread
- per-IO
O_APPEND
- per-IO
O_DSYNC
- high priority request, poll if possible
- per-IO, return
-EAGAIN
if operation would block - mask of flags supported by the kernel
- per-IO
O_SYNC
- (1U << 31) is reserved for signed error codes
- Check file is readable.
- non-uapi in-kernel
SA_FLAGS
for those indicates ABI for a signal frame. SA_FLAGS
values:- sizeof first published struct
- add:
util_{min,max}
- For the
sched_{set,get}attr()
calls SCHED_ISO
: reserved but not implemented yet- Scheduling policies
- Can be
ORed
in to make sure the process is reverted back toSCHED_NORMAL
on fork - rw: struct ucred
- Ancillary data object information MACROS Table 5-14 of POSIX 1003.1g “Socket”-level control message types: rw: access rights (array of int)
- rw: security label
- Valid flags for
SECCOMP_SET_MODE_FILTER
- Flags for seccomp notification fd ioctl.
- Valid values for seccomp.mode and prctl(
PR_SET_SECCOMP
,<mode>
) seccomp is not in use. - uses user-supplied filter.
- uses hard-coded filter.
- Masks for the return value sections.
- allow
- returns an errno
- All BPF programs must return a 32-bit value.
- kill the thread
- allow after logging
- pass to a tracer or disallow
- disallow and force a SIGSYS
- notifies userspace
- Valid operations for seccomp syscall.
- seek relative to current file position
- seek to the next data
- seek relative to end of file
- seek to the next hole
- seek relative to beginning of file
- ADI not enabled for mapped object
- invalid permissions for mapped object
- Disrupting MCD error
- Precise MCD exception
- failed address bound checks
- SIGSEGV
si_codes
address not mapped to object - failed protection key checks
- adjust on exit max value
- of entries in semaphore map
- SEMMNI, SEMMSL and SEMMNS are default values which can be modified by sysctl. The values has been chosen to be larger than necessary for any known configuration.
- <=
INT_MAX
max # of semaphores in system - num of undo structures system wide
- <=
INT_MAX
max num of semaphores per id - <= 1 000 max num of ops per semop call
- unused max num of undo entries per process
- sizeof struct
sem_undo
- <= 32767 semaphore maximum value
- ipcs ctl cmds
- semop flags undo the operation on exit
- If enabled
- Logical level for RTS pin after sent
- Logical level for RTS pin when sending
- Enable bus termination (if supported)
- set all semval’s
- set semval
SHIFT_USEC
defines the scaling (shift) of thetime_freq
andtime_tolerance
variables, which represent the current frequency offset and maximum frequency tolerance. frequency offset scale (shift)- max shm system wide (pages)
- max shared seg size (bytes)
- SHMMNI, SHMMAX and SHMALL are default upper limits which can be modified by sysctl.
- max num of segs system wide
- max shared segs per process
- execution access
- Bits 9 & 10 are
IPC_CREAT
andIPC_EXCL
segment will use huge TLB pages - Huge page size encoding when
SHM_HUGETLB
is specified, and a huge page size other than the default is desired. Seehugetlb_encode.h
- super user shmctl commands
- don’t check for reservations
shmget()
shmflg values. The bottom nine bits are the same asopen(2)
mode flags orS_IRUGO
from<linux/stat.h>
shmat()
shmflg values read-only access- take-over region on attach
- round attach address to SHMLBA boundary
- ipcs ctl commands
- or
S_IWUGO
from<linux/stat.h>
- enum
sock_shutdown_cmd
- Shutdown types: - Shutdown receptions/transmissions
- Shutdown transmissions
- other notification: meaningless
- sigevent definitions
- deliver via thread creation
- deliver to thread
- These should not be considered constants from userland.
- for blocking signals
- default signal handling
- error return from signal
- ignore signal
- for setting the signal mask
- for unblocking signals
- Get stamp (timeval)
- Get stamp (timespec)
- sent by AIO completion
- sent by glibc async name lookup completion
- sent by
execve()
killing subsidiary threads - sent by the kernel from somewhere
- sent by real time mesq state change
- sent by sigqueue
- sent by queued SIGIO
- sent by timer expiration
- sent by tkill system call
- How these fields are to be accessed.
si_code
values Digital reserves positive values for kernel-generated signals. sent by kill, sigsend, raise - Historically,
SOCKWQ_ASYNC_NOSPACE
&SOCKWQ_ASYNC_WAITDATA
were located insock->flags
, but moved intosk->sk_wq->flags
to be RCU protected. Eventually all flags will be insk->sk_wq->flags
. - Flags for socket, socketpair, accept4
- Datagram Congestion Control Protocol socket
- Datagram (conn.less) socket
- Linux specific way of getting packets at the dev level.
- Raw socket
- Reliably-delivered message
- sequential packet socket
- Structure describing an Internet (IP) socket address. sizeof(struct sockaddr)
- enum
sock_type
- Socket types For writing rarp and other similar things on the user level. - Mask which covers at least up to
SOCK_MASK - 1
. The remaining bits are used as flags. - ATM Adaption Layer (packet level)
- ATM layer (cell level)
setsockoptions(2)
level. Thanks to BSD these must matchIPPROTO_xxx
- For setsockopt(3)
- #define
SOL_ICMP
1 No-no-no! Due to Linux :-) we cannot useSOL_ICMP=1
- UDP-Lite (RFC 3828)
- Maximum queue length specifiable by listen.
- Socket filtering
- Instruct lower device to use last 4-bytes of skb data as FCS
- powerpc only differs in these
- Security levels - as per NRL IPv6 - don’t actually do anything
- on 64-bit and x32, avoid the ?: operator
sqe->splice_flags
- pages passed in are a gift
- expect more data
- Flags passed in from splice/tee/vmsplice move pages instead of copying
- don’t block on the pipe splicing (but we may still block on the fd we splice from/to, of course
- bit-flags disable sas during sighandling
- connected to socket
- in process of connecting
- in process of disconnecting
- mask for all
SS_xxx
flags - not allocated
- unconnected to any socket
- Want/got
fs_type
- Want/got mnt_…
- Want/got
mnt_point
- Want/got
mnt_root
- Want/got
propagate_from
mask
bits forstatmount(2)
Want/got sb_…- All currently supported flags
- Want/got
stx_atime
I
File is append-only- Dir: Automount trigger
- Attributes to be found in
stx_attributes
and masked instx_attributes_mask
. I
File requires key to decrypt in fsI
File is marked immutableI
File is not to be dumped- The stuff in the normal stat struct
- Want/got
stx_blocks
- Want/got
stx_btime
- Want/got
stx_ctime
- Want/got
stx_gid
- Want/got
stx_ino
- Want/got
stx_mode & ~S_IFMT
- Want/got
stx_mtime
- Want/got
stx_nlink
- Want/got
stx_size
- Flags to be
stx_mask
- Want/got
stx_uid
- Reserved for future struct statx expansion
- clock source (0 = A, 1 = B) (ro)
- clock hardware fault (ro)
- delete leap (rw)
- select frequency-lock mode (rw)
- hold frequency (rw)
- insert leap (rw)
- mode (0 = PLL, 1 = FLL) (ro)
- resolution (0 = us, 1 = ns) (ro)
- Status codes (timex.status) enable PLL updates (rw)
- PPS signal calibration error (ro)
- enable PPS freq discipline (rw)
- PPS signal jitter exceeded (ro)
- PPS signal present (ro)
- enable PPS time discipline (rw)
- PPS signal wander exceeded (ro)
- read-only bits
- clock unsynchronized (rw)
- Command definitions for the ‘quotactl’ system call.
- One swap address space for each 64M swap space
- Special value in each
swap_map
continuation. - enable discard for swap
- discard swap area at swapon-time
- discard page-clusters after use
- set if swap priority specified
- Bit flag in
swap_map
. - Note page is bad
- Special value in first
swap_map
. - Owned by shmem/tmpfs
- Clear ring buffer.
- Close the log. Currently a NOP.
- Set level of messages printed to console
- Disable printk’s to console
- Enable printk’s to console
- Open the log. Currently a NOP.
- Read from the log.
- Read all messages remaining in the ring buffer.
- Read and clear all messages remaining in the ring buffer
- Return size of the log buffer
- Return number of unread characters in the log buffer
sys_accept4(2)
sys_accept(2)
sys_bind(2)
sys_connect(2)
sys_getpeername(2)
sys_getsockname(2)
sys_getsockopt(2)
sys_listen(2)
sys_recvfrom(2)
sys_recvmmsg(2)
sys_recvmsg(2)
sys_recv(2)
- SIGSYS
si_codes
seccomp triggered Return fromSYS_SECCOMP
as it is already used by an syscall num. sys_sendmmsg(2)
sys_sendmsg(2)
sys_sendto(2)
sys_send(2)
sys_setsockopt(2)
sys_shutdown(2)
sys_socketpair(2)
sys_socket(2)
- struct dirent file types exposed to user via
getdents(2)
,readdir(3)
- 0x54 is just a magic number to make these relatively unique (‘T’)
- SYS5 TCGETX compatibility
tcflush()
QUEUE_SELECTOR
argument and TCFLSH use these Discard data received but not yet read- Send a STOP character
- Discard all pending data
- Send a START character
- Discard data written but not yet sent
tcflow()
ACTION argument and TCXONC use these Suspend output- Restart suspended output
- ECN was negociated at TCP session init
- we received at least one packet with ECT
- SYN-ACK acked data in SYN sent or rcvd
- for
TCP_INFO
socket option - Set TCP initial congestion window
- Set
sndcwnd_clamp
- Get Congestion Control (optional) info
- Congestion control algorithm
- Never send partially complete segments
- Wake up listener only when data arrive
- Enable
FastOpen
on listeners - Attempt
FastOpen
with connect - Set the key for Fast Open (cookie)
- Enable TFO without a TFO cookie
- Information about this connection.
- Notify bytes available to read as a cmsg on read
- Number of keepalives before death
- Start keeplives after this period
- Interval between keepalives
- Life time of orphaned FIN-WAIT-2 state
- Limit MSS
- TCP MD5 Signature (RFC2385)
- TCP MD5 Signature with extensions
- ifindex set
tcp_md5sig
extension flags forTCP_MD5SIG_EXT
address prefix length- for
TCP_MD5SIG
socket option - TCP general constants IPv4 (RFC1122, RFC2581)
- IPv6 (tunneled), EDNS0 (RFC3226)
- TCP socket options Turn off Nagle’s algorithm.
- limit number of unsent bytes in write queue
- Block/reenable quick acks
- TCP sock is under repair right now
- Turn off without window probes
- Get/set window parameters
- Get SYN headers recorded for connection
- Record SYN headers for new connections
- Number of SYN retransmits
- Fast retrans. after 1 dupack
- Use linear timeouts for thin streams
- delay outgoing packets by XX usec
- Attach a ULP to a TCP connection
- How long for loss retry before timeout
- Bound advertised window
- tcsetattr uses these
- Needed for POSIX
tcsendbreak()
- CAREFUL: Check include/asm-generic/fcntl.h when defining new flags, since they might collide with
O_*
ones. - Located here for
timespec[64]_valid_strict
- The various flags for setting POSIX.1b interval timers:
- bw compat
- delete leap second
- clock not synchronized
- insert leap second
- Clock states (
time_state
) clock synchronized, no leap second - leap second in progress
- Limits for
settimeofday()
: - leap second has occurred
- BSD compatibility
- Get primary device node of /dev/console
- Get exclusive mode state
- read serial port inline interrupt counts
- Get packet mode state
- Get Pty lock state
- Get Pty Number (of pty-mux device)
- Safely open the slave
- Return the session ID of FD
- wait for a change on serial input line(s)
- modem lines
- Used for packet mode
- BSD compatibility
- Get line status register
- Get multiport config
- For debugging only
- Set multiport config
- Transmitter physically empty
- pty: generate signal
- Lock/unlock Pty
- process taken branch trap
- SIGTRAP
si_codes
process breakpoint - hardware breakpoint/watchpoint
- process trace trap
- undiagnosed trap
UIO_MAXIOV
shall be at least 16 1003.1g (5.4.1.1)- Don’t follow symlink on umount
- Flag guaranteed to be unused
- Flags for bug emulation.
- element used for user quotas
c_cc
characters- block dump mode
dirty_background_ratio
dirty_expire_centisecs
dirty_ratio
dirty_writeback_centisecs
- int: nuke lots of pagecache
- permitted hugetlb group
- int: Number of available Huge Pages
- vm laptop mode
- legacy/compatibility virtual address space layout
- reservation ratio for lower memory zones
- int: Maximum number of mmaps/address-space
- Minimum free kilobytes to maintain
- Percent pages ignored by zone reclaim
- Set min percent of unmapped pages
nr_pdflush_threads
- Turn off the virtual memory safety limit
- percent of RAM to allow overcommit in
- struct: Control pagebuf parameters
- int: set number of pages to swap together
- panic at out-of-memory
- int: fraction of pages in each
percpu_pagelist
- Tendency to steal mapped memory
- default time for token time out
CTL_VM
names:- was; int: Linear or
sqrt()
swapout for hogs - was: struct: Set free page thresholds
- Spare
- was: struct: Set buffer memory thresholds
- was: struct: Set cache memory thresholds
- was: struct: Control kswapd behaviour
- was: struct: Set page table cache parameters
- map VDSO into new processes?
- dcache/icache reclaim pressure
- reclaim local zone memory before going off node
- Don’t reap, just poll status.
- Check file is writable.
- set value, fail if attr already exists
- Security namespace
- size of extended attribute namelist (64k)
- chars in an extended attribute name
- Namespaces
- set value, fail if attr does not exist
- size of an extended attribute value (64k)
- User return codes for XDP prog type.
- Check file is executable.
- Most things should be clean enough to redefine this at will, if care is taken to make libc match.
- Limit the stack by to some sane default: root can always increase this limit if needed.. 8MB seems reasonable.
- request aborted
- back merged to existing rq
- bio was bounced
- from a cgroup
- completed by driver
- driver-specific binary data
- front merge to existing rq
- allocated new request
- insert request
- sent to driver
- queue was plugged
- queued
- bio was remapped
- request requeued
- sleeping on rq allocation
- bio was split
- queue was unplugged by io
- queue was unplugged by timer
- from a cgroup
- Character string message
- establish pid/name mapping
- include system clock
- decimal division by zero
- packed decimal error
- decimal overflow
- invalid ASCII digit
- invalid decimal digit
- bundle-update (modification) in progress
- illegal break
- Before Linux 2.6.33 only
O_DSYNC
semantics were implemented, but using theO_SYNC
flag. - performed a listen
- Wait on all children, regardless of type
- Wait only on non-SIGCHLD children
- Don’t wait on children of other threads in this group
Functions§
FUTEX_WAKE_OP
will perform atomically.- Used to create numbers.
- used to decode ioctl numbers..
- Definitions of the bits in an Internet address integer. On subnets, host and network parts are found according to the subnet mask, not these masks.
Type Aliases§
- anything below here should be completely generic
- Most 32 bit architectures use
unsigned int
size_t
, and all 64 bit architectures useunsigned long
size_t
. - Most 64-bit platforms use ‘long’, while most 32-bit platforms use ‘__u32’.
- The default
si_band
type is “long”, as specified by POSIX. However, some architectures want to override this to “int” for historical compatibility reasons, so we allow that. - Basic trace actions
- Trace categories
- Notify events.
- The type of
fsconfig()
call made. - key handle permissions mask
- key handle serial number
- Type of a SYSV IPC key.
- Anything below here should be completely generic.
- at least 32 bits
- The type of an index into the pagecache.
- Type in which we store ids in memory
- Type in which we store sizes
- The restore function should be written with assembly, or naked rust function, which does not modify stack frame.
- Flags for
preadv2/pwritev2
: - The type used for indexing onto a disc or disc partition.
- Type of a signal handler.
signalfn_t
as usize restorefn_t
as usize- Most 32 bit architectures use
unsigned int
size_t
, and all 64 bit architectures useunsigned long
size_t
. - socket-state enum.
Unions§
- inputs to lookup
- IPv6 address structure
- arg for semctl system calls.