106 Chronologically structured chapter that builds foundational Linux concepts step-by-step
Here's a chronologically structured chapter that builds foundational Linux concepts step-by-step and prepares you for FAANG-level interviews with tools, commands, and real-time scenarios.
🧠 Chapter: Mastering Linux Internals for Interviews
1. The Big Picture: How Linux Works
-
Overview: Role of the kernel, user space vs kernel space
-
Key Concept: Linux as a multitasking, multiuser, monolithic kernel system
2. CPU: The Core Executor
-
What: Executes instructions, context switching, scheduling
-
Terms: User time, system time, idle time
-
Command:
top
,htop
,mpstat
,uptime
-
Interview Insight: Explain CPU-bound vs I/O-bound processes
3. Memory: RAM and Virtual Memory
-
Concepts: Virtual memory, paging, swapping, buffers, cache
-
Commands:
free -h
,vmstat
,/proc/meminfo
,top
(RES/VIRT/SHR) -
Interview Tip: What causes high swap usage?
4. Processes and Threads
-
Process: An independent executing program (PID, PPID)
-
Thread: Lightweight process sharing the same address space
-
Commands:
ps -ef
,pstree
,top
,htop
-
System Calls:
fork()
,exec()
,exit()
,wait()
-
Key Difference:
fork()
duplicates,exec()
replaces,exit()
terminates
5. Kernel: The Brain
-
Components: Process scheduler, memory manager, I/O manager
-
System Calls Interface: Bridge between user and kernel space
-
Command:
uname -a
,dmesg
-
Real World: Debugging kernel logs with
dmesg
6. I/O and Disk Subsystems
-
Concepts: Block vs character devices, buffered I/O, async I/O
-
Commands:
iostat
,iotop
,df -h
,du -sh
,lsblk
,mount
-
Use Case: Identify I/O bottlenecks with
iotop
,pidstat -d
7. System Calls Deep Dive
-
What: Interface to kernel services (e.g., file ops, process control)
-
Examples:
read()
,write()
,open()
,close()
,kill()
-
Tool:
strace
— trace system calls and signals -
Interview: How
exec()
works under the hood? Usestrace
to show.
8. Process States and Lifecycle
-
States: Running, Sleeping, Zombie, Stopped, Orphan
-
Monitoring Tools:
-
top
,htop
– for real-time process view -
watch -n 1 'ps aux | grep <pid>'
-
pidstat
– CPU, memory, I/O usage over time -
lsof
– list open files by a process
-
-
Real-World Debug: A zombie process scenario
9. Signals in Depth
-
Types: TERM, KILL, STOP, CONT, HUP, INT, etc.
-
Commands:
-
kill -SIGTERM <pid>
– graceful shutdown -
kill -9 <pid>
– force kill -
trap
in shell scripts
-
-
Tools:
strace -p <pid>
to inspect signal handling -
Real-Time Example: Releasing stuck processes via SIGKILL
10. Background & Detached Execution
-
Commands:
-
nohup command &
– runs even after logout -
disown %job_id
– remove from job table
-
-
Use Case: Run long jobs on remote systems safely
11. Advanced Performance Debugging
-
renice
– change process priority -
pidstat
– profile specific PIDs -
strace
– syscall tracing -
lsof
– open file/socket tracking -
dmesg
– kernel ring buffer
12. Process Monitoring in Production
-
Monitor All States:
-
ps -eo pid,state,cmd
-
top -H
– thread view -
watch -n 1 'ps aux | grep <app>'
-
-
Real-World Case: Memory leak or CPU spike in production
-
top
→strace
→lsof
→kill
orrenice
-
Let's break down and deeply explain the concepts of CPU time and the Linux scheduler using real-world analogies, command-line examples, and system internals. This will help you understand it at an interview level, especially for FAANG or senior DevOps/System Engineer roles.
🔧 Part 1: What is CPU Time?
✅ Definition:
CPU Time refers to the amount of time a CPU spends executing a specific process's instructions, excluding any time the process is idle or waiting for I/O (disk/network) operations.
🔄 Breakdown:
There are typically two types of CPU time:
-
User CPU Time: Time spent executing user-space code (your application).
-
System CPU Time: Time spent in the kernel (system calls, managing files, sockets, memory).
💡 Analogy:
Imagine the CPU as a chef in a kitchen.
-
Each process is a customer placing an order (program to run).
-
CPU time is the time the chef (CPU) actually spends cooking the dish (executing instructions) — not waiting for ingredients (I/O).
📌 Real Example:
$ time ls -l
Output:
real 0.003s
user 0.001s
sys 0.002s
-
real
: Total elapsed wall-clock time (you watching). -
user
: Time spent executing in user space (0.001s). -
sys
: Time spent in kernel space (0.002s).
So CPU time = user + sys = 0.003s
.
⚙️ Part 2: Linux Scheduler (How CPU Time is Shared)
✅ What is the Scheduler?
The Linux scheduler is a kernel component responsible for deciding which process/thread runs on the CPU and for how long.
It manages CPU time sharing to ensure:
-
Efficiency
-
Fairness (all get CPU)
-
Responsiveness (interactive processes run fast)
-
Throughput (keep CPUs busy)
📦 Types of Scheduling Policies in Linux:
Policy | Description |
---|---|
CFS (default) | Completely Fair Scheduler – balances CPU time fairly across processes |
SCHED_FIFO |
Real-time: first-in, first-out. No time slice, runs until it yields. |
SCHED_RR |
Real-time: Round-robin. Time slice-based rotation. |
SCHED_DEADLINE |
Guarantees deadlines for real-time tasks. |
🧠 How the CFS Scheduler Works (Deep Dive)
CFS (Completely Fair Scheduler) is the default scheduler used by modern Linux kernels.
🔁 Key Concept:
Each process is assigned a virtual runtime (vruntime). The process with the lowest vruntime gets the CPU.
📈 Idea:
-
Track how long a process has used the CPU.
-
If a process has had less CPU time than others, it is prioritized next.
-
Ensures fair CPU time proportionate to process weight (nice value).
🛠️ Tools:
You can view scheduling details using:
ps -eo pid,comm,ni,pri,cls,stat --sort=pid
-
ni
– Nice value (lower = higher priority) -
pri
– Kernel priority -
cls
– Scheduling class (TS
= CFS,FF
= FIFO) -
stat
– Process state
🧮 Example of Scheduler in Action
Imagine three processes:
-
Process A (Interactive shell)
-
Process B (Background database)
-
Process C (CPU-intensive encoding)
What scheduler does:
-
Process A: gets quick CPU bursts so shell is responsive.
-
Process B: gets occasional CPU as it's mostly waiting for I/O.
-
Process C: gets fair chunk, but not all CPU, to keep system responsive.
📟 Demo: Viewing CPU Time & Scheduling
🧪 Check CPU time per process:
ps -eo pid,etime,time,comm --sort=-time | head
| etime
| Elapsed real time since the process started
| time
| Total CPU time (user + system) consumed
👨💻 Strace Example:
To see how system calls contribute to CPU/system time:
strace -T -p <pid>
-
-T
shows how much time each system call takes.
🧠 Interview-Level Questions
Question | Answer |
---|---|
What is CPU time? | Time CPU spends executing user + system code of a process. |
Difference between wall-clock and CPU time? | Wall-clock is total elapsed; CPU time is time the CPU actually executed your process. |
What does the Linux scheduler do? | Decides which process/thread to run next on the CPU. |
What is vruntime in CFS? | A measure of how much CPU time a process has had; lower values are run first. |
What is the default scheduler in Linux? | CFS (Completely Fair Scheduler). |
📚 Summary (In a Nutshell)
Concept | Description |
---|---|
CPU Time | Actual processing time used by the CPU for a process. |
User Time | Time in user space (application code). |
System Time | Time in kernel space (system calls). |
Linux Scheduler | Kernel component that selects which process gets CPU time. |
CFS | Ensures fair CPU sharing using vruntime . |
Tools | top , ps , strace , htop , time , pidstat |
105 Process states and Management
Here’s a complete guide on Process Management – Process States, focused on FAANG-level interview preparation, with clear explanations, real-world examples, and 20 high-quality Q&A to test and reinforce your understanding.
🔧 1. Key Topics to Cover (Process Management & Process States)
Category | Topics |
---|---|
Process Basics | PID, PPID, UID, GID, nice, renice |
Process Lifecycle | Creation, Execution, Termination |
Process States (Linux/Unix) | Running , Waiting , Stopped , Zombie , Sleeping , Dead |
State Transitions | How and when process moves between states |
Signals | kill , SIGKILL , SIGTERM , SIGSTOP , SIGCONT , etc. |
Parent-Child Relationship | Orphan and Zombie processes |
Foreground/Background Jobs | fg , bg , jobs , & , nohup |
Process Scheduling | nice , renice , priority (niceness level), time slicing |
Troubleshooting Tools | ps , top , htop , pstree , strace , lsof , kill , nice , renice , watch |
🚦 2. Process States in Linux (FAANG-focused)
State | Description |
---|---|
R (Running) |
Actively executing on CPU |
S (Sleeping) |
Waiting for I/O (interruptible sleep) |
D (Uninterruptible sleep) |
Waiting for disk/network (not killable easily) |
T (Stopped) |
Suspended (via signal like SIGSTOP) |
Z (Zombie) |
Process completed but parent didn’t call wait() |
X (Dead) |
Terminated, not seen often |
I (Idle) |
Kernel threads only, rarely shown |
📚 3. FAANG-Ready Interview Questions + Answers
✅ Basic to Intermediate
Q1. What are the different states a process can be in? Explain.
Answer:
-
Running (R): Actively on CPU or ready to run.
-
Sleeping (S): Waiting for I/O; can be interrupted by signals.
-
Uninterruptible Sleep (D): Waiting on resources like disk; cannot be interrupted.
-
Stopped (T): Halted by a signal (e.g.,
SIGSTOP
). -
Zombie (Z): Terminated but not reaped by parent.
-
Dead (X): Process terminated and removed from system.
Q2. How can you identify zombie processes on Linux?
Answer:
ps aux | grep 'Z'
Or:
ps -eo pid,ppid,state,cmd | grep Z
Zombie processes show as state Z and can only be cleared if the parent process is terminated or calls wait().
Q3. What is the difference between zombie and orphan process?
Zombie | Orphan |
---|---|
Child terminated, parent alive but didn’t call wait() | Child alive, parent terminated |
Consumes PID | Reassigned to init (PID 1) |
Can’t be killed | Kernel adopts it |
Q4. How does a process move from running to waiting (sleeping)?
Answer:
A process transitions from running to sleeping when it requests I/O or is waiting for a resource (disk, network, user input). The scheduler then switches to another process until I/O is ready.
Q5. Explain the lifecycle of a Linux process.
Answer:
-
Created – via
fork()
orexec()
-
Ready – added to scheduler queue
-
Running – executes instructions
-
Waiting – for I/O or other resource
-
Terminated – exits using
exit()
-
Zombie – if not reaped by parent
✅ Advanced FAANG-Level
Q6. How do you handle a zombie process in a production server?
Answer:
-
Identify with
ps aux | grep Z
-
Check
PPID
-
Kill parent process to allow init (PID 1) to adopt and reap the zombie.
Q7. How to find which syscall a process is waiting on?
Answer:
Use strace
:
strace -p <pid>
This shows the system calls being waited on.
Q8. What does the D
state (Uninterruptible sleep) mean and why is it dangerous?
Answer:
It indicates the process is waiting for I/O and cannot be killed or interrupted (even with kill -9
). This can be a sign of hardware issues, NFS hang, or I/O lock.
Q9. How does the Linux kernel schedule processes?
Answer:
-
Linux uses the CFS (Completely Fair Scheduler).
-
It maintains a red-black tree to assign CPU time.
-
Each process gets a fair share of time based on niceness and priority.
Q10. What happens when you run kill -9 <pid>
on a zombie?
Answer:
Nothing. Zombie processes are already terminated. They can’t be killed. You need to terminate the parent process so the kernel can clean up the zombie.
✅ Real-World + Debugging
Q11. How do you debug high CPU usage from a process?
Answer:
top or htop
pidstat -p <pid> 1
strace -p <pid>
lsof -p <pid>
These tools help track what the process is doing, system calls, and open files.
Q12. How do you change the priority of a running process?
Answer:
renice -n 10 -p <pid>
Niceness range: -20 (highest priority) to 19 (lowest priority).
Q13. What’s the use of the pstree
command?
Answer:
Shows process hierarchy, useful for visualizing parent-child relationships:
pstree -p
Q14. What signal is sent by default using kill
command?
Answer:
SIGTERM (15)
– allows process to gracefully shut down.
Q15. When would you use nohup
?
Answer:
To run a process after logout/session close:
nohup ./my_script.sh &
✅ Behavioral + Situational
Q16. You found a process in D
state for hours. What do you do?
Answer:
-
Use
strace
ordmesg
to check blocked I/O. -
Check if NFS, disk, or DB is hung.
-
Consider restarting the service or rebooting if stuck.
Q17. A user reports background jobs disappearing after logout. Why?
Answer:
-
Use
nohup
ordisown
to detach processes from terminal session.
nohup script.sh &
disown
Q18. How to monitor all states of processes?
Answer:
ps -eo pid,user,state,cmd | sort
Also use top
, htop
, watch
.
Q19. What causes a process to become a zombie?
Answer:
Child process exits, but parent hasn’t called wait()
to read its exit status. Kernel keeps it to return info when parent requests.
Q20. How can you force the cleanup of multiple zombie processes?
Answer:
Find the parent with:
ps -eo ppid,pid,state,cmd | grep Z
Then kill or restart the parent to allow init to reap the child processes.
📌 Summary Cheat Sheet
Process Tool | Purpose |
---|---|
top / htop |
Real-time resource usage |
ps |
List process info |
strace |
Trace system calls |
lsof |
List open files/sockets |
kill / pkill / killall |
Send signals |
nice / renice |
Adjust priority |
jobs / bg / fg |
Background job control |
pstree |
Visualize process hierarchy |
104 Basic Commands
The commands cpu
, top
, kill
, and ps
are essential tools in Linux for monitoring and managing processes and system resources. Let’s go over each command in detail, including:
-
Purpose
-
Syntax
-
Common use-cases
-
Example output
-
How to analyze the output
🔹 1. top
— Real-time System Monitoring
📌 Purpose:
Displays real-time information about system processes, CPU, memory usage, and load average.
📌 Syntax:
top
📌 Example Output (Partial):
top - 08:45:26 up 2:34, 2 users, load average: 0.15, 0.20, 0.25
Tasks: 138 total, 1 running, 137 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.0 us, 1.0 sy, 0.0 ni, 95.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7850.2 total, 2200.5 free, 3400.6 used, 2249.1 buff/cache
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1324 root 20 0 123456 45678 1234 R 23.4 0.6 0:10.53 chrome
📌 How to Analyze:
-
Load average: First line shows 1, 5, and 15-minute system load. Rule of thumb: if load > number of CPUs, the system is overloaded.
-
%CPU: High
%us
(user) means CPU is working on your tasks,%sy
(system) is kernel,%id
is idle time. -
%MEM: Memory usage per process.
-
PID/COMMAND: Helps identify the exact process consuming resources.
🔹 2. ps
— Snapshot of Current Processes
📌 Purpose:
Displays a snapshot of current processes (unlike top
which is real-time).
📌 Syntax:
ps aux # All processes
ps -ef # Also shows all, but with different format
📌 Example Output:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 8960 2340 ? Ss 08:00 0:01 /sbin/init
prakash 1324 5.5 1.2 123456 45678 ? Sl 08:45 0:10 chrome
📌 Key Fields to Analyze:
-
PID: Process ID.
-
%CPU, %MEM: CPU and memory usage.
-
VSZ/RSS: Virtual and resident memory size.
-
STAT: Process state (
R
running,S
sleeping,Z
zombie). -
COMMAND: The command/process name.
🔹 3. kill
— Send Signal to a Process
📌 Purpose:
Terminates (or sends other signals) to a process using its PID.
📌 Syntax:
kill <PID> # Send SIGTERM (default)
kill -9 <PID> # Send SIGKILL (force kill)
kill -l # List all signals
📌 Example:
ps aux | grep chrome
# prakash 1324 5.5 1.2 ... chrome
kill -9 1324
📌 How to Analyze:
-
If a process is unresponsive, use
kill -9
. -
Use
ps
ortop
to find the PID of the problem process before killing.
🔹 4. cpu
— (Note: There's no native cpu
command in Linux)
📌 Possible Meanings:
-
You might be referring to:
-
Checking CPU usage via
top
,htop
, ormpstat
. -
lscpu
— to get CPU architecture info.
-
✅ Example 1: View CPU architecture
lscpu
Output:
Architecture: x86_64
CPU(s): 8
Model name: Intel(R) Core(TM) i7-8565U
CPU MHz: 1800.000
✅ Example 2: CPU usage
mpstat -P ALL 1
Output:
11:28:01 AM CPU %usr %sys %idle
11:28:02 AM all 5.00 1.00 94.00
🔍 Summary Comparison Table
Command | Use Case | Output Highlights | When to Use |
---|---|---|---|
top |
Real-time process monitoring | Load avg, CPU %, MEM %, PID, COMMAND | Live resource troubleshooting |
ps |
Snapshot of process state | PID, %CPU, %MEM, STAT, COMMAND | Get PID or process list |
kill |
Send signals (terminate) | N/A (command-line tool) | Stop/kill hung or rogue processes |
lscpu |
View CPU architecture info | Model name, cores, threads | Debug CPU availability or model |
mpstat |
CPU usage per core over time | %user, %sys, %idle | Performance bottleneck diagnosis |
📌 Real-world Interview Tip (FAANG-ready):
Be prepared to:
-
Find high CPU processes using
top
orps
. -
Kill a zombie or runaway process with
kill -9
. -
Monitor memory leaks or CPU bottlenecks.
-
Explain CPU load and how to scale (e.g., vertical scaling, multithreading).
Category: General Process Monitoring
1. Q: How do you find the top 5 memory-consuming processes on a Linux system?
A:
ps aux --sort=-%mem | head -n 6
-
--sort=-%mem
: Sorts descending by memory usage. -
head -n 6
: First line is header.
2. Q: How do you monitor CPU usage per core in real-time?
A:
mpstat -P ALL 1
-
-P ALL
: Show stats for all cores. -
1
: Refresh every second.
3. Q: A process is stuck in zombie state. Can you kill it?
A: No. Zombie processes are already dead; they just haven’t been cleaned up. The parent process must wait()
to release them. You can kill the parent process to clean it up.
4. Q: How do you identify zombie processes?
A:
ps aux | awk '$8=="Z" { print $2, $11 }'
OR
ps -eo pid,ppid,stat,cmd | grep Z
5. Q: How do you kill all Java processes running on the system?
A:
pkill -f java
Or:
ps aux | grep java | awk '{print $2}' | xargs kill -9
🔸 Category: top
Analysis
6. Q: What does the load average in top
mean?
A: It shows the average number of processes waiting to run:
-
First = 1-minute average
-
Second = 5-minute average
-
Third = 15-minute average
If it's > number of cores, system is overloaded.
7. Q: In top
, what does %wa
mean in CPU stats?
A: %wa
is the time the CPU is waiting on I/O. High %wa
may indicate disk or network bottlenecks.
8. Q: How do you sort by memory in top
?
A: Inside top
, press Shift + M
.
9. Q: You see a process consuming 100% CPU in top
. How do you find out what it's doing?
A:
-
Get PID from
top
-
Use
strace -p <pid>
to trace syscalls. -
Use
lsof -p <pid>
to inspect open files/sockets.
10. Q: How can you monitor top 10 CPU processes in real-time with a script?
A:
watch -n 2 "ps -eo pid,comm,%cpu,%mem --sort=-%cpu | head -n 11"
🔸 Category: ps
and kill
11. Q: What's the difference between kill
and kill -9
?
A:
-
kill
sends SIGTERM (15): Graceful shutdown. -
kill -9
sends SIGKILL (9): Force kill, can't be trapped or ignored.
12. Q: How do you list all running processes by a specific user?
A:
ps -u <username>
13. Q: How do you kill all processes in a specific group (e.g., all child processes of a PID)?
A:
pkill -P <parent_pid>
14. Q: How do you determine parent-child relationships between processes?
A:
ps -eo pid,ppid,cmd
-
ppid
: Parent PID -
Use
pstree
for visual tree
15. Q: How do you identify processes using the most open files?
A:
lsof | awk '{ print $2 }' | sort | uniq -c | sort -nr | head
🔸 Category: CPU & System Info
16. Q: How do you get CPU core count and thread info on Linux?
A:
lscpu
-
CPU(s)
: Logical cores -
Core(s) per socket
: Physical cores per CPU -
Thread(s) per core
: Hyper-threading
17. Q: How do you monitor CPU utilization over time and graph trends?
A:
-
Use
sar
fromsysstat
:
sar -u 1 10
-
Or use tools like Grafana + Prometheus.
18. Q: How do you detect a CPU bottleneck vs memory bottleneck?
A:
-
High
%us
or%sy
and low%id
intop
: CPU bottleneck. -
High
%wa
: I/O bottleneck. -
High memory usage, swapping (high
si/so
): Memory bottleneck.
19. Q: What is the difference between VSZ and RSS in ps
?
A:
-
VSZ
: Virtual memory size (includes code, data, shared libs). -
RSS
: Resident Set Size (actual physical memory in use).
20. Q: How do you trace what files or ports a process is using?
A:
lsof -p <pid>
-
Shows open files, sockets, network connections, etc.
✅ Bonus Tip for Interviews:
-
Always explain why a tool is used.
-
Tie back to real-world issues (e.g., server slowness, memory leaks, runaway processes).
-
Use watch + ps/top/lsof/netstat combinations to show dynamic diagnosis.
-
Know when to use
strace
,perf
, oriotop
for deeper profiling.
If a process is consuming 100% CPU in top
, you can analyze what it is doing using a combination of strace
, lsof
, and other tools. Here's a complete step-by-step example to guide you through this investigation:
🔍 Scenario:
You observe high CPU usage in top
, and you want to investigate what that process is doing.
✅ Step-by-Step Analysis:
Step 1: Identify the Process in top
top
Look for the process using the most CPU (e.g., 100%).
Sample output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12345 user1 20 0 10240 920 860 R 100.0 0.0 1:23.45 python
→ The PID is 12345
, and it's a Python process.
Step 2: Use strace
to Trace System Calls
Attach strace
to the running process to see what system calls it's making:
sudo strace -p 12345
Sample output:
clock_gettime(CLOCK_MONOTONIC, {tv_sec=12345, tv_nsec=678900000}) = 0
read(3, "", 4096) = 0
epoll_wait(4, [], 128, 0) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=12345, tv_nsec=679000000}) = 0
This shows the process is making repeated system calls—maybe polling or in a busy loop.
You can also log the output to a file and analyze later:
sudo strace -tt -T -p 12345 -o strace.log
Step 3: Use lsof
to See Open Files/Sockets
This tells you which files, libraries, sockets, etc., the process is using.
sudo lsof -p 12345
Sample output:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 12345 user1 cwd DIR 8,1 4096 123456 /home/user1/myapp
python 12345 user1 txt REG 8,1 123456 654321 /usr/bin/python3.8
python 12345 user1 mem REG 8,1 45678 987654 /lib/x86_64-linux-gnu/libc.so.6
python 12345 user1 3u REG 8,1 345678 456789 /home/user1/myapp/log.txt
python 12345 user1 4u IPv4 23456 0t0 TCP 127.0.0.1:45678->127.0.0.1:3306 (ESTABLISHED)
→ It’s connected to a MySQL server on port 3306 — maybe it’s stuck querying DB.
Step 4: Optional - Check the Stack Trace
Use pstack
(if installed) to print the stack trace of the process:
sudo pstack 12345
Sample output:
#0 0x00007f3f5dbae430 in __libc_recv () from /lib64/libc.so.6
#1 0x000055b6d2e5a891 in socket_read ()
#2 0x000055b6d2e4ff3a in main_loop ()
This helps in debugging loops or tight recursion inside the application.
🧠 Interview-ready Summary
Q: You see a process using 100% CPU in
top
. How do you find out what it's doing?
A:
-
Get the PID from
top
. -
Use
strace -p <pid>
to trace system calls in real-time. -
Use
lsof -p <pid>
to check files, sockets, and shared libs used. -
(Optional) Use
pstack <pid>
for stack trace. -
Analyze logs or application logic based on what these tools show.
This helps you understand whether it’s stuck in a loop, polling, doing excessive computation, or waiting on I/O.
Here's a clear and detailed explanation of kill
vs pkill
, including differences, use cases, and real examples, perfect for interview prep:
🔪 kill
vs pkill
in Linux
Feature | kill |
pkill |
---|---|---|
Targets | Process by PID | Process by name or pattern |
Signal | Default is SIGTERM (15) |
Default is SIGTERM (15) |
Flexibility | Needs PID only | More user-friendly (no PID needed) |
User Scope | Affects any PID (if permitted) | Can restrict by user, session, terminal |
✅ kill
— Send signal to PID
📌 Syntax:
kill [-SIGNAL] PID
✅ Example:
top
Find a process with PID 12345
.
Kill it:
kill 12345
Send a specific signal (e.g., SIGKILL = 9):
kill -9 12345
📌 You must know the PID beforehand.
✅ pkill
— Send signal to process name
📌 Syntax:
pkill [-SIGNAL] pattern
✅ Example:
To kill all processes with the name python
:
pkill python
To force kill (SIGKILL
) all nginx
processes:
pkill -9 nginx
Restrict to processes run by a specific user:
pkill -u prakash java
Send signal only to processes matching name and terminal:
pkill -t pts/1 bash
📌 You don’t need to look up the PID manually.
🧠 Real-World Use Case Comparison
Task | Command |
---|---|
Kill process with known PID | kill 9876 |
Kill all Java processes | pkill java |
Gracefully stop nginx (default SIGTERM) | pkill nginx |
Force kill Python script by PID | kill -9 13579 |
Kill all processes for a user | pkill -u prakash |
Kill process by exact match (not substring) | pkill -x nginx |
🛑 Common Signals
Signal Name | Number | Purpose |
---|---|---|
SIGTERM |
15 | Graceful termination |
SIGKILL |
9 | Forceful termination |
SIGHUP |
1 | Reload configuration |
SIGSTOP |
19 | Pause process |
SIGCONT |
18 | Resume stopped process |
🧠 Interview-ready Summary
Q: What's the difference between
kill
andpkill
?
A:
-
kill
sends signals to a specific process ID (PID). -
pkill
matches process names or patterns, making it easier to target multiple or unknown PIDs.
Example:
-
kill -9 1234
kills PID 1234. -
pkill -9 nginx
kills allnginx
processes.
Here’s a detailed explanation of the pstree
command with an example output — useful for Linux interviews and real-time debugging.
🌳 What is pstree
?
pstree
shows processes in a tree format, visualizing parent-child relationships (i.e., which process spawned which).
✅ Basic Syntax:
pstree
📌 Example Output:
systemd─┬─NetworkManager───2*[{NetworkManager}]
├─sshd───sshd───bash───pstree
├─cron
├─dbus-daemon
├─gnome-shell───2*[{gnome-shell}]
├─firefox───9*[{firefox}]
├─pulseaudio───{pulseaudio}
└─cupsd
🔍 How to Read It:
-
systemd
is the root (PID 1). -
sshd
started a childsshd
session, which launchedbash
, which launchedpstree
. -
firefox
has 9 threads shown as{firefox}
. -
2*[{NetworkManager}]
means two threads underNetworkManager
.
🔧 Useful Options:
Command | Description |
---|---|
pstree -p |
Show PIDs |
pstree -u |
Show usernames |
pstree -a |
Show command line arguments |
pstree -h |
Highlight current process tree |
pstree <username> |
Show processes owned by a specific user |
pstree <PID> |
Show subtree for a specific process |
📌 Example with PID and arguments:
pstree -p -a
Output:
systemd(1)─┬─cron(585)
├─sshd(980)───sshd(1032)───bash(1033)───pstree(1080) -p -a
├─nginx(850) -g daemon off;───nginx(851)
🧠 Interview Tip
Q: Why use
pstree
instead ofps
?
A: pstree
visually shows parent-child process hierarchy, helping you trace how a process was spawned (e.g., sshd → bash → python script
) — useful in debugging background services, daemons, or containerized environments.
Here's a code-block-style diagram to help you visually understand bottlenecks in a Linux system, categorized by key system components like CPU, memory, disk I/O, network, and processes. This is extremely useful for both troubleshooting and interviews:
📊 Linux System Bottlenecks – Block Diagram (Text-Based)
+-----------------+ +----------------+ +------------------+
| CPU |<-----> | Processes |<----->| Memory |
+-----------------+ +----------------+ +------------------+
^ | ^ | |
| | | | v
| | +-----+ +------+ +-------------+
| | | High CPU usage | | Swapping |
| +------------------->| Infinite loops | | Thrashing |
| | Busy-waiting | +-------------+
|
| +----------------+
| | Interrupts |
+----------| Context Switch |
+----------------+
|
v
+----------------+
| Load Avg |
| (uptime, top) |
+----------------+
+-------------------+ +-------------------+
| Disk I/O |<-----> | Processes |
+-------------------+ +-------------------+
| ^
| |
v |
+-------------------+ +--------------------------+
| High I/O Wait |<-------| Log-heavy apps, DBs |
| (iostat, iotop) | | Misconfigured writes |
+-------------------+ +--------------------------+
+-------------------+ +-------------------+
| Network I/O |<-----> | Processes |
+-------------------+ +-------------------+
| ^
| |
v |
+---------------------+ +-------------------------+
| High RX/TX, Drops |<------| Chatty apps, poor MTU |
| (iftop, netstat) | | Packet loss, congestion |
+---------------------+ +-------------------------+
🔍 Mapping Bottlenecks to Tools
Bottleneck Area | Symptoms | Commands to Diagnose |
---|---|---|
CPU | High load, 100% usage | top , htop , mpstat , pidstat |
Memory | Swapping, OOM kills | free -m , vmstat , dmesg , top |
Disk I/O | High iowait, slow FS access | iotop , iostat , dstat , df -h , lsof |
Network | Packet drops, latency spikes | iftop , nethogs , ss , netstat , ping |
Process | Zombie/stuck/high CPU tasks | ps aux , top , pstree , strace , lsof |
System Load | Load avg spikes | uptime , top , w , sar , vmstat |
🧠 Interview Tip
Q: How would you diagnose a slow Linux system?
A:
Start by checking system-wide metrics:
-
top
/htop
for CPU, memory, process bottlenecks. -
iotop
oriostat
for disk I/O wait. -
iftop
ornetstat
for network delays.
Then drill down using: -
strace
,lsof
,vmstat
, anddmesg
to isolate the culprit.
Would you like a printable visual PNG/diagram version of this block diagram for your notes?
103 Sudo vs Su and Principle of least privileges
1. Difference Between su
and sudo
Basic Definition
-
su
(Substitute User or Switch User):
Allows you to switch to another user account, typically the root user. You’ll be prompted to enter the target user’s password. -
sudo
(Superuser Do):
Allows you to run a single command with elevated privileges, but you use your own password (not root's). Access is controlled through the/etc/sudoers
file.
Key Differences
Feature | su |
sudo |
---|---|---|
Purpose | Switch to another user session | Run a command with elevated privileges |
Password Needed | Target user's password (e.g. root) | Your own password |
Security | Less secure (full shell access) | More secure (limited command access) |
User Traceability | No command logs | Commands logged in /var/log/auth.log |
Configuration | No configuration | Highly configurable via /etc/sudoers |
Session Scope | Creates a new shell | Executes a single command |
Real-World Usage
-
Use
su
when:-
You need to maintain a full root shell session.
-
You're working in an environment where
sudo
isn’t configured.
-
-
Use
sudo
when:-
You want to minimize risk by limiting access to specific commands.
-
You're working in a multi-user environment and need an audit trail.
-
You want to adhere to best security practices.
-
Example
# Using su to become root
su -
# (enter root password)
apt update
# Using sudo to run a command as root
sudo apt update
# (enter your password)
2. Principle of Least Privilege (PoLP)
What It Means
The Principle of Least Privilege is a security best practice stating that users and processes should be granted only the minimum level of access needed to perform their tasks — no more, no less.
💡 Why It’s Important
-
Reduces attack surface: Limits what attackers can do if they gain access.
-
Minimizes human error: Prevents accidental deletion or system changes.
-
Improves auditability: Easier to track and understand permission usage.
-
Supports compliance: Many regulations (e.g., HIPAA, GDPR) require it.
How It's Applied in Linux
-
User Permissions
Users are placed into groups and given access to only necessary files or directories. -
Sudo Configuration
The/etc/sudoers
file is used to allow users to run specific commands as root, without giving full root access.Example:
john ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart apache2
This means john can restart Apache with
sudo
, but nothing else. -
File Ownership and Permissions
Thechmod
,chown
, andchgrp
commands are used to tightly control file access. -
Service Accounts
Processes like web servers or databases run under dedicated users (e.g.,www-data
) that only have access to their specific directories.
Interview-Worthy Talking Points
-
“Using
sudo
instead ofsu
enforces least privilege by giving users temporary access to specific tasks.” -
“We implement PoLP by reviewing user permissions regularly and removing unnecessary sudo privileges.”
-
“I once audited a system where developers had full root access — we moved them to role-based
sudo
rules, which enhanced security significantly.”
101 Linux ACL
Here is the cleaned-up version of the Linux ACL (Access Control List) explanation without any special symbols:
What is ACL (Access Control List) in Linux?
Definition
ACL is a fine-grained permission system in Linux that allows you to grant different permissions to multiple users and groups on a single file or directory — something that traditional Unix permissions can't do.
Why Use ACL?
Traditional Linux file permissions only allow you to define:
-
One owner (user)
-
One group
-
Permissions for others
So you can only have 3 sets of permissions: user, group, others
But what if:
-
You want to give read access to another user
-
Or let a different group have write access
-
Without changing ownership or primary group
That's where ACL comes in.
ACL vs Traditional Permissions
Feature | Traditional Permissions | ACL |
---|---|---|
Number of users/groups | 1 user, 1 group | Multiple users and groups |
Granularity | Limited (r/w/x) | Fine-grained per user/group |
Inheritance | No | Yes (default ACL on directories) |
Enabling and Using ACL
Check if filesystem supports ACL
Most modern Linux distros with ext4, xfs, or btrfs support it.
mount | grep acl
If not enabled, mount with:
mount -o remount,acl /mount/point
ACL Commands
1. Check current ACLs
getfacl filename
Example:
getfacl report.txt
Output:
file: report.txt
owner: prakash
group: devs
user::rw-
user:john:r--
group::r--
mask::r--
other::---
2. Add ACL for specific user
setfacl -m u:john:r file.txt
John gets read-only access, even if he is not the owner or in the group.
3. Add ACL for a group
setfacl -m g:designers:rw file.txt
Group 'designers' can read/write file.txt
4. Remove ACL entry
setfacl -x u:john file.txt
5. Set Default ACL on directory (for inheritance)
setfacl -d -m u:john:rw /project/folder
All new files inside /project/folder will automatically give john read/write access.
6. Remove all ACLs
setfacl -b file.txt
Example Scenario
You’re building a CI/CD pipeline and want:
-
Dev team to have read/write on app.conf
-
Ops team to have read-only access
-
Jenkins user to have write-only access
Using ACL:
setfacl -m g:dev:rw app.conf
setfacl -m g:ops:r-- app.conf
setfacl -m u:jenkins:-w- app.conf
Output Explanation (getfacl)
file: file.txt
owner: prakash
group: devs
user::rw-
user:john:r--
group::r--
group:designers:rw-
mask::rw-
other::---
Mask defines the maximum permission limit for all ACL users and groups (excluding the owner).
Interview-Ready Q and A
Q1: Why use ACL when traditional permissions exist?
A: Traditional permissions only allow one user and one group. ACL allows multiple users and groups to have different access levels on the same file or directory — useful in collaborative or enterprise environments.
Q2: What does 'setfacl -m u:john:rw file.txt' do?
A: It gives read/write access to user john on file.txt without changing ownership or default permissions.
Q3: What does 'mask::r--' mean in 'getfacl'?
A: The mask defines the maximum permission limit for all ACL users and groups (excluding the owner). Even if ACL grants rw-, a mask of r-- will reduce it to read-only.
Q4: How do you make ACL changes persistent across reboots?
A: ACLs are stored in extended attributes of files, and are persistent across reboots — provided the filesystem is mounted with acl support.
Q5: What’s the difference between default ACL and access ACL?
Type | Applied to | Purpose |
---|---|---|
Access ACL | Files/Dirs | Overrides standard permissions |
Default ACL | Directories | Inherited by new files/dirs |
Summary Cheat Sheet
Command | Description |
---|---|
getfacl file.txt | View ACLs |
setfacl -m u:john:rw file.txt | Set ACL for user |
setfacl -m g:devs:rw file.txt | Set ACL for group |
setfacl -x u:john file.txt | Remove ACL for user |
setfacl -b file.txt | Remove all ACL entries |
setfacl -d -m u:john:rw dir/ | Set default ACL for directory |
Let me know if you want a one-page printable PDF or a Notion-ready template with examples and command cheats.
101 Linux Permissions
Here’s a comprehensive guide to Linux Permissions tailored for FAANG-level interviews—starting from beginner to advanced, along with 10 interview-style questions and answers for practice.
1. Linux Permissions: Beginner to Advanced
1.1 Basics of Linux Permissions
Each file/directory has:
[File Type][Owner][Group][Others]
Example:
-rwxr-xr-- 1 prakash devs 2345 Jun 6 12:00 script.sh
Breakdown:
-
-
= File (can bed
for directory) -
rwx
= Owner: read, write, execute -
r-x
= Group: read, execute -
r--
= Others: read only
1.2 Types of Permissions
Symbol | Meaning | Octal |
---|---|---|
r |
Read | 4 |
w |
Write | 2 |
x |
Execute | 1 |
To get octal value:
chmod 754 filename
# => Owner: 7 (rwx), Group: 5 (r-x), Others: 4 (r--)
1.3 Managing Permissions
-
View:
ls -l
-
Change using symbolic mode:
chmod u+x file.sh # Add execute for owner chmod g-w file.sh # Remove write for group chmod o=r file.sh # Set read-only for others
-
Change using numeric mode:
chmod 755 file.sh
1.4 Ownership
-
Change owner:
chown prakash file.txt
-
Change group:
chgrp devs file.txt
-
Change both:
chown prakash:devs file.txt
1.5 Special Permissions (Advanced)
1.5.1 SetUID (s) — Run as file owner
chmod u+s my_script
ls -l => -rwsr-xr-x
Used in programs like passwd
.
1.5.2 SetGID (s) — Run with group permissions
chmod g+s my_script
ls -l => -rwxr-sr-x
1.5.3 Sticky Bit (t) — Protect deletion in shared dirs
chmod +t /tmp
ls -ld /tmp => drwxrwxrwt
1.6 Default Permissions – umask
Check umask:
umask # e.g., 0022
Meaning:
-
File default: 666 – 0022 = 644 (
rw-r--r--
) -
Dir default: 777 – 0022 = 755 (
rwxr-xr-x
)
1.7 Recursive Permission Change
chmod -R 755 /var/www
chown -R prakash:www-data /var/www
2. 10 Linux Permission Questions & Answers
Q1. What does chmod 755 file.sh
do?
Answer:
Sets permissions to:
-
Owner:
rwx
(7) -
Group:
r-x
(5) -
Others:
r-x
(5)
Q2. What is the use of chmod +x script.sh
?
Answer:
Adds execute permission for the owner, allowing the script to be run directly.
Q3. What is the difference between chmod 777
and chmod 755
?
Answer:
-
777
: Everyone has full access (read/write/execute). -
755
: Only owner has write access, others can read/execute but not modify.
Q4. How do you give read & write to owner, read-only to others?
Answer:
chmod 644 file.txt
Q5. What does -rwsr-xr-x
mean in ls -l
output?
Answer:
SetUID is set:
-
File will run with owner’s privileges, not current user’s.
Q6. What does the Sticky Bit do?
Answer:
Prevents users from deleting others’ files in a shared directory like /tmp
.
drwxrwxrwt
indicates sticky bit set.
Q7. How do you recursively change ownership of a directory and its contents?
Answer:
chown -R user:group /path/to/dir
Q8. Explain what umask 027
means?
Answer:
Default permissions mask:
-
For files: 666 – 027 = 640 (
rw-r-----
) -
For dirs: 777 – 027 = 750 (
rwxr-x---
)
Q9. What’s the permission number for rw-rw-r--
?
Answer:
rw- rw- r-- = 664
Q10. How to give only execute permission to others?
Answer:
chmod o=x file.sh
Bonus Practice Examples
Task | Command |
---|---|
Give full permission to owner only | chmod 700 file.sh |
Remove all permissions for others | chmod o= file.sh |
Set SetGID on a directory | chmod g+s dir |
Make a directory with full access for all | mkdir -m 777 shared_dir |
View special permissions | ls -l or stat file |
Certainly! Let's break down umask
(User Mask or User file creation MASK) in full detail, including concept, default values, calculation logic, and practical examples — especially useful for FAANG-level interviews.
What is umask?
umask
defines the default permission bits to subtract from newly created files or directories.
-
It doesn’t grant permissions — it masks/restricts them.
-
When a user creates a file or directory, Linux applies a default permission first, then subtracts the
umask
.
Default Permission Values
Type | Max default permission |
---|---|
File | 666 (rw-rw-rw- ) – no x by default |
Directory | 777 (rwxrwxrwx ) |
How umask Works (Step-by-Step)
Let's say:
-
umask =
022
For files:
Default: 666
UMASK: 022
Result: 644 → rw-r--r--
For directories:
Default: 777
UMASK: 022
Result: 755 → rwxr-xr-x
So:
-
Owner keeps full permissions
-
Group & Others lose
w
permission
Common umask values & their effects
UMASK | File Permission | Dir Permission | Notes |
---|---|---|---|
000 | 666 → 666 | 777 → 777 | Everyone full access |
022 | 666 → 644 | 777 → 755 | Group/Others: no write |
027 | 666 → 640 | 777 → 750 | Group: read, Others: no access |
077 | 666 → 600 | 777 → 700 | Owner-only full access |
Examples in Shell
🔍 View current umask:
umask
# Output: 0022
Set umask temporarily:
umask 0027
touch file.txt
mkdir dir1
ls -l
# file.txt => -rw-r-----
# dir1 => drwxr-x---
⚙️ Make umask permanent:
For bash shell, add this to:
vi ~/.bashrc
umask 027
Then:
source ~/.bashrc
Important Rules
-
umask
removes bits, not adds them. -
Files are never given execute (
x
) by default, even if umask allows it. -
Directories do get
x
so you cancd
into them.
Interview-ready Example
Q: If a user has a umask of 0077 and runs touch test.txt
and mkdir demo
, what are the resulting permissions?
Answer:
-
Default for file:
666
-
umask:
077
-
Final:
600
(rw-------
) -
Default for dir:
777
-
umask:
077
-
Final:
700
(rwx------
)
So only the owner can read/write/execute. This is a secure configuration.
Visual Table: umask Logic
Resource | Max Default | - UMASK | = Final Permission |
---|---|---|---|
File | 666 | 027 | 640 (rw-r----- ) |
Dir | 777 | 027 | 750 (rwxr-x--- ) |
Absolutely! Let's dive deep into Linux Special Permissions — SetUID, SetGID, and the Sticky Bit — in a clear, detailed, and FAANG-level interview-ready manner with real-world use cases, diagrams (described), and practical examples.
Why Special Permissions Exist
Linux has standard permission bits for:
-
User (Owner)
-
Group
-
Others
But what if you want:
-
A user to run a program as root?
-
Files in a shared directory to always inherit the same group?
-
Prevent users from deleting others’ files in
/tmp
?
That’s where special permission bits come in.
1. SetUID (Set User ID)
What it Does:
When a binary file has SetUID, any user who executes it temporarily assumes the permissions of the file's owner (usually root).
Real-World Use Case:
-
The
passwd
command allows any user to update their own password, but it needs to write to/etc/shadow
, which is owned by root. -
So
passwd
runs as root, even when executed by a regular user.
Example:
ls -l /usr/bin/passwd
-rwsr-xr-x 1 root root 54256 Jan 1 00:00 /usr/bin/passwd
^
└── 's' here means SetUID is ON for user
How to Set:
chmod u+s my_script
🔍 How to Verify:
ls -l my_script
-rwsr-xr-x 1 prakash devs 2345 Jun 6 12:00 my_script
2. SetGID (Set Group ID)
What it Does:
A) On Files:
-
The process runs with group permissions of the file, not the executing user.
B) On Directories:
-
New files/directories inside the directory will inherit the group ownership of the directory — not the creator's default group.
Real-World Use Case:
-
In collaborative environments (e.g.,
/shared/projects
), you want files created by any team member to have the same group, likedevs
.
Example on Directory:
mkdir shared
chgrp devs shared
chmod g+s shared
ls -ld shared
drwxr-sr-x 2 prakash devs 4096 Jun 6 12:00 shared
^
└── 's' on group means SetGID is ON
Now, any file inside shared/
will automatically belong to group devs
.
How to Set on File:
chmod g+s my_binary
How to Set on Directory:
chmod g+s /some/dir
3. Sticky Bit (t)
What it Does:
Only the owner of a file can delete or rename it, even if others have write access to the directory.
Real-World Use Case:
-
/tmp directory: World-writable directory used by all users. Without sticky bit, users could delete each other’s temporary files.
Example:
ls -ld /tmp
drwxrwxrwt 10 root root 4096 Jun 6 12:00 /tmp
^
└── 't' means Sticky Bit is set
How to Set:
chmod +t mydir
How to Remove:
chmod -t mydir
Summary Table
Permission | Symbol | Applies to | Effect |
---|---|---|---|
SetUID | s (user) |
Executable file | Runs as file owner |
SetGID | s (group) |
File or directory | File: runs as group, Dir: inherits group |
Sticky Bit | t (others) |
Directory | Only file owner can delete |
Interview-Style Example
Q: A file shows -rwsr-xr-x
. What does it mean and why is it used?
Answer:
-
The
s
in user field = SetUID. -
This means: when the file is executed, it runs with the owner's privileges, not the executor’s.
-
Common use: commands like
passwd
which need to access/etc/shadow
.
FAANG-Level Takeaway Tips
-
Know how SetUID can be a security risk if misused (e.g., privilege escalation).
-
Sticky bit is essential for shared directories to prevent accidental file deletion.
-
SetGID on directories is useful in CI/CD pipelines and group collaboration.