106 Chronologically structured chapter that builds foundational Linux concepts step-by-step

No comments

 Here's a chronologically structured chapter that builds foundational Linux concepts step-by-step and prepares you for FAANG-level interviews with tools, commands, and real-time scenarios.


🧠 Chapter: Mastering Linux Internals for Interviews


1. The Big Picture: How Linux Works

  • Overview: Role of the kernel, user space vs kernel space

  • Key Concept: Linux as a multitasking, multiuser, monolithic kernel system


2. CPU: The Core Executor

  • What: Executes instructions, context switching, scheduling

  • Terms: User time, system time, idle time

  • Command: top, htop, mpstat, uptime

  • Interview Insight: Explain CPU-bound vs I/O-bound processes


3. Memory: RAM and Virtual Memory

  • Concepts: Virtual memory, paging, swapping, buffers, cache

  • Commands: free -h, vmstat, /proc/meminfo, top (RES/VIRT/SHR)

  • Interview Tip: What causes high swap usage?


4. Processes and Threads

  • Process: An independent executing program (PID, PPID)

  • Thread: Lightweight process sharing the same address space

  • Commands: ps -ef, pstree, top, htop

  • System Calls: fork(), exec(), exit(), wait()

  • Key Difference: fork() duplicates, exec() replaces, exit() terminates


5. Kernel: The Brain

  • Components: Process scheduler, memory manager, I/O manager

  • System Calls Interface: Bridge between user and kernel space

  • Command: uname -a, dmesg

  • Real World: Debugging kernel logs with dmesg


6. I/O and Disk Subsystems

  • Concepts: Block vs character devices, buffered I/O, async I/O

  • Commands: iostat, iotop, df -h, du -sh, lsblk, mount

  • Use Case: Identify I/O bottlenecks with iotop, pidstat -d


7. System Calls Deep Dive

  • What: Interface to kernel services (e.g., file ops, process control)

  • Examples: read(), write(), open(), close(), kill()

  • Tool: strace — trace system calls and signals

  • Interview: How exec() works under the hood? Use strace to show.


8. Process States and Lifecycle

  • States: Running, Sleeping, Zombie, Stopped, Orphan

  • Monitoring Tools:

    • top, htop – for real-time process view

    • watch -n 1 'ps aux | grep <pid>'

    • pidstat – CPU, memory, I/O usage over time

    • lsof – list open files by a process

  • Real-World Debug: A zombie process scenario


9. Signals in Depth

  • Types: TERM, KILL, STOP, CONT, HUP, INT, etc.

  • Commands:

    • kill -SIGTERM <pid> – graceful shutdown

    • kill -9 <pid> – force kill

    • trap in shell scripts

  • Tools: strace -p <pid> to inspect signal handling

  • Real-Time Example: Releasing stuck processes via SIGKILL


10. Background & Detached Execution

  • Commands:

    • nohup command & – runs even after logout

    • disown %job_id – remove from job table

  • Use Case: Run long jobs on remote systems safely


11. Advanced Performance Debugging

  • renice – change process priority

  • pidstat – profile specific PIDs

  • strace – syscall tracing

  • lsof – open file/socket tracking

  • dmesg – kernel ring buffer


12. Process Monitoring in Production

  • Monitor All States:

    • ps -eo pid,state,cmd

    • top -H – thread view

    • watch -n 1 'ps aux | grep <app>'

  • Real-World Case: Memory leak or CPU spike in production

    • topstracelsofkill or renice



Let's break down and deeply explain the concepts of CPU time and the Linux scheduler using real-world analogies, command-line examples, and system internals. This will help you understand it at an interview level, especially for FAANG or senior DevOps/System Engineer roles.


🔧 Part 1: What is CPU Time?

✅ Definition:

CPU Time refers to the amount of time a CPU spends executing a specific process's instructions, excluding any time the process is idle or waiting for I/O (disk/network) operations.

🔄 Breakdown:

There are typically two types of CPU time:

  • User CPU Time: Time spent executing user-space code (your application).

  • System CPU Time: Time spent in the kernel (system calls, managing files, sockets, memory).

💡 Analogy:

Imagine the CPU as a chef in a kitchen.

  • Each process is a customer placing an order (program to run).

  • CPU time is the time the chef (CPU) actually spends cooking the dish (executing instructions) — not waiting for ingredients (I/O).


📌 Real Example:

$ time ls -l

Output:

real    0.003s
user    0.001s
sys     0.002s
  • real: Total elapsed wall-clock time (you watching).

  • user: Time spent executing in user space (0.001s).

  • sys: Time spent in kernel space (0.002s).

So CPU time = user + sys = 0.003s.


⚙️ Part 2: Linux Scheduler (How CPU Time is Shared)

✅ What is the Scheduler?

The Linux scheduler is a kernel component responsible for deciding which process/thread runs on the CPU and for how long.

It manages CPU time sharing to ensure:

  • Efficiency

  • Fairness (all get CPU)

  • Responsiveness (interactive processes run fast)

  • Throughput (keep CPUs busy)


📦 Types of Scheduling Policies in Linux:

Policy Description
CFS (default) Completely Fair Scheduler – balances CPU time fairly across processes
SCHED_FIFO Real-time: first-in, first-out. No time slice, runs until it yields.
SCHED_RR Real-time: Round-robin. Time slice-based rotation.
SCHED_DEADLINE Guarantees deadlines for real-time tasks.

🧠 How the CFS Scheduler Works (Deep Dive)

CFS (Completely Fair Scheduler) is the default scheduler used by modern Linux kernels.

🔁 Key Concept:

Each process is assigned a virtual runtime (vruntime). The process with the lowest vruntime gets the CPU.

📈 Idea:

  • Track how long a process has used the CPU.

  • If a process has had less CPU time than others, it is prioritized next.

  • Ensures fair CPU time proportionate to process weight (nice value).

🛠️ Tools:

You can view scheduling details using:

ps -eo pid,comm,ni,pri,cls,stat --sort=pid
  • ni – Nice value (lower = higher priority)

  • pri – Kernel priority

  • cls – Scheduling class (TS = CFS, FF = FIFO)

  • stat – Process state


🧮 Example of Scheduler in Action

Imagine three processes:

  • Process A (Interactive shell)

  • Process B (Background database)

  • Process C (CPU-intensive encoding)

What scheduler does:

  • Process A: gets quick CPU bursts so shell is responsive.

  • Process B: gets occasional CPU as it's mostly waiting for I/O.

  • Process C: gets fair chunk, but not all CPU, to keep system responsive.


📟 Demo: Viewing CPU Time & Scheduling

🧪 Check CPU time per process:

ps -eo pid,etime,time,comm --sort=-time | head

| etime | Elapsed real time since the process started
| time | Total CPU time (user + system) consumed


👨‍💻 Strace Example:

To see how system calls contribute to CPU/system time:

strace -T -p <pid>
  • -T shows how much time each system call takes.


🧠 Interview-Level Questions

Question Answer
What is CPU time? Time CPU spends executing user + system code of a process.
Difference between wall-clock and CPU time? Wall-clock is total elapsed; CPU time is time the CPU actually executed your process.
What does the Linux scheduler do? Decides which process/thread to run next on the CPU.
What is vruntime in CFS? A measure of how much CPU time a process has had; lower values are run first.
What is the default scheduler in Linux? CFS (Completely Fair Scheduler).

📚 Summary (In a Nutshell)

Concept Description
CPU Time Actual processing time used by the CPU for a process.
User Time Time in user space (application code).
System Time Time in kernel space (system calls).
Linux Scheduler Kernel component that selects which process gets CPU time.
CFS Ensures fair CPU sharing using vruntime.
Tools top, ps, strace, htop, time, pidstat


No comments :

Post a Comment

105 Process states and Management

No comments

 Here’s a complete guide on Process Management – Process States, focused on FAANG-level interview preparation, with clear explanations, real-world examples, and 20 high-quality Q&A to test and reinforce your understanding.


🔧 1. Key Topics to Cover (Process Management & Process States)

Category Topics
Process Basics PID, PPID, UID, GID, nice, renice
Process Lifecycle Creation, Execution, Termination
Process States (Linux/Unix) Running, Waiting, Stopped, Zombie, Sleeping, Dead
State Transitions How and when process moves between states
Signals kill, SIGKILL, SIGTERM, SIGSTOP, SIGCONT, etc.
Parent-Child Relationship Orphan and Zombie processes
Foreground/Background Jobs fg, bg, jobs, &, nohup
Process Scheduling nice, renice, priority (niceness level), time slicing
Troubleshooting Tools ps, top, htop, pstree, strace, lsof, kill, nice, renice, watch

🚦 2. Process States in Linux (FAANG-focused)

State Description
R (Running) Actively executing on CPU
S (Sleeping) Waiting for I/O (interruptible sleep)
D (Uninterruptible sleep) Waiting for disk/network (not killable easily)
T (Stopped) Suspended (via signal like SIGSTOP)
Z (Zombie) Process completed but parent didn’t call wait()
X (Dead) Terminated, not seen often
I (Idle) Kernel threads only, rarely shown

📚 3. FAANG-Ready Interview Questions + Answers

✅ Basic to Intermediate


Q1. What are the different states a process can be in? Explain.

Answer:

  • Running (R): Actively on CPU or ready to run.

  • Sleeping (S): Waiting for I/O; can be interrupted by signals.

  • Uninterruptible Sleep (D): Waiting on resources like disk; cannot be interrupted.

  • Stopped (T): Halted by a signal (e.g., SIGSTOP).

  • Zombie (Z): Terminated but not reaped by parent.

  • Dead (X): Process terminated and removed from system.


Q2. How can you identify zombie processes on Linux?

Answer:

ps aux | grep 'Z'

Or:

ps -eo pid,ppid,state,cmd | grep Z

Zombie processes show as state Z and can only be cleared if the parent process is terminated or calls wait().


Q3. What is the difference between zombie and orphan process?

Zombie Orphan
Child terminated, parent alive but didn’t call wait() Child alive, parent terminated
Consumes PID Reassigned to init (PID 1)
Can’t be killed Kernel adopts it

Q4. How does a process move from running to waiting (sleeping)?

Answer:
A process transitions from running to sleeping when it requests I/O or is waiting for a resource (disk, network, user input). The scheduler then switches to another process until I/O is ready.


Q5. Explain the lifecycle of a Linux process.

Answer:

  1. Created – via fork() or exec()

  2. Ready – added to scheduler queue

  3. Running – executes instructions

  4. Waiting – for I/O or other resource

  5. Terminated – exits using exit()

  6. Zombie – if not reaped by parent


✅ Advanced FAANG-Level


Q6. How do you handle a zombie process in a production server?

Answer:

  1. Identify with ps aux | grep Z

  2. Check PPID

  3. Kill parent process to allow init (PID 1) to adopt and reap the zombie.


Q7. How to find which syscall a process is waiting on?

Answer:
Use strace:

strace -p <pid>

This shows the system calls being waited on.


Q8. What does the D state (Uninterruptible sleep) mean and why is it dangerous?

Answer:
It indicates the process is waiting for I/O and cannot be killed or interrupted (even with kill -9). This can be a sign of hardware issues, NFS hang, or I/O lock.


Q9. How does the Linux kernel schedule processes?

Answer:

  • Linux uses the CFS (Completely Fair Scheduler).

  • It maintains a red-black tree to assign CPU time.

  • Each process gets a fair share of time based on niceness and priority.


Q10. What happens when you run kill -9 <pid> on a zombie?

Answer:
Nothing. Zombie processes are already terminated. They can’t be killed. You need to terminate the parent process so the kernel can clean up the zombie.


✅ Real-World + Debugging


Q11. How do you debug high CPU usage from a process?

Answer:

top or htop
pidstat -p <pid> 1
strace -p <pid>
lsof -p <pid>

These tools help track what the process is doing, system calls, and open files.


Q12. How do you change the priority of a running process?

Answer:

renice -n 10 -p <pid>

Niceness range: -20 (highest priority) to 19 (lowest priority).


Q13. What’s the use of the pstree command?

Answer:
Shows process hierarchy, useful for visualizing parent-child relationships:

pstree -p

Q14. What signal is sent by default using kill command?

Answer:
SIGTERM (15) – allows process to gracefully shut down.


Q15. When would you use nohup?

Answer:
To run a process after logout/session close:

nohup ./my_script.sh &

✅ Behavioral + Situational


Q16. You found a process in D state for hours. What do you do?

Answer:

  • Use strace or dmesg to check blocked I/O.

  • Check if NFS, disk, or DB is hung.

  • Consider restarting the service or rebooting if stuck.


Q17. A user reports background jobs disappearing after logout. Why?

Answer:

  • Use nohup or disown to detach processes from terminal session.

nohup script.sh &
disown

Q18. How to monitor all states of processes?

Answer:

ps -eo pid,user,state,cmd | sort

Also use top, htop, watch.


Q19. What causes a process to become a zombie?

Answer:
Child process exits, but parent hasn’t called wait() to read its exit status. Kernel keeps it to return info when parent requests.


Q20. How can you force the cleanup of multiple zombie processes?

Answer:
Find the parent with:

ps -eo ppid,pid,state,cmd | grep Z

Then kill or restart the parent to allow init to reap the child processes.


📌 Summary Cheat Sheet

Process Tool Purpose
top / htop Real-time resource usage
ps List process info
strace Trace system calls
lsof List open files/sockets
kill / pkill / killall Send signals
nice / renice Adjust priority
jobs / bg / fg Background job control
pstree Visualize process hierarchy




No comments :

Post a Comment

104 Basic Commands

No comments

The commands cpu, top, kill, and ps are essential tools in Linux for monitoring and managing processes and system resources. Let’s go over each command in detail, including:

  • Purpose

  • Syntax

  • Common use-cases

  • Example output

  • How to analyze the output


🔹 1. top — Real-time System Monitoring

📌 Purpose:

Displays real-time information about system processes, CPU, memory usage, and load average.

📌 Syntax:

top

📌 Example Output (Partial):

top - 08:45:26 up  2:34,  2 users,  load average: 0.15, 0.20, 0.25
Tasks: 138 total,   1 running, 137 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.0 us,  1.0 sy,  0.0 ni, 95.0 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7850.2 total,  2200.5 free,  3400.6 used,  2249.1 buff/cache
PID  USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
1324 root      20   0  123456  45678   1234 R  23.4  0.6   0:10.53 chrome

📌 How to Analyze:

  • Load average: First line shows 1, 5, and 15-minute system load. Rule of thumb: if load > number of CPUs, the system is overloaded.

  • %CPU: High %us (user) means CPU is working on your tasks, %sy (system) is kernel, %id is idle time.

  • %MEM: Memory usage per process.

  • PID/COMMAND: Helps identify the exact process consuming resources.


🔹 2. ps — Snapshot of Current Processes

📌 Purpose:

Displays a snapshot of current processes (unlike top which is real-time).

📌 Syntax:

ps aux       # All processes
ps -ef       # Also shows all, but with different format

📌 Example Output:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1  8960  2340 ?        Ss   08:00   0:01 /sbin/init
prakash    1324  5.5  1.2 123456 45678 ?       Sl   08:45   0:10 chrome

📌 Key Fields to Analyze:

  • PID: Process ID.

  • %CPU, %MEM: CPU and memory usage.

  • VSZ/RSS: Virtual and resident memory size.

  • STAT: Process state (R running, S sleeping, Z zombie).

  • COMMAND: The command/process name.


🔹 3. kill — Send Signal to a Process

📌 Purpose:

Terminates (or sends other signals) to a process using its PID.

📌 Syntax:

kill <PID>               # Send SIGTERM (default)
kill -9 <PID>            # Send SIGKILL (force kill)
kill -l                  # List all signals

📌 Example:

ps aux | grep chrome
# prakash   1324  5.5  1.2 ... chrome
kill -9 1324

📌 How to Analyze:

  • If a process is unresponsive, use kill -9.

  • Use ps or top to find the PID of the problem process before killing.


🔹 4. cpu — (Note: There's no native cpu command in Linux)

📌 Possible Meanings:

  • You might be referring to:

    • Checking CPU usage via top, htop, or mpstat.

    • lscpu — to get CPU architecture info.

✅ Example 1: View CPU architecture

lscpu

Output:

Architecture:        x86_64
CPU(s):              8
Model name:          Intel(R) Core(TM) i7-8565U
CPU MHz:             1800.000

✅ Example 2: CPU usage

mpstat -P ALL 1

Output:

11:28:01 AM  CPU    %usr   %sys   %idle
11:28:02 AM  all     5.00   1.00   94.00

🔍 Summary Comparison Table

Command Use Case Output Highlights When to Use
top Real-time process monitoring Load avg, CPU %, MEM %, PID, COMMAND Live resource troubleshooting
ps Snapshot of process state PID, %CPU, %MEM, STAT, COMMAND Get PID or process list
kill Send signals (terminate) N/A (command-line tool) Stop/kill hung or rogue processes
lscpu View CPU architecture info Model name, cores, threads Debug CPU availability or model
mpstat CPU usage per core over time %user, %sys, %idle Performance bottleneck diagnosis

📌 Real-world Interview Tip (FAANG-ready):

Be prepared to:

  • Find high CPU processes using top or ps.

  • Kill a zombie or runaway process with kill -9.

  • Monitor memory leaks or CPU bottlenecks.

  • Explain CPU load and how to scale (e.g., vertical scaling, multithreading).



 Category: General Process Monitoring

1. Q: How do you find the top 5 memory-consuming processes on a Linux system?

A:

ps aux --sort=-%mem | head -n 6
  • --sort=-%mem: Sorts descending by memory usage.

  • head -n 6: First line is header.


2. Q: How do you monitor CPU usage per core in real-time?

A:

mpstat -P ALL 1
  • -P ALL: Show stats for all cores.

  • 1: Refresh every second.


3. Q: A process is stuck in zombie state. Can you kill it?

A: No. Zombie processes are already dead; they just haven’t been cleaned up. The parent process must wait() to release them. You can kill the parent process to clean it up.


4. Q: How do you identify zombie processes?

A:

ps aux | awk '$8=="Z" { print $2, $11 }'

OR

ps -eo pid,ppid,stat,cmd | grep Z

5. Q: How do you kill all Java processes running on the system?

A:

pkill -f java

Or:

ps aux | grep java | awk '{print $2}' | xargs kill -9

🔸 Category: top Analysis

6. Q: What does the load average in top mean?

A: It shows the average number of processes waiting to run:

  • First = 1-minute average

  • Second = 5-minute average

  • Third = 15-minute average
    If it's > number of cores, system is overloaded.


7. Q: In top, what does %wa mean in CPU stats?

A: %wa is the time the CPU is waiting on I/O. High %wa may indicate disk or network bottlenecks.


8. Q: How do you sort by memory in top?

A: Inside top, press Shift + M.


9. Q: You see a process consuming 100% CPU in top. How do you find out what it's doing?

A:

  1. Get PID from top

  2. Use strace -p <pid> to trace syscalls.

  3. Use lsof -p <pid> to inspect open files/sockets.


10. Q: How can you monitor top 10 CPU processes in real-time with a script?

A:

watch -n 2 "ps -eo pid,comm,%cpu,%mem --sort=-%cpu | head -n 11"

🔸 Category: ps and kill

11. Q: What's the difference between kill and kill -9?

A:

  • kill sends SIGTERM (15): Graceful shutdown.

  • kill -9 sends SIGKILL (9): Force kill, can't be trapped or ignored.


12. Q: How do you list all running processes by a specific user?

A:

ps -u <username>

13. Q: How do you kill all processes in a specific group (e.g., all child processes of a PID)?

A:

pkill -P <parent_pid>

14. Q: How do you determine parent-child relationships between processes?

A:

ps -eo pid,ppid,cmd
  • ppid: Parent PID

  • Use pstree for visual tree


15. Q: How do you identify processes using the most open files?

A:

lsof | awk '{ print $2 }' | sort | uniq -c | sort -nr | head

🔸 Category: CPU & System Info

16. Q: How do you get CPU core count and thread info on Linux?

A:

lscpu
  • CPU(s): Logical cores

  • Core(s) per socket: Physical cores per CPU

  • Thread(s) per core: Hyper-threading


17. Q: How do you monitor CPU utilization over time and graph trends?

A:

  • Use sar from sysstat:

sar -u 1 10
  • Or use tools like Grafana + Prometheus.


18. Q: How do you detect a CPU bottleneck vs memory bottleneck?

A:

  • High %us or %sy and low %id in top: CPU bottleneck.

  • High %wa: I/O bottleneck.

  • High memory usage, swapping (high si/so): Memory bottleneck.


19. Q: What is the difference between VSZ and RSS in ps?

A:

  • VSZ: Virtual memory size (includes code, data, shared libs).

  • RSS: Resident Set Size (actual physical memory in use).


20. Q: How do you trace what files or ports a process is using?

A:

lsof -p <pid>
  • Shows open files, sockets, network connections, etc.


✅ Bonus Tip for Interviews:

  • Always explain why a tool is used.

  • Tie back to real-world issues (e.g., server slowness, memory leaks, runaway processes).

  • Use watch + ps/top/lsof/netstat combinations to show dynamic diagnosis.

  • Know when to use strace, perf, or iotop for deeper profiling.



If a process is consuming 100% CPU in top, you can analyze what it is doing using a combination of strace, lsof, and other tools. Here's a complete step-by-step example to guide you through this investigation:


🔍 Scenario:

You observe high CPU usage in top, and you want to investigate what that process is doing.


✅ Step-by-Step Analysis:

Step 1: Identify the Process in top

top

Look for the process using the most CPU (e.g., 100%).

Sample output:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
12345 user1     20   0   10240    920    860 R 100.0  0.0   1:23.45 python

→ The PID is 12345, and it's a Python process.


Step 2: Use strace to Trace System Calls

Attach strace to the running process to see what system calls it's making:

sudo strace -p 12345

Sample output:

clock_gettime(CLOCK_MONOTONIC, {tv_sec=12345, tv_nsec=678900000}) = 0
read(3, "", 4096)                         = 0
epoll_wait(4, [], 128, 0)                = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=12345, tv_nsec=679000000}) = 0

This shows the process is making repeated system calls—maybe polling or in a busy loop.

You can also log the output to a file and analyze later:

sudo strace -tt -T -p 12345 -o strace.log

Step 3: Use lsof to See Open Files/Sockets

This tells you which files, libraries, sockets, etc., the process is using.

sudo lsof -p 12345

Sample output:

COMMAND   PID   USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
python  12345  user1  cwd    DIR  8,1     4096      123456 /home/user1/myapp
python  12345  user1  txt    REG  8,1   123456      654321 /usr/bin/python3.8
python  12345  user1  mem    REG  8,1    45678      987654 /lib/x86_64-linux-gnu/libc.so.6
python  12345  user1    3u   REG  8,1   345678      456789 /home/user1/myapp/log.txt
python  12345  user1    4u  IPv4 23456      0t0        TCP 127.0.0.1:45678->127.0.0.1:3306 (ESTABLISHED)

→ It’s connected to a MySQL server on port 3306 — maybe it’s stuck querying DB.


Step 4: Optional - Check the Stack Trace

Use pstack (if installed) to print the stack trace of the process:

sudo pstack 12345

Sample output:

#0  0x00007f3f5dbae430 in __libc_recv () from /lib64/libc.so.6
#1  0x000055b6d2e5a891 in socket_read ()
#2  0x000055b6d2e4ff3a in main_loop ()

This helps in debugging loops or tight recursion inside the application.


🧠 Interview-ready Summary

Q: You see a process using 100% CPU in top. How do you find out what it's doing?

A:

  1. Get the PID from top.

  2. Use strace -p <pid> to trace system calls in real-time.

  3. Use lsof -p <pid> to check files, sockets, and shared libs used.

  4. (Optional) Use pstack <pid> for stack trace.

  5. Analyze logs or application logic based on what these tools show.

This helps you understand whether it’s stuck in a loop, polling, doing excessive computation, or waiting on I/O.


Here's a clear and detailed explanation of kill vs pkill, including differences, use cases, and real examples, perfect for interview prep:


🔪 kill vs pkill in Linux

Feature kill pkill
Targets Process by PID Process by name or pattern
Signal Default is SIGTERM (15) Default is SIGTERM (15)
Flexibility Needs PID only More user-friendly (no PID needed)
User Scope Affects any PID (if permitted) Can restrict by user, session, terminal

kill — Send signal to PID

📌 Syntax:

kill [-SIGNAL] PID

✅ Example:

top

Find a process with PID 12345.

Kill it:

kill 12345

Send a specific signal (e.g., SIGKILL = 9):

kill -9 12345

📌 You must know the PID beforehand.


pkill — Send signal to process name

📌 Syntax:

pkill [-SIGNAL] pattern

✅ Example:

To kill all processes with the name python:

pkill python

To force kill (SIGKILL) all nginx processes:

pkill -9 nginx

Restrict to processes run by a specific user:

pkill -u prakash java

Send signal only to processes matching name and terminal:

pkill -t pts/1 bash

📌 You don’t need to look up the PID manually.


🧠 Real-World Use Case Comparison

Task Command
Kill process with known PID kill 9876
Kill all Java processes pkill java
Gracefully stop nginx (default SIGTERM) pkill nginx
Force kill Python script by PID kill -9 13579
Kill all processes for a user pkill -u prakash
Kill process by exact match (not substring) pkill -x nginx

🛑 Common Signals

Signal Name Number Purpose
SIGTERM 15 Graceful termination
SIGKILL 9 Forceful termination
SIGHUP 1 Reload configuration
SIGSTOP 19 Pause process
SIGCONT 18 Resume stopped process

🧠 Interview-ready Summary

Q: What's the difference between kill and pkill?

A:

  • kill sends signals to a specific process ID (PID).

  • pkill matches process names or patterns, making it easier to target multiple or unknown PIDs.

Example:

  • kill -9 1234 kills PID 1234.

  • pkill -9 nginx kills all nginx processes.

Here’s a detailed explanation of the pstree command with an example output — useful for Linux interviews and real-time debugging.


🌳 What is pstree?

pstree shows processes in a tree format, visualizing parent-child relationships (i.e., which process spawned which).


✅ Basic Syntax:

pstree

📌 Example Output:

systemd─┬─NetworkManager───2*[{NetworkManager}]
        ├─sshd───sshd───bash───pstree
        ├─cron
        ├─dbus-daemon
        ├─gnome-shell───2*[{gnome-shell}]
        ├─firefox───9*[{firefox}]
        ├─pulseaudio───{pulseaudio}
        └─cupsd

🔍 How to Read It:

  • systemd is the root (PID 1).

  • sshd started a child sshd session, which launched bash, which launched pstree.

  • firefox has 9 threads shown as {firefox}.

  • 2*[{NetworkManager}] means two threads under NetworkManager.


🔧 Useful Options:

Command Description
pstree -p Show PIDs
pstree -u Show usernames
pstree -a Show command line arguments
pstree -h Highlight current process tree
pstree <username> Show processes owned by a specific user
pstree <PID> Show subtree for a specific process

📌 Example with PID and arguments:

pstree -p -a

Output:

systemd(1)─┬─cron(585)
           ├─sshd(980)───sshd(1032)───bash(1033)───pstree(1080) -p -a
           ├─nginx(850) -g daemon off;───nginx(851)

🧠 Interview Tip

Q: Why use pstree instead of ps?

A: pstree visually shows parent-child process hierarchy, helping you trace how a process was spawned (e.g., sshd → bash → python script) — useful in debugging background services, daemons, or containerized environments.



Here's a code-block-style diagram to help you visually understand bottlenecks in a Linux system, categorized by key system components like CPU, memory, disk I/O, network, and processes. This is extremely useful for both troubleshooting and interviews:


📊 Linux System Bottlenecks – Block Diagram (Text-Based)

+-----------------+        +----------------+       +------------------+
|     CPU         |<-----> |    Processes   |<----->|      Memory       |
+-----------------+        +----------------+       +------------------+
      ^  |                          ^   |                      |
      |  |                          |   |                      v
      |  |                    +-----+   +------+        +-------------+
      |  |                    | High CPU usage |        |   Swapping  |
      |  +------------------->| Infinite loops |        |   Thrashing |
      |                       | Busy-waiting   |        +-------------+
      |
      |          +----------------+    
      |          | Interrupts     |     
      +----------| Context Switch |    
                 +----------------+   

        |
        v
+----------------+
|     Load Avg   |
|  (uptime, top) |
+----------------+

+-------------------+        +-------------------+
|     Disk I/O      |<-----> |     Processes     |
+-------------------+        +-------------------+
        |                              ^
        |                              |
        v                              |
+-------------------+        +--------------------------+
|  High I/O Wait    |<-------|  Log-heavy apps, DBs     |
|  (iostat, iotop)  |        |  Misconfigured writes    |
+-------------------+        +--------------------------+
+-------------------+        +-------------------+
|    Network I/O    |<-----> |     Processes     |
+-------------------+        +-------------------+
        |                              ^
        |                              |
        v                              |
+---------------------+       +-------------------------+
| High RX/TX, Drops   |<------| Chatty apps, poor MTU   |
| (iftop, netstat)    |       | Packet loss, congestion |
+---------------------+       +-------------------------+

🔍 Mapping Bottlenecks to Tools

Bottleneck Area Symptoms Commands to Diagnose
CPU High load, 100% usage top, htop, mpstat, pidstat
Memory Swapping, OOM kills free -m, vmstat, dmesg, top
Disk I/O High iowait, slow FS access iotop, iostat, dstat, df -h, lsof
Network Packet drops, latency spikes iftop, nethogs, ss, netstat, ping
Process Zombie/stuck/high CPU tasks ps aux, top, pstree, strace, lsof
System Load Load avg spikes uptime, top, w, sar, vmstat

🧠 Interview Tip

Q: How would you diagnose a slow Linux system?

A:
Start by checking system-wide metrics:

  • top/htop for CPU, memory, process bottlenecks.

  • iotop or iostat for disk I/O wait.

  • iftop or netstat for network delays.
    Then drill down using:

  • strace, lsof, vmstat, and dmesg to isolate the culprit.


Would you like a printable visual PNG/diagram version of this block diagram for your notes?


No comments :

Post a Comment

103 Sudo vs Su and Principle of least privileges

No comments

1. Difference Between su and sudo

Basic Definition

  • su (Substitute User or Switch User):
    Allows you to switch to another user account, typically the root user. You’ll be prompted to enter the target user’s password.

  • sudo (Superuser Do):
    Allows you to run a single command with elevated privileges, but you use your own password (not root's). Access is controlled through the /etc/sudoers file.


Key Differences

Feature su sudo
Purpose Switch to another user session Run a command with elevated privileges
Password Needed Target user's password (e.g. root) Your own password
Security Less secure (full shell access) More secure (limited command access)
User Traceability No command logs Commands logged in /var/log/auth.log
Configuration No configuration Highly configurable via /etc/sudoers
Session Scope Creates a new shell Executes a single command

Real-World Usage

  • Use su when:

    • You need to maintain a full root shell session.

    • You're working in an environment where sudo isn’t configured.

  • Use sudo when:

    • You want to minimize risk by limiting access to specific commands.

    • You're working in a multi-user environment and need an audit trail.

    • You want to adhere to best security practices.


Example

# Using su to become root
su -
# (enter root password)
apt update

# Using sudo to run a command as root
sudo apt update
# (enter your password)

2. Principle of Least Privilege (PoLP)

What It Means

The Principle of Least Privilege is a security best practice stating that users and processes should be granted only the minimum level of access needed to perform their tasks — no more, no less.


💡 Why It’s Important

  • Reduces attack surface: Limits what attackers can do if they gain access.

  • Minimizes human error: Prevents accidental deletion or system changes.

  • Improves auditability: Easier to track and understand permission usage.

  • Supports compliance: Many regulations (e.g., HIPAA, GDPR) require it.


How It's Applied in Linux

  1. User Permissions
    Users are placed into groups and given access to only necessary files or directories.

  2. Sudo Configuration
    The /etc/sudoers file is used to allow users to run specific commands as root, without giving full root access.

    Example:

    john ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart apache2
    

    This means john can restart Apache with sudo, but nothing else.

  3. File Ownership and Permissions
    The chmod, chown, and chgrp commands are used to tightly control file access.

  4. Service Accounts
    Processes like web servers or databases run under dedicated users (e.g., www-data) that only have access to their specific directories.


 Interview-Worthy Talking Points

  • “Using sudo instead of su enforces least privilege by giving users temporary access to specific tasks.”

  • “We implement PoLP by reviewing user permissions regularly and removing unnecessary sudo privileges.”

  • “I once audited a system where developers had full root access — we moved them to role-based sudo rules, which enhanced security significantly.”


No comments :

Post a Comment

101 Linux ACL

No comments

 Here is the cleaned-up version of the Linux ACL (Access Control List) explanation without any special symbols:


What is ACL (Access Control List) in Linux?

Definition

ACL is a fine-grained permission system in Linux that allows you to grant different permissions to multiple users and groups on a single file or directory — something that traditional Unix permissions can't do.


Why Use ACL?

Traditional Linux file permissions only allow you to define:

  • One owner (user)

  • One group

  • Permissions for others

So you can only have 3 sets of permissions: user, group, others

But what if:

  • You want to give read access to another user

  • Or let a different group have write access

  • Without changing ownership or primary group

That's where ACL comes in.


ACL vs Traditional Permissions

Feature Traditional Permissions ACL
Number of users/groups 1 user, 1 group Multiple users and groups
Granularity Limited (r/w/x) Fine-grained per user/group
Inheritance No Yes (default ACL on directories)

Enabling and Using ACL

Check if filesystem supports ACL

Most modern Linux distros with ext4, xfs, or btrfs support it.

mount | grep acl

If not enabled, mount with:

mount -o remount,acl /mount/point

ACL Commands

1. Check current ACLs

getfacl filename

Example:

getfacl report.txt

Output:

file: report.txt
owner: prakash
group: devs
user::rw-
user:john:r--
group::r--
mask::r--
other::---

2. Add ACL for specific user

setfacl -m u:john:r file.txt

John gets read-only access, even if he is not the owner or in the group.


3. Add ACL for a group

setfacl -m g:designers:rw file.txt

Group 'designers' can read/write file.txt


4. Remove ACL entry

setfacl -x u:john file.txt

5. Set Default ACL on directory (for inheritance)

setfacl -d -m u:john:rw /project/folder

All new files inside /project/folder will automatically give john read/write access.


6. Remove all ACLs

setfacl -b file.txt

Example Scenario

You’re building a CI/CD pipeline and want:

  • Dev team to have read/write on app.conf

  • Ops team to have read-only access

  • Jenkins user to have write-only access

Using ACL:

setfacl -m g:dev:rw app.conf
setfacl -m g:ops:r-- app.conf
setfacl -m u:jenkins:-w- app.conf

Output Explanation (getfacl)

file: file.txt
owner: prakash
group: devs
user::rw-
user:john:r--
group::r--
group:designers:rw-
mask::rw-
other::---

Mask defines the maximum permission limit for all ACL users and groups (excluding the owner).


Interview-Ready Q and A

Q1: Why use ACL when traditional permissions exist?

A: Traditional permissions only allow one user and one group. ACL allows multiple users and groups to have different access levels on the same file or directory — useful in collaborative or enterprise environments.


Q2: What does 'setfacl -m u:john:rw file.txt' do?

A: It gives read/write access to user john on file.txt without changing ownership or default permissions.


Q3: What does 'mask::r--' mean in 'getfacl'?

A: The mask defines the maximum permission limit for all ACL users and groups (excluding the owner). Even if ACL grants rw-, a mask of r-- will reduce it to read-only.


Q4: How do you make ACL changes persistent across reboots?

A: ACLs are stored in extended attributes of files, and are persistent across reboots — provided the filesystem is mounted with acl support.


Q5: What’s the difference between default ACL and access ACL?

Type Applied to Purpose
Access ACL Files/Dirs Overrides standard permissions
Default ACL Directories Inherited by new files/dirs

Summary Cheat Sheet

Command Description
getfacl file.txt View ACLs
setfacl -m u:john:rw file.txt Set ACL for user
setfacl -m g:devs:rw file.txt Set ACL for group
setfacl -x u:john file.txt Remove ACL for user
setfacl -b file.txt Remove all ACL entries
setfacl -d -m u:john:rw dir/ Set default ACL for directory

Let me know if you want a one-page printable PDF or a Notion-ready template with examples and command cheats.

No comments :

Post a Comment

101 Linux Permissions

No comments

 Here’s a comprehensive guide to Linux Permissions tailored for FAANG-level interviews—starting from beginner to advanced, along with 10 interview-style questions and answers for practice.


1. Linux Permissions: Beginner to Advanced


1.1 Basics of Linux Permissions

Each file/directory has:

[File Type][Owner][Group][Others]

Example:

-rwxr-xr-- 1 prakash devs 2345 Jun 6 12:00 script.sh

Breakdown:

  • - = File (can be d for directory)

  • rwx = Owner: read, write, execute

  • r-x = Group: read, execute

  • r-- = Others: read only


1.2 Types of Permissions

Symbol Meaning Octal
r Read 4
w Write 2
x Execute 1

To get octal value:

chmod 754 filename
# => Owner: 7 (rwx), Group: 5 (r-x), Others: 4 (r--)

1.3 Managing Permissions

  • View:

    ls -l
    
  • Change using symbolic mode:

    chmod u+x file.sh     # Add execute for owner
    chmod g-w file.sh     # Remove write for group
    chmod o=r file.sh     # Set read-only for others
    
  • Change using numeric mode:

    chmod 755 file.sh
    

1.4 Ownership

  • Change owner:

    chown prakash file.txt
    
  • Change group:

    chgrp devs file.txt
    
  • Change both:

    chown prakash:devs file.txt
    

1.5 Special Permissions (Advanced)

1.5.1 SetUID (s) — Run as file owner

chmod u+s my_script
ls -l => -rwsr-xr-x

Used in programs like passwd.

1.5.2 SetGID (s) — Run with group permissions

chmod g+s my_script
ls -l => -rwxr-sr-x

1.5.3 Sticky Bit (t) — Protect deletion in shared dirs

chmod +t /tmp
ls -ld /tmp => drwxrwxrwt

1.6 Default Permissions – umask

Check umask:

umask        # e.g., 0022

Meaning:

  • File default: 666 – 0022 = 644 (rw-r--r--)

  • Dir default: 777 – 0022 = 755 (rwxr-xr-x)


1.7 Recursive Permission Change

chmod -R 755 /var/www
chown -R prakash:www-data /var/www

2. 10 Linux Permission Questions & Answers


Q1. What does chmod 755 file.sh do?

Answer:
Sets permissions to:

  • Owner: rwx (7)

  • Group: r-x (5)

  • Others: r-x (5)


Q2. What is the use of chmod +x script.sh?

Answer:
Adds execute permission for the owner, allowing the script to be run directly.


Q3. What is the difference between chmod 777 and chmod 755?

Answer:

  • 777: Everyone has full access (read/write/execute).

  • 755: Only owner has write access, others can read/execute but not modify.


Q4. How do you give read & write to owner, read-only to others?

Answer:

chmod 644 file.txt

Q5. What does -rwsr-xr-x mean in ls -l output?

Answer:
SetUID is set:

  • File will run with owner’s privileges, not current user’s.


Q6. What does the Sticky Bit do?

Answer:
Prevents users from deleting others’ files in a shared directory like /tmp.
drwxrwxrwt indicates sticky bit set.


Q7. How do you recursively change ownership of a directory and its contents?

Answer:

chown -R user:group /path/to/dir

Q8. Explain what umask 027 means?

Answer:
Default permissions mask:

  • For files: 666 – 027 = 640 (rw-r-----)

  • For dirs: 777 – 027 = 750 (rwxr-x---)


Q9. What’s the permission number for rw-rw-r--?

Answer:

rw- rw- r-- = 664

Q10. How to give only execute permission to others?

Answer:

chmod o=x file.sh

Bonus Practice Examples

Task Command
Give full permission to owner only chmod 700 file.sh
Remove all permissions for others chmod o= file.sh
Set SetGID on a directory chmod g+s dir
Make a directory with full access for all mkdir -m 777 shared_dir
View special permissions ls -l or stat file

Certainly! Let's break down umask (User Mask or User file creation MASK) in full detail, including concept, default values, calculation logic, and practical examples — especially useful for FAANG-level interviews.


What is umask?

umask defines the default permission bits to subtract from newly created files or directories.

  • It doesn’t grant permissions — it masks/restricts them.

  • When a user creates a file or directory, Linux applies a default permission first, then subtracts the umask.


Default Permission Values

Type Max default permission
File 666 (rw-rw-rw-) – no x by default
Directory 777 (rwxrwxrwx)

How umask Works (Step-by-Step)

Let's say:

  • umask = 022

For files:

Default:   666
UMASK:     022
Result:    644 → rw-r--r--

For directories:

Default:   777
UMASK:     022
Result:    755 → rwxr-xr-x

So:

  • Owner keeps full permissions

  • Group & Others lose w permission


Common umask values & their effects

UMASK File Permission Dir Permission Notes
000 666 → 666 777 → 777 Everyone full access
022 666 → 644 777 → 755 Group/Others: no write
027 666 → 640 777 → 750 Group: read, Others: no access
077 666 → 600 777 → 700 Owner-only full access

Examples in Shell

🔍 View current umask:

umask
# Output: 0022

 Set umask temporarily:

umask 0027
touch file.txt
mkdir dir1

ls -l
# file.txt => -rw-r-----
# dir1     => drwxr-x---

⚙️ Make umask permanent:

For bash shell, add this to:

vi ~/.bashrc
umask 027

Then:

source ~/.bashrc

Important Rules

  1. umask removes bits, not adds them.

  2. Files are never given execute (x) by default, even if umask allows it.

  3. Directories do get x so you can cd into them.


Interview-ready Example

Q: If a user has a umask of 0077 and runs touch test.txt and mkdir demo, what are the resulting permissions?

Answer:

  • Default for file: 666

  • umask: 077

  • Final: 600 (rw-------)

  • Default for dir: 777

  • umask: 077

  • Final: 700 (rwx------)

So only the owner can read/write/execute. This is a secure configuration.


Visual Table: umask Logic

Resource Max Default - UMASK = Final Permission
File 666 027 640 (rw-r-----)
Dir 777 027 750 (rwxr-x---)

Absolutely! Let's dive deep into Linux Special PermissionsSetUID, SetGID, and the Sticky Bit — in a clear, detailed, and FAANG-level interview-ready manner with real-world use cases, diagrams (described), and practical examples.


Why Special Permissions Exist

Linux has standard permission bits for:

  • User (Owner)

  • Group

  • Others

But what if you want:

  • A user to run a program as root?

  • Files in a shared directory to always inherit the same group?

  • Prevent users from deleting others’ files in /tmp?

That’s where special permission bits come in.


1. SetUID (Set User ID)

What it Does:

When a binary file has SetUID, any user who executes it temporarily assumes the permissions of the file's owner (usually root).

Real-World Use Case:

  • The passwd command allows any user to update their own password, but it needs to write to /etc/shadow, which is owned by root.

  • So passwd runs as root, even when executed by a regular user.

Example:

ls -l /usr/bin/passwd
-rwsr-xr-x 1 root root 54256 Jan  1 00:00 /usr/bin/passwd
          ^
          └── 's' here means SetUID is ON for user

How to Set:

chmod u+s my_script

🔍 How to Verify:

ls -l my_script
-rwsr-xr-x 1 prakash devs 2345 Jun 6 12:00 my_script

2. SetGID (Set Group ID)

What it Does:

A) On Files:

  • The process runs with group permissions of the file, not the executing user.

B) On Directories:

  • New files/directories inside the directory will inherit the group ownership of the directory — not the creator's default group.

Real-World Use Case:

  • In collaborative environments (e.g., /shared/projects), you want files created by any team member to have the same group, like devs.

Example on Directory:

mkdir shared
chgrp devs shared
chmod g+s shared
ls -ld shared
drwxr-sr-x 2 prakash devs 4096 Jun 6 12:00 shared
           ^
           └── 's' on group means SetGID is ON

Now, any file inside shared/ will automatically belong to group devs.


How to Set on File:

chmod g+s my_binary

How to Set on Directory:

chmod g+s /some/dir

3. Sticky Bit (t)

What it Does:

Only the owner of a file can delete or rename it, even if others have write access to the directory.

Real-World Use Case:

  • /tmp directory: World-writable directory used by all users. Without sticky bit, users could delete each other’s temporary files.

Example:

ls -ld /tmp
drwxrwxrwt 10 root root 4096 Jun 6 12:00 /tmp
          ^
          └── 't' means Sticky Bit is set

How to Set:

chmod +t mydir

How to Remove:

chmod -t mydir

Summary Table

Permission Symbol Applies to Effect
SetUID s (user) Executable file   Runs as file owner
SetGID s (group) File or directory File: runs as group, Dir: inherits group
Sticky Bit t (others) Directory Only file owner can delete

Interview-Style Example

Q: A file shows -rwsr-xr-x. What does it mean and why is it used?

Answer:

  • The s in user field = SetUID.

  • This means: when the file is executed, it runs with the owner's privileges, not the executor’s.

  • Common use: commands like passwd which need to access /etc/shadow.


FAANG-Level Takeaway Tips

  • Know how SetUID can be a security risk if misused (e.g., privilege escalation).

  • Sticky bit is essential for shared directories to prevent accidental file deletion.

  • SetGID on directories is useful in CI/CD pipelines and group collaboration.


No comments :

Post a Comment