A thread is the smallest unit of execution within a process. A process can have multiple threads — they share the same heap memory (objects, static variables) but each thread gets its own stack, program counter, and local variables.
Process vs Thread: A process has its own isolated memory space. Threads within a process share memory, making communication cheap but synchronization necessary.
class MyThread extends Thread {
public void run() {
System.out.println("Thread: " + Thread.currentThread().getName());
}
}
new MyThread().start(); // start() creates new thread; run() would NOT
> Interview point: Always call start(), never run() directly. run() executes on the current thread — no new thread is created.
Daemon threads are background threads (e.g., GC). JVM exits when only daemon threads remain. Set via thread.setDaemon(true) before start().
Java threads have 6 states defined in Thread.State:
| State | Description | Transitions To |
|---|---|---|
| NEW | Created, not yet started | RUNNABLE on start() |
| RUNNABLE | Running or ready to run (OS scheduler decides) | BLOCKED, WAITING, TERMINATED |
| BLOCKED | Waiting to acquire an intrinsic lock | RUNNABLE when lock acquired |
| WAITING | wait(), join() with no timeout | RUNNABLE on notify/interrupt |
| TIMED_WAITING | sleep(ms), wait(ms), join(ms) | RUNNABLE on timeout/notify |
| TERMINATED | run() completed or exception thrown | — |
> Warning: BLOCKED = waiting for a monitor lock. WAITING = voluntarily gave up CPU (called wait/join). This is an important distinction in interviews!
| Aspect | Runnable | extends Thread |
|---|---|---|
| Inheritance | ✅ Can extend other class | ❌ No multiple inheritance |
| Reusability | ✅ Same Runnable → multiple threads | ❌ Tightly coupled |
| Best for | Task definition (preferred) | Simple demos only |
| Return value | ❌ void run() | ❌ void run() |
| Exceptions | ❌ No checked exceptions | ❌ No checked exceptions |
// Preferred: Runnable (or lambda)
Runnable task = () -> System.out.println("Hello");
new Thread(task).start();
// Better yet: use ExecutorService
ExecutorService ex = Executors.newFixedThreadPool(4);
ex.submit(task);
> In production code, you almost never extend Thread or use new Thread() directly. Always use an ExecutorService or virtual threads (Java 21+).
| Feature | Runnable | Callable\ |
|---|---|---|
| Method | void run() | V call() throws Exception |
| Return value | ❌ None | ✅ Returns V |
| Checked exceptions | ❌ | ✅ Can throw |
| Used with | execute(), submit() | submit() → Future\ |
Callable<Integer> task = () -> {
Thread.sleep(1000);
return 42;
};
Future<Integer> future = executor.submit(task);
Integer result = future.get(); // blocks until done
The Executor framework decouples task submission from execution mechanics. It manages thread lifecycle, pooling, and scheduling so you don't create threads manually.
Key interfaces: Executor → ExecutorService → ScheduledExecutorService
// Core hierarchy
Executor // execute(Runnable)
└── ExecutorService // submit(), shutdown(), invokeAll()
└── ScheduledExecutorService // schedule(), scheduleAtFixedRate()
// Factory methods via Executors class
Executors.newFixedThreadPool(4) // bounded pool
Executors.newCachedThreadPool() // unbounded, reuses idle
Executors.newSingleThreadExecutor() // serial execution
Executors.newScheduledThreadPool(2) // scheduled tasks
Executors.newWorkStealingPool() // ForkJoinPool based
> Always shut down ExecutorService: executor.shutdown() (graceful) or executor.shutdownNow() (interrupts running tasks). Use try-with-resources in Java 19+.
ThreadPoolExecutor is the real implementation. Key params: corePoolSize, maximumPoolSize, keepAliveTime, BlockingQueue. Understanding these is a senior-level interview question.
| Aspect | newFixedThreadPool(n) | newCachedThreadPool() |
|---|---|---|
| Thread count | Fixed at n | 0 to Integer.MAX_VALUE |
| Queue | LinkedBlockingQueue (unbounded) | SynchronousQueue (no capacity) |
| Idle thread TTL | Lives forever (risky!) | 60 seconds, then terminated |
| Best for | CPU-bound, predictable load | Many short-lived I/O tasks |
| Risk | OOM if queue grows unbounded | Thread explosion under load |
> Warning: In production, prefer ThreadPoolExecutor with explicit bounded queue over Executors factory methods. Unbounded queues hide backpressure issues.
// Production-grade thread pool
new ThreadPoolExecutor(
4, // corePoolSize
8, // maximumPoolSize
60L, TimeUnit.SECONDS,
new ArrayBlockingQueue<>(100), // bounded!
new ThreadPoolExecutor.CallerRunsPolicy() // backpressure
);
ForkJoinPool implements the work-stealing algorithm. Each thread has its own deque. Idle threads steal tasks from the tail of other threads' deques — maximizing CPU utilization for recursive divide-and-conquer tasks.
class SumTask extends RecursiveTask<Long> {
private final int[] arr; private final int lo, hi;
protected Long compute() {
if (hi - lo <= 1000) {
long sum = 0;
for (int i = lo; i < hi; i++) sum += arr[i];
return sum;
}
int mid = (lo + hi) / 2;
SumTask left = new SumTask(arr, lo, mid);
left.fork(); // async execution
SumTask right = new SumTask(arr, mid, hi);
return right.compute() + left.join();
}
}
ForkJoinPool.commonPool().invoke(new SumTask(arr, 0, arr.length));
> RecursiveTask returns a value; RecursiveAction returns void. Parallel streams internally use ForkJoinPool.commonPool().
Synchronization ensures mutual exclusion (only one thread executes a critical section) and memory visibility (changes made inside a synchronized block are visible to all threads that subsequently acquire the same lock).
Java uses the monitor pattern — every object has an intrinsic lock (monitor). The synchronized keyword acquires this lock before entering and releases it on exit (even via exceptions).
class Counter {
private int count = 0;
public synchronized void increment() { count++; } // instance lock
public void add(int n) {
synchronized(this) { count += n; } // equivalent
}
}
> Warning: Synchronization has cost: context switching, memory barriers. Don't over-synchronize. The critical section should be as small as possible.
| Type | Lock on | Syntax | Scope |
|---|---|---|---|
| Object-level | Instance (this) | synchronized method / synchronized(this) | Per instance — different objects don't block each other |
| Class-level | Class object (MyClass.class) | static synchronized / synchronized(MyClass.class) | All instances share ONE lock |
class Demo {
public synchronized void instanceMethod() { } // lock on 'this'
public static synchronized void staticMethod() { } // lock on Demo.class
public void block() {
synchronized(Demo.class) { } // explicit class lock
}
}
> Object lock and class lock are completely independent. Two threads can simultaneously call instanceMethod() on different objects, and a third thread can call staticMethod().
Every Java object has an intrinsic lock (monitor) baked into the object header. When a thread acquires it via synchronized, all other threads trying to acquire the same lock are placed in the BLOCKED state.
Key properties:
- Reentrant — A thread that already holds a lock can re-acquire it without deadlocking (increments hold count).
- Automatic release — Always released when the synchronized block exits, even on exception.
- Non-interruptible — A thread waiting to acquire cannot be interrupted (unlike
ReentrantLock.lockInterruptibly()).
class Parent {
public synchronized void method() {
childMethod(); // re-entrant: same thread, same lock — no deadlock
}
public synchronized void childMethod() { /* same lock, increments count */ }
}
| Feature | synchronized | ReentrantLock |
|---|---|---|
| Fairness | ❌ No fairness guarantee | ✅ new ReentrantLock(true) |
| Try-lock | ❌ | ✅ tryLock() / tryLock(timeout) |
| Interruptible | ❌ | ✅ lockInterruptibly() |
| Multiple conditions | ❌ One wait set | ✅ Multiple Condition objects |
| Explicit unlock | Auto on block exit | Must call unlock() in finally |
| Performance (Java 8+) | Similar (biased locking) | Similar |
ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
// critical section
} finally {
lock.unlock(); // ALWAYS in finally!
}
1. tryLock() — Non-blocking attempt to acquire lock, prevents deadlock:
if (lock.tryLock(100, TimeUnit.MILLISECONDS)) {
try { /* work */ } finally { lock.unlock(); }
} else { /* handle contention */ }
2. Multiple Conditions — Fine-grained wait/notify control:
Condition notFull = lock.newCondition(); Condition notEmpty = lock.newCondition(); // Producer signals notEmpty; Consumer signals notFull // Unlike synchronized: each condition has its own wait set
3. Fair lock — FIFO ordering, prevents starvation (at cost of throughput).
4. lockInterruptibly() — Thread waiting for lock can be interrupted, enabling cancellation.
> Use synchronized for simple use cases. Reach for ReentrantLock when you need: timeouts, interruptibility, fairness, or multiple conditions.
Deadlock occurs when ≥2 threads are permanently blocked, each waiting for a resource held by another.
4 Coffman conditions (all must hold for deadlock): Mutual exclusion, Hold-and-wait, No preemption, Circular wait.
// Classic deadlock
synchronized(lockA) {
synchronized(lockB) { /* Thread 1 */ }
}
// Thread 2 does lockB then lockA → DEADLOCK
// Fix: Always acquire locks in the same order
synchronized(lockA) {
synchronized(lockB) { /* Both threads */ }
}
Prevention strategies:
- Lock ordering — always acquire in a fixed order
- tryLock with timeout — back off and retry
- Lock coarsening — use fewer locks
- Avoid nested locks
Livelock is like deadlock except threads ARE active (not blocked) but keep responding to each other's state changes without making progress. CPU is consumed but no work is done.
Classic analogy: Two people in a corridor both step aside in the same direction repeatedly — polite but stuck.
// Livelock pattern: both threads keep releasing and re-acquiring
while (!tryLockBoth(lockA, lockB)) {
lockA.unlock();
Thread.sleep(random()); // both threads do same thing, keep colliding
}
// Fix: randomized back-off with jitter
> Livelock is harder to detect than deadlock because threads are running. Thread dumps show RUNNABLE state, but progress metrics are flat.
Starvation occurs when a thread is perpetually denied access to resources it needs, while other threads proceed. The starved thread never makes progress but is not blocked in a cycle.
Causes:
- Unfair lock implementation (non-fair ReentrantLock)
- High-priority threads hogging CPU, low-priority starved
- Long-running synchronized methods preventing others from entering
Solutions:
- Use
new ReentrantLock(true)for fair lock (FIFO) - Avoid holding locks for long periods
- Set appropriate thread priorities
> Starvation vs Deadlock: In deadlock all involved threads are stuck. In starvation, some threads proceed fine — only the starved thread is denied.
A race condition occurs when the correctness of a program depends on the relative timing of thread execution. Two threads race to access/modify shared state, producing different results each run.
// Classic race: check-then-act
if (!map.containsKey(key)) { // Thread 1 checks
map.put(key, value); // Thread 2 also checked! Both insert!
}
// Fix: Use ConcurrentHashMap.putIfAbsent() or synchronized
// Classic race: read-modify-write
count++; // NOT atomic: read, increment, write (3 ops!)
// Fix: AtomicInteger.incrementAndGet() or synchronized
> Race conditions are non-deterministic and often don't manifest in single-threaded tests. Use stress testing tools or thread sanitizers in CI.
volatile guarantees two things:
1. Visibility — writes are immediately flushed to main memory; reads always fetch from main memory (bypasses CPU caches).
2. Ordering — prevents instruction reordering around volatile reads/writes (memory barrier).
class Singleton {
private volatile static Singleton instance; // volatile critical here!
public static Singleton getInstance() {
if (instance == null) { // check 1 (no lock)
synchronized(Singleton.class) {
if (instance == null) // check 2 (with lock)
instance = new Singleton();
}
}
return instance;
}
}
> volatile does NOT provide atomicity! volatile int count; count++ is still NOT thread-safe — use AtomicInteger instead.
Good use cases: flags, status fields, double-checked locking, publishing immutable objects.
Atomic classes provide lock-free thread-safe operations using CPU-level CAS (Compare-And-Swap) instructions. No locking = no context switches = better scalability.
| Class | Use for |
|---|---|
| AtomicInteger/Long | Counters, flags |
| AtomicBoolean | One-time actions, flags |
| AtomicReference\ | Object references |
| AtomicIntegerArray | Array with atomic element ops |
| LongAdder | High-contention counters (faster than AtomicLong) |
| LongAccumulator | Custom accumulation functions |
AtomicInteger counter = new AtomicInteger(0); counter.incrementAndGet(); // atomic ++ counter.compareAndSet(5, 10); // if value==5, set to 10 counter.updateAndGet(x -> x * 2); // atomic update
> LongAdder is preferred for high-concurrency counters — it stripes the counter across cells to reduce contention, then sums them on read.
Happens-before (HB) is a formal guarantee in the Java Memory Model: if action A happens-before action B, then all effects of A are visible to B.
Key HB rules:
- Program order: statements in a thread execute in order
- Monitor unlock HB subsequent lock of the same monitor
- Volatile write HB subsequent volatile read of same variable
- Thread.start() HB any action in the started thread
- Thread.join() HB actions after join() returns
- Static initializer HB first use of the class
volatile int flag = 0;
int data = 0;
// Thread 1:
data = 42; // HB volatile write (program order rule)
flag = 1; // volatile write
// Thread 2:
if (flag == 1) // volatile read HB subsequent reads
use(data); // data=42 guaranteed visible!
The JMM defines how and when changes made by one thread become visible to others. Without the JMM, CPUs and JIT compilers can reorder reads/writes for performance.
The problem: Modern CPUs have caches (L1/L2/L3). Each core can cache values. Without synchronization, Thread A's writes may never reach Thread B's cache.
JMM key concepts:
- Main memory — shared heap
- Working memory — per-thread CPU cache view
- Actions: read, load, use, assign, store, write, lock, unlock
- The JMM specifies when working memory must sync with main memory
> JMM doesn't mandate a specific implementation — it defines what guarantees the JVM must provide. Synchronization primitives (synchronized, volatile, Atomic) create the necessary memory barriers.
| Problem | Description | Fix |
|---|---|---|
| Visibility | Thread B doesn't see Thread A's write (cached in A's CPU) | volatile, synchronized, Atomic |
| Atomicity | A compound operation (read-modify-write) is interrupted mid-way | synchronized, Atomic classes, Lock |
// volatile fixes visibility but NOT atomicity: volatile int count = 0; count++; // STILL BROKEN: 3 separate ops: read, +1, write // long/double writes are NOT atomic (two 32-bit writes): volatile long val; // volatile makes long writes atomic too // AtomicInteger fixes both: AtomicInteger count = new AtomicInteger(); count.incrementAndGet(); // atomic AND visible
| Feature | wait() | sleep() |
|---|---|---|
| Class | Object | Thread (static) |
| Lock release | ✅ Releases monitor lock | ❌ Holds lock |
| Wakeup | notify()/notifyAll() or timeout | Timeout only |
| Context | Must be in synchronized block | Anywhere |
| Use case | Inter-thread communication | Pause execution |
// wait() — must hold the lock
synchronized(lock) {
while (!condition) // while loop, not if (spurious wakeups!)
lock.wait();
// proceed
}
// sleep() — just pauses, keeps any held locks
Thread.sleep(1000); // does NOT release locks!
> Always use wait() in a while loop, not an if. Spurious wakeups (wakeup without notify) are allowed by the JVM spec.
| notify() | notifyAll() | |
|---|---|---|
| Wakes up | 1 arbitrary waiting thread | ALL waiting threads |
| Risk | Wrong thread woken (missed signal) | Thundering herd (all compete) |
| Safe when | All waiters identical (homogeneous) | Multiple different conditions |
// notifyAll is safer in most cases:
synchronized(lock) {
ready = true;
lock.notifyAll(); // all waiting threads re-evaluate their while condition
}
> Prefer notifyAll() unless you can guarantee all waiting threads are identical and any one can proceed. With ReentrantLock you can use separate Conditions to wake specific threads.
Classic coordination pattern. Best implemented with BlockingQueue in modern Java:
BlockingQueue<Integer> queue = new ArrayBlockingQueue<>(10);
// Producer
Runnable producer = () -> {
while (true) {
queue.put(produce()); // blocks if full
}
};
// Consumer
Runnable consumer = () -> {
while (true) {
consume(queue.take()); // blocks if empty
}
};
BlockingQueue types:
- ArrayBlockingQueue — bounded, fair option available
- LinkedBlockingQueue — optionally bounded (default Integer.MAX_VALUE)
- PriorityBlockingQueue — priority-ordered
- SynchronousQueue — no capacity, handoff only
- LinkedTransferQueue — producer waits until consumer picks up
A Semaphore maintains a set of permits. acquire() blocks until a permit is available; release() adds one back. Unlike locks, a semaphore can be released by a different thread than the one that acquired it.
// Limit concurrent DB connections to 5
Semaphore sem = new Semaphore(5);
void query() throws InterruptedException {
sem.acquire();
try {
runQuery();
} finally {
sem.release(); // always release!
}
}
Use cases: Rate limiting, resource pool sizing, binary semaphore as mutex, throttling parallel requests.
A binary semaphore (permits=1) acts like a mutex but without ownership — useful when one thread signals another.
| Feature | CountDownLatch | CyclicBarrier |
|---|---|---|
| Reusable | ❌ One-time use | ✅ Resets after each barrier |
| Who waits | One/more threads await() others to countDown() | All threads wait for each other |
| Action on trigger | Releases all waiting threads | Optional Runnable, then releases |
| Best for | Start signal, wait for N completions | Phased iteration (parallel algorithms) |
// CountDownLatch: wait for 3 services to start CountDownLatch latch = new CountDownLatch(3); // Each service: latch.countDown() when ready latch.await(); // main thread waits // CyclicBarrier: N threads sync at each phase CyclicBarrier barrier = new CyclicBarrier(4, () -> mergeResults()); // Each worker: barrier.await() at end of phase (auto-resets!)
Phaser is a flexible, reusable synchronization barrier that combines features of both CountDownLatch and CyclicBarrier, with dynamic party registration.
Advantages over CyclicBarrier:
- Parties can register/deregister dynamically
- Tracks phase number
- Can be tiered (parent-child hierarchy)
- Supports inter-phase actions via
onAdvance()override
Phaser phaser = new Phaser(1); // register main thread
for (int i = 0; i < 3; i++) {
phaser.register(); // dynamic registration
new Thread(() -> {
doPhase1();
phaser.arriveAndAwaitAdvance(); // sync point
doPhase2();
phaser.arriveAndDeregister(); // done with all phases
}).start();
}
phaser.arriveAndDeregister(); // main deregisters
ReadWriteLock allows multiple concurrent readers OR one exclusive writer. Ideal for read-heavy workloads.
Rules: Multiple readers can hold the read lock simultaneously. A writer needs exclusive access (no readers, no other writers).
ReadWriteLock rwLock = new ReentrantReadWriteLock();
Lock readLock = rwLock.readLock();
Lock writeLock = rwLock.writeLock();
void read() {
readLock.lock();
try { doRead(); } finally { readLock.unlock(); }
}
void write(Data d) {
writeLock.lock();
try { doWrite(d); } finally { writeLock.unlock(); }
}
Java 8+: Consider StampedLock — supports optimistic reads (no lock acquired, just validate stamp afterward). Even better performance for read-dominant cases.
Java 7: Segment-based locking (16 segments by default). Lock on segment for writes.
Java 8+: Completely redesigned. No segments. Uses a table of Node[] (bins).
Write operations: CAS for empty bins, synchronized on the first node of the bin (fine-grained lock). Only the affected bin is locked, not the whole map.
Read operations: Fully non-blocking using volatile reads. No locking at all.
Structure per bin:
- 0–7 entries: linked list
- 8+ entries: converts to red-black tree (TreeBin) for O(log n)
- Returns to linked list if size drops below 6
// Atomic compound operations map.putIfAbsent(key, value); map.computeIfAbsent(key, k -> new ArrayList<>()); map.merge(key, 1, Integer::sum); // for counting
> size() / isEmpty() are approximate — other threads may modify concurrently. Use mappingCount() for large maps.
CopyOnWriteArrayList and CopyOnWriteArraySet: on every write (add/remove/set), a new copy of the underlying array is created. Readers always see a consistent snapshot.
| Aspect | Detail |
|---|---|
| Reads | Lock-free, very fast (no synchronization) |
| Writes | Expensive — copies entire array + lock |
| Iteration | Never throws ConcurrentModificationException (snapshots iterator) |
| Best for | Read-dominant, rarely modified (event listener lists) |
| Worst for | Frequently mutated lists (huge GC pressure) |
> Iterators reflect the state at iteration start — mutations during iteration are NOT visible. This is by design but can confuse.
CompletableFuture is Java 8's way to write non-blocking async pipelines with composable stages.
// Chain of async operations
CompletableFuture
.supplyAsync(() -> fetchUser(id)) // runs in ForkJoinPool
.thenApplyAsync(user -> fetchOrders(user)) // on completion
.thenCombine(fetchInventory(), (o, i) -> merge(o, i))
.exceptionally(ex -> fallback()) // error handling
.thenAccept(System.out::println);
Key methods:
- thenApply — transform result (like map)
- thenCompose — chain CF returning CF (like flatMap)
- thenCombine — zip two CFs
- allOf — wait for all
- anyOf — first to complete
- exceptionally/handle — error recovery
- Async variants (thenApplyAsync) — run on separate thread
> Methods without "Async" suffix run on the thread that completes the previous stage. Use "Async" variants to ensure off the calling thread execution.
| Feature | Future\ | CompletableFuture\ |
|---|---|---|
| Non-blocking | ❌ get() always blocks | ✅ Callbacks, thenApply etc. |
| Chaining | ❌ No composition | ✅ Full pipeline |
| Exception handling | ExecutionException on get() | ✅ exceptionally, handle |
| Manual complete | ❌ | ✅ complete(), completeExceptionally() |
| Combine multiple | ❌ | ✅ allOf, anyOf, thenCombine |
> Future.get() is a blocking call — it defeats async purpose if called immediately. Always prefer callbacks unless you truly need the result right now.
Parallel streams split data across ForkJoinPool.commonPool() threads. Simple to enable but easy to misuse.
// Sequential list.stream().map(heavyCompute).collect(Collectors.toList()); // Parallel — splits work across CPU cores list.parallelStream().map(heavyCompute).collect(Collectors.toList()); // Custom pool (don't pollute commonPool for I/O) ForkJoinPool customPool = new ForkJoinPool(4); customPool.submit(() -> list.parallelStream().forEach(...)).get();
Avoid parallel streams when:
- Small data sets — fork/join overhead exceeds benefit (rule of thumb: <10,000 elements)
- I/O-bound operations — threads block waiting for I/O, wasting pool threads; use async I/O instead
- Order matters — parallel breaks encounter order unless forced ordered (
findFirston parallel is expensive) - Shared mutable state —
parallelStream()with side effects = data races - Blocking operations inside — blocks
ForkJoinPool.commonPool(), affecting all parallel streams app-wide - LinkedList or non-splittable sources — can't split efficiently; poor performance
> Never do blocking calls (DB, HTTP) inside parallelStream() on commonPool. Use a custom ForkJoinPool or CompletableFuture instead.
ThreadLocal<T> provides per-thread variable instances. Each thread has its own independent copy — no synchronization needed.
static ThreadLocal<SimpleDateFormat> formatter =
ThreadLocal.withInitial(() -> new SimpleDateFormat("yyyy-MM-dd"));
// Each thread gets its own SDF instance — SDF is NOT thread-safe
Common use cases:
- Per-thread database connections / JDBC transactions
- Request context propagation (user ID, locale, trace ID)
- Non-thread-safe utility instances (SimpleDateFormat, Random)
- Spring's
@Transactionaluses ThreadLocal internally
> Always call threadLocal.remove() when done, especially in thread pools! Threads are reused — stale data from a previous request can leak to the next.
In order of preference (easiest to use correctly):
- Immutability — No state = no race conditions. Use final fields, defensive copies. Best option.
- Thread confinement — Don't share state (ThreadLocal, stack-local vars, request-scoped beans).
- Stateless design — Lambdas, pure functions. Naturally thread-safe.
- Concurrent collections — ConcurrentHashMap, CopyOnWriteArrayList, BlockingQueue.
- Atomic variables — Lock-free updates via CAS.
- Synchronization — synchronized, ReentrantLock. Use as last resort due to contention cost.
> "The safest shared state is no shared state." Design for immutability first, then reach for synchronization only when necessary.
Lock-free algorithms guarantee at least one thread makes progress at any time, even if others are delayed. No locks = no deadlocks, no context switches on contention.
Non-blocking levels:
- Wait-free — every thread completes in bounded steps (strongest guarantee)
- Lock-free — at least one thread completes (others may retry)
- Obstruction-free — completes if no interference
// Lock-free counter using CAS loop
AtomicInteger counter = new AtomicInteger();
void increment() {
int current, next;
do {
current = counter.get();
next = current + 1;
} while (!counter.compareAndSet(current, next));
// retry if another thread changed value
}
Java's ConcurrentLinkedQueue, ConcurrentSkipListMap are lock-free implementations.
CAS is a CPU-level atomic instruction: "If memory location equals expected, update to new value atomically. Return success/fail."
// CAS pseudocode
boolean CAS(addr, expected, newValue) {
if (*addr == expected) { *addr = newValue; return true; }
return false; // atomic, no interruption possible
}
ABA Problem: Thread reads A, gets swapped to B and back to A by another thread, CAS succeeds but data changed in between. Solution: AtomicStampedReference or AtomicMarkableReference (adds version/stamp).
AtomicStampedReference<Integer> ref =
new AtomicStampedReference<>(initialVal, 0);
int[] stamp = new int[1];
Integer val = ref.get(stamp); // gets value AND stamp
ref.compareAndSet(val, newVal, stamp[0], stamp[0]+1);
False sharing occurs when two threads modify independent variables that happen to share the same CPU cache line (64 bytes). Every write by one thread invalidates the other thread's cache line, causing massive performance degradation.
// Bad: counter1 and counter2 likely on same cache line
class Counters {
long counter1 = 0; // 8 bytes
long counter2 = 0; // 8 bytes — SAME cache line!
}
// Fix: @Contended (Java 8+) adds padding
class Counters {
@Contended long counter1; // padded to separate cache line
@Contended long counter2;
}
// Requires JVM flag: -XX:-RestrictContended
> LongAdder solves this by striping into separate Cell objects — each on its own cache line. sum() aggregates on read.
Brian Goetz formula: N_threads = N_cpu × U_cpu × (1 + W/C)
Where: N_cpu = CPU cores, U_cpu = target utilization (0–1), W = wait time (I/O), C = compute time.
| Task type | Formula result | Example |
|---|---|---|
| CPU-bound | N_cpu (or N_cpu + 1) | Image processing |
| 50/50 I/O | 2 × N_cpu | Mixed workload |
| I/O dominant (10:1) | 10 × N_cpu | Web scraping |
> With virtual threads, pool sizing is no longer your concern for I/O-bound tasks — create a virtual thread per task. For CPU-bound, still use bounded pool ≈ CPU cores.
Always validate with load testing + monitoring (thread utilization, queue depth, latency p99).
Context switching = OS saves current thread's registers/stack, loads another thread's state. Typical cost: 1–10 microseconds plus cache pollution.
Voluntary switch: thread blocks (I/O, lock, sleep).
Involuntary switch: OS preempts running thread (time quantum expired).
Hidden costs:
- CPU cache flush — new thread accesses cold cache data
- TLB flush — virtual memory mappings change
- Branch predictor retraining
Reduce context switches by:
- Right-sizing thread pools (fewer threads = fewer switches)
- Batch work instead of many small tasks
- Use lock-free algorithms
- Virtual threads (much cheaper to switch — no OS involvement)
| Aspect | CPU-Bound | I/O-Bound |
|---|---|---|
| Bottleneck | Processor speed | Disk/network latency |
| Examples | Encryption, compression, ML inference | DB queries, HTTP calls, file reads |
| Thread count | ≈ CPU cores | Much more than CPU cores |
| Parallelism benefit | High (use all cores) | High (hide latency while waiting) |
| Virtual threads | No benefit | Massive benefit |
> Most web applications are I/O-bound (DB, external APIs). Virtual threads make thread-per-request viable at scale without reactive programming complexity.
Backpressure is a mechanism for a consumer to signal to a producer to slow down when it can't keep up. Without it, fast producers overwhelm slow consumers → OOM.
Java mechanisms:
ArrayBlockingQueue— bounded; producer blocks when full (natural backpressure)ThreadPoolExecutor.CallerRunsPolicy— submitter thread executes task, slowing submission rate- Reactive Streams spec (
Publisher.subscribe()→Subscription.request(n)) — explicit demand signaling Semaphore— limit concurrent tasks
// CallerRunsPolicy as backpressure
new ThreadPoolExecutor(4, 4, 0L, MILLISECONDS,
new ArrayBlockingQueue<>(100),
new ThreadPoolExecutor.CallerRunsPolicy()); // submitter slows down
Structured concurrency treats a group of related tasks as a single unit of work. Child tasks cannot outlive their parent scope — mirrors structured programming's block scoping.
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
Supplier<User> user = scope.fork(() -> fetchUser(id));
Supplier<Orders> orders = scope.fork(() -> fetchOrders(id));
scope.join(); // wait for all
scope.throwIfFailed(); // propagate first failure
return new Dashboard(user.get(), orders.get());
} // scope closes: cancels any running subtasks on exit
ShutdownOnFailure — cancels all if any fails
ShutdownOnSuccess — cancels all when first succeeds (race pattern)
> No more leaked threads! In traditional code, if parent exits early, subtasks keep running invisibly. Structured concurrency prevents this structurally.
- Check-Then-Act race:
if (!map.containsKey(k)) map.put(k,v)— useputIfAbsent() - Read-Modify-Write race:
count++on non-atomic — useAtomicInteger - Visibility bug: Thread B reads stale value of flag set by Thread A — use
volatile - Escaped this reference: Publishing object from constructor before it's fully initialized (registering
thisin constructor) - Deadlock from inconsistent lock order: — enforce consistent ordering
- ThreadLocal leak: Not calling
remove()in thread pool — values pollute next request - Calling synchronized on wrong object:
synchronized(new Object())inside method — new object each call = no mutual exclusion - Double-checked locking without volatile: Pre-Java 5 bug — always use
volatilefor DCL
Concurrency bugs are non-deterministic — don't always appear in standard unit tests.
Strategies:
- 1. Stress testing — run with many threads for extended time, look for failures
- 2. jcstress — Java Concurrency Stress tests framework by OpenJDK team
- 3. Lincheck — model checking for concurrent data structures
- 4. Thread.sleep/CyclicBarrier — force specific interleavings in tests
- 5. CompletableFuture-based testing — verify async behavior
// Force interleaving with barriers CyclicBarrier start = new CyclicBarrier(2); // Both threads do: start.await(); then the racy operation // Higher chance of exposing race condition
> A passing test does NOT prove thread safety. Absence of bugs in testing ≠ absence of bugs. Code review + formal reasoning about happens-before is essential.
Design principles:
- Prefer immutability — immutable objects are inherently thread-safe
- Minimize shared state — share only what you must
- Document synchronization policy —
@GuardedBy,@ThreadSafeannotations - Use higher-level abstractions — prefer concurrent collections over manual locking
- Keep locks short — don't do I/O or expensive ops while holding a lock
Practical rules:
- Always release locks in finally blocks
- Never call alien methods (callbacks, plugins) while holding a lock
- Prefer executor services over raw threads
- Use
@Immutable,@ThreadSafe,@NotThreadSafe(Concurrency in Practice annotations) - For Java 21+: embrace virtual threads for I/O; structured concurrency for task groups
> 📖 Essential reading: "Java Concurrency in Practice" by Brian Goetz et al. Still the definitive reference.
| Aspect | Traditional Threads | Reactive (Reactor/RxJava) |
|---|---|---|
| I/O model | Blocking (thread waits) | Non-blocking, event-driven |
| Scalability | One thread per request | Few threads, millions of events |
| Complexity | Lower (imperative) | Higher (functional/reactive) |
| Backpressure | Manual (queues) | Built-in (Reactive Streams spec) |
| Debugging | Easier stack traces | Harder (async stack traces) |
> Java 21 Virtual Threads blur this distinction — you can write blocking-style code with near-reactive scalability. Reactive still wins for built-in backpressure and stream operators.
Thread dumps show the state of all threads at a point in time. Critical for diagnosing deadlocks, hangs, and high CPU.
How to capture:
kill -3 <pid>on Linuxjstack <pid>jcmd <pid> Thread.print- VisualVM / JMC
What to look for:
- BLOCKED threads + "waiting to lock" → contention or deadlock
- "Found deadlock" section at bottom → exact cycle
- Many threads in WAITING on same object → potential bottleneck
- All pool threads RUNNABLE but high CPU → runaway loop or tight spin
> Take 3 thread dumps 5–10 seconds apart. Threads stuck in the same state across all dumps = root cause. Tools: fastthread.io, IBM TMDA, TDA.
Runtime detection: JVM detects deadlocks involving only intrinsic locks (synchronized). Check via jstack or programmatically:
ThreadMXBean bean = ManagementFactory.getThreadMXBean();
long[] deadlocked = bean.findDeadlockedThreads();
if (deadlocked != null) {
ThreadInfo[] info = bean.getThreadInfo(deadlocked, true, true);
// log info and alert
}
Prevention is better than detection:
- Consistent lock ordering
tryLockwith timeout- Single lock where possible
- Avoid calling external code while holding a lock