Garbage collection is more than freeing unused memory — it’s about managing time, space, and safety while keeping performance predictable.
Different algorithms optimize for different workloads: responsiveness, throughput, or simplicity.
This note summarizes how the most common collectors work and the principles that make them correct and efficient.
Note
The algorithms below can be mixed. Most modern runtimes combine generational copying with concurrent marking or regional compaction.
Mark–Sweep Collection
This is the classic tracing algorithm and the foundation for most others.
Mark phase: Start from root references (globals, stacks, registers). Traverse every reachable object and mark it as live.
Sweep phase: Scan the entire heap, reclaiming unmarked (unreachable) objects.
Pros
Simple and space-efficient (no copying).
Works with large, complex object graphs.
Cons
Leaves fragmentation: free memory scattered in small chunks.
Full-heap traversal pauses the program.
Example
Suppose an application allocates 1 GB of objects but only 200 MB remains live.
Mark–sweep reclaims the rest, but allocation may still fail if no contiguous 50 MB block exists — fragmentation.
Tri-Color Invariant
Incremental and concurrent GCs extend mark–sweep using the tri-color abstraction.
Objects are divided into three sets:
White: not yet reached (potential garbage).
Grey: discovered but not fully scanned.
Black: scanned completely.
The critical invariant is:
No black object points to a white one.
Maintaining this prevents reachable objects from being lost during concurrent marking.
Write barriers ensure that if a black object writes a pointer to a white object, the white one is shaded grey.
Note
Barriers are small snippets of code the compiler or runtime inserts automatically to preserve correctness during mutation.
Copying Collection
The copying collector eliminates fragmentation by compacting live data as it collects.
How It Works
Divide the heap into two equal halves: from-space and to-space.
During collection, copy every reachable object from from-space into to-space.
Update references to point to the new locations.
Swap roles: to-space becomes the new from-space.
Benefits
Always leaves a contiguous free region — no fragmentation.
Naturally compacts memory and improves cache locality.
Collection time proportional to the number of live objects, not heap size.
Costs
Requires 2× memory (half the heap unused at a time).
Copying overhead for large object graphs.
Relocation invalidates pointers — GC must patch all references.
Tip
Copying collection shines when most objects are short-lived — typical in functional and object-oriented programs.
Generational Collection
Empirical studies show most objects die young — a principle known as the weak generational hypothesis.
Generational collectors exploit this by dividing the heap into young and old regions.
Mechanism
Minor collection — runs frequently on the young generation.
Copies survivors into a survivor space or promotes them to the old generation.
Major collection — scans the entire heap, including the old generation.
Occurs rarely.
Objects that survive several minor collections are assumed long-lived and promoted.
Write Barriers
Since old objects can reference young ones, the collector must track these links.
Write barriers record such updates to a remembered set or card table, ensuring the GC doesn’t miss reachable young objects during a minor collection.
Example
When an old object assigns a pointer into the young heap:
old->child = new_young;
The write barrier notes this so the young generation collector treats new_young as reachable.
Diagram Explanation — Tri-Color & Generational
gc_tricolor_and_generational.svg should illustrate:
Tri-color marking: white, grey, black sets with arrows showing write barriers maintaining the invariant.
Generational layout: two young “from” and “to” spaces beside an older generation region.
Arrows from old → young objects recorded in a remembered set.
A timeline of collection cycles (minor vs major).
This helps visualize both how GCs preserve correctness and why they focus effort on new allocations.
Pitfalls and Tuning
Warning
Barrier bugs break invariants, causing live objects to be reclaimed.
Promotion heuristics affect pause time and memory usage.
Large objects often bypass copying spaces and are managed separately.
Tenuring thresholds that are too short cause thrashing between generations.
.NET CLR: generational mark–compact with concurrent phases.
Go: concurrent mark–sweep with small heap fragments, tuned for low pause times.
Rust: none — relies on ownership for deterministic lifetime management.
Tip
Even in non-GC languages, these principles influence memory-safe ownership systems — reasoning about reachability and lifetime statically rather than dynamically.