What is a diff?
A diff (short for difference) is a representation of what changed between two versions of a text. The original Unix diff command was created in 1974 and its output format — showing removed lines with - and added lines with + — is still the foundation of every modern diff tool, including Git.
At its core, a diff answers one question: given version A and version B, what is the shortest description of how to transform A into B? The fewer edits required, the better the diff algorithm has done its job.
The LCS algorithm explained
The key insight behind diff is finding the Longest Common Subsequence (LCS) — the largest set of lines that appear in both files in the same order. Every line in that common subsequence is "unchanged". Everything in the old file that is not part of the LCS is a removal; everything in the new file that is not part of the LCS is an addition.
Consider two short texts:
Original Modified
───────── ─────────
apple apple
banana cherry
cherry banana
date date
The LCS here is apple, cherry, date — three lines that appear in both files in order. banana moved, so it shows as a removal from position 2 and an addition at position 3. The diff output would be:
The algorithm builds an m × n table (where m and n are the line counts of each file) and fills it with LCS lengths using dynamic programming. Backtracking through the table then reconstructs the edit sequence. This approach is O(mn) in time and space — fast enough for most files, though very large files may benefit from more advanced variants like the Myers diff algorithm.
Reading diff output — the unified format
Most tools output diffs in unified format, which condenses the output by showing a few lines of context around each change rather than the entire file. Here is what each symbol means:
| Symbol | Meaning | Example |
|---|---|---|
| --- | Original file header | --- a/file.txt |
| +++ | Modified file header | +++ b/file.txt |
| @@ | Hunk header — line range in each file | @@ -3,6 +3,7 @@ |
| - | Line removed from original | - old line |
| + | Line added in modified | + new line |
| (space) | Unchanged context line | unchanged |
The hunk header @@ -3,6 +3,7 @@ means: starting at line 3 in the original, 6 lines are shown; starting at line 3 in the modified, 7 lines are shown (one line was added). A real unified diff looks like this:
How Git uses diff
Git stores complete snapshots of your project at each commit, not diffs. However, diff is used constantly at the surface level: git diff, git show, pull request views, and blame all compute diffs on the fly by comparing two stored snapshots.
Git's diff implementation (based on the Myers algorithm) has several practical enhancements:
- Patience diff — finds unique lines first, producing more human-readable output for code restructuring.
- Histogram diff — an improvement on patience diff, the default in some Git versions.
- Word-level diff —
git diff --word-diffshows changes within lines rather than just whole lines. - Rename detection — Git can detect that a file was moved and show a diff against its new path.
When to use a diff tool
Diff tools are most useful in four situations:
- Code review — comparing a feature branch to main to understand what changed before merging.
- Configuration auditing — spotting what changed between two versions of a config file or deployment script.
- Document comparison — tracking edits between drafts of a report, contract, or article.
- Debugging regressions — narrowing down which change introduced a bug by diffing known-good and broken versions.