What is a diff?

A diff (short for difference) is a representation of what changed between two versions of a text. The original Unix diff command was created in 1974 and its output format — showing removed lines with - and added lines with + — is still the foundation of every modern diff tool, including Git.

At its core, a diff answers one question: given version A and version B, what is the shortest description of how to transform A into B? The fewer edits required, the better the diff algorithm has done its job.

The LCS algorithm explained

The key insight behind diff is finding the Longest Common Subsequence (LCS) — the largest set of lines that appear in both files in the same order. Every line in that common subsequence is "unchanged". Everything in the old file that is not part of the LCS is a removal; everything in the new file that is not part of the LCS is an addition.

Consider two short texts:

Original          Modified
─────────         ─────────
apple             apple
banana            cherry
cherry            banana
date              date

The LCS here is apple, cherry, date — three lines that appear in both files in order. banana moved, so it shows as a removal from position 2 and an addition at position 3. The diff output would be:

apple
banana
cherry
+banana
date

The algorithm builds an m × n table (where m and n are the line counts of each file) and fills it with LCS lengths using dynamic programming. Backtracking through the table then reconstructs the edit sequence. This approach is O(mn) in time and space — fast enough for most files, though very large files may benefit from more advanced variants like the Myers diff algorithm.

Reading diff output — the unified format

Most tools output diffs in unified format, which condenses the output by showing a few lines of context around each change rather than the entire file. Here is what each symbol means:

SymbolMeaningExample
---Original file header--- a/file.txt
+++Modified file header+++ b/file.txt
@@Hunk header — line range in each file@@ -3,6 +3,7 @@
-Line removed from original- old line
+Line added in modified+ new line
(space)Unchanged context line unchanged

The hunk header @@ -3,6 +3,7 @@ means: starting at line 3 in the original, 6 lines are shown; starting at line 3 in the modified, 7 lines are shown (one line was added). A real unified diff looks like this:

@@@@ -1,4 +1,5 @@
apple
banana
cherry
+banana
+elderberry
date

How Git uses diff

Git stores complete snapshots of your project at each commit, not diffs. However, diff is used constantly at the surface level: git diff, git show, pull request views, and blame all compute diffs on the fly by comparing two stored snapshots.

Git's diff implementation (based on the Myers algorithm) has several practical enhancements:

Git does not track renames natively — it infers them during diff by comparing file content similarity. If a renamed file is more than 50% similar to a deleted file, Git reports it as a rename.

When to use a diff tool

Diff tools are most useful in four situations:

Tip: When reviewing a large diff, focus on the hunk headers first. They tell you which parts of the file changed so you can jump straight to the relevant sections rather than reading every context line.