How diff works — comparing files line by line

What is a diff?

A diff (short for difference) is a representation of what changed between two versions of a text. The original Unix diff command was created in 1974 and its output format — showing removed lines with - and added lines with + — is still the foundation of every modern diff tool, including Git.

At its core, a diff answers one question: given version A and version B, what is the shortest description of how to transform A into B? The fewer edits required, the better the diff algorithm has done its job.

The LCS algorithm explained

The key insight behind diff is finding the Longest Common Subsequence (LCS) — the largest set of lines that appear in both files in the same order. Every line in that common subsequence is "unchanged". Everything in the old file that is not part of the LCS is a removal; everything in the new file that is not part of the LCS is an addition.

Consider two short texts:

Original          Modified
─────────         ─────────
apple             apple
banana            cherry
cherry            banana
date              date

The LCS here is apple, cherry, date — three lines that appear in both files in order. banana moved, so it shows as a removal from position 2 and an addition at position 3. The diff output would be:

apple

−banana

cherry

+banana

date

The algorithm builds an m × n table (where m and n are the line counts of each file) and fills it with LCS lengths using dynamic programming. Backtracking through the table then reconstructs the edit sequence. This approach is O(mn) in time and space — fast enough for most files, though very large files may benefit from more advanced variants like the Myers diff algorithm.

Reading diff output — the unified format

Most tools output diffs in unified format, which condenses the output by showing a few lines of context around each change rather than the entire file. Here is what each symbol means:

Symbol	Meaning	Example
---	Original file header	`--- a/file.txt`
+++	Modified file header	`+++ b/file.txt`
@@	Hunk header — line range in each file	`@@ -3,6 +3,7 @@`
-	Line removed from original	`- old line`
+	Line added in modified	`+ new line`
(space)	Unchanged context line	`unchanged`

The hunk header @@ -3,6 +3,7 @@ means: starting at line 3 in the original, 6 lines are shown; starting at line 3 in the modified, 7 lines are shown (one line was added). A real unified diff looks like this:

@@@@ -1,4 +1,5 @@

apple

−banana

cherry

+banana

+elderberry

date

How Git uses diff

Git stores complete snapshots of your project at each commit, not diffs. However, diff is used constantly at the surface level: git diff, git show, pull request views, and blame all compute diffs on the fly by comparing two stored snapshots.

Git's diff implementation (based on the Myers algorithm) has several practical enhancements:

Patience diff — finds unique lines first, producing more human-readable output for code restructuring.
Histogram diff — an improvement on patience diff, the default in some Git versions.
Word-level diff — git diff --word-diff shows changes within lines rather than just whole lines.
Rename detection — Git can detect that a file was moved and show a diff against its new path.

Git does not track renames natively — it infers them during diff by comparing file content similarity. If a renamed file is more than 50% similar to a deleted file, Git reports it as a rename.

When to use a diff tool

Diff tools are most useful in four situations:

Code review — comparing a feature branch to main to understand what changed before merging.
Configuration auditing — spotting what changed between two versions of a config file or deployment script.
Document comparison — tracking edits between drafts of a report, contract, or article.
Debugging regressions — narrowing down which change introduced a bug by diffing known-good and broken versions.

Tip: When reviewing a large diff, focus on the hunk headers first. They tell you which parts of the file changed so you can jump straight to the relevant sections rather than reading every context line.

How diff works — comparing files line by line

What is a diff?

The LCS algorithm explained

Reading diff output — the unified format

How Git uses diff

When to use a diff tool

Try the Diff Checker