Linux and Unix diff command

Quick links

About diff
Syntax
Examples
Related commands
Linux and Unix main page

About diff

diff analyzes two files and prints the lines that are different. Essentially, it outputs a set of instructions for how to change one file in order to make it identical to the second file.

It does not actually change the files; however, it can optionally generate a script (with the -e option) for the program ed (or ex which can be used to apply the changes.

How diff Works

Let's say we have two files, file1.txt and file2.txt.

If file1.txt contains the following four lines of text:

I need to buy apples.
I need to run the laundry.
I need to wash the dog.
I need to get the car detailed.

...and file2.txt contains these four lines:

I need to buy apples.
I need to do the laundry.
I need to wash the car.
I need to get the dog detailed.

...then we can use diff to automatically display for us which lines differ between the two files with this command:

diff file1.txt file2.txt

...and the output will be:

2,4c2,4
< I need to run the laundry.
< I need to wash the dog.
< I need to get the car detailed.
---
> I need to do the laundry.
> I need to wash the car.
> I need to get the dog detailed.

Let's take a look at what this output means. The important thing to remember is that when diff is describing these differences to you, it's doing so in a prescriptive context: it's telling you how to change the first file to make it match the second file.

The first line of the diff output will contain:

  • line numbers corresponding to the first file,
  • a letter (a for add, c for change, or d for delete), and
  • line numbers corresponding to the second file.

In our output above, "2,4c2,4" means: "Lines 2 through 4 in the first file need to be changed in order to match lines 2 through 4 in the second file." It then tells us what those lines are in each file:

  • Lines preceded by a < are lines from the first file;
  • lines preceded by > are lines from the second file.

The three dashes ("---") merely separate the lines of file 1 and file 2.

Let's look at another example. Let's say our two files look like this:

file1.txt:

I need to go to the store.
I need to buy some apples.
When I get home, I'll wash the dog.

file2.txt:

I need to go to the store.
I need to buy some apples.
Oh yeah, I also need to buy grated cheese.
When I get home, I'll wash the dog.
diff file1.txt file2.txt

Output:

2a3
> Oh yeah, I also need to buy grated cheese.

Here, the output is telling us "After line 2 in the first file, a line needs to be added: line 3 from the second file." It then shows us what that line is.

Now let's see what it looks like when diff tells us we need to delete a line.

file1:

I need to go to the store.
I need to buy some apples.
When I get home, I'll wash the dog.
I promise.

file2:

I need to go to the store.
I need to buy some apples.
When I get home, I'll wash the dog.

Our command:

diff file1.txt file2.txt

The output:

4d3
< I promise.

Here, the output is telling us "You need to delete line 4 in the first file so that both files sync up at line 3." It then shows us the contents of the line that needs to be deleted.

Viewing diff Output In Context

The examples above show the default output of diff. It's intended to be read by a computer, not a human, so for human purposes, sometimes it helps to see the context of the changes.

GNU diff, which is the version most linux users will be using, offers two different ways to do this: "context mode" and "unified mode".

To view differences in context mode, use the -c option. For instance, let's say file1.txt and file2.txt contain the following:

file1.txt:

apples
oranges
kiwis
carrots

file2.txt:

apples
kiwis
carrots
grapefruits

Let's look at the contextual output for the diff of these two files. Our command is:

diff -c file1.txt file2.txt

And our output looks like this:

*** file1.txt   2014-08-21 17:58:29.764656635 -0400
--- file2.txt   2014-08-21 17:58:50.768989841 -0400
***************
*** 1,4 ****
  apples
- oranges
  kiwis
  carrots
--- 1,4 ----
  apples
  kiwis
  carrots
+ grapefruits

The first two lines of this output show us information about our "from" file (file 1) and our "to" file (file 2). It lists the filename, modification date, and modification time of each of our files, one per line. The "from" file is indicated by "***", and the "to" file is indicated by "---".

The line "***************" is just a separator.

The next line has three asterisks ("***") followed by a line range from the first file (in this case lines 1 through 4, separated by a comma). Then four asterisks ("****").

Then it shows us the contents of those lines. If the line is unchanged, it's simply prefixed by two spaces. If the line is changed, however, it's prefixed by an indicative character and a space. The character meanings are as follows:

character meaning
! Indicates that this line is part of a group of one or more lines that needs to change. There is a corresponding group of lines prefixed with "!" in the other file's context as well.
+ Indicates a line in the second file that needs to be added to the first file.
- Indicates a line in the first file that needs to be deleted.

After the lines from the first file, there are three dashes ("---"), then a line range, then four dashes ("----"). This indicates the line range in the second file that will sync up with our changes in the first file.

If there is more than one section that needs to change, diff will show these sections one after the other. Lines from the first file will still be indicated with "***", and lines from the second file with "---".

Unified Mode

Unified mode (the -u option) is similar to context mode, but it doesn't display any redundant information. Here's an example, using the same input files as our last example:

file1.txt:

apples
oranges
kiwis
carrots

file2.txt:

apples
kiwis
carrots
grapefruits

Our command:

diff -u file1.txt file2.txt

The output:

--- file1.txt   2014-08-21 17:58:29.764656635 -0400
+++ file2.txt   2014-08-21 17:58:50.768989841 -0400
@@ -1,4 +1,4 @@
 apples
-oranges
 kiwis
 carrots

The output is similar to above, but as you can see, the differences are "unified" into one set.

Finding Differences In Directory Contents

diff can also compare directories; simply provide directory names instead of filenames. See the Examples section.

Using diff To Create An Editing Script

The -e option tells diff to output a script, which can be used by the editing programs ed or ex, that contains a sequence of commands. The commands are a combination of c (change), a (add), and d (delete) which, when executed by the editor, will modify the contents of file1 (the first file specified on the diff command line) so that it matches the contents of file2 (the second file specified).

Let's say we have two files with the following contents:

file1.txt:

Once upon a time, there was a girl named Persephone.
She had black hair.
She loved her mother more than anything.
She liked to sit outside in the sunshine with her cat, Daisy.
She dreamed of being a painter when she grew up.

file2.txt

Once upon a time, there was a girl named Persephone.
She had red hair.
She loved chocolate chip cookies more than anything.
She liked to sit outside in the sunshine with her cat, Daisy.
She would look up into the clouds and dream of being a world-famous baker.

We can run the following command to analyze the two files with diff and produce a script to create a file identical to file2.txt from the contents of file1.txt:

diff -e file1.txt file2.txt

...and the output will look like this:

5c
She would look up into the clouds and dream of being a world-famous baker.
.
2,3c
She had red hair.
She loved chocolate chip cookies more than anything.
.

Notice that the changes are listed in reverse order: the changes closer to the end of the file are listed first, and changes closer to the beginning of the file are listed last. This is to preserve line numbering; if we made the changes at the beginning of the file first, that might change the line numbers later in the file. So the script starts at the end, and works backwards.

Here, the script is telling the editing program: "change line 5 to (the following line), and change lines 2 through 3 to (the following two lines)."

Next, we should save the script to a file. We can redirect the diff output to a file using the > operator, like this:

diff -e file1.txt file2.txt > my-ed-script.txt

This will not display anything on the screen (unless there is an error); instead, the output is redirected to the file my-ed-script.txt. If my-ed-script.txt doesn't exist, it will be created; if it exists already, it will be overwritten.

If we now check the contents of my-ed-script.txt with the cat command...

cat my-ed-script.txt

...we will see the same script we saw displayed above.

There's still one thing missing, though: we need the script to tell ed to actually write the file. All that's missing from the script is the w command, which will write the changes. We can add this to our script by echoing the letter "w" and using the >> operator to add it to our file. (The >> operator is similar to the > operator. It redirects output to a file, but instead of overwriting the destination file, it appends to the end of the file.) The command looks like this:

echo "w" >> my-ed-script.txt

Now, we can check to see that our script has changed by running the cat command again:

cat my-ed-script.txt
5c
She would look up into the clouds and dream of being a world-famous baker.
.
2,3c
She had red hair.
She loved chocolate chip cookies more than anything.
.
w

Now our script, when issued to ed, will make the changes and write the changes to disk.

So how do we get ed do do this?

We can issue this script to ed with the following command, telling it to overwrite our original file. The dash ("-") tells ed to read from the standard input, and the < operator directs our script to that input. In essence, the system enters whatever is in our script as input to the editing program. The command looks like this:

ed - file1.txt < my-ed-script.txt

This command displays nothing, but if we look at the contents of our original file...

cat file1.txt
Once upon a time, there was a girl named Persephone.
She had red hair.
She loved chocolate chip cookies more than anything.
She liked to sit outside in the sunshine with her cat, Daisy.
She would look up into the clouds and dream of being a world-famous baker.

...we can see that file1.txt now matches file2.txt exactly.

Warning! In this example, ed overwrote the contents of our original file, file1.txt. After running the script, the original text of file1.txt disappears, so make sure you understand what you're doing before running these commands!

Commonly-Used diff Options

Here are some useful diff options to take note of:

-b Ignore any changes which only change the amount of whitespace (such as spaces or tabs).
-w Ignore whitespace entirely.
-B Ignore blank lines when calculating differences.
-y Display output in two columns.

These are only some of the most commonly-used diff options. What follows is a complete list of diff options and their function.

Command Syntax

diff [OPTION]... FILES

Options

--normal output a "normal" diff. (This is the default)
-q, --brief Produce output only when files differ. If there are no differences, output nothing.
-s, --report-identical-files Report when two files are the same.
-c, -C NUM, --context[=NUM] Provide NUM (default 3) lines of context.
-u, -U NUM, --unified[=NUM] Provide NUM (default 3) lines of unified context.
-e, --ed Output an ed script.
-n, --rcs Output an RCS-format diff.
-y, --side by side Format output in two columns.
-W, --width=NUM Output at most NUM (default 130) print columns.
--left-column Output only the left column of common lines.
--suppress-common-lines Do not output lines common between the two files.
-p, --show-c-function For files that contain C code, also show which C function each change is in.
-F, --show-function-line=RE Show the most recent line matching regular expression RE.
--label LABEL When displaying output, use the label LABEL instead of the file name. This option can be issued more than once for multiple labels.
-t, --expand-tabs Expand tabs to spaces in output.
-T, --initial-tab Make tabs line up by prepending a tab if necessary.
--tabsize=NUM Define a tab stop as NUM (default 8) columns.
--suppress-blank-empty Suppress spaces and/or tabs before empty output lines.
-l, --paginate Pass output through pr to paginate it.
-r, --recursive Recursively compare any subdirectories found.
-N, --new-file If a specified file does not exist, perform the diff as if it is an empty file.
--unidirectional-new-file Same as -n, but only applies to the first file.
--ignore-file-name-case Ignore case when comparing file names.
--no-ignore-file-name-case Consider case when comparing file names.
-x, --exclude=PAT Exclude files that match filename pattern PAT.
-X, --exclude-from=FILE Exclude files that match any filename pattern in file FILE.
-S, --starting-file=FILE Start with file FILE when comparing directories.
--from-file=FILE1 Compare FILE1 to all operands; FILE1 can be a directory.
--to-file=FILE2 compare all operands to FILE2; FILE2 can be a directory.
-i, --ignore-case ignore case differences in file contents.
-E, --ignore-tab-expansion ignore changes due to tab expansion.
-b, --ignore-space-change ignore changes in the amount of white space.
-w, --ignore-all-space ignore all white space.
-B, --ignore-blank-lines ignore changes whose lines are all blank.
-I, --ignore-matching-lines=RE ignore changes whose lines all match regular expression RE.
-a, --text treat all files as text.
--strip-trailing-cr strip trailing carriage return on input.
-D, --ifdef=NAME output merged file with "#ifdef NAME" diffs.
--GTYPE-group-format=GFMT format GTYPE input groups with GFMT.
--line-format=LFMT format all input lines with LFMT.
--LTYPE-line-format=LFMT format LTYPE input lines with LFMT.

These format options provide fine-grained control over the output of diff, generalizing -D/--ifdef.

LTYPE is old, new, or unchanged.

GTYPE can be any of the LTYPE values, or the value changed.

GFMT (but not LFMT) may contain:

%< lines from FILE1
%> lines from FILE2
%= lines common to FILE1 and FILE2.
%[-][WIDTH][.[PREC]]{doxX}LETTER printf-style spec for LETTER
LETTERs are as follows for new group, lower case for old group:

F first line number
L last line number
N number of lines = L - F + 1
E F - 1
M L + 1
%(A=B?T:E) if A equals B then T else E
LFMT (only) may contain:

%L contents of line
%l contents of line, excluding any trailing newline
%[-][WIDTH][.[PREC]]{doxX}n printf-style spec for input line number
Both GFMT and LFMT may contain:

%% A literal %
%c'C' the single character C
%c'\OOO' the character with octal code OOO
C the character C (other characters simply represent themselves)
-d, --minimal try hard to find a smaller set of changes.
--horizon-lines=NUM keep NUM lines of the common prefix and suffix.
--speed-large-files assume large files and many scattered small changes.
--help display a help message and exit.
-v, --version output version information and exit.

FILES takes the form "FILE1 FILE2" or "DIR1 DIR2" or "DIR FILE..." or "FILE... DIR".

If the --from-file or --to-file options are given, there are no restrictions on FILE(s). If a FILE is a dash ("-"), diff reads from standard input.

Exit status is either 0 if inputs are the same, 1 if different, or 2 if diff encounters any trouble.

Examples

Here's an example of using diff to examine the differences between two files side by side using the -y option, given the following input files:

file1.txt:

apples
oranges
kiwis
carrots

file2.txt:

apples
kiwis
carrots
grapefruits
diff -y file1.txt file2.txt

Output:

apples            apples
oranges         <
kiwis             kiwis
carrots           carrots
                > grapefruits

And as promised, here is an example of using diff to compare two directories:

diff dir1 dir2

Output:

Only in dir1: tab2.gif
Only in dir1: tab3.gif
Only in dir1: tab4.gif
Only in dir1: tape.htm
Only in dir1: tbernoul.htm
Only in dir1: tconner.htm
Only in dir1: tempbus.psd

Related commands

bdiff — Identify the differences between two very big files.
cmp — Compare two files byte by byte.
comm — Compare two sorted files line by line.
dircmp — Compare the contents of two directories, listing unique files.
ed — A simple text editor.
pr — Format a text file for printing.
ls — List the contents of a directory or directories.
sdiff — Compare two files, side-by-side.