join Command in Linux



The join command in Linux is a versatile utility that combines two files based on a common field. It's particularly useful when dealing with data that's related but stored in separate files.

The join command is used to merge lines from two files based on a common field. It's particularly useful for combining data from related files.

Table of Contents

Here is a comprehensive guide to the options available with the join command −

Understanding the join Command

The join command in Linux is a versatile utility used to combine two files based on a common field. This command is particularly useful when dealing with data that is related but stored in separate files.

Understanding the various options available with the join command can greatly enhance your data manipulation capabilities in Linux.

How to Use join Command?

The default behavior of the join command is to take the first field as the key for joining. However, with the options listed above, you can customize the behavior to suit your specific needs.

For example, if you want to join two files on the second field of each, you would use -1 2 -2 2. If you want to output only the unpairable lines from file 1, you would use -a 1.

Syntax of join Command

join [options] file1 file2

Options join Command

Options Description
-a FILENUM This option prints lines from FILENUM (either file 1 or file 2) that do not have a corresponding line in the other file.
-2 FIELD Join on FIELD of file 2.
-j FIELD Join the files based on FIELD. This is equivalent to specifying -1 FIELD -2 FIELD.
-i or --ignore-case Ignore differences in case when comparing the fields.
-e EMPTY Replace missing input fields with the specified EMPTY string.
-o FORMAT Construct the output line to obey the specified FORMAT. Output fields in a specific format (e.g., 1.2).
-t CHAR Use CHAR as the field delimiter for both input and output. By default, whitespace is used.
-1 FIELD Join on FIELD of file 1.
-v FILENUM Like -a, but instead of printing the unrepairable lines, it suppresses the joined output lines.
--check-order Check that the input is correctly sorted, even if all input lines are pairable.
--nocheck-order Do not check that the input is correctly sorted.
--help Display a help message and exit.
--version Display version information and exit.

By mastering these options, you can efficiently combine data from multiple sources, making the join command a powerful tool in your Linux toolkit.

It's important to note that the join command requires that the input files be sorted on the join field. If they are not, you may need to sort them beforehand or use the --nocheck-order option if you're certain the files are sorted correctly.

Examples of join Command in Linux

Take a look at the following examples to get a clear understanding of how the join command works in Linux −

  • Basic Usage
  • Specifying the Join Field
  • Joining on Different Fields
  • Including Unpairable Lines
  • Changing the Output Format
  • Using a Different Field Separator
  • Case-Insensitive Joining
  • Checking for Sorted Input
  • Redirecting Output to a File

Basic Usage

The simplest form of the join command is when you have two files with a common field, usually the first column. Consider two files, file1.txt and file2.txt, with the following content −

file1.txt

1 AAYUSH
2 APAAR
3 HEMANT
4 KARTIK

file2.txt

1 101
2 102
3 103
4 104

To join these files, you would use −

join file1.txt file2.txt
Basic Usage Using join command

Specifying the Join Field

If the common field is not the first column, you can specify the join field using the -1 and -2 options followed by the field number. For example −

join -1 2 -2 1 file1.txt file2.txt
Specifying Join Field Using join

Joining on Different Fields

You can join two files on different fields in each file using -1 for the first file and -2 for the second file. For instance −

join -1 1 -2 2 file1.txt file2.txt
Joining on Different Fields using join

Including Unpairable Lines

By default, join only outputs lines that have a match in both files. To include lines from the first file that don't have a corresponding match in the second file, use the -a option −

join -a 1 file1.txt file2.txt
Including Unpairable Lines Using join

Changing the Output Format

The -o option allows you to customize the output format. For example, to output only the name and ID from file1.txt and file2.txt, you would use −

join -o 1.2,2.2 file1.txt file2.txt
Changing the Output Format Using join

Using a Different Field Separator

The -t option lets you specify a different field separator if your files don't use whitespace. For example, if your files use a colon −

join -t ':' file1.txt file2.txt
Using Different Field Separator Using join

Case-Insensitive Joining

The -i option allows you to perform a case-insensitive join. This is useful when the case of text in the join field may not match −

join -i file1.txt file2.txt
Case-Insensitive Joining Using join

Checking for Sorted Input

The --check-order option checks that the input is correctly sorted, which is a requirement for the join command to work properly −

join --check-order file1.txt file2.txt
Checking for Sorted Input Using join

Redirecting Output to a File

To save the output of the join operation to a new file, redirect the output using the > operator −

join file1.txt file2.txt > joined.txt
Redirecting Output to File Using join

Print unmatched lines from file1 −

join -a 1 file1.txt file2.txt
Print unmatched lines from file1

Replace missing fields with "missing" −

join -e "missing" file1.txt file2.txt
Replace missing fields with missing

Output fields in a specific format −

join -o 2.1 1.2 file1.txt file2.txt
Output fields in a specific format

Use a custom field separator −

join -t "," file1.txt file2.txt
Use a custom field separator

Ignore case when comparing fields −

join -i file1.txt file2.txt
Ignore case when comparing fields

Conclusion

The join command is a powerful tool for combining related data sets. With these examples, you should be able to leverage its capabilities to streamline your data processing tasks in Linux.

Remember, the key to effectively using the join command is ensuring that your input files are properly sorted on the join field. With a bit of practice, you'll find the join command to be an indispensable part of your Linux toolkit.

Advertisements