
join Command in Linux
The join command in Linux is a versatile utility that combines two files based on a common field. It's particularly useful when dealing with data that's related but stored in separate files.
The join command is used to merge lines from two files based on a common field. It's particularly useful for combining data from related files.
Table of Contents
Here is a comprehensive guide to the options available with the join command −
- Understanding the join Command
- How to Use join Command?
- Syntax of join Command
- Options join Command
- Examples of join Command in Linux
Understanding the join Command
The join command in Linux is a versatile utility used to combine two files based on a common field. This command is particularly useful when dealing with data that is related but stored in separate files.
Understanding the various options available with the join command can greatly enhance your data manipulation capabilities in Linux.
How to Use join Command?
The default behavior of the join command is to take the first field as the key for joining. However, with the options listed above, you can customize the behavior to suit your specific needs.
For example, if you want to join two files on the second field of each, you would use -1 2 -2 2. If you want to output only the unpairable lines from file 1, you would use -a 1.
Syntax of join Command
join [options] file1 file2
Options join Command
Options | Description |
---|---|
-a FILENUM | This option prints lines from FILENUM (either file 1 or file 2) that do not have a corresponding line in the other file. |
-2 FIELD | Join on FIELD of file 2. |
-j FIELD | Join the files based on FIELD. This is equivalent to specifying -1 FIELD -2 FIELD. |
-i or --ignore-case | Ignore differences in case when comparing the fields. |
-e EMPTY | Replace missing input fields with the specified EMPTY string. |
-o FORMAT | Construct the output line to obey the specified FORMAT. Output fields in a specific format (e.g., 1.2). |
-t CHAR | Use CHAR as the field delimiter for both input and output. By default, whitespace is used. |
-1 FIELD | Join on FIELD of file 1. |
-v FILENUM | Like -a, but instead of printing the unrepairable lines, it suppresses the joined output lines. |
--check-order | Check that the input is correctly sorted, even if all input lines are pairable. |
--nocheck-order | Do not check that the input is correctly sorted. |
--help | Display a help message and exit. |
--version | Display version information and exit. |
By mastering these options, you can efficiently combine data from multiple sources, making the join command a powerful tool in your Linux toolkit.
It's important to note that the join command requires that the input files be sorted on the join field. If they are not, you may need to sort them beforehand or use the --nocheck-order option if you're certain the files are sorted correctly.
Examples of join Command in Linux
Take a look at the following examples to get a clear understanding of how the join command works in Linux −
- Basic Usage
- Specifying the Join Field
- Joining on Different Fields
- Including Unpairable Lines
- Changing the Output Format
- Using a Different Field Separator
- Case-Insensitive Joining
- Checking for Sorted Input
- Redirecting Output to a File
Basic Usage
The simplest form of the join command is when you have two files with a common field, usually the first column. Consider two files, file1.txt and file2.txt, with the following content −
file1.txt −
1 AAYUSH 2 APAAR 3 HEMANT 4 KARTIK
file2.txt −
1 101 2 102 3 103 4 104
To join these files, you would use −
join file1.txt file2.txt

Specifying the Join Field
If the common field is not the first column, you can specify the join field using the -1 and -2 options followed by the field number. For example −
join -1 2 -2 1 file1.txt file2.txt

Joining on Different Fields
You can join two files on different fields in each file using -1 for the first file and -2 for the second file. For instance −
join -1 1 -2 2 file1.txt file2.txt

Including Unpairable Lines
By default, join only outputs lines that have a match in both files. To include lines from the first file that don't have a corresponding match in the second file, use the -a option −
join -a 1 file1.txt file2.txt

Changing the Output Format
The -o option allows you to customize the output format. For example, to output only the name and ID from file1.txt and file2.txt, you would use −
join -o 1.2,2.2 file1.txt file2.txt

Using a Different Field Separator
The -t option lets you specify a different field separator if your files don't use whitespace. For example, if your files use a colon −
join -t ':' file1.txt file2.txt

Case-Insensitive Joining
The -i option allows you to perform a case-insensitive join. This is useful when the case of text in the join field may not match −
join -i file1.txt file2.txt

Checking for Sorted Input
The --check-order option checks that the input is correctly sorted, which is a requirement for the join command to work properly −
join --check-order file1.txt file2.txt

Redirecting Output to a File
To save the output of the join operation to a new file, redirect the output using the > operator −
join file1.txt file2.txt > joined.txt

Print unmatched lines from file1 −
join -a 1 file1.txt file2.txt

Replace missing fields with "missing" −
join -e "missing" file1.txt file2.txt

Output fields in a specific format −
join -o 2.1 1.2 file1.txt file2.txt

Use a custom field separator −
join -t "," file1.txt file2.txt

Ignore case when comparing fields −
join -i file1.txt file2.txt

Conclusion
The join command is a powerful tool for combining related data sets. With these examples, you should be able to leverage its capabilities to streamline your data processing tasks in Linux.
Remember, the key to effectively using the join command is ensuring that your input files are properly sorted on the join field. With a bit of practice, you'll find the join command to be an indispensable part of your Linux toolkit.