Thursday, 12 May 2016

How to split a file in Linux

The specialty in Linux is that everything is a file. Here I am going to share you a small tip on how to split a file of particular size and get back to its original size. Consider an example, if we want to split a 900MB ISO image file, then do the following:
# split -b 900m image.iso
Now it will generate three files, namely image.iso1, image.iso2, image.iso3 each. Afterwards you can use the cat command to combine the three to get back the original file, as follows:
# cat image.iso* > new-image.iso
Thats it, we got file back with its original size………Cool!!!!!!!
split command in Unix is used to split a large file into smaller files. The splitting can be done on various criteria: on the basis of number of lines, or the number of output files or the byte count, etc.
$ cat file
Unix
Linux
AIX
Solaris
HPUX
Ubuntu

1. Split a file :

$ split file
The split command splits the file into multiple files with 1000 lines into each output file by default.
The output file generated in this case is:
$ ls x*
xaa
Since the input file does not contain 1000 lines, all the contents are put into only one output file “xaa”. By default, the output files generated contains the prefix “x”, and the suffix as “aa”, “ab”, “ac” and so on.

2. Split file into multiple files with 3 lines each:

$ split -l 3 file
The option -l specifies the number of lines per output file. Since the input file contains 7 lines, the output files contain 3, 3 and 1 respectively.
The output files generated are:
$ ls x*
xaa  xab  xac
$ cat xaa
Unix
Linux
AIX
The file “xab” contains the 4th till 6th line, and the file “xac” contains the last line.

3. Split file into multiple files with a user defined prefix:

$ split -l 3 file F
$ ls F*
Faa  Fab  Fac
The suffix, if provided, is the last argument of the split command. Since the suffix provided is “F”, the files created are “Faa”,”Fab”, and so on.

4. Split file into multiple files with a single character suffix:

$ split -l 3 -a 1 file F
$ ls F*
Fa  Fb  Fc
In the above examples, the suffixes generated are “aa”,”ab” and so on. If the number of output files to be created is huge, this makes sense. For our example, a single character suffix would suffice. The option -a of split allows to control the length of suffix. By providing the suffix length as 1, the files created are “Fa”,”Fb”, and so on.

5. Split file into multiple files with a numeric suffix:

$ split -l 3 -d file F
$ ls F*
F00  F01  F02
The option -d of split enables a numeric suffix. With this, the files generated will be “F00”, “F01”, “F02”, and so on. To get the single digit numeric suffix:
$ split -l 3 -a 1 -d file F
$ ls F*
F0  F1  F2
By enabling the option -a to 1, single digit numeric suffix is set.

6. Split file into multiple files with 10 bytes per OUTPUT file:

$ split -b 10 -a 1 -d file F
The -b option of split divides the file on the basis of byte count. The byte count includes the new line character present at the end of the line as well.
$ ls F*
F0  F1  F2  F3  F4

$ cat F0
Unix
Linux

$ cat F1
AIX
Solar
The file F0 contains 10 characters 5 characters of first line (Unix + new line) and 5 characters of second line (Linux). The new line character of the 2nd line moved to the 2nd output file.

7. Split file with Kilobytes or Megabytes of data per OUTPUT file:

$ split -b 1k file
This will split the file with 1 KB of data per OUTPUT file. Similarly, to split the file with 1MB of data per OUTPUT file:
$ split -b 1m file
Note: The commands below use the option -n which is not available in all Unix flavors.

8. Split a file into 2 files of equal length:

$ split -n 2 -a 1 -d file F
At times, the requirement can be to split a file equally into 2 files, unlike earlier case where the split is based on number of lines per output file. The n option of split does this. By specifying the “-n 2”, the file is split equally into 2 files as shown below:
$ ls F*
F0  F1

$ cat F0
Unix
Linux
AIX
Solari

$ cat F1
s
HPUX
Ubuntu
Cygwin
Note: -n divides the file into equal lengths on the basis of the byte count of the files. As shown above, since the file has 42 characters, it is divided into 21 characters each.

9. Split file into 2 files with complete lines of output:

$ split -n l/2 -a1 -d file F
The option “-n l/2” enables to split on the basis of complete lines. And hence, the file F0 contains the complete 4th line “Solaris”, and the rest goes to the 2nd file.
$ ls F*
F0  F1

$ cat F0
Unix
Linux
AIX
Solaris

10. split command to display only a section of the file:

$ split -n 1/4  file
Unix
Linux
The option “-n 1/4” does not create any output files. It simply displays the file. 4 indicates to split the file into 4 equal parts or sections, and 1/4 indicates to write to stdout the 1st of the 4 sections. In other words, it displays the 1st part in the terminal. Similarly, to display the 2nd of the 4 parts:
$ split -n 2/4 file

AIX
Solar
Note: As seen above, the output does not contain complete lines. The split is done purely on the basis of equal byte count.
Split file with complete lines:
$ split -n l/1/4  file
Unix
Linux

$ split -n l/2/4  file
AIX
Solaris
By specifying the l option, the split is done at the completion of the line.

No comments: