04 - Working with Shell - II
File Compression and Archival
Viewing file sizes
du
(disk usage): To inspect the size of the file.du -sk
: shows the size of a file or directory inKilobytes
.$ du -sk test.img # command 100000 # output
du -sh
: shows the size of a file or directory inhuman readable format
(MB).$ du -sh test.img # command 98M test.img # output
ls -lh
(long list): to print the size of the file.$ ls -lh test.img # command -rw-rw-r-- 1 98M Mar 13 15:48 test.img # output
Archiving Files
tar
is used to group multiple files and directories into a single file. Hence it is specially used for archiving data.Tar is an abbreviation for tape archive.
Files created with tar are often called tarballs.
Commands:
tar -cf <tarfile-name> <files..>
: to archive a file or directory.-c
to create an archive-f
is used to specify the name of the tar file to be created
$ tar -cf test.tar file1 file2 file3 # c = create an archive, f = specify the name of tar file
$ ls -ltr test.tar
tar -tf <tarfile-name>
: to see the contents of the tarball.$ tar -tf test.tar
tar -xf <tarfile-name>
: to extract the contents from the tarball.$ tar -xf test.tar
tar -zcf <tarfile-name>
: to compress the tarball to reduce its size.$ tar -zcf test.tar
Compression
Compression is the technique used to reduce the size consumed by a file or a dataset.
To reduce the size of a file or directory in the linux file system, there are some commands:
bzip2 (.bz2 extension)
gzip (.gz extension)
xz (.xz extension)
$ bzip2 test.img
$ gzip test1.img
$ xz test2.img
- Note: The space of the compressed files created by these three commands depends on a few factors, such as the type of data being compressed, the other factors that effect the size are the compression algorithm used by these commands and the compression level used.
Uncompression
The compressed files can be uncompressed by using the below commands:
bunzip2
gunzip
unxz
$ bunzip2 test.img
$ gunzip test1.img
$ unxz test2.img
Note
Compressed files need not to be uncompressed every time.
Tools such as
zcat
,bzcat
andxzcat
allow the compressed files to be read without an uncompress.$ zcat hostfile.txt.bz2 $ zcat hostfile.txt.gz $ zcat hostfile.txt.xz
Searching for Files and Patterns
There are multiple ways to locate a file or directory in the filesystem.
locate
find
grep
locate
locate <filename>
: to locate/find the file in the filesystem.locate City.txt
The downside of the locate command is it depends on a database called
mlocate.db
for querying the filename.If you have just installed Linux or if the file you are trying to locate was created recently. The locate command may not give you useful results. This is because it is possible that the DB is not been updated yet.
To manually update the DB, run the command
updatedb
and then run the locate command again.$ sudo updatedb
find
Another way is to make use of the
find
command.Use the find command followed by the directory under which you want to search. To search file by a name use the
-name
option followed by the name of the file.$ find /home/rohit -name City.txt
grep
To search within files, the most popular command in linux is
grep
.Grep is commonly used to print lines of a file matching a pattern but it also offers a variety of other options.
The grep command is case-sensitive.
grep <search-word> <filename>
: to search for the wordsecond
from thesample.txt
$ cat sample.txt # command This is the first line. # output Followed by the second line. And after that the third line. The fourth line has CAPITAL LETTERS. The fifth line does not want to be printed. $ grep second sample.txt # command Followed by the second line. # output
grep -i <search-word> <filename>
: case-insensitive search.$ grep -i capital sample.txt # command The fourth line has CAPITAL LETTERS. # output
grep -r <search-word> <directory>
: to search for a pattern recursively in a directory.$ grep -r "third Line" /home/rohit # command ./sample.txt:And after that the third line. # output
grep -v <search-word> <filename>
: to print the lines that don't matches the pattern$ grep -v "printed" sample.txt # command This is the first line. # output (file content) Followed by the second line. And after that the third line. The fourth line has CAPITAL LETTERS.
grep -w <word> <filename>
: to search for the whole word.$ cat examples.txt # command grep examples # output (file content) linux exam on 12th $ grep -w exam examples.txt # command linux exam on 12th # output
Can also combine multiple options together. eg: to reverse the search and print all lines of the same file that doesn't match the whole word exam:
$ grep -vw exam examples.txt # command grep examples # output
To print the number of lines after and before matching a pattern. Use
grep
command with-A
and-B
flags respectively.$ cat premier-league-table.txt # command 1 Arsenal # output (file content) 2 Liverpool 3 Chelsea 4 Manchester City $ grep -A1 Arsenal premier-league-table.txt # command 1 Arsenal # output 2 Liverpool $ grep -B1 4 premier-league-table.txt # command 3 Chelsea # output 4 Manchester City
The
-A
and-B
can be combined into one single search to print a number of lines before and after a match.$ grep -A1 -B1 Chelsea premier-league-table.txt # command 2 Liverpool # output 3 Chelsea 4 Manchester City
IO Redirection
There are three data streams created when we launch/run a linux command.
Standard Input (STDIN)
- STDIN is the standard input stream which accepts text as an input.
Standard Output (STDOUT)
- Text output is delivered as STDOUT or the standard out stream
Standard ERROR (STDERR)
- Error messages of the command are sent through the standard ERROR stream (STDERR)
- With IO Redirection, the STDIN, STDOUT and STDERR can be redirected to a text file.
REDIRECT STDOUT
>
: To redirect STDOUT to a file instead of printing it on the screen.$ echo $SHELL > shell.txt
»
:To append STDOUT to an exisiting file.$ echo $SHELL >> shell.txt
REDIRECT STDERR
2>
: To redirect just the ERROR message.$ cat missing_file 2> error.txt
2»
: To append the STDERR to the existing file.$ cat missing_file 2>> error.txt
2> /dev/null
: To execute and not print ERROR messages on the screen even if it generates a standard ERROR. (/dev/null
is a bitbucket, where you can dump anything you don’t need)$ cat missing_file 2> /dev/null
Command Line Pipes
Command Line Pipes allow the linking of multiple commands.
In simple terms, pipes allows the first commands standard output to be used as the standard input for the second command.
The pipes are defined using vertical bar symbol (|).
$ grep Hello sample.txt | less
Instead of the redirect operator, we can use the command line pipe
(|)
followed bytee
command.$ echo $SHELL | tee shell.txt
Use
tee
with -a option, to append instead of overwritting it$ echo "This is the bash shell" | tee -a
VI Editor
Text Editor
There are several text editor available, but the most popular is VI Editor.
The VI EDITOR is available in all most all of the linux distribution out of the box.
vi <filename>
: To open the vi editor to create or append a file.$ vi /home/rohit/sample.txt
The VI EDITOR has three operation modes.
Command Mode
When the vi editor opens a file, it always goes to the COMMAND MODE first.
In this mode, the editor only understands the commands.
Insert Mode
To switch from command mode to INSERT MODE type lower case
i
.This mode allows you to write text into the file.
Once you are done with editing the file, to go back to command mode hit the
ESC
button.While going into insert mode from command mode you may use other options such as
I
,o
,O
,a
, orA
.
Last Line Mode
Pressing the
:
key will take you to the LAST LINE MODE.In this mode you can choose to save changes to the file, discard changes, or save and edit.
From the last line mode hit the
ESC
key to go back to the command mode.
VIM Editor
VIM
is an improved version ofVI
with added features but very similar in appearance to VI.In the most distros today, the VI is the symblic to the VIM editor.
Command
update-alternatives --display editor
: Command to see what is default editor.update-alternatives --display editor