File Compression and Archival

Viewing file sizes

du (disk usage): To inspect the size of the file.

du -sk: shows the size of a file or directory in Kilobytes.

  $ du -sk test.img        # command
  100000                   # output

du -sh: shows the size of a file or directory in human readable format (MB).

  $ du -sh test.img        # command
  98M   test.img           # output

ls -lh(long list): to print the size of the file.

  $ ls -lh test.img                             # command
  -rw-rw-r-- 1 98M Mar 13 15:48 test.img        # output

Archiving Files

tar is used to group multiple files and directories into a single file. Hence it is specially used for archiving data.
- Tar is an abbreviation for tape archive.
- Files created with tar are often called tarballs.

Commands:

tar -cf <tarfile-name> <files..>: to archive a file or directory.
- -c to create an archive
- -f is used to specify the name of the tar file to be created

    $ tar -cf test.tar file1 file2 file3         # c = create an archive, f = specify the name of tar file
    $ ls -ltr test.tar

tar -tf <tarfile-name>: to see the contents of the tarball.
```
  $ tar -tf test.tar
```
tar -xf <tarfile-name>: to extract the contents from the tarball.
```
  $ tar -xf test.tar
```
tar -zcf <tarfile-name>: to compress the tarball to reduce its size.
```
  $ tar -zcf test.tar
```

Compression

Compression is the technique used to reduce the size consumed by a file or a dataset.
To reduce the size of a file or directory in the linux file system, there are some commands:
- bzip2 (.bz2 extension)
- gzip (.gz extension)
- xz (.xz extension)

    $ bzip2 test.img
    $ gzip test1.img
    $ xz test2.img

Note: The space of the compressed files created by these three commands depends on a few factors, such as the type of data being compressed, the other factors that effect the size are the compression algorithm used by these commands and the compression level used.

Uncompression

The compressed files can be uncompressed by using the below commands:
- bunzip2
- gunzip
- unxz

    $ bunzip2 test.img
    $ gunzip test1.img
    $ unxz test2.img

Note

Compressed files need not to be uncompressed every time.
Tools such as zcat , bzcat and xzcat allow the compressed files to be read without an uncompress.
```
  $ zcat hostfile.txt.bz2
  $ zcat hostfile.txt.gz
  $ zcat hostfile.txt.xz
```

Searching for Files and Patterns

There are multiple ways to locate a file or directory in the filesystem.
1. locate
2. find
3. grep

locate

locate <filename>: to locate/find the file in the filesystem.
```
  locate City.txt
```
The downside of the locate command is it depends on a database called mlocate.db for querying the filename.
If you have just installed Linux or if the file you are trying to locate was created recently. The locate command may not give you useful results. This is because it is possible that the DB is not been updated yet.
To manually update the DB, run the command updatedb and then run the locate command again.
```
  $ sudo updatedb
```

find

Another way is to make use of the find command.
Use the find command followed by the directory under which you want to search. To search file by a name use the -name option followed by the name of the file.
```
  $ find /home/rohit -name City.txt
```

grep

To search within files, the most popular command in linux is grep.
- Grep is commonly used to print lines of a file matching a pattern but it also offers a variety of other options.
- The grep command is case-sensitive.

grep <search-word> <filename>: to search for the word second from the sample.txt

  $ cat sample.txt                                # command
  This is the first line.                         # output
  Followed by the second line.
  And after that the third line.
  The fourth line has CAPITAL LETTERS.
  The fifth line does not want to be printed.

  $ grep second sample.txt                         # command
  Followed by the second line.                     # output

grep -i <search-word> <filename>: case-insensitive search.

  $ grep -i capital sample.txt            # command
  The fourth line has CAPITAL LETTERS.    # output

grep -r <search-word> <directory>: to search for a pattern recursively in a directory.

  $ grep -r "third Line" /home/rohit                # command
  ./sample.txt:And after that the third line.       # output

grep -v <search-word> <filename>: to print the lines that don't matches the pattern

  $ grep -v "printed" sample.txt                  # command
  This is the first line.                         # output (file content)
  Followed by the second line.
  And after that the third line.
  The fourth line has CAPITAL LETTERS.

grep -w <word> <filename>: to search for the whole word.

  $ cat examples.txt                    # command
  grep examples                         # output (file content)
  linux exam on 12th                    

  $ grep -w exam examples.txt           # command
  linux exam on 12th                    # output

Can also combine multiple options together. eg: to reverse the search and print all lines of the same file that doesn't match the whole word exam:

  $ grep -vw exam examples.txt            # command
  grep examples                           # output

To print the number of lines after and before matching a pattern. Use grep command with -A and -B flags respectively.

  $ cat premier-league-table.txt                            # command
  1 Arsenal                                                 # output (file content)
  2 Liverpool 
  3 Chelsea
  4 Manchester City

  $ grep -A1 Arsenal premier-league-table.txt               # command
  1 Arsenal                                                 # output
  2 Liverpool                                                       

  $ grep -B1 4 premier-league-table.txt                     # command
  3 Chelsea                                                 # output
  4 Manchester City

The -A and -B can be combined into one single search to print a number of lines before and after a match.

  $ grep -A1 -B1 Chelsea premier-league-table.txt            # command
  2 Liverpool                                                # output
  3 Chelsea
  4 Manchester City

IO Redirection

There are three data streams created when we launch/run a linux command.
- Standard Input (STDIN)
  - STDIN is the standard input stream which accepts text as an input.
- Standard Output (STDOUT)
  - Text output is delivered as STDOUT or the standard out stream
- Standard ERROR (STDERR)
  - Error messages of the command are sent through the standard ERROR stream (STDERR)

With IO Redirection, the STDIN, STDOUT and STDERR can be redirected to a text file.

REDIRECT STDOUT

>: To redirect STDOUT to a file instead of printing it on the screen.
```
  $ echo $SHELL > shell.txt
```
»:To append STDOUT to an exisiting file.
```
  $ echo $SHELL >> shell.txt
```

REDIRECT STDERR

2>: To redirect just the ERROR message.
```
  $ cat missing_file 2> error.txt
```
2»: To append the STDERR to the existing file.
```
  $ cat missing_file 2>> error.txt
```
2> /dev/null: To execute and not print ERROR messages on the screen even if it generates a standard ERROR. (/dev/null is a bitbucket, where you can dump anything you don’t need)
```
  $ cat missing_file 2> /dev/null
```

Command Line Pipes

Command Line Pipes allow the linking of multiple commands.
In simple terms, pipes allows the first commands standard output to be used as the standard input for the second command.
The pipes are defined using vertical bar symbol (|).
```
  $ grep Hello sample.txt | less
```
Instead of the redirect operator, we can use the command line pipe (|) followed by tee command.
```
  $ echo $SHELL | tee shell.txt
```
Use tee with -a option, to append instead of overwritting it
```
  $ echo "This is the bash shell" | tee -a
```

VI Editor

Text Editor

There are several text editor available, but the most popular is VI Editor.
The VI EDITOR is available in all most all of the linux distribution out of the box.
vi <filename>: To open the vi editor to create or append a file.
```
  $ vi /home/rohit/sample.txt
```
The VI EDITOR has three operation modes.
1. Command Mode
  - When the vi editor opens a file, it always goes to the COMMAND MODE first.
  - In this mode, the editor only understands the commands.

Insert Mode
- To switch from command mode to INSERT MODE type lower case i.
- This mode allows you to write text into the file.
- Once you are done with editing the file, to go back to command mode hit the ESC button.
- While going into insert mode from command mode you may use other options such as I, o, O, a, or A.

Last Line Mode
- Pressing the : key will take you to the LAST LINE MODE.
- In this mode you can choose to save changes to the file, discard changes, or save and edit.
- From the last line mode hit the ESC key to go back to the command mode.

VIM Editor

VIM is an improved version of VI with added features but very similar in appearance to VI.
In the most distros today, the VI is the symblic to the VIM editor.

Command

update-alternatives --display editor: Command to see what is default editor.
```
  update-alternatives --display editor
```

⏮

Previous Topic: 03 - Linux Package Management

⏭

Next Topic: 05 - Linux Networking Basics

04 - Working with Shell - II

Table of contents