Top Banner
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
35

Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Dec 24, 2015

Download

Documents

Abigail Conley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Management Information Systems

Lection 06Archiving information

CLARK UNIVERSITY

College of Professional and Continuing Education (COPACE)

Page 2: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Plan

• Coding of numeric information• Coding of textual information• Coding of graphical information• Archiving of information• Shannon-Fano coding• Huffman coding

Page 3: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Basic terms

• Coding is the converting the message to the code, that is, to the set of symbols transmitted by the communication channel

Page 4: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of numeric information

• Binary encoding used in computing, based on the representation of data sequence of two characters: 0 and 1.

• These signs are called binary digits, in English - binary digit, or, in short, bit (bit).

Page 5: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of numeric information

One bit can be represent two numbers: 0 or 1 (yes or no, true or false, etc.). If the number of bits is increased to two, we can represent four different numbers:

00 01 10 11Three bits can encode eight different values:

000 001 010 011 100 101 110 111

Page 6: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding binary data

The general formula is:

N = 2i

where N - number of independent coded values; i - bit binary code.

Page 7: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of binary integers

Principle: Integer is divided in a half, while the reminder is not either zero or one. The set of reminders from each division, written from right to left with the last reminder forms a binary equivalent of a decimal number.

Page 8: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Example

19 : 2 = 9 + 19 : 2 = 4 + 14 : 2 = 2 + 0

2 : 2 = 1

So, 1910 =10112

Page 9: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of binary integers

• To encode the integers from 0 to 255 it is enough to have 8 bits.

• 16-bit coding is used for integers from 0 to 65535

• 24 bits are used for more than 16.5 million numbers.

Page 10: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of textual information

• If each letter of the alphabet matches a certain integer, then we can use the binary code for the encoding the textual information.

• Eight bits are sufficient to encode 256 different characters.

Page 11: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of textual information

U.S. Standards Institute (ANSI - American National Standard Institute) has put in place a system of encoding ASCII (American Standard Code for Informational Interchange - American Standard Code for Information Interchange).

Page 12: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of textual information

• There are two encoding tables in ASCII: basic (symbols with numbers 0 - 127) and extended one (128 - 255).

Page 13: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

The extended ASCII character set

Page 14: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Windows 1251 character set

Page 15: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of textual information

• The use of multiple concurrent encoding happen due to the limited set of codes (256).

• The character set based on a 16-bit character encoding, called universal - UNICODE.

• It contains the unique codes for 65536 different characters.

• The transition to this system was limited by the insufficient resources of computing for a long time

Page 16: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of graphical information

• Graphic image is made up of tiny dots (pixels) which form a grid called a raster.

Page 17: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Example

• increasing in seven times

Page 18: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of graphical information

• Pixels with only two possible colors (black and white) can be encoded by two numbers - 0 or 1. So, it is necessary to use only 1 bit.

• For black and white illustrations it is generally accepted coding with 256 shades of gray. How many bits do we need then?

Page 19: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Example

Page 20: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of graphical information

• The color image on the screen is obtained by mixing three primary colors:

red (Red) green (Green)

blue (Blue)

Page 21: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of graphical information

Page 22: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Coding of graphical information

• While encoding color images, the principle of decomposition of any color on the basic components is used.

• Such a coding system is called RGB. • If for the encoding of each of the main

components of color it is used 256 bits, then the system provides 16777216 different colors.

Page 23: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Archiving of information

• Data archiving is the process of converting the information stored in a file to the form which reduces redundancy in its representation and thus requires less space for storage

Page 24: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Archiving of information

• Archiving (packing) movement of the source files into an archive file in a compressed format

• Decompression (unpacking) is the process of recovering files from the archive in the exact form which they had before archiving

Page 25: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Archiving of information

The aims:• accommodation in a more compact form on the

disk• reduction of time (or cost) of the transmission

of information through communication channels

• simplification of transferring files from one computer to another

• protection from unauthorised access

Page 26: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Archiving of information

• One of the first archiving method was proposed in 1844 by Samuel Morse in the coding system of Morse code.

• Frequent characters are coded in shorter sequences

Page 27: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Archiving of information

• In the 40-ies of the XX century the founder of the modern information theory Shannon and in independency with him Fano developed a universal algorithm for constructing optimal codes. There is an analogue of this algorithm which was proposed by Huffman.

• The principle of this algorithm is the encoding of frequently occurring characters by shorter sequences of bits.

Page 28: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Archiving of information

• In the 70's of the XX century Lempel and Ziv proposed algorithms LZ77 and LZW.

• The algorithm finds the repeated sequences and replace some numbers instead of these sequences according to the dynamically generated dictionary.

• Most modern archives (WinRar, WinZip) are based on the variations of the Lempel-Ziv algorithm.

Page 29: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Archiving of information

where Kc – the coefficient of the compressed file,

Vc – the volume of the compressed file,

Vr – the volume of the resource file.

The degree of the compression depends on the archiving program, the method and the type of source file

𝐾 𝑐=𝑉 𝑐

𝑉 𝑟

100%

Page 30: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Archiving of information

• The degree of compression for graphical, text and data files is 5-40%.

• The degree of compression for executable files is 60-90%.

• The degree of compression for archived files is 90-100%.

Page 31: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Archiving of information

• The self-extracting archive file is the boot executable module which is able to self-unzip contained files without using the archiver.

• Big archive files can be divided into several toms.

Page 32: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Shannon-Fano coding

Page 33: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

1. Develop a list of probabilities or frequency counts2. Sort the lists of symbols according to frequency3. Divide the list into two parts, with the total frequency

counts of the left part being as close to the total of the right as possible.

4. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1.

5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has a code.

Page 34: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Huffman coding

Symbol Codea1 0a2 10a3 110a4 111

Page 35: Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)

Huffman coding

• A source generates 4 different symbols with probability.

• A binary tree is generated from left to right taking the two least probable symbols and putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols.

• The process is repeated until there is just one symbol. • The tree can then be read backwards, from right to

left, assigning different bits to different branches.