Top Banner
Types of Digital Data Introduction Digital data can be classified into three forms: unstructured data Semi-structured data structured data BI and Its Applications
29
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Types of Digital DataIntroduction Digital data can be classified into three forms: unstructured data Semi-structured data structured data

BI and Its Applications

unstructured datasemi structured data

structured data

Distribution of digital data in three formsBI and Its Applications

Unstructured data : this is the data which does not conform toa data model or is not in a form which can be used easily by computer program Examples : Memos, chat room, ppt presentation, images, videos, letters, body of an email..Semi-structured data : this is data which does not conform to a data model but has some structure. However it is not in a form which can be used easily by a computer program. Metadata for this data is available but is not sufficient Examples: emails, XML, markup languages like HTML.

BI and Its Applications

Structured data: this is the data which is in an organized form (eg. rows and column ) and can Be easily used by a computer program Relationships exists between entities of data, such as classes and their objects. Data stored in DB is an examples of structured dataBI and Its Applications

Goodlife DB Doctors or nurses notes in an electronic report Emails sharing information about consultations or investigations Surveillance system report Narrative portions of electronic medical records Investigative reports Chat room

BI and Its Applications

Goodlife maintains a databse which stores data only in a structured format.GOODLIFE HEALTHCARE Patient Index Card Patient ID Nurse name Patient name Patient age Date

Body temperatureA snapshot of structured data

Blood pressure

BI and Its Applications

Getting to know structured dataCharacteristics of structured dataSimilar entities are groupedConform to a data model

Data is stored in the rows and column

Structured data Attributes in a group are the sameDefinition format and meaning of data is explicitly known

Data resides in fixed fields within a record or a file

BI and Its Applications

Structured data is organized in semantic chunks with similar entities grouped together to form relations or classes. Entities in the same group have the same description i.e attributes Description of all entities in group Have the same defined format Have same predefined length And follow the same ordersBI and Its Applications

Where does structured data come from?Databases

Structured data

Spreads sheets

SQL OLTP SystemsBI and Its Applications

To summarizes, structured data Consists of fully described data sets Has clearly defined categories and sub categories Is placed neatly in rows and columns Goes into the records and hence the database is regulated by well defined structure Can be indexed easily either by the DBMS itself or manually

BI and Its Applications

Its so easy with structured data Storage Scalability Security Update and delete

BI and Its Applications

Hassle free retrievalRetrieving information

Indexing and searching Ease with structured data Data mining

BI operations

BI and Its Applications

Getting to know unstructured dataDoes not conform to any data model Cannot be stored in the form of rows and column as in a DB

Has no easily identifiable structure Unstructured data

Not in any particular format or sequence

Does not follow any rules or semantics

Not easily usable by programBI and Its Applications

Where does unstructured data come from? Web pagesmemos videos images Body of an email Unstructured data Word document

pptchats surveys White papersBI and Its Applications

report

Unstructured data can be classified into two broad categories Bitmap objects: images, video, audio file Textual objects : word documents, emails. excel sheet

Lots of unstructured data is also noisy text such as chats, emails, and sms text

BI and Its Applications

Multimedia

Image Map

XML

Web pages

database text A Typical web pageBI and Its Applications

How to manage unstructured data Indexing Tags/ metadata Classification/ taxonomy CAS ( content addressable storage): it stores thedata based on their metadata. It assigns a unique name to every object stored in it. The object is retrieved based on its content and not its location

it is extensively to store emails etc.BI and Its Applications

How to store unstructured data? Storage space: it is difficult to store and manageunstructured data. A lot of space is required to store such data : ex: images , audio

Scalability : as the data grows, scalability becomes an issueand the cost of storing such data grows.

Retrieving information: even if the unstructured data isstored, it is difficult to retrieve and recover from it.

Security update and delete Indexing and searchingBI and Its Applications

Solutions to storage challenges of unstructured dataChanging the format: example converting audio, video data to text Developing ne hardware: Storing in RDBMS/BLOBs : binary large objects (BLOB) Storing in XML Format : XML format which tries to give some structure to it by passing tags and elements. CAS ( content addressable storage):BI and Its Applications

How to extract information from stored unstructured data?Few challenges in extracting the Unstructured data Interpretation : unstructured data is not easily interpreted by conventional search algorithms. Indexing: Deriving meaning : computer program cannot automatically derive meaning / structure from unstructured data. File format: increasing number of file formats makes it difficult to interpret data. Tags: as the data grows, it is not possible to put tags manuallyBI and Its Applications

The possible solutions to the challenges just mentioned are described bellow: Tags: unstructured data can be stored in a virtual repository and be automatically tagged Text mining: this tools help in grouping as well as classifying unstructured data and assist in analyzing by considering grammar, context, synonyms etc. Application Platforms: application platforms like XOLAP help extract information from email and XML Based documents Classification / taxonomy : taxonomy within the organization can be managed automatically to organize data in hierarchical structures.BI and Its Applications

Naming conventions / standards : following naming conventions or standards across an organization can greatly improve storage, retrieval, index and search.

BI and Its Applications

UIMA: A possible solution for unstructured dataUIMA ( Unstructured Information Management Architecture)is an open source platform from IBM which integrates different kinds of analysis engines to provide a complete solution for knowledge discovery from unstructured data. UIMA stores information in a structured format. The structured resources can be then mined searched, and put to other users.

BI and Its Applications

Various analysis engines analyze unstructured data in different ways such as : breaking up of documents into separate words. grouping and classifying according to taxonomy. detecting parts of speech, grammar, and synonyms Detecting events and times Detecting relationships between various elements.

BI and Its Applications

UIMAAcquired from various sources Subjected to semantic analysis

red data

Structured information Query & Presentation Structured information access

users

BI and Its Applications

Getting to know semi structured dataSimilar entities are grouped Does not conform to a data model but contain tags and elements Cannot be stored in the form of rows and columns as in a DB

Attributes in a group may not be the same

Semi structured data

The tag and elements describe data is stored

Not sufficient metadata

Characteristics of semi structured data

BI and Its Applications

Address1 Address2 Email follows the standard formt to: from: subject: CC: body : BI and Its Applications

ABC Healthcare Blood test report Date Department Attending doctor

Patient nameHb content RBC count WBC Count Platelet count Diagnosis < notes> Conclusion

Patient age

The blood test report, an example for semi structured dataBI and Its Applications

BI and Its Applications