OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 1/24

openSAP

TEXT ANALYTICS WITH SAP HANA

PLATFORM – WEEK 4

Version: January 22, 2016

Exercises / Solutions Anthony Waite / SAP Labs, LLC.Bill Miller / SAP Labs, LLC.



2

ContentsExercise 1 – Solution....................................................................................................................................... 5

In SAP HANA Studio ......................................................................................................................................... 5

Exercise 2 – Solution..................................................................................................................................... 10

In SAP HANA Studio ....................................................................................................................................... 10

Exercise 3 –

Solution..................................................................................................................................... 13












3

REMINDER BEFORE YOU START

System Host: HANA IP address

System Instance Number: 00

System User ID: SYSTEM

Password: Master Password you entered for the solution when creating

the instance in the SAP Cloud Appliance Library



4

EXERCISE 1 – CREATE FULLTEXT INDEX WITH TEXT MINING ON AND

GET RELEVANT DOCUMENTS

Objective

In this exercise, you will execute a SQL statement to create a fulltext index for your copy of the workshopdata and specify the TEXT MINING ON parameter. Discover the top-ranked relevant documents based onan input term.

Exercise Description

Create fulltext index and text mining index from reference document set Monitor the progress and status of text processing

Execute the text mining function TM_GET_RELEVANT_DOCUMENTS with the input term “enzyme”

Show top-ranked documents from the reference documentation set relevant to “enzyme”



5

EXERCISE 1 – SOLUTION

In SAP HANA Studio

Steps Screenshot

1) Under the “Repositories”tab, navigate to “(Default) /student00 / solutions /week-4”. Double-click on“exercises.sql”.

2) If there is “No connectionto database” displayed in the SQL console, click onthe “Choose Connection”icon, which is found to theright of the green circlewith an arrow (Execute)icon.



6

3) In the “Choose Connection”dialog, select theappropriate database.

Click the “OK” button.

4) In the SQL console,highlight the following SQLsyntax:

SET SCHEMA

OPENSAP_TA_WORKSHOP;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

Note: If you close this sessionat any point while working onWeek 4 exercises, you will needto re-execute this command atthe start.


CREATE FULLTEXT INDEX AWARDS_IDX ON

"student00.data::AWARDS"(A

WARD_ABSTRACT)

FAST PREPROCESS OFF



7

TEXT MINING ON;



SELECT * FROM SYS.M_FULLTEXT_QUEUES

WHERE SCHEMA_NAME =

'OPENSAP_TA_WORKSHOP' AND TABLE_NAME =

'student00.data::AWARDS';

Click on the “Execute” (greencircle with an arrow) icon or hit

the F8 key.

7) You can monitor theprogress and status of thetext analysis processing(tokenization, stemming andpart-of-speech tagging),which improves the qualityof text mining. After the jobfinishes, the text miningindex (a.k.a. term-documentmatrix) is created.

Note: Wait until all of thereference documents have beenindexed before executing thefollowing text mining functions.


SELECT T.FEDERAL_AWARD_ID_NUMBER,

T.AWARD_TITLE,

T.TOTAL_TERM_COUNT,

T.SCOREFROM

TM_GET_RELEVANT_DOCUMENTS

(TERM 'enzyme'

SEARCH

"AWARD_ABSTRACT" FROM

"OPENSAP_TA_WORKSHOP"."student00.data::AWARDS"

RETURN TOP 16

FEDERAL_AWARD_ID_NU

MBER, AWARD_TITLE



8

) AS T;


9) Notice this text miningfunction shows the top-ranked documents relevantto the input term "enzyme".

Note: You can find conceptsand the usage of the text miningcapabilities in the SAP HANAText Mining Developer Guideposted on the SAP Help Portal.



9

EXERCISE 2 – GET RELATED DOCUMENTS WITH REFERENCE

DOCUMENT

Objective

In this exercise, discover the top-ranked related documents based on an input document found already in thereference collection.


View the initial input document from the reference document set about enzymes

Execute the text mining function TM_GET_RELATED_DOCUMENTS with the input document aboutenzymes

Show top-ranked documents from the reference documentation set related to the input document about

enzymes



10


In SAP HANA Studio

Steps Screenshot


SELECT * FROM

"student00.data::AWARDS"

WHERE

FEDERAL_AWARD_ID_NUMBER =

1330760;

Click on the “Execute” (greencircle with an arrow) icon or hit

the F8 key.

2) Notice the document fromthe reference document setis about enzymes.


SELECT

T.FEDERAL_AWARD_ID_NUMBER,

T.AWARD_TITLE,T.TOTAL_TERM_COUNT,T.SCORE

FROM

TM_GET_RELATED_DOCUMENTS (

DOCUMENT IN FULLTEXT INDEX WHERE

FEDERAL_AWARD_ID_NUMBER =

1330760SEARCH


"OPENSAP_TA_WORKSHOP"."stu

dent00.data::AWARDS"

RETURN TOP 16

FEDERAL_AWARD_ID_NUMBER, AWARD_TITLE

) AS T;




11

4) Notice this text miningfunction shows the top-ranked documents relatedto the initial input documentalready found in thereference documentationset. The initial input

document is also returnedwith a score of 1.0, since it’sa perfect match for itself.



12

EXERCISE 3 – GET RELATED DOCUMENTS WITH NEW DOCUMENT

Objective

In this exercise, discover the top-ranked related documents based on a new (previously unseen) inputdocument.


Execute the text mining function TM_GET_RELATED_DOCUMENTS with a new input document

Show top-ranked documents from the reference documentation set related to the new input document



13


In SAP HANA Studio

Steps Screenshot


SELECT

T.FEDERAL_AWARD_ID_NUMBER,T.AWARD_TITLE,

T.TOTAL_TERM_COUNT,

T.SCORE

FROM

TM_GET_RELATED_DOCUMENTS (DOCUMENT '

The molecule known ascoenzyme A plays a key

role in cell metabolism byregulating the actions of

nitric oxide. Coenzyme A

sets into motion a process

known as proteinnitrosylation, which

unleashes nitric oxide to

alter the shape andfunction of proteins

within cells to modify

cell behavior. The purpose

of manipulating the

behavior of cells is totailor their actions to

accommodate the ever-changing needs of the

body’s metabolism.

'

SEARCH"AWARD_ABSTRACT" FROM



RETURN TOP 16

FEDERAL_AWARD_ID_NU

MBER, AWARD_TITLE) AS T;




14

2) Notice this shows the top-ranked documents relatedto a new input documentnot found in the referencedocument set.



15

EXERCISE 4 – GET RELEVANT TERMS

Objective

In this exercise, discover the top-ranked relevant terms (key phrases) that describe a document.


Execute the text mining function TM_GET_RELEVANT_TERMS with an input document already found in

the reference document set

Show top-ranked relevant terms from the reference documentation set that describe the input document



16


In SAP HANA Studio

Steps Screenshot


SELECT T.TERM,

T.NORMALIZED_TERM,T.TERM_TYPE,

T.TERM_FREQUENCY,

T.DOCUMENT_FREQUENCY,T.SCORE

FROM TM_GET_RELEVANT_TERMS (

DOCUMENT IN FULLTEXT INDEX WHERE

FEDERAL_AWARD_ID_NUMBER =1330760

SEARCH



RETURN TOP 16

) AS T;


2) Notice this shows the top-ranked relevant terms (keyphrases) that describe theinput document alreadyfound in the referencecollection.



17

EXERCISE 5 – GET RELATED TERMS

Objective

In this exercise, discover the top-ranked related terms based on co-occurrence to an input term.


Execute the text mining function TM_GET_RELATED_TERMS with the input term “enzyme”

Show top-ranked terms from the reference documentation set related to the input term “enzyme”



18


In SAP HANA Studio

Steps Screenshot


SELECT T.TERM,

T.NORMALIZED_TERM,

T.TERM_TYPE,

T.TERM_FREQUENCY,T.DOCUMENT_FREQUENCY,

T.SCORE

FROM

TM_GET_RELATED_TERMS (TERM 'enzyme'




RETURN TOP 16

) AS T;


2) Notice this text miningfunction shows the top-ranked related terms to theinput term "enzyme" alreadyfound in the referencedocumentation set as itreturns with a perfect scoreof "1".



19

EXERCISE 6 – GET SUGGESTED TERMS

Objective

In this exercise, discover the top-ranked terms matching an initial substring.


Execute the text mining function TM_GET_SUGGESTED_TERMS with the input substring “enz”

Show top-ranked suggested terms from the reference documentation set matching the input substring

“enz”



20


In SAP HANA Studio

Steps Screenshot


SELECT T.TERM,

T.NORMALIZED_TERM,

T.TERM_TYPE,

T.TERM_FREQUENCY,T.DOCUMENT_FREQUENCY,

T.SCORE

FROM

TM_GET_SUGGESTED_TERMS (TERM 'enz'




RETURN TOP 16

) AS T;


2) Notice this text miningfunction shows the top-ranked suggested terms tothe input substring "enz".



21

EXERCISE 7 – CATEGORIZE

Objective

In this exercise, provide an input document in order to determine the document categories from thereference collection that are most similar to the input document based on the terms used.


Execute the text mining function TM_CATEGORIZE_KNN with a new input document

Show top most-similar categories from the reference documentation set matched to the new input

document



22


In SAP HANA Studio

Steps Screenshot


SELECT T.RANK,

T.CATEGORY_VALUE,

NEIGHBOR_COUNT, SCORE

FROM TM_CATEGORIZE_KNN (

DOCUMENT ' The molecule

known as coenzyme A playsa key role in cell

metabolism by regulatingthe actions of nitric

oxide. Coenzyme A sets

into motion a processknown as protein

nitrosylation, which

unleashes nitric oxide to

alter the shape andfunction of proteins

within cells to modify

cell behavior. The purpose

of manipulating thebehavior of cells is to

tailor their actions to

accommodate the ever-changing needs of the

body’s metabolism.

' SEARCH

NEAREST NEIGHBORS 15



RETURN TOP 16

PROGRAM FROM "OPENSAP_TA_WORKSHOP"."stu


) AS T;




23

2) Notice the categorizationfunction determines the topcategories from the most-similar referencedocuments and does aweighted comparison byadding and normalizing the

similarities for eachcategory value.



www.sap.com

© 2015 SAP SE or an SAP affiliate company. All rights reserved.No part of this publication may be reproduced or transmitted in any formor for any purpose without the express permission of SAP SE or an SAPaffiliate company.SAP and other SAP products and services mentioned herein as well as theirrespective logos are trademarks or registered trademarks of SAP SE (or anSAP affiliate company) in Germany and other countries. Please seehttp://www.sap.com/corporate-en/legal/copyright/index.epx#trademark foradditional trademark information and notices. Some software productsmarketed by SAP SE and its distributors contain proprietary softwarecomponents of other software vendors.National product specifications may vary.These materials are provided by SAP SE or an SAP affiliate company forinformational purposes only, without representation or warranty of any kind,and SAP SE or its affiliated companies shall not be liable for errors oromissions with respect to the materials. The only warranties for SAP SE orSAP affiliate company products and services are those that are set forth inthe express warranty statements accompanying such products and services,if any. Nothing herein should be construed as constituting an additionalwarranty.In particular, SAP SE or its affiliated companies have no obligation to pursueany course of business outlined in this document or any related presentation,or to develop or release any functionality mentioned therein. This document,or any related presentation, and SAP SE’s or its affiliated companies’

strategy and possible future developments, products, and/or platformdirections and functionality are all subject to change and may be changed bySAP SE or its affiliated companies at any time for any reason without notice.The information in this document is not a commitment, promise, or legalobligation to deliver any material, code, or functionality. All forward-lookingstatements are subject to various risks and uncertainties that could causeactual results to differ materially from expectations. Readers are cautioned

http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark



OpenSAP Hsta1 Week 4 Exercise

Documents