7/24/2019 OpenSAP Hsta1 Week 4 Exercise http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 1/24 openSAP TEXT ANALYTICS WITH SAP HANA PLATFORM – WEEK 4Version: January 22, 2016 Exercises / Solutions Anthony Waite / SAP Labs, LLC. Bill Miller / SAP Labs, LLC.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
In SAP HANA Studio ......................................................................................................................................... 5
In SAP HANA Studio ....................................................................................................................................... 10
In SAP HANA Studio ....................................................................................................................................... 13
In SAP HANA Studio ....................................................................................................................................... 16
In SAP HANA Studio ....................................................................................................................................... 18
In SAP HANA Studio ....................................................................................................................................... 20
In SAP HANA Studio ....................................................................................................................................... 22
EXERCISE 1 – CREATE FULLTEXT INDEX WITH TEXT MINING ON AND
GET RELEVANT DOCUMENTS
Objective
In this exercise, you will execute a SQL statement to create a fulltext index for your copy of the workshopdata and specify the TEXT MINING ON parameter. Discover the top-ranked relevant documents based onan input term.
Exercise Description
Create fulltext index and text mining index from reference document set Monitor the progress and status of text processing
Execute the text mining function TM_GET_RELEVANT_DOCUMENTS with the input term “enzyme”
Show top-ranked documents from the reference documentation set relevant to “enzyme”
1) Under the “Repositories”tab, navigate to “(Default) /student00 / solutions /week-4”. Double-click on“exercises.sql”.
2) If there is “No connectionto database” displayed in the SQL console, click onthe “Choose Connection”icon, which is found to theright of the green circlewith an arrow (Execute)icon.
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
6) In the SQL console,highlight the following SQLsyntax:
SELECT * FROM SYS.M_FULLTEXT_QUEUES
WHERE SCHEMA_NAME =
'OPENSAP_TA_WORKSHOP' AND TABLE_NAME =
'student00.data::AWARDS';
Click on the “Execute” (greencircle with an arrow) icon or hit
the F8 key.
7) You can monitor theprogress and status of thetext analysis processing(tokenization, stemming andpart-of-speech tagging),which improves the qualityof text mining. After the jobfinishes, the text miningindex (a.k.a. term-documentmatrix) is created.
Note: Wait until all of thereference documents have beenindexed before executing thefollowing text mining functions.
8) In the SQL console,highlight the following SQLsyntax:
4) Notice this text miningfunction shows the top-ranked documents relatedto the initial input documentalready found in thereference documentationset. The initial input
document is also returnedwith a score of 1.0, since it’sa perfect match for itself.
1) In the SQL console,highlight the following SQLsyntax:
SELECT T.TERM,
T.NORMALIZED_TERM,
T.TERM_TYPE,
T.TERM_FREQUENCY,T.DOCUMENT_FREQUENCY,
T.SCORE
FROM
TM_GET_RELATED_TERMS (TERM 'enzyme'
SEARCH"AWARD_ABSTRACT" FROM
"OPENSAP_TA_WORKSHOP"."stu
dent00.data::AWARDS"
RETURN TOP 16
) AS T;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
2) Notice this text miningfunction shows the top-ranked related terms to theinput term "enzyme" alreadyfound in the referencedocumentation set as itreturns with a perfect scoreof "1".
In this exercise, provide an input document in order to determine the document categories from thereference collection that are most similar to the input document based on the terms used.
Exercise Description
Execute the text mining function TM_CATEGORIZE_KNN with a new input document
Show top most-similar categories from the reference documentation set matched to the new input
2) Notice the categorizationfunction determines the topcategories from the most-similar referencedocuments and does aweighted comparison byadding and normalizing the
strategy and possible future developments, products, and/or platformdirections and functionality are all subject to change and may be changed bySAP SE or its affiliated companies at any time for any reason without notice.The information in this document is not a commitment, promise, or legalobligation to deliver any material, code, or functionality. All forward-lookingstatements are subject to various risks and uncertainties that could causeactual results to differ materially from expectations. Readers are cautioned