How simple should Text Analytics be? (Playing with Semantria.com )
Nov 19, 2014
Disclaimer: this Powerpoint document was written 5 minutes after I first saw a demonstration …I haven’t tested it myself yet to find its
limitations.There are other text analytics providers out there.
Semantria’s key benefits.
1. Excel based* – everyone knows Excel.
2. It’s for SME’s, rather than Enterprise scale
companies - -with a pricing model to match, (so
even someone in a hurry in a large corporation can
also get it signed off quickly. )
3. Fast! - Processes up to 2000 docs/sec.†
4. Has Lexalytics as its engine – Radian6 etc use
Lexalytics as their sentiment engine
*Excel for Windows 2010 † We did 1000 tripadvisor comments quicker
than a sneeze
Semantria’s key benefits...cont.
5. It is simple to use – even I understood it.
Procedure
1. Pull verbatims into Excel (see next slide)
2. Start analysis
3. Document mode
4. Select range
5. [processes the document]
6. Get the results…
7. Yep, that easy.
Notes
1. It shows multiple entries for single verbatims –
because that’s the way it is
2. Document sentiment from a bank of 1.8 million
sentiment phrases (e.g. good, v good) with
‘amplifi ers’ (e.g. really + good)– logarithmic scale of -
7 -> +7, 95% of results fall within -1 -> +1
3. Entity = people, places, companies, job titles, times
etc.
4. Entity evidence – how many phrases are there to
support that sentiment judgement*
5. Themes – noun-phrases which are important to the
document and bear the most value to the theme of the
sentiment
* (1=ok for Twitter because it’s a short communication; ignore 1 for longer
document types)
Notes…cont.
7. Categorisation Engines built-in: so the user doesn’t
need to train the software in the user’s industry.*
8. Query – allows you to further personalise categories
to suit the user’s industry/specifi c needs
* searched 7TB of Wikipedia to build a giant thesaurus at the heart of the
engine…it knows ‘Coca Cola’ is a beverage, closely related to vodka, not
at all related to shoes…
The ‘Collection’ option allows users to dive into the problems identified in the first stage
1. You have an overview from previous stage (e.g. ‘rude’
+ ‘staff ’ seems to be coming up a lot).
2. Build a ‘Query’ that helps you identify which
posts/verbatims have this as a theme (e.g. look for
verbatims where ‘rude’ occurs within 20 characters of
‘staff ’)
3. You can then either contact the customer who posted
the verbatim and apologise; or refer the matter
internally.
I’m off to try it now…
*10,000 API calls as part of a free trial.
I’ll let you know how it goes…
Appendices
Verbal Identity is a brand consultancy specialising in language.
We creates language which creates value for our clients.
We work for brands in automotive, retail and telecoms.
Find out more:www.verbalidentity.co.uk
As part of our approach to providing quantifiable solutions, we will work with a number of text
analytics providers.We have no connection with Semantria.com