Top Banner
© Digital Sonata Pty Ltd -1 MiaMia: Poor Man's Internet By Vadim Berman Introduction Today mobile communication as a technology has two faces. First, it's a glitzy parade of smart phones, 3G, wireless internet, built-in cameras boasting megapixels of resolution, games, and other things you would not have expected to find in a tiny device in your pocket. Second, it is a much less trumpeted phenomenon of finally bringing modern communication capabilities to people in remote parts of the world. Techies are rarely fascinated with the old or the trivial. It is not surprising then, that the statistics about 9% in the mobile user community who use mobile internet, make the headlines – but the facts that there are 600 million mobile subscribers in China, and half of them don't even have a voice mail box, go unnoticed. Not many people know that the world's best-selling model is the cheap Nokia 1100, with over 200 million units sold until 2007. It does not support internet, let alone 3G. This makes the humble no-frills devices more significant than iPhone, both from change-making perspective and monetary terms. The low-tech mobile communication is not just an empty half of the glass; it is more like a significant 90%. Clearly 90% of the users cannot be ignored. Using the New to Power the Old Sending and receiving text is a capability that nearly every mobile phones has. The original purpose of the internet was getting answers. MiaMia's purpose is to provide this basic functionality. A user asks a question, and gets an answer. The question can be asked via voice or SMS, and the answer is delivered via SMS or email. MiaMia fundamentally differs from other answering services. Rather than completely delegating the task to humans, or on the opposite, providing limited functionality with fixed syntax, MiaMia is a hybrid engine, where humans work alongside powerful natural language processing software. MiaMia is not just limited to English; it is multilingual.
15
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-1

MiaMia: Poor Man's Internet

By Vadim Berman

Introduction

Today mobile communication as a technology has two faces. First, it's a glitzy parade of

smart phones, 3G, wireless internet, built-in cameras boasting megapixels of resolution,

games, and other things you would not have expected to find in a tiny device in your

pocket. Second, it is a much less trumpeted phenomenon of finally bringing modern

communication capabilities to people in remote parts of the world.

Techies are rarely fascinated with the old or the trivial. It is not surprising then, that the

statistics about 9% in the mobile user community who use mobile internet, make the

headlines – but the facts that there are 600 million mobile subscribers in China, and half

of them don't even have a voice mail box, go unnoticed. Not many people know that the

world's best-selling model is the cheap Nokia 1100, with over 200 million units sold until

2007. It does not support internet, let alone 3G. This makes the humble no-frills devices

more significant than iPhone, both from change-making perspective and monetary terms.

The low-tech mobile communication is not just an empty half of the glass; it is more like

a significant 90%. Clearly 90% of the users cannot be ignored.

Using the New to Power the Old

Sending and receiving text is a capability that nearly every mobile phones has. The

original purpose of the internet was getting answers. MiaMia's purpose is to provide this

basic functionality. A user asks a question, and gets an answer. The question can be asked

via voice or SMS, and the answer is delivered via SMS or email.

MiaMia fundamentally differs from other answering services. Rather than completely

delegating the task to humans, or on the opposite, providing limited functionality with

fixed syntax, MiaMia is a hybrid engine, where humans work alongside powerful natural

language processing software. MiaMia is not just limited to English; it is multilingual.

Page 2: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-2

We use powerful NLP software to power up MiaMia's artificial brain; it first tries to

understand the question, then sees whether it has the knowledge or tools to answer it. If

not, the question is delegated to humans, whose answer is inspected and learned.

MiaMia's purpose, as the title implies, is to serve as a poor man's internet. MiaMia can be

perceived as an NLP and basic reasoning application server, where the most standard

tasks are handled by small applications, or �LPlets, if you will. Does a specific class of

questions require looking up data in a database? Connecting to a web service? Searching

the web? With all the dirty job of disambiguation, communication with the user, part of

speech tagging, language detection, domain extraction, and insanely complex NLP logic,

done by MiaMia, creating a new NLP-let is similar to building a widget. The NLPlets are

triggered not by keywords, but by special language-neutral criteria.

The NLPlets are best demonstrated using MiaMia's test client application. The test client

connects to the MiaMia database and NLP engine, parses the question and decides what

NLPlet is to be invoked. The test client outputs, in addition to the reply, the language

detected by the system, domains, and the application code such as an SQL query or a

URL to call a web service.

Restaurant search is a standard fare these days. Figure 1 demonstrates an answer to a

relatively simple question, “Are there Middle Eastern eateries in Ghent?”

Fig. 1

Page 3: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-3

When asked about Middle Eastern restaurants in a particular location, the engine does not

just look for restaurants saying they are “Middle Eastern”; it analyzes the question in

depth, also trying to find restaurants with more specific cuisine by expanding the

argument using semantic connections. The T=x% fragments provide crosslingual

representation, which means that a user can search in different languages.

It is not just about matching the cuisine, of course; the questions can be more complex. If

you are a lost tourist, the currency conversion feature should come handy, as shown on

the Figure 2.

Page 4: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-4

Fig. 2

As can be seen on figure 2, the colloquial “bucks” are interpreted as “American dollars”,

which in turn are converted into the system currency, euro (USD90=EUR57.29 at the

time).

and if you want to try something even more complex, we can ask about a dish, special

features, payment methods, and more, as shown on Figure 3.

Fig. 3

Page 5: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-5

Page 6: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-6

MiaMia is not just about food and establishments. If the news are so big in the regular

internet, there is no reason why the “poor man's internet” can't have it. The user does not

have to scream NEWS; MiaMia can understand a subtle hint as well, as shown on the

figure 4.

Fig. 4

A question like this will work also in Dutch, as shown on Figure 5, and French, as shown

on Figure 6.

Fig. 5

Page 7: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-7

Page 8: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-8

Fig. 6

We have established the search function, but what about communications? If a user wants

to use email, MiaMia can help here as well. A user simply asks to send a message to an

email address, and it will be sent to the recipient, as shown on Figure 7, and received, as

shown on Figure 8.

Fig. 7

Page 9: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-9

Fig. 8

Perhaps the most interesting usage of MiaMia is the creation of mobile social networks.

In communities not used to, or having limited access to the internet, building a low-cost

“mobile craigslist” may create information infrastructure where otherwise it would take

years to build.

Imagine that you're a craftsman in the middle of nowhere. You want to expand your

reach, but the locals don't often visit the net for information. Even the landlines are

scarce, so Yellow Pages are not a hit either. This is where MiaMia comes handy: figure 9

shows how a simple message is analyzed to update the database. This way, our mini-web

becomes truly interactive.

Page 10: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-10

Fig. 9

Page 11: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-11

Under the Hood: Carabao Language Kit

The MiaMia NLP application server relies on a natural language processing module

called Carabao Language Kit. Carabao is a comprehensive suite of tools and components,

performing a range of tasks, from text analysis to paraphrasing and translation.

Carabao views text as raw material which is eventually transmogrified into entities and

concepts. The input is broken into tokens, which are studied, analyzed, and are assigned

into hundreds of what-if's. Based on the results of these games, Carabao decides what

each word exactly means. Is this bass the fish or bass the voice? It's all about the context,

and, as Figure 10 shows, Carabao can figure out which one is which.

Fig. 10

Possibly the most distinctive feature of the suite's architecture is its linguistic abstraction.

The concepts like “noun”, “adjective”, “verb”, “part of speech”, “morphological case”,

Page 12: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-12

are not known to the kernel. They are stored in a transactional database, which can be

edited to alter any aspect of the processing, as shown on Figure 11.

Fig. 11

The fact that the linguistic logic is not hard-coded, opens boundless possibilities for

customization. A deployment team or a 3rd party can modify absolutely everything,

develop a new language, or completely change the semantic network. No IT skills are

necessary: there are GUI tools for editing the database.

Grammar is often intertwined with semantics; often semantics can be deduced from the

morphology, in other cases grammar is influenced by the word's meaning. Therefore,

Carabao does not have solid distinction between pragmatics and grammatical rules. An

entity called sequence takes care of a range of tasks, from part of speech tagging,

pragmatics mapping, to transfer rules in translation mode, and entity / event extraction. A

sequence can be described as a “regular expression for natural language processing”,

where an element represents an element in a sentence. A mini-language expresses various

properties required or assigned to the element, resulting in a condensed code, which

usually scares away the uninitiated. However, knowing this code is not required. For

example, a sequence like this:

T=91730$O=1$P=1$I=4%T=311$R10=BASE$I=3%T=306$R1=PART$I=2%R1=V

ERB$R10=BASE$R23=BODY$I=1%

is presented and edited in an editor tool like the one shown on Figure 12.

Page 13: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-13

Fig. 12

From superficial examination of the sequence code, it can be deduced that the words are

not linked to the actual text or even stems. Rather than that, sequence members are linked

by actual sense. The same paradigm is used to create a language neutral representation of

sentences, which can be searched using any language in Carabao's database.

As Carabao is a universal reasoning machine, it can be used to refine intermediate results

in 3rd party applications such as speech recognition and optical recognition software; after

all, you need brain between your ears to make out what was said. Figure 13 shows how

Carabao selects the most “sensible” version of similarly sounding words.

Page 14: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-14

Fig. 13

History

In early 2000s, Lernout & Hauspie, the Belgian speech and language technology giant,

went bankrupt. In its last years, when the mobile communication was still limited to

phone calls and occasional messages, the company was busy, as it has always been,

building the future: a search engine called SofIA (Society of Intelligent Assistants).

SofIA's purpose was simple yet revolutionary: get a spoken question, return an answer.

The development halted when the company collapsed. The founders of L&H parted

ways.

Page 15: Microsoft Word - MiaMiaTech

© Digital Sonata Pty Ltd

-15

In another part of the planet, Vadim Berman, an Israeli software engineer, was building

an unorthodox automatic translation engine. The purpose of the experiments was to

create a generic analysis and transformation machine rather than a regular rule-based, or a

statistical translator. The hobby became an obsession; the obsession became the source of

living, and, after 5 years and 3 countries, it materialized into Carabao, the linguistic

suite, and Digital Sonata, an Australian company providing linguistic engineering

products and services.

Jo Lernout, one of the founders of L&H, did not give up the dream of a more intuitive,

easier to use search engine. MIIA Holding was founded in Hong Kong in order to

implement the idea. Seeking guidance in business matters from a veteran in the language

engineering industry, Vadim Berman contacted Jo Lernout, to discover that Jo is still

active in the field, and that there is an excellent match between MIIA's needs and

Carabao.

Today, MiaMia is offered as a service, or as a server for custom applications. Access to

information is necessary to get around in human society, whether it is industrialized or

developing. We are working hard to make MiaMia available in developing countries.

With broad user base, no user learning curve, and no infrastructure required, MiaMia can

become a real catalyst of change.