Top Banner
b e 3 Big Data in a Box! Part – I : Comprehending the Landscape By Kalyana Chakravarthy Kadiyala Contact Info: Tweet #ganaakruti Email [email protected] LinkedIn http://www.linkedin.com/in/kadiyalakc/ insights: f(data) = log(data) context ;context = {0,…,∞}
15

Be3 experimentingbigdatainabox-part1:comprehendingthescenario

Oct 31, 2014

Download

Technology

The wave of Big Data is still in its high peaks, with age of prominence at about 5 years. Many are still amused, while few fortunate folks had a taste of it. Taste with essence. Few linger around the topics, terminology, and other buzz!

This is a series attempt to gain our arms around the Domain and key coordinates of the subject. Subsequently dwell a bit deeper on implementation challenges, navigating a bit close to the core of the challenges. Whet tools, solution approaches and how knowledge from other related fields of Science fit into the overall ball game!

Main abode for this going forward will be at www.ganaakruti.com.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

be3

Big Data in a Box!

Part – I : Comprehending the Landscape

By

Kalyana Chakravarthy Kadiyala

Contact Info:

Tweet – #ganaakruti

Email – [email protected]

LinkedIn – http://www.linkedin.com/in/kadiyalakc/

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Page 2: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

Table of Contents 1.Disclaimer..................................................................................................................................... 3 2.Foreword....................................................................................................................................... 4 3.Datanomics – The Science behind Big Data..................................................................................5 4.Big Data – Technical Enumeration.................................................................................................7

4.1 How much big is Big?.............................................................................................................74.2 Handling the Big.....................................................................................................................84.3 Lego talk – Key Building blocks...............................................................................................94.4 Controls to Throttle...............................................................................................................10

5.Big Data – Data-driven Infotainment...........................................................................................115.1 Infotainment – Defining Moment..........................................................................................115.2 Enterprises – What do they do?............................................................................................125.3 Role of an Analyst.................................................................................................................125.4 Netizens – Ideal State...........................................................................................................13

6.Conclusion..................................................................................................................................14 7.Keywords.................................................................................................................................... 14 8.Bibliography................................................................................................................................15

Illustration IndexIllustration 1: Connected World – Digitized Human Interactions........................................................5Illustration 2: Chaos Condition - Interaction Derailment....................................................................6Illustration 3: Big Data Challenges - a visual perspective!................................................................7Illustration 4: Focus Areas – Big Data Solution Design......................................................................9Illustration 5: Moving Parts - Layered Approach..............................................................................10Illustration 6: Key Throttles to control!............................................................................................10Illustration 7: Data-Driven Infotainment..........................................................................................11Illustration 8: Enterprise Infotainment.............................................................................................12Illustration 9: Connected World Experience - “All things Me”..........................................................13

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Page 3: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

1. DisclaimerBelow are few pointers to help drive your expectations about this article, help you form a context

baseline and let you explore the information to suit your consuming appetites.

• What is it about? It's an article about Big Data, covering aspects since its age of prominence.

• Why now? There is lot of buzz, jargon and many variants of tool kits and solutions alike.

• How is it approached? Key domain coordinates are mapped besides scenario comprehension.

• Concepts touched? Definitions with logical reasoning, key impact areas and drivers.

• Audience? Technical and Functional audience who are intrigued by the phenomenon of Big Data. Categorically, this is meant for someone who'd like to get arms around the subject as a reconciliation attempt.

• Gotchas! For the most part we try to stay focused on the functional aspects. Few sections sway a bit into technicalities to cover the extents of the subject. You can ignore those if you don't want to.

• Handling? Elucidation of technical concepts is tied to application by a use case. At least a reasoning is given to touch on the aspect of 'why'. These concepts are classified into each player segment of socio-economics – Enterprises, Analysts and End Users.

• Objective – Intent is multi-faceted here, as described below:

◦ Gain a comprehensive understanding about the domain

◦ Gather essential coordinates to navigate the landscape

◦ Set stone to build a Big Data Platform to run controlled big data science experiments

◦ Solutions built using unit metrics on top of this platform should run in environments that support similar configurations. Again, just by being able to deal with configurations

• Why this approach? This is a validation approach as we consume different topics, tool kits and implementation concepts. This requires an approach that is agnostic to any connected environment or platform. Hence this approach, where we can travel beyond mere hello world intros.

• Is this a single shot? The idea here is to split the learning approach into a series of activities and share the lessons in a series of connected articles. Code examples will be shared via Git. This article covers the landscape highlights. Sequels dive deep into specifics with use cases and experiments to solve specific problems.

• Note on Consumption

◦ Absolute No's – plagiarism is prohibited by all means and conditions.

◦ Questions – approach the Author

◦ Criticism – welcome by all means. It helps fail early than fail in this endeavor

◦ Feedback – tweets, email, wall posts or general comments to follow

◦ Terms & Conditions – Author has adopted the GPL license. Terms and conditions in there apply here too!

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Page 4: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

2. ForewordThe wave of Big Data is still riding the markets, in its high peaks sweeping across the Industries and

Geographies. Deep within the tech-dungeons the effects of euphoria still linger over many minds. Select groups had an opportunity to experience the phenomenon, such as those with opportunity to deal with data generated from astronomical scale of events. At least events that lead to generation of digital footprints we see today. This is long before the lexicon went viral and gained its prominence. Others gazed and still gaze at it with amusement.

Current trend had its genesis at least a decade ago. During the high days of Telecommunications and the Internet revolution, in general. Key contributors to the innovation we see and experience today, include players from the verticals such as Digital Marketing, Information search, Telecommunications and Social Media (relatively a new comer). We now have several platforms to support management of exponentially large digital footprints. Also, are tool kits that enable us with insights from such footprints. Quality and success of such tools is measured by their effective handling of noise levels in the data streams.

Age of prominence is probably about five years now. Reason we get drawn to it is because we seek betterment. Betterment in our overall Socio-Economic condition as well as how we engage in interactions. The dynamics have changed by far. It all starts with an inquisitive impulse, gradually settling in as an essential need than a desire. This resulted in emergence of a new breed of humans – Netizens. At least within the Urban realms. Many more are being touched by the spread of undercurrents to new territories. Enterprises are driving the adoption, primarily for their own self reasons such as heed the competition, sustain bottom-lines and accelerate top-line performance.

With adoption rates on the rise, the Information highways are seeking expansion in their transport capabilities and processing efficiencies. Phrases such as text me, tweet me, post it to my wall, customer-360, sentiments, emotions and so forth are now considered part of common expressions. When set in motion the data nuggets go through various stages of transformation. They also get cloned at times. Related technical terms include Noise, Signals, Data Science, Data Streams, etcetera.

Said that, in this article we will first seek coordinates to map out the Big Data landscape. We will begin by describing the factors that drove the phenomenon so far. Then try to paraphrase the subject in a more logical sense, using both technical and functional aspects. Finally, provide a perspective on how each stakeholder categories perceive and opine about the situation. At least how they try to relate the effects to their socio-economic conditions. Comprehension approach leverages terms such as Big Data, Voids – Data & Digital, Datanomics, Socio-Economics, Human Cognition, etc. Visual cues to support verbal elaboration are used appropriately to touch on the high level detail.

All theory and no practicals is not a good way of learning either. It's a fair question to ask for a practical exposure. In subsequent articles we will dwell into specific aspects. Go beyond abstractions, pick some specific use cases and try them using a Big Data Box. We will build the platform in the due course. Canonical title given to this complete attempt is – “be3: Experimenting Big Data in a Box!”.

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Page 5: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

3. Datanomics – The Science behind Big DataDatanomics – another gibberish as it may sound, it is the science behind All things Big Data. It is a lexicon

derived from two other famous words – the Data and the Economics. Data as most of us know, is a nugget of information that can describe an entity or an event either in parts or as a whole. Economics is a Social science subject that helps us visualize how well our economies function. They emphasis upon studying patterns associated with production, distribution and consumption of various Goods and Services with some positional value.

As social dwellers we are among those entities that participate, interact and contribute to the overall economic functions and outcomes. Few are personal, while few are more generic and apply to common-core. With digitization the trails of our participation are now held by machines. Common buzz word being Hi-tech. Digital footprints as we know are mere electronic traces locked inside several silos (specific and contextual). These traces when analyzed with other sources of information as catalysts can reveal potential opportunities of how to better and improvise our economic functions. Also, will have capability to uncover patterns to help us foresee occurrence of future probabilities. This will help in either mitigate adversaries or innovate and generate newer opportunities.

Preceding visual describes the connected and integrated state of our digitized lives, with reasons to quote. Interaction experiences that we, as humans gain are very sensitive to how we participate, interact and respond. These differ by situation and context. Few will have short term impact, while others last long with either positive, negative or neutral effects. The takeaways are very contextual. For example, an Enterprise Organization likes to gain better conversion rates to increase their market share, retain customers, sustain bottom lines, achieve top-line performance, and so forth. All these require access to insightful information. That one can access on demand, when needed in a shortest possible amount of time. It is important to provision such insights without compromising on confidence and accuracy. Analytics is the key here.

Changing dynamics in our socio-economic conditions are forcing us to adapt and innovate at a much faster rate, than the rate at which the factual information can be gathered and processed. Success of

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Illustration 1: Connected World – Digitized Human Interactions

011101011010110111010110

0111010110 0111010110 0111010110

01110101100111010110

0111010110 01110101100111010110capturecapture learnlearn applyapplyrefinerefine

re-use or re-purpose

relevance

timinglocation

outcomelanguagecomprehension

presentation

non-factualfactual

situation

Page 6: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

Analytics is hedged on contextual relevance of insights they produce, given space and time dimensions as key variables. Converting information into insights can be challenging even when the context is kept constant. On the consuming end, participating Entities require information at varying levels of abstraction. Abstractions that suite to their contextual needs. Few require detailed insights, while others are okay with the gist itself. Mode of representation and communication effects the ability to grasp. Information can be represented and exchanged in either verbal, non-verbal, visual, or as a combination of these.

Said that, the field of Information Technology is going through a major shift in its course of evolution. There are now many tools and choices. This is true for both Enterprises and Individuals. As the adoption rate of digital avenues increases, the size of the footprints also will explode, but in exponential proportions. Fragmentation can lead to higher noise levels at the time of processing and/or consumption. Simplification of this process to provide fact based insights can get complex.

Analytics without its complexity gig, starts with few basic assertions, such that we can comprehend and rationalize current state of affairs. At least question the condition in the context. Diving deep, it usually requires few hypothesis based data experiments. Need for such deep analysis is to uncover knowns from unknowns. Least to help us stay away from entering into a chaotic condition.

What ever the solution may be, one is required to pay close attention to the state of data nuggets by their key features. These include dimensions of time, space, and those that describe their relevance to other participating entities. Breakdown list of few key aspects include:

• Why – to stay-put as a social animal in line with the changing dynamics of our social interactions

• What – care for expressions, impressions, presumptions, imaginations, emotions, sentiments, etcetera

• How – juxtaposing your representation against the generational currents in socio-economic conditions

• Where – at every touch point between more than one entity (humans, machines and digital avatars)

• Things to note – time dimension and significance of timing based on relative context

• Don't – just scrub the glass surface by mere attraction or inquisitiveness

• Beware – of territorial conditions that apply and vary by context

• Enforce – self-induced, tolerable discipline. Baby-sitting is next to being nonsensical!

• Mitigate – degradation of biological reflexes that includes drained cognition.

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Illustration 2: Chaos Condition - Interaction Derailment

???!!!

Page 7: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

4. Big Data – Technical Enumeration

4.1 How much big is Big? Big is a relative annotation. It is based on who is consuming and in what context. Something that is big

for one entity may seem trivial to another. Let's check on few coordinates, so that we can quantify the element of Big, without loosing its contextual sense and associated quality aspects. The coordinates in technical terms are – Volume, Variety, Velocity and Veracity. Few choose Value in the place of Veracity. Since Value is more contextual, we will leave it at its best abstract meaning and purpose. With all the data flowing through various channels, communications and exchange hubs, orchestration can become complicated.

Complexity of the situation can be expressed using a geometrical plane, where each aspect of the challenge is represented by certain axis point. Preceding illustration provides a visual perspective of this scenario. The axis points describe the constraints such as Volume, Variety, Velocity and Veracity. All these constraints drive the overall positional value of insights.

• Volume – sheer size of the data sets as manifested from their source

• Variety – discrete forms in which data or facts can exist, either in parts or as a whole

• Velocity – the rate at which data gets generated

• Veracity – this usually refers to data lineage that is used to ascertain the facts

Big Data play has three main components – Data, Infrastructure, and Data Science. Data as we know is a collection of nuggets that describe facts and entities. The necessary tool-kits to support data aggregation, storage, and accessibility form the underlying infrastructure. Data Science is the black magic that turns discrete nuggets into a composite information insight.

For a more intrigued mind, Big Data is a quest for Singularity. Data nuggets are always in a constant spin where they get sliced and diced until essential insights are produced. It includes analysis of deterministic behaviors and patterns along with the non-deterministic casualties. Its an amalgamation of inferential

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Illustration 3: Big Data Challenges - a visual perspective!

Statistics is scary!!Fix it now! Can't afford Y2K :-(

MachineGenerated Digital

Documents

Empiricalsets

Internet Streams

Trade &Commerce

Academic

Decision Support

AnalyticalQA Archive

Cleanse

Acquire

Integrate

Enable

Secure

volume

variety veracity

velocity

Page 8: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

statistics and applied mathematics. Domain expertise and experience are key, such that you can have better control over noise levels and can deliver clear insights.

Know the sources of your data as well ask right questions. Also, by default assume that data can come from heterogeneous sources in discrete forms. The sources can either be internal or external to your operating realm. Following are few examples:

• Machine Generated data such as event logs, sensor data, usage metrics, etc.

• Socio-economic digital footprints – social media posts, feedback, and other disjoint sets, etc.

• Residual data from our past consumption. For example, emails, text messages, etc.

• Disintegrated and fractured data – often caused by territorial or boundary conflicts.

Questions that you ask must emphasize the aspect of “why”. It means you must assert your feeds at each stage of consumption. This translates into a feature requirement for the underlying systems – support ad-hoc, filtered access to data, without diluting the richness. Once data is available to you, gradually then progress towards rationalization, comprehension and consolidation of your thoughts. This will help you meet your needs with specificity and objectivity.

Also, please note any experiments without clarity can cost you exponentially. Hence, it is very important that you fail fast than fail long.

4.2 Handling the BigIs Big Data a technological phenomenon or a functional aspect driven by a use case requirement? Is Big

Data transactional or analytical in nature? Is it handled in batches or in real time? If it's real-time, how real time are we talking? Do we get to work with snapshot data or data that is constantly in motion aka., streams? What will be the vastness of time dimension in the context? It is forecasting or prediction? What will it take to process such vast amounts of data? Is there an impact to consumption – visually or when approached on an ad hoc basis? Is there a difference in approach that we were used to so far in addressing information security? These are few questions, among many that are being asked by legions of technology .

Taking a moment away from the buzz, let us quickly check how the tools are being transformed. Key components include Computational Power, I/O, Storage and Presentation. Multi-core processors are now a norm. Even GPU (Graphical Processing Units) are being tried to balance the computational demands. Typical Laptops come packed with quad-core processors. These units can clock minimum speeds of at least 2.7Ghz rates, equipped with 8GB memory capacity that can handle memory transactions at least rates of 1800Mhz and minimum Terra-byte capacity in storage. All within economically affordable costs on average. These components carry more intelligence than legacy components. For example, the Graphical Processing Units (GPU ) are now being deployed to run computations, that yield throughputs higher than Central Processing Units (CPU). This is all cutting edge.

Software frameworks and other requisite tool kits also went through major overhauling. Programming languages now support constructs that reveal hooks to unlock true potential of multi-core systems. Tool selection is critical to ensure optimal balance between key runtime expectations of a system. These include consistency, availability and partition tolerance. Of-course with ability to deliver best performance.

Digressing a bit, it is now possible to build an experimental platform using a mobile (functional context) computing device itself. This will support controlled big data science experimentation needs. Where in, different aspects of your application and data flows can be whetted out, including the ability extract logarithmic performance. This will be highly helpful to debug your applications to their gory detail before moving onto a larger cluster. Open Source stable, now has all the required software components, Operating Systems and through to the Visualization frameworks. This sets tone on level of maturity each of the moving parts have gone through in the past recent years.

Speaking about mobility, there is one key area that you should still worry about – Energy Efficiency.

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Page 9: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

Reasons vary from being a pure mechanical control under the hood to how your applications are designed.

What else do we need to know about Big Data? This is simple, don't get baffled. Embrace the change and be ready to fail fast instead of failing long.

4.3 Lego talk – Key Building blocks

Platform to handle big data sets in a cohesive manner must be built by factoring in both the functional and technical aspects of Data. This is essential at each stage – initial analysis through to the implementation and operational phases. Following are few essential terms that should be understood well, in this context:

• Real-time – this metric is based on time and space dimensions of the context that you are trying to meet objectively. Downstream validation can be whetted out either in the form of presentation or an event to drive further actions.

• Data Pipelines – these are conduits to acquire data from discrete sources of data into the data platform. They also support dissemination flows to support data as a service requirements.

• Insights Extraction – this is a constant cycle of mining and modeling of data for insights. Goal here is to enhance richness of insights data can produce.

• Integrated Analytics – productive utilization of big data insights can be achieved if the insights can exhibit higher levels of accuracy and confidence in how they represent the truth factor. This is where analytics as a interactive function is essential. This can be achieved through visual and verbal cues.

• Semantic Inferences – Cognitive capacity of humans and their attention span is very limited. To acquire a consistent experience by insights, semantic inferences cannot be ignored when you package insights for consumption.

• Quality as we learn is a relative measure driven by consumer requirements. From a producer's perspective quality is defined by some quantitative descriptors such as terms of durability, timing, etc.

• Post Relational – this topic is related to representation formats of insights as integral pieces of data nuggets. Its emphasis is on the point that existing methods of representation and associated tool kits is limited to cater to different data scenarios. Particular challenge is being able to represent unknowns in the process of mining exercises, where you may already have whole bunch of known pieces of information nuggets. Scalability and Performance will become non-trivial. Data and computation complexity are two sides of this coin.

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Illustration 4: Focus Areas – Big Data Solution Design

Data InteractionsAnalyticsReports

Visualization

Data InteractionsAnalyticsReports

Visualization

ProvisioningAccessibility

Dev OPSSecurity

ProvisioningAccessibility

Dev OPSSecurity

Data SourcingData Pipelines

Stream ProcessingInsights Extraction

Data SourcingData Pipelines

Stream ProcessingInsights Extraction

RegulationsGovernanceStandards

Enforcement

RegulationsGovernanceStandards

Enforcement

Key Building Blocks

Key Building Blocks

Page 10: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

4.4 Controls to Throttle

As we learned from previous sections, there several moving parts that form a cohesive whole. Which, then must be scaled to meet corresponding performance and throughput requirements. We represent this requirement in the above visual. Instead of emphasizing the tenets of CAP theorem alone, we focus on concurrency, throughput and fail-over aspects. Use of terminology such as Java, Hadoop, Unix, etc., is only to set contextual reference. There are many more such tools, terminologies and techniques, depending on the context.

Teeing these topics back into the discussions on aspects such as Data Streams and Data Science, we see a Big Data Engine at its full throttle. Efficiency is a measure of how perfect the discrete pieces can align with their peers. Such that you see a fluid, pass through straight line curve of Digital Highways.

Skipping the gory details of implementation, let's assume you have a system that is now completely built from infrastructure standpoint and is operational. In such circumstances how do you monitor and manage the system? Ensure five nines productivity and efficiency! What does it even mean to ask for five nines? We are asking consistent performance keeping computation efficiency in constant time factor. Other influential variables include data that moves in space and time dimension.

Your try-catch-blocks (exception or error handling) can only cover for deterministic behaviors, that account for known issues. Taming systems when they exhibit nondeterminism is a daunting task. Preceding illustration presents set of least common denominators aka., controls that you can throttle to monitor and manage your operational environments.

Goal here is to balance the operational constraints Consistency, Availability and Partition Tolerance (CAP) aspects.

• Consistency – guarantee consistent usability, even if underlying parameters deviate and vary.

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Illustration 6: Key Throttles to control!

Illustration 5: Moving Parts - Layered Approach

Failover

Th

roug

hp

ut

Concurrency

Hardware Resources(Physical, Emulated)

Hardware Resources(Physical, Emulated)

Infrastructure(Hadoop, JVM, Unix,etc.)

Application(Heap, Data Models, Compute, etc)

Page 11: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

• Availability – system services are always available to you on-demand, by constant SLA factor.

• Partition Tolerance – data and compute capacities are spread distributed across various nodes in the cluster that form the platform. Partition tolerance is about dealing with inherent fragmentations or dropped communications, and yet able to provide a complete experience to the entities that interact with the system. This effects primarily the performance, then results in a dwindling state of consistency and availability.

5. Big Data – Data-driven Infotainment

5.1 Infotainment – Defining Moment

Paraphrasing the discussions so far, we need the tool to deliver insightful, contextually relevant information. Primarily with ability to access it on-demand and is derived based on quantifiable metrics and qualitative inferences. Anything else will be garbaged out easily, without even a single glance. Once an effective baseline is established there will be at least three broad categories of audience, who would be drawn towards it – Enterprises, Analysts and Consumers (aka., Users). Their requirements will be very discrete. Zoom-in close, no wonder you can also sense the randomized patterns that each of their requirements and levels of abstractions would carry. Right from a naive brain to a highly analytical and matured one would be glancing at the same piece of information.

The preceding diagram sets a visual context of data flows as a perennial stream. The participating entities produce and exchange information in more than one form – verbal, non-verbal, audio, and video. During its course the data gets transformed and even trans-morphed, to suite different needs and contexts. An ideal infotainment situation is one where unwanted information gets discarded, securely and with some sense of responsibility.

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Illustration 7: Data-Driven Infotainment

\_

)

)

)

\_

)

)

)

\_

)

)

)

Securely disposed irrelevant data!

Page 12: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

5.2 Enterprises – What do they do?Let's switch context to more specific examples. Moving in the order of enumeration, Enterprises

(for-profit or non-profit) consider the infotainment medium as a power boost. It is an essential tool in their chest net that can unlock potential capabilities that they can further explore, if not totally capitalize. There are now business models that are purely data-driven. Data is their new currency of trade and commerce to either improve top-line performance or sustain the bottom-line. The field of Business Intelligence (where information is constantly augmented to produce actionable insights) is now at least two decades old in its path towards maturity. From a mere passive, referential reporting or summed up dashboard experience it is now moved to a very dynamic field of study. Study that can reflect current state of affairs with more agility, accuracy and confidence.

As the changes in the underlying fabric are becoming more apparent, the rules of the trade (Standards and Governance Rules) and scope of their enforcement is also changing. Number of stakeholders and data stewards is now vastly huge and diverse. As humans we are still fond of instantaneous results. We try to capture the perspective of an Enterprise (for-profit or a non-profit) in the following visual representation:

5.3 Role of an AnalystLet's move to the next category of users – the Analysts. On the stage of infotainment this group's role

is primarily to dissect the state of affairs. Rationale thinking and Negation are key characteristics of this group. Their prime focus is on asking fundamental questions such as Why, What, Where, When, and How. Modus Operandi can either be independent or biased. Largely influenced by the level of association and degree of affiliation to any particular Enterprise or Organization. Their core strength lies in their ability to reflect current state of affairs as a single truth statement. Their job is to provide a clear picture of

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Illustration 8: Enterprise Infotainment

$

Page 13: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

undercurrents within the socio-economic fabric.

This group's information requirements come in variety of abstraction levels. Use of information is subjective to standards, compliance and rules enforced by their controlling authority. Also, once they travel beyond their confinements the degree of enforcement gets even stricter. Unauthorized application or misrepresentation of facts would only back-fire, instantaneously, decisively and intensely. They need to be supported with these requirements as well besides accessibility and availability of required data. Once again, time and space will be your challenges.

5.4 Netizens – Ideal StateMoving down the path is our pen-ultimate and critical actor on the stage of Infotainment – the Netizen!

As powerful they may sound, they can be that vulnerable. They are the most exposed entity in the whole ecosystems. Compliance and Social Responsibility usually are the very end of their priority list. They assume several aspects. Aspects that range from availability to security. Wait a second! Let's understand this first – they like services and goods for free, but in a bond of trust. Trust formed by either a good will or presence of some societal laws and governance policies.

Information is produced and consumed in frames. Think of a moment in a reel of a film where you have subjects and some context. That something which is subjective to change with the next scene in the roll. Typical challenges faced by this group include fragmentation of information and communications. Information will loose its relevance or value if it is not provided to them with a right timing. It's the same in both the scenarios – when its being asked for or when it can provided because of their subscriptions and preferences.

Irrespective of all the challenges, limitations, goodies and other factors that influence the touch points this group would have, it'd be great if they can get a snapshot of their overall socio-economic condition. Following illustration provides a visual perspective of the need here:

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Illustration 9: Connected World Experience - “All things Me”

Motivators by varying know-how

levels!

Point-in-time Interactions

society

relations

culture

career

finance

health

Serve the purpose

Boost Confidence

ObjectiveRealizationClear

MessagingElevate Elevate PositionPosition

Sustain

Grow!

Secure

Information Containment & Exploration Sources

InfotainmentMedia

EducationalSystems

Personalized Digital Assets

Non-PersonalDigital Assets

Non-DigitalAssets

35%50%

65%

75%80%

25%

Etcetera

Page 14: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

6. ConclusionBig Data started as a marketing buzz, is now settled into more practical channels of application. In a

retrospective approach we have tried to gain our arms around the concept of Big Data. It involved attempts to learn about the generational shifts, manifestation sources, general applicability and practical relevance to different categories of entities, that exist in our socio-economic landscape. We also touched on few technical aspects such as platform architecture, degree of complexity, challenges,etc. As mentioned at the very beginning of this article, we want to experiment and experience the phenomenon. We will cover the practical aspects as a sequel article – be3 – Controlled Big Data Experimentation.

7. KeywordsWe will live in the age of Search. We depend on search tools to ask questions. We are okay with even a

slightest clue about what we are looking for. This section will provide a vocabulary of words and phrases that will help you gain the context quickly and easily.

• Big Data • Contextual Relevance • Human Cognition• Relative Sense • Time or Timing • Confidence• Accuracy • Hadoop • High Performance Computing• Data and Economics • Infotainment • Datanomics• Data Science • Mathematics • Relational vs Post-relational• Meta-data • Information Insights • Data Nuggets• Netizen • Controlled Data Experiments • Patterns• Noise • Signals • Data Abstraction• Governance • CAP • Standards Compliance• Semantic Inferences • Amplification • Insights with clarity

insights: f(data) = ∫log(data)context;context = {0,…,∞}

Page 15: Be3 experimentingbigdatainabox-part1:comprehendingthescenario

8. BibliographyThis section serves as a bibliography, where in links to various Internet sources that were tapped in to

acquire the background subjective knowledge on the topic of Big Data, Hadoop, Linux and High Performance Computing. Most of the known articles are referenced directly, with many untracked list of forum posts on sites such as Stackoverflow, OSDIR, Google Forums, etc.

While some portions of this article provide links to external references, following is the list of all known and tracked resources available from the Internet. These were used to further the understanding and refine the grasp.

• Contextual Computing: Our Sixth, Seventh and Eighth Senses [http://www.forbes.com/sites/reuvencohen/2013/10/18/contextual-computing-our-sixth-seventh-and-eighth-senses/]

• Economics [http://en.wikipedia.org/wiki/Economics]

• Optimality Theory [http://en.wikipedia.org/wiki/Optimality_Theory]

• Open Sans Font – Apache License [http://cooltext.com/Download-Font-Open+Sans]

• Oracle VirtualBox Documentation [https://www.virtualbox.org/manual/UserManual.html]

• Consistency Types [http://en.wikipedia.org/wiki/Consistency_model#Types]

• Big Data Interest is Soaring, but Adoption Rates are Stalling [http://www.hightech-highway.com/communicate/big-data-interest-is-soaring-but-adoption-rates-are-stalling/]

• Is iOS7 A Better Innovation Platform than Android? [http://www.forbes.com/sites/haydnshaughnessy/2013/06/19/is-ios-7-a-better-innovation-platform-than-android/]

• Manifold [http://en.wikipedia.org/wiki/Manifold]

• Technological Singularity [http://en.wikipedia.org/wiki/Technological_singularity]

• Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services [http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf]

• w3.org: Semantic Web – Interference [http://www.w3.org/standards/semanticweb/inference]

insights: f(data) = ∫log(data)context;context = {0,…,∞}