Top Banner
MicroHash:An Efficient Index Structure for Flash-Based Sensor Devices Demetris Zeinalipour [ [email protected] ] School of Pure and Applied Sciences Open University of Cyprus http://is.ouc.ac.cy/~zeinalipour/ Microsoft Research Cambridge, January 11 th , 2008
50
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Download It

MicroHash:An Efficient Index Structure for Flash-Based Sensor

DevicesDemetris Zeinalipour

[ [email protected] ]School of Pure and Applied Sciences

Open University of Cyprus

http://is.ouc.ac.cy/~zeinalipour/

Microsoft Research Cambridge, January 11th, 2008

Page 2: Download It

2

Presentation Goals

• To provide an overview of recent developments in Wireless Sensor Network Technology

• To highlight some important storage and retrieval challenges that arise in this context

Page 3: Download It

3

• This is a joint work with my collaborators at the University of California – Riverside.

• Our results were presented in the following papers:– "MicroHash: An Efficient Index Structure for Flash-

Based Sensor Devices", D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos and W. Najjar, The 4th USENIX Conference on File and Storage Technologies (FAST’05), San Francisco, USA, December, 2005.

– " Efficient Indexing Data Structures for Flash-Based Sensor Devices", S. Lin, D. Zeinalipour-Yazti, V. Kalogeraki, D. Gunopulos, W. Najjar, ACM Transactions on Storage (TOS), ACM Press, Vol.2, No. 4, pp. 468-503, November 2006.

Acknowledgements

Page 4: Download It

4

Presentation Outline

1. Overview of Wireless Sensor Networks

2. Overview of Data Acquisition Models

3. The MicroHash Index Structure.

4. MicroHash Experimental Evaluation

5. Conclusions and Future Work

Page 5: Download It

5

Wireless Sensor Networks• Resource constrained devices utilized for

monitoring and studying the physical world at a high fidelity.

Page 6: Download It

6

Wireless Sensor Device

Page 7: Download It

Wireless Sensor Network

7

Page 8: Download It

8

Wireless Sensor Networks

• Applications have already emerged in: – Environmental and habitant monitoring– Seismic and Structural monitoring– Understanding Animal Migrations &

Interactions between species– Automation, Tracking, Hazard Monitoring

Scenarios, Urban Monitoring etc

Great Duck Island – Maine (Temperature, Humidity etc).

Golden Gate – SF, Vibration and Displacement

of the bridge structure

Zebranet (Kenya) GPS trajectory

Page 9: Download It

9

Wireless Sensor NetworksThe Great Duck Island Study (Maine, USA)

• Large-Scale deployment by Intel Research, Berkeley in 2002-2003 (Maine USA).

• Focuses on monitoring microclimate in and around the nests of endangered species

which are sensitive to disturbance.• They deployed more than 166 motes

installed in remote locations (such as 1000 feets in the forest)

Page 10: Download It

10

Wireless Sensor Networks

WebServer

Page 11: Download It

11

Wireless Sensor NetworksThe James Reserve Project, CA, USA

Available at: http://dms.jamesreserve.edu/

Page 12: Download It

12

The Anatomy of a Sensor Device• Processor, in various (sleep, idle, active) modes• Power source AA or Coin batteries, Solar

Panels• SRAM used for the program code and for in-

memory buffering.• LEDs used for debugging• Radio, used for transmitting the acquired data to some storage site (SINK) (9.6Kbps-250Kbps)

• Sensors: Numeric readings in a limited range (e.g. temperature -40F..+250F with one decimal

point precision) at a high frequency (2-2000Hz)

Storage

Page 13: Download It

13

Sensor Devices & CapabilitiesSensing Capabilities

• Light• Temperature• Humidity • Pressure• Tone Detection• Wind Speed• Soil Moisture• Location (GPS)• etc… Xbow’s

i-mote2UC-Riverside

RISE

Xbow’sTelos

UC-Berkeley mica2dot

Xbow’s Mica

Page 14: Download It

14

Characteristics 1. The Energy Source is limited.

Energy source: AA batteries, Solar Panels

2. Local Processing is cheaper than transmitting over the radio.

Transmitting 1 Byte over the Radio consumes as much energy as ~1200 CPU instructions.

3. Local Storage is cheaper than transmitting over the radio.

Transmitting 512B over a single-hop 9.6Kbps (915MHz) radio requires 82,000μJ, while writing to local flash only 760μJ.

Page 15: Download It

15

Presentation Outline

1. Overview of Wireless Sensor Networks (WSN)

2. Overview of Data Acquisition Models

3. The MicroHash Index Structure

4. MicroHash Experimental Evaluation

5. Conclusions and Future Work

Page 16: Download It

16

The Centralized Storage Model

• A Database that collects readings from many Sensors.

• Centralized: Storage, Indexing, Query Processing, Triggers, etc.

Page 17: Download It

17

Centralized Storage I

Available at: http://www.xbow.com/

Crossbow’s MoteView software• No in-network Aggregation • No in-Network Filtering

Page 18: Download It

18

Centralized Storage II

Available at: http://telegraph.cs.berkeley.edu/tinydb/

TinyDB - A Declarative Interface for Data Acquisition in Sensor Networks.

• In-Network Aggregation• In-Network Filtering (i.e., WHERE clause)

v1

v3

v2

v4

v570

6590

70

7090

90

MAX 90

85

75

e.g., SELECT MAX(temp) FROM sensors

Page 19: Download It

Centralized Storage: Conclusions• Frameworks such as TinyDB:

- Are suitable for continuous queries.

- Push aggregation in the network but keep much of the processing at the sink.

• New Challenges: - Many applications do not require the query to

be evaluated continuously (e.g., Average temperature in the last 6 months?)

- In many applications there is no sink (e.g., remote deployments and mobile sensor nets)

- Local Storage on sensors keeps increasing (e.g., RISE and more recently imote2)

Page 20: Download It

20

Our Model: In-Situ Data Storage

1.The data remains In-situ (at the generating site) in a sliding window fashion.

2.When required, users conduct on-demand queries to retrieve information of interest.

The SinkProgramming board

A Network of Sensor Databases

Page 21: Download It

21

Soil-Organism Monitoring(Center for Conservation Biology, UCR)

– A set of sensors monitor the CO2 levels in the soil over a large window of time.– Not a real-time application.– Many values may not be very interesting.

In-Situ Data Storage: Motivation

D. Zeinalipour-Yazti, S. Neema, D. Gunopulos, V. Kalogeraki and W. Najjar, "Data Acquision in Sensor Networks with Large Memories", IEEE Intl. Workshop on Networking Meets Databases NetDB (ICDE'2005), Tokyo, Japan, 2005.

Page 22: Download It

22

Presentation Outline

1. Overview of Wireless Sensor Networks

2. Overview of Data Acquisition Models

3. The MicroHash Index Structure

4. MicroHash Experimental Evaluation

5. Conclusions and Future Work

Page 23: Download It

23

Flash Memory at a Glance• The most prevalent storage medium used for Sensor

Devices is Flash Memory (NAND Flash)• Fastest growing memory market (‘05 $8.7B, ‘06:$11B)

(NAND) Flash Advantages

• Simple Cell Architecture (high capacity in a small surface)

• Fast Random Access (50-80 μs) compared to 10-20ms in Disks

• Economical Reproduction

• Shock Resistant

• Power Efficient

Surface mount NAND flash

Removable NAND Devices

Page 24: Download It

24 Asymmetric Read/Write Energy Cost :

Measurements using RISE

Flash Memory at a Glance1. Delete-Constraint: Erasure of a page can only be

performed at a block granularity (i.e. 8KB~64KB)2. Write-Constraint: Writing can only be performed at

a page granularity (256B~512B), after the respective page (and its respective 8KB~64KB block!) has been deleted

3. Wear-Constraint: Each page can only be written a limited number of times (typically 10,000-100,000)

Flash Media

Block 1

Block 2

Block n

Occupied Page

Empty Page

Energy (Page Size = 512 B)

Read = 24 μJ

Write =763μJ

Block Erase =425μJ

Page 25: Download It

25

MicroHash Objectives

General Objectives• Provide efficient access to any record stored

on flash by timestamp or value• Execute a wide spectrum of queries based on

our index, similarly to generic DB indexes.

Design Objectives: • Avoid wearing out specific pages.• Minimize random access deletions of pages.• Minimize SRAM structures

• SRAM is extremely limited (8KB-64KB).• Small memory-footprint => quick initialization.

Page 26: Download It

27

Main Structures• 4 Page Types: a) Root Page, b) Directory Page, c)

Index Page and d) Data Page

• 4 Phases of Operation: a) Initialization, b) Growing, c) Repartition and d) Garbage Collect.

Page 27: Download It

28

Growing the MicroHash Index• Collect data in an SRAM buffer page Pwrite

• When Pwrite is full flush it out to flash media• Next create index records for each data record

in Pwrite

• If SRAM gets full, Index pages are forced out to flash media by an LRU policy.

(ts, 74F)

Index Pages

BufferPwrite

BufferPwrite

60

80x

70

50

90

40

Directory

Index

Page 28: Download It

Growing the MicroHash Index

Flash Media

A populated Flash Media

idx: next empty page

Page 29: Download It

30

Garbage Collection in MicroHash• When the media gets full some pages need to

be deleted => delete the oldest pages.

• Oldest Block? The next block following the idx pointer.

Note:• This might create invalid

index records.• This will be handled by

our search algorithm

Page 30: Download It

31

Directory Repartition in MicroHash• MicroHash starts out with a directory that is

segmented into equiwidth buckets– e.g., divide the temperature range [-40,250] into c

buckets)

• Not efficient as certain buckets will not be utilized

– Consider the first few or last few buckets below.

Page 31: Download It

32

Directory Repartition in MicroHash• If bucket A links to more than τ index pages, evict the

least used bucket B and segment bucket A into A and A’

• We want to avoid bucket reassignments of old records as this would be very expensive

Example: τ=2C: #entries since last split S: timestamp of last addition

Page 32: Download It

33

Searching in MicroHash• Searching by value

“Find the timestamp (s) on which the temperature was 100F”– Simple operation in MicroHash– We simply find the right Directory Bucket, from there the

respective index page and then data record (page-by-page)

• Searching by timestamp“Find the temperature of some sensor on a given timestamp tq”– Problem: Index pages are mixed together with data pages.– Solutions:

1. Binary Search (O(logn), 18 pages for a 128MB media)2. LBSearch (less than 10 pages for a 128MB media)3. ScaleSearch (better than LBSearch, ~4.5 pages for a 128MB

media)

Page 33: Download It

34

LBSearch and ScaleSearchSolutions to the Search-by-timestamp problem:

A)LBSearch: We recursively create a lower bound on the position of tq until the given timestamp is located.

B)ScaleSearch: Quite similar to LBSearch, however in the first step we proceed more aggressively (by exploiting data distribution)

Query

tq=500

tq=300tq=350tq=420

tq=490

tq=500

Page 34: Download It

35

Searching Bottlenecks• Index Pages written on flash might not be fully

occupied• When we access these pages we transfer a lot of

empty bytes (padding) between the flash media and SRAM.

• Proposed Solutions:– Solution 1: Two-Phase Page Reads– Solution 2: ELF-like Chaining of Index Pages

Page 35: Download It

36

Improving Search Performance• Solution 1: Utilize Two-Phase Page Reads.

– Reads the 8B header from the flash media.– Then read the correct payload in the next

phase.

Page 36: Download It

37

Improving Search Performance• Solution 2: Avoid non-full index pages using ELF*.

– ELF: a linked list in which each page, other than the last page, is completely full.

– keeps copying the last non-full page into a newer page, when new records are requested to be added.

*Dai et. al., Efficient Log Structured Flash File System, SenSys 2004

Page 37: Download It

38

Presentation Outline

1. Overview of Wireless Sensor Networks

2. Overview of Data Acquisition Models

3. The MicroHash Index Structure

4. MicroHash Experimental Evaluation

5. Conclusions and Future Work

Page 38: Download It

39

Experimental Evaluation• Implemented MicroHash in nesC.• We tested it using TinyOS along with a

trace-driven experimental methodology.• Datasets:

– Washington State Climate• 268MB dataset contains readings in 2000-2005.

– Great Duck Island • 97,000 readings between October and November

2002.

• Evaluation Parameters: i) Space Overhead, ii) Energy Overhead, iii) Search Performance

Page 39: Download It

40

Space Overhead of Index• Index page overhead Φ = IndexPages/(DataPages+IndexPages)

• Two Index page layouts– Offset, an index record has the following form {datapageid,offset}

– NoOffset, in which an index record has the form {datapageid}

• 128 MB flash media (256,000 pages)

Page 40: Download It

41

Space Overhead of Index

Black denotes the index pages

Increasing the Buffer Decreases the Index Overhead

Page 41: Download It

42

Search Performance• 128 MB flash media (256,000 pages), varied SRAM (buffer) size• 2 Index page layouts

– Anchor: Index Pages store the last known timestamp– No Anchor: Timestamp is only stored in Data Pages

Page 42: Download It

43

Search Performance• We compared MicroHash vs. ELF Index Page

Chaining by searching all values in the range [20,100]• Keeping full index pages increases search

performance but decreases insertion performance.

Decreasing indexing performance using ELF

(15% more writes)

Increasing search performance using ELF

(10% less reads)

Page 43: Download It

44

Indexing the Great Duck Island Trace• Used 3KB index buffer and a 4MB flash card to store all

the 97,000 20-byte data readings.– The index pages never require more than 28% additional space – Indexing the records has only a small increase in energy

demand: the energy cost of storing the records on flash without an index is 3042mJ

– We were able to find any record by its timestamp with 4.75 page reads on average

Page 44: Download It

45

Presentation Outline

1. Overview of Wireless Sensor Networks

2. Overview of Data Acquisition Models

3. The MicroHash Index Structure

4. MicroHash Experimental Evaluation

5. Conclusions and Future Work

Page 45: Download It

46

Conclusions & Future Work• We proposed the MicroHash index, which is an

efficient external memory hash index that addresses the distinct characteristics of flash memory

• Our experimental evaluation shows that the structure we propose is both efficient and practical

• Future work:– Develop a complete library of indexes and data

structures (stacks, queues, b+trees, etc.)– Buffer optimizations and Online Compression– Support Range Queries

Page 46: Download It

MicroHash:An Efficient Index Structure for Flash-Based Sensor

DevicesDemetris Zeinalipour

Thank you!

Questions?Related Publications•"MicroHash: An Efficient Index Structure for Flash-Based Sensor Devices", D. Zeinalipour,S. Lin, V. Kalogeraki, D. Gunopulos, W. Najjar, In USENIX FAST’05.• " Efficient Indexing Data Structures for Flash-Based Sensor Devices", ACM Transactions on Storage (TOS), November 2006.

Presentation and publications available at:http://is.ouc.ac.cy/~zeinalipour/

Page 47: Download It

Backup Slides

Page 48: Download It

The Programming Cycle • The Operating System

TinyOS (UC-Berkeley): Component-based architecture that allows programmers to wire together the minimum required components in order to minimize code size and energy consumption

(The operating system is really a number of libraries that can be statically linked to the sensor binary at compile time)

• The Programming LanguagenesC (Intel Research, Berkeley): an event-based C-variant optimized for programming sensor devices

event result_t Clock.fire() { state = !state; if (state) call Leds.redOn(); else call Leds.redOff(); }

“Hello World”: Blinking the red LED!

Page 49: Download It

The Programming CycleThe Testing Environment• Debugging code directly on a sensor device is a tedious

procedure • nesC allows programmers to compile their code to

• A Binary File that is burnt to the sensor• A Binary File that runs on a PC

• TOSSIM (TinyOS Simulation) is the environment which allows programmers to simulate the PC binary directly on a PC.

• This enables accurate simulations, fine grained energy modeling (with PowerTOSSIM) and visualization (TinyViz)

Page 50: Download It

The Programming CycleThe Pre-deployment Environment• Once you have created and debugged you code you can perform

a deployment in a laboratory environment.• Harvard’s MoteLab uses 190 sensors, powered from wall power

interconnected with an Ethernet connection.• The Ethernet is just for debugging and reprogramming, while the

Radio for actual communication between motes• Motes can be reprogrammed through a web interface.

Available at: http://motelab.eecs.harvard.edu/


Related Documents