Top Banner
RFID Data Management RFID Data Management Kamlesh Laddhad (05329014) Kamlesh Laddhad (05329014) Karthik B.(05329021) Karthik B.(05329021) Guide: Prof. Bernard Menezes Guide: Prof. Bernard Menezes
43
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PPT

RFID Data ManagementRFID Data Management

Kamlesh Laddhad (05329014)Kamlesh Laddhad (05329014)Karthik B.(05329021)Karthik B.(05329021)

Guide: Prof. Bernard MenezesGuide: Prof. Bernard Menezes

Page 2: PPT

OutlineOutline

• Introduction to RFID Technology.Introduction to RFID Technology.• Issues with RFID Technology.Issues with RFID Technology.• RFID Data Characteristics.RFID Data Characteristics.• Data Warehousing.Data Warehousing.

– Expressive Temporal Model: Dynamic Relationship ER Expressive Temporal Model: Dynamic Relationship ER ModelModel

– RFID - Cuboids.RFID - Cuboids.– Use of Bitmap Datatype.Use of Bitmap Datatype.

• Data Cleaning.Data Cleaning.– Extensible Sensor stream Processing (ESP)Extensible Sensor stream Processing (ESP)– Statistical sMoothing for Unreliable RFid data.Statistical sMoothing for Unreliable RFid data.

(SMURF)(SMURF)

• Future Plans.Future Plans.

Page 3: PPT

IntroductionIntroduction• Radio Frequency Identification:Radio Frequency Identification:

– It is an Automatic Identification and Data Capture Technology.It is an Automatic Identification and Data Capture Technology.– FastFast– No contact or line of sight.No contact or line of sight.– Uses radio-frequency waves to transfer dataUses radio-frequency waves to transfer data

• ComponentsComponents– Tag: small, low-cost device that can hold a limited amount of Tag: small, low-cost device that can hold a limited amount of

data.data.• Associated with objects, such as pallets, cases, and even Associated with objects, such as pallets, cases, and even

individual items. individual items. – Reader: Recognize presence of tag and read info stored on it.Reader: Recognize presence of tag and read info stored on it.

• Unique electronic product code (EPC) associated with Unique electronic product code (EPC) associated with a tag.a tag.

• By placing RFID tag readers at various locations, one By placing RFID tag readers at various locations, one can track the movement of objects through supply can track the movement of objects through supply chain networks.chain networks.

Page 4: PPT

Applications and Applications and AdoptionsAdoptions

• Supply Chain Management: real-time Supply Chain Management: real-time inventory tracking.inventory tracking.– US Department Of Defense: shipments to armed US Department Of Defense: shipments to armed

forces forces • Retail: Active shelves monitor product Retail: Active shelves monitor product

availabilityavailability– Wal-Mart, Albertson: Major Retails storesWal-Mart, Albertson: Major Retails stores

• Access control: toll collection, transportation.Access control: toll collection, transportation.– Airline luggage management:Airline luggage management:

• British airways:20 million bags a yearBritish airways:20 million bags a year• Implemented to reduce lost/misplaced luggageImplemented to reduce lost/misplaced luggage

• Anti-counterfeiting and security: Anti-counterfeiting and security: – Food and Drug Administration: To reduce Food and Drug Administration: To reduce

counterfeit in pharmaceutical supply chaincounterfeit in pharmaceutical supply chain

Page 5: PPT

Prospective for RFID Prospective for RFID researchresearch

• The physics of building tags and readersThe physics of building tags and readers– Tags have few gates: Apart from basic operation, very less Tags have few gates: Apart from basic operation, very less

computing power.computing power.– Radio-frequency has some issues with operating in certain physical Radio-frequency has some issues with operating in certain physical

mediums.mediums.

• The privacy and safety issues: The privacy and safety issues: – Complex encryption schemes are not possible on RFID tags.Complex encryption schemes are not possible on RFID tags.– Counterfeiting by means of either illegitimate readers or spoofed Counterfeiting by means of either illegitimate readers or spoofed

tags are possible tags are possible – Reader-tag communication is wireless: Third parties can eavesdrop Reader-tag communication is wireless: Third parties can eavesdrop

on signals.on signals.

• Software Architecture to collect, filter, organize, and Software Architecture to collect, filter, organize, and answer online queries:answer online queries:– No. of tags are proportional to No of items being serviced/tracked.No. of tags are proportional to No of items being serviced/tracked.– No. of readers are proportional to traceable strategic No. of readers are proportional to traceable strategic

locations/areaslocations/areas• Each Reader picks up tag signals on continuous basis.Each Reader picks up tag signals on continuous basis.• Data generated by RFID systems is enormous:Data generated by RFID systems is enormous:• E.g. Wal-Mart is expected to generate 7 terabytes of RFID data per day. E.g. Wal-Mart is expected to generate 7 terabytes of RFID data per day.

• Our Focus: Third Stream.Our Focus: Third Stream.

Page 6: PPT

Data Warehousing Data Warehousing TechniquesTechniques

Page 7: PPT

Data Management Data Management ChallengesChallenges

• Data Explosion : ExampleData Explosion : Example– A retailer with 3,000 stores, selling A retailer with 3,000 stores, selling

10,000 items a day per store.10,000 items a day per store.– Each item moves 10 times on average Each item moves 10 times on average

before being sold before being sold • Movement recorded as (EPC, location, Movement recorded as (EPC, location,

second)second)

– Data volume: 300 million tuples per day.Data volume: 300 million tuples per day.– Example OLAP Query: “Average time for Example OLAP Query: “Average time for

items to move from warehouse to items to move from warehouse to checkout counter in March 2006?”.checkout counter in March 2006?”.• Costly to answer if there are a billion tuples Costly to answer if there are a billion tuples

for March 2006.for March 2006.

Page 8: PPT

Data CharacteristicsData Characteristics• Temporal and history orientedTemporal and history oriented

– Applications dynamically generate observations Applications dynamically generate observations (readings).(readings).

– Objects location and containment relationship among Objects location and containment relationship among objects changesobjects changes

– Need: Expressive data model.Need: Expressive data model.• Inaccurate data and implicit semanticsInaccurate data and implicit semantics

– False positive: Non-existing tag incorrectly read.False positive: Non-existing tag incorrectly read.– False Negative: Reader missed a tag which was in its False Negative: Reader missed a tag which was in its

vicinity.vicinity.– Noisy data & duplicate readings (redundancy): Same tag Noisy data & duplicate readings (redundancy): Same tag

read more than once. read more than once. – Need: Automated data filtering and transformation.Need: Automated data filtering and transformation.

• Streaming and large volumeStreaming and large volume– Object stay in place for longer duration: Readers records Object stay in place for longer duration: Readers records

them periodically. Large data keeps generating.them periodically. Large data keeps generating.– We need to preserve this data for tracking and monitoring.We need to preserve this data for tracking and monitoring.– Need: Scalable storage scheme, compression techniques Need: Scalable storage scheme, compression techniques

to reduce data.to reduce data.• Data GranularityData Granularity

– Data collection granularity needs to be decidedData collection granularity needs to be decided– Differs across applications.Differs across applications.

Page 9: PPT

Warehousing Helps!!Warehousing Helps!!• Lossless compressionLossless compression

– Remove redundancy: (rRemove redundancy: (r11,l,l11,t,t11) (r) (r11,l,l11,t,t22) ... (r) ... (r11,l,l11,t,t1010) => (r) => (r11,l,l11,t,t11,t,t1010))– Group objects that move and stay together.Group objects that move and stay together.

• Data cleaning: Data cleaning: Multi-reading, missed-reading, error-reading, bulky Multi-reading, missed-reading, error-reading, bulky movement.movement.

• Data mining: Find trends, outliers, frequent, sequential, Data mining: Find trends, outliers, frequent, sequential, flow patterns.flow patterns.

• Multi-dimensional summary: product, location, time, …Multi-dimensional summary: product, location, time, …– Store manager: Check item movements from the backroom to Store manager: Check item movements from the backroom to

different shelves in his storedifferent shelves in his store– Region manager: Collapse intra-store movements and look at Region manager: Collapse intra-store movements and look at

distribution centers, warehouses, and storesdistribution centers, warehouses, and stores• Query ProcessingQuery Processing

– Support for OLAP: roll-up, drill-down, slice, and diceSupport for OLAP: roll-up, drill-down, slice, and dice– Path query: New to RFID-Warehouses, about the structure of pathsPath query: New to RFID-Warehouses, about the structure of paths

• What products that go through quality control have shorter paths?What products that go through quality control have shorter paths?• What locations are common to the paths of a set of defective auto-parts?What locations are common to the paths of a set of defective auto-parts?• Identify containers at a port that have deviated from their historic pathsIdentify containers at a port that have deviated from their historic paths

Page 10: PPT

Dynamic Relationship ER Dynamic Relationship ER ModelModel• Proposed by Wang and Liu from Siemens.Proposed by Wang and Liu from Siemens.

• RFID entities are static and are not RFID entities are static and are not altered.altered.

• RFID relationships: dynamic and change RFID relationships: dynamic and change all the time.all the time.

• Two types of dynamic relationships added:Two types of dynamic relationships added:– Event-based dynamic relationship. A Event-based dynamic relationship. A

timestamp attribute added to represent the timestamp attribute added to represent the occurring timestamp of the event.occurring timestamp of the event.

– State-based dynamic relationship. tstart and State-based dynamic relationship. tstart and tend attributes added to represent the tend attributes added to represent the lifespan of a state.lifespan of a state.

Page 11: PPT

• Static entity tableStatic entity table– OBJECT (object_epc, name, description) OBJECT (object_epc, name, description) – LOCATION (location_id, name, owner)LOCATION (location_id, name, owner)

• Dynamic relationship tablesDynamic relationship tables– OBSERVATION(sensor_epc, value, timestamp)OBSERVATION(sensor_epc, value, timestamp)– OBJECTLOCATION(epc, location_id, tstart, tend)OBJECTLOCATION(epc, location_id, tstart, tend)– TRANSACTIONITEM(transaction_id, epc, TRANSACTIONITEM(transaction_id, epc,

timestamp)timestamp)

– SENSOR (sensor_epc, name, description)SENSOR (sensor_epc, name, description)– TRANSACTION (transaction_id, TRANSACTION (transaction_id,

transaction_type)transaction_type)

– CONTAINMENT(epc, parent_epc, tstart, CONTAINMENT(epc, parent_epc, tstart, tend)tend)

– SENSORLOCATION(sensor epc, location SENSORLOCATION(sensor epc, location id,position, tstart, tend) id,position, tstart, tend)

Page 12: PPT

Monitoring.Monitoring.• Missing RFID Object Detection:Missing RFID Object Detection:

– Find when and where object holding EPC= Find when and where object holding EPC= `MEPC’ was lost.`MEPC’ was lost.• select location_id, tstart, tend from select location_id, tstart, tend from objectlocaiton where epc='MEPC' and tstart = objectlocaiton where epc='MEPC' and tstart = ( select max(o.tstart) from objectlocation o where ( select max(o.tstart) from objectlocation o where o.epc='MEPC' )o.epc='MEPC' )

– Check if there are missing objects at current Check if there are missing objects at current location C, knowing that all objects were location C, knowing that all objects were complete at previous location L at time T.complete at previous location L at time T.• select l.epc from objectlocation l where select l.epc from objectlocation l where l.location_id = 'L' and l.tstart <= 'T' and l.location_id = 'L' and l.tstart <= 'T' and l.tend >= 'T' and l.epc not in ( select c.epc from l.tend >= 'T' and l.epc not in ( select c.epc from objectlocation c where c.location_id = 'C' )objectlocation c where c.location_id = 'C' )

Page 13: PPT

TrackingTracking

• RFID Object Moving Time Inquiry: RFID Object Moving Time Inquiry: – Time it takes to supply ‘OEPC’ from Time it takes to supply ‘OEPC’ from

location S to location E?location S to location E?• select (e.tstart-s.tstart) as supplying_time select (e.tstart-s.tstart) as supplying_time from objectlocation e, objectlocation s from objectlocation e, objectlocation s where e.epc = 'OEPC' and s.epc='OEPC' and where e.epc = 'OEPC' and s.epc='OEPC' and s.location_id ='S' and e.locaiton_id='E's.location_id ='S' and e.locaiton_id='E'

Page 14: PPT

Compression IdeaCompression Idea• Bulky object movementsBulky object movements

– Objects often move and stay together through the supply Objects often move and stay together through the supply chain.chain.

– If 1000 packs of product P stay together at the distribution If 1000 packs of product P stay together at the distribution center, register a single record.center, register a single record.

– (GID, distribution center, time_in, time_out).(GID, distribution center, time_in, time_out).– GID is a generalized identifier that represents the 1000 GID is a generalized identifier that represents the 1000

packs that stayed together at the distribution centerpacks that stayed together at the distribution center

• Analysis usually takes place at a much higher level Analysis usually takes place at a much higher level of abstraction than the one present in raw RFID of abstraction than the one present in raw RFID datadata

Factory

Dist. Center 1

Dist. Center2

10 pallets(1000 cases)

store 1

store 2

20 cases(1000 packs)

shelf 1

shelf 2

10 packs(12 sodas)

Page 15: PPT

RFID CuboidsRFID Cuboids• Fact Table: (EPC, location, time_in, time_out).Fact Table: (EPC, location, time_in, time_out).• In supply chain: Items travel through a series of In supply chain: Items travel through a series of

locations.locations.• Query: what is the average time that product P stays Query: what is the average time that product P stays

at store in Location A?at store in Location A?• Traditional cubes miss the path structure of the dataTraditional cubes miss the path structure of the data• Stay Table: (GIDs, location, time_in, time_out: Stay Table: (GIDs, location, time_in, time_out:

measures): measures): – Records information on items that stay together at a given Records information on items that stay together at a given

locationlocation– If using record transitions: difficult to answer queries, lots of If using record transitions: difficult to answer queries, lots of

intersections neededintersections needed• Map Table: (GID, <GID1,..,GIDn>)Map Table: (GID, <GID1,..,GIDn>)

– Links together stages that belong to the same path. Provides Links together stages that belong to the same path. Provides additional: compression and query processing efficiencyadditional: compression and query processing efficiency

– High level GID points to lower level GIDsHigh level GID points to lower level GIDs– If saving complete EPC Lists: high costs of IO to retrieve long If saving complete EPC Lists: high costs of IO to retrieve long

lists, costly query processinglists, costly query processing• Information Table: (EPC list, attribute 1,...,attribute n)Information Table: (EPC list, attribute 1,...,attribute n)

– Records path-independent attributes of the items, e.g., color, Records path-independent attributes of the items, e.g., color, manufacturer, price..manufacturer, price..

Page 16: PPT

EPC OverviewEPC Overview

• Electronic product code Electronic product code – Standard naming scheme, proposed by Auto-Id Center.Standard naming scheme, proposed by Auto-Id Center.– An EPC uniquely identifies an item. An EPC uniquely identifies an item. – Format: <Header, Manager_No., Object Class, Serial Format: <Header, Manager_No., Object Class, Serial

No.>No.>• Header: Identifies the length, type, structure, version and Header: Identifies the length, type, structure, version and

generation of EPC.generation of EPC.• Manager Number: Identifies an organizational entity.Manager Number: Identifies an organizational entity.• Object Class: Identifies a “class”, or type of thing.Object Class: Identifies a “class”, or type of thing.• Serial Number: Specific instance of the Object Class being Serial Number: Specific instance of the Object Class being

tagged.tagged.

– We will refer toWe will refer to• <Header, Manager No, Object Class>: Prefix<Header, Manager No, Object Class>: Prefix• <Serial No.>: Suffix<Serial No.>: Suffix

Page 17: PPT

Use of Bitmap DatatypeUse of Bitmap Datatype• Observation: Items move together.Observation: Items move together.

– Groups of items in the same proximity - e.g. on Groups of items in the same proximity - e.g. on a shelf, on a shipmenta shelf, on a shipment

– Groups of items with same property - e.g. Same Groups of items with same property - e.g. Same productproduct

• Use a bitmap type for modeling a collection Use a bitmap type for modeling a collection of EPCs that can occur in item tracking of EPCs that can occur in item tracking applications.applications.– Instead of storing a tuple per item store a tuple Instead of storing a tuple per item store a tuple

for all the items having same prefix.for all the items having same prefix.– New extra fields instead of epc: New extra fields instead of epc:

• <Len, Suffix_length, Prefix, suffix_start, Suffix_end, <Len, Suffix_length, Prefix, suffix_start, Suffix_end, bitmap>bitmap>

Page 18: PPT

Example: Product Example: Product Inventory Inventory

• With EPC With EPC CollectionsCollections

• With With epc_bitmapsepc_bitmapsStore_Store_

ididProd_iProd_i

ddTimTim

eeItem_collecItem_collec

tiontion

s1s1 p1p1 t1t1 epc11, epc11, epc12, epc12, epc13, epc13,

……

s1s1 p2p2 t2t2 epc21, epc21, epc22, epc22, epc23, epc23,

……

…… …… …… ……

Store_Store_idid

Prod_Prod_idid

TimTimee

Item_bmItem_bmapap

s1s1 p1p1 t1t1 bmap1bmap1

s1s1 p2p2 t2t2 bmap2bmap2

…… …… …… ……

Page 19: PPT

Use of Bitmap DatatypeUse of Bitmap DatatypeHeader EPC_Manager Object_Class Header EPC_Manager Object_Class

Serial_NumberSerial_Number

2-bits2-bits 21-bits 17-bits 24-bits 21-bits 17-bits 24-bits

0x0x4AA890001F4AA890001F62C160 62C160 ………………………… ………………………… 0x0x4AA890001F4AA890001FA0B38EA0B38E

LeLenn

Suff_leSuff_lenn

PrefixPrefix Suff_starSuff_startt

Suff_endSuff_end bitmapbitmap

6464 2424 0x4AA890000x4AA890001F1F

0x62C160x62C1600

0xA0B380xA0B38EE

101001…101001…0001000010

Page 20: PPT

Bitmap OperationsBitmap Operations

• To use this with such datatype in SQL, we To use this with such datatype in SQL, we need operations on such bitmaps.need operations on such bitmaps.

• Conversion and couting Operations: Conversion and couting Operations: epc2Bmap, bmap2Epc and bmap2Countepc2Bmap, bmap2Epc and bmap2Count

• Pairwise Logical Operations: bmapAnd, Pairwise Logical Operations: bmapAnd, bmapOr, bmapMinus, and bmapXor bmapOr, bmapMinus, and bmapXor

• Maintenance Operations: bmapInsert and Maintenance Operations: bmapInsert and bmapDelete bmapDelete

• Membership Testing Operation: bmapExists Membership Testing Operation: bmapExists • Comparison Operation: bmapEqual Comparison Operation: bmapEqual

Page 21: PPT

Use of these operations Use of these operations in SQLin SQL

• Items added to a given shelf between time t1 and Items added to a given shelf between time t1 and t2.t2.– SELECT bmap2Epc(bmapMinus(s2.item_bmap, SELECT bmap2Epc(bmapMinus(s2.item_bmap,

s1.item_bmap)) FROM Shelf_Inventory s1, s1.item_bmap)) FROM Shelf_Inventory s1, Shelf_Inventory s2 WHERE s1.shelf_id = <sid1> AND Shelf_Inventory s2 WHERE s1.shelf_id = <sid1> AND s1.shelf_id = s2.shelf_id AND s1.time = <t1> AND s1.shelf_id = s2.shelf_id AND s1.time = <t1> AND s2.time = <t2>; s2.time = <t2>;

• Book store categorizes books in various categories.Book store categorizes books in various categories.– Following query determines the shelves where the books Following query determines the shelves where the books

with property ’Adventure’ and ’Romance’, are currently with property ’Adventure’ and ’Romance’, are currently present in the store.present in the store.

– SELECT s.shelf_id FROM Shelf_Inventory s WHERE SELECT s.shelf_id FROM Shelf_Inventory s WHERE bmap2Count(bmapAnd( s.item_bmap, SELECT bmap2Count(bmapAnd( s.item_bmap, SELECT bmapAnd(p.Adventure, p.Romance) FROM bmapAnd(p.Adventure, p.Romance) FROM Propery_Inventory p) ) > 0; AND Propery_Inventory p) ) > 0; AND s.time=<current_date>; s.time=<current_date>;

Page 22: PPT

Road AheadRoad Ahead• Extension to bitmap proposal: Extension to bitmap proposal:

– Bitmap datatype is more appropriate for initial bulk-Bitmap datatype is more appropriate for initial bulk-load & batch updates.load & batch updates.

– It performs badly for incremental updates.It performs badly for incremental updates.– A ‘hybrid Scheme’ for incremental Updates:A ‘hybrid Scheme’ for incremental Updates:

• Maintain inventories periodic checkpoints using bitmaps.Maintain inventories periodic checkpoints using bitmaps.• For changes occurring between checkpoints, Maintain a For changes occurring between checkpoints, Maintain a

traditional item-level table. traditional item-level table. • Answer queries by merging the latest checkpoint bitmap Answer queries by merging the latest checkpoint bitmap

with the corresponding duration’s item-level data.with the corresponding duration’s item-level data.

• The epc_suffix in the collection may not be The epc_suffix in the collection may not be contiguouscontiguous– The bitmap will be sparse- Lot of zeros.The bitmap will be sparse- Lot of zeros.– Compress this using some encoding schemeCompress this using some encoding scheme

• Good for initial bulk loading and batch updates Good for initial bulk loading and batch updates • May reduce efficiency of bitmap operations.May reduce efficiency of bitmap operations.

Page 23: PPT

Open ProblemsOpen Problems

• Efficient methods data mining Efficient methods data mining problemsproblems– Trend analysisTrend analysis– Outlier detectionOutlier detection– Path clustering Path clustering

• We will try exploring data mining We will try exploring data mining applications to RFID data.applications to RFID data.

Page 24: PPT

RFID Data CleaningRFID Data Cleaning

Page 25: PPT

Issues in Data CleaningIssues in Data Cleaning• Lack of Completeness Lack of Completeness

– RFID readers capture only 60-70% of all tags RFID readers capture only 60-70% of all tags that are in the vicinity that are in the vicinity

– Smoothing of data is done to rectify the loss of Smoothing of data is done to rectify the loss of intermediate messages intermediate messages

• Temporal Nature of data or tag dynamics Temporal Nature of data or tag dynamics – RFID tags are in motion and that is what makes RFID tags are in motion and that is what makes

them more difficult to handle them more difficult to handle – But motion of a tag causes dropping of messages But motion of a tag causes dropping of messages

• RFID data streams are very fast and are RFID data streams are very fast and are huge in number huge in number – Hence filtering is important before sending them Hence filtering is important before sending them

to database to database

Page 26: PPT

Current StrategiesCurrent Strategies

• Temporal Granule:Temporal Granule:– Based on the fact that tag data do not Based on the fact that tag data do not

differ much over a small time perioddiffer much over a small time period– Data can be clubbed on a small time Data can be clubbed on a small time

frameframe

• Spatial Granule: Spatial Granule: – Similarly, data from physically close Similarly, data from physically close

readers are also homogeneous readers are also homogeneous

Page 27: PPT

Stages of ESPStages of ESP• Point:Point: operates over a single value in a operates over a single value in a

sensor stream, filtered by a predicate in sensor stream, filtered by a predicate in the WHERE clause the WHERE clause

• Smooth:Smooth: granularity defined by granularity defined by applications to correct for missed applications to correct for missed readings temporally (over one input readings temporally (over one input only); uses aggregate function over the only); uses aggregate function over the input. input.

• Merge:Merge: granularity specified by the granularity specified by the application to correct for missed application to correct for missed readings spatially; grouped by the readings spatially; grouped by the specified spatial granule. specified spatial granule.

Page 28: PPT

Stages of ESP (contd.)Stages of ESP (contd.)• Arbitrate:Arbitrate: deals with deals with

conflicts between conflicts between different spatial different spatial granules; grouped by granules; grouped by spatial granule first and spatial granule first and then uses HAVING then uses HAVING construct to determine construct to determine those conflicts those conflicts

• Virtualize:Virtualize: used for used for combining data streams combining data streams from different sources, from different sources, could also be different could also be different devices; join construct is devices; join construct is used to combine the used to combine the different data streams different data streams and then filtered using and then filtered using some predicate some predicate

Page 29: PPT

Smooth stageSmooth stage

• False Positives:False Positives: (erroneous readings) (erroneous readings) reporting objects that are not actually reporting objects that are not actually present present

• False Negatives:False Negatives: (missed readings) not (missed readings) not reporting objects that actually are present reporting objects that actually are present

False positives and False Negatives [Jeff06]

Page 30: PPT

Tag ListTag List• The reader has an internal table called the The reader has an internal table called the Tag ListTag List. . • An epoch is the smallest unit of interaction between the An epoch is the smallest unit of interaction between the

reader and the middleware. reader and the middleware. • Every epoch consists of certain number of Interrogation Every epoch consists of certain number of Interrogation

cycles cycles • Interrogation Cycle is one run of the reader protocol to Interrogation Cycle is one run of the reader protocol to

determine all tagsdetermine all tags• At every epoch the reader sends the tag list to the At every epoch the reader sends the tag list to the

middleware. middleware.

Tag IDTag ID ResponsResponseses

TimestaTimestampmp

1234123412341234 66 t1t1

1234789012347890 11 t2t2

Page 31: PPT

SMURF – Per tag SMURF – Per tag CleaningCleaning

• SMURF uses statistical methods to reduce SMURF uses statistical methods to reduce the false negative and false positives the false negative and false positives happening in the RFID stream. happening in the RFID stream.

• The goal here is two fold: one is to determine The goal here is two fold: one is to determine the statistical window size, and secondly, the statistical window size, and secondly, ensuring that the transition of the tags is ensuring that the transition of the tags is determined. determined.

• To determine the window size we need to fit a To determine the window size we need to fit a probability distribution to the sample size probability distribution to the sample size

• And to determine the transition of the tag out And to determine the transition of the tag out of the reader's vicinity, we define a 98% of the reader's vicinity, we define a 98% confidence interval within that probability confidence interval within that probability distribution function on the sample size distribution function on the sample size |S|Sii||. .

Page 32: PPT

SMURF SMURF –– Per tag Cleaning Per tag Cleaning (contd.)(contd.)

• Using the tag list, per-epoch sampling Using the tag list, per-epoch sampling probability, probability, ppi,ti,t is determined, is determined,ppi,ti,t = number of times tag was read in a = number of times tag was read in a epoch / epoch /

interrogation cycles per epoch interrogation cycles per epoch• We average this over the sample size We average this over the sample size |S|Sii||

to get the average read rate (to get the average read rate (ppiiavgavg) for a ) for a

tag tag ii. . • If same probability of If same probability of ppii is assumed for is assumed for

each epoch throughout the window then each epoch throughout the window then each successful observation is like a each successful observation is like a Bernoulli trail.Bernoulli trail.

Page 33: PPT

SMURF SMURF –– Per tag Cleaning Per tag Cleaning (contd.)(contd.)

• So, So, |S|Sii|| is the binomial random variable for a is the binomial random variable for a sample sample SSii with mean = with mean = wwii. p. pii

avgavg and variance and variance = = wwii. p. pii

avgavg. (1-p. (1-piiavgavg))

• Now using this we can express the window Now using this we can express the window size as a limit, size as a limit,

• If the current window size is less than the If the current window size is less than the calculated one then the window size is calculated one then the window size is adjusted accordingly. adjusted accordingly.

• Similarly using the Central limit theorem for Similarly using the Central limit theorem for transition detection we get transition detection we get ||S||Sii| - | - μμ| > 2 | > 2 σσ

Page 34: PPT

Normal Sliding Normal Sliding window….window….

• Epoch based mid-point sliding windowEpoch based mid-point sliding window• Emits a reading with an epoch value Emits a reading with an epoch value

corresponding to the middle of the corresponding to the middle of the windowwindow

Page 35: PPT

Ensuring CompletenessEnsuring Completeness

• In the first window, In the first window, ppiiavgavg demands a larger demands a larger

windowwindow• Thus window size is increasedThus window size is increased

Page 36: PPT

Transition DetectionTransition Detection

• In the first window the number of readings In the first window the number of readings decreases significantly (and statistically) decreases significantly (and statistically)

• Thus a transition is likely to have occurred; Thus a transition is likely to have occurred; so window is halvedso window is halved

[Fraklin06]

Page 37: PPT

SMURF – Multi-tag SMURF – Multi-tag aggregate Cleaningaggregate Cleaning

• Similar to per-tag cleaning, the window for multi-tag Similar to per-tag cleaning, the window for multi-tag cleaning is determined by:cleaning is determined by:

Here, Here, ppavgavg is the average per-epoch sampling is the average per-epoch sampling probability over all observed tags. probability over all observed tags.

• To detect the transition in population count, we To detect the transition in population count, we estimate the population count of two windows [estimate the population count of two windows [t – wt – wii, , tt] and [] and [t – wt – wii/2/2, , tt]; with true populations: ]; with true populations: NNww & & NNw’w’

• Thus, for a transition to have happened, we need the Thus, for a transition to have happened, we need the difference between the two estimates to be within the difference between the two estimates to be within the limit: limit:

2(2(σσww + + σσw’w’))

Page 38: PPT

SMURF – Multi-tag SMURF – Multi-tag aggregate Cleaningaggregate Cleaning

• To calculate the estimate of population count, To calculate the estimate of population count, we use we use ππ-estimators; The estimated population -estimators; The estimated population count is given by: count is given by:

• Similarly by Similarly by ππ-estimators, and assuming -estimators, and assuming independence across different tags, the independence across different tags, the variance of the estimate is estimated as:variance of the estimate is estimated as:

• Here Here ππii is probability of reading the tag is probability of reading the tag i i at at least once during the whole window, given by least once during the whole window, given by 1 – (1 – p1 – (1 – pii

avgavg))ww

Page 39: PPT

The Road ahead…The Road ahead…• Applications in RFID do not accept any delays Applications in RFID do not accept any delays

in the data delivery in the data delivery • Data is either present in the cache or the Data is either present in the cache or the

database; data in the database increases database; data in the database increases processing time and data in cache does not processing time and data in cache does not understand SQL like queriesunderstand SQL like queries

• Anomaly detection in object tracking is also an Anomaly detection in object tracking is also an important part of object tracking important part of object tracking

• Issues like untraceability, forward security, Issues like untraceability, forward security, and database desynchronization are still not and database desynchronization are still not completely resolved. completely resolved.

• One more serious problem with RFID is One more serious problem with RFID is counterfeiting counterfeiting

• In the next stage we expect to look into some In the next stage we expect to look into some of these issues of these issues

Page 40: PPT

????????

Page 41: PPT

Thank You.Thank You.

Page 42: PPT

ReferencesReferences

• Xiaolei Li, Hector Gonzalez, Jiawei Han and Diego Klabjan. Warehousing and analyzing massive RFID data sets. ICDE, 2006.

• Fusheng Wang and Peiya Liu. Temporal management of RFID data. VLDB, 2005.

• Timothy Chorma, Ying Hu, Seema Sundara and Jagannathan Srinivasan. Supporting RFID-based item tracking applications in oracle DBMS using a bitmap datatype. VLDB, 2005.

Page 43: PPT

ReferencesReferences

• Minos Garofalakis, Shawn R. Jeffery and Michael J. Franklin. Adaptive cleaning for RFID data streams. VLDB, 2006.

• J. Franklin, Wei Hong, Shawn R. Jeffery, Gustavo Alonso and Jennifer Widom. Declarative support for sensor data cleaning. In Pervasive, 2006.

• Sridhar Ramachandran Sudarshan S. Chawathe, Venkat Krishnamurthy and Sanjay E. Sarma. Managing RFID data. VLDB, 2004.