Twitter Frenzy FPGA Data Stream Processing
Cory Kleinheksel (Team Leader)Tim Meyer
David GrazianoJosh Clausman
Project Idea • Twitter Frenzy - A way to filter tweets as a set of frequencies using a FPGA
to perform packet analysis.
• Accelerate the stream processing of Twitter data queries.
• Specifically accelerate computationally intensive and long life-time queries with data with short life-times.
• The design/implementation of a frequency-based query will be the primary focus (interesting application of signal processing).
Details • Input: Live (or simulated) Twitter stream data
• Java program used to simulate twitter feed by reading from a dataset
• Processing:1. Extract tweets from input stream2. Filter tweets based on query parameters
• Text Matching3. Determine tweet frequency components
• Frequency Analysis4. Apply signal filter (signal processing)
• Output: Tweets matching filter
Design Issues
• Ability to acquire data from twitter at a useful speed
• Determining packet usefulness (send/drop) in efficient manner
• Managing concurrently arriving packets and multi-fragment packets
• How to calculate frequency and filter corresponding packets
Implementation Issues• How to properly buffer and send fragmented tweets
• Time/clock cycles needed to perform frequency calculations
• Time to perform Hashing – Created a lookup table based hashing block
• Modules consuming data at different rates
• Debugging HW
System Architecture Diagram
Breakdown: Network Data Flow
Breakdown: Text Matching
Breakdown: Frequency Analysis
Algorithms
• Hashing
• String Matching
• Frequency Analysis
• Filtering (FIR)
Project Results
• Analyzed the problem
• Implemented full simulator in software
• Implemented in VHDL
• Simulated in ModelSim
• Tested on hardware, confirmed results against software implementation
Dataset: JSON_29493.txtProcessed 29493 tweets192 passed string filter133 passed frequency filter
Software Simulator Example
Demo
References
Berinde, Indyk, Cormode, Strauss. "Space-optimal Heavy Hitters with Strong Error Bounds"
Cormode, Korn, Tirthapura. "Time-Decaying Aggregates in Out-of-order Streams"
Charikar, Chen, Farach-Colton. "Finding Frequent Items in Data Streams“