1 Copyright MySQL AB The World’s Most Popular Open Source Database 1 The World’s Most Popular Open Source Database Tagging and Folksonomy Schema Design for Scalability and Performance Jay Pipes Community Relations Manager, North America ([email protected]) MySQL, Inc. TELECONFERENCE: please dial a number to hear the audio portion of this presentation. Toll-free US/Canada: 866-469-3239 Direct US/Canada: 650-429-3300 : 0800-295-240 Austria 0800-71083 Belgium 80-884912 Denmark 0-800-1-12585 Finland 0800-90-5571 France 0800-101-6943 Germany 00800-12-6759 Greece 1-800-882019 Ireland 800-780-632 Italy 0-800-9214652 Israel 800-2498 Luxembourg 0800-022-6826 Netherlands 800-15888 Norway 900-97-1417 Spain 020-79-7251 Sweden 0800-561-201 Switzerland 0800-028-8023 UK 1800-093-897 Australia 800-90-3575 Hong Kong 00531-12-1688 Japan EVENT NUMBER/ACCESS CODE:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 1The Worldrsquos Most Popular Open Source Database
Tagging and FolksonomySchema Design for Scalability and Performance
Jay PipesCommunity Relations Manager North America
(jaymysqlcom)MySQL Inc
TELECONFERENCE please dial a number to hear the audio portion of this presentation
Toll-free USCanada 866-469-3239 Direct USCanada 650-429-3300 0800-295-240 Austria 0800-71083 Belgium80-884912 Denmark 0-800-1-12585 Finland0800-90-5571 France 0800-101-6943 Germany00800-12-6759 Greece 1-800-882019 Ireland800-780-632 Italy 0-800-9214652 Israel800-2498 Luxembourg 0800-022-6826 Netherlands800-15888 Norway 900-97-1417 Spain020-79-7251 Sweden 0800-561-201 Switzerland0800-028-8023 UK 1800-093-897 Australia800-90-3575 Hong Kong 00531-12-1688 Japan
EVENT NUMBERACCESS CODE
2Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 2The Worldrsquos Most Popular Open Source Database
craigslist
Communicate
Connect
Share
Play
Trade
Search amp Look Up
3Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 3The Worldrsquos Most Popular Open Source Database
Software Industry Evolution
CLOSED SOURCE SOFTWARE
FREE amp OPEN SOURCE SOFTWARE
20051995
WEB 10 WEB 20
1990 2000
4Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 4The Worldrsquos Most Popular Open Source Database
How MySQL Enables Web 20
lower tco ubiquitous with developers
ease of use
interoperableconnectors
language support
bundledopen sourcescale out
replication
query cache
session management
tag databases
world-widecommunity
lampfull-text search
large objects
pluggable storage engines
reliability
performance
XML
partitioning (51)
5Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 5The Worldrsquos Most Popular Open Source Database
Agenda
bull Web 20 more than just rounded cornersbull Tagging Concepts in SQL bull Folksonomy Concepts in SQLbull Scaling out sensibly
6Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 6The Worldrsquos Most Popular Open Source Database
More than just rounded corners
bull What is Web 20ndash Participationndash Interactionndash Connection
bull A changing of design patternsndash AJAX and XML-RPC are changing the way data is queriedndash Rich web-based clientsndash Everyones got an APIndash API leads to increased requests per second Must deal with
growth
7Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 7The Worldrsquos Most Popular Open Source Database
AJAX and XML-RPC change the game
bull Traditional (Web 10) page request builds entire pagendash Lots of HTML style and data in each page requestndash Lots of data processed or queried in each requestndash Complex page requests and application logic
bull AJAXXML-RPC request returns small data set or page fragmentndash Smaller amount of data being passed on each requestndash But often many more requests compared to Web 10ndash Exposed APIs mean data services distribute your data to
syndicated sitesndash RSS feeds supply data not full web page
8Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 8The Worldrsquos Most Popular Open Source Database
flickr Tag Cloud(s)
9Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 9The Worldrsquos Most Popular Open Source Database
Participation Interaction and Connection
bull Multiple users editing the same contentndash Content explosionndash Lots of textual data to manage How do we organize it
bull User-driven datandash ldquoTrackedrdquo pagesitemsndash Allows interlinking of content via the user to user relationship
(folksonomy)
bull Tags becoming the new categorizationfiling of this contentndash Anyone can tag the datandash Tags connect one thing to another similar to the way that the
folksonomy relationships link users together
bull So the common concept here is ldquolinkingrdquo
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
2Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 2The Worldrsquos Most Popular Open Source Database
craigslist
Communicate
Connect
Share
Play
Trade
Search amp Look Up
3Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 3The Worldrsquos Most Popular Open Source Database
Software Industry Evolution
CLOSED SOURCE SOFTWARE
FREE amp OPEN SOURCE SOFTWARE
20051995
WEB 10 WEB 20
1990 2000
4Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 4The Worldrsquos Most Popular Open Source Database
How MySQL Enables Web 20
lower tco ubiquitous with developers
ease of use
interoperableconnectors
language support
bundledopen sourcescale out
replication
query cache
session management
tag databases
world-widecommunity
lampfull-text search
large objects
pluggable storage engines
reliability
performance
XML
partitioning (51)
5Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 5The Worldrsquos Most Popular Open Source Database
Agenda
bull Web 20 more than just rounded cornersbull Tagging Concepts in SQL bull Folksonomy Concepts in SQLbull Scaling out sensibly
6Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 6The Worldrsquos Most Popular Open Source Database
More than just rounded corners
bull What is Web 20ndash Participationndash Interactionndash Connection
bull A changing of design patternsndash AJAX and XML-RPC are changing the way data is queriedndash Rich web-based clientsndash Everyones got an APIndash API leads to increased requests per second Must deal with
growth
7Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 7The Worldrsquos Most Popular Open Source Database
AJAX and XML-RPC change the game
bull Traditional (Web 10) page request builds entire pagendash Lots of HTML style and data in each page requestndash Lots of data processed or queried in each requestndash Complex page requests and application logic
bull AJAXXML-RPC request returns small data set or page fragmentndash Smaller amount of data being passed on each requestndash But often many more requests compared to Web 10ndash Exposed APIs mean data services distribute your data to
syndicated sitesndash RSS feeds supply data not full web page
8Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 8The Worldrsquos Most Popular Open Source Database
flickr Tag Cloud(s)
9Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 9The Worldrsquos Most Popular Open Source Database
Participation Interaction and Connection
bull Multiple users editing the same contentndash Content explosionndash Lots of textual data to manage How do we organize it
bull User-driven datandash ldquoTrackedrdquo pagesitemsndash Allows interlinking of content via the user to user relationship
(folksonomy)
bull Tags becoming the new categorizationfiling of this contentndash Anyone can tag the datandash Tags connect one thing to another similar to the way that the
folksonomy relationships link users together
bull So the common concept here is ldquolinkingrdquo
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
3Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 3The Worldrsquos Most Popular Open Source Database
Software Industry Evolution
CLOSED SOURCE SOFTWARE
FREE amp OPEN SOURCE SOFTWARE
20051995
WEB 10 WEB 20
1990 2000
4Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 4The Worldrsquos Most Popular Open Source Database
How MySQL Enables Web 20
lower tco ubiquitous with developers
ease of use
interoperableconnectors
language support
bundledopen sourcescale out
replication
query cache
session management
tag databases
world-widecommunity
lampfull-text search
large objects
pluggable storage engines
reliability
performance
XML
partitioning (51)
5Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 5The Worldrsquos Most Popular Open Source Database
Agenda
bull Web 20 more than just rounded cornersbull Tagging Concepts in SQL bull Folksonomy Concepts in SQLbull Scaling out sensibly
6Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 6The Worldrsquos Most Popular Open Source Database
More than just rounded corners
bull What is Web 20ndash Participationndash Interactionndash Connection
bull A changing of design patternsndash AJAX and XML-RPC are changing the way data is queriedndash Rich web-based clientsndash Everyones got an APIndash API leads to increased requests per second Must deal with
growth
7Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 7The Worldrsquos Most Popular Open Source Database
AJAX and XML-RPC change the game
bull Traditional (Web 10) page request builds entire pagendash Lots of HTML style and data in each page requestndash Lots of data processed or queried in each requestndash Complex page requests and application logic
bull AJAXXML-RPC request returns small data set or page fragmentndash Smaller amount of data being passed on each requestndash But often many more requests compared to Web 10ndash Exposed APIs mean data services distribute your data to
syndicated sitesndash RSS feeds supply data not full web page
8Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 8The Worldrsquos Most Popular Open Source Database
flickr Tag Cloud(s)
9Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 9The Worldrsquos Most Popular Open Source Database
Participation Interaction and Connection
bull Multiple users editing the same contentndash Content explosionndash Lots of textual data to manage How do we organize it
bull User-driven datandash ldquoTrackedrdquo pagesitemsndash Allows interlinking of content via the user to user relationship
(folksonomy)
bull Tags becoming the new categorizationfiling of this contentndash Anyone can tag the datandash Tags connect one thing to another similar to the way that the
folksonomy relationships link users together
bull So the common concept here is ldquolinkingrdquo
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
4Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 4The Worldrsquos Most Popular Open Source Database
How MySQL Enables Web 20
lower tco ubiquitous with developers
ease of use
interoperableconnectors
language support
bundledopen sourcescale out
replication
query cache
session management
tag databases
world-widecommunity
lampfull-text search
large objects
pluggable storage engines
reliability
performance
XML
partitioning (51)
5Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 5The Worldrsquos Most Popular Open Source Database
Agenda
bull Web 20 more than just rounded cornersbull Tagging Concepts in SQL bull Folksonomy Concepts in SQLbull Scaling out sensibly
6Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 6The Worldrsquos Most Popular Open Source Database
More than just rounded corners
bull What is Web 20ndash Participationndash Interactionndash Connection
bull A changing of design patternsndash AJAX and XML-RPC are changing the way data is queriedndash Rich web-based clientsndash Everyones got an APIndash API leads to increased requests per second Must deal with
growth
7Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 7The Worldrsquos Most Popular Open Source Database
AJAX and XML-RPC change the game
bull Traditional (Web 10) page request builds entire pagendash Lots of HTML style and data in each page requestndash Lots of data processed or queried in each requestndash Complex page requests and application logic
bull AJAXXML-RPC request returns small data set or page fragmentndash Smaller amount of data being passed on each requestndash But often many more requests compared to Web 10ndash Exposed APIs mean data services distribute your data to
syndicated sitesndash RSS feeds supply data not full web page
8Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 8The Worldrsquos Most Popular Open Source Database
flickr Tag Cloud(s)
9Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 9The Worldrsquos Most Popular Open Source Database
Participation Interaction and Connection
bull Multiple users editing the same contentndash Content explosionndash Lots of textual data to manage How do we organize it
bull User-driven datandash ldquoTrackedrdquo pagesitemsndash Allows interlinking of content via the user to user relationship
(folksonomy)
bull Tags becoming the new categorizationfiling of this contentndash Anyone can tag the datandash Tags connect one thing to another similar to the way that the
folksonomy relationships link users together
bull So the common concept here is ldquolinkingrdquo
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
5Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 5The Worldrsquos Most Popular Open Source Database
Agenda
bull Web 20 more than just rounded cornersbull Tagging Concepts in SQL bull Folksonomy Concepts in SQLbull Scaling out sensibly
6Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 6The Worldrsquos Most Popular Open Source Database
More than just rounded corners
bull What is Web 20ndash Participationndash Interactionndash Connection
bull A changing of design patternsndash AJAX and XML-RPC are changing the way data is queriedndash Rich web-based clientsndash Everyones got an APIndash API leads to increased requests per second Must deal with
growth
7Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 7The Worldrsquos Most Popular Open Source Database
AJAX and XML-RPC change the game
bull Traditional (Web 10) page request builds entire pagendash Lots of HTML style and data in each page requestndash Lots of data processed or queried in each requestndash Complex page requests and application logic
bull AJAXXML-RPC request returns small data set or page fragmentndash Smaller amount of data being passed on each requestndash But often many more requests compared to Web 10ndash Exposed APIs mean data services distribute your data to
syndicated sitesndash RSS feeds supply data not full web page
8Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 8The Worldrsquos Most Popular Open Source Database
flickr Tag Cloud(s)
9Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 9The Worldrsquos Most Popular Open Source Database
Participation Interaction and Connection
bull Multiple users editing the same contentndash Content explosionndash Lots of textual data to manage How do we organize it
bull User-driven datandash ldquoTrackedrdquo pagesitemsndash Allows interlinking of content via the user to user relationship
(folksonomy)
bull Tags becoming the new categorizationfiling of this contentndash Anyone can tag the datandash Tags connect one thing to another similar to the way that the
folksonomy relationships link users together
bull So the common concept here is ldquolinkingrdquo
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
6Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 6The Worldrsquos Most Popular Open Source Database
More than just rounded corners
bull What is Web 20ndash Participationndash Interactionndash Connection
bull A changing of design patternsndash AJAX and XML-RPC are changing the way data is queriedndash Rich web-based clientsndash Everyones got an APIndash API leads to increased requests per second Must deal with
growth
7Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 7The Worldrsquos Most Popular Open Source Database
AJAX and XML-RPC change the game
bull Traditional (Web 10) page request builds entire pagendash Lots of HTML style and data in each page requestndash Lots of data processed or queried in each requestndash Complex page requests and application logic
bull AJAXXML-RPC request returns small data set or page fragmentndash Smaller amount of data being passed on each requestndash But often many more requests compared to Web 10ndash Exposed APIs mean data services distribute your data to
syndicated sitesndash RSS feeds supply data not full web page
8Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 8The Worldrsquos Most Popular Open Source Database
flickr Tag Cloud(s)
9Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 9The Worldrsquos Most Popular Open Source Database
Participation Interaction and Connection
bull Multiple users editing the same contentndash Content explosionndash Lots of textual data to manage How do we organize it
bull User-driven datandash ldquoTrackedrdquo pagesitemsndash Allows interlinking of content via the user to user relationship
(folksonomy)
bull Tags becoming the new categorizationfiling of this contentndash Anyone can tag the datandash Tags connect one thing to another similar to the way that the
folksonomy relationships link users together
bull So the common concept here is ldquolinkingrdquo
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
7Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 7The Worldrsquos Most Popular Open Source Database
AJAX and XML-RPC change the game
bull Traditional (Web 10) page request builds entire pagendash Lots of HTML style and data in each page requestndash Lots of data processed or queried in each requestndash Complex page requests and application logic
bull AJAXXML-RPC request returns small data set or page fragmentndash Smaller amount of data being passed on each requestndash But often many more requests compared to Web 10ndash Exposed APIs mean data services distribute your data to
syndicated sitesndash RSS feeds supply data not full web page
8Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 8The Worldrsquos Most Popular Open Source Database
flickr Tag Cloud(s)
9Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 9The Worldrsquos Most Popular Open Source Database
Participation Interaction and Connection
bull Multiple users editing the same contentndash Content explosionndash Lots of textual data to manage How do we organize it
bull User-driven datandash ldquoTrackedrdquo pagesitemsndash Allows interlinking of content via the user to user relationship
(folksonomy)
bull Tags becoming the new categorizationfiling of this contentndash Anyone can tag the datandash Tags connect one thing to another similar to the way that the
folksonomy relationships link users together
bull So the common concept here is ldquolinkingrdquo
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
8Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 8The Worldrsquos Most Popular Open Source Database
flickr Tag Cloud(s)
9Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 9The Worldrsquos Most Popular Open Source Database
Participation Interaction and Connection
bull Multiple users editing the same contentndash Content explosionndash Lots of textual data to manage How do we organize it
bull User-driven datandash ldquoTrackedrdquo pagesitemsndash Allows interlinking of content via the user to user relationship
(folksonomy)
bull Tags becoming the new categorizationfiling of this contentndash Anyone can tag the datandash Tags connect one thing to another similar to the way that the
folksonomy relationships link users together
bull So the common concept here is ldquolinkingrdquo
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
9Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 9The Worldrsquos Most Popular Open Source Database
Participation Interaction and Connection
bull Multiple users editing the same contentndash Content explosionndash Lots of textual data to manage How do we organize it
bull User-driven datandash ldquoTrackedrdquo pagesitemsndash Allows interlinking of content via the user to user relationship
(folksonomy)
bull Tags becoming the new categorizationfiling of this contentndash Anyone can tag the datandash Tags connect one thing to another similar to the way that the
folksonomy relationships link users together
bull So the common concept here is ldquolinkingrdquo
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
10Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 10The Worldrsquos Most Popular Open Source Database
What The User Dimension Gives You
bull Folksonomy adds the ldquouser dimensionrdquobull Tags often provide the ldquogluerdquo between user and item
dimensionsbull Answers questions such as
ndash Who shares my interestsndash What are the other interests of users who share my interestsndash Someone tagged my bookmark (or any other item) with
something What other items did that person tag with the same thing
ndash What is the users immediate ldquoecosystemrdquo What are the fringes of that ecosystem
bull Marks a new aspect of web applications into the world of data warehousing and analysis
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
11Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 11The Worldrsquos Most Popular Open Source Database
Linking Is the ldquoRrdquo in RDBMS
bull The schema becomes the absolute driving force behind Web 20 applications
bull Understanding of many-to-many relationships is criticalbull Take advantage of MySQLs architecture so that our
schema is as efficient as possiblebull Ensure your data store is normalized standardized
and consistentbull So lets digg in (pun intended)
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
12Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 12The Worldrsquos Most Popular Open Source Database
Comparison of Tag Schema Designs
bull ldquoMySQLiciousrdquondash Entirely denormalizedndash One main fact table with one field storing delimited list of tagsndash Unless using FULLTEXT indexing does not scale well at allndash Very inflexible
bull ldquoScuttlerdquo solutionndash Two tables Lookup one-way from item table to tag tablendash Somewhat inflexiblendash Doesnt represent a many-to-many relationship correctly
bull ldquoToxirdquo solutionndash Almost gets it right except uses surrogate keys in mapping
tables ndash Flexible normalized approachndash Closest to our recommended architecture
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
13Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 13The Worldrsquos Most Popular Open Source Database
Tagging Concepts in SQL
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
14Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 14The Worldrsquos Most Popular Open Source Database
The Tags Table
bull The ldquoTagsrdquo table is the foundational table upon which all tag links are built
bull Lean and meanbull Make primary key an INT
CREATE TABLE Tags (tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT tag_text VARCHAR(50) NOT NULL PRIMARY KEY pk_Tags (tag_id) UNIQUE INDEX uix_TagText (tag_text)) ENGINE=InnoDB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
15Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 15The Worldrsquos Most Popular Open Source Database
Example Tag2Post Mapping Table
bull The mapping table creates the link between a tag and anything else
bull In other terms it maps a many-to-many relationshipbull Important to index from both ldquosidesrdquo
CREATE TABLE Tag2Post (tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY pk_Tag2Post (tag_id post_id) INDEX (post_id)) ENGINE=InnoDB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
16Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 16The Worldrsquos Most Popular Open Source Database
The Tag Cloud
bull Tag density typically represented by larger fonts or different colors
SELECT tag_text COUNT() as num_tagsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idGROUP BY tag_text
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
17Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 17The Worldrsquos Most Popular Open Source Database
Efficiency Issues with the Tag Cloudbull With InnoDB tables you dont want to be issuing
COUNT() queries even on an indexed fieldCREATE TABLE TagStattag_id INT UNSIGNED NOT NULL num_posts INT UNSIGNED NOT NULL num_xxx INT UNSIGNED NOT NULL PRIMARY KEY (tag_id)) ENGINE=InnoDB
SELECT tag_text tsnum_postsFROM Tag2Post t2pINNER JOIN Tags tON t2ptag_id = ttag_idINNER JOIN TagStat tsON ttag_id = tstag_idGROUP BY tag_text
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
18Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 18The Worldrsquos Most Popular Open Source Database
The Typical Related Items Query
bull Get all posts tagged with any tag attached to Post 6bull In other words ldquoGet me all posts related to post 6rdquo
SELECT p2post_id FROM Tag2Post p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idWHERE p1post_id = 6GROUP BY p2post_id
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
19Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 19The Worldrsquos Most Popular Open Source Database
Problems With Related Items Query
bull Joining small to medium sized ldquotag setsrdquo works greatbull But when youve got a large tag set on either ldquosiderdquo of
the join problems can occur with scalablitybull One way to solve is via derived tables
SELECT p2post_id FROM (SELECT tag_id FROM Tag2PostWHERE post_id = 6 LIMIT 10) AS p1INNER JOIN Tag2Post p2ON p1tag_id = p2tag_idGROUP BY p2post_id LIMIT 10
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
20Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 20The Worldrsquos Most Popular Open Source Database
The Typical Related Tags Querybull Get all tags related to a particular tag via an itembull The ldquoreverserdquo of the related items query we want a set
of related tags not related posts
SELECT t2p2tag_id t2tag_text FROM (SELECT post_id FROM Tags t1INNER JOIN Tag2Post ON t1tag_id = Tag2Posttag_idWHERE t1tag_text = beach LIMIT 10) AS t2p1INNER JOIN Tag2Post t2p2ON t2p1post_id = t2p2post_idINNER JOIN Tags t2ON t2p2tag_id = t2tag_idGROUP BY t2p2tag_id LIMIT 10
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
21Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 21The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag
bull What if we want only items related to each of a set of tags
bull Here is the typical way of dealing with this problem
SELECT t2ppost_idFROM Tags t1 INNER JOIN Tag2Post t2pON t1tag_id = t2ptag_idWHERE t1tag_text IN (beachcloud)GROUP BY t2ppost_id HAVING COUNT(DISTINCT t2ptag_id) = 2
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
22Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 22The Worldrsquos Most Popular Open Source Database
Dealing With More Than One Tag (contd)
bull The GROUP BY and the HAVING COUNT(DISTINCT ) can be eliminated through joins
bull Thus you eliminate the Using temporary using filesort in the query execution
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
23Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 23The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset
bull Here we want have a search query like ldquoGive me all posts tagged with ldquobeachrdquo and ldquocloudrdquo but not tagged with ldquoflowerrdquo Typical solution you will see
SELECT t2p1post_idFROM Tag2Post t2p1INNER JOIN Tags t1 ON t2p1tag_id = t1tag_idWHERE t1tag_text IN (beachcloud)AND t2p1post_id NOT IN(SELECT post_id FROM Tag2Post t2p2INNER JOIN Tags t2 ON t2p2tag_id = t2tag_idWHERE t2tag_text = flower)GROUP BY t2p1post_id HAVING COUNT(DISTINCT t2p1tag_id) = 2
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
24Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 24The Worldrsquos Most Popular Open Source Database
Excluding Tags From A Resultset (contd)bull More efficient to use an outer join to filter out the
ldquominusrdquo operator plus get rid of the GROUP BY
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
25Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 25The Worldrsquos Most Popular Open Source Database
Summary of SQL Tips for Tagging
bull Index fields in mapping tables properly Ensure that each GROUP BY can access an index from the left side of the index
bull Use summary or statistic tables to eliminate the use of COUNT() expressions in tag clouding
bull Get rid of GROUP BY and HAVING COUNT() by using standard join techniques
bull Get rid of NOT IN expressions via a standard outer joinbull Use derived tables with an internal LIMIT expression to
prevent wild relation queries from breaking scalability
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
26Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 26The Worldrsquos Most Popular Open Source Database
Folksonomy Concepts in SQL
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
27Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 27The Worldrsquos Most Popular Open Source Database
Here we see tagging and folksonomy
together the user dimension
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
28Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 28The Worldrsquos Most Popular Open Source Database
Folksonomy Adds The User Dimension
bull Adding the user dimension to our schemabull The tag is the relationship glue between the user and
item dimensions
CREATE TABLE UserTagPost (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NULL post_id INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id post_id) INDEX (tag_id) INDEX (post_id)) ENGINE=InnoDB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
29Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 29The Worldrsquos Most Popular Open Source Database
Who Shares My Interest Directly
bull Find out the users who have linked to the same item I have
bull Direct link we dont go through the tag glue
SELECT user_id FROM UserTagPostWHERE post_id = my_post_id
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
30Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 30The Worldrsquos Most Popular Open Source Database
Who Shares My Interests Indirectly
bull Find out the users who have similar tag setsbull But how much matching do we want to do In other
words what radius do we want to match onbull The first step is to find my tags that are within the
search radius this yields my ldquotoprdquo or most popular tags
SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
31Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 31The Worldrsquos Most Popular Open Source Database
Who Shares My Interests (contd)
bull Now that we have our ldquotoprdquo tag set we want to find users who match all of our top tags
SELECT othersuser_id FROM UserTagPost others INNER JOIN (SELECT tag_id FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_id
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
32Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 32The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matchesbull What about finding our ldquoclosestrdquo ecosystem matchesbull We can ldquorankrdquo other users based on whether they have
tagged items a number of times similar to ourselvesSELECT othersuser_id (COUNT() shy my_tagsnum_tags) AS rankFROM UserTagPost others INNER JOIN (SELECT tag_id COUNT() AS num_tags FROM UserTagPostWHERE user_id = my_user_idGROUP BY tag_idHAVING COUNT(tag_id) gt= radius) AS my_tagsON otherstag_id = my_tagstag_idGROUP BY othersuser_idORDER BY rank DESC
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
33Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 33The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficientlybull But weve still got our COUNT() problembull How about another summary table
CREATE TABLE UserTagStat (user_id INT UNSIGNED NOT NULL tag_id INT UNSIGNED NOT NUL num_posts INT UNSIGNED NOT NULL PRIMARY KEY (user_id tag_id) INDEX (tag_id)) ENGINE=InnoDB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
34Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 34The Worldrsquos Most Popular Open Source Database
Ranking Ecosystem Matches Efficiently 2bull Hey weve eliminated the aggregation
SELECT othersuser_id (othersnum_posts shy my_tagsnum_posts) AS rankFROM UserTagStat others INNER JOIN (SELECT tag_id num_postsFROM UserTagStatWHERE user_id = my_user_idAND num_posts gt= radius) AS my_tagsON otherstag_id = my_tagstag_idORDER BY rank DESC
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
35Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 35The Worldrsquos Most Popular Open Source Database
Scaling Out Sensibly
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
36Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 36The Worldrsquos Most Popular Open Source Database
SlaveMySQLServer
MasterMySQLServer
MySQL Replication (Scale Out)
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
WebAppServer
Writes amp Reads Reads Reads
hellip
Replication
Load Balancer
bull Write to one masterbull Read from many slavesbull Excellent for read intensive apps
SlaveMySQLServer
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
37Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 37The Worldrsquos Most Popular Open Source Database
Scale Out Using Replicationbull Master DB stores all writesbull Master has InnoDB tablesbull Slaves handle aggregate reads non-realtime readsbull Web servers can be load balanced (directed) to one or
more slavesbull Just plug in another slave to increase read performance
(thats scaling out)bull Slave can provide hot standby as well as backup server
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
38Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 38The Worldrsquos Most Popular Open Source Database
Scale Out Strategiesbull Slave storage engine can differ from
Masterndash InnoDB on Master (great updateinsertdelete
performance)ndash MyISAM on Slave (fantastic read performance
and well as excellent concurrent insert performance plus can use FULLTEXT indexing)
bull Push aggregated summary data in batches onto slaves for excellent read performance of semi-static datandash Example ldquothis weeks popular tagsrdquo
bull Generate the data via cron job on each slave No need to burden the master server
bull Truncate every week
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
39Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 39The Worldrsquos Most Popular Open Source Database
Scale Out Strategies (contd)bull Offload FULLTEXT indexing onto a FT
indexer such as Apache Lucene Mnogosearch Sphinx FT Engine etc
bull Use Partitioning feature of 51 to segment tag data across multiple partitions allowing you to spread disk load sensibly based on your tag text density
bull Use the MySQL Query Cache effectivelyndash Use SQL_NO_CACHE when selecting from
frequently updated tables (ex TagStat)ndash Very effective for high-read environments
can yield 200-250 performance improvement
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema
PlanetMySQL httpwwwplanetmysqlorg
Jay Pipes (jaymysqlcom)
MySQL AB
40Copyright MySQL AB The Worldrsquos Most Popular Open Source Database 40The Worldrsquos Most Popular Open Source Database
QuestionsMySQL Forge httpforgemysqlcom
MySQL Forge Tag Schema Wiki pageshttpforgemysqlcomwikiTagSchema