-
Figure 1: Model Viewer – one ofthree interfaces we have
developedto view and compare embeddingsof navigation graphs induced
fromsearch engine query transitions.While the LSE (Laplacian
spectralembedding) model to the leftshows different sub-intents for
thequery “brexit” (e.g., “brexit dealwhat is it”, the ASE
(Adjacencyspectral embedding) model to theright shows associated
events ofcomparable public interest (e.g.,“government shutdown” and
“hongkong protests”). These differentkinds of similar query may be
usedto drive different kinds of queryrecommendation in the
searchengine user experience.
Making Sense of Search: Using GraphEmbedding and Visualization
toTransform Query Understanding
Jonathan LarsonMicrosoft ResearchSilverdale, WA,
[email protected]
Nathan EvansMicrosoft ResearchSilverdale, WA,
[email protected]
Darren EdgeMicrosoft ResearchCambridge,
[email protected]
Christopher WhiteMicrosoft ResearchRedmond, WA,
[email protected]
Permission to make digital or hard copies of part or all of this
work for personal or classroom use is granted without fee provided
that copies are not made or distributed for profit or commercial
advantage and that copies bear this notice and the full citation on
the first page. Copyrights for third-party components of this work
must be honored. For all other uses, contact the Owner/Author.CHI
’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI,
USA.Copyright is held by the author/owner(s).ACM ISBN
978-1-4503-6819-3/20/04http://dx.doi.org/10.1145//3334480.3375233
AbstractWe present a suite of interfaces for the visual
explorationand evaluation of graph embeddings – machine
learningmodels that reveal implicit relationships not directly
ob-served in the input graph. Our focus is on the embeddingof
navigation graphs induced from search engine querylogs, and how
visualization of similar queries across dif-ferent embeddings,
combined with the interactive tuningof results through
multi-attribute ranking and post-filtering(e.g., using raw query
frequency or derived entity type), canprovide a universal
foundation for query recommendation.We describe the process of
technology transfer from ourapplied research team to the Microsoft
Bing product team,examining the critical role that visualization
played in theirdecisions to ship the technology on bing.com.
Author Keywordsquery logs; navigation graphs; graph embedding;
queryrecommendation; visualization; tech transfer
CCS Concepts•Human-centered computing → Activity centered
design;Visual analytics; •Information systems → Content rank-ing;
Query log analysis; Query suggestion; Evaluation ofretrieval
results;
CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI,
USA
CS36, Page 1
-
IntroductionWhile it is easy to make recommendations, it is hard
toprovide consistently good recommendations – especiallywhen the
questions can be vague and ill-formed, in any lan-guage, and on any
topic. This is the fundamental problemof search engine query
recommendation. A correspondingopportunity, arising from the
growing dominance of mobileinternet use, is that touch navigation
of recommendationlinks is significantly faster and easier than
manually typingnew queries. Successive rounds of query
recommendationalso allow the user to define and refine their intent
in an it-erative, incremental, and exploratory fashion. Developing
auniversal solution for query recommendation thus has thepotential
to transform the search experience for all users.
Stage Description
AwarenessKnowing ofinnovation
InterestSeeking moreinformation
EvaluationDeciding oninitial use
TrialLearning fromexperiences
AdoptionDeciding oncontinued use
Table 1: The diffusion ofinnovations (Rogers, 1962 [7]).
Figure 2: From graph layout in 2D(top) to graph embedding
projectedinto 3D (bottom). The proximity ofrelated vertices is
preserved.
This case study addresses the challenge of search
queryrecommendation in the context of the Bing search
engine,examining the critical role that data visualization and
inter-active data interfaces played in facilitating product
groupadoption of new recommendation mechanisms based ongraph
embedding. We describe the different qualities of thevisual
representations that drove the “diffusion” of this inno-vation
(Table 1) from our applied research team in MicrosoftResearch, via
key stakeholders in Bing product teams, tothe broader Bing
organization and onto bing.com.
The main lessons from this case study are twofold: (1) thatthe
sensitivity of graph embedding to its configuration andthe
subjectivity of evaluating recommendation quality de-mands a shared
medium for stakeholders to explore differ-ent options and discover
best practices; and (2) that whilestandalone explanatory data
visualizations may have suf-ficient power to develop awareness of
and interest in newtechnologies, shared exploratory data interfaces
may benecessary to drive the real-time evaluation by
individualstakeholders that leads to collective trial and
adoption.
Graph EmbeddingLet G = (V,E) represent a graph (network) of
vertices(nodes) connected by edges (links), where the edges
mayoptionally have a weight (strength) and direction (flow).Graph
embedding [1] describes a family of machine learn-ing techniques
that take conventional edge list or adjacencymatrix graph
representations, which are hard to reasonabout on account of their
discrete and high-dimensionalstructure, and transform them into
feature vector vertexrepresentations that are relatively easier to
reason aboutbecause their continuous and low-dimensional
structuredefines a metric space. Graph embedding algorithms aimto
perform this transformation in ways that preserve localstructure,
such that vertices sharing similar edges in thegraph are mapped to
similar locations in the embedding(Figure 2). The result is that
rather than edges specifyingthe presence of a relationship between
vertices, the dis-tance between any pair of vertices can be
interpreted asa measure of their relatedness. This enables simple
spa-tial implementations of three fundamental inference tasks:1)
vertex nomination – given a query vertex, find its near-est
neighbours in the embedded space; 2) link prediction– given a
similarity threshold, find possible edges missingfrom the input
graph; and 3) community detection – findclusters of vertices
preferentially connected to one another.
While stochastic approaches to graph embedding use ran-dom walks
to characterize vertex neighbourhoods beforelearning
multidimensional representations (e.g., DeepWalk[5] and node2vec
[4]), spectral approaches use eigende-composition to factor matrix
representations of the graphinto a multidimensional set of
orthogonal basis vectors. Dif-ferent methods group vertices in
different ways, e.g., Lapla-cian spectral embedding (LSE) groups
vertices with similarconnections, while Adjacency spectral
embedding (ASE)groups vertices with a similar structural role
(e.g., “hub”) [6].
CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI,
USA
CS36, Page 2
-
Navigation Graphs from Query LogsAs a way of understanding the
relationships between searchengine queries, we are interested in
inducing navigationgraphs that are represented implicitly in search
enginequery logs (Figure 3). These logs record both queries
andclicks (on page results, query recommendations, and ads)for
anonymous user sessions using Bing search. There isno one correct
way to induce such a graph: the choice ofvertex type (e.g., raw
text vs normalized text vs named enti-ties only), the relative
weights given to different edge types(e.g., query → query vs query
→ recommendation),and the minimum threshold for edge inclusion all
make asignificant impact on the character of the results.
The transfer of graph em-bedding technologies as adiffusion of
innovations
In the following series ofsidebars, we provide a com-mentary on
the process oftechnology transfer usingquotes from two of our
prod-uct group partners in Bing.We use the five stages of
the“diffusion of innovations” tostructure these observations.
Awareness – knowing ofinnovation. Our partnersknew of graph
embeddingand its possible use, but onlyin a general sense:
“There were teams doingsimilar things, just not usingthe
technology that we’re col-laborating on and as a result,a lot of
what they did waskind of handcrafted ratherthan a scaled solution.”
(P2)
“I do think that the rankingteam were aware of graphembedding
and may havebeen using it in specificcases, but not across thewhole
graph.” (P2)
The use of graph representations to understand
co-occurrencerelationships is a well established technique used in
diversedomains ranging from literary analysis [10] to
intelligenceanalysis [8]. The induction of navigation graphs from
userhistory has also been practiced for at least two decades
[2].Such “memory-based recommender systems” use collabo-rative
filtering in conjunction with association measures likepointwise
mutual information (PMI) to rank observed transi-tions [9]. The
promise of embedding-based approaches isthat they are able to
recommend relevant transitions evenin areas where the transition
signal is sparse. Recent workfrom Taobao [11] reports substantial
sparsity in item transi-tions on the Taobao e-commerce platfrom
(
-
Figure 4: VS Code extension for exploring graph embeddingsusing
coordinated views of a 2D node-link graph (top), a 3Dembedding
projection (bottom), and a ranked list of nodes (right).
beddings and derived data such as hierarchical “communi-ties” of
related entities) such that they may be automaticallyindexed and
joined with product group datasets (e.g., typesand identifiers for
knowledge graph entities) for interactiveexploration in the
browser. Our data application has threemain interface tabs,
introduced next.
With Hierarchy Viewer (Figure 5), the user can explore
hi-erarchical communities of queries related to a given
query,augmented with the query frequency and information on
thedominant entity from our internal knowledge graph. Com-munities
are detected from navigation graphs using a rangeof methods,
including statistical (graph-based) and spatial(embedding-based)
community detection techniques.
Interest – seeking moreinformation. Our first
graphvisualizations persuaded ourpartners about the value ofthe
approach, but the pre-vailing organizational culturewas to persuade
with data:
“I was just showing him howour users are navigating fromentity
to entity... He returnedwith this magnificent, won-derful,
beautiful visualizationof what the graph looks likeand it was like,
‘Wow, thisfeels like a much more usefuland interesting dataset
thanwe first thought’.” (P1)
“For me, I was sold even atthe very beginning with theinitial
tool [Figure 4], whichwas more of a strict visual-ization tool that
shows youthe whole of the graph color-coded... Right there, I got
it,I totally understand what thevalue would be of having allof Bing
in a graph.” (P2)
“There was initially a biaswhere people would look atthis and
think it was [just] avisualization tool... but reallypeople are not
interested invisualization tools in and ofthemselves.” (P2)
With Embeddings Browser (Figure 6), the user can ex-plore many
possible rankings of entity recommendationsrelated to a given
query, augmented with information fromour knowledge graph and query
logs (e.g., number of typed
Figure 5: Hierarchy Viewer for the query “Gone with the
wind”,listing top entities in the matching leaf cluster. Includes a
leadactor (Vivien Leigh), their character (Scarlett O’Hara), and
theirfamous lover not cast in the film (Lawrence Olivier).
queries vs requery clicks). Our use of the LineUp visual-ization
[3] allows the user to rerank the recommended en-tities based on
any attribute (e.g., its own frequency vs itssimilarity to the
query). The user can also create weightedcombinations of attributes
by dragging columns onto oneanother, before dynamically reweighting
the attributes (andthus reranking the results) by directly
manipulating relativecolumn widths. Recommendations may also be
filtered in-teractively (e.g., based on frequencies and/or entity
types)to derive multiple kinds of recommendation, each with
adistinctive character and purpose in terms user experience.
With Model Viewer (Figure 1), the user can load any sub-set of
the embedding models for side-by-side comparisonof the top N
results across a series of queries. Result listscan be filtered to
show their union (all items), intersection(common items), and
symmetric difference (unique items).
CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI,
USA
CS36, Page 4
-
Figure 6: Embedding Browser for the query “BMW”. Resultsranked
by similarity (top) are refined by jointly ranking onfrequency
(75%–25%) and filtering by entity type (bottom).
Toggling any item highlights that item in a common color
Evaluation – deciding oninitial use. Experimentationpersuaded
segment ownersto trial in production:
“When we meet with differentsegment owners, we showthe tool and
they really likeplaying with it, because theyfind for themselves
and theyunderstand for themselvesthat combining similarity
withother signals works well ingeneral... eventually
theyfamiliarize themselves withenough queries and resultsthat they
trust the data andgive the go ahead to the en-gineers to take the
raw dataand try to scale this and flightthis and ship this.”
(P1)
Trial – learning from expe-riences. Personal evalua-tions
outweighed prior resultsin other areas:
“the other segment owners,they all want to be individu-ally
convinced by more thanjust a flight... when I showthem this view
[Figure 6], itlooks user friendly, it lookshigh quality, and they
feelempowered to make theirown decision.” (P1)
across all juxtaposed models, while “Rank by Selection”ranks all
models based on their coverage of the selecteditems (using the
Jaccard set similarity measure of intersec-tion over union).
Finally, “Model Stats” presents a matrixcomparing all loaded models
to one another using eitherJaccard similarity or average precision
(treating the modelrepresented by each row as the ground truth
ranking forall other models). Together, these capabilities allow
de-tailed comparison of many candidate models across a widerange of
queries, allowing the user to understand which ofthe existing
models works best for their needs, and which
Figure 7: Similar products feature on bing.com, showing
acarousel of recommended links for smartphones similar to thequery
“iPhone XR” (Country/Region: United States – English).
new models should be developed and evaluated next be-fore
flighting or releasing to production. Figure 7 showsthe desktop and
mobile experiences of one such featureshipped on bing.com –
“Similar products” recommendation.
Finally, the web application offers a link to “Segment
spread-sheets” – a SharePoint folder in which the top
recommen-dations for all queries associated with each
predefinedsearch segment (e.g., health or automotive) are
exportedto their own Excel spreadsheets for custom analysis
(e.g.,by filtering and ranking in Excel, by importing to
BusinessIntelligence platforms like Power BI and Tableau, or by
per-forming data science using languages like R and Python).
CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI,
USA
CS36, Page 5
-
How Visual Representations Drove AdoptionWe interviewed two of
our primary partners in Bing to un-derstand their experience of the
technology transfer pro-cess and how our interfaces have influenced
sense-makingand decision-making across the organization. Both
arehighly experienced program managers who have workedin a variety
of Microsoft product groups before their cur-rent roles in Bing,
where they are responsible for the devel-opment of new metrics and
features, as well as the cross-group coordination with segment
owners (e.g., for querieson health, sports, politics, food, etc.)
required to ship newand improved features in production.
A central theme emerging from these interviews was that vi-sual
representations had played a vital role at each stage ofthe
adoption process [7], which we have illustrated througha series of
sidebars concluding in the left margin of thispage. In the sections
below, we reflect on how exploratorydata interfaces helped to make
sense of search.
Adoption – deciding oncontinued use. Improve-ments in user
experienceand revenue ultimately se-cured continued investment:
“In the past we were just us-ing PMI to do related entities,but
the unique challenge inproduct segments is thatthere is a very
sparse signalof users transitioning fromone product to another.
Thecool thing now is the similar-ity score, where even if youdon’t
have a lot of raw sig-nal, we’re able to find closerelationships.
So really ithelped us to create the re-lated entities feature that
wecouldn’t have done just withraw data... and it’s having adirect
monetary impact onBing, it’s really exciting.” (P1)
Overall, this analysis showsthe power of exploratory
datainterfaces to overcome boththe sensitivity of machinelearning
and the subjectivityof human decision makers,as well as the power
of graphmodels and embeddings toreveal latent value in data.
Visual representations suggest new metrics and experiencesP1
described the close relationship between new visual-izations and
metrics inspired by newly-visible phenomena:“The question is, ‘What
should the next metric be, and howshould we get people to buy into
that metric?’ And really vi-sualization is the key there, because
if people can see theproblem, they will want to act on it and track
the improve-ment over time”. From “the maybe 50-60 metrics” he
hadcreated and presented to executives, navigation graph met-rics
had “gained the most traction”. One reason was theclear advantage
that embedding-based methods had overthe use of PMI, which was the
prevailing standard prac-tice: “It’s what you’re blind to with PMI
... let’s say you’re onthis specific node. PMI is just going to
tell you, ‘go here, gohere, go here, go here... [to adjacent
nodes]’, but that’s notthe only thing the user should do. They
should also move
to nearby clusters”. This sense of ‘bottom up’ query cluster-ing
in ways that may be independent of ‘top down’ segmentdefinitions,
along with a sense for the spatial proximity ofsuch clusters, both
emerged from exposure to our earlynode-link visualization of
navigation graphs (Figures 3 & 4).
Visual clusters reveal gaps in existing knowledgeP2 explained
how one of the most important capabilities of-fered by our graph
embedding approach was the automaticgeneration of “reasonably
clustered sets where we can kindof narrow down and identify how
dense the entity graphis within that particular cluster”. Tool
users can thus inter-actively answer the questions, “Are there
missing entitiesin that particular cluster? Are the entities too
generic? Dowe need more specific entities in [our knowledge
graph]?”.At the same time, both partners critiqued how the
query-driven, list-based representation in this view showed only
anarrow slice through the navigation graph, and thus lackedthe
perceptual and navigational affordances of the node-link
visualizations we had initially used to communicate theideas of
navigation graphs and graph embeddings (Figures3 & 4). While
our design choices with the current tool wereguided by the ‘search
and list’ paradigm most familiar toour partner team, this
reinforces the idea that new tools canlead to cultural change
(which the tools must then follow).
P1 gave a detailed account of how a Bing-scale navigationgraph
combined with the ability to experiment in real-timeenabled a new
kind of collaborative decision-making: “Let’ssay I’m on a call and
they say, ‘I have this problem...’, I cango there in real-time and
try their specific query, their spe-cific segment, tweak the
ranking, the filtering, and show it tothem in real-time... being
able to explore all this intelligencefrom the dataset in real-time
is valuable because you areriding the wave of the meeting you’re
in... Instead of saying,‘let’s schedule another meeting’, or ‘I’ll
reply after the meet-
CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI,
USA
CS36, Page 6
-
ing with some extra insights’, you can extract it in
real-timeand have a discussion in real-time... there’s a lot of
value inthe tool but what I like the most is that I have access to
thefull dataset and I can explore it in real-time”.
Visual juxtaposition builds understanding of methodsP1 called
out Model Viewer as ending up “the most usefulto us” as it allowed
direct model comparison at scale, em-phasizing their similarities
and differences as well as theirsuitability to different problems:
“We can compare differentembedding algorithms side-by-side and this
is how we re-ally understood that one algorithm is really good for
typos,one is really good for sub-intent, one is really good for
re-lated entities. That was really a breakthrough there, andonce
you understand which embedding algorithm is goodfor what, then you
need to try it with different permutationsof hyperparameters for
tuning, and this is the other partwhere this tool was useful,
because then you can try like200 permutations and figure out which
one’s best”. Show-ing an example query for ‘Bill Gates’, P2 also
called outthe different qualities of result lists arising from the
use ofAdjacency vs Laplacian spectral embedding: “In the ASEmodel,
you can see that this is a really tight set of entities,especially
at the top. This would be the ‘find missing entitiesand find which
entities are related to a specific entity’ cut.Whereas the LSE
model shows a lot of re-queries with theterm Bill Gates, which is
so powerful... For example, youcould look at the popular re-query
terms for all the peoplethat are like Bill Gates, and then you
could rank those to getthe best set of re-query terms for that
particular entity type.And by entity type, Bill Gates would be
people.person, but ifyou look at this [ASE] list you can clearly
see we could cre-ate a new type of people.person.billionaire”.
These insights– that multiple embedding methods can be used
togetherto infer new entity types as well as their sub-intents –
wereboth consequences of visual model juxtaposition.
ConclusionIn this case study, we have shown the value of both
induc-ing and embedding navigation graphs as a novel approachto
“making sense of search”. While the resulting data as-sets may be a
necessary foundation for universal queryrecommendation, such data
alone are insufficient for drivingdecisions by product owners to
trial the data and associatedalgorithms in production. In our
experience, such decisionsgreatly benefit from the visual
representation and explo-ration of data, especially when the data
are comprehensive,the queries are relevant to the viewer, and the
results aretunable based on subjective notions of quality.
At the same time, results that look good to software engi-neers
and program managers in principle may not performwell with
end-users in practice, and the continued use andbroader adoption of
navigation graph embeddings withinBing will always depend on their
measured impact on bothuser engagement and revenue. Our models are
driving ex-perimental flights of both desktop and mobile
experiences,as well as shipped experiences for “Similar products”
and“People also search for” in a limited but expanding set
ofsegments and markets. We are observing statistically sig-nificant
improvements on the general metrics of sessionsuccess rate, time to
success, and estimated revenue perclick, as well as recommendation
clicks and carousel ac-tions. We hope that our work continues to
transform theBing user experience for the better and that others
maybenefit from the lessons of this case study.
AcknowledgementsWe thank our team members in Microsoft Research,
ourproduct group partners in Bing, and Carey Priebe and histeam at
the Johns Hopkins University Applied Mathematicsand Statistics
Department.
CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI,
USA
CS36, Page 7
-
REFERENCES[1] Hongyun Cai, Vincent W. Zheng, and Kevin
Chen-Chuan Chang. 2018. A comprehensive survey ofgraph
embedding: Problems, techniques, andapplications. IEEE Transactions
on Knowledge andData Engineering 30, 9 (2018), 1616–1637.
[2] Xiaobin Fu, Jay Budzik, and Kristian J. Hammond.2000. Mining
Navigation History for Recommendation.In Proceedings of the 5th
International Conference onIntelligent User Interfaces (IUI ’00).
ACM, New York,NY, USA, 106–112.
DOI:http://dx.doi.org/10.1145/325737.325796
[3] Samuel Gratzl, Alexander Lex, Nils Gehlenborg,Hanspeter
Pfister, and Marc Streit. 2013. Lineup:Visual analysis of
multi-attribute rankings. IEEEtransactions on visualization and
computer graphics19, 12 (2013),
2277–2286.https://github.com/lineupjs
[4] Aditya Grover and Jure Leskovec. 2016. node2vec:Scalable
feature learning for networks. In Proceedingsof the 22nd ACM SIGKDD
international conference onKnowledge discovery and data mining.
ACM, 855–864.https://github.com/aditya-grover/node2vec
[5] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena.2014.
DeepWalk: Online learning of socialrepresentations. In Proceedings
of the 20th ACMSIGKDD international conference on
Knowledgediscovery and data mining. ACM,
701–710.https://github.com/phanein/deepwalk
[6] Carey E. Priebe, Youngser Park, Joshua T. Vogelstein,John M.
Conroy, Vince Lyzinski, Minh Tang, AvantiAthreya, Joshua Cape, and
Eric Bridgeford. 2019. Ona two-truths phenomenon in spectral graph
clustering.Proceedings of the National Academy of Sciences116, 13
(2019), 5995–6000.
[7] Everett M. Rogers. 1962. Diffusion of innovations (1sted.).
Free Press of Glencoe, New York. (book).
[8] John Stasko, Carsten Görg, and Zhicheng Liu. 2008.Jigsaw:
supporting investigative analysis throughinteractive visualization.
Information visualization 7, 2(2008), 118–132.
[9] Eva Suárez-García, Alfonso Landin, Daniel Valcarce,and
Álvaro Barreiro. 2018. Term Association Measuresfor Memory-based
Recommender Systems. InProceedings of the 5th Spanish Conference
onInformation Retrieval. ACM, 6.
[10] Romain Vuillemot, Tanya Clement, Catherine Plaisant,and
Amit Kumar. 2009. What’s being said near“Martha”? Exploring name
entities in literary textcollections. In 2009 IEEE Symposium on
VisualAnalytics Science and Technology. IEEE, 107–114.
[11] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang,Binqiang
Zhao, and Dik Lun Lee. 2018. Billion-scalecommodity embedding for
e-commercerecommendation in alibaba. In Proceedings of the 24thACM
SIGKDD International Conference on KnowledgeDiscovery & Data
Mining. ACM, 839–848.
CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI,
USA
CS36, Page 8
http://dx.doi.org/10.1145/325737.325796https://github.com/lineupjshttps://github.com/aditya-grover/node2vechttps://github.com/phanein/deepwalk
IntroductionGraph EmbeddingNavigation Graphs from Query LogsData
Interfaces for Making Sense of SearchHow Visual Representations
Drove AdoptionVisual representations suggest new metrics and
experiencesVisual clusters reveal gaps in existing knowledgeVisual
juxtaposition builds understanding of methods
ConclusionREFERENCES