-
MANUFACTURING & SERVICE OPERATIONS MANAGEMENTArticles in
Advance, pp. 1–10
http://pubsonline.informs.org/journal/msom ISSN 1523-4614
(print), ISSN 1526-5498 (online)
NetEase Cloud Music DataDennis J. Zhang,a Ming Hu,b Xiaofei
Liu,c Yuxiang Wu,c Yong Lic
aOperations and Manufacturing Management, Olin Business School,
Washington University in St. Louis, St. Louis, Missouri
63130;bOperations Management and Statistics, Rotman School of
Management, University of Toronto, Toronto, Ontario M5S 3E6,
Canada;cEngineering Group, NetEase Cloud Music Inc., Hangzhou,
Zhejiang 310052, ChinaContact: [email protected],
https://orcid.org/0000-0002-4544-775X (DJZ);
[email protected],
https://orcid.org/0000-0003-0900-7631 (MH);
[email protected] (XL); [email protected]
(YW);[email protected] (YL)
Received: March 15, 2020Revised: May 21, 2020Accepted: May 31,
2020Published Online in Articles in Advance:December 8, 2020
https://doi.org/10.1287/msom.2020.0923
Copyright: © 2020 INFORMS
Abstract. This paper describes the impression/display data and
corresponding user,creator, and music content card data from
NetEase Cloud Music. This data set is col-lectively supplied by the
Revenue Management and Pricing (RMP) section of INFORMSand NetEase
Cloud Music to support data-driven research in operations
management.The data contain more than 57 million
impressions/displays of music content cardsrecommended to a random
sample of 2,085,533 users from November 1, 2019 toNovember 30,
2019. For each impression, the data provide the corresponding user
ac-tivities, such as clicks, likes, and follows. Moreover, the data
set also contains informationon each user, each content creator,
and each content in the impression sample.
Keywords: data competition • platform operations • revenue
management
1. IntroductionWith the development of faster internet speed
andbetter mobile connections, many people are nowstreamingmusic
through services such as Spotify andApple Music instead of
purchasing the hard-copymusic CDs. In fact, a recent report from
the Record-ing Industry Association of America (RIAA) revealsthat
streaming accounted for 80% of the U.S. musicmarket in 2019,
compared with 7% in 2010.1 Accordingto the same report from RIAA,
the number of musicstreaming subscribers in the United States rose
from1.5 million to around 61 million from 2010 to the firsthalf of
2019.
The United States is not the only country in whichmusic
streaming is reshaping the music entertain-ment industry. Other
countries also observe a similartrend. In this paper, we describe a
data set from one ofthe largest music streaming companies in
China—theNetEase Cloud Music (hereafter referred to as NCM).NCM is
a free music streaming service developedand owned by NetEase, Inc.
It was first launched onApril 23, 2013 and then became immensely
popularin China. According to a recent report, the musicstreaming
service had around 800 million users in2019, with a valuation of
around $9 billion.2
The major product of NCM is its music app (hereafterreferred to
as the music app). Figure 1 shows the mainuser interfaces of this
app. Aswe can see, there arefivemain tabs at the bottom in this
app. From the left to theright, the first is the “main page” tab,
which consistsof the recommended albums and podcasts. The sec-ond
is the “music video” tab, which contains a single
feed of music videos. The third tab in themiddle is the“my own
music” tab, which shows one’s own locallystoredmusic. The fourth
tab is the “cloud village” tab,which contains two feeds of short
music content cards(hereafter referred to as cards) that are
recommendedto a user. A music content card can be either a
shortvideo with background music or a set of pictures
andtextswithbackgroundmusic. The last tab is the “accountand
settings” tab, where users can change their ac-count settings.In
this data set, we will provide more than 57 million
impression-level data in the cloud village tab associ-atedwith
2, 085, 533users in a one-month-long sampleperiod fromNovember 1,
2019 to November 30, 2019.Impression is a commonly used term in the
adver-tising literature, which refers to the display of
anadvertisement on a web page to a user.3 In our con-text, each
impression corresponds to a card displayedto a given user on his or
her feed in the cloud villagethrough the recommender system. We
will also pro-vide users’, creators’, and cards’ characteristic
in-formation regarding all users, creators, and cardsthat appear in
the impression data. Our data can bedivided into six different
tables, andwewill describeeach table in detail in Section 2.To help
researchers identify practical problems that
are of interest toNCM,we discussed the following listof research
questions with the management groupof NCM:1. The company defines a
user to be inactive if he or
she has a zero or very low average click probability
onrecommended cards. The company wants to design
1
http://pubsonline.informs.org/journal/msommailto:[email protected]://orcid.org/0000-0002-4544-775Xhttps://orcid.org/0000-0002-4544-775Xmailto:[email protected]://orcid.org/0000-0003-0900-7631https://orcid.org/0000-0003-0900-7631mailto:[email protected]:[email protected]:[email protected]://doi.org/10.1287/msom.2020.0923
-
recommender systems to make inactive users activeand to make
active users remain active. How couldthe company design different
recommender systemsto serve users with different activity
levels?
2. What are the characteristics and preferences ofactive users
on the platform? How could the platformpredict whether a user will
be active or inactive fromhis or her early on actions, such as
clicks, likes, andshares? How could the platform design the
recom-mender system tomaximize the number of active users?
3. How do different types of feedback information,such as the
number of likes and follows, change acreator’s motivation to
publish new content? Howcould the platformdesign the recommender
system toencourage creators to create more content?
4. The company’s long-term goal on the cloudvillage tab is to
maximize the daily number of clicks/plays and the daily number of
content created. Howcould the company create a recommender system
thattrades off short-term goals, such as the number ofclicks/plays
in one day, with this long-term goal?
2. Data DescriptionIn this section, we describe all tables in
the data setprovided to researchers. To ensure confidentiality
anduser privacy, certain characteristics and key identifica-tion
information, such as username, user ID, and musicgenre, are
anonymized or dropped. The data set is
centered around 2, 085, 533 users who have partici-pated in the
cloud village tab, as shown in Figure 1,during a one-month-long
sample period from No-vember 1, 2019 to November 30, 2019. As
mentionedabove, the cloud village tab is one of the fivemain tabsin
the NCM app, and it is a platform where users canpost short videos
or sets of imageswith specificmusic.The users in our data do not
contain the entire pop-ulation of users who access the cloud
village tabduring our sample period. Rather, because of
confi-dentiality, we randomly sampled a subset of usersfrom all who
have accessed the focal tab at least onceduring the sample
period.Figure 2 shows the details of our focal tab, the cloud
village tab. As shown, this tab can be divided into twosubtabs:
the discovery subtab and the follow subtab,which are shown on the
top of the app. The discoverysubtab shows two vertical streams of
music videocards that are recommended to a user by the rec-ommender
system developed by NCM. Each cardcontains the first frame of the
video or the first picturein a set of images, the creator’s
information, a shortdescription, and the number of likes that the
card hasaccumulated until the current impression. The followsubtab
shows the cards from creators that a user hasfollowed, and these
cards are ranked chronologicallybased on their publication time.
Because the majorityof the activities in the cloud village tab
happens in
Figure 1. (Color online) Five Main Tabs on NCM App
Zhang et al.: NetEase Cloud Music Data2 Manufacturing &
Service Operations Management, Articles in Advance, pp. 1–10, ©
2020 INFORMS
-
the discovery subtab, and we would like to providedata that may
be useful for operations managementresearchers to test various
recommender systems, wefocus on the impression data in the
discovery subtab.
These impression data start with a data table con-taining the
57, 750, 395 card impressions displayedto users in the discovery
subtab. Each impressionrepresents a card shown to a specific user
at a par-ticular time during the sample period. Because
eachimpression consists of a user, a creator, and a card, inthe
other five tables, we will also provide daily in-formation with
respect to each user, each creator, andeach card that appear in the
impression data table.Table 1 provides a summary of all six tables
in the dataset. In the following subsection, we will first
discussthe impression-level table and then move on to thecard,
creator, and user data tables.
2.1. Impression DataTable 2 describes each data field in the
impression-level data table. This data table contains 57, 750,
395
impression-level data points covering 2, 085, 533 uniqueusers
during the 30-day-long sample period. The tablecontains 13 data
fields. The userId data field uniquelyidentifies each user in the
entire data set, and it can beused to join this table with user
tables in Section 2.4.The mlogId data field uniquely identifies
each cardand can be used to join this impression table with
cardtables in Section 2.2. The impressTime data field rec-ords the
epoch time when the impression is firstshown to the user on his or
her app (instead of the timewhen the user clicks on the
impression). Each usermay have multiple impressions in a given day;
eachcard may be shown to multiple users during thesample period.
Therefore, each row of impressiondata is uniquely identified by a
combination of userId,mlogId, and impressTime, representing that a
card isshown to a user at a particular time. Each impressionfor a
user comes with a position in his or her feedstream, and it is
recorded in the impressPosition datafield. The position starts with
one and is counted fromtop to down and from left to right. In other
words, if
Figure 2. (Color online) The Cloud Village Tab’s User
Interface
Table 1. Summary of Tables in the Data Set
File CategorySection inthe paper Data level
Number ofobservations
Impression_data.csv Impression 2.1 impression-level
57,750,395Mlog_demographics.csv Card 2.2 card-level
252,955Mlog_stats.csv Card 2.2 card-day-level
4,191,677Creator_demographics.csv Creator 2.3 creator-level
90,534Creator_stats.csv Creator 2.3 creator-day-level
2,572,512User_demographics.csv User 2.4 user-level 2,085,533
Zhang et al.: NetEase Cloud Music DataManufacturing &
Service Operations Management, Articles in Advance, pp. 1–10, ©
2020 INFORMS 3
-
we have four cards on the app screen, as shown inFigure 2, the
upper left has position one, the upperright has position two, the
lower left has positionthree, and the lower right has position
four.
For each impression, the table provides the users’actions on the
recommended cards. First, as shown inFigure 3(a), a user could
click on a card once theimpression of the card is presented to the
user. Oncethe user clicks on the card, the music video in the
cardwill be automatically played in the user’s app in fullscreen
mode if the card contains a video. If a cardcontains a set of
images, the first image will be shownto the user in full screen
mode. Such an action isrecorded in the isClick data field. Second,
Figure 3(a)also shows that, once clicking on a card, the user
cancomment and view other comments on the card. Asshown in the
middle scenario of Figure 4(b), the userscan view or post comments
on a card by clicking onthe comment section while watching the
card. Once auser clicks on the comment tab, the comment
sectionshowing other people’s comments will appear alongwith a text
box at the bottom. The user can viewothers’ comments or type in
textual information in thetext box to post a comment. Moreover,
Figure 4(a)displays that the total number of comments of a cardis
shown below the comment button to all users whohave clicked on the
card. Whether a user commentson a card for an impression is
recorded in the isCommentdata field, and whether a user views
others’ com-ments on a recommended card is recorded in
theisViewComment data field.
Third, a user can also like a card by clicking on thethumb-up
button. In the upper scenario of Figure 4(b),we can observe that,
once the user clicks on thethumb-up button, the button will turn
from white tored and a big thumb-up logo will appear in the
center
of the screen for a couple of seconds to indicate thatthe user
has successfully liked the card. ComparingFigure 2 and Figure 4(a),
we can see that the totalnumber of likes of a card, unlike its
total number ofcomments, is visible to all users, regardless of
whetherthey have clicked on the card or not. Whether a userlikes a
card or not is recorded in the isLike data field.Fourth, a user can
also click on the share button, asshown in the lower scenario of
Figure 4(b). Once auser clicks on the share button, a share
screenwill popup from the bottom, asking the user which channel
toshare with. The user can share through various socialmedia
channels, such as WeChat and QQ. Whether auser decides to share the
card from an impression,regardless of the channel, is recorded in
the isSharedata field. Moreover, a user can click on the
creator’spersonal profile logo on top of the like button and
getinto the creator’s personal page, and whether a userenters the
creator’s personal homepage from a card isrecorded in the
isIntoPersonalHomepage data field.Last but not least, the users can
also swipe down a
card. Once a user decides to swipe down the card, anew card will
be automatically recommended to theuser and automatically shown in
full screen mode.Notice that, if the current card has impression
posi-tion n, the next card recommended to the user afterswiping
down may not have position n + 1 becausethe user’s actions of
playing the current card wouldgive the algorithm more information
and update thealgorithm’s recommendation. If a user chooses toswipe
down after watching a card recommended tohim or her, the
information regarding each of thecards shown after swiping down
will be stored inthe detailMlogInfoList data field of the focal
card’sdata point in Table 2. The number of data points
indetailMlogInfoList represents how many times that
Table 2. Data Dictionary for impression_data.csv
Field name Data type Description Sample value
UserId String The unique identifier of each user in the data set
MCPCHCMCHCICDt Numeric The number of days from the start of the
sample period 11MlogId String The unique identifier of each card in
the data set. NCPCKCKCMCPCNCImpressTime Numeric The epoch time of
the impression 1574478123000ImpressPosition Numeric The position of
impression in the feed 10IsClick Binary One if the user clicks on
the card, zero otherwise 1IsComment Binary One if the user comments
on the card, zero otherwise 0IsLike Binary One if the user likes
the card, zero otherwise 1IsShare Binary One if the user shares the
card, zero otherwise 1IsViewComment Binary One if the user views
comments from other users on the
card, zero otherwise0
IsIntoPersonalHomepage Binary One if the user enters the
creator’s homepage throughthe card, zero otherwise
0
MlogViewTime Numeric The number of seconds that the user has
spent on thecard
136.09
DetailMlogInfoList String JSON file contains all cards that the
user sees if s/heswipes down
[{‘isZan’: ‘0’, ‘isComment’: ‘0’. . .]a
aisZan is the same as isLike.
Zhang et al.: NetEase Cloud Music Data4 Manufacturing &
Service Operations Management, Articles in Advance, pp. 1–10, ©
2020 INFORMS
-
the user has swiped down after clicking on a card.Notice that
the impressions through swiping downare different from those in the
discovery subtab. A
card will be automatically played if it comes fromswiping down.
But a card will only be played afterbeing clicked if it comes from
the discovery subtab.
Figure 3. Tenure of Users, Creators, and Cards
Note. (a) User tenure; (b) creator tenure; (c) card tenure.
Zhang et al.: NetEase Cloud Music DataManufacturing &
Service Operations Management, Articles in Advance, pp. 1–10, ©
2020 INFORMS 5
-
In summary, this impression-level data contains twosets of
impression: (a) impression through the twostreams of video cards in
the tab, which is stored in
each data point of Table 2, and (b) impression throughswiping
down, which is stored in the detailMlogInfoListdata field in Table
2.
Figure 4. (Color online) Actions on a Card
Note. (a) Clicking to play a card; (b) like, comment, and share
a card.
Zhang et al.: NetEase Cloud Music Data6 Manufacturing &
Service Operations Management, Articles in Advance, pp. 1–10, ©
2020 INFORMS
-
Furthermore, the table also provides the number ofseconds for
which the user has played the card. Inother words, this is the
difference between the timewhen the user clicks on the card and the
timewhen theuser swipes down or leaves the card by clicking theback
button or closes the app. If the card containsa video and the user
is still on the video page whenthe video ends, the video will
automatically replay.The watch time of an impression is recorded in
themlogViewTime data field. Notice that a user’s total appusage
cannot be imputed from this watch time be-cause a user may browse
other tabs in the app on agiven day; and, therefore, unfortunately,
researcherscannot impute a user’s total app usage in a daythrough
this data set.
2.2. Card DataWe then introduce two data tables regarding
eachcard in the impression data. Panel A of Table 3 de-scribes the
card demographics table, which is at thecard level. This table
offers information about eachcard that does not change over time.
Each row in thetable is uniquely identified by the mlogId data
field,representing each card. For each card, we use thesongId data
field to identify the song used in thebackground. Each card can
only have one song as-sociated with it, but each song can be used
for mul-tiple cards. In fact, the most popular song in this dataset
has been used for 92, 426 cards. For each card, wealso provide the
unique artist of the song used in thecard, which is recorded in the
artistId data field. ThecreatorId data field stores a unique
identifier for eachcreator in the data set, and it can be used to
join thiscard-level table with creator tables in Section 2.3.The
publishTime data field represents the differencebetween the time
when the card is initially pub-lished and the end of the sample
period, which isDecember 1, 2019. Figure 4(c) shows the histogramof
the publishTime data field. It can be seen that, eventhough the
sample period is onemonth long, the sam-ple contains cards that
have been created before thesample period (i.e., cards with
publishTime largerthan 30). The type data field is important and
differ-entiates a music video card from an image card. Thetype is
one if the card contains a set of images and textwith background
music and two if the card containsa music video. The contentId and
talkId data fieldsrepresent anonymized categorical data related
toeach card. The contentId data field contains 122
levelsrepresenting the content category of the card, such asgaming
or concert. The talkId data field has 9, 914levels indicating the
hashtags used in the card, whichis often about a particular
event.
Besides this card-level table, the data set also in-cludes a
card-day-level table that provides the dailysummary statistics of a
card in the sample period.Panel B of Table 3 describes this data
table. Notice thatthe summary statistics of a card does not only
includethe actions generated by users in our random sample(i.e.,
users in Table 4) but also include actions of allother users on the
platform who have used the dis-covery subtab on the given date and
interacted withthe card. Each row in this table is uniquely
identifiedby the combination of the mlogId data field and the
dtdata field, representing the summary statistics of acard for a
given date.We provide seven different summary statistics
with respect to each card in a given day. The
user-ImprssionCount refers to the total number of uniqueusers a
card was shown to in a given day. The user-ClickCount summarizes
the total number of uniqueusers who have clicked on a card during a
givendate. The userLikeCount, userShareCount, and user-CommentCount
data fields represent the number ofunique users who have liked,
shared, or commentedon a card in a given day. Similarly, the
userView-CommentCount and userIntoPersonalHomepageCountshow the
number of unique users who have browsedcomments on or entered the
creator’s home pagefrom a given card in a given day. Last, the
userFol-lowCreatorCount shows the total number of uniqueusers who
have followed the creator of a card throughthis card in a day. Note
that this number does notrepresent the total number of new
followers that acreator generates in a day because users can
alsofollow the creator from other cards and/or fromother tabs.
2.3. Creator DataNext, we discuss the two tables associated with
eachcreator in the data set. The first table provides de-mographic
information about each creator, and it isat the creator level. The
detailed data fields of thiscreator-level table are described in
panel A of Table 5.The unique identifier of this table is the
creatorId datafield. For each creator, we provide several key
de-mographic information that could be revealed. First,we offer
each creator’s predicted gender in the genderdata field, which can
be either “male” or “female.”This data field can also be either
“NA” or “unknown,”both of which represent that the gender of the
creatoris not known to the platform. Second, we provide thetenure
of each creator (i.e., the number of months forwhich this creator
has registered until December 1,2019), which is recorded in the
registeredMonthCntdata field. Figure 4(b) shows the histogram of
this
Zhang et al.: NetEase Cloud Music DataManufacturing &
Service Operations Management, Articles in Advance, pp. 1–10, ©
2020 INFORMS 7
-
Tab
le3.
DataDictio
nary
formlog_
demog
raph
ics.csvan
dmlog_
stats.csv
Pane
lA:D
atadictiona
ryformlog_
demog
raph
ics.csv
Fieldna
me
Datatype
Descriptio
nSa
mplevalue
MlogId
String
Theun
ique
iden
tifier
ofeach
card
intheda
taset
NCPC
KCKCMCPC
NC
Song
IdString
Theun
ique
iden
tifier
ofeach
song
intheda
taset
LCLC
NCGCPC
LCPC
JCGC
ArtistId
String
Theun
ique
iden
tifier
ofeach
artis
tof
asong
intheda
taset
PCNCHCNCNCPC
PCJC
CreatorId
String
Theun
ique
iden
tifier
ofeach
creatorof
acard
intheda
taset
KCJC
KCNCNCLC
LCIC
NC
PublishT
ime
Num
eric
Thenu
mbe
rof
days
whe
nthecard
ispu
blishe
dtillDecem
ber1,
2019
109
Type
Bina
ryOne
ifthecard
contains
aseto
fimag
esan
dtext
with
backgrou
ndmus
ic,twoifthecard
contains
amus
icvide
o1
Con
tentId
Num
eric
Thean
onym
ized
type
ofacard’s
conten
twith
122un
ique
leve
ls500,150,125,068
TalkId
Num
eric
Thean
onym
ized
topicof
acard
with
9,91
4un
ique
leve
ls27
,004
Pane
lB:
Datadictiona
ryformlog_
stats.csv
Fieldna
me
Datatype
Descriptio
nSa
mplevalue
MlogId
String
Theun
ique
iden
tifier
ofeach
card
intheda
taset
NCPC
KCKCMCPC
NC
Dt
Num
eric
Thenu
mbe
rof
days
from
thestartof
thesamplepe
riod
11UserImprssionC
ount
Num
eric
Thenu
mbe
rof
unique
usersthecard
was
show
nto
foragive
nda
te133
UserC
lickC
ount
Num
eric
Thenu
mbe
rof
unique
userswho
clicke
don
thecard
foragive
nda
te65
UserLikeC
ount
Num
eric
Thenu
mbe
rof
unique
userswho
liked
onthecard
foragive
nda
te8
UserC
ommen
tCou
ntNum
eric
Thenu
mbe
rof
unique
userswho
commen
tedon
thecard
foragive
nda
te1
UserV
iewCom
men
tCou
ntNum
eric
Thenu
mbe
rof
unique
userswho
view
edothe
rs’commen
tson
thecard
foragive
nda
te1
UserSha
reCou
ntNum
eric
Thenu
mbe
rof
unique
userswho
shared
thecard
foragive
nda
te0
UserIntoP
ersona
lHom
epag
eCou
ntNum
eric
Thenu
mbe
rof
unique
userswho
enteredthecreator’sho
mep
agefrom
thiscard
foragive
nda
te0
UserFollowCreatorCou
ntNum
eric
Thenu
mbe
rof
unique
userswho
follo
wed
thecreatorfrom
thecard
foragive
nda
te0
Zhang et al.: NetEase Cloud Music Data8 Manufacturing &
Service Operations Management, Articles in Advance, pp. 1–10, ©
2020 INFORMS
-
registeredMonthCnt data field, which indicates thatthe data set
covers a wide set of creators withheterogeneous registration time
on the platform. Third,we also provide each creator’s number of
followersand number of people he or she has followed byNovember 1,
2019. This information is stored at thefolloweds and follows data
fields. Fourth, the creatorTypedata field gives the anonymized
genre of creators,which has 10 levels. Last but not least, the
level datafield represents the activity intensity of the
creatorranging from 0–10. The activity intensity level is
acombination of a user’s app time and frequency ofinteractions with
the app; the smaller this number is,the less active the user
is.
The second table in this category provides dailyinformation
about each creator, and it is described inpanel B of Table 5. Each
row in this table is uniquelyidentified by the combination of the
creatorId and dtdata fields, which represents a unique creator in
agiven day. For each creator in each day, we providethe number of
cards that this creator has created inthat day. This information is
recorded in the Pub-lishMlogCnt data field.
2.4. User DataThe last part of our data consists of user
demographicinformation for all users who appeared in our
im-pression data set. Table 4 describes each data field ofthis
user-level table. Each row in the table is uniquelyidentified by
the userId data field. Similar to thecreator-level table, we
provide six key pieces of in-formation regarding each user in our
sample. First,we provide the province where the user resides
in,which is recorded in the province data field. This datafield can
be any province in China or NA, repre-senting that the province of
the user is not known.Second, the age and gender data fields
provide thepredicted age and gender of the user in our data
set.Third, similar to the creator-level table, the
regis-teredMonthCnt data field counts the number of monthsbetween
this user’s registration time and December 1,2019. Figure 4(a)
shows the histogram of the regis-teredMonthCnt data field. It can
be seen that, eventhough the data sample is only one month long,
it
covers users with a wide spectrum of tenure on theplatform.
Fourth, we provide the number of people auser has followed until
December 1, 2019, in the fol-lowCnt data field. Last, we also
provide the activityintensity level of each user in the level data
field.Notice that, because a user can also post video onNetEase
Cloud Village, a user can also be a creator.One can use userId in
Table 4 to match with creatorIdin Table 5 to get data of a user’s
creating and con-sumption activities at the same time.4
3. Conclusion and Data Access
This data set provided by NCM consists of six tablesdescribing
2, 085, 533 users’ impression-level activi-ties from the discovery
subtab on the NCM app fromNovember 1, 2019 to November 30, 2019.
NCM andthe Revenue Management and Pricing (RMP) sec-tion of INFORMS
collectively invite researchers toconduct novel data-driven
research in the field ofrevenue management and innovative
marketplaceanalysis by using this data set. Among all
possibleresearch questions, the team at NCMwho provides thedata are
mostly interested in the ones described inSection 1.Asmentioned
above, these six tables in the data set
can be divided into four categories. This first cate-gory is
about the impression data and contains theimpression table (i.e.,
impression_data.csv). Thistable records all 57 million impressions
of cards thatthose 2, 085, 533 users have experienced in the
dis-covery subtab during the 30-day-long sample period.The second
category is about cards and consists of thecard-level table and the
card-day-level table (i.e.,mlog_demographics.csv and
mlog_stats.csv). Thisinformation contains time-homogeneous and
time-inhomogeneous daily information regarding eachcard in the
impression data. The third category isabout creators and contains
two tables: one is at thecreator level (i.e.,
creator_demographics.csv) and theother is at the creator day level
(i.e., creator_stats.csv).These tables contain information
regarding each ofthe 90, 534 creators who appeared in our
impressiondata. The last category has a user-level table, which
Table 4. Data Dictionary for user_demographics.csv
Field name Data type Description Sample value
UserId String The unique identifier of each user in the data set
MCPCHCMCHCICProvince String The province in Pinyin that this user
comes from an huiAge Numeric The predicted age of the user 21Gender
String The predicted gender of the user maleRegisteredMonthCnt
Numeric The number of months between a user’s registration time and
December 1, 2019 66FollowCnt Numeric The number of people a user
has followed till December 1, 2019 1Level Numeric The activity
intensity level of a user ranging from 0–10 10
Zhang et al.: NetEase Cloud Music DataManufacturing &
Service Operations Management, Articles in Advance, pp. 1–10, ©
2020 INFORMS 9
-
provides user demographic information regardingeach of the 2,
085, 533 users in our impression data set.We believe that this data
set is unique in several
ways compared with other public data sets in theliterature and
provides unique research opportuni-ties. First, unlike previous
data sets used in the rec-ommendation literature, such as the
Expedia dataset,5 this data set does not only contain what
prod-ucts had been recommended (i.e., the impression) butalso the
sequence of products that were recommended(i.e., the impression
position). Such features allowthe researchers to explore dynamic
recommendationpolicies. Second, although the past literature on
user-generated content typically focuses on user-day-leveldata
(see, e.g., Zhang and Zhu 2011, Huang et al.2019), this data set
provides impression-level de-tails. Such details could allow
researchers to studyhowdifferentmicrolevel dynamics affect
thedemandand production of user-generated content on a
platform.Third, compared with traditional data sets used
intwo-sided platforms (see, e.g., Shen et al. 2019), thesedata
cover not only the demand-side consumptionbut also the
production-side response to demandfeedback, such as likes, shares,
and comments.Researchers can utilize these data to study
typicaltwo-sided control problems on a platform, such asmatching,
which otherwise can be difficult to study.To access the data and
participate in the compe-
tition, members of the RMP section at INFORMS cango to the data
hosting website on the INFORMS RMPsection website and follow the
download link. Thedata file is a compressed file that contains all
sixtables in the comma-separated values format.
Endnotes1 See
https://variety.com/2019/biz/news/music-streaming-soared-2010s-decade-riaa-1203454233/.2
See
https://www.musicbusinessworldwide.com/alibaba-is-spending-2bn-to-acquire-20-of-netease-cloud-music-say-sources/.3
See https://www.investopedia.com/terms/i/impression.asp.4Note that,
becausewe only have 5% of users, not all creators in Table 5in our
data can bematchedwith a user. Similarly, not all users in Table
4can be matched with a creator.5 See
https://www.kaggle.com/c/expedia-hotel-recommendations/overview.
ReferencesHuang N, Burtch G, Gu B, Hong Y, Liang C, Wang K, Fu
D, Yang B
(2019) Motivating user-generated content with
performancefeedback: Evidence from randomized field experiments.
Man-agement Sci. 65(1):327–345.
Shen Z-J M, Tang CS, Wu D, Yuan R, Zhou W (2019)
JD.com:Transaction level data for the 2020 MSOM data driven
researchchallenge. Preprint, submitted January 23,
https://ssrn.com/abstract=3511861.
Zhang XM, Zhu F (2011) Group size and incentives to contribute:A
natural experiment at Chinese Wikipedia. Amer. Econom.
Rev.101(4):1601–1615.T
able
5.DataDictio
nary
forcreator_de
mog
raph
ics.csvan
dcreator_stats.csv
Pane
lA:D
atadictiona
ryforcreator_de
mog
raph
ics.csv
Fieldna
me
Datatype
Descriptio
nSa
mplevalue
CreatorId
String
Theun
ique
iden
tifier
ofeach
creatorof
acard
intheda
taset
KCJC
KCNCNCLC
LCIC
NC
Gen
der
String
Thepred
ictedge
nder
ofthecreator,which
canbe
unkn
ownor
NA
male
Reg
isteredM
onthCnt
Num
eric
Thenu
mbe
rof
mon
thsbe
tweenthiscreator’sregistratio
ntim
ean
dDecem
ber1,
2019
66Fo
llows
Num
eric
Thenu
mbe
rof
peop
leacreatorha
sfollo
wed
onNov
embe
r1,
2019
66Fo
llowed
sNum
eric
Thenu
mbe
rof
follo
wersacreatorha
son
Nov
embe
r1,
2019
1CreatorTy
peNum
eric
Thean
onym
ized
type
ofacreatorwith
10leve
ls0
Leve
lNum
eric
Theactiv
ityintensity
leve
lof
acreatorrang
ingfrom
0–10
10
Pane
lB:
Datadictiona
ryforcreator_stats.csv
Fieldna
me
Datatype
Descriptio
nSa
mplevalue
CreatorId
String
Theun
ique
iden
tifier
ofeach
creatorof
acard
intheda
taset
KCJC
KCNCNCLC
LCIC
NC
Dt
Num
eric
Thenu
mbe
rof
days
from
thestartof
thesamplepe
riod
11Pu
shlishM
logC
ntNum
eric
Thenu
mbe
rof
cardsthis
creatorha
screatedforagive
nda
te1
Zhang et al.: NetEase Cloud Music Data10 Manufacturing &
Service Operations Management, Articles in Advance, pp. 1–10, ©
2020 INFORMS
https://variety.com/2019/biz/news/music-streaming-soared-2010s-decade-riaa-1203454233/https://variety.com/2019/biz/news/music-streaming-soared-2010s-decade-riaa-1203454233/https://www.musicbusinessworldwide.com/alibaba-is-spending-2bn-to-acquire-20-of-netease-cloud-music-say-sources/https://www.musicbusinessworldwide.com/alibaba-is-spending-2bn-to-acquire-20-of-netease-cloud-music-say-sources/https://www.investopedia.com/terms/i/impression.asphttps://www.kaggle.com/c/expedia-hotel-recommendations/overviewhttps://www.kaggle.com/c/expedia-hotel-recommendations/overviewhttps://ssrn.com/abstract=3511861https://ssrn.com/abstract=3511861
NetEase Cloud Music DataIntroductionData DescriptionConclusion
and Data Access