Top Banner
Final project Group members: Xinzhe Cao Fan zhou Zheyu Hua Hanwen Liu June 23, 2018 1
18

Final project - SJTU

Feb 16, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Final project - SJTU

Final project

Group members:Xinzhe CaoFan zhouZheyu HuaHanwen Liu

June 23, 2018

1

Page 2: Final project - SJTU

Contents

1 Brief introduction of our website 31.1 The composition of the website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 The main functions of the website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Main algorithm of the website 42.1 Search engine design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Recommand algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Data Visualization 63.1 Force-Directed-Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.1 3-step Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 The database improvemnets 94.1 Elasticsearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.2 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.3 Request Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.4 Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1.5 Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1.6 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.1.7 Kibana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.2 The improvement of our sql database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 The UI designing 135.1 Home page designing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 navigation parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.2.1 Top navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2.2 side navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.3 Other pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2

Page 3: Final project - SJTU

1 Brief introduction of our website

1.1 The composition of the website

Home page

– Provide a search engine to search the title, author and venue.It would direct to these page.

∗ result page for author

∗ result page for title

∗ Conference page

Author page

– Provide two parts of information of the target author.

∗ Papers of the authors

∗ Relaionship graph of the author

Paper page

– Include two parts of information

∗ Detail information of the paper

∗ A recommandation of the paper

Conference page

– Include two parts of the information

∗ Brief introduction of the conference

∗ All the papers of the conference

Renference page

– Include two parts of the information

∗ Detail information of the paper

∗ Reports cited that target paper

1.2 The main functions of the website

1 The searching engine of the authorpaper and conference.

– At the home page, the search box is provided for searching the different information of theauthors.

– We are trying different direction for the improvement of the website response speeds, and byapplying the elesticsearch as well as some sql improvements we success to limit the time lessthat about 1s for each page.

2 Complete information as well as hyperlinks between each pages for better information searching.

– We arrange the detail information for each papers including the papers’ title, year, conference,affilation , and all the author ordered by their author sequence. As for these title, author,conference, we can click it to get into others pages.

3 Visualization of data for more useful presentation the information of the author.

– A force image is provided in the author page to present the relationship of the authors and hiscooperators. We use the softmax regression to predict the relationship of these cooperators,and draw the graphes by apllying the method of d3.js.

3

Page 4: Final project - SJTU

2 Main algorithm of the website

2.1 Search engine design

In order to satisfy the need of different search direction, I use the option caption.

1 <select name="catid" id="catid_1" class="selectpicker form-control"

onchange="show_or_hide(this.value)">

2 <option value="result.php">Author</option>

3 <option value="title.php">Title</option>

4 <option value="conference.php">Conference</option>

5 </select>

At this time, I use the onchange function to gain the real time selection of the option.for this onchange function, it could change the link of the form and desiging different autocompletefunction for it.

1 <script type="text/javascript">

2 function show_or_hide(v) {

3 if(v=="result.php") {

4 $(function(){

5 $("#key").autocomplete({

6 source: "search.php",

7 minLength: 2,

8 autoFill: true

9 });

10 });

11 }

12 ....

13 }

14 </script>

Also, we attach a function selectionAction() to the search button, in this function we get the value ofthe option at the real time, and change the link in turn. So that we could use different type of searchby our selection.

1 html::

2 <button class="btn btn-info" style="background-color: white" aria-label="Left Align"

onclick="selectAction();">

3 javascript

4 function selectAction() {

5 var url="http://localhost/begin/";

6 var selector = document.getElementById("catid_1");

7 var theForm = document.getElementById("head_sh");

8 var checkValue = selector.options[selector.selectedIndex].value;

9

10 theForm.action = url + checkValue;

11 theForm.submit();

12 }

2.2 Recommand algorithm

Another function of our page is to recommend papers.It is very common that in google scholar or otherscholar website, we can find this recommend part.So we also add this function in our website. The keyto solve this problem is to think of an algorithm to recommend relative paper according to the paperusers are viewing.

4

Page 5: Final project - SJTU

Table 1: recommend papers in baidu scholar

5

Page 6: Final project - SJTU

We dicide that our page will mainly use 3 types of papers:

• The author’s other papers

• other papers cited

• other papers namely alike

And we can sort these papers, in order to show the most influential papers. Here we use cited times ofeach paper and thus we can use python to do this part of job.

1 while True:

2 line = f.readline().strip(’\n’)

3 if line:

4 list = line.split(’\t’)

5 exestr="""select PaperID,count(*) from paper_reference where

paper_reference.PaperID=’%s’ group by PaperID""" %list[0]

6 cursor.execute(exestr)

7 db.commit()

8 a=cursor.fetchall()

9 exestr="""update papers set papers.ReferenceTime=%d where papers.PaperID=’"""

%a[0][1]+a[0][0]+"""’"""

10 cursor.execute(exestr)

11 db.commit()

12 else:

13 break

And then use either SQL or elasticsearch. In the paperpage, according to the paperID, we then searchand locate the top5 or 10 papers in the result list and output the information in the page.This part isvery similar to the UI and page design.

3 Data Visualization

3.1 Force-Directed-Graph

So Force-Directed-Graph is actually an algorithm whose purpose is to position the nodes of a graph intwo-dimensional or three-dimensional space so that all the edges are of more or less equal length andthere are as few crossing edges as possible, by assigning forces among the set of edges and the set ofnodes, based on their relative positions, and then using these forces either to simulate the motion of theedges and nodes or to minimize their energy.

Here are some examples of the Force-Directed-Graph.

Table 2: result page1

6

Page 7: Final project - SJTU

3.1.1 3-step Process

In our project, the FDG is used to show the relationship between various academic cooperaters in authorpage. We want users to view clearly the relationship between the author and his/her cooperater.

To establish a Force-Directed-Graph, there are mainly 3 steps.

• Feed data to nodes and lines in json form

• Decide the Canvas size and other basic constant

• Add more detail to optimize visualization

So in our page, the relationship is predicted by Back-end data.And we already store such data usingPython machine learning method.And we can set this aside.

The graph is mainly about nodes and lines,so in php, we should give definition of such two types.

1 var link = svg.append("g")

2 .attr("class", "links")

3 .selectAll("line")

4 .data(graph.links)

5 .enter().append("line")

6 .attr("stroke-width", function(d) { return Math.sqrt(d.value); });

7

8 var node = svg.append("g")

9 .attr("class", "nodes")

10 .selectAll("circle")

11 .data(graph.nodes)

12 .enter().append("circle")

13 .attr("r", 5)

14 .attr("fill", function(d) { return color(d.group); })

15 .call(d3.drag()

16 .on("start", dragstarted)

17 .on("drag", dragged)

18 .on("end", dragended))

Next we should set some necessary things to the graph.

1 function isConnected(a, b) {

2 return linkedByIndex[a.index + "," + b.index] || linkedByIndex[b.index + "," + a.index] ||

a.index == b.index;

3 }

4

5 node.append("title")

6 .text(function(d) { return d.id; });

7

8 simulation

9 .nodes(graph.nodes)

10 .on("tick", ticked);

11

12

13

14 simulation.force("link")

15 .links(graph.links);

16

17 function ticked() {

18 link

19 .attr("x1", function(d) { return d.source.x; })

20 .attr("y1", function(d) { return d.source.y; })

7

Page 8: Final project - SJTU

21 .attr("x2", function(d) { return d.target.x; })

22 .attr("y2", function(d) { return d.target.y; });

23 node

24 .attr("cx", function(d) { return d.x; })

25 .attr("cy", function(d) { return d.y; });

26 }

27

28

29 }

30 });

31

32

33 function dragstarted(d) {

34 if (!d3.event.active) simulation.alphaTarget(0.3).restart();

35 d.fx = d.x;

36 d.fy = d.y;

37 }

38 function drag(){

39 return force.drag()

40 .on("dragstart",function(d){

41 d3.event.sourceEvent.stopPropagination();

42 d.fixed=true;

43 });

44 }

45 function dragged(d) {

46 d.fx = d3.event.x;

47 d.fy = d3.event.y;

48 }

49

50 function dragended(d) {

51 if (!d3.event.active) simulation.alphaTarget(0);

52 d.fx = null;

53 d.fy = null;

54 }

The above sentences ensure that a FDG is then established.But the graph is still very plain, to sortauthor cooperate types and their relationship more clearly, we can add more style to the ndoes anddefine some extra function in this graph.

1 .on("mouseover",function(d,i){

2 link.style("stroke-width",function(edge){

3 if (edge.source===d || edge.target===d){return "2px";}

4 else {

5 return "0.5px";}

6 })

7

8 .style("stroke",function(edge){

9 if (edge.source===d||edge.target===d){

10 return "#000";}

11 });

12

13 node.append("title").text(function(d){return d.group;});

14

15

16 })

17

18 .on("mouseout",function(d,i){

19 link.style("stroke-width",function(edge){

20 if (edge.source===d ||edge.target===d){return "2px";}

21 else{return "2px";}

8

Page 9: Final project - SJTU

22 }).style("stroke",function(edge){

23 if (edge.source===d ||edge.target===d){return d.value;}

24 })

25

26 });

The above are part of the codes to realize the function that when users moves the mouse over one nodes,it will stroke and others would fade in order to emphasize and make the current node stnad out. Soour final Force-Directed-Graph looks like the picture showed below, and we can drag it to analyze itsstructure.

Table 3: Force-Directed-Graph results

4 The database improvemnets

4.1 Elasticsearch

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you tostore, search, and analyze big volumes of data quickly and in near real time. It is generally usedas the underlying engine/technology that powers applications that have complex search features andrequirements.

4.1.1 Installation

I first installed the following tools to build the environment:

Java, Elasticsearch, Composer and curl.

Then we can go and search.

4.1.2 Searching

Elasticsearch provides us with two kinds of formulas.One is the request body, the other one is the request URI.

4.1.3 Request Body

Elasticsearch provides a JSON-style domain-specific language that you can use to execute queries.

9

Page 10: Final project - SJTU

1 Qbody = {

2 "query": {

3 "match": {

4 "Title": "home" # Title home

5 }

6 }

7 }

• Request URIThis way is brief and suitable for our needs.

1 $value = "localhost:9200/hwtry/paper/_search?q=Title:*home*&size=100";

1 $value = "localhost:9200/hwtry/paper/_search?q=Title:a%20a&size=100";

• Fuzzy SearchWe use * to instead of the missing letters to achieve fuzzy search.

• More ConditionWe use & to search with more than one condition. In my code, it fetch data and limit it less than100 columns. It promote the efficiency and speed.

4.1.4 Space

One thing haunted me for several days. That is, how to search with a space.

• InspirationFortunately, when I was surfing the internet, Zhihu inspired me. You can see the %20 in the picture.

• FoundNot only Zhihu, I found that many other websites like Baidu & Wekipedia also use Elasticsearch.

4.1.5 Codes

• Python-Create DatabaseI use python to create the Database.

1 action = {

2 "_index": "hwtry",

3 "_type": "paper",

4 "_id": i,

5 "_source": {

6 "paperID": paper[0],

10

Page 11: Final project - SJTU

7 "Title": paper[1:-2][0],

8 "PaperPublishYear": paper[-2].rstrip("\n"),

9 "ConferenceID": paper[-1].rstrip("\n")

10 }

11 }

To do such thing, it only needs 7 seconds.

• PHP-Search

1 $client = Elasticsearch\ClientBuilder::create()->setHosts([’localhost:9200’])->build();

2

3 $params[’index’] = ’hwtry’;

4 $params[’type’] = ’paper’;

5 $params[’body’][’query’][’match’][’Title’] = ’based on’

6 $results = $client->search($params);

Besides the basic data we need, it can also give us many other details like those ones. The shards,I want to emphasize to you, its the great thing of Elasticsearch.

1 # took

2 # timed_out

3 # _shards

4 # hints.total

5 # hints.max_score

4.1.6 Comparison

As you can see, the original database needs 3.8 seconds, while with the new one, we cut down the timeto 0.56 second. However, with Elasticsearch the number is 0.08, about 2 percent of the original one.

11

Page 12: Final project - SJTU

4.1.7 Kibana

The last one I want to introduce to u is Kibana, which enhanced our interaction with Elasticsearch.

12

Page 13: Final project - SJTU

4.2 The improvement of our sql database

In order to satisfy the respond speed requirement, our group try a new database to store these groupsof data.

old table new table

affiliations new new paperauthors new paperconference new authorpaper author affiliationpaper referencepapers

In our new database, we need only three database. The convenience is that we need not to use some joinand union operation thus we could greatly simplify the program.

Then we introduce these different databases.

1 new new paper

– To statisfy our website-paper’s requirement, these database are intended with fields:PaperID,Title,CitedNum,AffiliationName,PublishYear,ConferenceID,ConferenceName.It could satisfy three needs.

1 Could satisfy the vague searching of the title.

2 Could give the detail information of certain paper.

3 Could provide the citation index of the paper.

2 new paper

– It’s a inherited chart from the new new database, for it contain the all the author of onecertain paper. So we could quick get the result and list it by the author sequence.

3 new author

This database is designed for the authors’ vague search. It contain these fields:AuthorID,AuthorName,PaperNum,MainAffiliationSo it can easily satisfy the need of our searching of the authors.

At last, our website equiping these improved database could reach the speed at within 1 second forresponse. It’s a quite big improvements.

5 The UI designing

5.1 Home page designing

I laid a background picture under the search bar. Since scandals of academic cheating have been springingup those days, I finally chose a picture of mountains. In Chinese culture(and maybe many other cultures),mountains can hardly be removed and therefore they represent permanence. I intended to convey throughthe page that scholars should stick to their heart and not be moved by the outer environment. As forour students, the background picture is also a reminder that we should keep to be righteous and honestin our daily studies.

Below the search bar is some information about IEEE and ACM, the two largest associations incomputer science and electronic engineering. The brief introduction adds to the practicality of our page.At the bottom of the home page is a footer, where we have put our logo. If possible, we can also addmore information such as names and contact details of our team. There is also a side bar which can be

13

Page 14: Final project - SJTU

activated by the icon in the top right corner of the home page. You can be easily navigated to the restof the pages through the bar.

Home

5.2 navigation parts

As for a website, a good navigation is good for their visitor to get to target place.

14

Page 15: Final project - SJTU

5.2.1 Top navigation

So, I design a navigation at the topside for the website.

As it shows, it contain several hyperlink to different page, and with a search bar for us to search targetinformation.

5.2.2 side navigation

At this time, I use a bootstrap plug-in affix.js to make a side navigation.

we could use this navigation to tell the vistor what the information the website would give, also it helpthe visitor locating the target information more swiftly.What we may add in the future is adding the title of the paper or the key words of these papers,which may give the visiter more help.

15

Page 16: Final project - SJTU

5.3 Other pages

As for the other pages, I keep the idea of the simple ,highly effcient and easy to use. So I design theother pages in a relativey simple but with enough information.

Result of the Author search

Author

16

Page 17: Final project - SJTU

Paper

17

Page 18: Final project - SJTU

Conference

18