Top Banner
ETL into Neo4j Max De Marzi
31

ETL into Neo4j

May 10, 2015

Download

Technology

Max De Marzi

Learn some of the ways to load data into Neo4j quickly.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ETL into Neo4j

ETL into Neo4j

Max De Marzi

Page 2: ETL into Neo4j

About Me

• My Blog: http://maxdemarzi.com• Find me on Twitter: @maxdemarzi• Email me: [email protected]• GitHub: http://github.com/maxdemarzi

Built the Neography Gem (Ruby Wrapper to the Neo4j REST API)Playing with Neo4j since 10/2009

Page 3: ETL into Neo4j

Agenda

• ETL your mind• ETL with Batch and the REST API• ETL with Gremlin and Groovy• ETL with the Batch Importer• ETL from SQL

Page 4: ETL into Neo4j

ETL your Mind

You have to start there

Page 5: ETL into Neo4j

More Relational than Relational

Stop thinking about howTables are related

Start thinking about relationships

Page 6: ETL into Neo4j

Objects like to mingle

Optimized for “trees” of data Optimized for seeing the forest and the trees, and the branches, and the trunks

Page 7: ETL into Neo4j

SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1

Page 8: ETL into Neo4j

START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill

Page 9: ETL into Neo4j

Property Graph

Page 10: ETL into Neo4j

name

code

word_count

Language

name

code

flag_uri

Country

IS_SPOKEN_IN

as_primary

language_code

language_name

word_count

Language

country_code

country_name

flag_uri

Country

language_code

country_code

primary

LanguageCountry

Page 11: ETL into Neo4j

name: “Canada”

languages_spoken: “[ ‘English’, ‘French’ ]”

name: “Canada”

language:“English”

language:“French”

spoken_in

spoken_in

name: “USA”

name: “France”

spoken_in

spoken_in

Page 12: ETL into Neo4j

name

flag_uri

language_name

number_of_words

yes_in_langauge

no_in_language

currency_code

currency_name

Country

USES_CURRENCY

name

flag_uri

Country

name

number_of_words

yes

no

Language

SPEAKS

code

name

Currency

Page 13: ETL into Neo4j

ETL with Batch and the REST API

Page 14: ETL into Neo4j

Batch command from REST API

Great for importing Facebook/Twitter friends

Keep each request under 10k commands

Preferably send a request every 2k to 5k commands

Page 15: ETL into Neo4j

Using Batch from Neography

Page 16: ETL into Neo4j

Why BatchTransactional: any failures not committed.

Ordered: responses guaranteed to be in the same order as sent.

Continuous loading/updating nodes and relationships in spurts or streaming.

Page 17: ETL into Neo4j

ETL with Gremlin and Groovy

Page 18: ETL into Neo4j

Commit every 1000 changes or so, make sure to stop the transaction to commit the last few changes at the very end.

Look into auto-indexing to make life easier.

Disabled by default. See Docs for trick to make it full text instead of exact index.

http://docs.neo4j.org/chunked/milestone/auto-indexing.html

Page 19: ETL into Neo4j

Crazy Format is okId :: Title :: Genre|Genre|Genre

But it’s preferable to stay clear of escape characters like “|”

String location of data file, converted to URL, then processed one line at a time.Movie vertex created, genre vertex created unless it exists (index lookup), edge from movie to genre is created.

Full walk-through on http://maxdemarzi.com/2012/01/13/neo4j-on-heroku-part-one/

Page 20: ETL into Neo4j

ETL with the Batch Importer

Page 21: ETL into Neo4j

Installation Walk-Through

Page 22: ETL into Neo4j

Testing it

7.5M nodes, 42M relationships in just over 3 minutes on a laptop.

Page 23: ETL into Neo4j

Loading it into Neo4j

Full walk-through on http://maxdemarzi.com/2012/02/28/batch-importer-part-1/

Page 24: ETL into Neo4j

When to use the Batch Importer?

• 1st time loading or periodic reloading

• When you need Speed

• When you don’t mind a little Java

Page 25: ETL into Neo4j

ETL from SQL

Page 26: ETL into Neo4j

Identities who vouched for each other

row_number() and INTO are our friends

Page 27: ETL into Neo4j

The “term” vouched for will serve as our relationship type, status is a relationship property.

Page 28: ETL into Neo4j

Notice there are no node ids.These are automatic, clkao is node 1

Page 29: ETL into Neo4j

No time to get coffee >8-[

Page 30: ETL into Neo4j

What about multiple types of nodes?No problem, just add the MAX(node_id) from the first table.

Full walk-through at: http://maxdemarzi.com/2012/02/28/batch-importer-part-2/

Need help? E-mail me, catch me on Google chat or Skype.

Please don’t be shy…. and read my blog:

http://maxdemarzi.com

Page 31: ETL into Neo4j

Thank you!http://maxdemarzi.com