NoSQL- Graph databases (Neo4J), or the graph model - English Version

Para versão em português, clique aqui!

/* Ladies And Gentlemen DBAS!!!
     In my first post in english (sorry in advance), I would like to show you one of most interesting database models. The Graph Database (or Property Graph data model).

     A graph is model to store data which can be naturaly structured in an graphic way. A social network is an typical example, we have users that make connections between itself in many diferent ways (friend, follower, etc..) and they also can get connected with a lot of other objects like ideas, products or blog posts (like, follow, etc..)

      Another good examples are hierachy, corporate networks and semantic web applications

Fig.1 – Modeling my family

      Instead of use tables, columns and PK/FK relationships , an graph database uses vertices, properties and edges.

      A vertex is a object with properties. Example: In a book store, the vertices could be books, authors and publishers. Books could be described with properties like Title, Genre, Price, Total Pages, etc... authors receive properties like Name, Birthday, biography, etc..
             The edges connects the nodes with labels like Writed_by,
Published_by, etc... Edges can also have properties


      The tool that we are going to use in our "hands on" is Neo4J, that can be free download here. The Neo4J it is a BluePrints implementation and BluePrints  is a generic API for graph models. Another BluePrints implementations are TinkerGraph, OrientDB, DEX, InfiniteGraph, Rexster, e SailRDFStores
      We will also use Gremlin language to manipulate the data. Gremlin is based on Groovy and Pipes. There is a lot of languages that we could use (REST, Cypher). The main point is always choose the right tool for the job.
Note: Excuse me for the "new names avalanche", so many options make any begginner confuse, especially when we came from the relational world where the products are more integrated and ready to use, but is important to have a overview of all tools involved in a graphDB ecosystem if you want to implement it on the real world.

Creating an graph

      If your OS is Windows, all you need to do is download the Neo4J packagen on website , uncompressed it and execute Neo4J.bat file located in Bin folder
      The script will open a java windows (it is the service running)

      The Neo4J provides a web interface web hosted on your machine. by default, you can access the web admin on  http://localhost:7474/webadmin/

   To start, click on Console tab and select the Gremlin button on rigth-upper corner.

Inserting vertex, properties and listing
   Lets to create a basic social network, the vertex represents users and the unique edge type available will be KNOWS (CONHECE in portuguese).
   You can see below the code to add some users (a litle tribute to some friends of mine)

            Just after the ASCII Art, the webadmin instance the variable g that represents the graph.
After that, we use addVertex function to add our first vertex
            The vertex properties are passed as a parameter of the function, it is an JSON that basically have this structure:
         [PropertyName:PropertyValue, ...]

As you can notice, one differece between a relational database and a noSQL is that noSQL is schemaless,so properties can varies in the vertices.
To list vertex, use g.V, where V is a collection of vertices inside the graph G. To list a specific property type the property name after V. Example: g.V.Name or g.V.any_other_property.

We notice that the Bruno Salim position ("profissão" in portuguese) was writed wrong. Lets update this ( the vertex ID of Bruno is 28).

Very simple, as is simplier add new properties. For example, lets add a position to Gustavo vertex.

Or add a completely new property (this is schemaless!)

     The map show all vertex properties (a JSON structure).

Connecting vertices.
      Now, we are going to create connections between vertices through the edge type KNOWS

First of all, we assign the vertices to friendly named variables , then we add the edges between them.
      Finally, lets answer a few typical questions of social networks and figure the power of graph db.
      Gremlin queries are divided by steps separated by dots.
A step use the result of the previous step and can  transform, filter or insert some side effects in results.
Who I know?
      The Felipe object is me, we use the out step to return all vertices wich receive edges of the object.

Who are the friends of my friends?
     Re-applying the out step, we list the vertices conected to the vertices returned by the first out.

     One way to avoid the use of the out step many times it is use the loop step, it repeat the previous step until the rule in brackets is not fulfilled (In our case, we limited the loop to twice times checking the it.loops property.)
Who can introduces Felipe to Gustavo?
     The power of graph can be showed solving this question. In a relational database, the correspondent query will be heavy and unsuitable for applications with many users, but in a graph db regardless the quantity of users and iterations the result will keep the same performance.

     The loop will iterate until it find the the gustavo vertex. The query list two possible paths.
     Replacing the "out" steps  for pairs of "outE" (outbound edges) and "inV" (to get the vertices) we will get how they relate.

     Lets delete the connection between Spigariol and Gustavo and try again.

    In the first line, we use 'g.E' to list all edges, then 'inV' to get the verices and 'has' to filter that vertices with property Name equals "Gustavo". Finally, we use back(2), to show the results of two steps behind.(returning to E).
    The next command we remove the Edge, now we have only one possible path to Gustavo.


    That was a little tour on GraphDB, what are the tools to access and a "hands on" to create, erase and query the stored vertices. This post is not a full tutorial, for a more complete information please see this great video of Andreas Kollegger <twitter> , he explain the graph database concept and present an full demo of Neo4jMy favorite part of the video is when he said that a relational database is great to calculate the salary average of the attendees of the webcast, but the graph would be better to identify who would buy him a beer.
      Thanks for your reading. If you have any questions, feel free to send me a email, or follow me on twitter, I´m always posting related content of the marvelous world of data!

Best Regards!

Felipe Antunes

twitter: @felipe_store