Author Archives: admin

Using Neo4j CYPHER queries through the REST API

Lately I have been busy with graph databases. After reading the free eBook “Graph Databases” I installed Neo4j and played around with it. Later I went as far as to follow the introduction course as well as the advanced graph modeling course at Xebia. This really helped me start playing around with Neo4j in a bit more structured manner than I was doing before the course.

I can recommend installing Neo4j and just starting to use it, as it has a great user interface with excellent help manuals. For instance, this is the startscreen:

Neo4j-startscreen

Easy, right?

One of the things that struck me was the ease with which you could access the data from ECMAscript (or Javascript if you’re very old and soon-to-be obsoleted). Using the REST API you can access the graph in several ways, reading and writing data from and to the database. It’s the standard interface, actually. There’s a whole section in the Neo4j help dedicated to using the REST API, so I’ll leave most of it alone for now.

What’s important, is that you can also fire CYPHER queries at the database, receiving an answer in either JSON or XML notation, or even as an HTML page. This is important because CYPHER queries are *very* easy to write and understand. As an example, the following query will search the sample database that is part of the Neo4j database, with Movies and Actors.

Suppose we want to show all nodes that are of type Movie. Then the statement would be:

MATCH (m:Movie) RETURN m

A standard query to discover what’s in the database is
MATCH (m) RETURN m LIMIT 100
This is limited to 100 items (nodes and/or relationships), because it does return the entire database otherwise and in the user interface this starts to slow things down. It’s gorgeous, but when your resultsets are getting big it does slow things down. Here’s how it looks:

Neo4j-results-1

Very nice. But not that useful if we want a particular piece of data. However, if we want to show only the actors that played in movies, we could say:

MATCH (p:Person)-[n:ACTED_IN]->(m:Movie) RETURN p

This returns all nodes of type Person that are related to a node of type Movie through an edge of type ACTED_IN.

While I won’t go into more detail on Cypher, let’s just say it is a very powerful abstraction layer for queries on graphs that would be very hard to do with SQL. It’s not as performant as actually giving Neo4J explicit commands using the REST API, which you want to do if you build an application where sub-second performance is an issue, but for most day-to-day queries it’s pretty awesome.

So how do we use the REST API? That’s pretty easy, actually. There are two options, and one of them is now deprecated – that is the old CYPHER endpoint. So we use the new http://localhost:7474/db/data/transaction/commit endpoint, which starts a transaction and immediately commits it. And yes, you can delete and create nodes through this endpoint as well so it’s highly recommended to not expose the database to the internet, unless you don’t mind everyone using your server as a public litterbox.

You have to POST requests to the endpoint. There are endpoints you can access with GET, like http://localhost:7474/db/data/node/1 which returns the node with id=1 on a HTML page, but the transactional endpoint is accessed using POST.

The easiest way to use a REST API is to start a simple webserver, create a simple HTML-page, add Javascript to it that responds to user input and that calls the Neo4j REST API.

Since we’re going to use Javascript, be smart and use JQuery as well. It’s pretty much a standard include.

How to proceed:

  • First, start the Neo4j software. This opens a screen where you can start the database server, and in the bottom left of the screen you can see a button labeled “Options…”. Click that, then click the “Edit…” button in the Server Configuration section. Disable authentication for now (and make very sure you don’t do this on a server connected to the internet) by changing the code to the following:

    # Require (or disable the requirement of) auth to access Neo4j
    dbms.security.auth_enabled=false

    This makes sure we don’t have the hassle of authentication for now. Don’t do this on a connected server though.

  • Now, we start the Neo4j database. Otherwise we get strange errors.
  • Then, proceed to build a new HTML-page (I suggest index.html) on your webserver, that looks like this:
    <html>
    <head>
    <title>Cypher-test</title>
    <script src="scripts/jquery-2.1.3.js"></script>
    </head>
    <body>
        <script type="text/javascript">
            function post_cypherquery() {
                $('#messageArea').html('<h3>(loading)</h3>');
    
                $.ajax({
                    url: "http://localhost:7474/db/data/transaction/commit",
                    type: 'POST',
                    data: JSON.stringify({ "statements": [{ "statement": $('#cypher-in').val() }] }),
                    contentType: 'application/json',
                    accept: 'application/json; charset=UTF-8'                
                }).done(function (data) {
                    $('#resultsArea').text(JSON.stringify(data));
                    /* process data */
                    // Data contains the entire resultset. Each separate record is a data.value item, containing the key/value pairs.
                    var htmlString = '<table><tr><td>Columns:</td><td>' + data.results[0].columns + '</td></tr>';
                    $.each(data.results[0].data, function (k, v) {
                        $.each(v.row, function (k2, v2) {
                            htmlString += '<tr>';
                            $.each(v2, function (property, nodeval) {
                                htmlString += '<td>' + property + ':</td><td>' + nodeval + '</td>';
                            });
                            htmlString += '</tr>';
                        });
                    });
                    $('#outputArea').html(htmlString + '</table>');
                })
                .fail(function (jqXHR, textStatus, errorThrown) {
                    $('#messageArea').html('<h3>' + textStatus + ' : ' + errorThrown + '</h3>')
                });
            };
        </script>
    
    <h1>Cypher-test</h1>
    <p>
    <div id="messageArea"></div>
    <p>
    <table>
      <tr>
        <td><input name="cypher" id="cypher-in" value="MATCH (n) RETURN n LIMIT 10" /></td>
        <td><button name="post cypher" onclick="post_cypherquery();">execute</button></td>
      </tr>
    </table>
    <p>
    <div id="outputArea"></div>
    <p>
    </body>
    </html>
    

    Make sure you don’t forget to download JQuery and put the downloaded file in the scripts subdirectory below the directory in which you place this file. The line where you need to change the corresponding filename if you rename the file or place it somewhere else is highlighted in red.

While this doesn’t look very pretty, it gets the job done. It executes an AJAX call to Neo4j, using the transactional endpoint. After receiving a success-response, it writes the raw answer (JSON) into the resultsArea over the input box. Then, it parses the result and writes the results to a table in the dataArea.

The resultset from neo4j is returned as a data-object that looks like this:

{
  "results" : [ {
    "columns" : [ "n" ],
    "data" : [ 
      {"row" : [{"name":"Leslie Zemeckis"}]}, 
      {"row" : [{"title":"The Matrix","released":1999,"tagline":"Welcome to the Real World"}]}, 
      {"row" : [{"name":"Keanu Reeves","born":1964}]} 
      ]
  } ],
  "errors" : [ ]
}

Note the different row-variants. Since we did not limit ourselves to a single type of node, we got both Movie- and Actor-nodes in the result. And even within a single node-type, not every node has the same properties. The neo4j manual has more information about the possible contents of the resultset.

Please note that ANY valid Cypher-statement will be executed, including CREATE and DELETE statements, so feel free to play around with this.

– Ronald Kunenborg.

Presentation: history of DWH modeling

Dear readers, on june 6th I held a keynote presentation in front of 300 people, summarizing the state of DWH modeling. The conference proceedings of the day are available at BI-Podium .

My own presentation is available here as well: Next Generation DWH Modeling 2013 conference keynote speech

The Anchor Modeling folks also wrote a summary: Next Generation DWH Modeling

New website

Dear reader,

welcome to the new Grundsätzlich IT website. I hope to be able to better inform you with this site, due to the fact that I no longer have to edit the entire site using notepad and FTP. Integration with Twitter and company pages on Facebook and LinkedIn was also high on the list of desired features. Finally, a comment section has been made available for postings to improve the ability of the readers to provide feedback.

Enjoy the website.

Time Dimension Generator v1.41

In a data warehouse we often use a table containing information about dates. This is called a Time dimension (it should rather be called a Date Dimension, but let’s not quibble). This table is used to deal with calculations regarding time in SQL queries a bit easier. It enables us to attach meaning to dates as well, by letting us add attributes to them. Using time-shifted columns in the table we can then say things like “give me all the sales of the same period last year, for period X to Z”. Given the right query, of course.

This tool generates a table you can load into a data warehouse. For more information, please see http://en.wikipedia.org/wiki/Dimension_(data_warehouse) (it needs improvement but it’s a start). Note that it is available in the Dutch language only, due to specific features that depend on the Dutch local calendar (like working days). I may (on request) rebuild it for international use, but please check the following alternatives first: http://it.toolbox.com/wiki/index.php/Create_a_Time_Dimension_/_Date_Table or http://www.ipcdesigns.com/dim_date/.

Start the Time Dimension Generator (Dutch language version only).

Templator v1.0

This tool provides a template-replacement-service. You provide a template with variables, a signle line with the names of the variables and an number of lines (say, N) with replacements (CSV-format). The result will be N templates, with all the variables replaced by row 1..N from the replacements. Those templates will be present in a single download.

What do you use it for? Well… currently I use it as follows: create an ETL-workflow in BODS. Export the workflow to XML. Replace the items that are variable with variable-names. Save this as a template. Now, when a new workflow is necessary, it is just a matter of copying the variables, their replacements and the template into the Templator. Then, run it and use the download as an import (after removing superfluous begin and end-tags).

It works best if you have highly similar flows based on highly similar inputs that differ only by up to ten variables. Beyond that, it tends to become more cumbersome to gather the replacement values than to copy and paste the original workflow. Your mileage may vary, however.

Start the Templator or the Dutch language version.

Form follows function

The required functionality determines the form of a data warehouse. This presentation aims to show how the form and architecture of a given data warehouse are determined by the challenges a company will meet when changing data in information. Solutions for those challenges shape the data warehouse, both technically and organizationally.

Please note that the presentation is in Dutch.

Dutch

Deze presentatie gaat over de manier waarop de vorm en architectuur van een data warehouse worden bepaald door de uitdagingen die een bedrijf ontmoet bij het veranderen van gegevens in informatie. De oplossingen voor de uitdagingen geven het datawarehouse zowel technisch als organisatorisch vorm.

Deze presentatie kan tegen reiskostenvergoeding worden gehouden en duurt ongeveer 45 minuten.

Download the presentation (make sure to check the license before using it)

Data Vault Cheat Sheet Poster v1.0.8

This is the poster from 2010 that displays on one A3-size cheat sheet the most important rules of the Data Vault modelling method version 1.0.8.

You can find the rules that were used for this poster on the website of Dan Linstedt.

Download the PDF

**Update**: Please note that the current version is version 1.0.9. This version is kept for archival and to amuse data archaeologists 🙂

Anchor Modeling


Anchor Modeling is a new method of modeling a domain in a database. The method splits up all the attributes in their own table. This seems complex, but this actually simplifies maintenance. Furthermore, the method is flexible, quite resilient to change over time, does not need updates and is highly scalable.

These are good properties for a data warehouse model. In the article I explain how Anchor Modeling works and why you should at least take a look at it.
The article appeared in november 2009 in Database Magazine, Dutch magazine for database professionals. However, the magazine is now defunct and superseded by Business Information Magazine.

Download the PDF

Reasons for failure in data warehouses

This article discusses the reasons why some data warehouse projects fail. The focus is on the question whether the resemblances to standard IT projects may be greater than the differences, and where the differences could be found. A number of guidelines are given that help to recognize and prevent project failures.

Original publication in Juli 2009, reworked in September 2009. Please note that the article is in Dutch.

Faalfactoren bij Data Warehouses

Dit artikel gaat over waarom data warehouse projecten falen. Het focus ligt op de vraag of de overeenkomsten met gewone projecten misschien groter zijn dan de verschillen, en waar eventuele verschillen in zitten. Er worden ook richtlijnen gegeven om die extra faalfactoren te herkennen en te voorkomen.

Oorspronkelijk gepubliceerd in Juli 2009, tekst licht bijgewerkt in September 2009.

Download the PDF