Author Archives: admin

A jQuery cheat sheet

jQuery cheat sheet

A few days ago I received an e-mail from Robert Mening at WebsiteSetup.org, who kindly pointed me to his jQuery cheat sheet. There are of course more than a few cheatsheets for jQuery out there (and I will link a few more here), but this one at least has the advantage that it all fits on one page. It is at the bottom of this page. Note that clicking on the image will take you to a webpage where you can also find a PDF-version of the cheat sheet.

Alternative cheat sheets can be found with a quick search on Google, and from various sources that integrate others (for instance at https://www.sitepoint.com/10-jquery-cheat-sheets/ ). The ones I found most useful though, are the following:

I do have some issues with the cheat sheets in question: there is usually no license on the sheet itself, and the version of jQuery for which it is relevant isn’t always mentioned. These small failings also apply to the cheat sheet shown below. However, if you’re doing some jQuery work now and then, you could do worse than just putting up a copy of the cheat sheet displayed here.

jQuery Cheat Sheet

Image source: websitesetup.org – free to link with attribution

… or not migrating to Cloudflare

www.cloudflare.com

Several days ago I wrote a blogpost about my Cloudflare migration. I have done a bit more research since then and unfortunately, there are a few nasty little snags that could really hurt your website if you use Cloudflare without considering some of the nastier implications of the way in which Cloudflare and similar services work.

Main issues

There are two main issues, as described by an interesting blogpost on the blog of Sven Slootweg. Sven is the former administrator of AnonNews.org, and a security researcher. He was also suspected of being part of LulzSec, a very high profile group of hackers. I’ve read most of his posts now and I take them very seriously.

What he states is basically that

  • CloudFlare offers services that are very insecure by default [1];
  • CloudFlare has a business model that depends on them being the “Man in the Middle” with access to a lot of traffic [1];
  • CloudFlare does not mitigate DDOS attacks, unless they are quite small and only affect your webserver [1];
  • Some other minor issues, which I’ll skip. You should read the references for things like issues with Tor [2], SEO rankings [3], website impersonation [4], etcetera, to determine whether they apply to you.

The main problem with the security is that while the user may consider his or her experience to be encrypted (by SSL), the path from CloudFlare to the origin server is not. Now, consider an enduser in a country with an oppressive government who uses your CloudFlare-fronted website. CloudFlare may have a local server in that country. Suppose the user visits a page on the governments blacklist and leaves a nasty comment. Normally, SSL would encrypt all traffic and the government couldn’t intercept the traffic and look inside it. Well, not unless they were using a bit more resources than a simple scanner. But with CloudFlare the traffic between the CloudFlare local server and the webserver hosting the incriminating page, that traffic goes over the border *unencrypted*. Ouch. In some countries, your user will not survive this experience. And the user isn’t even warned: the browser tells the user that everything is fine. Even an SSL test (https://www.ssllabs.com/ssltest/analyze.html?d=www.grundsatzlich-it.nl) will say so. But it’s not. Oh, and if you have a front-end that accepts credit card information, that information will *also* travel the entire internet unencrypted. Not as dangerous as the first scenario, but probably not something you’d like to see as someone using a webstore.

Now, you can actually use the option to also encrypt the traffic between CloudFlare and your server. Let’s just say it’s not the default and requires a bit more expertise than just “point and click”. However, it’s not that difficult. That still leaves CloudFlare as the single point where a lot of web traffic goes through – entirely unencrypted. Sven Slootweg comments that:

This may not sound that bad – after all, they’re just a service provider, right? – but let’s put this in context for a moment. Currently, CloudFlare essentially controls 11% of the 10k biggest websites, over 8% of the 100k biggest websites (source), and almost 5% of sites on the entire web (source). According to their own numbers from 2012(!), they had more traffic than several of the most popular sites and services on earth combined, and almost half the traffic of Facebook. It has only grown since. And unlike every other backbone provider and mitigation provider, they can read your traffic in plaintext, TLS or not.

And finally, the DDOS issue. CloudFlare uses a method that mitigates against DDOS attacks by putting up a front page that asks you to enter a CAPTCHA. Apart from the fact it blocks bots, even backup bots, this doesn’t actually stop any big DDOS attack against something other than your web pages since it doesn’t block attacks by inspecting the packets but just relies on the CAPTCHA and stands in front of you. Dedicated DDOS mitigation works by making sure the packets never reach the real servers – CLoudFlare actually lets the packets reach their servers and relies on having “enough servers”. Given the new threat environment with DDOS attacks now going up to 600 Gbit/sec thanks to the “Internet of (apparently quite insecure) Things”, this may not be enough. Certainly, for the money you have to pay for the mitigation service it’s probably more effective to pay a dedicated DDOS protection service. Since CloudFlare is also responsible for hosting many of the DDOS-service provider websites, paying them for DDOS protection feels a bit like paying protection money to the Mafia.

The verdict

If you use CloudFlare to avoid buying a real certificate to secure your blog on “Human rights in Russia”, or run a webstore that accepts creditcard payments on your own webpages instead of using a payment processor page, you’re opening your users up to huge risks. And pleading innocent is less and less an option now that this information is out there. You can actually get better protection but CloudFlare will always be a Man in the Middle. Sven’s blog lists a number of alternatives if you can’t accept that.

However, for websites like mine, it’s not a big deal to accept less security. After all, before I migrated to CloudFlare the entire site was in cleartext. It’s slightly more secure now than it was, and thanks to this information I will make it more secure in the near future by using a better certificate.

If you move to CloudFlare like I did, you need to carefully weigh the pro’s and cons – better than I did, at least – before moving. But for a lot of blogs it may still be a very good option. If however your webshop or political blog is hosted by CloudFlare, you’d better do some checking before you post or pay. When in doubt, do not enter.

References

2016
[1] http://cryto.net/~joepie91/blog/2016/07/14/cloudflare-we-have-a-problem/
[2] https://blog.torproject.org/blog/trouble-cloudflare

2015
[3] https://salt.agency/blog/seo-rankings-cloudflare/

2014
[4] https://scotthelme.co.uk/tls-conundrum-and-leaving-cloudflare/
[5] https://scotthelme.co.uk/cloudflares-great-new-features-and-why-i-wont-use-them/

Server-test
[6] https://www.ssllabs.com/ssltest/analyze.html?d=www.grundsatzlich-it.nl

Integrating Twitter in WordPress

twitter large logo

Last year Twitter decided to change the way Twitter interacts with the rest of the world, by making it more difficult to integrate its twitter-streams with your own website. While you can get around this if you can deploy server-side software and go through the hassle of signing up for a developer key, a lot of folks run websites without being interested in having to program just to get their own tweets to display.

Twitter does have a solution, but this just dumps the stream on your site with the lay-out and styling of Twitter. While this is understandable from a branding and marketing point of view, it’s incredibly annoying to have your website look like a hash of different styles just because Twitter doesn’t like you changing the lay-out. So there are a lot of people looking for alternatives.

The best alternative I’ve found for my purpose is http://jasonmayes.com/projects/twitterApi/. Jason Mayes twitter API just takes the formatted twitter-feed, removes the formatting and provides the stream with normal tags to the page. Using standard CSS you can then style the stream and presto, you have a nice looking twitter feed.

How it works in WordPress is as follows:
– Download the software from http://jasonmayes.com/projects/twitterApi/
– Upload the javascript file “twitterFetcher_min.js” to your website. This could be as media but I chose to use FTP to upload it into a theme. As long as it’s on your website it’s okay though, the location is unimportant.
– Add a Text widget to the page where you want the tweets to show up.
– Include the following text in the widget:


<script src="/{path}/twitterFetcher_min.js"></script>
<div id="tweet-element">Tweets by Ronald Kunenborg</div>

<script>
var configProfile = {
"profile": {"screenName": '{yourtwittername}'},
"domId": 'tweet-element',
"maxTweets": 10,
"enableLinks": true,
"showUser": true,
"showTime": true,
"showImages": true
};
twitterFetcher.fetch(configProfile);
</script>

Replace “{yourtwittername}” with your own twitter name (of that of someone whose timeline you wish to show), and the {path} with the path of the uploaded javascript and you’re good to go. However, this looks pants. So we need to style it. In order to do that, include the following text in the widget before the script:
<style>
/*
* Tweet CSS - on Jason Mayes tweetgrabber (http://jasonmayes.com/projects/twitterApi/)
*/

div#tweet-element ul {
list-style: none;
}

div#tweet-element h2 {
clear:both;
}

div#tweet-element p {
font-size: 9pt;
margin: 0 0 0 0;
}

div#tweet-element ul li {
list-style:none;
overflow:hidden;
border-top:1px solid #dedede;
margin: 5px 0 10px 0;
padding: 0px;
}

div#tweet-element ul li:hover {
background-color:#f0f3fb;
}

/* tekst of tweet */
.tweet {
clear: left;
}

.user {
clear:left;
float:left;
}

.user a {
}

/* hide the @twittername, which is the 3rd span in the user class */
.user span:nth-child(3) {
display: none;
}

.user a > span {
margin-left:2px;
}

.user a > span {
display: table-cell;
vertical-align: middle;
margin: 5px;
padding: 5px;
}

.widget-text img,
.user a span img {
display: block;
float:left;
max-width: 40px;
margin: 2px 2px 2px 2px;
}

div#tweet-element p.timePosted {
clear: left;
font-style: italic;
}

div#tweet-element p.timePosted a {
color: #444;
}

.interact {
float:left;
margin-top:-7px;
width: 100%;
}

.interact a {
margin-left: 0px;
margin-right: 5px;
width: 30%;
}

.interact a.twitter_reply_icon {
float:left;
text-align: center;
}

.interact a.twitter_retweet_icon {
float:left;
text-align: center;
}

.interact a.twitter_fav_icon {
float:right;
text-align: center;
}

/* show media on front-page - hide it with display:none if you don't want to show media included by others. */
.media img {
max-width:100%;
}

#linkage {
position:fixed;
top:0px;
right:0px;
background-color:#3d3d3d;
color:#ffffff;
text-decoration:none;
padding:5px;
width:10%;
font-family:arial;
}
</style>

Make sure the <style> part is first in the Text widget.

Of course you can also put the style (without the <style> tags) in a stylesheet (.css) file, upload it and then refer to it, instead of pasting the stylesheet in the Text widget. In that case use the following command:

<link rel='stylesheet' id='twitter-css' href='/{path}/twitter-style.css' type='text/css' media='all' />

And please replace {path} with the desired path.

I hope this helps you as much as it helped me.

Migrating to CloudFlare

www.cloudflare.com

I recently added CloudFlare as cache in front of my website. Not only does it provide worldwide local caching of my website, it also improves security by adding in an easy manner all kinds of features you’d otherwise find hard to arrange. It’s still a hassle, but not as much as it used to be.

The standard CloudFlare plan is free. Yep. And you can’t beat the value. The following features are part of it:

  • Caching at a server that is local to your visitors, improving their browsing speeds. Pretty nice and worth the price all in and of itself.
  • Analytics in an easy dashboard. You can get this by incorporating Google Analytics on your pages, true, but here it’s already in the product. However, CloudFlare also has a nice button that allows you to add Google Analytics to all of your pages, if you really want it, without changing your website in any way.
  • Safe browsing over SSL for people visiting the CloudFlare cache, at no charge.
  • DNSSEC can be turned on, securing your DSN entries (DNS translates the name of your website into the IP-address you need to actually get there) against rogue DNS-servers that change the IP-address to their own sites, so they can intercept the traffic or just spoof your website and change pages around. Could be quite embarassing if you are a dissident or well-known political figure, or a bank.
  • A “web firewall” that tries to catch spambots and scrapers before they even reach your website. The more advanced options are paid, but the free option is pretty nice. It has, for instance, the option of asking suspect browsers to authenticate their “humanity” before allowed to access your site. This is enabled by default.
  • IPv6 to IPv4 translation. If you’re on a provider that does not provide IPv6 website hosting you should move ASAP anyway, but suppose you can’t? In that case you have the option of pretending to your rather outdated server that the request is actually an IPv4 type request. Could be useful.
  • An option called “Apps”. Apps are small features you can enable that are provided by 3rd parties. One of these for instance is “A Better Browser” which warns users of older browsers that they should upgrade. Once again, no code on your website changes and you can turn it on and off quite easily. Other apps provide analytics, more security and monitoring but almost all of these are paid options.
  • Email-address obfuscation. For the truly paranoid, you can turn all your emailadresses on the site into addresses that cannot be harvested by the scrapers they said they would stop. I don’t bother with this, but feel free.
  • Hotlink protection. This is pretty nifty if you have a site with a lot of images, and people blogging about them link directly to your site from their article. That means their pageviews count against your bandwidth. With this option you can prevent those requests from being served.

These options are all easily accessible through a set of buttons as displayed here: cloudflare-buttons

Pretty nice all by itself. But I’ll discuss the setup and two main features in more detail.

Setup

The setup is easy. Just sign up and add your website. The main thing you need to get working is the nameservers. If you cannot change the nameservers for your website, things will get really tough because that is how CloudFlare works. If you cannot change them, contact your provider. If your provider does not allow nameserver changes, move away to another that does support it. Otherwise none of the newer features of the internet will work unless your provider agrees to provide them to you. That won’t be cheap.

After you get the nameservers changed, you have to log out and wait a few hours. By then the change will have been recognized by CloudFlare, and now you can actually use its features. The two most useful features are of course caching and encryption, which I explain below in a bit more detail.

Caching

The caching features of the CloudFlare platform help you in the sense that small DDOS attacks won’t bring down your website or hurt your direct provider. Big ones will mean you have to pay up (a lot), but it’s better than your direct provider shutting down your website for a minor DDOS assault, right? They also have the option named “always online(tm)” that provides a cached copy of your website, if it is offline on your own side. Note that this only goes for the popular (cached) pages but these are the most important ones anyway. Of course, caching can be disabled (temporarily) by turning on “development mode”.

Encryption

Encrypting the website gives you the option to have browsers come in over SSL. And this is very interesting because browsers are now signalling by default that your site is untrusted if it is not protected by SSL. The CloudFlare option provides SSL for your website from visitor browser to CloudFlare, but if you don’t add something more, it will still be unencrypted between CloudFlare and your original website.

If you trust the channel between your website and CloudFlare, this is still pretty safe. For most websites it’s a major improvement because they go from no SSL at all, to SSL between visitor and cache. But if you want more it’s pretty easy. Most website hosting companies provide you with the ability to place a self-signed certificate on your website, and CloudFLare can be set to acknowledge that certificate. You could also set CloudFlare to acknowledge only certificates signed by a trusted authority, increasing the security of your channel either further, or reducing it to zero, depending on who you trusted as certificate provider. In my case, I go with the self-signed certificate.

DNSSEC is however a bit more involved. I was unable to get this working because my hosting provider does not provide me with the ability to add a “DS” record to the DNS-server. I’m still looking into it. HOWEVER… my provider has automatic DNSSEC as long as I use their nameservers… This effectively means that I am going to have to do without DNSSEC *or* CloudFlare. Given the risks involved (minor) I’m going to stick with CloudFlare for a while, but I may be returning to the provider I have. I would really like them to have this though.

Summary

All in all, I can highly recommend CloudFlare. It’s free, it’s easy and provides immediate benefits for most websites. If you’re big enough to already have most of this it may be less interesting, but for 90% of the internet this is a step forward.

Update

Update 09-okt-2016: I’ve written a new article about why you should be careful when moving to CloudFlare, as it is not quite as suitable as I thought for websites that require actual security and encryption.

Encryption is not a silver bullet

Have I been pwned?Recently, well-known security researcher Troy Hunt, responsible for the website Have I been pwned? described how someone lost 324000 records with full creditcard details, including security codes, by posting them on a public server. There were two parties suspected of the data breach, but neither could find any breach at first. So both parties stated categorically that there was no breach, all data was 100% encrypted and completely secure on their servers so the problem had to lie elsewere. And they were right, all the data was encrypted.

Now, encrypted data should be safe. And to be honest, encryption is more and more the mainstay of securing your data. Firewalls can be breached, servers and companies infiltrated, but if the data is encrypted it should remain secure even if you publish it on the internet. This is somewhat correct – barring adversaries like national intelligence services, who are very likely to be able to decrypt most schemes at the moment. It’s well known that the Dutch National Intelligence and Security Service (AIVD) is investing heavily in quantum computing research, for instance, which means that the NSA probably has one working right now. But apart from those entities, it’s still quite hard to crack decently encrypted data.

That is why in the new SQL Server edition, SQL Server 2016, it is now possible to keep the data encrypted all the time. Only the client can decrypt the data with their own keys. Barring vulnerabilities in the implementation this is a huge step forward: it is impossible for the database administrators to access data they aren’t allowed to see and the loss of a key only affects data stored for that client. Both are very important steps forward to enable clients to trust databases in the cloud. Which is one reason why Microsoft is pressing forward on this, because they will become entirely dependent on Azure in less than a decade, according to their own predictions. This means that trust in Azure will be a make-or-break issue for the company and their focus on improvements in security reflects this knowledge.

And let me be clear: this is a huge leap forward. The old situation could encrypt some data with server-side keys, but when you made a backup it was decrypted. And in several other scenarios it didn’t work if your data was encrypted. But now it works all over the database, you can set it up quite easily and even choose whether columns are encrypted in a deterministic way that gives the same result every time you encrypt the same value, which enables searching and joining, or random: every time you encrypt the value is different. The latter gives more protection from attackers who encrypt “likely values” and see if they match, which is a classic attack against password-files (see: rainbow tables / dictionary attacks).

In the picture you can see how it works by storing the keys on the client:
Always Encrypted SQL Server 2016

This means we can now store creditcard information and sensitive information in the cloud while not having to rely solely on the goodwill of the Azure database administrator.

There is unfortunately also a downside. The fact that data is now safer does not mean it is safe in all circumstances. The way “always encrypted” works has consequences for your implementation that could blow your encryption scheme right out of the water if misused. So while the temptation to store sensitive but potentially very interesting data because hey, “it’s encrypted” and thus safe, can overcome common sense and even regulations, we should still firmly ignore that temptation.

Because the case I linked in the beginning showed everyone that even if data is encrypted, it is not always safe. In the case which I quoted at the start of the article, the data was encrypted too, and it still leaked. The reason was that the encryption keys were known to the organisation involved and used to decrypt data for analysis. That decrypted textfile was then stored on a publicly accessible server. Encryption cannot mitigate that scenario if the keys are part of the webapplication and the owner of the application can also access the data. Anyone who can get to the keys, can decrypt the information. After that, the security of the data once again depends on what that person does with it – such as putting it on a public server.

This is the reason that if you want to process creditcard information, for instance, you need to be PCI compliant. This is a set of regulations drafted by the financial industry that tell you what data you can store and how. Very sensitive details such as the security code should NEVER be stored. They don’t give security regulations for the storage of the security code: storing it violates all the rules, no matter what you do. The case with Regpack shows that this is still true. What you store will eventually leak, even with encryption. Once quantum computers become available widely, all current encryption schemes are broken and that nicely encrypted data on the internet that wasn’t a problem… is suddenly readable text.

So while “always encrypted” is a step forward, you still need to be very careful about what you store and it still needs to be secure – processing encrypted data on an insecure platform means your data is just as insecure, as the data can be intercepted in memory. While solutions are in the works (Philips, IBM and others are working on homomorphic encryption schemes) this is currently not an option.

recommendations

My recommendations on this subject are as follows.

  • Do not store any data you are not allowed to store.
    If you do this anyway and lose the data, you will get fined or even shut down when this comes to light.

  • Do not store any sensitive data you do not have to store.
    Everything you store is a security risk, if you don’t store anything there are no risks. Being smart about what data to store is a big part of any security strategy.

  • If you do store sensitive data, let the owner of the data hold the key to that data if at all possible.
    After all, a file where every line is encrypted with a different key you don’t have, is a file that will be pretty hard to decrypt and certainly can’t be decrypted by accident by one of your employees.

  • If you cannot do even that, and your application does the encrypting, make sure the decryption key is locked in hardware like a smart card that is NOT reachable on any computer without physical presence.
    Violating this simple rule was what destroyed the Dutch Public Key provider Diginotar.

Some companies prioritize time-to-market and lower cost over data security. But eventually, those companies will be destroyed over that practice. The current digital environment is just too hostile to survive such practices for very long.

Certified Anchor Modeler

As of today, I am certified as Anchor Modeler. My thanks go to Lars Rönnbäck (UpToChange.com), the best teacher you could have, as well as Juan-José van der Linden for inviting me and to Essent for hosting the course.

While the community of Anchor Modelers is still quite small, it will likely expand as the concurrent-reliance-temporal model is extremely interesting. The notion of positors and reliance combined with the positing and changing time is quite advanced. I’m looking forward to combining this with Martijn Evers’ notions about timeline choices with respect to Consistency/Accuracy/Availability.

DataVault Cheat Sheet Poster v1.0.9

This poster displays the most important rules of the Data Vault modelling method version 1.0.9 on one A3-size cheat sheet. I decided to not add personal interpretation and keep the sheet as close to the original specs as possible.

You can find the rules that were used for this poster on the website of Dan Linstedt.

DataVault Cheat Sheet v109 (A3) PDF

A version where the Colors of the Data Vault have been used, is available as well:
DataVault Cheat Sheet v109 (A3, color) PDF

Creating brilliant visualizations of graph data with D3 and Neo4j

Okay, so someone recommended I spice up the titles a bit. I hope you’re happy now!

Anyway, it really is the truth: you can create brilliant visualizations of data with the D3 javascript library, and when you combine it with Neo4j and the REST API that gives you acccess to its data, you can create brilliant visualizations of graph data.

Examples of d3 visualizations

Examples of d3 visualizations, laid out in a hexadecimal grid

So what’s D3? Basically, D3 is a library that enables a programmer to construct and manipulate the DOM (Document Object Model) in your webbrowser. The DOM is what lives in the memory of your computer once a webpage has been read from the server and parsed by your browser. If you change anything in the DOM, it will be reflected on the webpage immediately.

There are more libraries that can manipulate the DOM (such as JQuery), but D3 is focused towards ease of use when using data as the driver for such manipulations, instead of having code based on mouseclicks do some alterations. There are commands to read CSV or other formats, parse them and then feed them to further commands that tell D3 how to change the DOM based on the data. This focus on using data to drive the shape of the DOM is gives D3.js its name: Data Driven Documents.

An example of what you can achieve with minimal coding is for instance the Neo4j browser itself, and the force-connected network that is shown as the output for a query returning nodes and/or relationships. However, another visualization of a network of nodes and relationships is the Sankey diagram:

An example of a Sankey diagram

An example of a Sankey diagram

The Sankey diagram as shown above was created using d3.js, a Sankey plug-in (javascript) and the lines of code that control d3: about 70 lines of Javascript in all.

To demonstrate how easy it is to use d3.js and Neo4j as database to create a nice visualization, I’m not going to use the Sankey example, however. It’s too complex to use as an example for that, although I will write an article about that particular topic in the near future.

No, we’re going to create a bar chart. We’ll use the previous article Using Neo4j CYPHER queries through the REST API as a basis on which to build upon.

The bar chart, when done, will look like this:

Barchart showing the number of players per movie

Barchart showing the number of players per movie

You will need some understanding of JavaScript (ECMAscript), but this can be obtained easily by reading the quite good book, Eloquent Javascript.

You will also need to understand at least some of the basics of D3, or this article will be incomprehensible. You can obtain such understanding from d3js.org, and I recommend this tutorial (building a bar chart) that goes into much more detail than I do here. An even better introduction is the book “D3 tips and tricks” that starts to build a graph from the ground up, explaining everything while it’s done.

Please note that I used the d3.js library while developing, and it ran fine from the development server. However, when I used d3 with the standard Microsoft webserver, it mangled the Greek alphabet soup in the code and it didn’t work. The minified version (d3.min.js) does not have that issue, so if you run into it, just use the minified version.

We will use nearly the same code as in the previous article, but with a few changes.

First, we add a new include: the D3 library needs to be included. We use the minified version here.

<html>
<head>
<title>Brilliant visualization of graph data with D3 and Neo4j</title>
<script src="scripts/jquery-2.1.3.js"></script>
<script src="scripts/d3.min.js"></script>
</head>
<body>

Next, we add the function “post_cypherquery()” again, to retrieve data from Neo4j. We use exactly the same routine we used the last time.

    <script type="text/javascript">
        function post_cypherquery() {
            // while busy, show we're doing something in the messageArea.
            $('#messageArea').html('<h3>(loading)</h3>');

            // get the data from neo4j
            $.ajax({
                url: "http://localhost:7474/db/data/transaction/commit",
                type: 'POST',
                data: JSON.stringify({ "statements": [{ "statement": $('#cypher-in').val() }] }),                
                contentType: 'application/json',
                accept: 'application/json; charset=UTF-8',
                success: function () { },
                error: function (jqXHR, textStatus, errorThrown) { $('#messageArea').html('<h3>' + textStatus + ' : ' + errorThrown + '</h3>') },
                complete: function () { }
            }).then(function (data) {

Once we have obtained the data, we display the query we used to obtain the result, and clear the “(Loading)” message.

                $('#outputArea').html("<p>Query: '"+ $('#cypher-in').val() +"'</p>");
                $('#messageArea').html('');

Then, we create an empty array to hold the attribute-value pairs we want and push the rows from the resultset into the d3 array. Basically, we make a copy of the resultset in a more practical form.

                var d3_data = [];
                $.each(data.results[0].data, function (k, v) { d3_data.push(v.row); });

Then we determine how big our chart should be. We will be using Mike Bostocks margin convention for this.

We create a barchart that has a margin of 40 pixels on top and bottom, and 200 pixels on the right – because I want to add the movienames on that side of the chart. Our graphic will occupy half the display, so the real area we can draw in is half the window size, minus the horizontal margin. The height of the graph will be scaled to 3/4 of the height of the window, minus the margins. We scale the bars to fit in that size.

                var margin = { top: 40, right: 200, bottom: 40, left: 40 },
                    width = ($(window).width()/2) - margin.left - margin.right,
                    height = ($(window).height()/2) - margin.top - margin.bottom, 
                    barHeight = height / d3_data.length;

Here we use our very first D3 function: d3.max. It will run over the d3_data array and apply our selector function to each element, then find the maximum value of the set.

This will give us the highest amount of players on any movie. Then we add a bit of margin to that so our barchart will look nicer later on, when we use this value to drive the size of the bars in the chart.

                var maxrange = d3.max(d3_data, function (d) { return d[1]; }) + 3;

Next, we use an important part of the D3 library: scales. Scales are used everywhere. Basically, they transform a range of values into another range. You can have all kinds of scales, logarithmic, exponential, etcetera, but we will stick to a linear scale for now. We will use one scale to transform the number of players into a size of the bar (scale_x), and another to transform the position of a movie in the array into a position on the barchart (scale_y).

We use rangeRound at the end, instead of range, to make sure our values are rounded to integers. Otherwise our axis ticks will be on fractional pixels and D3 will anti-alias them, creating very fuzzy axis tickmarks.

                var scale_x = d3.scale.linear()
                    .domain([0, maxrange])
                    .rangeRound([0, width]);

                var scale_y = d3.scale.linear()
                    .domain([d3_data.length, 0])
                    .rangeRound([0, height]);

And once we have the scales, we define our axes. Note that this doesn’t “draw” anything, we’re just defining functions here that tell D3 what they are like. An axis is defined by its scale, the number of ticks we want to see on the axis, and the orientation of the tickmarks.

                var xAxis = d3.svg.axis()
                    .scale(scale_x)
                    .ticks(maxrange)
                    .orient("bottom");

                var yAxis = d3.svg.axis()
                    .scale(scale_y)
                    .ticks(d3_data.length)
                    .orient("left");      

So far, we’ve just loaded our data, and defined the graph area we will use. Now, we’ll start to manipulate the Document Object Model to add tags where we need them. We will start with the most important one: the SVG tag. SVG stands for Scalable Vector Graphics, and it’s a web standard that allows us to draw in the browser page, inside the area defined by this tag. And that is what we will do now, inside the already existing element with id = “outputArea”. This allows us to place the graphics right where we want them to be on the page.

The preserveAspectRatio attribute defines how the chart will behave when the area is resized. See the definition of PreserveAspectRatioAttribute for more information.

                var chart = d3.select("#outputArea")
                    .append("svg")
                    .attr("width", (width + margin.left + margin.right) + "px")
                    .attr("height", (height + margin.top + margin.bottom) + "px")
                    .attr("version", "1.1") 
                    .attr("preserveAspectRatio", "xMidYMid")
                    .attr("xmlns", "http://www.w3.org/2000/svg");

Note that we assign this manipulation to a variable. This variable will hold the position in the DOM where the tag “svg” is placed and we can just add to it, to add more tags.

The first svg element in the svg should have a title and a description, as per the standard. So that is what we will do. After the <svg> tag, we will append a <title> tag with a text.

                chart.append("title")
                    .text("Number of players per movie");

                chart.append("desc")
                    .text("This SVG is a demonstration of the power of Neo4j combined with d3.js.");

Now, we will place a grouping element inside the svg tag. This element < g > will be placed at the correct margin offsets, so anything inside it has the correct margins on the left- and top sides.

                chart = chart.append("g")
                    .attr("transform", "translate(" + (+margin.left) + "," + (+margin.top) + ")");

Now we place the x- and y-axis that we defined earlier on, in the chart. That definition was a function – and now we come CALLing. Here we will also add a class-attribute, that will later allow us to style the x and y-axis separately. We put the x-axis on the bottom of the graph, and the y-axis on the left side.

Since the axes are composed of many svg-elements, it makes sense to define them inside a group-element, to make sure the entire axis and all its elements will be moved to the same location.

Please note that the SVG-coordinates have the (0,0) point at the top left of the svg area.

                chart.append("g")
                    .attr("class", "x axis")
                    .attr("transform", "translate(0," + (+height) + ")")
                    .call(xAxis);
                chart.append("g")
                    .attr("class", "y axis")
                    .attr("transform", "translate(" + (-1) + ",0)")
                    .call(yAxis);

Finally, we get to the point where we add the bars in the chart. Now, this looks strange. Because what happens is that we define a placeholder element in the SVG for every data element, and then D3 will walk over the data elements and call all of the functions after the “data” statement for each data-element.

So everything after the data-statement will be called for EACH element. And if it is a new data-element that wasn’t yet part of the DOM, it will be added to it. And all of the statements that manipulate the DOM, will be called for it.

So, we define the bar as an SVG-group, with a certain class (“bar”) and a position, that is based on the position in the array of elements. We just display the elements ordered in the way we received them. So adding an ORDER BY statement to the CYPHER query will change the order of the bars in the chart.

                var bar = chart.selectAll("g.bar")
                    .data(d3_data)
                    .enter().append("g").attr("class","bar")
                    .attr("transform", function (d, i) { return "translate(0," + i * barHeight + ")"; });

Then, still working with the bar itself, we define a rectangle of a certain width and height. We add the text “players: ” to it, for display inside the rectangle. We define the text as having class “info”. Then, we add the text with the name of the movie for display on the right of the bar, and give it class “movie”. And that concludes our D3 script.

                bar.append("rect")
                    .attr("width", function (d) { return scale_x(d[1]) + "px"; }) 
                    .attr("height", (barHeight - 1) + "px" );

                bar.append("text")
                    .attr("class", "info")
                    .attr("x", function (d) { return (scale_x(d[1]) - 3) + "px"; })
                    .attr("y", (barHeight / 2) + "px")
                    .attr("dy", ".35em")
                    .text(function (d) { return 'players: ' + d[1]; });

                bar.append("text")
                    .attr("class","movie")
                    .attr("x", function (d) { return (scale_x(d[1]) + 3) + "px"; })
                    .attr("y", (barHeight / 2) + "px")
                    .attr("dy", ".35em")
                    .text(function (d) { return d[0]; });
            });
        };
    </script>

All that remains is to define the HTML of the page itself that will display at first. This is the same HTML as before, but with a different CYPHER query.

<h1>Cypher-test</h1>
<p>
<div id="messageArea"></div>
<p>
<table>
  <tr>
    <td><input name="cypher" id="cypher-in" value="MATCH (n:Movie)-[:ACTED_IN]-(p:Person) return n.title as movietitle, count(p) as players" /></td>
    <td><button name="post cypher" onclick="post_cypherquery();">execute</button></td>
  </tr>
</table>
<p>
<div id="outputArea"></div>
<p>
</body>
</html>

Unfortunately, at this point our barchart will look like this:

Unstyled d3 barchart in black and white with blocky axes

Unstyled d3 barchart

What happened was that we didn’t use ANY styling at all. That doesn’t look very nice, so we will add a stylesheet to the page. Note that you can style SVG-elements just as you can style standard HTML elements, but there is one caveat: the properties are different. Where you can use the color attribute (style="color:red") on an HTML element, you would have to use the stroke and fill attributes for SVG elements. Just the text element alone has a lot of options, as shown in this tutorial.

So, we now add a stylesheet at the end of the <head> section. We start with the definitions of the bars – they will be steelblue rectangles with white text. The standard text will be white, right-adjusted text that stands to the left of the starting point. The movie-text will be left-adjusted and stand to the right of its starting position, in italic black font.

<style>
#outputArea {
  height: 50px;
}

#outputArea rect {
  fill: steelblue; 
}

#outputArea text {
  fill: white;
  font: 10px sans-serif;
  text-anchor: end;
  color: white;
}

#outputArea text.movie {
  fill: black;
  font: 10px sans-serif;
  font-style: italic;
  text-anchor: start;
}

Now we define the axes. They will be rendered with very small lines (crispEdges), in black. The minor tickmarks will be less visible than the normal tickmarks.

.axis {
  shape-rendering: crispEdges;
  stroke: black;
}

.axis text {
  stroke: none;
  fill: black;
  font: 10px sans-serif;
}

.y.axis text {
  display: none;
}

.x.axis path,
.x.axis line,
.y.axis path,
.y.axis line {
  fill: none;
  stroke: black;
  stroke-width: 1px;
  shape-rendering: crispEdges;
}

.x.axis .minor,
.y.axis .minor {
  stroke-opacity: .5;
}
</style>

And now, we get this:

Styled d3 barchart in color with crisp axes

Styled d3 barchart

We can add more bells and whistles, such as animations and nice gradients for the bars, but that’s something I’ll leave to you.

By the way: we can add SVG elements, but in the same manner we could also just add plain HTML elements and create a nicely styled tabular lay-out for the same data. Or we could create a Sankey diagram. But that’s something for another post.

Using Neo4j CYPHER queries through the REST API

Lately I have been busy with graph databases. After reading the free eBook “Graph Databases” I installed Neo4j and played around with it. Later I went as far as to follow the introduction course as well as the advanced graph modeling course at Xebia. This really helped me start playing around with Neo4j in a bit more structured manner than I was doing before the course.

I can recommend installing Neo4j and just starting to use it, as it has a great user interface with excellent help manuals. For instance, this is the startscreen:

Neo4j-startscreen

Easy, right?

One of the things that struck me was the ease with which you could access the data from ECMAscript (or Javascript if you’re very old and soon-to-be obsoleted). Using the REST API you can access the graph in several ways, reading and writing data from and to the database. It’s the standard interface, actually. There’s a whole section in the Neo4j help dedicated to using the REST API, so I’ll leave most of it alone for now.

What’s important, is that you can also fire CYPHER queries at the database, receiving an answer in either JSON or XML notation, or even as an HTML page. This is important because CYPHER queries are *very* easy to write and understand. As an example, the following query will search the sample database that is part of the Neo4j database, with Movies and Actors.

Suppose we want to show all nodes that are of type Movie. Then the statement would be:

MATCH (m:Movie) RETURN m

A standard query to discover what’s in the database is
MATCH (m) RETURN m LIMIT 100
This is limited to 100 items (nodes and/or relationships), because it does return the entire database otherwise and in the user interface this starts to slow things down. It’s gorgeous, but when your resultsets are getting big it does slow things down. Here’s how it looks:

Neo4j-results-1

Very nice. But not that useful if we want a particular piece of data. However, if we want to show only the actors that played in movies, we could say:

MATCH (p:Person)-[n:ACTED_IN]->(m:Movie) RETURN p

This returns all nodes of type Person that are related to a node of type Movie through an edge of type ACTED_IN.

While I won’t go into more detail on Cypher, let’s just say it is a very powerful abstraction layer for queries on graphs that would be very hard to do with SQL. It’s not as performant as actually giving Neo4J explicit commands using the REST API, which you want to do if you build an application where sub-second performance is an issue, but for most day-to-day queries it’s pretty awesome.

So how do we use the REST API? That’s pretty easy, actually. There are two options, and one of them is now deprecated – that is the old CYPHER endpoint. So we use the new http://localhost:7474/db/data/transaction/commit endpoint, which starts a transaction and immediately commits it. And yes, you can delete and create nodes through this endpoint as well so it’s highly recommended to not expose the database to the internet, unless you don’t mind everyone using your server as a public litterbox.

You have to POST requests to the endpoint. There are endpoints you can access with GET, like http://localhost:7474/db/data/node/1 which returns the node with id=1 on a HTML page, but the transactional endpoint is accessed using POST.

The easiest way to use a REST API is to start a simple webserver, create a simple HTML-page, add Javascript to it that responds to user input and that calls the Neo4j REST API.

Since we’re going to use Javascript, be smart and use JQuery as well. It’s pretty much a standard include.

How to proceed:

  • First, start the Neo4j software. This opens a screen where you can start the database server, and in the bottom left of the screen you can see a button labeled “Options…”. Click that, then click the “Edit…” button in the Server Configuration section. Disable authentication for now (and make very sure you don’t do this on a server connected to the internet) by changing the code to the following:

    # Require (or disable the requirement of) auth to access Neo4j
    dbms.security.auth_enabled=false

    This makes sure we don’t have the hassle of authentication for now. Don’t do this on a connected server though.

  • Now, we start the Neo4j database. Otherwise we get strange errors.
  • Then, proceed to build a new HTML-page (I suggest index.html) on your webserver, that looks like this:
    <html>
    <head>
    <title>Cypher-test</title>
    <script src="scripts/jquery-2.1.3.js"></script>
    </head>
    <body>
        <script type="text/javascript">
            function post_cypherquery() {
                $('#messageArea').html('<h3>(loading)</h3>');
    
                $.ajax({
                    url: "http://localhost:7474/db/data/transaction/commit",
                    type: 'POST',
                    data: JSON.stringify({ "statements": [{ "statement": $('#cypher-in').val() }] }),
                    contentType: 'application/json',
                    accept: 'application/json; charset=UTF-8'                
                }).done(function (data) {
                    $('#resultsArea').text(JSON.stringify(data));
                    /* process data */
                    // Data contains the entire resultset. Each separate record is a data.value item, containing the key/value pairs.
                    var htmlString = '<table><tr><td>Columns:</td><td>' + data.results[0].columns + '</td></tr>';
                    $.each(data.results[0].data, function (k, v) {
                        $.each(v.row, function (k2, v2) {
                            htmlString += '<tr>';
                            $.each(v2, function (property, nodeval) {
                                htmlString += '<td>' + property + ':</td><td>' + nodeval + '</td>';
                            });
                            htmlString += '</tr>';
                        });
                    });
                    $('#outputArea').html(htmlString + '</table>');
                })
                .fail(function (jqXHR, textStatus, errorThrown) {
                    $('#messageArea').html('<h3>' + textStatus + ' : ' + errorThrown + '</h3>')
                });
            };
        </script>
    
    <h1>Cypher-test</h1>
    <p>
    <div id="messageArea"></div>
    <p>
    <table>
      <tr>
        <td><input name="cypher" id="cypher-in" value="MATCH (n) RETURN n LIMIT 10" /></td>
        <td><button name="post cypher" onclick="post_cypherquery();">execute</button></td>
      </tr>
    </table>
    <p>
    <div id="outputArea"></div>
    <p>
    </body>
    </html>
    

    Make sure you don’t forget to download JQuery and put the downloaded file in the scripts subdirectory below the directory in which you place this file. The line where you need to change the corresponding filename if you rename the file or place it somewhere else is highlighted in red.

While this doesn’t look very pretty, it gets the job done. It executes an AJAX call to Neo4j, using the transactional endpoint. After receiving a success-response, it writes the raw answer (JSON) into the resultsArea over the input box. Then, it parses the result and writes the results to a table in the dataArea.

The resultset from neo4j is returned as a data-object that looks like this:

{
  "results" : [ {
    "columns" : [ "n" ],
    "data" : [ 
      {"row" : [{"name":"Leslie Zemeckis"}]}, 
      {"row" : [{"title":"The Matrix","released":1999,"tagline":"Welcome to the Real World"}]}, 
      {"row" : [{"name":"Keanu Reeves","born":1964}]} 
      ]
  } ],
  "errors" : [ ]
}

Note the different row-variants. Since we did not limit ourselves to a single type of node, we got both Movie- and Actor-nodes in the result. And even within a single node-type, not every node has the same properties. The neo4j manual has more information about the possible contents of the resultset.

Please note that ANY valid Cypher-statement will be executed, including CREATE and DELETE statements, so feel free to play around with this.

– Ronald Kunenborg.

Presentation: history of DWH modeling

Dear readers, on june 6th I held a keynote presentation in front of 300 people, summarizing the state of DWH modeling. The conference proceedings of the day are available at BI-Podium .

My own presentation is available here as well: Next Generation DWH Modeling 2013 conference keynote speech

The Anchor Modeling folks also wrote a summary: Next Generation DWH Modeling