Using Neo4J for Website Analytics

This is a cross-post from the digitalian

Working at the office customizing and installing different content management systems (CMS) for some of our clients, I have seen different ways of tracking users and then using the collected data to:

  1. generate analytics reports
  2. personalize content

I am not talking about simple Google Analytics data. I am referring to ways to map users into predefined personas and then modify the content of the site based on what that persona is interested into.

Recently, I grew interested in graph databases – in particular Neo4J – and since it’s quite simple to create a graph that maps the structure of a website (i.e. the website sitemap), I begun to think that there might be a way to employ the power of graph DBs to:

  1. track users more efficiently
  2. use the collected data to generate better analytics reports
  3. get better realtime content personalization
  4. be able to efficiently answers questions that would have been hard to answer using regular tracking mechanisms

This morning, while I was having breakfast, I started brainstorming possible ways to use Neo4J to track users. My first obvious ideas was to simply connect each user to all the pages they had visited:

An interesting starting point which I immediately redraw adding the concept of time tracking in each visit, by connecting visited pages in a sequence using a linked list:

Next, before going to the office, I posted these two sketches on Twitter asking for some feedback. Lots of people interested in graph databases (and Neo4J) were immediately interested.

Among them, @kennybastani suggested to “think of each visit as an event node” in order to “preserve continuity of paths“. This is my interpretation of his suggestion:

At this point we realized that the 140 characters limit imposed by Twitter would get in the way of an interesting discussion and we decided to turn this brainstorming session into a blog post to give everyone a chance to participate.


A persona is a group of visitors that visit a website, that share a common objective, purpose or background which make them distinct from another group of visitors.

Understanding in real time, as the visitor browses a site, what is the purpose of his visit, can help tailoring the content displayed in the sidebars to better appeal his/her interests.

Each page of the site can be assigned a value for each of the different personas identified as the most likely types of visitors for the site at hand. Here is my try to add this concept to the previous graph:

In order to better understand the idea behind this, let’s take a concrete example: an animal rescue shelter. Some visitors will come to the site to look to volunteer; others will want to understand how to fundraise. Others are looking to adopt either a cat or a dog. Some are coming to donate money to the shelter.

The following graph takes in account all the above personas and shows some possible values assigned to certain pages.

By the time the user gets to the Puppies:Page, he/she has already accumulated from the previous pages:

  • 12 points as a person that wants to adopt
  • 4 points as a dogs lover
  • 2 points as a possible donor

therefore in the sidebars of the Puppies:Page we should show some Calls To Action (CTAs) to donate money to the shelter using images of dog puppies. “Duh — you might say — the person is on the Dogs/Puppies page, so obviously you will show such a CTA!” That’s true… but let’s imagine that, after visiting Lucy:Page, that the visitor goes back to the homepage. By this time he/she has accumulated:

  • 26 points as a person that wants to adopt
  • 20 points as a dogs lover
  • 2 points as a possible donor

Guess what should be shown big on the homepage?

  • At the top I would show a large CTA inviting the visitor to donate, using the image of a puppy dog (because the user is a dogs lover).
  • In the sidebars, I would place a few CTAs inviting the user to adopt some of the dogs that have been at the shelter for the longest time (because the user is interested in adopting).
  • Most of the articles featured on the home page should be happy stories about dogs who were adopted and now live a happy life with their new families.

Value of using graphs

Everything I said so far can be accomplished with relational databases. The main concern there is the size of the database (tracking databases tend to become large quickly and this obviously can deter real time performance).

The question that remains on the table is whether graphs can bring value to this scenario either by:

  • providing faster feedback to the content management engine
  • allowing to do better offline reporting
  • allowing to answer questions that would be difficult to answer with a traditional relational database

Ideas? Thoughts?


Building a Single Page Application with AngularJS and Neo4J – Introduction

This is a cross-post from the digitalian

Lucky Coincidences

A series of lucky coincidences has recently brought me to dip my toes into a JavaScript-centric programming world completely new to me, from top to bottom. Or maybe I should better say, from front to back:

Front End

  • Using AngularJS, an open-source JavaScript framework, maintained by Google, that assists with running single-page applications.

Back End

  • Using Javascript also on the server side, with ExpressJS, a web application framework built on top of Node.js
  • Using JSON (JavaScript Object Notation) to transmit data objects between the client, the server and the database
  • Using Neo4J, a fully transactional Java persistence engine that stores data structured in graphs rather than in tables.

Initially I had picked the MEAN stack, a full-stack JavaScript development environment very trendy amongst prototypers in the Angular community (where MEAN is an acronym standing for MongoDB, ExpressJS, AngularJS and Node.js).

However, after I had already built a very simple prototype of my SPA (Single Page Application) using the MEAN stack, another lucky coincidence exposed me to Neo4J and it was love at first sight. My decision was that, if I had to learn something new, I would have rather enjoyed being forced into a completely new way of thinking from a data perspective as well, and so I dropped MongoDB in favor of Neo4J from my personal stack.*

“Abandon all hope, ye who enter here.”

In Dante Alighieri’s “Divine Commedy”, when Dante passes through the gate of Hell, the gate bears an inscription which reads “Abandon all hope, ye who enter here.”

It might sound a bit too dramatic to quote that inscription here but, being used to work in a very Microsoft-centric programming environment (Windows, ASP.NET and SQL Server), diving into a Javascript-centric environment felt a lot more challenging than I could have ever imagined. I had to abandon at the entrance of this new world most of what I was used to, most of what I knew and all the tools I was already familiar with. As much as it was frightening, though, finding myself almost completely disoriented felt also refreshingly good.

In this series of posts I will try to put down in words both the mental process I went through and some of the technical challenges that I found myself facing and had to overcome while building a ficticious personal project, which consists of building a Single Page Application (SPA) recreating some/most of Trello‘s functionality.

New Tools for the Trade

Being that this is a personal project and since it doesn’t require using any Microsoft technology, I decided to work off of my MacBook Pro. This decision meant that, among other things, I had to find either a decent text editor or an IDE to write my code, as well as an easy way to version control my files.

After trying out a few different tools, I opted for Sublime Text, a very versatile cross-platform text and source code editor which I liked from the beginning.

To manage the source code I decided to grab the opportunity to try out Git, a free and open source distributed version control system designed and developed by Linus Torvalds in 2005. I downloaded it and installed it on my Mac and connected it to my empty GitHub account.

Next, I created my first repo (i.e. repository) and, in order to better understand the inner workings of Git, I decided to just use the OS X Terminal, issuing git commands directly from the command line. Needless to say, not being a command line type of person, I grew tired of that within a couple of weeks, especially because I found the add/commit/push sequence a bit too cumbersome/annoying (even with some of the shortcuts that git offers). So, recently, I have downloaded GitHub for Mac. So far so good, although it’s a bit too early to really say how I feel about it. I do enjoy the fact that with one click I am finally able to commit all my changes and sync them to my GitHub repo.

A side effect of working with GitHub was the rediscovery of the Markdown language, a minimal markup language used for all README files in GitHub. Being a plain text formatting syntax, no special editor is needed to write content using Markdown, although there are some specially-designed editors which preview markdown files with styles. For now I picked Mou a very nice markdown editor which provides instant preview side-by-side to the text being edited.

Mou Editor

Coincidentally, just a few weeks ago, finally, WordPress has announced that they are now supporting GitHub’s version of Markdown within the blog post editor! Perfect timing for me to really get addicted to this simple writing tool! Because Markdown is simple enough to learn in a few minutes, clean and elegant enough to be readable no matter your context, and it is becoming the defacto markup language of the Internet, at least among the slightly geekier types who do know markup languages :)

Perfect timing! Markdown on WordPress!

this is an image

Awesome! Just when I was getting more into the swing of things (got the image?) using Markdown, between writing .md files on GitHub and using IA Writer on my Mac, WordPress has announced that we can finally use a fancy version of Markdown (which includes also code formatting and code highlighting, just like the GitHub version) also on WordPress hosted blogs!

Now, if only iPads had a Markdown friendly keyboard!1

  1. I know, I know, some applications do use the new features provided by iOS7 and do extend the default keyboard, but really the default keyboard should be a bit more Markdown friendly and at least provide easy access to symbols like #,*,@ and tabs 

Maybe Responsive Design is not such a good idea?


While working on a side project and researching about node.js, I stumbled upon an interesting comment, made by Kiran Prasad, head of the LinkedIn’s mobile development team.

Responsive design might work for uncomplicated, one-off websites, but for applications or networks (such as LinkedIn is), responsive design is actually bad. We’re looking at

  • the entrenched use case [for desktop users],
  • the coffee-and-couch use case [for tablet users],
  • and the two-minute use case [for mobile phone users].

You can’t take a mobile app and just scale it up to tablet or desktop. A lot of responsive design is building one site that works everywhere. This might work for some websites. But it’s a bad approach for others and especially for apps… You have to come up with a completely different design for each of the above use cases.

It made me think, and brought back to mind what a colleague was recently recommending to a client (a cabinet maker company) when they asked about building a mobile version of their website. Others might have jumped on the opportunity and would have suggested a complete website redesign to make it responsive. Instead, he brought up the use case of the user who access their site from their mobile phone. “When could that happen?” he asked. “When would a person browse a cabinet maker’s website on their mobile phone? It could happen when they are at the Home Depot, looking for cabinets and they want to make some quick comparisons. So no need to put on the mobile website any media heavy content, no need to focus on the company history, or on the cabinet maker application. Just simple info about their products, with measurements and prices.

I liked his approach, use-case based. As software engineers we often end up falling to deep into the technical trenches and forget about the end user, who in fact are the people for whom we are ultimately building our products. And working from a use case perspective, responsive design, makes really no sense, because, even though it’s a really cool concept from a technical perspective, it doesn’t necessarily address any of the needs of the end users. It seems to me that it’s just another case of “just because you can do it, it doesn’t mean you should be doing it” type of scenario.