Using Neo4J for Website Analytics

This is a cross-post from the digitalian

Working at the office customizing and installing different content management systems (CMS) for some of our clients, I have seen different ways of tracking users and then using the collected data to:

  1. generate analytics reports
  2. personalize content

I am not talking about simple Google Analytics data. I am referring to ways to map users into predefined personas and then modify the content of the site based on what that persona is interested into.

Recently, I grew interested in graph databases – in particular Neo4J – and since it’s quite simple to create a graph that maps the structure of a website (i.e. the website sitemap), I begun to think that there might be a way to employ the power of graph DBs to:

  1. track users more efficiently
  2. use the collected data to generate better analytics reports
  3. get better realtime content personalization
  4. be able to efficiently answers questions that would have been hard to answer using regular tracking mechanisms

This morning, while I was having breakfast, I started brainstorming possible ways to use Neo4J to track users. My first obvious ideas was to simply connect each user to all the pages they had visited:

An interesting starting point which I immediately redraw adding the concept of time tracking in each visit, by connecting visited pages in a sequence using a linked list:

Next, before going to the office, I posted these two sketches on Twitter asking for some feedback. Lots of people interested in graph databases (and Neo4J) were immediately interested.

Among them, @kennybastani suggested to “think of each visit as an event node” in order to “preserve continuity of paths“. This is my interpretation of his suggestion:

At this point we realized that the 140 characters limit imposed by Twitter would get in the way of an interesting discussion and we decided to turn this brainstorming session into a blog post to give everyone a chance to participate.

Personas

A persona is a group of visitors that visit a website, that share a common objective, purpose or background which make them distinct from another group of visitors.

Understanding in real time, as the visitor browses a site, what is the purpose of his visit, can help tailoring the content displayed in the sidebars to better appeal his/her interests.

Each page of the site can be assigned a value for each of the different personas identified as the most likely types of visitors for the site at hand. Here is my try to add this concept to the previous graph:

In order to better understand the idea behind this, let’s take a concrete example: an animal rescue shelter. Some visitors will come to the site to look to volunteer; others will want to understand how to fundraise. Others are looking to adopt either a cat or a dog. Some are coming to donate money to the shelter.

The following graph takes in account all the above personas and shows some possible values assigned to certain pages.

By the time the user gets to the Puppies:Page, he/she has already accumulated from the previous pages:

  • 12 points as a person that wants to adopt
  • 4 points as a dogs lover
  • 2 points as a possible donor

therefore in the sidebars of the Puppies:Page we should show some Calls To Action (CTAs) to donate money to the shelter using images of dog puppies. “Duh — you might say — the person is on the Dogs/Puppies page, so obviously you will show such a CTA!” That’s true… but let’s imagine that, after visiting Lucy:Page, that the visitor goes back to the homepage. By this time he/she has accumulated:

  • 26 points as a person that wants to adopt
  • 20 points as a dogs lover
  • 2 points as a possible donor

Guess what should be shown big on the homepage?

  • At the top I would show a large CTA inviting the visitor to donate, using the image of a puppy dog (because the user is a dogs lover).
  • In the sidebars, I would place a few CTAs inviting the user to adopt some of the dogs that have been at the shelter for the longest time (because the user is interested in adopting).
  • Most of the articles featured on the home page should be happy stories about dogs who were adopted and now live a happy life with their new families.

Value of using graphs

Everything I said so far can be accomplished with relational databases. The main concern there is the size of the database (tracking databases tend to become large quickly and this obviously can deter real time performance).

The question that remains on the table is whether graphs can bring value to this scenario either by:

  • providing faster feedback to the content management engine
  • allowing to do better offline reporting
  • allowing to answer questions that would be difficult to answer with a traditional relational database

Ideas? Thoughts?

Building a SPA with AngularJS and Neo4J – Data Structure (Second Try)

This is a crosspost from The Digitalian

At the end of last post I quoted the eye-opening comment that Graph Grandmaster Wes Freeman left on one of my questions on Stack Overflow. Following his advice I decided to change the way each of the queues in my application was handled, adding two extra nodes, the head and the tail.

new queue structure

Inserting a New Card

Moving the concepts of head and tail from simple relationships to nodes allows to have a single case when inserting a new card. Even in the special case of an empty queue…

new queue structure

all we have to do to add a new card to the tail of the queue is:

  • find the (previous) node connected by a [PREV_CARD] and a [NEXT_CARD] relationships to the (tail) node of the queue
  • create a (newCard) node
  • connect the (newCard) node to the (tail) node with both a [PREV_CARD] and a [NEXT_CARD] relationships
  • connect the (newCard) node to the (previous) node with both a [PREV_CARD] and a [NEXT_CARD] relationships
  • finally delete the original [PREV_CARD] and a [NEXT_CARD] relationships that connected the (previous) node to the (tail) node of the queue

new queue structure

which translates into the following cypher query:

MATCH (theList:List)-[tlt:TAIL_CARD]->(tail)-[tp:PREV_CARD]->(previous)-[pt:NEXT_CARD]->(tail)
WHERE ID(theList)={{listId}}
WITH theList, tail, tp, pt, previous
CREATE (newCard:Card { title: "Card Title", description: "" })
CREATE (tail)-[:PREV_CARD]->(newCard)-[:NEXT_CARD]->(tail)
CREATE (newCard)-[:PREV_CARD]->(previous)-[:NEXT_CARD]->(newCard)
DELETE tp,pt
RETURN newCard

Archiving a Card

Now let’s reconsider the use case in which we want to archive a card. Let’s review the architecture:

new queue structure

We have:

  • each project has a queue of lists
  • each project has an archive queue to store all archived cards
  • each list has a queue of cards

In the previous queue architecture I had 4 different scenarios, depending in whether the card to be archived was the head, the tail, or a card in between or if it was the last card left in the quee.

Now, with the introduction of the head and tail nodes, there is only one scenario, because the head and the tail node are there to stay, even in the case in which the queue is empty:

  • we need to find the (previous) and the (next) nodes, immediately before and after (theCard) node, which is the node that we want to archive
  • then, we need to connect (previous) and (next) with both a [NEXT_CARD] and a [PREV_CARD] relationship
  • then, we need to delete all the relationships that were connecting (theCard) to the (previous) and (next) nodes

The resulting cypher query can be subdivided in three distinct parts. The first part is in charge of finding (theArchive) node, given the ID of (theCard) node:

MATCH (theCard)<-[:NEXT_CARD|HEAD_CARD]-(l:List)<-[:NEXT_LIST]-(h)(theArchive:Archive)
WHERE ID(theCard)={{cardId}}

Next, we execute the logic that I described few lines earlier:

WITH theCard, theArchive
MATCH (previous)-[ptc:NEXT_CARD]->(theCard)-[tcn:NEXT_CARD]->(next)-[ntc:PREV_CARD]->(theCard)-[tcp:PREV_CARD]->(previous)
WITH theCard, theArchive, previous, next, ptc, tcn, ntc, tcp
CREATE (previous)-[:NEXT_CARD]->(next)-[:PREV_CARD]->(previous)
DELETE ptc, tcn, ntc, tcp

Finally, we insert (theCard) at the tail of the archive queue:

WITH theCard, theArchive
MATCH (theArchive)-[tat:TAIL_CARD]->(archiveTail)-[tp:PREV_CARD]->(archivePrevious)-[pt:NEXT_CARD]->(archiveTail)
WITH theCard, theArchive, archiveTail, tp, pt, archivePrevious
CREATE (archiveTail)-[:PREV_CARD]->(theCard)-[:NEXT_CARD]->(archiveTail)
CREATE (theCard)-[:PREV_CARD]->(archivePrevious)-[:NEXT_CARD]->(theCard)
DELETE tp,pt
RETURN theCard

Performance

I am very satisfied with how much simpler the new queries are, both to write and to understand, when compared to the older one discussed in the previous post of this series. At this point Wes suggested to run some performance tests. The results can be found below. Not much of a difference from a performance point of view – especially because this was an end-to-end performance test, calling N times the server, executing one insertion/archival at a time.
I ran each test three times, with the individual times in columns TIME 1, TIME 2 and TIME 3.

Using the New Architecture

TEST #1: INSERT (n) CARDS (in the same list)
# CARDS TIME 1 (ms) TIME 2 (ms) TIME 3 (ms)
10 77 56 58
100 552 534 526
1000 5133 4978 4903
10000 51342 51454 54709
TEST #2: ARCHIVE (n) CARDS (from the same list)
# CARDS TIME 1 (ms) TIME 2 (ms) TIME 3 (ms)
10 54 56 55
100 509 402 377
1000 3395 3508 2903
10000 27675 23381 22312

Using the Old Queries

TEST #3: INSERT (n) CARDS (in the same list)
# CARDS TIME 1 (ms) TIME 2 (ms) TIME 3 (ms)
10 116 118 111
100 1019 996 899
1000 7673 6262 6200
10000 62680 55663 58081
TEST #4: ARCHIVE (n) CARDS (from the same list)
# CARDS TIME 1 (ms) TIME 2 (ms) TIME 3 (ms)
10 148 133 124
100 954 784 676
1000 4958 3950 3539
10000 26921 23908 22942

Conclusions

I hope you find this post interesting as I found working through this exercise. I want to thank again Wes for his remote help (via Twitter and Stack Overflow) in this interesting (at least to me) experiment.

Building a SPA with AngularJS and Neo4J – Data Structure (First Try)

This is a crosspost from The Digitalian

Building Trello with AngularJS and Neo4J

As I mentioned in one of the previous articles of this series, the project I am working on — called Collaborative_Minds — consists of implementing the basic functionalities of Trello, a free web-based project management application made by Fog Creek Software, the legendary New York based software development company founded by Joel Spolsky, the father of Stack Overflow.

Trello uses a paradigm for managing projects known as kanban:

  • Projects are represented by a board
  • each board contains a number of lists (corresponding to task lists)
  • lists contain cards (corresponding to tasks).

Cards are supposed to progress from one list to the next, for instance mirroring the flow of a feature from idea to implementation.

Queue of queues of queues

Trello is a Single Page Application built with Backbone.js, Node.js and MongoDB — you can read here all the details about Trello’s cutting edge technology (as Joel himself describes it).

Maybe because I use Trello on a daily basis at work, I figured it would be the perfect candidate for an exercise to learn about SPAs, AngularJS, JavaScript-based programming stacks and Neo4J.

So, let’s review one more time the basic idea of Trello:

  • we have a number of Projects (represented as boards)
  • every Project has a number of Lists (whose order can matter)
  • every List contains a number of Cards (again the order could potentially matter)

I decided to implement this with a queue of queues of queues as the main data structure:

  • a main queue of Projects
  • each Project in the main queue has a queue of Lists
  • each List contains a queue of Cards

Queue of queues of queues

This structure is represented in the figure above as an oriented graph:

  • the Application node contains a queue of projects (in the sample graph we see represented three projects)
  • each queue is implemented with two relationships:
  • HEAD_XYZ pointing to the head node of the queue
  • TAIL_XYZ pointing to the tail node of the queue
  • all nodes in the queue are linked together by two types of relationships:
  • NEXT_XYZ pointing from the HEAD to the TAIL
  • PREV_XYZ pointing from the TAIL to the HEAD
  • each Project node contains a queue of lists (in the graph above, only “Second Project” has a queue of lists attached, with three lists, namely “To Do“, “In Progress” and “Done“)
  • each Project node also has an Archive list, connected with the ARCHIVE_LIST relationship, which is used to store archived cards
  • each List node contains a queue of cards (in the graph above, only the “In Progress” list has cards, three cards)

To represent an empty queue, for example when a List contains no cards, I adopted the convention where both the HEAD_CARD and the TAIL_CARD relationships point back to the list node itself.

Let’s see what all this looks like when implemented in Neo4J. The following figure is a screenshot taken directly from the awesome Neo4J broswer. In this screenshot we can see a very basic structure, with 3 distinct projects, each with 3 lists and an archive list as well. No cards are present in this diagram.

Basic Graph

For reference, this is the cypher query that generates exactly the sample graph show above:

CREATE (mainApp:CollaborativeMinds { name: “Collaborative Minds” }),
(proj1:Project { name: “My First Project”, company: “ABC Inc.” }),
(proj2:Project { name: “My Second Project”, company: “ACME” }),
(proj3:Project { name: “My Third Project”, company: “XYZ Corp.” }),
(mainApp)-[:HEAD_PROJECT]->(proj1), (mainApp)-[:TAIL_PROJECT]->(proj3),
(proj1)-[:NEXT_PROJECT]->(proj2), (proj2)-[:NEXT_PROJECT]->(proj3), (proj3)-[:PREV_PROJECT]->(proj2), (proj2)-[:PREV_PROJECT]->(proj1),
(proj1ToDoList:List { name: “To Do” }), (proj1InProgressList:List { name: “In Progress” }), (proj1DoneList:List { name: “Done” }), (proj1ArchiveList:List { name: “Archive” }),
(proj1)-[:ARCHIVE_LIST]->(proj1ArchiveList), (proj1)-[:HEAD_LIST]->(proj1ToDoList), (proj1)-[:TAIL_LIST]->(proj1DoneList),
(proj1ToDoList)-[:NEXT_LIST]->(proj1InProgressList), (proj1InProgressList)-[:NEXT_LIST]->(proj1DoneList), (proj1DoneList)-[:PREV_LIST]->(proj1InProgressList), (proj1InProgressList)-[:PREV_LIST]->(proj1ToDoList),
(proj1ArchiveList)-[:HEAD_CARD]->(proj1ArchiveList), (proj1ArchiveList)-[:TAIL_CARD]->(proj1ArchiveList),
(proj1ToDoList)-[:HEAD_CARD]->(proj1ToDoList), (proj1ToDoList)-[:TAIL_CARD]->(proj1ToDoList),
(proj1InProgressList)-[:HEAD_CARD]->(proj1InProgressList), (proj1InProgressList)-[:TAIL_CARD]->(proj1InProgressList),
(proj1DoneList)-[:HEAD_CARD]->(proj1DoneList), (proj1DoneList)-[:TAIL_CARD]->(proj1DoneList),
(proj2ToDoList:List { name: “To Do” }), (proj2InProgressList:List { name: “In Progress” }), (proj2DoneList:List { name: “Done” }), (proj2ArchiveList:List { name: “Archive” }),
(proj2)-[:ARCHIVE_LIST]->(proj2ArchiveList), (proj2)-[:HEAD_LIST]->(proj2ToDoList), (proj2)-[:TAIL_LIST]->(proj2DoneList),
(proj2ToDoList)-[:NEXT_LIST]->(proj2InProgressList), (proj2InProgressList)-[:NEXT_LIST]->(proj2DoneList), (proj2DoneList)-[:PREV_LIST]->(proj2InProgressList), (proj2InProgressList)-[:PREV_LIST]->(proj2ToDoList),
(proj2ArchiveList)-[:HEAD_CARD]->(proj2ArchiveList), (proj2ArchiveList)-[:TAIL_CARD]->(proj2ArchiveList),
(proj2ToDoList)-[:HEAD_CARD]->(proj2ToDoList), (proj2ToDoList)-[:TAIL_CARD]->(proj2ToDoList),
(proj2InProgressList)-[:HEAD_CARD]->(proj2InProgressList), (proj2InProgressList)-[:TAIL_CARD]->(proj2InProgressList),
(proj2DoneList)-[:HEAD_CARD]->(proj2DoneList), (proj2DoneList)-[:TAIL_CARD]->(proj2DoneList),
(proj3ToDoList:List { name: “To Do” }), (proj3InProgressList:List { name: “In Progress” }), (proj3DoneList:List { name: “Done” }), (proj3ArchiveList:List { name: “Archive” }),
(proj3)-[:ARCHIVE_LIST]->(proj3ArchiveList), (proj3)-[:HEAD_LIST]->(proj3ToDoList), (proj3)-[:TAIL_LIST]->(proj3DoneList),
(proj3ToDoList)-[:NEXT_LIST]->(proj3InProgressList), (proj3InProgressList)-[:NEXT_LIST]->(proj3DoneList), (proj3DoneList)-[:PREV_LIST]->(proj3InProgressList), (proj3InProgressList)-[:PREV_LIST]->(proj3ToDoList),
(proj3ArchiveList)-[:HEAD_CARD]->(proj3ArchiveList), (proj3ArchiveList)-[:TAIL_CARD]->(proj3ArchiveList),
(proj3ToDoList)-[:HEAD_CARD]->(proj3ToDoList), (proj3ToDoList)-[:TAIL_CARD]->(proj3ToDoList),
(proj3InProgressList)-[:HEAD_CARD]->(proj3InProgressList), (proj3InProgressList)-[:TAIL_CARD]->(proj3InProgressList),
(proj3DoneList)-[:HEAD_CARD]->(proj3DoneList), (proj3DoneList)-[:TAIL_CARD]->(proj3DoneList)

In the next figure, I have added a few cards to each of the lists of the second project. I have archived a few cards as well, to show what the archive looks like:

Basic Graph

Once settled for the above data structure, I immediately started working on building the two main queries I needed to be able to insert new cards and to archive existing ones.

Inserting a New Card

Being these queues, all cards need to be inserted at the end of the queue, in other words at the tail of the queue, keeping in mind that an empty queue is represented by both the head and the tail relationships pointing back to the empty list node.
This means that we have two cases:

  • when the list is initially empty, inserting the first card means changing the HEAD_CARD and the TAIL_CARD relationships to be both pointing at the first card
  • when the list has already at least one card, in other words when the list already has a tail, we need to change the TAIL_CARD relationship to point to the new card, and then build:
  • a PREV_CARD relationship going from the new card to the previous tail
  • a NEXT_CARD relationship going from the previous tail to the new card

Using the power of the OPTIONAL MATCH, I translated this idea in the following query:

// first get a hold of the list to which we want to add the new card
MATCH (theList:List) WHERE ID(theList)=5
// check if the list already has at least one card
OPTIONAL MATCH (theList)-[tlct:TAIL_CARD]->(currentTail:Card)
// check if the list is empty
OPTIONAL MATCH (theList)-[tltl1:TAIL_CARD]->(theList)-[tltl2:HEAD_CARD]->(theList)
WITH
theList,
CASE WHEN currentTail IS NULL THEN [] ELSE [(currentTail)] END AS currentTails,
currentTail, tlct,
CASE WHEN tltl1 IS NULL THEN [] ELSE [(theList)] END AS emptyLists,
tltl1, tltl2
// create the new card
CREATE (newCard:Card { title: “Card Title”, description: “” })
// handle the case in which the list already had at least one card
FOREACH (value IN currentTails |
CREATE (theList)-[:TAIL_CARD]->(newCard)
CREATE (newCard)-[:PREV_CARD]->(currentTail)
CREATE (currentTail)-[:NEXT_CARD]->(newCard)
DELETE tlct)
// handle the case in which the list was empty
FOREACH (value IN emptyLists |
CREATE (theList)-[:TAIL_CARD]->(newCard)
CREATE (theList)-[:HEAD_CARD]->(newCard)
DELETE tltl1, tltl2)
RETURN newCard

Not too bad… Although I have to admit that it took me a while to get to this query in the form you see it here, being that I am not familiar with cypher and graph databases.

Essentially, OPTIONAL MATCH statements identify which scenario are we in, either working with an empty list or with a list that has already some cards in it.

Then, each of those CASE statements generate either an empty array or an array with exactly one element. These arrays are then being used as logical switches by the FOREACH statements that drive the logic underneath each case.

Archiving a Card

Let’s take a look again at the simple graph I introduced earlier that shows a few cards loaded into three lists:

Basic Graph

Now imagine the process of taking anyone of those cards and moving it into the archive queue. It shouldn’t be too hard to see that we can have 4 different scenarios:

  1. The card to be archived is the only card in that list (see card #19 in the graph above). In this case we need to move the card and then, based on the convention we adopted, point the HEAD_CARD and TAIL_CARD relationships back to the list node.
  2. The card is in the middle of a queue (see cards #22 and #23 in the graph above). This is the simplest case. Simply move the card, delete all its relationships with the card before and after, and finally link the card before and the card after with both a NEXT_CARD and a PREV_CARD relationships.
  3. The card is at the head of a queue (such as cards #16 and #21 in the graph above). In this case we need to move the card to the archive, delete all its relationships with both the list node and the card immediately next in the queue. Finally we need to create a new HEAD_CARD relationship going from the list node to the new head of the queue.
  4. Symmetric case to #3. The card is at the tail of a queue (such as cards #17 and #24 in the graph above). In this case we need to move the card to the archive, delete all its relationships with both the list node and the card immediately previous in the queue. Finally we need to create a new TAIL_CARD relationship going from the list node to the new tail of the queue.

If you have understood well how the insertion query works, you should be able to grasp the following query as well:

// first let’s get a hold of the card we want to archive
MATCH (theCard:Card) WHERE ID(theCard)=44
// next, let’s get a hold of the correspondent archive list node, since we need to move the card in that list
OPTIONAL MATCH (theCard)<-[:NEXT_CARD|HEAD_CARD*]-(theList:List)(theArchive:List)
// let’s check if we are in the case where the card to be archived is in the middle of a list
OPTIONAL MATCH (before:Card)-[btc:NEXT_CARD]->(theCard:Card)-[tca:NEXT_CARD]->(after:Card)
OPTIONAL MATCH (next:Card)-[ntc:PREV_CARD]->(theCard:Card)-[tcp:PREV_CARD]->(previous:Card)
// let’s check if the card to be archived is the only card in the list
OPTIONAL MATCH (listOfOne:List)-[lootc:TAIL_CARD]->(theCard:Card)(theCard:Card)-[tcs:NEXT_CARD]->(second:Card)-[stc:PREV_CARD]->(theCard:Card)
// let’s check if the card to be archived is at the tail of the list
OPTIONAL MATCH (listToTail:List)-[ltttc:TAIL_CARD]->(theCard:Card)-[tcntl:PREV_CARD]->(nextToLast:Card)-[ntltc:NEXT_CARD]->(theCard:Card)
WITH
theCard, theList, theProject, theArchive,
CASE WHEN theArchive IS NULL THEN [] ELSE [(theArchive)] END AS archives,
CASE WHEN before IS NULL THEN [] ELSE [(before)] END AS befores,
before, btc, tca, after,
CASE WHEN next IS NULL THEN [] ELSE [(next)] END AS nexts,
next, ntc, tcp, previous,
CASE WHEN listOfOne IS NULL THEN [] ELSE [(listOfOne)] END AS listsOfOne,
listOfOne, lootc, tcloo,
CASE WHEN listToHead IS NULL THEN [] ELSE [(listToHead)] END AS listsToHead,
listToHead, lthtc, tcs, second, stc,
CASE WHEN listToTail IS NULL THEN [] ELSE [(listToTail)] END AS listsToTail,
listToTail, ltttc, tcntl, nextToLast, ntltc
// let’s handle the case in which the archived card was in the middle of a list
FOREACH (value IN befores |
CREATE (before)-[:NEXT_CARD]->(after)
CREATE (after)-[:PREV_CARD]->(before)
DELETE btc, tca)
FOREACH (value IN nexts | DELETE ntc, tcp)
// let’s handle the case in which the archived card was the one and only card in the list
FOREACH (value IN listsOfOne |
CREATE (listOfOne)-[:HEAD_CARD]->(listOfOne)
CREATE (listOfOne)-[:TAIL_CARD]->(listOfOne)
DELETE lootc, tcloo)
// let’s handle the case in which the archived card was at the head of the list
FOREACH (value IN listsToHead |
CREATE (listToHead)-[:HEAD_CARD]->(second)
DELETE lthtc, tcs, stc)
// let’s handle the case in which the archived card was at the tail of the list
FOREACH (value IN listsToTail |
CREATE (listToTail)-[:TAIL_CARD]->(nextToLast)
DELETE ltttc, tcntl, ntltc)
// finally, let’s move the card in the archive
// first get a hold of the archive list to which we want to add the card
WITH
theCard,
theArchive
// first get a hold of the list to which we want to add the new card
OPTIONAL MATCH (theArchive)-[tact:TAIL_CARD]->(currentTail:Card)
// check if the list is empty
OPTIONAL MATCH (theArchive)-[tata1:TAIL_CARD]->(theArchive)-[tata2:HEAD_CARD]->(theArchive)
WITH
theArchive, theCard,
CASE WHEN currentTail IS NULL THEN [] ELSE [(currentTail)] END AS currentTails,
currentTail, tact,
CASE WHEN tata1 IS NULL THEN [] ELSE [(theArchive)] END AS emptyLists,
tata1, tata2
// handle the case in which the list already had at least one card
FOREACH (value IN currentTails |
CREATE (theArchive)-[:TAIL_CARD]->(theCard)
CREATE (theCard)-[:PREV_CARD]->(currentTail)
CREATE (currentTail)-[:NEXT_CARD]->(theCard)
DELETE tact)
// handle the case in which the list was empty
FOREACH (value IN emptyLists |
CREATE (theArchive)-[:TAIL_CARD]->(theCard)
CREATE (theArchive)-[:HEAD_CARD]->(theCard)
DELETE tata1, tata2)
RETURN theCard

Nonetheless, this is a massive query. I am not sure about its performance, because I haven’t measured it, but its relative complexity is already enough to make me think that there must be a way to simplify this, possibly modifying the underlying data structure.

Furthermore, keep in mind, that it took me a long time to come up with this query. What inspired me to build this query the way I did, was a clever graph gist to play Tic Tac Toe using cypher queries posted by @SylvainRoussy. Before getting this inspiration I was stuck at running four separate queries, which is when I decided to ask for help on Stack Overflow.

A few days after posting my question, Graph Grandmaster Wes Freeman left an eye-opening comment:

You might be interested to see my skip list graph gist… it handles empty lists by having a tail and head that are never deleted, so the case is always the same (removing an internal node)

This is brilliant. Not just because it reveals a much easier way to solve the problem at hand, but because it was a real eye opener for me. When I think of data structures, in my head I visualize them the way I was used to do it in college, like in the following figure, which is taken from Wikipedia and shows a doubly-linked list:

doubly-linked list

This automatically translated in my head into the need for a graph relationship every time I saw a pointer in the data structure. Wes’ comment made me finally realize that a node is an even better translation, especially when it allows to simplify the number of scenarios when working with the data structure itself.

And so we are back to the drawing board. The results will come in the next post of this series.