Graph Databases, RoamResearch, and Personal Knowledge Management
How RoamResearch Uses New Database Technology To Bring Life To An 80 Year-old Idea
Quick Roadmap
In this short article, I explain the (very) basics of graph databases and use RoamResearch’s note-taking software as an example of an early and powerful mainstream application of this new technology.
Technical Background
As knowledge advances in any field, understanding the connections between ideas and data becomes increasingly important for making new insights and forging progress.
Graph databases (GDBs) are a type of NoSQL database designed to efficiently store and query information in these contexts. More formally, Neo4j defines a graph database as a “database designed to treat the relationships between data as equally important to the data itself” [1]. To handle these datasets, GDBs make use of semantic queries which retrieve information based on pattern matching and digital reasoning.
Using SPARQL as the query language, semantic queries process the relationships between information to return precise answers to open-ended questions by making inferences from the network of linked data stored in a GDB.
While this technology can be applied to any area of study, I’ll focus on the growing use of GDBs in the field of personal knowledge management to argue my main points in this paper. More specifically, I’ll be using RoamResearch’s note taking app as a case study.
The Memex Is Here!?
In 1945, Vannevar Bush theorized the usefulness of computers for augmenting human thought. His ‘memex’ encapsulates this idea: “a memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” [2].
Driven by ubiquitous tools such as Kindle, Evernote, and Google Drive, computers have enabled this component of Bush’s vision. Individuals can curate their own portable, digital collection of books, articles, media, and other artifacts. As amazing as these features are, however, they don’t represent a new paradigm of tools for human thought.
In their white-paper, RoamResearch argues that “almost every technology follows the same basic ‘file cabinet’ format: A unit of knowledge is saved to a certain file path, which places it within a taxonomy of folders, chapters or categories” [3]. Though significant, the only innovations brought about by the above tools are faster search, better indexing, sharing, portability, and ‘infinite’ storage space relative to analog alternatives (i.e., books on a bookshelf). Roam continues, “…unlike the brain, the file cabinet approach makes it difficult or impossible to remix or reuse the same piece of information” [3].
The problem is that this mingling of loosely correlated information is the defining memex feature that Bush argued would usher in a new paradigm for augmenting thought: “…this is the essential feature of the memex. The process of tying two items together is the important thing” [2]. By choosing to store all of a user’s data in a GDB, Roam is among the first tools to implement this important capability. Roam explains, “…if current tools resemble filing cabinets, Roam is more akin to the nodal networks in telecommunications, or the neurons in the human brain.
Rather than existing in a vacuum, each note or file becomes a node in an interconnected graph of ideas. A single node may simultaneously hold positions in several different sequences, hierarchies or file paths, and can ‘talk’ to other nodes, communicating information back and forth about the nature of each relationship” [3].
Roam’s use of a GDB for organizing files and notes increases the probability of generating serendipitous insights for users — just as Bush hoped for.
Following Bush’s line of thinking, Roam users can display a graph for each page (node) in their database (network) that diagrams all of the explicit connections to other notes they have. Additionally, the footer of each note optionally displays a list of both explicit and implicit relationships to other pages in the database (nodes in the network).
In combination, these features and standardized views inevitably bring the user’s attention to unrealized, unexpected connections between ideas. This is significant because novel, unforeseen interactions between previously disparate ideas is commonly the basis for combinatorial innovation.
Consider the invention of the printing press: “…Gutenberg’s printing press was a classic combinatorial innovation, more bricolage than breakthrough. Each of the key elements that made it such a transformative machine — the movable type, the ink, the paper, and the press itself — had been developed separately well before Gutenberg printed his first Bible” [4].
My entire thesis rests on this core idea: if every individual had access to a tool like Roam, we’d exponentially accelerate the pace of individual breakthroughs.
As Gutenberg shows, the serendipitous revelation of previously unseen combinations is often a jolt for a new idea or a starting point for a novel invention. Though Gutenberg’s breakthrough came from luck and life circumstance, users of Roam manufacture their own luck by automating semantic querying and using computers to constantly surface potential relationships. This is only enabled by GDBs.
In addition to technological innovation, the sparks inspired by Roam revealing connections between ideas can lead to breakthroughs in communication and education. By exposing users to unseen relationships between ideas, Roam is serving up possible hooks for analogous reasoning, thinking via metaphor. Many ideas are more easily communicated when presented in the context of an already understood idea. For example, if someone understands compound interest from finance, it would be easier to explain compounding in other contexts.
Some Disadvantages
The most frequently argued disadvantages of GBDs stem from the age of the technology. The GDB algorithms aren’t perfectly optimized for space and runtime efficiency. The industry doesn’t yet follow a common set of standards [5]. Companies change and update GDB software very frequently which makes tool selection difficult.
As with all computer technologies, it takes thousands of iterations and multiple decades to reach theoretically ideal algorithms, marketplace incentives, and protocols. No different from any other engineering topic, GDBs are subject to trade-offs: resource use, cost, time, speed, and accessibility.
With time and research, computer scientists will converge on an optimal balance given these constraints. Early relational databases were not nearly as efficient or powerful as their current incarnations. It would be unreasonable to think that the same wouldn’t hold for the early GDBs we are working with today. The development environment is rapidly changing because new improvements are being made all the time. Many commercially available GDB engines don’t fully utilize parallel processing which means many potential speed enhancements are not being implemented [5]. Additionally, semantic query processing is a new and not yet fully explored research area.
With time, I have complete faith that necessary innovations will emerge in these contexts just as they have for all computer science topics such as algorithm development, hardware design, relational database design, and networks. Simply put, we do not get things perfect the first time.
Wrap Up
I’m extremely optimistic about the future of GDBs. It is common knowledge that the world is experiencing a data explosion. A corollary to this is the explosion in potential relationships between data. GDBs and their underlying technologies will play an ever increasing role in making use of this incomprehensibly large quantity of data being collected.
The breakthrough usefulness for personal knowledge management using RoamResearch is just the tip of the iceberg when it comes to realizing the full potential of GDBs for organizing the world’s information. More so than many realize, GDBs are already making a splash. Companies like Comcast, Adobe, and eBay have successfully solved real world problems using Neo4j’s GDB tools [6].
As GDBs become increasingly better understood, more widely discussed, and as continuous improvements accrete, the use of GDBs will grow exponentially.
Like hammers are to nails, GDBs will one day be as common of a problem solving tool as machine learning algorithms, relational DBs, and all other widely available tools are to their appropriate set of problems.
If you enjoyed this article, sign up for my newsletter here!
References
[1] “What is a Graph Database? — Neo4j Graph Database Platform”, Neo4j Graph Database Platform, 2020. [Online]. Available: https://neo4j.com/developer/graph-database/. [Accessed: 20- Oct- 2020]
[2] V. Bush, “As We May Think”, The Atlantic, 1945. [Online]. Available: https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/. [Accessed: 20- Oct- 2020]
[3] R. Research, C. White-Sullivan and R. Meadows, “Roam Research — A note taking tool for networked thought.”, Roam Research. [Online]. Available: https://roamresearch.com/#/app/help/page/Vu1MmjinS. [Accessed: 20- Oct- 2020]
[4] S. Johnson, Where good ideas come from. London: Penguin, 2011, p. 152.
[5] Y. Schulz, “A quick primer on graph databases”, IT World Canada, 2020. [Online]. Available: https://www.itworldcanada.com/blog/a-quick-primer-on-graphdatabases/434215. [Accessed: 20- Oct- 2020]
[6] “Neo4j Customers — Neo4j Graph Database Platform”, Neo4j Graph Database Platform. [Online]. Available: https://neo4j.com/customers/. [Accessed: 20- Oct- 2020]
Special thanks to Yogi for the help when I was writing this paper!