Tropology is an experiment in visualizing conceptual networks from TVTropes. The initial idea was that, starting from a chosen concept, we could visualize the list of tropes and articles it referenced, along with the relationships between them.
Experimenting with the data, which even after some judicious pruning includes over 220 thousand nodes and 11 million relationships, led me to quickly conclude that the model was too widely connected. We couldn’t just visualize the raw data and needed to apply some sort of direction in order to make the visualizations sensible.
Let’s see how to go about that.
Or, apples vs. papaya salad.
Short version: I’ve moved Tropology to PostgreSQL for performance reasons, and because after some evaluation, Neo4j wasn’t as good a fit as it seemed at first blush.
You may want to start by catching up with the other Tropology articles.
Episode IV: A new approach
We got better performance with the last changes I made to the import process, at the cost of parallelization and a significantly less clean Clojure codebase.
As I was going about wrapping them up, I got a recommendation from Michael Hunger. Summarized:
- Don’t use the batch REST API
- The Cypher query endpoint is outdated
- Try merges
Well then. So much for that feature branch.
The story so far
Yesterday we were discussing our (admittedly) somewhat ghetto, not-quite-batched Neo4j implementation.
The long and short of it is that I was initially attacking the import process as one would on a JDBC client for a relational database. Query for these values here, create those if some don’t exist, insert relationships here, etc.
That seems to be woefully inefficient in Neo4j.
Welcome. You may want to first catch up on Part 1 of this little experiment.
All done? OK then.
This is the first part of a series of articles on a small experiment I’m building. The intent is to crawl TVTropes, to find possible relationships between tropes and the material they appear in, as well as shared concepts.
I want to be able to not only visualize a concept and the tropes that it links to, but also how they relate to each other. A bonus would be being able to query how far away a concept is from another - for instance, how many steps we need to take before we can go from Cowboy Bebop to Macross Missile Massacre.
This one bit me a bit hard and, seeing as I could find only one reference for it in the context of Clojure and Leiningen, I thought a small write-up would help.
After including some ClojureScript code on my project, lein started barfing when I attempted to build an uberjar with any optimization settings other than none.
Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.io.ByteStreams.limit(Ljava/io/InputStream;J)Ljava/io/InputStream;, compiling:(/private/var/folders/64/ckdtdxm14059n9wh8rhf1mvw0000gn/T/form-init527739448955044836.clj:1:123)
There was only one obscure reference I found, from an IRC chat now a couple of years old. My first thought was that something in the ClojureScript code was throwing off the compiler, but it seemed odd that this would trigger a call to a specific non-existent method.
Other searches - outside of the lein context - made me realize it was actually a dependency conflict, and only triggered on uberjar because since there wasn’t any ClojureScript code deployed before the compiler hadn’t had to be invoked.
lein deps :tree
showed there were multiple versions of com.google.guava being referenced, with the one used by ClojureScript being 18.0 but a much older 11.0.1 being referenced by urly. Seems like lein either doesn’t realize there’s a conflict or defaults to the older library, which was causing the error when ClojureScript attempted to build.
DECEMBER 2015 UPDATE: I’ve run some tests with the latest ClojureScript and made some minor changes to the code, which significantly improved performance. Read this post for updated numbers. Keeping this post around for archival reasons.
The set up
This whole thing started not as a performance test, but as me experimenting with ClojureScript and Quil while reading Matt Pearson’s Generative Art. As such, it is not the most scientific of comparisons, and instead born out of my notes when exploring how to do sketches with ClojureScript for the web.
Consider the following example:
On the datatypes chapter of Programming Clojure 2nd edition (page 155) there’s the following bit where the CryptoVault is extended to support the default input streams:
The calls to spit/slurp didn’t work on my tests at first, even reloading the namespace.