Open source project details

Up until now whenever someone asked about our open source projects I referred them directly to my Github page. That’s the repository, sure, but might be a bit noisy as it contains a mix of the projects I’ve created and maintain, with repositories I’ve forked (some of which I contribute to, but not all), with older projects.

I decided it was time to clear things up a bit, so I’ve done a brief write up on those that I’m actively maintaining. It has a better introduction to each, including both their application and licensing details.

Head over to the open source page to read more, and let me know if you have any questions.

cljs-build configuration tip: prefer maps over a vector

If you look at the lein cljs-build sample project, you’ll see that it states:

; The :builds option should be set to a sequence of maps.  Each
; map will be treated as a separate, independent, ClojureScript
; compiler configuration.

If you look at the example right below that description, you’ll see that it is actually set as a map of configuration profiles with the configuration name as the key. So what gives?

Visualizing TVTropes - Concept exploration

Introduction

Tropology is an experiment in visualizing conceptual networks from TVTropes. The initial idea was that, starting from a chosen concept, we could visualize the list of tropes and articles it referenced, along with the relationships between them.

Experimenting with the data, which even after some judicious pruning includes over 220 thousand nodes and 11 million relationships, led me to quickly conclude that the model was too widely connected. We couldn’t just visualize the raw data and needed to apply some sort of direction in order to make the visualizations sensible.

Let’s see how to go about that.

Tropology performance: PostgreSQL vs Neo4j

Or, apples vs. papaya salad.

Short version: I’ve moved Tropology to PostgreSQL for performance reasons, and because after some evaluation, Neo4j wasn’t as good a fit as it seemed at first blush.

You may want to start by catching up with the other Tropology articles.

All done?

Visualizing TVTropes - Part 4: Merges

Episode IV: A new approach

We got better performance with the last changes I made to the import process, at the cost of parallelization and a significantly less clean Clojure codebase.

As I was going about wrapping them up, I got a recommendation from Michael Hunger. Summarized:

  • Don’t use the batch REST API
  • The Cypher query endpoint is outdated
  • Try merges

Well then. So much for that feature branch.

Visualizing TVTropes - Part 3: Proper batching

The story so far

Yesterday we were discussing our (admittedly) somewhat ghetto, not-quite-batched Neo4j implementation.

The long and short of it is that I was initially attacking the import process as one would on a JDBC client for a relational database. Query for these values here, create those if some don’t exist, insert relationships here, etc.

That seems to be woefully inefficient in Neo4j.

Visualizing TVTropes - Part 2: Neo4j optimizations

Welcome. You may want to first catch up on Part 1 of this little experiment.

All done? OK then.

Visualizing TVTropes - Part 1

Introduction

This is the first part of a series of articles on a small experiment I’m building. The intent is to crawl TVTropes, to find possible relationships between tropes and the material they appear in, as well as shared concepts.

I want to be able to not only visualize a concept and the tropes that it links to, but also how they relate to each other. A bonus would be being able to query how far away a concept is from another - for instance, how many steps we need to take before we can go from Cowboy Bebop to Macross Missile Massacre.

Chasing after NoSuchMethodError: com.google.common.io.ByteStreams.limit

This one bit me a bit hard and, seeing as I could find only one reference for it in the context of Clojure and Leiningen, I thought a small write-up would help.

After including some ClojureScript code on my project, lein started barfing when I attempted to build an uberjar with any optimization settings other than none.

Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.io.ByteStreams.limit(Ljava/io/InputStream;J)Ljava/io/InputStream;, compiling:(/private/var/folders/64/ckdtdxm14059n9wh8rhf1mvw0000gn/T/form-init527739448955044836.clj:1:123)

There was only one obscure reference I found, from an IRC chat now a couple of years old. My first thought was that something in the ClojureScript code was throwing off the compiler, but it seemed odd that this would trigger a call to a specific non-existent method.

Other searches - outside of the lein context - made me realize it was actually a dependency conflict, and only triggered on uberjar because since there wasn’t any ClojureScript code deployed before the compiler hadn’t had to be invoked.

Calling

lein deps :tree

showed there were multiple versions of com.google.guava being referenced, with the one used by ClojureScript being 18.0 but a much older 11.0.1 being referenced by urly. Seems like lein either doesn’t realize there’s a conflict or defaults to the older library, which was causing the error when ClojureScript attempted to build.

Extending types from Clojure while working on the REPL

On the datatypes chapter of Programming Clojure 2nd edition (page 155) there’s the following bit where the CryptoVault is extended to support the default input streams:

Book capture

The calls to spit/slurp didn’t work on my tests at first, even reloading the namespace.