Open source project details

Up until now whenever someone asked about our open source projects I referred them directly to my Github page. That’s the repository, sure, but might be a bit noisy as it contains a mix of the projects I’ve created and maintain, with repositories I’ve forked (some of which I contribute to, but not all), with older projects.

I decided it was time to clear things up a bit, so I’ve done a brief write up on those that I’m actively maintaining. It has a better introduction to each, including both their application and licensing details.

Head over to the open source page to read more, and let me know if you have any questions.

Visualizing TVTropes - Concept exploration

Introduction

Tropology is an experiment in visualizing conceptual networks from TVTropes. The initial idea was that, starting from a chosen concept, we could visualize the list of tropes and articles it referenced, along with the relationships between them.

Experimenting with the data, which even after some judicious pruning includes over 220 thousand nodes and 11 million relationships, led me to quickly conclude that the model was too widely connected. We couldn’t just visualize the raw data and needed to apply some sort of direction in order to make the visualizations sensible.

Let’s see how to go about that.

Tropology performance: PostgreSQL vs Neo4j

Or, apples vs. papaya salad.

Short version: I’ve moved Tropology to PostgreSQL for performance reasons, and because after some evaluation, Neo4j wasn’t as good a fit as it seemed at first blush.

You may want to start by catching up with the other Tropology articles.

All done?

Visualizing TVTropes - Part 4: Merges

Episode IV: A new approach

We got better performance with the last changes I made to the import process, at the cost of parallelization and a significantly less clean Clojure codebase.

As I was going about wrapping them up, I got a recommendation from Michael Hunger. Summarized:

  • Don’t use the batch REST API
  • The Cypher query endpoint is outdated
  • Try merges

Well then. So much for that feature branch.

Visualizing TVTropes - Part 3: Proper batching

The story so far

Yesterday we were discussing our (admittedly) somewhat ghetto, not-quite-batched Neo4j implementation.

The long and short of it is that I was initially attacking the import process as one would on a JDBC client for a relational database. Query for these values here, create those if some don’t exist, insert relationships here, etc.

That seems to be woefully inefficient in Neo4j.

Visualizing TVTropes - Part 2: Neo4j optimizations

Welcome. You may want to first catch up on Part 1 of this little experiment.

All done? OK then.

Visualizing TVTropes - Part 1

Introduction

This is the first part of a series of articles on a small experiment I’m building. The intent is to crawl TVTropes, to find possible relationships between tropes and the material they appear in, as well as shared concepts.

I want to be able to not only visualize a concept and the tropes that it links to, but also how they relate to each other. A bonus would be being able to query how far away a concept is from another - for instance, how many steps we need to take before we can go from Cowboy Bebop to Macross Missile Massacre.