Data ownership and layer fungibility

Recently got into the topic of data ownership on a conversation with Carey Lening which helped catalyze a few ideas I’ve been kicking around for a while.

In short, data ownership requires layer fungibility.

I’ll elaborate below.

We are both dismissive of how “own your data” has been co-opted to mean “extract your data into our system so you can monetize it and we can get a cut” by countless (often token-issuing) startups, which are too eager to disregard not only how hard it is to build two-sided marketplaces, but the fact that we are currently seriously lacking in private data analysis tools.

So let’s define it first.

When I say data ownership, I mean that:

  1. You should have control of where your data is stored and how/if it’s shared; and
  2. A copy that you have should should be interchangeable with the data from the system you extracted it from - it should either be “live” (directly accessed) or in hibernation (can be restored from), not an embalmed slab of XML missing major organs.

As a trivial example: there is no reason why my Fitness information should be anywhere other than my watch and the phone it’s connected to. Maybe most users trust Apple, and wants it backed up to iCloud, maybe I don’t trust them and want it included exclusively in the local backup. That would satisfy the first condition. You could satisfy the second by making sure that you can re-import this backup into Fitness at any point - that would make the backup something that is in hibernation, waiting for you to reactivate it.

Contrast that with how Twitter is set up: you have no control of where your data is stored, you don’t get to choose who you share it with and, once you export a backup, that’s it - there is no way to restore your tweets to the platform if an overeager flunky decides to nuke your account because you posted that Musk is cisgender.

It should be obvious that data ownership is a sliding scale, not a toggle.

Being able to choose if your data ever leaves your device is step one; choosing where to back it up gets you further; remote services accessing data solely from your personal data store even more so; as does on-device private data analysis; all the way to the theoretical, pie-in-the-sky ideal where you get to revoke access even to a datum you have already shared and has left your PDS.

But to get any of that, you need system layer fungibility.

Back to the Twitter example: Twitter’s entire stack is take it or leave it - it is non-fungible and 100% coupled with the user interface you use to access and whatever policies the current deranged owner may choose to apply.

Mastodon and ActivityPub fare better. The protocol is a standard, and you get your choice of instance, but your control over storage is coupled with the instance’s policies and the domain they run under. It’s better, sure, but the only way to have a measure of control is to run and maintain your own instance, something I expect most users will be unlikely to do. This also leads to to some other issues, such as the side effects of how you need to customize your Fediverse experience by picking an instance, but I’ve discussed that before on BlueSky.

I suspect the fact that you are meant to customize your FediVerse experience by choosing an instance has something to do with it.

It feels like a case of "most intolerant minority wins", but aggregating the various absurd intolerant behaviors that develop separately on federated-enough instances.

[image or embed]

— Ricardo J. Méndez (@ricardo.bsky.social) September 5, 2024 at 11:03 AM

And then we have BlueSky and AT Protocol, which decouple storage, moderation, and identity. Under such a set up, anyone could use BlueSky’s default interface to create their content, but maintain their own PDS storage were no rogue moderator gets to modify their data.

If BlueSky continues growing at its current pace, I expect a small market of fungible PDS providers to pop up, with 1-click migrations - potentially you could even use another one for replication, in case your primary were to crash, instead of being fully tied to a single instance á la ActivityPub. Any providers would have the advantage of lowered friction, because users do not need to also decide on a new root domain and username, nor wonder about how will their followers find them.

You are much closer to owning your data in that scenario, than in either of previous two.

This is also why I expect an OSI-like model for identity to be more resilient - an application layer should not be able to dictate terms to the layers below nor decide how a lower layer communicates with others at their level; and at the same time, these lower layers should be fungible for any layer above.

Finally, this aligns with something we have seen repeatedly: either you have a tight, invisible bundling, á la Apple; or you win by providing the cleanest unbundling possible.

Sitting on the fence doesn’t get you far enough.


Published: 2024-09-19