FBXL Social

I think the "quadratic at scale" concern is the one thing I failed to share from my summary thread of the differences between "shared heap" in ATProto and "message passing" in ActivityPub.

In short: if everyone fully self hosts in message passing, you send messages between just send messages to relevant recipients

In a shared heap approach, to *not* miss relevant messages, all users must receive copies of all messages (including irrelevant ones), which is quadratic if everyone fully self-hosts

This is from the ""Message passing" vs "shared heap" architectures" subsection of https://dustycloud.org/blog/how-decentralized-is-bluesky/

> A world of full self-hosting is not possible with Bluesky. In fact, it is worse than the storage requirements, because the message delivery requirements become quadratic at the scale of full decentralization: to send a message to one user is to send a message to all. Rather than writing one letter, a copy of that letter must be made and delivered to every person on earth.

You might say "well, gossip helps with this!" or something, but it doesn't.

Bluesky *strongly emphasizes* in their documentation that they are aiming for "no missed message replies". Without directed message delivery, everyone needs to *receive* every message.

Regardless of how the messages are distributed, It's O(n^2) in best case if everyone fully self hosts and there is no directed delivery.

@cwebber adjacent thought but do you ever think about how I'm reading this toot on my laptop which is powered using -- solar by day, probably coal by night, but also a battery ... transmitted over my wifi which is powered by solar right now but passed along ... cables? and then if I open my phone and turn off the wifi, the same thing would be relayed to me via ... satellites? towers?

sometimes I think about it all and need to lie down.

@platypus I do think about how messages get around sometimes and it's pretty overwhelming, lol

@cwebber I'm interested to do more big-O comparisons as well.

for a large reply thread, say a thousand actively replying users on hundreds of separate instances, the number of AP messages that need to be rapidly distributed to assemble complete reply-thread view on each instance is pretty huge, no? N^2? and then also fan-out to followers?

each also needs to fetch/render media and social cards (O(instances)). and that doesn't cover *viewers* distinct from participants/followers.

Now SSB gets around this by only fetching messages from your followers and several degrees out from them, for instance. You don't receive all messages, but it's lossy, lossier than ActivityPub and "missing replies" by far, and you still receive *many* more messages than are relevant to you.

Of course the general approach taken by Bluesky is: "well, we aren't aiming for full self-hosting".

Message delivery is presently centralized. As more nodes are added, message delivery grows quadratically. Inherently, ATProto's approach relies on only having a few meganodes delivering.

This means that "credible exit" is still a viable path, but "full decentralization" is not possible unless Bluesky pivots to incorporating "message passing" into its architecture in some way or another or accepting a great degree of lossy replies, which explicitly is stated as a goal to avoid.

@bnewbold It's not n^2 because it's sending messages to *followers*. It's O(n), because inherently sending messages directly to interested parties becomes O(n).

The thing that makes it n^2 is the need to receive all the messages that are not relevant to you, every time.

@bnewbold It's a bit like... when I wrote my first platformer video game I was like "why is it getting so slow when I am adding so many enemies?" And that's because I knew very little about algorithms and had every enemy do a collision check against every other enemy. O(n^2), of course. As I learned more I l learned how to filter down the information of what's relevant to that enemy.

Directed delivery is one way to do that.

@joeyh Yeah, that's accurate. Regardless, there's a filtering over the set of all information possible in the network to make it not quadratic in SSB.

@joeyh (Although "what happens if everyone self-hosted their own hubs" is something SSB is also just not designed for considering.)

@cwebber @bnewbold gotta be honest Christine, I loved your article and thread from the other day but the main thing I take issue with is your insistence on public, non-DM replies only being relevant to the people they are addressed to. A reply is public speech and serves a rhetorical function of speaking out loud in a public space, rather than passing a note. Its value to people over a DM *is* its ability to be read by as many unadressed parties as possible

@darius @bnewbold Yes, but a publish-subscribe architecture with message delivery does change how that information gets filtered and flows, especially in terms of information routing.

@laurenshof @bnewbold The ddos effect is because of fetching post previews tho, which is not part of AP

@cwebber @laurenshof I'm sympathetic to Laruens' focus on real issues, but I think the link preview fetching phenomena isn't super baked-in to AP-the-protocol and could be "solved" in isolation.

eg, would be easy to do it the way bsky does it (author populates), though that has it's own issues/controversy.

@bnewbold @cwebber "assemble complete reply-thread view"

Why is that even a goal though? On a large thread there's no way to display, let alone read, all branches of the tree. I'm usually perfectly fine reading the portion of the thread that my instance just happens to know about for random reasons (usually, on account of someone on my instance following the author of that post)

@nemobis @cwebber I plan to get in to this more in a longer response to Christine's blog post, but a design goal for atproto is to have "no compromises" compared to a centralized platform. we don't want to try and convince/educate users that they don't "need" consistent and complete views of public conversations (or accurate "counts", or low-latency notifications, etc)

(A reason I sometimes hear about is that "reply-guys" would somehow be discouraged from posting very popular replies if they saw how popular they are, i.e. that someone else has already posted the same thing elsewhere. I don't know whether this is true but if it is there are easier solutions, such as 1) fetch more replies when someone attempts to post a reply, 2) when a thread has a gazillion replies, give a warning "uh this thread is huge, maybe move it to a forum or take a walk in the park?".)

@bnewbold I get it, but giant threads are broken on Twitter as well, just as they are on email. People keep adding and removing ccs. There are a million different ways of displaying the tree so everyone gets surprised by whichever display sequence Twitter happens to pick. You need to click every child of every post to surface other branches of the tree which weren't displayed for whatever reason. Notifications are haphazard. Etc.

@cwebber

@bnewbold I guess what I'm trying to say that perhaps "no missing replies ever" can be replaced by an easier goal that covers the needs of 75 % of the easiest cases and people will be happier at a lower cost.

@cwebber

@nemobis @cwebber I think this conversation really gets at a difference in approach. we (Bluesky) are trying to migrate mass numbers of users off incumbent centralized platforms into alternatives with "credible exit" and interoperation. we want to make that as seamless and low-friction as possible, and asking folks to change expectations and behavior *at the same time* cuts against that.

can see this w/ quote posts, interaction counts, recommendation feeds, etc

@bnewbold As I've said before, it's a design space choice. Architectural decisions do flow for that, I'm just trying to help people understand what that means from a topology, including a topology of power distribution.

There are tradeoffs; it's best if people understand which ones are chosen.

@bnewbold @nemobis @cwebber credible exit confuses me , how am I supposed to migrate my Bluesky specific data to another instance or service? There is no other Bluesky and afaik the data in a pds is not compatible with mastodon or anything else?

@fleeky @cwebber @nemobis @bnewbold I suppose one way to look at it would be if Bluesky PBC goes rogue, a group of volunteers/some other non-profit org can stand up a "Newsky" (if you will) and your PDS can can work with that.

@icy @fleeky @nemobis @bnewbold That's exactly the "credible exit" thing

@darius @cwebber @bnewbold mods ban this user

(/hj)

@pearl @bnewbold naw @darius is great, treasured even <3

@cwebber Forgive my ignorance, but is this similar to decentralization with Bitcoin/blockchains: it's decentralized in the sense that anyone can get the whole public ledger, but to do that you need oodles of storage, so people end up using things like Coinbase?

Or am I way off in thinking this?

edit: I didn't notice the whole long conversation this sparked; I will pore over your previous responses in case I missed an answer to this!

@rwg It is a similarish mindset (not fully but there's large overlap in "big global brain" thinking) tho I think but touching that is touching some... sensitive territory, so I haven't really talked about it

@rwg @cwebber supposedly to run a full bitcoin node the space requirement is around 700gb , and for ethereum 1 terabyte... these requirements will also grow over time.

that said , its not like mastodon / activitypub do *not* have this problem .. its just not as pronounced.

@fleeky @rwg Those problems do exist in blockchains too, that's true. Though also bitcoin and etherium have been notably online for a heck of a lot longer. Also I don't pose blockchains as solutions.

AP still *scales down* though. Of course it grows over time. But the "scale of scaling" isn't even close to the same thing.

@cwebber @fleeky@prsm.space

I'm sure I speak for everyone in saying, "thank you!!!" so much for all the analysis you've done, and for taking the time to answer questions. I know you want us to check out Spritely, so here's a plug!

https://spritely.institute/about/

@rwg Aw thanx :D

@cwebber couldn't, at least in theory, relays be organized in a cascading arrangement (ร  la *cough* NNTP or BBS servers back in the days *cough*), possibly hierarchically? If so, wouldn't that solve the O(n^2) problem? What am I missing? (Sure, that would result in uneven distribution times, but that's a different problem.)

@zacchiro this is "quadratic as all users become their own relays", which is *not* the cascading arrangement

@cwebber it's entirely possible I'm missing something egregious here, but I don't understand why "everyone fully self-hosts" implies "everyone sends messages to everyone". (Which is what I understand from what you are saying, maybe wrongly.) Is there something in ATProto that intrinsically forbids a distribution scheme where everyone receives messages from a single upstream feed and delivers them to N downstream feeds? (Yes, I see a lot of problems with this, but not the O(nยฒ) one.)

@zacchiro I'm talking about whether or not you can apply the design to a non-hierarchical system

@bnewbold @nemobis @cwebber Where is the hardcoded labelling service hardcoded? At the app view (bsky.app) level?
replies
1
announces
0
likes
0

@Hyolobrika @nemobis @cwebber in the client app; labeler gets sent as HTTP header on every request

Is it removed in any third-party clients?