@tw @crschmidt @cshabsin link preview bots all ignore robots.txt, so mastodon is at least following precedent here.
Except that I think Mastodon's implementation is wrong: on a centralized network the preview is created at the 'request' of the person sharing, so robots.txt doesn't apply. But here it's created fully automatically, so it really should apply. The fix would be to capture the site at sharing time and send it along in the post, which is also more efficient (though prone to abuse?)
@jefftk @tw @cshabsin yeah, the prone to abuse and "hard to standardize across all implementations" are the reasons it was rejected in 2017, and has languished as an untouched feature request since 2020 (respectively). Time to rethink that. (I don't love that a single implementation is 95% of the fediverse, but it is; standardization is frankly secondary to making sure the core implementation works well.
That's not actually true, you just can't see us from across The Great Wall of Mastodon.
I'd love to put AP federation on the project I am working on eventually but I think it would be better suited to not tie in with what's currently out there
This would be for future reference tho because it's not even in alpha yet lol
@kroner @crschmidt @tw @cshabsin @jefftk Just use custom Object types. Most servers will drop them if they don’t understand.
Over the years, I made a handful of maps of various things in Cambridge; I have collected some, but not all of them, on this page about housing things in Cambridge.
This includes things like maps of where you could legally build a fourplex (short answer: not many places!); the distribution of tax paid per parcel (Kendall Square pays a lot!) and more.
Fun fact: sharing this link on Mastodon caused my server to serve 112,772,802 bytes of data, in 430 requests, over the 60 seconds after I posted it (>7 r/s). Not because humans wanted them, but because of the LinkFetchWorker, which kicks off 1-60 seconds after Mastodon indexes a post (and possibly before it's ever seen by a human).
Every Mastodon instance fetches and stores their own local copy of my 750kb preview image.
(I was inspired by to look by @jwz's post: https://mastodon.social/@jwz/109411593248255294.)
lol, and because @jwz boosted _this_ post -- which does not include my URL in it! -- I got _another_ stampede... because Mastodon fetches the "context" of the post as well, so all the Mastodon servers with someone following jwz got both this post and the parent post indexed, and those servers all crawled mine as well.
(At least I fixed the HTTP->HTTPS problem in my og: information that caused _3_ requests per server in the first round first!)
Now up to 1.7GB, across 4846 requests, from 2302 different instances.
Interestingly, this includes servers that my instance has suspended; e.g. poa.st, which I wouldn't have expected to see this post; I guess this is because boosts from a server that isn't defederated with them can make it through. That has some side effects I don't like.
Chart attached shows new instances requesting pages from my server per minute since I posted it. List of instances at https://crschmidt.net/fediverse/linkfetcher-instances.txt .
Additional fun things to learn:
- Mastodon does not appear to use range headers or limit the size of a download to identify opengraph information, so linking a large file means 1000 servers are gonna read the whole damn thing?
- Plemora does a HEAD request first to (I expect) check the content-type: for the .txt link above, Plemora does _not_ fetch the content of the file (Mastodon does)
(Granted there's some stuff that doesn't support ActivityPub on here, but it's technically all part of the fediverse)
- replies
- 2
- announces
- 2
- likes
- 2
@sj_zero @tw @jefftk 95% by users and by "traffic delivered by the fediverse botnet". I'm aware there's a long tail of other services out there, but 96.7% of the requests to fetch my page metadata over the past 24 hours have been Mastodon instances. And 96% of active users are on Mastodon servers. https://mastodont.cat/@fediverse/109415451209328633
@crschmidt @jefftk @tw @cshabsin
Funny how everybody who pointed out the 95% figure is wrong was blocked in advance.
@sj_zero @crschmidt @tw @jefftk Mastodon wins at number of servers and total number of users.
Pleroma and its forks win at activity - of people actually doing stuff. One Pleroma user posts probably 10x more than a Mastodon user and are likely to stick around for longer.
@sj_zero @tw @jefftk But if it makes more sense, you can re-read my post as "This particular problem of aggressively DDOSing websites is primarily a Mastodon problem" (and, to a lesser extent, Plemora).
So if other ActivityPub/Fediverse implementations don't create this problem, then this is really a Mastodon problem that can have a Mastodon-specific solution without worrying about the rest of the Fediverse. (I assume that some extra ActivityStreams metadata won't break other implementations!)
@sj_zero @crschmidt @tw @jefftk (Look at https://fedi.ninja sorted by most active)
@PaterSnape @jefftk @tw @cshabsin I don't really know what you mean here?
Specifically, what I mean is "95% of the servers making these requests are Mastodon servers, so this is largely a Mastodon problem." My understanding is Mastodon is also the home to the overwhelming majority of the Fediverse _users_ (even though it's only bare majority of _servers_). My wording choice was poor, but I don't have anyone intentionally blocked?
@crschmidt @jefftk @tw @cshabsin
There are three people who answered to your post and didn't show up on the public thread of your instance. So I think they are blocked.
@crschmidt @jefftk @tw @cshabsin
I don't appear there either, so I don't know what that means.
@PaterSnape @crschmidt @tw @cshabsin link to one of the posts that seem blocked?
@PaterSnape @jefftk @crschmidt @tw @cshabsin MIT is blocking me? Lol.
@PaterSnape Okay. Well, that's through no action of mine. However, looking at https://joinfediverse.wiki/FediBlock likely gives the easy explanation here.
- NoAgendaSocial is relatively widely blocked, nearly as much as poa.st. (On better.boston, this is "limited".)
- Gleasonator is also widely blocked, and is marked "Suspend" on my instance.
I have no particular opinions on these particular moderation choices, but they weren't made by me, and certainly not to block people trying to correct me personally.
That's why I said they were blocked in advance. I personally don't do any instance counting, but they disagree with your numbers. And maybe your estimation is wrong, because you blocked most instances that don't run Mastodon.
@PaterSnape (re-adding @jefftk to the thread, since he was interested in the behavior; Jeff, see previous post here. This is based on how Mastodon handles instance blocks -- for Pater, a "limit", so I can search for his posts and reply to them, and would see them if I followed him; for Gleasonator, the instance is Suspend, so I can't find them even if I search explicitly.)
@kroner @alex @crschmidt @tw @cshabsin @jefftk @PaterSnape See how easily he can ignore the steak.
My impression is, users who have been around for longer, feel relaxed and are more likely to engage freely. The ‘rules for engaging’ are learned by observation and trial-and-error and are ‘mastered’ by those who stick around. This I observe mostly on Pleroma/Rebased.
To a lesser extent, on Diaspora. On that platform, I think, it’s more long format - so people post less frequently. This may be the reason.
@alex @PaterSnape No, not MIT; better.boston is (likely because they took the list from Fediblock as a starting point for blocking instances.) https://joinfediverse.wiki/FediBlock
@alex @PaterSnape @jefftk @crschmidt @tw @cshabsin That’s better than a degree from them. Wild cards drastically change the game.