• 1 Post
  • 34 Comments
Joined 1Y ago
cake
Cake day: Jun 14, 2023

help-circle
rss

Power management is going to be a huge emerging issue with the deployment of transformer model inference to the edge.

I foresee some backpedaling from this idea that “one model can do everything”. LLMs have their place, but sometimes a good old LSTM or CNN is a better choice.


Yeah, this is actually a pretty great application for AI. It’s local, privacy-preserving and genuinely useful for an underserved demographic.

One of the most wholesome and actually useful applications for LLMs/CLIP that I’ve seen.


Ideally you want something that gracefully degrades.

So, my media library is hosted by Plex/Jellyfin and a bunch of complex firewall and reverse proxy stuff… And it’s replicated using Syncthing. But at the end of the day it’s on an external HDD that they can plug into a regular old laptop and browse on pretty much any OS.

Same story for old family photos (Photoprism, indexing a directory tree on a Synology NAS) and regular files (mostly just direct SMB mounts on the same NAS).

Backups are a bit more complex, but I also have fairly detailed disaster recovery plans that explain how to decrypt/restore backups and access admin functions, if I’m not available (in the grim scenario, dead - but also maybe just overseas or otherwise indisposed) when something bad happens.

Aside from that, I always make sure that all of all the selfhosting stuff in my family home is entirely separate from the network infra. No DNS, DHCP or anything else ever runs on my hosting infra.


It would be better to have this as a FUSE filesystem though - you mount it on an empty directory, point the tool at your unorganised data and let it run its indexing and LLM categorisation/labelling, and your files are resurfaced under the mountpoint without any potentially damaging changes to the original data.

The other option would be just generating a bunch of symlinks, but I personally feel a FUSE implementation would be cleaner.

It’s pretty clear that actually renaming the original files based on the output of an LLM is a bad idea though.


I don’t think it’s necessarily a bad thing that an AI got it wrong.

I think the bigger issue is why the AI model got it wrong. It got the diagnosis wrong because it is a language model and is fundamentally not fit for use as a diagnostic tool. Not even a screening/aid tool for physicians.

There are AI tools designed for medical diagnoses, and those are indeed a major value-add for patients and physicians.


Precisely. Many of the narrowly scoped solutions work really well, too (for what they’re advertised for).

As of today though, they’re nowhere near reliable enough to replace doctors, and any breakthrough on that front is very unlikely to be a language model IMO.


Exactly. So the organisations creating and serving these models need to be clearer about the fact that they’re not general purpose intelligence, and are in fact contextual language generators.

I’ve seen demos of the models used as actual diagnostic aids, and they’re not LLMs (plus require a doctor to verify the result).


There are some very impressive AI/ML technologies that are already in use as part of existing medical software systems (think: a model that highlights suspicious areas on an MRI, or even suggests differential diagnoses). Further, other models have been built and demonstrated to perform extremely well on sample datasets.

Funnily enough, those systems aren’t using language models 🙄

(There is Google’s Med-PaLM, but I suspect it wasn’t very useful in practice, which is why we haven’t heard anything since the original announcement.)


It is quite terrifying that people think these unoriginal and inaccurate regurgitators of internet knowledge, with no concept of or heuristic for correctness… are somehow an authority on anything.


I know of at least one other case in my social network where GPT-4 identified a gas bubble in someone’s large bowel as “likely to be an aggressive malignancy.” Leading to said person fully expecting they’d be dead by July, when in fact they were perfectly healthy.

These things are not ready for primetime, and certainly not capable of doing the stuff that most people think they are.

The misinformation is causing real harm.


Ohh, my bad! I thought the person you were replying to was asking about Gitea. Yeah, Forgejo seems truly free and also looks like it has a strong governance structure that is likely to keep things that way.


This sadly isn’t true anymore - they now have Gitea Enterprise, which contains additional features not available in the open source version.


From here:

  • SAML
  • Branch protection for organizations
  • Dependency scanning (yes, there are other tools for this, but it’s still a feature the open source version doesn’t get).
  • Additional security controls for users (IP allowlisting, mandatory MFA)
  • Audit logging

Don’t use Gitea, use Forgejo - it’s a hard fork of Gitea after Gitea became a for-profit venture (and started gating their features behind a paywall).

Codeberg has switched to Forgejo as well.

Also, there’s some promising progress being made towards ActivityPub federation in Forgejo! Imagine a world where you can comment on issues and send/receive pull requests on other people’s projects, all from the comfort of a small homeserver.


I saw a job posting for Senior Software Engineer position at a large tech company (not Big Tech, but high profile and widely known) which required candidates to have “an excellent academic track record, including in high school.” A lot of these requirements feel deliberately arbitrary, and like an effort to thin the herd rather than filter for good candidates.


Songs and albums that I’ve uploaded from my own collection have disappeared from Apple Music, despite my physically owning them on CD and Apple advertising the ability to store my CD rips in the cloud.

It’s unacceptable. I’m still on Apple Music for now, but moving my music library to Jellyfin looks more appealing by the day.


Idk… in theory they probably don’t need to store a full copy of the page for indexing, and could move to a more data-efficient format if they do. Also, not serving it means they don’t need to replicate the data to as many serving regions.

But I’m just speculating here. Don’t know how the indexing/crawling process works at Google’s scale.


This is probably an attempt to save money on storage costs. Expect cloud storage pricing from Google to continue to rise as they reallocate spending towards ML hardware accelerators.

Never been happier to have a proper NAS setup with offsite backup 🙃


It’s a risk that I’m willing to take, personally.

But tbf I do make sure that I own my primary mail domain.

Website hosting and such thing? Njal.la all the way. Never had an issue with them.

Edit: oof, clearly some irrational hate for njal.la here. I state my personal preference and get downvoted… is this reddit now?!


I would’ve been delighted to receive a managed Ethernet switch as a kid! I hope it came with some useful SFP modules and a USB serial adapter 😜


The reddest of red flags.

Open source vulnerabilities typically stem from poorly written code

Yeah, because paid programmers never write bad closed-source code…


I found it much more barebones in my tinkering. It doesn’t seem to support pulling via SSH (and definitely doesn’t support signing commits). Configuration options appear extremely limited, both in documentation and the UI.

It looks nice, but I don’t really see the point to it when Gitea Actions is now a thing. Gitea is a more mature product, and is similarly fast and lightweight.

Edit: s/Gitea/Forgejo. Gitea has moved to a for-profit model since I made this comment.


This is why self hosted to me means actually running it on my own hardware in a location I have at least some control of physical access.

That said, an ISP could perform the same attack on a server hosted in your home using the HTTP-01 ACME challenge, so really no one is safe.

HSTS+certificate pinning, and monitoring new certificates issued for your domains using Certificate Transparency (crt.sh can be used to view these logs) is probably the only way to catch this kind of thing.


Njalla is mine. I like the privacy protections they offer.


Are CloudFlare, Amazon or Microsoft any better? Google at least take security (if not privacy) very seriously.

In general it seems bad to have any huge profit-driven organisation exercising significant control over open standards, but I do think that Google is lesser than many of the other evils.


SFF PCs with NVMe slots
I'm currently trying to build out a ZFS array with a few 8TB drives I have lying around. I have one of [these](https://www.digikey.com.au/en/products/detail/seeed-technology-co.,-ltd/103990543/13536270?msclkid=f483d664923b1a05595bb2dd98674604) 5-port NVMe SATA controllers and am looking for advice on which SFF PC to buy. I had a spare NUC that I thought had a NVMe slot, but turns out it's SATA only. Does anyone have any recommendations for reasonably cheap (second hand is fine) machines that would have: gigabit ethernet, USB3.0+, M.2 slot that supports NVMe? Thanks in advance!
fedilink

Sonarr and Radarr with Ombi for requests if desired. Transmission + OpenVPN for the download side.

Or you could manually rip DVDs/Blu Rays if you can still get ahold of them for the stuff you want to watch.


Did they ever satisfactorily resolve that issue, or did the media just stop covering it as aggressively? Last I heard they were trying to add solar shields to the satellites to reduce their albedo.



Transmission with OpenVPN, using the haugene/transmission-openvpn Docker image.

I mostly torrent via API using Sonarr and Radarr.


I’d argue the bigger moral is that you should always own your online identity. You should buy your own domain (@yourname.xyz or something like that) and make your email on that. So if Google bans you, you just switch email providers and keep your address.


IIRC DuckDuckGo wasn’t a fan of the Australian media bargaining bill either. I suspect they will also deindex news sites in Canada should amendments not be made.

I haven’t seen the Canadian one and this is honestly the first I’ve heard of it, but the idea that a referrer has to pay a news website for directing traffic to them is ludicrous to me.


Looks like a very cool project, thanks for building it and sharing!

Based on the formula you mentioned here, it sounds like an instance with one user who has posted at least one comment will have a maximum score of 1. Presumably the threshold would usually be set to greater than 1, to catch instances with lots of accounts that have never commented.

This has given me another thought though: could spammers not just create one instance per spam account? If you own something like blah.xyz, you could in theory create ephemeral spam instances on subdomains and blast content out using those (e.g. spamuser@esgdf.blah.xyz, spamuser@ttraf.blah.xyz, etc.)

Spam management on the Fediverse is sure to become an interesting issue. I wonder how practical the instance blocking approach will be - I think eventually we’ll need some kind of portable “user trustedness” score.


Maybe I’m being stupid, but how does this service actually determine suspicious-ness of instances?

If I self-host an instance, what are my chances of getting listed on here and then unilaterally blocked simply because I have a low active user count or something?


There has been some good commentary about this on Mastodon, but the long and short of it seems to be that federation is actually a pretty terrible way to harvest data.

The entire fediverse is based heavily on openly accessible APIs - Meta doesn’t need to federate with your instance to scrape your data, there’s really not much that can be done about it.

The real solution to Meta’s unethical behaviour is unfortunately going to be legislation, not technical.