I’m excited to announce the first alpha preview of this project that I’ve been working on for the past 4 months. I’m initially posting about this in a few small communities, and hoping to get some input from early adopters and beta testers.
The DHT crawler is Bitmagnet’s killer feature that (currently) makes it unique. Well, almost unique, read on…
So what is it? You might be aware that you can enable DHT in your BitTorrent client, and that this allows you find peers who are announcing a torrent’s hash to a Distributed Hash Table (DHT), rather than to a centralized tracker. DHT’s lesser known feature is that it allows you to crawl the info hashes it knows about. This is how Bitmagnet’s DHT crawler works works - it crawls the DHT network, requesting metadata about each info hash it discovers. It then further enriches this metadata by attempting to classify it and associate it with known pieces of content, such as movies and TV shows. It then allows you to search everything it has indexed.
This means that Bitmagnet is not reliant on any external trackers or torrent indexers. It’s a self-contained, self-hosted torrent indexer, connected via the DHT to a global network of peers and constantly discovering new content.
The DHT crawler is not quite unique to Bitmagnet; another open-source project, magnetico was first (as far as I know) to implement a usable DHT crawler, and was a crucial reference point for implementing this feature. However that project is no longer maintained, and does not provide the other features such as content classification, and integration with other software in the ecosystem, that greatly improve usability.
If this project interests you then I’d really appreciate your input:
Thanks for your attention. If you’re interested in this project and would like to help it gain momentum then please give it a star on GitHub, and expect further updates soon!
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.
Rules:
Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
Resources:
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Hi, this is a great point and one that I’ve already given consideration to. I’ll address separately the issue of the primary datastore ,i.e. Postgres, and the Redis dependency:
Postgres as the only option for the data store
There are 2 reasons for this:
Redis dependency
Redis is currently used only for the asynchronous task queue. I would like to have put this in Postgres, but there simply is not a good out-of-the-box solution that works well with Postgres and GoLang, and is actively maintained. I looked at quite a few queuing libraries and eventually settled on asynq (https://github.com/hibiken/asynq), which is a great library and does the job well - but could really do with support for non-Redis backends.
Using Redis here was a pragmatic decision that allowed me to make progress, rather than an optimal one. I guess I could have built a simple Postgres-based queue myself but that would have been a distraction and probably sub-optimal compared with a mature/separately developed library. It remains an option. Since I looked into this a new project has sprung up which I’m keeping an eye on - https://www.tork.run/ - it has a Postgres backend and looks like it might be up to the job, but is very new.
So yes, I’m very aware that the additional Redis dependency is not ideal and it may well disappear at some point.
Hi, those points are certainly valid and I have nothing against these picks!
I just wanted to chime in that perf might not be as big of a problem as you might expect. 5k/hour is 1.4/sec, which sqlite should for sure be able to handle.
In fact, you can do hundreds to thousands of writes/sec, as long as you batch them in transactions (as by default each query is executed in its own transaction).
thank you for such a detailed response. I would love to contribute however at the moment my capacities are rather limited but otherwise I’d be willing to add sqlite adapter. From your description it sounds like currently architecture is narrowly locked on PostgreSQL features. In my daily job I love PostgreSQL for big apps and stacks but I’m also aware how “hungry” PG can be, which is why I’m wondering whether it’s “too big of a hammer” for this particular problem. Also, setting up single service is easier to novices vs maintaining several. Docker compose is nice but it has it’s limitations.