• 1 Post
  • 18 Comments
Joined 1Y ago
cake
Cake day: Aug 02, 2023

help-circle
rss

Here’s my way of doing it. TLDR: LUKS with a encryption key hosted in my router

https://nowicki.io/self-hosting-lvm-raid1-with-key-over-ftp/


I keep my drives encrypted with a key currently hosted in my router hoping they wouldn’t steal that. I’m thinking of actually putting it to cloud so I can disable it remotely.

It was quite a ride to make everything work and I made a blog post explaining it so I remember what I did.

https://nowicki.io/self-hosting-lvm-raid1-with-key-over-ftp/




Thanks but don’t expect too much yet. Many sources are still missing. If you notice something should be there but it’s not even being crawled feel free to reach me one Mastodon or add it directly via PR here: https://github.com/Kukei-eu/spider/blob/main/index-sources.js



I’m on iPhone 12 mini. I love that small design and I strongly believe phones should be small.

Thanks for the good words! Highly appreciate it!


  1. SO and Reddit are on the TODO list. It even had SO (in the bottom indeed) once but not via crawling, via SO Search API. It has very poor quality results and was super slow so I had to remove it while thinking of a better solution. Crawling entire SO might be little too much of this project at this state tho but if I have enough courage and hours at night I might parse that 20GB stack overflow archive dump and try doing something useful with it.

Same for Reddit but here I have mixed feelings about it in general and hope it’s going to die soon being replaced by amazing Lemmy communities.

I also used to type some question and end with “reddit” in Google to get good quality content, but here with kukei the experiment is whether blogosphere can replace it properly when index is promoting it.

  1. Why blogs?

This is my main thing. To promote good quality blogs that I tried to follow via RSS but somehow never did. Having them all indexed (and more, some Mastodon community gave me amazing links to index) makes me actually visit them often.

For the “SEO cancer” that where curation comes into play. Before crawling I check unknown blogs to me and decide whether something goes in or not.


Great ideas. For the source code I’m not sure but I’ll put it to the backlog of cool things I get from Lemmy and work on them one by one. Thanks!


The crawler takes only the sources that are defined in the crawler repo (it’s open source, check the github org or kukei-spider).

So in this way it’s “curated” in a sense that it would not add anything else to the index.


Thx for the comments. I’ll fix the mobile view and will definitely redesign it all a bit over weekend. I see a lot of room for improvements.

Also will check how to submit it to Lenses. Highly appreciate it!

EDIT: mobile view is fixed, also did some small adjustments in the whitespaces between result items.


Good idea. I had this thought once to do some narrow indexing of websites, e.g. stack overflow is a big issue, indexing all of this is crazy, picking up some specific tags on the other hand feels like tons of work. In the end I adjust the whole project as it grows with hope that after every tuning it gets better.

As long as I have fun with it I’ll continue :D


For ?? I guess it already has a decent results. I’ll periodically check those kind of cases once the index gets more languages.

https://kukei.eu/?q=js+%3F%3F+operator


Thanks! If you have some suggestions in the future I’m always open to hear


Ah. This will never happen. I have zero motivation to do any GDPR stuff in this project. Even for analytics I anonymize visitors IPs so plausible don’t get them.

Also in this case it would be nonsense. For general search it makes sense that Bing knows I’m after parceljs when typing „parcel” instead of spedition companies. For such narrow search engine the user persona is known.


If it has it’s totally accidental.

What’s the use case for searching for those kind of symbols? I’ll check if I can tune it for this.


It’s still in MVP, work in progress, hence the index is not “full”.

For me “web development” is everything that we might need for well, web. Servers, mongo docs all goes into the index (I’m adding it every day basically but also it takes some time to index stuff and I observe how this whole thing works as index grows).

ASP.NET goes into the index of course. If your website has dev resources and blog posts that would go into it as well. Recently one person suggested tons of Haskell blogs and they are being indexed as we speak.

I have also a different problem, dev.to has a lot of good resources but also tons of SEO spam and low quality content. It’s also freaking huge and while it was for some time in the index I had to remove it and think about it some more.

Where would you draw lines on mixed c content or technologies

For now the line is: does this website have anything that web devs would need? Yes? Then it might get in.

If it’s a blog about locomotive CPU programming then maybe not. Although mostly due to infrastructure costs. Indexing cost in the end but having some non related stuff in the index should not hurt the results.

All of what I wrote is the state for today. I’m changing my mind often as it’s still in “having fun” state.

PS. also thanks for the feedback!


I like how first queries you guys make are attempts to SQL inject and XSS it.

EDIT: if you find something let me know, PRs also welcomed ;)


I’m creating a curated search engine for web developers. Asking for a feedback
So, in the era of increasingly good AI powered tools and general search engines full of SEO spam, last week I started creating something little old school and against the trends. For now It's a have-fun-and-find-out project that main aim is to provide good search results for general web development queries with a special focus on independent blog authors. The thesis is that no SEO spam website is in the index, which will already filter out most annoying noise on Google/Bing. Search results are grouped per type: docs, blogs and magazines (e.g. blog platforms or bigger websites). For now it's far from being done in terms of having a full index, but in most cases it already replaces my go-to search engine when I'm looking up some stuff during work. I'm looking forward hearing out what y'all think and if you think it makes sense overall I can only encourage you to post some links to blogs or docs that are still missing in the index. I'm more than happy to add it to the crawler. Responds like: "nei, total shit, who would need that" also accepted but constructive critique more appreciated ;) EDIT: everyone many thanks for all your voices and comments. I'm super grateful for all of them and happy that we have such place like Lemmy!
fedilink