Does anyone know of any off the shelf tool (online or offline) to find duplicates in several DNS blocklists and merge them into one?
Context: I am running AdGuard on one GL.iNet router with ~10 blocklists some of them pretty huge and most of the times the lists are updated the router comes to one halt while doing so, having to often times reboot it through the old power-off-and-on.
I would rather download the lists myself from time to time and merge them into one file but with duplicates extracted somehow.
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.
Rules:
Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
Resources:
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
If I’m understanding you correctly, you could make use of a shell script for this. Use WGET to download lists, then combine them into a single large file, and finally create a new file with no duplicates by using “awk ‘!visited[$0]++’”
wget URL1 URL2 URL3
cat *.txt > all.txt (This overwrites all.txt)
awk ‘!visited[$0]++’ all.txt > no_duplicates.txt
When no tool is available bash to the rescue, thank you for this it seems actually simpler then I thought :)