So, I’m selfhosting immich, the issue is we tend to take a lot of pictures of the same scene/thing to later pick the best, and well, we can have 5~10 photos which are basically duplicates but not quite.
Some duplicate finding programs put those images at 95% or more similarity.

I’m wondering if there’s any way, probably at file system level, for the same images to be compressed together.
Maybe deduplication?
Have any of you guys handled a similar situation?

@tehnomad@lemm.ee
link
fedilink
English
52M

Not sure if you’re aware, but Immich has a duplicate finder

Bakkoda
link
fedilink
English
12M

And immich-go can run one via cli

lemmyvore
link
fedilink
English
22M

From what I understand OP’s images aren’t the same image, just very similar.

@tehnomad@lemm.ee
link
fedilink
English
02M

Yeah, the duplicate finder uses a neural network to find duplicates I think. I went through my wedding album that had a lot of burst shots and it was able to detect similar images well.

@ShortN0te@lemmy.ml
link
fedilink
English
12M

Would be surprised if there is any AI involved. Finding duplicates is a solved problem.

AI is only involved in object detection and face recognition.

@tehnomad@lemm.ee
link
fedilink
English
12M

I wasn’t sure if it was AI or not. According to the description on GitHub:

Utilizes state-of-the-art algorithms to identify duplicates with precision based on hashing values and FAISS Vector Database using ResNet152.

Isn’t ResNet152 a neural network model? I was careful to say neural network instead of AI or machine learning.

@ShortN0te@lemmy.ml
link
fedilink
English
12M

Thanks for that link.

AI is the umbrella term for ML, neural networks, etc.

ResNet152 seems to be used only to recognice objects in the image to help when comparing images. I was not aware of that and i am not sure if i would classify it as actuall tool for image deduplication, but i have not looked at the code to determine how much they are doing with it.

As of now they still state that they want to use ML technologies in the future to help, so they either forgot to edit the readme or they do not use it.

Bakkoda
link
fedilink
English
12M

You can also adjust the threshold however that’s probably not a great idea unless you manually want to accept/reject the duplicates.

Create a post

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

Rules:

  1. Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

  • 1 user online
  • 124 users / day
  • 419 users / week
  • 1.16K users / month
  • 3.85K users / 6 months
  • 1 subscriber
  • 3.68K Posts
  • 74.2K Comments
  • Modlog