@noli

@noli@programming.dev

Tar Xtract Ze Vucking File

Edit: apparently someone else already mentioned this, oops

@noli@programming.dev

Is there any reason why you didn’t just switch the keyboard layout to US if that’s what you’re used to?

I switched to US at some point because many if the keys for programming were just so much easier to access. If I have to use a pc for any decent amount of time, I just switch the OS layout to US now regardless of the layout that’s printed on the keyboard.

@noli@programming.dev

Oh yeah, it’s actually pretty extensive and expressive. If you’re interested in this sort of stuff it’s worth checking out the IR language reference a bit. Apparently you can even specify the specific garbage collection strategy on a per-function basis if you want to. They do however specify the following: “Note that LLVM itself does not contain a garbage collector, this functionality is restricted to generating machine code which can interoperate with a collector provided externally” (source: https://llvm.org/docs/LangRef.html#garbage-collector-strategy-names )

If you’re interested in this stuff it’s definitely fun to work through a part of that language reference document. It’s pretty approachable. After going through the first few chapters I had some fun writing some IR manually for some toy programs.

@noli@programming.dev

LLVM is designed in a very modular way and the LLVM IR allows you to specify e.g. if memory management should be manual/garbage collected.

You could make a frontend (design a language) for LLVM that exposes those options through some compiler directives.

In general I’d heavily recommend looking into LLVM’s documentation.

@noli@programming.dev

Reminds me of the old joke that monads are easy to understand, you just have to realize monads are just monoids in the class of endofunctors.

@noli@programming.dev

Couldn’t you do something like JWT except allow the client to slap on their credentials to any initial request?

From the backend side that means that if there is no valid token, you can check the request body for the credentials. If they’re not there, then it’s an unauthorized request.

You’re eliminating a singular request in a long period of time at the cost of adding complexity to both client and backend but if the customer wants to be silly that’s their fault

@noli@programming.dev

JS was a mistake.

@noli@programming.dev

That’s actually a really nice application, in this case to reduce bandwidth constraints as opposed to the usual use case of memory constraints!

@noli@programming.dev

Cool, so in this case your filter is basically a classifier ML model. How would you set the hash functions then though?

@noli@programming.dev

Interesting. Do I understand it correctly if I say it’s a bloom filter where instead of setting a bit to 1 for each of the hashes, you increment a counter for that hash?

How do you infer the count then, take the minimum of all matching hashes? Because intuitively it seems to me like you would need a lot more space to avoid counts being too high

@noli@programming.dev

I know they are used in google’s BigTable. All data there is stored in seperate SSTables and you can specify that a locality group should have bloom filters generated for its SSTables. Apparently cassandra has them too.

Both are the same general application though and you already mentioned databases.

I did think about using them at some point for authentication purposes in a webservice. The idea being to check for double uses of a refresh token. This way the user database would need to store only a small amount of extra storage to check for the reuse of a refresh token and if you set the parameters accordingly, the false positives are kind of a benefit in that users cannot infinitely refresh and they actually have to reauthenticate sometimes.

Edit to add: I also read a paper recently that uses a datastructure called a collage that is closely related to bloom filters to perform in-network calculations in a sensor network. If I understand correctly, the basic idea there is that every node in the network adds a bit to the datastructure while it is in transit, so data from the entire network is aggregated. The result can then be fed to a classifier ML model. (Source: Oostvogels, J., Michiels, S., & Hughes, D. (2022). One-Take: Gathering Distributed Sensor Data Through Dominant Symbols for Fast Classification. )

@noli@programming.dev

It does create a MITM vulnerability, the question is just whether it matters or not. With HTTPS a third party will only know which url you’re accessing. With HTTP they can see exactly what data is transferred and can modify that data at will.

So adding HTTPS here accomplishes:

hiding which exact page of the hacker’s dictionary you’re accessing
hiding the exact contents of the page
ensuring that this page doesn’t get modified in transit

None of these are really an issue, so using http in this situation is fine. In general though, I’d consider not having HTTPS as a bug for most sites, unless you’re extremely resource constrained on either side of the connection and you think carefully about the security and privacy implications

@noli@programming.dev

Here: “yes it does support TLS1.3”