Entry Level Microprocessors: Linux, 600MHz and 128MB of RAM

@dragontamer@lemmy.world

Universal Paperclips is one of the best clicker games.

In particular: because it isn’t a clicker game. It only starts off as one. There’s only about 2 sections IIRC that are “clicker”, the start (before auto-clippers kick in), and then the quantum computer.

I guess you have to launch your first 20 or 30 probes at the space stage and that’s done one-click-at-a-time… but I don’t think that counts as a “clicker” game since its so few clicks in the great scheme of things. At no other point is rapid-clicking that useful.

@dragontamer@lemmy.world

I had a pretty standard linear-list scan initially. Each time the program started, I’d check the list for some values. The list of course grew each time the program started. I maximized the list size to like 2MB or something (I forget), but it was in the millions and therefore MBs range. I figured it was too small for me to care about optimization.

I was somewhat correct, even when I simulated a full-sized list, the program booted faster than I could react, so I didn’t care.

Later, I wrote some test code that exhaustively tested startup conditions. Instead of just running the startup once, I was running it millions of times. Suddenly I cared about startup speed, so I replaced it with a Hash Table so that my test-code would finish within 10 minutes (instead of taking a projected 3 days to exhaustively test all startup conditions).

Honestly, I’m more impressed at the opposite. This is perhaps one of the few times I’ve actually taken the linear-list and optimized it into a hash table. Almost all other linear-lists I’ve used in the last 10 years of my professional coding life remain just that: a linear scan, with no one caring about performance. I’ve got linear-lists doing some crazy things, even with MBs of data, that no one has ever came back to me and said it needs optimization.

Do not underestimate the power of std::vector. Its probably faster than you expect, even with O(n^2) algorithms all over the place. std::map and std::unordered_map certainly have their uses, but there’s a lot of situations where the std::vector is far, far, far easier to think about, so its my preferred solution rather than preoptimizing to std::map ahead of time.

@dragontamer@lemmy.world

Rasp Pi’s power usage, be it the RP2040 or the Rasp. Pi products in general, always have had horrendous power-consumption specs and even worse sleep/idle states.

@dragontamer@lemmy.world

How many layers does the Orange Pi Zero pcb have?

Answer: Good luck finding out. That’s not documented. But based off of the layout and what I can see with screenshots, far more than 4 layers.

A schematic alone is kind of worthless. Knowing if a BGA is designed for 6, 8, or 10 layers makes a big difference. Seeing a reference pcb-implementation with exactly that layer count, so the EE knows how to modify the design for themselves is key to customization. There’s all sorts of EMI and trace-length matching that needs to happen to get that CPU to DDR connection up-and-running.

Proving that a 4-layer layout like this exists is a big deal. It means that a relative beginner can work with the SAM9x60’s DDR interface on cheap 4-layer PCBs (though as I said earlier: 6-layers offer more room and is available at OSHPark so I’d recommend a beginner work with 6 instead)

With regards to SAM9x60D1G-I/LZB SOM vs Orange Pi Zero, the SAM9x60D1G-I/LZB SOM provides you with all remaining pins of access… 152 pins… to the SAM9x60. Meaning a full development board with full access to every feature. Its a fundamentally different purpose. The SOM is a learning-tool and development tool for customization.

@dragontamer@lemmy.world

Well, my self-deprecating humor aside, I’ve of course thought about it more deeply over my research. So I don’t want to sell it too short.

SAM9x60 has a proper GPU (albeit 2D one), full scale Linux, and DDR2 support (easily reaching 64MB, 128MB or beyond of RAM). At $3 for DDR2 chips the cost-efficacy is absurd (https://www.digikey.com/en/products/detail/issi-integrated-silicon-solution-inc/IS43TR16640C-125JBL/11568766), a QSPI 8MBit (1MB) SRAM chip basically costs the same as 1Gbit (128MB) of RAM.

Newhaven Displays offers various 16-bit TFT/LCD screens (https://newhavendisplay.com/tft-displays/standard-displays/) at a variety of price points. Lets take say… 400x300 pixel 16-bit screen for instance. How much RAM do you need for the framebuffer? (I dunno: this one https://newhavendisplay.com/4-3-inch-ips-480x272px-eve2-resistive-tft/ or something close).

Oh right, 400 x 300 x 2-bytes per pixel and we’re already at 240kB, meaning the entire field of MSP430, ATMega328, ARM Cortex-M0 and even ARM Cortex-M4 are dead on the framebuffer alone. Now lets say we have a 10-frames of animation we’d want to play and bam, we’re already well beyond what a $3 QSPI SRAM chip will offer us.

But lets look at one of the brother chips really quick: Microchip’s SAMA5D4. Though more difficult to boot up, this one comes with H.264 decoder. Forget “frames of animation”, this baby straight up supports MP4 videos on a full scale Linux platform.

Well, maybe you want Rasp. Pi to run that, but a Rasp. Pi 4 can hit 6000mW of power consumption, far beyond the means of typical battery packs of the ~3-inch variety. Dropping the power consumption to 300mW (SAMA5D4 + DDR2 RAM) + 300mW (LCD Screen) and suddenly we’re in the realm of AAA batteries.

@dragontamer@lemmy.world

That’s not what storage engineers mean when they say “bitrot”.

“Bitrot”, in the scope of ZFS and BTFS means the situation where a hard-drive’s “0” gets randomly flipped to “1” (or vice versa) during storage. It is a well known problem and can happen within “months”. Especially as a 20-TB drive these days is a collection of 160 Trillion bits, there’s a high chance that at least some of those bits malfunction over a period of ~double-digit months.

Each problem has a solution. In this case, Bitrot is “solved” by the above procedure because:

Bitrot usually doesn’t happen within single-digit months. So ~6 month regular scrubs nearly guarantees that any bitrot problems you find will be limited in scope, just a few bits at the most.
Filesystems like ZFS or BTFS, are designed to handle many many bits of bitrot safely.
Scrubbing is a process where you read, and if necessary restore, any files where bitrot has been detected.

Of course, if hard drives are of noticeably worse quality than expected (ex: if you do have a large number of failures in a shorter time frame), or if you’re not using the right filesystem, or if you go too long between your checks (ex: taking 25 months to scrub for bitrot instead of just 6 months), then you might lose data. But we can only plan for the “expected” kinds of bitrot. The kinds that happen within 25 months, or 50 months, or so.

If you’ve gotten screwed by a hard drive (or SSD) that bitrots away in like 5 days or something awful (maybe someone dropped the hard drive and the head scratched a ton of the data away), then there’s nothing you can really do about that.

@dragontamer@lemmy.world

If you have a NAS, then just put iSCSI disks on the NAS, and network-share those iSCSI fake-disks to your mini-PCs.

iSCSI is “pretend to be a hard-drive over the network”. iSCSI can exist “after” ZFS or BTRFS, meaning your scrubs / scans will fix any issues. So your mini-PC can have a small C: drive, but then be configured so that iSCSI is mostly over the D: iSCSI / Network drive.

iSCSI is very low-level. Windows literally thinks its dealing with a (slow) hard drive over the network. As such, it works even in complex situations like Steam installations, albeit at slower network-speeds (it gotta talk to the NAS before the data comes in) rather than faster direct connection to hard drive (or SSD) speeds.

Bitrot is a solved problem. It is solved by using bitrot-resilient filesystems with regular scans / scrubs. You build everything on top of solved problems, so that you never have to worry about the problem ever again.

@dragontamer@lemmy.world

Wait, what’s wrong with issuing “ZFS Scan” every 3 to 6 months or so? If it detects bitrot, it immediately fixes it. As long as the bitrot wasn’t too much, most of your data should be fixed. EDIT: I’m a dumb-dumb. The term was “ZFS scrub”, not scan.

If you’re playing with multiple computers, “choosing” one to be a NAS and being extremely careful with its data that its storing makes sense. Regularly scanning all files and attempting repairs (which is just a few clicks with most NAS software) is incredibly easy, and probably could be automated.

@dragontamer@lemmy.world

Professor Lemire btw, is a high-performance professor who has been doing a lot of AVX512 techniques / articles for the past few years. His blogposts are very popular on Hacker News (news.ycombinator.com). Pretty cool guy, I think its well worth it to follow his blog if you’re into low-level assembly, low-level memory optimizations and the like.

pext (and the reverse, pdep) are basically a 64-bit bitwise gather and 64-bit bitwise scatter instruction. On Intel, they execute in 1-tick, but on AMD they execute on 19-ticks (at least, a few years ago). Rumor is that the newest AMD chips are faster at it.

pdep and pext are some of my favorite functions, because gather/scatter is an important supercomputer / parallelism concept, and Intel invented an extremely elegant way to describe bit-movement in 64-bit registers. Given the huge importance of gather/scatter is to supercomputer algorithms of the past 40 years, I expect many, many more applications of pdep/pext.

My own experiments with pdep and pext was to create a small-sized bit-scale relational database for solving 4-coloring theorem (like) problems. I was able to implement “select” with a pext, and “joins” as a pdep. (4-bits is a single-column table. 16-bits for a dual-column table. 64-bits for a triple-column table).

@dragontamer@lemmy.world

Its not so easy.

GPU-programmers are the expert in AoS vs SoA formats. And when you look at how RGB values are stored, its… incredibly complex. Sometimes you’ve got RRRRGGGGBBBB, sometimes its RGBARGBARGBA, sometimes its YYYYUUVV. What’s best for performance changes dramatically on system-to-system, requiring lots of benchmarking and ultimately… a massive slew of processor-specific / ARM NEON instructions that convert between every format imaginable.

On right, GPUs don’t need that processor-specific instruction because permute and bpermute instructions exist (32-way crossbar any data-to-any-lane movement, and vice versa any lane pulling from any data, permute and bpermute respectively). CPUs do need it though.

@dragontamer@lemmy.world

https://www.infoworld.com/article/3409071/java-challenger-7-debugging-java-inheritance.html#toc-2

composition is literally the “has a” relationship. That’s how its always been taught.

@dragontamer@lemmy.world

You’re not describing composition.

Go Files do not “hasa reader”. You don’t do file.reader.read(), you just do file.read(), that’s inheritance as file has inherited the read() method.

@dragontamer@lemmy.world

But the fact that TCPStreams isa file-descriptor, Files isa file-descriptor, Pipes isa file-descriptor, and other such “stream-like objects” in the Linux kernel proves that the read/recv and write/send system calls are generic enough to work in a wide variety of circumstances.

Yeah, they’re all different. But as far as the Linux API goes, they’re all file descriptors under it all and have abstractions that work well for inheritance in practice. In many cases, inheritance doesn’t work. But in many cases, it works. And works well, for decades.

@dragontamer@lemmy.world

Inheritance is useful.

However, “Dog isa Animal” is worse than useless, it actively hampers your code and makes your life worse.

Useful inheritance patterns are all over the place in GUI / Model View Controller code however. “Button isa Window”, and “FileStream isa Stream”, and “StringStream isa Stream” in C++. If you stick with SOLID principles, inheritance helps your code significantly.

@dragontamer@lemmy.world

OpenSSL / Heartbleed was the event when this comic came out IIRC.

@dragontamer@lemmy.world

The refcount absolutely is shared state across threads.

If Thread#1 thinks the refcount is 5, but Thread#2 thinks the refcount is 0, you’ve got problems.

@dragontamer@lemmy.world

Meta: Hmmm… replying to kbin.social users appears to be bugged from my instance (lemmy.world).

I’m replying to you instead. It doesn’t change the meaning of my post at least, but we’re definitely experiencing some bugs / growing pains with regards to Lemmy (and particularly lemmy.world).

GC overhead is mostly memory-based too, not CPU-based.

Because modern C++ (and Rust) is almost entirely based around refcount++ and refcount-- (and if refcount==0 then call destructor), the CPU-usage of such calls is surprisingly high in a multithreaded environment. That refcount++ and refcount-- needs to be synchronized between threads (atomics + memory barriers, or lock/unlock), which is slower than people expect.

Even then, C malloc/free isn’t really cheap either. Its just that in C we can do tricks like struct Foo{ char endOfStructTrick[0]; } and store malloc((sizeof(struct Foo)) + 255); or whatever the size of the end-of-struct string is, to collate malloc / frees together and otherwise abuse memory-layouts for faster code.

If you don’t use such tricks, I don’t think that C’s malloc/free is much faster than GC.

Furthermore, Fragmentation is worse in C’s malloc/free land (many GCs can compact and fix fragmentation issues). Once we take into account fragmentation issues, the memory advantage diminishes.

Still, C and C++ almost always seems to use less memory than Java and other GC languages. So the memory-savings are substantial. But CPU-power savings? I don’t think that’s a major concern. Maybe its just CPUs are so much faster today than before that its memory that we practically care about.

Entry Level Microprocessors: Linux, 600MHz and 128MB of RAM

Entry Level Microprocessors: Linux, 600MHz and 128MB of RAM

Brainstorm: How to implement Fumen for Lemmy (for a Hypothetical Tetris community)?

Brainstorm: How to implement Fumen for Lemmy (for a Hypothetical Tetris community)?