• 2 Posts
  • 17 Comments
Joined 1Y ago
cake
Cake day: Jun 15, 2023

help-circle
rss

Universal Paperclips is one of the best clicker games.

In particular: because it isn’t a clicker game. It only starts off as one. There’s only about 2 sections IIRC that are “clicker”, the start (before auto-clippers kick in), and then the quantum computer.

I guess you have to launch your first 20 or 30 probes at the space stage and that’s done one-click-at-a-time… but I don’t think that counts as a “clicker” game since its so few clicks in the great scheme of things. At no other point is rapid-clicking that useful.


I had a pretty standard linear-list scan initially. Each time the program started, I’d check the list for some values. The list of course grew each time the program started. I maximized the list size to like 2MB or something (I forget), but it was in the millions and therefore MBs range. I figured it was too small for me to care about optimization.

I was somewhat correct, even when I simulated a full-sized list, the program booted faster than I could react, so I didn’t care.


Later, I wrote some test code that exhaustively tested startup conditions. Instead of just running the startup once, I was running it millions of times. Suddenly I cared about startup speed, so I replaced it with a Hash Table so that my test-code would finish within 10 minutes (instead of taking a projected 3 days to exhaustively test all startup conditions).


Honestly, I’m more impressed at the opposite. This is perhaps one of the few times I’ve actually taken the linear-list and optimized it into a hash table. Almost all other linear-lists I’ve used in the last 10 years of my professional coding life remain just that: a linear scan, with no one caring about performance. I’ve got linear-lists doing some crazy things, even with MBs of data, that no one has ever came back to me and said it needs optimization.

Do not underestimate the power of std::vector. Its probably faster than you expect, even with O(n^2) algorithms all over the place. std::map and std::unordered_map certainly have their uses, but there’s a lot of situations where the std::vector is far, far, far easier to think about, so its my preferred solution rather than preoptimizing to std::map ahead of time.


Rasp Pi’s power usage, be it the RP2040 or the Rasp. Pi products in general, always have had horrendous power-consumption specs and even worse sleep/idle states.


How many layers does the Orange Pi Zero pcb have?

Answer: Good luck finding out. That’s not documented. But based off of the layout and what I can see with screenshots, far more than 4 layers.


A schematic alone is kind of worthless. Knowing if a BGA is designed for 6, 8, or 10 layers makes a big difference. Seeing a reference pcb-implementation with exactly that layer count, so the EE knows how to modify the design for themselves is key to customization. There’s all sorts of EMI and trace-length matching that needs to happen to get that CPU to DDR connection up-and-running.

Proving that a 4-layer layout like this exists is a big deal. It means that a relative beginner can work with the SAM9x60’s DDR interface on cheap 4-layer PCBs (though as I said earlier: 6-layers offer more room and is available at OSHPark so I’d recommend a beginner work with 6 instead)


With regards to SAM9x60D1G-I/LZB SOM vs Orange Pi Zero, the SAM9x60D1G-I/LZB SOM provides you with all remaining pins of access… 152 pins… to the SAM9x60. Meaning a full development board with full access to every feature. Its a fundamentally different purpose. The SOM is a learning-tool and development tool for customization.


Well, my self-deprecating humor aside, I’ve of course thought about it more deeply over my research. So I don’t want to sell it too short.

SAM9x60 has a proper GPU (albeit 2D one), full scale Linux, and DDR2 support (easily reaching 64MB, 128MB or beyond of RAM). At $3 for DDR2 chips the cost-efficacy is absurd (https://www.digikey.com/en/products/detail/issi-integrated-silicon-solution-inc/IS43TR16640C-125JBL/11568766), a QSPI 8MBit (1MB) SRAM chip basically costs the same as 1Gbit (128MB) of RAM.

Newhaven Displays offers various 16-bit TFT/LCD screens (https://newhavendisplay.com/tft-displays/standard-displays/) at a variety of price points. Lets take say… 400x300 pixel 16-bit screen for instance. How much RAM do you need for the framebuffer? (I dunno: this one https://newhavendisplay.com/4-3-inch-ips-480x272px-eve2-resistive-tft/ or something close).


Oh right, 400 x 300 x 2-bytes per pixel and we’re already at 240kB, meaning the entire field of MSP430, ATMega328, ARM Cortex-M0 and even ARM Cortex-M4 are dead on the framebuffer alone. Now lets say we have a 10-frames of animation we’d want to play and bam, we’re already well beyond what a $3 QSPI SRAM chip will offer us.

But lets look at one of the brother chips really quick: Microchip’s SAMA5D4. Though more difficult to boot up, this one comes with H.264 decoder. Forget “frames of animation”, this baby straight up supports MP4 videos on a full scale Linux platform.

Well, maybe you want Rasp. Pi to run that, but a Rasp. Pi 4 can hit 6000mW of power consumption, far beyond the means of typical battery packs of the ~3-inch variety. Dropping the power consumption to 300mW (SAMA5D4 + DDR2 RAM) + 300mW (LCD Screen) and suddenly we’re in the realm of AAA batteries.


Entry Level Microprocessors: Linux, 600MHz and 128MB of RAM
As computer programmers, our code runs on a wide variety of machines. From 2TB of RAM dual-EPYC servers with 128+ cores/256 hardware threads, to tiny single-core Arduinos running at 4MHz and 4kB of RAM. While hobbyists and programmers around the world have become enamored with Arduinos, ESP32, STM32 Pills, and Rasp. Pi SBCs... there's a noticeable gap in the typical hobbyist's repertoire that should be looked at more carefully. This gap is the entry-level MPU market, perhaps best represented by Microchip's SAM9x60, though STM's STM32MP1, NXP i.MX ULL, and TI's AM355x chips tightly compete in this space. I hope to muse upon this category of processors, why its unpopular but... why maybe today, you should give it a closer look. Impedance-controlled 6-layer PCBs USED to be too complex for a hobbyist... but they're accessible today -------------------- This section's title says it all. Typical MPUs require PCB complexity that... at least 10 years ago, was well beyond a hobbyist's means. In the 2010-era of the fledgling "Maker" movement, 2-layer PCBs were the most complex you could hope for. Not just from a manufacturing perspective, but also from a software perspective. EagleCAD just didn't support more layers, and no manufacturer catered to hobbyists to make anything more complex. Paying for $500 NRE fees each time you setup a board just wasn't good on a hobbyist's budget. But today, OSHPark offers 6-layer boards (https://docs.oshpark.com/services/six-layer/) at reasonable prices, with tolerances specified for their dielectric (and therefore, impedance-controlled boards are a thing). Furthermore, KiCAD 7+ is more than usable today, meaning we have free OSS software that can lay out delay-matched PCB traces, with online libraries like UltraLibrarian, offering KiCAD Footprints and Symbols sponsored by Microchip/Ti/etc. etc. There's also DKRed's 4-layer service, JLCPCB's services from China and plenty of competitors around the world that can take your 6-layer+ gerbers and give you a good board. We live in a new era where hobbyists have access to far more complexity and can feasibly build a bigger electronics project than you ever dreamed before. The classic team: Arduino and Rasp. Pi.... --------------------- Arduino and Rasp. Pi stick together like peanut butter and jelly. They're a barbell strategy providing the user with a low-cost, cheap, easy-to-customize chip (ATMega328p and other Arduino-level chips) operating at single-digit mW of power... with a large suite of analog-sensors and low latency and simplicity. While Rasp. Pi offers Linux-level compute solutions, "grown up" C++ programs, Python, server-level compute. Albeit at the 6W (for Rasp. Pi 4) or beyond, so pushing the laptop-level power consumption. But... that gives us a good team that handles a lot of problems cheaply and effectively. Or... is it? This barbell strategy is popular for good reasons from a problem-solving perspective, but as soon as any power and/or energy constraint comes up, its hopelessly defeated. Intermediate devices, such as the ESP32 have popped up as a "more powerful Arduino", so to speak, providing more services (WiFi / Bluetooth, RAM and compute-power) than an Arduino can deliver, but is still far less than what Rasp. Pi programmers are used to. What does a typical programmer want? SAM9x60: ARMv5 at 600MHz, 128MB DDR2, Linux 6.1.x, dual-Ethernet 10/100, USB in 30mm x 30mm --------------------------------------- ![](https://lemmy.world/pictrs/image/6da8f092-e3c6-4f8a-a188-c7893cbb2841.png) When Rasp. Pi launched a bit over 10 years ago with 256 MB and a 700MHz processor and full Linux support, it set off a wave of hobbyists to experiment with the platform. Unfortunately, Rasp. Pi has left this "tier" of compute power, chasing the impossible dream of competing with Laptops / Desktops. IMO, the original Rasp. Pi 1 hit a niche and should have stuck with that platform. Fortunately, alternatives exist today. Though the SAM9x60D1G-i/lzb SOM Module above is far more complex than a Rasp. Pi, its a good representation of what's possible with a modern entry-level MPU. Yeah yeah yeah, its $60 but stick with me a bit longer. The SOM module is a bad value, but it shows the minimal system that it takes to boot this chip. This is very different from Rasp. Pi indeed. SAM9x60 chip is fully open source, and fully documented at https://linux4sam.org. You get a full builtroot environment, a fully documented stage1, stage2, and stage3 (UBoot) bootloader. You get all 2000+ pages of documentation. And perhaps most importantly: SAM9x60's reference design fits on 4-layer boards. With fully open reference designs (bill of materials, customization, etc. etc.). Note however, that I'd personally only be comfortable with a 6-layer design here. (SAM9x60's reference design is signal/ground/power/signal stackup, which is frowned upon by modern PCB theory. signal/ground/power/signal/ground/signal would be a superior stackup... and 6-layers is cheap/available today anyway, so might as well go for 6-layers). At $8 per SAM9x60 and at $3 to $5 for 128MB DDR2 (depending on vendor), and at $3 to $5 for the power-chip, you'll get a minimal booting Linux box with a fully custom PCB design doing whatever you want... with a fully customized motherboard / PCB doing whatever you want. Cool... but why would I need this? ----------------- Well, to tell you the truth... I don't know yet. Power-constraints are the **obvious** benefit to running with these chips (SAM9x60 + LPDDR RAM will use 1/10th the power of a Rasp-Pi4, while still delivering a full Linux environment). But beyond that I'm still thinking in the abstract here. I'm mostly writing this post because I've suddenly realized that a full custom MPU comparable to first-generation Rasp. Pi is **doable** by a modern hobbyist. Albeit a well studied hobbyist comfortable with trace-matched impedance controlled transmission line theory on PCBs, but I took those college-classes for a reason damn it and maybe I can actually do this. Its a niche that 10 years ago was unthinkable for hobbyists to cheaply make their own SBCs from scratch. But today, not only is it possible, but there's 4 or 5 different vendors (Microchip's SAM9x60, TI's AM355x, STM32's STM32MP1, etc. etc.) that are catering to hobbyists with full documentation, BSPs and more. We're no longer constrained to the designs that Rasp. Pi decides to release, we can have those 2x Ethernet ports we've always wanted for example (for... some reason), or build a bare-metal OS free design using only 8MB of SRAM, or use LPDDR2 low-power RAM and build a battery-operated portable device. Full customization costs money. Whatever hobby project we do with this will cost far more than a RP4 or even RP5's base price. But... full custom means we can build new solutions that never existed before. And the possibilities intrigue me. Full control over the full motherboard means we have absolute assurances of our power-constraints, our size, the capabilities, supporting chips and other decisions. Do you want LoRA (long-range radio?). Bam, just a module. And you might be surprised at how much cheaper this is today than its ever been before. Conclusion -------------- Thanks for hearing my rant today. This form factor is really intriguing to me and I'll definitely be studying it moving forward as a hobby. Hopefully I've manage to inspire someone else out there!
fedilink

That’s not what storage engineers mean when they say “bitrot”.

“Bitrot”, in the scope of ZFS and BTFS means the situation where a hard-drive’s “0” gets randomly flipped to “1” (or vice versa) during storage. It is a well known problem and can happen within “months”. Especially as a 20-TB drive these days is a collection of 160 Trillion bits, there’s a high chance that at least some of those bits malfunction over a period of ~double-digit months.

Each problem has a solution. In this case, Bitrot is “solved” by the above procedure because:

  1. Bitrot usually doesn’t happen within single-digit months. So ~6 month regular scrubs nearly guarantees that any bitrot problems you find will be limited in scope, just a few bits at the most.

  2. Filesystems like ZFS or BTFS, are designed to handle many many bits of bitrot safely.

  3. Scrubbing is a process where you read, and if necessary restore, any files where bitrot has been detected.

Of course, if hard drives are of noticeably worse quality than expected (ex: if you do have a large number of failures in a shorter time frame), or if you’re not using the right filesystem, or if you go too long between your checks (ex: taking 25 months to scrub for bitrot instead of just 6 months), then you might lose data. But we can only plan for the “expected” kinds of bitrot. The kinds that happen within 25 months, or 50 months, or so.

If you’ve gotten screwed by a hard drive (or SSD) that bitrots away in like 5 days or something awful (maybe someone dropped the hard drive and the head scratched a ton of the data away), then there’s nothing you can really do about that.


If you have a NAS, then just put iSCSI disks on the NAS, and network-share those iSCSI fake-disks to your mini-PCs.

iSCSI is “pretend to be a hard-drive over the network”. iSCSI can exist “after” ZFS or BTRFS, meaning your scrubs / scans will fix any issues. So your mini-PC can have a small C: drive, but then be configured so that iSCSI is mostly over the D: iSCSI / Network drive.

iSCSI is very low-level. Windows literally thinks its dealing with a (slow) hard drive over the network. As such, it works even in complex situations like Steam installations, albeit at slower network-speeds (it gotta talk to the NAS before the data comes in) rather than faster direct connection to hard drive (or SSD) speeds.


Bitrot is a solved problem. It is solved by using bitrot-resilient filesystems with regular scans / scrubs. You build everything on top of solved problems, so that you never have to worry about the problem ever again.


Wait, what’s wrong with issuing “ZFS Scan” every 3 to 6 months or so? If it detects bitrot, it immediately fixes it. As long as the bitrot wasn’t too much, most of your data should be fixed. EDIT: I’m a dumb-dumb. The term was “ZFS scrub”, not scan.

If you’re playing with multiple computers, “choosing” one to be a NAS and being extremely careful with its data that its storing makes sense. Regularly scanning all files and attempting repairs (which is just a few clicks with most NAS software) is incredibly easy, and probably could be automated.


Professor Lemire btw, is a high-performance professor who has been doing a lot of AVX512 techniques / articles for the past few years. His blogposts are very popular on Hacker News (news.ycombinator.com). Pretty cool guy, I think its well worth it to follow his blog if you’re into low-level assembly, low-level memory optimizations and the like.


pext (and the reverse, pdep) are basically a 64-bit bitwise gather and 64-bit bitwise scatter instruction. On Intel, they execute in 1-tick, but on AMD they execute on 19-ticks (at least, a few years ago). Rumor is that the newest AMD chips are faster at it.

pdep and pext are some of my favorite functions, because gather/scatter is an important supercomputer / parallelism concept, and Intel invented an extremely elegant way to describe bit-movement in 64-bit registers. Given the huge importance of gather/scatter is to supercomputer algorithms of the past 40 years, I expect many, many more applications of pdep/pext.

My own experiments with pdep and pext was to create a small-sized bit-scale relational database for solving 4-coloring theorem (like) problems. I was able to implement “select” with a pext, and “joins” as a pdep. (4-bits is a single-column table. 16-bits for a dual-column table. 64-bits for a triple-column table).


Its not so easy.

GPU-programmers are the expert in AoS vs SoA formats. And when you look at how RGB values are stored, its… incredibly complex. Sometimes you’ve got RRRRGGGGBBBB, sometimes its RGBARGBARGBA, sometimes its YYYYUUVV. What’s best for performance changes dramatically on system-to-system, requiring lots of benchmarking and ultimately… a massive slew of processor-specific / ARM NEON instructions that convert between every format imaginable.

On right, GPUs don’t need that processor-specific instruction because permute and bpermute instructions exist (32-way crossbar any data-to-any-lane movement, and vice versa any lane pulling from any data, permute and bpermute respectively). CPUs do need it though.



You’re not describing composition.

Go Files do not “hasa reader”. You don’t do file.reader.read(), you just do file.read(), that’s inheritance as file has inherited the read() method.


But the fact that TCPStreams isa file-descriptor, Files isa file-descriptor, Pipes isa file-descriptor, and other such “stream-like objects” in the Linux kernel proves that the read/recv and write/send system calls are generic enough to work in a wide variety of circumstances.

Yeah, they’re all different. But as far as the Linux API goes, they’re all file descriptors under it all and have abstractions that work well for inheritance in practice. In many cases, inheritance doesn’t work. But in many cases, it works. And works well, for decades.


Inheritance is useful.

However, “Dog isa Animal” is worse than useless, it actively hampers your code and makes your life worse.

Useful inheritance patterns are all over the place in GUI / Model View Controller code however. “Button isa Window”, and “FileStream isa Stream”, and “StringStream isa Stream” in C++. If you stick with SOLID principles, inheritance helps your code significantly.


OpenSSL / Heartbleed was the event when this comic came out IIRC.


Brainstorm: How to implement Fumen for Lemmy (for a Hypothetical Tetris community)?
I'm a programmer, but mostly C in my job. I've played with PHP5 (yeah, a long time ago), dabbled in HTML5 when it was new, touched upon some ancient version of Javascript, etc. etc. I have done bare-bone system administrator work but only enough for test instances. My web-knowledge is therefore lacking. So I'm seeking the discussion with more experienced peers (especially in web and/or administrator sides) to brainstorm this idea. I'm probably not going to ever implement this idea, but... I feel like discussion may help me solidify the work needed to build future communities. ---------- Lets say I want to make a Competitive Tetris community akin to HardDrop.com. Among the most important features of Tetris is the Web-Application "Fumen" Javascript application, created by Japanese programmers. The web-gui isn't the most intuitive, but you all can demo it here: https://harddrop.com/fumen/ . The code has all sorts of Japanese comments that I can't read, but presumably with enough copy/paste and effort, I could get it off of Github (or whever it originated from) and copy/paste it into my own server somehow. When the HardDrop forums were open, people would discuss Tetris Strategies by crafting them on a Fumen, and then discussing them. I made such a post long ago here, on the TKI3 strategy if you want to see it: https://harddrop.com/forums/index.php?showtopic=7889 ---------- Now lets say I want to import my old Fumen for my own discussion purposes, somewhere in Lemmy. What's the best way to present it? I'm assuming I'll need: 1. Copying the Fumen renderer into my own Javascript/HTML 2. A friendly administrator who is willing to give me Javascript-import (or hosting the Javascript locally on the server). This presumably would be myself making my own server, but alternatively another admin who is aligned with me could do this too. 3. Ideally a local copy of Fumen running. (I wouldn't want to hammer HardDrop's webGUI too hard) 4. Some kind of Lemmy plugin that allows people to post text of the form: (v115@BhilFeAtglR4Beg0RpBtR4Ceg0RpAtzhAeh0JeAgWW?AURVSASYNuEw488AQr78AwKY5DkoBAAvhBtsuAAlsBzgQ4I?eR4CeRpwhBeglQ4CeRpwhAeAtglFewhBthlEewhAtKeAAPX?AS1STAS4kcDnoo2AMoo2AQieeEFcxCA), which [coincides with the Fumen I wrote](https://harddrop.com/fumen/?v115@BhilFeAtglR4Beg0RpBtR4Ceg0RpAtzhAeh0JeAgWW?AURVSASYNuEw488AQr78AwKY5DkoBAAvhBtsuAAlsBzgQ4I?eR4CeRpwhBeglQ4CeRpwhAeAtglFewhBthlEewhAtKeAAPX?AS1STAS4kcDnoo2AMoo2AQieeEFcxCA). Upon recognizing a string of this form (or maybe of form: [[fumen@v115...]]), render such a Fumen inline with the post. 5. Obviously, it would only render correctly on my "local instance" with my "local users". Assume this is sufficient for my personal community. But if anyone can think of how this can be solved "Cross-Lemmy", I'd also be interested in that. I'm not sure if the Tetris community is in a state where this is worthwhile pushing. But I can see "future video games", especially competitive ones, needing other online GUIs, calculators, and other such web-application support to help discussion. Any web-experts out there wanna muse on this subject with me?
fedilink

The refcount absolutely is shared state across threads.

If Thread#1 thinks the refcount is 5, but Thread#2 thinks the refcount is 0, you’ve got problems.


Meta: Hmmm… replying to kbin.social users appears to be bugged from my instance (lemmy.world).

I’m replying to you instead. It doesn’t change the meaning of my post at least, but we’re definitely experiencing some bugs / growing pains with regards to Lemmy (and particularly lemmy.world).


GC overhead is mostly memory-based too, not CPU-based.

Because modern C++ (and Rust) is almost entirely based around refcount++ and refcount-- (and if refcount==0 then call destructor), the CPU-usage of such calls is surprisingly high in a multithreaded environment. That refcount++ and refcount-- needs to be synchronized between threads (atomics + memory barriers, or lock/unlock), which is slower than people expect.

Even then, C malloc/free isn’t really cheap either. Its just that in C we can do tricks like struct Foo{ char endOfStructTrick[0]; } and store malloc((sizeof(struct Foo)) + 255); or whatever the size of the end-of-struct string is, to collate malloc / frees together and otherwise abuse memory-layouts for faster code.

If you don’t use such tricks, I don’t think that C’s malloc/free is much faster than GC.


Furthermore, Fragmentation is worse in C’s malloc/free land (many GCs can compact and fix fragmentation issues). Once we take into account fragmentation issues, the memory advantage diminishes.

Still, C and C++ almost always seems to use less memory than Java and other GC languages. So the memory-savings are substantial. But CPU-power savings? I don’t think that’s a major concern. Maybe its just CPUs are so much faster today than before that its memory that we practically care about.