I’ve started noticing articles and YouTube videos touting the benefits of branchless programming, making it sound like this is a hot new technique (or maybe a hot old technique) that everyone should be using. But it seems like it’s only really applicable to data processing applications (as opposed to general programming) and there are very few times in my career where I’ve needed to use, much less optimize, data processing code. And when I do, I use someone else’s library.
How often does branchless programming actually matter in the day to day life of an average developer?
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person’s post makes sense in another community cross post into it.
Hope you enjoy the instance!
Follow the wormhole through a path of communities !webdev@programming.dev
Not that much - it’s useful when you have a very hot loop where branches can cause the branch predictor to guess wrong and have to rollback computation it already did unnecessarily.
This StackOverflow question explains it fairly well: https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array
I understand the principles, how branch prediction works, and why optimizing to help out the predictor can help. My question is more of, how often does that actually matter to the average developer? Unless you’re a developer on numpy, gonum, cryptography, digital signal processing, etc, how often do you have a hot loop that can be optimized with branchless programming techniques? I think my career has been pretty average in terms of the projects I’ve worked on and I can’t think of a single time I’ve been in that situation.
I’m also generally aggravated at what skills the software industry thinks are important. I would not be surprised to hear about branchless programming questions showing up in interviews, but those skills (and algorithm design in general) are irrelevant to 99% of development and 99% of developers in my experience. The skills that actually matter (in my experience) are problem solving, debugging, reading code, and soft skills. And being able to write code of course, but that almost seems secondary.
Personally I try to keep my code as free of branches as possible for simplicity reasons. Branch-free code is often easier to understand and easier to predict for a human. If your program is a giant block of if statements it’s going to be harder to make changes easily and reliably. And you’re likely leaving useful reusable functionality gunked up and spread out throughout your application.
Every piece of software actually is a data processing pipeline. You take some input, do some processing of some sort, then output something, usually along with some side effects (network requests, writing files, etc). Thinking about your software in this way can help you design better software. I rarely write code that needs to process large amounts of data, but pretty much any code can benefit from intentional simplicity and design.
I am all aboard the code readability train. The more readable code is, the more understandable and therefore debuggable and maintainable it is. I will absolutely advocate for any change that increases readability unless it hurts performance in a way that actually matters. I generally try to avoid nesting ifs and loops since deeply nested expressions tend to be awful to debug.
This article has had a significant influence on my programming style since I read it (many years ago). Specifically this part:
Yep, that’s how I write my code too. I took a class in college, comparative programming languages, that really changed how I thought about programming. The first section of the class was Ruby, and the code most of us wrote was pretty standard imperative style code. If statements, loops, etc. Then we spent a month or so in Haskell, basically rewriting parts of the standard library by only using more basic functions. I found it insanely difficult to wrap my head around but eventually did it.
Then we went back and wrote some more Ruby. A program that might have been 20-30 lines of imperative Ruby could often be expressed in 3 or 4 lines of functional style code. For me that was a huge eye opener and I’ve continued to apply functional style patterns regardless of the language I’m using (as long as it’s not out of style for the project, or makes anything less maintainable/reliable).
Then one day a coworker showed us a presentation from Netflix (presentation was done by Netflix software engineers, not related to the service) and how to think about event handlers differently. Instead of thinking of them as “events”, think about them as async streams of data - basically just a list you’re iterating over (except asynchronously). That blew my mind at the time, because it allows you to unify both synchronous and asynchronous programming paradigms and reuse the same primitives (map/filter/reduce) and patterns in both.
This is far beyond just eliminating if statements, but it turns out if you can reduce your code to a series of map/filter/reduce, you’re in an insanely good spot for any refactoring, reusing functionality, easily supporting new use cases, flexibility, etc. The downside would be more junior devs almost never think this way (so tough for them to work on), and it can get really messy and too abstract on large projects. You can’t take these things too far and need to stay practical, but those concepts really changed how I looked at programming in a major way.
It went from “a program is a step by step machine for performing many types of actions” to “a program is a pipeline for processing lists of data”. A step by step machine is complex and can easily break down, esp when you start changing things. Pipelines are simple + reliable, and as long as you connect them up properly the data will flow where it needs to flow. It’s easy to add new parts without impacting and existing code. And any data is a list, even if it’s a list of a single element.
Do you recall what the presentation was called? I built a pipelined packet processing system (for debugging packets sent over an RF channel) which sounds like a fairly representative example of what you’re talking about, but it’s not obvious to me how to naturally extend that to other types of projects.
I don’t remember the presentation, but luckily I did remember the concept and here’s an article: https://netflixtechblog.com/reactive-programming-in-the-netflix-api-with-rxjava-7811c3a1496a
It’s called “reactive” programming and that article goes over some of the basic premises. The context of the presentation was in front-end (web) code where it’s a god awful mess if you try to handle it in an imperative programming style. React = reactive programming. If you’ve ever wondered why React took off like it did, it’s because these concepts transformed the hellish nightmare landscape of jquery and cobbled together websites into something resembling manageable complexity (I’m ignoring a lot of stuff in between, the best parts of Angular were reactive too).
Reactive programming is really a pipeline of your data. So the concepts are applicable to all sorts of development, from low level packet processing, to web application development on both the front and back end, to data processing, to anything else. You can use these patterns in any software, but unless your data is async it’s just “functional programming”.
I wonder how relevant this is to Go (which is what I work in these days), at least for simple data retrieval services. I can see how transforming code to a functional style could improve clarity, but Go pretty much completely eliminates the need to worry about threads. I can write IO bound code and be confident that Go will shuffle my routines between existing threads and create new OS threads as the existing ones are blocked by syscalls. Though I suppose to achieve high performance I may need to start thinking about that more carefully.
On the other hand, the other major component of the system I’m working on is responsible for executing business logic. It’s probably too late to adopt a reactive programming approach, but it does seem like a more interesting problem than reactive programming for a data retrieval service.