How do you wrap your head around large established software projects in order to contribute to them?

@MonkderZweite@feddit.ch

That’s the point. Don’t contribute to software too difficult to understand.

Or at least start with easier ones.

@MidwayTheMagnificent@wayfarershaven.eu

I’m not sure I agree - your definition would almost certainly include the kernel, office suites, and a host of other things.

@Confused_Scallup@lemmy.ml

Im a programmer for work and honestly feel the same. Everyone says I’m doing a good job but I feel like it takes me forever to understand a new code base. I can’t just read a program and understand it. I need to copy and paste bits and try copy a feature to get my head around it. Like if there’s a button on the GUI then I follow it right the way through creating my own button. But I don’t know if there’s a better way to learn than that

Domi

When I come across pieces of code I don’t understand just by reading them I like to run them through ChatGPT and ask it what it does.

It does a really good job at explaining them and you can even ask follow up questions and it will go into more detail.

It’s essentially StackOverflow but nobody calls you an idiot for asking stupid questions.

Vhostym

I would be careful with this advice. If you are asking AI for an explanation of code, you may not have the experience to differentiate when it is correct and confidently incorrect.

@catastrophicblues@lemmy.ca

Also be wary about sharing confidential code. At work I don’t use ChatGPT unless it’s for extremely general questions.

Domi

The good thing about code is that explanations can easily be followed up with a quick search in the documentation once you know the terms to look it up.

But you are correct, as with everything related to ChatGPT, don’t let it bullshit you.

Domi

If the project does not properly document how to get started with contributing either in the readme or the contribute file, either ask if you really want to help out or don’t bother.

If contributions are wanted in larger projects they will have a very thorough documentation on how to setup your development environment.

Always try to get the project running as is before trying to make any changes.

Next up, start small. Like, very small. Many projects have issues that are marked as “good first issue”, they are usually small changes that can be done even by newcomers.

As for syntactic issues, you have 2 options. Either take a closer look at the coding guidelines for the project and the language or use ChatGPT. Just tell it what language/libraries and ask it what the following sample code does. You can ask follow up questions if anything is still unclear.

I use it primarily for C++ and C since I’m really bad at it and some C code look like it summons cthulu.

bahmanm

I’m a software engineer by profession and passion and have been writing programs for well over 20 years now. I believe your experience is totally natural - at least I share the same feelings:

Large code bases take time getting to know and understand: most definitely true. It takes time and effort and is an investment you need to make before being able to feel confident. You don’t need to fully comprehend every aspect of the project before you can contribute but you sure need to have a decent enough idea of how to build, test, run and deploy a particular feature. See point (2).
Don’t let the size of the project intimidate you. Start small and expand your knowledge base as you go. Usually one good starting point is simply building the project, running tests and deploying it (if applicable.) Then try to take on simple tasks (eg from the project’s issue tracker) and deliver on those (even things like fixing the installation docs, typos, …) That’ll have the additional impact of making you feel good about the work that you’re doing and what you’re learning. I’m sure at this stage you will “know” when you’re confident enough to work on tasks which are a bit bigger.
During (1) and (2), please please do NOT be tempted to just blindly copy-paste stuff at the first sign of trouble. Instead invest some time and try to understand things, what is failing and why it is so. Once you do, it’s totally fine to copy-paste.

After all, there’s no clear cut formula. Each project is a living and breathing creature and “not one of them is like another.” The only general guideline is patience, curiosity and incremental work.

Herowyn

Point 3 is so important, not just for large open source projects but also for any project, from small to big, as a hobby or for your job.

Understanding the project will help a lot when fixing issue. You’ll find more easily the root cause of the issue instead of fixing the symptoms.

@verdigris@lemmy.ml

You don’t, you look at the outstanding issues and pick one that sounds at all intelligible.

The experience of jumping into a big unfamiliar project is very much like Robert De Niro parachuting into the middle of hideously complex pipework to surgically fix a single leak in the movie Brazil. Nothing around you makes sense at first, and all you can do is look exactly where you’ve been told the problem is, and stare at it hard enough to finally see the crack in the pipe. Then as time goes on you can start checking the other pipes, and eventually you might think “Hey, why are these apartments all connected in this order? Wouldn’t it be more efficient to just have a central tank that then supplies each unit?” And then maybe you try that out, and maybe it’s a notable improvement and other engineers applaud you for your contribution… Or maybe it was a dumb idea for reasons that you hadn’t yet grasped, in which case you get to learn why.

It can be a fun process, just give yourself time to explore. One of the best things about version control is the ability to just fuck things up in experimentation without actually damaging the project.

arc

This is me with the lemmy-ui repository lol

It’s a struggle. I’ve found an issue I’d like to work on but now I’m struggling to get it running on my windows pc with wsl… the front end just errored when installing dependencies because I am using WSL

@Reva@startrek.website

Why not just use a GNU/Linux system?

@Efwis@lemmy.zip

That kinda follows the same question of why use windows. Some people that’s all they know, others still believe in the myth that GNU/Linux is only for geeks that want to be or are power users. Then there is the other faction that goes along the lines that there is no compatible alternative to what they use/ used to. A lot of alt programs still have a learning curve that is mostly caused by work flow muscle memory.

arc

I was considering it on my laptop but not for now

It’s just an os I’m most familiar with

@m_randall@sh.itjust.works

It’s easy! Don’t. It’s not possible to do.

Focus on one small area instead of the whole project. If there isn’t a “beginner” ticket selection then find one (or give yourself a goal). Figure out where that code is and start playing around with it.

As you branch out and work on more and more tickets you’ll gain more and more experience. You’ll understand how different blocks and systems interact and gain a better overall understanding of the code base in general but you’ll never be able to keep everything in your head. It’s just not needed.

And I don’t think it’s been said yet but as a former vi guy a good IDE was a huge boost to productivity. Ease of navigation around the code, intelligent searching, etc really helped out in the exploratory phase.

OpenStars

Some thoughts:

If there is something you see that is missing - particularly documentation - then perhaps that is an excellent place to start? The older devs may have just been waiting for someone like you to come along and could be ecstatic to hear that you want to make that. Maybe they used/continue to work together in a company or are old friends or sth and did not need that, so you could break the project wide open, making it easier for everyone who comes after you, possibly also changing the very culture of the project and encouraging the more senior devs to write documentation as well, as they make new things or solidify an existing foundation before extending into new territory. And there are so many forms of documentation - Pre/Post conditions, listing dependencies/interactions, plus overall description of assumptions made - that even if some of that exists, the project could perhaps still benefit from adding more, especially from the perspective of a newer team member.

Do not neglect the “people” side of things - maybe try to connect to some of the more senior devs on Discord or wherever they are first? Like on the plus side they could give you pointers, tell you what you can ignore, send you links to documentation that would have been hard to find on your own, etc. Seriously: imagine spending 6 months writing documentation for an enormously-complicated aspect of the code (like a major, central class + all of its dependencies), only to see the entire thing discarded & replaced, and you find out only then that it was always intended that way from the start. (still not a deal-breaker, b/c most of that “6 months” would be you learning stuff and getting familiar with generalities, so not entirely wasted, yet not entirely productive either if you could have been told to have picked a different entry point into the project) While on the minus side, if you see that they are just flat-out idiots, then you can abandon the project now and move on - that is a thing that can happen, and it is better to know ASAP than to only really be confronted by that a year or two in.:-(

Perhaps also consider your “fit” for the specific project. If you are good at many things, but not at the specific things involved there, then there will be a greater cost for you to work in that area, and you will spend more time “learning” and less time “contributing” (plus, how much time will people be willing to devote to helping you do the former, when you have done none of the latter yet?). Ngl, depending on the number and styles of languages involved - e.g. a script that calls an optimized C++ library that then feeds data into making an SQL query that uses a REGEXP into a database that has literally zero documentation anywhere… and so on - and your prior amount of experience with each of them, could take a good several YEARS to catch up, as only a side-project. Even if your expertise could help them - e.g. if you are great at UI/UX while the senior devs are more full-stack but almost exclusively focused on the back-end side - there is still the matter of you needing a way to deliver your contributions to them, i.e. understanding the existing codebase enough to be able to modify it to implement your ideas.

I hope this helps!:-)

@groucho@lemmy.sdf.org

I’ve been a dev for 20+ years and yeah, learning a new repo is hard. Here’s some stuff I’ve learned:

Before digging into the code:

get the thing running and get familiar with exercising it: test happy path, edge cases, and corner cases. We’re not even looking at code yet; we’re just getting a feel for how it behaves.
next up, see if there’s existing documentation. That’s not an end-all solution, but it’s good to see what the people that wrote the thing say about it.

Digging into the code:

grep is your very best friend. Pick a behavior or feature you want to try and search for it in the codebase. User-facing strings and log statements are a good place to start. If you’re very lucky, you can trace it down to a line of code and search up and down from there. If you’re unlucky, they’ll take you to a localization package and you’ll have to search based on that ID.
git blame is also your very best friend. Once you’ve got an idea where you’re working, use the blame feature on github to tie commits to PRs. This will give you a good idea of what contributing to the PR looks like, and what changes you’ll have to make for an acceptable PR.
unit tests are also a good method of stealth documentation. You can see what different areas of the code look like in isolation, what they require, and how they behave.
keep your own documentation file with your findings. The act of writing things down reinforces those things in your mind. They’ll be easier to recall and work with.
if there’s an official channel for questions / support, make use of it. Try to strike a balance here: you don’t want to blow them up every five minutes, but you also don’t want to churn on a thing for days if there’s an easy answer. This is a good skill to develop in general: knowing when to ask for help, knowing when an answer will actually be helpful, and knowing when to dig for a few minutes first.

There’s no silver bullet. Just keep acquiring information until you’re comfortable.

@astral_avocado@programming.dev

This is excellent advice and makes me feel less crazy…

@RustySharp@programming.dev

grep is your very best friend.

This. And also, in many cases, an ‘adjacent’ grep may help. Say you want to move the “OK” button on one screen. Searching for the string “OK” would be overwhelming as that would be all over the shop.

But you notice there’s a “Setup…” button next to it. Searching for that could potentially cut down your search results by orders of magnitude. The more obscure the text, the better.

@groucho@lemmy.sdf.org

Yep! Good point.

@apollo440@lemmy.world

I faced a similar challenge a few years ago; from my own projects to a large policy management system at an insurance company with literally 100s of thousands of source code files.

Of course it was extremely daunting at first, but what definitely helped me most was coming up with a strategy to find code where similar things had already been done.

This is easiest with a good IDE (search for aptly named classes, functions, etc.), and of course then asking somebody more experienced if you found the right bit of code. The last point cannot be overstated; most people love to show off their knowledge. If you come to someone with a question about a specific piece of code, they are usually extremely helpful. And importantly: if you don’t know who to ask, simply ask someone whom to ask! Again, make people show off their knowledge.

@midnightlightning@beehaw.org

It seems like you might be describing two different beasts, which could be part of your difficulty:

A codebase that has “dozens and dozens of classes and header files” sounds like a back-end project (written in C or similar), where the end product is an EXE or server app. A codebase where you’d help by updating “placement of a button” is a front-end project (written in HTML or JavaScript), where the output is HTML.

If you’ve cut your teeth contributing to front-end projects, you’ll likely feel more at home contributing to projects where the output is a website. There is a vast difference between working on a project that uses NextJS and contributing to the NextJS engine codebase itself. Finding a project that is using a library you know would be likely much easier to contribute to than contributing to the library itself.

@Reva@startrek.website

It’s a front-end project written in C++; a desktop environment for Unix-like systems.

@OldMrFish@lemmy.one

I’ve been working with software for 15 years and still feel like this when faced with a new codebase - it simply doesn’t want to make sense to me. As others have stated, codebases are living things, and are as much a map of previous developers minds as the are about being functional. The older a project is, the more convoluted and obscure the structure becomes due to changes, adaptations, new features and changing contributors.

Some developers seem to enjoy making their code obscenely difficult to understand, either because it actually makes sense to them that way, or because it makes them feel smarter. These projects are better left alone for the sake of your own sanity. If you encounter dozens of header files, walk away. C (or C++) are high performance languages, and projects are using that language for a reason. If you have no experience with them, the result is very unlikely to make any sense to you.

I’ve also found it quite difficult to find any project small enough to help on. The large projects have many contributors, and any manageable bugs are quickly fixed, leaving only the stuff that no one wants to touch.

Is there some sort of hobby you enjoy, where an open source tool is (or could be) used? The more obscure the better! Having some prior understanding of the subject usually makes understanding the codebase a little easier.

@wicked@programming.dev

Some developers seem to enjoy making their code obscenely difficult to understand, either because it actually makes sense to them that way, or because it makes them feel smarter.

Be wary about this mindset. This type of explanation sets you up for conflicts with existing developers. Several times I’ve seen developers coming into a team and complain about the code, creating conflicts that can last the entire working relationship for no good reason.

Much of the time the people who constantly work with code are already aware of the problems and may not be happy with it, but there’s no time or big benefit in improving working code. Or it’s complicated for good reasons which may not be immediately apparent. (ie. inherent complexity).

Here are a couple of benign reasons which probably will serve you much better.

It’s much more difficult and time consuming to make code that is easy to understand. Even in open source, there’s a limited amount of time to spend on any particular thing. This explanation is like a variation of Twain’s “I didn’t have time to write a short letter, so I wrote a long one instead.”, or more abrasively Hanlon’s razor “Never attribute to malice that which is adequately explained by ~~stupidity~~ time pressure”.
When writing the code, the developer has the entire context of his thought process available. You don’t have that, and that’s also the reason why your own code can make no sense a while later. Also it’s just much harder to read code than to write it.

@echindod@programming.dev

And sometimes coding habits are obtuse to people with different coding habits. These habits aren’t bad per service, but can be difficult to grok.

@OldMrFish@lemmy.one

While I agree with all of the above in principle (and even I have trouble reading my own code at times), this part was specifically in response to the section about ‘code optimized to irrecognizability’ and should not be taken as a general statement on finding other people’s code incomprehensible. Deliberately using non-descriptive naming is unfortunately a thing, although thankfully I rarely seem to encounter it anymore.

@zlatko@programming.dev

One thing to add that I haven’t seen is that for big projects, there’s often nobody that could understand it all. People either get their individual components it they understand how stuff interacts, it’s very rarely expected that new people in the project, even if very experienced, can just understand everything at once.

What you said that maintainers know every single fob is very frequently not the case at all! But since they get the big picture, they know in which part to look, and with their experience, they’ll know what to look for in that part, it may seem to you like magic. It’s not, it’s just experience.

Don’t get discouraged though!

Getting into big open source projects as a junior -level can be difficult, but often isn’t that hard - a lot of projects often need help and will take anything they can get. And if your experience already partially aligns with what you’re getting into, even better. If you reach out and be upfront about it, you’ll usually get pointed in some way.

Now, you seem to only have worked on your own, with smaller code bases. That means, you don’t have a problem of code organisation. So you can’t understand a solution if you don’t know what the problem is.

So how would you go about it?

My suggestion is to maybe get the. 10,000ft overview. Also, understand the project workflow. Projects usually have specific ways of doing things - how to build, test, run things. Try to figure out how to build and run the software on your own. If you make it, that’s a great step!

Then dig into one specific component/module/part. After a bit of study, you may be able to understand that component and find a simple thing that you can change about it. If you get this far you’re golden, you’re doing more then a majority of users that software.

Now if you’re interested, you can dig more, or reach out to devs, saying what your experience is and how far you got, and ask them if you can help. And take it from there.

@draagon@infosec.pub

Some projects try to make this easy. Check out this list / tag: https://github.com/topics/good-first-issue

People who know the code can quickly determine if an issue will be easy. Some things that may seem easy will be difficult, so as an outsider it’s hard to guess. Working on issues tagged good-first-issue is the safest bet that the issue won’t be overly complex to solve and that the maintainers are willing to work with new contributors.

Lil' Bobby Tables

There’s a lot of great advice here already; I’ve got one more piece from trying to figure out GIMP a few years ago.

Read the old commit notes and implementations. Treat them like your favorite soap opera. Get emotional about it; have your heroes and villains.

By the time you’ve gone from 1998 to 2015, you’ll have a thorough understanding of what’s going through everybody’s heads when they look at the code.

@Reva@startrek.website

You know what, that sounds super enjoyable. I’m going to do that.

jamyang

Now thats the spirit!

How do you wrap your head around large established software projects in order to contribute to them?

How do you wrap your head around large established software projects in order to contribute to them?

Programming

Rules

Wormhole