Yep, Longest Common Subsequence is usually greedy and that’s the earliest set of lines that satisfies the search. Happens when you just treat a file as lines and only match those.

You can get better results with more syntax or content awareness. Chunk into paragraphs or code blocks or functions, then sentences or statement lists, then lines, then words, etc. I think Beyond Compare can do this.

xedrak
link
fedilink
11Y

8,000+ lines in a single file??? I’m going to be sick

Oh that’s not uncommon in the industry. Especially when dealing with legacy code.

Personal best was 40k lines in a file called misc.c containing all the global functions that don’t fit anywhere else.

Runner up was the one where each developer dumped their miscellaneous functions in their own files, so they don’t have to deal with merge conflicts. Which means we had x1.c, x2.c, x3.c … etc.

YMS
link
fedilink
11Y

Best I can offer is a combined UI and logic class with 12,500 lines currently. It started out with less than 3,000 lines in the year 2000 (using the brand new Java 1.3), grew to 14,000 over time and survived our recent project-wide one-year cleanup project with only minor losses of code lines.

Create a post

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

  • Posts must be relevant to programming, programmers, or computer science.
  • No NSFW content.
  • Jokes must be in good taste. No hate speech, bigotry, etc.
  • 1 user online
  • 120 users / day
  • 257 users / week
  • 744 users / month
  • 3.72K users / 6 months
  • 1 subscriber
  • 1.48K Posts
  • 32.6K Comments
  • Modlog