The problem I see is mainly the divergence between hype and reality now, and a lack of a clear path forward.

Currently, AI is almost completely unable to work unsupervised. It fucks up constantly and is like a junior employee who sometimes shows up on acid. That’s cool and all, but has relatively little practical use. However, I also don’t see how this will improve over time. With computers or smartphones, you could see relatively early on, what the potential is and the progression was steady and could be somewhat reliably extrapolated. With AI that’s not possible. We have no idea, if the current architectures could hit a wall tomorrow and don’t improve anymore. It could become an asymptotic process, where we need massive increases for marginal gains.

Those two things combined mean, we currently only have toys, and we don’t know if these will turn into tools anytime soon.

aiccount
link
fedilink
56M

Yeah, it’s trajectory thing. Most people see the one-shot responses of something like the chatgpt’s current web interface on openai’s website and they think that’s where we are at. It isn’t though, the cutting edge of just what is currently openly available to people is things like CrewAI or Autogen using agents powered by things like Claude Opus or Llama 3, and maybe the latest gpt4 update.

When you use agents you don’t have to baby every response, the agents can run code, test code, check latest information on the internet, and more. This way you can give a complex instruction, let it run and come back to a finished product.

I say it is a trajectory thing because when you compare what was cutting-edge just 1 year ago, basically one-shot gpt3.5 to an agent network with today’s latest models, the difference is stark, and when you go a couple years before that to gpt2, it is way beyond stark. When you go a step further and realise that there is lots of custom hardware being built(basically llm ASICs-traditionally a ~10,000x speedup over general use gpus), you can see that soon having instant agent based responses will be the norm.

All this compounds when you consider that we have not hit a plateau and that we are still seeing that better datasets, and more compute, are still producing better models. Not to mention that other architectures, like state-based Mamba, are making remarkable achievements with very little compute so far. We have no idea how powerful thinks like Mamba would be if they were given the datasets and training that the current popular models are being given.

Even agents suffer from the same problem stated above: you can’t trust them.

Compare it to a traditional SQL database. If the DB says, that it saved a row or that there are 40 rows in the table, then that’s true. They do have bugs, obviously, but in general you can trust them.

AI agents don’t have that level of reliability. They’ll happily tell you that the empty database has all the 509 entries you expect them to have. Sure, you can improve reliability, but you won’t get anywhere near the DB example.

And I think that’s what makes it so hard to extrapolate progress. AI fails miserably at absolute basic tasks and doesn’t even see that it failed. Success seems more chance than science. That’s the opposite of how every technology before worked. Simple problems first, if that’s solved, you push towards the next challenge. AI in contrast is remarkably good at some highly complex tasks, but then fails at basic reasoning a minute later.

aiccount
link
fedilink
36M

I think having it give direct quotes and specific sources would help your experience quite a bit. I absolutely agree that if just use the simplest forms of current LLMs and the “hello world” agent setups that there are hallucination issues and such, but lots of this is no longer an issue when you get deeper into it. It’s just a matter of time until the stuff that most people can easily use will have this stuff baked in, it isn’t anything that is impossible. I mean, I pretty much always have my agents tell me exactly where they get all their information from. The exception is when I have them writing code because there the proof is in the results.

And what is the result? Either you have to check the sources if they really mean what the agent says they do, or you don’t check them meaning the whole thing is useless since they might come up with garbage anyway.

I think you’re arguing on a different level than I am. I’m not interested in mitigations or workarounds. That’s fine for a specific use case, but I’m talking about the usage in principle. You inherently cannot trust an AI. It does hallucinate. And unless we get the “shroominess” down to an extremely low level, we can’t trust the system with anything important. It will always be just a small tool that needs professional supervision.

aiccount
link
fedilink
46M

This is an issue with many humans I’ve hired, though. Maybe they try to cut corners and do a shitty job, but I occasionally check, if they are bad at their job, I warn them, correct them, maybe eventually fire them. For lots of stuff, AI can be interacted with in a very similar way.

This is so similar to many people’s complaints with self driving cars. Sure, accidents will still be had, they are not perfect, but neither are human drivers. If we hold AI to some standard that is way beyond people then yes, it’s not there, but if we say it just needs to be better than people, then it is there for many applications, but more importantly, it is rapidly improving. Even if it was only as good as people at something, it is still way cheaper and faster. For some things, it’s worth it if it isn’t even as good as people yet.

I have very little issues with hallucinations anymore, when I use an LLM to get anything involving facts, I always tell it to give sources for everything, and i can have another agent independently verify the sources before i see them. Often times I provide the books or papers that I want it to specifically source from. Even if I am going to check all the sources myself after that, it is still way more efficient then if I did the whole thing myself. The thing is, with the setups I use, I literally never have it make up sources anymore. I remember that kind of thing happening back in the days when AI didn’t have internet access, and there really weren’t agents yet. I realize some people are still back there, but in the future(that many of us are in) its basically solved. There is still logic mistakes and such, that stuff can’t be 100% depended on, but if you have a team of agents going back and forth to find an answer, then you pass it to another team of agents to independently verify the answer, and have it cycle back if a flaw is found, many issues just go away. Maybe some mistakes make it through this whole process, but the same thing happens sometimes with people.

I don’t have the link on hand, but there have been studies done that show gpt3.5 working in agentic cycles perform as good or better than gpt4 out of the box. The article I saw that in was saying that basically there are already people using what gpt5 will most likely be just by using teams of agents with the latest models.

aiccount
link
fedilink
16M

This is an issue with many humans I’ve hired, though. Maybe they try to cut corners and do a shitty job, but I occasionally check, if they are bad at their job, I warn them, correct them, maybe eventually fire them. For lots of stuff, AI can be interacted with in a very similar way.

This is so similar to many people’s complaints with self driving cars. Sure, accidents will still be had, they are not perfect, but neither are human drivers. If we hold AI to some standard that is way beyond people then yes, it’s not there, but if we say it just needs to be better than people, then it is there for many applications, but more importantly, it is rapidly improving. Even if it was only as good as people at something, it is still way cheaper and faster. For some things, it’s worth it if it isn’t even as good as people yet.

I have very little issues with hallucinations anymore, when I use an LLM to get anything involving facts, I always tell it to give sources for everything, and i can have another agent independently verify the sources before i see them. Often times I provide the books or papers that I want it to specifically source from. Even if I am going to check all the sources myself after that, it is still way more efficient then if I did the whole thing myself. The thing is, with the setups I use, I literally never have it make up sources anymore. I remember that kind of thing happening back in the days when AI didn’t have internet access, and there really weren’t agents yet. I realize some people are still back there, but in the future(that many of us are in) its basically solved. There is still logic mistakes and such, that stuff can’t be 100% depended on, but if you have a team of agents going back and forth to find an answer, then you pass it to another team of agents to independently verify the answer, and have it cycle back if a flaw is found, many issues just go away. Maybe some mistakes make it through this whole process, but the same thing happens sometimes with people.

I don’t have the link on hand, but there have been studies done that show gpt3.5 working in agentic cycles perform as good or better than gpt4 out of the box. The article I saw that in was saying that basically there are already people using what gpt5 will most likely be just by using teams of agents with the latest models.

Create a post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

  • 1 user online
  • 60 users / day
  • 170 users / week
  • 619 users / month
  • 2.31K users / 6 months
  • 1 subscriber
  • 3.28K Posts
  • 67K Comments
  • Modlog