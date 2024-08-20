Goldman Sachs went against the grain when it published a report saying Gen AI had too much spend and no "killer application." The bank's CIO, Marco Argenti, has often lauded the potential of the technology, but in a recent interview, he hinted at another hurdle we've yet to jump regarding the tech: hallucinations.

Click here to follow our new WhatsApp channel, and get instant news updates straight to your phone 📱

"You can't just drop a model into an environment like Goldman" Argenti told Bloomberg's Odd Lots podcast, noting that even "0.1% inaccuracy is totally unacceptable." The firm had previously experimented with building its own models but decided the firms "time was spent much better using existing models." The issue, then, is that many studies prove the technology is unreliable.

A paper from the HKUST last revised in July noted that around a quarter of summaries generated from state-of-the-art language generation models had hallucinated content in some form. Hallucination rates can reach as high as 41% for closed domain tasks in GPT-3 without intervention. Noisy data is one of the main drivers of hallucinations, and banks can often have more data than they know what to do with.

Refining the data used to train LLMs, therefore, is key. Argenti says Goldman has "standardized" the way it refines its data, using RAG (retrieval augmented generation), which involves searching for data outside of its training data to accurately answer queries.

This is not a sure fire fix to the problem, however. A recent study from Stanford created the HaluBench benchmark, which tests for hallucinations when using RAG. It found that GPT-4o hallucinated in 12.1% of its responses (with a sample size of 500). GPT-3.5, which a Columbia study said was the most frequently used in a trading context due to "cost-effectiveness and lower latency", hallucinated in 27.8% of its responses. None of the 11 models tested scored over 90%. Banks, which have abundant data with which to train their models, could reach significantly lower hallucination rates with careful pretraining, but the study still highlights a glaring issue.

Hallucinations can sometimes be difficult to spot. Argenti says it's "almost impossible" to get an AI model to say it doesn't know something: "you're always going to get an answer." You can eventually get it to do so by gently convincing the model it's okay to say 'I don't know'... but such tribulations defeat the point of a productivity gain, and failing to spot them can be catastrophic from a data governance perspective.

Have a confidential story, tip, or comment you’d like to share? Contact: Telegram: @AlexMcMurray, WhatsApp: (+1 269 237 3950). Click here to fill in our anonymous form, or email editortips@efinancialcareers.com. Signal also available.

Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)