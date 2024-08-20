Discover your dream Career
The other problem Goldman Sachs faces with LLMs

by Alex McMurray
5 hours ago
3 minute read
The other problem Goldman Sachs faces with LLMs

Goldman Sachs went against the grain when it published a report saying Gen AI had too much spend and no "killer application." The bank's CIO, Marco Argenti, has often lauded the potential of the technology, but in a recent interview, he hinted at another hurdle we've yet to jump regarding the tech: hallucinations.

"You can't just drop a model into an environment like Goldman" Argenti told Bloomberg's Odd Lots podcast, noting that even "0.1% inaccuracy is totally unacceptable." The firm had previously experimented with building its own models but decided the firms "time was spent much better using existing models." The issue, then, is that many studies prove the technology is unreliable.

A paper from the HKUST last revised in July noted that around a quarter of summaries generated from state-of-the-art language generation models had hallucinated content in some form. Hallucination rates can reach as high as 41% for closed domain tasks in GPT-3 without intervention. Noisy data is one of the main drivers of hallucinations, and banks can often have more data than they know what to do with.

Refining the data used to train LLMs, therefore, is key. Argenti says Goldman has "standardized" the way it refines its data, using RAG (retrieval augmented generation), which involves searching for data outside of its training data to accurately answer queries. 

This is not a sure fire fix to the problem, however. A recent study from Stanford created the HaluBench benchmark, which tests for hallucinations when using RAG. It found that GPT-4o hallucinated in 12.1% of its responses (with a sample size of 500). GPT-3.5, which a Columbia study said was the most frequently used in a trading context due to "cost-effectiveness and lower latency", hallucinated in 27.8% of its responses. None of the 11 models tested scored over 90%. Banks, which have abundant data with which to train their models, could reach significantly lower hallucination rates with careful pretraining, but the study still highlights a glaring issue.

Hallucinations can sometimes be difficult to spot. Argenti says it's "almost impossible" to get an AI model to say it doesn't know something: "you're always going to get an answer." You can eventually get it to do so by gently convincing the model it's okay to say 'I don't know'... but such tribulations defeat the point of a productivity gain, and failing to spot them can be catastrophic from a data governance perspective.

