Goldman Sachs has a new programming language for data jobs
Goldman Sachs is all about data. The firm is in the process of building itself a data lake (AKA a 'strategic repository for enterprise data') and is hiring for a data architecture team comprised of data lake engineers, data-oriented technologists who interact with the lake and data-oriented developers who create and manage the software that itself manages the data in the lake...
If this sounds like a big undertaking, it clearly is. And if you want to achieve one of these data job at Goldman - and anywhere else in finance, it might help to learn Goldman's inhouse data coding language, known internally as PURE and now open sourced as the Legend Language.
PURE/Legend is a logical modeling language developed by Goldman to describe its data. It's used by the firm in conjunction with a system known until yesterday as Alloy. Alloy uses PURE to interrogate Goldman's databases and to generate models as anything from SQL, to Java and JSON. - It's the visual frontend.
Speaking at the FINOS Open Source Strategy Forum a year ago, Neema Raphael, global head of engineering and chief data officer at Goldman Sachs, explained how Legend works. "Say you want to use some data for analysis or to share some data with a broader team," said Raphael, "...and that comes from different sources, with different attributes and also has linkages to other datasets, what Alloy lets you do is to simply and consistently define those concepts as business concepts and to normalize it as self-service for users."
Also speaking at FINOS' forum, Pierre De Belen, head of the data model engineering team at Goldman and godfather of the Legend system said the firm was using PURE/Legend to build a "conceptual graph of our information.
"We then map to the many databases storing our data using an advanced document schema adding constraints of transformation and derivation," said De Belen, adding that his team of 1,000 data modelers was busy modeling "pretty much all the information we have in the firm so that people can navigate it easily."
In other words, if you want a data modelling job at Goldman, you probably need to learn Legend/PURE. And if the Legend project takes off in the way Goldman hopes, the platform and the language could yet become de facto in the banking industry.
By making Legend open source, the intention is to create APIs that will allow Goldman's clients to self-serve data and to build their own tools using Goldman's platform.
So what is Legend? FINOS describes it as, "an immutable functional language based on the Unified Modeling Language (UML) and inspired by Object Constraint Language (OCL)." It has the advantage of speeding-up data modeling so that it becomes usable in a trading environment. It also makes it much easier to add executable constraints, derivations, and model-to-model mappings. The guide to getting started on the language is here.
This isn't the first time Goldman has built its own programming language. - It's also got Slang, which underpins SecDB, and which - depending upon who you ask - is either a route to a great software career at Goldman or a career cul-de-sac. As a more contemporary language (Slang dates back to the 1980s) and one that's been open sourced, Legend is unlikely to suffer the same problem. Anyone currently working on Extract Transform and Load (ETL) systems that copy data into destinations (like data lakes) might want to familiarize themselves with Legend especially quickly: if Legend becomes the norm, many of these roles could become redundant.
Have a confidential story, tip, or comment you’d like to share? Contact: email@example.com in the first instance. Whatsapp/Signal/Telegram also available. Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)