Here’s how Cleary Gottlieb law firm uses genAI for pre-trial discovery and more

Thursday October 17, 2024. 12:00 PM , from ComputerWorld

Corporate law is nothing like you see on television. To prepare for a case, 150 attorneys might be tasked to travel to remote warehouses to comb through tens of millions of documents gathering dust or track down amorphous electronic communications. It’s a process known as discovery.

For more than a decade, law firms have been using machine learning and artificial intelligence tools to help them hunt down paper trails and digital documents. But it wasn’t until the arrival two years ago of OpenAI’s generative AI (genAI) conversational chatbot, ChatGPT, that the technology became easy enough to use that even first-year associates straight out of law school could rely on it for electronic discovery (eDiscovery).

Today, you’d be hard pressed to find a law firm that hasn’t deployed genAI, or isn’t at the very least kicking the tires on its ability to speed discovery and reduce workloads.

For all intents and purposes, no one practicing law today studied AI in school, which means it falls to firms to integrate the fast-evolving tech into their workplaces and to train young lawyers on matching AI capabilities to client needs while remaining accountable for its output. This is the essence of turning AI into a copilot for all manner of chores, from wading through data to analyzing documents to improving billing.

In that vein, longtime IT workers are no longer just on call for computer glitches and AV setups; they have moved to the forefront of running a law firm, handling AI’s role in winning cases, retaining clients, growing revenue and, inevitably, helping attract the best and brightest new talent. Multinational law firm Cleary Gottlieb is a prime example of that.

Cleary has been able to dramatically cull the number of attorneys used for pre-trial discovery and has even launched a technology unit and genAI legal service: ClearyX. (ClearyX is essentially an arbitrage play — an alternative legal service provider [ALSP] for offshoring eDiscovery and automating electronic workflows.)

While Cleary readily admits that genAI isn’t perfect in retrieving 100% of the documents related to a case or always creating an accurate synopsis of them, neither are humans. At this point in the technology’s development, it’s good enough most of the time to reduce workloads and costs.

Still, cases do pop up that can be more expensive when customizing a large language model to suit specific needs than deploying those dozens of eager attorneys seeking to prove themselves.

Computerworld spoke with Christian “CJ” Mahoney, counsel and global head of Cleary’s e-Discovery and Litigation Technology group, and Carla Swansburg, CEO of ClearyX, about how the firm uses genAI tools. The following are excerpts from that interview:

Why is AI being adopted in the legal profession? Mahoney: “Because the legal profession is seeing an explosion of information and data created by their clients, and it’s become increasingly challenging to digest that information strictly through a team of attorneys looking through documents. That explosion probably started two decades ago. It’s been growing more and more challenging.

“I just had a case where we were measuring the amount of data we looking at, and for one case, we had 15 terabytes we had to analyze. It was over 50 million documents, and we had to do it in matter of weeks to find out what had to provide to the opposing party.

“Secondly, we wanted to find out what’s interesting in documents and what supported our advocacy. Traditional ways for looking through that type of information and getting a grasp of the case is really not feasible anymore. You need to incorporate AI into the process for analysis now.”

Swansburg: “One of the big shifts with OpenAI and genAI, in particular, is for the first time there’s ubiquity. Everyone’s hearing about it. Secondly, sophisticated clients are starting to approach it — even the formerly untouched Wall Street firms and other large firms with an eye on cost sensitivity.

“Fast forward to now. There’s a bit of an expectation that with the advent of genAI, things should be quicker and cheaper. Second of all, [there’s] the accessibility of AI through natural language processing. The third thing is the explosion of purpose-designed tools for the legal profession, and that does go back about a decade when you had diligence tools and tools for contract automation.”

Christian “CJ” Mahoney, counsel and global head of Cleary’s e-Discovery and Litigation Technology group, and Carla Swansburg, CEO ClearyXCleary Gottlieb

How have the expectations of clients changed regarding the use of genAI? Swansburg: “A year-and-a-half ago, we were getting messaging from clients saying, ‘You’d better not be using AI because it seems really risky.’ Now, we’re getting requests from clients asking, ‘How are you using AI to benefit me and how are you using it to make your practices more efficient for me?’

“There’s a lot of changing dynamics. Legal firms that were historically reluctant to embrace this technology are asking for it — ‘When can I get some of this generative AI to use in my practice?’”

How has the job of an attorney changed with genAI? Swansburg: “Nobody went to law school to do this. I used to go through banker’s boxes with sticky notes as a litigator. Nobody wants to do that. Nobody wants to read 100 leases to highlight an assignment clause for you. The good thing is [genAI is] moving up the value chain, but it’s starting with things that people really don’t want to be doing anyways.”

Is genAI replacing certain job titles, filling job roles? Mahoney: “I’d say we’re not to the place where it’s replacing entire categories of jobs. It’s certainly making us more efficient such that if I would have needed a team of 60 attorneys on work I’m doing, I may need a team of about 45 now. That’s the type of efficiency we’re talking about.

“I had over 60 [attorneys] just this weekend working on just one case. It’s the big data explosion of evidence there is to comb through.

“We’re using more complex workflows using AI. I said I saw a 60-person to 45-person reduction. But, on this kind of case, I would have had probably 150 attorneys doing this 15 years ago. Back then, it would just be like ‘OK guys, here’s a mountain of evidence — go through it.’

“Now, we are using several AI strategies to help classify documents for what we need to turn over to help narrow the amount of content we have to look over. It’s helping us to summarize before we even look at the documents, so that we have a summary going in to help us digest the information faster.”

Swansburg: “In my world, it’s not really replacing jobs yet, but it’s changing how you do jobs. So, it’s allowing people to move up the value chain a little bit. It’s taking away rote and repetitive work.

“Our experience has been — and we’ve kicked tires on a lot of language models and purpose-designed tools — [genAI tools] are not good enough to replace people for a lot of the work we do. For something like due diligence…, you often must be right. You need to know whether you can get consent to transfer something. In other use cases, such as summarization and initial drafting, that sort of thing is a little more accessible.”

What does that big data you’re discovering look like? Is it mostly unstructured? Mahoney: “Most of my data sets are unstructured. We’re talking about email and messages on someone’s laptop or a portion of a document repository on a file server. These days, we’re talking about chats on platforms like Teams or mobile devices. Often, we’ll target those collections through good attorney investigations, but a lot of times we have unstructured data sources like mailboxes to comb through. What we’re doing there is use a large language model algorithm.

“We are reviewing some samples, some of them random and some of them with training approaches we developed to target documents we think will help the model understand what we’re trying to teach it quicker. We’re reviewing a few thousand documents to train the model to predict if a document is responsive to the other [opposing] side’s document requests. We’re then running that model over millions of documents. We find throughout iterative model training improvement processes, we are approaching and sometimes surpassing the type of performance we’d expect by that team of 150 attorneys looking at all these documents.

“So, we use that as our starting point and sometimes our only process for identifying what we need to deliver to the other side. But once we have that set, we are using similar processes to identify things like attorney-client privilege in the document. And again, to identify which of these documents are interesting and useful for our advocacy.

“Now we’re also coupling that with generative AI workflows where, in addition to this training strategy, we’ve identified small samples of the [document] universe; we’re also seeing prompt-based genAI queries on portions of the data set to find documents that support our advocacy.”

Have you found other uses for AI that you didn’t initially expect? Mahoney: “We’re using genAI to look at files that we could have never used old school keyword searches on because they don’t have any text in them. They could be images or movies. We created a genAI process using some of the really new algorithms out there to analyze things like images and video files for finding more interesting information.

“We’ve also created genAI workflows when we claim attorney-client privilege; we have to create a whole attorney-client privilege log. We’ve created genAI workflows to help us draft the privilege log. It’s the same concept as using genAI to summarize a document. We’re using it to summarize the privileged portion of a document, but summarize it in a way that we’re meeting our privilege log obligation without revealing what the privilege advice is.

“Then a lot of our human-in-the-loop practices are taking a look of those AI results and doing validation, making some improvements here and there, rather than relying entirely on the AI. The level of that validation depends on what the task is.”

AI has the tendency to go off the rails with errors and hallucinations. How do you address that? Swansburg: “In CJ’s world, they work off of percentages — like 80% accurate. For us, largely we need to be 100% accurate. A lot of what we do, whether it’s contract analysis and management or transactional diligence, we have a context set of materials. So, the potential for hallucinations is more limited. Having said that, some of the tools in market will still hallucinate. So, you’ll say, ‘Find me the address of the leased property’ and it’ll totally make something up.

“One of the key things we do, and some of the development work we’re doing, is to say, ‘Show me in the document where that reference is.’ So, there’s a quick and easy way to validate information. You’ve got a reference; you tell me what it says. You’re extracting a piece of it, so we have a really fast way to validate.

“For us, it’s always a discrete set of context documents. So, we can first of all solve through prompting and tailoring it to which set of documents they want us to use, but second of all always confirming there’s always a way to ensure the provenance of the information.

“Some of the work we’re doing is we’ve developed a way to prompt a model to tell us when the termination data of an NDA is? If a person’s reading it, they can usually tell. But NDAs have an effective date and then they have a term that can be written in any number of ways: two years, three years, and then there are often continuing obligations.

“So if you just said, ‘When does this NDA terminate?’ a lot of AI models will get it wrong. But if you generate a way to say, ‘Find me the effective date, find me a clause, find me the period of time or continuing obligations,’ it’s typically 100% accurate. It’s a combination of focused context documents, proper prompt engineering and a validation process.”

Are you using retrieval augmented generation (RAG) to fine-tune these models, and how effective has it been at that task? Mahoney: “We are using RAG to put guardrails on how the large language model responds and what it’s looking at in its response. I think at times that’s certainly a helpful tool to use on top of the LLM.

“I’d also say even though we are more aggressively using LLMs and genAI in the discovery space, the process Carla described looks exactly the same. The difference would be our tolerance for errors, as part of that validation process.

“That’s comparing it to what human results would look like. What we find historically on various tasks in electronic discovery over several decades — humans usually get things right about 75% of the time. So, when we’re looking at LLMs and genAI, we want to be careful it’s working well, but we also want to be careful that we’re not holding it to too high a standard.

“If you’re writing a brief, 75% accuracy would be horrible and unacceptable. But when you’re looking through two million documents, that might be perfectly acceptable. That’s how the process looks a little different, even though the structure of the process looks the same in terms of steps.”

Small language models as opposed to large proprietary models from Amazon, Meta, and OpenAI are growing in popularity because you can create a model for every application need. What kinds of AI models do you use? Mahoney: “We’ve actually been using open large language models for five years now. We started with what was the largest language model at the time, but it’s probably closer to a small language model now. We use a version of BERT a lot when we’re doing supervised learning.

“We are very LLM agnostic, in that we’re able to look at the different tasks and see which one is right for a particular task. For image analysis, or the multimedia analysis, we’re using the latest and greatest, such as the ChatGPT Omni. It’s unique in having capabilities for drafting [client-privilege] log lines. Depending on the data, we’re shifting between GPT-4 or GPT-3.5 Turbo.

“We’re actually looking at where we’re getting reasonable performance and comparing that to things like costs.”

Is price an issue you consider when adopting a model? Mahoney: “Different LLMs have very different price points. For some of our data sets, the way GPT 3.5 Turbo is performing log lines is actually quite good. So, we wouldn’t want to spend the extra money on GPT-4 there.

“On the small language model front, I’d say we’re doing tuning rather than a separate small language model for each application…. We’re taking an existing model — but where we have an industry that might look very different than what that model was built on — [and] we’re doing some fine tuning on top of that to introduce the model to a dataset before it starts making predictions on it.”

So, essentially, some LLMs are better at some tasks than others? Mahoney: “Some language models are better at certain tasks in summarizing or pinpointing whatever it is. Ideally you have a workflow with six steps and you’re using a different LLM at different steps. You never know who’s going to emerge tomorrow and being better at X or Y.

“We’ve been using OpenAI [LLMs] before it was publicly launched. And we’ve been testing Meta and Claude and using the ones that we think make the most sense for a particular task.”

Data scientists and analysts, prompt engineers — what roles do you have or have you added to address your LLM needs? Swansburg: “For the work CJ does, and the work we do, the larger the data set, the more the need for data scientists. So, he does work with data scientists on his side.

“On my side, in terms of prompt engineers, we have good software developers that can do that for you. We have people who are pure developers, and we have people who sit in the middle that we call ‘legal technologists.’ Those are the translators who take client and lawyer requirements and feed those back and do the customization to the platforms we build.

“We don’t have any data scientists yet, because we use discrete data sets. So it’s more about being able to engineer the prompts — and the team we have now has been able to do that on the developer side. As we grow, and right now we’re recruiting another half-dozen developers, we will get more nuanced and look for people with prompt engineering experience and building APIs with LLMs and other tools.

“So, it’s constantly changing.”

Are you’re mostly using proprietary rather than open-source models? Mahoney: “Right now, we’re just using proprietary models and plugging them in and testing them — OpenAI being the more common example. We’re building things through prompts like contract determination dates to extract that data we need and building bundles of questions that will be generated based on the automatic determination of what the system in ingesting. All of that is being tested now.

“Some of them are really expensive. Something like ChatGPT is very accessible. Even the enterprise models can do the trick, and they’re accessible and affordable. “

If legal departments and law firms were already using AI and ML, why is ClearyX needed? Swansburg: “We’re trying to build a model that’s a lot less expensive than contract management software…and to have much higher quality than a lot of providers and provide a service.

“A lot of companies don’t have people to own and operate these programs. So, they have shelfware. They buy a contract lifecycle management tool, and it take three years to get their return on investment; then people don’t use it because it’s not custom designed. So, we’re trying to build custom solutions for clients that work the way they work, and that are affordable.

“We’re not venture capital owned. We’re owned by the partnership, so we’re able to build things in the right way. We’re not just serving clients of the Cleary law firm; we also have a mandate to get outside clients.

“We started thinking we weren’t going to be a development shop. We were going to use existing solutions and weave them together using APIs, but a couple things happened. The tools on the market weren’t doing what we wanted them to do. We weren’t able to customize them in the nuanced way that made clients actually delighted to use them.

“The other is the ubiquity of AI, and the ability to customize them is way easier than it was three years ago. So, over the last eight months or so, we’ve been able to pivot to something that allows us to customize it more easily and collaborate with clients to figure out how they want it to work.”