The Immediate Future of AI in Law: An Overview of Natural Language Processing Algorithms

By Barry Wang, Daniel Lee Aniceto and Jacky Zeng

Natural Language Processing (NLP) is a subfield of artificial intelligence and linguistics concerned with the interaction between algorithms and human language. The role and characteristics of language fundamentally underpin the legal profession. Any algorithm capable of understanding, manipulating, and expressing language will have wide-ranging impacts for the legal profession. We will survey the developments of NLP, then analyse the short and long term implications for the legal profession.

What is Natural language processing?

To better appreciate the implications of NLP it is helpful to draw a distinction between code-driven and data-driven algorithms.[1] Code-driven algorithms use a form of decisional logic that is predictable and operates on a set of preprogramed rules often in an "if this then that" format. Whereas data-driven algorithms make dynamic inferences and models by identifying complex patterns in data and creating their own rules and logic which are often too complex and numerous to pre-programme.[2] NLP falls into the latter category and essentially uses training data to develop complex models of language, and its capabilities have steadily developed over the past two decades. 

Two decades ago, NLP was only capable of rudimentary language processing tasks and would not be described as intelligent or creative in any sense. The capabilities of these early models included splitting a continuous line of text to words, removing inflectional endings to words to get the base word, and identifying nouns, verbs, and phrases within a sentence.

A decade ago, the capabilities of language models improved exponentially in parallel with increases in compute power and availability of data. New and improved language models started incorporating lexical semantics, the ability to understand words in its context. This resulted in the development of capabilities such as Name Entity Recognition which allows the model to identify places, people, and categories, and sentiment analysis which can identify the emotions of a piece of text. Language models employing these capabilities are proven to be accurate and still widely used commercially today. 

Fast forward to the present, NLP models should legitimately be described as intelligent and creative, and are capable of completing high-level language tasks at a human level. The capabilities exhibited by NLP models includes summarisation of text, creating credible news articles and creative writing, reading comprehension, and answering general questions on any subject matter. A method used to measure and track a NLP model’s understanding of language is to complete a specifically designed reading comprehension test. Under the widely used SQuAD test database, modern NLP models have already surpassed human reading comprehension abilities.[3]

In 10 years time, it is likely that these NLP models and their capabilities will become widely used commercially. The legal profession undoubtedly will be impacted by these NLP models that are able to summarise cases, answer broad legal questions, and generate legal arguments. We will explore these short and long term implications to the legal profession.

Current use of NLP 

The ever-increasing volume of documents involved in transactions calls for smarter systems to manage documents during the discovery process in litigation. The most common use for NLP models in the legal profession at the moment is in document review and management.

A variety of companies, such as Legartis, offer AI assisted programs that help sift through contracts by identifying relevant, irrelevant and problematic clauses. Currently, the capabilities of NLP models still require the involvement of a human legal professional to be effective. In the case of document review, all documents believed to be relevant are uploaded to the AI powered-tool. Currently the tools are not sophisticated and are unable to categorise and review the documents without any context or examples. Therefore, a sample dataset of documents are required from the legal professional operating the software who then identifies what terms are to be considered relevant. From the training dataset the NLP model is then able to know what to look out for in the documents and sift through millions of documents to then flag only those relevant to the matter

This process has already been incredibly effective in reducing delays and other inefficiencies in the workflow of various firms and has received judicial approval such as from Vickery J in the case of McConnell Dowell Constructors (Aust) Pty Ltd v Santam Ltd (No 1).

NLP in the short term

Large law firms that already harness these technologies in their workflows are beginning to build in-house incubators to help further develop the capabilities of NLP models, like Allen & Overy's Fuse program which helps explore, develop and test legal-tech products. The current technology will be able to confidently handle predictable and standard decision-making such as Neota Logica, deployed by King & Wood Mallesons, which is able to determine whether a specific deal requires approval from the Foreign Investment Review Board.

However, in the short term NLP and its use in the field have not reached commercial viability for many high-complexity language tasks. The main hurdle preventing AI technologies from achieving full autonomy in high-complexity tasks such as legal writing is the amount of understanding necessary and its ability to learn on its own through training data.[4] The requirements for a successful piece of legal writing are also more vague than deciding whether documents fit certain criteria to be considered 'relevant' or 'requiring approval from a body'. Indeed, a recent National Taiwanese University study attempted to fine tune OpenAI’s Generation Pre-Trained Transformer 2 (GTP-2) algorithm – a language prediction algorithm impressively capable of coherent text generation – to generate patent claims.[5] Although the outcome was mixed, the researchers concluded that the reasonable levels of success they achieved was promising given that these models of NLPs are still considered to be in the early stage of development in the Deep Learning field.

The future

In 2020, OpenAI released GTP-3 which has over 100 times more parameters than GTP-2 and is far and away the most powerful NLP transformer released to date.[6] GTP-3 could certainly provide the leap in complexity required to automate legal writing and it would be unsurprising to see successful research models surface in the immediate future.

Further advances in NLP, such as deep reinforcement learning algorithms have combined the process of synthesizing language with aim of optimized strategies for achieving goals. The algorithm will generate texts until it reaches an optimal score based on a pre-inputted metric. In other words, the automation of legal writing is merely a matter of defining the metrics of ‘good’ legal writing.

Implications

It is clear that the developments in NLP algorithms enable the increasing automation of the legal industry. The next domino to fall will be the process of legal writing which encompasses the drafting of emails, advices, court documents, subpoenas and case notes which is currently performed by clerks and junior lawyers alike. Certainly, like self-checkout machines to cashiers, the role of law clerks and junior lawyers who are currently employed to do highly repetitive task of drafting and reviewing will be swiftly replaced by algorithms which are more efficient and accurate.

This trend is likely to impact the larger legal firms of corporate law where a team of clerks and paralegals may be replaced with a single clerk overseeing and operating numerous legal software tools. As a result, we are likely to see the next generation of lawyers become proficient software users which may lead to a hierarchical reduction to the structures of these multi-tiered law firms.

Nevertheless, the legal practice is significantly more complex than mere legal writing with legal reasoning, both judicial and in-practice, carrying an intrinsically human factor. There is sufficient discussion that this human nature of law acts as a limiting factor on the future implications of AI and automation. There is after all, no ‘perfect’ legal argument, rather a viewpoint which prevails in the current context of legislation and precedent which is ever evolving. As Allsop CJ wrote:

Law, being society’s relational rules and principles that govern and control all exercises of power, must have a character and form that is adapted to, and suited for, application to law’s human task.[7]

This has been echoed in the recent US Court of Appeals for the Second Circuit case of Lola v. Skadden[8] where the Court held that document reviews done by a NLP software was practicing law as the algorithm “exercised no legal judgement whatsoever”.

As such, the role of solicitors and barristers in society is likely to remain. Indeed, some commentators have argued that future of AI and legal labour will be complementary; with algorithms as tools which free the lawyer from tedious task and enable them to pursue services which progress the legal market and society at large.[9]

Endnotes

[1] Hildebrandt, M. (2020). Code Driven Law. Scaling the Past and Freezing the Future. Scaling the Past and Freezing the Future (January 19, 2020).

[2] Ibid.

[3] Rajpurkar, Pranav, Robin Jia, and Percy Liang. "Know what you don't know: Unanswerable questions for SQuAD." arXiv preprint arXiv:1806.03822 (2018).

[4] Haney, Brian, Applied Natural Language Processing for Law Practice (October 27, 2019). Brian S. Haney, Applied Natural Language Processing for Law Practice, 2020 B.C. Intell. Prop. & Tech. F. (2020). , Available at SSRN: https://ssrn.com/abstract=3476351 or http://dx.doi.org/10.2139/ssrn.3476351

[5] Jieh-Sheng Lee and Jieh Hsiang “Patent Claim Generation by Fine-Tuning OpenAI GPT-2”Department of Computer Science and Information Engineering National Taiwan University

[6] https://openai.com/blog/openai-api/

[7] Chief Justice James Allsop, ‘The Rule of Law is Not a Law of Rules’ (Speech, Annual Quayside Oration, 1 November 2018)

[8] Lola v. Skadden, Arps, Slate, Meagher & Flom LLP, No. 13-cv-5008 (RJS), 2014 WL 4626228, at *1–2 (S.D.N.Y. Sept. 16, 2014

[9] Ibid n4.