Artificial Intelligence in Drug Development

Introduction

The pharmaceutical and biotechnology industry is experiencing a technological revolution in drug discovery and development with the promise of artificial intelligence (“AI”).  AI has the potential to make the drug discovery and development process more efficient, faster, more accurate and, importantly, AI may enable researchers and companies to discover and deliver novel medicines for patients with unmet needs.

This revolution has been occurring over the past two decades but has accelerated dramatically in recent years with the development of more advanced technologies.  For example, the Food and Drug Administration (“FDA”) reported in its May 2023 discussion paper entitled “Using Artificial Intelligence & Machine Learning in the Development of Drug & Biological Products” (“Discussion Paper”) that, as of 2021, there were over 100 submissions for drug and biological product applications that included AI components.1 These submissions apply to a range of therapeutic areas and the uses of AI described within these submissions cover a variety of different aspects of the drug development process, including drug discovery, clinical trials, endpoint assessment and post-market safety surveillance.

In addition, the AI ecosystem in the pharmaceutical and biotechnology space has grown significantly, with a variety of AI focused companies collaborating with and providing services to the industry.  The trend toward using AI in drug discovery and development also has been reflected in a series of recent notable mergers and acquisitions and partnering transactions.

Notwithstanding these recent trends and the potential for developing novel drugs in silico, there are a number of legal and commercial risks and limitations relating to using AI in drug discovery and development, particularly given that laws and regulations relating to the use of AI in major market jurisdictions are only beginning to be established and will evolve over years to come.

Accordingly, in this chapter, we provide an overview of some of the key legal and commercial considerations arising from the use of AI in drug discovery and development and discuss potential strategies for managing and mitigating associated risks.

What is AI?

Before we discuss how AI is being used in connection with drug discovery and drug development and the associated legal and commercial considerations, it is important to first define what AI is, given that the term has been used differently in various contexts.  For the purposes of this chapter, given our focus on the drug discovery and development process, we use a definition of AI that has been described by the FDA in its Discussion Paper.  More specifically, the FDA defines AI as a “branch of computer science, statistics, and engineering that uses algorithms or models to perform tasks and exhibit behaviors such as learning, making decisions, and making predictions.”  In addition, we note that the FDA defines machine learning (“ML”) as a “subset of AI that allows ML models to be developed by ML training algorithms through analysis of data, without models being explicitly programmed.” Importantly, AI encompasses a range of technologies, which in addition to ML, may include deep learning and neural networks, and uses predictions and automation to optimize and solve complex tasks. A common characteristic of these technologies is that they rely on the iterative identification of patterns across large data sets without the need for explicit programming by humans.

How is AI being used in drug development?

AI is being used or has the potential to be used in virtually all aspects of the drug discovery and development process as well as post-market safety surveillance and manufacturing.  With respect to the early-stage aspects of drug discovery and development, AI may improve the identification, selection and prioritization of drug targets.  AI can be used to mine and analyze data from disparate sources, including genomic and proteomic data and data from scientific publications, to identify molecules of therapeutic interest for particular conditions, even in circumstances where the relevant disease pathway may be poorly understood.  Such molecules may in turn be investigated as potential drug candidates.

Similarly, AI may allow for more efficient screening of drug candidates, including by matching compounds to therapeutically relevant targets and predicting the physicochemical properties, bioactivity, toxicity and binding affinity of such compounds.  Given the capacity of AI to analyze large datasets, this analysis can be performed across libraries that include billions of compounds.  AI may also prove to be an essential tool for de novo drug design given its capacity to more accurately predict three dimensional structures and protein folding characteristics.

With respect to later stages of drug development, AI algorithms can be used to improve the efficiency of pre-clinical research as well as the accuracy of pharmacokinetic and pharmacodynamic models.  AI can also be used to more efficiently design and conduct clinical trials, including by helping to identify, recruit, select and stratify trial participants, identify trial sites, collect and analyze data, as well as detect safety issues.  Moreover, AI may help optimize product manufacturing and improve process controls, and regulators have also identified that AI may be able to be used to efficiently identify and report adverse events as part of post-marketing safety surveillance.

Given that the drug development process is significantly capital-intensive and time consuming, the broad spectrum of AI use cases ranging from early-stage drug discovery through post-marketing safety surveillance provides companies with compelling reasons to consider AI to create efficiencies and reduce development timelines and costs.

Key legal and commercial issues arising from using AI in drug development

Although the benefits and potential of AI-enabled drug discovery and development are clear, the use of AI also presents various legal and commercial risks that pharmaceutical and biotechnology companies should consider.  At the outset, it is important to emphasize that there is no “one size fits all” approach to identifying and managing the risks of using AI in drug discovery and development, particularly because the types and severity of risks that may arise depend on the applicable use case.  Moreover, the risk profile will depend on factors such as the type of data that was used to train, or is otherwise input into, any AI system and whether the relevant technology has been developed in-house or is being procured and managed by an external service provider.  Furthermore, the use of AI may also involve various ancillary transactions that raise specific issues, such as the licensing of large-scale datasets to use as training data.

Input risk

AI systems rely on very large volumes of data.  The use of data, both to train AI models and as the subject of analysis by such models, raises a number of legal and commercial issues.

First, training or other input data may include material that is protected by data privacy, intellectual property or other similar rights and laws.  The use of AI for drug development is likely to see companies rely increasingly on large datasets that include information about individuals’ genetic makeup or medical history.  These datasets may be used to, amongst other things, find patterns in the distribution of a particular disease across a particular genetic population.  To the extent not deidentified, these data may be subject to various laws concerning data privacy and security.  While the specific laws that will apply depend on both the type of information involved and the jurisdiction where that information is collected or processed, many data privacy regimes around the world impose additional obligations with respect to the collection, storage and use of this kind of sensitive, health-related information.  In addition to this increased regulatory burden, the storage of such sensitive information on an ongoing basis may also expose companies to increased liability and reputational risk in the event of a cyberattack or other security incident.

In certain scenarios, training data also may include material protected by intellectual property laws.  For example, scientific literature and other written materials used as training data may be protected by copyright, which subsists at creation and independent of registration.  Furthermore, in circumstances where individual datapoints are not separately copyrightable, for example because they consist of unprotectable ideas or facts, databases including such data may be protectable as copyrightable compilations or, in certain jurisdictions, by specific database rights.  Copyright owners enjoy certain exclusive rights, including the exclusive right to reproduce, distribute copies of and create derivatives from their copyrighted material.  Such rights may be infringed by, for example, cleaning or re-ordering such material to allow it to be used as training data.  Where relevant material is protected by copyright, companies seeking to use such material will generally either need to obtain a license from the owner or rely on a fair use defense.  While there is ongoing uncertainty, and a significant amount of pending litigation, around how the fair use doctrine applies to material used to train AI systems, whether the defense applies requires a context-dependent, case-by-case analysis, including consideration of, among other factors, the specific purpose for which the copyrighted material is being used.

Second, in addition to legal restrictions on the use of training or other input data, companies that license datasets to third parties may impose contractual limitations on how such data can be used.  Within the pharmaceutical and biotechnology industry, and particularly with respect to arrangements involving academic, research and other non-profit institutions, it is not uncommon for data or other assets to be licensed solely for the purposes of research and development or other limited uses.  Such limitations may also be applied to data licensed from research institutions for the purpose of training AI systems, such as where models are trained on data published in scientific literature.  In these circumstances, companies may be required to obtain a separate license from the relevant provider to use such datasets – or even models trained on such datasets – for commercial drug development projects.

Third, uploading data into an AI system owned and operated by an external service provider may give rise to confidentiality concerns.  Data sharing may prove to be an inescapable part of using AI for drug discovery and development, however, the type of information uploaded to such a platform will often be commercially sensitive, including by potentially revealing information about a company’s development strategy.  For example, the physicochemical properties or binding affinities a company may be seeking to locate using an AI system may disclose the biological targets on which the company’s drug discovery program is focused.

Finally, the data used to train AI systems may contain biases or be inaccurate or incomplete.  For example, AI models trained on patient data may include data only from certain ethnic groups or geographical areas.  These limitations may compromise the utility of such models as tools for drug development, particularly if such limitations are unknown to the drug developer and unable to be adjusted for in the applicable model.

Output risk

There are also important legal and commercial issues to consider involving the output of AI systems, particularly in situations where pharmaceutical and biotechnology companies partner with external technology providers.

In the pharmaceutical and biotechnology industry, as in other industries in which intellectual property rights are of critical importance, a key threshold issue is the extent to which patent or other intellectual property rights will subsist in the output of an AI system.  There is currently significant uncertainty around the patentability of such output, which in turn creates uncertainty in how rights can be allocated between drug developers and their partners.  Courts in jurisdictions around the world have recently been required to apply traditional patent laws to inventions and material developed by, or with the assistance of, AI.  At the same time, policymakers have begun to consider the extent to which these laws should be reformed to account for AI.

The traditional legal view in the U.S. is that only a natural person can be an inventor for the purposes of patent law.2 In April 2023, the U.S. Supreme Court declined to review a 2022 decision of the U.S. Court of Appeals for the Federal Circuit confirming that an AI system cannot be listed as the inventor on a U.S. patent application.3  Importantly, as patent rights are contingent on registration, under current U.S. law, no patent rights arise in an AI-created invention, notwithstanding that such an invention would be entitled to patent protection if created by a human inventor.  Other jurisdictions, including the United Kingdom, have taken a similar approach when applying existing patent laws to AI-generated inventions.4

However, these decisions do not provide any meaningful guidance on the patentability of inventions generated through a combination of human ingenuity and AI.  It is reasonable to expect that most inventions made in the context of AI-enabled drug development would fall into this category.  For example, where an AI system is used to screen a compound library for potential drug candidates, one or more humans may be required to formulate the parameters of the screening process or otherwise provide the prompts that direct the AI model.  In these scenarios, the key issue is likely to be whether the degree of human involvement in the invention was sufficient to mean that there is an inventor that is a natural person.  If there is a human inventor, then such inventor can form the basis of any patent application and, importantly, assign or otherwise transact for any related patent rights.

There is little judicial guidance yet as to the degree of human involvement necessary to establish human inventorship.  However, on February 12, 2024, the U.S. Patent and Trademark Office (“USPTO”) issued guidance confirming that, based on existing law (including the requirements for inventorship discussed above), each claim of a patent requires a human inventor, joint inventor, or co-inventor who has “significantly” contributed to the claim’s conception.5  Further, it is likely that similar issues will arise in other areas of intellectual property law which may help inform the approach taken by courts with respect to patentability.  For example, the U.S. Copyright Office recently declined to find that the inputting of text prompts into the Midjourney generative AI system was sufficient for the human person that authored such prompts to be deemed the creator of the images generated by Midjourney for the purposes of U.S. copyright law.6 It is not difficult to foresee similar issues arising in the context of AI-enabled drug discovery and development.  Whether courts take a similar approach to the U.S. Copyright Office remains to be seen, particularly given courts in other jurisdictions – notably the People’s Republic of China – have taken a more permissive approach to AI authorship.7

The uncertainty around the patentability of AI-enabled inventions is likely to continue.  Even if there emerges a set of judicial decisions addressing this issue, these decisions are likely to be fact-specific and may be superseded over time given the rapid pace of technological advances in AI.  While the USPTO’s February 2024 guidance – and subsequent USPTO practice – may help provide some clarity, we expect the factors in the U.S. that will ultimately guide companies on the extent to which AI generated inventions may be patentable will likely evolve through case law (absent new legislation clearly addressing the topic).

To the extent the output of AI systems is protectable, the question arises as to which party owns, or otherwise has rights in, such output as between a third party AI service provider and the service recipient.  Ownership of, and rights to, output will generally be determined by the contractual terms agreed between such parties.  These terms may cover things such as which party has the right to publish any output data and which party owns any intellectual property rights arising from the use of such data.  Importantly, even where the output may not be patentable or protected by other intellectual property rights, contractual terms may nonetheless impose restrictions on, or grant permissions around, how such output can be used.  For example, platform providers may seek to reserve certain rights, including the ability to use output data, or data input into the model by a user, as training data moving forward.

More generally, it may be difficult to identify inaccuracies or other errors in the output of AI systems.  Such errors may arise for various reasons, including because of the training data limitations discussed above.  AI systems generally lack explainability such that AI models may be incapable of explaining the reason for their behavior, including as to why one compound is identified instead of another.  Therefore, it may be difficult for companies to identify potential limitations in particular drug candidates assessed as promising by an AI system absent further research.

Finally, as with input data, there is also a risk that material generated by AI systems, and the subsequent use or other exploitation of this material, may infringe or otherwise violate third party intellectual property rights.  Several commercially available large language models include technical measures designed to avoid model output incorporating material that infringes copyright rights.  However, these safeguards may not be effective, or may not be included in more specialized systems.  In addition, model output may include material protected by other intellectual property laws, for example a patented compound.  While AI system developers and providers have been the main targets thus far in litigation by intellectual property rights owners over the creation and use of AI system output, users of AI systems are not insulated from liability for infringement; and although some AI system providers may be willing the grant an indemnity to their users, those contractual rights, and the protection they offer, may be limited.  Moreover, AI system providers may instead require indemnification from their users, which can further heighten the risk associated with the use of AI system output.

Regulatory risk

As the use of AI increases, governments in various jurisdictions are turning their attention to regulating the technology, which may create additional compliance risks for pharmaceutical and biotechnology companies.  The European Union is currently finalizing its Artificial Intelligence Act (“AI Act”), which is expected to provide a comprehensive, risk-based legal framework for AI systems.

While the prospects of holistic, federal AI regulation in the U.S. remains uncertain, individual agencies may take a more proactive approach, particularly where use cases raise safety or national security concerns.  For example, President Biden’s recent Executive Order, signed on October 30, 2023, directed government agencies to further study the risks of using AI in synthetic biology and to impose new funding conditions for synthetic nucleic acid procurement.  The lack of holistic regulation in the U.S. also may result in the EU’s AI Act operating as a de facto global standard in a similar manner to the EU’s General Data Protection Regulation.  Nonetheless, there remains significant ongoing uncertainty about how such AI-specific regulations will be interpreted and applied by regulators and courts, and how existing legal frameworks, including intellectual property, data privacy and product liability laws, will be applied to AI systems.

As discussed above, the FDA and other similar regulatory authorities have also recognized the increasing role of AI in drug development.  However, the regulatory framework governing AI in this context is still being developed, particularly compared to recent efforts to provide guidance around the use of AI as part of medical devices (such as the development of Good Machine Learning Principles, which were jointly developed by the FDA, the United Kingdom’s Medicines & Healthcare products Regulatory Agency and Health Canada).8  In May 2023, the FDA released its Discussion Paper outlining its experience with AI to date and requesting stakeholder feedback on different approaches that could be used to provide regulatory clarity around the use of AI in drug development.  The paper notes that AI-led drug development may involve unique risks, with the FDA appearing particularly concerned about explainability problems, such as those discussed above, and the risk of AI amplifying errors and biases in training data.  Ultimately, we expect that any regulatory framework regarding the use of AI in drug discovery and development will be grounded in ensuring the safety and efficacy of approved medicines.  In that regard, the FDA has indicated it plans to develop a regulatory framework for the use of AI in drug discovery and development in a manner that promotes innovation while also ensuring patient safety.9

Risk management and mitigation strategies

AI, and the risks associated with such technology, are likely to become an inescapable part of economic activity in the future, including in connection with drug discovery and development.  Nonetheless, there are various mitigation and risk management strategies that pharmaceutical and biotechnology companies can deploy to navigate the legal and commercial issues inherent in using AI as part of drug discovery and development.  Determining which strategies are most appropriate will depend on the specific way the AI system is being exploited.  Accordingly, we conclude by setting out some high-level considerations for users of AI.

Governance

Good governance is essential for companies seeking to deploy AI systems, including for the purposes of drug development.  Companies should develop and maintain clear policies around the use of AI, including policies governing how they engage third party providers of such technology.  In addition, companies should put in place appropriate reporting and supervisory structures to allow boards and senior leaders to have oversight of key AI projects and how this technology is being applied across their businesses more broadly.  Importantly, appropriate governance structures should also form part of any joint governance protocols agreed as part of any contractual arrangements that involve use of AI.  These may be particularly important in partnerships between established pharmaceutical or biotechnology companies, on the one hand, and AI-focused “techbio” startups or other technology companies, on the other hand.

Due diligence

Companies procuring access to AI systems developed by external service providers should seek to understand as much as possible about how such models were developed and trained. This is important for two key reasons.  First, it places companies in the best position to identify any significant errors or biases which may compromise models and adversely affect their return on investment.  Second, it allows companies to better assess the legal and reputational risks that may flow from partnering with a particular provider.  For example, companies should satisfy themselves that AI service providers have acceptable policies and controls to ensure that the ingestion of training data does not violate any third-party intellectual property or data privacy rights, including by understanding the extent to which any service provider is relying on fair use in connection with any training data protected by copyright.  Companies, whether licensing-in an AI system or working with a service provider, should also review any applicable open source software licenses to ascertain whether so-called “copyleft” provisions apply and to ensure such licenses do not impose any unacceptable restrictions on their of use of the AI system or any output.  In addition, companies should be sure from a technical perspective that any open source software does not create any security concerns.  Furthermore, where a company is uploading confidential or other regulated data to an AI service provider, it should undertake sufficient diligence around the provider’s and its key vendors’ cybersecurity practices.

Similarly, companies in-licensing datasets for use with AI systems should also seek to understand how such data were collected and what consents or other permissions were obtained.  Again, this is particularly important where datasets include personal or other sensitive information.  Further, companies should seek, where feasible, to license such datasets for both research and development and commercial uses, as the costs of obtaining a commercial license may increase significantly once it becomes apparent that the license relates to a promising drug candidate.

To the extent that service providers continue to update their AI systems, companies may also wish to seek audit rights to allow them to undertake diligence on any material platform updates or any material changes to how the model is trained.

Defining rights and obligations

AI service providers and users should clearly agree, and written agreements should clearly delineate, their respective rights in the output of any artificial system.  Users should seek to include appropriate restrictions, including stringent confidentiality obligations, around any sensitive information used as input data.  This is essential to preserve the user’s trade secrets and may also play a role in preserving the patentability of inventions created using AI.  Companies should also consider what, if any, rights service providers should have in any model output.  Finally, relevant agreements should also set out the parties’ respective liabilities for the various intellectual property infringement and other risks that can arise when using AI systems.  For example, several large AI service providers currently agree, subject to certain terms and conditions, to defend their enterprise customers against copyright infringement claims resulting from the use of output generated by their platforms.  The scope and availability of these kind of protections is likely to evolve as the treatment of AI under intellectual property laws becomes clearer.

However, it is also important to note that contractual protections may not completely de-risk the use of AI.  Even where a service provider agrees to take on the financial risk of any intellectual property claims, intellectual property proceedings against the service provider or its customers may nonetheless interrupt companies’ use of AI systems, such as if injunctive relief is obtained, and cause reputational harm, or the service provider may not have the financial ability to stand behind its indemnity obligations.  This reinforces the importance of robust diligence to allow pharmaceutical and biotechnology companies a sufficient understanding of their overall risk exposure.

Both users and providers of AI systems should also consider patentability issues as part of their partnership arrangements and consider including protocols to maximize the chances that valuable output may be patentable, including by documenting the input of natural persons into the invention process.  In addition, given the ongoing legal uncertainty around patentability, companies should consider fully utilizing other mechanisms to protect valuable inventions, including trade secrets and regulatory exclusivity.

Validating results

Separate from the diligence process, companies seeking to develop in-house AI capabilities for drug development, or partner with external providers, should also implement appropriate validation mechanisms.  The lack of explainability inherent in AI systems means that it can be difficult to assess whether such systems are working as intended and, more importantly, whether such systems are providing value.  Even if a particular compound is identified as an attractive drug candidate by an AI system, further research will be required to ascertain the drug’s characteristics, effectiveness and safety.  Accordingly, implementation of appropriate validation protocols is necessary for any development program using AI.

Conclusion

The potential impact of AI on drug discovery and development is profound.  AI promises to help pharmaceutical and biotechnology companies identify and design novel medicines, and to more efficiently bring such products to market.  However, the increasing use of AI raises various legal and commercial risks, many of which stem from the current legal and regulatory uncertainty around the use of this technology.  Accordingly, pharmaceutical and biotechnology companies will need to continue to deploy risk management strategies to navigate these uncertainties and optimize the potential benefits of AI to their drug discovery and development activities.

Footnotes:

1 U.S. Food & Drug Administration, Using Artificial Intelligence & Machine Learning in the Development of Drug and Biological Products – Discussion Paper and Request for Feedback (May 2023).  Available at: https://www.fda.gov/media/167973/download.

2 See, e.g., Beech Aircraft Corp. v. EDO Corp., 990 F.2d 1237 (Fed. Cir. 1993).

3 Thaler v. Vidal, 43 F.4th 1207 (Fed. Cir. 2022).

4 See, e.g., Thaler v. Comptroller-General of Patents, Designs and Trade Marks [2023] UKSC 49.

5 U.S. Patent and Trademark Office, Inventorship Guidance for AI-Assisted Inventions (Feb. 13, 2024).  Available at: https://www.federalregister.gov/documents/2024/02/13/2024-02623/inventorship-guidance-for-ai-assisted-inventions.

6 U.S. Copyright Office, Registration Decision on Zarya of the Dawn (Feb. 21, 2023).  Available at: https://www.copyright.gov/docs/zarya-of-the-dawn.pdf.

7 Li v. Liu, Beijing Internet Court Civil Judgment (2023) Jing 0491 Min Chu No. 11279.

8 U.S. Food & Drug Administration, Good Machine Learning Practice for Medical Device Development: Guiding Principles (Oct. 27, 2021).  Available at: https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles.

9 U.S. Food & Drug Administration, Artificial Intelligence and Machine Learning (AI/ML) for Drug Development (May 16, 2023).  Available at:  https://www.fda.gov/science-research/science-and-research-special-topics/artificial-intelligence-and-machine-learning-aiml-drug-development#:~:text=FDA%20recognizes%20the%20increased%20use%20of%20AI%2FML%20throughout,with%20more%20than%20100%20submissions%20reported%20in%202021.