Developmental Toxicity Testing in the 21st Century: An Opportunity to Take a New Pathway

Home / New Perspectives / Reproductive & Developmental Toxicity / Developmental Toxicity Testing in the 21st Century: An Opportunity to Take a New Pathway

Reproductive & Developmental Toxicity

Developmental Toxicity Testing in the 21st Century: An Opportunity to Take a New Pathway

Ed Carney, The Dow Chemical Company

Published: June 24, 2009

About the Author(s)
Ed Carney serves as the Science & Technology Leader for Developmental & Reproductive Toxicology (DART) and Neurotoxicology at The Dow Chemical Company in Midland, MI, where he has worked since 1992.  He oversees regulatory toxicology testing programs in these disciplines, as well as investigational research programs focusing on in vitro alternatives to whole animal toxicology testing, developmental pharmacokinetics, and chemical mixtures.  Ed earned a B.S. in Animal Science from Cornell University (1981), and graduate degrees in reproductive physiology from University of Wisconsin-Madison (M.S. 1986) and Cornell University (Ph.D. 1990).  Immediately prior to joining Dow, Ed was a post-doctoral research fellow with Janet Rossant and Steve Lye at the Samuel Lunenfeld Research Institute at Mount Sinai Hospital in Toronto.

To date, Ed has contributed 69 peer-reviewed publications to the scientific literature.  He currently serves on the U.S. National Toxicology Program’s Board of Scientific Councilors and the editorial boards of Reproductive Toxicology, Toxicological Sciences, and, and is a member of the OECD Extended One-Generation Study Expert Group, the ILSI-DART subcommittee, and the Boards of the Toxicology Forum and the Humane Society’s Human Toxicology Project.  He also is Vice President-Elect of the Society of Toxicology’s Reproductive & Developmental Toxicology Specialty Section, holds an adjunct faculty position with the University of Michigan – School of Public Health, and lectures in the University of Surrey Master’s Programme in Toxicology.  Ed is a past Councilor of the Teratology Society and continues to serve on several Teratology Society committees, and is a Past President of the Midwest Teratology Association.

Edward W. Carney, Ph.D.
1803 Building
The Dow Chemical Company
Midland, MI  48674

Historical background. Since its inception following the thalidomide disaster in the early 1960s, testing for the potential effects of drugs and chemicals on prenatal development has remained largely unchanged. Standard protocols promulgated by worldwide regulatory agencies require the administration of test agents to pregnant mammals, followed by evaluation of viability, growth and structural abnormalities in full term fetuses. Over its nearly half century of use, this approach has served as the principle basis for countless public health decisions regarding pharmaceuticals, pesticides, industrial chemicals, herbals and other chemical entities. The longevity of this system is a tribute to its early designers.

The pioneers who drafted these protocols wanted to ensure that another thalidomide did not slip through undetected, and so they built in a number of features to maximize detection. These included testing in two mammalian species (typically the rat and rabbit, although monkeys, dogs and other species are sometimes used), exposure throughout the entire period of organogenesis, and the use of high doses, up to and including the maximal dose tolerated by the pregnant female. The default route of exposure was generally oral gavage, which tends to maximize detection by generating very high peak blood levels of the test agent. Large numbers of animals, typically 100 pregnant females of each species, were specified in order to generate a sufficient number of offspring (ranging from ~700-1400 each) to ensure detection of rare or unusual treatment-induced terata (birth defects). Finally, a suite of fetal end points were included, some of which were highly detailed and sensitive. For example, the tests require manual examination of over 200 bones of the developing fetal skeleton which are assessed for subtle differences in degree of ossification in addition to gross malformations.

Drivers for change. While there is some comfort in the protection afforded by this “fortress” of a testing system, both scientific progress and societal demands are exerting pressure on the traditional approach to testing. Many are not comfortable with a system in which the testing of a single compound requires more than 2000 animals, several hundred thousand dollars, and several months of effort. Also, testing resources are concentrated on a relatively small proportion of the universe of chemical entities, while regulatory initiatives such the European Union’s REACH program call for comprehensive toxicity data on approximately 30,000 chemicals. Clearly, the demand for toxicity data is now outstripping available resources.

The ability of animal models to accurately predict human developmental toxicity has also been criticized on the basis of discordant responses between test species (see, for example, Bailey’s Way Forward essay), which are not uncommon in standardized testing. If a rat and rabbit do not respond similarly to a test compound, then how can we rely on them to predict the human response? While it is not quite that simple, interspecies discordance certainly can be a problem in the current system.

Others question the continued requirement to test at maximally tolerated dose levels, even in cases where human exposures are orders of magnitude lower. We now know that high doses can overwhelm normal detoxification processes, often leading to major shifts in compound metabolism and/or saturation of toxicokinetic processes such as renal clearance. High doses also induce maternal toxicity which raises animal welfare issues, confounds data interpretation, and has been a perennial thorn in the side of developmental toxicity risk assessment. In the case of many industrial and agricultural chemicals, these unique high-dose responses are unlikely to occur at lower doses characteristic of human exposure (Holsapple and Wallace, 2008, Toxicol. Lett. 180, 85-92), rendering the data of questionable relevance for human risk assessment.

So what can be done to remedy this situation? Decades of effort have been focused on the development of short-term, inexpensive, alternative assays, such as the micromass assay, whole embryo culture, and the embryonic stem cell test. These assays have been useful as early stage screens, and as research tools to elucidate mechanism of action. However, mammalian development is far too complex for these assays to fully replace the traditional tests, at least within the current paradigm of safety testing. We will never meet the latter goal if alternatives are designed in the image of the existing tests. What we need to do is rethink the image in light of 21st century needs.

A New Paradigm. One radically new and different paradigm receiving a great deal of attention is the National Research Council’s “Toxicity Testing in the 21st Century: A Vision and a Strategy” (see reference in Andersen, et al. Way Forward essay). According to the vision, whole animal, effects-based testing would largely be replaced by assessment of cellular-molecular level signaling pathways using a battery of high-throughput assays. Many of these assays would utilize human cell lines. This new paradigm would be risk-based,in contrast to hazard-based systems, such as the EU Classification and Labeling system, which increasingly dominate industrial chemicals management. This would be achieved by integrating toxicity data with human exposure models and toxicokinetic modeling, providing a stronger context for choosing test doses, interpreting data, and assessing human risk. Linkages across this spectrum would be strengthened by the use of internal dose metrics (tissue or fluid concentration), rather than the applied dose (mg/kg/day).

Even the authors of the NRC report appreciate that its full realization will take many years and extensive resources, with analogies to the Human Genome Project or Manhattan Project already being made. For some, the vision might even seem impossible. In terms of developmental toxicity, one might ask if we can realistically expect changes in specific signaling pathways to predict malformations like missing or extra ribs? Even potent teratogens rarely have a penetrance exceeding 20-30%, with 70-80% of similarly exposed fetuses appearing completely normal. Some malformations, such as retroesophageal subclavian artery (the subclavian passes dorsal, rather than ventral, to the trachea), involve just a slightly altered course of an otherwise normally formed structure.

Is it reasonable to believe that these effects would be detected by examining alterations in signaling pathways? Perhaps not if one is designing in the image of traditional testing and risk assessment, but again, we have an opportunity to rethink the image. Particularly for the thousands of chemicals which are present at very low levels in the environment, what society needs most is a means of determining whether or not current exposures to these chemicals, either individually or as a mixture, represent a significant risk. For these chemicals, it is not necessary nor is it prudent to characterize every possible effect, especially when those effects occur exclusively at doses which are far above expected human exposures. The NRC vision opens the door to a new paradigm in which we can focus more on critical events at the molecular and cellular level which can be used to more accurately estimate thresholds demarcating safe from potentially unsafe exposure levels.

While the challenge of identifying signals that predict altered development apart from the range of normal signaling should not be underestimated, it is a direction the field of developmental toxicology should aggressively pursue. Keep in mind that distinguishing between low incidence treatment-related effects vs. spontaneous background changes has always been a problem in developmental toxicology. Those of us involved in safety testing are often faced with borderline changes in the incidence of even a single fetal abnormality, yet we know that highly consequential risk assessment, classification and labeling decisions ride on the interpretation of these changes. Was that missing tail in one high-dose fetus a fluke, or was it a harbinger of potential risk in humans (especially considering that most humans are missing their tails!)?

The tools we currently rely on to interpret these data are fairly limited. Statistics offer little help for interpreting low incidence fetal abnormalities. Historical control data give us some sense of the normal range over a period of time, but are not a substitute for concurrent controls. Our best tool is the experience and expert judgement of the investigator, but sometimes it simply comes down to intuition. Wouldn’t it be better to arm investigators with mechanistic data to help make these important calls? In the case of that missing tail, I’d much rather know whether or not hox gene expression was altered, not to mention having data on other highly conserved signaling pathways. At a bare minimum, research to identify predictive signals is certain to enhance the knowledge base of our field and would help close the ever increasing gap between developmental toxicology and basic developmental biology. However, if we cling exclusively to descriptive developmental toxicity, there is little chance of progress.

The other critical aspect of the NRC vision is its integration of assay results with toxicokinetics and human exposure modeling to constitute a risk-based, rather than hazard-based, evaluation system. This risk context is ultimately what is needed to manage chemicals safely, but without denying the public access to the benefits of these chemicals. The risk-based NRC paradigm, which calls for stronger links between exposure modeling, kinetic modeling and signaling pathway data, represents a platform to foster continual improvements in risk assessment. As the technology to estimate human exposures and internal dosimetry improves, so will the accuracy and relevance of human risk assessments. This, in turn, should enhance public confidence in our safety evaluation system.

Interim approaches. As an interim step along this journey, alternative models such as zebrafish and C. elegans seem to be a logical place to start building bridges between mammalian test species and in vitro models, such as human embryonic stem cells. These intact organisms capture the complexity and inherent interactions involved in embryogenesis, and their rapid rates of development facilitate coverage of a wide swath of ontogeny. As favored models in basic developmental biology, they come “fully loaded” with a rich understanding of developmental regulation on the cellular and molecular level. In fact, this depth of understanding is in stark contrast with that of rabbit or rat embryogenesis, which can be thought of as proverbial black boxes.

Another immediate step we can take down this new pathway is to add toxicokinetic measures to our existing studies. This has been done for many years in the pharmaceutical industry, and is something that should be increasingly incorporated into the testing of industrial and agricultural compounds. While toxicokinetics in whole animals is not a replacement alternative, it is a refinement that increases the amount and quality of data from a given set of animals and can reduce some of the uncertainty in extrapolations between animal data and human exposures.

The Way Forward. Taking all of this into consideration, which path should we choose as the way forward? Clearly the current system is the most reliable means we have of evaluating developmental toxicity potential, but its long-term sustainability is doubtful. The pathways approach as conceptualized in the NRC vision holds promise for modernizing and revitalizing developmental toxicology, enhancing risk-based decision making, reducing animal use and restoring public confidence. The shift to a pathways-based test system will require an enormous amount of research, but as outlined above, there are interim steps along the way that can move us in the right direction. Integrative research which firmly links the entire product safety assessment chain from human exposure modeling, to internal dose metrics and effects in test models is certain to yield myriad benefits.

The shift to a new paradigm needs to be implemented carefully and supported by robust validation data as there are serious consequences for a system that generates an excess of false negatives or false positives. Broad stakeholder engagement also will be a requirement for the ultimate success of a new system. While there is always uncertainty in charting a new course, this new path needs to be vigorously pursued if developmental toxicity testing is to remain scientifically credible and able to meet the societal demands of the future.
©2009 Ed Carney