How data science is ushering in a new era of modern medicine
Algorithms, artificial intelligence, machine learning and other technologies are transforming the way physicians identify, treat and manage diseases. Here’s how Johnson & Johnson is putting the latest tools to work.
It used to be that the process of understanding a disease could be slow and laborious. Years ago, to see if their hypotheses bore out, scientists often had to sort through and review data by hand to start building a clearer picture of how diseases behaved.
Today, thanks to cutting-edge tools and applications in data science, researchers can grasp the way a disease behaves on a much swifter timeline. Think about it: Just two years after the emergence of the SARS-CoV-2 virus and COVID-19 (the illness it causes), scientists already know a great deal about how the virus infects the body, how to help treat it and how to lower the risk of severe disease—thanks in no small part to the unprecedented sharing of data among researchers from all over the world.
“An analogy to looking through large sets of data for clues about treating diseases is finding the needle in a haystack,” says Michael Morrissey, Global Head, Early Detection & Data Science, Lung Cancer Initiative, Johnson & Johnson. But by applying “rigorous statistical methods,” he says, “data scientists hope to figure out exactly where that needle is.”
At Johnson & Johnson, these rigorous methods may help pave the way toward improving treatments for—and potentially even intercepting the onset of—many deadly diseases.
“We are leveraging data science from the discovery phase—when we figure out what’s driving a disease—all the way to the stage when we make a medicine available to patients,” says Najat Khan, Ph.D., Chief Data Science Officer and Global Head of Strategy and Operations for Research & Development, Janssen Pharmaceutical Companies of Johnson & Johnson. “We currently have over 120 projects ongoing, covering about 90% of our pipeline. We’re coupling cutting-edge analytical approaches—like artificial intelligence (AI), machine learning, real-world evidence and digital health—with a massive amount of anonymized patient data being analyzed to gain transformational insights and drive concrete impact for our pipeline and for patients.”
Lung cancer, pulmonary arterial hypertension and diversifying clinical trials are just three areas where this cutting-edge work is taking place. Here, we explore how the company is leveraging data in these spaces to drive innovation and help change the way doctors identify, manage and treat disease.
Finding Lung Cancer Before It Spreads
Catching lung cancer early can make all the difference for a patient’s prognosis. In the United States, only 7% of people whose lung cancer has already metastasized at diagnosis are still alive five years later.
And yet detecting lung cancer early can be difficult. Symptoms such as persistent cough or fatigue can be vague and mirror signs of other conditions. By the time a patient is symptomatic, the window of opportunity to intervene with favorable long-term outcomes may have closed. And not everyone has access to screening resources they might need to find disease early, when it is most treatable.
But Johnson & Johnson data scientists are working to help reduce delays in diagnoses. Through the company’s Lung Cancer Initiative (LCI), researchers are harnessing data and technology to help doctors identify and treat lung cancer before it progresses.
What data science efforts like these are doing is augmenting clinicians’ knowledge and know-how and what’s already in the medical literature by actually recording what patients are dealing with in the real world.
One example? The Precancer Genome Atlas. Built in partnership with Boston University, the atlas contains both a registry of patients with precancerous lesions and nodules, as well as a catalog of genetic changes in those nodules that can precede lung cancer. By building this massive data set—which is currently in progress—and mining it for patterns, researchers hope to help doctors diagnose and treat patients at a very early stage—rather than waiting for a patient to have symptoms—and better understand how precancerous changes progress to malignancies, explains Morrissey.
As Khan puts it: “Data science efforts like these are allowing us to better understand disease drivers and dynamics at a molecular level, filling the gaps in knowledge to aid in earlier disease detection and in the creation of more effective therapies and vaccines for patients.”
Early detection tools are critical in treating early-stage lung cancer. When a radiologist spots a suspicious nodule on a scan, it’s not always immediately obvious whether that nodule is malignant or likely to become malignant in the future. So-called “indeterminate” lung nodules can result in the delay of diagnosis and time-sensitive treatment when a malignancy is present or unnecessary procedures for individuals who have a benign nodule.
In 2019, the LCI announced a collaboration with Veracyte, developers of the Percepta Nasal Swab, a noninvasive test that helps physicians detect gene-expression changes in the nasal epithelium. These changes reflect a “field of injury” in the respiratory tract that is associated with lung cancer in current or former smokers.
And AI algorithms can be trained to detect cancer-like features and assist physicians in selecting only those patients who will benefit from invasive treatments, as well as those who could benefit from close follow-up and subsequent screening.
The LCI Data Science team is also integrating complex molecular data and radiology scans from a number of collaborations and consortia, recognizing that tests like CT scans and nasal swabs may shift lung cancer detection earlier in the patient journey to give patients the best chance for a durable treatment response.
Putting PAH Patients on a Faster Path to Diagnosis
Researchers and data scientists at Janssen Research & Development are using similar radiomics technology to help with early detection of pulmonary arterial hypertension (PAH), a rare but deadly type of high blood pressure in the lungs caused by a narrowing of the arteries.
PAH, a sub-type of pulmonary hypertension (PH), often has no known cause and, if untreated, can lead to serious complications, including right heart failure and potentially early death. Unfortunately, like lung cancer, it can also be difficult to spot early. As a result, for many patients, it can take years to receive the correct diagnosis, says Mona Selej, M.D., M.S, Senior Director of Data Science Portfolio Management for Cardiovascular Medicine and Pulmonary Hypertension at Janssen R&D.
“The average prediagnosis time window for patients is currently unacceptably high,” says Dr. Selej, a pulmonologist and former ICU physician who’s treated many PAH patients. “For some, it can be up to four years.”
To help decrease PAH patients’ timeline from symptoms to diagnosis, Dr. Selej and the R&D data science team are developing software in partnership with AI companies Us2.ai and nference, as well as Mayo Clinic and the University of California San Francisco health system, that can help assess a patient’s PH risk based on two often-used cardiovascular tests: transthoracic echocardiogram (TTE), which uses ultrasound or sound waves to generate images of the heart, and electrocardiogram (ECG), a recording of the heart’s electrical activity. These AI-based innovative technologies are envisioned to be deployed as early detection tools for PH, consequently helping to find PAH sooner.
“While these common bedside tests are being performed, the software runs in the background and can more effectively discern multiple subtleties,” Dr. Selej says. Afterwards, the software generates a report that flags whether a patient’s findings are consistent with risk for PH.
The definitive confirmation of PAH requires an invasive test called right heart catheterization. These early detection tools, based on ECG and TTE, may help physicians discern whether or not a patient’s PAH risk profile warrants them moving ahead with such testing, says Dr. Selej. “The hope is that these tools can be optimized to expedite the PAH diagnosis and improve both the patient journey and outcome.”
Prioritizing Patients Who Have Been Overlooked
Early diagnosis is just one area of focus for Janssen data scientists. The company is also leveraging data to make clinical trials more diverse and inclusive—which is crucial when it comes to developing treatments that work for all patients.
“When you have a lack of representation in trials, you don’t know whether a therapy will be effective for those patients who weren’t represented,” explains Denise Bronner, Ph.D., Director of Diversity, Equity and Inclusion in Clinical Trials, Janssen R&D. “If you want health equity for all, that needs to be reflected throughout the entire trial process.”
Not to mention, clinical trials can also offer patients the unique opportunity to access potentially life-saving therapies. But patients who don’t know about trials or can’t access them can’t reap those potential benefits.
To ensure the company’s trials are representative and inclusive, Bronner’s team has forged a close partnership with the Janssen R&D Data Science group.
So many patients and communities have been overlooked. Thanks to the power of data, Johnson & Johnson is finding new ways to serve them—one small but important step on the path to achieving health equity for all.
“In general, fewer than 5% of eligible patients are enrolled in clinical trials for certain diseases,” says Khan. “Why is that? For one, many people might not be able to take time off of work to get to trial sites, which may be located far from their homes. So we’re using machine learning models to go where the patients are versus just to academic centers where we have established relationships. And we’re incorporating large amounts of census data so that our models send us to areas of high racial diversity. By doing this in a data-driven way, we’re really increasing the probability of success that we’re going to develop something transformational for the patients who need it.”
One example of this cutting-edge work: In a collaboration announced last December, Janssen R&D data scientists are working with Persephone Biosciences on the colorectal cancer arm of Persephone’s Argonaut study, which will collect and analyze stool and blood samples from 1,350 advanced-stage colorectal cancer patients and healthy individuals from racially diverse backgrounds with varying levels of cancer risk. Data from the study will be used to identify clinically actionable cancer-specific biomarkers and risk factors for colorectal cancer with the goal of developing new therapies that target the microbiome.
To help with study recruitment, the team is using data and technology, including social media, to recommend trial sites with diverse patient populations.
Using a tool called Brandwatch, “we’ll be looking for health influencers, including investigators and academic institutions that work with diverse patient populations,” says Shivani Mehta, Associate Director, Data Science, Janssen R&D. By finding and building a database of those influencers, Mehta hopes to forge new connections with potential trial sites.
Roughly 60 to 80% of eligible patients do not live near existing clinical trial site networks, explains Khan. That’s why going where the patients are—versus the other way around—in a very data-driven way is essential. “Leveraging machine learning-driven models also allows us to incorporate socioeconomic and racial diversity factors when determining where to place clinical trial sites, leading to greater diversity in our clinical trials across our pipeline,” Mehta continues.
“So many patients and communities have been overlooked,” says Bronner. Thanks to the power of data and data science, Johnson & Johnson is finding new ways to serve them—just one important step on the path to achieving health equity for all.