How can AI and data science help to fight the coronavirus?
Hundreds of Ericsson volunteers have come together to take on the COVID-19 Open Research Dataset Challenge. In doing so, they are sourcing and building crucial data sets to help the world find an answer to the top challenge right now: the coronavirus. Learn more about them and their work below.
The challenge is based on the CORD-19 dataset, which contains tens of thousands of research papers. It is growing rapidly – faster than any researcher can read and digest. The challenge is to build tools to help medical researchers extract the information they need, even as more information continues to be published.
On Monday, March 16, in the US, the White House together with, among others, the National Institutes of Health and Georgetown University, released the CORD-19 dataset and issued a call to action.
In just a few days, more than 362 people from Ericsson raised their hands and offered to participate. The volunteers include data scientists, data engineers, data visualizers, project managers, task managers, leaders, and writers.
The challenge is divided into nine tasks with multiple sub-tasks. All volunteers have been divided into smaller teams working on sub-tasks. In addition, all work is being done virtually.
The nine tasks include:
- What is known about transmission, incubation, and environmental stability?
- What do we know about COVID-19 risk factors?
- What do we know about virus genetics, origin, and evolution?
- What has been published concerning ethical considerations for research?
- What do we know about the effectiveness of non-pharmaceutical interventions?
- What do we know about diagnostics and surveillance?
- What has been published about medical care?
- What has been published about information sharing and inter-sectoral collaboration?
- What do we know about vaccines and therapeutics?
Let’s meet a few people who are volunteering their time to work on just some of the tasks above and learn about what they’re working on:
What’s been published from a public health surveillance perspective?
Nikhil Korati Prasanna works at GAIA as a Data Scientist. His day job involves spatial anomaly detection, computer vision, knowledge graphs and natural language processing machine learning. Together with his team, he is helping to figure out what has been published concerning diagnostics from the public health surveillance perspective. By doing so, Nikhil expects to be able to predict clinical outcomes. The team is also trying to extract data from the research to determine how widespread the current exposure is to guide policy recommendations on mitigation measures.
Specifically, they are building a model to learn what the literature says about national guidance and guidelines about best practices to states, for example how states might leverage universities and private laboratories for testing purposes. The team is also trying to understand what the literature says about how to communicate to public health officials and the public about the situation.
To do so, Nikhil’s team is working on text summarization to find the relevant data from a huge number of articles and thereafter developing a classification model to predict positive and negative clinical outcomes based on what has been published.
The tools they develop will provide up-to-date data, even as more articles are added to the dataset, including what the various diagnostic approaches are and the affected demography. In the hands of medical researchers and public health officials, Nikhil hopes his team’s work will lead to more positive clinical outcomes.
What do we know about drugs being developed to fight coronavirus?
Emmett Moore, a software developer at Ericsson is leading a team that is finding information in the dataset about vaccines and therapeutics, such as understanding the effectiveness and risks of the drugs being developed to combat the virus.
His team started by brainstorming a set of keywords, phrases and topics to use as a base when they are parsing the literature to try and find which articles are relevant to the questions they are trying to answer. Emmett says: “As very few of us have a background in medicine or pharmaceuticals, this is proving quite interesting. For example, if we’re trying to find papers about vaccines, what words or phrases would tend to appear? We’re hoping this would help us discard any off-topic paper and give us a smaller dataset.”
In parallel, the data visualizers on the team are figuring out how best to present the team’s findings. For instance, they may use word clouds or more concise summaries of the relevant papers to demonstrate the key findings.
Emmett hopes his team’s delivery will help medical researchers looking into the topic of treatment and prevention of COVID-19. Since this dataset contains tens of thousands of full text papers and continues to grow, it is impossible for a single person or even a large group of researchers to read it all. Emmett is optimistic that new technologies like machine learning can really benefit society.
What has been published about medical care and COVID-19?
Becky Prince, a former English and writing teacher, now works as a technical writer. Becky’s team has been assigned Task 8 which is to explore what has been published about medical care and COVID-19.
The data scientist and data engineers on her team are currently generating ideas and testing various algorithms for their coding. Becky’s job is to then document their findings. The submissions will be scored in three categories: accuracy, documentation and presentation. Each category has a maximum of five points. So, the documentation and presentation are an important part of the submission.
Becky hopes the impact of her team’s work will be to help researchers more quickly and easily access the specific data they need as it relates to medical care in the fight against COVID-19.
How can we answer questions about things like incubation periods and how contagious the virus is?
Manjeet Attri works as a developer in Ottawa. Manjeet has a Master of Science degree from the University of Alberta and also worked for Ericsson in India for more than two years. In his team, his work is to test, verify and troubleshoot the test cases.
His team task is to find answers to the range of incubation periods, for how long individuals are contagious and how the virus transmits. His team is starting to make the huge dataset into a smaller one by choosing the most relevant data, making it graspable. Manjeet is responsible for the final report and has already prepared a report outline. The outline will give the data scientists and engineers a clear goal to work towards.
Manjeet says: “We are rearranging thousands of research papers to help medical professionals to find answers to their questions in seconds by looking at our answers.”
We win by giving participants hope
I’ve been overwhelmed by the response to the challenge and the work the volunteers have already done. I thought we might get 20, maybe 30 volunteers. Instead, we now have over 360. We are working hard to build a structure so that every volunteer knows what they are working on and with whom they are working. We’ve also tried to create processes for information flow so that we are sharing best practices.
If we are successful, we will help doctors and public health practitioners access critical information. This could help save lives and reduce the impact of COVID-19 on society and our communities. This is giving participants hope, which is critical in times like this. To me, that's how we win this competition.
To read more about Ericsson’s response to the coronavirus pandemic, you can read a blog post by our CEO: How we are responding to the coronavirus at Ericsson
We will also keep sharing information about our approach to the coronavirus and the measures we take on our coronavirus FAQ page.