Webinar: Data and Knowledge Sharing During PHEs

Wednesday 26 August 2020
16:00 - 17:00


Watch the webinar in full


This webinar examined the role of data and knowledge sharing within the clinical research community during a PHE. It addressed the following questions: 

  1. Has the COVID-19 pandemic affected the way the clinical research community shares data and knowledge? In what ways has COVID-19 facilitated data and knowledge sharing? In what ways has it hindered data and knowledge sharing? 
  2. What are the hurdles to sharing data and knowledge in clinical research? The scientific/technological hurdles? The societal/investment hurdles? 
  3. What is needed to make data and knowledge sharing more effective and more the norm? 

Preparedness training webinar series

This webinar was part of our series on Preparedness Planning for Public Health Emergencies.

Find out more.

Meet the speakers

John Wise, Consultant at Pistoia Alliance & Programme Coordinator, PRISME Forum


  • Professor Barend Mons, Biosemantics, Leiden University Medical Center; Board Member, Leiden Centre of Data Science; Chair, High Level Expert Group of the European Commission on the European Open Science Cloud; Director of the International Office of GO FAIR; The Netherlands
  • Professor Elizabeth (Betsy) Ogburn, Department of Biostatistics, Johns Hopkins University & COVID-19 Collaborative Platform (covidcp.org), USA
  • Lynsey Chediak, Project Lead, Shaping the Future of Health and Healthcare, World Economic


A message from Prof. Huanming YANG

Unfortunately Prof. Huanming (Harry) YANG, who was invited to be one of the speakers, could not make this webinar. However, he sent this message below:


First of all, please accept my apology for my absence from the meeting last Thursday, surely no excuse.

Since 1997 when I worked in the International Bioethics Committee of UNESCO, I and my institute have dedicated to the free and immediate sharing of data, the human genome sequence at that time in 1999.  Then we advocated free sharing of genome sequences of other animals, such as pig and chicken, and plants like rice and wheat, to all we have contributed.  For the SARS in 2003, we published the first genome sequence in China.  That is the reason I fully support to data-sharing in pharmacology, which is even more challenging because its economic significance.  It is also a challenge to our ethical belief, as well as our wisdom.

Thanks to the contribution by you and all other colleagues, the previous webinars have turned out fruitful.  I have made so many new friends, and have learned so much from you all.


Professor Huanming

Co-founder, director and chairman of BGI Group”

Meeting report

John Wise opened the meeting by introducing the expert panelists. He had some questions which were put before the panel members to share their thoughts on them.


Has the COVID-19 pandemic affected the way the clinical research community shares data knowledge? And in what ways was the sharing facilitated or hindered? 


Professor Mons started by saying that under high-pressure, everything becomes fluid. A lot of things that were prevented by all kinds of Bureaucracy have certainly become possible because of the high pressure inserted by the COVID-19. He said that effective data sharing has always been technically possible, and in 90% of the cases, and where it is not possible, the reasons are not technical but social and very much related to data being not seen as of value.

He mentioned that data needs to be findable, accessible, interoperable for the machines. And only then is it reusable. Professor Mons stated that most of the time, when the term “data sharing” is used, the majority of the professionals think of sending physical data over the internet to someone else. They regard it as a loss of control and potentially violating GDPR, etc. So we are better off by moving from data sharing to date visiting. The data stay where they are, and the smart algorithms go to the data and ask access and run their models in their hypothesis. COVID has exposed the problem of data sharing enormously. We have struggled for three months to get sufficient data to build a conceptual model and find some new drugs. It is like no other disease, maybe except for Ebola. It has been extremely difficult to physically share data between centers. There is no way the Chinese government allows sending data to the United States or the other way around. There is nothing to do with gdpr. The entire problem revolves around politics. The pandemic course was so heavily politicized. It brought to the forefront that we can only deal with the learning on real-world observation in the current epidemic by visiting the data and sharing it centrally.


John Wise stated that the idea of data sharing is extremely important. He then went on to mention that some of his colleagues in England, associated with genomics, used different analogy and described themselves as reading library and not a lending library.


Professor Mons went further to simplify the terms of docker images and virtual machines. If you have a bookshelf and I have a bookshelf, instead of giving you one of my books, if you call me and ask permission if you quote received page 8 from book number 12. The virtual machine only takes the results with it and never the personalized data. You, as the custodian of your own fair data station, you decide which virtual machines to let in. And you check them again when they leave to ensure they don’t take any data that are not supposed to take. So this helps to stay in tune with gdpr as well, but the political pressure on COVID is unprecedented.


Professor Ogburn reaffirmed that pandemic had highlighted many of the problems that arise from the lack of data sharing. In particular, the clinical trial landscape has been flooded with trials that, on their own, will never be able to give solid evidence for or against any particular clinical decision-making. But if the data could be shared and trials coordinated, we would be in a much better position right now.

There was a study that the FDA in the US commissioned to look at all the COVID-19 randomized clinical trials that were registered to clinicaltrials.gov. This internal study found that 90% of these clinical trials for not actionable, meaning they were such low quality or the sample size was too small that the FDA would never base any clinical decision-making on this trial. These were just a waste of time and resources. This issue is not yet recognized enough to have changed behavior at this point, but in a post-mortem, it will be really hard to ignore. After the Ebola pandemic, guidelines were issued that pointed out exactly the same problems. But those problems are still present and affecting a lot of the people facing the current pandemic. She hoped that in five years, when we come back for another webinar, we would be able to say that COVID changed things are better.

On the flipside, clinical research has been slow to embrace open science than many other science areas. She showed concerns the way that non-peer-reviewed pre-prints have been used in this pandemic is going to make that embrace even harder. The peer-review system is already broken.


Professor Mons added that so often, people mix ‘fair’ with ‘open-science.’

And the worst thing is they consider open-science equal to open-access articles. He mentioned that according to figure, 450 papers about COVID come out per day. There is no possible way we will read all these papers and go to peer-review. We have gotten a proposal accepted to mine them all with a very high-level tagger and only look at the new cardinal assertions. For example, some fool claiming that hydroxychloroquine will treat COVID. So only those claims in the papers that are actually new or repetitive will be reviewed. There is no possible way to read all the papers, let alone judge them. The data should be published with the supplementary article, not the other way around. The data is currently being hidden behind papers that are a nightmare for machines as they can’t find the links that are often broken. Data need to be published with fair metadata so that the virtual machines can visit them easily. The virtual machines should have a fair amount of metadata themselves, and the software should be fair compliant. The data cannot be separated from the major reading mechanism anymore.

The situation of COVID is very bad. We now have fair data points in many African countries. We have installed one in China, but when asked to get real-world data, the answer was NO. The government doesn’t even allow data transfer from hospitals in China to a trusted professional in charge of the Chinese science-cloud. We now are generating 50,000 synthetic patients by algorithms filling the WHO crf comprising of a tremendous amount of values and data. As a result, we can make fake patients labeled as synthetic cases, so we have no gdpr issues. This enables us to distribute data all over the world.


As mentioned by Betsy, 90% of the trials are unusable. Shouldn’t these trials be subjected to an institutional review board or an ethics committee? Isn’t it in their remit to make a comment about whether the trial is worth doing? Because having trials of a statistical size that are meaningless is an irresponsible activity.


Professor Ogburn stated that although it should be happening, it’s not due to the faults in the way that internal review boards operate. Another significant problem is that there are not enough patients to enroll in these trials. Even though IRB approvals of a 400-person trial, there would be around eight patients by the time the trial is up and running for months. This is a problem that can be solved by data sharing. Of the hundreds of clinical trials, many of them were for hydroxychloroquine, and if those trials had come together and shared their data, we would have a huge well-powered trial.

Although there is some onus on individual IRB boards, the real issue is that no single decision-maker has access to the knowledge that is required to make a well-informed decision. Another drawback of the lack of data sharing is that we are flooding the landscape with repeated trials of the same clinical question that have been run at the same time. As a result, this is inflating the false positive rate when you look across the research landscape. The something that review boards should be wary of, but they just don’t have access to the knowledge that would allow them to make good decisions about this situation.

She then added that there are many researchers who are wary of making their data public because they are worried that somebody is going to go on a fishing expedition in their data and publish lots of papers that will flood the literature with exactly the same content. These researchers feel that making the data public is going to exacerbate the problem rather than fix it.


Professor Mons added by saying that we are the custodian of the patients’ data as a doctor. If I were to allow a virtual machine to access the data pool, I could ask co-author the research paper besides other restrictions. This way, I will be able to control what is being done with my data.


Lynsey Chediak started that there was an opportunity to point out how we need to adopt new models to work together better. As a specialist in genomic data policy, specifically rare diseases, the number of parallels between data sharing barriers and rare diseases that have come out in COVID has been fascinated. When working on a rare disease, data access at a cost at a global level, you have to go to a global level because the sample size is so small. You may be dealing with a disease that affects only one patient in California, but if you scale that up to a global level, tens of thousands of people have the same disease. So, it is crucial to be able to access one another’s clinical records. There is a lot of pushback when trying to implement new policy frameworks to enable greater access to data globally.

She stated that if not for COVID, what more could allow for a global disease management approach? When will we realize that we don’t need to have all the answers and data in individual hospitals or individual research ecosystems? We need to rethink how we handle complex disease cases, and this has been a great opportunity to see that in the clear.

She mentioned that the healthcare coalition in the US, led by the Mayo clinic, quickly figured out that they could use a federated data system to visit clinical data across hospitals anonymously. They should be new even after the current pandemic is over.

In terms of how data and knowledge sharing are hindered, the impact on a clinical trial resulting from the COVID-19 is going to continue on. Numerous partnerships that were on the cusp of success, either in the area disease space alter cancer space, have now passed. It is unclear when resources will be allocated, especially when the hospital budgets are pinched and revenue has been adversely impacted. As a consequence, what kind of impact will this have on the patients who sought answers on the area disease is still unclear.

It is about time that we worked together on data sharing and data access. We need to re-evaluate what’s at stake and ensure that we are able to provide patients with more answers at the forefront.


John Wise backed Lynsey Chediak’s point of escaping from the idea of ‘my data.’.’ He goes on to state that Mons has managed to vitalize this whole subject and bring it to the forefront.


Professor Mons added to the conversation by stating that Americans are better in acronyms can Europeans. They now used ‘fair’ as ‘fully AI-ready.’

He pointed out that fair data access points were set in Africa after the Ebola outbreak. This was made possible because when the Ebola epidemic was under control, 99% of the data was taken by the donors who came to help. The Africans could not even access the data on their own epidemic. They are adamant now that data on their health will never leave Africa anymore. It can be visited by machines and shared via virtual systems to Europe, the United States, or China.



What are the hurdles to sharing data and knowledge in clinical research? Discuss the following categories: scientific, technological, societal, and investment hurdles.



Professor Ogburn started by saying that 90% of the reasons are social and no technical. The biggest two hurdles by far are the free that research products are credited, and the inertia. The easiest thing we can work on to change going forward is the incentive structure that in academia is aligned solidly against sharing. We are currently in a situation where humanity benefits, but the individual researchers suffer when they share data. The current incentive is to hoard all data and publish as many peoples as you can before anybody else is allowed to make any discoveries from it. There have been a lot of easy to implement the solutions proposed. It is still unclear that why haven’t they taken off.

When federal money is funding research, it is imperative that the research is judged on its usefulness and not how many people it publishes. The real-world use should be considered, like did it help in clinical decision-making, the data used to inform new standards of care, etc. This incentivizes sharing because this is not something one author can do on their own. Right now, credit is basically tied exclusively to authorship on papers. It is shorted sighted and still unclear why funding agencies like NIH have still not jumped on board to look into the matter.

The scientific and technological hurdles are real, but they are minor when compared to the cultural and systematic hurdles. And these can be easier to deal with in comparison to the latter two.


Professor Mons added that if funders and governments want data sharing to happen, they need to invest in the underlying infrastructure. But then the top investors are not interested in the development of the infrastructure; instead, they affiliate themselves with top science. It is similar to everybody wanting to pay for rocket science, but nobody is willing to pay for the local launch.


Lynsey Chediak took the conversation forward by combining Betsy’s and Mons’s comments. She summarized the problem into two: the people and the processes. She believed that these are the real hurdles moving forward.

It is all about incentives. Especially with a rare disease case, we want to allow greater access to the data because that will enable us to cut down the diagnostic odyssey, which may help in finding better and more effective treatments for the 95% of the people with a rare disease that don’t have a treatment available yet. Although these moral incentives are there, it seems to is still not enough.

One incentive that seems to be constantly referenced is the idea of being innovative. And to ensure that their respective institution is world-renowned. But if that is to be achieved, you need to ensure that your processes are top-notch and have the ability to change your processes dynamically. When partnering with someone else, there is always room for learning and improving continuously.

On a societal note, if the role of benefit-sharing for a bit more front and center, we might see more proofs of concepts of successful data-sharing partnerships. She added that most of the patients don’t care about their information being leaked; all they want is the answers to their diseases. On the contrary, this is not where most of the genomic Institutes stand at. Not taking anything away from patient data privacy, but we should not let privacy protocols hinder the actual goal of data sharing.

Scientific and technical hurdles are not the ones that are necessarily stopping the data and knowledge sharing internally; it is really the people and the processes that impact the process of data sharing the most.


Professor Mons stated that clinical research’s hurdles are slightly different from any other generic research where patient privacy is not in question. If there is an area where distributed learning and distributed analytics has been proven to work is in clinical research. He said that his fight was not with scientists or technologists but with the doctors. This is because they are often themselves not willing to share the clinical data and cause a big hurdle in knowledge sharing. It all comes back to the social hurdles. It is necessary that we break through the hegemony of the old systems, which is very tough.

He quoted one of his friends, “The Internet brought us from the situation of many computers and many datasets to the situation of one computer and many datasets. The internet of fair data and services will bring us one computer in one data set.”

For virtual machines, it is trivial where the data can visit them with lightning speed as long as they can visit them.


What is needed to make data acknowledge sharing more effective and more and the norm?


Lynsey Chediak shares that there is a six-step guide on sharing genomic data. Four of which are regarding people, and only two are relevant to technical and scientific aspects – which is in itself a very interesting indication.

In the last seven years, 14 countries invested 4 billion USD in national genomic initiatives. The volume of genomic data is not going anywhere, but only increasing. However, not a single genomics initiative has invested in a mechanism to share their data with any other country. It’s all about how you can we as a nation, amass genomic data for own use. On the contrary, they are rare diseases, and cancer does not differentiate on the basis of nationality. The differences in our processes are not going away.

If we can propel data access partnerships and make that the norm, it will add great value to the data itself.


Professor Mons goes on to state that we generally are hard-headed people. There are two points in the life-cycle of a scientist’s life where he or she listens to the people. Firstly, when funding is required and secondly when he or she needs to publish.

So the funders have the required fair data. The publishing hegemony should be broken through, kill the general impact factor, prohibited supplementary data, published data with a supplementary article. Moreover, it is also crucial to invest in coal resources like curated databases and infrastructure for the sharing.


Professor Ogburn backs the idea of needing requirements at the funding and publication level. She said that allowing data stewards to be gatekeepers will be highly risky at any point in the process. Requiring data to be available in a way that removes the original data collected as the gatekeeper will be crucial for net circumnavigating all of the human issues that are actually the strongest barriers and avoid repeating everything that Lynsey Chediak and Mons said.

Ground-up movements like fair can solve some of the problems, but some of it really does require top-down organization. For example, the problem of clinical trials that proliferate without any individual IRB being able to adjudicate a proposed trial in the context of the full landscape.

She was optimistic about the problems that can be solved by individual scientists, like coming up with novel ideas, promoting them to their networks, and eventually, they will be able to catch on. But the idea of getting top-down leadership from WHO or NIH to avoid a lot of the problems that we have seen with COVID that just can’t be solved by ground-up operations.

She hoped that after COVID, there would be a will to change among people going forward.


Sharing thoughts on:

  1. diversity in clinical trials 
  2. data repository and data retention


Professor Ogburn points out that if there is treatment effect heterogeneity across subgroups that are not represented in clinical populations, we miss the signal that tells about how we might want to make different clinical decisions in different settings and how we might want to make global recommendations. This is something that the WHO has to worry about as they are making global recommendations that should apply to all nations, but they are often basing that on data that represents certain populations.

It is impossible to answer the most important scientific questions without collaborating and sharing data in lots of cases. It is impossible for a single research group to get a truly representative sample for the global population. A much more efficient and readily available solution is to allow research to continue to be local to make sure that the products are global.


Lynsey Chediak starts by saying that reference-genome is entirely representative of the US and European populations. And what we think of as discoveries don’t actually apply to the global landscape.

It is important to know that why do people participate in clinical trials. Is it for your individual disease, is it for the greater good or something in between? And how can you have more expectations for participation?

One model that can increase diversity in clinical trials is from Australian genomics. They have a dynamic consent platform called CTRL, which lets you complete a comprehensive survey.

It all comes down to not homogenizing a global population and trying to think through and not making assumptions about what every participant in a clinical trial wants. There is an education side of how to actually reach people and help them understand what you are doing.

The other important questions include, what do you do once you have the data, how do you treat your participant in a study, and is there a benefit provided if desired?

It is still unclear if something as sensitive as genomic data can lead the way in exploring new models for engagement.


Professor Mons stated that one of the most important incentives for people to give positive consent to the reuse of the data is feedback. Feedback information on variants in actual citizens’ genome is not something that a lot of time one to know because it is extremely difficult for the average layman to judge the value of an x % chance on a particular disease when you get 40 years old. So we need to be very careful when feeding back results to the citizens that gave consent; for some, it can be a very big incentive while for others, not so much.

The genetic information is quite dangerous as it requires real knowledge to understand the relative risk of a particular variant.


John Wise added that the biggest unmet medical need was the Asian phenotype. This shows us how skewed we are in our clinical development programs to the Western phenotype.

In the UK, the grant-giving bodies insist that data is collected and deposited when it is in support of our publication. However, from an industry point of view, we need to be aware of the fact that it takes money to look after data. The second most important point is that the data ages. The data collected today may not be particularly useful in the future. So if we are to have these data repositories, should there not be some kind of data management protocol signed to it. So, if the data is not used or accessed and is aging, then we can get rid of it because we save the cost.


Professor Mons points out that instead of using the term ‘data management,’ it’s better to use the term ‘data stewardship.’

It is important to consider the fairness and quality of the data. Fair has nothing to do with the quality of the data. Perfectly synthetic fake data can be made fair.


Professor Ogburn takes on the topic of wearable computing and its significance in the healthcare, medical, and clinical research field. She stated that we don’t yet fully understand the errors built into most wearable devices, especially the ones that are manufactured for commercial and not for research purposes. And it is still unclear what it is that these wearable devices are measuring and what useful biomarkers are there. We need to be very careful in choosing them as an endpoint in studies that are going to inform immediate clinical decision-making. Wearable devices are extremely useful for collecting data that is going to be used either in exploratory analysis: understanding disease processes. But it is still not ready for clinical decision-making research.


What did you do with your COVID-19 data?


Professor Alessandra mentioned that they had collected data on 1100 patients. Twenty-five of them is asymptomatic or partially symptomatic. They are trying to understand the genetic basis of post response and clinical outcomes. She mentioned that they had developed a new method that could be able to predict the clinical outcomes with up to 80% accuracy. There is a dire need to share these methods and validate them with other courts. The idea of sharing data and knowledge at a global level is crucial as we need to face the current challenge together.


Elaboration on Benefit-sharing


Lynsey Chediak stated that we need to involve patients more in clinical research. If we consider genomic medicine as the end came, there has to be an opportunity for greater information sharing in general as well as in clinical trial settings.

She showed concerns over how wearable data is not the right approach for sophisticated biomarkers. However, it would be an ideal case scenario if those wearables could be used as an addition to the clinical records.

Unlike Professor Mons, Lynsey Chediak is of the view that data sharing with patients is not entirely a bad thing.

She also added that telemedicine is transforming how patients allow the doctor to be in their home without being in the home. All this was made possible by COVID-19.

On the topic of informed consent, Professor Ogburn stated that when subjects agree to participate in clinical trials, they are often motivated by wanting to be as useful for as many people as possible. It was an ethical failing when researchers failed to make the data they collected from individuals as useful as possible for as broad an audience as possible. It is an obligation to share data responsibly for as many useful scientific ends as possible. It should be baked into the way that we think about the ethical obligations researchers entered into when they accept patients’ informed consent.