Cancer Research Data Commons Newsletter
June 17, 2021
News & Stories
What Is Interoperability and Why Do We Need It?
By Melissa Haendel, Oregon Health & Science University; Samuel Volchenboum, University of Chicago; Kathryn Blumhardt, University of Chicago; Monica Munoz-Torres, Oregon State; Tanja Davidsen, NCI
The Center for Cancer Data Harmonization (CCDH) Team had the opportunity to host and present at the virtual NIH Cloud Platform Interoperability (NCPI) workshop on May 3 & 4, 2021, where they facilitated a discussion related to data interoperability.
The lack of standardization across clinical trials, basic research, and healthcare has culminated in data not moving smoothly or efficiently between clinical systems and research platforms. At best, this slows progress, and at worst, this leads to error-prone research and faulty conclusions that can harm patients. The development of common standards and FAIR data practices, which aims to increase data query and support data use and reuse by researchers, will help foster a culture of interoperability. We expect that this approach will lead to better data, ultimately bridging clinical and basic research into a single healthcare data “universe.”
Regarding interoperability, there are four main challenges:
-
Legal/licensing - Restrictively licensed data can only be combined with permissively licensed data for reuse and only permissively licensed data can be combined for redistribution.
-
Regulatory - Access control must match provenanced regulatory permissions.
-
System - Platforms and tools often cannot talk to one another to move data and analyses.
-
Data - Data are often un-encoded or coded in different data models and terminologies, limiting search and integrated analytics.
To ensure analytics are made more efficient and effective across domains and systems, we must necessarily address all four challenges.
While system interoperability has been front-and-center for NCPI activities, data interoperability has not yet been a major focus. However, the CCDH focuses on improving data harmonization and interoperability within the Cancer Research Data Commons (CRDC) ecosystem and offered this perspective to the NCPI workshop attendees.
Specifically, there is a need to support data interoperability across data commons on four levels:
-
Semantic - The common meaning and context using ontologies and terminologies such as Mondo and NCIt.
-
Syntactic - How the data are modeled, including data models, data structures, data dictionaries, and data schemas such as the CRDC-H.
-
Data format - How the data are exchanged via common data formats defined for encoding, decoding, and representation such as PFB, RDF, VCF.
-
Data architecture - The exchange and access via networks, computers, applications and web services, examples include APIs and Docker.
Data modeling and terminologies
CCDH presented the use and contribution to the LinkMl modeling language, in which the CRDC-H harmonized data model has been realized. LinkML is a “born interoperable” semantic data modeling framework designed for data dictionaries, data submission forms, data commons, and complex biomedical schemas. LinkML has a lot of features that support the above data interoperability goals, such as implementation as simple YAML as the source of truth, where JSON schema, Python dataclasses, GraphQL, RDF Turtle, OWL, and Shape Expressions can all be derived for building APIs, writing ETL, validation, and other downstream applications.
Terminology management and bindings to the CRDC-H model were also presented. To ensure the same meaning is used across different data commons, the enumerations and codesets referenced within any given data model must be recorded in a uniquely identifying manner with full provenance and versioning.
Takeaways
The CCDH recommended the creation of a common data model for use across data commons to support cross-commons search and analytics. Use of an implementation-independent language such as LinkML affords flexibility across platforms and contexts. Terminology services and bindings to the model can then be managed separately in a fit-for-purpose manner. Computable resources such as Cancer Data Standards Registry and Repository (caDSR) can be leveraged as a repository for Common Data Element value set creation and validation, supporting further semantic interoperability. Such endeavors within the CCDH to develop harmonization strategies and tools for CRDC could readily be expanded for implementation across NCPI resources.
Meet Our Team
Meet Alan Zheng, PhD, Clinical Informatics Program Manager
This month we spoke with Alan Zheng, the CRDC’s new program manager, to learn more about him and his work. Alan has 20 years of experience in IT product development and project management and holds a PhD degree in Biochemistry. His skills include SDLC management, Scrum Agile methodology, business analysis, quality assurance, and project and program management. He is also a Certified Project Management Professional. Alan joined Essex in February 2021 as a senior associate.
What is your professional background and interests?
I was trained in medicine and biochemistry and conducted neuroscience research at NINDS in my early career. I spent the past sixteen years at the American Society of Clinical Oncology (ASCO), managing technology programs in scientific conference, digital education, and quality improvement. I am interested in machine learning and AI, and how bioinformatics can be used in precision medicine to improve the outcome of cancer treatment.
What is your new role at CRDC? What are you looking to accomplish?
I am the program manager for CRDC. My current role is to support the CRDC program by working with various project managers to provide timely and accurate project statuses, to facilitate stakeholder communications across different parties, and to address some of the challenges associated with the fast growth of the program. For the near term, I am focusing on processes and tools to improve information sharing.
Why does CRDC matter to you?
I take cancer research and cancer care very personally, because I have too many relatives, friends, and colleagues affected by cancer. I believe sharing research data across all disciplines is critical to bringing results from the bench to bedside, and the CRDC plays a key role in enabling researchers to contribute, access, and analyze vast amount of cancer research data.
What advice would you ask from folks at CRDC?
How do you prioritize your work and remain focused while multitasking on different projects?
What is something you would like people to know about you outside work?
I am a curious person and always want to understand why? I read things like “why gravity is not a force but rather a spacetime curvature”, or “why intermittent fasting may promote health and longevity.” I am interested in space travel and often fantasize about living on Mars.
|
Letter from the Editors
Since its inception in January 2021, the NCI CRDC Insight brought us a variety of topics. Our stories would not be possible without your contributions and ideas. We would like to take the opportunity to shout out all our authors, contributors, and readers for making this newsletter a reality.
Based on input that the CRDC-wide team provided at the Fall 2020 All-Hands meeting, we shared stories about science and data releases, lessons learned, tools and resources, our people, as well as relevant events and opportunities.
The CRDC Insight was opened an average by 51% of users with 12% of users clicked on links within it. We are proud to share that these numbers are higher than the benchmark for internal newsletters, where the averages are 40-45% and 6-7%, respectively.
We will take a recess in July and August but stay tuned for new stories in the Fall of 2021. If you missed past issues of the monthly newsletter, check them out on the CRDC Wiki.
Thank you for your engagement!
Marcia Fournier, Tanja Davidsen, Erika Kim, Allen Dearry & Eve Shalley
|