|
Introducing Livewire’s Metadata and Quality Characterization
In this issue of Livewire news, you will learn about what metadata is and why it is so important. To review Livewire dataset descriptions and see metadata in action go to livewire.energy.gov/dataset-search or sign up for a Livewire account at livewire.energy.gov/register.
What is metadata? Metadata is simply data about data. Both projects and datasets have metadata. A project’s metadata is descriptive information about the project, while a dataset’s metadata captures information about the content and structure of the data itself. Almost everything you see when you visit the U.S. Department of Energy’s Livewire Data Platform is (or is enabled by) metadata: data describing the projects in Livewire’s catalog and data describing the data in a project’s datasets. When you look at a project’s page in Livewire, information such as the project title, description, point of contact, references, and keywords are all captured in the project’s metadata. Everything Livewire needs to know about a given project is also captured in the project’s metadata. When using Livewire’s search function, the search is looking through every project’s metadata for the search terms. And as of late 2023, those searches are also looking through more detailed metadata for an expanding portion of Livewire’s datasets, making it even more powerful and informative.
Where does Livewire’s metadata come from? The Livewire team assembles each project’s metadata based primarily on information gained through conversations with the project team or data owner. We do this to remove that burden from the project team and to ensure all needed information is captured consistently across the Livewire catalog. It’s part of our curation effort to ensure consistency and findability throughout Livewire. The more detailed metadata describing datasets in each project is also produced by the Livewire team using largely automated processes to analyze datasets, determine their structure, and evaluate their content. Each dataset’s detailed metadata becomes part of the Livewire platform, supporting searches and other capabilities and is made available to Livewire users in different user-readable and system-usable formats. Wondering how Livewire’s search knows that a particular project’s dataset has references to your search terms in a table or column name, or even in the data values themselves? That’s all captured in the detailed metadata. It’s something Livewire provides so that researchers and data owners don’t have to.
What is Livewire’s quality characterization seen on many dataset pages? A key product of our process to produce a dataset’s detailed metadata is a characterization of the quality of the data itself. That characterization is the result of statistical analysis of the data with the intent of identifying likely outliers or invalid or missing information, and then assessing the extent to which those potential issues impact other aspects of the dataset. These aspects of accuracy, quality, and completeness are assessed at the lowest row or column level and aggregated to produce an overall quality metric shown on the dataset’s page in Livewire to help users make decisions about which datasets may be most valuable in their research. Knowing that one dataset is likely more complete or has fewer likely inaccuracies can help Livewire users decide which datasets best meet their needs.
Curious to know more? Contact the Livewire team at livewirecontact@lyris.pnnl.gov.
Stay up-to-date on the Latest VTO News!
Sign up to receive regular communications about VTO career opportunities, Fact of the Week (FOTW), newsletter, webinars, workshops, funding opportunities, and technical reports.
|