After your organization forms a general plan for tackling its cybersecurity and privacy risk management issues, it needs particular state-of-the-art tools to make that plan a reality. Computer security and privacy experts at the NIST have the answer with an updated toolbox of safeguards for protecting an organization’s operations and assets, as well as the personal privacy of individuals.
The need for tools to help everyday users access the information they are looking for on the internet is more urgent due to the massive amount of data generated every day, hour, and minute! This information can be in different formats, such as text (e.g., the latest news or answers to questions), audio (e.g., soundtracks on SoundCloud), images (e.g., pictures for a PowerPoint presentation) and video (e.g., movies, tutorials about how to make or fix stuff, or cats).
For example, on YouTube each day, about 1 billion hours of video are being watched, and every month, more than 2 billion users log onto the service. How can YouTube make those billions of users happy by delivering them the videos they are looking for? This is where a major research field called information retrieval (IR) comes into play.
New NIST Resources Aid Searches of Coronavirus Dataset
NIST has made available four new resources for searching the CORD-19 Open Research Dataset, a one-stop-shop of tens of thousands of research articles about SARS-CoV-2 and the coronavirus family of viruses. Some of these resources were developed with advanced data science and natural language processing, while the git repositories will be useful for people who want to apply their own algorithms to the dataset.
- The NIST Scientific Indexing Resource uses the NIST-developed “root and rule” method to determine keywords and help a user find relevant and related articles.
- The COVID-19 Data Repository relies on the Configurable Data Curation System, developed at NIST for structuring datasets that lack organization, and offers multiple ways to query the dataset.
- The COVID-19 Registry, also based on the Configurable Data Curation System, is a web application that collects descriptions of resources including other repositories, databases, services, portals, websites, and organizations. It relies on contributions from across the research community.
-
cord19-cdcs-nist, hosted on GitHub, provides quick access to CORD-19 data that is already screened for incomplete, irrelevant, or corrupt data, and is therefore ready for analysis with any programming language.
These tools were developed in response to the March 16, 2020, White House Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset, which tasked the artificial intelligence community with ways to make the collection’s text and data easily searchable by biomedical researchers.