The speaker's face flushed and glowed; Hilary Vireo, always glad andstrong in look and bearing, was grandly joyful when the power of thegospel he had to preach came upon him; the gospel of a full,perfect, and unstinted hope.
The Astronomical Data Center (ADC) at the NASA Goddard Space Flight Center is a major archive for machine-readable astronomical data tables. Many ADC tables are derived from published journal articles. Article tables are reformatted to be machine-readable and documentation is crafted to facilitate proper reuse by researchers. The recent switch of journals to web based electronic format has resulted in the generation of large amounts of tabular data that could be captured into machine-readable archive format at fairly low cost. The large data flow of the tables from all major North American astronomical journals (a factor of 100 greater than the present rate at the ADC) necessitates the development of rigorous standards for the exchange of data between researchers, publishers, and the archives. We have selected a suitable markup language that can fully describe the large variety of astronomical information contained in ADC tables. The eXtensible Markup Language XML is a powerful internet-ready documentation format for data. It provides a precise and clear data description language that is both machine- and human-readable. It is rapidly becoming the standard format for business and information transactions on the internet and it is an ideal common metadata exchange format. By labelling, or "marking up", all elements of the information content, documents are created that computers can easily parse. An XML archive can easily and automatically be maintained, ingested into standard databases or custom software, and even totally restructured whenever necessary. Structuring astronomical data into XML format will enable efficient and focused search capabilities via off-the-shelf software. The ADC is investigating XML's expanded hyperlinking power to enhance connectivity within the ADC data/metadata and developing XSL display scripts to enhance display of astronomical data. The ADC XML Definition Type Document can be viewed at -TREE.html
Free text data searching of earth science datasets has been implemented with varying degrees of success and completeness across the spectrum of the 12 NASA earth sciences data centers. At the JPL Physical Oceanography Distributed Active Archive Center (PO.DAAC) the search engine has been developed around the Solr/Lucene platform. Others have chosen other popular enterprise search platforms like Elasticsearch. Regardless, the default implementations of these search engines leveraging factors such as dataset popularity, term frequency and inverse document term frequency do not fully meet the needs of precise relevancy and ranking of earth science search results. For the PO.DAAC, this shortcoming has been identified for several years by its external User Working Group that has assigned several recommendations to improve the relevancy and discoverability of datasets related to remotely sensed sea surface temperature, ocean wind, waves, salinity, height and gravity that comprise a total count of over 500 public availability datasets. Recently, the PO.DAAC has teamed with an effort led by George Mason University to improve the improve the search and relevancy ranking of oceanographic data via a simple search interface and powerful backend services called MUDROD (Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery) funded by the NASA AIST program. MUDROD has mined and utilized the combination of PO.DAAC earth science dataset metadata, usage metrics, and user feedback and search history to objectively extract relevance for improved data discovery and access. In addition to improved dataset relevance and ranking, the MUDROD search engine also returns recommendations to related datasets and related user queries. This presentation will report on use cases that drove the architecture and development, and the success metrics and improvements on search precision and recall that MUDROD has demonstrated over the existing PO.DAAC search 2b1af7f3a8