The Earth Observer



March/April 1996, Vol.8, No.2

Quality Assurance Methodology for EOS Products

--Bob Lutz (rlutz@ltpmail.gsfc.nasa.gov), Hughes STX Corp., ESDIS Science Office

Introduction

This article presents a strategy for Quality Assurance (QA) for the generation and archiving of EOS products. The process will require the participation of the Instrument Science Teams (ITs), the DAACs, the Interdisciplinary Science (IDS) investigators and representatives from the general science community. One of the goals and challenges of EOS will be to ensure that EOSDIS satisfies the QA science requirements of all these communities. The material is being presented here to expose the proposed QA methodology to as much of the EOS community as possible. Information is provided at the end of this article on how to become involved in the process.

Often, before the EOS/EOSDIS era, i.e., before 1990, detailed QA procedures had been incorporated into the processing algorithms after the launch of the satellite. This approach was sometimes ad hoc and incomplete, and the organization and content of the archived QA parameters were poor. Within EOS, forethought needs to be given regarding the real-time processing needs and the storage demands necessary for the inclusion of QA data into EOSDIS, due to budgetary constraints and capacity allocations. Furthermore, a definition of a QA procedure early in EOS, will maximize the potential for the long-term utilization of this type of data.

The general issue of QA of EOS products was initially raised at a Data Processing Focus Team (DPFT) meeting in April 1994. Scoping of the effort began within the EOSDIS Project Science Office (Steve Wharton) and has since transitioned to the ESDIS Science Office (H. K. Ramapriyan). Draft versions of QA Procedural Plans have been circulated among the ITs and DAACs, soliciting them for comments. The latest draft (June 1995) has been presented at several science team meetings and workshops. In addition, a QA Functional Components document has been prepared, to serve as a template for the anticipated IT-DAAC QA Plans.

Definition of Quality Assurance

The Panel on Data Quality (Mike Freilich, Chairperson) has proposed that quality control consists of three entities: calibration, quality assurance, and validation.

QA is defined as a process whose objective is to identify and flag data products, which obviously and significantly deviate from the expected accuracies for the particular product type. The QA process will be performed at the granule or smaller level, where a granule is defined as the smallest entity of a dataset for which inventory entries are maintained. We are recommending that the QA definition also include any quality control process that can be done before the product is released to the general science community.

The QA analysis of EOS products will consist of one or more of three possible functional components:

1) Product Generation Executive (PGE) QA Analysis
Within this component, the data products are produced (presently at a DAAC) from science algorithms supplied by the instrument science teams. It is anticipated that numerous QA parameters (operational and product-related) will also be generated from these algorithms. As a part of this process, some of these generated QA parameters may be summarized and possibly subsetted. These QA parameters will then be sorted and subdivided amongst the product metadata, the data product, possible external QA products, and operational processing logs.

2) DAAC QA Analysis (Optional)
The role of the DAACs in the science QA process will be determined by the Instrument Science Teams. Some DAACs may perform extensive science QA as negotiated between them and their respective ITs, while others may see no science QA performed at all at their site. Possible DAAC QA functions may be the monitoring of operational QA parameters and summary QA statistics generated from the previously discussed PGE analysis. Visual examination of the data products or statistical analyses of the QA parameters may also be done within this component. It is anticipated that the results of these analyses and the specification of possible problems will be sent electronically to Science Computing Facility (SCF) personnel in the form of QA reports. The SCF scientists would then investigate the problem and evaluate the situation.

3) SCF QA Analysis
The instrument teams will be ultimately responsible for the science QA of their data products. A subset or the entire data product (and the related external QA products) may be examined by scientists at the SCF, visually or statistically. It is also possible that the SCF scientists may want some (or all) of the operational QA information and summary QA statistics. In addition the instrument teams may also be receiving QA reports from the DAAC(s) processing the data. As a result of the SCF QA analysis, it is possible (though not probable) that the SCF scientists will modify the data products. A more-likely scenario will see only an update of the metadata files within the product. The final step within this analysis will entail the instrument team recommending that : 1) the data should be archived and released to the general public; 2) the data should be put in temporary storage and that further investigation is warranted; or 3) the data are incorrect and reprocessing is necessary.

Metadata

Quality control information will be inserted into the core metadata at two levels: validation parameters will be contained at the collection level, e.g., a data set, many granules, and QA measures at the granule level. QA attributes within the core granule metadata include :

  1. QA Collection Stats
    A set of three general QA flags will be used to indicate the overall quality assurance level of the granule:

  2. QA Stats Generic numerically-based flags will be associated with each granule. These flags include :

It should be noted that some of these flags may not be informative for all levels, e.g., all Level 3 data are interpolated data.

To indicate individual product QA information, specific QA measures will be established by the instrument teams. The set of QA attributes will be contained within the non-core metadata. It is strongly suggested, if possible, that a common approach be developed by the ITs for the inclusion of these non-core QA results into the metadata. This would provide users with a consistent format in their interpretation of project-wide QA. In addition, these product specific metadata parameters should be general and adaptive enough to accommodate a changing QA data stream over the life of the project.

Users of QA parameters

QA parameters may be used by several "types" of users:

  1. ITs will use QA parameters for the monitoring of the "health" of their data products. It is possible that some of these data, based on decisions of the ITs, may only be "internal" and not archived at the DAAC, e.g., algorithm QA parameters.

  2. ITs, whose products use other ITs' products as inputs, may need supplemental QA information from the other ITs. Some of the desired incoming QA parameters may be of the "internal" nature, but caution and careful documentation must be used if non-archived QA information is utilized in any decisions.

  3. The IDS teams and non-EOS funded researchers, may need extensive QA, e.g., individual data point QA Flags, in their generation of higher level EOS products.

  4. The general science community may utilize QA statistics quite differently from the above groups, in that these parameters may be principally used to "screen" data for potential usefulness. It is quite possible that the metadata QA statistics may be the most important parameters for this community. This group may also provide recommendations (though not binding) pertaining to the characteristics of the non-core QA data, i.e., what resolution and which QA parameters from the PGE analysis should be ultimately archived.

Implementation

A proposed implementation scheme for the development of a comprehensive QA methodology is now presented for the ITs, the DAACs, the IDS teams, and the user community. The procedure is a two-step iterative process. During the first stage of the process, data are gathered independently from each group through solicitation of each group's QA procedures and needs. The collected data will be compiled and distributed to the various groups. A workshop will be convened where representatives from all groups will participate in the formulation of a project-wide QA approach. During the second stage each group may fine-tune its own individual plans to accommodate the needs of others.

Involvement of the ITs and the DAACs

As algorithms mature and lessons are learned from the implementation of earlier versions of the software, the QA methodology will evolve. We, therefore, recommend a sequence of the writing of QA Plans, coinciding with the anticipated greater needs of QA information for the IT software deliveries. The first part of this sequence would occur before the workshop. A suggested QA Plan has been formulated and circulated among the ITs. It solicits information pertaining to the general characteristics of the functional QA components, as well as a detailed description of the inputs and the outputs for each component. The proposed QA Plan also provides a section and an opportunity for the ITs to indicate QA statistics that they would desire from other ITs.

  1. Draft QA Plan for Version 1
    Within this version of the QA Plan, the ITs may not be able to provide specific details of their QA products because of the immaturity of their algorithms. General QA elements are expected to be known though, and specification of this preliminary information will aid data-dependent ITs in the planning of their software. We recommend that Draft QA plans be generated by the ITs between their Beta and Version 1 releases (June 1996). This will allow sufficient time after the generation of such draft plans for an information exchange workshop (October 1996).

  2. QA Plan for Version 2
    From information learned during the workshop, as well as lessons learned from the implementation of Version 1, final QA plans will be generated by the ITs. We recommend that the delivery of these plans (April 1997) should be several months before the implementation date of Version 2 software to provide ample time for data dependent ITs to understand and incorporate the incoming QA data products.

Involvement of the IDS Teams and the User Community

The IDS teams will be notified, through their panel chairpersons, that there is a need within the project for their input to QA Procedures within EOS. Inputs regarding their needs for QA information will be gathered through the Ad Hoc Working Group on Consumers (AHWGC).

A panel will be formed of researchers representative of the science user community. Members of the DAAC User Working Groups (UWGs) may compose some of the panel. Other interested members of the science community will be welcomed to be part of this group. We recommend that the AHWGC work with this group to formulate a method to solicit comments from the user community on the proposed content of the archived QA metadata. Within the workshop, representatives of the panel will be encouraged to comment on the proposed content of the sub-granule QA information as specified in the IT Draft QA Plans.

Summary

The successful completion of this activity will allow:

This article, the fourth draft of the QA Procedural Plan, and the QA Functional Components document are found at URL address http://eos.nasa.gov/esdis (ESDIS homepage). Please use the comment option found there or e-mail Bob Lutz (rlutz@ltpmail.gsfc. nasa.gov) to indicate an interest in the information exchange workshop planned for October 1996.

| Table of Contents | | Previous | | Next |