Skip to content

Data Quality

What is the Data Quality?

Data quality measures how well a dataset meets criteria such as accuracy, completeness, validity, consistency, uniqueness, timeliness, and fitness for scientific use. In Earth System Science (ESS), we employ observational, chemical, physical, and modeling techniques to understand the Earth system. Data quality, encompassing quality control, performance monitoring, calibration, and validation, need to be systematically applied to ensure that collected datasets align with mission objectives and the broader standards of the Earth science community.

Why is the Data Quality important?

The quality of data directly affects the robustness of scientific explanations in ESS. Accurate datasets enable the scientific community to strengthen its understanding of Earth’s processes and to manage data more reliably. In the era of AI and machine learning, datasets are increasingly reused as training material. Flawed or “dirty” datasets—those containing errors, biases, or missing information—can significantly mislead analyses and distort conclusions. Therefore, rigorous evaluation and publication of data quality metrics have become fundamental for scholarly datasets. These criteria are critical to regulate and inform the creation of future datasets.

What will be offered in this article collection?

The "Data Quality Collection" is planned to cover topics such as profiling, validation, standardization, cleansing, enrichment, deduplication, monitoring, and governance. This should ensure accuracy, completeness, and consistency from the point of data entry onward. We also encourage the use of international standards and tools for data quality control. For more details on contributing, please consult our article template.