SPIDER: Data Quality & Data Cleaning Project

Mohammad Sadoghi, D. Stantic, and N. Koudas.

In IBM CASCON, 2005.

Abstract

Data quality is a serious concern in every organization that relies on data. The quality of data is commonly poor due to a multitude of reasons including, but not limited to, spelling mistakes, abbreviations, lack of standards and inconsistent notations. SPIDER is a declarative data cleaning tool. It incorporates a set of algorithms that can be used to aid the improvement of data quality on any relational data source.

Download


Readers who enjoyed the above work, may also like the following:


  • Optimizing Key-Value Stores for Hybrid Storage Architectures.
    Prashanth Menon, Tilmann Rabl, Mohammad Sadoghi, and Hans-Arno Jacobsen.
    In Proceedings of CASCON, 2014.
    Tags: key-value stores, leveldb
  • Adaptive Parallel Compressed Event Matching.
    Mohammad Sadoghi and Hans-Arno Jacobsen.
    In 30th IEEE International Conference on Data Engineering, 2014.
    Tags: content-based matching, publish/subscribe, event processing
  • CaSSanDra: An SSD Boosted Key-Value Store.
    Prashanth Menon, Tilmann Rabl, Mohammad Sadoghi, and Hans-Arno Jacobsen.
    In 30th IEEE International Conference on Data Engineering, pages 1162-1167, 2014.
    Tags: cassandra, big data, key-value store, nosql