Logo Goletty

A Survey on Removal of Duplicate Records in Database
Journal Title Indian Journal of Science and Technology
Journal Abbreviation indjst
Publisher Group Informatics (India) Limited (gjeis)
Website http://gjeis.org/index.php/indjst
PDF (216 kb)
   
Title A Survey on Removal of Duplicate Records in Database
Authors Anand, S. Krishna
Abstract Deduplication is a task of identifying one or more records in repository that represents same object or entity. The problem is that the same data may be represented in different way in every database. While merging the databases, duplicates occur despite different schemas, writing styles or misspellings. They are called as replicas. Removing replicas from the repositories provides high quality information and saves processing time. This paper presents a thorough analysis of similarity metrics to identify similar fields in records and a set of algorithms and duplicate detection tools to detect and remove the replicas from the database.
Publisher Indian Society for Education and Environment (ISEE)
Date 2013-04-01
Source Indian Journal of Science and Technology Volume 6, Issue 4, April 2013

 

See other article in the same Issue


Goletty © 2024