Wednesday, March 16, 2011

Feeling Fuzzy

SimMetrics
"SimMetrics is an open source extensible library of Similarity or Distance Metrics, e.g. Levenshtein Distance, L2 Distance, Cosine Similarity, Jaccard Similarity etc etc. SimMetrics provides a library of float based similarity measures between String Data as well as the typical unnormalised metric output."


Python: difflib
"This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats..."


SSIS: Fuzzy Lookup Transformation
"The Fuzzy Lookup transformation performs data cleaning tasks such as standardizing data, correcting data, and providing missing values."