Introduction

mSQL provides high performing, scalable, and robust deduplication functionality via two interfaces: SQL CLR Stored Procedures (version 2005 or newer) and .NET tasks for use in SSIS (SQL Server Integration Services) packages.

 

Stored Procedures:

These are typically executed within T-SQL scripts. A number of stored procedures are available for performing discrete tasks, and these can be executed either in sequences or concurrently via multiple T-SQL scripts.

The stored procedures are configured using XML-formatted configuration files (‘configs’). These contain all the settings required to perform certain operations; for instance, contents include one or more data sources (containing a database connection string, a list of tables, and a list of columns and what type of data they contain), fuzzy match keys to use, names of output tables, and various processing options.

Configs can be edited using any XML or text editor. An optional browser-based ASP.NET UI is available as an alternative for creating and editing configs.

 

SSIS Packages:

Control Flow components for use in SSIS packages are also provided. These are created and edited visually within a Visual Studio shell (BIDS or SSDT), and provide near real-time feedback of running tasks.

The components use the same core .NET code as the stored procedures, and so are functionally identical.

 

Deployment:

The product can be deployed in a number of different environments.  It can be installed and used, for example, on a computer with one SQL Server instance that contains the data being deduped; it can also be installed and used on a computer containing SQL Server accessing data located elsewhere on the network.

 

Glossary

SSIS = SQL Server Integration Services
SSMS = SQL Server Management Studio
BIDS = Business Intelligence Development Studio (SQL Server 2005 and 2008)
SSDT = SQL Server Data Tools (SQL Server 2012 or newer)
Config = An XML-formatted configuration file that is used by the stored procedures.