Testbed

mHUB includes three versions of a testbed ("HubTest") that can be used for simple testing and demonstration purposes. They take a delimited text file or database table, identifies the duplicate records within it, and outputs the results. (Additionally, two files/tables can be passed in and the records that overlap - i.e. appear in both tables - can be output.)

 

C++ HubTest (delimited files)

HubTest is a command line program. To see a full list of available arguments, execute:

hubtest /?

hubtest /license=<licenseFile> /settings=<settingsFile>
          /input=<inputFile1> [/delimiter=<delimiter1>]
          [/input=<inputFile2> [/delimiter=<delimiter2>]]
          /output=<outputFile> [/encoding=<encoding>]
          [/stats=<statsFile>] [switches]

Where:
  <licenseFile>   The name of the file containing the activation code.
  <settingsFile>  The name of a mHUB XML settings file
                  (see the Configuration Guide for details).
  <inputFile1>    The name of the input file for a matching process or
                  the name of the first input file for an overlap process.
  <delimiter1>    The field delimiter used in the first or both input file(s)
                  (default ',').
  <inputFile2>    The name of the second input file for an overlap process.
  <delimiter2>    The field delimiter used in the second input file, if
                  different from the first.
  <outputFile>    The name of the output file.
  <encoding>      The character encoding used in all files (default ANSI).
  <statsFile>     The name of a file to write statistics to (XML format).

  <delimiter> can be any single non-alphanumeric character.
  <encoding> can be ANSI, UTF8 or UTF16.

Switches:
  /yes            Overwrite output file without asking.
  /header         Input files have header record.
  /help or /?     Display usage.

An example of using HubTest:

hubtest /settings=settings.xml /input=contacts.txt /output=results.txt /license=activation.txt "/delimiter=|"

The mHUB configuration settings are supplied in an XML-formatted text file (settings.xml); please refer to the Configuration Guide for details on how to create and customize configuration settings files.

The input data is located in the file contacts.txt; this is a delimited text file that uses a pipe character ('|') as the delimiter. Results will be output to results.txt; these will similarly be delimited.

mHUB requires a valid license. This will usually be supplied as a text file, the contents of which must be read and passed in when a mHUB engine is initialized. HubTest does this automatically, using the specified license file.

 

C# & Java HubTest (database tables)

The C# and Java versions of HubTest are similar to the C++ version but the inputs and outputs are database tables rather than delimited files. (The C# version works with MS SQL Server, the Java version works with MS SQL Server, Oracle and MySQL).

hubtest /testconfig=<testConfigFile>
          /settings=<settingsFile>
          /license=<licenseFile>
          [/stats=<statsFile>]
          [switches]

Where:
  <testConfigFile>  The name of a HubTest XML configuration file.
  <settingsFile>    The name of a mHUB XML settings file
                    (see the Configuration Guide for details).
  <licenseFile>     The name of the file containing the activation code.
  <statsFile>       The name of a file to write statistics to (XML format).

Switches:
  /help or /?       Display usage.

The difference from the C++ command line usage is that, instead of the names of input and output files, these take a <testConfigFile> switch which specifies a configuration file for the database access. E.g.

<?xml version="1.0" encoding="utf-8" ?>
<config>

  <input>
    <!-- Define one data source for single table Matching -->
    <dataSource>
      <connectionString>database connection string</connectionString>
      <table>TABLE</table>
      <columns>UniqueRef,Prefix,Forenames,Surname,Address1,Address2,Address3,Address4,Address5,Postcode</columns>
    </dataSource>

    <!-- Define two data sources for Overlap Matching -->
    <!--
    <dataSource>
    </dataSource>
    -->
  </input>

  <!-- Output database in which to create tables for MatchingPairs, GroupedMatchingPairs, MatchingGroups, DedupedData, DuplicateData, DeletedPairs.
       Only the output types enabled in the mHUB configuration file will be output. -->
  <output>
    <connectionString>database connection string</connectionString>
  </output>

</config>

 

Sample Source Code

mHUB includes full sample source code for all three versions of the HubTest program.

To access the sample code, open the Start menu and navigate to matchIT Hub -> Install Sample Files. Clicking this runs an installer that extracts the sample code to:

C:\Users\<username>\Documents\matchIT Hub\sample code

where <username> specifies the user account of the current user. Windows Explorer will automatically open this folder.

The sample code is supplied in three versions:

  • The C++ version reads data from a delimited text file, and writes results to a separate delimited text file;
  • The C# version (for Windows only) uses ADO.NET to read from and write to MS SQL Server tables;
  • The Java version uses JDBC to read from and write to MS SQL Server, Oracle or MySQL Database tables.

Note that these programs aren't intended to be used in their current form in a production environment, but they can, of course, be used as a starting point to help in the development of an application that uses mHUB.