mHUB is currently available for these operating systems:
- Windows (XP, Vista, 7, 10, 2008, 2012);
- Linux (including RHEL 5-7)*;
- Solaris (10 or 11)*.
mHUB can be provided for other operating systems on request.
(*) Note that mHUB requires GCC 4.8.x or newer, which isn't available out-of-the-box on Solaris and some older Linux distributions. Please refer to the Runtime Requirements section for further details.
mHUB is fully multithreaded and highly scalable. By default it will use all available processor cores, but can be forced to run using fewer if necessary. The more cores that mHUB runs on, the faster it will be able to process data.
mHUB can run entirely in-memory. As the volume of data increases, memory requirements also increase. It is highly recommended that mHUB is used on a machine with enough memory to sufficiently process the data without requiring disk storage.
As a rough guideline:
- a machine with 8 GB of RAM should comfortably process 15 million rows;
- a machine with 16 GB of RAM should comfortably process 30 million rows;
- a machine with 32 GB of RAM should comfortably process 60 million rows;
- a machine with 48 GB of RAM should comfortably process 80 million rows.
If overlapping two sources of data, then use their summed row counts with these guidelines (for example, 100 million vs. 20 million would require 80 GB of RAM.
Note that these figures are highly dependent on factors such as:
- the average size of each row (these figures assume an average row size of 150 bytes);
- which match keys are used (refer to the Configuration Guide for details on match keys);
- the amount of duplication in the data.
To work with such high volumes of data, it is necessary to use the 64-bit edition of the mHUB component. If the 32-bit edition is used, then typically only 2 GB of RAM can be allocated by the process, and this limits the amount of data that can be held in RAM and processed without resorting to disk for overflow storage (which has a significant impact on performance); in practice, the limit will be 1 or 2 million rows of data, but this is entirely dependent on factors including quality of data and which matching levels and match keys are in use.
Normalization: Note that when an engine is configured for normalization, a row of data added to the engine is discarded immediately after it's processed and output; it is otherwise not retained in RAM. The above RAM requirements are therefore not applicable, and memory usage is minimal.
mHUB can fall back to storing data on disk, for example if memory usage exceeds a predetermined threshold. This can significantly impact performance, but will allow for processing greater volumes of data. Should disk usage be necessary, then fast disks (such as SSDs) are highly recommended.