Working around The 2 GB File Size Limit

The 2GB limit is essentially a by product of DOS and subsequently the Windows File Systems, these were limited to 2GB at a time when the largest hard disk purchasable was approximately 180MB. This problem is not limited to FoxPro – Microsoft Access and Sequel Server's MSDE product amongst others have a 2GB limit.

What does the 2GB limit mean in terms of matchIT? 2GB is 2,147,483,648 bytes. To work out how many records this represents, find out how many bytes per record are in the file. The Table tab in the Main File Layout Table Designer will show the total record length of each record and enable calculation of the maximum number of records that can be loaded into one table. The formula is 1024 * 1024 * 1024 * 2 divided by the record length.

To access the tab table, open a DBF file, click tools, then main file layout table designer and finally click the table tab

 

Steps to take to ensure the 2 GB limit is not reached:

1. Maximize the number of records able to be imported by reducing record lengths i.e. reduce the size of the input fields as much as possible (name and address fields can often be limited to 20 or 30 characters) and remove any fields not needed for mDesktop to process prior to import.

2. Reduce or remove some of mDesktop’s generated fields as follows:

  • Name - make this 1 byte if you aren't doing Individual level matching
  • Coy_name - delete if you aren't doing Business level matching
  • Name_key - delete this: it is just NAME1 + LEFT(NAME2, 1) if you want to use it in a match key
  • Coy_key - delete unless you are doing both Business and Individual matching
  • Match_ref - delete unless you are exporting matching info to another system
  • Overlap_ref - delete unless you are exporting matching info to another system
  • Set_dups - delete unless you need to count duplicate group sizes
  • Premise - keep it short, 3-6 bytes depending on your data
  • Salutation, Contact - delete and generate later, either on a split file or one with fields that were required for matching limited to one byte.

There are other fields that mDesktop usually adds that can be eliminated – please contact our support team for advice.

If the file still exceeds 2GB, consider the following alternatives:

  • If it is the master file in a multiple file job which exceeds 2 GB, large suppression files can be kept out of the multiple file wizard and then used in Find Overlap - this is much more efficient too.
  • If using suppressIT, it is easier to keep the suppression files separate than by using Find Overlap. In addition getting the suppression files included in the multiple file reports with suppressIT.
  • If it is not a multiple file job or has no large suppression files, split the file on e.g. first character of postal code. If many records contain no postal code, dedupe each file individually and then find overlap between them. If the postal code is populated consistently and reliably, then finding the overlap is unnecessary.