mHUB - 360Science for Spark - Sample Applications

The sample applications demonstrate deduplication of CSV data in the form of JavaRDD and of database tables loaded via Jdbc into Datasets.

Each sample application folder contains:

src Folder containing the sample source code. Script to build the application using maven and the pom.xml file.
<app>-jar-with-dependencies.jar Pre-built executable jar.
pom.xml Maven build configuration.
readme Text file with overview of application Example script to run the application.
sampleconfig.xml Example configuration file.

Additionally, the DedupeTextFile contains an example1.txt input file.

You don’t need to build the sample apps, as pre-built binaries are included, but build scripts are also included in case you want to modify the source to tailor the applications.