The sample applications demonstrate deduplication of CSV data in the form of JavaRDD and of database tables loaded via Jdbc into Datasets.
Each sample application folder contains:
|src||Folder containing the sample source code.|
|build.sh||Script to build the application using maven and the pom.xml file.|
|<app>-jar-with-dependencies.jar||Pre-built executable jar.|
|pom.xml||Maven build configuration.|
|readme||Text file with overview of application|
|run.sh||Example script to run the application.|
|sampleconfig.xml||Example configuration file.|
Additionally, the DedupeTextFile contains an example1.txt input file.
You don’t need to build the sample apps, as pre-built binaries are included, but build scripts are also included in case you want to modify the source to tailor the applications.