Skip to content Skip to sidebar Skip to footer

Pyspark: Ship Jar Dependency With Spark-submit

I wrote a pyspark script that reads two json files, coGroup them and sends the result to an elasticsearch cluster; everything works (mostly) as expected when I run it locally, I do

Solution 1:

The --jars just works; the problem is how I run the spark-submit job in the first place; the correct way to execute is:

./bin/spark-submit <options> scriptname

Therefore the --jars option must be placed before the script:

./bin/spark-submit --jars /path/to/my.jar myscript.py

This if obvious if you think that this is the only way to pass arguments to the script itself, as everything after the script name will be used as input arguments for the script:

./bin/spark-submit --jars /path/to/my.jar myscript.py --do-magic=true

Post a Comment for "Pyspark: Ship Jar Dependency With Spark-submit"