Pyspark: Ship Jar Dependency With Spark-submit
I wrote a pyspark script that reads two json files, coGroup them and sends the result to an elasticsearch cluster; everything works (mostly) as expected when I run it locally, I do
Solution 1:
The --jars
just works; the problem is how I run the spark-submit
job in the first place; the correct way to execute is:
./bin/spark-submit <options> scriptname
Therefore the --jars
option must be placed before the script:
./bin/spark-submit --jars /path/to/my.jar myscript.py
This if obvious if you think that this is the only way to pass arguments to the script itself, as everything after the script name will be used as input arguments for the script:
./bin/spark-submit --jars /path/to/my.jar myscript.py --do-magic=true
Post a Comment for "Pyspark: Ship Jar Dependency With Spark-submit"