Creating and customizing your pyspark image

  1. Download the base spark code
  2. Builds the spark dockerfile for pyspark, without changing anything
  3. Adds anothers dockerfile with GCS jar and another python requirements, as an example.
    You can also find a sample job to read a CSV inside.
make run_local
  • Using git-sync for a spark docker
  • Reading binary file and extracting their metadata (in my case DICOM files)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store