SparkException: ExecutorLostFailure (executor driver lost)

In attempt to simplify things; I was running a self-contained (standalone) spark by using “sbt run” (as was suggested by the earlier versions of the docs).   However, it was quite unstable with spark throwing cryptic errors (see example below).  Moreover, sometimes the same piece of code would execute, and during the other times it would fail.

It turns out, that using “sbt run” was the cause of the problem.  Switching to running it with spark-submit got rid of most of the exceptions; even when exception was thrown the message was much more informative.

Running things with spark-submit is quite straightforward.  You should first have a pre-compiled version of spark somewhere on your machine.  Then package your program with “sbt package” and then just invoke spark-submit (make sure to specify the proper path to the jar).

 

15/09/13 19:09:46 INFO DAGScheduler: Resubmitted ShuffleMapTask(33, 832), so marking it as still running
15/09/13 19:09:46 INFO DAGScheduler: Resubmitted ShuffleMapTask(33, 114), so marking it as still running
15/09/13 19:09:46 INFO DAGScheduler: Resubmitted ShuffleMapTask(33, 2725), so marking it as still running
15/09/13 19:09:46 INFO DAGScheduler: Resubmitted ShuffleMapTask(33, 1980), so marking it as still running
15/09/13 19:09:46 INFO DAGScheduler: Resubmitted ShuffleMapTask(33, 1639), so marking it as still running
15/09/13 19:09:46 INFO DAGScheduler: Resubmitted ShuffleMapTask(33, 1298), so marking it as still running
15/09/13 19:09:46 INFO DAGScheduler: Resubmitted ShuffleMapTask(33, 186), so marking it as still running
15/09/13 19:09:46 INFO DAGScheduler: Resubmitted ShuffleMapTask(33, 2734), so marking it as still running
15/09/13 19:09:46 INFO TaskSchedulerImpl: Cancelling stage 33
15/09/13 19:09:46 INFO DAGScheduler: ShuffleMapStage 33 (combineByKey at BinaryClassificationMetrics.scala:152) failed in 19.215 s
15/09/13 19:09:46 INFO DAGScheduler: Job 21 failed: collect at BinaryClassificationMetrics.scala:193, took 20.474752 s
15/09/13 19:09:46 INFO DAGScheduler: Executor lost: driver (epoch 9)
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 3343 in stage 33.0 failed 1 times, most recent failure: Lost task 3343.0 in stage 33.0 (TID 147765, localhost): ExecutorLostFailure (executor driver lost)
[error] Driver stacktrace:
15/09/13 19:09:46 INFO BlockManagerMasterEndpoint: Trying to remove executor driver from BlockManagerMaster.
15/09/13 19:09:46 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(driver, localhost, 60878)
15/09/13 19:09:46 INFO BlockManagerMaster: Removed driver successfully in removeExecutor
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3343 in stage 33.0 failed 1 times, most recent failure: Lost task 3343.0 in stage 33.0 (TID 147765, localhost): ExecutorLostFailure (executor driver lost)
Driver stacktrace:
 at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
 at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
 at scala.Option.foreach(Option.scala:257)
 at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
 at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
 at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
 at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
 at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:905)
 at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
 at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
 at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
 at org.apache.spark.rdd.RDD.collect(RDD.scala:904)
 at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$4$lzycompute(BinaryClassificationMetrics.scala:193)
 at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$4(BinaryClassificationMetrics.scala:147)
 at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions$lzycompute(BinaryClassificationMetrics.scala:149)
 at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions(BinaryClassificationMetrics.scala:149)
 at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.createCurve(BinaryClassificationMetrics.scala:225)
 at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.roc(BinaryClassificationMetrics.scala:88)
 at Driver$.classify(Driver.scala:270)
 at Driver$.main(Driver.scala:100)
 at Driver.main(Driver.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)

 

About Neil Rubens

see http://ActiveIntelligence.org
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to SparkException: ExecutorLostFailure (executor driver lost)

  1. mustafa says:

    Thank you, for this tip. It solved our problem and saved a lot of time 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *


*