org.apache.spark.SparkException: Job aborted due to stage failure: Task 10095 in stage 1.0 failed 1 times, most recent failure: Lost task 10095.0 in stage 1.0 (TID 10096, localhost): ExecutorLostFailure (executor driver lost)

Spark was throwing a cryptic error while doing a simple Logistic Regression on some text (see exception below).

It took me a while to track down the reason.  Basically Spark was running out of memory (would have been nice if that was mentioned in the exception).  In my case reducing number of features did the trick.  (Thanks to this post for providing a clue).

 

 

Exception

15/08/31 18:48:47 INFO DAGScheduler: Resubmitted ShuffleMapTask(1, 7006), so marking it as still running
15/08/31 18:48:47 INFO TaskSchedulerImpl: Cancelling stage 1
15/08/31 18:48:47 INFO DAGScheduler: ShuffleMapStage 1 (treeAggregate at LogisticRegression.scala:114) failed in 152.579 s
15/08/31 18:48:47 INFO DAGScheduler: Job 1 failed: treeAggregate at LogisticRegression.scala:114, took 152.902223 s
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 10095 in stage 1.0 failed 1 times, most recent failure: Lost task 10095.0 in stage 1.0 (TID 10096, localhost): ExecutorLostFailure (executor driver lost)
[error] Driver stacktrace:
15/08/31 18:48:47 INFO DAGScheduler: Executor lost: driver (epoch 0)
15/08/31 18:48:47 INFO BlockManagerMasterEndpoint: Trying to remove executor driver from BlockManagerMaster.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 10095 in stage 1.0 failed 1 times, most recent failure: Lost task 10095.0 in stage 1.0 (TID 10096, localhost): ExecutorLostFailure (executor driver lost)
Driver stacktrace:
 at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
 at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
 at scala.Option.foreach(Option.scala:257)
 at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
 at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
 at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
[trace] Stack trace suppressed: run last compile:run for the full output.
15/08/31 18:48:47 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(driver, localhost, 52910)
15/08/31 18:48:47 INFO BlockManagerMaster: Removed driver successfully in removeExecutor
15/08/31 18:48:47 INFO DAGScheduler: Host added was in lost list earlier: localhost
15/08/31 18:48:47 ERROR Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.InterruptedException
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
 at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:65)
 at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1215)
 at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
15/08/31 18:48:47 ERROR ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
 at java.lang.Object.wait(Native Method)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
 at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:157)
 at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1215)
 at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
 at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:67)
15/08/31 18:48:47 INFO SparkUI: Stopped Spark web UI at http://192.168.1.107:4040
15/08/31 18:48:47 INFO DAGScheduler: Stopping DAGScheduler
java.lang.RuntimeException: Nonzero exit code: 1
 at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
[error] Total time: 190 s, completed Aug 31, 2015 6:48:47 PM

About Neil Rubens

see http://ActiveIntelligence.org

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*