sortByKey at BinaryClassificationMetrics.scala:134 failed : Spark Exception

I had a strange error while getting some metrics (see below).

The fix was to add the “key” field to the case class.  In my case it was as following:

case class Email(
                  id: String,
                  body: String,
                  headers: Option[Headers] = None) extends Serializable {
  val key = id + java.lang.System.currentTimeMillis() // needed for sortBy (metrics); otherwise exception is thrown
}

 

 

Exception

15/08/27 13:17:37 INFO DAGScheduler: ResultStage 11 (sortByKey at BinaryClassificationMetrics.scala:134) failed in 19.436 s
  15/08/27 13:17:37 INFO DAGScheduler: Job 2 failed: sortByKey at BinaryClassificationMetrics.scala:134, took 66.423785 s
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 9552 in stage 11.0 failed 1 times, most recent failure: Lost task 9552.0 in stage 11.0 (TID 57584, localhost): ExecutorLostFailure (executor driver lost)
  [error] Driver stacktrace:
15/08/27 13:17:37 INFO DAGScheduler: Executor lost: driver (epoch 5)
15/08/27 13:17:37 INFO BlockManagerMasterEndpoint: Trying to remove executor driver from BlockManagerMaster.
15/08/27 13:17:37 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(driver, localhost, 48434)
15/08/27 13:17:37 INFO BlockManagerMaster: Removed driver successfully in removeExecutor
org.apache.spark.SparkException: Job aborted due to stage failure: Task 9552 in stage 11.0 failed 1 times, most recent failure: Lost task 9552.0 in stage 11.0 (TID 57584, localhost): ExecutorLostFailure (executor driver lost)
Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

 

 

 

About Neil Rubens

see http://ActiveIntelligence.org
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


*