Spark: SparkContext: wholeTextFile: InvalidInputException: Input path does not exist

Spark has a handy function `wholeTextFile` for loading files from a diretory into RDD

While using it I got a cryptic error (bellow); that took me awhile to debug.

The problem was with the file name; for some reason the folder could not start with “_” underscore; renaming the folder fixed the problem.

[error] (run-main-0) org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file: _
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file: _
 at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:285)
 at org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55)
 at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:266)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
 at scala.Option.getOrElse(Option.scala:121)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
 at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
 at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
 at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
 at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
 at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
 at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
 at scala.collection.AbstractTraversable.map(Traversable.scala:104)
 at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
 at scala.Option.getOrElse(Option.scala:121)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1781)
 at org.apache.spark.rdd.RDD.count(RDD.scala:1099)
 at Driver$.fun2(Driver.scala:55)
 at Driver$.delayedEndpoint$Driver$1(Driver.scala:20)
 at Driver$delayedInit$body.apply(Driver.scala:18)
 at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
 at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
 at scala.App$$anonfun$main$1.apply(App.scala:76)
 at scala.App$$anonfun$main$1.apply(App.scala:76)
 at scala.collection.immutable.List.foreach(List.scala:381)
 at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
 at scala.App$class.main(App.scala:76)
 at Driver$.main(Driver.scala:18)
 at Driver.main(Driver.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)

About Neil Rubens

see http://ActiveIntelligence.org
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


*