Showing posts with label BigData. Show all posts
Showing posts with label BigData. Show all posts

July 21, 2017

Spark word count example

example:
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object Wordcount {
 def main(args: Array[String]) {
  
 //Create conf object
 val conf = new SparkConf()
 .setMaster("local")// to set the environment
 .set("spark.driver.memory","1g") // to resolve the memory issue(Could not reserve enough space for 3145728KB object heap)
 .setAppName("WordCount")//application name
  
 //create spark context object
 val sc = new SparkContext(conf)
 
//Check whether sufficient params are supplied
 if (args.length < 2) {
 println("Usage: ScalaWordCount  ")
 System.exit(1)
 }
 //Read file and create RDD
 val rawData = sc.textFile(args(0))
  
 //convert the lines into words using flatMap operation
 val words = rawData.flatMap(line => line.split(" "))
  
 //count the individual words using map and reduceByKey operation
 val wordCount = words.map(word => (word, 1)).reduceByKey(_ + _)
  
 //Save the result
 wordCount.saveAsTextFile(args(1))
 println(""+wordCount)
//stop the spark context
 sc.stop
 }
}



To run in eclipse:
Before run the above wordcount program in Eclipse, set the below configurations:
1. Add the spark jars in the classpath
2. Add the scala, java libraries in library
3. How to run the program:
   Programs arguments:
"C:\Programs\spark-2.1.1-bin-hadoop2.7/README.md" "C:\Users\svm6kor\Documents\Trainings\SparkWithScala\Code\output5"
   VM Arguments:
-Xms1336m -Xmx1336m
-Dspark.driver.memory=2g
-Djava.net.preferIPv4Stack=true

To run comment prompt:
spark submit:
outside the project work space:
C:\Programs\spark-2.1.1-bin-hadoop2.7\bin>spark-submit --class Wordcount --master local C:/Users/userName/Documents/Trainings/SparkWithScala/Code/Wordcount.jar C:/Users/userName/Documents/Trainings/SparkWithScala/Code/WordcountData output

or

working folder:
C:\Users\userName\Documents\Trainings\SparkWithScala\Code>c:\Programs\spark-2.1.1-bin-hadoop2.7\bin\spark-submit --class Wordcount --master local Wordcount.jar output
Read more ...

My Favorite Site's List

#update below script more than 500 posts