Showing posts with label spark. Show all posts
Showing posts with label spark. Show all posts

July 24, 2017

how to start the Apache spark with Kafka application in windows?

To configure Apache Kafka in Windows:
https://www.youtube.com/watch?v=MVdGjs0YQBQ
https://www.youtube.com/watch?v=LS6WiHs_m5w

Apache Spark With Kafka:
https://spark.apache.org/docs/latest/streaming-kafka-integration.html
http://kafka.apache.org/documentation.html

Integrating Spark Streaming with Apache Kafka:
https://www.youtube.com/watch?v=5V-Wtfrc38U&t=621s

Advanced Apache Spark Training - Sameer Farooqui (Databricks):
https://www.youtube.com/watch?v=7ooZ4S7Ay6Y

Intro to Apache Spark Streaming | NewCircle Training:
https://www.youtube.com/watch?v=2STfulBcorA

How to start the zookeeper/kafka application after the installation with path configuration:

Zookeeper server start configuration:
C:\Programs\zookeeper-3.4.10\bin>zkserver

kafka server :
C:\Programs\kafka_2.12-0.11.0.0>.\bin\windows\kafka-server-start.bat .\config\server.properties

kafka-topics:
C:\Programs\kafka_2.12-0.11.0.0\bin\windows>kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test7

or

C:\Programs\kafka_2.12-0.11.0.0\bin\windows>kafka-topics.sh --describe --zookeeper localhost:2181 --topic test

kafka-console-producer:
C:\Programs\kafka_2.12-0.11.0.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic test7
> my test message 7


kafka-console-consumer:
C:\Programs\kafka_2.12-0.11.0.0\bin\windows>kafka-console-consumer.bat --zookeeper localhost:2181 --topic test7
> my test message 7
Read more ...

July 21, 2017

Spark word count example

example:
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object Wordcount {
 def main(args: Array[String]) {
  
 //Create conf object
 val conf = new SparkConf()
 .setMaster("local")// to set the environment
 .set("spark.driver.memory","1g") // to resolve the memory issue(Could not reserve enough space for 3145728KB object heap)
 .setAppName("WordCount")//application name
  
 //create spark context object
 val sc = new SparkContext(conf)
 
//Check whether sufficient params are supplied
 if (args.length < 2) {
 println("Usage: ScalaWordCount  ")
 System.exit(1)
 }
 //Read file and create RDD
 val rawData = sc.textFile(args(0))
  
 //convert the lines into words using flatMap operation
 val words = rawData.flatMap(line => line.split(" "))
  
 //count the individual words using map and reduceByKey operation
 val wordCount = words.map(word => (word, 1)).reduceByKey(_ + _)
  
 //Save the result
 wordCount.saveAsTextFile(args(1))
 println(""+wordCount)
//stop the spark context
 sc.stop
 }
}



To run in eclipse:
Before run the above wordcount program in Eclipse, set the below configurations:
1. Add the spark jars in the classpath
2. Add the scala, java libraries in library
3. How to run the program:
   Programs arguments:
"C:\Programs\spark-2.1.1-bin-hadoop2.7/README.md" "C:\Users\svm6kor\Documents\Trainings\SparkWithScala\Code\output5"
   VM Arguments:
-Xms1336m -Xmx1336m
-Dspark.driver.memory=2g
-Djava.net.preferIPv4Stack=true

To run comment prompt:
spark submit:
outside the project work space:
C:\Programs\spark-2.1.1-bin-hadoop2.7\bin>spark-submit --class Wordcount --master local C:/Users/userName/Documents/Trainings/SparkWithScala/Code/Wordcount.jar C:/Users/userName/Documents/Trainings/SparkWithScala/Code/WordcountData output

or

working folder:
C:\Users\userName\Documents\Trainings\SparkWithScala\Code>c:\Programs\spark-2.1.1-bin-hadoop2.7\bin\spark-submit --class Wordcount --master local Wordcount.jar output
Read more ...

July 20, 2017

Situation: Creating spark context 2 times by mistake while creating spark mongo db configuration

Situation: Creating spark context 2 times by mistake.

 val spark = SparkSession.builder()
      .master("local")
      .appName("SparkMongoDB")
      .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/Spark.sparkCollection")
      .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/Spark.sparkCollection")
      .getOrCreate()
   //val sparkConf =  new SparkConf().setAppName("SparkMongoDB").set("spark.driver.allowMultipleContexts","true"); val sc = new SparkContext(sparkConf);
      val sc = spark.sparkContext

 I got below exception in console:
Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)

Action: Verified https://issues.apache.org/jira/browse/SPARK-2243, Take action: Removed unnecessary spark context objects.

Result: Running the application without any issue.
Read more ...

June 04, 2017

Why is VirtualBox only showing 32 bit guest versions on my 64 bit host OS?

Step1:
Please download the virtual box. if not showing 64bit version while configuring vm. please follow step2 and step3.


Step2:
Need check BIOS settings(is it enable or not for virtual machine? if not enable, should be enable)


Step3:
Please check now, virtual box. Still if not working, please click on the bellow link:

How to fix issue wint 64bit host OS?
Read more ...

My Favorite Site's List

#update below script more than 500 posts