Jul 11, 2021Spark Streaming Checkpoint Directory explainedSpark streaming is one of the best way to process data as it arrives. Spark streaming is being used by the vast of the organizations to do streaming ETL. It is fault tolerant, efficient and reliable way of processing real time data as it arrives. The fault tolerance of spark…Spark Streaming3 min read
Feb 26, 2021How to add SSL security to your Kafka brokers and securely transmit receive dataHi Folks, I am going to explain how to add SSL to your kafka brokers and transmit/receive/access data securely.Kafka3 min read
May 10, 2020Configuring Hive metastore for sparkHi Folk’s , This post is going to be about how to configure hive metastore to access your hive table’s in spark sql. Defaults is spark’s default metastore respective to the session. 1)In Spark-env.sh Add your mysql connector jar path in the class path and hive home variable as shown below.Hive2 min read
Mar 15, 2020Running a spark submit command on dockerHi folks, This blog is about how to run a spark submit on a docker container. I was spending some time on fixing an issue for running a spark submit inside a docker container. …Docker1 min read
Feb 20, 2020Automatic loading of data to Snowflake using snow pipe from an external source when new files are pushed to the sourceHi Folks, I have done the the configuration if automatic load of data to Snowflake from an external source when new files are pushed to the source. Note there’s no screenshot for it , But you can refer the below video for reference. Automatically Ingesting Streaming Data with Snowpipe See how anyone can use Snowpipe to automatically ingest their streaming data from S3 directly into Snowflake. You can…resources.snowflake.comSnowflake2 min read
Feb 20, 2020Usage of Deequ Suite by Amazon for DQC(Data Quality Checks) for your dataHey folks, Big data is becoming more and more trendier nowadays, as the technologies in the world tend to grow the collection of data in order to provide a good experience for the subscriber’s also tend to increase. In this kind of occasion’s there will be places where you don’t…Data Quality4 min read
Feb 16, 2020Spark — hbase integrationHey Folks Thought of sharing a solution for an issue which took me a week or so to figure to the solution for it. Hbase is a Nosql technology which runs over Hadoop, As huge amount’s of data are getting generated every minute some might be without schema, so it…Spark3 min read