how to use spark-submit with jars on terminal for storing avro format of data on directory

PHOTO

Tue Jun 28 2022 07:07:15 GMT+0000 (Coordinated Universal Time)

Saved by @irfan309 #apache-spark #spark-submit #--jars #avro #storing-in-avro-format

#when we are storing the data to any directory in AVRO FILE FORMAT, then we have to do some #extra thing that is need to download the jar from the maven like,
# if our spark is 3.0.3 version then we have to search like spark avro 3.0.3 and download the jar file from browser

#spark configuration along with jar setting in our conf
my_conf = SparkConf()
my_conf.set("spark.app.name","write API")
my_conf.set("spark.master","local[*]")
my_conf.set("spark.jars","/Downloads/spark-avro_2.12-3.0.3.jar")


#standard way of creating the df & loading the csv file
order_df = spark.read.format("csv")\
            .option("header",True)\
            .option("inferSchema",True)\
            .option("path","/Downloads/orders.csv")\
            .load()

# writing the data in avro format for that we have to download the jar and configure in sparkConf and the avro file will be stored in below path which is newfolder_data 
order_write_df = order_df.write.format("avro")\
                        .mode("overwrite")\
                        .option("path","/Users/Desktop/newfolder_data")\
                        .save()

#===============================================================================

# now how to submit this jar to process above file on terminal for that,
# on terminal use 'spark-submit' and '--jars' option <path-of-jar-with-jarfile_name.jar> our python_filename.py
#example below
spark-submit --jars C:\Downloads\spark-avro_2.12-3.0.3.jar  Writer_api.py

# done!

COPY

Created by @irfan309

Save snippets that work from anywhere online with our extensions

Comments

PyCodes

@irfan309

How to make Python GUI App into Executable convert python code to exe with icon & no console & in onefile with version file Creating Version file From python code how to use spark-submit with jars on terminal for storing avro format of data on directory How to display human time (x days ago, or now) in django.

Apache

@irfan309

how to use spark-submit with jars on terminal for storing avro format of data on directory

apache_spark

@irfan309

how to use spark-submit with jars on terminal for storing avro format of data on directory

#apache-spark #spark-submit #--jars #avro #storing-in-avro-format

how to use spark-submit with jars on terminal for storing avro format of data on directory

#when we are storing the data to any directory in AVRO FILE FORMAT, then we have to do some #extra thing that is need to download the jar from the maven like,
# if our spark is 3.0.3 version then we have to search like spark avro 3.0.3 and download the jar file from browser

#spark configuration along with jar setting in our conf
my_conf = SparkConf()
my_conf.set("spark.app.name","write API")
my_conf.set("spark.master","local[*]")
my_conf.set("spark.jars","/Downloads/spark-avro_2.12-3.0.3.jar")


#standard way of creating the df & loading the csv file
order_df = spark.read.format("csv")\
            .option("header",True)\
            .option("inferSchema",True)\
            .option("path","/Downloads/orders.csv")\
            .load()

# writing the data in avro format for that we have to download the jar and configure in sparkConf and the avro file will be stored in below path which is newfolder_data 
order_write_df = order_df.write.format("avro")\
                        .mode("overwrite")\
                        .option("path","/Users/Desktop/newfolder_data")\
                        .save()

#===============================================================================

# now how to submit this jar to process above file on terminal for that,
# on terminal use 'spark-submit' and '--jars' option <path-of-jar-with-jarfile_name.jar> our python_filename.py
#example below
spark-submit --jars C:\Downloads\spark-avro_2.12-3.0.3.jar  Writer_api.py

# done!

#apache-spark #spark-submit #--jars #avro #storing-in-avro-format

How to display human time (x days ago, or now) in django.

Django has a contrib package called "django.contrib.humanize". 
#Add this to your INSTALLED_APPS, 
#then use 
{% load humanize %} 
#in your template, after that,
you can use "value|naturaltime" template tag. "value" will be your date

ref : https://stackoverflow.com/questions/40345450/how-to-display-human-time-x-days-ago-or-now-in-django-admin

#apache-spark #vs-code #ssh #ssh-from-vs-code

to connect to remote machine from the VSCODE

Host My-Application-Name-or-project-name
    HostName XX.XXX.XXX.XX
    User ubuntu   # user name of machine to login/ssh
    IdentityFile C:\Users\JOHN\.ssh\mykeypair.pem

how to use spark-submit with jars on terminal for storing avro format of data on directory

Save snippets that work from anywhere online with our extensions

Comments

More like this

PyCodes

Apache

apache_spark

Browse more snippets >>

how to use spark-submit with jars on terminal for storing avro format of data on directory

Save snippets that work from anywhere online with our extensions

Comments

More like this

PyCodes

Apache

apache_spark

Browse more snippets >>

Embed code snippet