pyspark DataFrame 转RDD
生活随笔
收集整理的這篇文章主要介紹了
pyspark DataFrame 转RDD
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
# -*- coding: utf-8 -*-
from __future__ import print_function
from pyspark.sql import SparkSession
from pyspark.sql import Rowif __name__ == "__main__":# 初始化SparkSessionspark = SparkSession \.builder \.appName("RDD_and_DataFrame") \.config("spark.some.config.option", "some-value") \.getOrCreate()sc = spark.sparkContextlines = sc.textFile("employee.txt")parts = lines.map(lambda l: l.split(","))employee = parts.map(lambda p: Row(name=p[0], salary=int(p[1])))#RDD轉(zhuǎn)換成DataFrameemployee_temp = spark.createDataFrame(employee)#顯示DataFrame數(shù)據(jù)employee_temp.show()#創(chuàng)建視圖employee_temp.createOrReplaceTempView("employee")#過(guò)濾數(shù)據(jù)employee_result = spark.sql("SELECT name,salary FROM employee WHERE salary >= 14000 AND salary <= 20000")# DataFrame轉(zhuǎn)換成RDDresult = employee_result.rdd.map(lambda p: "name: " + p.name + " salary: " + str(p.salary)).collect()#打印RDD數(shù)據(jù)for n in result:print(n)
總結(jié)
以上是生活随笔為你收集整理的pyspark DataFrame 转RDD的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: pyspark.sql.DataFram
- 下一篇: 推荐算法 之协同过滤