當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

structField、structType、schame

發布時間：2023/12/8 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 structField、structType、schame 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1、structField

源碼結構：

case class StructField(name: String,dataType: DataType,nullable: Boolean = true,metadata: Metadata = Metadata.empty) {}

-----A field inside a StructType
name：The name of this field.
dataType：The data type of this field.
nullable：Indicates if values of this field can be null values.
metadata：The metadata of this field. The metadata should be preserved during transformation if the content of the column is not modified, e.g, in selection.

一個結構體內部的一個StructField就像一個SQL中的一個字段一樣，它包含了這個字段的具體信息，可以看如下列子：

def schema_StructField()={/*** StructField 是一個 case class ,其中是否可以為空,默認是 true，初始元信息是為空* 它是作為描述 StructType中的一個字段*/val sf = new StructField("b",IntegerType)println(sf.name)//bprintln(sf.dataType)//IntegerTypeprintln(sf.nullable)//trueprintln(sf.metadata)//{}}

2、structType

A StructType object can be constructed by

StructType(fields: Seq[StructField])

一個StructType對象，可以有多個StructField,同時也可以用名字（name）來提取,就想當于Map可以用key來提取value，但是他StructType提取的是整條字段的信息

在源碼中structType是一個case class,如下：

case class StructType(fields: Array[StructField]) extends DataType with Seq[StructField] {}

它是繼承Seq的，也就是說Seq的操作，它都擁有，但是從形式上來說，每個元素是用 ?StructField包住的。

package Datasetimport org.apache.spark.sql.types._/*** Created by root on 9/21/16.*/object schemaAnalysis {//--------------------------------------------------StructType analysis---------------------------------------val struct = StructType(StructField("a", IntegerType) ::StructField("b", LongType, false) ::StructField("c", BooleanType, false) :: Nil)def schema_StructType()={/*** 一個scheme是*/import org.apache.spark.sql.types.StructTypeval schemaTyped = new StructType().add("a","int").add("b","string")schemaTyped.foreach(println)/*** StructField(a,IntegerType,true)* StructField(b,StringType,true)*/}def structType_extracted()={// Extract a single StructField.val singleField_a = struct("a")println(singleField_a)//省卻的清空下表示：可以為空的，//StructField(a,IntegerType,true)val singleField_b = struct("b")println(singleField_b)//StructField(b,LongType,false)//val nonExisting = struct("d")//println(nonExisting)//java.lang.IllegalArgumentException: Field "d" does not exist.// Extract multiple StructFields. Field names are provided in a set.// A StructType object will be returned.val twoFields = struct(Set("b", "c"))println(twoFields)//StructType(StructField(b,LongType,false), StructField(c,BooleanType,false))// Any names without matching fields will be ignored.// For the case shown below, "d" will be ignored and// it is treated as struct(Set("b", "c")).val ignoreNonExisting = struct(Set("b", "c", "d"))println(ignoreNonExisting)// ignoreNonExisting: StructType =// StructType(List(StructField(b,LongType,false), StructField(c,BooleanType,false)))//值得注意的是：當沒有存在的字段的時候，官方文檔說：單個返回的是null，多個返回的是當沒有那個字段//但是實驗的時候，報錯---Field d does not exist//源碼調用的是apply方法，確實還沒有處理好這部分功能//我是用的是spark2.0初始版本}def structType_opration()={/*** 源碼：case class StructType(fields: Array[StructField]) extends DataType with Seq[StructField] {* 它是繼承與Seq的，也就是說 Seq的操作，StructType都有* 可以查看scala的Seq的操作:http://www.scala-lang.org/api/current/#scala.collection.Seq*/val tmpStruct = StructType(StructField("d", IntegerType)::Nil)//集合與集合的操作println(struct++tmpStruct)// println(struct++:tmpStruct)//List(StructField(a,IntegerType,true), StructField(b,LongType,false), StructField(c,BooleanType,false), StructField(d,IntegerType,true))//集合與元素的操作println(struct :+ StructField("d", IntegerType))//可以用add來進行println(struct.add("e",IntegerType))//StructType(StructField(a,IntegerType,true), StructField(b,LongType,false), StructField(c,BooleanType,false), StructField(e,IntegerType,true))//head 部分的元素println(struct.head)//StructField(a,IntegerType,true)//last 部分的元素println(struct.last)//StructField(c,BooleanType,false)println(struct.apply("a"))//StructField(a,IntegerType,true)println(struct.treeString)/*** root|-- a: integer (nullable = true)|-- b: long (nullable = false)|-- c: boolean (nullable = false)*/println(struct.contains(StructField("f", IntegerType)))//falseprintln(struct.mkString)//StructField(a,IntegerType,true)StructField(b,LongType,false)StructField(c,BooleanType,false)println(struct.prettyJson)/*** {"type" : "struct","fields" : [ {"name" : "a","type" : "integer","nullable" : true,"metadata" : { }}, {"name" : "b","type" : "long","nullable" : false,"metadata" : { }}, {"name" : "c","type" : "boolean","nullable" : false,"metadata" : { }} ]}*///更多操作可以查看API：http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.StructType}def main(args: Array[String]) {//schema_StructType()//structType_extracted()structType_opration()}}

3、Schema

---------Schema就是我們數據的數據結構描述。

? ? ? ?一個Schema是一個數據結構的描述（比如描述一個Json文件），它可以是在運行的時候隱式導入，或者在編譯的時候就導入。?它是用一個StructField集合對象的StructType描述（用一個三元tuple,內部是：name,type.nullability），本來有四個信息的為什么會說是三元數組？?其實metadata，你是可以調出來。

def schema_op()={case class Person(name: String, age: Long)val sparkSession = SparkSession.builder().appName("data set example").master("local").getOrCreate()import sparkSession.implicits._val rdd = sparkSession.sparkContext.textFile("hdfs://master:9000/src/main/resources/people.txt")val dataSet = rdd.map(_.split(",")).map(p =>Person(p(0),p(1).trim.toLong)).toDS()println(dataSet.schema)//StructType(StructField(name,StringType,true), StructField(age,LongType,false))/*** def schema: StructType = queryExecution.analyzed.schema** def apply(name: String): StructField = {* nameToField.getOrElse(name,* throw new IllegalArgumentException(s"""Field "$name" does not exist."""))* }*/val tmp: StructField = dataSet.schema("name")println(tmp)//StructField(name,StringType,true)println(tmp.name)//nameprintln(tmp.dataType)//StringTypeprintln(tmp.nullable)//trueprintln(tmp.metadata)//{}

總結

以上是生活随笔為你收集整理的structField、structType、schame的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： MapStruct超级简单的学习笔记
下一篇： [C++] 结构体Struct类型和变量