DataFrame 一旦被创建,则可被DataFrame, column和 function中的函数操作,这些函数叫做 domain-specific-language (DSL) 函数。
DataFrame函数和RDD一样,也分为action和transformation,且后者是惰性的。
Spark 1.5.1函数:
dataframe:
http://spark.apache.org/docs/1.5.1/api/scala/index.html#org.apache.spark.sql.DataFrame
column:
http://spark.apache.org/docs/1.5.1/api/scala/index.html#org.apache.spark.sql.Column
functions:
http://spark.apache.org/docs/1.5.1/api/scala/index.html#org.apache.spark.sql.functions$
新增列同名时会报错
import org.apache.spark.sql.functions.explode import org.apache.spark.sql.functions.col val words1=sc.parallelize(Seq((1,Array("sun", "rises", "in")))) val words2=sc.parallelize(Seq((1,Map(("k1","v1"),("k2","v2"),("k3","v3"))))) words1.toDF("key","values").withColumn("value",explode(col("values"))).show words1.toDF("key","values").select(col("key"),explode(col("values"))).toDF("key","value").show // words2.toDF("key","values").withColumn("key",explode(col("values"))).show //报错,有两列,却只有一个列名 words2.toDF("key","values").select(col("key"),explode(col("values"))).toDF("key","subkey","value").show