apache spark - Using MatrixUDT as column in SparkSQL Dataframe -


i'm trying load set of medical images spark sql dataframe. here each image loaded matrix column of dataframe. see spark added matrixudt support kind of cases, don't find sample using in dataframe.

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/linalg/matrixudt.scala

can me this.

really appreciate help.

thanks

karthik vadla

actually matrixudt has been part of o.a.s.mllib.linalg since 1.4 , has been copied o.a.s.ml.linalg. since it's never been public cannot declare correct schema doubt intended general applications. not mention api arguably limited useful in practice.

nevertheless basic conversions work fine need rdd or seq of product types (once again not possible define schema) , you're go:

import org.apache.spark.ml.linalg.matrices   seq((1, matrices.dense(2, 2, array(1, 2, 3, 4)))).todf // org.apache.spark.sql.dataframe = [_1: int, _2: matrix]  seq((1, matrices.dense(2, 2, array(1, 2, 3, 4)))).tods // org.apache.spark.sql.dataset[(int, org.apache.spark.ml.linalg.matrix)] //   = [_1: int, _2: matrix] 

Comments