class ExtendedDataFrame extends AnyRef
DataFrame extension class.
- Alphabetic
- By Inheritance
- ExtendedDataFrame
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
ExtendedDataFrame(df: DataFrame)
- df
DataFrame to extend functionality.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
cache(): DataFrame
Caches the result of the DataFrame and creates a new Dataframe, whose operations won't affect the original DataFrame.
Caches the result of the DataFrame and creates a new Dataframe, whose operations won't affect the original DataFrame.
- returns
New cached DataFrame.
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
collectAsList(): List[Row]
Implementation of Spark's collectAsList function.
Implementation of Spark's collectAsList function. Collects the DataFrame and converts it to a java.util.List[Row] object.
- returns
A java.util.List[Row] representation of the DataFrame.
-
def
columns: Seq[String]
Function that returns a Seq of strings with the DataFrame's column names.
Function that returns a Seq of strings with the DataFrame's column names.
- returns
list of columns in the DataFrame
-
def
dropDuplicates(columns: Seq[String]): DataFrame
Overload of dropDuplicates to comply with Spark's implementations of dropDuplicates function.
Overload of dropDuplicates to comply with Spark's implementations of dropDuplicates function. Unspecified columns from the dataframe will be preserved, but won't be considered to calculate duplicates. For rows with different values on unspecified columns, it will return the first row.
- columns
List of columns to group by to detect the duplicates.
- returns
DataFrame without duplicates on the specified columns.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
filter(conditionExpr: String): DataFrame
Overload of filter to comply with Spark's implementation of filter function when receiving a SQL expression.
Overload of filter to comply with Spark's implementation of filter function when receiving a SQL expression.
- conditionExpr
SQL conditional expression to filter the dataset on it.
- returns
DataFrame filtered on the specified SQL expression.
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
head(n: Int): Array[Row]
Equivalent to Spark's head.
Equivalent to Spark's head. Returns the first N rows. Spark's default behavior with Empty DataFrames throw an error when executing this function, however this behavior isn't replicated exactly.
- n
Amount of rows to return.
- returns
Array with the amount of rows specified in the parameter.
-
def
head(): Option[Row]
Equivalent to Spark's head.
Equivalent to Spark's head. Returns the first row. Since this is an Option element, a
.get
is required to get the actual row. Spark's default behavior with Empty DataFrames throw an error when executing this function, however this behavior isn't replicated exactly.- returns
The first row of the DataFrame.
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
orderBy(sortCol: String, sortCols: String*): DataFrame
Alias for Spark OrderBy function.
Alias for Spark OrderBy function. Receives column names. While testing Spark OrderBy, it doesn't accept SQL Expressions, only column names.
- sortCol
Column name 1
- sortCols
Variable column names
- returns
DataFrame filtered on the variable names.
-
def
orderBy(sortExprs: Column*): DataFrame
Alias for Spark OrderBy function.
Alias for Spark OrderBy function. Receives columns or column expressions.
- sortExprs
Column expressions to order the dataset by.
- returns
Returns the dataset ordered by the specified expressions
-
def
printSchema(): Unit
Alias for Spark printSchema.
Alias for Spark printSchema. This is a shortcut to schema.printTreeString(). Prints the schema of the DataFrame in a tree format. Includes column names, data types and if they're nullable or not. The results between this function and Spark's implementation is not identical, but it is very similar.
-
def
selectExpr(exprs: String*): DataFrame
Equivalent to Spark's selectExpr.
Equivalent to Spark's selectExpr. Selects columns based on the expressions specified. They could either be column names, or calls to other functions such as conversions, case expressions, among others.
- exprs
Expressions to apply to select from the DataFrame.
- returns
DataFrame with the selected expressions as columns. Unspecified columns are not included.
-
def
startCols: ColumnsSimplifier
Column simplifier object to increase performance of withColumns functionality.
Column simplifier object to increase performance of withColumns functionality.
- returns
Column simplifier class
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
take(n: Int): Array[Row]
Equivalent to Spark's take.
Equivalent to Spark's take. Returns the first N rows. Spark's implementation of this function differs from the one from head. In paper they have the same functionality, but in practice they have different implementations since head is mostly used for returning small numbers whereas take can be used for larger amounts of rows. This function does not make a difference on implementation from head.
- n
Amount of rows to return.
- returns
Array with the amount of rows specified in the parameter.
-
def
toJSON: DataFrame
Implementation of Spark's toJSON function.
Implementation of Spark's toJSON function. Converts each row into a JSON object and returns a DataFrame with a single column.
- returns
DataFrame with 1 column whose value corresponds to a JSON object of the row.
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
transform(func: (DataFrame) ⇒ DataFrame): DataFrame
Transforms the DataFrame according to the function from the parameter.
Transforms the DataFrame according to the function from the parameter.
- func
Function to apply to the DataFrame.
- returns
DataFrame with the transformation applied.
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
def
withColumnRenamed(existingName: String, newName: String): DataFrame
Function that returns the dataframe with a column renamed.
Function that returns the dataframe with a column renamed.
- existingName
Name of the column to rename.
- newName
New name to give to the column.
- returns
DataFrame with the column renamed.
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated @deprecated
- Deprecated
(Since version ) see corresponding Javadoc for more information.
Snowpark by itself is a powerful library, but still some utility functions can always help.
snowpark-extensions
Snowpark by itself is a powerful library, but still some utility functions can always help.
The source code for this library is available here
Installation
With Maven you can add something like this to your POM:
or with sbt use
Usage
just import it at the top of your file and it will automatically extend your snowpark package.
For example:
Extensions
See Session Extensions
See Session Builder Extensions
See DataFrame Extensions
See Column Extensions
See Function Extensions