object functions
Collection of helper functions to provide more equivalency between snowpark and spark
- Alphabetic
- By Inheritance
- functions
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- sealed trait ColumnOrInt[T] extends AnyRef
- sealed trait ColumnOrString[T] extends AnyRef
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
array(colName: String, colNames: String*): Column
Wrapper for Snowflake built-in array function.
Wrapper for Snowflake built-in array function. Create array from columns names.
- returns
The array.
-
def
array(c: Column*): Column
Wrapper for Snowflake built-in array function.
Wrapper for Snowflake built-in array function. Create array from columns.
- c
Columns to build the array.
- returns
The array.
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
asc(colname: String): Column
Function to convert column name into column and order in an ascending manner.
Function to convert column name into column and order in an ascending manner.
- colname
Column name.
- returns
Column object ordered in an ascending manner.
-
def
base64(col: Column): Column
Computes the BASE64 encoding of a column
Computes the BASE64 encoding of a column
- returns
the encoded column
-
def
bround(colName: Column): Column
Implementation for Spark bround.
Implementation for Spark bround. This function receives a column and rounds it to 0 decimals with HALF_EVEN round mode, often called as "Banker's rounding" . This means that if the number is at the same distance from an even or odd number, it will round to the even number.
- colName
Column to round.
- returns
Rounded number.
-
def
bround(colName: Column, scale: Int): Column
Implementation for Spark bround.
Implementation for Spark bround. This function receives a column with a number and rounds it to scale decimal places with HALF_EVEN round mode, often called as "Banker's rounding" . This means that if the number is at the same distance from an even or odd number, it will round to the even number.
- colName
Column to round.
- scale
Number of decimals to preserve.
- returns
Rounded number.
-
def
cbrt(columnName: String): Column
Wrapper for Spark CBRT(String) function.
Wrapper for Spark CBRT(String) function. Used to calculate the cubic root of a number. There were slight differences found: cbrt(341) -> Spark: 6.986368027818106, Snowflake: 6.986368027818107 (Notice the last decimal). cbrt(2147483647) -> Spark: 1290.159154892091, Snowflake: 1290.1591548920912 (Notice the difference at the end). This difference is acceptable.
- returns
Column object.
-
def
cbrt(e: Column): Column
Wrapper for Spark CBRT(Column) function.
Wrapper for Spark CBRT(Column) function. Used to calculate the cubic root of a number. There were slight differences found: cbrt(341) -> Spark: 6.986368027818106, Snowflake: 6.986368027818107 (Notice the last decimal). cbrt(2147483647) -> Spark: 1290.159154892091, Snowflake: 1290.1591548920912 (Notice the difference at the end). This difference is acceptable.
- returns
Column object.
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
collect_list(s: String): Column
Wrapper for Snowflake built-in collect_list function.
Wrapper for Snowflake built-in collect_list function. Get the values of array column.
- s
Column name to be collected.
- returns
The array.
-
def
collect_list(c: Column): Column
Wrapper for Snowflake built-in collect_list function.
Wrapper for Snowflake built-in collect_list function. Get the values of array column.
- c
Column to be collect.
- returns
The array.
-
def
collect_set(e: String): Column
Aggregate function: returns a set of objects with duplicate elements eliminated.
Aggregate function: returns a set of objects with duplicate elements eliminated.
- e
The column to collect the list values
- returns
A list with unique values
-
def
collect_set(e: Column): Column
Aggregate function: returns a set of objects with duplicate elements eliminated.
Aggregate function: returns a set of objects with duplicate elements eliminated.
- e
The column to collect the list values
- returns
A list with unique values
-
def
conv(c: Column, fromBase: Int, toBase: Int): Column
Wrapper for Snowflake built-in conv function.
Wrapper for Snowflake built-in conv function. Convert number with from and to base.
- c
Column to be converted.
- fromBase
Column from base format.
- toBase
Column to base format.
- returns
Column object.
-
def
date_add(start: Column, days: Column): Column
Returns the date that is
days
days afterstart
Returns the date that is
days
days afterstart
- start
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- days
The number of days to add to
start
, can be negative to subtract days- returns
A date, or null if
start
was a string that could not be cast to a date
-
def
date_add(start: Column, days: Int): Column
Returns the date that is
days
days afterstart
Returns the date that is
days
days afterstart
- start
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- days
The number of days to add to
start
, can be negative to subtract days- returns
A date, or null if
start
was a string that could not be cast to a date
-
def
date_format(c: Column, s: String): Column
Wrapper for Snowflake built-in date_format function.
Wrapper for Snowflake built-in date_format function. Converts a date into a string using the specified format.
- c
Column to convert to string.
- s
Date format.
- returns
Column object.
-
def
date_sub(start: Column, days: Int): Column
Implementation for Spark date_sub.
Implementation for Spark date_sub. This function receives a date or timestamp, as well as a properly formatted string and subtracts the specified amount of days from it. If receiving a string, this string is casted to date using try_cast and if it's not possible to cast, returns null. If receiving a timestamp it will be casted to date (removing its time).
There are some functional differences with this function:
- Snowflake infers timestamp format better than Spark when the column is a string. For example, Spark returns null when specifying:
05/15/2021 06:54:34
returns null whereas Snowflake returns the expected value.
- start
Date, Timestamp or String column to subtract days from.
- days
Days to subtract.
- returns
Column object.
- Snowflake infers timestamp format better than Spark when the column is a string. For example, Spark returns null when specifying:
-
def
desc(c: String): Column
Function to convert column name into column and order in a descending manner.
Function to convert column name into column and order in a descending manner.
- c
Column name.
- returns
Column object ordered in a descending manner.
-
def
element_at[T](column: Column, index: T)(implicit arg0: ColumnOrInt[T]): Column
Returns element of array at given index in value if column is array.
Returns element of array at given index in value if column is array. Mostly and overload for snowpark get_path
- See also
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
expr(s: String): Column
Function to convert a string into an SQL expression.
Function to convert a string into an SQL expression.
- s
SQL Expression as text.
- returns
Converted SQL Expression.
-
def
first(c: Column, nulls: Boolean): Column
Wrapper for Snowflake built-in last function.
Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping.
- c
Column to get last value.
- nulls
Consider null values or not.
- returns
Column object.
-
def
first(s: String, nulls: Boolean): Column
Wrapper for Snowflake built-in first function.
Wrapper for Snowflake built-in first function. Gets the first value of a column according to its grouping.
- s
Column name to get first value.
- nulls
Consider null values or not.
- returns
Column object.
-
def
first(s: String): Column
Wrapper for Snowflake built-in first function.
Wrapper for Snowflake built-in first function. Gets the first value of a column according to its grouping.
- returns
Column object.
-
def
first(c: Column): Column
Wrapper for Snowflake built-in first function.
Wrapper for Snowflake built-in first function. Gets the first value of a column according to its grouping.
- c
Column to obtain first value.
- returns
Column object.
-
def
format_number(x: Column, d: Int): Column
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.
If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.
- x
numeric column to be transformed
- d
Amount of decimal for the number format
- returns
Number casted to the specific string format
-
def
format_string(format: String, arguments: Column*): Column
Formats the arguments in printf-style and returns the result as a string column.
Formats the arguments in printf-style and returns the result as a string column.
- format
the printf-style format
- arguments
arguments for the formatting string
- returns
formatted string
- Note
this function requires the format_string UDF to be previosly created
-
def
from_json(e: Column): Column
Wrapper for Spark
from_json
column function.Wrapper for Spark
from_json
column function. This function converts a JSON string to a struct in Spark (variant in Snowflake). Spark has several overloads for this function, where you specify the schema in which to convert it to the desired names and datatypes.Since we are using
TRY_PARSE_JSON
to replicate the functionality, we're not able to specify the structure. Snowflake will automatically parse everything. However, it parses everything as strings. Manual casts need to be performed, for example:There were differences found between Spark from_json and this function when performing a select on the resulting column:
- Float type: Spark returns only 7 floating points, whereas Snowflake returns more.
- Timestamp type: Spark interprets input date as UTC and transforms to local timestamp, whereas Spark leaves the timestamp as-is.
- Spark allows the selection of values within the resulting object by navigating through the object with a '.' notation. For example: df.select("json.relative.age") On Snowflake, however this does not work. It is required to use df.selectExprs function, and the same example should be translated to "json['relative']['age']" to access the value.
- Since Spark receives a schema definition for the JSON string to read, it reads the values from the JSON and converts it to the specified data type.
On Snowflake the values are converted automatically, however they're converted as variants, meaning that the printSchema function would return different datatypes.
To convert the datatype and it to be printed as the expected datatype, it should be read on the selectExpr function as "json['relative']['age']::integer".
val data_for_json = Seq( (1, "{\"id\": 172319, \"age\": 41, \"relative\": {\"id\": 885471, \"age\": 29}}"), (2, "{\"id\": 532161, \"age\": 17, \"relative\":{\"id\": 873513, \"age\": 47}}") ) val data_for_json_column = Seq("col1", "col2") val df_for_json = session.createDataFrame(data_for_json).toDF(data_for_json_column) val json_df = df_for_json.select( from_json(col("col2")).as("json") ) json_df.selectExpr( "json['id']::integer as id" , "json['age']::integer as age" , "json['relative']['id']::integer as rel_id" , "json['relative']['age']::integer as rel_age" ).show(10, 10000)
----------------------------------------- |"ID" |"AGE" |"REL_ID" |"REL_AGE" | ----------------------------------------- |172319 |41 |885471 |29 | |532161 |17 |873513 |47 | -----------------------------------------
- e
String column to convert to variant.
- returns
Column object.
-
def
from_unixtime(ut: Column, f: String): Column
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
- ut
A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
- f
A date time pattern that the input will be formatted to
- returns
A string, or null if
ut
was a string that could not be cast to a long orf
was an invalid date time pattern
-
def
from_unixtime(ut: Column): Column
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.
- ut
A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch
- returns
A string, or null if the input was a string that could not be cast to a long
-
def
from_utc_timestamp(ts: Column, tz: Column): Column
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.
- ts
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- tz
A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
- returns
A timestamp, or null if
ts
was a string that could not be cast to a timestamp ortz
was an invalid value
-
def
from_utc_timestamp(ts: Column, tz: String): Column
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.
- ts
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- tz
A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
- returns
A timestamp, or null if
ts
was a string that could not be cast to a timestamp ortz
was an invalid value
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hex(c: Column): Column
Wrapper for Snowflake built-in hex_encode function.
Wrapper for Snowflake built-in hex_encode function. Returns the hexadecimal representation of a string.
- c
Column to encode.
- returns
Encoded string.
-
def
instr(str: Column, substring: String): Column
Locate the position of the first occurrence of substr column in the given string.
Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.
- Note
The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isnull(c: Column): Column
Wrapper for Snowflake built-in isnull function.
Wrapper for Snowflake built-in isnull function. Gets a boolean depending if value is NULL or not.
- c
Column to qnalize if it is null value.
- returns
Column object.
-
def
json_tuple(json: Column, fields: String*): Seq[Column]
Wrapper for Spark json_tuple.
Wrapper for Spark json_tuple. This leverages JSON_EXTRACT_PATH_TEXT and improves functionality by allowing multiple columns in a single call, whereas JSON_EXTRACT_PATH_TEXT must be called once for every column.
There were differences found between Spark json_tuple and this function:
- Float type: Spark returns only 6 floating points, whereas Snowflake returns more.
- Timestamp type: Spark interprets input date as UTC and transforms to local timestamp, whereas Spark leaves the timestamp as-is.
- Complex JSON path expressions: This function allows the retrieval of values within json objects, whereas Spark only allows values from the root.
- Identifiers with spaces: Snowflake returns error when an invalid expression is sent, whereas Spark returns null.
Usage:
df = session.createDataFrame(Seq(("CR", "{\"id\": 5, \"name\": \"Jose\", \"age\": 29}"))).toDF(Seq("nationality", "json_string"))
When the result of this function is the only part of the select statement, no changes are needed:df.select(json_tuple(col("json_string"), "id", "name", "age")).show()
---------------------- |"C0" |"C1" |"C2" | ---------------------- |5 |Jose |29 | ----------------------
However, when specifying multiple columns, an expression like this is required:df.select( col("nationality") , json_tuple(col("json_string"), "id", "name", "age"):_* // Notice the :_* syntax. ).show()
------------------------------------------------- |"NATIONALITY" |"C0" |"C1" |"C2" |"C3" | ------------------------------------------------- |CR |5 |Jose |29 |Mobilize | -------------------------------------------------
- json
Column containing the JSON string text.
- fields
Fields to pull from the JSON file.
- returns
Column sequence with the specified strings.
-
def
last(c: Column, nulls: Boolean): Column
Wrapper for Snowflake built-in last function.
Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping. Functional difference with windows, In Snowpark is needed the order by. SQL doesn't guarantee the order.
- c
Column to get last value.
- nulls
Consider null values or not.
- returns
Column object.
-
def
last(s: String, nulls: Boolean): Column
Wrapper for Snowflake built-in last function.
Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping. Functional difference with windows, In Snowpark is needed the order by. SQL doesn't guarantee the order.
- s
Column name to get last value.
- nulls
Consider null values or not.
- returns
Column object.
-
def
last(s: String): Column
Wrapper for Snowflake built-in last function.
Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping.
- returns
Column object.
-
def
last(c: Column): Column
Wrapper for Snowflake built-in last function.
Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping. Functional difference with windows, In Snowpark is needed the order by. SQL doesn't guarantee the order.
- c
Column to obtain last value.
- returns
Column object.
-
def
locate(substr: Column, str: Column, pos: Int): Column
Locate the position of the first occurrence of substr in a string column, after position pos.
Locate the position of the first occurrence of substr in a string column, after position pos.
- substr
string to search
- str
value where string will be searched
- pos
index for starting the search
- returns
returns the position of the first occurrence.
- Note
The position is not zero based, but 1 based index. returns 0 if substr could not be found in str. This function is just leverages the SF POSITION builtin
-
def
locate(substr: String, str: Column, pos: Int = 0): Column
Locate the position of the first occurrence of substr in a string column, after position pos.
Locate the position of the first occurrence of substr in a string column, after position pos.
- substr
string to search
- str
value where string will be searched
- pos
index for starting the search
- returns
Returns the position of the first occurrence
- Note
The position is not zero based, but 1 based index. returns 0 if substr could not be found in str. This function is just leverages the SF POSITION builtin
-
def
log10(columnName: String): Column
Computes the logarithm of the given column in base 10.
Computes the logarithm of the given column in base 10.
- columnName
Column to apply this mathematical operation
- returns
log2 of the given column
-
def
log10(expr: Column): Column
Computes the logarithm of the given column in base 10.
Computes the logarithm of the given column in base 10.
- expr
Column to apply this mathematical operation
- returns
log2 of the given column
-
def
log1p(col: Column): Column
Computes the natural logarithm of the given value plus one.
Computes the natural logarithm of the given value plus one.
- col
the value to use
- returns
the natural logarithm of the given value plus one.
-
def
log1p(columnName: String): Column
Computes the natural logarithm of the given value plus one.
Computes the natural logarithm of the given value plus one.
- columnName
the value to use
- returns
the natural logarithm of the given value plus one.
-
def
log2(columnName: String): Column
Computes the logarithm of the given column in base 2.
Computes the logarithm of the given column in base 2.
- columnName
Column to apply this mathematical operation
- returns
log2 of the given column
-
def
log2(expr: Column): Column
Computes the logarithm of the given column in base 2.
Computes the logarithm of the given column in base 2.
- expr
Column to apply this mathematical operation
- returns
log2 of the given column
-
def
monotonically_increasing_id(): Column
A column expression that generates monotonically increasing 64-bit integers.
-
def
months_between(end: Column, start: Column): Column
Returns number of months between dates
start
andend
.Returns number of months between dates
start
andend
.A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.
For example:
months_between("2017-11-14", "2017-07-14") // returns 4.0 months_between("2017-01-01", "2017-01-10") // returns 0.29032258 months_between("2017-06-01", "2017-06-16 12:00:00") // returns -0.5
- end
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- start
A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as
yyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- returns
A double, or null if either
end
orstart
were strings that could not be cast to a timestamp. Negative ifend
is beforestart
-
def
nanvl(expr1: Column, expr2: Column): Column
Returns expr1 if it is not NaN, or expr2 if expr1 is NaN.
Returns expr1 if it is not NaN, or expr2 if expr1 is NaN.
- expr1
expression when value is NaN
- expr2
expression when value is not NaN
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- def ntile(n: Int): Column
-
def
randn(seed: Long): Column
Wrapper for Spark randn(seed).
Wrapper for Spark randn(seed). It will return a call to the Snowflake RANDOM function. The return values differ from Snowflake to Spark, however this difference is accepted. Spark returns a Float number between -9 and 9 with 15-17 floating points, whereas Snowflake returns integers of 17-19 digits.
- seed
Seed to use in the random function.
- returns
Random number.
-
def
randn(): Column
Wrapper for Spark randn.
Wrapper for Spark randn. It will return a call to the Snowflake RANDOM function. The return values differ from Snowflake to Spark, however this difference is accepted. Spark returns a Float number between -9 and 9 with 15-17 floating points, whereas Snowflake returns integers of 17-19 digits.
- returns
Random number.
-
def
regexp_extract(colName: Column, exp: String, grpIdx: Int): Column
Implementation for Spark
regexp_extract
.Implementation for Spark
regexp_extract
. This function receives a column and extracts the groupIdx from the string after applying the exp regex. Spark returns empty string when the string doesn't match and null if the input is null. This behavior has been replicated, since Snowflake returned null on both cases.This function applies the
case sensitive
andextract
flags. It doesn't apply multiline nor .* matches newlines. Even though Snowflake is capable of doing so with theregexp_substr
function, Spark's behavior for this function doesn't allow this. If these flags need to be applied, usebuiltin("REGEXP_SUBSTR")
instead and apply the desired flags.Note: This function ensures that the function has the same functionality on Snowflake as it does on Spark. However, it doesn't guarantee that regular expressions on Spark will work the same on Snowflake. They still have different engines and the regex itself needs to be analyzed manually. For example: non-greedy tokens such as
.*?
don't work on Snowflake as they do on Spark.- colName
Column to apply regex.
- exp
Regex expression to apply.
- grpIdx
Group to extract.
- returns
Column object.
-
def
regexp_replace(strExpr: Column, pattern: String, replacement: String): Column
Wrapper for Snowflake built-in regexp_replace function.
Wrapper for Snowflake built-in regexp_replace function. Replaces parts of a string with the specified replacement value, based on a regular expression.
- strExpr
String to apply replacement.
- pattern
Regex pattern to find in the expression.
- replacement
Column to replace within the string.
- returns
Column object.
-
def
regexp_replace(strExpr: Column, pattern: Column, replacement: Column): Column
Wrapper for Snowflake built-in regexp_replace function.
Wrapper for Snowflake built-in regexp_replace function. Replaces parts of a string with the specified replacement value, based on a regular expression.
- strExpr
String to apply replacement.
- pattern
Regex pattern to find in the expression.
- replacement
Column to replace within the string.
- returns
Column object.
-
def
reverse(c: Column): Column
Wrapper for Snowflake built-in reverse function.
Wrapper for Snowflake built-in reverse function. Gets the reversed string.
- c
Column to be reverse.
- returns
Column object.
-
def
shiftleft(c: Column, numBits: Int): Column
Wrapper for bitshiftleft.
Wrapper for bitshiftleft. Shifts numBits bits of a number to the left. There is a slight difference between Spark and Snowflake's implementation. When shifting an integer value, if Snowflake detects that the shift will exceed the maximum value for the integers, it will convert it to Long, whereas Spark will make it negative and start counting from the bottom. Example: shiftleft(2147483647, 5) returns -32 on Spark, but on Snowflake it returns 68719476704. This is fixed by casting the column in Spark to LongType. In general, this difference is accepted.
- c
Column to modify.
- numBits
Number of bits to shift.
- returns
Column object.
-
def
shiftright(c: Column, numBits: Int): Column
Wrapper for bitshiftright.
Wrapper for bitshiftright. Shifts numBits bits of a number to the right. There might be differences on this function similar to the ones explained on shiftleft.
- c
Column to modify.
- numBits
Number of bits to shift.
- returns
Column object.
-
def
signum(columnName: String): Column
Wrapper for Spark signum function.
Wrapper for Spark signum function. Returns the sign of the given column. Returns either 1 for positive, 0 for 0 or NaN -1 for negative and null for null.
There are some functional differences with Spark:
- Spark returns NaN when NaN is provided.
- Spark and Snowflake can receive string and attempts to cast. If it casts correctly, returns the calculation, if not returns Spark returns null whereas Snowflake returns error.
- columnName
Name of the column to calculate the sign.
- returns
Column object.
-
def
signum(colName: Column): Column
Wrapper for Spark signum function.
Wrapper for Spark signum function. Returns the sign of the given column. Returns either 1 for positive, 0 for 0 or NaN, -1 for negative and null for null.
There are some functional differences with Spark:
- Spark returns NaN when NaN is provided.
- Spark and Snowflake can receive string and attempts to cast. If it casts correctly, returns the calculation, if not returns Spark returns null whereas Snowflake returns error.
- returns
Column object.
-
def
size(c: Column): Column
Wrapper for Snowflake built-in size function.
Wrapper for Snowflake built-in size function. Gets the size of array column.
- c
Column to get the size.
- returns
Size of array column.
- def substring_index[TStr, TDelim, TCount](str: TStr, delim: TDelim, count: TCount)(implicit arg0: ColumnOrString[TStr], arg1: ColumnOrString[TDelim], arg2: ColumnOrInt[TCount]): Column
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
to_utc_timestamp(ts: Column, tz: Column): Column
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
- ts
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- tz
A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
- returns
A timestamp, or null if
ts
was a string that could not be cast to a timestamp ortz
was an invalid value
-
def
to_utc_timestamp(ts: Column, tz: String): Column
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
- ts
A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as
yyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- tz
A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.
- returns
A timestamp, or null if
ts
was a string that could not be cast to a timestamp ortz
was an invalid value
-
def
unbase64(col: Column): Column
Decodes a BASE64 encoded string
Decodes a BASE64 encoded string
- returns
the decoded column
-
def
unhex(c: Column): Column
Wrapper for Snowflake built-in hex_decode_string function.
Wrapper for Snowflake built-in hex_decode_string function. Returns the string representation of a hexadecimal value.
- c
Column to encode.
- returns
Encoded string.
-
def
unix_timestamp(s: Column, p: String): Column
Converts time string with given pattern to Unix timestamp (in seconds).
Converts time string with given pattern to Unix timestamp (in seconds).
- s
A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as
yyyy-MM-dd
oryyyy-MM-dd HH:mm:ss.SSSS
- p
A date time pattern detailing the format of
s
whens
is a string- returns
A long, or null if
s
was a string that could not be cast to a date orp
was an invalid format
-
def
unix_timestamp(s: Column): Column
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.
- s
A date, timestamp or string. If a string, the data must be in the
yyyy-MM-dd HH:mm:ss
format- returns
A long, or null if the input was a string not of the correct format
-
def
unix_timestamp(): Column
Returns the current Unix timestamp (in seconds) as a long.
Returns the current Unix timestamp (in seconds) as a long.
- Note
All calls of
unix_timestamp
within the same query return the same value
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
def
when(condition: Column, result: Any): ExtendedCaseExpr
Wrapper for the when expression to facilite some castings
Wrapper for the when expression to facilite some castings
- condition
expression to evaluate
- result
the value to return if expression is true
- returns
Column object.
- object ColumnOrInt
- object ColumnOrString
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated @deprecated
- Deprecated
(Since version ) see corresponding Javadoc for more information.
Snowpark by itself is a powerful library, but still some utility functions can always help.
snowpark-extensions
Snowpark by itself is a powerful library, but still some utility functions can always help.
The source code for this library is available here
Installation
With Maven you can add something like this to your POM:
or with sbt use
Usage
just import it at the top of your file and it will automatically extend your snowpark package.
For example:
Extensions
See Session Extensions
See Session Builder Extensions
See DataFrame Extensions
See Column Extensions
See Function Extensions