Packages

  • package root

    Snowpark by itself is a powerful library, but still some utility functions can always help.

    Snowpark by itself is a powerful library, but still some utility functions can always help.

    snowpark-extensions

    Snowpark by itself is a powerful library, but still some utility functions can always help.

    The source code for this library is available here

    Installation

    With Maven you can add something like this to your POM:

    <dependency>
        <groupId>net.mobilize.snowpark-extensions</groupId>
        <artifactId>snowparkextensions</artifactId>
        <version>0.0.9</version>
    </dependency>
    

    or with sbt use

    libraryDependencies += "net.mobilize.snowpark-extensions" % "snowparkextensions" % "0.0.16"
    

    Usage

    just import it at the top of your file and it will automatically extend your snowpark package.

    For example:

    
    
    
    import com.snowflake.snowpark_extensions.Extensions._
    import com.snowflake.snowpark.Session
    
    val new_session = Session.builder.from_snowsql().appName("app1").create()
    

    Extensions

    See Session Extensions
    See Session Builder Extensions
    See DataFrame Extensions
    See Column Extensions
    See Function Extensions

    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package snowflake
    Definition Classes
    com
  • package snowpark_extensions
    Definition Classes
    snowflake
  • object Extensions
    Definition Classes
    snowpark_extensions
  • functions

object functions

Collection of helper functions to provide more equivalency between snowpark and spark

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. functions
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. sealed trait ColumnOrInt[T] extends AnyRef
  2. sealed trait ColumnOrString[T] extends AnyRef

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def array(colName: String, colNames: String*): Column

    Wrapper for Snowflake built-in array function.

    Wrapper for Snowflake built-in array function. Create array from columns names.

    returns

    The array.

  5. def array(c: Column*): Column

    Wrapper for Snowflake built-in array function.

    Wrapper for Snowflake built-in array function. Create array from columns.

    c

    Columns to build the array.

    returns

    The array.

  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def asc(colname: String): Column

    Function to convert column name into column and order in an ascending manner.

    Function to convert column name into column and order in an ascending manner.

    colname

    Column name.

    returns

    Column object ordered in an ascending manner.

  8. def base64(col: Column): Column

    Computes the BASE64 encoding of a column

    Computes the BASE64 encoding of a column

    returns

    the encoded column

  9. def bround(colName: Column): Column

    Implementation for Spark bround.

    Implementation for Spark bround. This function receives a column and rounds it to 0 decimals with HALF_EVEN round mode, often called as "Banker's rounding" . This means that if the number is at the same distance from an even or odd number, it will round to the even number.

    colName

    Column to round.

    returns

    Rounded number.

  10. def bround(colName: Column, scale: Int): Column

    Implementation for Spark bround.

    Implementation for Spark bround. This function receives a column with a number and rounds it to scale decimal places with HALF_EVEN round mode, often called as "Banker's rounding" . This means that if the number is at the same distance from an even or odd number, it will round to the even number.

    colName

    Column to round.

    scale

    Number of decimals to preserve.

    returns

    Rounded number.

  11. def cbrt(columnName: String): Column

    Wrapper for Spark CBRT(String) function.

    Wrapper for Spark CBRT(String) function. Used to calculate the cubic root of a number. There were slight differences found: cbrt(341) -> Spark: 6.986368027818106, Snowflake: 6.986368027818107 (Notice the last decimal). cbrt(2147483647) -> Spark: 1290.159154892091, Snowflake: 1290.1591548920912 (Notice the difference at the end). This difference is acceptable.

    returns

    Column object.

  12. def cbrt(e: Column): Column

    Wrapper for Spark CBRT(Column) function.

    Wrapper for Spark CBRT(Column) function. Used to calculate the cubic root of a number. There were slight differences found: cbrt(341) -> Spark: 6.986368027818106, Snowflake: 6.986368027818107 (Notice the last decimal). cbrt(2147483647) -> Spark: 1290.159154892091, Snowflake: 1290.1591548920912 (Notice the difference at the end). This difference is acceptable.

    returns

    Column object.

  13. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  14. def collect_list(s: String): Column

    Wrapper for Snowflake built-in collect_list function.

    Wrapper for Snowflake built-in collect_list function. Get the values of array column.

    s

    Column name to be collected.

    returns

    The array.

  15. def collect_list(c: Column): Column

    Wrapper for Snowflake built-in collect_list function.

    Wrapper for Snowflake built-in collect_list function. Get the values of array column.

    c

    Column to be collect.

    returns

    The array.

  16. def collect_set(e: String): Column

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    e

    The column to collect the list values

    returns

    A list with unique values

  17. def collect_set(e: Column): Column

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    e

    The column to collect the list values

    returns

    A list with unique values

  18. def conv(c: Column, fromBase: Int, toBase: Int): Column

    Wrapper for Snowflake built-in conv function.

    Wrapper for Snowflake built-in conv function. Convert number with from and to base.

    c

    Column to be converted.

    fromBase

    Column from base format.

    toBase

    Column to base format.

    returns

    Column object.

  19. def date_add(start: Column, days: Column): Column

    Returns the date that is days days after start

    Returns the date that is days days after start

    start

    A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

    days

    The number of days to add to start, can be negative to subtract days

    returns

    A date, or null if start was a string that could not be cast to a date

  20. def date_add(start: Column, days: Int): Column

    Returns the date that is days days after start

    Returns the date that is days days after start

    start

    A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

    days

    The number of days to add to start, can be negative to subtract days

    returns

    A date, or null if start was a string that could not be cast to a date

  21. def date_format(c: Column, s: String): Column

    Wrapper for Snowflake built-in date_format function.

    Wrapper for Snowflake built-in date_format function. Converts a date into a string using the specified format.

    c

    Column to convert to string.

    s

    Date format.

    returns

    Column object.

  22. def date_sub(start: Column, days: Int): Column

    Implementation for Spark date_sub.

    Implementation for Spark date_sub. This function receives a date or timestamp, as well as a properly formatted string and subtracts the specified amount of days from it. If receiving a string, this string is casted to date using try_cast and if it's not possible to cast, returns null. If receiving a timestamp it will be casted to date (removing its time).

    There are some functional differences with this function:

    • Snowflake infers timestamp format better than Spark when the column is a string. For example, Spark returns null when specifying: 05/15/2021 06:54:34 returns null whereas Snowflake returns the expected value.
    start

    Date, Timestamp or String column to subtract days from.

    days

    Days to subtract.

    returns

    Column object.

  23. def desc(c: String): Column

    Function to convert column name into column and order in a descending manner.

    Function to convert column name into column and order in a descending manner.

    c

    Column name.

    returns

    Column object ordered in a descending manner.

  24. def element_at[T](column: Column, index: T)(implicit arg0: ColumnOrInt[T]): Column

    Returns element of array at given index in value if column is array.

    Returns element of array at given index in value if column is array. Mostly and overload for snowpark get_path

    See also

    Snowpark get_path

  25. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  27. def expr(s: String): Column

    Function to convert a string into an SQL expression.

    Function to convert a string into an SQL expression.

    s

    SQL Expression as text.

    returns

    Converted SQL Expression.

  28. def first(c: Column, nulls: Boolean): Column

    Wrapper for Snowflake built-in last function.

    Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping.

    c

    Column to get last value.

    nulls

    Consider null values or not.

    returns

    Column object.

  29. def first(s: String, nulls: Boolean): Column

    Wrapper for Snowflake built-in first function.

    Wrapper for Snowflake built-in first function. Gets the first value of a column according to its grouping.

    s

    Column name to get first value.

    nulls

    Consider null values or not.

    returns

    Column object.

  30. def first(s: String): Column

    Wrapper for Snowflake built-in first function.

    Wrapper for Snowflake built-in first function. Gets the first value of a column according to its grouping.

    returns

    Column object.

  31. def first(c: Column): Column

    Wrapper for Snowflake built-in first function.

    Wrapper for Snowflake built-in first function. Gets the first value of a column according to its grouping.

    c

    Column to obtain first value.

    returns

    Column object.

  32. def format_number(x: Column, d: Int): Column

    Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.

    Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.

    If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.

    x

    numeric column to be transformed

    d

    Amount of decimal for the number format

    returns

    Number casted to the specific string format

  33. def format_string(format: String, arguments: Column*): Column

    Formats the arguments in printf-style and returns the result as a string column.

    Formats the arguments in printf-style and returns the result as a string column.

    format

    the printf-style format

    arguments

    arguments for the formatting string

    returns

    formatted string

    Note

    this function requires the format_string UDF to be previosly created

  34. def from_json(e: Column): Column

    Wrapper for Spark from_json column function.

    Wrapper for Spark from_json column function. This function converts a JSON string to a struct in Spark (variant in Snowflake). Spark has several overloads for this function, where you specify the schema in which to convert it to the desired names and datatypes.

    Since we are using TRY_PARSE_JSON to replicate the functionality, we're not able to specify the structure. Snowflake will automatically parse everything. However, it parses everything as strings. Manual casts need to be performed, for example:

    There were differences found between Spark from_json and this function when performing a select on the resulting column:

    • Float type: Spark returns only 7 floating points, whereas Snowflake returns more.
    • Timestamp type: Spark interprets input date as UTC and transforms to local timestamp, whereas Spark leaves the timestamp as-is.
    • Spark allows the selection of values within the resulting object by navigating through the object with a '.' notation. For example: df.select("json.relative.age") On Snowflake, however this does not work. It is required to use df.selectExprs function, and the same example should be translated to "json['relative']['age']" to access the value.
    • Since Spark receives a schema definition for the JSON string to read, it reads the values from the JSON and converts it to the specified data type. On Snowflake the values are converted automatically, however they're converted as variants, meaning that the printSchema function would return different datatypes. To convert the datatype and it to be printed as the expected datatype, it should be read on the selectExpr function as "json['relative']['age']::integer".
      val data_for_json = Seq(
        (1, "{\"id\": 172319, \"age\": 41, \"relative\": {\"id\": 885471, \"age\": 29}}"),
        (2, "{\"id\": 532161, \"age\": 17, \"relative\":{\"id\": 873513, \"age\": 47}}")
      )
      val data_for_json_column = Seq("col1", "col2")
      val df_for_json = session.createDataFrame(data_for_json).toDF(data_for_json_column)
      
      val json_df = df_for_json.select(
        from_json(col("col2")).as("json")
      )
      
      json_df.selectExpr(
        "json['id']::integer as id"
        , "json['age']::integer as age"
        , "json['relative']['id']::integer as rel_id"
        , "json['relative']['age']::integer as rel_age"
      ).show(10, 10000)
      

    -----------------------------------------
    |"ID"    |"AGE"  |"REL_ID"  |"REL_AGE"  |
    -----------------------------------------
    |172319  |41     |885471    |29         |
    |532161  |17     |873513    |47         |
    -----------------------------------------
    

    e

    String column to convert to variant.

    returns

    Column object.

  35. def from_unixtime(ut: Column, f: String): Column

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

    ut

    A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch

    f

    A date time pattern that the input will be formatted to

    returns

    A string, or null if ut was a string that could not be cast to a long or f was an invalid date time pattern

  36. def from_unixtime(ut: Column): Column

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the yyyy-MM-dd HH:mm:ss format.

    ut

    A number of a type that is castable to a long, such as string or integer. Can be negative for timestamps before the unix epoch

    returns

    A string, or null if the input was a string that could not be cast to a long

  37. def from_utc_timestamp(ts: Column, tz: Column): Column

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.

    ts

    A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

    tz

    A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.

    returns

    A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value

  38. def from_utc_timestamp(ts: Column, tz: String): Column

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.

    ts

    A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

    tz

    A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.

    returns

    A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value

  39. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  40. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  41. def hex(c: Column): Column

    Wrapper for Snowflake built-in hex_encode function.

    Wrapper for Snowflake built-in hex_encode function. Returns the hexadecimal representation of a string.

    c

    Column to encode.

    returns

    Encoded string.

  42. def instr(str: Column, substring: String): Column

    Locate the position of the first occurrence of substr column in the given string.

    Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.

    Note

    The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.

  43. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  44. def isnull(c: Column): Column

    Wrapper for Snowflake built-in isnull function.

    Wrapper for Snowflake built-in isnull function. Gets a boolean depending if value is NULL or not.

    c

    Column to qnalize if it is null value.

    returns

    Column object.

  45. def json_tuple(json: Column, fields: String*): Seq[Column]

    Wrapper for Spark json_tuple.

    Wrapper for Spark json_tuple. This leverages JSON_EXTRACT_PATH_TEXT and improves functionality by allowing multiple columns in a single call, whereas JSON_EXTRACT_PATH_TEXT must be called once for every column.

    There were differences found between Spark json_tuple and this function:

    • Float type: Spark returns only 6 floating points, whereas Snowflake returns more.
    • Timestamp type: Spark interprets input date as UTC and transforms to local timestamp, whereas Spark leaves the timestamp as-is.
    • Complex JSON path expressions: This function allows the retrieval of values within json objects, whereas Spark only allows values from the root.
    • Identifiers with spaces: Snowflake returns error when an invalid expression is sent, whereas Spark returns null.

    Usage:

    df = session.createDataFrame(Seq(("CR", "{\"id\": 5, \"name\": \"Jose\", \"age\": 29}"))).toDF(Seq("nationality", "json_string"))
    
    When the result of this function is the only part of the select statement, no changes are needed:
    df.select(json_tuple(col("json_string"), "id", "name", "age")).show()
    

    ----------------------
    |"C0"  |"C1"  |"C2"  |
    ----------------------
    |5     |Jose  |29    |
    ----------------------
    
    However, when specifying multiple columns, an expression like this is required:
    df.select(
      col("nationality")
      , json_tuple(col("json_string"), "id", "name", "age"):_* // Notice the :_* syntax.
    ).show()
    

    -------------------------------------------------
    |"NATIONALITY"  |"C0"  |"C1"  |"C2"  |"C3"      |
    -------------------------------------------------
    |CR             |5     |Jose  |29    |Mobilize  |
    -------------------------------------------------
    

    json

    Column containing the JSON string text.

    fields

    Fields to pull from the JSON file.

    returns

    Column sequence with the specified strings.

  46. def last(c: Column, nulls: Boolean): Column

    Wrapper for Snowflake built-in last function.

    Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping. Functional difference with windows, In Snowpark is needed the order by. SQL doesn't guarantee the order.

    c

    Column to get last value.

    nulls

    Consider null values or not.

    returns

    Column object.

  47. def last(s: String, nulls: Boolean): Column

    Wrapper for Snowflake built-in last function.

    Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping. Functional difference with windows, In Snowpark is needed the order by. SQL doesn't guarantee the order.

    s

    Column name to get last value.

    nulls

    Consider null values or not.

    returns

    Column object.

  48. def last(s: String): Column

    Wrapper for Snowflake built-in last function.

    Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping.

    returns

    Column object.

  49. def last(c: Column): Column

    Wrapper for Snowflake built-in last function.

    Wrapper for Snowflake built-in last function. Gets the last value of a column according to its grouping. Functional difference with windows, In Snowpark is needed the order by. SQL doesn't guarantee the order.

    c

    Column to obtain last value.

    returns

    Column object.

  50. def locate(substr: Column, str: Column, pos: Int): Column

    Locate the position of the first occurrence of substr in a string column, after position pos.

    Locate the position of the first occurrence of substr in a string column, after position pos.

    substr

    string to search

    str

    value where string will be searched

    pos

    index for starting the search

    returns

    returns the position of the first occurrence.

    Note

    The position is not zero based, but 1 based index. returns 0 if substr could not be found in str. This function is just leverages the SF POSITION builtin

  51. def locate(substr: String, str: Column, pos: Int = 0): Column

    Locate the position of the first occurrence of substr in a string column, after position pos.

    Locate the position of the first occurrence of substr in a string column, after position pos.

    substr

    string to search

    str

    value where string will be searched

    pos

    index for starting the search

    returns

    Returns the position of the first occurrence

    Note

    The position is not zero based, but 1 based index. returns 0 if substr could not be found in str. This function is just leverages the SF POSITION builtin

  52. def log10(columnName: String): Column

    Computes the logarithm of the given column in base 10.

    Computes the logarithm of the given column in base 10.

    columnName

    Column to apply this mathematical operation

    returns

    log2 of the given column

  53. def log10(expr: Column): Column

    Computes the logarithm of the given column in base 10.

    Computes the logarithm of the given column in base 10.

    expr

    Column to apply this mathematical operation

    returns

    log2 of the given column

  54. def log1p(col: Column): Column

    Computes the natural logarithm of the given value plus one.

    Computes the natural logarithm of the given value plus one.

    col

    the value to use

    returns

    the natural logarithm of the given value plus one.

  55. def log1p(columnName: String): Column

    Computes the natural logarithm of the given value plus one.

    Computes the natural logarithm of the given value plus one.

    columnName

    the value to use

    returns

    the natural logarithm of the given value plus one.

  56. def log2(columnName: String): Column

    Computes the logarithm of the given column in base 2.

    Computes the logarithm of the given column in base 2.

    columnName

    Column to apply this mathematical operation

    returns

    log2 of the given column

  57. def log2(expr: Column): Column

    Computes the logarithm of the given column in base 2.

    Computes the logarithm of the given column in base 2.

    expr

    Column to apply this mathematical operation

    returns

    log2 of the given column

  58. def monotonically_increasing_id(): Column

    A column expression that generates monotonically increasing 64-bit integers.

  59. def months_between(end: Column, start: Column): Column

    Returns number of months between dates start and end.

    Returns number of months between dates start and end.

    A whole number is returned if both inputs have the same day of month or both are the last day of their respective months. Otherwise, the difference is calculated assuming 31 days per month.

    For example:

    months_between("2017-11-14", "2017-07-14")  // returns 4.0
    months_between("2017-01-01", "2017-01-10")  // returns 0.29032258
    months_between("2017-06-01", "2017-06-16 12:00:00")  // returns -0.5
    end

    A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

    start

    A date, timestamp or string. If a string, the data must be in a format that can cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

    returns

    A double, or null if either end or start were strings that could not be cast to a timestamp. Negative if end is before start

  60. def nanvl(expr1: Column, expr2: Column): Column

    Returns expr1 if it is not NaN, or expr2 if expr1 is NaN.

    Returns expr1 if it is not NaN, or expr2 if expr1 is NaN.

    expr1

    expression when value is NaN

    expr2

    expression when value is not NaN

  61. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  62. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  63. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  64. def ntile(n: Int): Column
  65. def randn(seed: Long): Column

    Wrapper for Spark randn(seed).

    Wrapper for Spark randn(seed). It will return a call to the Snowflake RANDOM function. The return values differ from Snowflake to Spark, however this difference is accepted. Spark returns a Float number between -9 and 9 with 15-17 floating points, whereas Snowflake returns integers of 17-19 digits.

    seed

    Seed to use in the random function.

    returns

    Random number.

  66. def randn(): Column

    Wrapper for Spark randn.

    Wrapper for Spark randn. It will return a call to the Snowflake RANDOM function. The return values differ from Snowflake to Spark, however this difference is accepted. Spark returns a Float number between -9 and 9 with 15-17 floating points, whereas Snowflake returns integers of 17-19 digits.

    returns

    Random number.

  67. def regexp_extract(colName: Column, exp: String, grpIdx: Int): Column

    Implementation for Spark regexp_extract.

    Implementation for Spark regexp_extract. This function receives a column and extracts the groupIdx from the string after applying the exp regex. Spark returns empty string when the string doesn't match and null if the input is null. This behavior has been replicated, since Snowflake returned null on both cases.

    This function applies the case sensitive and extract flags. It doesn't apply multiline nor .* matches newlines. Even though Snowflake is capable of doing so with the regexp_substr function, Spark's behavior for this function doesn't allow this. If these flags need to be applied, use builtin("REGEXP_SUBSTR") instead and apply the desired flags.

    Note: This function ensures that the function has the same functionality on Snowflake as it does on Spark. However, it doesn't guarantee that regular expressions on Spark will work the same on Snowflake. They still have different engines and the regex itself needs to be analyzed manually. For example: non-greedy tokens such as .*? don't work on Snowflake as they do on Spark.

    colName

    Column to apply regex.

    exp

    Regex expression to apply.

    grpIdx

    Group to extract.

    returns

    Column object.

  68. def regexp_replace(strExpr: Column, pattern: String, replacement: String): Column

    Wrapper for Snowflake built-in regexp_replace function.

    Wrapper for Snowflake built-in regexp_replace function. Replaces parts of a string with the specified replacement value, based on a regular expression.

    strExpr

    String to apply replacement.

    pattern

    Regex pattern to find in the expression.

    replacement

    Column to replace within the string.

    returns

    Column object.

  69. def regexp_replace(strExpr: Column, pattern: Column, replacement: Column): Column

    Wrapper for Snowflake built-in regexp_replace function.

    Wrapper for Snowflake built-in regexp_replace function. Replaces parts of a string with the specified replacement value, based on a regular expression.

    strExpr

    String to apply replacement.

    pattern

    Regex pattern to find in the expression.

    replacement

    Column to replace within the string.

    returns

    Column object.

  70. def reverse(c: Column): Column

    Wrapper for Snowflake built-in reverse function.

    Wrapper for Snowflake built-in reverse function. Gets the reversed string.

    c

    Column to be reverse.

    returns

    Column object.

  71. def shiftleft(c: Column, numBits: Int): Column

    Wrapper for bitshiftleft.

    Wrapper for bitshiftleft. Shifts numBits bits of a number to the left. There is a slight difference between Spark and Snowflake's implementation. When shifting an integer value, if Snowflake detects that the shift will exceed the maximum value for the integers, it will convert it to Long, whereas Spark will make it negative and start counting from the bottom. Example: shiftleft(2147483647, 5) returns -32 on Spark, but on Snowflake it returns 68719476704. This is fixed by casting the column in Spark to LongType. In general, this difference is accepted.

    c

    Column to modify.

    numBits

    Number of bits to shift.

    returns

    Column object.

  72. def shiftright(c: Column, numBits: Int): Column

    Wrapper for bitshiftright.

    Wrapper for bitshiftright. Shifts numBits bits of a number to the right. There might be differences on this function similar to the ones explained on shiftleft.

    c

    Column to modify.

    numBits

    Number of bits to shift.

    returns

    Column object.

  73. def signum(columnName: String): Column

    Wrapper for Spark signum function.

    Wrapper for Spark signum function. Returns the sign of the given column. Returns either 1 for positive, 0 for 0 or NaN -1 for negative and null for null.

    There are some functional differences with Spark:

    • Spark returns NaN when NaN is provided.
    • Spark and Snowflake can receive string and attempts to cast. If it casts correctly, returns the calculation, if not returns Spark returns null whereas Snowflake returns error.
    columnName

    Name of the column to calculate the sign.

    returns

    Column object.

  74. def signum(colName: Column): Column

    Wrapper for Spark signum function.

    Wrapper for Spark signum function. Returns the sign of the given column. Returns either 1 for positive, 0 for 0 or NaN, -1 for negative and null for null.

    There are some functional differences with Spark:

    • Spark returns NaN when NaN is provided.
    • Spark and Snowflake can receive string and attempts to cast. If it casts correctly, returns the calculation, if not returns Spark returns null whereas Snowflake returns error.
    returns

    Column object.

  75. def size(c: Column): Column

    Wrapper for Snowflake built-in size function.

    Wrapper for Snowflake built-in size function. Gets the size of array column.

    c

    Column to get the size.

    returns

    Size of array column.

  76. def substring_index[TStr, TDelim, TCount](str: TStr, delim: TDelim, count: TCount)(implicit arg0: ColumnOrString[TStr], arg1: ColumnOrString[TDelim], arg2: ColumnOrInt[TCount]): Column
  77. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  78. def toString(): String
    Definition Classes
    AnyRef → Any
  79. def to_utc_timestamp(ts: Column, tz: Column): Column

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

    ts

    A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

    tz

    A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.

    returns

    A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value

  80. def to_utc_timestamp(ts: Column, tz: String): Column

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.

    Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

    ts

    A date, timestamp or string. If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

    tz

    A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous.

    returns

    A timestamp, or null if ts was a string that could not be cast to a timestamp or tz was an invalid value

  81. def unbase64(col: Column): Column

    Decodes a BASE64 encoded string

    Decodes a BASE64 encoded string

    returns

    the decoded column

  82. def unhex(c: Column): Column

    Wrapper for Snowflake built-in hex_decode_string function.

    Wrapper for Snowflake built-in hex_decode_string function. Returns the string representation of a hexadecimal value.

    c

    Column to encode.

    returns

    Encoded string.

  83. def unix_timestamp(s: Column, p: String): Column

    Converts time string with given pattern to Unix timestamp (in seconds).

    Converts time string with given pattern to Unix timestamp (in seconds).

    s

    A date, timestamp or string. If a string, the data must be in a format that can be cast to a date, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS

    p

    A date time pattern detailing the format of s when s is a string

    returns

    A long, or null if s was a string that could not be cast to a date or p was an invalid format

  84. def unix_timestamp(s: Column): Column

    Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.

    Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.

    s

    A date, timestamp or string. If a string, the data must be in the yyyy-MM-dd HH:mm:ss format

    returns

    A long, or null if the input was a string not of the correct format

  85. def unix_timestamp(): Column

    Returns the current Unix timestamp (in seconds) as a long.

    Returns the current Unix timestamp (in seconds) as a long.

    Note

    All calls of unix_timestamp within the same query return the same value

  86. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  87. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  88. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  89. def when(condition: Column, result: Any): ExtendedCaseExpr

    Wrapper for the when expression to facilite some castings

    Wrapper for the when expression to facilite some castings

    condition

    expression to evaluate

    result

    the value to return if expression is true

    returns

    Column object.

  90. object ColumnOrInt
  91. object ColumnOrString

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated @deprecated
    Deprecated

    (Since version ) see corresponding Javadoc for more information.

Inherited from AnyRef

Inherited from Any

Ungrouped