7  Snowflake

WIP

7.1 Setup

You will first need to install Snowflake’s ODBC Driver. Then configure the DSN for either Windows or macOS.

7.2 Making the Connection

I wrapped the connection into a function:

connect_to_snowflake <- function() {
  conn <- DBI::dbConnect(
    odbc::odbc(), 
    "<data_source_name>", 
    uid = "<user_id>", 
    pwd = rstudioapi::askForPassword())
  return(conn)
}

con <- connect_to_snowflake()

This works fine for interactive analysis, but you will need to stash your password as an environment variable for various workflows, or perhaps within {targets} projects. (see below)

7.3 Using SQL

Like any other database connection, you can pass SQL queries to the connection as a string.

Warning

If you set a default schema when configuring your Snowflake data source, you must explicitly reference other schemas when querying tables outside of it.

DBI::dbGetQuery(
  con,
  "select table_name, last_altered
  from information_schema.tables 
  where table_name like 'STG%' limit 5
  ")

7.4 Using R

is <- tbl(con, in_schema("INFORMATION_SCHEMA", "TABLES"))
is %>% 
  select(TABLE_NAME, LAST_ALTERED) %>% 
  filter(str_detect(TABLE_NAME, '^STG')) %>%
  head(5) %>% 
  collect()

7.5 Snowflake tables as targets

We can use tarchetypes::tar_change() and the LAST_ALTERED field referenced above as a trigger in {targets} pipelines. If the date changes, the target will rerun alongside with downstream targets. Here’s a small example of how that might work:

_targets.R

library(targets)
library(tarchetypes)

tar_option_set(
  packages = c(
    "tibble",
    "DBI",
    "odbc",
    "dplyr",
    "dbplyr"
    )
  )

connect_to_snowflake <- function() {
  conn <- DBI::dbConnect(
    odbc::odbc(),
    "<dsn_name>",
    uid = "<user_id",
    pwd = "<password>")
  return(conn)
}

tar_plan(
  snowflake_con = connect_to_snowflake(),
  tar_change(
    my_table,
    collect(tbl(snowflake_con, "MY_TABLE")),
    change = dbGetQuery(
      snowflake_con,
      "select last_altered from db.information_schema.tables where table_name = 'MY_TABLE' and table_schema = 'PROD'"
      )
  ),
  my_table_transformed = head(my_table)
)