--- title: "Getting Started with qryflow" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with qryflow} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r} library(qryflow) ``` # What is `qryflow`? `qryflow` lets you write multi-step SQL workflows in plain `.sql` files and run them from R with a single function call. Specially formatted tags tell R how to execute each SQL chunk and what to name the results. This allows you to: - Keep multiple SQL statements in the same file. - Control how each SQL "chunk" is executed. - Return results as named R objects. - Pass metadata that can be used later in R workflows In short: You can define and run **multi-step SQL workflows** with one function call, and get your results back as a structured R object. # Basic usage The main function is `qryflow`, which accepts SQL tagged with special comments and a connection to DBI-compliant database. Note, the SQL can be a character vector, like in the example below, or a filepath to a file that contains SQL. ```{r example} # Connection to In-Memory DB with table populated from mtcars con <- example_db_connect(mtcars) sql <- " -- @exec: drop_cyl_6 DROP TABLE IF EXISTS cyl_6; -- @exec: prep_cyl_6 CREATE TABLE cyl_6 AS SELECT * FROM mtcars WHERE cyl = 6; -- @query: df_cyl_6 SELECT * FROM cyl_6; " # Pass tagged SQL to `qryflow` results <- qryflow(con, sql, verbose = TRUE) # Access the results from the chunk named `df_cyl_6` head(results$df_cyl_6) ``` By default, the package supports `@exec` tags, which are executed with `DBI::dbExecute()` and `@query` tags, which are executed with `DBI::dbGetQuery()`. When you run `qryflow()`: 1. The SQL script is split into chunks using tag lines like `-- @query: df_mtcars`. 2. Each chunk is assigned a type (e.g., `query` or `exec`) 3. Chunks are executed in order, using the associated execution type 4. The results are returned as named objects # Defining a Chunk In `qryflow`, a chunk is a grouped section of SQL code, representing a single executable unit within a larger multi-step SQL workflow, and preceded by one or more tag lines (e.g., the pattern `-- @: `). - Tagged lines act as markers that start a new chunk. - All lines (comments and SQL) immediately following a contiguous group of tagged lines belong to that chunk until another tag line starts the next chunk. - If the script has no tags, the entire script is treated as one single chunk. # Tags and Aliases Each SQL chunk must be tagged with a `type` so `qryflow` knows how to execute it. If a chunk is not provided with a tag, the `qryflow` engine will use the value of the `default_type` argument, which can be provided directly or set with `getOption("qryflow.default_type", "query")`. It defaults to "query", as getting data out is the most common use case. Tags use SQL-style comments (`--`) and follow the format: ```sql -- @: ``` ## Important Tags Each chunk should have both a `name` (the name of the object when returned to R) and a `type` (execution mode for the chunk). Users can set these explicitly with the following tags: - `@type` apecifies execution type (`-- @type: query`) - `@name` assigns a name to the chunk’s result (`-- @name: df_users`) For registered types, users can use shorthand to supply both name and type in one line. For example, `@query` and `@exec` are aliases for setting both `@type` and `@name` in one line. **Aliased form (preferred):** ```sql -- @query: df_mtcars SELECT * FROM mtcars; ``` **Explicit form (equivalent):** ```sql -- @type: query -- @name: df_mtcars SELECT * FROM mtcars; ``` ## Type Identification During parsing, `qryflow` determines its type using the following rules: 1. If a chunk includes an explicit `-- @type:` tag, that value is used as the chunk type. 2. If there is no `@type:` tag, `qryflow` checks for any other tag that matches a registered type (`@query:`, `@exec:`, etc.) . The first match found is used as the type. 3. If no recognized tag is found, the type defaults to the value of `getOption("qryflow.default_type", "query")`. ## Passing Additional Tags You can include additional tags to carry metadata into your R workflow, that follow the tagging structure: ```sql -- @exec: df_mtcars -- @src: dbo.mtcars -- @topic: cars SELECT * FROM mtcars; ``` # Important Arguments ## on_error The `on_error` argument controls what happens when a single chunk fails: - `"stop"` (default): halts execution immediately and raises an error. - `"warn"`: records the error and signals a warning, but continues running remaining chunks. - `"collect"`: silently collects all errors across all chunks and raises a single combined error at the end. ```{r, error=TRUE} # on_error = "stop" (default): halts on first failure bad_sql <- " -- @exec: prep_cyl_6 CREATE TABLE cyl_6 AS SELECT * FROM mtcars WHERE cyl = 6; -- @query: df_missing SELECT * FROM nonexistent_table; -- @query: df_mtcars SELECT * FROM mtcars; " qryflow(con, bad_sql, on_error = "stop") ``` ```{r} # Warn collects errors and signals a warning qryflow(con, bad_sql, on_error = "warn") ``` ```{r, error=TRUE} # on_error = "collect": runs everything, then reports all failures together qryflow(con, bad_sql, verbose = TRUE, on_error = "collect") ``` ## verbose By default, `qryflow` is quiet. However, for long running queries with multiple chunks, you may want feedback on which chunks are currently running. You can use `verbose = TRUE` to get updates during execution. ## simplify When `simplify = TRUE`, in the case where there is only one chunk, `qryflow()` will return a single object (as opposed to a named list of results). For example: ```{r} sql1 <- " -- @query: df_mtcars SELECT * FROM mtcars; " sql2 <- " -- @query: df_mtcars SELECT * FROM mtcars; -- @query: df_mtcars_cyl6 SELECT * FROM mtcars WHERE cyl = 6; " # Pass tagged SQL to `qryflow` res1 <- qryflow(con, sql1, simplify = TRUE) res2 <- qryflow(con, sql2, simplify = TRUE) res3 <- qryflow(con, sql1, simplify = FALSE) class(res1) # simplifies the result to the single data.frame() because only one chunk class(res2) # returns named list class(res3) # returns named list, because simplify = FALSE ``` This design choice is to facilitate easy interactive use and is a common use-case. Because `qryflow()` might return a named `list` or a single `data.frame` depending on the input, the `qryflow` package exports other functions so users can prioritize reliability in return objects. The next section explores functions like `qryflow_run()` and `qryflow_results()` further. # The Core API While `qryflow()` covers most use cases, users who want more control and consistency may prefer to use the functions that `qryflow()` leverages: - `qryflow_run()` - `qrflow_results() ` - `qryflow_parse()` - `qryflow_execute` ## `qryflow_run()` and `qryflow_results()` `qryflow_run()` performs parsing *and* execution, returning a full `qryflow` object - including all chunk metadata, not just the query results. ```{r} obj <- qryflow_run(con, sql) # A qryflow object class(obj) # Chunk names are top-level list names names(obj) obj # Print Method ``` Each element is a `qryflow_chunk`: ```{r} class(obj$df_cyl_6) # Print the chunk obj$df_cyl_6 ``` To extract only the query results (equivalent to what `qryflow()` returns), use `qryflow_results()`: ```{r} results <- qryflow_results(obj) class(results$df_cyl_6) head(results$df_cyl_6) ``` ## `qryflow_parse()` and `qryflow_execute()` For even more control, you can parse and execute separately: ```{r} # Step 1: Parse the SQL into structured chunks filepath <- example_sql_path() workflow <- qryflow_parse(filepath) class(workflow) length(workflow) names(workflow) # Inspect a chunk before execution workflow$df_mtcars ``` Each `qryflow_chunk` contains: - `$type`: the execution type (e.g., `"query"`) - `$name`: the chunk name - `$sql`: the SQL body - `$tags`: any additional tags - `$results`: `NULL` before execution; populated after ```{r} # Step 2: Execute the parsed workflow executed <- qryflow_execute(con, workflow) class(executed) names(executed) executed ``` # Metadata Both the worfklow object (`qryflow`) and the chunk objects (`qryflow_chunk`) store metadata about the execution. You can access this information with the `qryflow_meta()` function: ```{r} qryflow_meta(executed) # The whole workflow ``` ```{r} qryflow_meta(executed[[1]]) # The whole chunk ``` # Summary | Function | What it does | |---|---| | `qryflow()` | Parse + execute + return query results. | | `qryflow_run()` | Parse + execute, returning a full `qryflow` object with metadata. | | `qryflow_results()` | Extract query results from a `qryflow` object. | | `qryflow_parse()` | Parse SQL into structured `qryflow` object - No execution. | | `qryflow_execute()` | Execute a parsed `qryflow` object against a connection. | | `qryflow_meta()` | Access metadata (status, duration, timing) on a workflow or chunk. | For a guide on registering custom chunk types and extending `qryflow`'s behaviour, see `vignette("extend-qryflow", package = "qryflow")`. # Examples **Example 1 - Script with no tags** ```sql CREATE TABLE cyl_6 AS SELECT * FROM mtcars WHERE cyl = 6; ``` Result - The entire script is one chunk containing all lines. - Why? Without tags, `qryflow` treats the whole script as a single step. **Example 2 - Script with one tag at the start** ```sql -- @query: get_6cyl SELECT * FROM mtcars WHERE cyl = 6; ``` Result - One chunk starting at the tag, containing the rest of the script. Because the tag is at line 1, the chunk starts there and continues to the end. **Example 3 - Script with one tag in the middle** ```sql SELECT * FROM mtcars WHERE cyl = 6; -- @query: df_mtcars SELECT * FROM mtcars; ``` Result - Two chunks: - Chunk 1: lines before the tag (untagged SQL). - Chunk 2: from the tag line to the end. This preserves any pre-tag SQL as a separate chunk. **Example 4 - Script with multiple tags** ```sql -- @exec: drop_cyl_6 DROP TABLE IF EXISTS cyl_6; -- @exec: prep_cyl_6 CREATE TABLE cyl_6 AS SELECT * FROM mtcars WHERE cyl = 6; -- @query: df_mtcars SELECT * FROM mtcars; -- @query: df_cyl_6 SELECT * FROM cyl_6 ``` - Result - Four chunks, each starting at its respective tag line. - Each chunk is parsed and executed independently in sequence. ```{r, echo=FALSE, include=FALSE} DBI::dbDisconnect(con) ```