Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

DAGraph is a Data Analytics Graph or a Directed Acyclic Graph.

Caveat

As of the time of writing this book, DAGraph is still evolving quickly and still in early-access.

Some features may change, or be removed. Some features may move to the PRO tier (paid).

DAG - Directed Acyclic Graph

Directed acyclic graph. The data flows from the input to the output, there are no cycles.

The app will prevent you from creating cycles.

General UI overview

The window is split in 4 main parts:

  • the menu (top bar)
  • the tool bar (below the menu)
  • the document / canvas view
  • the panels

Panels

Panels can be reorganised by drag-and-dropping the tabs.

Selection

A selection can contain nodes and edges you can operate on, such as moving, deleting or connecting.

A selection can have a primary selection when multiple items are in the selection. You can differentiate them by the hue difference.

On mobile devices: to region-select, long-press on the canvas until you see a blue rectangle appear and shrink. The app runs on mobile devices (phones and tablets), but a bigger screen and a pointing device (mouse or trackpad) are desirable.

Inspector

The inspector will show details of the primary selection.

Chapter 1 - Arithmetic nodes

A node can be a value or an operation (like an arithmetic function or a SQL script).

A scalar is a basic value, as opposed to a table.

Values can be:

  • scalar:
    • just a singular value: a decimal number (“float”)
    • a string (like “hello”) these have more limited use, and are mostly useful as SQL node input (see later)
  • tables: a collection of columns, with a type per column

Transformations can be:

  • scalar operations such as:
    • function1’s: unary function like negation, absolute value
    • function2’s: binary function like addition, multiplication
  • table transformation,

Node name

A node has a name that is unique across the document - no 2 nodes can have the same name.

Node anatomy

To move a node on the canvas you can drag the handle which is on the top-left corner.

The type indicator show shows what type of node it is:

  • N is for number (floats)
  • T is for text (UTF-8)

Node label editing

A node’s label can be edited to rename the node.

Node names are unique, there cannot be 2 nodes with the same name within a document.

If you try to rename a node with an existing name, the app will warn you by highlighting the name with a yellow background.

Scalar nodes

There are 2 types of scalar nodes:

  • float: contains a float value: used in basic arithmetic and as input to SQL nodes.
  • text: contains a string (mostly useful when connecting to a SQL node)

Moving nodes

To move nodes, use the handle of the node located in the top left corner of the widget. Function nodes can be dragged from anywhere.

Connecting nodes

Drag from the arrow pointing down (↓) in the node to the function node.

A function without all its inputs cannot be computed, hence the “error” output in the result node; once the function node has all its inputs the output will compute.

Disconnecting and reconnecting nodes

Nodes can be disconnect by deleting edges. To delete an edge, select it and press back space or the “delete” button in the toolbar.

Nodes can be reconnected, nodes can be connected multiple times.

Changing function nodes

Select a function node and click on another function in the panel. The result will update accordingly.

Chaining functions

To apply a function to a result node, select the result node and click on the button of the function.

Make sure the selection only contains 1 element for a function1.

“Function 1” and “Function 2” are respectively shorthand for “function of arity 1” and “function of arity 2”, a fancy way of saying how many arguments the function takes.

Chapter 2 - Table nodes

Table node and SQL nodes

Input table

Input table are for manual input tables. They are meant to input small tables by hand.

Table from file

This is the preferred way to do calculation on large tables. Parquet and CSV files are supported.

Parquet files are the preferred way to read data from files.

CSV tables are also supported, as a convenience.

Files are copied into the browser’s file system (OPFS).

SQL nodes

DAGraph uses DataFusion SQL, which is pretty close to Postgres SQL.

As it’s an OLAP tool, DAGraph SQL really only supports DQL (select queries).

Schema definition is done either by creating input tables or importing data from files.

DAGraph is functional, there are no side effect, SQL queries cannot do INSERT, UPDATE or DELETE.

Parameterized SQL queries

Queries with positional parameters, such as:

select
    $1 as number,
    $2 as text,
    $3 as inline

will be parsed and the widget will have “ports”. Ports can be connected, or values can be input.

Connecting scalar nodes to SQL params

Drag and drop

Drag from the scalar node output to the params input.

Typing the input node name

You can also type in the node name, if you select “node”.

Removing edges from scalar nodes to SQL params

Select the edge and delete the edge.

Table output

SQL nodes have output table nodes.

When a table node is selected, the inspector shows information about the table, such as its dimensions and schema. You can export the table as a Parquet file (preferred) or CSV.

Chapter 3 - Export

You can export the document from the workspace. You should export your document regularly, the workspace, as its name suggest is mainly for temporary storage while doing the work. In the browser the app uses OPFS, which comes with some limitation, and maybe purged by the system to reclaim space. The constraints are browser-dependent. To avoid any loss of data, export your documents regularly. Future version of DAGraph will be able to connect directly to version control system.

Archive

.dagraph files are actually zip bundles that contain:

  • sql scripts (one file per node)
  • graph description
  • data files (optional)

It’s possible to see inside .dagraph files: on some systems, you just need to rename the file (e.g. foo.dagraph to foo.zip, or foo.dagraph.zip) and then unzip the archive.

The format is evolving, but the principle will remain the same.

Flat file

This format is deprecated and should not be used.

Misc

DAGraph website: https://dagraph.com