Mode
Important Capabilities
Capability | Status | Notes |
---|---|---|
Asset Containers | ✅ | Enabled by default |
Column-level Lineage | ✅ | Supported by default |
Descriptions | ✅ | Enabled by default |
Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion.remove_stale_metadata |
Extract Ownership | ✅ | Enabled by default |
Platform Instance | ✅ | Enabled by default |
Table-Level Lineage | ✅ | Supported by default |
This plugin extracts Charts, Reports, and associated metadata from a given Mode workspace. This plugin is in beta and has only been tested on PostgreSQL database.
Report
/api/{account}/reports/{report} endpoint is used to retrieve the following report information.
- Title and description
- Last edited by
- Owner
- Link to the Report in Mode for exploration
- Associated charts within the report
Chart
/api/{workspace}/reports/{report}/queries/{query}/charts' endpoint is used to retrieve the following information.
- Title and description
- Last edited by
- Owner
- Link to the chart in Metabase
- Datasource and lineage information from Report queries.
The following properties for a chart are ingested in DataHub.
Chart Information
Name | Description |
---|---|
Filters | Filters applied to the chart |
Metrics | Fields or columns used for aggregation |
X | Fields used in X-axis |
X2 | Fields used in second X-axis |
Y | Fields used in Y-axis |
Y2 | Fields used in second Y-axis |
Table Information
Name | Description |
---|---|
Columns | Column names in a table |
Filters | Filters applied to the table |
Pivot Table Information
Name | Description |
---|---|
Columns | Column names in a table |
Filters | Filters applied to the table |
Metrics | Fields or columns used for aggregation |
Rows | Row names in a table |
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: mode
config:
# Coordinates
connect_uri: http://app.mode.com
# Credentials
token: token
password: pass
# Options
workspace: "datahub"
default_schema: "public"
owner_username_instead_of_email: False
api_options:
retry_backoff_multiplier: 2
max_retry_interval: 10
max_attempts: 5
sink:
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
password ✅ string(password) | When creating workspace API key this is the 'Secret'. |
token ✅ string | When creating workspace API key this is the 'Key ID'. |
workspace ✅ string | The Mode workspace name. Find it in Settings > Workspace > Details. |
connect_uri string | Mode host URL. Default: https://app.mode.com |
default_schema string | Default schema to use when schema is not provided in an SQL query Default: public |
exclude_restricted boolean | Exclude restricted collections Default: False |
ingest_embed_url boolean | Whether to Ingest embed URL for Reports Default: True |
owner_username_instead_of_email boolean | Use username for owner URN instead of Email Default: True |
platform_instance_map map(str,string) | |
tag_measures_and_dimensions boolean | Tag measures and dimensions in the schema Default: True |
env string | The environment that all assets produced by this connector belong to Default: PROD |
api_options ModeAPIConfig | Retry/Wait settings for Mode API to avoid "Too many Requests" error. See Mode API Options below Default: {'retry_backoff_multiplier': 2, 'max_retry_interva... |
api_options.max_attempts integer | Maximum number of attempts to retry before failing Default: 5 |
api_options.max_retry_interval One of integer, number | Maximum interval to wait when retrying Default: 10 |
api_options.retry_backoff_multiplier One of integer, number | Multiplier for exponential backoff when waiting to retry Default: 2 |
api_options.timeout integer | Timout setting, how long to wait for the Mode rest api to send data before giving up Default: 40 |
space_pattern AllowDenyPattern | Regex patterns for mode spaces to filter in ingestion (Spaces named as 'Personal' are filtered by default.) Specify regex to only match the space name. e.g. to only ingest space named analytics, use the regex 'analytics' Default: {'allow': ['.*'], 'deny': ['^Personal$'], 'ignoreC... |
space_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
space_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
space_pattern.allow.string string | |
space_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
space_pattern.deny.string string | |
stateful_ingestion StatefulStaleMetadataRemovalConfig | Base specialized config for Stateful Ingestion with stale metadata removal capability. |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"title": "ModeConfig",
"description": "Base configuration class for stateful ingestion for source configs to inherit from.",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform_instance_map": {
"title": "Platform Instance Map",
"description": "A holder for platform -> platform_instance mappings to generate correct dataset urns",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"stateful_ingestion": {
"$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
},
"connect_uri": {
"title": "Connect Uri",
"description": "Mode host URL.",
"default": "https://app.mode.com",
"type": "string"
},
"token": {
"title": "Token",
"description": "When creating workspace API key this is the 'Key ID'.",
"type": "string"
},
"password": {
"title": "Password",
"description": "When creating workspace API key this is the 'Secret'.",
"type": "string",
"writeOnly": true,
"format": "password"
},
"exclude_restricted": {
"title": "Exclude Restricted",
"description": "Exclude restricted collections",
"default": false,
"type": "boolean"
},
"workspace": {
"title": "Workspace",
"description": "The Mode workspace name. Find it in Settings > Workspace > Details.",
"type": "string"
},
"default_schema": {
"title": "Default Schema",
"description": "Default schema to use when schema is not provided in an SQL query",
"default": "public",
"type": "string"
},
"space_pattern": {
"title": "Space Pattern",
"description": "Regex patterns for mode spaces to filter in ingestion (Spaces named as 'Personal' are filtered by default.) Specify regex to only match the space name. e.g. to only ingest space named analytics, use the regex 'analytics'",
"default": {
"allow": [
".*"
],
"deny": [
"^Personal$"
],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"owner_username_instead_of_email": {
"title": "Owner Username Instead Of Email",
"description": "Use username for owner URN instead of Email",
"default": true,
"type": "boolean"
},
"api_options": {
"title": "Api Options",
"description": "Retry/Wait settings for Mode API to avoid \"Too many Requests\" error. See Mode API Options below",
"default": {
"retry_backoff_multiplier": 2,
"max_retry_interval": 10,
"max_attempts": 5,
"timeout": 40
},
"allOf": [
{
"$ref": "#/definitions/ModeAPIConfig"
}
]
},
"ingest_embed_url": {
"title": "Ingest Embed Url",
"description": "Whether to Ingest embed URL for Reports",
"default": true,
"type": "boolean"
},
"tag_measures_and_dimensions": {
"title": "Tag Measures And Dimensions",
"description": "Tag measures and dimensions in the schema",
"default": true,
"type": "boolean"
}
},
"required": [
"token",
"password",
"workspace"
],
"additionalProperties": false,
"definitions": {
"DynamicTypedStateProviderConfig": {
"title": "DynamicTypedStateProviderConfig",
"type": "object",
"properties": {
"type": {
"title": "Type",
"description": "The type of the state provider to use. For DataHub use `datahub`",
"type": "string"
},
"config": {
"title": "Config",
"description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
"default": {},
"type": "object"
}
},
"required": [
"type"
],
"additionalProperties": false
},
"StatefulStaleMetadataRemovalConfig": {
"title": "StatefulStaleMetadataRemovalConfig",
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"default": false,
"type": "boolean"
},
"remove_stale_metadata": {
"title": "Remove Stale Metadata",
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"ModeAPIConfig": {
"title": "ModeAPIConfig",
"type": "object",
"properties": {
"retry_backoff_multiplier": {
"title": "Retry Backoff Multiplier",
"description": "Multiplier for exponential backoff when waiting to retry",
"default": 2,
"anyOf": [
{
"type": "integer"
},
{
"type": "number"
}
]
},
"max_retry_interval": {
"title": "Max Retry Interval",
"description": "Maximum interval to wait when retrying",
"default": 10,
"anyOf": [
{
"type": "integer"
},
{
"type": "number"
}
]
},
"max_attempts": {
"title": "Max Attempts",
"description": "Maximum number of attempts to retry before failing",
"default": 5,
"type": "integer"
},
"timeout": {
"title": "Timeout",
"description": "Timout setting, how long to wait for the Mode rest api to send data before giving up",
"default": 40,
"type": "integer"
}
},
"additionalProperties": false
}
}
}
See Mode's Authentication documentation on how to generate token
and password
.
Code Coordinates
- Class Name:
datahub.ingestion.source.mode.ModeSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Mode, feel free to ping us on our Slack.