Superset
Important Capabilities
Capability | Status | Notes |
---|---|---|
Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion |
Domains | ✅ | Enabled by domain config to assign domain_key |
Table-Level Lineage | ✅ | Supported by default |
This plugin extracts the following:
- Charts, dashboards, and associated metadata
See documentation for superset's /security/login at https://superset.apache.org/docs/rest-api for more details on superset's login api.
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: superset
config:
# Coordinates
connect_uri: http://localhost:8088
# Credentials
username: user
password: pass
provider: ldap
sink:
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
connect_uri string | Superset host URL. Default: http://localhost:8088 |
database_alias map(str,string) | |
display_uri string | optional URL to use in links (if connect_uri is only for ingestion) |
ingest_charts boolean | Enable to ingest charts. Default: True |
ingest_dashboards boolean | Enable to ingest dashboards. Default: True |
ingest_datasets boolean | Enable to ingest datasets. Default: False |
options object | Default: {} |
password string | Superset password. |
platform_instance string | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details. |
provider string | Superset provider. Default: db |
username string | Superset username. |
env string | The environment that all assets produced by this connector belong to Default: PROD |
domain map(str,AllowDenyPattern) | A class to store allow deny regexes |
domain. key .allowarray | List of regex patterns to include in ingestion Default: ['.*'] |
domain. key .allow.stringstring | |
domain. key .ignoreCaseboolean | Whether to ignore case sensitivity during pattern matching. Default: True |
domain. key .denyarray | List of regex patterns to exclude from ingestion. Default: [] |
domain. key .deny.stringstring | |
stateful_ingestion StatefulStaleMetadataRemovalConfig | Superset Stateful Ingestion Config. |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"title": "SupersetConfig",
"description": "Base configuration class for stateful ingestion for source configs to inherit from.",
"type": "object",
"properties": {
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.",
"type": "string"
},
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"stateful_ingestion": {
"title": "Stateful Ingestion",
"description": "Superset Stateful Ingestion Config.",
"allOf": [
{
"$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
}
]
},
"connect_uri": {
"title": "Connect Uri",
"description": "Superset host URL.",
"default": "http://localhost:8088",
"type": "string"
},
"display_uri": {
"title": "Display Uri",
"description": "optional URL to use in links (if `connect_uri` is only for ingestion)",
"type": "string"
},
"domain": {
"title": "Domain",
"description": "regex patterns for tables to filter to assign domain_key. ",
"default": {},
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/AllowDenyPattern"
}
},
"username": {
"title": "Username",
"description": "Superset username.",
"type": "string"
},
"password": {
"title": "Password",
"description": "Superset password.",
"type": "string"
},
"ingest_dashboards": {
"title": "Ingest Dashboards",
"description": "Enable to ingest dashboards.",
"default": true,
"type": "boolean"
},
"ingest_charts": {
"title": "Ingest Charts",
"description": "Enable to ingest charts.",
"default": true,
"type": "boolean"
},
"ingest_datasets": {
"title": "Ingest Datasets",
"description": "Enable to ingest datasets.",
"default": false,
"type": "boolean"
},
"provider": {
"title": "Provider",
"description": "Superset provider.",
"default": "db",
"type": "string"
},
"options": {
"title": "Options",
"default": {},
"type": "object"
},
"database_alias": {
"title": "Database Alias",
"description": "Can be used to change mapping for database names in superset to what you have in datahub",
"default": {},
"type": "object",
"additionalProperties": {
"type": "string"
}
}
},
"definitions": {
"DynamicTypedStateProviderConfig": {
"title": "DynamicTypedStateProviderConfig",
"type": "object",
"properties": {
"type": {
"title": "Type",
"description": "The type of the state provider to use. For DataHub use `datahub`",
"type": "string"
},
"config": {
"title": "Config",
"description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
"default": {},
"type": "object"
}
},
"required": [
"type"
],
"additionalProperties": false
},
"StatefulStaleMetadataRemovalConfig": {
"title": "StatefulStaleMetadataRemovalConfig",
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"default": false,
"type": "boolean"
},
"remove_stale_metadata": {
"title": "Remove Stale Metadata",
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
}
}
}
If you were using database_alias
in one of your other ingestions to rename your databases to something else based on business needs you can rename them in superset also
source:
type: superset
config:
# Coordinates
connect_uri: http://localhost:8088
# Credentials
username: user
password: pass
provider: ldap
database_alias:
example_name_1: business_name_1
example_name_2: business_name_2
sink:
# sink configs
Code Coordinates
- Class Name:
datahub.ingestion.source.superset.SupersetSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Superset, feel free to ping us on our Slack.