Configure the OPC UA extractor

To configure the OPC UA extractor, you must edit the configuration file. The file is in YAML format, and the sample configuration file contains all valid options with default values.

You can leave many fields empty to let the extractor use the default values. The configuration file separates the settings by component, and you can remove an entire component to disable it or use the default values.

Sample configuration files

In the extractor installation folder, the /config subfolder contains sample complete and minimal configuration files. The values wrapped in ${} are replaced with environment variables with that name. For example, ${COGNITE_PROJECT} will be replaced with the value of the environment variable called COGNITE_PROJECT.

The configuration file also contains the global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 1 of the configuration schema.

Tip

You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.

Minimal YAML configuration file

version: 1

source:
  # The URL of the OPC-UA server to connect to
  endpoint-url: 'opc.tcp://localhost:4840'

cognite:
  # The project to connect to in the API, uses the environment variable COGNITE_PROJECT.
  project: '${COGNITE_PROJECT}'
  # Cognite authentication
  # This is for Microsoft as IdP. To use a different provider,
  # set implementation: Basic, and use token-url instead of tenant.
  # See the example config for the full list of options.
  idp-authentication:
    # Directory tenant
    tenant: ${COGNITE_TENANT_ID}
    # Application Id
    client-id: ${COGNITE_CLIENT_ID}
    # Client secret
    secret: ${COGNITE_CLIENT_SECRET}
    # List of resource scopes, ex:
    # scopes:
    #   - scopeA
    #   - scopeB
    scopes:
      - ${COGNITE_SCOPE}

extraction:
  # Global prefix for externalId in destinations. Should be unique to prevent name conflicts.
  id-prefix: 'gp:'
  # Map OPC-UA namespaces to prefixes in CDF. If not mapped, the full namespace URI is used.
  # Saves space compared to using the full URL. Using the ns index is not safe as the order can change on the server.
  # It is recommended to set this before extracting the node hierarchy.
  # For example:
  # NamespaceMap:
  #   "urn:cognite:net:server": cns
  #   "urn:freeopcua:python:server": fps
  #   "http://examples.freeopcua.github.io": efg

ProtoNodeId

You can provide an OPC UA nodeid in several places in the configuration file with an object in YAML with the following structure:

node:
- node-id: i=123
- namespace-urii: opc.tcp://test.test/

To find the node IDs, we recommend using the UAexpert tool.

Locate the datatype/event-type/node in the hierarchy, then find the node ID on the right side under Attribute > NodeId. Find the Namespace Uri by matching the NamespaceIndex on the right to the list on the left. The default value is No highlight.

If either part is left empty, it's converted to a different node ID based on context. This happens automatically for events if you use the configuration tool released with version 1.1. If a mapping is specified in namespace-map, you can use the mapped value in place of namespace-uri.

Timestamps and intervals

In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit], for example, 10m for 10 minutes or 1h for 1 hour. timeunit is one of d, h, m, s, ms. You can also use a cron expression when this makes sense.

For history start and end times you can use a similar syntax. [N][timeunit] and [N][timeunit]-ago. 1d-ago means 1 day in the past from the time history starts, and 1h means 1 hour in the future. For instance, you can use this syntax to configure the extractor to read only recent history.

Source

This part of the configuration file concerns the extraction from the OPC UA server.

Parameter	Description
endpoint-url	The URL of the OPC UA server to connect to. In practice, this is the URL of the discovery server, where multiple levels of security may be provided. The OPC UA extractor attempts to use the highest security possible based on the configuration. Required.
alt-endpoint-urls	List alternative endpoint URLs the extractor can attempt when connecting to the server. Use this for non-transparent redundancy. See the OPC UA standard part 4, section 6.6.2. We recommend setting `force-restart` to `true`. Otherwise, the extractor will reconnect to the same server each time.
endpoint-details	Details to override default endpoint behavior. This is used to make the client connect directly to an OPC UA endpoint, for example if the server is behind NAT (Network Address Translation), circumventing server discovery. This parameter contains one field: override-endpoint-url, which overrides the URL of the selected endpoint.
redundancy	Additional configuration options related to redundant servers. The OPC UA extractor supports Cold redundancy, as described in the OPC UA standard part 4, section 6.6.2. Options: service-level-threshold - servers above this level are considered live,. If the server drops below this level, the extractor will switch, provided monitor-service-level is set to `true`. The default value is 200. reconnect-interval: The extractor will look through the available servers at this interval if the service level of the current server is below service-level-threshold. The default value is `10m`. The syntax is as given in Timestamps and intervals. monitor-service-level - If `true`, the extractor will subscribe to changes in the `ServiceLevel` of the server, and attempt to change server once it drops below service-level-threshold. This also makes the extractor not update states if the service level is below the threshold, letting servers indicate to the extractor that they are not receiving all data from sources.
reverse-connect-url	The local URL used for reverse-connect. This is the URL the server should connect to. You should also specify an `endpoint-url`. The server is responsible for initiating connections, so it can be placed behind a firewall. Leave empty to use direct connections.
auto-accept	Set to `true` to automatically accept connections from servers. If you set this to `false` and try to connect to a server with higher security than `None`, the connection fails. A certificate is placed in the rejected certificates folder (by default `application_dir/pki/rejected/`), but you can manually move it to the accepted certificates folder (`application_dir/pki/accepted`). A simple solution is to set this to `true` once on the first connection, then change it to `false`.
username/password	Used for server sign-in. Leave `username` empty to use no authentication.
x509-certificate	Specifies the configuration for using a signed x509 certificate to connect to the server. Options: file-name - the location of the x509-certificate password - the password to the x509-certificate file. store - the local store to use, either `None` (to use file), `Local` (for LocalMachine), or `User`. cert-name - the name of the certificate in the store.
secure	Try to connect to an endpoint with security above `None`.
ignore-certificate-issues	Ignore all suppressible certificate errors on the server certificate. You can use this setting if you receive errors such as Certificate use not allowed. CAUTION: This is potentially a security risk. Bad certificates can open the extractor to man-in-the-middle attacks from the server or similar. If the server security is located elsewhere (it's running locally, over a secure VPN, or similar), it's most likely fairly safe. Some errors aren't suppressible and must be remedied on the server.
publishing-interval	Sets the interval (n milliseconds) between publishing requests to the server. This limits the maximum frequency of points pushed to CDF but not the maximum frequency of points on the server. In most cases, this can be set to the same as `Extraction.DataPushDelay`. If you set it to `0`, the server chooses the interval according to the specification.
force-restart	If `true`, the OPC UA extractor won't attempt to reconnect using the OPC UA reconnect protocol on a disconnect from the server but restart completely. Use this option for servers that do not support reconnecting
exit-on-failure	If `true`, the OPC UA extractor won't automatically restart after a crash, but defer to some external mechanism.
restart-on-reconnect	If `true`, the OPC UA extractor will be restarted on reconnect. This may not be required if the server is expected to be static and if it handles reconnects well. Setting this to `true` lowers restart times.
keep-alive-interval	Specifies the interval in milliseconds between each keep-alive request to the server. The connection times out if a keep-alive request fails twice (2 * interval + 100ms). This typically happens if the server hangs on a heavy operation and doesn't manage to respond to keep-alive requests or if the server goes down. In the first case, waiting can be a good option. In the second case, it's better to time out quickly.
node-set-source	Read from NodeSet2 files instead of browsing the OPC UA node hierarchy. This is useful for smaller servers, where the full node hierarchy is defined. In general, it can be used to lower the load on the server if parts of it's known beforehand. Options: node-sets - a list of objects with either file-name or url, pointing to a NodeSet2.xml file. instance - Boolean. If `true`, the instance hierarchy isn't browsed from the server but obtained from the NodeSet files instead. types - Boolean. If `true`, event types, reference types, and object types are obtained from the NodeSet2 files.
limit-to-server-config	The default value `true` uses the `Server_ServerCapabilities` object to limit chunk sizes. Set this to `false` only if you want to set the limits higher and are certain that the server is reporting the wrong limits. If the real server limits are exceeded, the extractor will typically crash.
alt-source-background-browse	If `true`, browses the OPC UA node hierarchy in the background when reading nodes from NodeSet files or from CDF RAW. This setup doesn't reduce the load on the server but can speed up startup.
browse-chunk	Sets the number of maximum desired results from each call of the Browse service to OPC UA. Most servers have some limits, but the default of 1000 is usually reasonable. The server should also usually limit this on its own.
browse-nodes-chunk	Sets the number of maximum nodes to browse per browse service call. If set too high, the browse operation may fail. Most servers have an upper limit to the number of operations per service call, and this value also may affect the speed. We don't recommend setting this to 1, but it may be necessary for some servers.
attributes-chunk	Specifies the maximum number of attributes to fetch per operation. If the server fails with a `TooManyOperations` exception during attribute read, it may help to lower this value. 1000 should be fine for most servers and may even be set higher for higher-spec servers. For very large servers, 1000 will take a long time, and this should be set as high as possible, even if that requires increasing the keep-alive-interval.
subscription-chunk	Sets the maximum number of new `MonitoredItems` to create per operation. If the server fails with `TooManyOperations`, try to lower this value. Unless there are a large number of nodes on the server, 1000 per chunk is generally fine.
browse-throttling	Configuration object for throttling browses. max-per-minute - Maximum number of browse requests per minute. max-parallelism - Maximum number of parallel browse requests, if supported by the server. max-node-parallelism - Maximum number of nodes to read in parallel. This can be used to limit the number of continuation points used by the extractor.
certificate-expiry	Specifies the default certificate expiration in months. You can also replace the certificate with your own by modifying the .xml configuration file. Defaults to 5 years as of v2.5.3.
retries	Specify the retry policy for requests to the OPC UA server. timeout - Total timeout, after this much time has elapsed, no more retries will be attempted. max-tries - Enter the maximum number of retries. max-delay - Enter the maximum delay between each retry attempt. initial-delay - Enter the initial delay between each retry attempt, This is used as the basis for exponential backoff. retry-status-codes - List numerical status codes to retry and a number retried by default by the extractor. The syntax for delays is described in Timestamps and intervals.

History

The OPC UA extractor supports reading from data and event history in OPC UA. For data, the Historizing attribute must be set on the nodes to be read. For events, you must specify explicitly the node IDs of the emitters in the configuration.

Parameter	Description
enabled	Set to `false` to disable history read. This overrides all other history configurations and disables these entirely for both events and data points.
data	Set to `false` to disable history for data points. The default value is `true`. Use this to only enable history for events.
backfill	Enable backfill, meaning that data is read backward and forward through history. The server can start reading live values without completing `history-read` first if there is a lot of history. If set to `false` (default), the behavior is pre 1.1, meaning that the data is read from the beginning of history to the end before any live streaming begins.
require-historizing	Set to `true` to require `Historizing` to be set on time series to read history.
restart-period	Time in seconds to wait between each restart of history. Setting this too low may impact performance. Leave at 0 to disable periodic restarts. The syntax is described in Timestamps and intervals, this option allows cron expressions.
data-chunk	Maximum number of results to request per `HistoryRead` call when reading variables. Generally, this is limited by the server, so it can safely be set to `0`.
data-nodes-chunk	Maximum number of nodes to query per `HistoryRead` call when reading variables. If `Granularity` is set, this is applied afterward.
event-chunk	Maximum number of results to request per `HistoryRead` call when reading events. Generally, this is limited by the server, so it can safely be set to `0`.
event-nodes-chunk	Maximum number of nodes to query per `HistoryRead` call when reading events.
granularity	Granularity in seconds for chunking history read operations. Variables with the latest timestamp within the same chunk have their history read together. Reading more variables per operation is more efficient, but if the granularity is set too high, then a large number of duplicates are fetched. This can be inefficient for very large granularities. The best choice for this value is a few times the expected update frequency of your variables. The syntax is described in Timestamps and intervals.
start-time	Earliest timestamp to read from in milliseconds since January 1, 1970. The syntax is described in Timestamps and intervals, `-ago` can be added to make a timestamp in the past.
end-time	Timestamp to be considered the end of forward history. Only relevant if `max-read-length` is set. In milliseconds since 1/1/1970. The default is the current time, if this is 0. The syntax is described in Timestamps and intervals, `-ago` can be added to make a timestamp in the past.
ignore-continuation-points	Set to `true` to attempt to read history without using `ContinationPoints`, instead using the `Time` of events and `SourceTimestamp` of data points to incrementally change the start time of the request until no points are returned.
max-history-length	Maximum length of each read of history, in seconds. If this is set greater than zero, history will be read in chunks of maximum this size until the end. This can potentially take a very long time if `end-time` is much larger than `start-time`. The syntax is described in Timestamps and intervals.
throttling	Configuration object for throttling history reads. max-per-minute - The maximum number of history requests per minute. max-parallelism - The maximum number of parallel history requests if supported by the server. max-node-parallelism - Maximum number of nodes to read in parallel. This can be used to limit the number of continuation points used by the extractor.
log-bad-values	The default value is `true`. Log bad history data points, count per read at debug, and each data point at verbose.
error-threshold	The threshold in percent for a history run to be considered failed. For example, if this is set to `10.0`, the history read will be considered failed if more than `10%` of nodes fail to read at some point. Retries still apply. This only applies to nodes that fail even after retries. This is safe in terms of data loss. A node that has failed during history will not receive state updates from streaming.

Dry run

The dry-run option is on the top level. If this is set to true, the extractor will read from OPC UA, but not push anything to CDF. This is useful for debugging the extractor setup.

Cognite - CDF API

Configuration for pushing directly to the CDF API.

Parameter	Description
project	The CDF project. Required. Can be left out if the OPC UA extractor is set to debug mode.
host	The CDF service URL.
read-extracted-ranges	Specifies whether to read start/endpoints on startup, where possible. At least one pusher should be able to do this. Otherwise, the back/frontfill will run for the entire history of every restart. The CDF pusher can't read start/end points for events, so if reading historical events is enabled, one other pusher able to do this should be enabled. If the server has a lot of variables, this can be extremely slow, and we recommend using the state-store instead.
data-set-id	The internal ID of the CDF data set to be used for all new time series, assets, and events. Already created items won't be affected.
data-set-external-id	The data set to use for new objects, overridden by `data-set-id`. Requires the capability `datasets:read` for the given data set.
nan-replacement	Replacement value for values that are non-finite, for instance NaN, +Infinity, and -Infinity. If this is left empty, these points are ignored.
metadata-targets	Configuration for targets for metadata, meaning assets, time series metadata, and relationships.
metadata-targets/clean	Configuration for enabling writing to clean. Options: assets - Set to `true` to enable writing to CDF assets. The default value is `false`. timeseries - Set to `true` to enable writing to CDF time series. The default value is `false`. relationships - Set to `true` to enable writing to CDF relationships. The default value is `false`.
metadata-targets/raw	Configuration for writing to CDF RAW. Options: database - The RAW database to write to, required. assets-table - Name of the RAW table to write assets to, enables writing objects and types to RAW. timeseries-table - Name of the RAW table to write timeseries to, enables writing variables to RAW. relationships-table - Name of the table to write relationships to, enables writing references to RAW.
raw-metadata	Configuration for using CDF RAW to store assets and time series metadata. This is deprecated in favor of cognite.metadata-targets.
raw-node-buffer	Read from CDF instead of OPC UA when starting the extractor to speed up starting on slow servers. This requires the extraction.expand-node-ids and extraction-append-internal-values to be set to `true`. Generally, this would be enabled along with skip-metadata or raw-metadata. Reading from CDF RAW into clean using this is generally not supported. If browse-on-empty is set to `true`, and raw-metadata is configured with the same database and tables, the extractor will read from the server on first startup only, then use CDF RAW for all further reads. With this enabled, rebrowse/updates are generally pointless. enabled - set to `true` to enable feature. database - CDF RAW database to read from assets-table - CDF RAW table to read assets from for events. timeseries-table - CDF RAW table to read time series from for events and data points. browse-on-empty - Run normal browse if nothing is found when reading from CDF. Note that nodes may be present in the CDF RAW table. Browse will still run if none are variables and none have a valid `EventNotifier`.
metadata-mapping	Contains two string/string maps named assets and timeseries. It lets you define mappings between properties in OPC UA and CDF attributes. For example, it's quite common for variables in OPC UA to have an `EngineeringUnits` field, which ideally should be mapped to `unit` in CDF. This can be done with timeseries: EngineeringUnits: unit Valid attributes are: `name`, `description`, and `parentId`, and `unit` for time series. `parentId` must be the parent external ID of the time series, and it must be an asset mapped by the OPC UA extractor. It may be a string ID or a node ID.
skip-metadata	If `true`, assets won't be written to CDF, and only basic time series will be created. This is the same as when raw-metadata is enabled, except that nothing will be pushed to CDF RAW either. This is deprecated in favor of cognite.metadata-targets.
idp-authentication	Configuration for authentication using a bearer access token. See OAuth 2.0 client credentials flow. Required fields are `client-id`, `tenant`, `secret`, `scopes`. `min-ttl` is optional minimum time-to-live in seconds for the token. The default value is `30`. The authentication is inferred if you enter a `tenant` or `token-url`. You can only set one. If you set `tenant`, `MSAL` is used for authentication. If you set `token-url` , `basic` is used for authentication. . `authority` is the identity provider endpoint. The default is `https://login.microsoftonline.com/`.
cdf-retries	Configure automatic retries on requests to CDF. Fields: timeout - The maximum timeout for each individual try. max-retries - The maximum number of retries, less than `0` retries forever. max-delay - The maximum delay in milliseconds between each try. Base delay is calculated according to 125*2^retry ms. If less than `0`, there is no maximum (`0` would mean no delay). If the connection to CDF is very poor, you may need to change this setting. Lowering the maximum number of retries can also lower the time to `failure-buffering` starts, which may be necessary if there is a lot of data.
cdf-chunking	Configure chunking of data on requests to CDF. Note that some of these reflect actual limits in the API, and increasing them may cause requests to fail. See https://docs.cognite.com/api/v1/. time-series - The maximum number of time series per get/create time series request. assets - The maximum number of assets per get/create asset request. data-point-time-series - The maximum number of time series per data point create request. data-points - The maximum number of data points per data point create request. data-point-list - The maximum number of time series per data point read request, used when getting first point in a time series. data-point-latest - The maximum number of time series per data point read latest request. raw-rows - The maximum number of rows per request to CDF RAW. Used with RAW state-store and for RAW asset/time series metadata. events - The maximum number of events per get/create events request.
cdf-throttling	Configure how requests to CDF should be throttled. Each entry is the maximum allowed number of parallel requests to CDF. Fields: `time series`, `assets`, `datapoints`, `raw`, `ranges` (first/last data point), and `events`.
sdk-logging	Configuration for logging using the .NET SDK. This is additional debug information about requests and will show in detail what requests fail and how long they take. disable - Set to `true` to disable logging from the SDK. The default value is `false`. level - The level of logging, either `trace`, `debug`, `information`, `warning`, `error`, `critical`, `none`. format - The formatting of the log message.
extraction-pipeline	Configure an extraction pipeline manager. The pipeline must be created beforehand. external-id - The external ID of the extraction pipeline in CDF. frequency - The frequency to report `Seen` in seconds. Less than or equal to zero won't report automatically.
browse-callback	Call a Cognite function with the number of assets, time series, and relationships created and updated after each browse and rebrowse operation. The function is called with a JSON object containing the following fields: idPrefix - The configured `extraction.id-prefix`. assetsCreated - The number of new assets or raw rows in the assets table created. assetsUpdated - The number of assets updated, or raw rows in the asset table modified. timeSeriesCreated - The number of new time series or raw rows in the time series table. timeSeriesUpdated - The number of time series updated, or raw rows in the time series table modified. minimalTimeSeriesCreated - The number of time series created with no metadata, only used if time series are written to CDF RAW. relationshipsCreated - The number of new relationships or raw rows in the relationships table. rawDatabase - Name of the configured CDF RAW database. assetsTable - Name of the configured CDF RAW table for assets. timeSeriesTable - Name of the configured CDF RAW table for time series. relationshipsTable - Name of the configured CDF RAW table for relationships. Minimal time series refers to time series that are created with no metadata when time series are written to CDF RAW. This option requires `functions:WRITE` scoped to the function given by external ID or ID, and `functions:READ` if external ID is used. It's a YAML object with fields: external-id - function external ID. If this is used, `functions:READ` is required. id - function internal ID. report-on-empty - default `false`, set to `true` to always report, even if nothing was modified in CDF.
delete-relationships	If this is set to `true`, relationships deleted from the source will be hard-deleted in CDF.

Influx

Configuration for pushing to an InfluxDB database. Data points and events will be pushed, but no context or metadata.

Parameter	Description
host	The URL of the InfluxDB server.
username	The username for connecting to the database.
password	The password for connecting to the database
database	The database to connect to on the server. The database won't be created automatically.
read-extracted-ranges	Whether to read start/endpoints on startup, where possible.
read-extracted-event-ranges	Whether to read start/endpoints for events on startup, where possible.
point-chunk-size	Maximum number of points per push. Try to increase if the pushing seems to be slow.
non-finite-replacement	Replacement value for values that are non-finite, e.g. NaN, +Infinity and -Infinity. Leave empty to ignore these points.

MQTT

The MQTT pusher pushes to CDF one-way over MQTT. It requires that the MQTTCDFBridge application is running somewhere with access to CDF.

Parameter	Description
host	The address of TCP MQTT broker. This needs to be running for the pusher to function.
port	The port on the TCP MQTT broker.
username	The MQTT broker username. Leave empty to connect without authentication.
password	The MQTT broker password. Leave empty to connect without authentication.
client-id	The MQTT Client ID. This needs to be unique for each broker.
data-set-id	The internal ID of CDF dataset to be used for all new time series, assets, and events. Already created items won't be affected.
asset-topic	The topic to use for assets. Needs to match the configuration of MQTTCDFBridge (it does by default).
ts-topic	The topic to use for time series.
event-topic	The topic to use for events.
datapoint-topic	The topic to use for data points.
raw-topic	The topic to use for raw rows.
local-state	Set to enable storing a list of created assets/time series in a local database. Requires the `StateStorage.Location` property to be set. The value of this option is the table name. The default value is empty. Using this with raw state-storage doesn't make sense.
invalidate-before	Timestamp in ms since epoch to invalidate stored states. Any objects created before this will be replaced the next time the OPC UA extractor is restarted.
non-finite-replacement	The replacement value for values that are non-finite e.g. NaN, +Infinity and -Infinity, or not between -10^100 and 10^100. If this is left empty, these points are ignored.
raw-metadata	Configuration for using CDF RAW to store assets and time series metadata.
raw-metadata/database	The CDF RAW database to store metadata in, required for this feature to be enabled.
raw-metadata/assets-table	The CDF RAW table to store assets in. If this is set along with database, assets aren't pushed to the asset hierarchy but instead written to RAW. Time series won't be contextualized in this case, but if timeseries-table is set, the asset external ID will be stored there. The assets are pushed as full asset JSON objects with all the data available from extraction.
raw-metadata/timeseries-table	The CDF RAW table to store time series in. If this is set along with database, time series are pushed with minimum information (`isStep`, `isString`, `externalId`). Everything else is stored in CDF RAW as full time series JSON objects.
metadata-mapping	Contains two string/string maps named assets and timeseries. It lets you define mappings between properties in OPC UA and CDF attributes. For example, it's quite common for variables in OPC UA to have an `EngineeringUnits` field, which ideally should be mapped to a unit in CDF. This can be done with timeseries: EngineeringUnits: unit Valid attributes are `name`, `description`, and `parentId`, and `unit` for time series. `parentId` must be the parent externalId of the time series, and it must be an asset mapped by the OPC UA extractor. It may be a string ID directly or a node ID.
skip-metadata	If `true`, assets won't be written to CDF, and only basic time series will be created. This is the same as when raw-metadata is enabled, except that nothing will be pushed to CDF RAW either.
allow-untrusted-certificates	If `true`, allow untrusted certificates when connecting to the MQTT broker. This is a security risk. We recommend using `custom-certificate-authority` instead.
custom-certificate-authority	Path to a custom certificate file for a certificate authority the broker SSL certificate will be verified against.

Logger

Log entries are either Fatal, Error, Warning, Information, Debug, Verbose, in order of decreasing importance. Each level covers the ones of higher importance.

Parameter	Description
console/level	The level of messages to write to console. If not present, or invalid, logging to console is disabled. One of `fatal`, `error`, `warning`, `information`, `debug`, or `verbose`.
file/level	The level of messages to write to file. If not present, or invalid, logging to file is disabled. One of `fatal`, `error`, `warning`, `information`, `debug`, or `verbose`.
file/path	The path to a log file, logs are rotated.
file/retention-limit	The maximum number of logs to keep in log folder. The oldest are deleted.
file/rolling-interval	A rolling interval for log files. Either `day` or `hour`. The default value is `day`.
ua-trace-level	Capture OPC-UA tracing at this level or above. One of `fatal`, `error`, `warning`, `information`, `debug`, or `verbose`. This parameter is optional.
ua-session-tracing	Log data sent to and received from the OPC UA server.

StateStorage

A local LiteDb database or a table in CDF RAW that stores various persistent information between runs. It can be used as a replacement of the potential process of reading first/last data points from CDF, and also allow storing first/last times for events.

Parameter	Description
location	The path to the .db file used for storage, or the name of the CDF RAW database.
interval	The time between each time the state store is updated. Use syntax described in Timestamps and intervals. Defaults to `10s`.
database	Which type of database to use. Valid options are `None`, `Raw`, `LiteDb`.
variable-store	The name of the table or litedb collection to store information about extracted OPC UA variables.
event-store	The name of the table or litedb collection to store information about extracted events.
influx-variable-store	The name of the table or litedb collection to store information about variable ranges in influxdb failure buffer.
influx-event-store	The name of the table or litedb collection to store information about event ranges in influxdb failure buffer.

FailureBuffer

If the connection to a destination goes down, the OPC UA extractor supports buffering data points and events in influxdb or a local file. This is helpful if the connection is unstable.

Parameter	Description
datapoint-path	The path to the binary file where data points are buffered. Leave empty to disable pushing data points to file. Buffering to file is very fast, and is generally hardware bound.
enabled	Set to `true` to enable the FailureBuffer for all pushers.
event-path	The path to the binary file where events are buffered. Leave empty to disable pushing events to file.
influx	Set to `true` to enable buffering in influxdb. This requires influxdb to be running. This serves as an alternative to a local file, but should only be used if pushing to influxdb is required for other reasons.
influx-state-store	Set to `true` to enable storing the state of the influxdb buffer to a local database. This makes the influxdb buffer persistent even if the OPC UA extractor stops before it's emptied. Requires the `StateStorage.Location` option to be set.
max-buffer-size	Set the maximum size in bytes for the buffer file. If the size exceeds this size, no new datapoints or events will be written to their respective buffer files, and any further ephemeral data is lost. Note that if both datapoint and event buffers are enabled, the potential disk usage is twice this number.

Metrics

The OPC UA extractor can push some metrics about usage to a Prometheus pushgateway server.

Parameter	Description
server/host	The hostname for a locally hosted Prometheus server, used for scraping.
server/port	The port used for a locally hosted Prometheus server.
push-gateways	A list of pushgateway configurations. The OPC UA extractor will periodically push to each of these in turn.
push-gateways/host	The pushgateway URL root. Ex. config `my.prometheus.server` and job `myjob` gives the final endpoint `my.prometheus.server/metrics/jobs/myjob`
push-gateways/job	The job to use in the destination.
push-gateways/username	The username for the Prometheus target.
push-gateways/password	The password for the Prometheus target.
nodes	Use to treat certain OPC UA nodes as metrics. server-metrics - If `true`, a couple of relevant diagnostics from ServerDiagnosticsSummary are mapped. other-metrics - List of `ProtoNodeId` describing nodes that should be treated as metrics.

Extraction

Contains configuration settings for most extraction options, such as mapping, datatypes, and filters.

External ID generation

IDs used in OPC UA are special nodeId object with an identifier and a namespace that need to be converted to a string for destination systems. However, a direct conversion has several problems:

It will use the namespaceIndex, which isn't necessarily preserved between server restarts.
The namespace table may be modified, in which case all old nodeIds are invalidated.
- - NodeIds are also not unique between OPC UA servers and frequently just count from 1, which makes reading from multiple OPC UA servers impossible.
Node identifiers can be duplicated on different namespaces.

The solution is a nodeId on the following form:

IdPrefix + namespace + identifiertype(i,s,g,etc.) + = + identifier value as string
(+ [index in array if viable])

For example, the node with nodeId (SomeId, http://my.namespace.url), using the ID prefix gp: will be mapped to gp:http://my.namespace.url:i=SomeId. You can specify a namespace mapping in extraction/namespace-map to, for example, turn this into gp:mnu:i=SomeId

If it's an array, it turns into an object with the above ID, and several time series with IDs like gp:mnu:i=SomeId[1].

Alternatively, you can manually override each nodeId.

Parameter	Description
id-prefix	Prefix used to generate `NodeIds`.
ignore-name-prefix	DEPRECATED, use transformations. List of strings used to filter out prefixes on the `DisplayName` of nodes during browsing. This means that children of these nodes are also filtered out.
ignore-name	DEPRECATED, use transformations. List of full DisplayNames to ignore instead of just a prefix.
data-push-delay	Time between each push to destinations, in ms. The syntax is described in Timestamps and intervals.
root-node	A single `ProtoNodeId` (as described above) used as the origin of the browse. An empty `ProtoNodeId` (no identifier or no namespace) is treated as the objects folder. Combined with root-nodes, if specified. If neither root-node or root-nodes is specified, this defaults to the Objects folder.
root-nodes	A list of ProtoNodeIds to use as root nodes when browsing. These will generally be created as root assets in CDF. If a node set as root node is discovered as a descendant of another root node it will be ignored, but it may be best to avoid doing this at all.
node-map	Map from strings, representing externalIds, to `ProtoNodeIds`. This can be used to override the externalIds, for example to place the hierarchy as children of an asset in CDF. For example, if `UaRoot` is set to the same value as the `RootNode`, all the nodes in the tree will be placed as children of the node with externalId UaRoot.
namespace-map	Used as described above to map namespaces to shortened identifiers.
data-types	Sub-object containing configuration for how data types and arrays should be handled by the OPC UA extractor.
data-types/custom-numeric-types	Used to manually set types in OPC UA to be numeric. This can be used to make custom types be treated as numbers, etc. The conversion is done with the C# Convert functionality. If no valid conversion exists, this will fail.
data-types/ignore-data-types	List of `ProtoNodeId` (as described above), describing data types on variables to filter out.
data-types/unknown-as-scalar	Assume non-specific ValueRanks in OPC UA (`ScalarOrOneDimensions` and `Any`), are scalar, if they do not have an `ArrayDimension` set. If such a variable produces an array, only the first element will be mapped to CDF. In order to properly extract arrays to CDF, `ArrayDimensions` must be set.
data-types/max-array-size	Maximum length of arrays to be mapped to destinations. If this is set to `0`, only scalar values are mapped. Each `array-type` variable in the source system is converted to an object in the destination system, then each entry in the array is added as a child variable of that object. (In CDF this will mean that you get an asset with the `externalId` corresponding to the original variable, with time series for each entry in the array.) This requires the `ArrayDimensions` property to be set and be of length 1.
data-types/allow-string-variables	Set to `true` to map variables of non-numeric types to strings in destination systems.
data-types/auto-identify-types	Map out the data type hierarchy before starting. This is useful if there are custom or enum types. This is necessary for enum metadata and for enums-as-strings to work. If set to `false`, any custom numeric types must be added manually. This causes some extra work on startup.
data-types/enums-as-strings	If set to `false` and auto-identify-types is set to `true`, or there are manually added enums in custom-numeric-types, enums will be mapped to numeric time series, and labels are added as metadata fields. If set to `true`, labels aren't mapped to metadata, and enums will be mapped to string time series with values equal to mapped label values.
data-types/data-type-metadata	Add a metadata property `dataType` which contains the name or ID of the OPC UA datatype. Built-in types can always be mapped to name, custom types require auto-identify-types to be set to true.
data-types/null-as-numeric	Treat null data types as numeric. This can be useful on servers without string variables and faulty data types.
data-types/expand-node-ids	Add attributes such as `NodeId`, `ParentNodeId`, and `TypeDefinitionId` to nodes in CDF RAW , as full NodeIds encoded reversibly.
data-types/append-internal-values	Add internal attributes like `ValueRank`, `ArrayDimensions`, `AccessLevel`, and `Historizing` to nodes in CDF RAW .
data-types/estimate-array-sizes	If `max-array-size` is set, this looks for the MaxArraySize property on each node with one-dimension ValueRank. If this isn't found, it tries to read the value as well and look at the current size. ArrayDimensions is still the preferred way to identify array sizes, this isn't guaranteed to generate reasonable or useful values.
auto-rebrowse-period	Time in minutes between each automatic re-browse of the node hierarchy. Since only new nodes are pushed to destinations, this is usually quite fast. The syntax is described in Timestamps and intervals, this option accepts cron expressions.
enable-audit-discovery	The OPC UA extractor listens to `AuditAddNodes` and `AuditAddReferences` events on the server node, then uses the information in these to browse the hierarchy. This is more efficient than browsing periodically, but requires server support for auditing.
map-variable-children	By default, children of variables are treated as properties. If this is set to `true`, they can be treated as objects or variables instead. This will cause some variables to be mapped to both time series and assets, to allow time series to have time series children.
update	Update data in destinations on re-browse or restart. Set auto-rebrowse-period to some value to do this periodically. Consists of two objects, objects, and variables, controlling updates of assets and time series, respectively. For each, name, description, context, and metadata can be configured separately. context refers to the structure of the node graph in OPC UA (`assetId` and `parentId` in CDF). Metadata refers to any information obtained from OPC UA properties (metadata in CDF). Enabling any of these will increase the startup- and rebrowse-time of the OPC UA extractor. Enabling metadata will increase it more.
relationships	Map OPC UA non-hierarchical references to relationships in CDF. The generated relationships will have external-id `[prefix][reference type name (or inverse-name)];[namespace source][id source];[namespace target][id target]` Only relationships between mapped nodes will be added. This may be relevant if the server contains functional relationships, like connected components, a non-hierarchical reference based system for location, etc.
relationships/enabled	Enable mapping non-hierarchical relationships to CDF. This is also required for any kind of relationship mapping to occur at all.
relationships/hierarchical	Map hierarchical references to relationships in CDF.
relationships/inverse-hierarchical	Create inverse relationships for each hierarchical reference. For efficiency these are inferred, not read.
node-types	Config related to mapping object- and variable-types to destinations.
node-types/metadata	Add the TypeDefinition as a metadata field to all nodes.
node-types/as-nodes	Allow discovered types to be treated as nodes and mapped to CDF assets. Requires these to be inside the hierarchy, a solution to this may be to specify the Types folder as a root node.
transformations	A list of transformations to be applied to the source nodes before pushing. The possible transformations are: Ignore - ignore the node. This will ignore all descendants of the node. If the filter doesn't use `is-array`, `description`, or `parent`, this is done while reading, and so children won't be read. Otherwise, the filtering happens later. Property - turn the node into a property, which is treated as metadata. This also applies to descendants. Nested metadata is give a name like `grandparent_parent_variable`, for each variable in the tree. There is some overhead associated with the filters. DropSubscriptions - do not subscribe to this node with events or data points. TimeSeries - make the variable not a property, so that it's treated as a time series instead. Requires parents to be non-properties as well. Note that transformations are applied sequentially, so it can help performance to put Ignore filters first, and that TimeSeries transformations can undo Property transformations. It's possible to have multiples of each filter type. Each transformation consists of a type field and a filter field. The type is either Ignore, Property, or TimeSeries, the filter has the following fields: name - regex filter on node `DisplayName`. description - regex filter on node `Description`. `id` - regex filter on string representation of node ID, on the form “`i=123`, `s=string`, etc. is-array - `true`/`false` whether the node is an array. If this is set to some value, the filter will only match variables that satisfy the requirement. namespace - regex filter on full namespace of the node ID. type-definition - regex filter on string representation of `TypeDefinition NodeId`, on the form `i=123`, `s=string`, etc. node-class - filter on the NodeClass of the node, either `Object`, `Variable`, `ObjectType`, `VariableType`. historizing - `true`/`false` on the Historizing attribute on variables. If this is set to some value, filter will only match variables. parent - another instance of this filter that will be applied to the parent node if it exists. For nodes without registered parents, this will always miss.
rebrowse-triggers	Configure the extractor to trigger a rebrowse of the server when there are changes to specific namespace metadata nodes. Options: targets - Which node types to listen to changes to, currently the only valid option is namespace-publication-date: true namespaces - A list of namespace URIs filtering the selected nodes. Leave empty to use all namespaces.
deletes	Configuration for soft deletes. When this is enabled, all read nodes are written to a state store after browse. Nodes that are missing on subsequent browses are marked as deleted from CDF with a configurable marker. A notable exception is relationships in CDF, which have no metadata. These are hard-deleted if cognite.delete-relahionships is enabled. Options: enabled - Enable deletes, this requires a state store to be configured. delete-marker - Name of marker indicating a node is deleted. Added to metadata, or as a column in RAW. The default value is `deleted`.

Subscriptions

A few options for subscriptions to events and data points. Subscriptions in OPC UA consist of Subscription objects on the server, which contain a list of MonitoredItems. By default, the extractor produces a maximum of four subscriptions:

DataChangeListener - handles data point subscriptions.
EventListener - handles event subscriptions.
AuditListener - which handles audit events.
NodeMetrics - which handles subscriptions for use as metrics.

Each of these can contain a number of MonitoredItems.

Parameter	Description
data-points	The default value is `true`. Enables subscriptions on data points.
events	The default value is `true`. Enables subscriptions for events.
data-change-filter	Modify the DataChangeFilter used for data point subscriptions. See OPC UA reference part 4 7.17.2 for details. These are passed to the server in DataChangeListener. trigger - Either `Status`, `StatusValue`, `StatusValueTimestamp`. The default value is `StatusValue`. deadband-type - Default `None`. Either `None`, `Absolute`, or `Percent`. deadband-value - Default `0`, meaning depends on deadband-type.
ignore-access-level	Ignore the AccessLevel attribute and subscribe to all Variables, reading history from all nodes with Historizing set to `true`. This is the pre-2.3 behavior.
log-bad-values	Log bad subscription data points.
sampling-interval	Sets the sample rate of subscriptions on the server. The server usually defines a set of permitted sample-rates and picks the closest to what you specify here. Many servers don't support more than a single sample rate. Set the interval to `0` to use the server default. This setting generally sets the maximum rate of points from the server (in milliseconds). On many servers, sampling is an internal operation, but on some, this may access external systems. Setting this very low can increase the load on the server significantly. It typically limits the density of the points from the server, but not always.
queue-length	Specifies the length of the internal server queue for points and events. Normally, this can be set to the same as publishing-interval/sampling-interval. Higher numbers increase the strain on the server. Many servers have a limited maximum queue size or ignore this parameter entirely and use a fixed size for everything.
keep-alive-count	The number of publish requests without a response before the server should send a keep alive message. Default 10.
lifetime-count	The number of publish requests without a response before the server should close the subscription. Must be at least `3 * keep-alive-count`. Default 1000.
alternative-configs	List of alternative subscription configurations. The first entry with a matching filter will be used for each node. Contains data-change-filter, sampling-interval, and queue-length, as well as filter, which contains the following fields: id - regex on node external ID. data-type - regex on node data type, if it is a variable is-event-state - match on whether this subscription is for data points or events

Events

Events in OPC UA are usually custom when used on a server, and servers that support events often have a large number active. In OPC UA, any node may specify the EventNotifier property, which indicates that it emits events and optionally stores historical events.

By default, all events will be read. If all-events is set to false, only events that do not belong to the base namespace will be read.

The attributes of each event are automatically mapped out, and a few general properties are filtered off. Others may be used as metadata in CDF or other source systems, or in some cases be mapped directly to event properties.

If the event has a SourceNode that refers to a node in the mapped hierarchy, it will be used to set the assetId property on the event in CDF.

The old options event-ids, emitter-ids, and historizing-emitter-ids are, deprecated, but will still work and may be used as a workaround for servers that aren't fully compliant with the OPC UA standard.

Parameter	Description
enabled	`True` to enable reading events from the server. If this is `false`, no events will be read.
history	`True` to enable reading historical events.
all-events	`True` to read all events, not just custom events. The default value is `true`.
read-server	`True` to also check the server node when looking for event emitters. The default `true`.
exclude-event-filter	Regex filter on event type `DisplayName`, matches won't be extracted.
exclude-properties	List of `BrowseNames` for properties of events to be excluded from metadata or other consideration. By default, only `Time` and `Severity` are used from the `BaseEventType`, all properties of subtypes are included.
destination-name-map	Map source browse names to other values in the destination. For CDF, internal properties may be overwritten, by default `Message` is mapped to description, `SourceNode` is used for context, and `EventType` is used for type. These may also be excluded or replaced by overrides in `DestinationNameMap`. If multiple properties are mapped to the same value, the first non-null is used. If `StartTime`, `EndTime`, or `SubType` are specified, either directly or through the map, these are used as event properties instead of metadata. `StartTime` and `EndTime` should be either `DateTime`, or a number corresponding to the number of milliseconds since January 1 1970. If no `StartTime` or `EndTime` are specified, both are set to the `Time` property of `BaseEventType`. `Type` may be overridden case-by-case using `NodeMap` in the Extraction configuration, or in a dynamic way here. If no `Type` is specified, it's generated from `Event NodeId` in the same way `ExternalIds` are generated for normal nodes.
event-ids (deprecated)	List of `ProtoNodeIds` (as described above) to be mapped to destinations. Events must be `ObjectTypes` and subtypes of `BaseEventType` in the OPC UA hierarchy. An empty `ProtoNodeId` defaults to the `BaseEventType`. This serves as an allowlist. If not specified, all events will be extracted.
emitter-ids (deprecated)	List of `ProtoNodeIds` used as emitters. An empty `ProtoNodeId` defaults to the server node. This allows specifying additional event emitters. This is used to add extra emitters that aren't in the extracted node hierarchy, or that doesn't correctly specify the `EventNotifier` property.
historizing-emitter-ids (deprecated)	List of `ProtoNodeIds` that must be a subset of the `EmitterIds`. These emitters will have their event history read. The server must support this. The `events.history` option must be set for this to work. This is used to supplement the `EventNotifier` property, so that events that do not have the `EventNotifier` property set may still have their events read. Note that attempting to read historical events from non-historizing emitters may cause issues.

Pub-Sub

This is an experimental feature that allows subscribing to OPC UA pubsub instead of using OPC UA subscriptions for data points only. This requires the OPC UA server to be available and to expose the full PubSub configuration, as described in Part 14 of the OPC UA standard. It currently only supports MQTT.

Note that this doesn't disable subscriptions, you may want to consider setting subscriptions: data-points: false to avoid getting double data points.

Time series aren't created from OPC UA pubsub configuration, but must be discovered in the OPC UA node hierarchy.

Parameter	Description
enabled	The default value is `false`. Enables pub-sub discovery.
prefer-uadp	The default value is `true`. If set to `false`, the extractor will prefer using uadp if the same datasets are exposed through multiple DataSetWriters.
file-name	Save or read configuration from a file. If the file doesn't exist, it will be created from server configuration. If this is pre-created manually, the server doesn't need to expose pubsub configuration.

High availability

The extractor can run with a rudimentary form of redundancy. Multiple extractors on different machines are on standby, with one actively extracting from the OPC UA server. Each extractor must have a unique index.

Parameter	Description
index	A unique index for this extractor. Indices must be unique, or high availability will not work correctly.
raw	Use the CDF staging area as a shared store for the extractor. This configuration must be the same for each redundant extractor. database-name - Name of the database in CDF. table-name - Name of the table in CDF.
redis	Use a redis store as shared state for the extractor. This configuration must be the same for each redundant extractor. connection-string - Redis connection string. table-name - Name of the redis table to use.

Configure the OPC UA extractor

Sample configuration files​

Minimal YAML configuration file​

ProtoNodeId​

Timestamps and intervals​

Source​

History​

Dry run​

Cognite - CDF API​

Influx​

MQTT​

Logger​

StateStorage​

FailureBuffer​

Metrics​

Extraction​

External ID generation​

Subscriptions​

Events​

Pub-Sub​

High availability​