メイン コンテンツにスキップする

Configure the SAP extractor

To configure the SAP extractor, you must create a configuration file. The file must be in YAML format.

You can use the sample minimal configuration file included with the extractor packages as a starting point for your configuration settings.

The configuration file contains the global parameter version, which holds the version of the configuration schema. This article describes version 1.

Tip

You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.

Logger

Use the optional logger section to set up logging to a console or files.

ParameterDescription
consoleSet up console logger configuration. See the Console section.
fileSet up file logger configuration. See the File section.

Console

Use the console subsection to enable logging to a standard output, such as a terminal window.

ParameterDescription
levelSelect the verbosity level for console logging. Valid options, in decreasing verbosity levels, are DEBUG, INFO, WARNING, ERROR, and CRITICAL.

File

Use the file subsection to enable logging to a file. The files are rotated daily.

ParameterDescription
levelSelect the verbosity level for file logging. Valid options, in decreasing verbosity levels, are DEBUG, INFO, WARNING, ERROR, and CRITICAL.
pathInsert the path to the log file.
retentionSpecify the number of days to keep logs for. The default value is 7 days.

Cognite

Use the cognite section to describe which CDF project the extractor will load data into and how to connect to the project.

ParameterDescription
projectInsert the CDF project name. This is a required parameter.
hostInsert the base URL of the CDF project. The default value is https://api.cognitedata.com.
api-keyWe've deprecated API key authentication.
idp-authenticationInsert the credentials for authenticating to CDF using an external identity provider. You must enter either an API key or use IdP authentication.

Identity provider (IdP) authentication

Use the idp-authentication subsection to enable the extractor to authenticate to CDF using an external identity provider, such as Azure AD.

ParameterDescription
client-idEnter the client ID from the IdP. This is a required parameter.
secretEnter the client secret from the IdP. This is a required parameter.
scopesList the scopes. This is a required parameter.
resourceInsert token requests. This is an optional parameter.
token-urlInsert the URL to fetch tokens from. You must enter either a token URL or an Azure tenant.
tenantEnter the Azure tenant. You must enter either a token URL or an Azure tenant.

Extractor

Use the optional extractor section to add tuning parameters.

ParameterDescription
modeSet the execution mode. Options are single or continuous.
Use continous to run the extractor in a continuous mode, executing the Odata queries defined in the endpoints section. The default value is single
upload-queue-sizeEnter the size of the upload queue. The default value is 50 000 rows.
parallelismInsert the number of parallel queries to run. The default value is 4 queries.
state-storeSet to true to configure state store. The default value is no state store, and the incremental load is deactivated. See the State store section.
chunk_sizeEnter the number of rows to be extracted from SAP OData on every run. The default value is 1000 rows, as recommended by SAP.
delta_padding_minutesExtractor internal parameter to control the incremental load padding. Do not change.

State store

Use the state store subsection to save extraction states between runs. Use this if data is loaded incrementally. We support multiple state stores, but you can only configure one at a time.

ParameterDescription
localLocal state store configuration. See the Local section.
rawRAW state store configuration. See the RAW section.

Local

Use the local section to store the extraction state in a JSON file on a local machine.

ParameterDescription
pathInsert the file path to a JSON file.
save-intervalEnter the interval in seconds between each save. The default value is 30 seconds.

RAW

Use the RAW section to store the extraction state in a table in the CDF staging area.

ParameterDescription
databaseEnter the database name in the CDF staging area.
tableEnter the table name in the CDF staging area.
upload-intervalEnter the interval in seconds between each save. The default value is 30 seconds.

Metrics

Use the metrics section to describe where to send performance metrics for remote monitoring of the extractor. We recommend sending metrics to a Prometheus pushgateway, but you can also send metrics as time series in the CDF project.

ParameterDescription
push-gatewaysList the Pushgateway configurations. See the Pushgateways section.
cogniteList the Cognite metrics configurations. See the Cognite section.

Pushgateways

Use the pushgateways subsection to define a list of metric destinations, each on the following schema:

ParameterDescription
hostEnter the address of the host to push metrics to. This is a required parameter.
job-nameEnter the value of the exported_job label to associate metrics with. This separates several deployments on a single pushgateway, and should be unique. This is a required parameter.
usernameEnter the credentials for the pushgateway. This is a required parameter.
passwordEnter the credentials for the pushgateway. This is a required parameter.
clear-afterEnter the number of seconds to wait before clearing the pushgateway. When this parameter is present, the extractor will stall after the run is complete before deleting all metrics from the pushgateway. The recommended value is at least twice that of the scrape interval on the pushgateway. This is to ensure that the last metrics are gathered before the deletion.
push-intervalEnter the interval in seconds between each push. The default value is 30 seconds.

Cognite

Use the cognite subsection to sent metrics as time series to the CDF project configured in the cognite main section above. Only numeric metrics, such as Prometheus counters and gauges, are sent.

ParameterDescription
external-id-prefixInsert a prefix to all time series used to represent metrics for this deployment. This creates a scope for the set of time series created by these metrics exported and should be deployment-unique across the entire project. This is a required parameter.
asset-nameEnter the name of the asset to attach to time series. This will be created if it doesn't already exist.
asset-external-idEnter the external ID for the asset to create if the asset doesn't already exist.
push-intervalEnter the interval in seconds between each push. The default value is 30 seconds.

SAP

The sap section contains a list of SAP sources. The schema for each SAP source configuration depends on which SAP source type you are connecting to. These are distinguished by the type parameter. The supported SAP sources are:

  • OData
  • SOAP
  • RFC

This is the schema for SAP OData sources

ParameterDescription
typeType of SAP source connection, set to odata for SAP OData sources.
source_nameInsert the SAP NetWeaver Gateway URL. This is a required parameter.
gateway_urlInsert the SAP NetWeaver Gateway URL. This is a required parameter.
clientEnter the SAP client number. This is a required parameter.
usernameEnter the SAP username to connect to the SAP NetWeaver Gateway. This is a required parameter.
passwordEnter the password to connect to the SAP NetWeaver Gateway. This is a required parameter.
certificatesCertificates needed for authentication towards SAP instance. This is an optional parameter. See the Certificates section.
timezoneSpecify how the extractor should handle the source time zone. Valid values are local and utc. The default value is local.

This is the schema for SAP SOAP sources

ParameterDescription
typeType of SAP source connection, set to soap for SAP SOAP sources.
source_nameInsert the SAP NetWeaver Gateway URL. This is a required parameter.
wsdl_urlInsert the SOAP WSDL URL related to the SAP ABAP webservice . This is a required parameter.
clientEnter the SAP client number. This is a required parameter.
usernameEnter the SAP username to connect to the SAP Webservice. This is a required parameter.
passwordEnter the password to connect to the SAP SAP Webservice. This is a required parameter.
certificatesCertificates needed for authentication towards SAP instance. This is an optional parameter. See the Certificates section.
timezoneSpecify how the extractor should handle the source time zone. Valid values are local and utc. The default value is local.

This is the schema for SAP RFC sources

ParameterDescription
typeType of SAP source connection, set to rfc for SAP RFC sources.
source_nameInsert the SAP NetWeaver Gateway URL. This is a required parameter.
ashostInsert SAP application host address. This is a required parameter
sysnrTechnical identifier for internal processes in SAP. It consists of a two-digit number from 00 to 97. This is a required parameter
clientEnter the SAP client number. This is a required parameter.
usernameEnter the SAP username to connect to the SAP NetWeaver Gateway. This is a required parameter.
passwordEnter the password to connect to the SAP NetWeaver Gateway. This is a required parameter.
saprouterEnter the SAPRouter address when applicable. This is an optional parameter.
snc_partnernameEnter the SAP SNC (Secure Network Communcation) name when applicable. This is an optional parameter.
snc_libEnter the path to the the SAP SNC library needed when using SNC authentication. This is an optional parameter.
x509certEnter the path to the user X509 certificate when applicable . This is an optional parameter.
timezoneSpecify how the extractor should handle the source time zone. Valid values are local and utc. The default value is local.

Certificates

Use the certificates subsection the certificates to be used for authentication towards SAP instances.

There are three certificates needed to perform the authentication: certificate authority (ca_cert), public key (public_key), and private key (private_key)

Please check this documentation on how to generate the three certificates from a .p12 certificate file, if needed.

When setting the certificate authentication, note thatthree certificates are needed and they must be placed in the same folder where the extractor will be running.

ParameterDescription
ca_certEnter the path to the CA certificate file.
public_keyEnter the path to the public key file.
private_keyEnter the path to the key file.

Endpoints

Use the endpoint subsection to specify the OData endpoints.

ParameterDescription
nameEnter the name of SAP endpoint that will be used to extract data from a SAP source. The name must be unique for each query in the configuration file. This is a required parameter.
source_nameEnter the name of the SAP source related to this endpoint. This must be one of the SAP sources configured in the sap section. This is a required parameter.
sap_serviceEnter the name of the related SAP service. For odata endpoints, it's the SAP OData service. For soap endpoints, it's the operation defined in the WSDL document. For rfc endpoints, it's the name of the SAP function module exposed through the RFC protocol. This is a required parameter.
sap_entityEnter the name of the SAP entity related to the SAP OData service. This is a required parameter.
destinationThe destination of the data in CDF. One of many destination types, see Destination. This is a required value.
sap_keyEnter list of fields related to the SAP entity to be used as keys while ingesting data to CDF staging. This is a required parameter when using raw as a CDF destination.
requestEnter the request to be sent to the SAP. This is a required parameter for rfc and soap endpoints. See Request section.
incremental_fieldEnter the name of the field to be used as reference for the incremental runs. This is an optional parameter. If you leave this field empty, the extractor will fetch full data loads every run.
scheduleSchedule the interval which the OData queries will be executed towards the SAP Odata service. See the Schedule section.
extract_schemaExtracts the SAP entity schema to CDF staging area. It expects database and table parameters, same as RAW destination. This is an optional parameter.

Request

The request parameter is required for rfc and soap endpoints.

Both SOAP or RFC communication protocols need a request to the SAP server in order to retrieve data.

SOAP requests

SAP ABAP Webservices are SOAP/based, meaning the requests to the SAP server must be in a valid XML format.

The SAP extractor expects this XML to be added as a string in the request parameter. This is an example of a valid XML request to a SAP ABAP Webservice generated from a SAP Function Module:

    request: |
<n0:BAPI_FUNCLOC_GETLIST xmlns:n0="urn:sap-com:document:sap:rfc:functions">
<FUNCLOC_LIST>
<item>
<FUNCTLOCATION>String 57</FUNCTLOCATION>
<FUNCLOC>String 58</FUNCLOC>
<LABEL_SYST>S</LABEL_SYST>
<DESCRIPT>String 60</DESCRIPT>
<STRIND>Strin</STRIND>
<CATEGORY>S</CATEGORY>
<SUPFLOC>String 63</SUPFLOC>
<PLANPLANT>Stri</PLANPLANT>
<MAINTPLANT>1010</MAINTPLANT>
<PLANGROUP>Str</PLANGROUP>
<SORTFIELD>String 67</SORTFIELD>
</item>
</FUNCLOC_LIST>
<MAINTPLANT_RA>
<item>
<SIGN>I</SIGN>
<OPTION>EQ</OPTION>
<LOW>1010</LOW>
<HIGH>1010</HIGH>
</item>
</MAINTPLANT_RA>
</n0:BAPI_FUNCLOC_GETLIST>

RFC requests

The SAP RFC communication protocol triggers a SAP Function module remotely to a target SAP server. SAP Function Modules expect import parameters in order to run and return the processed request.

The SAP extractor expects the SAP FM parameters to be sent as a JSON request inside the request parameter. This is an example of a valid SAP RFC call to RFC_READ_TABLE SAP function module.

    request: |
{
"QUERY_TABLE":"QMEL",
"FIELDS":["QMNUM","QMART","QMTXT"]
}

Schedule

Use the schedule subsection to schedule runs when the extractor runs as a service.

ParameterDescription
typeInsert the schedule type. Valid options are cron and interval.
  • cron uses regular cron expressions.
  • interval expects an interval-based schedule.
  • expressionEnter the cron or interval expression to trigger the query. For example, 1h repeats the query hourly, and 5m repeats the query every 5 minutes.

    Destination

    The raw destination writes data to the CDF staging area (RAW). The raw destination requires the sap_key parameter in the endpoint configuration.

    ParameterDescription
    typeType of CDF destination, set to raw to write data to RAW.
    databaseEnter the CDF RAW database to upload data into. This will be created if it doesn't exist. This is a required value.
    tableEnter the CDF RAW table to upload data into. This will be created if it doesn't exist. This is a required value.

    Time series

    The time_series destination inserts the resulting data as data points in time series.

    There are two mandatory parameters in order to use time series as destination:

    • type: Set to time_series to write data to CDF time series
    • field_mapping: To ingest data into a time series, SAP entity fields must be mapped to the following CDF timeseries fields
      • externalId: Required SAP entity field.
      • timestamp: Required SAP entity field.
      • value: Required SAP entity field.

    Assets

    The assets destination inserts the resulting data as CDF assets.

    There are two mandatory parameters in order to use CDF assets as destination:

    • type: Set to assets to write data to CDF assets
    • field_mapping: To ingest data into assets, SAP entity fields must be mapped to the following CDF time series fields
      • externalId: Required SAP entity field.
      • parentExternalId: Optional SAP entity field.
      • description: Optional SAP entity field.
      • source: Optional SAP entity field.

    Any other columns returned by the endpoint call will be mapped to key/value pairs in the metadata field for assets.

    Events

    The events destination inserts the resulting data as CDF events.

    There are two mandatory parameters in order to use CDF events as destination:

    • type: Set to events to write data to CDF events
    • field_mapping: To ingest data into events, SAP entity fields must be mapped to the following CDF time series fields
      • externalId: Required SAP entity field.
      • startTime: Optional SAP entity field.
      • endTime: Optional SAP entity field.
      • description: Optional SAP entity field.
      • source: Optional SAP entity field.

    Any other columns returned by the endpoint call will be mapped to key/value pairs in the metadata field for events.