Configure the SAP extractor
To configure the SAP extractor, you must create a configuration file. The file must be in YAML format.
You can use the sample minimal configuration file included with the extractor packages as a starting point for your configuration settings.
The configuration file contains the global parameter version
, which holds the version of the configuration schema. This article describes version 1.
You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.
Logger
Use the optional logger
section to set up logging to a console or files.
Parameter | Description |
---|---|
console | Set up console logger configuration. See the Console section. |
file | Set up file logger configuration. See the File section. |
Console
Use the console
subsection to enable logging to a standard output, such as a terminal window.
Parameter | Description |
---|---|
level | Select the verbosity level for console logging. Valid options, in decreasing verbosity levels, are DEBUG , INFO , WARNING , ERROR , and CRITICAL . |
File
Use the file
subsection to enable logging to a file. The files are rotated daily.
Parameter | Description |
---|---|
level | Select the verbosity level for file logging. Valid options, in decreasing verbosity levels, are DEBUG , INFO , WARNING , ERROR , and CRITICAL . |
path | Insert the path to the log file. |
retention | Specify the number of days to keep logs for. The default value is 7 days. |
Cognite
Use the cognite
section to describe which CDF project the extractor will load data into and how to connect to the project.
Parameter | Description |
---|---|
project | Insert the CDF project name. This is a required parameter. |
host | Insert the base URL of the CDF project. The default value is https://api.cognitedata.com. |
api-key | We've deprecated API key authentication. |
idp-authentication | Insert the credentials for authenticating to CDF using an external identity provider. You must enter either an API key or use IdP authentication. |
Identity provider (IdP) authentication
Use the idp-authentication
subsection to enable the extractor to authenticate to CDF using an external identity provider, such as Azure AD.
Parameter | Description |
---|---|
client-id | Enter the client ID from the IdP. This is a required parameter. |
secret | Enter the client secret from the IdP. This is a required parameter. |
scopes | List the scopes. This is a required parameter. |
resource | Insert token requests. This is an optional parameter. |
token-url | Insert the URL to fetch tokens from. You must enter either a token URL or an Azure tenant. |
tenant | Enter the Azure tenant. You must enter either a token URL or an Azure tenant. |
Extractor
Use the optional extractor
section to add tuning parameters.
Parameter | Description |
---|---|
mode | Set the execution mode. Options are single or continuous . Use continous to run the extractor in a continuous mode, executing the Odata queries defined in the endpoints section. The default value is single |
upload-queue-size | Enter the size of the upload queue. The default value is 50 000 rows. |
parallelism | Insert the number of parallel queries to run. The default value is 4 queries. |
state-store | Set to true to configure state store. The default value is no state store, and the incremental load is deactivated. See the State store section. |
chunk_size | Enter the number of rows to be extracted from SAP OData on every run. The default value is 1000 rows, as recommended by SAP. |
delta_padding_minutes | Extractor internal parameter to control the incremental load padding. Do not change. |
State store
Use the state store
subsection to save extraction states between runs. Use this if data is loaded incrementally. We support multiple state stores, but you can only configure one at a time.
Parameter | Description |
---|---|
local | Local state store configuration. See the Local section. |
raw | RAW state store configuration. See the RAW section. |
Local
Use the local
section to store the extraction state in a JSON file on a local machine.
Parameter | Description |
---|---|
path | Insert the file path to a JSON file. |
save-interval | Enter the interval in seconds between each save. The default value is 30 seconds. |
RAW
Use the RAW
section to store the extraction state in a table in the CDF staging area.
Parameter | Description |
---|---|
database | Enter the database name in the CDF staging area. |
table | Enter the table name in the CDF staging area. |
upload-interval | Enter the interval in seconds between each save. The default value is 30 seconds. |
Metrics
Use the metrics
section to describe where to send performance metrics for remote monitoring of the extractor. We recommend sending metrics to a Prometheus pushgateway, but you can also send metrics as time series in the CDF project.
Parameter | Description |
---|---|
push-gateways | List the Pushgateway configurations. See the Pushgateways section. |
cognite | List the Cognite metrics configurations. See the Cognite section. |
Pushgateways
Use the pushgateways
subsection to define a list of metric destinations, each on the following schema:
Parameter | Description |
---|---|
host | Enter the address of the host to push metrics to. This is a required parameter. |
job-name | Enter the value of the exported_job label to associate metrics with. This separates several deployments on a single pushgateway, and should be unique. This is a required parameter. |
username | Enter the credentials for the pushgateway. This is a required parameter. |
password | Enter the credentials for the pushgateway. This is a required parameter. |
clear-after | Enter the number of seconds to wait before clearing the pushgateway. When this parameter is present, the extractor will stall after the run is complete before deleting all metrics from the pushgateway. The recommended value is at least twice that of the scrape interval on the pushgateway. This is to ensure that the last metrics are gathered before the deletion. |
push-interval | Enter the interval in seconds between each push. The default value is 30 seconds. |
Cognite
Use the cognite
subsection to sent metrics as time series to the CDF project configured in the cognite main section above. Only numeric metrics, such as Prometheus counters and gauges, are sent.
Parameter | Description |
---|---|
external-id-prefix | Insert a prefix to all time series used to represent metrics for this deployment. This creates a scope for the set of time series created by these metrics exported and should be deployment-unique across the entire project. This is a required parameter. |
asset-name | Enter the name of the asset to attach to time series. This will be created if it doesn't already exist. |
asset-external-id | Enter the external ID for the asset to create if the asset doesn't already exist. |
push-interval | Enter the interval in seconds between each push. The default value is 30 seconds. |
SAP
The sap
section contains a list of SAP sources. The schema for each SAP source configuration depends on which SAP source type you are connecting to. These are distinguished by the type
parameter. The supported SAP sources are:
- OData
- SOAP
- RFC
This is the schema for SAP OData sources
Parameter | Description |
---|---|
type | Type of SAP source connection, set to odata for SAP OData sources. |
source_name | Insert the SAP NetWeaver Gateway URL. This is a required parameter. |
gateway_url | Insert the SAP NetWeaver Gateway URL. This is a required parameter. |
client | Enter the SAP client number. This is a required parameter. |
username | Enter the SAP username to connect to the SAP NetWeaver Gateway. This is a required parameter. |
password | Enter the password to connect to the SAP NetWeaver Gateway. This is a required parameter. |
certificates | Certificates needed for authentication towards SAP instance. This is an optional parameter. See the Certificates section. |
timezone | Specify how the extractor should handle the source time zone. Valid values are local and utc . The default value is local . |
This is the schema for SAP SOAP sources
Parameter | Description |
---|---|
type | Type of SAP source connection, set to soap for SAP SOAP sources. |
source_name | Insert the SAP NetWeaver Gateway URL. This is a required parameter. |
wsdl_url | Insert the SOAP WSDL URL related to the SAP ABAP webservice . This is a required parameter. |
client | Enter the SAP client number. This is a required parameter. |
username | Enter the SAP username to connect to the SAP Webservice. This is a required parameter. |
password | Enter the password to connect to the SAP SAP Webservice. This is a required parameter. |
certificates | Certificates needed for authentication towards SAP instance. This is an optional parameter. See the Certificates section. |
timezone | Specify how the extractor should handle the source time zone. Valid values are local and utc . The default value is local . |
This is the schema for SAP RFC sources
Parameter | Description |
---|---|
type | Type of SAP source connection, set to rfc for SAP RFC sources. |
source_name | Insert the SAP NetWeaver Gateway URL. This is a required parameter. |
ashost | Insert SAP application host address. This is a required parameter |
sysnr | Technical identifier for internal processes in SAP. It consists of a two-digit number from 00 to 97. This is a required parameter |
client | Enter the SAP client number. This is a required parameter. |
username | Enter the SAP username to connect to the SAP NetWeaver Gateway. This is a required parameter. |
password | Enter the password to connect to the SAP NetWeaver Gateway. This is a required parameter. |
saprouter | Enter the SAPRouter address when applicable. This is an optional parameter. |
snc_partnername | Enter the SAP SNC (Secure Network Communcation) name when applicable. This is an optional parameter. |
snc_lib | Enter the path to the the SAP SNC library needed when using SNC authentication. This is an optional parameter. |
x509cert | Enter the path to the user X509 certificate when applicable . This is an optional parameter. |
timezone | Specify how the extractor should handle the source time zone. Valid values are local and utc . The default value is local . |
Certificates
Use the certificates
subsection the certificates to be used for authentication towards SAP instances.
There are three certificates needed to perform the authentication: certificate authority (ca_cert
), public key (public_key
), and private key (private_key
)
Please check this documentation on how to generate the three certificates from a .p12 certificate file, if needed.
When setting the certificate authentication, note thatthree certificates are needed and they must be placed in the same folder where the extractor will be running.
Parameter | Description |
---|---|
ca_cert | Enter the path to the CA certificate file. |
public_key | Enter the path to the public key file. |
private_key | Enter the path to the key file. |
Endpoints
Use the endpoint
subsection to specify the OData endpoints.
Parameter | Description |
---|---|
name | Enter the name of SAP endpoint that will be used to extract data from a SAP source. The name must be unique for each query in the configuration file. This is a required parameter. |
source_name | Enter the name of the SAP source related to this endpoint. This must be one of the SAP sources configured in the sap section. This is a required parameter. |
sap_service | Enter the name of the related SAP service. For odata endpoints, it's the SAP OData service. For soap endpoints, it's the operation defined in the WSDL document. For rfc endpoints, it's the name of the SAP function module exposed through the RFC protocol. This is a required parameter. |
sap_entity | Enter the name of the SAP entity related to the SAP OData service. This is a required parameter. |
destination | The destination of the data in CDF. One of many destination types, see Destination. This is a required value. |
sap_key | Enter list of fields related to the SAP entity to be used as keys while ingesting data to CDF staging. This is a required parameter when using raw as a CDF destination. |
request | Enter the request to be sent to the SAP. This is a required parameter for rfc and soap endpoints. See Request section. |
incremental_field | Enter the name of the field to be used as reference for the incremental runs. This is an optional parameter. If you leave this field empty, the extractor will fetch full data loads every run. |
schedule | Schedule the interval which the OData queries will be executed towards the SAP Odata service. See the Schedule section. |
extract_schema | Extracts the SAP entity schema to CDF staging area. It expects database and table parameters, same as RAW destination. This is an optional parameter. |
Request
The request
parameter is required for rfc
and soap
endpoints.
Both SOAP or RFC communication protocols need a request to the SAP server in order to retrieve data.
SOAP requests
SAP ABAP Webservices are SOAP/based, meaning the requests to the SAP server must be in a valid XML format.
The SAP extractor expects this XML to be added as a string in the request
parameter. This is an example of a valid XML request to a SAP ABAP Webservice generated from a SAP Function Module:
request: |
<n0:BAPI_FUNCLOC_GETLIST xmlns:n0="urn:sap-com:document:sap:rfc:functions">
<FUNCLOC_LIST>
<item>
<FUNCTLOCATION>String 57</FUNCTLOCATION>
<FUNCLOC>String 58</FUNCLOC>
<LABEL_SYST>S</LABEL_SYST>
<DESCRIPT>String 60</DESCRIPT>
<STRIND>Strin</STRIND>
<CATEGORY>S</CATEGORY>
<SUPFLOC>String 63</SUPFLOC>
<PLANPLANT>Stri</PLANPLANT>
<MAINTPLANT>1010</MAINTPLANT>
<PLANGROUP>Str</PLANGROUP>
<SORTFIELD>String 67</SORTFIELD>
</item>
</FUNCLOC_LIST>
<MAINTPLANT_RA>
<item>
<SIGN>I</SIGN>
<OPTION>EQ</OPTION>
<LOW>1010</LOW>
<HIGH>1010</HIGH>
</item>
</MAINTPLANT_RA>
</n0:BAPI_FUNCLOC_GETLIST>
RFC requests
The SAP RFC communication protocol triggers a SAP Function module remotely to a target SAP server. SAP Function Modules expect import parameters in order to run and return the processed request.
The SAP extractor expects the SAP FM parameters to be sent as a JSON request inside the request
parameter. This is an example of a valid SAP RFC call to RFC_READ_TABLE
SAP function module.
request: |
{
"QUERY_TABLE":"QMEL",
"FIELDS":["QMNUM","QMART","QMTXT"]
}
Schedule
Use the schedule
subsection to schedule runs when the extractor runs as a service.
Parameter | Description |
---|---|
type | Insert the schedule type. Valid options are cron and interval . cron uses regular cron expressions.interval expects an interval-based schedule. |
expression | Enter the cron or interval expression to trigger the query. For example, 1h repeats the query hourly, and 5m repeats the query every 5 minutes. |
Destination
The raw
destination writes data to the CDF staging area (RAW). The raw destination requires the sap_key
parameter in the endpoint configuration.
Parameter | Description |
---|---|
type | Type of CDF destination, set to raw to write data to RAW. |
database | Enter the CDF RAW database to upload data into. This will be created if it doesn't exist. This is a required value. |
table | Enter the CDF RAW table to upload data into. This will be created if it doesn't exist. This is a required value. |
Time series
The time_series
destination inserts the resulting data as data points in time series.
There are two mandatory parameters in order to use time series as destination:
type
: Set totime_series
to write data to CDF time seriesfield_mapping
: To ingest data into a time series, SAP entity fields must be mapped to the following CDF timeseries fieldsexternalId
: Required SAP entity field.timestamp
: Required SAP entity field.value
: Required SAP entity field.
Assets
The assets
destination inserts the resulting data as CDF assets.
There are two mandatory parameters in order to use CDF assets as destination:
type
: Set toassets
to write data to CDF assetsfield_mapping
: To ingest data into assets, SAP entity fields must be mapped to the following CDF time series fieldsexternalId
: Required SAP entity field.parentExternalId
: Optional SAP entity field.description
: Optional SAP entity field.source
: Optional SAP entity field.
Any other columns returned by the endpoint call will be mapped to key/value pairs in the metadata
field for assets.
Events
The events
destination inserts the resulting data as CDF events.
There are two mandatory parameters in order to use CDF events as destination:
type
: Set toevents
to write data to CDF eventsfield_mapping
: To ingest data into events, SAP entity fields must be mapped to the following CDF time series fieldsexternalId
: Required SAP entity field.startTime
: Optional SAP entity field.endTime
: Optional SAP entity field.description
: Optional SAP entity field.source
: Optional SAP entity field.
Any other columns returned by the endpoint call will be mapped to key/value pairs in the metadata
field for events.