Defining a schema: Bringing your own properties into the graph
You can create your own schemas to enrich instances with custom properties. The schemas consist of three key elements:
- Containers: define physical storage which contains properties.
- Views: establish logical schemas which map properties.
- Data models are collections of one or more views, used for graph data consumption and ingestion.
All three elements are scoped to a space, just like instances:
Containers
Containers are the physical storage for properties. They are defined within a space, and hold a set of properties that logically belong together. You must define types for your properties, and you can add optional constraints that the data must adhere to, and define indexes to optimize query performance.
Containers store properties for instances (nodes and edges.) An instance can have properties in multiple containers:
You can populate the containers for an instance, in this example below, for a node.
This data:
externalId: 'xyz42'
equipment:
manufacturer: 'Acme Inc.'
pump:
maxPressure: 1.2
translates to this:
You can define containers in different space than the space holding the instances. This can be useful if you want to use the same schema for nodes in different spaces, which is often the case given the access control model.
As you add data to these containers for more nodes, the physical storage of the containers will look similar to this:
Note that only node.{space, externalId, type}
is included in the Node
base container for brevity.
This is similar to relational database schemas where (space, externalId)
constitutes a foreign key to the core node table, and results in a snowflake schema. Importantly, this data lives on a different plane than the graph data discussed in the previous section. For example, nothing ensures that a node has data in Pump
just because it has node.type
set to [types, pump]
. Validation of data content is left to the client to determine, but you can use views to make it more ergonomic.
Which types of instances can you use a container for?
The usedFor
field lets you define which types of instances the containers can be used for. Specify one of these values:
node
: the container can only be used to populate properties on a node.edge
: the container can only be used to populate properties on an edge.all
: the container can be used to populate properties on both nodes and edges.
If you use all
, ingesting to the container will be more expensive than using only node
or edge
.
Properties
When you define a container, you must specify the properties it will contain. Data modeling supports the following basic data types for properties:
Property type | Description |
---|---|
text | A string of characters. |
int64 | A 64-bit integer. |
float64 | A 64-bit floating point number. |
float32 | A 32-bit floating point number. |
boolean | A boolean value. |
timestamp | A timestamp (with timezone). |
date | A date (without timezone). |
json | A JSON object. |
direct | A direct relation to another instance. |
In addition to these property types, we support native reference types that point to resources in other CDF APIs. This lets you reference data not suited for storage in a property graph. We support the following native resource reference types:
Native resource reference type | Description |
---|---|
TimeSeries | A reference to one specific time series. You can use GraphQL queries to expand data from the time series, including data points. |
File | A reference to a file stored in CDF and uploaded through the files service. |
Sequences | A reference to a sequence stored in CDF. |
With the exception of the direct
type, we support declaring all of these base and reference types as lists. For example, to store a list of file references: files: [File]
You can specify whether the property is nullable, and provide a default value.
The full specification of a required string property can look like this:
name: myStringProperty
description: A string property
nullable: false
defaultValue: foo
type:
type: string
list: false
When creating your containers, it is important to consider how they will be queried. Identify the properties likely to be used to filter and sort on when querying, and create indexes or composite indexes to support your queries. Since indexes can represent additional update overhead when data is mutated or ingested, make sure the indexes you create are useful and necessary with testing and performance validation.
Indexes
Well designed indexes speed up data access, they support constraints such as uniqueness, and efficient cursoring with custom sort operations. Data modeling supports up to 10 indexes for a container.
An index belongs to a container and is not a flag on a property. When you're laying out your physical schema, it's important to remember that you can only build indexes using properties from the same container. Indexes cannot be built using properties hosted in different containers
We support two index types:
btree
: Use a btree index on primitive base types for efficient lookups and range scans. You can set btree indexes to be cursorable, and enable efficient cursoring with custom sorts.inverted
: Use an inverted index for list-type properties to enable efficient searching for values that appear within the list.
Constraints
Use constraints to restrict the values that can be stored by a property, or in a container. Constraints ensure that the data has integrity, and reflects the real world. We support up to 10 constraints per container.
Currently, we support two constraint types:
uniqueness
: Ensures that the values of a property or a set of properties are unique within the container.requires
: Points to another container, and requires that the instance has data in that other container used to populate this container.
Example container definition
This example defines two containers, Equipment
and Pump
:
- It sets
usedFor
tonode
on both containers to allow them to be populated for nodes, not edges. - The
btree
index onEquipment.manufacturer
enables efficient sorting/filtering on themanufacturer
property. The index is cursorable and lets you efficiently cursor through equipment nodes when sorting onmanufacturer
. - The
requires
constraint onPump
ensures that any node with data in thePump
container also has data in theEquipment
container.
- space: equipment
externalId: Equipment
usedFor: node
properties:
- manufacturer:
type:
type: string
list: false
nullable: false
indexes:
manufacturer:
type: btree
properties:
- manufacturer
cursorable: True
- space: equipment
externalId: Pump
usedFor: node
properties:
- maxPressure:
type:
type: float64
list: false
nullable: false
constraints:
requireEquipment:
constraintType: requires
require:
space: equipment
externalId: Equipment
Views
Use views to create logical schemas to consume and populate a graph tailored for specific use cases. Like containers, views contain a group of properties. You define the views by either mapping container properties, or by creating connection properties to express the expected relationships in the graph.
You query data through your defined views. Data is not queried directly from the containers.
Mapped properties
Views let you map properties from different containers in a "flat" object and rename or alias properties.
For example, this view creates a flat object with the properties manufacturer
and maxPressure
from the Equipment
and Pump
containers. It also renames the manufacturer
property to producer
:
You can use the view to populate a node with data from both the Equipment
and Pump
containers at the same time, and query for the properties when retrieving the nodes.
Connection properties
Connection properties let you describe that you expect certain relations to exist between nodes in the graph. For example, you can express that you expect nodes with data in BasicPump
to have flows-to
edges to Valve
nodes. When this metadata is persisted, you can retrieve related data when viewing instances through a particular view. For example:
The example above encodes that nodes with data in BasicPump
can have flows-to
edges to nodes with data
in BasicValve
. You can describe this with these fields in a connection property:
type
: the fully qualified external ID of the node representing the edge type.source
: a reference to the view which you can view the node in the other end through.direction
: the direction to traverse the edge (inwards/outwards).edgeSource
: an optional reference to a view.
A Pump/Valve would look like this:
type:
space: types
externalId: flows-to
source:
space: equipment
externalId: BasicValve
version: v1
direction: outwards
Currently, we only support a single type of connection property, which lets you represent single-hop edge traversals.
Implementing other views
Views can implement other views and inherit their properties. This is useful when you want to create a view that combines the properties of multiple other views. For example:
- space: equipment
externalId: Equipment
version: v1
properties:
manufacturer:
container:
space: equipment
externalId: Equipment
containerPropertyIdentifier: manufacturer
- space: equipment
externalId: Pump
version: v1
implements: # <-- Declares that the view implements the Equipment view
- space: equipment
externalId: Equipment
version: v1
properties:
maxPressure:
container:
space: equipment
externalId: Pump
containerPropertyIdentifier: maxPressure
The effective properties of the Pump
view are now manufacturer
and maxPressure
.
The effective properties of a view are resolved at query time. We do not allow breaking changes to views, but if a view implemented by another view is deleted, the inherited properties will be removed from the implementing view. This could break clients if it removes any required properties.
Implemented property conflicts and precedence
If you, for example, have four views; A, B, C, and D, each with a single property with the following implements graph, you can see the effective properties on the right.
If you introduce conflicting property identifiers in this graph, they are resolved by sorting the implements
graph topologically. The order beneath a node is determined by the order of the implements array, where later entries are preferred.
If B implements [C, D]
, the order of precedence is A, B, D, C
.
If B implements [D, C]
, the order is A, B, C, D
.
In these examples B implements [C, D]
:
View versioning
Views are versioned, and you can not introduce breaking changes without changing the version.
You can adapt the versioning scheme to your needs. If you don't have a preference, we recommend using integer numbers starting from 1 and then increment by one for each new version.
Use decimals such as 1.1, 1.2, and 2.0 if you want higher granularity. The whole number part increments each time you make a more significant change that requires migrating the business logic in your applications. The fractional number part increments when you make minor changes that break the API but don't substantially break the logic required to use the API.
Semantic versioning is a widely used versioning scheme; all version increments to a data model will break, as non-breaking changes are allowed without changing the version. You can choose to implement semantic versioning, but it requires more attention to the reusability factor designed for CDF data modeling. The semantic versioning scheme is not designed for data modeling but for software development. The a.b.c
versioning scheme in semantic versioning specifies that both changes in a
and b
are breaking changes. This leaves only c
for non-breaking changes.
In CDF data modeling, we allow non-breaking changes to the data model without incrementing the version. For example, adding a new data type to a data model is a non-breaking change. However, in semantic versioning, this would be a breaking change.
View filters
All views have a filter field that lets you filter the nodes that are included when querying the view. For most higher-level query endpoints in DMS, the filters are applied automatically. For advanced endpoints, you have to apply the filters manually.
If no filter is specified, the default hasData
filter is applied on the list of views specified when querying. Learn more about hasData
filters in the querying article.
Equipment example view
This example illustrates a view definition for equipment:
- space: equipment
externalId: BasicEquipment
properties:
producer:
container:
space: equipment
externalId: Equipment
containerPropertyIdentifier: manufacturer
- space: equipment
externalId: BasicValve
version: v1
# Since this only maps properties in the Equipment view, we can't rely on hasData filtering.
# We add a custom filter to make sure we only include nodes of the correct type.
filter:
equals:
property: ['node', 'type']
value: { 'space': 'types', 'externalId': 'valve' }
implements: # Inherit the properties from the BasicEquipment view
- space: equipment
externalId: BasicEquipment
- space: equipment
externalId: BasicPump
version: v1
implements: # Inherit the properties from the BasicEquipment view
- space: equipment
externalId: BasicEquipment
properties:
maxPressure:
container:
space: equipment
externalId: Pump
containerPropertyIdentifier: maxPressure
valves:
type: # The edge type to traverse
space: types
externalId: flows-to
source: # The view to view the other node in
space: equipment
externalId: BasicValve
version: v1
direction: outwards
Polymorphism in views
You can achieve polymorphism for views in two ways:
In the sections below, BasicPump
and BasicValve
are subtypes of BasicEquipment
.
Using implements
and implicit view hasData
filtering
When the BasicPump
view implements the BasicEquipment
view, the default filter on BasicPump
is a hasData
filter across the underlying containers: Pump
and Equipment
.
If you filter using the BasicEquipment
view, you'll get anything with data in the Equipment
container. If you filter using the BasicPump
view, you'll get anything with data in both the Equipment
and the Pump
containers.
This approach to polymorphism resembles structural subtyping: "if it looks like a duck and quacks like a duck, it's a duck."
This breaks down in some cases. For example, if a view only has connection properties, there are no backing containers to apply the hasData
filtering on. In this case, you can use the type
property and view filters - see the next section.
Using explicit view filters on type
If, for example, you associate all pump nodes with the type [types, pump]
, you can use a view filter to only include nodes with that type:
filter:
equals:
property: ['node', 'type']
value: { 'space': 'types', 'externalId': 'pump' }
To list all equipment nodes, you must explicitly include the subtypes of the view in your filter:
filter:
in:
property: ['node', 'type']
values:
[
{ 'space': 'types', 'externalId': 'pump' },
{ 'space': 'types', 'externalId': 'valve' },
]
This approach to polymorphism resembles nominal subtyping: "if it's a duck, it's a duck."
Data models
Use data models to group views that belong together for a purpose. For example, you might define a EquipmentInspection
data model containing a BasicValve
and a BasicPump
view.
space: equipment
externalId: EquipmentInspection
version: v1
views:
- space: equipment
externalId: BasicPump
version: v1
- space: equipment
externalId: BasicValve
version: v1