ML Event Registry
At Coinbase, there are thousands of different events across hundreds of Kafka topics that may be useful for ML applications, each with their own schema. For our framework to support an evolving set of events, we explicitly register each event that will be consumed by an ML application.
MLEs then interact with the event registry to generate Tecton data sources by selecting one or more events and their desired metadata.
Under the hood, our framework autogenerates Tecton Spark data source functions that
Load each event from various Kafka topics (for streaming) and Delta Lake tables (for batch and backfills)
Transform the unique schema for each event into a standardized schema
user_id: StringType
event_name: StringType
timestamp: TimestampType
metadata: StructType
Union the transformed events into a single Spark dataframe for feature computation
By introducing an event registry and simplifying data source generation, we enable MLEs to quickly discover and select useful events. In practice, MLEs have used as many as 50 events in a single sequence feature and dozens of metadata fields.