Overview
For tabular data, Gradient-Boosted Tree (GBT) packages enable training models with minimal pre-processing. Similarly, when datasets are purely natural language, NLP packages enable model training or fine-tuning on popular architectures with minimal effort. In contrast, training a deep learning model on tabular and multi-modal data requires:
Pre-preprocessing for different modalities – e.g. normalizing numerical features, encoding and truncation for categorical/sequence features, tokenization for natural language.
Different architecture components – e.g. embeddings for categorical features, LSTMs for numerical sequences, transformer layers for categorical sequences.
Custom training logic – e.g. configuring optimizers, dataloaders, early stopping, logging.
Despite this added complexity, deep learning models offer potential advantages over off-the-shelf models like GBT: native support for sequence features, full control of feature representations and interactions, and more sophisticated model structures.
EasyTensor aims to simplify this with a developer experience that’s similar to an off-the-shelf model, while offering the capabilities of deep learning models. The core abstraction which enables this is the EasyTensor block, which includes both pre-processing and the neural network layers. To build and train a model, a Machine Learning Engineer merely picks the blocks appropriate for the feature types involved in a given problem, and EasyTensor handles feature processing such as fitting encoders, as well as constructing and training the neural network. This reduces the barrier to entry for experiments with bigger and more capable deep learning models.
Example EasyTensor block for sequences of categories (e.g. user events).
EasyTensor also includes various features and tooling to enhance developer experience:
Logging of training, validation metrics, model checkpoints and code to MLFlow
Fault-tolerant restarts from uploaded checkpoints
Feature importance computation
Automatic inference optimization with TensorRT