Most existing tools for training and deploying machine learning (MLOps) were built as if for traditional software. They focus on the code rather than the data and they target narrow slices of the ML development pipeline. There are MLOps tools for monitoring, for feature stores, for model versioning, for dataset versioning, for model training, for evaluation stores and more. Almost none of these tools make it easy to actually look at and understand the data that the systems are learning from.

I can’t comment much on it, except I strongly believe that data should be structured in the first place. Getting that right is critical, and then iterate on the machine learning data sets with increasing complexity.