Original article was published by Theodoros Ntakouris on Deep Learning on Medium
Most machine learning pipelines read data from a structured source ( database, CSV files/ Pandas Dataframes , TF Records), perform feature selection, cleaning, (and possibly) preprocessing, passing a raw multidimensional array (tensor) to a model along with another tensor representing the correct prediction for each input sample.
Reorder or rename input features in production? → Useless results or the client — side breaks in production
Absent Features? Missing Data? Bad output value interpretation? Mixing up integer indices by mistake? → Useless Results or the client — side breaks in production
Want to know what feature columns were used for training in order to provide the same ones for inference? → You can’t — Misinterpretation Errors
Want to know what value output values represent? → You can’t — Misinterpretation Errors