It is a bad example from Quertle blog; claiming that the algorithm “discriminated”. I don’t have any objective proof but the algorithm is as good as the data itself. Instead of framing the important question of structuring data, we are trying to reinvent the wheel by running algorithms on unstructured and “missing data”. As such- it is the famous GIGO principle- “garbage in, garbage out”.
Currently, there is no definitive answer on the matter, but if this does prove to be the case, it is likely that the problem lies within the data. Data sets are crucial for training the AI decision process, but as mentioned above, if the assumptions underlying the AI training are biased, the AI results will also be biased. For example, if one group (in this case black patients) were underserved in the first place, the training set might inappropriately connect lower-cost procedures with that group.
This raises a critically important consideration when building AI for healthcare: data quality. Patient demographics, ethnicity, genetics, treatment history, and much more need to be accounted for. Any artificial intelligence training must include equal representation of all variables with all possible treatment pathways.
This also further speaks to the general problem for deep learning AI: the resulting algorithms will always reflect the limitations of the training sets. The basic premise is that greater volumes of data contribute more to the validity of the model than the accuracy. I think, though, that there is often a failure to recognize, regardless of volume, the training sets absolutely cannot be biased.
Data quality is extremely difficult to standardize unless there exists a central repository to “prepopulate” the forms. Databases exist in silos. API’s don’t exist. Even if someone attempts to make them, the experience is tortuous and expensive.