Parmy Olson writing in for Bloomberg Opinion
Designed to loosely emulate the human brain, deep-learning AI systems can spot tumors, drive cars and write text, showing spectacular results in a lab setting. But therein lies the catch. When it comes to using the technology in the unpredictable real world, AI sometimes falls short. That’s worrying when it is touted for use in high-stakes applications like healthcare.
The stakes are also dangerously high for social media, where content can influence elections and fuel mental-health disorders, as revealed in a recent expose of internal documents from a whistleblower. But Facebook’s faith in AI is clear on its own site, where it often highlights machine-learning algorithms before mentioning its army of content moderators. Zuckerberg also told Congress in 2018 that AI tools would be “the scalable way” to identify harmful content. Those tools do a good job at spotting nudity and terrorist-related content, but they still struggle to stop misinformation from propagating.
As I have been mentioning – and echoed by Parmy in this brilliant opinion piece – the privacy creep of the big tech is disconcerting. AI is being used to fan misinformation, including the recent expose by WSJ on “Facebook files”, while quoting the whistleblower who provided them with the internal documents. Better data will lead to robust models (and hopefully better insights). Parmy had included another interesting paper in the linked opinion piece, and I am quoting the abstract (ironically it comes from Google).
AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact, impacting predictions like cancer detection, wildlife poaching, and loan allocations. Paradoxically, data is the most under-valued and de-glamorised aspect of AI. In this paper, we report on data practices in high-stakes AI, from interviews with 53 AI practitioners in India, East and West African countries, and USA. We define, identify, and present empirical evidence on Data Cascades—compounding events causing negative, downstream effects from data issues—triggered by conventional AI/ML practices that undervalue data quality. Data cascades are pervasive (92% prevalence), invisible, delayed, but often avoidable. We discuss HCI opportunities in designing and incentivizing data excellence as a first-class citizen of AI, resulting in safer and more robust systems for all.
These are important takeaways.
[embeddoc url=”https://storage.googleapis.com/pub-tools-public-publication-data/pdf/0d556e45afc54afeb2eb6b51a9bc1827b9961ff4.pdf”%5D