Machine Learning in Science: Sage words of advice

  1. Garbage in. Garbage out. Always curate input data. Just like surface preparation is essential for a paint job.
  2. The unknowns are greater than knowns. What looks good in the lab doesn’t portend in the real clinical scenario. Possibly, that’s the reason drug research is so expensive (because of high rates of failure).
  3. GPT-3 is hype.
  4. Buttressing GPT-3 on genomes is many orders of confusion.
  5. There has to be a limit to what we want to discover and how we can redeem the “investments”. It is good to push through an obscure pathway but there is always of “fear of missing out” kind of research-driven agenda. If we don’t know about something, it is good enough to work with what we have.

We’re a long way from the world of Neuromancer –  probably a good thing, too, considering how the AIs behave in it. The best programs that we are going to be making might be able to discern shapes and open patches in the data we give them, and infer that there must be something important there that is worth investigating, or be able to say “If there were a connection between X and Y here, everything would make a lot more sense – maybe see if there’s one we don’t know about”. I’ll be very happy if we can get that far. We aren’t there now.

You can read more here.