This is an interesting bit for neural processing via GPU’s.
Instead of training on the entire 100 million outcomes—product purchases, in this example—Mach divides them into three “buckets,” each containing 33.3 million randomly selected outcomes. Now, MACH creates another “world,” and in that world, the 100 million outcomes are again randomly sorted into three buckets. Crucially, the random sorting is separate in World One and World Two—they each have the same 100 million outcomes, but their random distribution into buckets is different for each world.With each world instantiated, a search is fed to both a “world one” classifier and a “world two” classifier, with only three possible outcomes apiece. “What is this person thinking about?” asks Shrivastava. “The most probable class is something that is common between these two buckets.”Deep Learning breakthrough made by Rice University scientists | Ars Technica
While I was searching for its importance, here’s the paper from arxiv:
Here’s something from the forum discussion:
Bigger batches are good, but they result in locking. Picking a good batch size relative to how much data you have is important. This new technique lets you, effectively, buy a “meta batch” for free (that is a terrible analogy, but it’s the best I can do.).As batches get bigger and can’t fit inside a single gpu or single compute node, your challenge becomes data transport. So anything that will be able to decouple your computational agents can be a win.In this case, it’s a more clever way of decoupling your agents. Normally asynchronous batches are awful, but this is kind of a very clever way of allowing for asynchronous batching of your data.
If I may opine on the matter, I think we’re reaching a point where machine learning researchers should start thinking about abandoning python as a programming medium. For example, the other decoupling strategy (decoupled neural net back propagation) doesn’t really seem like something I would want to write in python, much less debug someone else’s code. Python is really not an appropriate framework for tackling difficult problems in distribution and network coordination.
It is not really a “breakthrough” but I’d leave it for the readers to decide.