
Ben Dickson writing for Venture Beat:
Karpathy acknowledged that vision-based autonomous driving is technically more difficult because it requires neural networks that function incredibly well based on the video feeds only. “But once you actually get it to work, it’s a general vision system, and can principally be deployed anywhere on earth,” he said.
The main argument against the pure computer vision approach is that there is uncertainty on whether neural networks can do range-finding and depth estimation without help from lidar depth maps.
It is difficult to rely on a mainstream publication for a “scholarly understanding” of the technologies. Most of the papers in this subject are beyond my limited understanding of the field. I remain unaware of the terminologies but it is a good starting point to discuss with a subject expert.
For example, Bixby vision from Samsung uses on-device camera to give a contextual understanding of objects. It is gimmicky in the present iteration, but if the general on device processors become better, what would stop it from pointing the camera on the X-ray and getting an instant report? The general purpose computer vision is being pushed out for cars, and the technology can be adapted for healthcare with minor tweaks. Here’s an interesting demostration of Tesla’s recognition of objects:

The company created a hierarchical deep learning architecture composed of different neural networks that process information and feed their output to the next set of networks. The deep learning model uses convolutional neural networks to extract features from the videos of eight cameras installed around the car and fuses them together using transformer networks. It then fuses them across time, which is important for tasks such as trajectory-prediction and to smooth out inference inconsistencies.

My concern is that it requires intensive computing resources – first to train models and labelling (either automated or human subjects) and will require extensive funding. Will healthcare enterprises shift towards this sort of data modelling? I doubt.
Before you get overjoyed with Tesla ferrying you to work without a driver, here’s a kicker:
Deep learning models also struggle with making causal inference, which can be a huge barrier when the models face new situations they haven’t seen before. So, while Tesla has managed to create a very huge and diverse dataset, open roads are also very complex environments where new and unpredicted things can happen all the time.
Autonomous driving is a chimera. I am not suggesting that general purpose research is futile. My contention is this sort of funding and implementation causes “data-concentration”, which won’t usher diffusion towards a greater public good. Computer vision (and object recognition) remains on the fringes of AI research for consumers to benefit from the eventual spin-offs. It won’t take much for Tesla to open up another start up or a business line to apply this for anything else. For example, in self-check out stores. Or a healthcare application with further data and refining that acts as a flywheel.
The real overlords of AI are the ones who own the prorietary technology.