How Tesla’s FSD Works - Part 2

We previously dived into how FSD works based on Tesla’s patents back in November, and Tesla has recently filed for additional patents regarding its training of FSD.
This particular patent is titled “Predicting Three-Dimensional Features for Autonomous Driving” - and it’s all about using Tesla Vision to establish a ground truth - which enables the rest of FSD to make decisions and navigate through the environment.
This patent essentially explains how FSD can generate a model of the environment around it and then analyze that information to create predictions.
Time Series
Creating a sequence of data over time - a Time Series - is the basis for how FSD understands the environment. Tesla Vision, in combination with the internal vehicle sensors (for speed, acceleration, position, etc.,) establishes data points over time. These data points come together to create the time series.
By analyzing that time series, the system establishes a “ground truth” - a highly accurate and precise representation of the road, its features, and what is around the vehicle. For example, FSD may observe a lane line from multiple angles and distances as the vehicle moves through time, allowing it to determine the line’s precise 3D shape in the world. This system helps FSD to maintain a coherent truth as it moves forward - and allows it to establish the location of things in space around it, even if they were initially hidden or unclear.
Author’s Note
Interestingly, Tesla’s patent actually mentions the use of sensors other than Tesla Vision. It goes on to mention radar, LiDAR, and ultrasonic sensors. While Tesla doesn’t use radar (despite HD radars being on the current Model S and Model X) or ultrasonic sensors anymore, it does use LiDAR for training.
However, this LIDAR use is for establishing accurate sensor data for FSD - for training purposes. No Tesla vehicle is actually shipped with any LiDAR sensors. You can read about Tesla’s use for its LIDAR training rigs here.
Associating the Ground Truth
Once the ground truth is established, it is linked to specific points in time within the time series - usually a single image or the amalgamation of a set of images. This association is critical - it allows the system to predict the complete 3D structure of the environment from just that single snapshot. In addition, they also serve as a learning tool to help FSD understand the environment around it.
Imagine FSD has figured out the exact curve of a lane line using data from the time series. Next, it connects this knowledge to the particular image in the sequence where the lane line was visible. Next, it applies what it has learned - the exact curve, and the image sequence and data - to predict the 3D shape of the line going forward - even if it may not know for sure what the line may look like in the future.
Author’s Note
This isn’t part of the patent, but when you combine that predictive knowledge with precise and effective map data, that means that FSD can better understand the lay of the road and plan its maneuvers ahead of time. We do know that FSD takes into account mapping information. However, live road information from the ground truth is taken as the priority - mapping is just context, after all.
That is why when roads are incorrectly mapped, such as the installation of a roundabout in a location where a 4-way stop previously existed, FSD is still capable of traversing the intersection.
Three Dimensional Features
Representing features that the system picks up in 3D is essential, too. This means that the lane lines, to continue our previous example, must move up and down, left and right, and through time. This 3D understanding is vital for accurate navigation and path planning, especially on roads with curves, hills, or any varying terrain.
Automated Training Data Generation
One of the major advantages of this entire 3D system is that it generates training data automatically. As the vehicle drives, it collects sensor data and creates time series associated with ground truths.
Tesla does exactly this when it uploads data from your vehicle and analyzes it with its supercomputers. The machine learning model uses all the information it gets to better improve its prediction capabilities. This is now becoming a more automated process, as Tesla is moving away from the need to manually label data and is instead automatically labeling data with AI.
Semantic Labelling
The patent also discusses the use of semantic labeling - a topic covered in our AI Labelling Patent. However, a quick nitty-gritty is that Tesla labels lane lines as “left lane” or “right lane,” depending on the 3D environment that is generated through the time series.
On top of that, vehicles and other objects can also be labelled, such as “merging” or “cutting in.” All of these automatically applied labels help FSD to prioritize how it will analyze information and what it expects the environment around it to do.
How and When Tesla Uploads Data
Tesla’s data upload isn’t just everything they may catch - even though they did draw an absolutely astounding 1.28 TB from the author’s Cybertruck once it received FSD V13.2.2. It is based on transmitting selective sensor information based on triggers. These triggers can include incorrect predictions, user interventions, or failures to correctly conduct path planning.
Tesla can also request all data from certain vehicles based on the vehicle type and the location - hence the request for the absurd 1.28 TB coming from one of the first Canadian Cybertrucks. This allows Tesla to collect data from specific driving scenarios - which it needs to help build better models that are more adaptable to more circumstances while also keeping data collection focused, thereby making training more efficient.
How It Works
To wrap that all up, the model applies predictions to better navigate through the environment. It uses data collected through time and then encapsulated in a 3D environment around the vehicle. Using that 3D environment, Tesla’s FSD formulates predictions on what the environment ahead of it will look like.
This process provides a good portion of the context that is needed for FSD to actually make decisions. But there are quite a few more layers to the onion that is FSD.
Adding in Other Layers
The rest of the decision-making process lies in understanding moving and static objects on the road, as well as identifying and reducing risk to vulnerable road users. Tesla’s 3D mapping also identifies and predicts the pathing of other moving objects, which enables it to conduct its path planning. While this isn’t part of this particular patent per-say, it is still an essential element to the entire system.
If all that technical information is interesting to you, we recommend you check out the rest of our series on Tesla’s patents:
How FSD Works Part 2 (this article)
We’ll continue to dive deep into Tesla’s patents, as they provide a unique and interesting source to explain how FSD actually works behind the curtains. It’s an excellent chance to get a peek behind the silicon brains that make the decisions in your car, as well as a chance to see how Tesla’s engineers actually structure FSD.