Hello friends! Once again, it has been quite a while since I last posted. Heads up, this will be a long post. Grab some popcorn! This post will cover how I started at NVIDIA, worked on cutting-edge autonomous cars, and then transitioned into the equally cutting-edge field of generative AI.
PS – the visuals in this post are generated by DALL-E 3 through OpenAI’s ChatGPT4.
My last post, Experimenting with Light Fields, was back in April 2021, when I was just about to graduate from Carnegie Mellon with my MS in Robotic Systems Development. After graduation, I joined NVIDIA as a systems software engineer on the DRIVE Autonomous Vehicles (AV) team.
Understanding NVIDIA’s Role in Accelerated Computing
When I told most people that I was building self-driving cars at NVIDIA, their immediate response is, “wait, NVIDIA works on autonomous cars?” The answer is yes, we do. NVIDIA was founded in 1993 as a company building hardware chips and systems to solve computer graphics. This mission has since expanded to cover a wide variety of solutions under the umbrella of “accelerated computing”. As a result, NVIDIA chips are being used across nearly every major scientific and industrial vertical of importance, including graphics, cinematography, supercomputing, climate change and weather simulations, quantum computing, self-driving, artificial intelligence and robotics. This is what makes it one of the most valuable companies (by market cap) in the world today.
NVIDIA has two approaches to the self-driving market. One approach is to build high-performance self-driving SoCs (systems-on-chip) like the NVIDIA Xavier, Orin and Thor. These are ARM-based supercomputers on chips, with integrated CPUs, GPUs, custom deep learning and computer vision accelerators, and are automotive functional-safety certified. These also come with a low-level operating system called DRIVE OS, but beyond that, it is up to the vehicle manufacturer or AV company to build additional software functionality. A number of companies buy this DRIVE SoC hardware, and customize for their needs, including Tesla (from 2016-2018), Zoox, Rivian, Lucid and more.
The second approach is to partner with a large automotive OEM to own the entire stack end-to-end, including the hardware, software and the systems in between. The automotive company would only be manufacturing the physical vehicles. This is the NVIDIA DRIVE AV Solution, or NDAS. NVIDIA has partnered with Mercedes-Benz and Jaguar Land Rover for these capabilities. From 2024 and 2025 respectively, every new vehicle from these companies will have NVIDIA hardware and software within them!
My Role in the Fleet Engineering Team
In the NVIDIA AV team, I was one of the first members of the Fleet Engineering initiative. This team was responsible for building tools and workflows that would accelerate our day-to-day in-car testing and ensure that our vehicles went out as quickly as possible.
The team was hugely cross-functional. We would be the interface to the vehicles, through which all other teams would need to interact. We built out core infrastructure for vehicle setup, including installing custom software builds, running various tests at a regular cadence, and setting up standardized methods of validating quality of drives. Our team was regularly in the AV garage, and we had the unique position of being able to watch the software quality improve over time.
The other advantage of being the founding engineer of a small team at a large company, is that it operates just like a startup with infinite capital. You have power to operate and take up anything that can make the product or company better, and you have the freedom to deploy it into the vehicles immediately. Rapid iteration like a startup, but the scale of a large company – the idea that whatever software I built will go into production sales of over 2 million cars per year (Mercedes-Benz) was thrilling.
About 1-1.5 years into this role, the world began seeing renewed interest in the field of generative AI. Generative AI, simply put, is a technique for creating new content in the form of text, images, audio and more. The current wave (“AI Spring”) began back in August 2022, with the launch of stable diffusion, a deep learning model that could convert text into images. It absolutely exploded with the launch of ChatGPT, by OpenAI, in November 2022.
Project Athena: Bridging AI and Practical Solutions
Around the same time, I began experimenting with how we could automate or speed up the resolution of issues within our AV Fleet testing. Our vehicles would go out on daily drives along fixed routes, with test plans that were generated by developers or by our software QA teams. During these drives, if the test operators faced issues, they would post it on a Slack hotline channel with a detailed description of the problem, a screenshot of the issue, and so on. Wouldn’t it be great if we could automatically analyze the screenshot or message, identify the issue, see if it had been discussed before, and then provide the user with suggestions on how they could fix it?
With the above problem statement in mind, I began working on Project Athena in September 2022. Athena was a Slack bot that would automatically respond to user messages on the hotline channel and provide solutions based on similar issues from the past. Here’s a screenshot of Athena’s description:
So how does Athena work? It started off pretty simple. I built off some of the research work I had done in the past – “Enriching Social Media Personas with Personality Traits: A Deep Learning Approach Using the Big Five Classes”. This won the Best Paper Award, AI in HCI Track at the HCI International 2020 Conference in Copenhagen, Denmark in 2020 and has over 30 citations as of Jan’24. In that work, we classified some Twitter tweets into one of five categories, by using pre-trained language models like BERT and distilBERT. These models are able to convert text data into mathematical vector representations with certain useful properties. One of these properties allows similar semantic phrases and paragraphs to be mapped into similar locations in vector space. This means that the vector distance between them is small. It allows us to find which sentences or paragraphs have the same semantic meanings.
For Athena, this meant that we could convert every new user’s issue into a semantic word vector embedding, and then compare it against all the past issue embeddings, to find which past issues were similar to the new one. Once we had that, we could simply retrieve the solution to the past issues and show them to the user.
It started out great. We would have Athena send responses to every new user query on the channel. But we soon realized that Athena was still limited – she would respond with 5 past issues or the fixes to those issues, but it was up to the user to filter through those, and try them out to see if they worked. The burden of proof was on the user to verify, Athena was just a slightly smarter information retrieval system.
The next obvious improvement to Athena was for it to respond akin to ChatGPT – generate an entirely new answer based on knowledge from the past issues, recent bugs, documentation and user manuals, and more. This would make Athena similar to an in-car intelligent assistant for troubleshooting and fixing problems. For this, I began experimenting with large language models, or LLMs. ChatGPT itself relies on an LLM created by OpenAI, called GPT3 (and now GPT4). These models are defined by a neural network architecture, comprising a set of mathematical operations connected by a series of weights. From the perspective of a layman, these generative LLMs take plain language (usually English) as input, and can respond in various languages (including human-spoken ones like English, but also code languages like Python, C++ or Java).
With NVIDIA’s vision of accelerating computing, the company had invested into a software and system stack named CUDA (which I have blogged about before – Programming in CUDA) since 2006. This software stack forms the basis for a large majority of artificial intelligence model training, deployment and inference today. NVIDIA also provides various other frameworks like NeMo for Generative AI. Our own internal NeMo team had hosted various LLMs for use by employees through an API.
Using these internal LLMs, I was able to spin up a new-and-improved version of Athena that could take these past issues as context and generate an entirely new reply to the issue that would incorporate the information but would provide a precise and customized response for the user. It could also use previous memory of the interactions and provide a high-quality interactive troubleshooting experience.
Working on Athena reminded me of my interest in building applications with deep learning and AI. Furthermore, this was the cutting-edge now. There were startups and research papers being released every day in 2023, and an enormous amount of funding went into this field. NVIDIA’s share price went up by over 3 times (240% increase) in the year, catapulting it well over 1 trillion dollars in valuation. I wanted to continue working on these applications, and also felt it would give me the best opportunity to grow, since it was a fast-moving field.
A New Chapter: Senior Software Architect for Generative AI
And so, in October 2023 I decided to transition to the role of Senior Software Architect for Generative AI. This was part of the Solutions Architecture and Engineering group at NVIDIA, which works directly with enterprise customers on building applications with NVIDIA products. So far, I believe it has been a great decision! I have had the opportunity to work on really exciting projects, it is equally cross-functional, if not more, and I have a sense of excitement and urgency that is brought on by the rapid pace of development by the worldwide open-source and research community.
One of the most interesting things that happened in my new role is something that blended both my past experience in Autonomous Vehicles, as well as my current interests in Generative AI. Cerence, which is a company that builds in-vehicle experiences across 475 million vehicles worldwide, announced that they would be partnering with NVIDIA to build their own custom CaLLM (Cerence Automotive Large Language Model). In fact, the first version of it is a prototype I helped to build, and it will be demo’ing at CES2024!
This post has become super detailed already, but I will strive to write more posts around various ideas mentioned here. Most importantly, I want to share how I believe generative AI will revolutionize almost all industries and experiences, no matter how unlikely. With AI Assistants to aid us in our day-to-day work, we will see productivity and new opportunities grow!
Thank you so much if you read this far. If you think this was useful, I would really appreciate if you like or share the post. If you have suggestions on what content I should write about, please feel free to leave a comment below. Thanks!