How to get into the AI space

The landscape of AI has shifted significantly in the last three years. While a PhD is still the “Golden Ticket” for a pure Research Scientist role (formulating new theories/math), the rise of Large Language Models (LLMs) and massive-scale computing has created a desperate need for Research Engineers.

Being a Product Software Engineer is actually an asset, provided you can pivot your skills toward AI Systems and Engineering.

Here’s some ideas and roadmap to get into the frontier lab without spending 5+ years on a PhD.


1. Shift Target: The “Research Engineer”

Frontier labs are constrained more by compute engineering than by new math right now. They need people who can make training runs stable, optimize GPU usage, and manage massive datasets.

  • Researcher: Invents the algorithm. (Needs PhD).
  • Research Engineer: Implements the algorithm, scales it across 10,000 GPUs, and fixes the bugs. (Needs strong SWE skills + ML fluency).

My Strategy: Leverage product engineering discipline (testing, CI/CD, clean code, distributed systems) and apply it to ML.

2. The “Residency” Route (The Golden Path)

Almost every major AI lab has a “Residency” or “Fellowship” program specifically designed for people with experienced engineering skills or brilliant students who want to transition into AI research without a PhD.

  • OpenAI Residency: Designed to bridge the gap.
  • Google AI Residency: The classic program.
  • Meta AI (FAIR) Residency: Highly prestigious.
  • Anthropic: Often hires “Members of Technical Staff” who are generalist engineers.

Reality Check: These are extremely competitive. To get in, I need a impressive portfolio.

3. Build “Proof of Work” (The Portfolio)

I cannot apply with a resume that only lists React/Java/Python product work. I need to signal that I understand the AI stack.

A. Replicate Papers (The Standard Advice) Don’t just read papers; implement them as suggested by many great research engineers.

  • Take a classic paper (e.g., “Attention is All You Need” or “LoRA”) and implement it from scratch in PyTorch without looking at existing code until you are stuck.
  • Write a blog post explaining your implementation and the engineering challenges.

B. Go Lower Level (The High-Value Advice) Frontier labs value systems optimization. If you learn CUDA programming or Triton, you become instantly hireable.

  • Project Idea: Write a custom CUDA kernel to speed up a specific layer of a Transformer.
  • Project Idea: Optimize a model for inference using quantization techniques.
  • Why: PhDs often suck at this. They know the math; you know how to make it run fast.

C. Contribute to Open Source Don’t fix typos. Fix bugs or add features to:

  • Hugging Face (Transformers, Accelerate)
  • PyTorch Lightning
  • vLLM (very hot right now for inference)
  • LangChain / LlamaIndex

4. The “Infrastructure Backdoor”

If you are a Senior/Staff Engineer in product, you likely know distributed systems, Kubernetes, or data pipelines.

The Strategy: Apply to an AI Lab for an Infrastructure or Platform role, not a Research role.

  • AI labs need massive infrastructure to run experiments.
  • Once you are inside, you are in the lunch line with the researchers. You can learn by osmosis, help them optimize their code, and eventually transfer internally to the research team. This is a very common path at Google and Meta.

5. The “Stepping Stone” Startup

Going from “Generic SaaS Corp” to “OpenAI” is a massive leap. It is easier to make two jumps:

  1. Jump 1: Join a funded Applied AI startup (e.g., a company building tools for LLMs, vector databases like Pinecone, or specialized models like Jasper/Midjourney). Get “AI” on your resume.
  2. Jump 2: Use that experience to apply to a frontier lab.

6. The Study Plan (What to learn specifically)

You don’t need to know everything. Focus on the modern stack:

  1. The Frameworks: PyTorch (Non-negotiable). JAX (Good for Google/DeepMind).
  2. The Architecture: Transformers. You must understand the Transformer architecture inside and out.
  3. Distributed Training: Learn about Data Parallelism (DDP), Tensor Parallelism, and FSDP. This is how large models are actually trained.
  4. The “Karpathy” Curriculum: Watch Andrej Karpathy’s “Zero to Hero” YouTube series. It is the single best resource for an engineer to understand the guts of GPT.

Summary Checklist

  1. Don’t do a PhD. It takes too long, and the field moves too fast.
  2. Target “Research Engineer” roles.
  3. Watch Karpathy’s videos and implement a GPT from scratch.
  4. Learn CUDA/Triton if you want to be undeniable.
  5. Build a public GitHub repo specifically for AI projects.
  6. Apply to Residencies or Infrastructure roles at the labs.