Models Mafia - how to build a foundation model?

Seekers going deep, thanks to deepseek.

Jan 29, 2025

Until last week whenever there was a question on how to build a great foundation model, the thinking stopped at computing.

The one with more compute wins.

But something changed last week when DeepSeek rewrote the thesis by cutting down the computing costs while delivering the same quality output.

This opened up the doors for talent to think better. A new community group started discussing on how to build foundation models.

Below is a summary of discussion on building foundation models along with a call on how to get started.

Why should you care about foundation models? 🤔

After DeepSeek's breakthrough, there's growing excitement about democratizing AI development. Whether you're a seasoned developer or just starting out, understanding how to build foundation models can open new opportunities in the AI landscape.

Q: Isn't building foundation models too complex for beginners?

A: Not anymore! As one of our community members put it: "Most people can build a basic LLM within three months." The key is structured learning and hands-on practice.

Q: What makes DeepSeek's approach special?

A: They succeeded through clever engineering:

Using smaller matrices with mixture of experts
Implementing 8-bit floating point for memory optimization
Optimizing GPU communication → This shows that innovation isn't just about scale!

Q: How do I get started? 📚

Our community recommends this learning path:

Foundation Skills
- Python basics (Recommended: sentdex YouTube channel)
- Basic math (Linear Algebra, Calculus, Probability)
- Neural network fundamentals
Core Resources
- D2L (Dive into Deep Learning)
- Andrej Karpathy's neural networks course
- 3Blue1Brown's neural network playlist
Practical Projects
- Start with Perceptron models
- Build CNNs for image classification
- Progress to RNNs for text processing
- Graduate to transformer architectures

Q: What tools do I need? 🛠️

Essential tools include:

Jupyter Notebook
Python with NumPy
PyTorch
Lightning Studio

Q: Are there any ready-to-use resources?

Yes! Check out:

Bumblebee Project - Community learning initiative
LLMs from Scratch - Step-by-step guide

Q: What's the end goal? 🎯

Our community is working towards building a small language model that can:

Run on a standard laptop
Use Wikipedia data as training corpus
Demonstrate fundamental LLM principles

Q: Beyond Transformers: What's Next? 🚀

Keep an eye on:

Mamba architecture (used in Falcon LLM family)
Check out: Mamba docs
Research: Mamba paper
Example: Falcon LLM

Want to join the movement? 💪

Our community believes in:

Learning together
Building from scratch to understand deeply
Supporting each other's growth
The only requirement? Commitment and dedication!

Pro Tips from the Community:

"Focus on understanding before optimization"
"Start small, build up gradually"
"Theory and practice must go hand in hand"
"Don't be afraid to experiment"

Ready to dive in?

Community link: Here

Meeting notes, Jan 29th 2025: Here

Models Mafia: Team Introductions 🚀

A diverse group of professionals came together to build foundation models. Here's who we are (anonymized):

Cloud Platform SRE
- Works at a major cloud provider on H1B visa
- From UP (Uttar Pradesh)
- In US for 7 years
- Loves mathematics and going to the gym
Tech Consultant & Coach
- Teaches teams about tech, including AI and low-level systems
- Background: ~3 years biology, ~7 years CS
- Lives in SF with a Jersey connection
- Focused on helping teams stay current with tech
Tech Giant SWE
- Long-time US resident
- Background in operations research, simulation
- Currently works in distributed systems
- Works in cloud infrastructure team
Dual-Role AI Consultant
- Day: Consultant at an AI R&D firm in Noida
- Night: Consultant at US-based AI agents company
- Skills: Python, ML/DL libraries, software project management, applied research
- Passionate about AI development
FinTech Backend Engineer
- Works for a major "too big to fail" bank in NYC
- Specializes in Java and backend technologies
- Building robust financial systems
Media AI Product Manager
- Works at major entertainment company
- Based in Los Angeles
- Personal: Ran 3 marathons, enjoys surfing, reading biographies
- Researches AI in free time
Data Engineer
- Works with data pipelines
- Experience with basic AI models (supervised/unsupervised)
- Built up to neural networks
- Worked on image processing models for classification
Startup PM
- PM at a logistics startup
- Builder by night
- IIT Kharagpur graduate (2022)
- Experience with agentic applications and basic fine tuning
- Active in tech community
AI Startup Founder
- Founded company working on satellite imagery
- Developing realtime insights at the Edge
- Using advanced GenAI models
- Innovating in space tech
Materials Scientist
- Background in Material science
- Experience in process engineering
- Expertise in Micro and Nanofabrication, Advanced Materials Characterization
- Working on personal AI project
Enterprise Success Leader
- Customer Success Manager at major CRM company
- 10 years IT development experience
- Currently brainstorming a side hustle
- Interested in LLM building and collaboration

OpenSphere

Discussion about this post