Until last week whenever there was a question on how to build a great foundation model, the thinking stopped at computing.
The one with more compute wins.
But something changed last week when DeepSeek rewrote the thesis by cutting down the computing costs while delivering the same quality output.
This opened up the doors for talent to think better. A new community group started discussing on how to build foundation models.
Below is a summary of discussion on building foundation models along with a call on how to get started.
Why should you care about foundation models? 🤔
After DeepSeek's breakthrough, there's growing excitement about democratizing AI development. Whether you're a seasoned developer or just starting out, understanding how to build foundation models can open new opportunities in the AI landscape.
Q: Isn't building foundation models too complex for beginners?
A: Not anymore! As one of our community members put it: "Most people can build a basic LLM within three months." The key is structured learning and hands-on practice.
Q: What makes DeepSeek's approach special?
A: They succeeded through clever engineering:
Using smaller matrices with mixture of experts
Implementing 8-bit floating point for memory optimization
Optimizing GPU communication → This shows that innovation isn't just about scale!
Q: How do I get started? 📚
Our community recommends this learning path:
Foundation Skills
Python basics (Recommended: sentdex YouTube channel)
Basic math (Linear Algebra, Calculus, Probability)
Neural network fundamentals
Core Resources
Practical Projects
Start with Perceptron models
Build CNNs for image classification
Progress to RNNs for text processing
Graduate to transformer architectures
Q: What tools do I need? 🛠️
Essential tools include:
Jupyter Notebook
Python with NumPy
PyTorch
Lightning Studio
Q: Are there any ready-to-use resources?
Yes! Check out:
Bumblebee Project - Community learning initiative
LLMs from Scratch - Step-by-step guide
Q: What's the end goal? 🎯
Our community is working towards building a small language model that can:
Run on a standard laptop
Use Wikipedia data as training corpus
Demonstrate fundamental LLM principles
Q: Beyond Transformers: What's Next? 🚀
Keep an eye on:
Mamba architecture (used in Falcon LLM family)
Check out: Mamba docs
Research: Mamba paper
Example: Falcon LLM
Want to join the movement? 💪
Our community believes in:
Learning together
Building from scratch to understand deeply
Supporting each other's growth
The only requirement? Commitment and dedication!
Pro Tips from the Community:
"Focus on understanding before optimization"
"Start small, build up gradually"
"Theory and practice must go hand in hand"
"Don't be afraid to experiment"
Ready to dive in?
Community link: Here
Meeting notes, Jan 29th 2025: Here
Models Mafia: Team Introductions 🚀
A diverse group of professionals came together to build foundation models. Here's who we are (anonymized):
Cloud Platform SRE
Works at a major cloud provider on H1B visa
From UP (Uttar Pradesh)
In US for 7 years
Loves mathematics and going to the gym
Tech Consultant & Coach
Teaches teams about tech, including AI and low-level systems
Background: ~3 years biology, ~7 years CS
Lives in SF with a Jersey connection
Focused on helping teams stay current with tech
Tech Giant SWE
Long-time US resident
Background in operations research, simulation
Currently works in distributed systems
Works in cloud infrastructure team
Dual-Role AI Consultant
Day: Consultant at an AI R&D firm in Noida
Night: Consultant at US-based AI agents company
Skills: Python, ML/DL libraries, software project management, applied research
Passionate about AI development
FinTech Backend Engineer
Works for a major "too big to fail" bank in NYC
Specializes in Java and backend technologies
Building robust financial systems
Media AI Product Manager
Works at major entertainment company
Based in Los Angeles
Personal: Ran 3 marathons, enjoys surfing, reading biographies
Researches AI in free time
Data Engineer
Works with data pipelines
Experience with basic AI models (supervised/unsupervised)
Built up to neural networks
Worked on image processing models for classification
Startup PM
PM at a logistics startup
Builder by night
IIT Kharagpur graduate (2022)
Experience with agentic applications and basic fine tuning
Active in tech community
AI Startup Founder
Founded company working on satellite imagery
Developing realtime insights at the Edge
Using advanced GenAI models
Innovating in space tech
Materials Scientist
Background in Material science
Experience in process engineering
Expertise in Micro and Nanofabrication, Advanced Materials Characterization
Working on personal AI project
Enterprise Success Leader
Customer Success Manager at major CRM company
10 years IT development experience
Currently brainstorming a side hustle
Interested in LLM building and collaboration