Introduction
In the rapidly evolving world of AI, foundation models have become the backbone of various applications, from chatbots to advanced research tools. If you're looking to get started with building your own foundation model, this guide is for you. Based on an in-depth discussion among AI enthusiasts and professionals, we’ve compiled key takeaways, methodologies, and resources to help you begin your journey.
Understanding Precision and Bit Representations
One of the earliest topics discussed was floating-point (FP) precision, particularly moving from 32-bit to 16-bit and even to 8-bit representations.
FP8 is gaining traction due to its efficiency in model training and inference.
Researchers have explored binary neural networks where weights are either 0 or 1.
The number of bits needed depends on the range of values in the dataset.
FP8 has specific applications and is mostly used on NVIDIA H100s.
For those interested in learning more, NVIDIA provides an FP8 primer: FP8 Primer
Key Players and Their Roles
The group discussing these topics comprises professionals from diverse backgrounds:
Cloud Platform SREs
Tech Consultants & Coaches
Engineers from major tech companies
Embedded chip designers
AI researchers
This mix of expertise allows for deep technical discussions and practical implementation strategies.
Training vs. Inference: Where Does FP8 Fit?
FP8 is primarily used for inference but is increasingly being adopted for training.
DeepSeek models have been loaded onto H100s for inference, which suggests access to high-end GPUs is critical for using FP8 effectively.
Open-source GPU rental services such as SFCompute provide H100s at affordable rates (<$2 per hour), making them accessible to independent researchers.
Setting Up Infrastructure for Model Training
To build a foundation model, you need the right infrastructure. Here are the key steps outlined in the discussion:
Step 0: Set up a Proof of Concept (PoC) using personal devices on a local network.
Step 1: Create a network gateway to connect devices into a secure shared network.
Step 2: Scale the setup by adding more devices and optimizing compute resources.
Available Compute Options
Several cost-effective options exist for AI model training:
SFCompute - Low-cost GPU access.
ExoLabs - Connect multiple devices into a cluster.
Vast.ai - Offers cheaper GPU rentals.
Jetson Orin Nano Super - $250 powerful hardware alternative.
Step 0: Setting Up ExoLabs for Distributed Compute
ExoLabs enables running AI models across multiple devices. Steps to set it up:
Clone the ExoLabs repo:
git clone https://github.com/exo-explore/exo.git
Install dependencies:
pip install -e .
Run the software and ensure all devices are connected to the same network.
Download a model and test inference speed.
Challenges in Distributed Computing
One of the major roadblocks in scaling foundation models is network latency. ExoLabs uses a ring topology to distribute the model, but bandwidth limitations can slow down inference speeds. Possible solutions include:
Using high-speed switches and 10Gbps Ethernet connections.
Implementing federated learning to perform local computations before aggregating data.
Sharding models smartly based on network speed and compute capacity.
Selecting a Niche for Model Training
While general-purpose models exist, niche foundation models can offer superior accuracy in specialized fields. Some suggested areas:
Health AI – Training a model on medical records for better diagnosis.
Legal AI – AI-assisted legal document review.
Religious & Historical Texts – Training on Vedic texts or Upanishads for philosophical inquiries.
Agriculture & Weather – Providing accurate climate and farming predictions in regional languages.
Benchmarking and Evaluating Models
Before deploying models, it's crucial to benchmark them for accuracy and efficiency. Some useful benchmarks include:
CRM Benchmark by Salesforce – Tests structured reasoning capabilities.
Perplexity Measurement – Evaluates model fluency and prediction accuracy.
Cross-lingual Testing – Validating models in multiple languages for broader applicability.
Next Steps and Collaboration
Moving to Step 1: Expanding the Network
Now that the PoC is underway, the next phase involves creating a secure gateway to connect devices across different networks. The focus areas will be:
Setting up VPN-based secure access.
Ensuring visibility of all connected devices within ExoLabs.
Optimizing load distribution across compute resources.
Community-Driven Open Source Development
To accelerate progress, the team is maintaining open-source repositories for collaborative work. Contributions are welcome in areas such as:
Implementing a Hindi foundation model.
Improving ExoLabs for better distributed inference.
Experimenting with knowledge distillation techniques for training efficiency.
Conclusion
Building a foundation model is an ambitious but achievable goal. With the right infrastructure, compute power, and collaborative effort, even small teams can contribute to the growing AI ecosystem.
Whether you're an AI researcher, engineer, or enthusiast, this guide provides the roadmap to get started. Stay engaged, keep experimenting, and push the boundaries of what's possible in AI.
Ready to dive in?
Community link: Here
Meeting notes, Jan 29th 2025: Here
Previous post link:
New Members:
Criston Mascarenhas – Fullstack developer working in the assistive technology space.
Paul John – BSc in electrical and electronics engineering, specializes in embedded chip design, familiar with RTOS, SPI, CAN, IoT, currently studying engineering management in the US. Interested in AI for enhancing hardware solutions.
Arun Jalanila – Data Scientist at a healthcare company, works on NLP pipelines and GenAI products.
Rishal Raj – Interested in building and optimizing AI-based solutions.
Kawaljeet Kumar – Owns a café, experimenting with AI use cases in the food sector.
Pratyush Shukla – Experienced in resource management for AI workloads.
Krishna Kumar Soni – Interested in AI applications for specific industry use cases.
Current Challenges:
Challenges in Distributed Computing
Network Latency & Bottlenecks: When running distributed inference across multiple devices, network speed becomes a critical factor. The slowest network connection can limit the overall speed.
Bandwidth Constraints: Standard internet speeds (1Gbps or 10Gbps) may not be sufficient for efficient distributed inference, while high-speed networking (100Gbps) is expensive.
Sharding Complexity: Efficiently partitioning and distributing model layers across devices without excessive latency is a challenge.
Federated Learning Issues: While federated learning helps in distributed training, it still requires well-optimized aggregation mechanisms.
Challenges in Model Training & Optimization
FP8 Adoption & Accuracy Trade-offs: While FP8 offers efficiency gains, it is not always viable for every part of the model. Sensitive areas like embeddings require higher precision.
Compute Cost & Availability: Training a full-scale foundation model requires high-end GPUs like H100s, which are expensive and not always accessible.
Resource Allocation for Hindi & Other Niche Models: Training models for Indian languages or domain-specific use cases requires custom datasets and computational resources.
Memory & Compute Limits on Local Devices: Training on consumer hardware is limited by VRAM, requiring optimization techniques like quantization and model parallelism.
Challenges in Infrastructure & Deployment
Setting Up Secure Networks: Step 1 involves creating a gateway to securely connect multiple devices across different locations, which requires expertise in networking.
Scaling Compute Efficiently: Expanding from a small setup to a large-scale distributed system introduces challenges in device synchronization and load balancing.
Cost Optimization for Cloud & On-Premise Resources: Using cloud services like SFCompute or Vast.ai can be expensive for long-term training unless optimized correctly.
Ensuring Model Trustworthiness & Bias Reduction: Current LLMs often exhibit biases, and training models in local languages like Hindi or domain-specific knowledge areas requires careful dataset curation.