Breaking Barriers in AI: Foundation Models - Funding, Challenges and Approach
Key Developments and Next Steps
Overview
The third day of collaboration in the foundation models group brought significant momentum in both technical discussions and execution. From documentation strategies to technical bottlenecks and funding solutions, the group continues to push forward with strong coordination. The key themes of the day revolved around model training infrastructure, distributed inference, and strategic funding.
Technical Documentation and Coordination
Documentation: Various options for hosting and managing technical documentation were considered. While GitHub was chosen as the primary repository for now, Confluence was proposed as a potential alternative.
Funding and Grants: Blogs and technical documentation are expected to play a key role in securing grants.
Collaboration Tools: Ensuring smooth coordination, members discussed leveraging VPN-based network setups and optimizing model sharding for efficient inference across multiple devices.
Technical Challenges and Solutions
1. Inference Engine and Model Support
Exo's Compatibility: The group identified a key issue with inference engines: Exo supports both TinyGrad and MLX, but Linux devices struggle with MLX, limiting model availability.
Mac-Based Optimization: Thunderbolt connections were suggested to optimize networking speeds between Mac devices.
Multi-GPU Support: Workarounds were discussed for better multi-GPU integration in Exo, especially in networked environments.
2. Model Sharding and Network Limitations
Latency Issues: Sharding over network gateways poses challenges due to hardware constraints and bandwidth limitations.
Potential Solutions: High-throughput switches and optimized network routing were considered.
3. Distributed Training Feasibility
Current Status: Exo currently does not support distributed training across multiple GPUs efficiently.
DiLoCo Methodology: Distributed Low-Communication Training (DiLoCo) was proposed as a theoretical solution to reduce synchronization overhead.
Training Over Networks: Members explored using cloud-based GPUs to reduce reliance on local infrastructure.
Key Progress Points
🚀 Step 0 (PoC) Completed:
Successfully installed ExoLabs on Mac and Linux devices.
Validated distributed compute approach with smaller models.
Identified network latency as a bottleneck.
Confirmed Exo’s model sharding works but has limitations.
🔜 Step 1: Expanding the Network (Ongoing)
VPN Integration: Securely connecting multiple devices across networks.
Optimizing Sharding: Reducing network overhead for improved performance.
Model Selection: Testing different models across Exo’s supported inference engines.
Funding and Infrastructure Strategies
Cloud Computing Credits and Resource Allocation
Members are pooling resources to rent high-performance GPUs (H100s on Vast.ai, DigitalOcean GPU droplets, etc.).
Free Credits: AWS, GCP, and Microsoft provide startup credits. Applications are underway.
Strategic Funding: Exploring grants via the India AI mission and private donors.
Project Nalanda: A New Initiative
Goal: Build a foundation model trained on Vedic literature, Sanskrit texts, and Indic knowledge.
First Steps: Train an LLM on Hindi Wikipedia as a proof-of-concept.
Long-Term Vision: Recreate lost knowledge from Nalanda, incorporating translations and cross-cultural texts.
Next Steps and Immediate Priorities
MVP Development:
Train a small Hindi language model using Wiki data.
Deploy a basic chatbot for religious literature discussions.
Optimize inference using Exo’s model partitioning.
Infrastructure & Compute Planning:
Finalize cloud GPU rentals and distributed computing setup.
Implement optimizations for cross-network training.
Community Expansion & Awareness:
Onboard new contributors and align efforts across sub-projects.
Explore government and private funding avenues.
Maintain active documentation and technical blogging for transparency and credibility.
The third day’s discussions solidified our technical foundation, clarified our immediate priorities, and laid the groundwork for securing funding based on progress.
The enthusiasm within the group remains strong, with promising steps towards both short-term deliverables and long-term innovation in open-source foundation models.
Community link: Here
Previous posts: