14.5 C
New York
Sunday, November 17, 2024

AMD launches AMD-135M: AMD’s first series of small language models trained from scratch on AMD Instinct™ MI250 accelerators using 670B tokens


AMD has recently presented its new language model, AMD-135M either AMD-Flame-135Mwhich is an important addition to the landscape of AI models. Based on the LLaMA2 model architecture, this language model has a robust structure with 135 million parameters and is optimized for performance on the latest AMD GPUs, specifically the MI250. This launch marks a crucial milestone for AMD as it strives to establish a strong position in the competitive AI industry.

Background and technical specifications

The AMD-135M is built on the LLaMA2 model architecture and is integrated with advanced features to support various applications, particularly in text generation and language understanding. The model is designed to work seamlessly with the Hugging Face Transformers library, making it accessible to developers and researchers. The model can handle complex tasks with a hidden size of 768, 12 layers (blocks), and 12 attention heads while maintaining high efficiency. The activation function used is the Swiglu function and the layer normalization is based on RMSNorm. Its positional embedding is designed using the RoPE method, improving its ability to accurately understand and generate contextual information.

The launch of this model is not just about the hardware specifications but also the software and data sets that power it. AMD-135M has been pre-trained on two key data sets: the SlimPajama and Project Gutenberg data sets. SlimPajama is a deduplicated version of RedPajama, including sources such as Commoncrawl, C4, GitHub, Books, ArXiv, Wikipedia, and StackExchange. The Project Gutenberg dataset provides access to a vast repository of classical texts, allowing the model to understand various linguistic structures and vocabularies.

AMD-135M Key Features

AMD-135M has notable features that differentiate it from other models on the market. Some of these key features include:

  • Parameter size: 135 million parameters, allowing for efficient text processing and generation.
  • Number of layers: 12 layers with 12 attention heads for in-depth analysis and contextual understanding.
  • Hidden size: 768, which offers the ability to handle various language modeling tasks.
  • Attention type: Multi-head attention, which allows the model to focus on different aspects of the input data simultaneously.
  • Context window size: 2048, ensuring that the model can effectively handle larger input data sequences.
  • Pre-training and fine-tuning data sets: The SlimPajama and Project Gutenberg datasets are used for pre-training, and the StarCoder dataset is used for fine-tuning, ensuring comprehensive understanding of the language.
  • Training settings: The model employs a 6e-4 learning rate with a cosine learning rate schedule and has gone through multiple epochs for effective training and tuning.

Implementation and use

The AMD-135M can be easily implemented and used through the Hugging Face Transformers library. For implementation, users can load the model using the `LlamaForCausalLM` and `AutoTokenizer` modules. This ease of integration makes it a favorable option for developers looking to incorporate language modeling capabilities into their applications. Additionally, the model supports speculative decoding for AMD’s CodeLlama, further expanding its usability for code generation tasks. This feature makes AMD-135M particularly useful for developers working on programming-related text generation or other NLP applications.

Performance evaluation

The performance of AMD-135M has been evaluated using the Movie Evaluation Harness on several NLP benchmarks such as SciQ, WinoGrande, and PIQA. The results indicate that the model is highly competitive and offers performance comparable to other models in its parameter range. For example, it achieved a pass rate of approximately 32.31% on the Humaneval dataset using MI250 GPUs, a strong performance indicator for a model of this size. This demonstrates that AMD-135M can be a reliable model for commercial and research applications in natural language processing.

In conclusion, the launch of AMD-135M underscores AMD’s commitment to advancing AI technologies and providing affordable, high-performance models to the research community. Its robust architecture and advanced training techniques position the AMD-135M as a formidable contender in the rapidly evolving landscape of AI models.


look at the Model hugging face and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram channel and LinkedIn Grabove. If you like our work, you will love our information sheet..

Don’t forget to join our SubReddit over 50,000ml


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. Their most recent endeavor is the launch of an AI media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.



Related Articles

Latest Articles