Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Apple OpenELM

Quick Take: Apple’s OpenELM Small Language Model

Apple this week unveiled its OpenELM, a set of small AI language models designed to run on local devices like smartphones rather than rely on cloud-based data centers. This reflects a growing trend toward smaller, more efficient AI models that can operate on consumer devices without significant computational resources.

OpenELM models are currently available on the Hugging Face platform under an Apple Sample Code License, allowing developers to explore these models’ capabilities.

Apple’s OpenELM suite contains eight models with 270 million to 3 billion parameters. The models are divided into two categories: four “pretrained” models, which are raw, next-token predictors, and four “instruction-tuned” models, designed for instruction-following applications such as chatbots and AI assistants.

Its parameter range contrasts with much larger models like Meta’s Llama 3, which has a version with 70 billion parameters, and OpenAI’s GPT-3, with 175 billion parameters. However, the smaller models are gaining ground with increasing capability and efficiency, thanks to recent research into optimizing AI performance with fewer resources.

Apple trained OpenELM models using a 2048-token context window on a mix of public datasets totaling around 1.8 trillion tokens. The models incorporate a “layer-wise scaling strategy,” which improves efficiency and accuracy by allocating parameters across layers more effectively. This approach has yielded a 2.36% improvement in accuracy over Allen AI’s OLMo 1B model while requiring half as many pre-training tokens.

Analysis

OpenELM furthers a broader trend in the AI landscape, where companies want to bring sophisticated AI capabilities directly to consumer devices, reducing reliance on cloud-based data centers.

Apple’s emphasis on transparency and reproducibility with OpenELM is significant. The company released source code, model weights, and training materials, encouraging a collaborative approach to AI development. This level of openness is rare among major tech companies and is incredibly unusual for Apple.

Apple hasn’t yet integrated OpenELM into its consumer devices, but the upcoming iOS 18 update, expected to be announced at WWDC, may feature AI capabilities that leverage on-device processing. This could provide a much-needed boost to Apple’s virtual assistant, Siri, which has struggled to keep pace with competitors.

Apple isn’t alone in developing small language models. Microsoft recently unveiled its Phi-3, which is trained on 3.8 billion parameters and aims to deliver effective language processing performance while maintaining a smaller footprint. Mistral AI, a European startup, also recently released a 7-billion-parameter model, providing a lightweight alternative to larger models like OpenAI’s GPT-3.  

Small language models like these will bring the natural language capabilities of generative AI to end-user devices in a portable manner that will support everyday consumer uses cases in environments that are often disconnected. It’s a powerful capability.

Disclosure: The author is an industry analyst, and NAND Research an industry analyst firm, that engages in, or has engaged in, research, analysis, and advisory services with many technology companies, which may include those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.