This machine learning study tests the transformer’s length generalization ability using the task of adding two integers

Written By Adarsh Shankar Jha

Transformer-based models have transformed the fields of Natural Language Processing (NLP) and Natural Language Generation (NLG), demonstrating excellent performance in a wide range of applications. The best examples of these are the recently introduced Gemini models from Google and GPT models from OpenAI. Several studies have shown that these models perform well in mathematical reasoning, code synthesis, and theorem-proving tasks, but struggle with length generalization, which is the ability to apply their knowledge to sequences longer than those encountered during training.

This limitation raises important questions about whether Transformers truly understand the fundamental algorithms of a task, or whether they rely on quick fixes and surface-level memory that don’t work for larger, more complex tasks. The researchers are trying to find out if the transformers have a built-in design flaw that prevents successful length generalization.

To overcome this, a team of researchers from Google DeepMind has focused on a methodical analysis of the transformer’s length generalization ability, with particular attention to the N-digit decimal addition problem. Despite the relative simplicity of the addition problem compared to natural language, this study treats it as synthetic language learning to gain insights into the Transformer’s ability to internalize basic processes.

The team investigated the length generalization ability of the Transformer model, i.e. using the addition of integers as a lens. The results revealed an important interdependence: the ability of a transformer to process longer sequences depends not only on its architecture and size, but also to a large extent on the type of data it uses and the positional encoding used. The team shared that the position coding technique, which gives the model a sense of sequence order, and the data format, which describes how information is provided to the model, are critical elements in determining whether the model can be generalized or no.

Through experiments involving different combinations of positional encodings and data formats, the team found configurations that allow standard transformers to extrapolate to sequences 2.5 times longer than those encountered during training, thus greatly exceeding their training limits. This showed that transformers are capable of handling longer sequences successfully when given the right training and conditions.

In contrast to expecting models to perform consistently on data similar to their training set in within-distribution generalization, length generalization is a more subtle achievement, emphasizing the complex interplay between dynamic training, data representation, and model design in order to to achieve reliable extrapolation capabilities.

The team summarized their main contributions as follows.

  1. It was found that the strategic choice of position encoding and data format is critical to achieving successful length generalization in language models, especially in tasks such as integer addition. The capabilities of these models have been extended by optimizing these aspects, allowing them to handle sequences up to 2.5 times larger than those on which they were trained.
  1. Several data shaping and augmentation approaches have been studied, and it has been found that the effectiveness of these approaches in improving length generalization is highly dependent on the type of location coding applied. This highlights the importance of using a concerted strategy when choosing location encoding and data format to get the best results.
  1. The models have been found to achieve remarkable generalization, such as extrapolation to lengths far beyond their training range. however, there was a notable fragility to this ability. Model performance varies greatly between training iterations due to factors such as randomization of weight initialization and the order in which the training data is given.

check it Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us Twitter and Google news. Participation Our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channeland LinkedIn Groops.

If you like our work, you will love our work newsletter..

Don’t forget to join us Telegram channel

You might also like ours FREE AI Courses….


20220308 160704 1 Tanya

Tanya Malhotra is a senior from University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with specialization in Artificial Intelligence and Machine Learning.
He is a Data Science enthusiast with good analytical and critical thinking along with a keen interest in acquiring new skills, leading teams and managing work in an organized manner.


You May Also Like

0 Comments

Trackbacks/Pingbacks

  1. Review: Penny's Big Breakaway (PS5) - Unique Yo-Yo Platformer Has Its Ups and Downs | BitRise - […] with the appealing visual style and ear-friendly music, there’s a lot to like about this title. However, it is…
  2. Goodbye low prices! Netflix makes you pay more | BitRise - […] Subscriptions through the App Store will result in a significant increase in their monthly bill. This is especially true…
  3. No more typing! The keyboard of the future has arrived… | BitRise - […] This innovative feature proves to be particularly useful in many everyday situations. Whether you’re a student, an entrepreneur, or…
  4. Scaling Up LLM Agents: Unlocking Enhanced Performance Through Simplicity | BitRise - […] This paper investigates a fascinating phenomenon: the possibility of improving the performance of LLM simply by scaling the number…
  5. Livebox 5 flashes white: What to do to repair the modem? | BitRise - […] happen that the Livebox 5 encounters malfunctions and displays a flashing white light . What does this signal mean?…
  6. Researchers from NVIDIA and University of Maryland Propose ODIN: A Hacking-Mitigating Reward Decomposition Technique in Reinforcement Learning from Human Feedback (RLHF) | BitRise - […] problem with RLHF, where the policy receives a large reward without fulfilling the real objectives. This is as a…
  7. AMD Radeon RX 7600XT 16GB Launch and XFX QICK 309 Review - PC Perspective | BitRise - […] Fluid Motion Frames has just been added in the latest drivers (RX 7000 Series and RX 6000 Series). This…
  8. The best PS5 games 2024 to put on your wishlist! | BitRise - […] of Ronin is an action-adventure game developed by Sucker Punch, the creators of Ghost of Tsushima. This game has…
  9. How to do (@) on Mac? | BitRise - […] panic, in this article we will explain how make ats on mac easy and […]
  10. Azertyuiopqsdfghjklmwxcvbn: Why this order? [Azerty] | BitRise - […] typewriters. QWERTY was designed to prevent blocking of adjacent keys during rapid typing. To do this, the most frequent…
  11. The free remote is flashing red: What to do? | BitRise - […] red quickly and no longer works properly. What is the cause of this problem and how to fix it?…
  12. How to access the secret menu of your Samsung TV? | BitRise - […] : This submenu allows you to change settings such as hotel mode receiver type, country, language, […]
  13. TCL TV is connected but no internet: What to do? | BitRise - […] enabled WiFi on your TV, there may be a temporary problem with smart features or your router. In this…
  14. i upper & lower case umlaut | Azerty - Qwerty on the keyboard | BitRise - […] press this key and then the I key to get an umlaut. For example, to write naive, type N,…
  15. Autoclick: automates mouse clicks on a computer | BitRise - […] installed, you need to launch it and access its main interface. Depending on the software selected, this may vary…
  16. ASUS ROG Strix B650E-F: competitive motherboard? | BitRise - […] This motherboard offers 4x DDR5 (dual channel) for memory banks, supports up to 6400 MHz (1DPC) and a maximum…
  17. Infernax could be a good time - This VideoGame Blog | BitRise - […] only important thing is that it is needed touch like that game you used to play and love back…
  18. Blast Brigade Enters Early Access - That VideoGame Blog | BitRise - […] you will need to change them immediately if you want to overcome the obstacles presented to you. This will…
  19. SSD Gammix S70 (Blade): ADATA withdraws the new Firmware! | BitRise - […] modernizethe reports said that SSD they were useless. NEEDLE Therefore, it was forced to remove this […]
  20. REVIEW / Song of Horror (Xbox) - That VideoGame Blog | BitRise - […] ends up trapped behind the mysterious door, which also ends up disappearing. At this point I got to meet…
  21. Trigger Witch Heads Up Consoles This Summer - This VideoGame Blog | BitRise - […] One, Xbox Series X/S, and of course, Switch. We don’t have an exact release date yet, but this title…
  22. Open or closed computer case, what to choose? | BitRise - […] This point should definitely be considered depending on each individual’s situation. The question does not arise in the same…
  23. Ceram, the memory of the future | BitRise - […] team has made great strides in advancing CeRAM technology. The next step would be to commercialize this technology of…
  24. When should I replace my computer's power supply? | BitRise - […] This can cause some issues, crashes, FPS drops, etc. […]
  25. Aligning Vision and Language: Driving Consistency in Unified Models with CocoCon | BitRise - […] models to impressive heights, enabling them to tackle a wide range of multimodal tasks. However, this flexibility has revealed…
  26. 710 - Switch 2 Rumors and The Last of Us Emmy | BitRise - […] we discuss The Last of Us’ remarkable achievement of winning eight Emmys. Don’t miss this exciting episode full of…
  27. How to test your VPN speed? | BitRise - […] first step is to determine how your VPN affects your connection speed. To do this, you need to run…
  28. The main criteria for choosing an LED wall | BitRise - […] there is one advertising tool that can make the difference: the LED wall. If you want to use this…
  29. How to perform a clean boot in Windows 11? | BitRise - […] This comprehensive guide, suitable for beginners and advanced users explains in detail how to perform a clean boot on…
  30. Sony Patches accidentally released the Stellar Blade PS5 demo to prevent people from playing it | BitRise - […] someone let this thing slip a little early. Stellar Blade itself is due out on April 26th, so we…
  31. How to use GPUs to improve video acquisition and processing? | BitRise - […] advantages over traditional CPUs in terms of performance, processing time, and image quality. In this article, we’ll explore how…
  32. How (and when) to use the Ping command in Windows? | BitRise - […] this article, we will explore the ping command Windows . We’ll look at what it does, how to use…
  33. 176-layer 3D NAND, 100 TB SSD! | BitRise - […] performance. It would be entirely possible to produce 100TB SSDs with 176-layer 3D NAND. Therefore, this technology will land…
  34. Honor finally reveals her crazy project! | BitRise - […] announcement: Honor, the former subsidiary of Huawei, is preparing to release a mobile phone. This newcomer directly takes on…
  35. The Value Priced Enermax REVOLUTION DF X 1050W - PC Perspective | BitRise - […] excellent or good, so there’s nothing to worry about there. As far as The FPS Review goes, this may…
  36. It's Super Tuesday! The NVIDIA RTX 4070 SUPER Version Of It Anyway - PC Perspective | BitRise - […] 2.0 and the FireStorm Utility to overclock the card, increasing the power as high as it would go. This…
  37. Enhancing tool use in large language models: The path to accuracy with trial-and-error simulation | BitRise - […] reveals a key challenge: the accuracy with which these models use tools still needs to be improved. This gap…
  38. 2TB Sabrent Rocket 5, adding even more speed to one of the fastest SSD manufacturers - PC Perspective | BitRise - […] reads and writes were slightly lower, and the Sabrent comes in second just behind the Crucial T705. This performance…
  39. Review: Reigns: Three Kingdoms (Nintendo Switch) - Pure Nintendo | BitRise - […] Queens and Reigns: Game of Thrones before it, Reigns: Three Kingdoms is a turn-based strategy game, this time set…
  40. Invest in MSI MEG Z790 ACE DDR5? - PC perspective | BitRise - […] partly due to the power system, it has a Duet Rail power system with a 24+1+2 design and 105A…
  41. WD and Dropbox team up on Ultrastar DC HC650 drives | BitRise - […] Magnetic Recording it has been used since 2005 to replace LMR (Longitudinal Magnetic Recording). This PMR technology allows higher…
  42. This AI research from Stability AI and Tripo AI introduces the TripoSR model for fast generation 3D feed from a single image | BitRise - […] sampling by optimizing 3D models using a 2D diffusion model. For creating detailed 3D objects, this method is a…
  43. This $190 6th generation iPad Refurbishment updates to the latest iPadOS | BitRise - […] not talking about discounts or coupons, we’re talking about refurbished devices. And, well, this just happens to be for…
  44. This Microsoft Research Proposes PRISE: A New Machine Learning Method for Learning Multitasking Temporal Action Abstraction that Leverages a New Connection to NLP Methodology | BitRise - […] this work from Microsoft, the research team has focused on temporal action abstractions, e.g. in […]
  45. How to Scan QR Codes in Windows? | BitRise - […] has developed a free application called Codex, dedicated to scanning QR codes and barcodes. This app offers additional features…
  46. Toshiba presents the new MG08-D series of electric motors | BitRise - […] has just introduced a new series of MG08-D drives. This is a high-performance drive intended for small and medium-sized…
  47. AORUS Z690i Ultra Plus, now with less WHEA errors - PC Perspective | BitRise - […] This sports Mini-ITX board bakes almost half the thickness of the board’s width, with shielding on the back as…
  48. This AI paper from the University of Oxford proposes Magi: A Machine Learning Tool to Make Manga Accessible to the Visually Impaired | BitRise - […] Manga relies heavily on interlocking visuals and text, making the experience inherently visual. This visual dependency means that visually…
  49. Beware, Arisen, Dragons Dogma 2 Spoilers are in the Wild | BitRise - […] public service announcement ahead of the game’s release on Friday. As usual, if this is an experience you’d like…
  50. How to find Windows 11 or 10 product key? | BitRise - […] file : Windows product key in the registry, but this method is more technical and requires precise […]
  51. PS5 turning itself off: What to do to fix it? | BitRise - […] with the system software may be causing the problem with yours PS5 turning itself off. To avoid this, be…
  52. Common Problems in Xiaomi Redmi 11A and Solution Fix – Tips & Tricks! | BitRise - […] this article, I have discussed various Issues with the Xiaomi Redmi 11A. All the solution explained […]
  53. Apple boss Tim Cook launches charm offensive in vital market | Digital Trends | BitRise - […] “a classic Shanghai breakfast,” adding that “I am always so glad to be back in this remarkable […]
  54. Review: The Legend of Legacy HD Remastered (Nintendo Switch) - Pure Nintendo | BitRise - […] It’s too generic to hook you and kind of betrays the game’s unique concept. At least this time they’ve…
  55. Corsair Vengenace Unleashes 32GB DDR5-6000MHz - PC Perspective | BitRise - […] may also prefer this kit if you have a rig controlled by Corsair’s iCUE software, as it is fully…
  56. Common Problems in Xiaomi Mi 1S and Solution Fix – Tips & Tricks! | BitRise - […] this article, I’ll show you how to fix the bugs and problems on your Xiaomi Mi 1S device. It’s…
  57. MSI MAG CoreLiquid E360 AIO Delivers Quiet Cooling - PC Perspective | BitRise - […] the other hand, if you want to create a quiet system, then you should definitely take a look at…
  58. Windows 11: How to fix a missing Wi-Fi option? | BitRise - […] mysteriously disappeared? Do not panic ! This problem, while frustrating, is often easy to fix. In this article, we…
  59. How do ChatGPT, Gemini and other LLMs work? | BitRise - […] such as ChatGPT, Google’s BERT and others exemplify developments in this area. These models are used in a variety…