Attention
In AI “Attention” is the term for the program method that determines which words in the text are most relevant for understanding each other. These relationships display the context, and the context creates a value in the language. For example, in the Bank's proposal, the Bank has raised interest rates, the model helps to establish that the Bank refers to the “interest rates” in the financial context, and not in the context of the river. Due to attention, conceptual relations become quantitatively as the numbers stored in the neural network. Attention also determines how the models of the AI languages choose which information is “most important” when generating each word of their answer.
The calculation of the context with the help of the machine is difficult, and it was not practical on the scale, while chips such as graphic processors that can calculate these relations in parallel have not reached a certain level of ability. Despite this, the original Transformer Since 2017, the architecture has checked the relationship of each word in a hint with any other word in the form of rough power. Thus, if you have given 1000 words of tips in the AI model, this led to a comparison of 1000 x 1000 or 1 million relationships for calculating. With 10,000 words, this becomes 100 million relationships. A The cost is growing squareWhich creates a fundamental narrow place for processing long conversations.
Although it is likely that Openai uses some meager methods of attention in GPT-5, long conversations still suffer from fines for performance. Each time you send a new answer to Chatgpt, a model of artificial intelligence on its main processes for comparing the context for the whole history of conversations again and again.
Of course, the researchers standing behind the original model of the transformer developed it for machine translation with relatively short sequences (possibly several hundred tokens, which are pieces of data that represent the words), where the square attention was controlled. When people began to scale up to thousands or tens of thousands of tokens, the square value became exorbitantly high.