Golden has been acquired by ComplyAdvantage.Read about it here ⟶

Retentive Network

The Retentive network (RetNet) architecture is a foundational architecture for large language models proposed as an alternative to transformers.

Overview Structured Data Issues Contributors Activity

All edits

Edits on 4 Aug, 2023

Amy Tomlinson Gayle

edited on 4 Aug, 2023

Edits made to:

Article (+16/-15 characters)

Article

In particular, the parallel representation allows for training parallelism. The recurrent representation enables inference, improving decoding throughput, latency, and GPU memory. The chunkwise recurrent representation allows for efficient long-sequence modeling with linear complexity, with eaheach chunk encoded parallelly while recurrently summarizing chunks.

...

In their paper, the team at Microsoft Research and Tsinghua University conducted a series of experiments to show that RetNet is competitive in terms of both scaling curves and in-context learning with Transformers and its variants. The paper also states that the inference cost of RetNet is length-invariant. For a 7B parameter model and 8k sequence length, RetNet decoded 8.4x faster, saving 70% of the memory compared to transformers with key-value caches. Training RetNet achieves 25-50% memory saving and 7x acceleration than standard Transformerstransformers.

Edits on 3 Aug, 2023

Arthur Smalley

edited on 3 Aug, 2023

Edits made to:

Infobox (+4/-2 properties)

Timeline (+1 events) (+327 characters)

Description (+53/-34 characters)

Article (+2291 characters)

Further Resources (+1 rows) (+4 cells) (+214 characters)

Retentive Network

The Retentive Networknetwork (RETNETRetNet) asarchitecture is a foundational architecture for large language models proposed as an alternative to transformer modelstransformers.

Article

Overview

The Retentive network (RetNet) architecture is a foundational architecture for large language models (LLMs) proposed as an alternative to transformers. RetNet was first proposed by researchers at Microsoft Research and Tsinghua University, Beijing, in a paper submitted on July 17, 2023. The paper titled "Retentive Network: A Successor to Transformer for Large Language Models" was authored by Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, and Furu Wei. Alongside the paper, the researchers released code on GitHub allowing users to develop their own RetNet models. The code is available through TorchScale, a PyTorch library of foundation architectures.

RetNet derives a connection between recurrence and attention (a key concept in the transformer architecture), proposing the retention mechanism for sequence modeling that supports three computation paradigms:

Parallel
Recurrent
Chunkwise recurrent

In particular, the parallel representation allows for training parallelism. The recurrent representation enables inference, improving decoding throughput, latency, and GPU memory. The chunkwise recurrent representation allows for efficient long-sequence modeling with linear complexity, with eah chunk encoded parallelly while recurrently summarizing chunks.

...

Transformers have become the primary architecture for LLMs. The training parallelism of transformers leads to inefficient inference. With growing sequence lengths, this deficiency increases GPU memory consumption and latency while reducing inference speed. RetNet is a potential next-generation architecture aiming to retain the training parallelism and competitive performance of transformers but with improved inference.

Performance

Further Resources

Title

Author

Link

Type

Date

Retentive Network: A Successor to Transformer for Large Language Models

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei

https://arxiv.org/pdf/2307.08621.pdf

July 17, 2023

Infobox

Created/Discovered by

‌

Li Dong

Date Invented

July 25, 2023

Date Invented

July 17, 2023

Related Industries

Machine learning

Artificial Intelligence (AI)

Generative AI

Timeline

July 17, 2023

Researchers from Microsoft Research and Tsinghua University submit a paper describing the retentive network architecture.

Authored by Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, and Furu Wei, the paper is titled "Retentive Network: A Successor to Transformer for Large Language Models."

Edits on 31 Jul, 2023