Methods, non-transitory machine readable media, and computing devices that provide improved dictionary-based compression are disclosed. With this technology, a first portion of an input data stream is compressed using a first dictionary. A second dictionary is trained when the first dictionary is determined to be stale. The dictionary can be determined to be stale based on a size of the input data stream compressed using the first dictionary or a compression ratio decreasing by a threshold, for example. The first dictionary can be stored with metadata associated with the compressed first portion of the input data stream. Accordingly, this technology improves compression ratios, eliminates the need for reference counting, and facilitates improved reclamation of orphan dictionaries, among other advantages.