novelai_api.Tokenizer¶

class SentencePiece[source]¶

Bases: sentencepiece.SentencePieceProcessor

Wrapper around sentencepiece.SentencePieceProcessor that adds the encode and decode methods

encode(s: str) → List[int][source]¶

Encode the provided text using the SentencePiece tokenizer. This workaround is needed because sentencepiece cannot handle some tokens

decode(t: List[int])[source]¶

Decode the provided tokens using the SentencePiece tokenizer. This workaround is needed because sentencepiece cannot handle some tokens

class Tokenizer[source]¶

Bases: object

Abstraction of the tokenizer behind each Model

classmethod get_tokenizer_name(model: novelai_api.Preset.Model) → str[source]¶

Get the tokenizer name a model uses

Decode the provided tokens using the chosen tokenizer

Parameters:

Returns:

Text the provided tokens decode into

Encode the provided text using the chosen tokenizer

Parameters:

Returns:

List of tokens the provided text encodes into