novelai_api.Tokenizer¶
- class SentencePiece[source]¶
Bases:
sentencepiece.SentencePieceProcessor
Wrapper around sentencepiece.SentencePieceProcessor that adds the encode and decode methods
- trans_table_ids: Dict[int, str]¶
- trans_table_str: Dict[str, int]¶
- trans_regex_str: re.Pattern¶
- class Tokenizer[source]¶
Bases:
object
Abstraction of the tokenizer behind each Model
- classmethod get_tokenizer_name(model: novelai_api.Preset.Model) str [source]¶
Get the tokenizer name a model uses
- Parameters:
model – Model to get the tokenizer name of
- classmethod decode(model: novelai_api.Preset.Model | novelai_api.ImagePreset.ImageModel, o: List[int]) str [source]¶
Decode the provided tokens using the chosen tokenizer
- Parameters:
model – Model to use the tokenizer of
o – List of tokens to decode
- Returns:
Text the provided tokens decode into
- classmethod encode(model: novelai_api.Preset.Model | novelai_api.ImagePreset.ImageModel, o: str) List[int] [source]¶
Encode the provided text using the chosen tokenizer
- Parameters:
model – Model to use the tokenizer of
o – Text to encode
- Returns:
List of tokens the provided text encodes into