Cirilla

Ciri from The Witcher 4 trailer
Ciri from The Witcher 4 trailer

Cirilla is an open source learning project aiming at implementing various LLMs. It is focused mainly on showing how to make, train, infer and deploy a LLM from scratch using PyTorch and a budget-friendly GPU (RTX 4060Ti 16GiB ~500$).

Who is Cirilla

Ciri
Fig.1 Ciri Gwent card by Bogna Gawrońska

Cirilla Fiona Elen Riannon, known as Ciri, is one of the central characters in The Witcher saga by Andrzej Sapkowski and its adaptations. She is the princess of Cintra, granddaughter of Queen Calanthe, and the sole heir to a powerful lineage marked by the mysterious Elder Blood.

Ciri is defined by her destiny, adaptability, and potential. Unlike kings who wield authority by birthright, her strength comes from surviving chaos, learning from mentors like Geralt and Yennefer, and unlocking extraordinary powers.

Her unique abilities make her one of the most pivotal figures in the saga. Known as the Lady of Space and Time, the Lion Cub of Cintra, and the Child of the Elder Blood, she can manipulate space and time, travel between worlds, and influence the course of events in ways few can.

Why name a LLM Cirilla

Unlike rulers who inherit authority, Cirilla embodies potential realized through learning, experience, and adaptability. She is resilient, capable of navigating complex and unpredictable worlds, and able to respond to challenges with skill and precision—qualities that mirror how a language model can shift between tasks, domains, and contexts.

Guided by mentors and shaped by hardships, Ciri develops her abilities quickly, mastering both strategy and instinct while remaining flexible in the face of unforeseen circumstances.

Her combination of innate talent, adaptability, and the capacity for growth makes her a fitting symbol for a language model designed to acquire knowledge, evolve over time, and connect information across domains.

Ciri
Fig.2 Ciri Gwent card by Anna Podedworna

What is a LLM

On a high level: imagine a toddler with a huge amount of knowledge but still possessing a toddler-like way of reasoning and understanding.

On a lower level: an LLM is a neural network trained on big data to recognize patterns, generate human-like responses, and predict the most likely next word in a given context. While it can process and recall information efficiently, it lacks true understanding, reasoning, or consciousness, relying only on statistical correlations rather than genuine comprehension. The reasoning of LLMs is being improved in projects (most notably) like DeepSeek, which focus on enhancing the ability to understand context and simulate human-like reasoning.

Documentation


OllamaCurate

class cirilla.synth_data.OllamaCurate(model, system_prompt, response_template)

Generall class for creating syntetic datasets with ollama

ArgumentTypeDescription
modelstrname of ollama model e.g. "llama3.1:8b"
system_promptstrsystem prompt for the chosen ollama model, used by its functions: __call__, dynamic_hierarchical_summary and single_pass_summary
response_templatepydantic.BaseModeltemplate for structured responses, used by its functions: __call__, dynamic_hierarchical_summary, single_pass_summary and multi_turn
__call__(paths, save_to, seed, checkpoint, skip) → None

Create synthetic instructions (question-answer pairs)
to see how to turn the instructions into a .jsonl file see the rm_duplicate_instructs function

ArgumentTypeDescription
pathslist[Path]paths to .txt files containing texts to create question answer pairs
save_toPathfolder to save generated question answer pairs to
seedintseed for the ollama model
checkpointintsave the generated question answer pairs to the save_to folder after checkpoint iterations
skipboolif True, skip the already existing question answer pairs, else save as existing_name_{i} where i is the number of already existing files of the same name

Example:

import os
import random

class Response(BaseModel):
    question: str = Field(description="What question is appropriate to this text?")
    answer: str = Field(description="Answer to the question")

sys_prompt = """
You are an expert dataset annotator for instruction-tuning large language models. 
Your task is to create high-quality question-answer pairs from provided texts 
for training instruct models.

Guidelines:
- Keep the question relevant and informative for learners.
- Avoid using markdown or any unnecessary formatting.
- You can ask to elaborate based on a keyword or phrase in the text.
- You can ask about the plot if the text is a story.
- Do not use overly formal language.
- Use only the information provided in the text.
- If the text states that any part of it is from Netflix, or mentions that a section is from Netflix,
  ignore that part and do not include it in the question or answer.
- If user specifies already created question and answer pair, find a different question and answer
  pair that is different from the one provided. If this is impossible use different words then the ones
  provided.
- Return the output strictly as a JSON with two fields: "question" and "answer".
"""

folder = "./witcher_fandom"
paths = os.listdir(folder)
paths = [os.path.join(folder, p) for p in paths]
print(f"{len(paths)} paths found")

for _ in range(3):
        dual = OllamaCurate(
            model="qwen3:8b",
            system_prompt=sys_prompt,
            response_template=Response
        )
        dual(
            paths,
            save_to="./witcher_synthetic_instruct/qwen3:8b",
            seed=random.randint(0, 1000),
            checkpoint=10
            skip=False,
        )

Example created output file: ./witcher_synthetic_instruct/qwen3:8b/Deadly Plot.json

{
    "question": "What location must the player meet Dijkstra at for the A Deadly Plot quest?",
    "answer": "Passiflora."
}

The output file's contents depend on whatever was present in the corresponding text file:
./witcher_fandom/Deadly Plot.txt

single_pass_summary(paths, save_to, seed, num_predict, use_response_template) → None

For provided .txt files create a simple summary of them
See how to turn summaries into a dataset: gather_summaries summaries_to_instruct

ArgumentTypeDescription
pathslist[Path]paths to .txt files containing texts to summarize
save_toPathfolder to save generated summaries to
seedintseed for the ollama model
num_predictintmaximum number of tokens the ollama model can generate
use_response_templateboolwhether to use a response template, that is set when initialing the OllamaCurate class

Example:

import os
import random
from pydantic import BaseModel, Field

class Response(BaseModel):
    summary: str = Field(description="Final summary of the entire text. Pure summary only,
    no introduction or reasoning.")

paths = os.listdir("./witcher_fandom")
paths = [os.path.join("./witcher_fandom", p) for p in paths]

print(f"{len(paths)} paths found")

for m in ["granite3.1-moe:3b"]:
    model = OllamaCurate(model=m, system_prompt="", response_template=Response)
    model.single_pass_summary(
        paths,
        save_to=f"./witcher_synth_summaries/{m}",
        seed=random.randint(0, 1000),
        num_predict=4096,
        use_response_template=True,
    )
dynamic_hierarchical_summary(paths, save_to, chunk_lines, seed, num_predict, max_words_summary, use_response_template) → None

For provided .txt files create a hierarchical summary of them. Meaning that for a text divided into chunks we subsequently generate a summary for each chunk, and then generate a final summary based on the summaries of the chunks

ArgumentTypeDescription
pathslist[Path]paths to .txt files containing texts to summarize
save_toPathfolder to save generated summaries to
chunk_linesintthe number of lines by which to chunk the text e.g. for a text of 1000 lines and chunk_lines=200 the text will be divided into 5 chunks of 200 lines each, all the chunks will get their own summary and based on them a final summary will be generated
seedintseed for the ollama model
num_predictintmaximum number of tokens the ollama model can generate
max_words_summaryintmaximum number of words ollama should target for the summary
use_response_templateboolwhether to use a response template, that is set when initialing the OllamaCurate class

Example:

import os
from pydantic import BaseModel, Field
import os
import random

class Response(BaseModel):
    summary: str = Field(description="Summary of the text, without the thinking process and without any
    introduction. Provide only pure summary, be expressive but stick to the maximum number of words
    that were provided.")

paths = os.listdir('./witcher_texts')
paths = [os.path.join('./witcher_texts', p) for p in paths]

print(f"{len(paths)} paths found")

for m in ['llama3.2:3b', 'llama3.1:8b', 'qwen3:8b']:

    model = OllamaCurate(model=m,
                        system_prompt="",
                        response_template=Response
                        )
    
    model.dynamic_hierarchical_summary(paths,
                                    save_to=f'./synth_sumarries/witcher_texts/{m}',
                                    chunk_lines=100,
                                    seed=random.randint(0, 1000),
                                    num_predict=2048,
                                    max_words_summary=500,
                                    use_response_template=True
                                    )
multi_turn(paths, save_to, bar, n_turns_range, seed, prob_chance_new_context) → None

Create synthetic multi turn instructions (question-answer pairs)

ArgumentTypeDescription
pathslist[Path]paths to .txt files containing texts to create multi turn question answer pairs
save_toPathfolder to save generated mult turn question answer pairs to
bartqdm.tqdmtqdm bar to track progress
n_turns_rangetuple[int,int]range of number of turns to generate, e.g. (2, 4) will generate question answer pairs with 2 to 4 turns (the actual number of turns will be randomly chosen from this range and consistent for all the files)
seedintseed for the ollama model
prob_chance_new_contextfloatProbability of starting a new context for the question answer pairs e.g. with a given chance we can continue the multi turn conversation with a new context from another randomly chosen .txt file

Example:

from pydantic import BaseModel, Field
import os
import random

class Response(BaseModel):
    question: str = Field(description="What question is appropriate to this text?")
    answer: str = Field(description="Answer to the question")

paths = os.listdir('./witcher_fandom')
paths = [os.path.join('./witcher_fandom', p) for p in paths]

bar = tqdm(total=3*3*len(paths))
for model in ['qwen3:8b', 'phi4', 'llama3.1:8b']:

    for _ in range(3):
        ol = OllamaCurate(model, "", Response)
        ol.multi_turn(paths,
        save_to=f'./synth_multi_round/{model}',
        bar=bar,
        n_turns_range=(2, 5),
        seed=random.randint(0, 1000),
        prob_chance_new_context=0.3)

rm_duplicate_instructs

func cirilla.synth_data.rm_duplicate_instructs(main_dir, save_to) → None

remove duplicate synthetic instructions

ArgumentTypeDescription
main_dirPathpath to the directory containing the synthetic instructions
out_pathPathpath to save the cleaned instructions to

Example:

main_dir = './witcher_synthetic_instruct'
save_to = './witcher_synthetic_instruct.jsonl'

rm_duplicate_instructs(main_dir, save_to)

Example input file: ./witcher_synthetic_instruct/qwen3:8b/Deadly Plot.json

{
    "question": "What is the objective of the A Deadly Plot quest in The Witcher 3: Wild Hunt?",
    "answer": "The objective of the A Deadly Plot quest is to help Dijkstra ..."
}

Example input file: ./witcher_synthetic_instruct/qwen3:8b/Deadly Plot_1.json

{
    "question": "What is the main objective of the A Deadly Plot quest in The Witcher 3: Wild Hunt?",
    "answer": "The objective of the A Deadly Plot quest is to help Dijkstra ..."
}
Since the second instruction is nearly identical, it will be deleted in the final .jsonl file.

Example created output file (obviously the actual .jsonl file is saved in single lines): ./witcher_synthetic_instruct.jsonl

{"subject": "Silver sword", "text": [
    {"role": "user", "content": "What material are silver swords made from in The Witcher?"},
    {"role": "assistant", "content": "Meteoric iron coated with silver and inscribed with magical runes"}
    ], "data type": "conv", "model": "llama3.1:8b"}
...

gather_summaries

func cirilla.synth_data.gather_summaries(in_path, out_path) → None

turn summaries into a training dataset

ArgumentTypeDescription
in_pathPathpath to the directory containing summaries in .txt files, they can be nested
out_pathPathpath to save the instructions to

Example:

in_path = './witcher_synth_summaries'
out_path = './witcher_summaries_gathered.jsonl'

gather_summaries(in_path, out_path)

Example input file: ./witcher_synth_summaries/llama3.1:8b/Alchemist.txt

In the Witcher lore, Alchemists are individuals who practice alchemy ...

Example created output file (obviously the actual .jsonl file is saved in single lines): ./witcher_summaries_gathered.jsonl

{
    "subject": "sorcerers",
    "text":"Mages, or sorcerers/sorceresses, ...",
    "data type": "plain text", "source": "fandom", "model": "llama3.2:3b"
}
...

summaries_to_instruct

func cirilla.synth_data.summaries_to_instruct(in_path, out_path) → None

turn summaries into simple instructions. The user questions are chosen as one from a list

ArgumentTypeDescription
in_pathPathpath to the directory containing summaries in .txt files, they can be nested
out_pathPathpath to save the instructions to

Example:

in_path = './witcher_synth_summaries'
out_path = './witcher_summaries_gathered_instr.jsonl'

gather_summaries(in_path, out_path)

Example input file: ./witcher_synth_summaries/llama3.1:8b/Alchemist.txt

In the Witcher lore, Alchemists are individuals who practice alchemy ...

Example created output file (obviously the actual .jsonl file is saved in single lines): ./witcher_summaries_gathered_instr.jsonl

{
    "subject": "Ravik",
    "text": [
    {"role": "user", "content": "What is notable about Ravik?"},
    {"role": "assistant", "content": "Ravik, also known as Ravvy, was a friend ..."}
    ],
    "data type": "plain text", "source": "fandom", "model": "llama3.2:3b"
}
...
As you may notice the user's question is asking about Ravik, which is the name of the file ./.../Ravik.txt

get_synth_reasoning_dataset

func cirilla.synth_data.get_synth_reasoning_dataset(out_path, n_samples, specs) → None

create synthetic reasoning dataset with reasoning_gym

ArgumentTypeDescription
out_pathPathpath to save the synthetic reasoning dataset to
n_samplesintHow many samples of the synthetic reasoning dataset to create (each sample contains 100 data points)
specslist[reasoning_gym.composite.DataSpec]specs for creating the synthetic reasoning dataset

Example:

out_path = './reason_gym_synth.jsonl'
n_samples = 400 # will contain 40'000 data points

get_synth_resoning_dataset(out_path, n_samples)

Example output file: ./reason_gym_synth.jsonl (obviously the actual .jsonl file is saved in single lines)

{"subject": "simple_equations", "text": [
    {"role": "user", "text": "Solve for g: 65*g = 3185"},
    {"role": "assistant", "text": "49"}],
    "data type": "conv"}
{"subject": "needle_haystack", "text": [
    {"role": "user", "text": "Ismaeel worships lions. Yann embraces travel blogging. Malakhy
    scorns historical documentaries. Kabeer cherishes sketching. \nWho scorns historical
    documentaries? Reply only with a name."},
    {"role": "assistant", "text": "Malakhy"}],
    "data type": "conv"}
...

vllm_multi_turn

func cirilla.synth_data.vllm_multi_turn(paths, save_to, batch_size, system_prompt, n_turns, template, model, prob_chance_new_context) → None

create synthetic multi turn conversations about given topics with vllm
to see how to turn the instructions into a .jsonl file see the multi_turn_gather function

ArgumentTypeDescription
pathslist[Path]list of paths to the contexts for the conversations
save_toPathpath to save the synthetic multi turn dataset to
batch_sizeintbatch size for the conversations
system_promptstrsystem prompt for the conversations
n_turnsintnumber of turns for the conversations, it usually means the maximum number of turns, since some of the turns may fail and will thus be empty
templatepydantic.BaseModeltemplate for the conversations
modelstrmodel form huggingface for the conversations
prob_chance_new_contextfloatprobability of a new context being added into the conversations

Example:

paths_ = ['./summaries/granite3.1-moe:3b',
            './summaries/llama3.1:8b',
            './summaries/llama3.2:3b',
            './summaries/qwen3:8b']

paths = [[os.path.join(p, f) for f in os.listdir(p)] for p in paths_]

for model in ["unsloth/granite-3.2-2b-instruct-unsloth-bnb-4bit"]:
    for i, sub_paths in enumerate(paths):
        for _ in range(1):
        
            vllm_multi_turn(sub_paths,
            save_to=f'./synth_multi_round/{model.split("/")[1]}/{paths_[i].split("/")[-1]}',
            model=model)

Where the summaries may come from OllamaCurate.single_pass_summary

Example created output file ./synth_multi_round/granite-3.2-2b-.../granite3.1-moe:3b/A Book of Tales.json:

[
  {
    "question": "Which specific adventure or mystery introduced in The Book of Tales ... ?",
    "answer": "The given text does not specify a single adventure where the ...",
    "context": "A Book of Tales"
  },
  {
    "question": "Which three new playable races \u2013 Gnomes, ... ?",
    "answer": "Gnomes are native to the mountainous region of Mount Carbon in Kovir ...",
    "context": "A Book of Tales"
  },
  {
    "question": "What central role does Radko hold in The Witcher ... ?",
    "answer": "Radko, a soldier under the Bloody Baron's command ...",
    "context": "Radko"
  }
]

multi_turn_gather

func cirilla.synth_data.multi_turn_gather(input_path, save_to) → None

gather multi turn conversations into a .jsonl file

ArgumentTypeDescription
input_pathPathpath to a folder with .jsonl files containing multi turn conversations, they can be nested
save_toPathpath to save the gathered conversations to

Example:

inp = './synth_multi_round'
outp = './multi_round.jsonl'

multi_turn_gather(inp, outp)

Example created output file ./multi_round.jsonl (obviously the actual .jsonl file is saved in single lines):

{
    "subject": "A Little Sacrifice", "text":
    [
    {"role": "user", "content": "What does Sh'eenaz do to help resolve the ... ?"},
    {"role": "assistant", "content": "Sh'eenaz agrees to give up her tail for ..."},
    {"role": "user", "content": "What does Sh'eenaz do to ... ?"},
    {"role": "assistant", "content": "Sh'eenaz ..."},
    {"role": "user", "content": "What is the nature of Geralt's ... ?"},
    {"role": "assistant", "content": "Geralt ..."}
    ],
    "data type": "conv", "source": "fandom",
    "metadata": {"contexts": ["A Little Sacrifice", "A Little Sacrifice", "A Little Sacrifice"]}
}

get_activation

func cirilla.LLM_pieces.get_activation(path) → HF kernel

get an optimized kernel from Huggingface kernel hub
those kernels mostly work only on cuda

ArgumentTypeDescription
pathPathpath to a Huggingface kernel e.g. "kernels-community/activation"

Example:

dim=256
activation = get_activation('Motif-Technologies/activation')
rmsnorm = activation.layers.RMSNorm(dim=dim).cuda()

print(rmsnorm(torch.randn(1, dim, device='cuda')).shape)
# torch.Size([1, 256])

BertAttention

class cirilla.LLM_pieces.BertAttention(args, rope)

BERT attention with grouped query

ArgumentTypeDescription
args.BertAttentionArgsarguments for the BertAttention class
ropecirilla.LLM_pieces.Rope.Roperotary positional embeddings class

Signature:

RMSNorm → Wqkv → RoPE → FlashAttention3

Example:

from cirilla.Cirilla_model import benchmark_model_part

att = BertAttention(BertAttentionArgs(), RoPE(128, 512)).cuda().to(torch.bfloat16)
x = torch.randn(4, 512, 128*16, device='cuda', dtype=torch.bfloat16)
benchmark_model_part(att, x, "BertAttention")
# [BertAttention]
# Forward time:   1.85 ms
# Backward time:  4.03 ms
# Forward memory: 56.12 MB
# Backward memory:-36.12 MB

out = att(x)

RoPE

class cirilla.LLM_pieces.RoPE(head_dim, seq_len, device, theta, dtype)

rotary positional embeddings

ArgumentTypeDescription
head_dimintsize of the head dimension
seq_lenint(maximum) sequence length
deviceUnion[torch.device, str]device to use
theatafloattheta for sin and cos
dtypetorch.dtypedtype to use

Example:

rope = RoPE(128, 512)
xq = torch.randn(2, 512, 4, 128, device='cuda', dtype=torch.bfloat16) # (b, seq_len, head, head_dim)
xk = torch.randn(2, 512, 4, 128, device='cuda', dtype=torch.bfloat16)
xq_out, xk_out = rope.apply_rotary_embeddings(xq, xk)

print(xq.shape, xq_out.shape, xq_out.dtype, xq_out.device)
print(xk.shape, xk_out.shape, xk_out.dtype, xk_out.device)
# torch.Size([2, 512, 4, 128]) torch.Size([2, 512, 4, 128]) torch.bfloat16 cuda:0
# torch.Size([2, 512, 4, 128]) torch.Size([2, 512, 4, 128]) torch.bfloat16 cuda:0

SlidingWindowAttention

class cirilla.LLM_pieces.SlidingWindowAttention(args, rope, mask, score_mod)

sliding window attention with grouped query

ArgumentTypeDescription
args.AttentionArgsarguments for the SlidingWindowAttention class
ropecirilla.LLM_pieces.Rope.Roperotary positional embeddings class
maskUnion[BlockMask, .create_dynamic_block_mask]attention mask or a function to create it
score_modCallablea function to modify the attention scores, e.g. Soft-capping

Signature:

RMSNorm → Wqkv → RoPE → FlexAttention

Example:

import torch
from cirilla.LLM_pieces import create_dynamic_block_mask, create_static_block_mask
from attn_gym.mods import generate_tanh_softcap

SOFT_CAP = 20

x = torch.rand((1,2048,128*16), device='cuda', dtype=torch.bfloat16) # (b, seq, head_dim*h)

""" static mask """
static_mask = create_static_block_mask(sliding_window_causal, 2048, 2048)
softcap = generate_tanh_softcap(SOFT_CAP, approx=False)
# or you can do: softcap = None

rope = RoPE(128, 2048)

attention_layer = SlidingWindowAttention(AttentionArgs(),
                                        rope,
                                        mask=static_mask,
                                        score_mod=softcap).to('cuda', dtype=torch.bfloat16)
out = attention_layer(x)
print(out.shape) # torch.Size([1, 2048, 2048])

"""" dynamic mask - won't trigger recompilation """
dynamic_args = AttentionArgs(static_mask=False)
attention_layer = SlidingWindowAttention(dynamic_args,
                                        mask=create_dynamic_block_mask,
                                        rope=rope,
                                        score_mod=softcap).to('cuda', dtype=torch.bfloat16)
out = attention_layer(x)
print(out.shape) # torch.Size([1, 2048, 2048])

x = torch.rand((1,512,128*16), device='cuda', dtype=torch.bfloat16) # (b, seq, head_dim*h)

out = attention_layer(x)
print(out.shape) # torch.Size([1, 512, 2048])


x = torch.rand((1,256,128*16), device='cuda', dtype=torch.bfloat16) # (b, seq, head_dim*h)

out = attention_layer(x)
print(out.shape) # torch.Size([1, 256, 2048])

x = torch.rand((1,2048,128*16), device='cuda', dtype=torch.bfloat16) # (b, seq, head_dim*h)

out = attention_layer(x)
print(out.shape) # torch.Size([1, 2048, 2048])

print(create_dynamic_block_mask.cache_info()) # how many times the mask template was reused
# CacheInfo(hits=1, misses=3, maxsize=32, currsize=3)

Expert

class cirilla.LLM_pieces.Expert(args)

a single expert used in Sparse Mixture of Experts (SMoE)

ArgumentTypeDescription
args.ExpertArgsarguments for the Expert class

Signature:

SwiGLU (W1 → [SiLU(x[:dff]) ⊙ x[dff:]] → W2)

SMoE

class cirilla.LLM_pieces.SMoE(args, experts)

pytorch implementation of SMoE

ArgumentTypeDescription
args.SMoEArgsarguments for the SMoE class
expertslist[Expert]list of experts to use

Signature:

RMSNorm → Gating → Experts

Example:

import torch

moe = SMoE(
    SMoEArgs(num_experts=4, k=2),
        [Expert(ExpertArgs()) for _ in range(4)]
    ).to("cuda") # hf kernel only work on cuda

x = torch.randn(4, 1024, 128,
                device='cuda', requires_grad=True) # (b, seq_len, dim) ; requires grad for smoe

out = moe(x)

MegablockMoE

class cirilla.LLM_pieces.MegablockMoE(args)

megablocks implementation of MoE

ArgumentTypeDescription
args.MegablockArgsarguments for the MegablockMoE class

Example:

import torch
                    
megamoe = MegablockMoE(MegablockArgs())
x = torch.randn(4, 1024, 128,
                device='cuda', requires_grad=True) # (b, seq_len, dim) ; requires grad for smoe

out = megamoe(x)

MegablockdMoE

class cirilla.LLM_pieces.MegablockdMoE(args)

megablocks implementation of dropless MoE (dMoE)

ArgumentTypeDescription
args.MegablockArgsarguments for the MegablockdMoE class

Example:

import torch

megadmoe = MegablockdMoE(MegablockArgs())
x = torch.randn(4, 1024, 128,
                device='cuda', requires_grad=True) # (b, seq_len, dim) ; requires grad for smoe

out = megadmoe(x)

CirillaBERT

class cirilla.Cirilla_model.CirillaBERT(args)

implementation of ModernBERT with Cirilla LLM pieces

ArgumentTypeDescription
args.BertArgsarguments for the CirillaBERT class

Signature:

Input Embeddings → N×(BertAttention → MoE) → Mean Pooling / tokens / classes / (RMSNorm → Wout)

Example:

model = CirillaBERT(BertArgs(output_what='classify'))

targs = TrainingArgs(hf_repo_id='AnthonyPa57/HF-torch-demo-R', local_checkpoint_folder='./test_model_bert')
trainer = CirillaTrainer(model, targs)

tokenizer = CirillaTokenizer(hub_url='AnthonyPa57/HF-torch-demo2')
dl = JSONLDataset(['./example_bert.jsonl', './example_bert.jsonl'],
                    shuffle_path=True, tokenizer=tokenizer, max_len=model.args.context_window)

from types import MethodType

def new_training_step(self, data):
    out = self.model.pred(data[0], data[1]) # tokens, mask
    loss = self.criterion(out, data[2])
    return loss

trainer.training_step = MethodType(new_training_step, trainer)
trainer.criterion = nn.CrossEntropyLoss()

trainer.train(dl, dl)
pred(x, attention_mask) → torch.Tensor

forward pass of the model, can return mean pooling / tokens / classes / llm output

ArgumentTypeDescription
xtorch.Tensortensor of shape (b, seq_len)
attention_maskOptional[torch.Tensor]attention mask, used for mean pooling

Example:

vocab_size = 60_000
context_window = 2048
x = torch.randint(0, vocab_size,
    (4, context_window),
    dtype=torch.long)

mask = torch.zeros_like(x)
mask[x != 0] = 1

model = CirillaBERT(BertArgs(output_what='classify', n_classes=2))

print(model.pred(x, mask).shape) # torch.Size([4, 2])
pull_model_from_hub(hf_repo_id) → None

pull model from huggingface, the pulled model has to be compatible with CirillaBERT

ArgumentTypeDescription
hf_repo_idstrhuggingface repo id to pull model from

Example:

model = CirillaBERT(BertArgs())

mode.pull_model_from_hub('AnthonyPa57/HF-torch-demo-R')

JSONLDataset

class cirilla.Cirilla_model.JSONLDataset(paths, shuffle_path, device, tokenizer, max_len, pad_token, eos_token, sos_token, user_token, bert_append_tokens, random_spelling_mistake_prob, random_missing_char_prob)

basic dataset for training Cirilla models with .jsonl files

ArgumentTypeDescription
pathUnion[Path, tuple[Path]]path to a folder with .jsonl file(s) that contain training data
shuffle_pathboolwhether to shuffle the order of the .jsonl files (in place)
devicetorch.devicewhat devide to transfer data to
tokenizerCirillaTokenizertokenizer to use
max_lenintmaximum length of the sequence
pad_tokenstrpadding token <pad>
eos_tokenstrend of sequence token <eos>
sos_tokenstrstart of sequence token <sos>
user_tokenstruser token <|user|>
bert_append_tokenslist[str]tokens to append to the end of the sequence for BERT models e.g. <cls>
random_spelling_mistake_probfloatprobability of adding a random spelling mistake
random_missing_char_probfloatprobability of removing a random character

Example:

import time
import numpy as np

tokenizer = CirillaTokenizer(hub_url='AnthonyPa57/HF-torch-demo2')

dl = JSONLDataset(['./example.jsonl', './example.jsonl'],
                    shuffle_path=True, tokenizer=tokenizer, max_len=32)
print(len(dl))
times = []
dl = DataLoader(dl, batch_size=2)
for _ in range(2):
    times.append(time.time())
    for i in dl:
        print('-'*50)
        print(i[0], i[1], sep='\n')
        times.append(time.time())
print(np.mean(np.diff(times)))
dataset = JSONLDataset(['./example.jsonl', './example.jsonl'],
                        random_missing_char_prob=0.01, random_spelling_mistake_prob=0.02)
for _ in range(4):
    print(dataset._apply_random_spelling_mistake('hello world, I am a sentence'))

# hwllo world, I am a sentence
# hello worls, I am a sentence
# helo world, I am a sentemce
# helloworld, I am a sentence

Cirilla

class cirilla.Cirilla_model.Cirilla(args)

implementation of modern transformer model with Cirilla LLM pieces

ArgumentTypeDescription
args.Argsarguments for the Cirilla class

Signature:

Input Embeddings → N×(SlidingWindowAttention → MoE) → RMSNorm → Wout

Example:

model = Cirilla(Args())

targs = TrainingArgs(hf_repo_id='AnthonyPa57/HF-torch-demo-R',
                        local_checkpoint_folder='./test_model')

trainer = CirillaTrainer(model, targs)

tokenizer = CirillaTokenizer(hub_url='AnthonyPa57/HF-torch-demo2')
dl = JSONLDataset(['./example.jsonl', './example.jsonl'],
                    shuffle_path=True, tokenizer=tokenizer,
                    max_len=model.args.context_window)

trainer.train(dl, dl)
pred(x) → torch.Tensor

forward pass of the model, can return llm output of shape (b, seq_len, vocab_size)

ArgumentTypeDescription
xtorch.Tensortensor of shape (b, seq_len)

Example:

vocab_size = 60_000
context_window = 2048
x = torch.randint(0, vocab_size,
    (4, context_window),
    dtype=torch.long)

model = Cirilla(Args())

print(model.pred(x).shape) # torch.Size([4, context_window, vocab_size])
pull_model_from_hub(hf_repo_id) → None

pull model from huggingface, the pulled model has to be compatible with Cirilla

ArgumentTypeDescription
hf_repo_idstrhuggingface repo id to pull model from

Example:

model = Cirilla(Args())

mode.pull_model_from_hub('AnthonyPa57/HF-torch-demo-R')

benchmark_model_part

func cirilla.Cirilla_model.benchmark_model_part(model, x, label) → None

benchmark a part of the model

ArgumentTypeDescription
modelCallablemodel or a piece of the model to benchmark
xtorch.Tensorinput to the model or piece of the model
labelstrlabel for the benchmark

Example:

att = BertAttention(BertAttentionArgs(), RoPE(128, 512)).cuda().to(torch.bfloat16)
x = torch.randn(4, 512, 128*16, device='cuda', dtype=torch.bfloat16)
benchmark_model_part(att, x, "BertAttention")

# [BertAttention]
# Forward time:   1.85 ms
# Backward time:  4.03 ms
# Forward memory: 56.12 MB
# Backward memory:-36.12 MB

CirillaTokenizer

class cirilla.Cirilla_model.CirillaTokenizer(path, hub_url)

tokenizer for the Cirilla models, it retains the functionality of Huggingface tokenizers

ArgumentTypeDescription
pathPathlocal path to a tokenizer file
hub_urlstrhuggingface repo id to pull tokenizer from

Example:

tokenizer = CirillaTokenizer(hub_url='AnthonyPa57/HF-torch-demo2')

chat = [
    {'role': 'system', 'content': 'What is the capital of France?'},
    {'role': 'user', 'content': 'What is the capital of France?'},
]

print(tokenizer.decode(tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=False)))
print(tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=False))
train(dataset, special_tokens, save_to_path, **kwargs) → self.tokenizer

train the tokenizer on an iterator dataset

ArgumentTypeDescription
datasetUnion[Iterator[str], Iterator[Iterator[str]]]dataset to train on
special_tokensdict[str, str]special tokens to add to the tokenizer
save_to_pathPathlocal path to save tokenizer to
**kwargsAny**kwargs for tokenizer training

Example:

dl = JSONLDataset('./example.jsonl', shuffle_path=True)

tokenizer = CirillaTokenizer()
tokenizer.train(dl, special_tokens=SPECIAL_TOKENS, min_frequency=2)

tokenizer.push_to_hub('AnthonyPa57/HF-torch-demo2')

print(tokenizer.decode(tokenizer.encode('hello world')))
print(tokenizer.encode('What is the capital of France?'))
print(tokenizer.decode(tokenizer.encode('What is the capital of France?')))

CirillaTrainer

class cirilla.Cirilla_model.CirillaTrainer(model, training_args)

trainer for the Cirilla models

ArgumentTypeDescription
modeltorch.nn.Modulemodel to train
training_args.TrainingArgsarguments for the trainer

Example:

model = Cirilla(Args())

targs = TrainingArgs(hf_repo_id='AnthonyPa57/HF-torch-demo-R', local_checkpoint_folder='./test_model')
trainer = CirillaTrainer(model, targs)

tokenizer = CirillaTokenizer(hub_url='AnthonyPa57/HF-torch-demo2')
dl = JSONLDataset(['./example.jsonl', './example.jsonl'], shuffle_path=True,
                    tokenizer=tokenizer, max_len=model.args.context_window)

trainer.train(dl, dl)
train(dataset, valid_dataset) → None

train the model on on a dataset

ArgumentTypeDescription
datasetJSONLDatasetdataset to train on
valid_datasetJSONLDatasetdataset to validate on

Example:

model = CirillaBERT(BertArgs(output_what='classify'))

targs = TrainingArgs(hf_repo_id='AnthonyPa57/HF-torch-demo-R',
                        local_checkpoint_folder='./test_model_bert')
trainer = CirillaTrainer(model, targs)

tokenizer = CirillaTokenizer(hub_url='AnthonyPa57/HF-torch-demo2')
dl = JSONLDataset(['./example_bert.jsonl', './example_bert.jsonl'],
                    shuffle_path=True, tokenizer=tokenizer, max_len=model.args.context_window)

from types import MethodType

def new_training_step(self, data):
    out = self.model.pred(data[0], data[1]) # tokens, mask
    loss = self.criterion(out, data[2])
    return loss

trainer.training_step = MethodType(new_training_step, trainer)
trainer.criterion = nn.CrossEntropyLoss()

trainer.train(dl, dl)
benchmark() → None

benchmark the model on a randomly generated data of batch size 4

Example:

model = Cirilla(Args())

trainer.benchmark()
# average time for epoch: 0.8215 (seconds)

scrape_fandom

func fandom_scraper.scrape_fandom(in_path, out_path, instruct_path, n_workers, wiki, lang) → None

scrape a given fandom wiki. Use huggingface's span maker (Named Entity Recognition) to search for new pages to scrape

fandom_scraper is a standalone package

ArgumentTypeDescription
in_pathPathpath to a folder with .json files containing lists with key words to search for first (so-called seeds)
out_pathPathpath to save the scraped texts to
instruct_pathPathpath to save the scraped instructions to
n_workersinthow many async wokers to use to fetch the fandom pages
wikistrwhat fandom wiki to scrape
langstrwhat language to use for the fandom

Example:

in_path = "./witcher_json"
out_path = "./async_fandom"
instruct_path = "./async_fandom_instruct"

scrape_fandom(in_path,
            out_path,
            instruct_path,
            n_workers = 50,
            wiki = "Witcher",
            lang = "en"
            )

Example input file: ./witcher_json/witcher_1.json

[
    "Geralt of Rivia", "Triss Merigold", "Vesemir", "Leo", "Lambert", 
    "Eskel", "Alvin", "Shani", "Zoltan Chivay", "Dandelion (Jaskier)", 
    "King Foltest", "Adda the White", ...
]

Example created output file: ./async_fandom/Geralt of Rivia.txt

Geralt of Rivia
Sub-Pages:
Main
Biography
Geralt was born as the son of the sorceress Visenna and presumably, the warrior Korin. Shortly after his
birth, his mother left him with the School of the Wolf at the stronghold of Kaer Morhen. There, Geralt
was made and trained to become a Witcher.
As a child, he ...

Example created output file: ./async_fandom_instruct/Geralt of Rivia.json

{
    "Who is Geralt of Rivia?": "Geralt of Rivia is a witcher and the protagonist from The Witcher ...",
    "What is Geralt's nickname?": "Geralt has a number of nicknames, with the more well known ones ...",
    "Who trained Geralt to become a Witcher?": "Geralt was trained by his mentor Vesemir, who he ...",
    "What is Geralt's profession?": "Geralt is a monster slayer for hire. He travels the world on ...",
    "What is the Trial of The Grasses?": "The Trial of The Grasses was a painful process that ..."
}

instructions_into_conv

func fandom_scraper.instructions_into_conv(input_path, out_path) → None

convert scraped fandom instructions into a .jsonl file

ArgumentTypeDescription
input_pathPathpath to a folder with scraped instructions
out_pathPathpath to save the .jsonl file to

Example:

instructions_into_conv(input_path="./async_fandom_instruct",
                        out_path="./async_fandom_instruct_gathered.jsonl")

Example input file: ./async_fandom_instruct/Geralt of Rivia.json

{
  "Who is Geralt of Rivia?": "Geralt of Rivia is a witcher and the protagonist from The Witcher ...",
  "What is Geralt's nickname?": "Geralt has a number of nicknames, with the more well known ones ...",
  "Who trained Geralt to become a Witcher?": "Geralt was trained by his mentor Vesemir, who he ...",
  "What is Geralt's profession?": "Geralt is a monster slayer for hire. He travels the world on ...",
  "What is the Trial of The Grasses?": "The Trial of The Grasses was a painful process that ..."
}

Example created output file (obviously the actual .jsonl file is saved in single lines): ./async_fandom_instruct_gathered.jsonl

{"subject": "Geralt of Rivia", "text": [
    {"role": "user", "content": "Who is Geralt of Rivia?"},
    {"role": "assistant", "content": "Geralt of Rivia is a witcher and the protagonist from ..."}
    ], "data type": "conv", "source": "fandom"}
{"subject": "Geralt of Rivia", "text": [
    {"role": "user", "content": "What is Geralt's nickname?"},
    {"role": "assistant", "content": "Geralt has a number of nicknames, with the more well known ones ..."}
    ], "data type": "conv", "source": "fandom"}
...